Re: big Packages.gz file
== Brian May [EMAIL PROTECTED] writes: zhaoway == zhaoway [EMAIL PROTECTED] writes: zhaoway This is only a small part of the whole story, IMHO. See zhaoway my other email replying you. ;) Maybe there could be another version of Packages.gz without the extended descriptions -- I imagine they would take something like 33% of the Packages file, in line count at least. zhaoway Exactly. DIFF or RSYNC method of APT (as Goswin pointed zhaoway out), or just seperate Descriptions out (as I pointed out zhaoway and you got it too), nearly 66% of the bits are zhaoway saved. But this is only a hack, albeit efficient. At the risk of getting flamed, I investigated the possibility of writing an apt-get method to support rsync. I would use this to access an already existing private mirror, and not the main Debian archive. Hence the server load issue is not a problem. The only problem I have is downloading several megs of index files every time I want to install a new package (often under 100kb) from unstable, over a volume charged 28.8 kbps PPP link, using apt-get[1]. I tried the same, but I used the copy method as template, which is rather bad. Should have used http as starting point. Can you send me your patch please. I think (if I understand correctly) that I found three problems with the design of apt-get: 1. It tries to down-load the compressed Packages file, and has no way to override it with the uncompressed file. I filed a bug report against apt-get on this, as I believe this will also be a problem with protocols like rproxy too. 2. apt-get tries to be smart and passes the method a destination file name that is only a temporary file, and not the final file. Hence, rsync cannot make a comparison between local and remote versions of the file. I wrote to the deity mailinglist concerning those two problems with 2 possible sollution. Till now the only answere I got was NO we don't want rsync after pressing the issue here on debian-devel. 3. Instead, rsync creates its own temporary file while downloading, so apt-get cannot display the progress of the download operation because as far as it is concerned the destination file is still empty. Hmm, isn't there a informational message you can output to hint of the progress? We would have to patch rsync to generate that style of progress output or fork and parse the output of rsync and pass on altered output. I think the only way to fix both 2 and 3 is to allow some coordination between apt-get and rsync where to put the temporary file and where to find the previous version of the file. Doing some more thinking I like the second solution to the problem more and more: 1. Include a template (some file that apt-get thinks matches best) in the fetch request. The rsync method can then copy that file to the destination and rsync on it. This would be the uncompressed Packages file or a previous deb or the old source. 2. return wheather the file is compressed or not simply by passing back the destination filename with the appropriate extension (.gz). So the destination filename is altered to reflect the fileformat. MfG Goswin
Re: big Packages.gz file
On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote: A big package index IMHO is the current bottleneck of Debian package system. What is the real problem with the large package files? They take a long time to download, but so do emacs and other bloatware. Hamish -- Hamish Moffatt VK3SB [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: big Packages.gz file
Hamish Moffatt [EMAIL PROTECTED] writes: What is the real problem with the large package files? They take a long time to download, but so do emacs and other bloatware. Yeah, but how often do you download emacs? The packages file gets downloaded _every single time_ you do an update, and for those of us with a slow modem link, that really sucks. -Miles -- Love is a snowmobile racing across the tundra. Suddenly it flips over, pinning you underneath. At night the ice weasels come. --Nietzsche
Re: big Packages.gz file
On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote: Hamish Moffatt [EMAIL PROTECTED] writes: What is the real problem with the large package files? They take a long time to download, but so do emacs and other bloatware. Yeah, but how often do you download emacs? Never, I wouldn't touch that thing with a 40 foot barge pole! The packages file gets downloaded _every single time_ you do an update, and for those of us with a slow modem link, that really sucks. True enough. I haven't really been following the discussion, to be honest. Maybe there could be another version of Packages.gz without the extended descriptions -- I imagine they would take something like 33% of the Packages file, in line count at least. Hamish -- Hamish Moffatt VK3SB [EMAIL PROTECTED] [EMAIL PROTECTED]
Re: big Packages.gz file
From: Hamish Moffatt [EMAIL PROTECTED] Subject: Re: big Packages.gz file Date: Tue, 9 Jan 2001 23:40:01 +1100 On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote: The packages file gets downloaded _every single time_ you do an update, and for those of us with a slow modem link, that really sucks. This is only a small part of the whole story, IMHO. See my other email replying you. ;) Maybe there could be another version of Packages.gz without the extended descriptions -- I imagine they would take something like 33% of the Packages file, in line count at least. Exactly. DIFF or RSYNC method of APT (as Goswin pointed out), or just seperate Descriptions out (as I pointed out and you got it too), nearly 66% of the bits are saved. But this is only a hack, albeit efficient. Cause this does not solve the problem of the package pool within the package pool system. It does it on the protocol and client tool side. 1) AIUI, package pool should be a storage system, which should has a smart algorithm for deleting packages which no distribution or other packages referncing. (Garbage collection by reference counts.) 2) A distribution, put aside the work of our honoured release manager, should be a partial package index listing. Thus, should be seperated from storage system. The current ``testing'' distribution doesn't to it well enough. (Thus, it has a regulation on upload frequency.) With these two things in mind, RSYNC can help very little. And the package pool's indexing problem remains. While on my previous letters, I try to get out a discussion on one of my humble try to help. ;) As soon as I have enough time, and enough discussion, I maybe write a more prepared document. But I need discussion first. Thanks! -- echo EOF |cpp - -|egrep -v '(^#|^$)' /* =|=X ++ * /\+_ p7 [EMAIL PROTECTED] */ EOF
Re: big Packages.gz file
From: Hamish Moffatt [EMAIL PROTECTED] Subject: Re: big Packages.gz file Date: Tue, 9 Jan 2001 19:59:13 +1100 On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote: A big package index IMHO is the current bottleneck of Debian package system. What is the real problem with the large package files? They take a long time to download, but so do emacs and other bloatware. The problem is, IMHO, that is, ;) Every awhile, when you want to update a package to the newest version, you have to update the package index first. And that is not absolutely necessary if you look into this problem. And the size of package index is constantly growing. With Emacs, nearly all of the bits are necessary for the functionality, and you don't download it for evey trivial update tasks. And it is not as rapidly growing in size as package index is. To look further, if we allow translation of Packages index, it could be even bigger. Or we allow multiple versions of a package come into Package pool (as Manoj had mentioned in another thread), big Package index could be even more troublesome. Hope I make myself clearer. ;) And thank you for discuss with me! ;) -- echo EOF |cpp - -|egrep -v '(^#|^$)' /* =|=X ++ * /\+_ p7 [EMAIL PROTECTED] */ EOF
Re: big Packages.gz file
On Tue, Jan 09, 2001 at 11:40:01PM +1100, Hamish Moffatt wrote: On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote: Hamish Moffatt [EMAIL PROTECTED] writes: What is the real problem with the large package files? They take a long time to download, but so do emacs and other bloatware. The packages file gets downloaded _every single time_ you do an update, and for those of us with a slow modem link, that really sucks. True enough. I haven't really been following the discussion, to be honest. Maybe there could be another version of Packages.gz without the extended descriptions -- I imagine they would take something like 33% of the Packages file, in line count at least. Please excuse me if I am jumping into the discussion unprepared or if this has already been mentioned. How hard would it be to make daily diffs of the Package file? Most people running unstable update every other day and this will require downloading and applying only a couple of diff files. The whole process can be easily automated. Sluncho [EMAIL PROTECTED]
Re: big Packages.gz file
sluncho == sluncho [EMAIL PROTECTED] writes: sluncho How hard would it be to make daily diffs of the Package sluncho file? Most people running unstable update every other day sluncho and this will require downloading and applying only a sluncho couple of diff files. sluncho The whole process can be easily automated. Sounds remarkably like the process (weekly not daily though) to distribute Fidonet nodelist diffs. Also similar to kernel diffs, I guess to. Seems a good idea to me (until better solutions like rproxy are better implemented), but you have to be careful not to get apply diffs in the wrong order. -- Brian May [EMAIL PROTECTED]
Re: big Packages.gz file
== Brian May [EMAIL PROTECTED] writes: sluncho == sluncho [EMAIL PROTECTED] writes: sluncho How hard would it be to make daily diffs of the Package sluncho file? Most people running unstable update every other day sluncho and this will require downloading and applying only a sluncho couple of diff files. sluncho The whole process can be easily automated. Sounds remarkably like the process (weekly not daily though) to distribute Fidonet nodelist diffs. Also similar to kernel diffs, I guess to. Seems a good idea to me (until better solutions like rproxy are better implemented), but you have to be careful not to get apply diffs in the wrong order. -- Brian May [EMAIL PROTECTED] Or missing one or having a corrupted file to begin with or any other of 1000 possibilities. Also mirrors will allways lack behind, have erratic timestamping on those files and so on. I think it would become a mess pretty soon. The nice thing about rsync is that its self repairing. Its allso more efficient than a normal diff. MfG Goswin
Re: big Packages.gz file
zhaoway == zhaoway [EMAIL PROTECTED] writes: zhaoway This is only a small part of the whole story, IMHO. See zhaoway my other email replying you. ;) Maybe there could be another version of Packages.gz without the extended descriptions -- I imagine they would take something like 33% of the Packages file, in line count at least. zhaoway Exactly. DIFF or RSYNC method of APT (as Goswin pointed zhaoway out), or just seperate Descriptions out (as I pointed out zhaoway and you got it too), nearly 66% of the bits are zhaoway saved. But this is only a hack, albeit efficient. At the risk of getting flamed, I investigated the possibility of writing an apt-get method to support rsync. I would use this to access an already existing private mirror, and not the main Debian archive. Hence the server load issue is not a problem. The only problem I have is downloading several megs of index files every time I want to install a new package (often under 100kb) from unstable, over a volume charged 28.8 kbps PPP link, using apt-get[1]. I think (if I understand correctly) that I found three problems with the design of apt-get: 1. It tries to down-load the compressed Packages file, and has no way to override it with the uncompressed file. I filed a bug report against apt-get on this, as I believe this will also be a problem with protocols like rproxy too. 2. apt-get tries to be smart and passes the method a destination file name that is only a temporary file, and not the final file. Hence, rsync cannot make a comparison between local and remote versions of the file. 3. Instead, rsync creates its own temporary file while downloading, so apt-get cannot display the progress of the download operation because as far as it is concerned the destination file is still empty. I think the only way to fix both 2 and 3 is to allow some coordination between apt-get and rsync where to put the temporary file and where to find the previous version of the file. Note: [1] Normally I try to find the files manually via lynx, but right at the moment this is rather difficult, as I seem to try numerous directories but not get the expected result. Some packages -- Brian May [EMAIL PROTECTED]
Re: big Packages.gz file
Brian == Brian May [EMAIL PROTECTED] writes: Brian Note: [1] Normally I try to find the files manually via Brian lynx, but right at the moment this is rather difficult, as Brian I seem to try numerous directories but not get the expected Brian result. Some packages Damm - sent that message before I had finished typing :-( Anyway, I meant to say some packages are hard to find manually while they haven't all been moved to the package pool system yet. -- Brian May [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
== Jason Gunthorpe [EMAIL PROTECTED] writes: On 8 Jan 2001, Goswin Brederlow wrote: I don't need to get a filelisting, apt-get tells me the name. :) You have missed the point, the presence of the ability to do file listings prevents the adoption of rsync servers with high connection limits. Then that feature should be limited to non-recursive listings or turned off. Or .listing files should be created that are just served. Reversed checksums (with a detached checksum file) is something someone should implement for debian-cd. You calud even quite reasonably do that totally using HTTP and not run the risk of rsync load at all. At the moment the client calculates one roling checksum and md5sum per block. I know how rsync works, and it uses MD4. Ups, then s/5/4/g. Given a 650MB file, I don't want to know the hit/miss ratios for the roling checksum and the md5sum. Must be realy bad. The ratio is supposed to only scale with block size, so it should be the same for big files and small files (ignoring the increase in block size with file size). The amount of time expended doing this calculation is not trivial however. Hmm, in the technical paper it says that it creates a 16 bit external hash, each entry a linked list of items containing the full 32 Bit rolling checksum (or the other 16 bit) and the md4sum. So when you have more blocks, the hash will fill up. So you have more hits on the first level and need to search a linked list. With a block size of 1K a CD image has 10 items per hash entry, its 1000% full. The time wasted alone to check the rolling checksum must be huge. And with 65 rolling checksums for the image, theres a ~10/65536 chance chance of hitting the same checksum with differen md4sum, so thats about 100 times per CD, just by pure chance. If the images match, then its 65 times. So the better the match, the more blocks you have, the more cpu it takes. Of cause larger blocks take more time to compute a md4sum, but you will have less blocks then. For CD images the concern is of course available disk bandwidth, reversed checksums eliminate that bottleneck. That anyway. And ram. MfG Goswin
Re: package pool and big Packages.gz file
On 8 Jan 2001, Goswin Brederlow wrote: Then that feature should be limited to non-recursive listings or turned off. Or .listing files should be created that are just served. *couf* rproxy *couf* So when you have more blocks, the hash will fill up. So you have more hits on the first level and need to search a linked list. With a block size of 1K a CD image has 10 items per hash entry, its 1000% full. The time wasted alone to check the rolling checksum must be huge. Sure, but that is trivially solvable and is really a minor amount of time when compared with the computing of the MD4 hashes. In fact when you start taking about 65 blocks you want to reconsider the design choices that were made with rsync's searching - it is geared toward small files and is not really optimal for big ones. So the better the match, the more blocks you have, the more cpu it takes. Of cause larger blocks take more time to compute a md4sum, but you will have less blocks then. No. The smaller the blocks the more CPU time it will take to compute MD4 hashes. Expect MD4 to run at 100meg/sec on modern hardware so you are looking at burning 6 seconds of CPU time to verify the local CD image. If you start getting large 32 bit checksum matches with md4 mismatches due to too large a block size then you could easially double or triple the number of md4 calculations you need. That is still totally dwarfed by the 10meg/sec IO throughput you can expect with a copy of a 600 meg ISO file. Jason
Re: Linux Gazette [Was: Re: big Packages.gz file]
On 2001-01-07, Goswin Brederlow [EMAIL PROTECTED] wrote: zhaoway 1) It prevent many more packages to come into Debian, for zhaoway example, Linux Gazette are now not present newest issues zhaoway in Debian. People occasionally got fucked up by packages Any reasons why the Linux gazette is not present anymore? And is there a virtual package for the Linux gazette that allays depends on the newest version? Another solution would be to have only an installer which installs the latest version of the LG from a server that keeps it. Keeps the Packages.gz file clean, and LG readers happy. Or am I missing something? -- Andreas Fuchs, [EMAIL PROTECTED], [EMAIL PROTECTED], antifuchs Hail RMS! Hail Cthulhu! Hail Eris! All hail Discordia!
Re: big Packages.gz file
On Sun, Jan 07, 2001 at 05:18:02PM -0500, Chris Gray wrote: Brian May writes: bm What do large packages have to do with the size of the index file, bm Packages? I think the point was that every package adds about 30-45 lines to the Packages file. You don't need to download any of the Linux Gazette to have the 33 lines each issue takes up in the Packages file. A big package index IMHO is the current bottleneck of Debian package system. While most of people are more interested in RSYNC to come to cure, MHO RSYNC is an overkill and a non-clean-kill. It prevents easy mirroring of Debian by requesting RSYNC service on the mirror system, and it won't solve the pool's problem, but give a hack. ;) While OTOH a relatively straight solution is: * To seperate Packages.gz to be along with each package as another seperate file. Ceazar's belong to Ceazar. ;) i.e., each pkg_ver-sub_arch.deb with a pkg_ver-sub_arch.idx * At the same time, provide a big Packages.gz by collecting these small files for compatibility. Or, maybe even a trimmed Packages.gz by removing all of the Description:s. * Optionally, provide hard or symlinks along with each package, some i.e., pkg_[stable|unstable|testing]_arch.idx - pkg_ver-sub_arch.idx Note: this won't hurt mirror, OTOH could even help partial mirror. * And enable multiple versions of a package in the package pool. This way, general package index is optional. And release management could move towards those more fine tuned task-* like packages. No lost. ;) Just for discussion, I would be glad to hear critics. ;) -- echo EOF |cpp - -|egrep -v '(^#|^$)' /* =|=X ++ * /\+_ p7 [EMAIL PROTECTED] */ EOF
Re: big Packages.gz file
Hello, On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote: * To seperate Packages.gz to be along with each package as another seperate file. Ceazar's belong to Ceazar. ;) i.e., each pkg_ver-sub_arch.deb with a pkg_ver-sub_arch.idx No, thats not a win. You would end up checking time stamps for thousands of files in case of an update. I liked the idea of alphabetical splitting in Packages-[a-z0-9].gz * At the same time, provide a big Packages.gz by collecting these small files for compatibility. Or, maybe even a trimmed Packages.gz by removing all of the Description:s. Jup, just keep a copy of Packages.gz and provide backwards compatibility. Bastian Kleineidam pgp0mckdUDTPq.pgp Description: PGP signature
Re: Linux Gazette [Was: Re: big Packages.gz file]
On Mon, Jan 8, 2001 at 18:20:16 +0100 (+), Andreas Fuchs wrote: On 2001-01-07, Goswin Brederlow [EMAIL PROTECTED] wrote: zhaoway 1) It prevent many more packages to come into Debian, for zhaoway example, Linux Gazette are now not present newest issues zhaoway in Debian. People occasionally got fucked up by packages Any reasons why the Linux gazette is not present anymore? And is there a virtual package for the Linux gazette that allays depends on the newest version? Another solution would be to have only an installer which installs the latest version of the LG from a server that keeps it. Keeps the Packages.gz file clean, and LG readers happy. Or am I missing something? To answer the questions: a) it is present but I havn't updated it in a while (busy). Wouter Verhelst has offered to take over the package but he's new to packaging so things are taking a bit of time. b) nope - I havn't done a virtual latest package yet, there is a bug about it I think (or Wouter suggested it). c) personally, I like the LG since I find the issues useful - I found useful articles in all the ones I read. Unfortuantely since I left uni I havn't been sufficiently bored to remember to download and read them (and hence to package them). d) I was hoping the data section of Debian would get into policy so I could move the packages there and out of main. Adrian Email: [EMAIL PROTECTED] Windows NT - Unix in beta-testing. GPG/PGP keys available on public key servers Debian GNU/Linux -*- By professionals for professionals -*- www.debian.org
Re: package pool and big Packages.gz file
On Fri, 5 Jan 2001 09:33:05 -0700 (MST) Jason Gunthorpe [EMAIL PROTECTED] wrote: If that suits your needs, feel free to write a bugreport on apt about this. Yes, I enjoy closing such bug reports with a terse response. Hint: Read the bug page for APT to discover why! From bug report #76118: No. Debian can not support the use of rsync for anything other than mirroring, APT will never support it. Why? Because if everyone used rsync, the loads on the servers that supported rsync would be too high? Or something else? -- Sam Vilain, [EMAIL PROTECTED]WWW: http://sam.vilain.net/ GPG public key: http://sam.vilain.net/sam.asc
Re: package pool and big Packages.gz file
On Fri, 5 Jan 2001 19:08:38 +0200 [EMAIL PROTECTED] (Sami Haahtinen) wrote: Or, can rsync sync binary files? hmm.. this sounds like something worth implementing.. rsync can, but the problem is with a compressed stream if you insert or alter data early on in the stream, the data after that change is radically different. But... you could use it successfully against the .tar files inside the .deb, which are normally compressed. This would probably require some special implementation of rsync, or to have the uncompressed packages on the server and put the magic in apt. Or perhaps the program apt-mirror is called for, which talks its own protocol to other copies of itself, and will do a magic job of selectively updating mirror copies of the debian archive using the rsync algorithm. This would be similar to the apt-get and apt-move pair, but actually sticking it into a directory structure that looks like the debian mirror. Then, if you want to enable it, turn on the server version and share your mirror with your friends inside your corporate network! Or an authenticated version, so that a person with their own permanent internet connection could share their archive with a handful of friends - having an entire mirror would be too costly for them. I think this has some potential to be quite useful and reduce bandwidth requirements. It could use GPG signatures to check that nothing funny is going on, too. Either that or keep a number of patch files or .xd files for a couple of old revs per packages against the uncompressed contents of packages to allow small changes to packages to be quick. Or perhaps implement this as patch packages, which are a special .deb that only contain the changed files and upgrade the package. -- Sam Vilain, [EMAIL PROTECTED]WWW: http://sam.vilain.net/ GPG public key: http://sam.vilain.net/sam.asc
Re: package pool and big Packages.gz file
== Sam Vilain [EMAIL PROTECTED] writes: On Fri, 5 Jan 2001 09:33:05 -0700 (MST) Jason Gunthorpe [EMAIL PROTECTED] wrote: If that suits your needs, feel free to write a bugreport on apt about this. Yes, I enjoy closing such bug reports with a terse response. Hint: Read the bug page for APT to discover why! From bug report #76118: No. Debian can not support the use of rsync for anything other than mirroring, APT will never support it. Why? Because if everyone used rsync, the loads on the servers that supported rsync would be too high? Or something else? -- Sam Vilain, [EMAIL PROTECTED] WWW: http://sam.vilain.net/ GPG public key: http://sam.vilain.net/sam.asc Actually the load should drop, providing the following feature add ons: 1. cached checksums and pulling instead of pushing 2. client side unpackging of compressed streams That way the rsync servers would have to first server the checksum file from cache (being 200-1000 smaller than the real file) and then just the blocks the client asks for. So if 1% of the file being rsynced fits its even and everything above that saves bandwidth. The current mode of operation of rsync works in the reverse, so all the computation is done on the server every time, which of cause is a heavy load on the server. I hope both features will work without chaning the server, but if not, we will have to wait till servers catch up with the feature. MfG Goswin
Re: package pool and big Packages.gz file
Sam Vilain [EMAIL PROTECTED] writes: On Fri, 5 Jan 2001 19:08:38 +0200 [EMAIL PROTECTED] (Sami Haahtinen) wrote: Or, can rsync sync binary files? hmm.. this sounds like something worth implementing.. rsync can, but the problem is with a compressed stream if you insert or alter data early on in the stream, the data after that change is radically different. But... you could use it successfully against the .tar files inside the .deb, which are normally compressed. This would probably require some special implementation of rsync, or to have the uncompressed packages on the server and put the magic in apt. [...] Either that or keep a number of patch files or .xd files for a couple of old revs per packages against the uncompressed contents of packages to allow small changes to packages to be quick. Or perhaps implement this as patch packages, which are a special .deb that only contain the changed files and upgrade the package. I suggest you have a look at 'tje' by Joost Witteveen (http://joostje.op.het.net/tje/index.html). It is specifically written with the goal in mind to sync Debian mirrors with minimum bandwidth use. It doesn't use the rsync algorithm, but something similar. It understands .debs and claims to have less server CPU usage than rsync, since it caches diffs and md5sums. It would be really nice if anybody with an up-to-date mirror could volunteer to provide a machine to set up a tje server to test it a little more... Falk
Re: package pool and big Packages.gz file
On Sun, Jan 07, 2001 at 03:49:43PM +0100, Goswin Brederlow wrote: Actually the load should drop, providing the following feature add ons: [...] The load should drop from that induced by the current rsync setup (for the mirrors), but if many, many more client start using rsync (instead of FTP/HTTP), I think there will still be a significant net increase in load. Whether it would be enough to cause a problem is debatable, and I honestly don't know either way. -- - mdz
Re: package pool and big Packages.gz file
== Matt Zimmerman [EMAIL PROTECTED] writes: On Sun, Jan 07, 2001 at 03:49:43PM +0100, Goswin Brederlow wrote: Actually the load should drop, providing the following feature add ons: [...] The load should drop from that induced by the current rsync setup (for the mirrors), but if many, many more client start using rsync (instead of FTP/HTTP), I think there will still be a significant net increase in load. Whether it would be enough to cause a problem is debatable, and I honestly don't know either way. When the checksums are cached there will be no cpu load caused by rsync, since it will only transfer the file. And the checksum files will be realy small as I said, so if some similarity is found the reduction in data will make more than up for the checksum download. The only increase is the space needed to store the checksums in some form of cache. MfG Goswin
Re: big Packages.gz file
Brian May writes: zhaoway == zhaoway [EMAIL PROTECTED] writes: zhaoway 1) It prevent many more packages to come into Debian, for zhaoway example, Linux Gazette are now not present newest issues zhaoway in Debian. People occasionally got fucked up by packages zhaoway like anachism-doc because the precious band-width. And zhaoway some occasional discussion on L10N packages to distrub zhaoway others life who don't need it. bm ...only if you download and install the package in question. bm What do large packages have to do with the size of the index file, bm Packages? I think the point was that every package adds about 30-45 lines to the Packages file. You don't need to download any of the Linux Gazette to have the 33 lines each issue takes up in the Packages file. Cheers, Chris -- Got jag? http://www.tribsoft.com
Linux Gazette [Was: Re: big Packages.gz file]
== Chris Gray [EMAIL PROTECTED] writes: Brian May writes: zhaoway == zhaoway [EMAIL PROTECTED] writes: zhaoway 1) It prevent many more packages to come into Debian, for zhaoway example, Linux Gazette are now not present newest issues zhaoway in Debian. People occasionally got fucked up by packages Any reasons why the Linux gazette is not present anymore? And is there a virtual package for the Linux gazette that allays depends on the newest version? MfG Goswin
Re: package pool and big Packages.gz file
On 7 Jan 2001, Goswin Brederlow wrote: Actually the load should drop, providing the following feature add ons: 1. cached checksums and pulling instead of pushing 2. client side unpackging of compressed streams Apparently reversing the direction of rsync infringes on a patent. Plus there is the simple matter that the file listing and file download features cannot be seperated. Doing a listing of all files on our site is non-trivial. Once you strip all that out you have rproxy. Reversed checksums (with a detached checksum file) is something someone should implement for debian-cd. You calud even quite reasonably do that totally using HTTP and not run the risk of rsync load at all. Such a system for Package files would also be acceptable I think. Jason
Re: package pool and big Packages.gz file
Goswin == Goswin Brederlow [EMAIL PROTECTED] writes: Goswin Actually the load should drop, providing the following Goswin feature add ons: How does rproxy cope? Does it require a high load on the server? I suspect not, but need to check on this. I think of rsync as just being a quick hack, rproxy is the (long-term) direction we should be headed. rproxy is the same as rsync, but based on the HTTP protocol, so it should be possible (in theory) to integrate into programs like Squid, Apache and Mozilla (or so the authors claim). -- Brian May [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
== Brian May [EMAIL PROTECTED] writes: Goswin == Goswin Brederlow [EMAIL PROTECTED] writes: Goswin Actually the load should drop, providing the following Goswin feature add ons: How does rproxy cope? Does it require a high load on the server? I suspect not, but need to check on this. I think of rsync as just being a quick hack, rproxy is the (long-term) direction we should be headed. rproxy is the same as rsync, but based on the HTTP protocol, so it should be possible (in theory) to integrate into programs like Squid, Apache and Mozilla (or so the authors claim). -- Brian May [EMAIL PROTECTED] URL? Sounds more like encapsulation of an rsync similar protocol in html, but its hard to tell from the few words you write. Could be intresting though. Anyway, it will not resolve the problem with compressed files if its just like rsync. MfG Goswin
Re: package pool and big Packages.gz file
Goswin == Goswin Brederlow [EMAIL PROTECTED] writes: Goswin URL? URL:http://linuxcare.com.au/projects/rproxy/ The documentation seems very comprehensive, but I am not sure when it was last updated. Goswin Sounds more like encapsulation of an rsync similar Goswin protocol in html, but its hard to tell from the few words Goswin you write. Could be intresting though. errr... I think you mean http, not html. Goswin Anyway, it will not resolve the problem with compressed Goswin files if its just like rsync. True; however, I was thinking more in the context of Packages and other uncompressed files. It would be good though if these issues regarding deb packages could be resolved. Then again, perhaps I was a bit blunt with that statement on rsync, rsync still will always have its uses, eg. copying private data. -- Brian May [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
== Jason Gunthorpe [EMAIL PROTECTED] writes: On 7 Jan 2001, Goswin Brederlow wrote: Actually the load should drop, providing the following feature add ons: 1. cached checksums and pulling instead of pushing 2. client side unpackging of compressed streams Apparently reversing the direction of rsync infringes on a patent. When I rsync a file, rsync starts ssh to connect to the remote host and starts rsync there in the reverse mode. You say that the recieving end is violating a patent and the sending end not? Hmm, which patent anyway? So I have to fork a rsync-non-US because of a patent? Plus there is the simple matter that the file listing and file download features cannot be seperated. Doing a listing of all files on our site is non-trivial. I don't need to get a filelisting, apt-get tells me the name. :) Also I can do rsync -v host::dir and parse the output to grab the actual files with another rsync. So filelisting and downloading is absolutely seperable. Doing a listing of all file probably results in a timeout. The harddrives are too slow. Once you strip all that out you have rproxy. Reversed checksums (with a detached checksum file) is something someone should implement for debian-cd. You calud even quite reasonably do that totally using HTTP and not run the risk of rsync load at all. At the moment the client calculates one roling checksum and md5sum per block. The server, on the other hand, calculates the rolling checksum per byte and for each hit it calculates an md5sum for one block. Given a 650MB file, I don't want to know the hit/miss ratios for the roling checksum and the md5sum. Must be realy bad. The smaller the file, the less wrong md5sums need to be calculated. Such a system for Package files would also be acceptable I think. For Packages file even cvs -z9 would be fine. They are comparatively small to the rest of the load I would think. But I, just as you do, think that it would be a realy good idea to have precalculated rolling checksums and md5sums, maybe even for various blocksizes, and let the client do the time consuming guessing and calculating. That would prevent rsync to read every file served twice, as it does now when they are dissimilar. May the Source be with you. Goswin
Re: package pool and big Packages.gz file
On 8 Jan 2001, Goswin Brederlow wrote: Apparently reversing the direction of rsync infringes on a patent. When I rsync a file, rsync starts ssh to connect to the remote host and starts rsync there in the reverse mode. Not really, you have to use quite a different set of operations to do it one way vs the other. The core computation is the same, mind you. Hmm, which patent anyway? Don't know, I never heard back from Tridge on that. I don't need to get a filelisting, apt-get tells me the name. :) You have missed the point, the presence of the ability to do file listings prevents the adoption of rsync servers with high connection limits. Reversed checksums (with a detached checksum file) is something someone should implement for debian-cd. You calud even quite reasonably do that totally using HTTP and not run the risk of rsync load at all. At the moment the client calculates one roling checksum and md5sum per block. I know how rsync works, and it uses MD4. Given a 650MB file, I don't want to know the hit/miss ratios for the roling checksum and the md5sum. Must be realy bad. The ratio is supposed to only scale with block size, so it should be the same for big files and small files (ignoring the increase in block size with file size). The amount of time expended doing this calculation is not trivial however. For CD images the concern is of course available disk bandwidth, reversed checksums eliminate that bottleneck. Jason
Re: package pool and big Packages.gz file
Quoting Goswin Brederlow [EMAIL PROTECTED]: == Sami Haahtinen [EMAIL PROTECTED] writes: Or, can rsync sync binary files? Of cause, but forget it with compressed data. Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew Tridgell (Samba, Rsync) has a patch to do this, but I don't know whether he passed it onto the gzip maintainers. (Apparently he's working on a --fuzzy flag for matching rsyncs between, say foo-1.0.deb and foo-1.1.deb. He says it should be called the --debian flag.) Cheerio, Andrew Stribblehill Systems programmer, IT Service, University of Durham, England
Re: big Packages.gz file
On 2001-01-05, Brian May [EMAIL PROTECTED] wrote: What do large packages have to do with the size of the index file, Packages? They waste one byte per multiple of 10 bytes of package size. (-; Bad joke? So sue me. -- Andreas Fuchs, [EMAIL PROTECTED], [EMAIL PROTECTED], antifuchs Hail RMS! Hail Cthulhu! Hail Eris! All hail Discordia!
Re: big Packages.gz file
On 2001-01-05, Brian May [EMAIL PROTECTED] wrote: What do large packages have to do with the size of the index file, Packages? Andreas Fuchs [EMAIL PROTECTED] wrote: They waste one byte per multiple of 10 bytes of package size. (-; You mean one byte per order of magnitude of package size. ;) Bad joke? So sue me. Yes, very bad. I couldn't resist correcting, which makes me at least as bad. -- Sam Couter | Internet Engineer | http://www.topic.com.au/ [EMAIL PROTECTED]| tSA Consulting | OpenPGP key available on key servers OpenPGP fingerprint: A46B 9BB5 3148 7BEA 1F05 5BD5 8530 03AE DE89 C75C pgpSGNJSoIRqT.pgp Description: PGP signature
Re: package pool and big Packages.gz file
Andrew Stribblehill [EMAIL PROTECTED] wrote: Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew Tridgell (Samba, Rsync) has a patch to do this, but I don't know whether he passed it onto the gzip maintainers. I like the idea of having plugins for rsync to handle different kinds of data. So the gzip plugin will decompress the data, and the rsync algorithm can work on the decompressed data. Much better. (Apparently he's working on a --fuzzy flag for matching rsyncs between, say foo-1.0.deb and foo-1.1.deb. He says it should be called the --debian flag.) A deb plugin would be better. :) -- Sam Couter | Internet Engineer | http://www.topic.com.au/ [EMAIL PROTECTED]| tSA Consulting | OpenPGP key available on key servers OpenPGP fingerprint: A46B 9BB5 3148 7BEA 1F05 5BD5 8530 03AE DE89 C75C pgpSUf10GUoEI.pgp Description: PGP signature
Re: package pool and big Packages.gz file
Sam == Sam Couter [EMAIL PROTECTED] writes: Sam Andrew Stribblehill [EMAIL PROTECTED] wrote: Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew Tridgell (Samba, Rsync) has a patch to do this, but I don't know whether he passed it onto the gzip maintainers. Sam I like the idea of having plugins for rsync to handle Sam different kinds of data. So the gzip plugin will decompress Sam the data, and the rsync algorithm can work on the Sam decompressed data. Much better. (Apparently he's working on a --fuzzy flag for matching rsyncs between, say foo-1.0.deb and foo-1.1.deb. He says it should be called the --debian flag.) Sam A deb plugin would be better. :) Sounds like a good idea to me. Although don't get the two issues confused: 1. difference in filename. 2. format of file. Although, I guess in most cases the two will always be linked (eg. choosing the best filename really depends on the format, as ideally the most similar *.deb package should be used, and this means implementing debian rules for comparing versions), will this always be the case? -- Brian May [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
On Sun, Jan 07, 2001 at 11:43:39AM +1100, Sam Couter wrote: A deb plugin would be better. :) One problem with a deb plugin is that .debs are signed in compressed form. gzip isn't guaranteed to produce the same compressed file from identical uncompressed files on different architectures and releases. Varying the compression flags can also change the compressed file. -Drake
Re: package pool and big Packages.gz file
On Sun, Jan 07, 2001 at 12:53:14PM +1100, Drake Diedrich wrote: On Sun, Jan 07, 2001 at 11:43:39AM +1100, Sam Couter wrote: A deb plugin would be better. :) One problem with a deb plugin is that .debs are signed in compressed form. gzip isn't guaranteed to produce the same compressed file from identical uncompressed files on different architectures and releases. Varying the compression flags can also change the compressed file. It shouldn't be a problem to tweak things so that the resulting files end up exactly the same. This is rsync, after all, and that is the program's goal. For instance, uncompressed blocks could be used for comparison, but the gzip header copied exactly. -- - mdz
Re: [FINAL, for now ;-)] (Was: Re: package pool and big Packages.gz file)
If you don't like large Packages files, implement a rsync transfer method for them. -- see shy jo
big Packages.gz file
[sorry, either fetchmail or my ISP made me lost 30 or so emails.] The problem with bigger and bigger Packages.gz, [I thought is obvious. :-(] is, 1) It prevent many more packages to come into Debian, for example, Linux Gazette are now not present newest issues in Debian. People occasionally got fucked up by packages like anachism-doc because the precious band-width. And some occasional discussion on L10N packages to distrub others life who don't need it. 2) It don't scale. Release managment is difficult. RM in general only considering RC bugs on most of the packages he is not familiar of. ;-) Now considering mechanisms such as DIFF and RSYNC of Packages.gz 1) They're difficult to setup, though it _should_ be easier considering it's an end-user stuff. With the current state of testing, i.e., often updated Packages.gz and a more or less stable state, that people tends to update very often. 2) They have a FIX TIME problem. I.e., if you don't RSYNC or DIFF for a long time, they won't save you extra bandwidth. While my approach do. 3) They don't scale just as well. ;-) Now considering mechanism to section Packages.gz by functionality or just like Package pool does. 1) Due to the complicated Dependency problem, they're deemed to fail. ;-) Okay, now see my approach. [See my previous mail. The FINAL one. ;-)] 1) They're compatible with old tools. (Only you discuss with me!!) 2) It scales well. To release managment, and to include just as many as our hard disks permitted packages into Debian. 3) It is very easy for enduser to setup. 4) No extra burden on Developers as how frequently they should do upload. 5) No FIX TIME problem. (See above.) 6) Possibilies exist for package to provide changelog to users for their consideration to if to upload. These will help developers to avoid some fake bug reports. So why not bother discuss with me? ;-) zw
Re: big Packages.gz file
zhaoway == zhaoway [EMAIL PROTECTED] writes: zhaoway 1) It prevent many more packages to come into Debian, for zhaoway example, Linux Gazette are now not present newest issues zhaoway in Debian. People occasionally got fucked up by packages zhaoway like anachism-doc because the precious band-width. And zhaoway some occasional discussion on L10N packages to distrub zhaoway others life who don't need it. ...only if you download and install the package in question. What do large packages have to do with the size of the index file, Packages? zhaoway 2) They have a FIX TIME problem. I.e., if you don't RSYNC zhaoway or DIFF for a long time, they won't save you extra zhaoway bandwidth. While my approach do. You only download what has changed. Nothing more, nothing less. I could equally argue, if you wait a while, then exactly one package in each section will change, causing you to have to re-download all Index files. I am not trying to argue that your method is a bad idea, but please try and get your facts straight first. Now back on topic: another similar alternative to rsync might be protocols like rproxy, which add rsync capabilities to HTTP. Apparently the authors want to include functionality (not sure what time frame they are talking about here) in Squid and Apache. This would mean rsync support in apt-get may be less important, you just need to force it to download Packages not Packages.gz. -- Brian May [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
On 5 Jan 2001, Goswin Brederlow wrote: If that suits your needs, feel free to write a bugreport on apt about this. Yes, I enjoy closing such bug reports with a terse response. Hint: Read the bug page for APT to discover why! Jason
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 03:05:03AM +0100, Goswin Brederlow wrote: Whats the problem with a big Packages file? If you don't want to download it again and again just because of small changes I have a better solution for you: rsync apt-get update could rsync all Packages files (yes, not the .gz once) and thereby download only changed parts. On uncompressed files rsync is very effective and the changes can be compressed for the actual transfer. So on upload you will pratically get a diff.gz to your old Packages file. this would bring us to, apt renaming the old deb (if there is one) to the name of the new package and rsync those. and we would save some time once again... Or, can rsync sync binary files? hmm.. this sounds like something worth implementing.. -- every nerd knows how to enjoy the little things of life, like: rm -rf windows
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 05:46:35AM +0800, zhaoway wrote: how about diffs bethween dinstall runs?.. sorry, but i don't understand here. dinstall is a server side thing here? yes, when dinstall runs it would copy the old packages file to, lets say, packages.old and create it's changes to the new file.. after it's done it would diff packages.old and packages... packages-010102-010103.gz packages-010103-010104.gz packages.gz apt would download the changes after the last update, and merge these to the package file, if the file gets corrupted, it would attempt to do a full update. on the top, some pkg-gz-deb lists packages on the leaf of dependency tree, and each of pkg-gz-deb won't get bigger than 100k, and each of them depends on some more basic pkg-gz-deb below, some other pkg-gz-deb like the base sub-system. this way, when user install xdm, apt-get first install pkg-gz-deb which lists xdm, then as dependency checking, it will install base-a-pkg-gz-deb etc. ect., then xdm got installed, this way, all xdm's dependency will be fulfiled with the newest information avalaible. and you can see this will surely ease up the band-width. (when update gcc, i won't get additional bits of Packages.gz about xdm xfree etc.) wouldn't this make it a BIT too difficult? -- every nerd knows how to enjoy the little things of life, like: rm -rf windows
Re: package pool and big Packages.gz file
Previously Sami Haahtinen wrote: this would bring us to, apt renaming the old deb (if there is one) to the name of the new package and rsync those. and we would save some time once again... There is a --fuzzy-names patch for rsync that makes rsync do that itself. Or, can rsync sync binary files? Yes. hmm.. this sounds like something worth implementing.. Don't bother, it's been done already. Ask Rusty for details. Wichert. -- / Generally uninteresting signature - ignore at your convenience \ | [EMAIL PROTECTED] http://www.liacs.nl/~wichert/ | | 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0 2805 3CB8 9250 2FA3 BC2D |
Re: package pool and big Packages.gz file
== Sami Haahtinen [EMAIL PROTECTED] writes: On Fri, Jan 05, 2001 at 03:05:03AM +0100, Goswin Brederlow wrote: Whats the problem with a big Packages file? If you don't want to download it again and again just because of small changes I have a better solution for you: rsync apt-get update could rsync all Packages files (yes, not the .gz once) and thereby download only changed parts. On uncompressed files rsync is very effective and the changes can be compressed for the actual transfer. So on upload you will pratically get a diff.gz to your old Packages file. this would bring us to, apt renaming the old deb (if there is one) to the name of the new package and rsync those. and we would save some time once again... Thats what the debian-mirror script does (its about halve of the script just for that). It also uses old tar.gz, orig.tar.gz, diff.gz and dsc files. Or, can rsync sync binary files? Of cause, but forget it with compressed data. hmm.. this sounds like something worth implementing.. I'm currently discussing some changes to the rsync client with some people from the rsync ML which would uncompress compressed data on the client side (no changes to the server) and rsync those. Sounds like not improving anything, but when reading the full description on this it actually does. Before that rsyncing new debs with old once hardly ever saves anything. Where it hels is with big packages like xfree, where several packages are identical between releases. MfG Goswin
Re: package pool and big Packages.gz file
== Jason Gunthorpe [EMAIL PROTECTED] writes: On 5 Jan 2001, Goswin Brederlow wrote: If that suits your needs, feel free to write a bugreport on apt about this. Yes, I enjoy closing such bug reports with a terse response. Hint: Read the bug page for APT to discover why! Jason I couldn't find any existing bugreport concerning rsync support for apt-get in the long list of bugs. So why would you close such a wishlist bugreport? And why with a terse response? MfG Goswin
Re: package pool and big Packages.gz file
In 05 Jan 2001 19:51:08 +0100 Goswin Brederlow [EMAIL PROTECTED] cum veritate scripsit : Hello, I'm currently discussing some changes to the rsync client with some people from the rsync ML which would uncompress compressed data on the client side (no changes to the server) and rsync those. Sounds like not improving anything, but when reading the full description on this it actually does. Before that rsyncing new debs with old once hardly ever saves anything. Where it hels is with big packages like xfree, where several packages are identical between releases. No offence, but wouldn't it be a tad difficult to play around with it, since deb packages are not just gzipped archives, but ar archive containing gzipped tar archives? regards, junichi -- University: [EMAIL PROTECTED]Netfort: [EMAIL PROTECTED] dancer, a.k.a. Junichi Uekawa http://www.netfort.gr.jp/~dancer Dept. of Knowledge Engineering and Computer Science, Doshisha University. ... Long Live Free Software, LIBERTAS OMNI VINCIT.
Re: package pool and big Packages.gz file
== Junichi Uekawa [EMAIL PROTECTED] writes: In 05 Jan 2001 19:51:08 +0100 Goswin Brederlow [EMAIL PROTECTED] cum veritate scripsit : Hello, I'm currently discussing some changes to the rsync client with some people from the rsync ML which would uncompress compressed data on the client side (no changes to the server) and rsync those. Sounds like not improving anything, but when reading the full description on this it actually does. Before that rsyncing new debs with old once hardly ever saves anything. Where it hels is with big packages like xfree, where several packages are identical between releases. No offence, but wouldn't it be a tad difficult to play around with it, since deb packages are not just gzipped archives, but ar archive containing gzipped tar archives? Yes and no. The problem is that deb files are special ar archives, so you can't just download the files and ar them together. One way would be to download the files in the ar, ar them together and rsync again. Since ar does not chnage the data in it, the deb has the same data just at different places, and rsync handles that well. This would be possible, but would require server changes. The trick is to know a bit about ar, but not to much. Just rsync the header of the ar file till the first real file in it and then rsync that recursively, then a bit more ar file data and another file and so on. Knowing when subfiles start and how long they are is enough. The question will be how much intelligence to teach rsync. I like rsync stupid but still intelligent enough to do the job. Its pretty tricky, so it will be some time before anything in that direction is useable. MfG Goswin
Re: package pool and big Packages.gz file
Jason Gunthorpe wrote: Hint: Read the bug page for APT to discover why! Looking through the apt bugs., saw this one, rejected: Bug#77054: wish: show current-upgraded versions on upgrade -u My private solution to this is the following patch to `apt-get': --- algorithms.cc-ORG Sat May 13 06:08:43 2000 +++ algorithms.cc Sat Sep 9 22:11:19 2000 @@ -47,9 +47,13 @@ { // Adapt the iterator PkgIterator Pkg = Sim.FindPkg(iPkg.Name()); + const char *oldver = Pkg-CurrentVer ? Pkg.CurrentVer().VerStr() : -; + const char *newver = Pkg-VersionList ? Pkg.VersionList().VerStr() : -; + Flags[Pkg-ID] = 1; - cout Inst Pkg.Name(); + cout Inst Pkg.Name() ( oldver newver ); + Sim.MarkInstall(Pkg,false); // Look for broken conflicts+predepends. This informs me about versions when doing apt-get --no-act install package. I like this very much, and would appreciate this going into the official apt-get command. -- Thanks, -o) Matthijs Melchior Maarssen /\\ mailto:[EMAIL PROTECTED] +31 346 570616Netherlands _\_v
package pool and big Packages.gz file
hi, [i'm not sure if this has been resolved, lart me if you like.] my proposal to resolve big Packages.gz is through package pool system. add 36 or so new debian package, namely, [a-zA-Z0-1]-packages-gz_date_all.deb contents of each is quite obvious. ;-) and a virtual unstable-packages-gz depends on all of them. finished. apt-get update should deal with it. 1) as default, install packages-gz.deb, and finished. (against some policy ...) 2) otherwise, let user to choose from, that is a ui design... ;-) release managment could just ;-) upload a woody-packages-gz_test-1_all.deb episode I finished. episode II involves the package pool deletion algorithms. a package should only be deleted when no *-packages-gz debs reference it. my 2'c thanks for bear with me ;-) zw
Re: package pool and big Packages.gz file
[read my previous semi-proposal] this has some more benefits, 1) package maintainer could upload (to pool) in whatever frequency they like. 2) release is seperated from package pool which is a storage system. and release is a qa system. 3) release could be managed through BTS on specific package-gz.deb that surely would put much more burden on BTS, ;-) 4) if apt-get could deal it well, i hope all of sub-mirror'ing issue will be gone easily. just apt-get install some-rel-packages-gz then apt-get mirror (just like download and move ...) my more 2'c ;-) zw
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 03:17:30AM +0800, zhaoway wrote: [read my previous semi-proposal] this has some more benefits, 1) package maintainer could upload (to pool) in whatever frequency they like. in an ideal world, developer should upload to ''xxx-auto-builder'' ;-) 9i'm turning out to be crappy now. ;-) bye,
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote: my proposal to resolve big Packages.gz is through package pool system. add 36 or so new debian package, namely, [a-zA-Z0-1]-packages-gz_date_all.deb contents of each is quite obvious. ;-) and a virtual unstable-packages-gz depends on all of them. finished. apt-get update should deal with it how about diffs bethween dinstall runs?.. packages-010102-010103.gz packages-010103-010104.gz packages.gz apt would download the changes after the last update, and merge these to the package file, if the file gets corrupted, it would attempt to do a full update. This wouldn't be a big difference in the load that the master-ftp has to handle, atleast when some 7 of these would be stored at maximum. Regards, Sami Haahtinen -- every nerd knows how to enjoy the little things of life, like: rm -rf windows
Re: package pool and big Packages.gz file
The only other possibility not yet proposed (?) would be to split the packages file by section. base-packages games-packages x11-packages net-packages Then a server that just doesn't do x11 or doesn't go games has no need to keep up with available x11 or games packages. [EMAIL PROTECTED] punki.fi To: debian-devel@lists.debian.org (Samicc: (bcc: Vince Mulhollon/Brookfield/Norlight) Haahtinen) Fax to: Subject: Re: package pool and big Packages.gz file 01/04/2001 03:01 PM On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote: my proposal to resolve big Packages.gz is through package pool system. add 36 or so new debian package, namely, [a-zA-Z0-1]-packages-gz_date_all.deb contents of each is quite obvious. ;-) and a virtual unstable-packages-gz depends on all of them. finished. apt-get update should deal with it how about diffs bethween dinstall runs?.. packages-010102-010103.gz packages-010103-010104.gz packages.gz apt would download the changes after the last update, and merge these to the package file, if the file gets corrupted, it would attempt to do a full update. This wouldn't be a big difference in the load that the master-ftp has to handle, atleast when some 7 of these would be stored at maximum. Regards, Sami Haahtinen -- every nerd knows how to enjoy the little things of life, like: rm -rf windows -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
On Thu, Jan 04, 2001 at 03:07:00PM -0600, Vince Mulhollon wrote: The only other possibility not yet proposed (?) would be to split the packages file by section. base-packages games-packages x11-packages net-packages Then a server that just doesn't do x11 or doesn't go games has no need to keep up with available x11 or games packages. how would the package manager (namely apt) know which ones you need.. even if you don't have X11 installed (and apt assumes you don't need X11 packages file) doesn't mean that you wouldn't want to install x11 packages file. same goes for net (which is a weird definition in any case) and games, base is the only reasonable one. But without the others it's not needed either... -- every nerd knows how to enjoy the little things of life, like: rm -rf windows
Re: package pool and big Packages.gz file
On Thu, Jan 04, 2001 at 11:01:15PM +0200, Sami Haahtinen wrote: On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote: my proposal to resolve big Packages.gz is through package pool system. add 36 or so new debian package, namely, [a-zA-Z0-1]-packages-gz_date_all.deb contents of each is quite obvious. ;-) and a virtual unstable-packages-gz depends on all of them. finished. apt-get update should deal with it how about diffs bethween dinstall runs?.. sorry, but i don't understand here. dinstall is a server side thing here? packages-010102-010103.gz packages-010103-010104.gz packages.gz apt would download the changes after the last update, and merge these to the package file, if the file gets corrupted, it would attempt to do a full update. This wouldn't be a big difference in the load that the master-ftp has to handle, atleast when some 7 of these would be stored at maximum. okay, try to group packages according to dependency, on the top, some pkg-gz-deb lists packages on the leaf of dependency tree, and each of pkg-gz-deb won't get bigger than 100k, and each of them depends on some more basic pkg-gz-deb below, some other pkg-gz-deb like the base sub-system. this way, when user install xdm, apt-get first install pkg-gz-deb which lists xdm, then as dependency checking, it will install base-a-pkg-gz-deb etc. ect., then xdm got installed, this way, all xdm's dependency will be fulfiled with the newest information avalaible. and you can see this will surely ease up the band-width. (when update gcc, i won't get additional bits of Packages.gz about xdm xfree etc.) regards, zw
Re: package pool and big Packages.gz file
On Thu, Jan 04, 2001 at 11:19:59PM +0200, Sami Haahtinen wrote: how would the package manager (namely apt) know which ones you need.. even if you don't have X11 installed (and apt assumes you don't need X11 packages file) doesn't mean that you wouldn't want to install x11 packages file. another solution is to let every single deb provides its.pkg-gz then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. and a virtual release-pkgs-gz.deb will depend on some selected part of those any.pkg-gz to get up a release. then katie will remove a package only when no release-pkgs-gz.deb (or testing, or whatever) depends on its.pkg-gz regards, zw
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 06:07:20AM +0800, zhaoway wrote: another solution is to let every single deb provides its.pkg-gz then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. that is a minimum. isn't it? ;) and then we will need some ``apt-get info pkg'' hehe.. and a virtual release-pkgs-gz.deb will depend on some selected part of those any.pkg-gz to get up a release. say one release contains 2000 pkgz, each pkg name is 10 chars, then the whole infomation is a little more then 20k, compare with nowadays, a more than 1M. and you could have base-3.3-release, and gnome-4.4-release which depends on base-3.2-release and x-5.6-release. and chinese-2.0-release etc. ... then katie will remove a package only when no release-pkgs-gz.deb (or testing, or whatever) depends on its.pkg-gz zw
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 06:07:20AM +0800 , zhaoway wrote: On Thu, Jan 04, 2001 at 11:19:59PM +0200, Sami Haahtinen wrote: how would the package manager (namely apt) know which ones you need.. even if you don't have X11 installed (and apt assumes you don't need X11 packages file) doesn't mean that you wouldn't want to install x11 packages file. another solution is to let every single deb provides its.pkg-gz then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. but it will immensly restrict it's view on dependencies - think about virtual packages. This is really not the way. Maybe spliting by as in pool/ so you only download changed part of the whole thing. But that's about it. Maybe you can leave some part out, but .. Petr Cech -- Debian GNU/Linux maintainer - www.debian.{org,cz} [EMAIL PROTECTED] * Joy notes some people think Unix is a misspelling of Unics which is a misspelling of Emacs :)
Re: package pool and big Packages.gz file
[quote myself, ;-) this is semi-final now ;-)] another solution is to let every single deb provides its.pkg-gz then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. that is a minimum. isn't it? ;) and then we will need some ``apt-get info pkg'' hehe.. and a virtual release-pkgs-gz.deb will depend on some selected part of those any.pkg-gz to get up a release. say one release contains 2000 pkgz, each pkg name is 10 chars, then the whole infomation is a little more then 20k, compare with nowadays, a more than 1M. and you could still do ``apt-get dist-upgrade'', just first install release-pkgs-gz.deb then go on..., OR, first get a list of all debs installed then update them each. [some more thoughts here..., later] and you could have base-3.3-release, and gnome-4.4-release which depends on base-3.2-release and x-5.6-release. and chinese-2.0-release etc. ... then katie will remove a package only when no release-pkgs-gz.deb (or testing, or whatever) depends on its.pkg-gz zw
Re: package pool and big Packages.gz file
On Thu, Jan 04, 2001 at 11:19:25PM +0100, Petr Cech wrote: On Fri, Jan 05, 2001 at 06:07:20AM +0800 , zhaoway wrote: then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. but it will immensly restrict it's view on dependencies - think about virtual packages. This is really not the way. Maybe spliting by as in pool/ so you only download changed part of the whole thing. But that's about it. Maybe you can leave some part out, but .. virtual package is weird here. ;-) but could be resolve by a some-virtula.pkg-gz ;-) and the tree view of dependency tree, like in console-apt, that means, [see my another semi-final mail.. ;-)] in general, if you wanna an tree-view of the whole tree, you will need to download the whole tree anyway, and my approach won't prevent you do that. ;-) kinda regards, zw
[FINAL, for now ;-)] (Was: Re: package pool and big Packages.gz file)
final thoughts ;-) On bigger and bigger Packages.gz file, a try The directory structure looks roughly like this: debian/dists/woody/main/binary-all/Packages.deb debian/pool/main/a/abba/abba_1989.orig.tar.gz abba_1989-12.diff.gz abba_1989-12.dsc abba_1989-12_all.deb abba_1989-12_all.pkg debian/pool/main/r/rel-chinese/rel-music_0.9_all.pkg rel-music_0.9_all.deb rel-base/rel-base_200_all.pkg rel-base_200_all.pkg Contents of rel-chinese_0.9_all.pkg is as following. rel-base or even rel-woody is just much more complicated. Hope so. rel-chinese.deb is nearly an empty package. Package: rel-music Priority: optional Section: misc Installed-Size: 12 Maintainer: Anthony and Cleopatra Architecture: all Source: rel-chinese Version: 0.9 Depends: rel-base (= 200), abba (= 1989-12), beatles(= 1979-100), garbage(= 1998-7) wearied-ear (= 2.1) Provides: music | abba | beatles Filename: debian/pool/main/r/rel-chinese/rel-music_0.9_all.deb Size: 3492 MD5sum: c8c730ea650cf14638d83d6bb7707cdb Description: Simplified music environment This 'task package' installs programs, data files, fonts, and documentation that makes it easier to use Debian for Simplified music related operations. (Surprise, surprise, garbage didn't provide music!) Note, music is a virtual package provided by adda and beatles. Contents of abba_1989-12_all.pkg is as following. Package: abba Priority: optional Section: sound Installed-Size: 140 Maintainer: Old Man Billy Architecture: all Version: 1998-12 Replaces: beatles Provides: music Depends: wearied-ear (= 2.0) Filename: pool/main/a/abba/abba_1989-12_all.deb Size: 33256 MD5sum: e07899b62b7ad12c545e9998adb7c8d7 Description: A Swedish Music Band ABBA is popular in 1980's in last millenium. Don't be confused by ABBA and ADDA which is a heavy metal band. Here, music is a virtual package provided by packages abba and beatles. Let's simulate some typical senarios here. 1) apt-get update There're roughtly two purpose for this action. One is to get an overview, to ease up further processing like virtual packages; another purpose is to install a specific package, or do dist-upgrade. On the second purpose, apt-get here will do nothing. (See below) On the first purpose, apt-get will have to download and parse the current distribution's .pkg file according to user configuration. Say, to download rel-music, and then see the virtual package music is provided by abba and beatles. So, generally, ``apt-get update'' will deal with rel-some__all.pkg to get all of the overall information it will need further on. Then, where does the rel-some__all.pkg get its information? We don't want the release manager to track down all of these information. So, where's katie? ;-) I think the trade-off is worthy (Indeed, only katie get to be a little more complicated) considering the scalabily being gained. Read on. 2) apt-get install abba apt-get will first parse the previously downloaded rel-music.pkg, and get abba is at version 1998-12, and it depends on wearied-ear (= 2.0) and wow! rel-music happens to provide wearied-ear (= 2.1), that's okay. Then apt-get go on to download its .pkg and parse it, and so on. When all required .pkg were downloaded and parsed (an updated Packages.gz) apt-get then go on to download and install every of the debs. (Maybe there will be more complicated issues, only let me know. See what's going. ;-) Thus, minimum data downloaded. ;-) 3) apt-get dist-upgrade I don't know the details, but I think it's not very complicated given above information. (All necessary things are there, aren't they? ;-) 4) Packages upload .pkg file is generated automatically. No extra burden on most of the developers. And developers could upload just as frequently as they see fit. ;-) Katie will be a little ;-) more complicated. Package will get to be deleted from package pool only when no rel-X depends on them. rel-X are treated specially. And some fine tuned mirror could be setup. And release management could benefit from Bug Tracking System and more flexible. IMHO. ;-) Kind regards, zw
Re: package pool and big Packages.gz file
== zhaoway [EMAIL PROTECTED] writes: hi, [i'm not sure if this has been resolved, lart me if you like.] my proposal to resolve big Packages.gz is through package pool system. Whats the problem with a big Packages file? If you don't want to download it again and again just because of small changes I have a better solution for you: rsync apt-get update could rsync all Packages files (yes, not the .gz once) and thereby download only changed parts. On uncompressed files rsync is very effective and the changes can be compressed for the actual transfer. So on upload you will pratically get a diff.gz to your old Packages file. If that suits your needs, feel free to write a bugreport on apt about this. MfG Goswin