Re: package pool and big Packages.gz file
== Jason Gunthorpe [EMAIL PROTECTED] writes: On 8 Jan 2001, Goswin Brederlow wrote: I don't need to get a filelisting, apt-get tells me the name. :) You have missed the point, the presence of the ability to do file listings prevents the adoption of rsync servers with high connection limits. Then that feature should be limited to non-recursive listings or turned off. Or .listing files should be created that are just served. Reversed checksums (with a detached checksum file) is something someone should implement for debian-cd. You calud even quite reasonably do that totally using HTTP and not run the risk of rsync load at all. At the moment the client calculates one roling checksum and md5sum per block. I know how rsync works, and it uses MD4. Ups, then s/5/4/g. Given a 650MB file, I don't want to know the hit/miss ratios for the roling checksum and the md5sum. Must be realy bad. The ratio is supposed to only scale with block size, so it should be the same for big files and small files (ignoring the increase in block size with file size). The amount of time expended doing this calculation is not trivial however. Hmm, in the technical paper it says that it creates a 16 bit external hash, each entry a linked list of items containing the full 32 Bit rolling checksum (or the other 16 bit) and the md4sum. So when you have more blocks, the hash will fill up. So you have more hits on the first level and need to search a linked list. With a block size of 1K a CD image has 10 items per hash entry, its 1000% full. The time wasted alone to check the rolling checksum must be huge. And with 65 rolling checksums for the image, theres a ~10/65536 chance chance of hitting the same checksum with differen md4sum, so thats about 100 times per CD, just by pure chance. If the images match, then its 65 times. So the better the match, the more blocks you have, the more cpu it takes. Of cause larger blocks take more time to compute a md4sum, but you will have less blocks then. For CD images the concern is of course available disk bandwidth, reversed checksums eliminate that bottleneck. That anyway. And ram. MfG Goswin
Re: package pool and big Packages.gz file
On 8 Jan 2001, Goswin Brederlow wrote: Then that feature should be limited to non-recursive listings or turned off. Or .listing files should be created that are just served. *couf* rproxy *couf* So when you have more blocks, the hash will fill up. So you have more hits on the first level and need to search a linked list. With a block size of 1K a CD image has 10 items per hash entry, its 1000% full. The time wasted alone to check the rolling checksum must be huge. Sure, but that is trivially solvable and is really a minor amount of time when compared with the computing of the MD4 hashes. In fact when you start taking about 65 blocks you want to reconsider the design choices that were made with rsync's searching - it is geared toward small files and is not really optimal for big ones. So the better the match, the more blocks you have, the more cpu it takes. Of cause larger blocks take more time to compute a md4sum, but you will have less blocks then. No. The smaller the blocks the more CPU time it will take to compute MD4 hashes. Expect MD4 to run at 100meg/sec on modern hardware so you are looking at burning 6 seconds of CPU time to verify the local CD image. If you start getting large 32 bit checksum matches with md4 mismatches due to too large a block size then you could easially double or triple the number of md4 calculations you need. That is still totally dwarfed by the 10meg/sec IO throughput you can expect with a copy of a 600 meg ISO file. Jason
Re: package pool and big Packages.gz file
On Fri, 5 Jan 2001 09:33:05 -0700 (MST) Jason Gunthorpe [EMAIL PROTECTED] wrote: If that suits your needs, feel free to write a bugreport on apt about this. Yes, I enjoy closing such bug reports with a terse response. Hint: Read the bug page for APT to discover why! From bug report #76118: No. Debian can not support the use of rsync for anything other than mirroring, APT will never support it. Why? Because if everyone used rsync, the loads on the servers that supported rsync would be too high? Or something else? -- Sam Vilain, [EMAIL PROTECTED]WWW: http://sam.vilain.net/ GPG public key: http://sam.vilain.net/sam.asc
Re: package pool and big Packages.gz file
On Fri, 5 Jan 2001 19:08:38 +0200 [EMAIL PROTECTED] (Sami Haahtinen) wrote: Or, can rsync sync binary files? hmm.. this sounds like something worth implementing.. rsync can, but the problem is with a compressed stream if you insert or alter data early on in the stream, the data after that change is radically different. But... you could use it successfully against the .tar files inside the .deb, which are normally compressed. This would probably require some special implementation of rsync, or to have the uncompressed packages on the server and put the magic in apt. Or perhaps the program apt-mirror is called for, which talks its own protocol to other copies of itself, and will do a magic job of selectively updating mirror copies of the debian archive using the rsync algorithm. This would be similar to the apt-get and apt-move pair, but actually sticking it into a directory structure that looks like the debian mirror. Then, if you want to enable it, turn on the server version and share your mirror with your friends inside your corporate network! Or an authenticated version, so that a person with their own permanent internet connection could share their archive with a handful of friends - having an entire mirror would be too costly for them. I think this has some potential to be quite useful and reduce bandwidth requirements. It could use GPG signatures to check that nothing funny is going on, too. Either that or keep a number of patch files or .xd files for a couple of old revs per packages against the uncompressed contents of packages to allow small changes to packages to be quick. Or perhaps implement this as patch packages, which are a special .deb that only contain the changed files and upgrade the package. -- Sam Vilain, [EMAIL PROTECTED]WWW: http://sam.vilain.net/ GPG public key: http://sam.vilain.net/sam.asc
Re: package pool and big Packages.gz file
== Sam Vilain [EMAIL PROTECTED] writes: On Fri, 5 Jan 2001 09:33:05 -0700 (MST) Jason Gunthorpe [EMAIL PROTECTED] wrote: If that suits your needs, feel free to write a bugreport on apt about this. Yes, I enjoy closing such bug reports with a terse response. Hint: Read the bug page for APT to discover why! From bug report #76118: No. Debian can not support the use of rsync for anything other than mirroring, APT will never support it. Why? Because if everyone used rsync, the loads on the servers that supported rsync would be too high? Or something else? -- Sam Vilain, [EMAIL PROTECTED] WWW: http://sam.vilain.net/ GPG public key: http://sam.vilain.net/sam.asc Actually the load should drop, providing the following feature add ons: 1. cached checksums and pulling instead of pushing 2. client side unpackging of compressed streams That way the rsync servers would have to first server the checksum file from cache (being 200-1000 smaller than the real file) and then just the blocks the client asks for. So if 1% of the file being rsynced fits its even and everything above that saves bandwidth. The current mode of operation of rsync works in the reverse, so all the computation is done on the server every time, which of cause is a heavy load on the server. I hope both features will work without chaning the server, but if not, we will have to wait till servers catch up with the feature. MfG Goswin
Re: package pool and big Packages.gz file
Sam Vilain [EMAIL PROTECTED] writes: On Fri, 5 Jan 2001 19:08:38 +0200 [EMAIL PROTECTED] (Sami Haahtinen) wrote: Or, can rsync sync binary files? hmm.. this sounds like something worth implementing.. rsync can, but the problem is with a compressed stream if you insert or alter data early on in the stream, the data after that change is radically different. But... you could use it successfully against the .tar files inside the .deb, which are normally compressed. This would probably require some special implementation of rsync, or to have the uncompressed packages on the server and put the magic in apt. [...] Either that or keep a number of patch files or .xd files for a couple of old revs per packages against the uncompressed contents of packages to allow small changes to packages to be quick. Or perhaps implement this as patch packages, which are a special .deb that only contain the changed files and upgrade the package. I suggest you have a look at 'tje' by Joost Witteveen (http://joostje.op.het.net/tje/index.html). It is specifically written with the goal in mind to sync Debian mirrors with minimum bandwidth use. It doesn't use the rsync algorithm, but something similar. It understands .debs and claims to have less server CPU usage than rsync, since it caches diffs and md5sums. It would be really nice if anybody with an up-to-date mirror could volunteer to provide a machine to set up a tje server to test it a little more... Falk
Re: package pool and big Packages.gz file
On Sun, Jan 07, 2001 at 03:49:43PM +0100, Goswin Brederlow wrote: Actually the load should drop, providing the following feature add ons: [...] The load should drop from that induced by the current rsync setup (for the mirrors), but if many, many more client start using rsync (instead of FTP/HTTP), I think there will still be a significant net increase in load. Whether it would be enough to cause a problem is debatable, and I honestly don't know either way. -- - mdz
Re: package pool and big Packages.gz file
== Matt Zimmerman [EMAIL PROTECTED] writes: On Sun, Jan 07, 2001 at 03:49:43PM +0100, Goswin Brederlow wrote: Actually the load should drop, providing the following feature add ons: [...] The load should drop from that induced by the current rsync setup (for the mirrors), but if many, many more client start using rsync (instead of FTP/HTTP), I think there will still be a significant net increase in load. Whether it would be enough to cause a problem is debatable, and I honestly don't know either way. When the checksums are cached there will be no cpu load caused by rsync, since it will only transfer the file. And the checksum files will be realy small as I said, so if some similarity is found the reduction in data will make more than up for the checksum download. The only increase is the space needed to store the checksums in some form of cache. MfG Goswin
Re: package pool and big Packages.gz file
On 7 Jan 2001, Goswin Brederlow wrote: Actually the load should drop, providing the following feature add ons: 1. cached checksums and pulling instead of pushing 2. client side unpackging of compressed streams Apparently reversing the direction of rsync infringes on a patent. Plus there is the simple matter that the file listing and file download features cannot be seperated. Doing a listing of all files on our site is non-trivial. Once you strip all that out you have rproxy. Reversed checksums (with a detached checksum file) is something someone should implement for debian-cd. You calud even quite reasonably do that totally using HTTP and not run the risk of rsync load at all. Such a system for Package files would also be acceptable I think. Jason
Re: package pool and big Packages.gz file
Goswin == Goswin Brederlow [EMAIL PROTECTED] writes: Goswin Actually the load should drop, providing the following Goswin feature add ons: How does rproxy cope? Does it require a high load on the server? I suspect not, but need to check on this. I think of rsync as just being a quick hack, rproxy is the (long-term) direction we should be headed. rproxy is the same as rsync, but based on the HTTP protocol, so it should be possible (in theory) to integrate into programs like Squid, Apache and Mozilla (or so the authors claim). -- Brian May [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
== Brian May [EMAIL PROTECTED] writes: Goswin == Goswin Brederlow [EMAIL PROTECTED] writes: Goswin Actually the load should drop, providing the following Goswin feature add ons: How does rproxy cope? Does it require a high load on the server? I suspect not, but need to check on this. I think of rsync as just being a quick hack, rproxy is the (long-term) direction we should be headed. rproxy is the same as rsync, but based on the HTTP protocol, so it should be possible (in theory) to integrate into programs like Squid, Apache and Mozilla (or so the authors claim). -- Brian May [EMAIL PROTECTED] URL? Sounds more like encapsulation of an rsync similar protocol in html, but its hard to tell from the few words you write. Could be intresting though. Anyway, it will not resolve the problem with compressed files if its just like rsync. MfG Goswin
Re: package pool and big Packages.gz file
Goswin == Goswin Brederlow [EMAIL PROTECTED] writes: Goswin URL? URL:http://linuxcare.com.au/projects/rproxy/ The documentation seems very comprehensive, but I am not sure when it was last updated. Goswin Sounds more like encapsulation of an rsync similar Goswin protocol in html, but its hard to tell from the few words Goswin you write. Could be intresting though. errr... I think you mean http, not html. Goswin Anyway, it will not resolve the problem with compressed Goswin files if its just like rsync. True; however, I was thinking more in the context of Packages and other uncompressed files. It would be good though if these issues regarding deb packages could be resolved. Then again, perhaps I was a bit blunt with that statement on rsync, rsync still will always have its uses, eg. copying private data. -- Brian May [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
== Jason Gunthorpe [EMAIL PROTECTED] writes: On 7 Jan 2001, Goswin Brederlow wrote: Actually the load should drop, providing the following feature add ons: 1. cached checksums and pulling instead of pushing 2. client side unpackging of compressed streams Apparently reversing the direction of rsync infringes on a patent. When I rsync a file, rsync starts ssh to connect to the remote host and starts rsync there in the reverse mode. You say that the recieving end is violating a patent and the sending end not? Hmm, which patent anyway? So I have to fork a rsync-non-US because of a patent? Plus there is the simple matter that the file listing and file download features cannot be seperated. Doing a listing of all files on our site is non-trivial. I don't need to get a filelisting, apt-get tells me the name. :) Also I can do rsync -v host::dir and parse the output to grab the actual files with another rsync. So filelisting and downloading is absolutely seperable. Doing a listing of all file probably results in a timeout. The harddrives are too slow. Once you strip all that out you have rproxy. Reversed checksums (with a detached checksum file) is something someone should implement for debian-cd. You calud even quite reasonably do that totally using HTTP and not run the risk of rsync load at all. At the moment the client calculates one roling checksum and md5sum per block. The server, on the other hand, calculates the rolling checksum per byte and for each hit it calculates an md5sum for one block. Given a 650MB file, I don't want to know the hit/miss ratios for the roling checksum and the md5sum. Must be realy bad. The smaller the file, the less wrong md5sums need to be calculated. Such a system for Package files would also be acceptable I think. For Packages file even cvs -z9 would be fine. They are comparatively small to the rest of the load I would think. But I, just as you do, think that it would be a realy good idea to have precalculated rolling checksums and md5sums, maybe even for various blocksizes, and let the client do the time consuming guessing and calculating. That would prevent rsync to read every file served twice, as it does now when they are dissimilar. May the Source be with you. Goswin
Re: package pool and big Packages.gz file
On 8 Jan 2001, Goswin Brederlow wrote: Apparently reversing the direction of rsync infringes on a patent. When I rsync a file, rsync starts ssh to connect to the remote host and starts rsync there in the reverse mode. Not really, you have to use quite a different set of operations to do it one way vs the other. The core computation is the same, mind you. Hmm, which patent anyway? Don't know, I never heard back from Tridge on that. I don't need to get a filelisting, apt-get tells me the name. :) You have missed the point, the presence of the ability to do file listings prevents the adoption of rsync servers with high connection limits. Reversed checksums (with a detached checksum file) is something someone should implement for debian-cd. You calud even quite reasonably do that totally using HTTP and not run the risk of rsync load at all. At the moment the client calculates one roling checksum and md5sum per block. I know how rsync works, and it uses MD4. Given a 650MB file, I don't want to know the hit/miss ratios for the roling checksum and the md5sum. Must be realy bad. The ratio is supposed to only scale with block size, so it should be the same for big files and small files (ignoring the increase in block size with file size). The amount of time expended doing this calculation is not trivial however. For CD images the concern is of course available disk bandwidth, reversed checksums eliminate that bottleneck. Jason
Re: package pool and big Packages.gz file
Quoting Goswin Brederlow [EMAIL PROTECTED]: == Sami Haahtinen [EMAIL PROTECTED] writes: Or, can rsync sync binary files? Of cause, but forget it with compressed data. Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew Tridgell (Samba, Rsync) has a patch to do this, but I don't know whether he passed it onto the gzip maintainers. (Apparently he's working on a --fuzzy flag for matching rsyncs between, say foo-1.0.deb and foo-1.1.deb. He says it should be called the --debian flag.) Cheerio, Andrew Stribblehill Systems programmer, IT Service, University of Durham, England
Re: package pool and big Packages.gz file
Andrew Stribblehill [EMAIL PROTECTED] wrote: Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew Tridgell (Samba, Rsync) has a patch to do this, but I don't know whether he passed it onto the gzip maintainers. I like the idea of having plugins for rsync to handle different kinds of data. So the gzip plugin will decompress the data, and the rsync algorithm can work on the decompressed data. Much better. (Apparently he's working on a --fuzzy flag for matching rsyncs between, say foo-1.0.deb and foo-1.1.deb. He says it should be called the --debian flag.) A deb plugin would be better. :) -- Sam Couter | Internet Engineer | http://www.topic.com.au/ [EMAIL PROTECTED]| tSA Consulting | OpenPGP key available on key servers OpenPGP fingerprint: A46B 9BB5 3148 7BEA 1F05 5BD5 8530 03AE DE89 C75C pgpSUf10GUoEI.pgp Description: PGP signature
Re: package pool and big Packages.gz file
Sam == Sam Couter [EMAIL PROTECTED] writes: Sam Andrew Stribblehill [EMAIL PROTECTED] wrote: Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew Tridgell (Samba, Rsync) has a patch to do this, but I don't know whether he passed it onto the gzip maintainers. Sam I like the idea of having plugins for rsync to handle Sam different kinds of data. So the gzip plugin will decompress Sam the data, and the rsync algorithm can work on the Sam decompressed data. Much better. (Apparently he's working on a --fuzzy flag for matching rsyncs between, say foo-1.0.deb and foo-1.1.deb. He says it should be called the --debian flag.) Sam A deb plugin would be better. :) Sounds like a good idea to me. Although don't get the two issues confused: 1. difference in filename. 2. format of file. Although, I guess in most cases the two will always be linked (eg. choosing the best filename really depends on the format, as ideally the most similar *.deb package should be used, and this means implementing debian rules for comparing versions), will this always be the case? -- Brian May [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
On Sun, Jan 07, 2001 at 11:43:39AM +1100, Sam Couter wrote: A deb plugin would be better. :) One problem with a deb plugin is that .debs are signed in compressed form. gzip isn't guaranteed to produce the same compressed file from identical uncompressed files on different architectures and releases. Varying the compression flags can also change the compressed file. -Drake
Re: package pool and big Packages.gz file
On Sun, Jan 07, 2001 at 12:53:14PM +1100, Drake Diedrich wrote: On Sun, Jan 07, 2001 at 11:43:39AM +1100, Sam Couter wrote: A deb plugin would be better. :) One problem with a deb plugin is that .debs are signed in compressed form. gzip isn't guaranteed to produce the same compressed file from identical uncompressed files on different architectures and releases. Varying the compression flags can also change the compressed file. It shouldn't be a problem to tweak things so that the resulting files end up exactly the same. This is rsync, after all, and that is the program's goal. For instance, uncompressed blocks could be used for comparison, but the gzip header copied exactly. -- - mdz
Re: [FINAL, for now ;-)] (Was: Re: package pool and big Packages.gz file)
If you don't like large Packages files, implement a rsync transfer method for them. -- see shy jo
Re: package pool and big Packages.gz file
On 5 Jan 2001, Goswin Brederlow wrote: If that suits your needs, feel free to write a bugreport on apt about this. Yes, I enjoy closing such bug reports with a terse response. Hint: Read the bug page for APT to discover why! Jason
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 03:05:03AM +0100, Goswin Brederlow wrote: Whats the problem with a big Packages file? If you don't want to download it again and again just because of small changes I have a better solution for you: rsync apt-get update could rsync all Packages files (yes, not the .gz once) and thereby download only changed parts. On uncompressed files rsync is very effective and the changes can be compressed for the actual transfer. So on upload you will pratically get a diff.gz to your old Packages file. this would bring us to, apt renaming the old deb (if there is one) to the name of the new package and rsync those. and we would save some time once again... Or, can rsync sync binary files? hmm.. this sounds like something worth implementing.. -- every nerd knows how to enjoy the little things of life, like: rm -rf windows
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 05:46:35AM +0800, zhaoway wrote: how about diffs bethween dinstall runs?.. sorry, but i don't understand here. dinstall is a server side thing here? yes, when dinstall runs it would copy the old packages file to, lets say, packages.old and create it's changes to the new file.. after it's done it would diff packages.old and packages... packages-010102-010103.gz packages-010103-010104.gz packages.gz apt would download the changes after the last update, and merge these to the package file, if the file gets corrupted, it would attempt to do a full update. on the top, some pkg-gz-deb lists packages on the leaf of dependency tree, and each of pkg-gz-deb won't get bigger than 100k, and each of them depends on some more basic pkg-gz-deb below, some other pkg-gz-deb like the base sub-system. this way, when user install xdm, apt-get first install pkg-gz-deb which lists xdm, then as dependency checking, it will install base-a-pkg-gz-deb etc. ect., then xdm got installed, this way, all xdm's dependency will be fulfiled with the newest information avalaible. and you can see this will surely ease up the band-width. (when update gcc, i won't get additional bits of Packages.gz about xdm xfree etc.) wouldn't this make it a BIT too difficult? -- every nerd knows how to enjoy the little things of life, like: rm -rf windows
Re: package pool and big Packages.gz file
Previously Sami Haahtinen wrote: this would bring us to, apt renaming the old deb (if there is one) to the name of the new package and rsync those. and we would save some time once again... There is a --fuzzy-names patch for rsync that makes rsync do that itself. Or, can rsync sync binary files? Yes. hmm.. this sounds like something worth implementing.. Don't bother, it's been done already. Ask Rusty for details. Wichert. -- / Generally uninteresting signature - ignore at your convenience \ | [EMAIL PROTECTED] http://www.liacs.nl/~wichert/ | | 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0 2805 3CB8 9250 2FA3 BC2D |
Re: package pool and big Packages.gz file
== Sami Haahtinen [EMAIL PROTECTED] writes: On Fri, Jan 05, 2001 at 03:05:03AM +0100, Goswin Brederlow wrote: Whats the problem with a big Packages file? If you don't want to download it again and again just because of small changes I have a better solution for you: rsync apt-get update could rsync all Packages files (yes, not the .gz once) and thereby download only changed parts. On uncompressed files rsync is very effective and the changes can be compressed for the actual transfer. So on upload you will pratically get a diff.gz to your old Packages file. this would bring us to, apt renaming the old deb (if there is one) to the name of the new package and rsync those. and we would save some time once again... Thats what the debian-mirror script does (its about halve of the script just for that). It also uses old tar.gz, orig.tar.gz, diff.gz and dsc files. Or, can rsync sync binary files? Of cause, but forget it with compressed data. hmm.. this sounds like something worth implementing.. I'm currently discussing some changes to the rsync client with some people from the rsync ML which would uncompress compressed data on the client side (no changes to the server) and rsync those. Sounds like not improving anything, but when reading the full description on this it actually does. Before that rsyncing new debs with old once hardly ever saves anything. Where it hels is with big packages like xfree, where several packages are identical between releases. MfG Goswin
Re: package pool and big Packages.gz file
== Jason Gunthorpe [EMAIL PROTECTED] writes: On 5 Jan 2001, Goswin Brederlow wrote: If that suits your needs, feel free to write a bugreport on apt about this. Yes, I enjoy closing such bug reports with a terse response. Hint: Read the bug page for APT to discover why! Jason I couldn't find any existing bugreport concerning rsync support for apt-get in the long list of bugs. So why would you close such a wishlist bugreport? And why with a terse response? MfG Goswin
Re: package pool and big Packages.gz file
In 05 Jan 2001 19:51:08 +0100 Goswin Brederlow [EMAIL PROTECTED] cum veritate scripsit : Hello, I'm currently discussing some changes to the rsync client with some people from the rsync ML which would uncompress compressed data on the client side (no changes to the server) and rsync those. Sounds like not improving anything, but when reading the full description on this it actually does. Before that rsyncing new debs with old once hardly ever saves anything. Where it hels is with big packages like xfree, where several packages are identical between releases. No offence, but wouldn't it be a tad difficult to play around with it, since deb packages are not just gzipped archives, but ar archive containing gzipped tar archives? regards, junichi -- University: [EMAIL PROTECTED]Netfort: [EMAIL PROTECTED] dancer, a.k.a. Junichi Uekawa http://www.netfort.gr.jp/~dancer Dept. of Knowledge Engineering and Computer Science, Doshisha University. ... Long Live Free Software, LIBERTAS OMNI VINCIT.
Re: package pool and big Packages.gz file
== Junichi Uekawa [EMAIL PROTECTED] writes: In 05 Jan 2001 19:51:08 +0100 Goswin Brederlow [EMAIL PROTECTED] cum veritate scripsit : Hello, I'm currently discussing some changes to the rsync client with some people from the rsync ML which would uncompress compressed data on the client side (no changes to the server) and rsync those. Sounds like not improving anything, but when reading the full description on this it actually does. Before that rsyncing new debs with old once hardly ever saves anything. Where it hels is with big packages like xfree, where several packages are identical between releases. No offence, but wouldn't it be a tad difficult to play around with it, since deb packages are not just gzipped archives, but ar archive containing gzipped tar archives? Yes and no. The problem is that deb files are special ar archives, so you can't just download the files and ar them together. One way would be to download the files in the ar, ar them together and rsync again. Since ar does not chnage the data in it, the deb has the same data just at different places, and rsync handles that well. This would be possible, but would require server changes. The trick is to know a bit about ar, but not to much. Just rsync the header of the ar file till the first real file in it and then rsync that recursively, then a bit more ar file data and another file and so on. Knowing when subfiles start and how long they are is enough. The question will be how much intelligence to teach rsync. I like rsync stupid but still intelligent enough to do the job. Its pretty tricky, so it will be some time before anything in that direction is useable. MfG Goswin
Re: package pool and big Packages.gz file
Jason Gunthorpe wrote: Hint: Read the bug page for APT to discover why! Looking through the apt bugs., saw this one, rejected: Bug#77054: wish: show current-upgraded versions on upgrade -u My private solution to this is the following patch to `apt-get': --- algorithms.cc-ORG Sat May 13 06:08:43 2000 +++ algorithms.cc Sat Sep 9 22:11:19 2000 @@ -47,9 +47,13 @@ { // Adapt the iterator PkgIterator Pkg = Sim.FindPkg(iPkg.Name()); + const char *oldver = Pkg-CurrentVer ? Pkg.CurrentVer().VerStr() : -; + const char *newver = Pkg-VersionList ? Pkg.VersionList().VerStr() : -; + Flags[Pkg-ID] = 1; - cout Inst Pkg.Name(); + cout Inst Pkg.Name() ( oldver newver ); + Sim.MarkInstall(Pkg,false); // Look for broken conflicts+predepends. This informs me about versions when doing apt-get --no-act install package. I like this very much, and would appreciate this going into the official apt-get command. -- Thanks, -o) Matthijs Melchior Maarssen /\\ mailto:[EMAIL PROTECTED] +31 346 570616Netherlands _\_v
Re: package pool and big Packages.gz file
[read my previous semi-proposal] this has some more benefits, 1) package maintainer could upload (to pool) in whatever frequency they like. 2) release is seperated from package pool which is a storage system. and release is a qa system. 3) release could be managed through BTS on specific package-gz.deb that surely would put much more burden on BTS, ;-) 4) if apt-get could deal it well, i hope all of sub-mirror'ing issue will be gone easily. just apt-get install some-rel-packages-gz then apt-get mirror (just like download and move ...) my more 2'c ;-) zw
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 03:17:30AM +0800, zhaoway wrote: [read my previous semi-proposal] this has some more benefits, 1) package maintainer could upload (to pool) in whatever frequency they like. in an ideal world, developer should upload to ''xxx-auto-builder'' ;-) 9i'm turning out to be crappy now. ;-) bye,
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote: my proposal to resolve big Packages.gz is through package pool system. add 36 or so new debian package, namely, [a-zA-Z0-1]-packages-gz_date_all.deb contents of each is quite obvious. ;-) and a virtual unstable-packages-gz depends on all of them. finished. apt-get update should deal with it how about diffs bethween dinstall runs?.. packages-010102-010103.gz packages-010103-010104.gz packages.gz apt would download the changes after the last update, and merge these to the package file, if the file gets corrupted, it would attempt to do a full update. This wouldn't be a big difference in the load that the master-ftp has to handle, atleast when some 7 of these would be stored at maximum. Regards, Sami Haahtinen -- every nerd knows how to enjoy the little things of life, like: rm -rf windows
Re: package pool and big Packages.gz file
The only other possibility not yet proposed (?) would be to split the packages file by section. base-packages games-packages x11-packages net-packages Then a server that just doesn't do x11 or doesn't go games has no need to keep up with available x11 or games packages. [EMAIL PROTECTED] punki.fi To: debian-devel@lists.debian.org (Samicc: (bcc: Vince Mulhollon/Brookfield/Norlight) Haahtinen) Fax to: Subject: Re: package pool and big Packages.gz file 01/04/2001 03:01 PM On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote: my proposal to resolve big Packages.gz is through package pool system. add 36 or so new debian package, namely, [a-zA-Z0-1]-packages-gz_date_all.deb contents of each is quite obvious. ;-) and a virtual unstable-packages-gz depends on all of them. finished. apt-get update should deal with it how about diffs bethween dinstall runs?.. packages-010102-010103.gz packages-010103-010104.gz packages.gz apt would download the changes after the last update, and merge these to the package file, if the file gets corrupted, it would attempt to do a full update. This wouldn't be a big difference in the load that the master-ftp has to handle, atleast when some 7 of these would be stored at maximum. Regards, Sami Haahtinen -- every nerd knows how to enjoy the little things of life, like: rm -rf windows -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: package pool and big Packages.gz file
On Thu, Jan 04, 2001 at 03:07:00PM -0600, Vince Mulhollon wrote: The only other possibility not yet proposed (?) would be to split the packages file by section. base-packages games-packages x11-packages net-packages Then a server that just doesn't do x11 or doesn't go games has no need to keep up with available x11 or games packages. how would the package manager (namely apt) know which ones you need.. even if you don't have X11 installed (and apt assumes you don't need X11 packages file) doesn't mean that you wouldn't want to install x11 packages file. same goes for net (which is a weird definition in any case) and games, base is the only reasonable one. But without the others it's not needed either... -- every nerd knows how to enjoy the little things of life, like: rm -rf windows
Re: package pool and big Packages.gz file
On Thu, Jan 04, 2001 at 11:01:15PM +0200, Sami Haahtinen wrote: On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote: my proposal to resolve big Packages.gz is through package pool system. add 36 or so new debian package, namely, [a-zA-Z0-1]-packages-gz_date_all.deb contents of each is quite obvious. ;-) and a virtual unstable-packages-gz depends on all of them. finished. apt-get update should deal with it how about diffs bethween dinstall runs?.. sorry, but i don't understand here. dinstall is a server side thing here? packages-010102-010103.gz packages-010103-010104.gz packages.gz apt would download the changes after the last update, and merge these to the package file, if the file gets corrupted, it would attempt to do a full update. This wouldn't be a big difference in the load that the master-ftp has to handle, atleast when some 7 of these would be stored at maximum. okay, try to group packages according to dependency, on the top, some pkg-gz-deb lists packages on the leaf of dependency tree, and each of pkg-gz-deb won't get bigger than 100k, and each of them depends on some more basic pkg-gz-deb below, some other pkg-gz-deb like the base sub-system. this way, when user install xdm, apt-get first install pkg-gz-deb which lists xdm, then as dependency checking, it will install base-a-pkg-gz-deb etc. ect., then xdm got installed, this way, all xdm's dependency will be fulfiled with the newest information avalaible. and you can see this will surely ease up the band-width. (when update gcc, i won't get additional bits of Packages.gz about xdm xfree etc.) regards, zw
Re: package pool and big Packages.gz file
On Thu, Jan 04, 2001 at 11:19:59PM +0200, Sami Haahtinen wrote: how would the package manager (namely apt) know which ones you need.. even if you don't have X11 installed (and apt assumes you don't need X11 packages file) doesn't mean that you wouldn't want to install x11 packages file. another solution is to let every single deb provides its.pkg-gz then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. and a virtual release-pkgs-gz.deb will depend on some selected part of those any.pkg-gz to get up a release. then katie will remove a package only when no release-pkgs-gz.deb (or testing, or whatever) depends on its.pkg-gz regards, zw
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 06:07:20AM +0800, zhaoway wrote: another solution is to let every single deb provides its.pkg-gz then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. that is a minimum. isn't it? ;) and then we will need some ``apt-get info pkg'' hehe.. and a virtual release-pkgs-gz.deb will depend on some selected part of those any.pkg-gz to get up a release. say one release contains 2000 pkgz, each pkg name is 10 chars, then the whole infomation is a little more then 20k, compare with nowadays, a more than 1M. and you could have base-3.3-release, and gnome-4.4-release which depends on base-3.2-release and x-5.6-release. and chinese-2.0-release etc. ... then katie will remove a package only when no release-pkgs-gz.deb (or testing, or whatever) depends on its.pkg-gz zw
Re: package pool and big Packages.gz file
On Fri, Jan 05, 2001 at 06:07:20AM +0800 , zhaoway wrote: On Thu, Jan 04, 2001 at 11:19:59PM +0200, Sami Haahtinen wrote: how would the package manager (namely apt) know which ones you need.. even if you don't have X11 installed (and apt assumes you don't need X11 packages file) doesn't mean that you wouldn't want to install x11 packages file. another solution is to let every single deb provides its.pkg-gz then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. but it will immensly restrict it's view on dependencies - think about virtual packages. This is really not the way. Maybe spliting by as in pool/ so you only download changed part of the whole thing. But that's about it. Maybe you can leave some part out, but .. Petr Cech -- Debian GNU/Linux maintainer - www.debian.{org,cz} [EMAIL PROTECTED] * Joy notes some people think Unix is a misspelling of Unics which is a misspelling of Emacs :)
Re: package pool and big Packages.gz file
[quote myself, ;-) this is semi-final now ;-)] another solution is to let every single deb provides its.pkg-gz then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. that is a minimum. isn't it? ;) and then we will need some ``apt-get info pkg'' hehe.. and a virtual release-pkgs-gz.deb will depend on some selected part of those any.pkg-gz to get up a release. say one release contains 2000 pkgz, each pkg name is 10 chars, then the whole infomation is a little more then 20k, compare with nowadays, a more than 1M. and you could still do ``apt-get dist-upgrade'', just first install release-pkgs-gz.deb then go on..., OR, first get a list of all debs installed then update them each. [some more thoughts here..., later] and you could have base-3.3-release, and gnome-4.4-release which depends on base-3.2-release and x-5.6-release. and chinese-2.0-release etc. ... then katie will remove a package only when no release-pkgs-gz.deb (or testing, or whatever) depends on its.pkg-gz zw
Re: package pool and big Packages.gz file
On Thu, Jan 04, 2001 at 11:19:25PM +0100, Petr Cech wrote: On Fri, Jan 05, 2001 at 06:07:20AM +0800 , zhaoway wrote: then, apt-get update will do nothing, apt-get install some.deb will first download some.pkg-gz, then check its dependency, then grab them.pkg-gz all, then install. but it will immensly restrict it's view on dependencies - think about virtual packages. This is really not the way. Maybe spliting by as in pool/ so you only download changed part of the whole thing. But that's about it. Maybe you can leave some part out, but .. virtual package is weird here. ;-) but could be resolve by a some-virtula.pkg-gz ;-) and the tree view of dependency tree, like in console-apt, that means, [see my another semi-final mail.. ;-)] in general, if you wanna an tree-view of the whole tree, you will need to download the whole tree anyway, and my approach won't prevent you do that. ;-) kinda regards, zw
[FINAL, for now ;-)] (Was: Re: package pool and big Packages.gz file)
final thoughts ;-) On bigger and bigger Packages.gz file, a try The directory structure looks roughly like this: debian/dists/woody/main/binary-all/Packages.deb debian/pool/main/a/abba/abba_1989.orig.tar.gz abba_1989-12.diff.gz abba_1989-12.dsc abba_1989-12_all.deb abba_1989-12_all.pkg debian/pool/main/r/rel-chinese/rel-music_0.9_all.pkg rel-music_0.9_all.deb rel-base/rel-base_200_all.pkg rel-base_200_all.pkg Contents of rel-chinese_0.9_all.pkg is as following. rel-base or even rel-woody is just much more complicated. Hope so. rel-chinese.deb is nearly an empty package. Package: rel-music Priority: optional Section: misc Installed-Size: 12 Maintainer: Anthony and Cleopatra Architecture: all Source: rel-chinese Version: 0.9 Depends: rel-base (= 200), abba (= 1989-12), beatles(= 1979-100), garbage(= 1998-7) wearied-ear (= 2.1) Provides: music | abba | beatles Filename: debian/pool/main/r/rel-chinese/rel-music_0.9_all.deb Size: 3492 MD5sum: c8c730ea650cf14638d83d6bb7707cdb Description: Simplified music environment This 'task package' installs programs, data files, fonts, and documentation that makes it easier to use Debian for Simplified music related operations. (Surprise, surprise, garbage didn't provide music!) Note, music is a virtual package provided by adda and beatles. Contents of abba_1989-12_all.pkg is as following. Package: abba Priority: optional Section: sound Installed-Size: 140 Maintainer: Old Man Billy Architecture: all Version: 1998-12 Replaces: beatles Provides: music Depends: wearied-ear (= 2.0) Filename: pool/main/a/abba/abba_1989-12_all.deb Size: 33256 MD5sum: e07899b62b7ad12c545e9998adb7c8d7 Description: A Swedish Music Band ABBA is popular in 1980's in last millenium. Don't be confused by ABBA and ADDA which is a heavy metal band. Here, music is a virtual package provided by packages abba and beatles. Let's simulate some typical senarios here. 1) apt-get update There're roughtly two purpose for this action. One is to get an overview, to ease up further processing like virtual packages; another purpose is to install a specific package, or do dist-upgrade. On the second purpose, apt-get here will do nothing. (See below) On the first purpose, apt-get will have to download and parse the current distribution's .pkg file according to user configuration. Say, to download rel-music, and then see the virtual package music is provided by abba and beatles. So, generally, ``apt-get update'' will deal with rel-some__all.pkg to get all of the overall information it will need further on. Then, where does the rel-some__all.pkg get its information? We don't want the release manager to track down all of these information. So, where's katie? ;-) I think the trade-off is worthy (Indeed, only katie get to be a little more complicated) considering the scalabily being gained. Read on. 2) apt-get install abba apt-get will first parse the previously downloaded rel-music.pkg, and get abba is at version 1998-12, and it depends on wearied-ear (= 2.0) and wow! rel-music happens to provide wearied-ear (= 2.1), that's okay. Then apt-get go on to download its .pkg and parse it, and so on. When all required .pkg were downloaded and parsed (an updated Packages.gz) apt-get then go on to download and install every of the debs. (Maybe there will be more complicated issues, only let me know. See what's going. ;-) Thus, minimum data downloaded. ;-) 3) apt-get dist-upgrade I don't know the details, but I think it's not very complicated given above information. (All necessary things are there, aren't they? ;-) 4) Packages upload .pkg file is generated automatically. No extra burden on most of the developers. And developers could upload just as frequently as they see fit. ;-) Katie will be a little ;-) more complicated. Package will get to be deleted from package pool only when no rel-X depends on them. rel-X are treated specially. And some fine tuned mirror could be setup. And release management could benefit from Bug Tracking System and more flexible. IMHO. ;-) Kind regards, zw
Re: package pool and big Packages.gz file
== zhaoway [EMAIL PROTECTED] writes: hi, [i'm not sure if this has been resolved, lart me if you like.] my proposal to resolve big Packages.gz is through package pool system. Whats the problem with a big Packages file? If you don't want to download it again and again just because of small changes I have a better solution for you: rsync apt-get update could rsync all Packages files (yes, not the .gz once) and thereby download only changed parts. On uncompressed files rsync is very effective and the changes can be compressed for the actual transfer. So on upload you will pratically get a diff.gz to your old Packages file. If that suits your needs, feel free to write a bugreport on apt about this. MfG Goswin