Re: Packages file missing from unstable archive
Anthony Towns writes: > Hrm, thinking about it, I guess zsync probably works by storing the > state of the gzip table at certain points in the file and doing a > rolling hash of the contents and recompressing each chunk of the file; > that'd result in the size of the .gz not necessarily being the same, let > alone the md5sum. zsync has to recompress the raw data locally and for that it has to guess at the implementation used to compress the initial file. But for debs that should be deterministic. zsync can garanty that recompressing gives the same result by checking that is does when creating the checksum files. If the input file and zsync's recompression result in the same then it will always be the same unless zsync changes its gzip implementation. > Feh, trying to verify this with ~512kB of random data, gzipped, I just > keep getting "Aborting, download available in zsyncnew.gz.part". That's > not terribly reassuring. And trying it with gzipped text data, I get > stuck on 99.0%, with zsync repeatedly requesting around 700 bytes. > > Anyway, if it's recompressing like I think, there's no way to get the > same compressed md5sum -- even if the information could be transferred, > there's no guarantee the local gzip _can_ produce the same output as > the remote gzip -- imagine if it had used gzip -9 and your local gzip > only supports -1 through -5, eg. zsync doesn't fork of some unknown local gzip and it knows what its own gzip routines can produce. It can easily be guaranteed that the zsync client behaves the same way as the remote zsync checksum program that would test for recompressability. The failure to sync the file is definetly a bug in zsync. Even if the recompression fails (which it should know beforehand) it should fall back to syncing the compressed data and produce the expected result. > Hrm, it probably also means that mirrors can't use zsync -- that is, > if you zsync fooA to fooB you probably can't use fooA.zsync to zsync > from fooB to fooC. > > Anyway, just because you get a different file, that doesn't mean it'll > act differently; so we could just use an "authentication" mechanism > that reflects that. That might involve providing sizes and sha1s of the > uncompressed contents of the ar in the packages file, instead of the > md5sum of the ar. Except the previous note probably means that you'd > still need to use the md5sum of the .deb to verify mirrors; which means > mirrors and users would have different ways of verifying their > downloads, which is probably fairly undesirable. Too bad Packages files contain the md5sum of the full deb. Changing that would be a ugly and lengthy process. So lets not do that. The only sane way is to make zsync produce identical debs. It isn't trivial but not impossible. > Relatedly, mirrors (and apt-proxy users, etc) need to provide Packages.gz > of a particular md5sum/size, so they can't use Packages.diff to speed > up their diffs. It might be worth considering changing the Release file > definition to just authenticate the uncompressed files and expect tools > like apt and debootstrap to authenticate only after uncompressing. A > "Compression-Methods: gz, bz2" header might suffice to help tools work > out whether to try downloading Packages.gz, Packages.bz2 or just plain > Packages first. Possibly "Packages-Compress:" and "Sources-Compress:" > might be better. > > Cheers, > aj % gunzip Packages.gz.2 % gunzip Packages.gz.3 % gunzip Packages.gz.4 % gunzip Packages.gz.5 % md5sum * 172930d0165cf3f7b23324ec79e52847 Packages.gz be00244619e0ed53ae2ba5a454aa3fee Packages.gz.2 d4c7c8e04d963beb4d3bee4ac8e7bd0f Packages.gz.3 764c5aa8168cb58d5e4d6412333516a5 Packages.gz.4 764c5aa8168cb58d5e4d6412333516a5 Packages.gz.5 The problem is the timestamp in gzip files. If you patch the DAK to use the -n switch then Packages.diff can be used to update Packages and then recompress it. Further zsync could include the timestamp in the .zsync file and recompress to the same timestamp. MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
> On Tue, Nov 01, 2005 at 09:54:09AM -0500, Michael Vogt wrote: > > A problem is that zsync needs to teached to deal with deb files (that > > is, that it needs to unpack the data.tar and use that for the syncs). [Anthony Towns] > That seems kinda awkward -- you'd need to start by downloading the ar > header, working out where in the file the data.tar.gz starts, then > redownloading from there. I guess you could include that info in the > .zsync file though. Right, the latter. Having downloaded the .zsync file, you calculate your local checksums against the ones in that file and you know exactly what's left to be downloaded and what to do with it. The .zsync file includes a sort of map of the structure of the target, not unlike a jigdo file. > OTOH, there should be savings in the control.tar.gz too, surely -- > it'd change less than data.tar.gz most of the time, no? He was only comparing data.tar.gz because that made for a simpler mock-up. zsync doesn't currently dig into a .deb at all, so this was just a simulation, as it were. > Hrm, thinking about it, I guess zsync probably works by storing the > state of the gzip table at certain points in the file and doing a > rolling hash of the contents and recompressing each chunk of the > file I haven't actually looked at the implementation of zsync, but I've always assumed that zsync assumes a homogeneous (i.e., predictable) gzip algorithm everywhere, works out the known variables by trial and error, and stores the appropriate amount of state to reproduce the gzip file exactly, given the assumptions about the gzip implementation. For that to be correct assumes a certain homogeneity of the zlib used by zsync implementations; for it to be efficient assumes the same about whatever is used to compress files in gzip format. I've always harbored my doubts about the deployment scalability of this approach. > Anyway, just because you get a different file, that doesn't mean > it'll act differently; so we could just use an "authentication" > mechanism that reflects that. That might involve providing sizes and > sha1s of the uncompressed contents of the ar in the packages file, > instead of the md5sum of the ar. Authenticating uncompressed content is a good design choice anyway. Makes it easier, for instance, to add gpg signatures inside the ar file, without invalidating existing checksum authentication. Conceptually, authenticating content based on a container which is essentially nondeterministic is a bit like refusing to authenticate a person because he or she is wearing different clothes from the ones noted in the auth database. signature.asc Description: Digital signature
Re: Packages file missing from unstable archive
On Fri, 11 Nov 2005 14:51:30 +1000 Anthony Towns wrote: > Anyway, if it's recompressing like I think, there's no way to get the > same compressed md5sum -- even if the information could be > transferred, there's no guarantee the local gzip _can_ produce the > same output as the remote gzip -- imagine if it had used gzip -9 and > your local gzip only supports -1 through -5, eg. We could just mandate in policy that what the gzip level is supposed to be. If we're going to do that, it's probably easier to just use --rsyncable and teach zsync to do look-in-ar instead of look-in-gz. Also we wouldn't have the md5sum problem on the data.tar.gz then. Note that I haven't tested the efficiency of --rsyncable... grts Tim -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On Tue, Nov 01, 2005 at 09:54:09AM -0500, Michael Vogt wrote: > My next test was to use only the data.tar.gz of the two > archives. Zsync will extract the gzip file then and use the tar as the > base. With that I got: > 8< > Read data.tar.gz. Target 34.1% complete. > used 1056768 local, fetched 938415 > 8< > The size of the data.tar.gz is 1210514. Fetching 938kB instead of 1210kB is a 22.5% saving, so 12% of the desired data was apparently already present, but redownloaded anyway. > A problem is that zsync needs to teached to deal with deb files (that > is, that it needs to unpack the data.tar and use that for the syncs). That seems kinda awkward -- you'd need to start by downloading the ar header, working out where in the file the data.tar.gz starts, then redownloading from there. I guess you could include that info in the .zsync file though. OTOH, there should be savings in the control.tar.gz too, surely -- it'd change less than data.tar.gz most of the time, no? How much zsync data is required for that 22.5% saving over 1MB? I guess it'd be about 16 bytes per 4k of uncompressed data, assuming 33% compression, that's 16bytes per 3kB, or .5% overhead. For 100GB of debs in the archive, that's about an extra half gig of space used. Hrm, thinking about it, I guess zsync probably works by storing the state of the gzip table at certain points in the file and doing a rolling hash of the contents and recompressing each chunk of the file; that'd result in the size of the .gz not necessarily being the same, let alone the md5sum. Feh, trying to verify this with ~512kB of random data, gzipped, I just keep getting "Aborting, download available in zsyncnew.gz.part". That's not terribly reassuring. And trying it with gzipped text data, I get stuck on 99.0%, with zsync repeatedly requesting around 700 bytes. Anyway, if it's recompressing like I think, there's no way to get the same compressed md5sum -- even if the information could be transferred, there's no guarantee the local gzip _can_ produce the same output as the remote gzip -- imagine if it had used gzip -9 and your local gzip only supports -1 through -5, eg. Hrm, it probably also means that mirrors can't use zsync -- that is, if you zsync fooA to fooB you probably can't use fooA.zsync to zsync from fooB to fooC. Anyway, just because you get a different file, that doesn't mean it'll act differently; so we could just use an "authentication" mechanism that reflects that. That might involve providing sizes and sha1s of the uncompressed contents of the ar in the packages file, instead of the md5sum of the ar. Except the previous note probably means that you'd still need to use the md5sum of the .deb to verify mirrors; which means mirrors and users would have different ways of verifying their downloads, which is probably fairly undesirable. Relatedly, mirrors (and apt-proxy users, etc) need to provide Packages.gz of a particular md5sum/size, so they can't use Packages.diff to speed up their diffs. It might be worth considering changing the Release file definition to just authenticate the uncompressed files and expect tools like apt and debootstrap to authenticate only after uncompressing. A "Compression-Methods: gz, bz2" header might suffice to help tools work out whether to try downloading Packages.gz, Packages.bz2 or just plain Packages first. Possibly "Packages-Compress:" and "Sources-Compress:" might be better. Cheers, aj signature.asc Description: Digital signature
Re: Packages file missing from unstable archive
On Wed, Nov 09, 2005 at 04:26:59PM +0100, Goswin von Brederlow wrote: > Anthony Towns writes: > > On Sun, Oct 30, 2005 at 09:48:35AM +0100, Goswin von Brederlow wrote: > >> Zsync checksum files are, depending on block size, about 3% of the > >> file size. For the full archive that means under 10G more data. As > >> comparison adding amd64 needs ~30G. After the scc split there might be > >> enough space on mirrors for both. > > Adding amd64 needs 30G? Since when? > With stable/testing/unstable/experimental it should end up around > there I think. Its 6-7G for the amd64 sarge debs so depending on > overlap you get more or less. Assuming no overlap, and your numbers you get 3 * 7 = 21 << 30. For architectures in the archive, including oldstable through experimental, disk space used by debs of that architecture range from 9GB (m68k) to 14GB (i386, ia64), including 13GB arch:all packages. It's necessary to have accurate numbers on these things, rather than pulling things out of the air. Cheers, aj signature.asc Description: Digital signature
Re: Packages file missing from unstable archive
Michael Vogt <[EMAIL PROTECTED]> writes: > 8< > Read data.tar.gz. Target 34.1% complete. > used 1056768 local, fetched 938415 > 8< > The size of the data.tar.gz is 1210514. So your simple test shows 34% savings for a mixed binary/doc package. That is very promising. Now imagine syncing the X fonts that didn't actualy change contents between releases. > A problem is that zsync needs to teached to deal with deb files (that > is, that it needs to unpack the data.tar and use that for the syncs). > > Having it inside dak is not (at the beging) a requirement. Zsync seems > to be able to deal with URLs, so we could create a pool with zsync > files on any server and let them point to ftp.debian.org. Correct. Someone just needs to setup a debian mirror, run zsync over every new file and make the checksum files available somwhere public. Something I'm willing to do using alioth to server the checksum files. But first zsync has to look into debs. > We need to guarantee that the md5sum of the synced deb must match the > md5sum in the Packages file. Initial tests indicate that that is not > the case. Only the md5sum of the unpacked data.tar file matches, not > from the gzip file (or the deb). This is a serious showstoper IMHO. Is that not just a problem of the data.tar.gz containing a timestamp that differs now? How many bytes differ between the original and the zsync result? MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
Anthony Towns writes: > On Sun, Oct 30, 2005 at 09:48:35AM +0100, Goswin von Brederlow wrote: >> Zsync checksum files are, depending on block size, about 3% of the >> file size. For the full archive that means under 10G more data. As >> comparison adding amd64 needs ~30G. After the scc split there might be >> enough space on mirrors for both. > > Adding amd64 needs 30G? Since when? With stable/testing/unstable/experimental it should end up around there I think. Its 6-7G for the amd64 sarge debs so depending on overlap you get more or less. > And stuff doesn't go on the mirrors because it's "under 30G", it goes on > the mirrors because it provides useful benefits. Where're the statistics > showing how much zsync signatures actually help? Before zsync looks into debs as well as gz files there won't be any reasonable gain for debs and for the Packages/Sources files the diff method works better. So for now there is no big gain and no strong argument for adding zsync. But the potential is there for anyone willing to invest some time and improve the code. > Cheers, > aj MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On Thu, Oct 27, 2005 at 10:06:22AM +0200, Robert Lemmen wrote: > On Wed, Oct 26, 2005 at 09:15:38PM -0400, Joey Hess wrote: > > (And yes, we still need a solution to speed up the actual deb file > > downloads..) [..] > if zsync would be taught to handle .deb files as it does .gz files, and > a method for apt be written, how big are the chances that support could > be integrated into dak? the effort wouldn't be *that* big... I did a pretty unscientific test with apt and the changes from 0.6.41 -> 0.6.42.1. It contains a good mix of code changes, documentation updates and translation updates [1]. With the two normals debs I got no effect at all because no usable data was found. I then repacked the data.tar.gz and control.tar.gz inside the deb with "--rsyncable" (and reassmbled the deb). This resulted in: "Read apt_0.6.41_i386.deb. Target 0.8% complete." So this didn't had a lot of effect either. My next test was to use only the data.tar.gz of the two archives. Zsync will extract the gzip file then and use the tar as the base. With that I got: 8< Read data.tar.gz. Target 34.1% complete. used 1056768 local, fetched 938415 8< The size of the data.tar.gz is 1210514. A problem is that zsync needs to teached to deal with deb files (that is, that it needs to unpack the data.tar and use that for the syncs). Having it inside dak is not (at the beging) a requirement. Zsync seems to be able to deal with URLs, so we could create a pool with zsync files on any server and let them point to ftp.debian.org. We need to guarantee that the md5sum of the synced deb must match the md5sum in the Packages file. Initial tests indicate that that is not the case. Only the md5sum of the unpacked data.tar file matches, not from the gzip file (or the deb). This is a serious showstoper IMHO. Cheers, Michael [1] I would love to hear results from other people testing it with different packages and different changes. -- Linux is not The Answer. Yes is the answer. Linux is The Question. - Neo -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On Sun, Oct 30, 2005 at 09:48:35AM +0100, Goswin von Brederlow wrote: > Zsync checksum files are, depending on block size, about 3% of the > file size. For the full archive that means under 10G more data. As > comparison adding amd64 needs ~30G. After the scc split there might be > enough space on mirrors for both. Adding amd64 needs 30G? Since when? And stuff doesn't go on the mirrors because it's "under 30G", it goes on the mirrors because it provides useful benefits. Where're the statistics showing how much zsync signatures actually help? Cheers, aj signature.asc Description: Digital signature
Re: Packages file missing from unstable archive
Henrique de Moraes Holschuh <[EMAIL PROTECTED]> writes: > On Thu, 27 Oct 2005, Robert Lemmen wrote: >> if zsync would be taught to handle .deb files as it does .gz files, and > > You are talking about freaking lot of metadata here, and about changing some > key stuff to get --rsyncable compression. > > I may not understand why most apt metadata in .gz (Packages, Sources, > Contents...) is not made --rsyncable, but I am quite sure the chances of > anyone doing official changes to dpkg to use --rsyncable right now are nil. Zsync checksum files are, depending on block size, about 3% of the file size. For the full archive that means under 10G more data. As comparison adding amd64 needs ~30G. After the scc split there might be enough space on mirrors for both. zsync is also more capable then rsync and can sync a normal gzip file efficiently from the checksums of the uncompressed file. It will download chunks of the gzip file containing changes and reconstruct the gzip file from the local uncompressed data and those chunks. The --rsyncable option is not needed as zsync can pinpoint the exact byte where the changed uncompressed block starts in the gziped file. MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
Kurt Roeckx <[EMAIL PROTECTED]> writes: > On Wed, Oct 26, 2005 at 05:11:00AM -0700, Ian Bruce wrote: >> >> If the .deb files were compressed using the gzip "--rsyncable" option, >> then fetching them with zsync (or rsync) would be considerably more >> efficient than straight HTTP transfers. > > No it wouldn't. Remember that .deb files are never supposed to > change. For other files like Packages and Sources this might > work indeed. > > > Kurt Two things to note: 1) the apt cache (or local mirror) can contain the previous version of a deb and that can be used as template to zsync the new package. 2) Packages files now have the new diff files that consist of ed scripts which are even smaller than rsync/zsync would achieve. MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On 10/27/05, Henrique de Moraes Holschuh <[EMAIL PROTECTED]> wrote: > On Thu, 27 Oct 2005, Robert Lemmen wrote: > > if zsync would be taught to handle .deb files as it does .gz files, and > > You are talking about freaking lot of metadata here, and about changing some > key stuff to get --rsyncable compression. > > I may not understand why most apt metadata in .gz (Packages, Sources, > Contents...) is not made --rsyncable, but I am quite sure the chances of > anyone doing official changes to dpkg to use --rsyncable right now are nil. --rsyncable does not change the format of the output; it merely tweaks the compressor in such a way that the result _tends_ to be more rsyncable. It can be decompressed in exactly the same way as before.
Re: Packages file missing from unstable archive
On Thu, 27 Oct 2005, Robert Lemmen wrote: > if zsync would be taught to handle .deb files as it does .gz files, and You are talking about freaking lot of metadata here, and about changing some key stuff to get --rsyncable compression. I may not understand why most apt metadata in .gz (Packages, Sources, Contents...) is not made --rsyncable, but I am quite sure the chances of anyone doing official changes to dpkg to use --rsyncable right now are nil. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On Wed, Oct 26, 2005 at 09:15:38PM -0400, Joey Hess wrote: > (And yes, we still need a solution to speed up the actual deb file > downloads..) i think zsync is the way to go here. it would cause no load on the servers as rsync does, and only require a few percent more of mirror space. if zsync would be taught to handle .deb files as it does .gz files, and a method for apt be written, how big are the chances that support could be integrated into dak? the effort wouldn't be *that* big... cu robert -- Robert Lemmen http://www.semistable.com signature.asc Description: Digital signature
Re: Packages file missing from unstable archive
On Wed, Oct 26, 2005 at 04:47:21PM -0700, Ian Bruce wrote: > As explained, I wish to use rsync (or preferably, zsync) to update the > local packages list; repeatedly downloading the 3.6MB "Packages.gz" file > over a 56kb/s link is highly undesirable. I am unable to understand why > this ambition is considered to be unreasonable. as joey already said, the index diff stuff is the way to go (it's also more efficient than rsync). if you can't use apt-get from experimental, there is also a script by aba and a c implementation by me that you can both get from http://www.semistable.com/files. in the script you should replace "ed" with "red", the c implementation hasn't been touched in a while and i don't know how well it works now cu robert -- Robert Lemmen http://www.semistable.com signature.asc Description: Digital signature
Re: Packages file missing from unstable archive
Ian Bruce wrote: > As explained, I wish to use rsync (or preferably, zsync) to update the > local packages list; repeatedly downloading the 3.6MB "Packages.gz" file > over a 56kb/s link is highly undesirable. I am unable to understand why > this ambition is considered to be unreasonable. Is there some reason you're ignoring the parts of this thread where the diff stuff is explained? This is an apt-get update, using that on dialup. 28.8 or so, my line sucks. I haven't updated for a couple of days. [EMAIL PROTECTED]:~>sudo apt-get update Get:1 http://ftp.debian.org unstable Release.gpg [189B] Ign http://ftp.debian.org unstable/main Translation-en Ign http://ftp.debian.org unstable/contrib Translation-en Ign http://ftp.debian.org unstable/non-free Translation-en Ign http://ftp.debian.org unstable/main Translation-en Ign http://ftp.debian.org unstable/contrib Translation-en Ign http://ftp.debian.org unstable/non-free Translation-en Get:2 http://ftp.debian.org ../project/experimental Release.gpg [189B] Ign http://ftp.debian.org ../project/experimental/main Translation-en Get:3 http://uqm.debian.net unstable/ Release.gpg [189B] Hit http://ftp.debian.org unstable Release Ign http://uqm.debian.net unstable/ Translation-en Get:4 http://ftp.debian.org ../project/experimental Release [21.6kB] Get:5 http://ftp.debian.org unstable/main Packages/DiffIndex [1760B] Get:6 http://ftp.debian.org unstable/contrib Packages/DiffIndex [1609B] Get:7 http://ftp.debian.org unstable/non-free Packages/DiffIndex [919B] Get:8 http://ftp.debian.org unstable/main Sources/DiffIndex [1747B] Get:9 http://ftp.debian.org unstable/contrib Sources/DiffIndex [1609B] Get:10 http://ftp.debian.org unstable/non-free Sources/DiffIndex [919B] Ign http://ftp.debian.org ../project/experimental/main Packages/DiffIndex Get:11 2005-10-25-1310.14.pdiff [16.9kB] Get:12 2005-10-25-1310.14.pdiff [16.9kB] Get:13 2005-10-25-1310.14.pdiff [16.9kB] Get:14 2005-10-25-1310.14.pdiff [189B] Get:15 2005-10-25-1310.14.pdiff [189B] Get:16 2005-10-25-1310.14.pdiff [326B] Get:17 2005-10-25-1310.14.pdiff [326B] Get:18 2005-10-25-1310.14.pdiff [9562B] Hit http://uqm.debian.net unstable/ Release Get:19 2005-10-25-1310.14.pdiff [189B] Get:20 2005-10-25-1310.14.pdiff [326B] Get:21 2005-10-25-1310.14.pdiff [9562B] Get:22 2005-10-25-1310.14.pdiff [9562B] Get:23 2005-10-25-1310.14.pdiff [31B] Get:24 2005-10-25-1310.14.pdiff [31B] Get:25 2005-10-25-1310.14.pdiff [31B] Get:26 2005-10-25-1310.14.pdiff [255B] Get:27 2005-10-25-1310.14.pdiff [255B] Get:28 2005-10-25-1310.14.pdiff [255B] Ign http://ftp.debian.org ../project/experimental/main Packages Get:29 2005-10-26-1312.16.pdiff [13.0kB] Ign http://uqm.debian.net unstable/ Packages/DiffIndex Get:30 2005-10-26-1312.16.pdiff [13.0kB] Get:31 2005-10-26-1312.16.pdiff [13.0kB] Get:32 2005-10-26-1312.16.pdiff [240B] Get:33 2005-10-26-1312.16.pdiff [240B] Ign http://uqm.debian.net unstable/ Packages Get:34 2005-10-26-1312.16.pdiff [7942B] Get:35 2005-10-26-1312.16.pdiff [240B] Get:36 2005-10-26-1312.16.pdiff [7942B] Get:37 2005-10-26-1312.16.pdiff [7942B] Hit http://uqm.debian.net unstable/ Packages Get:38 2005-10-26-1312.16.pdiff [260B] Get:39 2005-10-26-1312.16.pdiff [260B] Get:40 2005-10-26-1312.16.pdiff [260B] Get:41 http://ftp.debian.org ../project/experimental/main Packages [214kB] Fetched 42.1MB in 2m4s (338kB/s) ^ If experimental had these diff files too, I could shave another 20 seconds or so off, but as it is, 2 minute Packages updates over dialup is faster than things have been for at least ten years. If you go back and read several of the threads about different ways to speed up Packages file downloads, most of their volume is people standing around bikeshedding, promoting their favorite pet ideas. The fact that some people went off and produced a working solution despite that wins them my utmost respect. Ignoring what they've done and repeating the same tired arguments is lame. (And yes, we still need a solution to speed up the actual deb file downloads..) -- see shy jo signature.asc Description: Digital signature
Re: Packages file missing from unstable archive
On Thu, 27 Oct 2005 00:24:36 +0200 Joerg Jaspert <[EMAIL PROTECTED]> wrote: > > Returning to the original question: Does anybody know why the > > uncompressed "Packages" file has disappeared from the "unstable" > > archive? > > Because relevant tools do not / should not use that file since years. > It was announced *long* ago "to be in a few days", so now it happened. > See: > http://lists.debian.org/debian-devel-announce/2002/08/msg8.html I hadn't seen that announcement before, but it still doesn't answer the question of "why". As explained, I wish to use rsync (or preferably, zsync) to update the local packages list; repeatedly downloading the 3.6MB "Packages.gz" file over a 56kb/s link is highly undesirable. I am unable to understand why this ambition is considered to be unreasonable. (At this point, somebody is sure to say "because rsync imposes too much computational load on the network servers." Shouldn't the decision of whether or not to offer rsync access be up to the administrators of each individual mirror? In any case, zsync is the solution to the problem; it would decrease the servers' network load without increasing their compute load.) As far as I can see, updating the packages list with rsync requires either an uncompressed "Packages" file, or a "Packages.gz" file compressed with the "--rsyncable" option. Currently, neither of these exists in the "unstable" archive (and according to that announcement, the "testing" archive will follow). Why is rsync considered to be an undesirable method of accessing the archive? The relative costs of network traffic versus CPU cycles are quite different in many places outside the United States. Why are the needs of sites with poor network connectivity considered unimportant? If there are any "relevant tools" which can update the package lists without downloading the whole file, and without using rsync/zsync, please advise me of such. I'm not committed to any particular solution or piece of software. I just don't understand why the issue of minimizing network traffic is thought to be universally irrelevant. Why shouldn't there be a variety of access methods, to address the varying situations of different client and mirror sites? -- Ian Bruce -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On 10454 March 1977, Ian Bruce wrote: > Returning to the original question: Does anybody know why the > uncompressed "Packages" file has disappeared from the "unstable" > archive? Because relevant tools do not / should not use that file since years. It was announced *long* ago "to be in a few days", so now it happened. See: http://lists.debian.org/debian-devel-announce/2002/08/msg8.html -- bye Joerg It seems to me that the account creation step could be fully automated: checking the box "approved by DAM" could trigger an insert into the LDAP database thereby creating the account. <[EMAIL PROTECTED]> pgpRyQBc4dc9n.pgp Description: PGP signature
Re: Packages file missing from unstable archive
On Wed, 26 Oct 2005 19:12:30 +0200 Kurt Roeckx <[EMAIL PROTECTED]> wrote: > > If the .deb files were compressed using the gzip "--rsyncable" > > option, then fetching them with zsync (or rsync) would be > > considerably more efficient than straight HTTP transfers. > > No it wouldn't. Remember that .deb files are never supposed to > change. For other files like Packages and Sources this might > work indeed. zsync has an option ("-i") to specify a local file with another name to be used as a reference for the difference algorithm. In the case of apt-get, or especially apt-proxy, where you have previous versions of the package lying around, with similar filenames, this is obviously the way zsync would be used. Returning to the original question: Does anybody know why the uncompressed "Packages" file has disappeared from the "unstable" archive? Can it either be replaced, or alternatively, can the "Packages.gz" file be compressed using the "--rsyncable" option, so that rsync can again be used for updating the packages list? -- Ian Bruce -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On Wed, Oct 26, 2005 at 05:11:00AM -0700, Ian Bruce wrote: > > If the .deb files were compressed using the gzip "--rsyncable" option, > then fetching them with zsync (or rsync) would be considerably more > efficient than straight HTTP transfers. No it wouldn't. Remember that .deb files are never supposed to change. For other files like Packages and Sources this might work indeed. Kurt -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On Wed, 26 Oct 2005, Ian Bruce wrote: > option was implemented. Perhaps it's thought that more testing is > required before it can be used for the archives; is there any other > reason not to use it? The way gzip --rsyncable works is perfectly safe, it cannot cause data loss AFAIK. It just makes gzip begin compression blocks in predictable places of the plaintext data, that tend to stay constant. OTOH, it does decrease compression ratio (probably *very* little). But if compression ratio was important, we would have switched to bzip2 for everything anyway. AFAIK, there is no technical reason for not using gzip --rsyncable, other than the simple fact that nobody modified the dak code yet. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On Wed, 26 Oct 2005 12:05:08 +0200 Goswin von Brederlow <[EMAIL PROTECTED]> wrote: > > -- has there been any progress towards providing zsync access to the > > archives? It would seem that this would result in greatly reduced > > data traffic on the network servers, without increasing the > > computational load, as rsync does; I gather that this is the main > > objection to its use. > > zsync uses http so you already have access. What is missing are the > checksum files. Also, last I checked, zsync didn't yet have support to > sync the contents of a deb as opposed to syncing the compressed data > itself. Before that the savings are minimal at best. To speak of zsync access obviously implies the existence of the control files, which, as you say, are not there. Therefore the archives are not currently accessible with zsync. If the .deb files were compressed using the gzip "--rsyncable" option, then fetching them with zsync (or rsync) would be considerably more efficient than straight HTTP transfers. That's the reason why that option was implemented. Perhaps it's thought that more testing is required before it can be used for the archives; is there any other reason not to use it? -- Ian Bruce -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
Ian Bruce <[EMAIL PROTECTED]> writes: > Some related questions: > > -- what is the purpose of the "Packages.diff/" directory which has > appeared in the "testing" and "unstable" archives? Is there some piece > of software which makes use of this for updating the packages lists? apt-get (experimental only iirc) uses this to only download changes since the last update. This has been brewing for a long time and was previously available from people.d.o/~aba. It is now integrated into tiffany. > -- is it possible that the "Packages.gz" files could be compressed using > the gzip "--rsyncable" option? Or is this already the case? Not sure about the rsyncable but it contains the timestamp and possibly file permissions or something as gzip does. Running a gzip Packages localy does not give an identical result. For debmirror I rsync the Packages.gz, gunzip, rsync the Packages, bzip2, rsync the Packages.bz2 (all only if the md5sums don't match) > -- has there been any progress towards providing zsync access to the > archives? It would seem that this would result in greatly reduced data > traffic on the network servers, without increasing the computational > load, as rsync does; I gather that this is the main objection to its > use. zsync uses http so you already have access. What is missing are the checksum files. Also, last I checked, zsync didn't yet have support to sync the contents of a deb as opposed to syncing the compressed data itself. Before that the savings are minimal at best. MfG Goswin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Packages file missing from unstable archive
On Wed, 26 Oct 2005 21:32, Ian Bruce wrote: > It seems that recently, the uncompressed version of the "Packages" file > has disappeared from the "unstable" archive on the Debian network > servers and all their mirrors. > > http://ftp.debian.org/debian/dists/unstable/main/binary-i386/ > > On the other hand, the uncompressed file is still available for the > "stable" and "testing" archives. > > http://ftp.debian.org/debian/dists/stable/main/binary-i386/ > http://ftp.debian.org/debian/dists/testing/main/binary-i386/ > > What is the explanation for this decision? It makes it impossible to > use rsync to update the packages list. (Perhaps this was actually the > motivation for the change, but shouldn't it be up to the administrators > of each mirror whether or not they want to allow rsync access?) > > Some related questions: > > -- what is the purpose of the "Packages.diff/" directory which has > appeared in the "testing" and "unstable" archives? Is there some piece > of software which makes use of this for updating the packages lists? > > -- is it possible that the "Packages.gz" files could be compressed > using the gzip "--rsyncable" option? Or is this already the case? > > -- has there been any progress towards providing zsync access to the > archives? It would seem that this would result in greatly reduced data > traffic on the network servers, without increasing the computational > load, as rsync does; I gather that this is the main objection to its > use. > > Perhaps the answers to these questions are available in some obvious > place; I looked everywhere that occurred to me, but didn't find > anything. I got caught with this too. Then I remembered a discussion some time back to the effect that to download Packages was a waste of time and that diffs would do the job. I personally don't bother with the diffs, but use buzip2 -fk .../dists/sid/main/binary-hurd-i386/packages.bz2 Phil. -- Philip Charles; 39a Paterson Street, Abbotsford, Dunedin, New Zealand +64 3 488 2818Fax +64 3 488 2875Mobile 025 267 9420 [EMAIL PROTECTED] - preferred. [EMAIL PROTECTED] I sell GNU/Linux & GNU/Hurd CDs & DVDs. See http://www.copyleft.co.nz -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Packages file missing from unstable archive
It seems that recently, the uncompressed version of the "Packages" file has disappeared from the "unstable" archive on the Debian network servers and all their mirrors. http://ftp.debian.org/debian/dists/unstable/main/binary-i386/ On the other hand, the uncompressed file is still available for the "stable" and "testing" archives. http://ftp.debian.org/debian/dists/stable/main/binary-i386/ http://ftp.debian.org/debian/dists/testing/main/binary-i386/ What is the explanation for this decision? It makes it impossible to use rsync to update the packages list. (Perhaps this was actually the motivation for the change, but shouldn't it be up to the administrators of each mirror whether or not they want to allow rsync access?) Some related questions: -- what is the purpose of the "Packages.diff/" directory which has appeared in the "testing" and "unstable" archives? Is there some piece of software which makes use of this for updating the packages lists? -- is it possible that the "Packages.gz" files could be compressed using the gzip "--rsyncable" option? Or is this already the case? -- has there been any progress towards providing zsync access to the archives? It would seem that this would result in greatly reduced data traffic on the network servers, without increasing the computational load, as rsync does; I gather that this is the main objection to its use. Perhaps the answers to these questions are available in some obvious place; I looked everywhere that occurred to me, but didn't find anything. -- Ian Bruce -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]