Re: Packages file missing from unstable archive

2005-11-19 Thread Goswin von Brederlow
Anthony Towns  writes:

> Hrm, thinking about it, I guess zsync probably works by storing the
> state of the gzip table at certain points in the file and doing a
> rolling hash of the contents and recompressing each chunk of the file;
> that'd result in the size of the .gz not necessarily being the same, let
> alone the md5sum.

zsync has to recompress the raw data locally and for that it has to
guess at the implementation used to compress the initial file. But for
debs that should be deterministic. zsync can garanty that
recompressing gives the same result by checking that is does when
creating the checksum files. If the input file and zsync's
recompression result in the same then it will always be the same
unless zsync changes its gzip implementation.

> Feh, trying to verify this with ~512kB of random data, gzipped, I just
> keep getting "Aborting, download available in zsyncnew.gz.part". That's
> not terribly reassuring. And trying it with gzipped text data, I get
> stuck on 99.0%, with zsync repeatedly requesting around 700 bytes.
>
> Anyway, if it's recompressing like I think, there's no way to get the
> same compressed md5sum -- even if the information could be transferred,
> there's no guarantee the local gzip _can_ produce the same output as
> the remote gzip -- imagine if it had used gzip -9 and your local gzip
> only supports -1 through -5, eg.

zsync doesn't fork of some unknown local gzip and it knows what its
own gzip routines can produce. It can easily be guaranteed that the
zsync client behaves the same way as the remote zsync checksum program
that would test for recompressability.

The failure to sync the file is definetly a bug in zsync. Even if the
recompression fails (which it should know beforehand) it should fall
back to syncing the compressed data and produce the expected result.

> Hrm, it probably also means that mirrors can't use zsync -- that is,
> if you zsync fooA to fooB you probably can't use fooA.zsync to zsync
> from fooB to fooC.
>
> Anyway, just because you get a different file, that doesn't mean it'll
> act differently; so we could just use an "authentication" mechanism
> that reflects that. That might involve providing sizes and sha1s of the
> uncompressed contents of the ar in the packages file, instead of the
> md5sum of the ar. Except the previous note probably means that you'd
> still need to use the md5sum of the .deb to verify mirrors; which means
> mirrors and users would have different ways of verifying their
> downloads, which is probably fairly undesirable.

Too bad Packages files contain the md5sum of the full deb. Changing
that would be a ugly and lengthy process. So lets not do that.

The only sane way is to make zsync produce identical debs. It isn't
trivial but not impossible.

> Relatedly, mirrors (and apt-proxy users, etc) need to provide Packages.gz
> of a particular md5sum/size, so they can't use Packages.diff to speed
> up their diffs. It might be worth considering changing the Release file
> definition to just authenticate the uncompressed files and expect tools
> like apt and debootstrap to authenticate only after uncompressing. A
> "Compression-Methods: gz, bz2" header might suffice to help tools work
> out whether to try downloading Packages.gz, Packages.bz2 or just plain
> Packages first. Possibly "Packages-Compress:" and "Sources-Compress:"
> might be better.
>
> Cheers,
> aj

% gunzip Packages.gz.2
% gunzip Packages.gz.3
% gunzip Packages.gz.4
% gunzip Packages.gz.5
% md5sum *
172930d0165cf3f7b23324ec79e52847  Packages.gz
be00244619e0ed53ae2ba5a454aa3fee  Packages.gz.2
d4c7c8e04d963beb4d3bee4ac8e7bd0f  Packages.gz.3
764c5aa8168cb58d5e4d6412333516a5  Packages.gz.4
764c5aa8168cb58d5e4d6412333516a5  Packages.gz.5

The problem is the timestamp in gzip files. If you patch the DAK to
use the -n switch then Packages.diff can be used to update Packages
and then recompress it.

Further zsync could include the timestamp in the .zsync file and
recompress to the same timestamp.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-11-11 Thread Peter Samuelson

> On Tue, Nov 01, 2005 at 09:54:09AM -0500, Michael Vogt wrote:
> > A problem is that zsync needs to teached to deal with deb files (that
> > is, that it needs to unpack the data.tar and use that for the syncs).

[Anthony Towns]
> That seems kinda awkward -- you'd need to start by downloading the ar
> header, working out where in the file the data.tar.gz starts, then
> redownloading from there. I guess you could include that info in the
> .zsync file though.

Right, the latter.  Having downloaded the .zsync file, you calculate
your local checksums against the ones in that file and you know exactly
what's left to be downloaded and what to do with it.  The .zsync file
includes a sort of map of the structure of the target, not unlike a
jigdo file.

> OTOH, there should be savings in the control.tar.gz too, surely --
> it'd change less than data.tar.gz most of the time, no?

He was only comparing data.tar.gz because that made for a simpler
mock-up.  zsync doesn't currently dig into a .deb at all, so this was
just a simulation, as it were.

> Hrm, thinking about it, I guess zsync probably works by storing the
> state of the gzip table at certain points in the file and doing a
> rolling hash of the contents and recompressing each chunk of the
> file

I haven't actually looked at the implementation of zsync, but I've
always assumed that zsync assumes a homogeneous (i.e., predictable)
gzip algorithm everywhere, works out the known variables by trial and
error, and stores the appropriate amount of state to reproduce the gzip
file exactly, given the assumptions about the gzip implementation.

For that to be correct assumes a certain homogeneity of the zlib used
by zsync implementations; for it to be efficient assumes the same about
whatever is used to compress files in gzip format.  I've always
harbored my doubts about the deployment scalability of this approach.

> Anyway, just because you get a different file, that doesn't mean
> it'll act differently; so we could just use an "authentication"
> mechanism that reflects that. That might involve providing sizes and
> sha1s of the uncompressed contents of the ar in the packages file,
> instead of the md5sum of the ar.

Authenticating uncompressed content is a good design choice anyway.
Makes it easier, for instance, to add gpg signatures inside the ar
file, without invalidating existing checksum authentication.

Conceptually, authenticating content based on a container which is
essentially nondeterministic is a bit like refusing to authenticate a
person because he or she is wearing different clothes from the ones
noted in the auth database.


signature.asc
Description: Digital signature


Re: Packages file missing from unstable archive

2005-11-11 Thread Tim Dijkstra
On Fri, 11 Nov 2005 14:51:30 +1000
Anthony Towns  wrote:

> Anyway, if it's recompressing like I think, there's no way to get the
> same compressed md5sum -- even if the information could be
> transferred, there's no guarantee the local gzip _can_ produce the
> same output as the remote gzip -- imagine if it had used gzip -9 and
> your local gzip only supports -1 through -5, eg.

We could just mandate in policy that what the gzip level is supposed to
be. If we're going to do that, it's probably easier to just use
--rsyncable and teach zsync to do look-in-ar instead of
look-in-gz. Also we wouldn't have the md5sum problem on the data.tar.gz
then. Note that I haven't tested the efficiency of --rsyncable...

grts Tim


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-11-10 Thread Anthony Towns
On Tue, Nov 01, 2005 at 09:54:09AM -0500, Michael Vogt wrote:
> My next test was to use only the data.tar.gz of the two
> archives. Zsync will extract the gzip file then and use the tar as the
> base. With that I got:
> 8<
> Read data.tar.gz. Target 34.1% complete.
> used 1056768 local, fetched 938415
> 8<
> The size of the data.tar.gz is 1210514. 

Fetching 938kB instead of 1210kB is a 22.5% saving, so 12% of the desired
data was apparently already present, but redownloaded anyway.

> A problem is that zsync needs to teached to deal with deb files (that
> is, that it needs to unpack the data.tar and use that for the syncs).

That seems kinda awkward -- you'd need to start by downloading the
ar header, working out where in the file the data.tar.gz starts, then
redownloading from there. I guess you could include that info in the
.zsync file though. OTOH, there should be savings in the control.tar.gz
too, surely -- it'd change less than data.tar.gz most of the time, no?

How much zsync data is required for that 22.5% saving over 1MB? I guess
it'd be about 16 bytes per 4k of uncompressed data, assuming 33%
compression, that's 16bytes per 3kB, or .5% overhead. For 100GB of debs
in the archive, that's about an extra half gig of space used.

Hrm, thinking about it, I guess zsync probably works by storing the
state of the gzip table at certain points in the file and doing a
rolling hash of the contents and recompressing each chunk of the file;
that'd result in the size of the .gz not necessarily being the same, let
alone the md5sum.

Feh, trying to verify this with ~512kB of random data, gzipped, I just
keep getting "Aborting, download available in zsyncnew.gz.part". That's
not terribly reassuring. And trying it with gzipped text data, I get
stuck on 99.0%, with zsync repeatedly requesting around 700 bytes.

Anyway, if it's recompressing like I think, there's no way to get the
same compressed md5sum -- even if the information could be transferred,
there's no guarantee the local gzip _can_ produce the same output as
the remote gzip -- imagine if it had used gzip -9 and your local gzip
only supports -1 through -5, eg.

Hrm, it probably also means that mirrors can't use zsync -- that is,
if you zsync fooA to fooB you probably can't use fooA.zsync to zsync
from fooB to fooC.

Anyway, just because you get a different file, that doesn't mean it'll
act differently; so we could just use an "authentication" mechanism
that reflects that. That might involve providing sizes and sha1s of the
uncompressed contents of the ar in the packages file, instead of the
md5sum of the ar. Except the previous note probably means that you'd
still need to use the md5sum of the .deb to verify mirrors; which means
mirrors and users would have different ways of verifying their
downloads, which is probably fairly undesirable.

Relatedly, mirrors (and apt-proxy users, etc) need to provide Packages.gz
of a particular md5sum/size, so they can't use Packages.diff to speed
up their diffs. It might be worth considering changing the Release file
definition to just authenticate the uncompressed files and expect tools
like apt and debootstrap to authenticate only after uncompressing. A
"Compression-Methods: gz, bz2" header might suffice to help tools work
out whether to try downloading Packages.gz, Packages.bz2 or just plain
Packages first. Possibly "Packages-Compress:" and "Sources-Compress:"
might be better.

Cheers,
aj


signature.asc
Description: Digital signature


Re: Packages file missing from unstable archive

2005-11-10 Thread Anthony Towns
On Wed, Nov 09, 2005 at 04:26:59PM +0100, Goswin von Brederlow wrote:
> Anthony Towns  writes:
> > On Sun, Oct 30, 2005 at 09:48:35AM +0100, Goswin von Brederlow wrote:
> >> Zsync checksum files are, depending on block size, about 3% of the
> >> file size. For the full archive that means under 10G more data. As
> >> comparison adding amd64 needs ~30G. After the scc split there might be
> >> enough space on mirrors for both.
> > Adding amd64 needs 30G? Since when?
> With stable/testing/unstable/experimental it should end up around
> there I think. Its 6-7G for the amd64 sarge debs so depending on
> overlap you get more or less.

Assuming no overlap, and your numbers you get 3 * 7 = 21 << 30. 

For architectures in the archive, including oldstable through
experimental, disk space used by debs of that architecture range from 9GB
(m68k) to 14GB (i386, ia64), including 13GB arch:all packages.

It's necessary to have accurate numbers on these things, rather than
pulling things out of the air.

Cheers,
aj



signature.asc
Description: Digital signature


Re: Packages file missing from unstable archive

2005-11-09 Thread Goswin von Brederlow
Michael Vogt <[EMAIL PROTECTED]> writes:

> 8<
> Read data.tar.gz. Target 34.1% complete.
> used 1056768 local, fetched 938415
> 8<
> The size of the data.tar.gz is 1210514. 

So your simple test shows 34% savings for a mixed binary/doc
package. That is very promising. Now imagine syncing the X fonts that
didn't actualy change contents between releases.

> A problem is that zsync needs to teached to deal with deb files (that
> is, that it needs to unpack the data.tar and use that for the syncs).
>
> Having it inside dak is not (at the beging) a requirement. Zsync seems
> to be able to deal with URLs, so we could create a pool with zsync
> files on any server and let them point to ftp.debian.org.

Correct. Someone just needs to setup a debian mirror, run zsync over
every new file and make the checksum files available somwhere
public. Something I'm willing to do using alioth to server the
checksum files. But first zsync has to look into debs.

> We need to guarantee that the md5sum of the synced deb must match the
> md5sum in the Packages file. Initial tests indicate that that is not
> the case. Only the md5sum of the unpacked data.tar file matches, not
> from the gzip file (or the deb). This is a serious showstoper IMHO. 

Is that not just a problem of the data.tar.gz containing a timestamp
that differs now? How many bytes differ between the original and the
zsync result?

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-11-09 Thread Goswin von Brederlow
Anthony Towns  writes:

> On Sun, Oct 30, 2005 at 09:48:35AM +0100, Goswin von Brederlow wrote:
>> Zsync checksum files are, depending on block size, about 3% of the
>> file size. For the full archive that means under 10G more data. As
>> comparison adding amd64 needs ~30G. After the scc split there might be
>> enough space on mirrors for both.
>
> Adding amd64 needs 30G? Since when?

With stable/testing/unstable/experimental it should end up around
there I think. Its 6-7G for the amd64 sarge debs so depending on
overlap you get more or less.

> And stuff doesn't go on the mirrors because it's "under 30G", it goes on
> the mirrors because it provides useful benefits. Where're the statistics
> showing how much zsync signatures actually help?

Before zsync looks into debs as well as gz files there won't be any
reasonable gain for debs and for the Packages/Sources files the diff
method works better. So for now there is no big gain and no strong
argument for adding zsync. But the potential is there for anyone
willing to invest some time and improve the code.

> Cheers,
> aj

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-11-01 Thread Michael Vogt
On Thu, Oct 27, 2005 at 10:06:22AM +0200, Robert Lemmen wrote:
> On Wed, Oct 26, 2005 at 09:15:38PM -0400, Joey Hess wrote:
> > (And yes, we still need a solution to speed up the actual deb file
> > downloads..)
[..]
> if zsync would be taught to handle .deb files as it does .gz files, and
> a method for apt be written, how big are the chances that support could 
> be integrated into dak? the effort wouldn't be *that* big...

I did a pretty unscientific test with apt and the changes from 
0.6.41 -> 0.6.42.1. It contains a good mix of code changes,
documentation updates and translation updates [1].

With the two normals debs I got no effect at all because no usable
data was found. 

I then repacked the data.tar.gz and control.tar.gz inside the deb with
"--rsyncable" (and reassmbled the deb).  This resulted in:
"Read apt_0.6.41_i386.deb. Target 0.8% complete."
So this didn't had a lot of effect either. 

My next test was to use only the data.tar.gz of the two
archives. Zsync will extract the gzip file then and use the tar as the
base. With that I got:
8<
Read data.tar.gz. Target 34.1% complete.
used 1056768 local, fetched 938415
8<
The size of the data.tar.gz is 1210514. 

A problem is that zsync needs to teached to deal with deb files (that
is, that it needs to unpack the data.tar and use that for the syncs).

Having it inside dak is not (at the beging) a requirement. Zsync seems
to be able to deal with URLs, so we could create a pool with zsync
files on any server and let them point to ftp.debian.org.

We need to guarantee that the md5sum of the synced deb must match the
md5sum in the Packages file. Initial tests indicate that that is not
the case. Only the md5sum of the unpacked data.tar file matches, not
from the gzip file (or the deb). This is a serious showstoper IMHO. 


Cheers,
 Michael

[1] I would love to hear results from other people testing it with
different packages and different changes.
-- 
Linux is not The Answer. Yes is the answer. Linux is The Question. - Neo


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-11-01 Thread Anthony Towns
On Sun, Oct 30, 2005 at 09:48:35AM +0100, Goswin von Brederlow wrote:
> Zsync checksum files are, depending on block size, about 3% of the
> file size. For the full archive that means under 10G more data. As
> comparison adding amd64 needs ~30G. After the scc split there might be
> enough space on mirrors for both.

Adding amd64 needs 30G? Since when?

And stuff doesn't go on the mirrors because it's "under 30G", it goes on
the mirrors because it provides useful benefits. Where're the statistics
showing how much zsync signatures actually help?

Cheers,
aj


signature.asc
Description: Digital signature


Re: Packages file missing from unstable archive

2005-10-30 Thread Goswin von Brederlow
Henrique de Moraes Holschuh <[EMAIL PROTECTED]> writes:

> On Thu, 27 Oct 2005, Robert Lemmen wrote:
>> if zsync would be taught to handle .deb files as it does .gz files, and
>
> You are talking about freaking lot of metadata here, and about changing some
> key stuff to get --rsyncable compression.
>
> I may not understand why most apt metadata in .gz (Packages, Sources,
> Contents...) is not made --rsyncable, but I am quite sure the chances of
> anyone doing official changes to dpkg to use --rsyncable right now are nil.

Zsync checksum files are, depending on block size, about 3% of the
file size. For the full archive that means under 10G more data. As
comparison adding amd64 needs ~30G. After the scc split there might be
enough space on mirrors for both.

zsync is also more capable then rsync and can sync a normal gzip file
efficiently from the checksums of the uncompressed file. It will
download chunks of the gzip file containing changes and reconstruct
the gzip file from the local uncompressed data and those chunks. The
--rsyncable option is not needed as zsync can pinpoint the exact byte
where the changed uncompressed block starts in the gziped file.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-29 Thread Goswin von Brederlow
Kurt Roeckx <[EMAIL PROTECTED]> writes:

> On Wed, Oct 26, 2005 at 05:11:00AM -0700, Ian Bruce wrote:
>> 
>> If the .deb files were compressed using the gzip "--rsyncable" option,
>> then fetching them with zsync (or rsync) would be considerably more
>> efficient than straight HTTP transfers.
>
> No it wouldn't.  Remember that .deb files are never supposed to
> change.  For other files like Packages and Sources this might
> work indeed.
>
>
> Kurt

Two things to note:

1) the apt cache (or local mirror) can contain the previous version of
a deb and that can be used as template to zsync the new package.

2) Packages files now have the new diff files that consist of ed
scripts which are even smaller than rsync/zsync would achieve.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-27 Thread Bryan Donlan
On 10/27/05, Henrique de Moraes Holschuh <[EMAIL PROTECTED]> wrote:
> On Thu, 27 Oct 2005, Robert Lemmen wrote:
> > if zsync would be taught to handle .deb files as it does .gz files, and
>
> You are talking about freaking lot of metadata here, and about changing some
> key stuff to get --rsyncable compression.
>
> I may not understand why most apt metadata in .gz (Packages, Sources,
> Contents...) is not made --rsyncable, but I am quite sure the chances of
> anyone doing official changes to dpkg to use --rsyncable right now are nil.

--rsyncable does not change the format of the output; it merely tweaks
the compressor in such a way that the result _tends_ to be more
rsyncable. It can be decompressed in exactly the same way as before.



Re: Packages file missing from unstable archive

2005-10-27 Thread Henrique de Moraes Holschuh
On Thu, 27 Oct 2005, Robert Lemmen wrote:
> if zsync would be taught to handle .deb files as it does .gz files, and

You are talking about freaking lot of metadata here, and about changing some
key stuff to get --rsyncable compression.

I may not understand why most apt metadata in .gz (Packages, Sources,
Contents...) is not made --rsyncable, but I am quite sure the chances of
anyone doing official changes to dpkg to use --rsyncable right now are nil.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-27 Thread Robert Lemmen
On Wed, Oct 26, 2005 at 09:15:38PM -0400, Joey Hess wrote:
> (And yes, we still need a solution to speed up the actual deb file
> downloads..)

i think zsync is the way to go here. it would cause no load on the
servers as rsync does, and only require a few percent more of mirror
space.

if zsync would be taught to handle .deb files as it does .gz files, and
a method for apt be written, how big are the chances that support could 
be integrated into dak? the effort wouldn't be *that* big...

cu  robert

-- 
Robert Lemmen   http://www.semistable.com 


signature.asc
Description: Digital signature


Re: Packages file missing from unstable archive

2005-10-27 Thread Robert Lemmen
On Wed, Oct 26, 2005 at 04:47:21PM -0700, Ian Bruce wrote:
> As explained, I wish to use rsync (or preferably, zsync) to update the
> local packages list; repeatedly downloading the 3.6MB "Packages.gz" file
> over a 56kb/s link is highly undesirable. I am unable to understand why
> this ambition is considered to be unreasonable.

as joey already said, the index diff stuff is the way to go (it's also
more efficient than rsync). if you can't use apt-get from experimental,
there is also a script by aba and a c implementation by me that you can
both get from http://www.semistable.com/files. in the script you should
replace "ed" with "red", the c implementation hasn't been touched in a
while and i don't know how well it works now

cu  robert

-- 
Robert Lemmen   http://www.semistable.com 


signature.asc
Description: Digital signature


Re: Packages file missing from unstable archive

2005-10-26 Thread Joey Hess
Ian Bruce wrote:
> As explained, I wish to use rsync (or preferably, zsync) to update the
> local packages list; repeatedly downloading the 3.6MB "Packages.gz" file
> over a 56kb/s link is highly undesirable. I am unable to understand why
> this ambition is considered to be unreasonable.

Is there some reason you're ignoring the parts of this thread where the
diff stuff is explained?

This is an apt-get update, using that on dialup. 28.8 or so, my line sucks.
I haven't updated for a couple of days.

[EMAIL PROTECTED]:~>sudo apt-get update
Get:1 http://ftp.debian.org unstable Release.gpg [189B]
Ign http://ftp.debian.org unstable/main Translation-en
Ign http://ftp.debian.org unstable/contrib Translation-en
Ign http://ftp.debian.org unstable/non-free Translation-en
Ign http://ftp.debian.org unstable/main Translation-en
Ign http://ftp.debian.org unstable/contrib Translation-en
Ign http://ftp.debian.org unstable/non-free Translation-en
Get:2 http://ftp.debian.org ../project/experimental Release.gpg [189B]
Ign http://ftp.debian.org ../project/experimental/main Translation-en
Get:3 http://uqm.debian.net unstable/ Release.gpg [189B]
Hit http://ftp.debian.org unstable Release
Ign http://uqm.debian.net unstable/ Translation-en
Get:4 http://ftp.debian.org ../project/experimental Release [21.6kB]
Get:5 http://ftp.debian.org unstable/main Packages/DiffIndex [1760B]
Get:6 http://ftp.debian.org unstable/contrib Packages/DiffIndex [1609B]
Get:7 http://ftp.debian.org unstable/non-free Packages/DiffIndex [919B]
Get:8 http://ftp.debian.org unstable/main Sources/DiffIndex [1747B]
Get:9 http://ftp.debian.org unstable/contrib Sources/DiffIndex [1609B]
Get:10 http://ftp.debian.org unstable/non-free Sources/DiffIndex [919B]
Ign http://ftp.debian.org ../project/experimental/main Packages/DiffIndex
Get:11 2005-10-25-1310.14.pdiff [16.9kB]
Get:12 2005-10-25-1310.14.pdiff [16.9kB]
Get:13 2005-10-25-1310.14.pdiff [16.9kB]
Get:14 2005-10-25-1310.14.pdiff [189B]
Get:15 2005-10-25-1310.14.pdiff [189B]
Get:16 2005-10-25-1310.14.pdiff [326B]
Get:17 2005-10-25-1310.14.pdiff [326B]
Get:18 2005-10-25-1310.14.pdiff [9562B]
Hit http://uqm.debian.net unstable/ Release
Get:19 2005-10-25-1310.14.pdiff [189B]
Get:20 2005-10-25-1310.14.pdiff [326B]
Get:21 2005-10-25-1310.14.pdiff [9562B]
Get:22 2005-10-25-1310.14.pdiff [9562B]
Get:23 2005-10-25-1310.14.pdiff [31B]
Get:24 2005-10-25-1310.14.pdiff [31B]
Get:25 2005-10-25-1310.14.pdiff [31B]
Get:26 2005-10-25-1310.14.pdiff [255B]
Get:27 2005-10-25-1310.14.pdiff [255B]
Get:28 2005-10-25-1310.14.pdiff [255B]
Ign http://ftp.debian.org ../project/experimental/main Packages
Get:29 2005-10-26-1312.16.pdiff [13.0kB]
Ign http://uqm.debian.net unstable/ Packages/DiffIndex
Get:30 2005-10-26-1312.16.pdiff [13.0kB]
Get:31 2005-10-26-1312.16.pdiff [13.0kB]
Get:32 2005-10-26-1312.16.pdiff [240B]
Get:33 2005-10-26-1312.16.pdiff [240B]
Ign http://uqm.debian.net unstable/ Packages
Get:34 2005-10-26-1312.16.pdiff [7942B]
Get:35 2005-10-26-1312.16.pdiff [240B]
Get:36 2005-10-26-1312.16.pdiff [7942B]
Get:37 2005-10-26-1312.16.pdiff [7942B]
Hit http://uqm.debian.net unstable/ Packages
Get:38 2005-10-26-1312.16.pdiff [260B]
Get:39 2005-10-26-1312.16.pdiff [260B]
Get:40 2005-10-26-1312.16.pdiff [260B]
Get:41 http://ftp.debian.org ../project/experimental/main Packages [214kB]
Fetched 42.1MB in 2m4s (338kB/s)
^ 

If experimental had these diff files too, I could shave another 20
seconds or so off, but as it is, 2 minute Packages updates over dialup
is faster than things have been for at least ten years.

If you go back and read several of the threads about different
ways to speed up Packages file downloads, most of their volume is people
standing around bikeshedding, promoting their favorite pet ideas. The
fact that some people went off and produced a working solution despite
that wins them my utmost respect. Ignoring what they've done and
repeating the same tired arguments is lame.

(And yes, we still need a solution to speed up the actual deb file
downloads..)

--
see shy jo


signature.asc
Description: Digital signature


Re: Packages file missing from unstable archive

2005-10-26 Thread Ian Bruce
On Thu, 27 Oct 2005 00:24:36 +0200
Joerg Jaspert <[EMAIL PROTECTED]> wrote:

> > Returning to the original question: Does anybody know why the
> > uncompressed "Packages" file has disappeared from the "unstable"
> > archive?
> 
> Because relevant tools do not / should not use that file since years.
> It was announced *long* ago "to be in a few days", so now it happened.
> See:
> http://lists.debian.org/debian-devel-announce/2002/08/msg8.html

I hadn't seen that announcement before, but it still doesn't answer the
question of "why".

As explained, I wish to use rsync (or preferably, zsync) to update the
local packages list; repeatedly downloading the 3.6MB "Packages.gz" file
over a 56kb/s link is highly undesirable. I am unable to understand why
this ambition is considered to be unreasonable.

(At this point, somebody is sure to say "because rsync imposes too much
computational load on the network servers." Shouldn't the decision of
whether or not to offer rsync access be up to the administrators of each
individual mirror? In any case, zsync is the solution to the problem; it
would decrease the servers' network load without increasing their
compute load.)

As far as I can see, updating the packages list with rsync requires
either an uncompressed "Packages" file, or a "Packages.gz" file
compressed with the "--rsyncable" option. Currently, neither of these
exists in the "unstable" archive (and according to that announcement,
the "testing" archive will follow). Why is rsync considered to be an
undesirable method of accessing the archive? The relative costs of
network traffic versus CPU cycles are quite different in many places
outside the United States. Why are the needs of sites with poor network
connectivity considered unimportant?

If there are any "relevant tools" which can update the package lists
without downloading the whole file, and without using rsync/zsync,
please advise me of such. I'm not committed to any particular solution
or piece of software. I just don't understand why the issue of
minimizing network traffic is thought to be universally irrelevant. Why
shouldn't there be a variety of access methods, to address the varying
situations of different client and mirror sites?


-- Ian Bruce


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-26 Thread Joerg Jaspert
On 10454 March 1977, Ian Bruce wrote:

> Returning to the original question: Does anybody know why the
> uncompressed "Packages" file has disappeared from the "unstable"
> archive?

Because relevant tools do not / should not use that file since years. It
was announced *long* ago "to be in a few days", so now it happened.
See: http://lists.debian.org/debian-devel-announce/2002/08/msg8.html


-- 
bye Joerg
It seems to me that the account creation step could be fully automated:
checking the box "approved by DAM" could trigger an insert into the LDAP
database thereby creating the account.
   <[EMAIL PROTECTED]>


pgpRyQBc4dc9n.pgp
Description: PGP signature


Re: Packages file missing from unstable archive

2005-10-26 Thread Ian Bruce
On Wed, 26 Oct 2005 19:12:30 +0200
Kurt Roeckx <[EMAIL PROTECTED]> wrote:

> > If the .deb files were compressed using the gzip "--rsyncable"
> > option, then fetching them with zsync (or rsync) would be
> > considerably more efficient than straight HTTP transfers.
> 
> No it wouldn't.  Remember that .deb files are never supposed to
> change.  For other files like Packages and Sources this might
> work indeed.

zsync has an option ("-i") to specify a local file with another
name to be used as a reference for the difference algorithm. In the
case of apt-get, or especially apt-proxy, where you have previous
versions of the package lying around, with similar filenames, this is
obviously the way zsync would be used.

Returning to the original question: Does anybody know why the
uncompressed "Packages" file has disappeared from the "unstable"
archive?

Can it either be replaced, or alternatively, can the "Packages.gz" file
be compressed using the "--rsyncable" option, so that rsync can again
be used for updating the packages list?


-- Ian Bruce


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-26 Thread Kurt Roeckx
On Wed, Oct 26, 2005 at 05:11:00AM -0700, Ian Bruce wrote:
> 
> If the .deb files were compressed using the gzip "--rsyncable" option,
> then fetching them with zsync (or rsync) would be considerably more
> efficient than straight HTTP transfers.

No it wouldn't.  Remember that .deb files are never supposed to
change.  For other files like Packages and Sources this might
work indeed.


Kurt


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-26 Thread Henrique de Moraes Holschuh
On Wed, 26 Oct 2005, Ian Bruce wrote:
> option was implemented. Perhaps it's thought that more testing is
> required before it can be used for the archives; is there any other
> reason not to use it?

The way gzip --rsyncable works is perfectly safe, it cannot cause data loss
AFAIK.  It just makes gzip begin compression blocks in predictable places of
the plaintext data, that tend to stay constant.

OTOH, it does decrease compression ratio (probably *very* little).  But if
compression ratio was important, we would have switched to bzip2 for
everything anyway.

AFAIK, there is no technical reason for not using gzip --rsyncable, other
than the simple fact that nobody modified the dak code yet.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-26 Thread Ian Bruce
On Wed, 26 Oct 2005 12:05:08 +0200
Goswin von Brederlow <[EMAIL PROTECTED]> wrote:

> > -- has there been any progress towards providing zsync access to the
> > archives? It would seem that this would result in greatly reduced
> > data traffic on the network servers, without increasing the
> > computational load, as rsync does; I gather that this is the main
> > objection to its use.
> 
> zsync uses http so you already have access. What is missing are the
> checksum files. Also, last I checked, zsync didn't yet have support to
> sync the contents of a deb as opposed to syncing the compressed data
> itself. Before that the savings are minimal at best.

To speak of zsync access obviously implies the existence of the control
files, which, as you say, are not there. Therefore the archives are not
currently accessible with zsync.

If the .deb files were compressed using the gzip "--rsyncable" option,
then fetching them with zsync (or rsync) would be considerably more
efficient than straight HTTP transfers. That's the reason why that
option was implemented. Perhaps it's thought that more testing is
required before it can be used for the archives; is there any other
reason not to use it?


-- Ian Bruce


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-26 Thread Goswin von Brederlow
Ian Bruce <[EMAIL PROTECTED]> writes:

> Some related questions:
>
> -- what is the purpose of the "Packages.diff/" directory which has
> appeared in the "testing" and "unstable" archives? Is there some piece
> of software which makes use of this for updating the packages lists?

apt-get (experimental only iirc) uses this to only download changes
since the last update. This has been brewing for a long time and was
previously available from people.d.o/~aba. It is now integrated into
tiffany.

> -- is it possible that the "Packages.gz" files could be compressed using
> the gzip "--rsyncable" option? Or is this already the case?

Not sure about the rsyncable but it contains the timestamp and
possibly file permissions or something as gzip does. Running a gzip
Packages localy does not give an identical result. For debmirror I
rsync the Packages.gz, gunzip, rsync the Packages, bzip2, rsync the
Packages.bz2 (all only if the md5sums don't match)

> -- has there been any progress towards providing zsync access to the
> archives? It would seem that this would result in greatly reduced data
> traffic on the network servers, without increasing the computational
> load, as rsync does; I gather that this is the main objection to its
> use.

zsync uses http so you already have access. What is missing are the
checksum files. Also, last I checked, zsync didn't yet have support to
sync the contents of a deb as opposed to syncing the compressed data
itself. Before that the savings are minimal at best.

MfG
Goswin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Packages file missing from unstable archive

2005-10-26 Thread Philip Charles
On Wed, 26 Oct 2005 21:32, Ian Bruce wrote:
> It seems that recently, the uncompressed version of the "Packages" file
> has disappeared from the "unstable" archive on the Debian network
> servers and all their mirrors.
>
> http://ftp.debian.org/debian/dists/unstable/main/binary-i386/
>
> On the other hand, the uncompressed file is still available for the
> "stable" and "testing" archives.
>
> http://ftp.debian.org/debian/dists/stable/main/binary-i386/
> http://ftp.debian.org/debian/dists/testing/main/binary-i386/
>
> What is the explanation for this decision? It makes it impossible to
> use rsync to update the packages list. (Perhaps this was actually the
> motivation for the change, but shouldn't it be up to the administrators
> of each mirror whether or not they want to allow rsync access?)
>
> Some related questions:
>
> -- what is the purpose of the "Packages.diff/" directory which has
> appeared in the "testing" and "unstable" archives? Is there some piece
> of software which makes use of this for updating the packages lists?
>
> -- is it possible that the "Packages.gz" files could be compressed
> using the gzip "--rsyncable" option? Or is this already the case?
>
> -- has there been any progress towards providing zsync access to the
> archives? It would seem that this would result in greatly reduced data
> traffic on the network servers, without increasing the computational
> load, as rsync does; I gather that this is the main objection to its
> use.
>
> Perhaps the answers to these questions are available in some obvious
> place; I looked everywhere that occurred to me, but didn't find
> anything.


I got caught with this too.  Then I remembered a discussion some time back 
to the effect that to download Packages was a waste of time and that 
diffs would do the job.

I personally don't bother with the diffs, but use 
buzip2 -fk .../dists/sid/main/binary-hurd-i386/packages.bz2

Phil.
--
  Philip Charles; 39a Paterson Street, Abbotsford, Dunedin, New Zealand
   +64 3 488 2818Fax +64 3 488 2875Mobile 025 267 9420
 [EMAIL PROTECTED] - preferred.  [EMAIL PROTECTED]
  I sell GNU/Linux & GNU/Hurd CDs & DVDs.   See http://www.copyleft.co.nz


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Packages file missing from unstable archive

2005-10-26 Thread Ian Bruce
It seems that recently, the uncompressed version of the "Packages" file
has disappeared from the "unstable" archive on the Debian network
servers and all their mirrors.

http://ftp.debian.org/debian/dists/unstable/main/binary-i386/

On the other hand, the uncompressed file is still available for the
"stable" and "testing" archives.

http://ftp.debian.org/debian/dists/stable/main/binary-i386/
http://ftp.debian.org/debian/dists/testing/main/binary-i386/

What is the explanation for this decision? It makes it impossible to
use rsync to update the packages list. (Perhaps this was actually the
motivation for the change, but shouldn't it be up to the administrators
of each mirror whether or not they want to allow rsync access?)

Some related questions:

-- what is the purpose of the "Packages.diff/" directory which has
appeared in the "testing" and "unstable" archives? Is there some piece
of software which makes use of this for updating the packages lists?

-- is it possible that the "Packages.gz" files could be compressed using
the gzip "--rsyncable" option? Or is this already the case?

-- has there been any progress towards providing zsync access to the
archives? It would seem that this would result in greatly reduced data
traffic on the network servers, without increasing the computational
load, as rsync does; I gather that this is the main objection to its
use.

Perhaps the answers to these questions are available in some obvious
place; I looked everywhere that occurred to me, but didn't find
anything.


-- Ian Bruce


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]