Re: package pool and big Packages.gz file

2001-01-08 Thread Goswin Brederlow
   == Jason Gunthorpe [EMAIL PROTECTED] writes:

  On 8 Jan 2001, Goswin Brederlow wrote:
 
 I don't need to get a filelisting, apt-get tells me the
 name. :)

  You have missed the point, the presence of the ability to do
  file listings prevents the adoption of rsync servers with high
  connection limits.

Then that feature should be limited to non-recursive listings or
turned off. Or .listing files should be created that are just served.

  Reversed checksums (with a detached checksum file) is
 something  someone should implement for debian-cd. You calud
 even quite  reasonably do that totally using HTTP and not run
 the risk of  rsync load at all.
 
 At the moment the client calculates one roling checksum and
 md5sum per block.

  I know how rsync works, and it uses MD4.

Ups, then s/5/4/g.

 Given a 650MB file, I don't want to know the hit/miss ratios
 for the roling checksum and the md5sum. Must be realy bad.

  The ratio is supposed to only scale with block size, so it
  should be the same for big files and small files (ignoring the
  increase in block size with file size).  The amount of time
  expended doing this calculation is not trivial however.

Hmm, in the technical paper it says that it creates a 16 bit external
hash, each entry a linked list of items containing the full 32 Bit
rolling checksum (or the other 16 bit) and the md4sum.

So when you have more blocks, the hash will fill up. So you have more
hits on the first level and need to search a linked list. With a block
size of 1K a CD image has 10 items per hash entry, its 1000% full. The
time wasted alone to check the rolling checksum must be huge.

And with 65 rolling checksums for the image, theres a ~10/65536
chance chance of hitting the same checksum with differen md4sum, so
thats about 100 times per CD, just by pure chance.

If the images match, then its 65 times.

So the better the match, the more blocks you have, the more cpu it
takes. Of cause larger blocks take more time to compute a md4sum, but
you will have less blocks then.

  For CD images the concern is of course available disk
  bandwidth, reversed checksums eliminate that bottleneck.

That anyway. And ram.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-08 Thread Jason Gunthorpe

On 8 Jan 2001, Goswin Brederlow wrote:

 Then that feature should be limited to non-recursive listings or
 turned off. Or .listing files should be created that are just served.

*couf* rproxy *couf*

 So when you have more blocks, the hash will fill up. So you have more
 hits on the first level and need to search a linked list. With a block
 size of 1K a CD image has 10 items per hash entry, its 1000% full. The
 time wasted alone to check the rolling checksum must be huge.

Sure, but that is trivially solvable and is really a minor amount of
time when compared with the computing of the MD4 hashes. In fact when you
start taking about 65 blocks you want to reconsider the design choices
that were made with rsync's searching - it is geared toward small files
and is not really optimal for big ones.

 So the better the match, the more blocks you have, the more cpu it
 takes. Of cause larger blocks take more time to compute a md4sum, but
 you will have less blocks then.

No. The smaller the blocks the more CPU time it will take to compute MD4
hashes. Expect MD4 to run at  100meg/sec on modern hardware so you are
looking at burning 6 seconds of CPU time to verify the local CD image.

If you start getting large 32 bit checksum matches with md4 mismatches due
to too large a block size then you could easially double or triple the
number of md4 calculations you need. That is still totally dwarfed by the
 10meg/sec IO throughput you can expect with a copy of a 600 meg ISO
file. 
 
Jason




Re: package pool and big Packages.gz file

2001-01-07 Thread Sam Vilain
On Fri, 5 Jan 2001 09:33:05 -0700 (MST)
Jason Gunthorpe [EMAIL PROTECTED] wrote:

  If that suits your needs, feel free to write a bugreport on apt about
  this.
 Yes, I enjoy closing such bug reports with a terse response.
 Hint: Read the bug page for APT to discover why!

From bug report #76118:

No. Debian can not support the use of rsync for anything other than
mirroring, APT will never support it.

Why?  Because if everyone used rsync, the loads on the servers that supported 
rsync would be too high?  Or something else?
--
Sam Vilain, [EMAIL PROTECTED]WWW: http://sam.vilain.net/
GPG public key: http://sam.vilain.net/sam.asc




Re: package pool and big Packages.gz file

2001-01-07 Thread Sam Vilain
On Fri, 5 Jan 2001 19:08:38 +0200
[EMAIL PROTECTED] (Sami Haahtinen) wrote:

 Or, can rsync sync binary files?
 hmm.. this sounds like something worth implementing..

rsync can, but the problem is with a compressed stream if you insert or alter 
data early on in the stream, the data after that change is radically different.

But... you could use it successfully against the .tar files inside the .deb, 
which are normally compressed.  This would probably require some special 
implementation of rsync, or to have the uncompressed packages on the server and 
put the magic in apt.

Or perhaps the program apt-mirror is called for, which talks its own protocol 
to other copies of itself, and will do a magic job of selectively updating 
mirror copies of the debian archive using the rsync algorithm.  This would be 
similar to the apt-get and apt-move pair, but actually sticking it into a 
directory structure that looks like the debian mirror.  Then, if you want to 
enable it, turn on the server version and share your mirror with your friends 
inside your corporate network!  Or an authenticated version, so that a person 
with their own permanent internet connection could share their archive with a 
handful of friends - having an entire mirror would be too costly for them.  I 
think this has some potential to be quite useful and reduce bandwidth 
requirements.  It could use GPG signatures to check that nothing funny is going 
on, too.

Either that or keep a number of patch files or .xd files for a couple of old 
revs per packages against the uncompressed contents of packages to allow small 
changes to packages to be quick.  Or perhaps implement this as patch packages, 
which are a special .deb that only contain the changed files and upgrade the 
package.
--
Sam Vilain, [EMAIL PROTECTED]WWW: http://sam.vilain.net/
GPG public key: http://sam.vilain.net/sam.asc




Re: package pool and big Packages.gz file

2001-01-07 Thread Goswin Brederlow
   == Sam Vilain [EMAIL PROTECTED] writes:

  On Fri, 5 Jan 2001 09:33:05 -0700 (MST) Jason Gunthorpe
  [EMAIL PROTECTED] wrote:

  If that suits your needs, feel free to write a bugreport on
 apt about this.  Yes, I enjoy closing such bug reports with a
 terse response.  Hint: Read the bug page for APT to discover
 why!

 From bug report #76118:

  No. Debian can not support the use of rsync for anything other
  than mirroring, APT will never support it.

  Why?  Because if everyone used rsync, the loads on the servers
  that supported rsync would be too high?  Or something else?  --
  Sam Vilain, [EMAIL PROTECTED] WWW: http://sam.vilain.net/ GPG
  public key: http://sam.vilain.net/sam.asc

Actually the load should drop, providing the following feature add
ons:

1. cached checksums and pulling instead of pushing
2. client side unpackging of compressed streams

That way the rsync servers would have to first server the checksum
file from cache (being 200-1000 smaller than the real file) and then
just the blocks the client asks for. So if 1% of the file being
rsynced fits its even and everything above that saves bandwidth.

The current mode of operation of rsync works in the reverse, so all
the computation is done on the server every time, which of cause is a
heavy load on the server.

I hope both features will work without chaning the server, but if not,
we will have to wait till servers catch up with the feature.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-07 Thread Falk Hueffner
Sam Vilain [EMAIL PROTECTED] writes:

 On Fri, 5 Jan 2001 19:08:38 +0200
 [EMAIL PROTECTED] (Sami Haahtinen) wrote:
 
  Or, can rsync sync binary files?
  hmm.. this sounds like something worth implementing..
 
 rsync can, but the problem is with a compressed stream if you insert
 or alter data early on in the stream, the data after that change is
 radically different.
 
 But... you could use it successfully against the .tar files inside
 the .deb, which are normally compressed.  This would probably
 require some special implementation of rsync, or to have the
 uncompressed packages on the server and put the magic in apt.

 [...]
 
 Either that or keep a number of patch files or .xd files for a
 couple of old revs per packages against the uncompressed contents of
 packages to allow small changes to packages to be quick.  Or perhaps
 implement this as patch packages, which are a special .deb that only
 contain the changed files and upgrade the package.

I suggest you have a look at 'tje' by Joost Witteveen
(http://joostje.op.het.net/tje/index.html). It is specifically written
with the goal in mind to sync Debian mirrors with minimum bandwidth
use. It doesn't use the rsync algorithm, but something similar. It
understands .debs and claims to have less server CPU usage than rsync,
since it caches diffs and md5sums. It would be really nice if anybody
with an up-to-date mirror could volunteer to provide a machine to set
up a tje server to test it a little more...

Falk




Re: package pool and big Packages.gz file

2001-01-07 Thread Matt Zimmerman
On Sun, Jan 07, 2001 at 03:49:43PM +0100, Goswin Brederlow wrote:

 Actually the load should drop, providing the following feature add
 ons:
 [...]

The load should drop from that induced by the current rsync setup (for the
mirrors), but if many, many more client start using rsync (instead of
FTP/HTTP), I think there will still be a significant net increase in load.

Whether it would be enough to cause a problem is debatable, and I honestly
don't know either way.

-- 
 - mdz




Re: package pool and big Packages.gz file

2001-01-07 Thread Goswin Brederlow
   == Matt Zimmerman [EMAIL PROTECTED] writes:

  On Sun, Jan 07, 2001 at 03:49:43PM +0100, Goswin Brederlow
  wrote:
 Actually the load should drop, providing the following feature
 add ons: [...]

  The load should drop from that induced by the current rsync
  setup (for the mirrors), but if many, many more client start
  using rsync (instead of FTP/HTTP), I think there will still be
  a significant net increase in load.

  Whether it would be enough to cause a problem is debatable, and
  I honestly don't know either way.

When the checksums are cached there will be no cpu load caused by
rsync, since it will only transfer the file. And the checksum files
will be realy small as I said, so if some similarity is found the
reduction in data will make more than up for the checksum download.

The only increase is the space needed to store the checksums in some
form of cache.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-07 Thread Jason Gunthorpe

On 7 Jan 2001, Goswin Brederlow wrote:

 Actually the load should drop, providing the following feature add
 ons:
 
 1. cached checksums and pulling instead of pushing
 2. client side unpackging of compressed streams

Apparently reversing the direction of rsync infringes on a patent.

Plus there is the simple matter that the file listing and file download
features cannot be seperated. Doing a listing of all files on our site is
non-trivial.

Once you strip all that out you have rproxy.

Reversed checksums (with a detached checksum file) is something someone
should implement for debian-cd. You calud even quite reasonably do that
totally using HTTP and not run the risk of rsync load at all.

Such a system for Package files would also be acceptable I think.

Jason




Re: package pool and big Packages.gz file

2001-01-07 Thread Brian May
 Goswin == Goswin Brederlow [EMAIL PROTECTED] writes:

Goswin Actually the load should drop, providing the following
Goswin feature add ons:

How does rproxy cope? Does it require a high load on the server?  I
suspect not, but need to check on this.

I think of rsync as just being a quick hack, rproxy is the (long-term)
direction we should be headed. rproxy is the same as rsync, but based
on the HTTP protocol, so it should be possible (in theory) to
integrate into programs like Squid, Apache and Mozilla (or so the
authors claim).
-- 
Brian May [EMAIL PROTECTED]




Re: package pool and big Packages.gz file

2001-01-07 Thread Goswin Brederlow
   == Brian May [EMAIL PROTECTED] writes:

 Goswin == Goswin Brederlow [EMAIL PROTECTED] writes:
Goswin Actually the load should drop, providing the following
Goswin feature add ons:

  How does rproxy cope? Does it require a high load on the
  server?  I suspect not, but need to check on this.

  I think of rsync as just being a quick hack, rproxy is the
  (long-term) direction we should be headed. rproxy is the same
  as rsync, but based on the HTTP protocol, so it should be
  possible (in theory) to integrate into programs like Squid,
  Apache and Mozilla (or so the authors claim).  -- Brian May
  [EMAIL PROTECTED]

URL?

Sounds more like encapsulation of an rsync similar protocol in html,
but its hard to tell from the few words you write. Could be intresting
though.

Anyway, it will not resolve the problem with compressed files if its
just like rsync.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-07 Thread Brian May
 Goswin == Goswin Brederlow [EMAIL PROTECTED] writes:

Goswin URL?

URL:http://linuxcare.com.au/projects/rproxy/

The documentation seems very comprehensive, but I am not sure when it
was last updated.

Goswin Sounds more like encapsulation of an rsync similar
Goswin protocol in html, but its hard to tell from the few words
Goswin you write. Could be intresting though.

errr... I think you mean http, not html.

Goswin Anyway, it will not resolve the problem with compressed
Goswin files if its just like rsync.

True; however, I was thinking more in the context of Packages
and other uncompressed files. It would be good though if these
issues regarding deb packages could be resolved.

Then again, perhaps I was a bit blunt with that statement on rsync,
rsync still will always have its uses, eg. copying private data.
-- 
Brian May [EMAIL PROTECTED]




Re: package pool and big Packages.gz file

2001-01-07 Thread Goswin Brederlow
   == Jason Gunthorpe [EMAIL PROTECTED] writes:

  On 7 Jan 2001, Goswin Brederlow wrote:

 Actually the load should drop, providing the following feature
 add ons:
 
 1. cached checksums and pulling instead of pushing 2. client
 side unpackging of compressed streams

  Apparently reversing the direction of rsync infringes on a
  patent.

When I rsync a file, rsync starts ssh to connect to the remote host
and starts rsync there in the reverse mode.

You say that the recieving end is violating a patent and the sending
end not?

Hmm, which patent anyway?

So I have to fork a rsync-non-US because of a patent?

  Plus there is the simple matter that the file listing and file
  download features cannot be seperated. Doing a listing of all
  files on our site is non-trivial.

I don't need to get a filelisting, apt-get tells me the name. :)
Also I can do rsync -v host::dir and parse the output to grab the
actual files with another rsync. So filelisting and downloading is
absolutely seperable.

Doing a listing of all file probably results in a timeout. The
harddrives are too slow.

  Once you strip all that out you have rproxy.

  Reversed checksums (with a detached checksum file) is something
  someone should implement for debian-cd. You calud even quite
  reasonably do that totally using HTTP and not run the risk of
  rsync load at all.

At the moment the client calculates one roling checksum and md5sum per
block.

The server, on the other hand, calculates the rolling checksum per
byte and for each hit it calculates an md5sum for one block.

Given a 650MB file, I don't want to know the hit/miss ratios for the
roling checksum and the md5sum. Must be realy bad.

The smaller the file, the less wrong md5sums need to be calculated.

  Such a system for Package files would also be acceptable I
  think.

For Packages file even cvs -z9 would be fine. They are comparatively
small to the rest of the load I would think.

But I, just as you do, think that it would be a realy good idea to
have precalculated rolling checksums and md5sums, maybe even for
various blocksizes, and let the client do the time consuming guessing
and calculating. That would prevent rsync to read every file served
twice, as it does now when they are dissimilar.

May the Source be with you.
Goswin




Re: package pool and big Packages.gz file

2001-01-07 Thread Jason Gunthorpe

On 8 Jan 2001, Goswin Brederlow wrote:

   Apparently reversing the direction of rsync infringes on a
   patent.
 
 When I rsync a file, rsync starts ssh to connect to the remote host
 and starts rsync there in the reverse mode.

Not really, you have to use quite a different set of operations to do it
one way vs the other. The core computation is the same, mind you.
 
 Hmm, which patent anyway?

Don't know, I never heard back from Tridge on that.
 
 I don't need to get a filelisting, apt-get tells me the name. :)

You have missed the point, the presence of the ability to do file listings
prevents the adoption of rsync servers with high connection limits.

   Reversed checksums (with a detached checksum file) is something
   someone should implement for debian-cd. You calud even quite
   reasonably do that totally using HTTP and not run the risk of
   rsync load at all.
 
 At the moment the client calculates one roling checksum and md5sum per
 block.

I know how rsync works, and it uses MD4.

 Given a 650MB file, I don't want to know the hit/miss ratios for the
 roling checksum and the md5sum. Must be realy bad.

The ratio is supposed to only scale with block size, so it should be the
same for big files and small files (ignoring the increase in block size
with file size).  The amount of time expended doing this calculation is
not trivial however. 

For CD images the concern is of course available disk bandwidth, reversed
checksums eliminate that bottleneck.

Jason




Re: package pool and big Packages.gz file

2001-01-06 Thread Andrew Stribblehill
Quoting Goswin Brederlow [EMAIL PROTECTED]:
== Sami Haahtinen [EMAIL PROTECTED] writes:

   Or, can rsync sync binary files?
 
 Of cause, but forget it with compressed data.

Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew
Tridgell (Samba, Rsync) has a patch to do this, but I don't know
whether he passed it onto the gzip maintainers.

(Apparently he's working on a --fuzzy flag for matching rsyncs
between, say foo-1.0.deb and foo-1.1.deb. He says it should be
called the --debian flag.)

Cheerio,

Andrew Stribblehill
Systems programmer, IT Service, University of Durham, England




Re: package pool and big Packages.gz file

2001-01-06 Thread Sam Couter
Andrew Stribblehill [EMAIL PROTECTED] wrote:
 
 Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew
 Tridgell (Samba, Rsync) has a patch to do this, but I don't know
 whether he passed it onto the gzip maintainers.

I like the idea of having plugins for rsync to handle different kinds of
data. So the gzip plugin will decompress the data, and the rsync algorithm
can work on the decompressed data. Much better.

 (Apparently he's working on a --fuzzy flag for matching rsyncs
 between, say foo-1.0.deb and foo-1.1.deb. He says it should be
 called the --debian flag.)

A deb plugin would be better. :)
-- 
Sam Couter  |   Internet Engineer   |   http://www.topic.com.au/
[EMAIL PROTECTED]|   tSA Consulting  |
OpenPGP key available on key servers
OpenPGP fingerprint:  A46B 9BB5 3148 7BEA 1F05  5BD5 8530 03AE DE89 C75C


pgpSUf10GUoEI.pgp
Description: PGP signature


Re: package pool and big Packages.gz file

2001-01-06 Thread Brian May
 Sam == Sam Couter [EMAIL PROTECTED] writes:

Sam Andrew Stribblehill [EMAIL PROTECTED] wrote:
 Doesn't gzip have a --rsync option, or somesuch? Apparently
 Andrew Tridgell (Samba, Rsync) has a patch to do this, but I
 don't know whether he passed it onto the gzip maintainers.

Sam I like the idea of having plugins for rsync to handle
Sam different kinds of data. So the gzip plugin will decompress
Sam the data, and the rsync algorithm can work on the
Sam decompressed data. Much better.

 (Apparently he's working on a --fuzzy flag for matching rsyncs
 between, say foo-1.0.deb and foo-1.1.deb. He says it should be
 called the --debian flag.)

Sam A deb plugin would be better. :) 

Sounds like a good idea to me.

Although don't get the two issues confused:

1. difference in filename.
2. format of file.

Although, I guess in most cases the two will always be linked (eg.
choosing the best filename really depends on the format, as ideally
the most similar *.deb package should be used, and this means
implementing debian rules for comparing versions), will this always be
the case?
-- 
Brian May [EMAIL PROTECTED]




Re: package pool and big Packages.gz file

2001-01-06 Thread Drake Diedrich
On Sun, Jan 07, 2001 at 11:43:39AM +1100, Sam Couter wrote:
 
 A deb plugin would be better. :)

   One problem with a deb plugin is that .debs are signed in compressed
form.  gzip isn't guaranteed to produce the same compressed file from
identical uncompressed files on different architectures and releases.
Varying the compression flags can also change the compressed file.

-Drake




Re: package pool and big Packages.gz file

2001-01-06 Thread Matt Zimmerman
On Sun, Jan 07, 2001 at 12:53:14PM +1100, Drake Diedrich wrote:

 On Sun, Jan 07, 2001 at 11:43:39AM +1100, Sam Couter wrote:
  
  A deb plugin would be better. :)
 
One problem with a deb plugin is that .debs are signed in compressed
 form.  gzip isn't guaranteed to produce the same compressed file from
 identical uncompressed files on different architectures and releases.
 Varying the compression flags can also change the compressed file.

It shouldn't be a problem to tweak things so that the resulting files end up
exactly the same.  This is rsync, after all, and that is the program's goal.
For instance, uncompressed blocks could be used for comparison, but the gzip
header copied exactly.

-- 
 - mdz




Re: [FINAL, for now ;-)] (Was: Re: package pool and big Packages.gz file)

2001-01-05 Thread Joey Hess
If you don't like large Packages files, implement a rsync transfer
method for them.

-- 
see shy jo




Re: package pool and big Packages.gz file

2001-01-05 Thread Jason Gunthorpe

On 5 Jan 2001, Goswin Brederlow wrote:

 If that suits your needs, feel free to write a bugreport on apt about
 this.

Yes, I enjoy closing such bug reports with a terse response.

Hint: Read the bug page for APT to discover why!

Jason




Re: package pool and big Packages.gz file

2001-01-05 Thread Sami Haahtinen
On Fri, Jan 05, 2001 at 03:05:03AM +0100, Goswin Brederlow wrote:
 Whats the problem with a big Packages file?
 
 If you don't want to download it again and again just because of small
 changes I have a better solution for you:
 
 rsync
 
 apt-get update could rsync all Packages files (yes, not the .gz once)
 and thereby download only changed parts. On uncompressed files rsync
 is very effective and the changes can be compressed for the actual
 transfer. So on upload you will pratically get a diff.gz to your old
 Packages file.


this would bring us to, apt renaming the old deb (if there is one) to the
name of the new package and rsync those. and we would save some time once
again... 

Or, can rsync sync binary files?

hmm.. this sounds like something worth implementing..

-- 
every nerd knows how to enjoy the little things of life,
like: rm -rf windows




Re: package pool and big Packages.gz file

2001-01-05 Thread Sami Haahtinen
On Fri, Jan 05, 2001 at 05:46:35AM +0800, zhaoway wrote:
  how about diffs bethween dinstall runs?..
 
 sorry, but i don't understand here. dinstall is a server side thing here?

yes, when dinstall runs it would copy the old packages file to, lets say,
packages.old and create it's changes to the new file.. after it's done it would
diff packages.old and packages...

  packages-010102-010103.gz
  packages-010103-010104.gz
  packages.gz
  
  apt would download the changes after the last update, and merge these to
  the package file, if the file gets corrupted, it would attempt to do a full
  update.
 
 on the top, some pkg-gz-deb lists packages on the leaf of dependency tree,
 and each of pkg-gz-deb won't get bigger than 100k, and each of them depends
 on some more basic pkg-gz-deb
 
 below, some other pkg-gz-deb like the base sub-system.
 
 this way, when user install xdm,
 apt-get first install pkg-gz-deb which lists xdm, then as dependency checking,
 it will install base-a-pkg-gz-deb etc. ect., then xdm got installed, this way,
 all xdm's dependency will be fulfiled with the newest information avalaible.
 
 and you can see this will surely ease up the band-width. (when update gcc, i
 won't get additional bits of Packages.gz about xdm xfree etc.)

wouldn't this make it a BIT too difficult?

-- 
every nerd knows how to enjoy the little things of life,
like: rm -rf windows




Re: package pool and big Packages.gz file

2001-01-05 Thread Wichert Akkerman
Previously Sami Haahtinen wrote:
 this would bring us to, apt renaming the old deb (if there is one) to the
 name of the new package and rsync those. and we would save some time once
 again... 

There is a --fuzzy-names patch for rsync that makes rsync do that itself.

 Or, can rsync sync binary files?

Yes.

 hmm.. this sounds like something worth implementing..

Don't bother, it's been done already. Ask Rusty for details.

Wichert.

-- 
   
 / Generally uninteresting signature - ignore at your convenience  \
| [EMAIL PROTECTED]  http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0  2805 3CB8 9250 2FA3 BC2D |




Re: package pool and big Packages.gz file

2001-01-05 Thread Goswin Brederlow
   == Sami Haahtinen [EMAIL PROTECTED] writes:

  On Fri, Jan 05, 2001 at 03:05:03AM +0100, Goswin Brederlow
  wrote:
 Whats the problem with a big Packages file?
 
 If you don't want to download it again and again just because
 of small changes I have a better solution for you:
 
 rsync
 
 apt-get update could rsync all Packages files (yes, not the .gz
 once) and thereby download only changed parts. On uncompressed
 files rsync is very effective and the changes can be compressed
 for the actual transfer. So on upload you will pratically get a
 diff.gz to your old Packages file.


  this would bring us to, apt renaming the old deb (if there is
  one) to the name of the new package and rsync those. and we
  would save some time once again...

Thats what the debian-mirror script does (its about halve of the
script just for that). It also uses old tar.gz, orig.tar.gz, diff.gz
and dsc files.

  Or, can rsync sync binary files?

Of cause, but forget it with compressed data.

  hmm.. this sounds like something worth implementing..

I'm currently discussing some changes to the rsync client with some
people from the rsync ML which would uncompress compressed data on the
client side (no changes to the server) and rsync those. Sounds like
not improving anything, but when reading the full description on this
it actually does.

Before that rsyncing new debs with old once hardly ever saves
anything. Where it hels is with big packages like xfree, where several
packages are identical between releases.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-05 Thread Goswin Brederlow
   == Jason Gunthorpe [EMAIL PROTECTED] writes:

  On 5 Jan 2001, Goswin Brederlow wrote:

 If that suits your needs, feel free to write a bugreport on apt
 about this.

  Yes, I enjoy closing such bug reports with a terse response.

  Hint: Read the bug page for APT to discover why!

  Jason

I couldn't find any existing bugreport concerning rsync support for
apt-get in the long list of bugs.

So why would you close such a wishlist bugreport?
And why with a terse response?

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-05 Thread Junichi Uekawa
In 05 Jan 2001 19:51:08 +0100 Goswin Brederlow [EMAIL PROTECTED] cum veritate 
scripsit :

Hello,

 I'm currently discussing some changes to the rsync client with some
 people from the rsync ML which would uncompress compressed data on the
 client side (no changes to the server) and rsync those. Sounds like
 not improving anything, but when reading the full description on this
 it actually does.
 
 Before that rsyncing new debs with old once hardly ever saves
 anything. Where it hels is with big packages like xfree, where several
 packages are identical between releases.

No offence, but wouldn't it be a tad difficult to play around with it, 
since deb packages are not just gzipped archives, but ar archive containing 
gzipped tar archives?


regards,
junichi

--
University: [EMAIL PROTECTED]Netfort: [EMAIL PROTECTED]
dancer, a.k.a. Junichi Uekawa   http://www.netfort.gr.jp/~dancer
 Dept. of Knowledge Engineering and Computer Science, Doshisha University.
... Long Live Free Software, LIBERTAS OMNI VINCIT.




Re: package pool and big Packages.gz file

2001-01-05 Thread Goswin Brederlow
   == Junichi Uekawa [EMAIL PROTECTED] writes:

  In 05 Jan 2001 19:51:08 +0100 Goswin Brederlow
  [EMAIL PROTECTED] cum veritate
  scripsit : Hello,

 I'm currently discussing some changes to the rsync client with
 some people from the rsync ML which would uncompress compressed
 data on the client side (no changes to the server) and rsync
 those. Sounds like not improving anything, but when reading the
 full description on this it actually does.
 
 Before that rsyncing new debs with old once hardly ever saves
 anything. Where it hels is with big packages like xfree, where
 several packages are identical between releases.

  No offence, but wouldn't it be a tad difficult to play around
  with it, since deb packages are not just gzipped archives, but
  ar archive containing gzipped tar archives?

Yes and no.

The problem is that deb files are special ar archives, so you can't
just download the files and ar them together.

One way would be to download the files in the ar, ar them together and
rsync again. Since ar does not chnage the data in it, the deb has the
same data just at different places, and rsync handles that well.

This would be possible, but would require server changes.

The trick is to know a bit about ar, but not to much. Just rsync the
header of the ar file till the first real file in it and then rsync
that recursively, then a bit more ar file data and another file and so
on. Knowing when subfiles start and how long they are is enough.

The question will be how much intelligence to teach rsync. I like
rsync stupid but still intelligent enough to do the job.

Its pretty tricky, so it will be some time before anything in that
direction is useable.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-05 Thread Matthijs Melchior
Jason Gunthorpe wrote:
 

 Hint: Read the bug page for APT to discover why!
 


Looking through the apt bugs., saw this one, rejected:

Bug#77054: wish: show current-upgraded versions on upgrade -u


My private solution to this is the following patch to `apt-get':


--- algorithms.cc-ORG   Sat May 13 06:08:43 2000
+++ algorithms.cc   Sat Sep  9 22:11:19 2000
@@ -47,9 +47,13 @@
 {
// Adapt the iterator
PkgIterator Pkg = Sim.FindPkg(iPkg.Name());
+   const char *oldver = Pkg-CurrentVer ? Pkg.CurrentVer().VerStr() : -;
+   const char *newver = Pkg-VersionList ? Pkg.VersionList().VerStr() : -;
+
Flags[Pkg-ID] = 1;

-   cout  Inst   Pkg.Name();
+   cout  Inst   Pkg.Name()   (  oldver  newver  );
+
Sim.MarkInstall(Pkg,false);

// Look for broken conflicts+predepends.


This informs me about versions when doing apt-get --no-act install package.

I like this very much, and would appreciate this going into the official
apt-get command.


-- 
Thanks,
  -o)
Matthijs Melchior   Maarssen  /\\
mailto:[EMAIL PROTECTED]  +31 346 570616Netherlands _\_v
 




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
[read my previous semi-proposal]

this has some more benefits,

1) package maintainer could upload (to pool) in whatever
frequency they like.

2) release is seperated from package pool which is a storage
system. and release is a qa system.

3) release could be managed through BTS on specific package-gz.deb
that surely would put much more burden on BTS, ;-)

4) if apt-get could deal it well, i hope all of sub-mirror'ing issue
will be gone easily. just apt-get install some-rel-packages-gz then
apt-get mirror (just like download and move ...)

my more 2'c ;-)

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Fri, Jan 05, 2001 at 03:17:30AM +0800, zhaoway wrote:
 [read my previous semi-proposal]
 
 this has some more benefits,
 
 1) package maintainer could upload (to pool) in whatever
 frequency they like.

in an ideal world, developer should upload to ''xxx-auto-builder'' ;-)

9i'm turning out to be crappy now. ;-)

bye,




Re: package pool and big Packages.gz file

2001-01-04 Thread Sami Haahtinen
On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote:
 my proposal to resolve big Packages.gz is through package
 pool system.
 
 add 36 or so new debian package, namely,
 
 [a-zA-Z0-1]-packages-gz_date_all.deb
 
 contents of each is quite obvious. ;-)
 and a virtual unstable-packages-gz depends on all of them. finished.
 
 apt-get update should deal with it

how about diffs bethween dinstall runs?..

packages-010102-010103.gz
packages-010103-010104.gz
packages.gz

apt would download the changes after the last update, and merge these to the
package file, if the file gets corrupted, it would attempt to do a full update.

This wouldn't be a big difference in the load that the master-ftp has to
handle, atleast when some 7 of these would be stored at maximum.

Regards, Sami Haahtinen

-- 
every nerd knows how to enjoy the little things of life,
like: rm -rf windows




Re: package pool and big Packages.gz file

2001-01-04 Thread Vince Mulhollon

The only other possibility not yet proposed (?) would be to split the
packages file by section.

base-packages
games-packages
x11-packages
net-packages

Then a server that just doesn't do x11 or doesn't go games has no need to
keep up with available x11 or games packages.





[EMAIL PROTECTED]   

punki.fi To: debian-devel@lists.debian.org  

(Samicc: (bcc: Vince 
Mulhollon/Brookfield/Norlight) 
Haahtinen)   Fax to:

 Subject: Re: package pool and big 
Packages.gz file 
01/04/2001  

03:01 PM









On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote:
 my proposal to resolve big Packages.gz is through package
 pool system.

 add 36 or so new debian package, namely,

 [a-zA-Z0-1]-packages-gz_date_all.deb

 contents of each is quite obvious. ;-)
 and a virtual unstable-packages-gz depends on all of them. finished.

 apt-get update should deal with it

how about diffs bethween dinstall runs?..

packages-010102-010103.gz
packages-010103-010104.gz
packages.gz

apt would download the changes after the last update, and merge these to
the
package file, if the file gets corrupted, it would attempt to do a full
update.

This wouldn't be a big difference in the load that the master-ftp has to
handle, atleast when some 7 of these would be stored at maximum.

Regards, Sami Haahtinen

--
every nerd knows how to enjoy the little things of life,
like: rm -rf windows


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact
[EMAIL PROTECTED]








Re: package pool and big Packages.gz file

2001-01-04 Thread Sami Haahtinen
On Thu, Jan 04, 2001 at 03:07:00PM -0600, Vince Mulhollon wrote:
 
 The only other possibility not yet proposed (?) would be to split the
 packages file by section.
 
 base-packages
 games-packages
 x11-packages
 net-packages
 
 Then a server that just doesn't do x11 or doesn't go games has no need to
 keep up with available x11 or games packages.

how would the package manager (namely apt) know which ones you need.. even if
you don't have X11 installed (and apt assumes you don't need X11 packages file)
doesn't mean that you wouldn't want to install x11 packages file.

same goes for net (which is a weird definition in any case) and games, base is
the only reasonable one. But without the others it's not needed either...

-- 
every nerd knows how to enjoy the little things of life,
like: rm -rf windows




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Thu, Jan 04, 2001 at 11:01:15PM +0200, Sami Haahtinen wrote:
 On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote:
  my proposal to resolve big Packages.gz is through package
  pool system.
  
  add 36 or so new debian package, namely,
  
  [a-zA-Z0-1]-packages-gz_date_all.deb
  
  contents of each is quite obvious. ;-)
  and a virtual unstable-packages-gz depends on all of them. finished.
  
  apt-get update should deal with it
 
 how about diffs bethween dinstall runs?..

sorry, but i don't understand here. dinstall is a server side thing here?

 packages-010102-010103.gz
 packages-010103-010104.gz
 packages.gz
 
 apt would download the changes after the last update, and merge these to the
 package file, if the file gets corrupted, it would attempt to do a full 
 update.
 
 This wouldn't be a big difference in the load that the master-ftp has to
 handle, atleast when some 7 of these would be stored at maximum.

okay, try to group packages according to dependency,

on the top, some pkg-gz-deb lists packages on the leaf of dependency tree,
and each of pkg-gz-deb won't get bigger than 100k, and each of them depends
on some more basic pkg-gz-deb

below, some other pkg-gz-deb like the base sub-system.

this way, when user install xdm,
apt-get first install pkg-gz-deb which lists xdm, then as dependency checking,
it will install base-a-pkg-gz-deb etc. ect., then xdm got installed, this way,
all xdm's dependency will be fulfiled with the newest information avalaible.

and you can see this will surely ease up the band-width. (when update gcc, i
won't get additional bits of Packages.gz about xdm xfree etc.)

regards,

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Thu, Jan 04, 2001 at 11:19:59PM +0200, Sami Haahtinen wrote:
 how would the package manager (namely apt) know which ones you need.. even if
 you don't have X11 installed (and apt assumes you don't need X11 packages 
 file)
 doesn't mean that you wouldn't want to install x11 packages file.

another solution is to let every single deb provides its.pkg-gz

then, apt-get update will do nothing,
apt-get install some.deb will first download some.pkg-gz, then check its 
dependency,
then grab them.pkg-gz all, then install.

and a virtual release-pkgs-gz.deb will depend on some selected part of those
any.pkg-gz to get up a release.

then katie will remove a package only when no release-pkgs-gz.deb (or
testing, or whatever) depends on its.pkg-gz

regards,
zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Fri, Jan 05, 2001 at 06:07:20AM +0800, zhaoway wrote:
 another solution is to let every single deb provides its.pkg-gz
 
 then, apt-get update will do nothing,
 apt-get install some.deb will first download some.pkg-gz, then check its 
 dependency,
 then grab them.pkg-gz all, then install.

that is a minimum. isn't it? ;)
and then we will need some ``apt-get info pkg'' hehe..

 and a virtual release-pkgs-gz.deb will depend on some selected part of those
 any.pkg-gz to get up a release.

say one release contains 2000 pkgz, each pkg name is 10 chars, then the whole
infomation is a little more then 20k, compare with nowadays, a more than 1M.

and you could have base-3.3-release, and gnome-4.4-release which depends on
base-3.2-release and x-5.6-release. and chinese-2.0-release etc. ...

 then katie will remove a package only when no release-pkgs-gz.deb (or
 testing, or whatever) depends on its.pkg-gz

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread Petr Cech
On Fri, Jan 05, 2001 at 06:07:20AM +0800 , zhaoway wrote:
 On Thu, Jan 04, 2001 at 11:19:59PM +0200, Sami Haahtinen wrote:
  how would the package manager (namely apt) know which ones you need.. even 
  if
  you don't have X11 installed (and apt assumes you don't need X11 packages 
  file)
  doesn't mean that you wouldn't want to install x11 packages file.
 
 another solution is to let every single deb provides its.pkg-gz
 
 then, apt-get update will do nothing,
 apt-get install some.deb will first download some.pkg-gz, then check its 
 dependency,
 then grab them.pkg-gz all, then install.

but it will immensly restrict it's view on dependencies - think about
virtual packages. This is really not the way. Maybe spliting by as in pool/
so you only download changed part of the whole thing. But that's about it.
Maybe you can leave some part out, but ..

Petr Cech
-- 
Debian GNU/Linux maintainer - www.debian.{org,cz}
   [EMAIL PROTECTED]

* Joy notes some people think Unix is a misspelling of Unics which is a 
misspelling of Emacs :)




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
[quote myself, ;-) this is semi-final now ;-)]

another solution is to let every single deb provides its.pkg-gz
 
then, apt-get update will do nothing,
apt-get install some.deb will first download some.pkg-gz, then check its 
dependency,
then grab them.pkg-gz all, then install.

that is a minimum. isn't it? ;)
and then we will need some ``apt-get info pkg'' hehe..

and a virtual release-pkgs-gz.deb will depend on some selected part of those
any.pkg-gz to get up a release.

say one release contains 2000 pkgz, each pkg name is 10 chars, then the whole
infomation is a little more then 20k, compare with nowadays, a more than 1M.

and you could still do ``apt-get dist-upgrade'', just first install
release-pkgs-gz.deb then go on..., OR, first get a list of all debs installed
then update them each. [some more thoughts here..., later]

and you could have base-3.3-release, and gnome-4.4-release which depends on
base-3.2-release and x-5.6-release. and chinese-2.0-release etc. ...

then katie will remove a package only when no release-pkgs-gz.deb (or
testing, or whatever) depends on its.pkg-gz

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Thu, Jan 04, 2001 at 11:19:25PM +0100, Petr Cech wrote:
 On Fri, Jan 05, 2001 at 06:07:20AM +0800 , zhaoway wrote:
  then, apt-get update will do nothing,
  apt-get install some.deb will first download some.pkg-gz, then check its 
  dependency,
  then grab them.pkg-gz all, then install.
 
 but it will immensly restrict it's view on dependencies - think about
 virtual packages. This is really not the way. Maybe spliting by as in pool/
 so you only download changed part of the whole thing. But that's about it.
 Maybe you can leave some part out, but ..

virtual package is weird here. ;-)
but could be resolve by a some-virtula.pkg-gz ;-)

and the tree view of dependency tree, like in console-apt, that means,
[see my another semi-final mail.. ;-)] in general, if you wanna an tree-view
of the whole tree, you will need to download the whole tree anyway, and my
approach won't prevent you do that. ;-)

kinda regards,
zw




[FINAL, for now ;-)] (Was: Re: package pool and big Packages.gz file)

2001-01-04 Thread zhaoway
final thoughts ;-)


On bigger and bigger Packages.gz file, a try


The directory structure looks roughly like this:

debian/dists/woody/main/binary-all/Packages.deb

debian/pool/main/a/abba/abba_1989.orig.tar.gz
abba_1989-12.diff.gz
abba_1989-12.dsc
abba_1989-12_all.deb
abba_1989-12_all.pkg

debian/pool/main/r/rel-chinese/rel-music_0.9_all.pkg
   rel-music_0.9_all.deb
   rel-base/rel-base_200_all.pkg
rel-base_200_all.pkg


Contents of rel-chinese_0.9_all.pkg is as following. rel-base
or even rel-woody is just much more complicated. Hope so. rel-chinese.deb
is nearly an empty package.

Package: rel-music
Priority: optional
Section: misc
Installed-Size: 12
Maintainer: Anthony and Cleopatra
Architecture: all
Source: rel-chinese
Version: 0.9
Depends: rel-base (= 200), abba (= 1989-12), beatles(= 1979-100), garbage(= 
1998-7)
 wearied-ear (= 2.1)
Provides: music | abba | beatles
Filename: debian/pool/main/r/rel-chinese/rel-music_0.9_all.deb
Size: 3492
MD5sum: c8c730ea650cf14638d83d6bb7707cdb
Description: Simplified music environment
 This 'task package' installs programs, data files, fonts, and
 documentation that makes it easier to use Debian for
 Simplified music related operations. (Surprise, surprise, garbage
 didn't provide music!)


Note, music is a virtual package provided by adda and beatles.

Contents of abba_1989-12_all.pkg is as following.

Package: abba
Priority: optional
Section: sound
Installed-Size: 140
Maintainer: Old Man Billy
Architecture: all
Version: 1998-12
Replaces: beatles
Provides: music
Depends: wearied-ear (= 2.0)
Filename: pool/main/a/abba/abba_1989-12_all.deb
Size: 33256
MD5sum: e07899b62b7ad12c545e9998adb7c8d7
Description: A Swedish Music Band
 ABBA is popular in 1980's in last millenium. Don't be confused by ABBA
 and ADDA which is a heavy metal band.


Here, music is a virtual package provided by packages abba and beatles.


Let's simulate some typical senarios here.

1) apt-get update

There're roughtly two purpose for this action. One is to get an
overview, to ease up further processing like virtual packages; another
purpose is to install a specific package, or do dist-upgrade.

On the second purpose, apt-get here will do nothing. (See below)

On the first purpose, apt-get will have to download and parse the
current distribution's .pkg file according to user configuration.
Say, to download rel-music, and then see the virtual package music
is provided by abba and beatles.

So, generally, ``apt-get update'' will deal with rel-some__all.pkg
to get all of the overall information it will need further on.

Then, where does the rel-some__all.pkg get its information? We don't
want the release manager to track down all of these information. So, where's
katie? ;-) I think the trade-off is worthy (Indeed, only katie get to be a
little more complicated) considering the scalabily being gained. Read on.


2) apt-get install abba

apt-get will first parse the previously downloaded rel-music.pkg, and get
abba is at version 1998-12, and it depends on wearied-ear (= 2.0) and wow!
rel-music happens to provide wearied-ear (= 2.1), that's okay. Then apt-get
go on to download its .pkg and parse it, and so on.

When all required .pkg were downloaded and parsed (an updated Packages.gz)
apt-get then go on to download and install every of the debs.

(Maybe there will be more complicated issues, only let me know. See what's
going. ;-)

Thus, minimum data downloaded. ;-)


3) apt-get dist-upgrade

I don't know the details, but I think it's not very complicated given above
information. (All necessary things are there, aren't they? ;-)


4) Packages upload

.pkg file is generated automatically. No extra burden on most of the
developers. And developers could upload just as frequently as they see
fit. ;-)

Katie will be a little ;-) more complicated.

Package will get to be deleted from package pool only when no rel-X
depends on them. rel-X are treated specially.

And some fine tuned mirror could be setup.

And release management could benefit from Bug Tracking System and more
flexible. IMHO. ;-)


Kind regards,
zw




Re: package pool and big Packages.gz file

2001-01-04 Thread Goswin Brederlow
   == zhaoway  [EMAIL PROTECTED] writes:

  hi, [i'm not sure if this has been resolved, lart me if you
  like.]

  my proposal to resolve big Packages.gz is through package pool
  system.


Whats the problem with a big Packages file?

If you don't want to download it again and again just because of small
changes I have a better solution for you:

rsync

apt-get update could rsync all Packages files (yes, not the .gz once)
and thereby download only changed parts. On uncompressed files rsync
is very effective and the changes can be compressed for the actual
transfer. So on upload you will pratically get a diff.gz to your old
Packages file.

If that suits your needs, feel free to write a bugreport on apt about
this.

MfG
Goswin