Re: big Packages.gz file

2001-01-10 Thread Goswin Brederlow
   == Brian May [EMAIL PROTECTED] writes:

 zhaoway == zhaoway  [EMAIL PROTECTED] writes:
zhaoway This is only a small part of the whole story, IMHO. See
zhaoway my other email replying you. ;)

 Maybe there could be another version of Packages.gz without
 the extended descriptions -- I imagine they would take
 something like 33% of the Packages file, in line count at
 least.

zhaoway Exactly. DIFF or RSYNC method of APT (as Goswin pointed
zhaoway out), or just seperate Descriptions out (as I pointed out
zhaoway and you got it too), nearly 66% of the bits are
zhaoway saved. But this is only a hack, albeit efficient.

  At the risk of getting flamed, I investigated the possibility
  of writing an apt-get method to support rsync. I would use this
  to access an already existing private mirror, and not the main
  Debian archive. Hence the server load issue is not a
  problem. The only problem I have is downloading several megs of
  index files every time I want to install a new package (often
  under 100kb) from unstable, over a volume charged 28.8 kbps PPP
  link, using apt-get[1].

I tried the same, but I used the copy method as template, which is
rather bad. Should have used http as starting point.

Can you send me your patch please.

  I think (if I understand correctly) that I found three problems
  with the design of apt-get:

  1. It tries to down-load the compressed Packages file, and has
  no way to override it with the uncompressed file. I filed a bug
  report against apt-get on this, as I believe this will also be
  a problem with protocols like rproxy too.

  2. apt-get tries to be smart and passes the method a
  destination file name that is only a temporary file, and not
  the final file. Hence, rsync cannot make a comparison between
  local and remote versions of the file.

I wrote to the deity mailinglist concerning those two problems with 2
possible sollution. Till now the only answere I got was NO we don't
want rsync after pressing the issue here on debian-devel.

  3. Instead, rsync creates its own temporary file while
  downloading, so apt-get cannot display the progress of the
  download operation because as far as it is concerned the
  destination file is still empty.

Hmm, isn't there a informational message you can output to hint of the
progress? We would have to patch rsync to generate that style of
progress output or fork and parse the output of rsync and pass on
altered output.

  I think the only way to fix both 2 and 3 is to allow some
  coordination between apt-get and rsync where to put the
  temporary file and where to find the previous version of the
  file.

Doing some more thinking I like the second solution to the problem
more and more:

1. Include a template (some file that apt-get thinks matches best) in
the fetch request. The rsync method can then copy that file to the
destination and rsync on it. This would be the uncompressed Packages
file or a previous deb or the old source.

2. return wheather the file is compressed or not simply by passing
back the destination filename with the appropriate extension (.gz). So
the destination filename is altered to reflect the fileformat.

MfG
Goswin




Re: big Packages.gz file

2001-01-09 Thread Hamish Moffatt
On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote:
 A big package index IMHO is the current bottleneck of Debian package system.

What is the real problem with the large package files? They take a long
time to download, but so do emacs and other bloatware.


Hamish
-- 
Hamish Moffatt VK3SB [EMAIL PROTECTED] [EMAIL PROTECTED]




Re: big Packages.gz file

2001-01-09 Thread Miles Bader
Hamish Moffatt [EMAIL PROTECTED] writes:
 What is the real problem with the large package files? They take a long
 time to download, but so do emacs and other bloatware.

Yeah, but how often do you download emacs?

The packages file gets downloaded _every single time_ you do an update,
and for those of us with a slow modem link, that really sucks.

-Miles
-- 
Love is a snowmobile racing across the tundra.  Suddenly it flips over,
pinning you underneath.  At night the ice weasels come.  --Nietzsche




Re: big Packages.gz file

2001-01-09 Thread Hamish Moffatt
On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote:
 Hamish Moffatt [EMAIL PROTECTED] writes:
  What is the real problem with the large package files? They take a long
  time to download, but so do emacs and other bloatware.
 
 Yeah, but how often do you download emacs?

Never, I wouldn't touch that thing with a 40 foot barge pole!

 The packages file gets downloaded _every single time_ you do an update,
 and for those of us with a slow modem link, that really sucks.

True enough. I haven't really been following the discussion, to be honest.

Maybe there could be another version of Packages.gz without the
extended descriptions -- I imagine they would take something like
33% of the Packages file, in line count at least.


Hamish
-- 
Hamish Moffatt VK3SB [EMAIL PROTECTED] [EMAIL PROTECTED]




Re: big Packages.gz file

2001-01-09 Thread zhaoway
From: Hamish Moffatt [EMAIL PROTECTED]
Subject: Re: big Packages.gz file
Date: Tue, 9 Jan 2001 23:40:01 +1100

 On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote:
  The packages file gets downloaded _every single time_ you do an update,
  and for those of us with a slow modem link, that really sucks.

This is only a small part of the whole story, IMHO. See my other email
replying you. ;)

 Maybe there could be another version of Packages.gz without the
 extended descriptions -- I imagine they would take something like
 33% of the Packages file, in line count at least.

Exactly. DIFF or RSYNC method of APT (as Goswin pointed out), or just
seperate Descriptions out (as I pointed out and you got it too),
nearly 66% of the bits are saved. But this is only a hack, albeit
efficient.

Cause this does not solve the problem of the package pool within the
package pool system. It does it on the protocol and client tool side.

1) AIUI, package pool should be a storage system, which should has a
smart algorithm for deleting packages which no distribution or other
packages referncing. (Garbage collection by reference counts.)

2) A distribution, put aside the work of our honoured release manager,
should be a partial package index listing. Thus, should be seperated
from storage system. The current ``testing'' distribution doesn't to
it well enough. (Thus, it has a regulation on upload frequency.)

With these two things in mind, RSYNC can help very little. And the
package pool's indexing problem remains. While on my previous letters,
I try to get out a discussion on one of my humble try to help. ;)

As soon as I have enough time, and enough discussion, I maybe write a
more prepared document. But I need discussion first. Thanks!

--
echo EOF |cpp - -|egrep -v '(^#|^$)'
/*   =|=X ++
 *   /\+_ p7 [EMAIL PROTECTED] */
EOF




Re: big Packages.gz file

2001-01-09 Thread zhaoway
From: Hamish Moffatt [EMAIL PROTECTED]
Subject: Re: big Packages.gz file
Date: Tue, 9 Jan 2001 19:59:13 +1100

 On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote:
  A big package index IMHO is the current bottleneck of Debian package system.
 
 What is the real problem with the large package files? They take a long
 time to download, but so do emacs and other bloatware.

The problem is, IMHO, that is, ;)

Every awhile, when you want to update a package to the newest version,
you have to update the package index first. And that is not absolutely
necessary if you look into this problem. And the size of package index
is constantly growing.

With Emacs, nearly all of the bits are necessary for the
functionality, and you don't download it for evey trivial update
tasks. And it is not as rapidly growing in size as package index is.

To look further, if we allow translation of Packages index, it could
be even bigger. Or we allow multiple versions of a package come into
Package pool (as Manoj had mentioned in another thread), big Package
index could be even more troublesome.

Hope I make myself clearer. ;) And thank you for discuss with me! ;)

--
echo EOF |cpp - -|egrep -v '(^#|^$)'
/*   =|=X ++
 *   /\+_ p7 [EMAIL PROTECTED] */
EOF




Re: big Packages.gz file

2001-01-09 Thread sluncho
On Tue, Jan 09, 2001 at 11:40:01PM +1100, Hamish Moffatt wrote:
 On Tue, Jan 09, 2001 at 06:04:58PM +0900, Miles Bader wrote:
  Hamish Moffatt [EMAIL PROTECTED] writes:
   What is the real problem with the large package files? They take a long
   time to download, but so do emacs and other bloatware.

  The packages file gets downloaded _every single time_ you do an update,
  and for those of us with a slow modem link, that really sucks.
 
 True enough. I haven't really been following the discussion, to be honest.
 
 Maybe there could be another version of Packages.gz without the
 extended descriptions -- I imagine they would take something like
 33% of the Packages file, in line count at least.

Please excuse me if I am jumping into the discussion unprepared or if
this has already been mentioned.

How hard would it be to make daily diffs of the Package file? Most people
running unstable update every other day and this will require downloading
and applying only a couple of diff files.

The whole process can be easily automated.

Sluncho [EMAIL PROTECTED]




Re: big Packages.gz file

2001-01-09 Thread Brian May
 sluncho == sluncho  [EMAIL PROTECTED] writes:

sluncho How hard would it be to make daily diffs of the Package
sluncho file? Most people running unstable update every other day
sluncho and this will require downloading and applying only a
sluncho couple of diff files.

sluncho The whole process can be easily automated.

Sounds remarkably like the process (weekly not daily though) to
distribute Fidonet nodelist diffs. Also similar to kernel diffs, I
guess to.

Seems a good idea to me (until better solutions like rproxy are better
implemented), but you have to be careful not to get apply diffs in the
wrong order.
-- 
Brian May [EMAIL PROTECTED]




Re: big Packages.gz file

2001-01-09 Thread Goswin Brederlow
   == Brian May [EMAIL PROTECTED] writes:

 sluncho == sluncho  [EMAIL PROTECTED] writes:
sluncho How hard would it be to make daily diffs of the Package
sluncho file? Most people running unstable update every other day
sluncho and this will require downloading and applying only a
sluncho couple of diff files.

sluncho The whole process can be easily automated.

  Sounds remarkably like the process (weekly not daily though) to
  distribute Fidonet nodelist diffs. Also similar to kernel
  diffs, I guess to.

  Seems a good idea to me (until better solutions like rproxy are
  better implemented), but you have to be careful not to get
  apply diffs in the wrong order.  -- Brian May [EMAIL PROTECTED]

Or missing one or having a corrupted file to begin with or any other
of 1000 possibilities.

Also mirrors will allways lack behind, have erratic timestamping on
those files and so on. I think it would become a mess pretty soon.

The nice thing about rsync is that its self repairing. Its allso more
efficient than a normal diff.

MfG
Goswin




Re: big Packages.gz file

2001-01-09 Thread Brian May
 zhaoway == zhaoway  [EMAIL PROTECTED] writes:

zhaoway This is only a small part of the whole story, IMHO. See
zhaoway my other email replying you. ;)

 Maybe there could be another version of Packages.gz without the
 extended descriptions -- I imagine they would take something
 like 33% of the Packages file, in line count at least.

zhaoway Exactly. DIFF or RSYNC method of APT (as Goswin pointed
zhaoway out), or just seperate Descriptions out (as I pointed out
zhaoway and you got it too), nearly 66% of the bits are
zhaoway saved. But this is only a hack, albeit efficient.

At the risk of getting flamed, I investigated the possibility of
writing an apt-get method to support rsync. I would use this to access
an already existing private mirror, and not the main Debian
archive. Hence the server load issue is not a problem. The only
problem I have is downloading several megs of index files every time I
want to install a new package (often under 100kb) from unstable, over
a volume charged 28.8 kbps PPP link, using apt-get[1].

I think (if I understand correctly) that I found three problems with
the design of apt-get:

1. It tries to down-load the compressed Packages file, and has no way
to override it with the uncompressed file. I filed a bug report
against apt-get on this, as I believe this will also
be a problem with protocols like rproxy too.

2. apt-get tries to be smart and passes the method a destination file
name that is only a temporary file, and not the final file. Hence,
rsync cannot make a comparison between local and remote versions of
the file.

3. Instead, rsync creates its own temporary file while downloading, so
apt-get cannot display the progress of the download operation because
as far as it is concerned the destination file is still empty.

I think the only way to fix both 2 and 3 is to allow some coordination
between apt-get and rsync where to put the temporary file and where to
find the previous version of the file.

Note:
[1] Normally I try to find the files manually via lynx, but right
at the moment this is rather difficult, as I seem to try numerous
directories but not get the expected result. Some packages
-- 
Brian May [EMAIL PROTECTED]




Re: big Packages.gz file

2001-01-09 Thread Brian May
 Brian == Brian May [EMAIL PROTECTED] writes:

Brian Note: [1] Normally I try to find the files manually via
Brian lynx, but right at the moment this is rather difficult, as
Brian I seem to try numerous directories but not get the expected
Brian result. Some packages 

Damm - sent that message before I had finished typing :-(

Anyway, I meant to say some packages are hard to find manually while
they haven't all been moved to the package pool system yet.
-- 
Brian May [EMAIL PROTECTED]




Re: package pool and big Packages.gz file

2001-01-08 Thread Goswin Brederlow
   == Jason Gunthorpe [EMAIL PROTECTED] writes:

  On 8 Jan 2001, Goswin Brederlow wrote:
 
 I don't need to get a filelisting, apt-get tells me the
 name. :)

  You have missed the point, the presence of the ability to do
  file listings prevents the adoption of rsync servers with high
  connection limits.

Then that feature should be limited to non-recursive listings or
turned off. Or .listing files should be created that are just served.

  Reversed checksums (with a detached checksum file) is
 something  someone should implement for debian-cd. You calud
 even quite  reasonably do that totally using HTTP and not run
 the risk of  rsync load at all.
 
 At the moment the client calculates one roling checksum and
 md5sum per block.

  I know how rsync works, and it uses MD4.

Ups, then s/5/4/g.

 Given a 650MB file, I don't want to know the hit/miss ratios
 for the roling checksum and the md5sum. Must be realy bad.

  The ratio is supposed to only scale with block size, so it
  should be the same for big files and small files (ignoring the
  increase in block size with file size).  The amount of time
  expended doing this calculation is not trivial however.

Hmm, in the technical paper it says that it creates a 16 bit external
hash, each entry a linked list of items containing the full 32 Bit
rolling checksum (or the other 16 bit) and the md4sum.

So when you have more blocks, the hash will fill up. So you have more
hits on the first level and need to search a linked list. With a block
size of 1K a CD image has 10 items per hash entry, its 1000% full. The
time wasted alone to check the rolling checksum must be huge.

And with 65 rolling checksums for the image, theres a ~10/65536
chance chance of hitting the same checksum with differen md4sum, so
thats about 100 times per CD, just by pure chance.

If the images match, then its 65 times.

So the better the match, the more blocks you have, the more cpu it
takes. Of cause larger blocks take more time to compute a md4sum, but
you will have less blocks then.

  For CD images the concern is of course available disk
  bandwidth, reversed checksums eliminate that bottleneck.

That anyway. And ram.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-08 Thread Jason Gunthorpe

On 8 Jan 2001, Goswin Brederlow wrote:

 Then that feature should be limited to non-recursive listings or
 turned off. Or .listing files should be created that are just served.

*couf* rproxy *couf*

 So when you have more blocks, the hash will fill up. So you have more
 hits on the first level and need to search a linked list. With a block
 size of 1K a CD image has 10 items per hash entry, its 1000% full. The
 time wasted alone to check the rolling checksum must be huge.

Sure, but that is trivially solvable and is really a minor amount of
time when compared with the computing of the MD4 hashes. In fact when you
start taking about 65 blocks you want to reconsider the design choices
that were made with rsync's searching - it is geared toward small files
and is not really optimal for big ones.

 So the better the match, the more blocks you have, the more cpu it
 takes. Of cause larger blocks take more time to compute a md4sum, but
 you will have less blocks then.

No. The smaller the blocks the more CPU time it will take to compute MD4
hashes. Expect MD4 to run at  100meg/sec on modern hardware so you are
looking at burning 6 seconds of CPU time to verify the local CD image.

If you start getting large 32 bit checksum matches with md4 mismatches due
to too large a block size then you could easially double or triple the
number of md4 calculations you need. That is still totally dwarfed by the
 10meg/sec IO throughput you can expect with a copy of a 600 meg ISO
file. 
 
Jason




Re: Linux Gazette [Was: Re: big Packages.gz file]

2001-01-08 Thread Andreas Fuchs
On 2001-01-07, Goswin Brederlow
[EMAIL PROTECTED] wrote:
 zhaoway 1) It prevent many more packages to come into Debian, for
 zhaoway example, Linux Gazette are now not present newest issues
 zhaoway in Debian. People occasionally got fucked up by packages

 Any reasons why the Linux gazette is not present anymore?
 And is there a virtual package for the Linux gazette that allays
 depends on the newest version?

Another solution would be to have only an installer which installs the
latest version of the LG from a server that keeps it. Keeps the
Packages.gz file clean, and LG readers happy.

Or am I missing something?

-- 
Andreas Fuchs, [EMAIL PROTECTED], [EMAIL PROTECTED], antifuchs
Hail RMS! Hail Cthulhu! Hail Eris! All hail Discordia!




Re: big Packages.gz file

2001-01-08 Thread zhaoway
On Sun, Jan 07, 2001 at 05:18:02PM -0500, Chris Gray wrote:
  Brian May writes:
 bm What do large packages have to do with the size of the index file,
 bm Packages?
 
 I think the point was that every package adds about 30-45 lines to the
 Packages file.  You don't need to download any of the Linux Gazette to
 have the 33 lines each issue takes up in the Packages file.

A big package index IMHO is the current bottleneck of Debian package system.
While most of people are more interested in RSYNC to come to cure, MHO RSYNC
is an overkill and a non-clean-kill. It prevents easy mirroring of Debian by
requesting RSYNC service on the mirror system, and it won't solve the pool's
problem, but give a hack. ;)

While OTOH a relatively straight solution is:

* To seperate Packages.gz to be along with each package as another seperate
  file. Ceazar's belong to Ceazar. ;)
  i.e., each pkg_ver-sub_arch.deb with a pkg_ver-sub_arch.idx
* At the same time, provide a big Packages.gz by collecting these small
  files for compatibility. Or, maybe even a trimmed Packages.gz by removing
  all of the Description:s.
* Optionally, provide hard or symlinks along with each package, some
  i.e., pkg_[stable|unstable|testing]_arch.idx - pkg_ver-sub_arch.idx
  Note: this won't hurt mirror, OTOH could even help partial mirror.
* And enable multiple versions of a package in the package pool.

This way, general package index is optional. And release management could
move towards those more fine tuned task-* like packages. No lost. ;)

Just for discussion, I would be glad to hear critics. ;)

-- 
echo EOF |cpp - -|egrep -v '(^#|^$)'
/*   =|=X ++
 *   /\+_ p7 [EMAIL PROTECTED] */
EOF




Re: big Packages.gz file

2001-01-08 Thread calvin
Hello,

On Tue, Jan 09, 2001 at 03:04:10AM +0800, zhaoway wrote:
 * To seperate Packages.gz to be along with each package as another seperate
   file. Ceazar's belong to Ceazar. ;)
   i.e., each pkg_ver-sub_arch.deb with a pkg_ver-sub_arch.idx
No, thats not a win. You would end up checking time stamps for thousands
of files in case of an update.
I liked the idea of alphabetical splitting in Packages-[a-z0-9].gz

 * At the same time, provide a big Packages.gz by collecting these small
   files for compatibility. Or, maybe even a trimmed Packages.gz by removing
   all of the Description:s.
Jup, just keep a copy of Packages.gz and provide backwards compatibility.

Bastian Kleineidam


pgp0mckdUDTPq.pgp
Description: PGP signature


Re: Linux Gazette [Was: Re: big Packages.gz file]

2001-01-08 Thread Adrian Bridgett
On Mon, Jan  8, 2001 at 18:20:16 +0100 (+), Andreas Fuchs wrote:
 On 2001-01-07, Goswin Brederlow
 [EMAIL PROTECTED] wrote:
  zhaoway 1) It prevent many more packages to come into Debian, for
  zhaoway example, Linux Gazette are now not present newest issues
  zhaoway in Debian. People occasionally got fucked up by packages
 
  Any reasons why the Linux gazette is not present anymore?
  And is there a virtual package for the Linux gazette that allays
  depends on the newest version?
 
 Another solution would be to have only an installer which installs the
 latest version of the LG from a server that keeps it. Keeps the
 Packages.gz file clean, and LG readers happy.
 
 Or am I missing something?

To answer the questions:

a) it is present but I havn't updated it in a while (busy).  Wouter Verhelst
has offered to take over the package but he's new to packaging so things are
taking a bit of time.

b) nope - I havn't done a virtual latest package yet, there is a bug about
it I think (or Wouter suggested it).

c) personally, I like the LG since I find the issues useful - I found useful
articles in all the ones I read.  Unfortuantely since I left uni I havn't
been sufficiently bored to remember to download and read them (and hence to
package them).

d) I was hoping the data section of Debian would get into policy so I
could move the packages there and out of main.

Adrian

Email: [EMAIL PROTECTED]
Windows NT - Unix in beta-testing. GPG/PGP keys available on public key servers
Debian GNU/Linux  -*-  By professionals for professionals  -*-  www.debian.org




Re: package pool and big Packages.gz file

2001-01-07 Thread Sam Vilain
On Fri, 5 Jan 2001 09:33:05 -0700 (MST)
Jason Gunthorpe [EMAIL PROTECTED] wrote:

  If that suits your needs, feel free to write a bugreport on apt about
  this.
 Yes, I enjoy closing such bug reports with a terse response.
 Hint: Read the bug page for APT to discover why!

From bug report #76118:

No. Debian can not support the use of rsync for anything other than
mirroring, APT will never support it.

Why?  Because if everyone used rsync, the loads on the servers that supported 
rsync would be too high?  Or something else?
--
Sam Vilain, [EMAIL PROTECTED]WWW: http://sam.vilain.net/
GPG public key: http://sam.vilain.net/sam.asc




Re: package pool and big Packages.gz file

2001-01-07 Thread Sam Vilain
On Fri, 5 Jan 2001 19:08:38 +0200
[EMAIL PROTECTED] (Sami Haahtinen) wrote:

 Or, can rsync sync binary files?
 hmm.. this sounds like something worth implementing..

rsync can, but the problem is with a compressed stream if you insert or alter 
data early on in the stream, the data after that change is radically different.

But... you could use it successfully against the .tar files inside the .deb, 
which are normally compressed.  This would probably require some special 
implementation of rsync, or to have the uncompressed packages on the server and 
put the magic in apt.

Or perhaps the program apt-mirror is called for, which talks its own protocol 
to other copies of itself, and will do a magic job of selectively updating 
mirror copies of the debian archive using the rsync algorithm.  This would be 
similar to the apt-get and apt-move pair, but actually sticking it into a 
directory structure that looks like the debian mirror.  Then, if you want to 
enable it, turn on the server version and share your mirror with your friends 
inside your corporate network!  Or an authenticated version, so that a person 
with their own permanent internet connection could share their archive with a 
handful of friends - having an entire mirror would be too costly for them.  I 
think this has some potential to be quite useful and reduce bandwidth 
requirements.  It could use GPG signatures to check that nothing funny is going 
on, too.

Either that or keep a number of patch files or .xd files for a couple of old 
revs per packages against the uncompressed contents of packages to allow small 
changes to packages to be quick.  Or perhaps implement this as patch packages, 
which are a special .deb that only contain the changed files and upgrade the 
package.
--
Sam Vilain, [EMAIL PROTECTED]WWW: http://sam.vilain.net/
GPG public key: http://sam.vilain.net/sam.asc




Re: package pool and big Packages.gz file

2001-01-07 Thread Goswin Brederlow
   == Sam Vilain [EMAIL PROTECTED] writes:

  On Fri, 5 Jan 2001 09:33:05 -0700 (MST) Jason Gunthorpe
  [EMAIL PROTECTED] wrote:

  If that suits your needs, feel free to write a bugreport on
 apt about this.  Yes, I enjoy closing such bug reports with a
 terse response.  Hint: Read the bug page for APT to discover
 why!

 From bug report #76118:

  No. Debian can not support the use of rsync for anything other
  than mirroring, APT will never support it.

  Why?  Because if everyone used rsync, the loads on the servers
  that supported rsync would be too high?  Or something else?  --
  Sam Vilain, [EMAIL PROTECTED] WWW: http://sam.vilain.net/ GPG
  public key: http://sam.vilain.net/sam.asc

Actually the load should drop, providing the following feature add
ons:

1. cached checksums and pulling instead of pushing
2. client side unpackging of compressed streams

That way the rsync servers would have to first server the checksum
file from cache (being 200-1000 smaller than the real file) and then
just the blocks the client asks for. So if 1% of the file being
rsynced fits its even and everything above that saves bandwidth.

The current mode of operation of rsync works in the reverse, so all
the computation is done on the server every time, which of cause is a
heavy load on the server.

I hope both features will work without chaning the server, but if not,
we will have to wait till servers catch up with the feature.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-07 Thread Falk Hueffner
Sam Vilain [EMAIL PROTECTED] writes:

 On Fri, 5 Jan 2001 19:08:38 +0200
 [EMAIL PROTECTED] (Sami Haahtinen) wrote:
 
  Or, can rsync sync binary files?
  hmm.. this sounds like something worth implementing..
 
 rsync can, but the problem is with a compressed stream if you insert
 or alter data early on in the stream, the data after that change is
 radically different.
 
 But... you could use it successfully against the .tar files inside
 the .deb, which are normally compressed.  This would probably
 require some special implementation of rsync, or to have the
 uncompressed packages on the server and put the magic in apt.

 [...]
 
 Either that or keep a number of patch files or .xd files for a
 couple of old revs per packages against the uncompressed contents of
 packages to allow small changes to packages to be quick.  Or perhaps
 implement this as patch packages, which are a special .deb that only
 contain the changed files and upgrade the package.

I suggest you have a look at 'tje' by Joost Witteveen
(http://joostje.op.het.net/tje/index.html). It is specifically written
with the goal in mind to sync Debian mirrors with minimum bandwidth
use. It doesn't use the rsync algorithm, but something similar. It
understands .debs and claims to have less server CPU usage than rsync,
since it caches diffs and md5sums. It would be really nice if anybody
with an up-to-date mirror could volunteer to provide a machine to set
up a tje server to test it a little more...

Falk




Re: package pool and big Packages.gz file

2001-01-07 Thread Matt Zimmerman
On Sun, Jan 07, 2001 at 03:49:43PM +0100, Goswin Brederlow wrote:

 Actually the load should drop, providing the following feature add
 ons:
 [...]

The load should drop from that induced by the current rsync setup (for the
mirrors), but if many, many more client start using rsync (instead of
FTP/HTTP), I think there will still be a significant net increase in load.

Whether it would be enough to cause a problem is debatable, and I honestly
don't know either way.

-- 
 - mdz




Re: package pool and big Packages.gz file

2001-01-07 Thread Goswin Brederlow
   == Matt Zimmerman [EMAIL PROTECTED] writes:

  On Sun, Jan 07, 2001 at 03:49:43PM +0100, Goswin Brederlow
  wrote:
 Actually the load should drop, providing the following feature
 add ons: [...]

  The load should drop from that induced by the current rsync
  setup (for the mirrors), but if many, many more client start
  using rsync (instead of FTP/HTTP), I think there will still be
  a significant net increase in load.

  Whether it would be enough to cause a problem is debatable, and
  I honestly don't know either way.

When the checksums are cached there will be no cpu load caused by
rsync, since it will only transfer the file. And the checksum files
will be realy small as I said, so if some similarity is found the
reduction in data will make more than up for the checksum download.

The only increase is the space needed to store the checksums in some
form of cache.

MfG
Goswin




Re: big Packages.gz file

2001-01-07 Thread Chris Gray
 Brian May writes:

 zhaoway == zhaoway  [EMAIL PROTECTED] writes:
zhaoway 1) It prevent many more packages to come into Debian, for
zhaoway example, Linux Gazette are now not present newest issues
zhaoway in Debian. People occasionally got fucked up by packages
zhaoway like anachism-doc because the precious band-width. And
zhaoway some occasional discussion on L10N packages to distrub
zhaoway others life who don't need it.

bm ...only if you download and install the package in question.

bm What do large packages have to do with the size of the index file,
bm Packages?

I think the point was that every package adds about 30-45 lines to the
Packages file.  You don't need to download any of the Linux Gazette to
have the 33 lines each issue takes up in the Packages file.

Cheers,
Chris

-- 
Got jag?  http://www.tribsoft.com




Linux Gazette [Was: Re: big Packages.gz file]

2001-01-07 Thread Goswin Brederlow
   == Chris Gray [EMAIL PROTECTED] writes:

 Brian May writes:
 zhaoway == zhaoway  [EMAIL PROTECTED] writes:
zhaoway 1) It prevent many more packages to come into Debian, for
zhaoway example, Linux Gazette are now not present newest issues
zhaoway in Debian. People occasionally got fucked up by packages

Any reasons why the Linux gazette is not present anymore?

And is there a virtual package for the Linux gazette that allays
depends on the newest version?

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-07 Thread Jason Gunthorpe

On 7 Jan 2001, Goswin Brederlow wrote:

 Actually the load should drop, providing the following feature add
 ons:
 
 1. cached checksums and pulling instead of pushing
 2. client side unpackging of compressed streams

Apparently reversing the direction of rsync infringes on a patent.

Plus there is the simple matter that the file listing and file download
features cannot be seperated. Doing a listing of all files on our site is
non-trivial.

Once you strip all that out you have rproxy.

Reversed checksums (with a detached checksum file) is something someone
should implement for debian-cd. You calud even quite reasonably do that
totally using HTTP and not run the risk of rsync load at all.

Such a system for Package files would also be acceptable I think.

Jason




Re: package pool and big Packages.gz file

2001-01-07 Thread Brian May
 Goswin == Goswin Brederlow [EMAIL PROTECTED] writes:

Goswin Actually the load should drop, providing the following
Goswin feature add ons:

How does rproxy cope? Does it require a high load on the server?  I
suspect not, but need to check on this.

I think of rsync as just being a quick hack, rproxy is the (long-term)
direction we should be headed. rproxy is the same as rsync, but based
on the HTTP protocol, so it should be possible (in theory) to
integrate into programs like Squid, Apache and Mozilla (or so the
authors claim).
-- 
Brian May [EMAIL PROTECTED]




Re: package pool and big Packages.gz file

2001-01-07 Thread Goswin Brederlow
   == Brian May [EMAIL PROTECTED] writes:

 Goswin == Goswin Brederlow [EMAIL PROTECTED] writes:
Goswin Actually the load should drop, providing the following
Goswin feature add ons:

  How does rproxy cope? Does it require a high load on the
  server?  I suspect not, but need to check on this.

  I think of rsync as just being a quick hack, rproxy is the
  (long-term) direction we should be headed. rproxy is the same
  as rsync, but based on the HTTP protocol, so it should be
  possible (in theory) to integrate into programs like Squid,
  Apache and Mozilla (or so the authors claim).  -- Brian May
  [EMAIL PROTECTED]

URL?

Sounds more like encapsulation of an rsync similar protocol in html,
but its hard to tell from the few words you write. Could be intresting
though.

Anyway, it will not resolve the problem with compressed files if its
just like rsync.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-07 Thread Brian May
 Goswin == Goswin Brederlow [EMAIL PROTECTED] writes:

Goswin URL?

URL:http://linuxcare.com.au/projects/rproxy/

The documentation seems very comprehensive, but I am not sure when it
was last updated.

Goswin Sounds more like encapsulation of an rsync similar
Goswin protocol in html, but its hard to tell from the few words
Goswin you write. Could be intresting though.

errr... I think you mean http, not html.

Goswin Anyway, it will not resolve the problem with compressed
Goswin files if its just like rsync.

True; however, I was thinking more in the context of Packages
and other uncompressed files. It would be good though if these
issues regarding deb packages could be resolved.

Then again, perhaps I was a bit blunt with that statement on rsync,
rsync still will always have its uses, eg. copying private data.
-- 
Brian May [EMAIL PROTECTED]




Re: package pool and big Packages.gz file

2001-01-07 Thread Goswin Brederlow
   == Jason Gunthorpe [EMAIL PROTECTED] writes:

  On 7 Jan 2001, Goswin Brederlow wrote:

 Actually the load should drop, providing the following feature
 add ons:
 
 1. cached checksums and pulling instead of pushing 2. client
 side unpackging of compressed streams

  Apparently reversing the direction of rsync infringes on a
  patent.

When I rsync a file, rsync starts ssh to connect to the remote host
and starts rsync there in the reverse mode.

You say that the recieving end is violating a patent and the sending
end not?

Hmm, which patent anyway?

So I have to fork a rsync-non-US because of a patent?

  Plus there is the simple matter that the file listing and file
  download features cannot be seperated. Doing a listing of all
  files on our site is non-trivial.

I don't need to get a filelisting, apt-get tells me the name. :)
Also I can do rsync -v host::dir and parse the output to grab the
actual files with another rsync. So filelisting and downloading is
absolutely seperable.

Doing a listing of all file probably results in a timeout. The
harddrives are too slow.

  Once you strip all that out you have rproxy.

  Reversed checksums (with a detached checksum file) is something
  someone should implement for debian-cd. You calud even quite
  reasonably do that totally using HTTP and not run the risk of
  rsync load at all.

At the moment the client calculates one roling checksum and md5sum per
block.

The server, on the other hand, calculates the rolling checksum per
byte and for each hit it calculates an md5sum for one block.

Given a 650MB file, I don't want to know the hit/miss ratios for the
roling checksum and the md5sum. Must be realy bad.

The smaller the file, the less wrong md5sums need to be calculated.

  Such a system for Package files would also be acceptable I
  think.

For Packages file even cvs -z9 would be fine. They are comparatively
small to the rest of the load I would think.

But I, just as you do, think that it would be a realy good idea to
have precalculated rolling checksums and md5sums, maybe even for
various blocksizes, and let the client do the time consuming guessing
and calculating. That would prevent rsync to read every file served
twice, as it does now when they are dissimilar.

May the Source be with you.
Goswin




Re: package pool and big Packages.gz file

2001-01-07 Thread Jason Gunthorpe

On 8 Jan 2001, Goswin Brederlow wrote:

   Apparently reversing the direction of rsync infringes on a
   patent.
 
 When I rsync a file, rsync starts ssh to connect to the remote host
 and starts rsync there in the reverse mode.

Not really, you have to use quite a different set of operations to do it
one way vs the other. The core computation is the same, mind you.
 
 Hmm, which patent anyway?

Don't know, I never heard back from Tridge on that.
 
 I don't need to get a filelisting, apt-get tells me the name. :)

You have missed the point, the presence of the ability to do file listings
prevents the adoption of rsync servers with high connection limits.

   Reversed checksums (with a detached checksum file) is something
   someone should implement for debian-cd. You calud even quite
   reasonably do that totally using HTTP and not run the risk of
   rsync load at all.
 
 At the moment the client calculates one roling checksum and md5sum per
 block.

I know how rsync works, and it uses MD4.

 Given a 650MB file, I don't want to know the hit/miss ratios for the
 roling checksum and the md5sum. Must be realy bad.

The ratio is supposed to only scale with block size, so it should be the
same for big files and small files (ignoring the increase in block size
with file size).  The amount of time expended doing this calculation is
not trivial however. 

For CD images the concern is of course available disk bandwidth, reversed
checksums eliminate that bottleneck.

Jason




Re: package pool and big Packages.gz file

2001-01-06 Thread Andrew Stribblehill
Quoting Goswin Brederlow [EMAIL PROTECTED]:
== Sami Haahtinen [EMAIL PROTECTED] writes:

   Or, can rsync sync binary files?
 
 Of cause, but forget it with compressed data.

Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew
Tridgell (Samba, Rsync) has a patch to do this, but I don't know
whether he passed it onto the gzip maintainers.

(Apparently he's working on a --fuzzy flag for matching rsyncs
between, say foo-1.0.deb and foo-1.1.deb. He says it should be
called the --debian flag.)

Cheerio,

Andrew Stribblehill
Systems programmer, IT Service, University of Durham, England




Re: big Packages.gz file

2001-01-06 Thread Andreas Fuchs
On 2001-01-05, Brian May [EMAIL PROTECTED] wrote:
 What do large packages have to do with the size of the index file,
 Packages?

They waste one byte per multiple of 10 bytes of package size. (-;

Bad joke? So sue me.
-- 
Andreas Fuchs, [EMAIL PROTECTED], [EMAIL PROTECTED], antifuchs
Hail RMS! Hail Cthulhu! Hail Eris! All hail Discordia!




Re: big Packages.gz file

2001-01-06 Thread Sam Couter
 On 2001-01-05, Brian May [EMAIL PROTECTED] wrote:
  What do large packages have to do with the size of the index file,
  Packages?

Andreas Fuchs [EMAIL PROTECTED] wrote:
 They waste one byte per multiple of 10 bytes of package size. (-;

You mean one byte per order of magnitude of package size. ;)

 Bad joke? So sue me.

Yes, very bad. I couldn't resist correcting, which makes me at least as bad.
-- 
Sam Couter  |   Internet Engineer   |   http://www.topic.com.au/
[EMAIL PROTECTED]|   tSA Consulting  |
OpenPGP key available on key servers
OpenPGP fingerprint:  A46B 9BB5 3148 7BEA 1F05  5BD5 8530 03AE DE89 C75C


pgpSGNJSoIRqT.pgp
Description: PGP signature


Re: package pool and big Packages.gz file

2001-01-06 Thread Sam Couter
Andrew Stribblehill [EMAIL PROTECTED] wrote:
 
 Doesn't gzip have a --rsync option, or somesuch? Apparently Andrew
 Tridgell (Samba, Rsync) has a patch to do this, but I don't know
 whether he passed it onto the gzip maintainers.

I like the idea of having plugins for rsync to handle different kinds of
data. So the gzip plugin will decompress the data, and the rsync algorithm
can work on the decompressed data. Much better.

 (Apparently he's working on a --fuzzy flag for matching rsyncs
 between, say foo-1.0.deb and foo-1.1.deb. He says it should be
 called the --debian flag.)

A deb plugin would be better. :)
-- 
Sam Couter  |   Internet Engineer   |   http://www.topic.com.au/
[EMAIL PROTECTED]|   tSA Consulting  |
OpenPGP key available on key servers
OpenPGP fingerprint:  A46B 9BB5 3148 7BEA 1F05  5BD5 8530 03AE DE89 C75C


pgpSUf10GUoEI.pgp
Description: PGP signature


Re: package pool and big Packages.gz file

2001-01-06 Thread Brian May
 Sam == Sam Couter [EMAIL PROTECTED] writes:

Sam Andrew Stribblehill [EMAIL PROTECTED] wrote:
 Doesn't gzip have a --rsync option, or somesuch? Apparently
 Andrew Tridgell (Samba, Rsync) has a patch to do this, but I
 don't know whether he passed it onto the gzip maintainers.

Sam I like the idea of having plugins for rsync to handle
Sam different kinds of data. So the gzip plugin will decompress
Sam the data, and the rsync algorithm can work on the
Sam decompressed data. Much better.

 (Apparently he's working on a --fuzzy flag for matching rsyncs
 between, say foo-1.0.deb and foo-1.1.deb. He says it should be
 called the --debian flag.)

Sam A deb plugin would be better. :) 

Sounds like a good idea to me.

Although don't get the two issues confused:

1. difference in filename.
2. format of file.

Although, I guess in most cases the two will always be linked (eg.
choosing the best filename really depends on the format, as ideally
the most similar *.deb package should be used, and this means
implementing debian rules for comparing versions), will this always be
the case?
-- 
Brian May [EMAIL PROTECTED]




Re: package pool and big Packages.gz file

2001-01-06 Thread Drake Diedrich
On Sun, Jan 07, 2001 at 11:43:39AM +1100, Sam Couter wrote:
 
 A deb plugin would be better. :)

   One problem with a deb plugin is that .debs are signed in compressed
form.  gzip isn't guaranteed to produce the same compressed file from
identical uncompressed files on different architectures and releases.
Varying the compression flags can also change the compressed file.

-Drake




Re: package pool and big Packages.gz file

2001-01-06 Thread Matt Zimmerman
On Sun, Jan 07, 2001 at 12:53:14PM +1100, Drake Diedrich wrote:

 On Sun, Jan 07, 2001 at 11:43:39AM +1100, Sam Couter wrote:
  
  A deb plugin would be better. :)
 
One problem with a deb plugin is that .debs are signed in compressed
 form.  gzip isn't guaranteed to produce the same compressed file from
 identical uncompressed files on different architectures and releases.
 Varying the compression flags can also change the compressed file.

It shouldn't be a problem to tweak things so that the resulting files end up
exactly the same.  This is rsync, after all, and that is the program's goal.
For instance, uncompressed blocks could be used for comparison, but the gzip
header copied exactly.

-- 
 - mdz




Re: [FINAL, for now ;-)] (Was: Re: package pool and big Packages.gz file)

2001-01-05 Thread Joey Hess
If you don't like large Packages files, implement a rsync transfer
method for them.

-- 
see shy jo




big Packages.gz file

2001-01-05 Thread zhaoway
[sorry, either fetchmail or my ISP made me lost 30 or so emails.]

The problem with bigger and bigger Packages.gz,
[I thought is obvious. :-(] is,

1) It prevent many more packages to come into Debian, for example,
Linux Gazette are now not present newest issues in Debian. People
occasionally got fucked up by packages like anachism-doc because
the precious band-width. And some occasional discussion on L10N
packages to distrub others life who don't need it.

2) It don't scale. Release managment is difficult. RM in general
only considering RC bugs on most of the packages he is not familiar
of. ;-)

Now considering mechanisms such as DIFF and RSYNC of Packages.gz

1) They're difficult to setup, though it _should_ be easier considering
it's an end-user stuff. With the current state of testing, i.e., often
updated Packages.gz and a more or less stable state, that people tends
to update very often.

2) They have a FIX TIME problem. I.e., if you don't RSYNC or DIFF for
a long time, they won't save you extra bandwidth. While my approach
do.

3) They don't scale just as well. ;-)

Now considering mechanism to section Packages.gz by functionality or just
like Package pool does.

1) Due to the complicated Dependency problem, they're deemed to fail. ;-)

Okay, now see my approach. [See my previous mail. The FINAL one. ;-)]

1) They're compatible with old tools. (Only you discuss with me!!)

2) It scales well. To release managment, and to include just as many as
our hard disks permitted packages into Debian.

3) It is very easy for enduser to setup.

4) No extra burden on Developers as how frequently they should do upload.

5) No FIX TIME problem. (See above.)

6) Possibilies exist for package to provide changelog to users for their
consideration to if to upload. These will help developers to avoid some
fake bug reports.

So why not bother discuss with me? ;-)

zw




Re: big Packages.gz file

2001-01-05 Thread Brian May
 zhaoway == zhaoway  [EMAIL PROTECTED] writes:

zhaoway 1) It prevent many more packages to come into Debian, for
zhaoway example, Linux Gazette are now not present newest issues
zhaoway in Debian. People occasionally got fucked up by packages
zhaoway like anachism-doc because the precious band-width. And
zhaoway some occasional discussion on L10N packages to distrub
zhaoway others life who don't need it.

...only if you download and install the package in question.

What do large packages have to do with the size of the index file,
Packages?

zhaoway 2) They have a FIX TIME problem. I.e., if you don't RSYNC
zhaoway or DIFF for a long time, they won't save you extra
zhaoway bandwidth. While my approach do.

You only download what has changed. Nothing more, nothing less.

I could equally argue, if you wait a while, then exactly one package
in each section will change, causing you to have to re-download all
Index files.

I am not trying to argue that your method is a bad idea, but please
try and get your facts straight first.


Now back on topic: another similar alternative to rsync might be
protocols like rproxy, which add rsync capabilities to
HTTP. Apparently the authors want to include functionality (not sure
what time frame they are talking about here) in Squid and Apache. This
would mean rsync support in apt-get may be less important, you just
need to force it to download Packages not Packages.gz.
-- 
Brian May [EMAIL PROTECTED]




Re: package pool and big Packages.gz file

2001-01-05 Thread Jason Gunthorpe

On 5 Jan 2001, Goswin Brederlow wrote:

 If that suits your needs, feel free to write a bugreport on apt about
 this.

Yes, I enjoy closing such bug reports with a terse response.

Hint: Read the bug page for APT to discover why!

Jason




Re: package pool and big Packages.gz file

2001-01-05 Thread Sami Haahtinen
On Fri, Jan 05, 2001 at 03:05:03AM +0100, Goswin Brederlow wrote:
 Whats the problem with a big Packages file?
 
 If you don't want to download it again and again just because of small
 changes I have a better solution for you:
 
 rsync
 
 apt-get update could rsync all Packages files (yes, not the .gz once)
 and thereby download only changed parts. On uncompressed files rsync
 is very effective and the changes can be compressed for the actual
 transfer. So on upload you will pratically get a diff.gz to your old
 Packages file.


this would bring us to, apt renaming the old deb (if there is one) to the
name of the new package and rsync those. and we would save some time once
again... 

Or, can rsync sync binary files?

hmm.. this sounds like something worth implementing..

-- 
every nerd knows how to enjoy the little things of life,
like: rm -rf windows




Re: package pool and big Packages.gz file

2001-01-05 Thread Sami Haahtinen
On Fri, Jan 05, 2001 at 05:46:35AM +0800, zhaoway wrote:
  how about diffs bethween dinstall runs?..
 
 sorry, but i don't understand here. dinstall is a server side thing here?

yes, when dinstall runs it would copy the old packages file to, lets say,
packages.old and create it's changes to the new file.. after it's done it would
diff packages.old and packages...

  packages-010102-010103.gz
  packages-010103-010104.gz
  packages.gz
  
  apt would download the changes after the last update, and merge these to
  the package file, if the file gets corrupted, it would attempt to do a full
  update.
 
 on the top, some pkg-gz-deb lists packages on the leaf of dependency tree,
 and each of pkg-gz-deb won't get bigger than 100k, and each of them depends
 on some more basic pkg-gz-deb
 
 below, some other pkg-gz-deb like the base sub-system.
 
 this way, when user install xdm,
 apt-get first install pkg-gz-deb which lists xdm, then as dependency checking,
 it will install base-a-pkg-gz-deb etc. ect., then xdm got installed, this way,
 all xdm's dependency will be fulfiled with the newest information avalaible.
 
 and you can see this will surely ease up the band-width. (when update gcc, i
 won't get additional bits of Packages.gz about xdm xfree etc.)

wouldn't this make it a BIT too difficult?

-- 
every nerd knows how to enjoy the little things of life,
like: rm -rf windows




Re: package pool and big Packages.gz file

2001-01-05 Thread Wichert Akkerman
Previously Sami Haahtinen wrote:
 this would bring us to, apt renaming the old deb (if there is one) to the
 name of the new package and rsync those. and we would save some time once
 again... 

There is a --fuzzy-names patch for rsync that makes rsync do that itself.

 Or, can rsync sync binary files?

Yes.

 hmm.. this sounds like something worth implementing..

Don't bother, it's been done already. Ask Rusty for details.

Wichert.

-- 
   
 / Generally uninteresting signature - ignore at your convenience  \
| [EMAIL PROTECTED]  http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0  2805 3CB8 9250 2FA3 BC2D |




Re: package pool and big Packages.gz file

2001-01-05 Thread Goswin Brederlow
   == Sami Haahtinen [EMAIL PROTECTED] writes:

  On Fri, Jan 05, 2001 at 03:05:03AM +0100, Goswin Brederlow
  wrote:
 Whats the problem with a big Packages file?
 
 If you don't want to download it again and again just because
 of small changes I have a better solution for you:
 
 rsync
 
 apt-get update could rsync all Packages files (yes, not the .gz
 once) and thereby download only changed parts. On uncompressed
 files rsync is very effective and the changes can be compressed
 for the actual transfer. So on upload you will pratically get a
 diff.gz to your old Packages file.


  this would bring us to, apt renaming the old deb (if there is
  one) to the name of the new package and rsync those. and we
  would save some time once again...

Thats what the debian-mirror script does (its about halve of the
script just for that). It also uses old tar.gz, orig.tar.gz, diff.gz
and dsc files.

  Or, can rsync sync binary files?

Of cause, but forget it with compressed data.

  hmm.. this sounds like something worth implementing..

I'm currently discussing some changes to the rsync client with some
people from the rsync ML which would uncompress compressed data on the
client side (no changes to the server) and rsync those. Sounds like
not improving anything, but when reading the full description on this
it actually does.

Before that rsyncing new debs with old once hardly ever saves
anything. Where it hels is with big packages like xfree, where several
packages are identical between releases.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-05 Thread Goswin Brederlow
   == Jason Gunthorpe [EMAIL PROTECTED] writes:

  On 5 Jan 2001, Goswin Brederlow wrote:

 If that suits your needs, feel free to write a bugreport on apt
 about this.

  Yes, I enjoy closing such bug reports with a terse response.

  Hint: Read the bug page for APT to discover why!

  Jason

I couldn't find any existing bugreport concerning rsync support for
apt-get in the long list of bugs.

So why would you close such a wishlist bugreport?
And why with a terse response?

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-05 Thread Junichi Uekawa
In 05 Jan 2001 19:51:08 +0100 Goswin Brederlow [EMAIL PROTECTED] cum veritate 
scripsit :

Hello,

 I'm currently discussing some changes to the rsync client with some
 people from the rsync ML which would uncompress compressed data on the
 client side (no changes to the server) and rsync those. Sounds like
 not improving anything, but when reading the full description on this
 it actually does.
 
 Before that rsyncing new debs with old once hardly ever saves
 anything. Where it hels is with big packages like xfree, where several
 packages are identical between releases.

No offence, but wouldn't it be a tad difficult to play around with it, 
since deb packages are not just gzipped archives, but ar archive containing 
gzipped tar archives?


regards,
junichi

--
University: [EMAIL PROTECTED]Netfort: [EMAIL PROTECTED]
dancer, a.k.a. Junichi Uekawa   http://www.netfort.gr.jp/~dancer
 Dept. of Knowledge Engineering and Computer Science, Doshisha University.
... Long Live Free Software, LIBERTAS OMNI VINCIT.




Re: package pool and big Packages.gz file

2001-01-05 Thread Goswin Brederlow
   == Junichi Uekawa [EMAIL PROTECTED] writes:

  In 05 Jan 2001 19:51:08 +0100 Goswin Brederlow
  [EMAIL PROTECTED] cum veritate
  scripsit : Hello,

 I'm currently discussing some changes to the rsync client with
 some people from the rsync ML which would uncompress compressed
 data on the client side (no changes to the server) and rsync
 those. Sounds like not improving anything, but when reading the
 full description on this it actually does.
 
 Before that rsyncing new debs with old once hardly ever saves
 anything. Where it hels is with big packages like xfree, where
 several packages are identical between releases.

  No offence, but wouldn't it be a tad difficult to play around
  with it, since deb packages are not just gzipped archives, but
  ar archive containing gzipped tar archives?

Yes and no.

The problem is that deb files are special ar archives, so you can't
just download the files and ar them together.

One way would be to download the files in the ar, ar them together and
rsync again. Since ar does not chnage the data in it, the deb has the
same data just at different places, and rsync handles that well.

This would be possible, but would require server changes.

The trick is to know a bit about ar, but not to much. Just rsync the
header of the ar file till the first real file in it and then rsync
that recursively, then a bit more ar file data and another file and so
on. Knowing when subfiles start and how long they are is enough.

The question will be how much intelligence to teach rsync. I like
rsync stupid but still intelligent enough to do the job.

Its pretty tricky, so it will be some time before anything in that
direction is useable.

MfG
Goswin




Re: package pool and big Packages.gz file

2001-01-05 Thread Matthijs Melchior
Jason Gunthorpe wrote:
 

 Hint: Read the bug page for APT to discover why!
 


Looking through the apt bugs., saw this one, rejected:

Bug#77054: wish: show current-upgraded versions on upgrade -u


My private solution to this is the following patch to `apt-get':


--- algorithms.cc-ORG   Sat May 13 06:08:43 2000
+++ algorithms.cc   Sat Sep  9 22:11:19 2000
@@ -47,9 +47,13 @@
 {
// Adapt the iterator
PkgIterator Pkg = Sim.FindPkg(iPkg.Name());
+   const char *oldver = Pkg-CurrentVer ? Pkg.CurrentVer().VerStr() : -;
+   const char *newver = Pkg-VersionList ? Pkg.VersionList().VerStr() : -;
+
Flags[Pkg-ID] = 1;

-   cout  Inst   Pkg.Name();
+   cout  Inst   Pkg.Name()   (  oldver  newver  );
+
Sim.MarkInstall(Pkg,false);

// Look for broken conflicts+predepends.


This informs me about versions when doing apt-get --no-act install package.

I like this very much, and would appreciate this going into the official
apt-get command.


-- 
Thanks,
  -o)
Matthijs Melchior   Maarssen  /\\
mailto:[EMAIL PROTECTED]  +31 346 570616Netherlands _\_v
 




package pool and big Packages.gz file

2001-01-04 Thread zhaoway
hi,

[i'm not sure if this has been resolved, lart me if you like.]

my proposal to resolve big Packages.gz is through package
pool system.

add 36 or so new debian package, namely,

[a-zA-Z0-1]-packages-gz_date_all.deb

contents of each is quite obvious. ;-)
and a virtual unstable-packages-gz depends on all of them. finished.

apt-get update should deal with it.

1) as default, install packages-gz.deb, and finished. (against some policy ...)
2) otherwise, let user to choose from, that is a ui design... ;-)

release managment could just ;-) upload a woody-packages-gz_test-1_all.deb

episode I finished.

episode II involves the package pool deletion algorithms.

a package should only be deleted when no *-packages-gz debs reference it.

my 2'c

thanks for bear with me ;-)

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
[read my previous semi-proposal]

this has some more benefits,

1) package maintainer could upload (to pool) in whatever
frequency they like.

2) release is seperated from package pool which is a storage
system. and release is a qa system.

3) release could be managed through BTS on specific package-gz.deb
that surely would put much more burden on BTS, ;-)

4) if apt-get could deal it well, i hope all of sub-mirror'ing issue
will be gone easily. just apt-get install some-rel-packages-gz then
apt-get mirror (just like download and move ...)

my more 2'c ;-)

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Fri, Jan 05, 2001 at 03:17:30AM +0800, zhaoway wrote:
 [read my previous semi-proposal]
 
 this has some more benefits,
 
 1) package maintainer could upload (to pool) in whatever
 frequency they like.

in an ideal world, developer should upload to ''xxx-auto-builder'' ;-)

9i'm turning out to be crappy now. ;-)

bye,




Re: package pool and big Packages.gz file

2001-01-04 Thread Sami Haahtinen
On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote:
 my proposal to resolve big Packages.gz is through package
 pool system.
 
 add 36 or so new debian package, namely,
 
 [a-zA-Z0-1]-packages-gz_date_all.deb
 
 contents of each is quite obvious. ;-)
 and a virtual unstable-packages-gz depends on all of them. finished.
 
 apt-get update should deal with it

how about diffs bethween dinstall runs?..

packages-010102-010103.gz
packages-010103-010104.gz
packages.gz

apt would download the changes after the last update, and merge these to the
package file, if the file gets corrupted, it would attempt to do a full update.

This wouldn't be a big difference in the load that the master-ftp has to
handle, atleast when some 7 of these would be stored at maximum.

Regards, Sami Haahtinen

-- 
every nerd knows how to enjoy the little things of life,
like: rm -rf windows




Re: package pool and big Packages.gz file

2001-01-04 Thread Vince Mulhollon

The only other possibility not yet proposed (?) would be to split the
packages file by section.

base-packages
games-packages
x11-packages
net-packages

Then a server that just doesn't do x11 or doesn't go games has no need to
keep up with available x11 or games packages.





[EMAIL PROTECTED]   

punki.fi To: debian-devel@lists.debian.org  

(Samicc: (bcc: Vince 
Mulhollon/Brookfield/Norlight) 
Haahtinen)   Fax to:

 Subject: Re: package pool and big 
Packages.gz file 
01/04/2001  

03:01 PM









On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote:
 my proposal to resolve big Packages.gz is through package
 pool system.

 add 36 or so new debian package, namely,

 [a-zA-Z0-1]-packages-gz_date_all.deb

 contents of each is quite obvious. ;-)
 and a virtual unstable-packages-gz depends on all of them. finished.

 apt-get update should deal with it

how about diffs bethween dinstall runs?..

packages-010102-010103.gz
packages-010103-010104.gz
packages.gz

apt would download the changes after the last update, and merge these to
the
package file, if the file gets corrupted, it would attempt to do a full
update.

This wouldn't be a big difference in the load that the master-ftp has to
handle, atleast when some 7 of these would be stored at maximum.

Regards, Sami Haahtinen

--
every nerd knows how to enjoy the little things of life,
like: rm -rf windows


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact
[EMAIL PROTECTED]








Re: package pool and big Packages.gz file

2001-01-04 Thread Sami Haahtinen
On Thu, Jan 04, 2001 at 03:07:00PM -0600, Vince Mulhollon wrote:
 
 The only other possibility not yet proposed (?) would be to split the
 packages file by section.
 
 base-packages
 games-packages
 x11-packages
 net-packages
 
 Then a server that just doesn't do x11 or doesn't go games has no need to
 keep up with available x11 or games packages.

how would the package manager (namely apt) know which ones you need.. even if
you don't have X11 installed (and apt assumes you don't need X11 packages file)
doesn't mean that you wouldn't want to install x11 packages file.

same goes for net (which is a weird definition in any case) and games, base is
the only reasonable one. But without the others it's not needed either...

-- 
every nerd knows how to enjoy the little things of life,
like: rm -rf windows




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Thu, Jan 04, 2001 at 11:01:15PM +0200, Sami Haahtinen wrote:
 On Fri, Jan 05, 2001 at 03:02:15AM +0800, zhaoway wrote:
  my proposal to resolve big Packages.gz is through package
  pool system.
  
  add 36 or so new debian package, namely,
  
  [a-zA-Z0-1]-packages-gz_date_all.deb
  
  contents of each is quite obvious. ;-)
  and a virtual unstable-packages-gz depends on all of them. finished.
  
  apt-get update should deal with it
 
 how about diffs bethween dinstall runs?..

sorry, but i don't understand here. dinstall is a server side thing here?

 packages-010102-010103.gz
 packages-010103-010104.gz
 packages.gz
 
 apt would download the changes after the last update, and merge these to the
 package file, if the file gets corrupted, it would attempt to do a full 
 update.
 
 This wouldn't be a big difference in the load that the master-ftp has to
 handle, atleast when some 7 of these would be stored at maximum.

okay, try to group packages according to dependency,

on the top, some pkg-gz-deb lists packages on the leaf of dependency tree,
and each of pkg-gz-deb won't get bigger than 100k, and each of them depends
on some more basic pkg-gz-deb

below, some other pkg-gz-deb like the base sub-system.

this way, when user install xdm,
apt-get first install pkg-gz-deb which lists xdm, then as dependency checking,
it will install base-a-pkg-gz-deb etc. ect., then xdm got installed, this way,
all xdm's dependency will be fulfiled with the newest information avalaible.

and you can see this will surely ease up the band-width. (when update gcc, i
won't get additional bits of Packages.gz about xdm xfree etc.)

regards,

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Thu, Jan 04, 2001 at 11:19:59PM +0200, Sami Haahtinen wrote:
 how would the package manager (namely apt) know which ones you need.. even if
 you don't have X11 installed (and apt assumes you don't need X11 packages 
 file)
 doesn't mean that you wouldn't want to install x11 packages file.

another solution is to let every single deb provides its.pkg-gz

then, apt-get update will do nothing,
apt-get install some.deb will first download some.pkg-gz, then check its 
dependency,
then grab them.pkg-gz all, then install.

and a virtual release-pkgs-gz.deb will depend on some selected part of those
any.pkg-gz to get up a release.

then katie will remove a package only when no release-pkgs-gz.deb (or
testing, or whatever) depends on its.pkg-gz

regards,
zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Fri, Jan 05, 2001 at 06:07:20AM +0800, zhaoway wrote:
 another solution is to let every single deb provides its.pkg-gz
 
 then, apt-get update will do nothing,
 apt-get install some.deb will first download some.pkg-gz, then check its 
 dependency,
 then grab them.pkg-gz all, then install.

that is a minimum. isn't it? ;)
and then we will need some ``apt-get info pkg'' hehe..

 and a virtual release-pkgs-gz.deb will depend on some selected part of those
 any.pkg-gz to get up a release.

say one release contains 2000 pkgz, each pkg name is 10 chars, then the whole
infomation is a little more then 20k, compare with nowadays, a more than 1M.

and you could have base-3.3-release, and gnome-4.4-release which depends on
base-3.2-release and x-5.6-release. and chinese-2.0-release etc. ...

 then katie will remove a package only when no release-pkgs-gz.deb (or
 testing, or whatever) depends on its.pkg-gz

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread Petr Cech
On Fri, Jan 05, 2001 at 06:07:20AM +0800 , zhaoway wrote:
 On Thu, Jan 04, 2001 at 11:19:59PM +0200, Sami Haahtinen wrote:
  how would the package manager (namely apt) know which ones you need.. even 
  if
  you don't have X11 installed (and apt assumes you don't need X11 packages 
  file)
  doesn't mean that you wouldn't want to install x11 packages file.
 
 another solution is to let every single deb provides its.pkg-gz
 
 then, apt-get update will do nothing,
 apt-get install some.deb will first download some.pkg-gz, then check its 
 dependency,
 then grab them.pkg-gz all, then install.

but it will immensly restrict it's view on dependencies - think about
virtual packages. This is really not the way. Maybe spliting by as in pool/
so you only download changed part of the whole thing. But that's about it.
Maybe you can leave some part out, but ..

Petr Cech
-- 
Debian GNU/Linux maintainer - www.debian.{org,cz}
   [EMAIL PROTECTED]

* Joy notes some people think Unix is a misspelling of Unics which is a 
misspelling of Emacs :)




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
[quote myself, ;-) this is semi-final now ;-)]

another solution is to let every single deb provides its.pkg-gz
 
then, apt-get update will do nothing,
apt-get install some.deb will first download some.pkg-gz, then check its 
dependency,
then grab them.pkg-gz all, then install.

that is a minimum. isn't it? ;)
and then we will need some ``apt-get info pkg'' hehe..

and a virtual release-pkgs-gz.deb will depend on some selected part of those
any.pkg-gz to get up a release.

say one release contains 2000 pkgz, each pkg name is 10 chars, then the whole
infomation is a little more then 20k, compare with nowadays, a more than 1M.

and you could still do ``apt-get dist-upgrade'', just first install
release-pkgs-gz.deb then go on..., OR, first get a list of all debs installed
then update them each. [some more thoughts here..., later]

and you could have base-3.3-release, and gnome-4.4-release which depends on
base-3.2-release and x-5.6-release. and chinese-2.0-release etc. ...

then katie will remove a package only when no release-pkgs-gz.deb (or
testing, or whatever) depends on its.pkg-gz

zw




Re: package pool and big Packages.gz file

2001-01-04 Thread zhaoway
On Thu, Jan 04, 2001 at 11:19:25PM +0100, Petr Cech wrote:
 On Fri, Jan 05, 2001 at 06:07:20AM +0800 , zhaoway wrote:
  then, apt-get update will do nothing,
  apt-get install some.deb will first download some.pkg-gz, then check its 
  dependency,
  then grab them.pkg-gz all, then install.
 
 but it will immensly restrict it's view on dependencies - think about
 virtual packages. This is really not the way. Maybe spliting by as in pool/
 so you only download changed part of the whole thing. But that's about it.
 Maybe you can leave some part out, but ..

virtual package is weird here. ;-)
but could be resolve by a some-virtula.pkg-gz ;-)

and the tree view of dependency tree, like in console-apt, that means,
[see my another semi-final mail.. ;-)] in general, if you wanna an tree-view
of the whole tree, you will need to download the whole tree anyway, and my
approach won't prevent you do that. ;-)

kinda regards,
zw




[FINAL, for now ;-)] (Was: Re: package pool and big Packages.gz file)

2001-01-04 Thread zhaoway
final thoughts ;-)


On bigger and bigger Packages.gz file, a try


The directory structure looks roughly like this:

debian/dists/woody/main/binary-all/Packages.deb

debian/pool/main/a/abba/abba_1989.orig.tar.gz
abba_1989-12.diff.gz
abba_1989-12.dsc
abba_1989-12_all.deb
abba_1989-12_all.pkg

debian/pool/main/r/rel-chinese/rel-music_0.9_all.pkg
   rel-music_0.9_all.deb
   rel-base/rel-base_200_all.pkg
rel-base_200_all.pkg


Contents of rel-chinese_0.9_all.pkg is as following. rel-base
or even rel-woody is just much more complicated. Hope so. rel-chinese.deb
is nearly an empty package.

Package: rel-music
Priority: optional
Section: misc
Installed-Size: 12
Maintainer: Anthony and Cleopatra
Architecture: all
Source: rel-chinese
Version: 0.9
Depends: rel-base (= 200), abba (= 1989-12), beatles(= 1979-100), garbage(= 
1998-7)
 wearied-ear (= 2.1)
Provides: music | abba | beatles
Filename: debian/pool/main/r/rel-chinese/rel-music_0.9_all.deb
Size: 3492
MD5sum: c8c730ea650cf14638d83d6bb7707cdb
Description: Simplified music environment
 This 'task package' installs programs, data files, fonts, and
 documentation that makes it easier to use Debian for
 Simplified music related operations. (Surprise, surprise, garbage
 didn't provide music!)


Note, music is a virtual package provided by adda and beatles.

Contents of abba_1989-12_all.pkg is as following.

Package: abba
Priority: optional
Section: sound
Installed-Size: 140
Maintainer: Old Man Billy
Architecture: all
Version: 1998-12
Replaces: beatles
Provides: music
Depends: wearied-ear (= 2.0)
Filename: pool/main/a/abba/abba_1989-12_all.deb
Size: 33256
MD5sum: e07899b62b7ad12c545e9998adb7c8d7
Description: A Swedish Music Band
 ABBA is popular in 1980's in last millenium. Don't be confused by ABBA
 and ADDA which is a heavy metal band.


Here, music is a virtual package provided by packages abba and beatles.


Let's simulate some typical senarios here.

1) apt-get update

There're roughtly two purpose for this action. One is to get an
overview, to ease up further processing like virtual packages; another
purpose is to install a specific package, or do dist-upgrade.

On the second purpose, apt-get here will do nothing. (See below)

On the first purpose, apt-get will have to download and parse the
current distribution's .pkg file according to user configuration.
Say, to download rel-music, and then see the virtual package music
is provided by abba and beatles.

So, generally, ``apt-get update'' will deal with rel-some__all.pkg
to get all of the overall information it will need further on.

Then, where does the rel-some__all.pkg get its information? We don't
want the release manager to track down all of these information. So, where's
katie? ;-) I think the trade-off is worthy (Indeed, only katie get to be a
little more complicated) considering the scalabily being gained. Read on.


2) apt-get install abba

apt-get will first parse the previously downloaded rel-music.pkg, and get
abba is at version 1998-12, and it depends on wearied-ear (= 2.0) and wow!
rel-music happens to provide wearied-ear (= 2.1), that's okay. Then apt-get
go on to download its .pkg and parse it, and so on.

When all required .pkg were downloaded and parsed (an updated Packages.gz)
apt-get then go on to download and install every of the debs.

(Maybe there will be more complicated issues, only let me know. See what's
going. ;-)

Thus, minimum data downloaded. ;-)


3) apt-get dist-upgrade

I don't know the details, but I think it's not very complicated given above
information. (All necessary things are there, aren't they? ;-)


4) Packages upload

.pkg file is generated automatically. No extra burden on most of the
developers. And developers could upload just as frequently as they see
fit. ;-)

Katie will be a little ;-) more complicated.

Package will get to be deleted from package pool only when no rel-X
depends on them. rel-X are treated specially.

And some fine tuned mirror could be setup.

And release management could benefit from Bug Tracking System and more
flexible. IMHO. ;-)


Kind regards,
zw




Re: package pool and big Packages.gz file

2001-01-04 Thread Goswin Brederlow
   == zhaoway  [EMAIL PROTECTED] writes:

  hi, [i'm not sure if this has been resolved, lart me if you
  like.]

  my proposal to resolve big Packages.gz is through package pool
  system.


Whats the problem with a big Packages file?

If you don't want to download it again and again just because of small
changes I have a better solution for you:

rsync

apt-get update could rsync all Packages files (yes, not the .gz once)
and thereby download only changed parts. On uncompressed files rsync
is very effective and the changes can be compressed for the actual
transfer. So on upload you will pratically get a diff.gz to your old
Packages file.

If that suits your needs, feel free to write a bugreport on apt about
this.

MfG
Goswin