Re: RFDisscusion: Big Packages.gz and Statistics and Comparing solution

2001-01-08 Thread zhaoway
On Mon, Jan 08, 2001 at 12:53:52AM +0100, Marcin Owsiany wrote:
> Something like this should be implemented anyway when
> translated Descriptions will be supported and Packages size
> will grow by some 6 times.

Oh, man, you got another strong point against general package
index. (Big Packages.gz could be overwhelmingly big. hehe.. ;)

-- 
echo < */
EOF




Re: RFDisscusion: Big Packages.gz and Statistics and Comparing solution

2001-01-07 Thread Marcin Owsiany
On Sun, Jan 07, 2001 at 06:30:33PM -0500, Thomas Smith wrote:
> * Keep old Packages.gz file with Descriptions.
> * Make new Small-Packages.gz file w/o Descriptions, and have
>   new version of apt look for it, if so configured.
> * Some method of getting the descriptions separately.  Maybe
>   Descriptions.gz or maybe per-package or whatever.
> * Perhaps merge Descriptions (if they're downloaded), or put
>   placeholders (Description: ), into files
>   in /var/state/apt/lists/ so there's no compatability break
>   in those.

Something like this should be implemented anyway when
translated Descriptions will be supported and Packages size
will grow by some 6 times.

Marcin
-- 
Marcin Owsiany <[EMAIL PROTECTED]>
http://student.uci.agh.edu.pl/~porridge/
GnuPG: 1024D/60F41216 FE67 DA2D 0ACA FC5E 3F75  D6F6 3A0D 8AA0 60F4 1216




Re: RFDisscusion: Big Packages.gz and Statistics and Comparing solution

2001-01-07 Thread Thomas Smith
hi.

On Sun, Jan 07, 2001 at 11:42:04PM +0800, zhaoway wrote:

> 3. Additional benefits
> 
> Seperate changelog.Debian and `Description:' etc. out into meta-info file
> could help users: 1) reduce the bandwidth eaten 2) help their upgrade
> decisions easily.
I like this idea.  There are some ways to allow the
non-Description download without breaking compatability.
Here's one:

* Keep old Packages.gz file with Descriptions.
* Make new Small-Packages.gz file w/o Descriptions, and have
  new version of apt look for it, if so configured.
* Some method of getting the descriptions separately.  Maybe
  Descriptions.gz or maybe per-package or whatever.
* Perhaps merge Descriptions (if they're downloaded), or put
  placeholders (Description: ), into files
  in /var/state/apt/lists/ so there's no compatability break
  in those.

have a nice day,
 thomas
-- 
Thomas Smith <[EMAIL PROTECTED]>  
http://finbar.dyndns.org/ 
gpg key id 0xACABA81E, fingerprint:
3A47 CFA5 0E5D CF4A 5B22  12D3 FF1B 84FE ACAB A81E



pgpYOjCOT7gtb.pgp
Description: PGP signature


Re: RFDisscusion: Big Packages.gz and Statistics and Comparing solution

2001-01-07 Thread Colin Watson
Goswin Brederlow <[EMAIL PROTECTED]> wrote:
>Also think of the benefit when updating. With some extra code on the
>client side (for example in apt) a pseudo deb can be created from the
>installed version and then rsynced against the new version.

Coo, yes, and you don't even need that much extra code: check in
/var/cache/apt/archives, and otherwise dpkg-repack. That would be nice.

-- 
Colin Watson [EMAIL PROTECTED]




Re: RFDisscusion: Big Packages.gz and Statistics and Comparing solution

2001-01-07 Thread Goswin Brederlow
> " " == zhaoway  <[EMAIL PROTECTED]> writes:

 > [A quick reply. And thanks for discuss with me! And no need to
 > Cc: me anymore, I updated my DB info.]

 > On Sun, Jan 07, 2001 at 05:51:26PM +0100, Goswin Brederlow
 > wrote:
>> The problem is that people want to browse descriptions to find
>> a package fairly often or just run "apt-cache show package" to
>> see what a package is about. So you need a method to download
>> all descriptions.

 > The big Packages.gz is still there. No conflict between the two
 > method.  And the newest, most updated information is always on
 > freshmeat.net. ;)

>> As far as I see theres no server support needed for rsync
>> support to operate better on compressed files.

 > Um, I don't know. But doesn't RSYNC need a server side RSYNC to
 > run?  Or, can I expect a HTTP server to provide RSYNC? (Maybe I
 > am stupid, I'll read RSYNC man page, later.)

Yes, eigther rsyncd or rshd/sshd needs to be running. But thats
already the case.

What I ment was that the new feature to uncompress archives before
rsyncing can (hoepfully) be done without any changes to existing
servers and without unpacking on the server side. All old servers
should do fine. Thats what I aim to archive.

>> If you update often, saving 1 Byte every time is worth it. If
>> you update seldomely, it doesn't realy matter that you download
>> a big Packages.gz. You would have to downlaod all the small
>> Packages.gz files also.

 > There is an approach to help this. But that is another
 > story. Later.

>> So you see, between potato and woody diff saves about 60%.
>> Also note that rsync usually performs better than cvs, since it
>> does not include the to be removed lines in the download.

 > Pretty sounding argument. My only critic on DIFF or RSYNC now
 > is just server support now. (Again, I'll read RSYNC man page
 > later. ;-)

 > The point is, can a storage server which provides merely HTTP
 > and/or FTP service do the job for apt-get?

Nope, but rsync servers already exist. Time to push people to convert
their services by pushing the users to use them.

Also think of the benefit when updating. With some extra code on the
client side (for example in apt) a pseudo deb can be created from the
installed version and then rsynced against the new version. You
wouldn't need a local mirror and you still save a lot of download.

Of cause this all needs support to rsync compressed archives
uncompressed in the rsync client.

MfG
Goswin




Re: RFDisscusion: Big Packages.gz and Statistics and Comparing solution

2001-01-07 Thread zhaoway
[A quick reply. And thanks for discuss with me! And no need to Cc: me
anymore, I updated my DB info.]

On Sun, Jan 07, 2001 at 05:51:26PM +0100, Goswin Brederlow wrote:
> The problem is that people want to browse descriptions to find a
> package fairly often or just run "apt-cache show package" to see what
> a package is about. So you need a method to download all descriptions.

The big Packages.gz is still there. No conflict between the two method.
And the newest, most updated information is always on freshmeat.net. ;)

> As far as I see theres no server support needed for rsync support to
> operate better on compressed files.

Um, I don't know. But doesn't RSYNC need a server side RSYNC to run?
Or, can I expect a HTTP server to provide RSYNC? (Maybe I am stupid,
I'll read RSYNC man page, later.)

> If you update often, saving 1 Byte every time is worth it. If you
> update seldomely, it doesn't realy matter that you download a big
> Packages.gz. You would have to downlaod all the small Packages.gz
> files also.

There is an approach to help this. But that is another story. Later.

> So you see, between potato and woody diff saves about 60%.
> Also note that rsync usually performs better than cvs, since it does
> not include the to be removed lines in the download.

Pretty sounding argument. My only critic on DIFF or RSYNC now is just
server support now. (Again, I'll read RSYNC man page later. ;-)

The point is, can a storage server which provides merely HTTP and/or
FTP service do the job for apt-get?

-- 
echo < */
EOF




Re: RFDisscusion: Big Packages.gz and Statistics and Comparing solution

2001-01-07 Thread Goswin Brederlow
> " " == zhaoway  <[EMAIL PROTECTED]> writes:

 > Hi, [Sorry for the thread broken, my POP3 provider stopped.]
 > [Please Cc: me! <[EMAIL PROTECTED]>. Sorry! ;-)]

 > 1. RFDiscussion on big Packages.gz

 > 1.1. Some statistics

 > % grep-dctrl -P
 > 
-sPackage,Priority,Installed-Size,Version,Depends,Provides,Conflicts,Filename,Size,MD5sum
 > -r '.*'
 > ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages
 > | gzip -9 > test.pkg.gz % gzip -9
 > ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages
 > % ls -alF *.gz -rw-r--r-- 1 zw zw 1157494 Jan 7 21:20
 > ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages.gz
 > -rw-r--r-- 1 zw zw 341407 Jan 7 21:23 test.pkg.gz %

Ahh, what does it do? Just take out the descriptions?

 > This approach is simple and straight and almost compatible. But
 > could accpect 10K more packages come into Debian with little
 > loss. Worth consideration. IMHO.

 > Better, if `Description:' etc. could come into seperate gzipped
 > file along with the Debian package.

The problem is that people want to browse descriptions to find a
package fairly often or just run "apt-cache show package" to see what
a package is about. So you need a method to download all descriptions.

Also many small files compress far less than one big file.


 > 2. Compare with DIFF and RSYNC method of APT

 > 2.1. They need server support. (More than a directory layout
 > and client tool changing.)

As far as I see theres no server support needed for rsync support to
operate better on compressed files.

 > 2.2. If you don't update for a long time, DIFF won't
 > help. RSYNC help less.

If you update often, saving 1 Byte every time is worth it. If you
update seldomely, it doesn't realy matter that you download a big
Packages.gz. You would have to downlaod all the small Packages.gz
files also.

And after that you download 500 MB of updates. So who cares about 2MB
packages.gz?

Also, diff and rsync do a great job even after a long time:

diff potato_Packages woody_Packages| gzip -9 | wc --bytes
 339831

% ls -l /debian/dists/woody/main/binary-i386/Packages.gz
-rw-r--r--1 mrvn mrvn   955259 Jan  6 21:03 
/debian/dists/woody/main/binary-i386/Packages.gz

So you see, between potato and woody diff saves about 60%.
Also note that rsync usually performs better than cvs, since it does
not include the to be removed lines in the download.

 > 3. Additional benefits

 > Seperate changelog.Debian and `Description:' etc. out into
 > meta-info file could help users: 1) reduce the bandwidth eaten
 > 2) help their upgrade decisions easily.

A global Description.gz might benefit from the fact that the
description doesn't change for each update, but the extra work needed
for this to realy work is not worth it. It would only benefit people
that do daily mirroring, where rsync would do just as good.

MfG
Goswin




RFDisscusion: Big Packages.gz and Statistics and Comparing solution

2001-01-07 Thread zhaoway
Hi,

[Sorry for the thread broken, my POP3 provider stopped.]
[Please Cc: me! <[EMAIL PROTECTED]>. Sorry! ;-)]

1. RFDiscussion on big Packages.gz

1.1. Some statistics

% grep-dctrl -P 
-sPackage,Priority,Installed-Size,Version,Depends,Provides,Conflicts,Filename,Size,MD5sum
 -r '.*' ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages | 
gzip -9 > test.pkg.gz
% gzip -9 ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages 
% ls -alF *.gz
-rw-r--r--1 zw   zw1157494 Jan  7 21:20 
ftp.jp.debian.org_debian_dists_unstable_main_binary-i386_Packages.gz
-rw-r--r--1 zw   zw 341407 Jan  7 21:23 test.pkg.gz
% 

This approach is simple and straight and almost compatible. But could
accpect 10K more packages come into Debian with little loss. Worth
consideration. IMHO.

Better, if `Description:' etc. could come into seperate gzipped file along
with the Debian package.

1.2. Little math

Suppose: 1) Site A get K hits of `apt-get update' per day. With everyday
passed, M extra hits added, as Debian goes more popular.
 2) N new packages come into Debian every day. After `gzip -9',
each contribute 206 byte to old package index file, and 61 to
new format index file. Current package number is P.
 3) Days passed as X axis.
 4) B as the byte size of the data flow for `apt-get update' for
that day. On the server side. (Client side K =1, M = 0)

  B = (K + M*X) * (P + N*X) * 206   is for old format package index
  B = (K + M*X) * (P + N*X) * 61is for new format package index
  
[It's still X^^2 function, anyway, so it's, in theory, not a big deal. ;-)]
[Only if we could eliminate the need for Package Index. That is possible. ]

  For K = 500, P = 6000, X = 0, Server side B is,
  [EMAIL PROTECTED] ~/tmp % echo $((6000*500*206))
  61800
  [EMAIL PROTECTED] ~/tmp % echo $((6000*500*61))
  18300
  [EMAIL PROTECTED] ~/tmp % 
  
[Though the caches could help a great lot for servers in such cases.]
 
2. Compare with DIFF and RSYNC method of APT

2.1. They need server support. (More than a directory layout and client tool
 changing.)

2.2. If you don't update for a long time, DIFF won't help. RSYNC help less.

3. Additional benefits

Seperate changelog.Debian and `Description:' etc. out into meta-info file
could help users: 1) reduce the bandwidth eaten 2) help their upgrade
decisions easily.

-- 
echo < */
EOF