Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-10 Thread Andrew Lenharth
  No, this won't work with very many compression algorithms.  Most
  algorithms update their dictionaries/probability tables dynamically based
  on input.  There isn't just one static table that could be used for
  another file, since the table is automatically updated after every (or
  near every) transmitted or decoded symbol.  Further, the algorithms start
  with blank tables on both ends (compression and decompression), the
  algorithm doesn't transmit the tables (which can be quite large for higher
  order statistical models).
  
 Well the table is perfectly static when the compression ends. Even if
 the table isn't transmitted itself, its information is contained in the
 compressed file, otherwise the file couldn't be decompressed either. 

But the tables you have at the end of the compression are NOT what you
want to use for the entire process.  The point of a dynamic table is to
allow the probabilities of different symbols to change dynamically as the
compression happens.  The tables used by the end of the file may be very
different than those used early in the file, to the point where they are
useless for the early part of the file.

Without fear of sounding redundent, EACH symbol is encoded with a
different set of tables.  That is the probability tables or dictionaries
are different for EACH and EVERY character of the file.  And as I said
before, the tables from latter in the compression (which you propose
using all the time) will not even generate the same compressed file as the
one they are based on nor will they be anywhere near optimal for the file.
That is, gzip --compress-like=foo.gz foo would generate a entirely
different foo.gz than gzip foo would.

I really suggets you investigate LZW based algorithms.  You would find
they do not behave as you think.  Only incredibly simple static
compression algorithms have the properties you desire.

Andrew Lenharth




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-10 Thread Otto Wyss
  gzip --compress-like=old-foo foo

gzip creates a dictionary (that gets realy large) of strings that are
used and encodes references to them. At the start the dictionary is
empty, so the first char is pretty much unencoded and inserted into
the dictionary. The next char is encoded using the first one and so
on. That way longer and longer strings enter the dictionary.

...

So, as you see, that method is not possible.

Okay lets asume gzip knows anyhow the table of the previous compression.
It starts compressing as usual, getting the first value to encode. Now
instead of just putting it at the first position it looks in the old
table and finds it at position 3. Gzip puts it at position 3 leaving the
first 2 unused. Now everthing goes fine until a value isn't found. This
time gzip appends it to the end of the table. Of course at a certain
point these to table diverge to much so gzip starts using a new table.

I don't know the outcome of such a compression but I think it will much
better correspond to the sources. Besides this algorithmus could be used as
gzip --compress=i386.gz

where i386 does contain a optimal table for i386 binaries. It will give
a better start compression rate while retaining an adaptable compression
and it allows to specify th optimal compression for any case.

I don't think mine is the best solution and I don't know if its working,
it just gives an idea how the problem could be solved. The principle is
to use the old compression scheme and adapt it as less as possible but
as much as necessary.

But there is a little optimisation that can be used (and is used by
the --rsyncable patch):

This is of course a very good solution, I only wouldn't call it
--rsyncable. I wouldn't make it an option at all. Anyhow it's not the
NonPlusUltra solution, there are cases where it will fail.

  Maybe it's time to design a compression alogrithmus which has
  this functionality (same difference rate as the source) from
  the ground up.

There are such algorithms, but they eigther allys use the same
dictionary or table (like some i386.exe runtime compressors that are
specialiesed to the patterns used in opcodes) or they waste space by
adding the dictionary/table to the compressed file. Thats a huge waste
with all the small diff files we have.

These all have fixed compression, as far as I know there isn't any which
combines a fixed with an adaptable compression. 

O. Wyss




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-09 Thread Martijn van Oosterhout
Goswin Brederlow wrote:
  gzip --rsyncable, aloready implemented, ask Rusty Russell.
 
   The --rsyncable switch might yield the same result (I haven't
   checked it sofar) but will need some internal knowledge how to
   determine the old compression.
 
 As far as I understand the patch it forces gzip to compress the binary
 into chunks of 8K. So every 8K theres a break where rsync can try to
 match blocks. It seems to help somehow, but I think it handles
 movement of data in a file badly (like when a line is inserted).

It's much cleverer than that. The blocks are not a fixed size, but
are determined by a rolling checksum based on a certain about of 
data. So if a line is inserted the data that causes the block
to finish moves also, so the block is bigger and the rest of
the file is compressed as the other one.

Neat huh?
-- 
Martijn van Oosterhout [EMAIL PROTECTED]
http://cupid.suninternet.com/~kleptog/




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-09 Thread Otto Wyss
  gzip --compress-like=old-foo foo
  
  where foo will be compressed as old-foo was or as aquivalent as
  possible. Gzip does not need to know anything about foo except how it
  was compressed. The switch --compress-like could be added to any
  compression algorithmus (bzip?) as long as it's easy to retrieve the
 
 No, this won't work with very many compression algorithms.  Most
 algorithms update their dictionaries/probability tables dynamically based
 on input.  There isn't just one static table that could be used for
 another file, since the table is automatically updated after every (or
 near every) transmitted or decoded symbol.  Further, the algorithms start
 with blank tables on both ends (compression and decompression), the
 algorithm doesn't transmit the tables (which can be quite large for higher
 order statistical models).
 
Well the table is perfectly static when the compression ends. Even if
the table isn't transmitted itself, its information is contained in the
compressed file, otherwise the file couldn't be decompressed either. 

O. Wyss




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-09 Thread Otto Wyss
   gzip --compress-like=old-foo foo
 
 AFAIK thats NOT possible with gzip. Same with bzip2.
 
Why not.

 I wish it where that simple.
 
I'm not saying it's simple, I'm saying it's possible. I'm not a
compression speciallist but from the theory there is nothing which
prevents this except from the actual implementation.

Maybe it's time to design a compression alogrithmus which has this
functionality (same difference rate as the source) from the ground up.

O. Wyss




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-09 Thread Goswin Brederlow
   == Otto Wyss [EMAIL PROTECTED] writes:

  gzip --compress-like=old-foo foo
 
 AFAIK thats NOT possible with gzip. Same with bzip2.
 
  Why not.

gzip creates a dictionary (that gets realy large) of strings that are
used and encodes references to them. At the start the dictionary is
empty, so the first char is pretty much unencoded and inserted into
the dictionary. The next char is encoded using the first one and so
on. That way longer and longer strings enter the dictionary.

Every sequence of bytes creates an unique (maybe not unique, but
pretty much so) dictionary that can be completly reconstructed from
the compressed data. Given the dictionary after the first n characters
the n+1's characted can be decoded and the next dictionary can be
calculated.

I think its pretty much impossible to find two files resulting in the
same dictionary. It certainly is impossible for the speed we need.

You cannot encoded two random files with the same dictionary, not
without adding the dictionary to the file, which gzip does not (since
its a waste).

So, as you see, that method is not possible.

But there is a little optimisation that can be used (and is used by
the --rsyncable patch):

If the dictionary gets to big, the compression ratio drops. It becomes
ineffective. Then gzip flushes the dictionary and starts again with an
empty one.

The --rsyncable patch now changes the moments when that will
happen. It looks for block of bytes that have a certain rolling
checksum and if it matches it flushes the dictionary. Most likely two
similar files will therefore flush the dictionary at exactly the same
places. If two files are equal after such a flush, the data will be
encoded the same way and rsync can match those blocks.

The author claims that it takes about 0.1-0.2% more space for
rsyncable gzip files, which is a loss I think everybody is willing to
pay.

 I wish it where that simple.
 
  I'm not saying it's simple, I'm saying it's possible. I'm not a
  compression speciallist but from the theory there is nothing
  which prevents this except from the actual implementation.

  Maybe it's time to design a compression alogrithmus which has
  this functionality (same difference rate as the source) from
  the ground up.

There are such algorithms, but they eigther allys use the same
dictionary or table (like some i386.exe runtime compressors that are
specialiesed to the patterns used in opcodes) or they waste space by
adding the dictionary/table to the compressed file. Thats a huge waste
with all the small diff files we have.


The --rsyncable patch looks promising for a start and will greatly
reduce the downloads for source mirrors, if its used.

MfG
Goswin




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-09 Thread Goswin Brederlow
   == Otto Wyss [EMAIL PROTECTED] writes:

  gzip --compress-like=old-foo foo   where foo will be
 compressed as old-foo was or as aquivalent as  possible. Gzip
 does not need to know anything about foo except how it  was
 compressed. The switch --compress-like could be added to any
  compression algorithmus (bzip?) as long as it's easy to
 retrieve the
 
 No, this won't work with very many compression algorithms.
 Most algorithms update their dictionaries/probability tables
 dynamically based on input.  There isn't just one static table
 that could be used for another file, since the table is
 automatically updated after every (or near every) transmitted
 or decoded symbol.  Further, the algorithms start with blank
 tables on both ends (compression and decompression), the
 algorithm doesn't transmit the tables (which can be quite large
 for higher order statistical models).
 
  Well the table is perfectly static when the compression
  ends. Even if the table isn't transmitted itself, its
  information is contained in the compressed file, otherwise the
  file couldn't be decompressed either.

Yes THEY are. Most of the time each character is encoded by its own
table, which is constrcted out of all the characters encoded or
decoded before. The tables are static but 100% dependent on the
data. Change one char and all later tables change. (except when gzip
cleans the dictionary, see other mail).

MfG
Goswin




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Goswin Brederlow
   == Jason Gunthorpe [EMAIL PROTECTED] writes:

  On 7 Jan 2001, Bdale Garbee wrote:

  gzip --rsyncable, aloready implemented, ask Rusty Russell.
 
 I have a copy of Rusty's patch, but have not applied it since I
 don't like diverging Debian packages from upstream this way.
 Wichert, have you or Rusty or anyone taken this up with the
 gzip upstream maintainer?

  Has anyone checked out what the size hit is, and how well
  ryncing debs like this performs in actual use? A study using
  xdelta on rsyncable debs would be quite nice to see. I recall
  that the results of xdelta on the uncompressed data were not
  that great.

That might be a problem of xdelta, I heard its pretty inefective.

MfG
Goswin




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Peter Eckersley
On Mon, Jan 08, 2001 at 08:27:53AM +1100, Sam Couter wrote:
 Otto Wyss [EMAIL PROTECTED] wrote:
  
  So why not solve the compression problem at the root? Why not try to
  change the compression in a way so it does produce a compressed result
  with the same (or similar) difference rate as the source? 
 
 Are you going to hack at *every* different kind of file format that you
 might ever want to rsync, to make it rsync friendly?
 
 Surely it makes more sense to make rsync able to more efficiently deal with
 different formats easily.

I think you reach the right conclusion, but for the wrong reason.

Either you fix rsync for each of n file formats, or you fix n file formats
for rsync :)

The advantage of doing it in rsync-land is that you can do a better job; you
apply the inverse of the compression format at both ends, calculate the
differences, and re-apply compression (probably gzip rather than the original
algorithm, but it depends) to these.  Trying to hack compression algorithms to
fit rsync is in general a bad idea.  Rusty could probably get away with it for
gzip, because it is very simple - decompression of gzip is interpreting codes
like repeat the 17 characters you saw 38 characters ago.

Other, more sophisticated algorithms, like bzip2 (go and read about the
Burrows-Wheeler Transform, it's amazing ;) would be much harder to hack in any
reasonable way.

--

| |= -+- |= |
|  |-  |  |- |\

Peter Eckersley
([EMAIL PROTECTED])
http://www.cs.mu.oz.au/~pde

for techno-leftie inspiration, take a look at
http://www.computerbank.org.au/


pgp8KT0vs0T0R.pgp
Description: PGP signature


Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Peter Eckersley
On Mon, Jan 08, 2001 at 11:58:26PM +1100, Peter Eckersley wrote:
 On Mon, Jan 08, 2001 at 08:27:53AM +1100, Sam Couter wrote:
  Otto Wyss [EMAIL PROTECTED] wrote:
   
   So why not solve the compression problem at the root? Why not try to
   change the compression in a way so it does produce a compressed result
   with the same (or similar) difference rate as the source? 
  
  Are you going to hack at *every* different kind of file format that you
  might ever want to rsync, to make it rsync friendly?
  
  Surely it makes more sense to make rsync able to more efficiently deal with
  different formats easily.
 
 I think you reach the right conclusion, but for the wrong reason.
 
 Either you fix rsync for each of n file formats, or you fix n file formats
 for rsync :)
 
 The advantage of doing it in rsync-land is that you can do a better job; you
 apply the inverse of the compression format at both ends, calculate the
 differences, and re-apply compression (probably gzip rather than the original

smacks self in head

That wasn't very clear of me, was it?  Gzip compression is fine for
transferring data on the wire.  At the other end, you still need to reuse the
original algorithm.  Re-applying compression is, of course, a potential
problem (as someone pointed out in another thread, even gzip is not in general
re-applicable).

So more correctly, an rsync module is better where it's possible...

 algorithm, but it depends) to these.  Trying to hack compression algorithms to
 fit rsync is in general a bad idea.  Rusty could probably get away with it for
 gzip, because it is very simple - decompression of gzip is interpreting codes
 like repeat the 17 characters you saw 38 characters ago.

--
Peter Eckersley http://www.cs.mu.oz.au/~pde 
([EMAIL PROTECTED])  TLI:  http://www.computerbank.org.au
~sig temporarily conservative pending divine intervention~
GPG fingerprint: 30BF 6A78 2013 DCFA 5985  E255 9D31 4A9A 7574 65BC


pgp4KnkuuM3vN.pgp
Description: PGP signature


Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Otto Wyss
 So why not solve the compression problem at the root? Why not try to
 change the compression in a way so it does produce a compressed
result
 with the same (or similar) difference rate as the source? 

Are you going to hack at *every* different kind of file format that you
might ever want to rsync, to make it rsync friendly?

No, I want rsync not even to be mentioned. All I want is something
similar to

gzip --compress-like=old-foo foo

where foo will be compressed as old-foo was or as aquivalent as
possible. Gzip does not need to know anything about foo except how it
was compressed. The switch --compress-like could be added to any
compression algorithmus (bzip?) as long as it's easy to retrieve the
compression scheme. Besides the following is completly legal but
probably not very sensible

gzip --compress-like=foo bar

where bar will be compressed as foo even if they might be totally
unrelated.

Rsync-ing Debian packages will certainly take advantage of this solution
but the solution itself is 100% pure compression specific. Anything
which needs identical compression could profit from this switch. It's up
to profiting application to provide the necessary wrapper around.

gzip --rsyncable, aloready implemented, ask Rusty Russell.

The --rsyncable switch might yield the same result (I haven't checked it
sofar) but will need some internal knowledge how to determine the old
compression.

As I read my mail again the syntax for compressing like could be

gzip --compress=foo bar

where bar is compressed as foo was. Foo is of course a compressed file
(how else could the compression be retrieved) while bar is not.

O. Wyss




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Matt Zimmerman
On Mon, Jan 08, 2001 at 05:28:56PM +0100, Otto Wyss wrote:

  So why not solve the compression problem at the root? Why not try to
  change the compression in a way so it does produce a compressed
 result
  with the same (or similar) difference rate as the source? 
 
 Are you going to hack at *every* different kind of file format that you
 might ever want to rsync, to make it rsync friendly?
 
 No, I want rsync not even to be mentioned. All I want is something
 similar to
 
 gzip --compress-like=old-foo foo
 
 where foo will be compressed as old-foo was or as aquivalent as
 possible. Gzip does not need to know anything about foo except how it
 was compressed. The switch --compress-like could be added to any
 compression algorithmus (bzip?) as long as it's easy to retrieve the
 compression scheme.

This breaks, though, when foo and old-foo are being compressed on different
systems, by different people, etc., as is often the case with Debian packages.
Something like --rsyncable seems more generally applicable, though users
probably need to ensure that they use the same compression level both times.
(this should not be a problem for Debian, as this is all done by tools).

-- 
 - mdz




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Andrew Lenharth
 No, I want rsync not even to be mentioned. All I want is something
 similar to
 
 gzip --compress-like=old-foo foo
 
 where foo will be compressed as old-foo was or as aquivalent as
 possible. Gzip does not need to know anything about foo except how it
 was compressed. The switch --compress-like could be added to any
 compression algorithmus (bzip?) as long as it's easy to retrieve the
 compression scheme. Besides the following is completly legal but
 probably not very sensible

No, this won't work with very many compression algorithms.  Most
algorithms update their dictionaries/probability tables dynamically based
on input.  There isn't just one static table that could be used for
another file, since the table is automatically updated after every (or
near every) transmitted or decoded symbol.  Further, the algorithms start
with blank tables on both ends (compression and decompression), the
algorithm doesn't transmit the tables (which can be quite large for higher
order statistical models).

I suggest you read about LZW and arithmetic encoding with higher order
stitistical models.  Try The Data Compression Book by Nelson (?) for a
fairly good overview of how these work.

What is better and easier is to ensure that the compression is
deturministic (gzip by default is not, bzip2 seems to be), so that rsync
can decompress, rsync, compress, and get the exact file back on the other
side.

Andrew Lenharth




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Goswin Brederlow
   == Otto Wyss [EMAIL PROTECTED] writes:

 So why not solve the compression problem at the root? Why not
 try to change the compression in a way so it does produce a
 compressed
  result
 with the same (or similar) difference rate as the source?
  Are you going to hack at *every* different kind of file format
 that you might ever want to rsync, to make it rsync friendly?
 
  No, I want rsync not even to be mentioned. All I want is
  something similar to

  gzip --compress-like=old-foo foo

AFAIK thats NOT possible with gzip. Same with bzip2.

  where foo will be compressed as old-foo was or as aquivalent as
  possible. Gzip does not need to know anything about foo except
  how it was compressed. The switch --compress-like could be
  added to any compression algorithmus (bzip?) as long as it's
  easy to retrieve the compression scheme. Besides the following
  is completly legal but probably not very sensible

  gzip --compress-like=foo bar

  where bar will be compressed as foo even if they might be
  totally unrelated.

  Rsync-ing Debian packages will certainly take advantage of this
  solution but the solution itself is 100% pure compression
  specific. Anything which needs identical compression could
  profit from this switch. It's up to profiting application to
  provide the necessary wrapper around.

 gzip --rsyncable, aloready implemented, ask Rusty Russell.

  The --rsyncable switch might yield the same result (I haven't
  checked it sofar) but will need some internal knowledge how to
  determine the old compression.

As far as I understand the patch it forces gzip to compress the binary
into chunks of 8K. So every 8K theres a break where rsync can try to
match blocks. It seems to help somehow, but I think it handles
movement of data in a file badly (like when a line is inserted).

  As I read my mail again the syntax for compressing like could
  be

  gzip --compress=foo bar

  where bar is compressed as foo was. Foo is of course a
  compressed file (how else could the compression be retrieved)
  while bar is not.

I wish it where that simple.

MfG
Goswin




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Goswin Brederlow
   == Andrew Lenharth [EMAIL PROTECTED] writes:

  What is better and easier is to ensure that the compression is
  deturministic (gzip by default is not, bzip2 seems to be), so
  that rsync can decompress, rsync, compress, and get the exact
  file back on the other side.

gzip encodes timestamps, which makes identical files seem to be
different when compressed.

Given the same file with the same timestamp, gzip should allways
generate an equal file.

Of cause that also depends on the options used.

MfG
Goswin




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread John O Sullivan
There was a few discussions on the rsync mailing lists about how to
handle compressed files, specifically .debs
I'd like to see some way of handling it better, but I don't think
it'll happen at the rsync end. Reasons include higher server cpu load
to (de)compress every file that is transferred and problems related to
different compression rates.
see this links for more info
http://lists.samba.org/pipermail/rsync/1999-October/001403.html

johno




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-08 Thread Goswin Brederlow
   == John O Sullivan [EMAIL PROTECTED] writes:

  There was a few discussions on the rsync mailing lists about
  how to handle compressed files, specifically .debs I'd like to
  see some way of handling it better, but I don't think it'll
  happen at the rsync end. Reasons include higher server cpu load
  to (de)compress every file that is transferred and problems
  related to different compression rates.  see this links for
  more info
  http://lists.samba.org/pipermail/rsync/1999-October/001403.html

Did you read my proposal a few days back? That should do the trick,
works without unpacking on the server side and actually reduces the
load on the server, because it then can cache the checksum,
i.e. calculate them once and reuse them every time.

MfG
Goswin




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Wichert Akkerman
Previously Otto Wyss wrote:
 So why not solve the compression problem at the root? Why not try to
 change the compression in a way so it does produce a compressed result
 with the same (or similar) difference rate as the source? 

gzip --rsyncable, aloready implemented, ask Rusty Russell.

Wichert.

-- 
   
 / Generally uninteresting signature - ignore at your convenience  \
| [EMAIL PROTECTED]  http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0  2805 3CB8 9250 2FA3 BC2D |




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Sam Couter
Otto Wyss [EMAIL PROTECTED] wrote:
 
 So why not solve the compression problem at the root? Why not try to
 change the compression in a way so it does produce a compressed result
 with the same (or similar) difference rate as the source? 

Are you going to hack at *every* different kind of file format that you
might ever want to rsync, to make it rsync friendly?

Surely it makes more sense to make rsync able to more efficiently deal with
different formats easily.
-- 
Sam Couter  |   Internet Engineer   |   http://www.topic.com.au/
[EMAIL PROTECTED]|   tSA Consulting  |
OpenPGP key available on key servers
OpenPGP fingerprint:  A46B 9BB5 3148 7BEA 1F05  5BD5 8530 03AE DE89 C75C


pgpy7IsSTN3dq.pgp
Description: PGP signature


Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Goswin Brederlow
   == Otto Wyss [EMAIL PROTECTED] writes:

  It's commonly agreed that compression does prevent rsync from
  profit of older versions of packages when synchronizing Debian
  mirrors. All the discussion about fixing rsync to solve this,
  even trough a deb-plugin is IMHO not the right way. Rsync's
  task is to synchronize files without knowing what's inside.

  So why not solve the compression problem at the root? Why not
  try to change the compression in a way so it does produce a
  compressed result with the same (or similar) difference rate as
  the source?

  As my understanding of compression goes, all have a kind of
  lookup table at the beginning where all compression codes where
  declared. Each time this table is created new, each time
  slightly different than the previous one depending on the

Nope. Only a few compression programs use a table at the start of the
file. Most build the table as they go along. Saves a lot of
information not to copy the table.

gzip (I hope I remeber that correctly) for example increases its table
with every character it encodes, so when you compress a file that does
only contain 0, the table will not contain any a's, so a can't even be
encoded.

bzip2 on the other hand resorts the input in some way to get better
compression ratios. You can't resort the input in the same way with
different data. The compression rate will dramatically drop otherwise.

ppm, as a third example, builds a new table for every character thats
transfered and encoded the probability range of the real character in
one of the current contexts. And the contexts are based on all
previous characters. The first character will be plain text and the
rest of the file will (most likely) differ if that char changes.

  source. So to get similar results when compressing means using
  the same or at least an aquivalent lookup table.  If it would
  be possible to feed the lookup table of the previous compressed
  file to the new compression process, an equal or at least
  similar compression could be achieved.

  Of course using allways the same lookup table means a deceasing
  of the compression rate. If there is an algorithmus which
  compares the old rate with an optimal rate, even this could be
  solved. This means a completly different compression from time
  to time. All depends how easy an aquivalent lookup table could
  be created without loosing to much of the compression rate.

Knowing the structure of the data can greatly increase the compression
ratio. Also knowing the structure can greatly reduce the differences
needed to sync two files.

So why should rsync stay stupid?

MfG
Goswin




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Bdale Garbee
[EMAIL PROTECTED] (Wichert Akkerman) writes:

 gzip --rsyncable, aloready implemented, ask Rusty Russell.

I have a copy of Rusty's patch, but have not applied it since I don't like
diverging Debian packages from upstream this way.  Wichert, have you or Rusty
or anyone taken this up with the gzip upstream maintainer?

Bdale




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Jason Gunthorpe

On 7 Jan 2001, Bdale Garbee wrote:

  gzip --rsyncable, aloready implemented, ask Rusty Russell.
 
 I have a copy of Rusty's patch, but have not applied it since I don't like
 diverging Debian packages from upstream this way.  Wichert, have you or Rusty
 or anyone taken this up with the gzip upstream maintainer?

Has anyone checked out what the size hit is, and how well ryncing debs
like this  performs in actual use? A study using xdelta on rsyncable debs
would be quite nice to see. I recall that the results of xdelta on the
uncompressed data were not that great.

Jason




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Matt Zimmerman
On Sun, Jan 07, 2001 at 08:16:08PM -0700, Bdale Garbee wrote:

 [EMAIL PROTECTED] (Wichert Akkerman) writes:
 
  gzip --rsyncable, aloready implemented, ask Rusty Russell.
 
 I have a copy of Rusty's patch, but have not applied it since I don't like
 diverging Debian packages from upstream this way.  Wichert, have you or Rusty
 or anyone taken this up with the gzip upstream maintainer?

As you know, it's been eons since the last upstream gzip release.  What are the
chances of upstream being interested in making this patch official?  Maybe we
should develop this stuff on a parallel branch, work the bugs out, and hope to
integrate it down the road?

-- 
 - mdz




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Bdale Garbee
[EMAIL PROTECTED] (Matt Zimmerman) writes:

 As you know, it's been eons since the last upstream gzip release.

On advice of the current FSF upstream, we moved to 1.3 in November 2000.  

I think it is entirely reasonable to talk to upstream about this before 
contemplating forking.

Bdale




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Wichert Akkerman
Previously Jason Gunthorpe wrote:
 Has anyone checked out what the size hit is, and how well ryncing debs
 like this  performs in actual use?

Rusty has, the size difference is amazingly minimal.

Wichert.

-- 
   
 / Generally uninteresting signature - ignore at your convenience  \
| [EMAIL PROTECTED]  http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0  2805 3CB8 9250 2FA3 BC2D |




Re: Solving the compression dilema when rsync-ing Debian versions

2001-01-07 Thread Wichert Akkerman
Previously Bdale Garbee wrote:
 Wichert, have you or Rusty or anyone taken this up with the gzip upstream
 maintainer?

I'm not sure; I'll meet Rusty next week at linux.conf.au, I'll ask
him.

Wichert.

-- 
   
 / Generally uninteresting signature - ignore at your convenience  \
| [EMAIL PROTECTED]  http://www.liacs.nl/~wichert/ |
| 1024D/2FA3BC2D 576E 100B 518D 2F16 36B0  2805 3CB8 9250 2FA3 BC2D |