On Thursday 31 Jul 2003 18:55, Tom Kaitchuck wrote:

> The entire directory:         2617995
> The tar of the directory:     2703360
> The zip of the directory:     583382  Ratio: 4.48 to 1
> Gzip of the tar (default):    497157  Ratio: 5.26 to 1 (compaired to the dir
> size) Gzip of the tar (--best):       492510  Ratio: 5.31 to 1 (compaired to the
> dir size) Bzip2 of the tar:           437351  Ratio: 5.98 to 1 (compaired to the dir
> size)

> This means, that if we assume this compression ratio, on a hypothetical
> size index, Bzip will result in enough improvement to move to the next
> power of 2 size 3/8ths of the time. Bottom line: On Freesites that are
> using HTML containers that have between 4KB and 4MB of uncompressed
> content,  Bzips will only use 80% of bandwidth than zips. Please bare in
> mind that Zips are already reducing this bandwidth to about 23% of what it
> once was, and that space wise, this is a small part of Freenet's content.

Moving down to the next power of two yields 50% reduction in space. There is 
no inbetween. So, if bzip2 is 20% smaller than zip on average, given the 
powers of 2 distribution of file sizes, how often will that yield that 50% 
improvement?

> Is is taking this number down to 19% worth all the extra effort it would
> take?

It depends on how often that would result in reducing the space usage by a 
notch. And I am not all that concerned that the effort required to use one 
compression library instead of another is that great.

However, it has to be noted that for archives, zip does appear to be a more 
suitable format because we don't have to decompress the whole archive before 
extracting the file(s). In that case, it would make sense to use the same 
compression library for both purposes, for the sake of consistency. This, 
advantage, only holds true until archive decompression caching is 
implemented. Then the two cases become pretty close, especially on smaller 
archives.

> When you consider just how much better things are with compression,
> I think the most important thing that can be done is to insure that
> everything that is not already compressed, that is inserted, gets
> compressed.

I completely agree there. I don't think there is any question about whether 
some compression will be implemented. It is purely a question of which 
algorithm/library will be used.

> Beyond that you can easily play games in the insertion
> utilities, like having 2 or 3 zips for a sight and shuffle files around so
> they get padded less.

Now you are talking about archives. I am talking about individual file 
compression.

> However I think that it ultimately comes down to the
> fact that zips are so much essayer to implement.

As I said, I am not sure that the difference will be that big. I don't think 
the bzip2 library API will be that much more verbose than the built in Java 
zip.

Gordan
_______________________________________________
devl mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/devl

Reply via email to