[freenet-dev] compressing files unnecessarily

Matthew Toseland Fri, 8 Jan 2010 23:48:03 +0000

On Wednesday 16 December 2009 16:11:33 Evan Daniel wrote:
> On Wed, Dec 16, 2009 at 10:20 AM, Daniel Cheng
> <j16sdiz+freenet at gmail.com> wrote:
> > On Wed, Dec 16, 2009 at 8:12 PM, Florent Daigniere
> > <nextgens at freenetproject.org> wrote:
> >> * Gregory Maxwell <gmaxwell at gmail.com> [2009-12-15 18:39:24]:
> >>
> >>> On Mon, Dec 14, 2009 at 11:51 AM, Florent Daigniere
> >>> <nextgens at freenetproject.org> wrote:
> >>> > Modern compression algorithms allow FAST decompression. We are talking
> >>> > 10 to 20 times faster here!
> >>> >
> >>> > http://en.wikipedia.org/wiki/Lempel-Ziv-Markov_chain_algorithm
> >>> > # Compression speed: approximately 1 MiB per second on a 2 GHz CPU
> >>> > # Decompression speed: 10-20 MiB per second on a 2 GHz CPU
> >>> >
> >>> > Anyway, the assumption is and has always been that CPU cycles are cheap
> >>> > contrary ?to network traffic. Moore's law doesn't apply to networks.
> >>>
> >>> It does, in fact. ? Networks are just a combination of processing,
> >>> storage, and connectivity. Each of those is itself a combination of
> >>> processing, storage, and connectivity. At the limit of this recusion
> >>> the performance of all of these, except the connectivity, is driven by
> >>> transistor density??? Moore's law.
> >>>
> >>
> >> The keyword here is "except". What characterize a bottleneck is the weakest
> >> part of the chain...
> >>
> >>> There is a *ton* of available bandwidth on optical fibers. For the
> >>> communication part we have Butter's law: "Butters' Law says the amount
> >>> of data coming out of an optical fiber is doubling every nine
> >>> months"[1]
> >>>
> >>
> >> Very informative... except you're comparing transit connectivity. Here we 
> >> are
> >> writing a p2p software, what matters is what the average user has on his 
> >> local
> >> loop.
> >>
> >> ADSL for most of them... or worst. If Freenet was run from servers with 
> >> fiber
> >> connectivity and high uptimes it would perfom much better.
> >>
> >>> It took me a day to find a graph of historical wholesale internet transit
> >>> prices:
> >>>
> >>> http://www.drpeering.net/a/Peering_vs_Transit___The_Business_Case_for_Peering_files/droppedImage_1.png
> >>> (In fact, this graph appears to be overstating the current cost for
> >>> bulk rate transit. Advertised pricing at the gbit port level is down
> >>> to $2/mbit/sec/month from some cut-rate providers; negotiated prices
> >>> can be lower still)
> >>>
> >>> Of course, Freenet does a lot of network criss-crossing... this shifts
> >>> the balance in favour of stronger compression but that doesn't
> >>> magically make compression that only gives a 1% reduction a win.
> >>>
> >>> [1] http://www.eetimes.com/story/OEG20000926S0065
> >>
> >> Like often we are arguing over not much: Freenet does heavy encryption and 
> >> FEC
> >> anyway... Adding compression to the mix is not much of an overhead compared
> >> to the rest.
> >>
> >> Actually I'm surprised no one suggested getting rid of encryption
> >> altogether; it would be waaaaaayyyy faster for sure.
> >>
> >> All I gathered from Ian's report is that we probably shouldn't have a
> >> COMPRESSION stage... on the user interface. Users obviously know better but
> >> they still don't understand what is being achieved here and press cancel.
> >> If we weren't saying what we are doing they would just complain about 
> >> speed,
> >> which we are used to.
> >>
> >> It's not like this is the first time we argue over it:
> >> Feb 11 Todd Walton ? ? ( ?24) [Tech] CHKs, Metadata, Encryption, 
> >> Compression,
> >> Hashing
> >> Aug 12 Matthew Toselan ( ?55) [Tech] Should we try to compress all files?
> >> => that one is worth reading, Ian already says he want the client to choose
> >> whether to compress or not... and we already argued over recompressing 
> >> video
> >> http://archives.freenetproject.org/message/20060812.161518.1148b5c5.en.html
> >> Jun 03 Matthew Toselan ( ?50) [freenet-dev] Memory overhead of compression 
> >> codecs
> >>
> >
> > Personally, I prefer a very light compression (e.g. gzip -2).
> > Most of the media files are compressed anyway, using more cpu for
> > compression does not help much and degrade user experience.
> > For other files, they are mostly text which compress well on gzip -2.
> >
> > LZMA and similar algorithm take memory (>32MiB). If we want to target
> > lower-end computer (i think we do, according to the recent discussion
> > on Chinese user), the LZMA choice does not make sense.
> 
> For some files (including text), LZMA is a big win.
> 
> Using index_00 from the wAnnA spider index:
> 
> 17509696 Search-24-index_00.xml
> 1786075  Search-24-index_00.xml.bz2
> 2280034  Search-24-index_00.xml.gz
> 2583718  Search-24-index_00.xml.gz2
> 1238409  Search-24-index_00.xml.lzma.9
> 1272083  Search-24-index_00.xml.lzma.default
> 
> .gz is default gzip; .gz2 is gzip -2.  lzma.default is default lzma;
> lzma.9 is lzma -9.
> 
> For that file, it's obvious that lzma or even lzma -9 is appropriate.
> It will be uploaded once, and downloaded many times, in a context
> where the user cares about time to retrieve the final, decompressed
> file.  FEC is slow compared to decompression, so it makes sense to
> compress it well even before you account for the slow network.
> 
> Obviously, different files need different settings.  I like the
> proposal of testing compression on a portion of the file, and then
> deciding based on those results.  However, I think we should use a
> larger section than 64 KiB; I propose 1 MiB.  Test compression is
> still fast, and accuracy is improved, especially on files where the
> header and the contents are different.
> 
> I suspect that, in general, the relative speed of lzma decompression
> and FEC decode means that lzma is the correct choice in any case where
> it beats gzip.  (LZMA decode is very fast, unlike eg bz2.  The
> compression is slow, but in most contexts a file will be uploaded once
> and downloaded multiple times.)  Also, I'd have to check, but I
> believe FEC has comparable memory requirements, meaning that using
> gzip instead of lzma doesn't reduce the amount of memory needed.


They are separate. We control the amount of memory needed by the dictionary 
size. At the moment this is static but it should depend on the file size for 
small stuff, and we should therefore be able to parallel compress more or less 
files depending on available memory (and of course on whether they are going to 
disk, in which case doing them in parallel may be silly; but in some 
interesting cases e.g. searching they are not).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20100108/521fccbc/attachment.pgp>

[freenet-dev] compressing files unnecessarily

Reply via email to