Matthew Toseland wrote: > On Thursday 23 October 2008 14:02, Florent Daigniere wrote: >> Florent Daigniere wrote: >>> Matthew Toseland wrote: >>>> On Thursday 23 October 2008 10:39, NextGen$ wrote: >>>>> * Daniel Cheng <j16sdiz+freenet at gmail.com> [2008-10-23 08:12:14]: >>>>> >>>>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <nextgens at >>>>>> freenetproject.org> >>>> wrote: >>>>>>> * Matthew Toseland <toad at amphibian.dyndns.org> [2008-10-22 20:48:24]: >>>>>>> >>>>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote: >>>>>>>>> * Matthew Toseland <toad at amphibian.dyndns.org> [2008-10-21 > 20:53:51]: >>>>>>>>>> On Tuesday 21 October 2008 16:24, nextgens at freenetproject.org >>>> wrote: >>>>>>>>>>> Author: nextgens >>>>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008) >>>>>>>>>>> New Revision: 23014 >>>>>>>>>>> >>>>>>>>>>> Modified: >>>>>>>>>>> trunk/freenet/src/freenet/client/ArchiveManager.java >>>>>>>>>>> trunk/freenet/src/freenet/client/ArchiveStoreContext.java >>>>>>>>>>> trunk/freenet/src/freenet/client/ClientMetadata.java >>>>>>>>>>> >>>> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java >>>>>>>>>>> trunk/freenet/src/freenet/client/Metadata.java >>>>>>>>>>> trunk/freenet/src/freenet/client/async/ClientPutter.java >>>>>>>>>>> >>>> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java >>>>>>>>>>> trunk/freenet/src/freenet/client/async/SingleFileFetcher.java >>>>>>>>>>> trunk/freenet/src/freenet/client/async/SingleFileInserter.java >>>>>>>>>>> trunk/freenet/src/freenet/client/async/SplitFileInserter.java >>>>>>>>>>> trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java >>>>>>>>>>> trunk/freenet/src/freenet/frost/message/FrostMessage.java >>>>>>>>>>> trunk/freenet/src/freenet/node/NodeARKInserter.java >>>>>>>>>>> trunk/freenet/src/freenet/node/TextModeClientInterface.java >>>>>>>>>>> trunk/freenet/src/freenet/node/fcp/ClientPut.java >>>>>>>>>>> trunk/freenet/src/freenet/node/fcp/DirPutFile.java >>>>>>>>>>> >>>> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java >>>>>>>>>>> Log: >>>>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! *** >>>>>>>>>>> It's still not backward compatible with stable but should be >>>>>>>>>> forward-compatible ;) >>>>>>>>> [...] see r23023 >>>>>>>>> >>>>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now? >>>>>>>> Shouldn't >>>>>>>>>> there be a max size configuration above which we don't try bzip2, >>>> perhaps >>>>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long >>>> time ... >>>>>>>>> I don't think we need one. Big files will take long to compress but >>>> will >>>>>>>> take >>>>>>>>> long to insert too. I think it's worth spending a few more CPU > cycles >>>> to >>>>>>>>> spare the insertion of a few blocks (plus their FEC blocks). >>>>>>>> I'm not convinced that this is acceptable from a usability point of >>>> view. >>>>>>>> Maybe we can provide a progress bar within the compression phase? On >>>> the new >>>>>>>> UI it is proposed to separate downloads which are not yet finalised >>>> (i.e. >>>>>>>> haven't fetched the last lot of metadata) from downloads that are... > we >>>> could >>>>>>>> do something similar with inserts in compression. >>>>>>>> >>>>>>> Have a look to what I have commited. From now on the compression is >>>> fully >>>>>>> serialized... We have one mutex, and only one compression job (just > like >>>> we >>>>>>> do for FEC encoding in fact) which means a even higher latency. >>>>>> It is feasible to insert some blocks of data while compressing? >>>>>> Gzip, bzip2 and lzma all support streams. We can collect the output > data >>>>>> as we feed data to them. >>>>>> >>>>> Right now we attempt to compress the full data using all the compression >>>>> algorithms and we keep the smallest resulting bucket. How do you plan to >>>>> chose the best-performing algorithm before actually compressing the > data? >>>>> I don't think that we can evaluate how well algorithms compress over a >>>> single >>>>> segment: it's just too small. >>>>> >>>>>> As soon as we get enough compressed data for FEC, we can insert them. >>>>>> This would be a great preformance improvement for large file on SMP. >>>>>> >>>>> That would involve rewritting most of the client-layer. >>>>> >>>>>> It this doable without changing the data format? >>>>>> >>>>> It's not about the data format; we insert the manifest at the end unless > not >>>>> told to by the earlyEncode parameter. >>>>> >>>>> IMHO we are debating for no real reason here: the real-time taken by the >>>>> compression phase is insignificant compared to the time taken by the >>>>> insertion process. Sure, trunk will take at least 3 times longer than >>>> current >>>>> stable before it starts inserting anything; but is that a big deal? You > will >>>>> need real numbers to convince me here. >>>> I'd like some numbers ... iirc it takes around 2 days to insert a > CD-sized >>>> ISO? How long does it take to bzip2 it? >>>> >>> It obviously depends on various factors including how fast you can do >>> I/Os, the block size and the number of cores you have. >>> >>> Here on what is likely to be "the worst case scenario": >>> $time bzip2 -c iso > iso.bz2|grep real >>> real 3m57552s >>> $time gzip -c iso > iso.gz|grep real >>> real 0m46.079s >>> $du -hs iso* >>> 560M iso >>> 506M iso.bz2 >>> 506M iso.gz >>> >>> There is no clear gain to bzip the content... but compression is worth >>> it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)! >>> Now if you tell me that freenet is able to insert 108MB of data in less >>> than 5mins, I will consider optimizing the compression step. >>> >>> They are solutions for guesstimating the efficiency of a given >>> compression algorithm but I am not sure they are worth implementing. >>> >> Here is some more representative data on a dual-core system: >> >> real 24m55.472s >> user 23m4.947s >> sys 0m10.633s >> 1884544 iso.lzma >> >> real 13m32.442s >> user 12m6.937s >> sys 0m7.784s >> 1934324 iso.bz2 >> My implementation of BZIP2 uses only one of the two cores >> >> real 3m19.066s >> user 2m11.332s >> sys 0m6.284s >> 1935056 iso.gz >> >> And the original : >> 2026416 iso >> >> So, we have: >> 63325 blocks for the original >> 60470 blocks with GZIP (4.5% gain) >> 60447 blocks with BZIP2 (4.5% gain) >> 58892 blocks with LZMA (7% gain) >> >> Of course those don't include the FEC blocks: So to sum up, yes I think >> it's worth spending half an hour of CPU time to "win" 4433*2=8866 >> blocks. And that's still true on a single core system where we would >> spend 1 hour. > > Okay, then the current trunk code is fine. Lzma would be great, if you can > solve the DoS issues; we'd probably use -5, I'm definitely not comfortable > with -7: > > -1 2 MB 1 MB > -2 12 MB 2 MB > -3 12 MB 1 MB > -4 16 MB 2 MB > -5 26 MB 3 MB > -6 45 MB 5 MB > -7 83 MB 9 MB > -8 159 MB 17 MB > -9 311 MB 33 MB > > Eventually we should show a progress bar within compression. In the short > term, it would be reasonably easy for fproxy to show "Compressing" when it is > compressing, instead of just having no progress bar. After that it should > move to "Starting", and after that show a progress bar. If you want me to > deal with that I'll get around to it eventually; should I file a bug? The UI > changes are needed for 0.8 but not for 1166. >
We produce an event when we start the compression; are you sure we are not displaying "compressing"? Fill in a ticket in any case.