Matthew Toseland wrote:
> On Thursday 23 October 2008 14:02, Florent Daigniere wrote:
>> Florent Daigniere wrote:
>>> Matthew Toseland wrote:
>>>> On Thursday 23 October 2008 10:39, NextGen$ wrote:
>>>>> * Daniel Cheng <j16sdiz+freenet at gmail.com> [2008-10-23 08:12:14]:
>>>>>
>>>>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <nextgens at 
>>>>>> freenetproject.org> 
>>>> wrote:
>>>>>>> * Matthew Toseland <toad at amphibian.dyndns.org> [2008-10-22 20:48:24]:
>>>>>>>
>>>>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote:
>>>>>>>>> * Matthew Toseland <toad at amphibian.dyndns.org> [2008-10-21 
> 20:53:51]:
>>>>>>>>>> On Tuesday 21 October 2008 16:24, nextgens at freenetproject.org 
>>>> wrote:
>>>>>>>>>>> Author: nextgens
>>>>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008)
>>>>>>>>>>> New Revision: 23014
>>>>>>>>>>>
>>>>>>>>>>> Modified:
>>>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveManager.java
>>>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveStoreContext.java
>>>>>>>>>>>    trunk/freenet/src/freenet/client/ClientMetadata.java
>>>>>>>>>>>    
>>>> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java
>>>>>>>>>>>    trunk/freenet/src/freenet/client/Metadata.java
>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/ClientPutter.java
>>>>>>>>>>>    
>>>> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java
>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileFetcher.java
>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileInserter.java
>>>>>>>>>>>    trunk/freenet/src/freenet/client/async/SplitFileInserter.java
>>>>>>>>>>>    trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java
>>>>>>>>>>>    trunk/freenet/src/freenet/frost/message/FrostMessage.java
>>>>>>>>>>>    trunk/freenet/src/freenet/node/NodeARKInserter.java
>>>>>>>>>>>    trunk/freenet/src/freenet/node/TextModeClientInterface.java
>>>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/ClientPut.java
>>>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/DirPutFile.java
>>>>>>>>>>>    
>>>> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java
>>>>>>>>>>> Log:
>>>>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! ***
>>>>>>>>>>> It's still not backward compatible with stable but should be
>>>>>>>>>> forward-compatible ;)
>>>>>>>>> [...] see r23023
>>>>>>>>>
>>>>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now?
>>>>>>>> Shouldn't
>>>>>>>>>> there be a max size configuration above which we don't try bzip2, 
>>>> perhaps
>>>>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long 
>>>> time ...
>>>>>>>>> I don't think we need one. Big files will take long to compress but 
>>>> will
>>>>>>>> take
>>>>>>>>> long to insert too. I think it's worth spending a few more CPU 
> cycles 
>>>> to
>>>>>>>>> spare the insertion of a few blocks (plus their FEC blocks).
>>>>>>>> I'm not convinced that this is acceptable from a usability point of 
>>>> view.
>>>>>>>> Maybe we can provide a progress bar within the compression phase? On 
>>>> the new
>>>>>>>> UI it is proposed to separate downloads which are not yet finalised 
>>>> (i.e.
>>>>>>>> haven't fetched the last lot of metadata) from downloads that are... 
> we 
>>>> could
>>>>>>>> do something similar with inserts in compression.
>>>>>>>>
>>>>>>> Have a look to what I have commited. From now on the compression is 
>>>> fully
>>>>>>> serialized... We have one mutex, and only one compression job (just 
> like 
>>>> we
>>>>>>> do for FEC encoding in fact) which means a even higher latency.
>>>>>> It is feasible to insert some blocks of data while compressing?
>>>>>> Gzip, bzip2 and lzma all support streams. We can collect the output 
> data
>>>>>> as we feed data to them.
>>>>>>
>>>>> Right now we attempt to compress the full data using all the compression
>>>>> algorithms and we keep the smallest resulting bucket. How do you plan to
>>>>> chose the best-performing algorithm before actually compressing the 
> data?
>>>>> I don't think that we can evaluate how well algorithms compress over a 
>>>> single
>>>>> segment: it's just too small.
>>>>>
>>>>>> As soon as we get enough compressed data for FEC, we can insert them.
>>>>>> This would be a great preformance improvement for large file on SMP.
>>>>>>
>>>>> That would involve rewritting most of the client-layer.
>>>>>
>>>>>> It this doable without changing the data format?
>>>>>>
>>>>> It's not about the data format; we insert the manifest at the end unless 
> not
>>>>> told to by the earlyEncode parameter.
>>>>>
>>>>> IMHO we are debating for no real reason here: the real-time taken by the
>>>>> compression phase is insignificant compared to the time taken by the
>>>>> insertion process. Sure, trunk will take at least 3 times longer than 
>>>> current
>>>>> stable before it starts inserting anything; but is that a big deal? You 
> will
>>>>> need real numbers to convince me here.
>>>> I'd like some numbers ... iirc it takes around 2 days to insert a 
> CD-sized 
>>>> ISO? How long does it take to bzip2 it?
>>>>
>>> It obviously depends on various factors including how fast you can do 
>>> I/Os, the block size and the number of cores you have.
>>>
>>> Here on what is likely to be "the worst case scenario":
>>> $time bzip2 -c iso > iso.bz2|grep real
>>> real 3m57552s
>>> $time gzip -c iso > iso.gz|grep real
>>> real 0m46.079s
>>> $du -hs iso*
>>> 560M iso
>>> 506M iso.bz2
>>> 506M iso.gz
>>>
>>> There is no clear gain to bzip the content... but compression is worth 
>>> it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)! 
>>> Now if you tell me that freenet is able to insert 108MB of data in less 
>>> than 5mins, I will consider optimizing the compression step.
>>>
>>> They are solutions for guesstimating the efficiency of a given 
>>> compression algorithm but I am not sure they are worth implementing.
>>>
>> Here is some more representative data on a dual-core system:
>>
>> real    24m55.472s
>> user    23m4.947s
>> sys     0m10.633s
>> 1884544 iso.lzma
>>
>> real    13m32.442s
>> user    12m6.937s
>> sys     0m7.784s
>> 1934324 iso.bz2
>> My implementation of BZIP2 uses only one of the two cores
>>
>> real    3m19.066s
>> user    2m11.332s
>> sys     0m6.284s
>> 1935056 iso.gz
>>
>> And the original :
>> 2026416 iso
>>
>> So, we have:
>>      63325 blocks for the original
>>      60470 blocks with GZIP (4.5% gain)
>>      60447 blocks with BZIP2 (4.5% gain)
>>      58892 blocks with LZMA (7% gain)
>>
>> Of course those don't include the FEC blocks: So to sum up, yes I think 
>> it's worth spending half an hour of CPU time to "win" 4433*2=8866 
>> blocks. And that's still true on a single core system where we would 
>> spend 1 hour.
> 
> Okay, then the current trunk code is fine. Lzma would be great, if you can 
> solve the DoS issues; we'd probably use -5, I'm definitely not comfortable 
> with -7:
> 
>       -1          2 MB               1 MB
>      -2         12 MB               2 MB
>      -3         12 MB               1 MB
>      -4         16 MB               2 MB
>      -5         26 MB               3 MB
>      -6         45 MB               5 MB
>      -7         83 MB               9 MB
>      -8        159 MB              17 MB
>      -9        311 MB              33 MB
> 
> Eventually we should show a progress bar within compression. In the short 
> term, it would be reasonably easy for fproxy to show "Compressing" when it is 
> compressing, instead of just having no progress bar. After that it should 
> move to "Starting", and after that show a progress bar. If you want me to 
> deal with that I'll get around to it eventually; should I file a bug? The UI 
> changes are needed for 0.8 but not for 1166.
> 

We produce an event when we start the compression; are you sure we are 
not displaying "compressing"?
Fill in a ticket in any case.

Reply via email to