Florent Daigniere wrote:
> Matthew Toseland wrote:
>> On Thursday 23 October 2008 10:39, NextGen$ wrote:
>>> * Daniel Cheng <[EMAIL PROTECTED]> [2008-10-23 08:12:14]:
>>>
>>>> On Thu, Oct 23, 2008 at 6:49 AM, NextGen$ <[EMAIL PROTECTED]> 
>> wrote:
>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-22 20:48:24]:
>>>>>
>>>>>> On Wednesday 22 October 2008 01:09, NextGen$ wrote:
>>>>>>> * Matthew Toseland <[EMAIL PROTECTED]> [2008-10-21 20:53:51]:
>>>>>>>
>>>>>>>> On Tuesday 21 October 2008 16:24, [EMAIL PROTECTED] 
>> wrote:
>>>>>>>>> Author: nextgens
>>>>>>>>> Date: 2008-10-21 15:24:47 +0000 (Tue, 21 Oct 2008)
>>>>>>>>> New Revision: 23014
>>>>>>>>>
>>>>>>>>> Modified:
>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveManager.java
>>>>>>>>>    trunk/freenet/src/freenet/client/ArchiveStoreContext.java
>>>>>>>>>    trunk/freenet/src/freenet/client/ClientMetadata.java
>>>>>>>>>    
>> trunk/freenet/src/freenet/client/HighLevelSimpleClientImpl.java
>>>>>>>>>    trunk/freenet/src/freenet/client/Metadata.java
>>>>>>>>>    trunk/freenet/src/freenet/client/async/ClientPutter.java
>>>>>>>>>    
>> trunk/freenet/src/freenet/client/async/SimpleManifestPutter.java
>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileFetcher.java
>>>>>>>>>    trunk/freenet/src/freenet/client/async/SingleFileInserter.java
>>>>>>>>>    trunk/freenet/src/freenet/client/async/SplitFileInserter.java
>>>>>>>>>    trunk/freenet/src/freenet/clients/http/WelcomeToadlet.java
>>>>>>>>>    trunk/freenet/src/freenet/frost/message/FrostMessage.java
>>>>>>>>>    trunk/freenet/src/freenet/node/NodeARKInserter.java
>>>>>>>>>    trunk/freenet/src/freenet/node/TextModeClientInterface.java
>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/ClientPut.java
>>>>>>>>>    trunk/freenet/src/freenet/node/fcp/DirPutFile.java
>>>>>>>>>    
>> trunk/freenet/src/freenet/node/simulator/BootstrapPushPullTest.java
>>>>>>>>> Log:
>>>>>>>>> more work on bug #71: *** IT NEEDS TESTING! ***
>>>>>>>>> It's still not backward compatible with stable but should be
>>>>>>>> forward-compatible ;)
>>>>>>> [...] see r23023
>>>>>>>
>>>>>>>> Do we attempt to compress all files with bzip2 as well as gzip now?
>>>>>> Shouldn't
>>>>>>>> there be a max size configuration above which we don't try bzip2, 
>> perhaps
>>>>>>>> unless asked to via FCP? bzip2'ing ISOs could take a really long 
>> time ...
>>>>>>> I don't think we need one. Big files will take long to compress but 
>> will
>>>>>> take
>>>>>>> long to insert too. I think it's worth spending a few more CPU cycles 
>> to
>>>>>>> spare the insertion of a few blocks (plus their FEC blocks).
>>>>>> I'm not convinced that this is acceptable from a usability point of 
>> view.
>>>>>> Maybe we can provide a progress bar within the compression phase? On 
>> the new
>>>>>> UI it is proposed to separate downloads which are not yet finalised 
>> (i.e.
>>>>>> haven't fetched the last lot of metadata) from downloads that are... we 
>> could
>>>>>> do something similar with inserts in compression.
>>>>>>
>>>>> Have a look to what I have commited. From now on the compression is 
>> fully
>>>>> serialized... We have one mutex, and only one compression job (just like 
>> we
>>>>> do for FEC encoding in fact) which means a even higher latency.
>>>> It is feasible to insert some blocks of data while compressing?
>>>> Gzip, bzip2 and lzma all support streams. We can collect the output data
>>>> as we feed data to them.
>>>>
>>> Right now we attempt to compress the full data using all the compression
>>> algorithms and we keep the smallest resulting bucket. How do you plan to
>>> chose the best-performing algorithm before actually compressing the data?
>>>
>>> I don't think that we can evaluate how well algorithms compress over a 
>> single
>>> segment: it's just too small.
>>>
>>>> As soon as we get enough compressed data for FEC, we can insert them.
>>>> This would be a great preformance improvement for large file on SMP.
>>>>
>>> That would involve rewritting most of the client-layer.
>>>
>>>> It this doable without changing the data format?
>>>>
>>> It's not about the data format; we insert the manifest at the end unless not
>>> told to by the earlyEncode parameter.
>>>
>>> IMHO we are debating for no real reason here: the real-time taken by the
>>> compression phase is insignificant compared to the time taken by the
>>> insertion process. Sure, trunk will take at least 3 times longer than 
>> current
>>> stable before it starts inserting anything; but is that a big deal? You will
>>> need real numbers to convince me here.
>> I'd like some numbers ... iirc it takes around 2 days to insert a CD-sized 
>> ISO? How long does it take to bzip2 it?
>>
> 
> It obviously depends on various factors including how fast you can do 
> I/Os, the block size and the number of cores you have.
> 
> Here on what is likely to be "the worst case scenario":
> $time bzip2 -c iso > iso.bz2|grep real
> real 3m57552s
> $time gzip -c iso > iso.gz|grep real
> real 0m46.079s
> $du -hs iso*
> 560M iso
> 506M iso.bz2
> 506M iso.gz
> 
> There is no clear gain to bzip the content... but compression is worth 
> it: we spare 54*2=108 MB (You have to count FEC blocks too to be fair)! 
> Now if you tell me that freenet is able to insert 108MB of data in less 
> than 5mins, I will consider optimizing the compression step.
> 
> They are solutions for guesstimating the efficiency of a given 
> compression algorithm but I am not sure they are worth implementing.
> 

Here is some more representative data on a dual-core system:

real    24m55.472s
user    23m4.947s
sys     0m10.633s
1884544 iso.lzma

real    13m32.442s
user    12m6.937s
sys     0m7.784s
1934324 iso.bz2
My implementation of BZIP2 uses only one of the two cores

real    3m19.066s
user    2m11.332s
sys     0m6.284s
1935056 iso.gz

And the original :
2026416 iso

So, we have:
        63325 blocks for the original
        60470 blocks with GZIP (4.5% gain)
        60447 blocks with BZIP2 (4.5% gain)
        58892 blocks with LZMA (7% gain)

Of course those don't include the FEC blocks: So to sum up, yes I think 
it's worth spending half an hour of CPU time to "win" 4433*2=8866 
blocks. And that's still true on a single core system where we would 
spend 1 hour.
_______________________________________________
Devl mailing list
Devl@freenetproject.org
http://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Reply via email to