[freenet-dev] compressing files unnecessarily

Ximin Luo Mon, 14 Dec 2009 12:03:32 +0000

Florent Daigniere wrote:
> Ximin Luo wrote:
>> Florent Daigniere wrote:
>>> Anyway, how do you determine if a file is already compressed or not without
>>> actually compressing it? Did you do the maths? 
>> An heuristic that should get this right most of the time is just count how 
>> many
>> times each byte value (0x00, 0x01, 0x02) etc appears in the first few
>> megabytes. If the distribution is not even then theoretically the data can be
>> compressed further.
>>
>> X
> 
> Well, that would typically fail with all the dictionary-based 
> compression algorithms I'm familiar with... As the dictionary can be 
> compressed further using another compression algorithm, possibly not 
> dictionary based.


How big could the dictionary be? We could do it for the last few megabytes
then, or say, a random interval in the last 3/4 of the file? Or take random
samples from the file, but that would take more time.

Also, I should think video/audio compression is somewhat more sophisticated
than dictionary? JPEG uses discrete cosine transforms...

X

[freenet-dev] compressing files unnecessarily

Reply via email to