Ximin Luo ?????: > Florent Daigniere wrote: >> Ximin Luo wrote: >>> Florent Daigniere wrote: >>>> Anyway, how do you determine if a file is already compressed or not without >>>> actually compressing it? Did you do the maths? >>> An heuristic that should get this right most of the time is just count how >>> many >>> times each byte value (0x00, 0x01, 0x02) etc appears in the first few >>> megabytes. If the distribution is not even then theoretically the data can >>> be >>> compressed further. >>> >>> X >> Well, that would typically fail with all the dictionary-based >> compression algorithms I'm familiar with... As the dictionary can be >> compressed further using another compression algorithm, possibly not >> dictionary based. > > How big could the dictionary be? We could do it for the last few megabytes > then, or say, a random interval in the last 3/4 of the file? Or take random > samples from the file, but that would take more time.
Problem with anything random is that the behaviour would then not be based on the file. If you insert a file and reinsert it again, you expect to get the same blocks (and hopefully the same key all together). > Also, I should think video/audio compression is somewhat more sophisticated > than dictionary? JPEG uses discrete cosine transforms... > > X - Volodya -- http://freedom.libsyn.com/ Echo of Freedom, Radical Podcast http://www.freedomporn.org/ Freedom Porn, anarchist and activist smut "None of us are free until all of us are free." ~ Mihail Bakunin