Florent Daigniere wrote: > Ximin Luo wrote: >> Florent Daigniere wrote: >>> Anyway, how do you determine if a file is already compressed or not without >>> actually compressing it? Did you do the maths? >> An heuristic that should get this right most of the time is just count how >> many >> times each byte value (0x00, 0x01, 0x02) etc appears in the first few >> megabytes. If the distribution is not even then theoretically the data can be >> compressed further. >> >> X > > Well, that would typically fail with all the dictionary-based > compression algorithms I'm familiar with... As the dictionary can be > compressed further using another compression algorithm, possibly not > dictionary based.
How big could the dictionary be? We could do it for the last few megabytes then, or say, a random interval in the last 3/4 of the file? Or take random samples from the file, but that would take more time. Also, I should think video/audio compression is somewhat more sophisticated than dictionary? JPEG uses discrete cosine transforms... X