Hi all, Recently I've been investigating various ways to improve FLAC compression, and now I've stumbled upon quite a small change with large implications.
Flake, an alternative compressor using the FLAC format, has always provided better compression than FLAC. I've found out why: Flake uses doubles (64-bit floating point) for calculating autocorrelation values, while FLAC uses regular floats (32-bit floating point). The largest problem with implementing this, is that intrinsics routines (for SSE and VSX) have to be rewritten. I've done quite a bit of testing and comparing, see the next two PDFs. http://www.audiograaf.nl/misc_stuff/double-autoc-with-sse2-intrinsics.pdf http://www.audiograaf.nl/misc_stuff/double-autoc-with-sse2-intrinsics-per-track.pdf There are four lines, all going from setting -4 as the rightmost (fastest) through -5, -6, -7 to -8 as the leftmost (slowest). - darkblue line is current git - green line is current git but with SSE intrinsics for autocorrelation calculation disabled - lightblue line is calculating autocorrelation in doubles instead of real - red line is calculating autocorrelation in doubles but with new SSE2 intrinsics routines As you can see in the PDFs, the overall gain for setting -4 is large (0.3%point or 0.5%) with minimal slowdown. This gain grows smaller while the slowdown increases with increasing setting. The -per-track PDF shows that the gain is highly dependent on the kind of audio that is being compressed. Tracks with strong tonal components, like piano music (14 and 15) benefit the most. Orchestral music (2, 6, 10 and 9) and electronic music (4 and 13) benefit in varying degrees. Music with much more noisy content, like metal (3, 5 and 12) have (almost) no benefit. However, in the tracks that benefit, gains can be large. Track 15, which is piano music, sees a gain of 2.2%point or 5% for setting -4 and 1%point or 2% for -8. Code is here: https://github.com/ktmf01/flac/tree/autoc-sse2 Before I send a push request, I'd like to discuss a choice that has to be made. I see a few options - Don't switch to autoc[] as doubles, keep current speed and ignore possible compression gain - Switch to autoc[] as doubles, but keep current intrinsics routines. This means some platforms (with only SSE but not SSE2 or with VSX) will get less compression, but won't see a large slowdown. - Switch to autoc[] as doubles, but remove current SSE and disable VSX intrinsics for someone to update them later (I don't have any POWER8 or POWER9 hardware to test). This means all platforms will get the same compression, but some (with only SSE but not SSE2 or with VSX) will see a large slowdown. Thanks in advance for your replies and comments on this. Kind regards, Martijn van Beurden _______________________________________________ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev