It is relatively easy to convert some SSE2/3/4 code into AVX2: just use AVX2 intrinsics instead of SSE and the logic of the functions. Unfortunately my CPU doesn't have AVX2. But today I managed to briefly test AVX2 code on i5 Haswell CPU. Unfortunately I wasn't able to run full test suite on Haswell, but it seems that the new code works correctly. The results of a quick performance test are:
16-bit WAV encoding: ~20% speed increase 24-bit WAV encoding: ~40% speed increase The speed increase isn't impressive for 16-bit input... and this code requires Haswell. But it's still some speed improvement that will cost another increase of the size of executable files (by 20-30 kB). What do you think? Also the new code requires AVX CPU/OS support detection code to be added to cpu.c I'd like to simplify it slightly further before this. For example, by removing 3DNow code because it's hardly relevant these days. _______________________________________________ flac-dev mailing list flac-dev@xiph.org http://lists.xiph.org/mailman/listinfo/flac-dev