It is relatively easy to convert some SSE2/3/4 code into AVX2: just
use AVX2 intrinsics instead of SSE and the logic of the functions.
Unfortunately my CPU doesn't have AVX2. But today I managed to briefly
test AVX2 code on i5 Haswell CPU. Unfortunately I wasn't able to run
full test suite on Haswell, but it seems that the new code works correctly.
The results of a quick performance test are:

16-bit WAV encoding: ~20% speed increase
24-bit WAV encoding: ~40% speed increase

The speed increase isn't impressive for 16-bit input...
and this code requires Haswell. But it's still some
speed improvement that will cost another increase of
the size of executable files (by 20-30 kB).

What do you think?


Also the new code requires AVX CPU/OS support detection code to be added
to cpu.c I'd like to simplify it slightly further before this. For example,
by removing 3DNow code because it's hardly relevant these days.
_______________________________________________
flac-dev mailing list
flac-dev@xiph.org
http://lists.xiph.org/mailman/listinfo/flac-dev

Reply via email to