Hi,

Comparing the performance of fft-test and fft-fixed-test on a Cortex-A8 (with Neon support enabled) I only see a very small performance increase with the 16-bit fixed point version compared to the float version, regardless of the FFT size (64, 256, 4096). I didn't see any performance numbers in Måns' original patch post.


The obvious question is what limits the performance of the fixed-point implementation? My assumption being that for many of the operations involved, it should be possible to process twice the amount of elements in the same amount of time.

(The underlying data type isn't changed for the fixed-point test (i.e. the data is not 16-bit packed), but for small sizes the L1 data cache should be pretty warm anyway so I don't suspect that the implementation is throttled by memory.)


Thanks,
Orjan

--
Orjan Friberg
FlatFrog Laboratories AB
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to