Hi,
Comparing the performance of fft-test and fft-fixed-test on a Cortex-A8
(with Neon support enabled) I only see a very small performance increase
with the 16-bit fixed point version compared to the float version,
regardless of the FFT size (64, 256, 4096). I didn't see any
performance numbers in Måns' original patch post.
The obvious question is what limits the performance of the fixed-point
implementation? My assumption being that for many of the operations
involved, it should be possible to process twice the amount of elements
in the same amount of time.
(The underlying data type isn't changed for the fixed-point test (i.e.
the data is not 16-bit packed), but for small sizes the L1 data cache
should be pretty warm anyway so I don't suspect that the implementation
is throttled by memory.)
Thanks,
Orjan
--
Orjan Friberg
FlatFrog Laboratories AB
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel