Hi Jamie,
judging from some simple tests I've done with SSE intrinsics using GCC
on MacOSX, I would say that quite impressive performance gains can be
achieved.
Now why GCC cannot cope with ARM NEON intrinsics, that's another
question. Did you see the example on http://hilbert-space.de/?p=22 under
"First Results"? What comes out of the compiler is just awful! And that
matches quite well my own experiences with GCC and intrinsics - works
fine with SSE on Intel CPUs but fails completely with NEON on ARM.
Best regards,
Fritz
On 26.09.2012 14:47, Jamie Bullock wrote:
Hi Dan,
Similar question to Fritz, but do you know of any benchmarks to backup the
claim of massive performance improvements on intel chips?
best,
Jamie
--
http://jamiebullock.com
On 25 Sep 2012, at 08:58, Dan Stowell <[email protected]> wrote:
Hi,
FWIW, let me mention this lib by a colleague of mine:
"nova-simd" is a C++ header-only template library for taking advantage
of SIMD instructions in the kind of DSP processes used in audio
processing (i.e. not just basic stuff like vectorised mul+add and loop
unrolling, but also some more audio-specific things like ramps,
distortions, etc).
It's created by Tim Blechmann and it lives here:
https://github.com/timblechmann/nova-simd
It was originally implemented for SSE and SSE2 - we use it in
SuperCollider and it gives us massive performance improvements on
intel chips. It's got ARM NEON support too - I don't think I've seen any ARM
benchmarks of it yet.
Dan
On 25/09/12 01:54, Chris Townsend wrote:
I'm wondering if any of you have experience doing floating point DSP
processing on a moderately recent ARM processor, such as the Cortex-A8
or Cortex-A9? I keep hearing about how powerful ARM processors have
become in recent years, and that they have an exceptionally high price
to performance ratio. For DSP type processing I don't see a whole lot
of information out there, but I've found some floating point
benchmarks and the results seem to extremely poor. Using the Linpack
benchmark this paper shows 23 MFLOPS for a 600MHz A8 processor, versus
almost a GFLOP for a 1.6GHz Intel Atom!
http://www.slideshare.net/napoleaninlondon/arm-cortex-a8-vs-intel-atomarchitectural-and-benchmark-comparisons
Apparently the Cortex-A8 floating point processing is not even
pipelined, so it's no wonder that FPU performance is dreadful! I also
don't know if this benchmark is using the NEON SIMD instructions, but
even doubling or quadrupling this figure is still very poor. The A9
processor has a pipelined FPU, so potentially that could make a huge
difference. The latest generation Cortex-A15 processor has dual SIMD
floating point execution units, which sounds good on paper, but I
haven't found any real world data on that.
Thoughts?
Thanks,
Chris
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp
--
Dan Stowell
Postdoctoral Research Assistant
Centre for Digital Music
Queen Mary, University of London
Mile End Road, London E1 4NS
http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm
http://www.mcld.co.uk/
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp