Hi Jamie,

judging from some simple tests I've done with SSE intrinsics using GCC on MacOSX, I would say that quite impressive performance gains can be achieved.

Now why GCC cannot cope with ARM NEON intrinsics, that's another question. Did you see the example on http://hilbert-space.de/?p=22 under "First Results"? What comes out of the compiler is just awful! And that matches quite well my own experiences with GCC and intrinsics - works fine with SSE on Intel CPUs but fails completely with NEON on ARM.

Best regards,
Fritz

On 26.09.2012 14:47, Jamie Bullock wrote:

Hi Dan,

Similar question to Fritz, but do you know of any benchmarks to backup the 
claim of massive performance improvements on intel chips?

best,

Jamie

--
http://jamiebullock.com

On 25 Sep 2012, at 08:58, Dan Stowell <[email protected]> wrote:

Hi,

FWIW, let me mention this lib by a colleague of mine:

"nova-simd" is a C++ header-only template library for taking advantage
of SIMD instructions in the kind of DSP processes used in audio
processing (i.e. not just basic stuff like vectorised mul+add and loop
unrolling, but also some more audio-specific things like ramps,
distortions, etc).

It's created by Tim Blechmann and it lives here:
https://github.com/timblechmann/nova-simd

It was originally implemented for SSE and SSE2 - we use it in
SuperCollider and it gives us massive performance improvements on
intel chips. It's got ARM NEON support too - I don't think I've seen any ARM 
benchmarks of it yet.

Dan


On 25/09/12 01:54, Chris Townsend wrote:
I'm wondering if any of you have experience doing floating point DSP
processing on a moderately recent ARM processor, such as the Cortex-A8
or Cortex-A9?  I keep hearing about how powerful ARM processors have
become in recent years, and that they have an exceptionally high price
to performance ratio.  For DSP type processing I don't see a whole lot
of information out there, but I've found some floating point
benchmarks and the results seem to extremely poor.  Using the Linpack
benchmark this paper shows 23 MFLOPS for a 600MHz A8 processor, versus
almost a GFLOP for a 1.6GHz Intel Atom!

http://www.slideshare.net/napoleaninlondon/arm-cortex-a8-vs-intel-atomarchitectural-and-benchmark-comparisons

Apparently the Cortex-A8 floating point processing is not even
pipelined, so it's no wonder that FPU performance is dreadful!  I also
don't know if this benchmark is using the NEON SIMD instructions, but
even doubling or quadrupling this figure is still very poor.  The A9
processor has a pipelined FPU, so potentially that could make a huge
difference.  The latest generation Cortex-A15 processor has dual SIMD
floating point execution units, which sounds good on paper, but I
haven't found any real world data on that.

Thoughts?


Thanks,
Chris
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


--
Dan Stowell
Postdoctoral Research Assistant
Centre for Digital Music
Queen Mary, University of London
Mile End Road, London E1 4NS
http://www.elec.qmul.ac.uk/digitalmusic/people/dans.htm
http://www.mcld.co.uk/
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp


--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Reply via email to