Re: Performance of floating point instructions
On Wed, Mar 10, 2010 at 8:54 PM, Siarhei Siamashka wrote: [...] > I wonder why the compiler does not use real NEON instructions with -ffast-math > option, it should be quite useful even for scalar code. > > something like: > > vld1.32 {d0[0]}, [r0] > vadd.f32 d0, d0, d0 > vst1.32 {d0[0]}, [r0] > > instead of: > > flds s0, [r0] > fadds s0, s0, s0 > fsts s0, [r0] > > for: > > *float_ptr = *float_ptr + *float_ptr; > > At least NEON is pipelined and should be a lot faster on more complex code > examples where it can actually benefit from pipelining. On x86, SSE2 is used > quite nicely for floating point math. Even if fast-math is known to break some rules, it only breaks C rules IIRC. OTOH, NEON FP has no support for NaN and other nice things from IEEE754. Anyway you're perhaps looking for -mfpu=neon, no? Laurent ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Performance of floating point instructions
On Wed, Mar 10, 2010 at 7:29 PM, Alberto Mardegan wrote: > Alberto Mardegan wrote: >> >> Does one have any figure about how the performance of the FPU is, compared >> to integer operations? > > I added some profiling to the code, and I measured the time spent by a > function which is operating on an array of points (whose coordinates are > integers) and trasforming each of them into a geographic coordinates > (latitude and longitude, floating point) and calculating the distance from > the previous point. > > http://vcs.maemo.org/git?p=maemo-mapper;a=shortlog;h=refs/heads/gps_control > map_path_calculate_distances() is in path.c, > calculate_distance() is in utils.c, > unit2latlon() is a pointer to unit2latlon_google() in tile_source.c > > > The output (application compiled with -O0): > > > double: > > map_path_calculate_distances: 110 ms for 8250 points > map_path_calculate_distances: 5 ms for 430 points > > map_path_calculate_distances: 109 ms for 8250 points > map_path_calculate_distances: 5 ms for 430 points > > > float: > > map_path_calculate_distances: 60 ms for 8250 points > map_path_calculate_distances: 3 ms for 430 points > > map_path_calculate_distances: 60 ms for 8250 points > map_path_calculate_distances: 3 ms for 430 points > > > float with fast FPU mode: > > map_path_calculate_distances: 50 ms for 8250 points > map_path_calculate_distances: 2 ms for 430 points > > map_path_calculate_distances: 50 ms for 8250 points > map_path_calculate_distances: 2 ms for 430 points > > > So, it seems that there's a huge improvements when switching from doubles to > floats; although I wonder if it's because of the FPU or just because the > amount of data passed around is smaller. > On the other hand, the improvements obtained by enabling the fast FPU mode > is rather small -- but that might be due to the fact that the FPU operations > are not a major player in this piece of code. The "fast" mode only gains 1 or 2 cycles per FP instruction. The FPU on Cortex-A8 is not pipelined and the fast mode can't change that :-) > One curious thing is that while making these changes, I forgot to change the > math functions to there float version, so that instead of using: > > float x, y; > x = sinf(y); > > I was using: > > float x, y; > x = sin(y); > > The timings obtained this way are surprisingly (at least to me) bad: > > map_path_calculate_distances: 552 ms for 8250 points > map_path_calculate_distances: 92 ms for 430 points > > map_path_calculate_distances: 552 ms for 8250 points > map_path_calculate_distances: 91 ms for 430 points > > Much worse than the double version. The only reason I can think of, is the > conversion from float to double and vice versa, but is it really that > expensive? This looks odd given that the 2 additional instructions take 5 and 7 cycles. > Anyway, I'll stick to using 32bit floats. :-) As long as it fits your needs that seems wise :) Laurent ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Performance of floating point instructions
On Wed, Mar 10, 2010 at 10:46 AM, Ove Kaaven wrote: > Alberto Mardegan skrev: >> Does anyone know any tricks to optimize certain operations on arrays of >> data? > > The answer to that is, obviously, to use the Cortex-A-series SIMD > engine, NEON. > > Supposedly you may be able to make gcc generate NEON instructions with > -mfpu=neon -ffast-math -ftree-vectorize (and perhaps -mfloat-abi=softfp, > but that's the default in the Fremantle SDK anyway), but it's still not > very good at it, so writing the asm by hand is still better... and I'm > not sure if it can automatically vectorize library calls like sqrt. One has to be careful with that approach: Cortex-A9 SoC won't necessarily come with a NEON SIMD unit, as it's optional. So it'd be better to also include code that doesn't assume one has a NEON unit. Laurent ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: rpm vs. deb and "universal binaries/packages"
On Tue, Feb 16, 2010 at 12:17 PM, Christopher Intemann wrote: [...] > Apple had a great success story when they almost seamlessly switched > from PPC to Intel by introducing their universal binaries. That wouldn't work too well for ARM: you'd want ARMv6 with or without VFP, ARMv7-A with or without VFP + with or without NEON (and also with a poor VFP so that you should use NEON). Of course one can always hope dev's would select at runtime the fastest function for the platform, but that's only hope :-) Laurent ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Fremantle OpenGL wrapper?
On Wed, Jun 3, 2009 at 9:25 AM, Kate Alhola wrote: > > OpenGL-ES2.0 != OpenGL 1.x . You can check http://wiki.maemo.org/OpenGL-ES > . To port applications that use OpenGL-1.x API to OpenGL-ES2.0 need to use > new API > that you need to use if you are porting for desktop OpenGL 3.0 . My understanding is that OpenGL 3.0 still has all of the features of OpenGL 2.x. On the other hand OpenGL 3.1 removed the features that were deprecated in 3.0 (such as begin/end). Laurent ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers
Re: Qemu Error on Maemo
On Wed, Aug 13, 2008 at 6:47 PM, David Greaves <[EMAIL PROTECTED]> wrote: > > Someone pointed me at this: > http://lists.gnu.org/archive/html/qemu-devel/2006-03/msg00202.html This information is obsolete: qemu now supports ARMv6 and v7 instruction sets. Laurent ___ maemo-developers mailing list maemo-developers@maemo.org https://lists.maemo.org/mailman/listinfo/maemo-developers