Re: Performance of floating point instructions

2010-03-10 Thread Laurent Desnogues
On Wed, Mar 10, 2010 at 8:54 PM, Siarhei Siamashka
 wrote:
[...]
> I wonder why the compiler does not use real NEON instructions with -ffast-math
> option, it should be quite useful even for scalar code.
>
> something like:
>
> vld1.32  {d0[0]}, [r0]
> vadd.f32 d0, d0, d0
> vst1.32  {d0[0]}, [r0]
>
> instead of:
>
> flds     s0, [r0]
> fadds    s0, s0, s0
> fsts     s0, [r0]
>
> for:
>
> *float_ptr = *float_ptr + *float_ptr;
>
> At least NEON is pipelined and should be a lot faster on more complex code
> examples where it can actually benefit from pipelining. On x86, SSE2 is used
> quite nicely for floating point math.

Even if fast-math is known to break some rules, it only
breaks C rules IIRC.  OTOH, NEON FP has no support
for NaN and other nice things from IEEE754.

Anyway you're perhaps looking for -mfpu=neon, no?


Laurent
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Performance of floating point instructions

2010-03-10 Thread Laurent Desnogues
On Wed, Mar 10, 2010 at 7:29 PM, Alberto Mardegan
 wrote:
> Alberto Mardegan wrote:
>>
>> Does one have any figure about how the performance of the FPU is, compared
>> to integer operations?
>
> I added some profiling to the code, and I measured the time spent by a
> function which is operating on an array of points (whose coordinates are
> integers) and trasforming each of them into a geographic coordinates
> (latitude and longitude, floating point) and calculating the distance from
> the previous point.
>
> http://vcs.maemo.org/git?p=maemo-mapper;a=shortlog;h=refs/heads/gps_control
> map_path_calculate_distances() is in path.c,
> calculate_distance() is in utils.c,
> unit2latlon() is a pointer to unit2latlon_google() in tile_source.c
>
>
> The output (application compiled with -O0):
>
>
> double:
>
> map_path_calculate_distances: 110 ms for 8250 points
> map_path_calculate_distances: 5 ms for 430 points
>
> map_path_calculate_distances: 109 ms for 8250 points
> map_path_calculate_distances: 5 ms for 430 points
>
>
> float:
>
> map_path_calculate_distances: 60 ms for 8250 points
> map_path_calculate_distances: 3 ms for 430 points
>
> map_path_calculate_distances: 60 ms for 8250 points
> map_path_calculate_distances: 3 ms for 430 points
>
>
> float with fast FPU mode:
>
> map_path_calculate_distances: 50 ms for 8250 points
> map_path_calculate_distances: 2 ms for 430 points
>
> map_path_calculate_distances: 50 ms for 8250 points
> map_path_calculate_distances: 2 ms for 430 points
>
>
> So, it seems that there's a huge improvements when switching from doubles to
> floats; although I wonder if it's because of the FPU or just because the
> amount of data passed around is smaller.
> On the other hand, the improvements obtained by enabling the fast FPU mode
> is rather small -- but that might be due to the fact that the FPU operations
> are not a major player in this piece of code.

The "fast" mode only gains 1 or 2 cycles per FP instruction.
The FPU on Cortex-A8 is not pipelined and the fast mode
can't change that :-)

> One curious thing is that while making these changes, I forgot to change the
> math functions to there float version, so that instead of using:
>
> float x, y;
> x = sinf(y);
>
> I was using:
>
> float x, y;
> x = sin(y);
>
> The timings obtained this way are surprisingly (at least to me) bad:
>
> map_path_calculate_distances: 552 ms for 8250 points
> map_path_calculate_distances: 92 ms for 430 points
>
> map_path_calculate_distances: 552 ms for 8250 points
> map_path_calculate_distances: 91 ms for 430 points
>
> Much worse than the double version. The only reason I can think of, is the
> conversion from float to double and vice versa, but is it really that
> expensive?

This looks odd given that the 2 additional instructions
take 5 and 7 cycles.

> Anyway, I'll stick to using 32bit floats. :-)

As long as it fits your needs that seems wise :)


Laurent
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Performance of floating point instructions

2010-03-10 Thread Laurent Desnogues
On Wed, Mar 10, 2010 at 10:46 AM, Ove Kaaven  wrote:
> Alberto Mardegan skrev:
>> Does anyone know any tricks to optimize certain operations on arrays of
>> data?
>
> The answer to that is, obviously, to use the Cortex-A-series SIMD
> engine, NEON.
>
> Supposedly you may be able to make gcc generate NEON instructions with
> -mfpu=neon -ffast-math -ftree-vectorize (and perhaps -mfloat-abi=softfp,
> but that's the default in the Fremantle SDK anyway), but it's still not
> very good at it, so writing the asm by hand is still better... and I'm
> not sure if it can automatically vectorize library calls like sqrt.

One has to be careful with that approach:  Cortex-A9 SoC won't
necessarily come with a NEON SIMD unit, as it's optional.  So it'd
be better to also include code that doesn't assume one has a
NEON unit.


Laurent
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: rpm vs. deb and "universal binaries/packages"

2010-02-16 Thread Laurent Desnogues
On Tue, Feb 16, 2010 at 12:17 PM, Christopher Intemann
 wrote:
[...]
> Apple had a great success story when they almost seamlessly switched
> from PPC to Intel by introducing their universal binaries.

That wouldn't work too well for ARM:  you'd want ARMv6 with
or without VFP, ARMv7-A with or without VFP + with or
without NEON (and also with a poor VFP so that you should
use NEON).  Of course one can always hope dev's would
select at runtime the fastest function for the platform, but that's
only hope :-)


Laurent
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Fremantle OpenGL wrapper?

2009-06-03 Thread Laurent Desnogues
On Wed, Jun 3, 2009 at 9:25 AM, Kate Alhola  wrote:
>
> OpenGL-ES2.0 != OpenGL 1.x  . You can check http://wiki.maemo.org/OpenGL-ES
> . To port applications that use OpenGL-1.x API to OpenGL-ES2.0 need to use
> new API
> that you need to use if you are porting for desktop OpenGL 3.0 .

My understanding is that OpenGL 3.0 still has all of the features of OpenGL 2.x.
On the other hand OpenGL 3.1 removed the features that were deprecated in
3.0 (such as begin/end).


Laurent
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers


Re: Qemu Error on Maemo

2008-08-13 Thread Laurent Desnogues
On Wed, Aug 13, 2008 at 6:47 PM, David Greaves <[EMAIL PROTECTED]> wrote:
>
> Someone pointed me at this:
>  http://lists.gnu.org/archive/html/qemu-devel/2006-03/msg00202.html

This information is obsolete: qemu now supports ARMv6 and v7 instruction
sets.


Laurent
___
maemo-developers mailing list
maemo-developers@maemo.org
https://lists.maemo.org/mailman/listinfo/maemo-developers