On Wednesday 02 May 2007 18:48, Eero Tamminen wrote:
> On x86 I prefer valgrind/cachegrind/callgrind/kcachegrind as
> that way one can browse the source code interactively with
> the profiling information.  Getting to know how the source
> really works is sometimes more useful than knowing the exact
> bottleneckedness percentage of some function.

Sure, I'm also using valgrind/cachegrind/callgrind/kcachegrind in my 
work quite often. It's a very nice tool.

But callgrind for statistics does not provide information about floating point
math and integer divisions, so real results on ARM may be really very
different.

Also cache behaviour on Nokia 770 arm926ej-s core is very different from 
cache on x86. Actually arm926ej-s  does not allocate cache line on write 
miss and all the x86 cpus do. This makes very big difference for the code
which does lots of writes to uncached memory. Cachegrind only simulates
write-allocate cache.

I created the following patch for simulating read-allocate behaviour 
in callgrind (for more precise arm926ej-s simulation):
http://ufo2000.sourceforge.net/files/vg-read-allocate-cache-patch.diff

Though arm1136jf-s core from N800 now supports write-allocate cache 
and this patch is not needed when optimizing for N800 :)

> > Did anybody try installing newer toolchains in scratchbox and use them
> > with maemo SDK? I just don't have much free time for these experiments
> > and don't want to break my installation of scratchbox which works now
> > (more or less acceptable)
>
> Installing new toolchains for Sbox shouldn't be a problem (if it's
> already available for it) and you can make a new Sbox target for each
> toolchain you want to test.

Thanks, I'll try that. In my preliminary tests, mplayer becomes a few percents
faster for mpeg4 decoding when switching to gcc 4.1.1 (tested a build compiled
with a crosscompiler outside scratchbox, with no audio/video output except for
SDL, so not really useful for end users, but fine for benchmarking with
gprof).

> > Building packages with new toolchain would probably need to have
> > libstdc++ linked statically for C++ applications to work on 770/N800, but
> > otherwise everything should be fine.
>
> Actually, you cannot really build static binaries with Glibc.
> It links some stuff always dynamically (nss for example).
> I don't know whether this is a problem in practice though.

I'm not going to statically link with glibc, but only with libstdc++ (standard
c++ library). There are a few known tricks to make gcc link with libstdc++
statically, but dynamically with all the rest of libraries. One of them is
creating a symlink to libstdc++.a in some empty directory and specify this
directory with -L option in gcc command line. When gcc will start linking,
it will be fooled to link with a static libstdc++ library. But I guess just
killing libstdc++.so in scratchbox will do the the job. After that, the
compiler theoretically should create binaries which should run with no
problems on the device even for c++ applications.

> > http://arm.com/documentation/ARMProcessor_Cores/index.html
> > 'ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual'
> > Chapter 4 'Unaligned and Mixed-Endian Data Access Support'
>
> Did you read the section on "ARMv6 unaligned data access restrictions"?
> Basically it doesn't work in all cases, the accesses are not atomic and
> have performance implications.

Did you also read Intel docs? Unaligned access has some restrictions
on x86 as well. Do you have an example of some practical case where 
hardware unaligned support from ARM11 would work worse than on x86?

The compiler should do the job aligning data for performance reasons (as it
does on x86 as well). But if you happen to have some unaligned data in 
memory anyway, just reading it with some minor unavoidable performance 
penalty will be faster than reading data one byte at a time and combining it
into a 32-bit or 16-bit value (instructions timings can be also found in this
Technical Reference Manual).

Enabling hardware unaligned access support should make explicit pointer
conversion hacks that are sometimes used in not very portable C code work
just like they do on x86. Which is a good thing in my opinion.

> > As ARM11 core used in N800 is little endian, does have floating point
> > unit and supports unaligned memory access in hardware (which only needs
> > to be enabled). It probably doesn't have any serious portability issues
> > to be aware of anymore and vast majority of software initially developed
> > for x86 should be easy to compile and run on it even without doing any
> > modifications.
>
> Compiler aligns everything correctly if your code is correct.
> I think non-aligned code is bug and performance issue.

In the real world such buggy code unfortunately exists. And it works fine on
x86 which is probably the most widely used platform for software development.

> > Enabling unaligned memory support will make life much easier for
> > developers unfamiliar with ARM platform. The number of applications for
> > N800 should grow up, as less newbee developers will be turned away
> > frustrated by the alignment bugs they have never heared about before.
>
> Can you give examples of issues people have with this?

Don't know if it is a good example, but it took me some time to figure out
what the hell is going on when I only started trying to port code to 
Nokia 770 :) I did not start flooding the mailing list or forums with 
requests for help but found a solution myself after some time of 
searching for information. I did know about endian issues on macs 
before, but it was completely unknown to me that there are 
architectures which have strict alignment requirement (except for 
SSE instructions on x86, which seemed like rather exception than 
a rule to me). Maybe some people also found a solution on their own, 
but maybe some have just given up without even complaining and we 
lost some developers. 

The number of posts asking for help is also nonzero:
http://www.internettablettalk.com/forums/showthread.php?t=2668

In addition, I remember having explained about alignment issues to a few 
people on #maemo channel over all this time, they all came complaining
about applications working on x86 but crashing on ARM.

So in my opinion this problem really exists, even if it is not so significant.
_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to