Re: What is the best way to resolve ARM alignment issues for large modules?

2010-05-08 Thread Martin Guy
On 5/7/10, Shaun Pinney shaun.pin...@bil.konicaminolta.us wrote:
  Essentially, we have code which works fine on x86/PowerPC but fails on ARM 
 due
  to differences in how misaligned accesses are handled.  The failures occur in
  multiple large modules developed outside of our team and we need to find a
  solution.  The best question to sum this up is, how can we use the compiler 
 to
  arrive at a complete solution to quickly identify all code locations which
  generate misaligned accesses and/or prevent the compiler from generating
  misaligned accesses?

Dunno about the compiler, but if you use the Linux kernel you can fiddle with
/proc/cpu/alignment.

By default it's set to 0, which silently gives garbage results when
unaligned accesses are made.

echo 3  /proc/cpu/alignment

will fix those misalignments using a kernel trap to emulate correct
behaviour (i.e. loading from bytes (char *)a to (char *)a + 3 in the
case of an int). Alternatively,

echo 5  /proc/cpu/alignment

will make an unaligned access cause a Bus Error, which usually kills
the process and you can identify the offending code by running it
under gdb.

Eliminating the unaligned accesses is tedious work, but the result
will run slightly faster than relying on fixups, as well as making it
portable to any word-aligned system.

   M


Re: What is the best way to resolve ARM alignment issues for large modules?

2010-05-08 Thread Mikael Pettersson
Shaun Pinney writes:
  Hello all,
  
  Essentially, we have code which works fine on x86/PowerPC but fails on ARM 
  due
  to differences in how misaligned accesses are handled.  The failures occur in
  multiple large modules developed outside of our team and we need to find a
  solution.  The best question to sum this up is, how can we use the compiler 
  to
  arrive at a complete solution to quickly identify all code locations which
  generate misaligned accesses and/or prevent the compiler from generating
  misaligned accesses?  Thanks for any advice.  I'll go into more detail below.
  
  ---
  We're using an ARM9 core (ARMv5) and notice that GCC generates misaligned 
  load
  instructions for certain modules in our platform.  For these modules, which 
  work
  correctly on x86/PowerPC, the misaligned loads causes failures.  This is 
  because
  the ARM rounds down misaligned addresses to the correct alignment, performs 
  the
  memory load, and rotates the data before placing in a register.  As a 
  result, a
  misaligned multi-byte load instruction on ARM actually loads memory below the
  requested address and does not load all upper bytes from address to 
  address +
  size - 1 so it appears to these modules as incorrect data.  On x86/PowerPC,
  loads do provide bytes from address to address + size - 1 regardless of
  alignment, so there are no problems.
  
  Fixing the code manually for ARM alignment has difficulties.  Due to the 
  large
  code volume of these external modules, it is difficult to identify all 
  locations
  which may be affected by misaligned accesses so the code can be rewritten.
  Currently, the only way to detect these issues is to use -Wcast-align and 
  view
  the output to get a list of potential alignment issues.  This appears to 
  list a
  large number of false positives so sorting through and doing code 
  investigation
  to locate true problems looks very time-consuming.  On the runtime side, 
  we've
  enabled alignment exceptions to catch some additional cases, but the problem 
  is
  that exceptions are only thrown for running code.  There is always the chance
  there is some more unexecuted 'hidden' code waiting to fail when the right
  circumstance occurs.  I'd like to provably remove the problem entirely and
  quickly.
  
  One idea, to guarantee no load/store alignment problems will affect our 
  product,
  was to force the compiler to generate single byte load/store instructions in
  place of multi byte load/store instructions when the alignment cannot be
  verified by the compiler.  Such as, for pointer typecasts where the 
  alignment is
  increased (e.g. char * to int *), accesses to misaligned fields of packed 
  data
  structures, accesses to structure fields not allocated on the stack, etc.  Is
  this available?  Obviously, this will add performance overhead, but would
  clearly resolve the issue for affected modules.
  
  Does the ARM compiler provide any other techniques to help with these types 
  of
  problems?  It'd be very helpful to find a fast and complete way to do this 
  work.
  Thanks!
  
  Thanks again for your advice.
  
  Best regards,
  Shaun
  
  BTW - our ARM also allows us to change the behavior of multi-byte load/store
  instructions so they read from 'address' to 'address + size - 1'.  However, 
  our
  OS, indicates that it intentionally uses misaligned loads/stores, so changing
  the ARM's load/store behavior to fix the module alignment problems would 
  break
  the OS in unknown places.  Also, because of this we cannot permanently enable
  alignment exceptions either.  I plan to discuss this more with our OS vendor.

You don't name the platform OS but the obvious solution (to me anyway) is to run
the code on ARM/Linux. On that platform you can instruct the kernel to take 
various
actions on alignment faults. In particular, by

 echo 5  /proc/cpu/alignment

you tell the kernel to log misalignment traps and then kill the offending 
process.

So you:

1. Run the application. It gets killed.
2. Retrieve the fault PC from the kernel message log.
3. Map it back to the application source. Fix the problem or add debugging code.
4. Repeat from step 1 until all alignment faults have been eliminated.

You can also instruct the kernel to (correctly) handle and emulate misaligned
loads/stores without killing the process. That allows you to run the code 
correctly,
though the fault handling will induce some performance overhead.

If you can't run Linux on your target HW then you could do the debugging in an
ARM emulator such as QEMU.


What is the best way to resolve ARM alignment issues for large modules?

2010-05-07 Thread Shaun Pinney
Hello all,

Essentially, we have code which works fine on x86/PowerPC but fails on ARM due
to differences in how misaligned accesses are handled.  The failures occur in
multiple large modules developed outside of our team and we need to find a
solution.  The best question to sum this up is, how can we use the compiler to
arrive at a complete solution to quickly identify all code locations which
generate misaligned accesses and/or prevent the compiler from generating
misaligned accesses?  Thanks for any advice.  I'll go into more detail below.

---
We're using an ARM9 core (ARMv5) and notice that GCC generates misaligned load
instructions for certain modules in our platform.  For these modules, which work
correctly on x86/PowerPC, the misaligned loads causes failures.  This is because
the ARM rounds down misaligned addresses to the correct alignment, performs the
memory load, and rotates the data before placing in a register.  As a result, a
misaligned multi-byte load instruction on ARM actually loads memory below the
requested address and does not load all upper bytes from address to address +
size - 1 so it appears to these modules as incorrect data.  On x86/PowerPC,
loads do provide bytes from address to address + size - 1 regardless of
alignment, so there are no problems.

Fixing the code manually for ARM alignment has difficulties.  Due to the large
code volume of these external modules, it is difficult to identify all locations
which may be affected by misaligned accesses so the code can be rewritten.
Currently, the only way to detect these issues is to use -Wcast-align and view
the output to get a list of potential alignment issues.  This appears to list a
large number of false positives so sorting through and doing code investigation
to locate true problems looks very time-consuming.  On the runtime side, we've
enabled alignment exceptions to catch some additional cases, but the problem is
that exceptions are only thrown for running code.  There is always the chance
there is some more unexecuted 'hidden' code waiting to fail when the right
circumstance occurs.  I'd like to provably remove the problem entirely and
quickly.

One idea, to guarantee no load/store alignment problems will affect our product,
was to force the compiler to generate single byte load/store instructions in
place of multi byte load/store instructions when the alignment cannot be
verified by the compiler.  Such as, for pointer typecasts where the alignment is
increased (e.g. char * to int *), accesses to misaligned fields of packed data
structures, accesses to structure fields not allocated on the stack, etc.  Is
this available?  Obviously, this will add performance overhead, but would
clearly resolve the issue for affected modules.

Does the ARM compiler provide any other techniques to help with these types of
problems?  It'd be very helpful to find a fast and complete way to do this work.
Thanks!

Thanks again for your advice.

Best regards,
Shaun

BTW - our ARM also allows us to change the behavior of multi-byte load/store
instructions so they read from 'address' to 'address + size - 1'.  However, our
OS, indicates that it intentionally uses misaligned loads/stores, so changing
the ARM's load/store behavior to fix the module alignment problems would break
the OS in unknown places.  Also, because of this we cannot permanently enable
alignment exceptions either.  I plan to discuss this more with our OS vendor.