from:"Richard Earnshaw"

Re: [Linaro-TCWG-CI] gcc-15-3607-g9a94c8ffdc8b: FAIL: 23 regressions: 22 improvements on master-thumb_m23_soft_eabi

2024-09-25 Thread Richard Earnshaw (lists)


On 24/09/2024 22:20, Maxim Kuvyrkov wrote:

On Sep 25, 2024, at 05:13, Richard Earnshaw (lists)  
wrote:

On 21/09/2024 08:49, ci_not...@linaro.org wrote:

Dear contributor, our automatic CI has detected problems related to your 
patch(es). Please find some details below.  If you have any questions, please 
follow up on linaro-toolchain@lists.linaro.org mailing list, Libera's 
#linaro-tcwg channel, or ping your favourite Linaro toolchain developer on the 
usual project channel.
We understand that it might be difficult to find the necessary logs or 
reproduce the issue locally. If you can't get what you need from our CI within 
minutes, let us know and we will be happy to help.
We track this report status in https://linaro.atlassian.net/browse/GNU-1349 , 
please let us know if you are looking at the problem and/or when you have a fix.
In  arm-eabi cortex-m23 soft after:
   | commit gcc-15-3607-g9a94c8ffdc8b
   | Author: Richard Earnshaw 
   | Date:   Thu Sep 12 14:24:55 2024 +0100
   |
   | arm: testsuite: make use of -mcpu=unset/-march=unset
   |
   | This patch makes use of the new ability to unset the CPU or
   | architecture flags on the command line to enable several more tests on
   | Arm.  It doesn't cover every case and it does enable some tests that
   | now fail for different reasons when the tests are no-longer skipped;
   | these were failing anyway for other testsuite configurations, so it's
   | ... 22 lines of the commit log omitted.
FAIL: 23 regressions: 22 improvements
regressions.sum:
=== gcc tests ===
Running gcc:gcc.target/arm/arm.exp ...
FAIL: gcc.target/arm/scd42-2.c scan-assembler mov[ \t].*272
Running gcc:gcc.target/arm/cmse/cmse.exp ...
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O2   scan-assembler 
lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O2   scan-assembler 
lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O3 -g   scan-assembler 
lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O3 -g   scan-assembler 
lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
... and 19 more entries
improvements.sum:
=== gcc tests ===
Running gcc:gcc.target/arm/cmse/cmse.exp ...
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O3 -g   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O3 -g   scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp 
-mthumb  -O3 -g   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
... and 16 more entries


I can't make any sense of this at all.  After hours wasted trying to find the 
configuration information from the logs (it's there, but to the inexperienced 
user of your reports, it is buried far too deep), I'm still none-the-wiser.


Hi Richard,

Thanks for looking into this.  Do send us a quick email if you can't immideatelly find 
what you are looking for.  As our email says "We understand that it might be 
difficult to find the necessary logs or reproduce the issue locally. If you can't get 
what you need from our CI within minutes, let us know and we will be happy to help."

Regarding adding configure information to our reports -- we are working on it.



Great.  Where do I find the dejagnu target-list information for a run? 
ie the site.exp file (or whatever the DEJAGNU environment variable 
points at).




  All I can see is that things like

FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1

have changed to

FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c  -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O2   scan-assembler 
lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1

(ie that -mcpu=unset has been added to the test name).

That's not a regression, it's a simple FAIL->FAIL

Re: [Linaro-TCWG-CI] gcc-15-3607-g9a94c8ffdc8b: FAIL: 23 regressions: 22 improvements on master-thumb_m23_soft_eabi

2024-09-24 Thread Richard Earnshaw (lists)


On 21/09/2024 08:49, ci_not...@linaro.org wrote:

Dear contributor, our automatic CI has detected problems related to your 
patch(es).  Please find some details below.  If you have any questions, please 
follow up on linaro-toolchain@lists.linaro.org mailing list, Libera's 
#linaro-tcwg channel, or ping your favourite Linaro toolchain developer on the 
usual project channel.

We understand that it might be difficult to find the necessary logs or 
reproduce the issue locally. If you can't get what you need from our CI within 
minutes, let us know and we will be happy to help.

We track this report status in https://linaro.atlassian.net/browse/GNU-1349 , 
please let us know if you are looking at the problem and/or when you have a fix.

In  arm-eabi cortex-m23 soft after:

   | commit gcc-15-3607-g9a94c8ffdc8b
   | Author: Richard Earnshaw 
   | Date:   Thu Sep 12 14:24:55 2024 +0100
   |
   | arm: testsuite: make use of -mcpu=unset/-march=unset
   |
   | This patch makes use of the new ability to unset the CPU or
   | architecture flags on the command line to enable several more tests on
   | Arm.  It doesn't cover every case and it does enable some tests that
   | now fail for different reasons when the tests are no-longer skipped;
   | these were failing anyway for other testsuite configurations, so it's
   | ... 22 lines of the commit log omitted.

FAIL: 23 regressions: 22 improvements

regressions.sum:
=== gcc tests ===

Running gcc:gcc.target/arm/arm.exp ...
FAIL: gcc.target/arm/scd42-2.c scan-assembler mov[ \t].*272

Running gcc:gcc.target/arm/cmse/cmse.exp ...
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O2   scan-assembler 
lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O2   scan-assembler 
lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O3 -g   scan-assembler 
lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O3 -g   scan-assembler 
lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
... and 19 more entries

improvements.sum:
=== gcc tests ===

Running gcc:gcc.target/arm/cmse/cmse.exp ...
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O3 -g   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O3 -g   scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp 
-mthumb  -O2   scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp 
-mthumb  -O3 -g   scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1
... and 16 more entries


I can't make any sense of this at all.  After hours wasted trying to 
find the configuration information from the logs (it's there, but to the 
inexperienced user of your reports, it is buried far too deep), I'm 
still none-the-wiser.  All I can see is that things like


FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c 
-march=armv8.1-m.main+fp -mthumb  -O2   scan-assembler 
lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1


have changed to

FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c  -mcpu=unset 
-march=armv8.1-m.main+fp -mthumb  -O2   scan-assembler 
lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1


(ie that -mcpu=unset has been added to the test name).

That's not a regression, it's a simple FAIL->FAIL

R.



You can find the failure logs in *.log.1.xz files in
  - 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m23_soft_eabi-build/144/artifact/artifacts/00-sumfiles/
The full lists of regressions and improvements as well as configure and make 
commands are in
  - 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m23_soft_eabi-build/144/artifact/artifacts/notify/
The list of [ignored] baseline and flaky failures are in
  - 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m23_soft_eabi-build/144/artifact/artifacts/sumfiles/xfails.xfail

The configuration of this build is:
CI config tcwg_gnu_embed_check_gcc arm-eab

Re: [Linaro-TCWG-CI] gcc-14-8887-gd9459129ea8: FAIL: 29 regressions on master-thumb_m33_eabi

2024-02-12 Thread Richard Earnshaw


I think all of these actually fall under

"I suspect there are still some further issues to address here, since
the framework does not correctly test that the multilibs and startup
code enable alternative format; but this is still an improvement over
what we had before."

All the failures are execution test failures due to the fact that we 
don't check the available hardware/multilibs for running the test; so 
blindly adding options and then running the test is incorrect.  But we 
currently lack such a test in the framework.


It's also less than clear exactly what these tests are checking and 
which part of what they are checking that really requires the options 
they add.  I suspect that they previously passed only by accident (they 
didn't really add enough flags to enable what they author thought they 
were checking).


R.

On 10/02/2024 02:43, ci_not...@linaro.org wrote:
Dear contributor, our automatic CI has detected problems related to your 
patch(es).  Please find some details below.  If you have any questions, 
please follow up on linaro-toolchain@lists.linaro.org mailing list, 
Libera's #linaro-tcwg channel, or ping your favourite Linaro toolchain 
developer on the usual project channel.


We appreciate that it might be difficult to find the necessary logs or 
reproduce the issue locally. If you can't get what you need from our CI 
within minutes, let us know and we will be happy to help.


We track this report status in 
https://linaro.atlassian.net/browse/GNU-1149 
<https://linaro.atlassian.net/browse/GNU-1149> , please let us know if 
you are looking at the problem and/or when you have a fix.


In  arm-eabi cortex-m33 hard after:

   | commit gcc-14-8887-gd9459129ea8
   | Author: Richard Earnshaw 
   | Date:   Mon Feb 5 17:16:45 2024 +
   |
   | arm: testsuite: fix issues relating to fp16 alternative testing
   |
   | The v*_fp16_xN_1.c tests on Arm have been unstable since they were
   | added.  This is not a problem with the tests themselves, or 
even the

   | patches that were added, but with the testsuite infrastructure.  It
   | turned out that another set of dg- tests for fp16 were 
corrupting the

   | cached set of options used by the new tests, leading to running the
   | ... 45 lines of the commit log omitted.

FAIL: 29 regressions

regressions.sum:
     === g++ tests ===

Running g++:g++.dg/dg.exp ...
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=c++14 execution test
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=c++17 execution test
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=c++20 execution test
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=c++98 execution test
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C -std=gnu++14 execution test
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C -std=gnu++17 execution test
FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C -std=gnu++20 execution test
... and 26 more entries

You can find the failure logs in *.log.1.xz files in
  - 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/00-sumfiles/ <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/00-sumfiles/>
The full lists of regressions and progressions as well as configure and 
make commands are in
  - 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/notify/ <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/notify/>

The list of [ignored] baseline and flaky failures are in
  - 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/sumfiles/xfails.xfail <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/sumfiles/xfails.xfail>


The configuration of this build is:
CI config tcwg_gnu_embed_check_gcc arm-eabi -mthumb 
-march=armv8-m.main+dsp+fp -mtune=cortex-m33 -mfloat-abi=hard -mfpu=auto


-8<--8<--8<--
The information below can be used to reproduce a debug environment:

Current build   : 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts>
Reference build : 
https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/362/artifact/artifacts <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/362/artifact/artifacts>


Reproduce last good and first bad builds: 
https://git-us.linaro.org/toolchain/ci/interesting-commits.git/plain/gcc/sha1/d9459129ea8f8c3cbd6150b90e842decba7952a3/tcwg_gnu_embed_check_gcc/master-thumb_m33_eabi/reproduction_instructions.tx

Re: [PATCH v2] ARM: Block predication on atomics [PR111235]

2024-01-15 Thread Richard Earnshaw





On 02/10/2023 18:12, Wilco Dijkstra wrote:

Hi Ramana,


I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf
--build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the
default armhf settings are incorrect. I shouldn't need the --with-float=hard 
since
that is obviously implied by armhf, and they should also imply armv7-a with 
vfpv3
according to documentation. It seems to get confused and skip some tests. I 
tried
using --with-fpu=auto, but that doesn't work at all, so in the end I forced it 
like:
--with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more 
tests.


Yeah that's a wart that I don't like.

armhf just implies the hard float ABI and came into being to help
distinguish from the Base PCS for some of the distros at the time
(2010s). However we didn't want to set a baseline arch at that time
given the imminent arrival of v8-a and thus the specification of
--with-arch , --with-fpu and --with-float became second nature to many
of us working on it at that time.


Looking at it, the default is indeed incorrect, you get:
'-mcpu=arm10e' '-mfloat-abi=hard' '-marm' '-march=armv5te+fp'
That's not incorrect.  It's the first version of the architecture that 
can support the hard-float ABI.




That's like 25 years out of date!


It's not a matter of being out of date (and it's only 22 years since 
arm1020e was announced ;) it's a matter of being as compatible as we can 
be with existing hardware out-of-the-box.  Distros are free, of course, 
to set a higher bar and do so.




However all the armhf distros have Armv7-a as the baseline and use Thumb-2:
'-mfloat-abi=hard' '-mthumb' '-march=armv7-a+fp'


Wrong.  Rawhide uses Arm state (or it did last I checked).  As I 
mentioned above, distros are free to set a higher bar.




So the issue is that dg-require-effective-target arm_arch_v7a_ok doesn't work on
armhf. It seems that if you specify an architecture even with hard-float 
configured,
it turns off FP and then complains because hard-float implies you must have 
FP...


OK, I think I see the problem there, it's in the data for
proc add_options_for_arm_arch_FUNC

in lib/target-supports.exp.  In order to work correctly with -mfpu=auto, 
the -march flags in the table need "+fp" adding in most cases (pretty 
much everything from armv5e onwards) - that's harmless whenever the 
float-abi is soft, but should do the right thing when softfp or hard are 
used.




So in most configurations Iincluding the one used by distro compilers) we 
basically
skip lots of tests for no apparent reason...


Ok, thanks for promising to do so - I trust you to get it done. Please
try out various combinations of -march v7ve, v7-a , v8-a with the tool
as each of them have slightly different rules. For instance v7ve
allows LDREXD and STREXD to be single copy atomic for 64 bit loads
whereas v7-a did not .


You mean LDRD may be generated on CPUs with LPAE. We use LDREXD by
default since that is always atomic on v7-a.


Ok if no regressions but as you might get nagged by the post commit CI ...


Thanks, I've committed it. Those links don't show anything concrete, however I 
do note
the CI didn't pick up v2.

Btw you're happy with backports if there are no issues reported for a few days?

Cheers,
Wilco


R.
___
linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org
To unsubscribe send an email to linaro-toolchain-le...@lists.linaro.org

Re: What -mfpu option is used with neon, vfpv3 and vfpd32 flag?

2016-07-22 Thread Richard Earnshaw

On 22/07/16 05:21, Jeffrey Walton wrote:
> On Fri, Jul 22, 2016 at 12:19 AM, Jim Wilson  wrote:
>> On Thu, Jul 21, 2016 at 9:13 PM, Jeffrey Walton  wrote:
>>> So I guess the question is, what do I use for -mfpu=neon-vfp3 (or
>>> -mfpu=neon-vfp3-d32)? Is -mfpu=neon enough?
>>
>> The -mfpu=neon option is enough.  neon implies vfpv3 and 32 D registers.
>
> Perfect, thanks.
>
> Jeff
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
>

According to https://beagleboard.org/black, this board contains a
Cortex-A8.  So -mfpu=neon is correct.

https://community.arm.com/groups/tools/blog/2013/04/15/arm-cortex-a-processors-and-gcc-command-lines

R.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: -mfpu=neon-fp-armv8 and unrecognized command line option

2016-07-21 Thread Richard Earnshaw

On 20/07/16 22:33, Jim Wilson wrote:
> On Wed, Jul 20, 2016 at 2:14 PM, Jeffrey Walton  wrote:
>> I'm having trouble with ARMv8/Aarch64. One is an early Mustang server
>
> ARMv8 implies 32-bit code (aarch32).  Aaarch64 implies 64-bit code.
> These are two different compilers, with two different sets of command
> line options.

Er, no.  ARMv8 (pedantically ARMv8-A, since there are also ARMv8-R and
ARMv8-M specifications as well) is an architecture, not an ISA.  The
ARMv8 architecture defines two execution modes: AArch32 and AArch64.
The AArch32 execution mode further has to states with separate ISAs: A32
and T32, more traditionally known as ARM and Thumb states.

GCC has two separate compilers for ARMv8.  One handles the AArch32
execution mode (configurations based arm-*-* for legacy reasons)  and
the other AArch64 (configurations based on aarch64-*-*).

The -mfpu option only applies to the AArch32 compiler.

R.

>
>> $ g++ -DDEBUG -g3 -O0 -mfpu=neon-fp-armv8 -fPIC -pipe -c cryptlib.cpp
>> g++: error: unrecognized command line option â€˜-mfpu=neon-fp-armv8â€™
>> GNUmakefile:753: recipe for target 'cryptlib.o' failed
>
> -mfpu=neon-fp-armv8 is an arm (32-bit) compiler option.  The aarch64
> (64-bit) compiler will not accept it.
>
> Because FP and Neon support is optional in the 32-bit arm
> architecture, there are compiler options to enable fp and/or neon
> support.  Usually FP support is enabled by default for a linux distro,
> but the neon support usually is not, and you can enable neon by using
> this -mcpu=neon-fp-armv8 option if running 32-bit code on an ARMv8
> architecture part.
>
> Meanwhile, the aarch64 spec requires FP and ASIMD instruction support
> in the linux ABI, so there are no options to enable them, they are on
> by default.  If you really want to disable them, you can do so by
> using a -march= option, e.g. -march=aarch64+fp+simd enables them, and
> -march=aarch64+nofp+nosimd disables them.  However, if you disable fp
> support, you will break the ABI, and your code may not compile or run,
> so don't do that unless perhaps you have an embedded target, and have
> your own OS build and your own ABI. or no code that uses FP  You can
> also enable/.disable crc (crypto) support this way, but a better way
> is to use a -mcpu= option, and let gcc figure out if the target has
> crc instructions.
>
> See the aarch64 compiler docs here
> https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options
>
> Jim
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
>

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: gcc 5.2 code quality

2016-03-03 Thread Richard Earnshaw

On 03/03/16 00:44, kugan wrote:
>
>
>>> I have just switched to gcc 5.2 from 4.9.2 and the code quality does
>>> seem to have improved significantly. For example, it now seems much
>>> better at using ldp/stp and it seems to has stopped gratuitous use of
>>> the SIMD registers.
>>>
>>> However, I still have a few whinges:-)
>>>
>>> See attached copy.c / copy.s (This is a performance critical function
>>> from OpenJDK)
>>>
>>> pd_disjoint_words:
>>>  cmp x2, 8 <<< (1)
>>>  sub sp, sp, #64   <<< (2)
>>>  bhi .L2
>>>  cmp w2, 8 <<< (1)
>>>  bls .L15
>>> .L2:
>>>  add sp, sp, 64<<< (2)
>>>
>>> (1) If count as a 64 bit unsigned is <= 8 then it is probably still
>>> <= 8 as a 32 bit unsigned.
>>>
>> Agreed.  This could probably be done by the mid-end based on value range
>> propagation.  Please can you file a report in gcc bugzilla?
>
> Not sure how we can do this in VRP. It seems that this is generated
> during the RTL expansion time. Maybe,it has to be done during expansion.
> optimized tree looks like:
>
>

Ramana and I looked further into thsi last night.  It turns out this is
due to the way we expand switch tables.  The ARM and AArch64 back-ends
both use the casesi pattern which is defined to do a range check and a
branch into the table.  The range check is based on a 32-bit value.

Because this example uses a 64-bit type as the controlling expression,
the mid-end has to insert another check that the original value is
within range; this renders the second check redundant but there's then
no way to remove that.  You're correct that VRP isn't going to help here.

We're looking at whether we can adjust things to use the tablejump
expander, since that should eliminate the need for the second check.



> ;; Function pd_disjoint_words (pd_disjoint_words, funcdef_no=0,
> decl_uid=2763, cgraph_uid=0, symbol_order=0)
>
> Removing basic block 13
> pd_disjoint_words (HeapWord * from, HeapWord * to, size_t count)
> {
>   long int t$b;
>   long int t$a;
>   struct unit t;
>   struct unit t;
>   struct unit t;
>   struct unit t;
>   struct unit t;
>   struct unit t;
>   long int _5;
>
>   :
>   switch (count_2(D)) , case 0: , case 1: , case
> 2: , case 3: , case 4: , case 5: , case 6: , case
> 7: , case 8: >
>
> :
>   _5 = *from_4(D);
>   *to_6(D) = _5;
>   goto  ();
>
> :
>   t$a_8 = MEM[(struct unit *)from_4(D)];
>   t$b_9 = MEM[(struct unit *)from_4(D) + 8B];
>   MEM[(struct unit *)to_6(D)] = t$a_8;
>   MEM[(struct unit *)to_6(D) + 8B] = t$b_9;
>   goto  ();
>
> :
>   t = MEM[(struct unit *)from_4(D)];
>   MEM[(struct unit *)to_6(D)] = t;
>   t ={v} {CLOBBER};
>   goto  ();
>
> :
>   t = MEM[(struct unit *)from_4(D)];
>   MEM[(struct unit *)to_6(D)] = t;
>   t ={v} {CLOBBER};
>   goto  ();
>
> :
>   t = MEM[(struct unit *)from_4(D)];
>   MEM[(struct unit *)to_6(D)] = t;
>   t ={v} {CLOBBER};
>   goto  ();
>
> :
>   t = MEM[(struct unit *)from_4(D)];
>   MEM[(struct unit *)to_6(D)] = t;
>   t ={v} {CLOBBER};
>   goto  ();
>
> :
>   t = MEM[(struct unit *)from_4(D)];
>   MEM[(struct unit *)to_6(D)] = t;
>   t ={v} {CLOBBER};
>   goto  ();
>
> :
>   t = MEM[(struct unit *)from_4(D)];
>   MEM[(struct unit *)to_6(D)] = t;
>   t ={v} {CLOBBER};
>   goto  ();
>
> :
>   _Copy_disjoint_words (from_4(D), to_6(D), count_2(D)); [tail call]
>
> :
>   return;
>
> }
>
>
>
> Thanks,
> Kugan
>

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: gcc 5.2 code quality

2016-03-02 Thread Richard Earnshaw

On 02/03/16 11:35, Edward Nevill wrote:
> Hi,
>
> I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to 
> have improved significantly. For example, it now seems much better at using 
> ldp/stp and it seems to has stopped gratuitous use of the SIMD registers.
>
> However, I still have a few whinges:-)
>
> See attached copy.c / copy.s (This is a performance critical function from 
> OpenJDK)
>
> pd_disjoint_words:
> cmp x2, 8 <<< (1)
> sub sp, sp, #64   <<< (2)
> bhi .L2
> cmp w2, 8 <<< (1)
> bls .L15
> .L2:
> add sp, sp, 64<<< (2)
>
> (1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a 
> 32 bit unsigned.
>
Agreed.  This could probably be done by the mid-end based on value range
propagation.  Please can you file a report in gcc bugzilla?

> (2) Nowhere in the function does it store anything on the stack, so why
> drop and restore the stack every time. Also, minor quibble in the
> disass, why does sub use #64 whereas add uses just '64' (appreciate this
> is probably binutils, not gcc).
>

This is a known problem.  What's happened is that in the early phase of
compilation you had an object that appeared to need stack space.  Later
on that was optimized away, but the stack slot is not freed.  In large
functions where there is often other data on the stack anyway this
equates to little more than some wasted stack space, but in small
functions it can often make the difference between needing stack
adjustments and not.

> .L15:
> adrpx3, .L4
> add x3, x3, :lo12:.L4
> ldrbw2, [x3,w2,uxtw] <<< (3)
> adr x3, .Lrtx4
> add x2, x3, w2, sxtb #2
> br  x2
>
> (3) Why use a byte table, this is not some sort of embedded system. Use
> a word table and this becomes.
>
> .L15:
> adrpx3, .L4
> add x3, x3, :lo12:.L4
>   ldr x2, [x3, x2, lsl #3]
>   br  x2
>
> An aligned word load takes exactly the same time as a byte load and we
> save the faffing about calculating the address.
>

That doesn't work for PIC (or PIE) and can also significantly increase
cache pressure.

> .L10:
> ldp x6, x7, [x0]
> ldp x4, x5, [x0, 16]
> ldp x2, x3, [x0, 32]  <<< (4)
> stp x2, x3, [x1, 32]  <<< (4)
> stp x6, x7, [x1]
> stp x4, x5, [x1, 16]
>
> (4) Seems to be something wrong with the load scheduler here? Why not
> move the stp x2, x3 to the end. It does this repeatedly.
>

You don't say what compilation options you used but a simple build with
-O3 on gcc trunk shows the stores in the correct order.


> Unfortunately as this function is performance critical it means I will
> probably end up doing it in inline assembler which is time consuming,
> error prone and non portable.
>
> * Whinge mode off
>
> Ed
>
>
> copy.c
>
>
> #include 
>
> typedef long HeapWord;
>
> extern void _Copy_disjoint_words(HeapWord* from, HeapWord* to, size_t count);
>
> void pd_disjoint_words(HeapWord* from, HeapWord* to, size_t count) {
> switch (count) {
>   case 0: return;
>   case 1: to[0] = from[0]; return;
>   case 2: {
> struct unit { HeapWord a, b; } *p, *q, t;
>   p = (struct unit *)from;
>   q = (struct unit *)to;
>   t = *p;
>   *q = t;
>   return;
>   }
>   case 3: {
> struct unit { HeapWord a, b, c; } *p, *q, t;
>   p = (struct unit *)from;
>   q = (struct unit *)to;
>   t = *p;
>   *q = t;
>   return;
>   }
>   case 4: {
> struct unit { HeapWord a, b, c, d; } *p, *q, t;
>   p = (struct unit *)from;
>   q = (struct unit *)to;
>   t = *p;
>   *q = t;
>   return;
>   }
>   case 5: {
> struct unit { HeapWord a, b, c, d, e; } *p, *q, t;
>   p = (struct unit *)from;
>   q = (struct unit *)to;
>   t = *p;
>   *q = t;
>   return;
>   }
>   case 6: {
> struct unit { HeapWord a, b, c, d, e, f; } *p, *q, t;
>   p = (struct unit *)from;
>   q = (struct unit *)to;
>   t = *p;
>   *q = t;
>   return;
>   }
>   case 7: {
> struct unit { HeapWord a, b, c, d, e, f, g; } *p, *q, t;
>   p = (struct unit *)from;
>   q = (struct unit *)to;
>   t = *p;
>   *q = t;
>   return;
>   }
>   case 8: {
> struct unit { HeapWord a, b, c, d, e, f, g, h; } *p, *q, t;
>   p = (struct unit *)from;
>   q = (struct unit *)to;
>   t = *p;
>   *q = t;
>   return;
>   }
>   default:
> _Copy_disjoint_words(from, to, count);
> }
> }
>
>
> copy.s
>
>
>   .cpu generic+fp+simd
>   .file   "copy.c"
>   .text
>   .align  2
>   .p2align 3,,7
>   .global pd_disjoint_words
>   .type   pd_disjoint_words, %function
> pd_disjoint_words:
>   cmp x2, 8
>   sub sp, sp, #64
>

Re: gcc 5.2 code quality

2016-03-02 Thread Richard Earnshaw

On 02/03/16 14:25, Renato Golin wrote:
> On 2 March 2016 at 11:35, Edward Nevill  wrote:
>> cmp x2, 8   <<< (1)
>> (1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as 
>> a 32 bit unsigned.
>
> You mean to use "cmp w2, 8" instead? Is there any difference?
>

No, it's code equivalent to

unsigned long x;

if (x <= 8)
{
  if ((unsigned) x <= 8)
  {
...
  }
}

Where the inner test is clearly redundant (for unsigned).

R.


>
>> (2) Nowhere in the function does it store anything on the stack, so why
>> drop and restore the stack every time. Also, minor quibble in the
>> disass, why does sub use #64 whereas add uses just '64' (appreciate this
>> is probably binutils, not gcc).
>
> My reading of the AAPCS64 is that it's not necessary to have a frame
> at all, only that if you do, it must be quad-word aligned.
>
> Clang/LLVM doesn't seem to bother with the push and pop, but it also
> uses "cmp x".
>
>
>> .L15:
>> adrpx3, .L4
>> add x3, x3, :lo12:.L4
>> ldr x2, [x3, x2, lsl #3]
>> br  x2
>
> Hum, this is *exactly* what Clang generates... :)
>
>
>> (4) Seems to be something wrong with the load scheduler here? Why not
>> move the stp x2, x3 to the end. It does this repeatedly.
>
> Again, Clang seems to do what you want...
>
> Have you tried building OpenJDK with Clang?
>
> cheers,
> --renato
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-toolchain
>

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Linaro 4.9 toolchain has issues with allyesconfig

2016-01-27 Thread Richard Earnshaw

On 27/01/16 16:57, Jim Wilson wrote:
> On Wed, Jan 27, 2016 at 8:37 AM, Christophe Lyon
>  wrote:
>> I confirm the trampolines are not inserted for ld -r.
>
> We can't insert the trampolines with ld -r.  We don't have all of the
> symbols required for this with a relocatable link.  Also, for the
> symbols we do have, we don't know the final addresses, so we don't
> know which branches/calls will be out of range.
>
>> Maybe using -ffunction-sections would help?
>
> Using -mlong-calls might be easier.  You would only want this enabled
> for a allyesconfig build of course.  This should solve the
> out-of-range calls to the spin lock functions.  it probably doesn't
> help for the out-of-range branches to the .text.unlikely sections, but
> this is a gcc optimization that can be disabled with
> -fno-reorder-functions.  This is again something you would only want
> for an allyesconfig build.
>
> Jim
>

Long calls would probably solve the problem, but would likely be
horribly expensive in performance.

The best solution would be to have an option to prevent ld -r from
merging like-named sections (instead just aggregating multiple sections
with similar names into one object file).  This is possible in ELF.

R.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Linaro 4.9 toolchain has issues with allyesconfig

2016-01-27 Thread Richard Earnshaw

On 27/01/16 14:52, William Mills wrote:
>
>
> On 01/27/2016 08:35 AM, Richard Earnshaw wrote:
>> On 26/01/16 17:25, Christophe Lyon wrote:
>>> On 26 January 2016 at 18:23, Dan Murphy  wrote:
>>>> Christophe
>>>>
>>>>
>>>> On 01/26/2016 10:58 AM, Christophe Lyon wrote:
>>>>>
>>>>> On 25 January 2016 at 17:21, Dan Murphy  wrote:
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> When using the linaro-4.9-2015.05 toolchain on the Linux master and on
>>>>>> Linux stable releases
>>>>>> I am seeing a build issue below when using the allyesconfig.  This does
>>>>>> not seem to occur on the 5.2 tool chain.
>>>>>> We cannot move to 5.2 tool chain for our releases as they are based on
>>>>>> 4.9.
>>>>>>
>>>>>>   LD  init/built-in.o
>>>>>> arch/arm/kernel/built-in.o:(.text.fixup+0x1d4): relocation truncated to
>>>>>> fit: R_ARM_JUMP24 against `.text.unlikely'
>>>>>> arch/arm/kernel/built-in.o:(.text.fixup+0x1e0): relocation truncated to
>>>>>> fit: R_ARM_JUMP24 against `.text.unlikely'
>>>>>> arch/arm/kernel/built-in.o:(.text.fixup+0x1ec): relocation truncated to
>>>>>> fit: R_ARM_JUMP24 against `.text.unlikely'
>>>>>> drivers/built-in.o: In function `combiner_handle_cascade_irq':
>>>>>> :(.text+0x834): relocation truncated to fit: R_ARM_CALL against symbol
>>>>>> `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o
>>>>>> :(.text+0x868): relocation truncated to fit: R_ARM_CALL against symbol
>>>>>> `_raw_spin_unlock' defined in .spinlock.text section in kernel/built-in.o
>>>>>> drivers/built-in.o: In function `hip04_irq_set_type':
>>>>>> :(.text+0xad0): relocation truncated to fit: R_ARM_CALL against symbol
>>>>>> `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o
>>>>>> :(.text+0xb10): relocation truncated to fit: R_ARM_CALL against symbol
>>>>>> `_raw_spin_unlock' defined in .spinlock.text section in kernel/built-in.o
>>>>>> drivers/built-in.o: In function `hip04_raise_softirq':
>>>>>> :(.text+0xc8c): relocation truncated to fit: R_ARM_CALL against symbol
>>>>>> `_raw_spin_lock_irqsave' defined in .spinlock.text section in
>>>>>> kernel/built-in.o
>>>>>> :(.text+0xdc8): relocation truncated to fit: R_ARM_CALL against symbol
>>>>>> `_raw_spin_unlock_irqrestore' defined in .spinlock.text section in
>>>>>> kernel/built-in.o
>>>>>> drivers/built-in.o: In function `hip04_irq_set_affinity':
>>>>>> :(.text+0xefc): relocation truncated to fit: R_ARM_CALL against symbol
>>>>>> `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o
>>>>>> :(.text+0xf78): additional relocation overflows omitted from the output
>>>>>> make: *** [vmlinux] Error 1
>>>>>>
>>>>>> Please advise to how to resolve this issue within the 4.9 Linaro tool
>>>>>> chain
>>>>>
>>>>> Hi Dan,
>>>>>
>>>>> It would be better to report this kind of problem using our bugzilla:
>>>>> https://bugs.linaro.org/
>>>>>
>>>>> I've managed to reproduce it, except for the errors with R__ARM_JUMP24.
>>>>> Maybe that's because I used linux-4.3.3.tar.xz.
>>>>>
>>>>> Anyway, these errors indicate branches trying reach a function which
>>>>> is too far away from the call site.
>>>>> Normally, the linker inserts stubs (trampolines) to handle such
>>>>> situations, but here the .text section
>>>>> of drivers/built-in.o is really huge: 84247436 bytes (84MB) in my case.
>>>>> The linker is able to insert trampolines at section boundaries
>>>>> (between object files), but in this case
>>>>> it cannot insert one close enough, hence the error you are seeing.
>>>>
>>>>
>>>> Do we know if this is a linux kernel issue or a linker issue that got
>>>> exposed
>>>> when a patch came in?
>>>>
>>>
>>> I don't know, but I'm not a kernel expert.
>>>
>>> It's

Re: Linaro 4.9 toolchain has issues with allyesconfig

2016-01-27 Thread Richard Earnshaw

On 26/01/16 17:25, Christophe Lyon wrote:
> On 26 January 2016 at 18:23, Dan Murphy  wrote:
>> Christophe
>>
>>
>> On 01/26/2016 10:58 AM, Christophe Lyon wrote:
>>>
>>> On 25 January 2016 at 17:21, Dan Murphy  wrote:

 Hi!

 When using the linaro-4.9-2015.05 toolchain on the Linux master and on
 Linux stable releases
 I am seeing a build issue below when using the allyesconfig.  This does
 not seem to occur on the 5.2 tool chain.
 We cannot move to 5.2 tool chain for our releases as they are based on
 4.9.

   LD  init/built-in.o
 arch/arm/kernel/built-in.o:(.text.fixup+0x1d4): relocation truncated to
 fit: R_ARM_JUMP24 against `.text.unlikely'
 arch/arm/kernel/built-in.o:(.text.fixup+0x1e0): relocation truncated to
 fit: R_ARM_JUMP24 against `.text.unlikely'
 arch/arm/kernel/built-in.o:(.text.fixup+0x1ec): relocation truncated to
 fit: R_ARM_JUMP24 against `.text.unlikely'
 drivers/built-in.o: In function `combiner_handle_cascade_irq':
 :(.text+0x834): relocation truncated to fit: R_ARM_CALL against symbol
 `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o
 :(.text+0x868): relocation truncated to fit: R_ARM_CALL against symbol
 `_raw_spin_unlock' defined in .spinlock.text section in kernel/built-in.o
 drivers/built-in.o: In function `hip04_irq_set_type':
 :(.text+0xad0): relocation truncated to fit: R_ARM_CALL against symbol
 `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o
 :(.text+0xb10): relocation truncated to fit: R_ARM_CALL against symbol
 `_raw_spin_unlock' defined in .spinlock.text section in kernel/built-in.o
 drivers/built-in.o: In function `hip04_raise_softirq':
 :(.text+0xc8c): relocation truncated to fit: R_ARM_CALL against symbol
 `_raw_spin_lock_irqsave' defined in .spinlock.text section in
 kernel/built-in.o
 :(.text+0xdc8): relocation truncated to fit: R_ARM_CALL against symbol
 `_raw_spin_unlock_irqrestore' defined in .spinlock.text section in
 kernel/built-in.o
 drivers/built-in.o: In function `hip04_irq_set_affinity':
 :(.text+0xefc): relocation truncated to fit: R_ARM_CALL against symbol
 `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o
 :(.text+0xf78): additional relocation overflows omitted from the output
 make: *** [vmlinux] Error 1

 Please advise to how to resolve this issue within the 4.9 Linaro tool
 chain
>>>
>>> Hi Dan,
>>>
>>> It would be better to report this kind of problem using our bugzilla:
>>> https://bugs.linaro.org/
>>>
>>> I've managed to reproduce it, except for the errors with R__ARM_JUMP24.
>>> Maybe that's because I used linux-4.3.3.tar.xz.
>>>
>>> Anyway, these errors indicate branches trying reach a function which
>>> is too far away from the call site.
>>> Normally, the linker inserts stubs (trampolines) to handle such
>>> situations, but here the .text section
>>> of drivers/built-in.o is really huge: 84247436 bytes (84MB) in my case.
>>> The linker is able to insert trampolines at section boundaries
>>> (between object files), but in this case
>>> it cannot insert one close enough, hence the error you are seeing.
>>
>>
>> Do we know if this is a linux kernel issue or a linker issue that got
>> exposed
>> when a patch came in?
>>
>
> I don't know, but I'm not a kernel expert.
>
> It's likely that the allyes config produces large object files.
>
> Did it ever work for you?
>
> With a known-to-work starting point we could try to bisect
> and identify went the problem appeared.
>

I find it hard to believe that a compiler could generate a single object
file that contained 84Mb of code, it would take an inordinate amount of
code to do that even if optimizations were turned off.  So that leaves
two likely options:

1) The file is constructed by concatenating multiple object files,
perhaps with ld -r to produce a partially linked object file.
2) The file contains some directives that are inserting large amounts of
padding.

Either way, I suspect that this is going to require fixing in the Linux
build system.  You've hit a fundamental limit of the AArch32
architecture, as Christophe has mentioned and there's nothing at this
point that the tools can do to help you.

R.

> Christophe.
>
>
>> Dan
>>
>>
>>> The range of branch instructions is indicated for instance here:
>>>
>>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/Cihfddaf.html
>>>
>>> I'm not sure why this would no happen with the 5.2 toolchain, I haven't
>>> tried.
>>>
>>> Christophe.
>>>
>>>
 Dan

 --
 --
 Dan Murphy

 ___
 linaro-toolchain mailing list
 linaro-toolchain@lists.linaro.org
 https://lists.linaro.org/mailman/listinfo/linaro-toolchain
>>
>>
>>
>> --
>> --
>> Dan Murphy
>>
> ___

Re: [ANNOUNCE] Linaro GCC 4.9 2014.06 released

2014-06-13 Thread Richard Earnshaw

On 13/06/14 09:52, Matthias Klose wrote:
> Am 13.06.2014 10:10, schrieb Yvan Roux:
>> Linaro GCC 4.9 2014.06 is the third Linaro GCC source package release in the
>> 4.9 series.  It is based on FSF GCC 4.9.1+svn211054 and includes performance
>> improvements and bug fixes.
> 
> This sounds like 4.9.1 is already released. Do you mean 4.9.0+svn211054 ?
> 
> 
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
> 

BASE_VER is bumped in SVN immediately after the release has been made,
so the svn branch probably does say 4.9.1.

This should really be called 4.9.1-pre+svn211054.

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: [RFC][AArch64] Remove CORE_REGS form reg_class

2014-05-15 Thread Richard Earnshaw


On 15/05/14 00:22, Kugan wrote:

Hi All,

AAarch64 back-end defines GENERAL_REGS and CORE_REGS with the same set
of register. Is there any reason why we need this?

target hooks like aarch64_register_move_cost doesn’t handle CORE_REGS.
In addition, IRA cost calculation also has logics like make common class
biggest of best and alternate; this might get confused with this.

Attached RFC patch removes it. regression tested for
aarch64-none-linux-gnu on qemu-aarch64 with now new regression. Is this OK ?



Patches for gcc need to be sent to gcc-patches...

R.



Thanks,
Kugan

gcc/

2014-05-14  Kugan Vivekanandarajah  

* config/aarch64/aarch64.c (aarch64_regno_regclass) : Change CORE_REGS
to GENERAL_REGS.
(aarch64_secondary_reload) : LikeWise.
(aarch64_class_max_nregs) : Remove CORE_REGS.
* config/aarch64/aarch64.h (enum reg_class) : Remove CORE_REGS.
(REG_CLASS_NAMES) : Likewise.
(REG_CLASS_CONTENTS) : LikeWise.
(INDEX_REG_CLASS) : Change CORE_REGS to GENERAL_REGS.





___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: -mfpu=softvfp+vfp in LLVM

2014-02-18 Thread Richard Earnshaw

On 18/02/14 12:59, Renato Golin wrote:
> Richard,
> 
> I found some emails about you implementing softvfp back in 2003, and
> I'd like to know what is the expected behaviour when it conflicts with
> the target triple, for example:
> 
>  -triple arm-linux-gnueabihf + -mfpu=sofvfp+vfp
> 
> In this case, in LLVM, the triple sets "-float-abi=hard" but the fpu
> would set "+soft-float-abi", which are contradictory flags.
> 
> Is that case even possible? If so, what's the expected behaviour? Soft
> or hard float?
> 
> Do the extra flags always override the triple behaviour? Is it
> expected that *every* compiler flag will work on a
> last-seen-sets-behaviour manner?
> 
> cheers,
> --renato
> 

I honestly don't remember what -mfpu=softvfp+vfp is without going to
look it up... you're talking about code that was written 11 years ago!
I suspect it dates to the time when we were starting to phase out
support for the old FPA instructions (if you don't remember those, think
yourself lucky :-); where softvfp meant to use the floating-point data
format that was used with the VFP; the +vfp was probably meant to imply
that vfp instructions could be used as well, but didn't change the ABI
(doesn't imply float-abi=hard) -- I would say the combination you
describe above is probably meaningless.

In the gcc world triplets are only used during configuration of the
compiler to set the various defaults, they never override something
given on the command line at run time.  It is possible to create
meaningless combinations of some options (thumb1 + hard-float ABI is
currently one that GCC can't handle).

R.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: MMU Off / Strict Alignment

2013-12-19 Thread Richard Earnshaw

On 18/12/13 05:06, Jonathan S. Shapiro wrote:
> At the risk of sticking my nose in, this isn't a startup code issue.
> It's a contract issue.
> 
> First, I don't buy Richard's argument about memcpy() startup costs and
> hard-to-predict branches. We do those tests on essentially every
> *other* RISC platform without complaint, and it's very easy to order
> those branches so that the currently efficient cases run well. Perhaps
> more to the point, I haven't seen anybody put forward quantitative
> data that using the MMU for unaligned references is any better than
> executing those branches. Speaking as a recovering processor
> architect, that assumption needs to be validated quantitatively. My
> guess is that the branches are faster if properly arranged.
> 
> Second, this is a contract issue. If newlib intends to support
> embedded platforms, then it needs to implement algorithms that are
> functionally correct without relying on an MMU. By all means use
> simpler or smarter algorithms when an MMU can be assumed to be
> available in a given configuration, but provide an algorithm that is
> functionally correct when no MMU is available. "Good overall
> performance in memcpy" is a fine thing, but it is subject to the
> requirement of meeting functional specifications. As Jochen Liedtke
> famously put it (read this in a heavy German accent): "Fast, ya. But
> correct? (shrug) Eh!"
> 
> So: we need a normative statement saying what the contract is. The
> rest of the answer will fall out from that.
> 
> I do agree with Richard that startup code is special. I've built
> deeply embedded runtimes of one form or another for 25 years now, and
> I have yet to see a system where optimizing a simplistic byte-wise
> memcpy during bootstrap would have made any difference in anything
> overall. That said, if the specification of memcpy requires it to
> handle incompatibly aligned pointers (and it does), and the contract
> for newlib requires it to operate in MMU-less scenarios in a given
> configuration (which, at least in some cases, it does), it's
> completely legitimate to expect that bootstrap code can call memcpy()
> and expect behavior that meets specifications.
> 
> So what's the contract?
>
I disagree with your assertion that newlib *requires* it to operate in
an MMU-less scenario for all targets; it only does so when the target
can reasonably be expected to not have an MMU.

The only contract that exists is the one written in the C standard:

7.23.2.1#2 The memcpy function copies n characters from the object
pointed to by s2 into the object pointed to by s1. If copying takes
place between objects that overlap, the behavior is undefined.

But that is written on the assumption that we're in a normal execution
environment, not in some special case.

What you're missing is that AArch64 is (in ARM ARM terms) an A-profile
only environment where an MMU is mandated in the system.  Furthermore,
processors implementing the architecture will *expect* that the MMU be
turned on as soon as possible after boot, since without this the caches
cannot be used and without those the performance will be truly horrible.
 Once the caches are enabled, it's perfectly reasonable to assume that
memcpy will only be used for copies to and from NORMAL memory, since
other types of memory have potential side effects, which means that use
of memcpy would be unsafe.

If you want to write an MMU-less memcpy, then feel free to write one;
but please install it with a different interface -- something like
__memcpy_nommu().  Don't penalise the standard case for the non-standard
exceptional one.

R.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: MMU Off / Strict Alignment

2013-12-17 Thread Richard Earnshaw

On 16/12/13 17:54, Christopher Covington wrote:
> Hi,
> 
> On 11/20/2013 03:45 PM, Matthew Gretton-Dann wrote:
>> On 20 November 2013 17:57, Christopher Covington  wrote:
>>> Hi,
>>>
>>> We've noticed an issue trying to use the Linaro AArch64 binary bare metal
>>> toolchain release with the MMU turned off for some low-level tests.
>>>
>>> Anytime puts, sprintf, etc. gets called, a reent structure gets created with
>>> references to STDIN, STDOUT, STDERR FILE types. A member in the __sFile
>>> struct, _mbstate, is an 8 byte struct, but is not aligned on an 8 byte
>>> boundary. This means that when memset (or a similar function) gets called on
>>> this struct, and doesn't operate one byte at a time, a data alignment fault
>>> will be generated when operating out of device memory, such as on a system
>>> where the MMU has not yet been turned on yet.
> 
> We believe to have narrowed down the issue to the AArch64 optimized
> memcpy/memset implementations that assume unaligned accesses will not fault.
> While the current AArch64 libgloss startup code turns the MMU on so such
> accesses will succeed, I don't think turning on the MMU should be required of
> all startup code. Would it be possible to modify these routines to make only
> size-aligned accesses without degrading performance? If a single
> implementation can't make everyone happy, should the ifdefs around them
> perhaps be expanded to include something about requiring the MMU to be on?
> 

Quite frankly, I doubt it.  Good overall performance in memcpy means
avoiding hard-to-predict branches (it's not unusual for the code to be
called with completely random copy sizes); removing the unaligned
accesses would mean many more compares and branches than are currently
required, each of which would carry a significant risk of an avoidable
branch mispredict.

Furthermore, completely unaligned copies would then need to be entirely
rewritten to use byte-shifting techniques; that would significantly
impact the overall performance.

My personal feeling is that startup code is really special.  If you need
to copy some memory during this time and the MMU has not been enabled,
then you can't assume that it's safe to call memcpy.

R.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Overheating Pandas

2013-07-03 Thread Richard Earnshaw


On 03/07/13 17:41, Renato Golin wrote:

On 3 July 2013 17:22, Mans Rullgard mailto:mans.rullg...@linaro.org>> wrote:

I repeat, the 4460 will run at 1.2GHz indefinitely without thermal
management.


My mistake, I said 1.3GHz when it was actually 1.2GHz. So, at 1.2GHz, it
freezes every few hours on full load on both 4430 and 4460.

linaro@linaro-panda-01:~$ cat
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
120

Now what?



keep lowering the clock limit (.../cpufreq/scaling_max_freq) until you 
get stability.  If you don't, then it isn't a heating problem.


Remember that manufacturers match the form of packaging to the expected 
TDP of the intended usage environment (to keep product costs down).   In 
a mobile part that probably means relatively cheap plastic package 
because a hot chip would burn a hole in your pocket -- literally.  The 
package almost certainly doesn't have a high thermal conductivity from 
the chip to the external surface so while a heat sink might help, it 
won't be as effective as with other packaging options.


Chips expected to dissipate large amounts of power normally have a metal 
pad on the package so that a heat sink with thermal grease will make a 
good thermal contact.


R.



___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: [ACTIVITY] Week 12

2013-03-25 Thread Richard Earnshaw


On 25/03/13 06:00, Pinski, Andrew wrote:

Yes I know what you want to do but I think this is better to without adding a 
new tree code.
You want to expand the following:
a = x < y
c = z ? a : b

Where a is only used in the assignment of a.

I think this is better to add target specific hook for doing expansions which 
are complex and only effect one target.


Two (at least).


Then the patch becomes almost all target specific and no longer need to add a 
new tree code which is only good for the arm target.


Not necessarily.  It's often easier to lower something more specific 
into general code when that feature isn't present that to optimize for 
it by identifying specific patterns when it is.  The existing code 
stands almost no chance of being able to deal with repeated conditional 
comparison operations, it just gets too hairy.


R.



Thanks,
Andrew

From: Zhenqiang Chen [zhenqiang.c...@linaro.org]
Sent: Sunday, March 24, 2013 9:34 PM
To: Pinski, Andrew
Cc: linaro-toolchain
Subject: Re: [ACTIVITY] Week 12

On 25 March 2013 12:05, Pinski, Andrew  wrote:

* Investigate how to expand conditional compare GIMPLE to RTL and emit asm.


I think maybe we should start adding target specific expanders.  Then all you 
need to do is take that expander and when you get a COND_EXPR and then looks at 
TRE provided information which then can exapnd the conditional compare without 
adding a new tree code.  Note I think this should be discussed on the GCC list 
directly anyways rather than on the linaro form because it is more likely be 
accepted if talked about there.


Thanks for the comments.

The "conditional compare" mentioned here is a different from
COND_EXPR. If my understanding is correct,

COND_EXPR is for "c1? v1: v2"

"conditional compare" here is to represent the second operand of
short-circuit, e.g. TRUTH_ANDIF_EXPR. It is more like "c1? CMP (v1,
v2): c1"

It is just a CMP (GT, NE, etc)_EXPR if ignoring the "conditional" part.

Agree with you, we should discuss it on GCC list. But before that, I
want a prototype and estimate the efforts.

Thanks!
-Zhenqiang



___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain





___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: AArch64 asm statement question

2013-02-25 Thread Richard Earnshaw


On 22/02/13 09:54, Yvan Roux wrote:

Hi Richard,

thanks for the reminding, my previous example was just an attempt to
find the good asm statement constraints to generate a correct ldxp
instruction. My real objective is to implement 128-bit single-copy
atomic load/store and to do this I use ldxp without any matching stxp
for the atomic_load :

 __asm__ __volatile__(
   "   ldxp %0, %1, [%2]"
   : "=&r" (res.v1), "=&r" (res._v2)
   : "r" (addr)
  );

and a "fake" ldxp with the "real" stxp for the atomic_store:

 do {
   __asm__ __volatile__(
   "   ldxp  %0, %1, %3\n"
   "   stxp %w2, %4, %5, %3"
   : "=&r" (fake_val.v1), "=&r" (fake_val.v2), "=&r" (status), "+Q"
(*addr)
   : "r" (value.v1), "r" (value.v2)
   );
 } while (status);

do you think that it is the right way to do it ?



Sadly, no.  Only the STXP instruction is single-copy atomic.  To get 
atomicity on a read you need to repeat the sequence



LDXPXn, Xm, [addr]
    STXPWs, Xn, Xm, [addr]

until the store succeeds.

However, this isn't needed on 32-bit LDXP sequences.

R.


Thanks
Yvan

On 21 February 2013 19:31, Richard Earnshaw  wrote:

On 21/02/13 15:54, Yvan Roux wrote:


Hi,

in the example below I want to explicitly generate a "store exclusive
pair" instruction with an asm statement:

typedef struct {
  long unsigned int v1;
  long unsigned int v2;
} mtype;


int main () {
mtype val[2] ;
val[0].v1 = 1234;
val[0].v2 = 5678;
int status;

do {
  __asm__ __volatile__(
  "  stxp %0, %2, %3, %1"
  : "=&r" (status), "=Q" (val[1])
  : "r" (val[0].v1), "r" (val[0].v2)
  );
} while (status != 0);

if (val[1].v1 == 1234 && val[1].v2 == 5678)
  return 0;
return 1;
}

The generated assembly is:

.L7:
  ldr x0, [sp]
  ldr x1, [sp,8]
.L3:
  add x3, sp, 16
  stxpx2, x0, x1, [x3]
  cbnzw2, .L7

and the issue is that the assembler is not happy of the register x2
used to store the exclusive access status, it should be w2, but
looking at constraint.md it seems that there is no constraint to say
that we want the 32bit version of the register. Any idea ?



You may already be aware of this, but like AArch32, the architecture
restricts the use of load and store operations that are permitted between
LDXP and STXP, which essentially means that any ASM block that uses LDXP
must also contain the matching STXP that depends on it.  If you don't do
this the compiler may introduce random load/store operations (eg
spills/reloads) that will kill your exclusive access and make the code
unable to proceed.

R.








___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: AArch64 asm statement question

2013-02-21 Thread Richard Earnshaw


On 21/02/13 15:54, Yvan Roux wrote:

Hi,

in the example below I want to explicitly generate a "store exclusive
pair" instruction with an asm statement:

typedef struct {
 long unsigned int v1;
 long unsigned int v2;
} mtype;


int main () {
   mtype val[2] ;
   val[0].v1 = 1234;
   val[0].v2 = 5678;
   int status;

   do {
 __asm__ __volatile__(
 "  stxp %0, %2, %3, %1"
 : "=&r" (status), "=Q" (val[1])
 : "r" (val[0].v1), "r" (val[0].v2)
 );
   } while (status != 0);

   if (val[1].v1 == 1234 && val[1].v2 == 5678)
 return 0;
   return 1;
}

The generated assembly is:

.L7:
 ldr x0, [sp]
 ldr x1, [sp,8]
.L3:
 add x3, sp, 16
 stxpx2, x0, x1, [x3]
 cbnzw2, .L7

and the issue is that the assembler is not happy of the register x2
used to store the exclusive access status, it should be w2, but
looking at constraint.md it seems that there is no constraint to say
that we want the 32bit version of the register. Any idea ?



You may already be aware of this, but like AArch32, the architecture 
restricts the use of load and store operations that are permitted 
between LDXP and STXP, which essentially means that any ASM block that 
uses LDXP must also contain the matching STXP that depends on it.  If 
you don't do this the compiler may introduce random load/store 
operations (eg spills/reloads) that will kill your exclusive access and 
make the code unable to proceed.


R.



___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: aarch64 does not run

2013-02-13 Thread Richard Earnshaw

On 13/02/13 02:08, Wink Saville wrote:

Thanks, I install ia32-lib and now it works, but what a poor error
message. To bad i would at least tell you what file wasn't found.

I agree, but that's a feature of the OS, rather than the program you 
were trying to run.  The programs you installed never even started to 
run because the OS couldn't find the required libraries.

R.

-- Wink

On Sun, Feb 10, 2013 at 3:23 AM, Mans Rullgard mailto:mans.rullg...@linaro.org>> wrote:

On 10 February 2013 04:36, Wink Saville mailto:w...@saville.com>> wrote:
 > I downloaded the aarch64 binaries to  a ubuntu machine:
 >
 >   wink@ssi-primary:~$ uname -a
 >   Linux ssi-primary 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11
18:51:59 UTC
 > 2012 x86_64 x86_64 x86_64 GNU/Linux
 >
 >
 > And when I try to run gcc-4.7.3:
 >
 >   wink@ssi-primary:~$ ls -al
 >

~/aarch64-toolchain/gcc-linaro-aarch64-linux-gnu-4.7+bzr115029-20121015+bzr2506_linux/bin/aarch64-linux-gnu-gcc-4.7.3
 >   -rwxr-xr-x 1 wink wink 553068 Oct 18 14:21
 >

/home/wink/aarch64-toolchain/gcc-linaro-aarch64-linux-gnu-4.7+bzr115029-20121015+bzr2506_linux/bin/aarch64-linux-gnu-gcc-4.7.3
 >
 > I get a file not found:
 >
 >   wink@ssi-primary:~$ strace
 >

/home/wink/aarch64-toolchain/gcc-linaro-aarch64-linux-gnu-4.7+bzr115029-20121015+bzr2506_linux/bin/aarch64-linux-gnu-gcc-4.7.3
 > -v
 >
 >

execve("/home/wink/aarch64-toolchain/gcc-linaro-aarch64-linux-gnu-4.7+bzr115029-20121015+bzr2506_linux/bin/aarch64-linux-gnu-gcc-4.7.3",
 > ["/home/wink/aarch64-toolchain/gcc"..., "-v"], [/* 19 vars */]) =
   -1
 > ENOENT (No such file or directory)

This error usually means the executable is requesting a non-existent
"interpreter" (dynamic loader).  You need to install the 32-bit compat
lib package.  I don't remember what it's called on ubuntu, probably
ia32-libs or similar.

--
Mans Rullgard / mru

ATT1..txt

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: FPU error with arm-linux-gnueabihf-gcc 3.7.3

2012-12-12 Thread Richard Earnshaw


On 11/12/12 19:39, Matthew Gretton-Dann wrote:

On 10 December 2012 14:53, Zhangfei Gao  wrote:

We met build error when using arm-linux-gnueabihf-gcc 3.7.3 for Huawei
s40v200 kernel, which is 3.0.8, log is below.

HAVE issue:
gcc-linaro-arm-linux-gnueabihf-4.7-2012.11-20121123_linux.tar.bz2
gcc-linaro-arm-linux-gnueabihf-4.7-2012.10-20121022_linux.tar.bz2
gcc-linaro-arm-linux-gnueabi-2012.03-20120326_linux.tar.bz2, with build flag
--with-float=softfp


NO issue:
arm-eabi-gcc 4.4.0, which comes from android package.

Any suggestion?

Thanks

make[1]: `include/generated/mach-types.h' is up to date.
   CALLscripts/checksyscalls.sh
   CHK include/generated/compile.h
   AS  arch/arm/mach-godbox/hi_pm_sleep.o
arch/arm/mach-godbox/hi_pm_sleep.S: Assembler messages:
arch/arm/mach-godbox/hi_pm_sleep.S:456: Error: selected processor does not
support requested special purpose register -- `mrs r10,FPEXC'
arch/arm/mach-godbox/hi_pm_sleep.S:456: Error: selected processor does not
support requested special purpose register -- `msr FPEXC,r2'
arch/arm/mach-godbox/hi_pm_sleep.S:456: Error: selected processor does not
support requested special purpose register -- `mrs r2,FPSCR'
arch/arm/mach-godbox/hi_pm_sleep.S:546: Error: selected processor does not
support requested special purpose register -- `msr FPEXC,r2'
arch/arm/mach-godbox/hi_pm_sleep.S:546: Error: selected processor does not
support requested special purpose register -- `msr FPSCR,r10'
arch/arm/mach-godbox/hi_pm_sleep.S:546: Error: selected processor does not
support requested special purpose register -- `msr FPEXC,r9'
make[1]: *** [arch/arm/mach-godbox/hi_pm_sleep.o] Error 1
make: *** [arch/arm/mach-godbox] Error 2
make: *** Waiting for unfinished jobs


Can you show me the complete command line for assembling
arch/arm/mach-godbox/hi_pm_sleep.S into
arch/arm/mach-godbox/hi_pm_sleep.o please?  And also the output of gcc
-v?

My guess is that you are building GCC yourself have specified
--with-float=softfp, but not specified the actual floating point
architecture with --with-fpu= on the GCC configure line.  See the
documentation for -mfpu= in the GCC manuals to see what values are
valid here (again guessing but you probably want --with-fpu=vfpv3 or
--with-fpu=neon - but I don't know what architecture you are actually
compiling for).



Hmm, building the kernel with floating-point enabled is probably a 
no-no!  The kernel has to preserve the user-space floating point context 
as it isn't saved on kernel entry.  IIRC, insns in the kernel that 
really must access the co-processors have to use generic instrucions 
(MCR/MRC/LDC/STC, etc).


However, you might get a more useful answer if you talked to the kernel 
folk.


R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: AND vs UXTB

2012-08-03 Thread Richard Earnshaw

On 03/08/12 13:49, Mans Rullgard wrote:
> I have noticed gcc has a preference for generating UXTB instructions
> when an AND with #255 would do the same thing.  This is bad, because
> on A9 UXTB has two cycles latency compared to one cycle for AND.  On
> A8 both instructions have one cycle latency.
> 

UXTB on the other hand is a 16-bit instruction, whereas AND is a 32-bit one.

Of the cores I'm aware of, only A9 has this performance anomaly.

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Distinguishing SF/HF ABI binaries, take two

2012-08-03 Thread Richard Earnshaw

On 02/08/12 18:39, Mans Rullgard wrote:
> Nevertheless, the tags in the .ARM.attributes section are the standard,
> published way to identify FP ABI as well as a number of other properties
> that might be relevant to a linker.

1) The attributes only visible in the section view (as used by linkable
object files).  You can't rely on that being present in an executable image.

2) The encoding has been arranged for density, not performance; it's
unsuitable for low-cost look-ups when searching a chain of libraries.

Attributes were never intended for use at run time; IMO it would be a
mistake to try and coerce them into such a role.

R.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Fwd: [cbuild] gcc-4.8~svn189808 armv7l failed

2012-07-25 Thread Richard Earnshaw

On 25/07/12 05:16, Michael Hope wrote:
> FYI GCC trunk r189808 fails to build with a bootstrap comparison error:
> 
> Comparing stages 2 and 3
> warning: gcc/cc1-checksum.o differs
> warning: gcc/cc1plus-checksum.o differs
> warning: gcc/cc1obj-checksum.o differs
> warning: gcc/cc1objplus-checksum.o differs
> Bootstrap comparison failure!
> arm-linux-gnueabi/libgcc/unwind-arm.o differs
> arm-linux-gnueabi/libgcc/unwind-arm_s.o differs
> 
> 189575 was fine on hard float.  189745 is fine on softfp.
> 

189792 is fine for softfp as well.

R.

> -- Michael
> 
> -- Forwarded message --
> From: Linaro Toolchain Builder 
> Date: 25 July 2012 15:59
> Subject: [cbuild] gcc-4.8~svn189808 armv7l failed
> To: "michael.hope+not...@linaro.org" 
> 
> 
> ursa3 finished running job gcc-4.8~svn189808 on
> armv7l-precise-cbuild348-ursa3-cortexa9hfr1.
> 
> The results are here:
>  http://builds.linaro.org/toolchain/gcc-4.8~svn189808
> 
> 
> 
> This email is sent from a cbuild (https://launchpad.net/cbuild) based
> bot which is administered by Michael Hope .
> 





___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Armhf dynamic linker path

2012-05-02 Thread Richard Earnshaw

On 02/05/12 13:25, Mans Rullgard wrote:
> On 2 May 2012 05:15, Michael Hope  wrote:
>> On 27 April 2012 11:59, Michael Hope  wrote:
>>> On 23 April 2012 14:23, Jon Masters  wrote:
 On 04/22/2012 06:06 PM, Michael Hope wrote:
> On 21 April 2012 09:10, Jon Masters  wrote:
>> Hey everyone,
>>
>> Following up here. Where do we stand? We need to have upstream patches
>> before we can pull them into the distro - is that piece done?
>
> Hi Jon.  I've been away, sorry.  I've just sent the GCC patch and
> Carlos is on the hook for the GLIBC side.

 I saw the email. Could folks do me a favor and let me know the moment
 this lands in upstream and I'll arrange for us to pull it immediately.

 (I'm on all the libc lists, but then I'm on almost every list,
 everywhere, so it takes a bit of time to get to it)
>>>
>>> Hi Jon.  There's a fault with the GCC patch so it's still in progress.
>>>  Carlos sent the GLIBC patch out for review today.
>>
>> Hi Jon.  The GCC patch is now upstream as r186859 and r187012.
> 
> I noticed that it now sets the dynamic loader to /lib/ld-linux-armhf.so.3
> even when configured for soft-float ABI and linking against a soft-float
> rootfs.  The resulting binaries then fail to run.  Passing -mfloat-abi=softfp
> to the link command fixes it.  Is this change in behaviour intentional?
> 

Eh?  Exactly what command line did you invoke the compiler with, and
what was the configuration?  Are you sure you don't have a compiler
configured to hard-float by default?

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Update on stack (re-)alignment issues

2012-04-19 Thread Richard Earnshaw

On 18/04/12 18:36, Ulrich Weigand wrote:
> 
> Hello,
> 
> I've been following up on the discussion we had on Monday regarding stack
> alignment, and noticed that I had mis-remembered the current state of
> affairs.  Ramana asked me on Tuesday to provide a write-up of the actual
> status, so here we go ...
> 
> 
> To summarize the background of the problem:  on ARM, the incoming stack
> pointer is only guaranteed to be aligned to an 8 byte boundary.  This means
> that objects on the stack (local variables, spill slots, temporaries etc.)
> cannot easily be aligned to more than 8 bytes.  This can potentially cause
> problems in two situations:
> 
> 1) The object's default alignment (according to its type) is larger than 8
> bytes
> 2) The object has a forced non-default alignment that is larger than 8
> bytes
> 
> The first situation should in theory never appear, since according to the
> ARM ABI all types have a default alignment of at most 8 bytes.   However,
> due to the current mix-up in GCC, vector types actually are considered to
> have a 16-byte alignment requirement in GCC.
> 

> The second situation can only appear with local variables that are declared
> using attribute ((aligned)).
> 
> 
> We had discussed on Monday that we need to fix the second situation, since
> this can always occur and is supported on other platforms.   By doing so,
> we would then automatically fix the first situation as well.
> 
> However, this reasoning turns out to be incorrect.  There are currently in
> GCC *two* completely separate mechanisms that can be used to align objects
> on the stack to larger than the ABI guaranteed stack pointer alignment:
> 
> A) Re-alignment of the full stack frame.  This is what is used by the Intel
> back-end (and only the Intel back-end).  At function entry, generated code
> will align the stack pointer itself to whatever is necessary to fulfil
> alignment requirements of all objects on the stack.  This may necessitate
> follow-on changes: the frame pointer, if there is one, will likewise need
> to be aligned at runtime.  Also, since incoming stack arguments are now no
> longer at a fixed offset relative to the stack pointer *or* frame pointer
> in some cases, we might need an extra register as argument pointer.  This
> method allows extra alignment for *any* object on the stack, but needs
> significant back-end support in order to be enabled on any non-Intel
> architecture.
> 
> B) Dynamic allocation of selected stack variables.  This is implemented by
> common code with no involvement of the back-end.  In effect, the code in
> cfgexpand.c:expand_stack_vars that decides on how to allocate local
> variables on the stack will remove all variables that require extra
> alignment and place them into an extra structure.  Generated prologue code
> will then in effect dynamically allocate and align that structure on the
> stack, and just store a pointer to it as "variable" into the normal stack
> frame.  All other areas of the frame are unaffected.  Since this method
> just simulates code the programmer could have written themselves using
> alloca, it does not require *any* back-end support and is enabled by
> default everywhere.  However, it only works for regular local variables,
> and not for any other objects on the stack.

I read the C11 standard briefly a few months back, and I believe that B)
is all that is needed there.  The standard excludes over-aligning
function arguments.

> 
> Objects on the stack *except* local variables always use default alignment.
> Since on most platforms, except Intel and *currently* ARM, the ABI stack
> pointer alignment is sufficient to implement default alignments, method B)
> as above is able to fulfil all stack alignments.   Intel uses method A), so
> they're also OK.   In effect, it's only ARM due to the vector type
> alignment problem that runs into the situation that neither method works.
> 
> 
> Under those circumstances, given that:
> - we want to fix vector type alignment in order to become ABI compliant
> - once we've fixed this, we're in the same situation as other platforms and
> method B) already fixes stack alignment problems
> - implementing method A) is therefore both quite involved *and* actually
> superfluous
> 
> I'd now rather recommend that we *don't* try to implement method A)  (full
> stack-frame re-alignment) on ARM.
> 
> Comments?
> 

Yes, sounds like the right solution to me.

Technically, GCC's vector mechanism allows the creation of any size of
vector, which will be aligned to the size of the vector.  We only run
into problems when that size exceeds the maximum alignment.  Such values
passed by value to functions should also be over-aligned.  I think if we
were to continue supporting such non-standard types we would have to
change the rules to pass them by reference and have caller copying.
We'd still need to deal with the 16-byte vectors somehow though.

So overall, I think the only practical solution is to limit vectors to
8-byte

Re: getting armv7 linker to emit armv4t thumb interworking

2012-04-13 Thread Richard Earnshaw

On 13/04/12 12:05, Richard Earnshaw wrote:
> On 13/04/12 11:58, Mans Rullgard wrote:
>> On 13 April 2012 11:47, Richard Earnshaw  wrote:
>>> On 12/04/12 20:10, Allen Martin wrote:
>>>> I have a cross toolchain I configured with "--with-arch=armv7-a 
>>>> --with-cpu=cortex-a9 --with-tune=cortex-a9" and I want the linker to emit 
>>>> armv4t compatible thumb interworking, but I can't seem to get it to.
>>>>
>>>> I noticed that if I create a armv4t toolchain with "--with-arch=armv4t 
>>>> --with-cpu=arm7tdmi --with-tune=arm7tdmi" and then I pass "--use-blx" to 
>>>> the linker it will emit armv7 thumb interworking.  There doesn't seem to 
>>>> be any inverse "--no-use-blx" type switch though.  Is this a 
>>>> bug/limitation of the linker or am I misunderstanding something?
>>>>
>>>
>>> it's all in the friendly manual :-)
>>>
>>> The option you need is --fix-v4bx.
>>
>> That option is for supporting pre-thumb cores, which is not necessary
>> here.
> 
> Oops, misread the question.  Sorry.
> 
> To get v4t style interworking you need to ensure all your objects are
> built for v4t.  The use of blx should only occur if the linker detects
> an object file in the input list that already contains support for blx
> (for example, because it was compiled with v5 or later).
> 
> R.
> 
> 

I've just built an assembler/linker with the options you mention.

Compiling the following testcase:

.text
.arm
.cpu arm7tdmi
.global _start
.type _start, %function
_start: 
bl  bar
bx  lr
.thumb
.global bar
.type bar, %function
bar:
bx  lr

Results in:

.../as /tmp/asm.s -o /tmp/asm.o
.../ld -o /tmp/asm /tmp/asm.o

.../objdump -d /tmp/asm

/tmp/asm: file format elf32-littlearm


Disassembly of section .text:

8000 <_start>:
8000:   eb02bl  8010 <__bar_from_arm>
8004:   e12fff1ebx  lr

8008 :
8008:   4770bx  lr
800a:   46c0nop ; (mov r8, r8)
800c:   movsr0, r0
...

8010 <__bar_from_arm>:
8010:   e59fc000ldr ip, [pc]; 8018
<__bar_from_arm+0x8>
8014:   e12fff1cbx  ip
8018:   8009.word   0x8009
801c:   .word   0x

Change the .cpu directive to
.cpu cortex-a9
and you get

objdump -d /tmp/asm

/tmp/asm: file format elf32-littlearm


Disassembly of section .text:

8000 <_start>:
8000:   fa00blx 8008 
8004:   e12fff1ebx  lr

8008 :
8008:   4770bx  lr
800a:   bf00nop


So exactly how are you calling the linker?


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: [PATCH] ld: add switch to disable use of BLX instructions

2012-04-13 Thread Richard Earnshaw

> +target2_type, fix_v4bx, use_blx, no_use_blx,
>  vfp11_denorm_fix, no_enum_size_warning,
>  no_wchar_size_warning,
>  pic_veneer, fix_cortex_a8,
> @@ -533,6 +534,7 @@ PARSE_AND_LIST_PROLOGUE='
>  #define OPTION_NO_MERGE_EXIDX_ENTRIES   316
>  #define OPTION_FIX_ARM1176   317
>  #define OPTION_NO_FIX_ARM1176318
> +#define OPTION_NO_USE_BLX319
>  '
>
>  PARSE_AND_LIST_SHORTOPTS=p
> @@ -547,6 +549,7 @@ PARSE_AND_LIST_LONGOPTS='
>{ "fix-v4bx", no_argument, NULL, OPTION_FIX_V4BX},
>{ "fix-v4bx-interworking", no_argument, NULL, 
> OPTION_FIX_V4BX_INTERWORKING},
>{ "use-blx", no_argument, NULL, OPTION_USE_BLX},
> +  { "no-use-blx", no_argument, NULL, OPTION_NO_USE_BLX},
>{ "vfp11-denorm-fix", required_argument, NULL, OPTION_VFP11_DENORM_FIX},
>{ "no-enum-size-warning", no_argument, NULL, OPTION_NO_ENUM_SIZE_WARNING},
>{ "pic-veneer", no_argument, NULL, OPTION_PIC_VENEER},
> @@ -567,7 +570,7 @@ PARSE_AND_LIST_OPTIONS='
>fprintf (file, _("  --target2=Specify definition of 
> R_ARM_TARGET2\n"));
>fprintf (file, _("  --fix-v4bx  Rewrite BX rn as MOV pc, 
> rn for ARMv4\n"));
>fprintf (file, _("  --fix-v4bx-interworking Rewrite BX rn branch to 
> ARMv4 interworking veneer\n"));
> -  fprintf (file, _("  --use-blx   Enable use of BLX 
> instructions\n"));
> +  fprintf (file, _("  --[no-]use-blx  Disable/enable use of BLX 
> instructions\n"));
>fprintf (file, _("  --vfp11-denorm-fix  Specify how to fix VFP11 
> denorm erratum\n"));
>fprintf (file, _("  --no-enum-size-warning  Don'\''t warn about 
> objects with incompatible\n"
>  "enum sizes\n"));
> @@ -625,6 +628,10 @@ PARSE_AND_LIST_ARGS_CASES='
>use_blx = 1;
>break;
>
> +case OPTION_NO_USE_BLX:
> +  no_use_blx = 1;
> +  break;
> +
>  case OPTION_VFP11_DENORM_FIX:
>if (strcmp (optarg, "none") == 0)
>  vfp11_denorm_fix = BFD_ARM_VFP11_FIX_NONE;


--
Richard Earnshaw Email: richard.earns...@arm.com
Engineering Manager  Phone: +44 1223 400569 (Direct + VoiceMail)
OpenSource Tools Switchboard: +44 1223 400400
ARM Ltd  Fax: +44 1223 400410
110 Fulbourn Rd  Web: http://www.arm.com/
Cambridge, UK. CB1 9NJ

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: getting armv7 linker to emit armv4t thumb interworking

2012-04-13 Thread Richard Earnshaw

On 13/04/12 11:58, Mans Rullgard wrote:
> On 13 April 2012 11:47, Richard Earnshaw  wrote:
>> On 12/04/12 20:10, Allen Martin wrote:
>>> I have a cross toolchain I configured with "--with-arch=armv7-a 
>>> --with-cpu=cortex-a9 --with-tune=cortex-a9" and I want the linker to emit 
>>> armv4t compatible thumb interworking, but I can't seem to get it to.
>>>
>>> I noticed that if I create a armv4t toolchain with "--with-arch=armv4t 
>>> --with-cpu=arm7tdmi --with-tune=arm7tdmi" and then I pass "--use-blx" to 
>>> the linker it will emit armv7 thumb interworking.  There doesn't seem to be 
>>> any inverse "--no-use-blx" type switch though.  Is this a bug/limitation of 
>>> the linker or am I misunderstanding something?
>>>
>>
>> it's all in the friendly manual :-)
>>
>> The option you need is --fix-v4bx.
>
> That option is for supporting pre-thumb cores, which is not necessary
> here.

Oops, misread the question.  Sorry.

To get v4t style interworking you need to ensure all your objects are
built for v4t.  The use of blx should only occur if the linker detects
an object file in the input list that already contains support for blx
(for example, because it was compiled with v5 or later).

R.


--
Richard Earnshaw Email: richard.earns...@arm.com
Engineering Manager  Phone: +44 1223 400569 (Direct + VoiceMail)
OpenSource Tools Switchboard: +44 1223 400400
ARM Ltd  Fax: +44 1223 400410
110 Fulbourn Rd  Web: http://www.arm.com/
Cambridge, UK. CB1 9NJ

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: getting armv7 linker to emit armv4t thumb interworking

2012-04-13 Thread Richard Earnshaw

On 12/04/12 20:10, Allen Martin wrote:
> I have a cross toolchain I configured with "--with-arch=armv7-a 
> --with-cpu=cortex-a9 --with-tune=cortex-a9" and I want the linker to emit 
> armv4t compatible thumb interworking, but I can't seem to get it to.
>
> I noticed that if I create a armv4t toolchain with "--with-arch=armv4t 
> --with-cpu=arm7tdmi --with-tune=arm7tdmi" and then I pass "--use-blx" to the 
> linker it will emit armv7 thumb interworking.  There doesn't seem to be any 
> inverse "--no-use-blx" type switch though.  Is this a bug/limitation of the 
> linker or am I misunderstanding something?
>

it's all in the friendly manual :-)

The option you need is --fix-v4bx.

BTW, taking a v7a toolchain and trying to use it to build v4 binaries is
likely to be prolematic.  For it to work you'll need to ensure that
*all* your libraries are built to support back-conversion to v4,
including those that are normally built as part of the toolchain
(libgcc, etc).

R.

--
Richard Earnshaw Email: richard.earns...@arm.com
Engineering Manager  Phone: +44 1223 400569 (Direct + VoiceMail)
OpenSource Tools Switchboard: +44 1223 400400
ARM Ltd  Fax: +44 1223 400410
110 Fulbourn Rd  Web: http://www.arm.com/
Cambridge, UK. CB1 9NJ

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Phone call (was Re: Armhf dynamic linker path)

2012-04-13 Thread Richard Earnshaw

On 12/04/12 19:29, Dennis Gilmore wrote:
> 
> off topic but i find aarch64 weird and too generic is it arm alpha amd
> atom.  
> 

That's only 'cos it's new.  It's no different from names like ia64.

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: New conference call numbers

2012-03-05 Thread Richard Earnshaw

Number is back up, but no host...



On 5 Mar 2012, at 09:38, "Richard Earnshaw"  wrote:

> Call went silent on me.  Redialing is giving number unobtainable :-(
>
>
>
> On 4 Mar 2012, at 22:35, "Michael Hope"  wrote:
>
>> On Mon, Mar 5, 2012 at 11:31 AM, Peter Maydell  
>> wrote:
>>> On 4 March 2012 22:21, Michael Hope  wrote:
>>>> The new conference call numbers have arrived.  Let's use them for today's 
>>>> call.
>>>>
>>>> The toll free numbers are:
>>>>
>>>> * Australia:  1 800 804 786
>>>> * China:  +400 120 05 90
>>>> * Germany:  01801 003 899
>>>> * New Zealand:  0800 452 947
>>>> * Sweden:  0200 125 588
>>>> * UK:  0845 351 2782
>>>
>>> This isn't toll-free : UK 0845 numbers are traditionally "local rate".
>>> Your wiki page correctly lists this as "lo-call".
>>>
>>>> * USA:  1-866-398-2885
>>>>
>>>> Alternates, including cheaper (for Linaro) local numbers are:
>>>>
>>>> * International toll free - Germany:  0800 588 9170
>>>> * International toll free - UK:  0800 358 6385
>>>
>>> ...and indeed this is the UK toll free number.
>>
>> Ta.  Use whichever is affordable and best.
>>
>> -- Michael
>>
>> ___
>> linaro-toolchain mailing list
>> linaro-toolchain@lists.linaro.org
>> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
>>

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: New conference call numbers

2012-03-05 Thread Richard Earnshaw

Call went silent on me.  Redialing is giving number unobtainable :-(



On 4 Mar 2012, at 22:35, "Michael Hope"  wrote:

> On Mon, Mar 5, 2012 at 11:31 AM, Peter Maydell  
> wrote:
>> On 4 March 2012 22:21, Michael Hope  wrote:
>>> The new conference call numbers have arrived.  Let's use them for today's 
>>> call.
>>>
>>> The toll free numbers are:
>>>
>>>  * Australia:  1 800 804 786
>>>  * China:  +400 120 05 90
>>>  * Germany:  01801 003 899
>>>  * New Zealand:  0800 452 947
>>>  * Sweden:  0200 125 588
>>>  * UK:  0845 351 2782
>>
>> This isn't toll-free : UK 0845 numbers are traditionally "local rate".
>> Your wiki page correctly lists this as "lo-call".
>>
>>>  * USA:  1-866-398-2885
>>>
>>> Alternates, including cheaper (for Linaro) local numbers are:
>>>
>>>  * International toll free - Germany:  0800 588 9170
>>>  * International toll free - UK:  0800 358 6385
>>
>> ...and indeed this is the UK toll free number.
>
> Ta.  Use whichever is affordable and best.
>
> -- Michael
>
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
>

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: gcc: Thumb interworking and weakly linked functions

2012-02-23 Thread Richard Earnshaw

On 23/02/12 10:27, Aneesh V wrote:
> Ok. Agree. I never used to use %function when I wrote assembly
> functions earlier. I am sure a lot of code will break if this was
> enforced.

If you've not used %function on ARM, then your code is semantically
broken even if it isn't syntactically broken.

The ABI rules for dealing with interworking and and out-of-range
branches all rely on %function being used correctly.

R.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Operations on intrinsic types

2012-01-06 Thread Richard Earnshaw

On 06/01/12 02:17, Michael Hope wrote:
> Hi Ramana.  You were right about being able to do operations on
> intrinsic types.  Instead of doing the admittedly made up:
>
> int16x4_t foo2(int16x4_t a, int16x4_t b)
> {
>   int16x4_t ca = vdup_n_s16(0.2126*256);
>   int16x4_t cb = vdup_n_s16(0.7152*256);
>
>   return vadd_s16(vmul_s16(ca, a), vmul_s16(cb, b));
> }
>
> you can do:
>
> int16x4_t foo3(int16x4_t a, int16x4_t b)
> {
>   int16x4_t ca = vdup_n_s16(0.2126*256);
>   int16x4_t cb = vdup_n_s16(0.7152*256);
>
>   return ca*a + cb*b;
> }
>
> which is more readable and, as an added bonus, generates the
> multiply-and-accumulate that I missed when using intrinsics.  Nice.

This is a GCC extension.  It's not portable, and in particular it's not
supported by ARM's own compiler.

There are also difficulties if you start doing operations directly when
it comes to dealing with big-endian as there is a degree of divergence
between GCC's own interpretation of vectors and the intrinsic view;
mixing and matching can lead to subtle problems with lane numbering.


--
Richard Earnshaw Email: richard.earns...@arm.com
Engineering Manager  Phone: +44 1223 400569 (Direct + VoiceMail)
OpenSource Tools Switchboard: +44 1223 400400
ARM Ltd  Fax: +44 1223 400410
110 Fulbourn Rd  Web: http://www.arm.com/
Cambridge, UK. CB1 9NJ

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: How to fail an ARMv5T u-boot build when libgcc is ARMv7T2?

2011-11-10 Thread Richard Earnshaw


On 10 Nov 2011, at 23:47, "Loïc Minier"  wrote:

> On Thu, Nov 10, 2011, Richard Earnshaw wrote:
>> You can't rely on finding it.  Executable images are only required to
>> have segment headers an d the attributes system uses section headers.
>> Section headers can be removed when an image gets stripped.
>
> but if I have it during the u-boot build, it's reliable?

I've no idea how uboot works in detail.  I'm describing the global rules for 
elf images.
>
> If not, can I check the architecture for which libgcc was built
> programatically?

If it's a static library, the you could extract one of the object files and 
look at its attributes.  If it's a shared lib, then the same restrictions would 
apply as to an executable.

R.
>
> --
> Loïc Minier
>

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: How to fail an ARMv5T u-boot build when libgcc is ARMv7T2?

2011-11-10 Thread Richard Earnshaw

You can't rely on finding it.  Executable images are only required to have 
segment headers an d the attributes system uses section headers.  Section 
headers can be removed when an image gets stripped.



On 10 Nov 2011, at 16:46, "Loïc Minier" 
mailto:loic.min...@linaro.org>> wrote:

On Thu, Nov 10, 2011, Richard Earnshaw wrote:
The build attributes were never intended to be used in executables; the
format is too expensive to decode in most situations.  Instead the ABI
specification included an optional PHEADER that had a highly simplified
indication of the executable compatibility of an image (essentially the
architecture).  Unfortunately this has never been added to GNU Binutils.

See PT_ARM_ARCHEXT in the ARM ELF specification (available from infocenter).

As a poor man's test, would it be reliable to test Tag_CPU_name on the
resulting u-boot ELF binary at the end of the build?  If I read you
correctly, you seem to suggest that it would be too fragile?

--
Loïc Minier


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: How to fail an ARMv5T u-boot build when libgcc is ARMv7T2?

2011-11-10 Thread Richard Earnshaw

ch/arm/cpu/arm926ejs/versatile/libversatile.o arch/arm/lib/libarm.o 
common/libcommon.o disk/libdisk.o drivers/bios_emulator/libatibiosemu.o 
drivers/block/libblock.o drivers/dma/libdma.o drivers/fpga/
libfpga.o drivers/gpio/libgpio.o drivers/hwmon/libhwmon.o drivers/i2c/libi2c.o 
drivers/input/libinput.o drivers/misc/libmisc.o drivers/mmc/libmmc.o 
drivers/mtd/libmtd.o drivers/mtd/nand/libnand.o 
drivers/mtd/onenand/libonenand.o drivers/mtd/spi/libspi_flash.o 
drivers/mtd/ubi/libubi.o drivers/net/libnet.o drivers/net/phy/libphy.o 
drivers/pci/libpci.o drivers/pcmcia/libpcmcia.o drivers/power/libpower.o 
drivers/rtc/librtc.o drivers/serial/libserial.o drivers/spi/libspi.o 
drivers/twserial/libtws.o drivers/usb/eth/libusb_eth.o 
drivers/usb/gadget/libusb_gadget.o drivers/usb/host/libusb_host.o 
drivers/usb/musb/libusb_musb.o drivers/usb/phy/libusb_phy.o 
drivers/video/libvideo.o drivers/watchdog/libwatchdog.o fs/cramfs/libcramfs.o 
fs/ext2/libext2fs.o fs/fat/libfat.o fs/fdos/libfdos.o fs/jffs2/libjffs2.o 
fs/reiserfs/libreiserfs.o fs/ubifs/libubifs.o fs/yaffs2/libyaffs2.o 
lib/libfdt/libfdt.o lib/libgeneric.o lib/lzma/liblzma.o lib/lzo/liblzo.o 
lib/zlib/libz.o net/libnet.o post/libpost.o
 board/armltd/versatile/libversatile.o --end-group 
/home/lool/git/denx/u-boot/obj-v-broken/arch/arm/lib/eabi_compat.o  -L 
/usr/lib/gcc/arm-linux-gnueabi/4.6.1 -lgcc -Map u-boot.map -o u-boot
>
>  I verified that all the .o files passed above have Tag_CPU_name:
>  "5TE" in their arm-linux-gnueabi-readelf -A output; the only
>  problematic file is -lgcc.
>
>  Note that the final link is done with arm-linux-gnueabi-ld and doesn't
>  set any architecture; I changed it manually to use gcc and pass the
>  -marm -march=armv5te, and had to set -nostdlib too when using gcc:
> arm-linux-gnueabi-gcc -marm -march=armv5te -nostdlib -pie -T 
> /home/lool/git/denx/u-boot/obj-v-broken/u-boot.lds -Bstatic -Ttext 0x1 
> $UNDEF_SYM arch/arm/cpu/arm926ejs/start.o -Wl,--start-group api/libapi.o 
> arch/arm/cpu/arm926ejs/libarm926ejs.o 
> arch/arm/cpu/arm926ejs/versatile/libversatile.o arch/arm/lib/libarm.o 
> common/libcommon.o disk/libdisk.o drivers/bios_emulator/libatibiosemu.o 
> drivers/block/libblock.o drivers/dma/libdma.o drivers/fpga/libfpga.o 
> drivers/gpio/libgpio.o drivers/hwmon/libhwmon.o drivers/i2c/libi2c.o 
> drivers/input/libinput.o drivers/misc/libmisc.o drivers/mmc/libmmc.o 
> drivers/mtd/libmtd.o drivers/mtd/nand/libnand.o 
> drivers/mtd/onenand/libonenand.o drivers/mtd/spi/libspi_flash.o 
> drivers/mtd/ubi/libubi.o drivers/net/libnet.o drivers/net/phy/libphy.o 
> drivers/pci/libpci.o drivers/pcmcia/libpcmcia.o drivers/power/libpower.o 
> drivers/rtc/librtc.o drivers/serial/libserial.o drivers/spi/libspi.o 
> drivers/twserial/libtws.o drivers/usb/eth/libusb_eth.o drivers/usb/
gadget/libusb_gadget.o drivers/usb/host/libusb_host.o 
drivers/usb/musb/libusb_musb.o drivers/usb/phy/libusb_phy.o 
drivers/video/libvideo.o drivers/watchdog/libwatchdog.o fs/cramfs/libcramfs.o 
fs/ext2/libext2fs.o fs/fat/libfat.o fs/fdos/libfdos.o fs/jffs2/libjffs2.o 
fs/reiserfs/libreiserfs.o fs/ubifs/libubifs.o fs/yaffs2/libyaffs2.o 
lib/libfdt/libfdt.o lib/libgeneric.o lib/lzma/liblzma.o lib/lzo/liblzo.o 
lib/zlib/libz.o net/libnet.o post/libpost.o 
board/armltd/versatile/libversatile.o -Wl,--end-group 
/home/lool/git/denx/u-boot/obj-v-broken/arch/arm/lib/eabi_compat.o  -L 
/usr/lib/gcc/arm-linux-gnueabi/4.6.1 -lgcc -Wl,-Map u-boot.map -o u-boot
>
>  But this command works and produces an u-boot ELF which has
>  Tag_CPU_name: "7-A".
>

The build attributes were never intended to be used in executables; the
format is too expensive to decode in most situations.  Instead the ABI
specification included an optional PHEADER that had a highly simplified
indication of the executable compatibility of an image (essentially the
architecture).  Unfortunately this has never been added to GNU Binutils.

See PT_ARM_ARCHEXT in the ARM ELF specification (available from infocenter).

R.

>  How would I break the build when libgcc isn't ARMv5T?
>
>Thanks,


--
Richard Earnshaw Email: richard.earns...@arm.com
Engineering Manager  Phone: +44 1223 400569 (Direct + VoiceMail)
OpenSource Tools Switchboard: +44 1223 400400
ARM Ltd  Fax: +44 1223 400410
110 Fulbourn Rd  Web: http://www.arm.com/
Cambridge, UK. CB1 9NJ

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium.  Thank you.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: eglibc and fun with config.sub

2011-09-16 Thread Richard Earnshaw

On 16/09/11 15:12, Ulrich Weigand wrote:
> Richard Sandiford  wrote:
>> David Gilbert  writes:
>>> My current patch:
>>>   * adds armv6 and armv7 to config.sub
>>>   * adds arm/eabi/armv7 and arm/eabi/armv6t2   and one assembler
>>> routine in there.
>>>   * If $machine is just 'arm' then it autodetects from gcc's #defines
>>>   * else if $machine is armv then that's still $machine
>>
>> I'm taking you literally here, but I think you want things like
>> armeb-linux-gnueabi to be treated like arm-linux-gnueabi.
>>
>> TBH, I think unconditionally using the autodetect (but setting $machine
>> rather than $submachine, as you say) would be easier and more consistent
>> across packages.  gcc and eglibc will then agree on the target, whereas
>> the extra complication in the current scheme is there simply to make
>> eglibc and gcc disagree in certain cases.  But I realise we might not
>> want to fight that fight.
> 
> FWIW I'd tend to agree that encoding the architecture level into the
> target triplet seems to lead to more confusion than that it helps ...
> 

Agreed.  Especially as

1) It's incomplete -- doesn't cover FPU/Neon capabilities
2) It's generally mixed up with other things -- like endianness.  This
makes it all the more confusing
3) It's too ad-hoc -- sometimes it's an architecture, sometimes it's a
CPU.  This leads to all sorts of weird and wonderful names appearing in
the config string which have to be understood.

> We certainly never did that on s390 (or powerpc, for that matter);
> instead, the way to select an architecture level is to build your
> system compiler to default to that level, and then build all your
> system libraries with that compiler; the libraries will detect
> the desired architecture level from GCC defines.

I think it comes from the i[345]86 mess.  I suspect it has origins in
red-hat linux where the config triplet is built directly from the 'cpu'
reported by the kernel uname -m component.

> 
> On the other hand, given that arm has already gone down the road of
> using the target triplet, I guess I can see why it might make sense
> to continue that.  In the end, that's for the platform maintainers
> to decide ...
> 

I'd prefer that we retrenched if possible.  Let's not make the rat-hole
even bigger.

R.

> 
> Mit freundlichen Gruessen / Best Regards
> 
> Ulrich Weigand
> 
> --
>   Dr. Ulrich Weigand | Phone: +49-7031/16-3727
>   STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
>   IBM Deutschland Research & Development GmbH
>   Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
> Wittkopp
>   Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
> Stuttgart, HRB 243294
> 
> 
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
> 



___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: A question about disabling -gtoggle in bootstrap run

2011-03-04 Thread Richard Earnshaw


On Fri, 2011-03-04 at 15:33 +0200, Revital1 Eres wrote:
> Hello,
> 
> I am looking for a way to disable '-gtoggle' flag in the run of stage 2 in
> bootstrap; when
> configuring ARM with (*).
> The flag seems to be applied in stage 2 but not in stage 3 which seems to
> cause bootstrap failure when
> testing SMS as in stage 2 SMS fails because of  debug_insn caused
> by -gtoggle disturbing do-loop; while in stage 3 SMS succeeds; resulting
> in different .o files and bootsrtrap failure.
> 
> (*) This the configure I used:
> ../gcc/configure --prefix=/home/eres/mainline/build --enable-checking
> --enable-languages=c --enable-bootstrap

By definition that's regarded as a bug in the compiler.  Code generated
with debug enabled is required to be identical to code generated without
it.

R.



___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Perfromance Test Results using gcc-linaro-4.5-2010.11-1

2010-12-09 Thread Richard Earnshaw

On Thu, 2010-12-09 at 13:31 +1300, Michael Hope wrote:
> On Wed, Dec 8, 2010 at 7:10 PM, Michael Hope  wrote:
> -fno-common also gives a small improvement in various benchmarks, but
> may break some programs.

Any breakage with -fno-common would be detected at link time, so if your
program compiles successfully it should run successfully.  

The breakage is that some non-strictly conforming programs that declare
variables tentatively with, for example:

int x;

in multiple translation units will get multiple-definition errors at
link time.

R.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Using inline NEON code

2010-12-03 Thread Richard Earnshaw

On Fri, 2010-12-03 at 10:49 +1300, Michael Hope wrote:
> Hi there.  Currently you can't use NEON instructions in inline
> assembly if the compiler is set to -mfpu=vfp such as Ubuntu's
> -mfpu=vfpv3-d16.  Trying code like this:
> 
> int main()
> {
>asm("veor d1, d2, d3");
>return 0;
> }
> 
> gives an error message like:
> 
> test.s: Assembler messages:
> test.s:29: Error: selected processor does not support Thumb mode `veor 
> d1,d2,d3'
> 
> The problem is that -mfpu=vfpv3-d16 has two jobs:  it tells the
> compiler what instructions to use, and also tells the assembler what
> instructions are valid.  We might want the compiler to use the VFP for
> compatibility or power reasons, but still be able to use NEON
> instructions in inline assembler without passing extra flags.

Sorry, I disagree with that distinction.

> 
> Inserting ".fpu neon" to the start of the inline assembly fixes the
> problem.  Is this valid?  

Not really.  It changes the global setting for the rest of the file and
changes the global attributes on your object.

> Are assembly files with multiple .fpu
> statements allowed?  Passing '-Wa,-mfpu=neon' to GCC doesn't work as
> gas seems to ignore the second -mfpu.
> 
> What's the best way to handle this? Some options are:
>  * Add '.fpu neon' directives to the start of any inline assembly

Nope, .fpu neon is used in building the attributes data.  The attribute
architecture in the ABI supports attributes on sections or symbols, it
does not support them on arbitrary snippets of code.  The point of
attributes is to reason about compatibility and users intentions.  You
can't do that if the attributes data is messed up.

Currently BFD/GAS/GOLD cannot generate/interpret attributes on
sub-elements of object files; they can only work on file-scope
attributes.  That's a bug, but it's not really relevant to this
discussion.

>  * Separate out the features, so you can specify the capabilities with
> one option and restrict the compiler to a subset with another.
> Something like '-mfpu=neon -mfpu-tune=vfpv3-d16'

That'll just confuse users, IMO.  Also, see below

>  * Relax the assembler so that any instructions are accepted.  We'd
> lose some checking of GCC's output though.

Trust me, you just don't want to do that (see above).

You've missed the right answer though: make the compiler smarter.  The
compiler should be able to work out when it's beneficial to use neon and
when it's not: adding hooks to try and defeat a poor choice by the
compiler sounds like bad design to me.

R.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Help on merging two patches

2010-11-05 Thread Richard Earnshaw


On Wed, 2010-11-03 at 17:39 +0800, Yao Qi wrote:
> Hi,
> I am backporint some patches from FSF mainline, which may improve Linaro 
> 4.5 gcc on thumb2 speed.
> 
> The first one is done by Richard E. "Improve optimization to transform 
> TST into LSLS"
> http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html
> After it applied to Linaro 4.5 tree, EEMBC speed number downgrades, 
> while code size is reduced to some extent.  The code difference is like 
> this,
> 
>   6801ldr r1, [r0, #0]
>   f831 3013   ldrh.w  r3, [r1, r3, lsl #1]
> -f413 6f00tst.w   r3, #2048   ; 0x800
> -f43f af41beq.w   cc 
> +0518 lslsr0, r3, #20
> +f57f af44bpl.w   cc 
>   4610mov r0, r2
> 
> After reading cortex-a8 TRM, I can't find exact timing cycles of lsls. 
> Under Chung-Lin's help, we feel that lsls should be slower than tst, but 
> don't have any evidence to prove.  If any people is familiar with arm 
> microarch, help is welcome.  If our assumption is correct, we may can 
> change this patch to an optimization specific to size only.
> 
> The second patch is Bernd's "Fix an if statement in arm_rtx_costs_1"
> http://gcc.gnu.org/ml/gcc-patches/2010-07/msg02096.html
> After this patch applied, EEMBC benchmark number is not changed.  Shall 
> we merge this patch to linaro 4.5 tree?  I am inclined to merge it, but 
> if you have concerns on this patch, let us discuss here.

So I have no reason to expect lsls to ever take longer to execute than
tst.  I suspect what you are seeing here is some unfortunate side effect
that can't be explained from the small code snippet.  An example would
include BTAC aliasing, but there could be other reasons for this
happening.

So overall, I'd expect the change to be a Good Thing (tm), but there's
always the chance that individual blocks of code may run more slowly.

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Host strip corrupts cross-built binaries

2010-09-29 Thread Richard Earnshaw

On Wed, 2010-09-22 at 08:36 -0700, Mark Mitchell wrote:
> On 9/22/2010 8:34 AM, Loïc Minier wrote:
> 
> >  Which component is to blame here?  Are we looking at a binutils or a
> >  gcc bug for not being able to set or read enough data that the
> >  architecture mismatch isn't detected?  What could we do about it?
> 
> This is definitely a binutils bug.

So I agree that it's a bug in binutils that this produces incorrect
output.  

I don't think it's necessarily a bug that a generic ELF strip program is
unable to strip all ARM ELF binary files.  This is particularly true for
unlinked object files that can contain additional sections that refer to
the symbol section.

R.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Linaro GCC 4.4 and 4.5 2010.09 released

2010-09-20 Thread Richard Earnshaw


On Mon, 2010-09-20 at 09:14 +0100, Dave Martin wrote:
> On Fri, Sep 17, 2010 at 11:31 PM, Loïc Minier  wrote:
> > On Wed, Sep 15, 2010, Michael Hope wrote:
> >> GLIBC a mechanism for picking the best routines to use based on the
> >> CPU capabilities.  This means that GLIBC can include A8 and A9
> >> versions both with and without NEON, Ubuntu can ship all of these
> >> versions, and the dynamic linker can choose the best one based on the
> >> chip it is running on.
> >
> >  Actually I understand STT_GNU_IFUNC would allow that, we just lack a
> >  good test
> 
> Is STT_GNU_IFUNC implemented yet?
> 

No.  We need to sort out the ABI specs first.

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Thumb2 code size improvements

2010-09-17 Thread Richard Earnshaw


On Fri, 2010-09-17 at 18:21 +0800, Yao Qi wrote:
> Michael Hope wrote:
> > It's only part of the puzzle, but I run speed benchmarks as part of
> > the continious build:
> >  http://ex.seabright.co.nz/helpers/buildlog
> >  http://ex.seabright.co.nz/helpers/benchcompare
> >  
> > http://ex.seabright.co.nz/build/gcc-linaro-4.5-2010.09-1/logs/armv7l-maverick-cbuild4-pavo4/pybench-test.txt
> > 
> > I've just modified this to build different variants as well.  ffmpeg
> > now builds as supplied (-O2 and others), with -Os, with hand-written
> > assembler turned off, and with -mfpu=neon.  corebench builds in -O2
> > and -Os.
> 
> Here are some options we may have to use in our benchmarks,
> {-Os,-O2} -fno-common -mthumb --mfloat-abi={hard,soft} -mfpu=neon
> 
> IIRC, hardfp will increase the code size to some extent.

hard-float should show a significant code size saving over pure
soft-float for anything with floating point code as the compiler will be
able to use single instructions for many operations rather than library
calls.

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Thumb2 code size improvements

2010-09-10 Thread Richard Earnshaw


On Tue, 2010-09-07 at 12:24 +0100, Julian Brown wrote:
> On Tue, 7 Sep 2010 12:55:59 +0200
> Loïc Minier  wrote:
> 
> > On Tue, Sep 07, 2010, Julian Brown wrote:
> > >   Do
> > > you still have the code fragment handy (I don't remember exactly how
> > > it went)?
> > 
> >  You can extract it from the wiki history with the "Info" action on
> > the page and then diffing revisions:
> 
> Oh right, I should have realised that :-).
> 
> > 1. stmdb/ldmia registers that are not used
> >  * Observations
> > {{{
> > Dump of assembler code for function history_expand_line_internal:
> >0x1c1c <+0>: stmdb sp!, {r4, r5, r6, r7, r8, lr}
> 
> This could be:
> 
>   push {r3, r4, r5, r6, r7, lr}
> 
> >0x1c20 <+4>: movs r1, #0
> >0x1c22 <+6>: ldr r5, [pc, #52] ; (0x1c58
> > ) 0x1c24 <+8>: mov r2, r1
> >0x1c26 <+10>: mov r6, r0
> >0x1c28 <+12>: ldr r7, [r5, #0]
> >0x1c2a <+14>: str r1, [r5, #0]
> >0x1c2c <+16>: bl 0x1c2c 
> >0x1c30 <+20>: str r7, [r5, #0]
> >0x1c32 <+22>: cmp r0, r6
> >0x1c34 <+24>: mov r4, r0
> >0x1c36 <+26>: bne.n 0x1c52 
> >0x1c38 <+28>: bl 0x1c38 
> >0x1c3c <+32>: ldr r1, [pc, #28] ; (0x1c5c
> > ) 0x1c3e <+34>: movw r2, #1850 ;
> > 0x73a 0x1c42 <+38>: adds r0, #1
> >0x1c44 <+40>: bl 0x1c44 
> >0x1c48 <+44>: mov r1, r4
> >0x1c4a <+46>: ldmia.w sp!, {r4, r5, r6, r7, r8, lr}
> 
> This must remain a wide instruction...
> 
>   ldmia.w sp!, {r3, r4, r5, r6, r7, lr}
> 
> >0x1c4e <+50>: b.w 0x1c4e 
> >0x1c52 <+54>: ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
> 
> But this could be:
> 
>   pop {r3, r4, r5, r6, r7, pc}
> 
> >0x1c56 <+58>: nop
> >0x1c58 <+60>: andeq r0, r0, r0
> >0x1c5c <+64>: andeq r0, r0, r0
> > }}}
> > Register r8 is not used in this function, so no need to save/restore
> > r8.
> >  * Possible improvements
> 
> So yeah, I think there is indeed a possible improvement here (and we
> don't even need to break the EABI, I don't think). Unless I've
> overlooked something, anyway...
> 

GCC 4.5 should already do this:

2009-06-02  Richard Earnshaw  

* arm.c (arm_get_frame_offsets): Prefer using r3 for padding a
push/pop multiple to 8-byte alignment.

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Thumb2 code size improvements

2010-09-10 Thread Richard Earnshaw


On Tue, 2010-09-07 at 13:09 +0100, Andrew Stubbs wrote:
> On 07/09/10 13:01, Yao Qi wrote:
> >> * Investigate reduced alignment constraints?
> >
> > Any details on this?
> 
> No, I just know that some targets like to align functions to 
> cache-lines. This is a useful speed optimization, but does lead to lots 
> of "blank" gaps in the code. I have no real idea if ARM does this kind 
> of thing, or if the ABI has anything to say about it.
> 
> I just suggest that we should check it out - or at least ask an ARM 
> expert if I'm talking nonsense. :)

I'm pretty certain we don't do this with gratuitously -Os on ARM.  We
may, however, align some thumb functions to a 32-bit boundary
unnecessarily (still needed if there's a literal pool).

R.


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

50 matches

Mail list logo