Re: [Linaro-TCWG-CI] gcc-15-3607-g9a94c8ffdc8b: FAIL: 23 regressions: 22 improvements on master-thumb_m23_soft_eabi
On 24/09/2024 22:20, Maxim Kuvyrkov wrote: On Sep 25, 2024, at 05:13, Richard Earnshaw (lists) wrote: On 21/09/2024 08:49, ci_not...@linaro.org wrote: Dear contributor, our automatic CI has detected problems related to your patch(es). Please find some details below. If you have any questions, please follow up on linaro-toolchain@lists.linaro.org mailing list, Libera's #linaro-tcwg channel, or ping your favourite Linaro toolchain developer on the usual project channel. We understand that it might be difficult to find the necessary logs or reproduce the issue locally. If you can't get what you need from our CI within minutes, let us know and we will be happy to help. We track this report status in https://linaro.atlassian.net/browse/GNU-1349 , please let us know if you are looking at the problem and/or when you have a fix. In arm-eabi cortex-m23 soft after: | commit gcc-15-3607-g9a94c8ffdc8b | Author: Richard Earnshaw | Date: Thu Sep 12 14:24:55 2024 +0100 | | arm: testsuite: make use of -mcpu=unset/-march=unset | | This patch makes use of the new ability to unset the CPU or | architecture flags on the command line to enable several more tests on | Arm. It doesn't cover every case and it does enable some tests that | now fail for different reasons when the tests are no-longer skipped; | these were failing anyway for other testsuite configurations, so it's | ... 22 lines of the commit log omitted. FAIL: 23 regressions: 22 improvements regressions.sum: === gcc tests === Running gcc:gcc.target/arm/arm.exp ... FAIL: gcc.target/arm/scd42-2.c scan-assembler mov[ \t].*272 Running gcc:gcc.target/arm/cmse/cmse.exp ... FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 ... and 19 more entries improvements.sum: === gcc tests === Running gcc:gcc.target/arm/cmse/cmse.exp ... FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 ... and 16 more entries I can't make any sense of this at all. After hours wasted trying to find the configuration information from the logs (it's there, but to the inexperienced user of your reports, it is buried far too deep), I'm still none-the-wiser. Hi Richard, Thanks for looking into this. Do send us a quick email if you can't immideatelly find what you are looking for. As our email says "We understand that it might be difficult to find the necessary logs or reproduce the issue locally. If you can't get what you need from our CI within minutes, let us know and we will be happy to help." Regarding adding configure information to our reports -- we are working on it. Great. Where do I find the dejagnu target-list information for a run? ie the site.exp file (or whatever the DEJAGNU environment variable points at). All I can see is that things like FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 have changed to FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 (ie that -mcpu=unset has been added to the test name). That's not a regression, it's a simple FAIL->FAIL
Re: [Linaro-TCWG-CI] gcc-15-3607-g9a94c8ffdc8b: FAIL: 23 regressions: 22 improvements on master-thumb_m23_soft_eabi
On 21/09/2024 08:49, ci_not...@linaro.org wrote: Dear contributor, our automatic CI has detected problems related to your patch(es). Please find some details below. If you have any questions, please follow up on linaro-toolchain@lists.linaro.org mailing list, Libera's #linaro-tcwg channel, or ping your favourite Linaro toolchain developer on the usual project channel. We understand that it might be difficult to find the necessary logs or reproduce the issue locally. If you can't get what you need from our CI within minutes, let us know and we will be happy to help. We track this report status in https://linaro.atlassian.net/browse/GNU-1349 , please let us know if you are looking at the problem and/or when you have a fix. In arm-eabi cortex-m23 soft after: | commit gcc-15-3607-g9a94c8ffdc8b | Author: Richard Earnshaw | Date: Thu Sep 12 14:24:55 2024 +0100 | | arm: testsuite: make use of -mcpu=unset/-march=unset | | This patch makes use of the new ability to unset the CPU or | architecture flags on the command line to enable several more tests on | Arm. It doesn't cover every case and it does enable some tests that | now fail for different reasons when the tests are no-longer skipped; | these were failing anyway for other testsuite configurations, so it's | ... 22 lines of the commit log omitted. FAIL: 23 regressions: 22 improvements regressions.sum: === gcc tests === Running gcc:gcc.target/arm/arm.exp ... FAIL: gcc.target/arm/scd42-2.c scan-assembler mov[ \t].*272 Running gcc:gcc.target/arm/cmse/cmse.exp ... FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 ... and 19 more entries improvements.sum: === gcc tests === Running gcc:gcc.target/arm/cmse/cmse.exp ... FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsls\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-6.c -march=armv8.1-m.main+fp -mthumb -O3 -g scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 ... and 16 more entries I can't make any sense of this at all. After hours wasted trying to find the configuration information from the logs (it's there, but to the inexperienced user of your reports, it is buried far too deep), I'm still none-the-wiser. All I can see is that things like FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 have changed to FAIL: gcc.target/arm/cmse/mainline/8_1m/bitfield-4.c -mcpu=unset -march=armv8.1-m.main+fp -mthumb -O2 scan-assembler lsrs\t(r[3-9]|r10|fp|ip), \\1, #1.*blxns\t\\1 (ie that -mcpu=unset has been added to the test name). That's not a regression, it's a simple FAIL->FAIL R. You can find the failure logs in *.log.1.xz files in - https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m23_soft_eabi-build/144/artifact/artifacts/00-sumfiles/ The full lists of regressions and improvements as well as configure and make commands are in - https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m23_soft_eabi-build/144/artifact/artifacts/notify/ The list of [ignored] baseline and flaky failures are in - https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m23_soft_eabi-build/144/artifact/artifacts/sumfiles/xfails.xfail The configuration of this build is: CI config tcwg_gnu_embed_check_gcc arm-eab
Re: [Linaro-TCWG-CI] gcc-14-8887-gd9459129ea8: FAIL: 29 regressions on master-thumb_m33_eabi
I think all of these actually fall under "I suspect there are still some further issues to address here, since the framework does not correctly test that the multilibs and startup code enable alternative format; but this is still an improvement over what we had before." All the failures are execution test failures due to the fact that we don't check the available hardware/multilibs for running the test; so blindly adding options and then running the test is incorrect. But we currently lack such a test in the framework. It's also less than clear exactly what these tests are checking and which part of what they are checking that really requires the options they add. I suspect that they previously passed only by accident (they didn't really add enough flags to enable what they author thought they were checking). R. On 10/02/2024 02:43, ci_not...@linaro.org wrote: Dear contributor, our automatic CI has detected problems related to your patch(es). Please find some details below. If you have any questions, please follow up on linaro-toolchain@lists.linaro.org mailing list, Libera's #linaro-tcwg channel, or ping your favourite Linaro toolchain developer on the usual project channel. We appreciate that it might be difficult to find the necessary logs or reproduce the issue locally. If you can't get what you need from our CI within minutes, let us know and we will be happy to help. We track this report status in https://linaro.atlassian.net/browse/GNU-1149 <https://linaro.atlassian.net/browse/GNU-1149> , please let us know if you are looking at the problem and/or when you have a fix. In arm-eabi cortex-m33 hard after: | commit gcc-14-8887-gd9459129ea8 | Author: Richard Earnshaw | Date: Mon Feb 5 17:16:45 2024 + | | arm: testsuite: fix issues relating to fp16 alternative testing | | The v*_fp16_xN_1.c tests on Arm have been unstable since they were | added. This is not a problem with the tests themselves, or even the | patches that were added, but with the testsuite infrastructure. It | turned out that another set of dg- tests for fp16 were corrupting the | cached set of options used by the new tests, leading to running the | ... 45 lines of the commit log omitted. FAIL: 29 regressions regressions.sum: === g++ tests === Running g++:g++.dg/dg.exp ... FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=c++14 execution test FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=c++17 execution test FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=c++20 execution test FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-3.C -std=c++98 execution test FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C -std=gnu++14 execution test FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C -std=gnu++17 execution test FAIL: g++.dg/ext/arm-fp16/arm-fp16-ops-4.C -std=gnu++20 execution test ... and 26 more entries You can find the failure logs in *.log.1.xz files in - https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/00-sumfiles/ <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/00-sumfiles/> The full lists of regressions and progressions as well as configure and make commands are in - https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/notify/ <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/notify/> The list of [ignored] baseline and flaky failures are in - https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/sumfiles/xfails.xfail <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts/sumfiles/xfails.xfail> The configuration of this build is: CI config tcwg_gnu_embed_check_gcc arm-eabi -mthumb -march=armv8-m.main+dsp+fp -mtune=cortex-m33 -mfloat-abi=hard -mfpu=auto -8<--8<--8<-- The information below can be used to reproduce a debug environment: Current build : https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/363/artifact/artifacts> Reference build : https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/362/artifact/artifacts <https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m33_eabi-build/362/artifact/artifacts> Reproduce last good and first bad builds: https://git-us.linaro.org/toolchain/ci/interesting-commits.git/plain/gcc/sha1/d9459129ea8f8c3cbd6150b90e842decba7952a3/tcwg_gnu_embed_check_gcc/master-thumb_m33_eabi/reproduction_instructions.tx
Re: [PATCH v2] ARM: Block predication on atomics [PR111235]
On 02/10/2023 18:12, Wilco Dijkstra wrote: Hi Ramana, I used --target=arm-none-linux-gnueabihf --host=arm-none-linux-gnueabihf --build=arm-none-linux-gnueabihf --with-float=hard. However it seems that the default armhf settings are incorrect. I shouldn't need the --with-float=hard since that is obviously implied by armhf, and they should also imply armv7-a with vfpv3 according to documentation. It seems to get confused and skip some tests. I tried using --with-fpu=auto, but that doesn't work at all, so in the end I forced it like: --with-arch=armv8-a --with-fpu=neon-fp-armv8. With this it runs a few more tests. Yeah that's a wart that I don't like. armhf just implies the hard float ABI and came into being to help distinguish from the Base PCS for some of the distros at the time (2010s). However we didn't want to set a baseline arch at that time given the imminent arrival of v8-a and thus the specification of --with-arch , --with-fpu and --with-float became second nature to many of us working on it at that time. Looking at it, the default is indeed incorrect, you get: '-mcpu=arm10e' '-mfloat-abi=hard' '-marm' '-march=armv5te+fp' That's not incorrect. It's the first version of the architecture that can support the hard-float ABI. That's like 25 years out of date! It's not a matter of being out of date (and it's only 22 years since arm1020e was announced ;) it's a matter of being as compatible as we can be with existing hardware out-of-the-box. Distros are free, of course, to set a higher bar and do so. However all the armhf distros have Armv7-a as the baseline and use Thumb-2: '-mfloat-abi=hard' '-mthumb' '-march=armv7-a+fp' Wrong. Rawhide uses Arm state (or it did last I checked). As I mentioned above, distros are free to set a higher bar. So the issue is that dg-require-effective-target arm_arch_v7a_ok doesn't work on armhf. It seems that if you specify an architecture even with hard-float configured, it turns off FP and then complains because hard-float implies you must have FP... OK, I think I see the problem there, it's in the data for proc add_options_for_arm_arch_FUNC in lib/target-supports.exp. In order to work correctly with -mfpu=auto, the -march flags in the table need "+fp" adding in most cases (pretty much everything from armv5e onwards) - that's harmless whenever the float-abi is soft, but should do the right thing when softfp or hard are used. So in most configurations Iincluding the one used by distro compilers) we basically skip lots of tests for no apparent reason... Ok, thanks for promising to do so - I trust you to get it done. Please try out various combinations of -march v7ve, v7-a , v8-a with the tool as each of them have slightly different rules. For instance v7ve allows LDREXD and STREXD to be single copy atomic for 64 bit loads whereas v7-a did not . You mean LDRD may be generated on CPUs with LPAE. We use LDREXD by default since that is always atomic on v7-a. Ok if no regressions but as you might get nagged by the post commit CI ... Thanks, I've committed it. Those links don't show anything concrete, however I do note the CI didn't pick up v2. Btw you're happy with backports if there are no issues reported for a few days? Cheers, Wilco R. ___ linaro-toolchain mailing list -- linaro-toolchain@lists.linaro.org To unsubscribe send an email to linaro-toolchain-le...@lists.linaro.org
Re: What -mfpu option is used with neon, vfpv3 and vfpd32 flag?
On 22/07/16 05:21, Jeffrey Walton wrote: > On Fri, Jul 22, 2016 at 12:19 AM, Jim Wilson wrote: >> On Thu, Jul 21, 2016 at 9:13 PM, Jeffrey Walton wrote: >>> So I guess the question is, what do I use for -mfpu=neon-vfp3 (or >>> -mfpu=neon-vfp3-d32)? Is -mfpu=neon enough? >> >> The -mfpu=neon option is enough. neon implies vfpv3 and 32 D registers. > > Perfect, thanks. > > Jeff > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/linaro-toolchain > According to https://beagleboard.org/black, this board contains a Cortex-A8. So -mfpu=neon is correct. https://community.arm.com/groups/tools/blog/2013/04/15/arm-cortex-a-processors-and-gcc-command-lines R. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: -mfpu=neon-fp-armv8 and unrecognized command line option
On 20/07/16 22:33, Jim Wilson wrote: > On Wed, Jul 20, 2016 at 2:14 PM, Jeffrey Walton wrote: >> I'm having trouble with ARMv8/Aarch64. One is an early Mustang server > > ARMv8 implies 32-bit code (aarch32). Aaarch64 implies 64-bit code. > These are two different compilers, with two different sets of command > line options. Er, no. ARMv8 (pedantically ARMv8-A, since there are also ARMv8-R and ARMv8-M specifications as well) is an architecture, not an ISA. The ARMv8 architecture defines two execution modes: AArch32 and AArch64. The AArch32 execution mode further has to states with separate ISAs: A32 and T32, more traditionally known as ARM and Thumb states. GCC has two separate compilers for ARMv8. One handles the AArch32 execution mode (configurations based arm-*-* for legacy reasons) and the other AArch64 (configurations based on aarch64-*-*). The -mfpu option only applies to the AArch32 compiler. R. > >> $ g++ -DDEBUG -g3 -O0 -mfpu=neon-fp-armv8 -fPIC -pipe -c cryptlib.cpp >> g++: error: unrecognized command line option ‘-mfpu=neon-fp-armv8’ >> GNUmakefile:753: recipe for target 'cryptlib.o' failed > > -mfpu=neon-fp-armv8 is an arm (32-bit) compiler option. The aarch64 > (64-bit) compiler will not accept it. > > Because FP and Neon support is optional in the 32-bit arm > architecture, there are compiler options to enable fp and/or neon > support. Usually FP support is enabled by default for a linux distro, > but the neon support usually is not, and you can enable neon by using > this -mcpu=neon-fp-armv8 option if running 32-bit code on an ARMv8 > architecture part. > > Meanwhile, the aarch64 spec requires FP and ASIMD instruction support > in the linux ABI, so there are no options to enable them, they are on > by default. If you really want to disable them, you can do so by > using a -march= option, e.g. -march=aarch64+fp+simd enables them, and > -march=aarch64+nofp+nosimd disables them. However, if you disable fp > support, you will break the ABI, and your code may not compile or run, > so don't do that unless perhaps you have an embedded target, and have > your own OS build and your own ABI. or no code that uses FP You can > also enable/.disable crc (crypto) support this way, but a better way > is to use a -mcpu= option, and let gcc figure out if the target has > crc instructions. > > See the aarch64 compiler docs here > https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options > > Jim > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/linaro-toolchain > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: gcc 5.2 code quality
On 03/03/16 00:44, kugan wrote: > > >>> I have just switched to gcc 5.2 from 4.9.2 and the code quality does >>> seem to have improved significantly. For example, it now seems much >>> better at using ldp/stp and it seems to has stopped gratuitous use of >>> the SIMD registers. >>> >>> However, I still have a few whinges:-) >>> >>> See attached copy.c / copy.s (This is a performance critical function >>> from OpenJDK) >>> >>> pd_disjoint_words: >>> cmp x2, 8 <<< (1) >>> sub sp, sp, #64 <<< (2) >>> bhi .L2 >>> cmp w2, 8 <<< (1) >>> bls .L15 >>> .L2: >>> add sp, sp, 64<<< (2) >>> >>> (1) If count as a 64 bit unsigned is <= 8 then it is probably still >>> <= 8 as a 32 bit unsigned. >>> >> Agreed. This could probably be done by the mid-end based on value range >> propagation. Please can you file a report in gcc bugzilla? > > Not sure how we can do this in VRP. It seems that this is generated > during the RTL expansion time. Maybe,it has to be done during expansion. > optimized tree looks like: > > Ramana and I looked further into thsi last night. It turns out this is due to the way we expand switch tables. The ARM and AArch64 back-ends both use the casesi pattern which is defined to do a range check and a branch into the table. The range check is based on a 32-bit value. Because this example uses a 64-bit type as the controlling expression, the mid-end has to insert another check that the original value is within range; this renders the second check redundant but there's then no way to remove that. You're correct that VRP isn't going to help here. We're looking at whether we can adjust things to use the tablejump expander, since that should eliminate the need for the second check. > ;; Function pd_disjoint_words (pd_disjoint_words, funcdef_no=0, > decl_uid=2763, cgraph_uid=0, symbol_order=0) > > Removing basic block 13 > pd_disjoint_words (HeapWord * from, HeapWord * to, size_t count) > { > long int t$b; > long int t$a; > struct unit t; > struct unit t; > struct unit t; > struct unit t; > struct unit t; > struct unit t; > long int _5; > > : > switch (count_2(D)) , case 0: , case 1: , case > 2: , case 3: , case 4: , case 5: , case 6: , case > 7: , case 8: > > > : > _5 = *from_4(D); > *to_6(D) = _5; > goto (); > > : > t$a_8 = MEM[(struct unit *)from_4(D)]; > t$b_9 = MEM[(struct unit *)from_4(D) + 8B]; > MEM[(struct unit *)to_6(D)] = t$a_8; > MEM[(struct unit *)to_6(D) + 8B] = t$b_9; > goto (); > > : > t = MEM[(struct unit *)from_4(D)]; > MEM[(struct unit *)to_6(D)] = t; > t ={v} {CLOBBER}; > goto (); > > : > t = MEM[(struct unit *)from_4(D)]; > MEM[(struct unit *)to_6(D)] = t; > t ={v} {CLOBBER}; > goto (); > > : > t = MEM[(struct unit *)from_4(D)]; > MEM[(struct unit *)to_6(D)] = t; > t ={v} {CLOBBER}; > goto (); > > : > t = MEM[(struct unit *)from_4(D)]; > MEM[(struct unit *)to_6(D)] = t; > t ={v} {CLOBBER}; > goto (); > > : > t = MEM[(struct unit *)from_4(D)]; > MEM[(struct unit *)to_6(D)] = t; > t ={v} {CLOBBER}; > goto (); > > : > t = MEM[(struct unit *)from_4(D)]; > MEM[(struct unit *)to_6(D)] = t; > t ={v} {CLOBBER}; > goto (); > > : > _Copy_disjoint_words (from_4(D), to_6(D), count_2(D)); [tail call] > > : > return; > > } > > > > Thanks, > Kugan > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: gcc 5.2 code quality
On 02/03/16 11:35, Edward Nevill wrote: > Hi, > > I have just switched to gcc 5.2 from 4.9.2 and the code quality does seem to > have improved significantly. For example, it now seems much better at using > ldp/stp and it seems to has stopped gratuitous use of the SIMD registers. > > However, I still have a few whinges:-) > > See attached copy.c / copy.s (This is a performance critical function from > OpenJDK) > > pd_disjoint_words: > cmp x2, 8 <<< (1) > sub sp, sp, #64 <<< (2) > bhi .L2 > cmp w2, 8 <<< (1) > bls .L15 > .L2: > add sp, sp, 64<<< (2) > > (1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as a > 32 bit unsigned. > Agreed. This could probably be done by the mid-end based on value range propagation. Please can you file a report in gcc bugzilla? > (2) Nowhere in the function does it store anything on the stack, so why > drop and restore the stack every time. Also, minor quibble in the > disass, why does sub use #64 whereas add uses just '64' (appreciate this > is probably binutils, not gcc). > This is a known problem. What's happened is that in the early phase of compilation you had an object that appeared to need stack space. Later on that was optimized away, but the stack slot is not freed. In large functions where there is often other data on the stack anyway this equates to little more than some wasted stack space, but in small functions it can often make the difference between needing stack adjustments and not. > .L15: > adrpx3, .L4 > add x3, x3, :lo12:.L4 > ldrbw2, [x3,w2,uxtw] <<< (3) > adr x3, .Lrtx4 > add x2, x3, w2, sxtb #2 > br x2 > > (3) Why use a byte table, this is not some sort of embedded system. Use > a word table and this becomes. > > .L15: > adrpx3, .L4 > add x3, x3, :lo12:.L4 > ldr x2, [x3, x2, lsl #3] > br x2 > > An aligned word load takes exactly the same time as a byte load and we > save the faffing about calculating the address. > That doesn't work for PIC (or PIE) and can also significantly increase cache pressure. > .L10: > ldp x6, x7, [x0] > ldp x4, x5, [x0, 16] > ldp x2, x3, [x0, 32] <<< (4) > stp x2, x3, [x1, 32] <<< (4) > stp x6, x7, [x1] > stp x4, x5, [x1, 16] > > (4) Seems to be something wrong with the load scheduler here? Why not > move the stp x2, x3 to the end. It does this repeatedly. > You don't say what compilation options you used but a simple build with -O3 on gcc trunk shows the stores in the correct order. > Unfortunately as this function is performance critical it means I will > probably end up doing it in inline assembler which is time consuming, > error prone and non portable. > > * Whinge mode off > > Ed > > > copy.c > > > #include > > typedef long HeapWord; > > extern void _Copy_disjoint_words(HeapWord* from, HeapWord* to, size_t count); > > void pd_disjoint_words(HeapWord* from, HeapWord* to, size_t count) { > switch (count) { > case 0: return; > case 1: to[0] = from[0]; return; > case 2: { > struct unit { HeapWord a, b; } *p, *q, t; > p = (struct unit *)from; > q = (struct unit *)to; > t = *p; > *q = t; > return; > } > case 3: { > struct unit { HeapWord a, b, c; } *p, *q, t; > p = (struct unit *)from; > q = (struct unit *)to; > t = *p; > *q = t; > return; > } > case 4: { > struct unit { HeapWord a, b, c, d; } *p, *q, t; > p = (struct unit *)from; > q = (struct unit *)to; > t = *p; > *q = t; > return; > } > case 5: { > struct unit { HeapWord a, b, c, d, e; } *p, *q, t; > p = (struct unit *)from; > q = (struct unit *)to; > t = *p; > *q = t; > return; > } > case 6: { > struct unit { HeapWord a, b, c, d, e, f; } *p, *q, t; > p = (struct unit *)from; > q = (struct unit *)to; > t = *p; > *q = t; > return; > } > case 7: { > struct unit { HeapWord a, b, c, d, e, f, g; } *p, *q, t; > p = (struct unit *)from; > q = (struct unit *)to; > t = *p; > *q = t; > return; > } > case 8: { > struct unit { HeapWord a, b, c, d, e, f, g, h; } *p, *q, t; > p = (struct unit *)from; > q = (struct unit *)to; > t = *p; > *q = t; > return; > } > default: > _Copy_disjoint_words(from, to, count); > } > } > > > copy.s > > > .cpu generic+fp+simd > .file "copy.c" > .text > .align 2 > .p2align 3,,7 > .global pd_disjoint_words > .type pd_disjoint_words, %function > pd_disjoint_words: > cmp x2, 8 > sub sp, sp, #64 >
Re: gcc 5.2 code quality
On 02/03/16 14:25, Renato Golin wrote: > On 2 March 2016 at 11:35, Edward Nevill wrote: >> cmp x2, 8 <<< (1) >> (1) If count as a 64 bit unsigned is <= 8 then it is probably still <= 8 as >> a 32 bit unsigned. > > You mean to use "cmp w2, 8" instead? Is there any difference? > No, it's code equivalent to unsigned long x; if (x <= 8) { if ((unsigned) x <= 8) { ... } } Where the inner test is clearly redundant (for unsigned). R. > >> (2) Nowhere in the function does it store anything on the stack, so why >> drop and restore the stack every time. Also, minor quibble in the >> disass, why does sub use #64 whereas add uses just '64' (appreciate this >> is probably binutils, not gcc). > > My reading of the AAPCS64 is that it's not necessary to have a frame > at all, only that if you do, it must be quad-word aligned. > > Clang/LLVM doesn't seem to bother with the push and pop, but it also > uses "cmp x". > > >> .L15: >> adrpx3, .L4 >> add x3, x3, :lo12:.L4 >> ldr x2, [x3, x2, lsl #3] >> br x2 > > Hum, this is *exactly* what Clang generates... :) > > >> (4) Seems to be something wrong with the load scheduler here? Why not >> move the stp x2, x3 to the end. It does this repeatedly. > > Again, Clang seems to do what you want... > > Have you tried building OpenJDK with Clang? > > cheers, > --renato > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/linaro-toolchain > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Linaro 4.9 toolchain has issues with allyesconfig
On 27/01/16 16:57, Jim Wilson wrote: > On Wed, Jan 27, 2016 at 8:37 AM, Christophe Lyon > wrote: >> I confirm the trampolines are not inserted for ld -r. > > We can't insert the trampolines with ld -r. We don't have all of the > symbols required for this with a relocatable link. Also, for the > symbols we do have, we don't know the final addresses, so we don't > know which branches/calls will be out of range. > >> Maybe using -ffunction-sections would help? > > Using -mlong-calls might be easier. You would only want this enabled > for a allyesconfig build of course. This should solve the > out-of-range calls to the spin lock functions. it probably doesn't > help for the out-of-range branches to the .text.unlikely sections, but > this is a gcc optimization that can be disabled with > -fno-reorder-functions. This is again something you would only want > for an allyesconfig build. > > Jim > Long calls would probably solve the problem, but would likely be horribly expensive in performance. The best solution would be to have an option to prevent ld -r from merging like-named sections (instead just aggregating multiple sections with similar names into one object file). This is possible in ELF. R. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Linaro 4.9 toolchain has issues with allyesconfig
On 27/01/16 14:52, William Mills wrote: > > > On 01/27/2016 08:35 AM, Richard Earnshaw wrote: >> On 26/01/16 17:25, Christophe Lyon wrote: >>> On 26 January 2016 at 18:23, Dan Murphy wrote: >>>> Christophe >>>> >>>> >>>> On 01/26/2016 10:58 AM, Christophe Lyon wrote: >>>>> >>>>> On 25 January 2016 at 17:21, Dan Murphy wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> When using the linaro-4.9-2015.05 toolchain on the Linux master and on >>>>>> Linux stable releases >>>>>> I am seeing a build issue below when using the allyesconfig. This does >>>>>> not seem to occur on the 5.2 tool chain. >>>>>> We cannot move to 5.2 tool chain for our releases as they are based on >>>>>> 4.9. >>>>>> >>>>>> LD init/built-in.o >>>>>> arch/arm/kernel/built-in.o:(.text.fixup+0x1d4): relocation truncated to >>>>>> fit: R_ARM_JUMP24 against `.text.unlikely' >>>>>> arch/arm/kernel/built-in.o:(.text.fixup+0x1e0): relocation truncated to >>>>>> fit: R_ARM_JUMP24 against `.text.unlikely' >>>>>> arch/arm/kernel/built-in.o:(.text.fixup+0x1ec): relocation truncated to >>>>>> fit: R_ARM_JUMP24 against `.text.unlikely' >>>>>> drivers/built-in.o: In function `combiner_handle_cascade_irq': >>>>>> :(.text+0x834): relocation truncated to fit: R_ARM_CALL against symbol >>>>>> `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o >>>>>> :(.text+0x868): relocation truncated to fit: R_ARM_CALL against symbol >>>>>> `_raw_spin_unlock' defined in .spinlock.text section in kernel/built-in.o >>>>>> drivers/built-in.o: In function `hip04_irq_set_type': >>>>>> :(.text+0xad0): relocation truncated to fit: R_ARM_CALL against symbol >>>>>> `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o >>>>>> :(.text+0xb10): relocation truncated to fit: R_ARM_CALL against symbol >>>>>> `_raw_spin_unlock' defined in .spinlock.text section in kernel/built-in.o >>>>>> drivers/built-in.o: In function `hip04_raise_softirq': >>>>>> :(.text+0xc8c): relocation truncated to fit: R_ARM_CALL against symbol >>>>>> `_raw_spin_lock_irqsave' defined in .spinlock.text section in >>>>>> kernel/built-in.o >>>>>> :(.text+0xdc8): relocation truncated to fit: R_ARM_CALL against symbol >>>>>> `_raw_spin_unlock_irqrestore' defined in .spinlock.text section in >>>>>> kernel/built-in.o >>>>>> drivers/built-in.o: In function `hip04_irq_set_affinity': >>>>>> :(.text+0xefc): relocation truncated to fit: R_ARM_CALL against symbol >>>>>> `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o >>>>>> :(.text+0xf78): additional relocation overflows omitted from the output >>>>>> make: *** [vmlinux] Error 1 >>>>>> >>>>>> Please advise to how to resolve this issue within the 4.9 Linaro tool >>>>>> chain >>>>> >>>>> Hi Dan, >>>>> >>>>> It would be better to report this kind of problem using our bugzilla: >>>>> https://bugs.linaro.org/ >>>>> >>>>> I've managed to reproduce it, except for the errors with R__ARM_JUMP24. >>>>> Maybe that's because I used linux-4.3.3.tar.xz. >>>>> >>>>> Anyway, these errors indicate branches trying reach a function which >>>>> is too far away from the call site. >>>>> Normally, the linker inserts stubs (trampolines) to handle such >>>>> situations, but here the .text section >>>>> of drivers/built-in.o is really huge: 84247436 bytes (84MB) in my case. >>>>> The linker is able to insert trampolines at section boundaries >>>>> (between object files), but in this case >>>>> it cannot insert one close enough, hence the error you are seeing. >>>> >>>> >>>> Do we know if this is a linux kernel issue or a linker issue that got >>>> exposed >>>> when a patch came in? >>>> >>> >>> I don't know, but I'm not a kernel expert. >>> >>> It's
Re: Linaro 4.9 toolchain has issues with allyesconfig
On 26/01/16 17:25, Christophe Lyon wrote: > On 26 January 2016 at 18:23, Dan Murphy wrote: >> Christophe >> >> >> On 01/26/2016 10:58 AM, Christophe Lyon wrote: >>> >>> On 25 January 2016 at 17:21, Dan Murphy wrote: Hi! When using the linaro-4.9-2015.05 toolchain on the Linux master and on Linux stable releases I am seeing a build issue below when using the allyesconfig. This does not seem to occur on the 5.2 tool chain. We cannot move to 5.2 tool chain for our releases as they are based on 4.9. LD init/built-in.o arch/arm/kernel/built-in.o:(.text.fixup+0x1d4): relocation truncated to fit: R_ARM_JUMP24 against `.text.unlikely' arch/arm/kernel/built-in.o:(.text.fixup+0x1e0): relocation truncated to fit: R_ARM_JUMP24 against `.text.unlikely' arch/arm/kernel/built-in.o:(.text.fixup+0x1ec): relocation truncated to fit: R_ARM_JUMP24 against `.text.unlikely' drivers/built-in.o: In function `combiner_handle_cascade_irq': :(.text+0x834): relocation truncated to fit: R_ARM_CALL against symbol `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o :(.text+0x868): relocation truncated to fit: R_ARM_CALL against symbol `_raw_spin_unlock' defined in .spinlock.text section in kernel/built-in.o drivers/built-in.o: In function `hip04_irq_set_type': :(.text+0xad0): relocation truncated to fit: R_ARM_CALL against symbol `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o :(.text+0xb10): relocation truncated to fit: R_ARM_CALL against symbol `_raw_spin_unlock' defined in .spinlock.text section in kernel/built-in.o drivers/built-in.o: In function `hip04_raise_softirq': :(.text+0xc8c): relocation truncated to fit: R_ARM_CALL against symbol `_raw_spin_lock_irqsave' defined in .spinlock.text section in kernel/built-in.o :(.text+0xdc8): relocation truncated to fit: R_ARM_CALL against symbol `_raw_spin_unlock_irqrestore' defined in .spinlock.text section in kernel/built-in.o drivers/built-in.o: In function `hip04_irq_set_affinity': :(.text+0xefc): relocation truncated to fit: R_ARM_CALL against symbol `_raw_spin_lock' defined in .spinlock.text section in kernel/built-in.o :(.text+0xf78): additional relocation overflows omitted from the output make: *** [vmlinux] Error 1 Please advise to how to resolve this issue within the 4.9 Linaro tool chain >>> >>> Hi Dan, >>> >>> It would be better to report this kind of problem using our bugzilla: >>> https://bugs.linaro.org/ >>> >>> I've managed to reproduce it, except for the errors with R__ARM_JUMP24. >>> Maybe that's because I used linux-4.3.3.tar.xz. >>> >>> Anyway, these errors indicate branches trying reach a function which >>> is too far away from the call site. >>> Normally, the linker inserts stubs (trampolines) to handle such >>> situations, but here the .text section >>> of drivers/built-in.o is really huge: 84247436 bytes (84MB) in my case. >>> The linker is able to insert trampolines at section boundaries >>> (between object files), but in this case >>> it cannot insert one close enough, hence the error you are seeing. >> >> >> Do we know if this is a linux kernel issue or a linker issue that got >> exposed >> when a patch came in? >> > > I don't know, but I'm not a kernel expert. > > It's likely that the allyes config produces large object files. > > Did it ever work for you? > > With a known-to-work starting point we could try to bisect > and identify went the problem appeared. > I find it hard to believe that a compiler could generate a single object file that contained 84Mb of code, it would take an inordinate amount of code to do that even if optimizations were turned off. So that leaves two likely options: 1) The file is constructed by concatenating multiple object files, perhaps with ld -r to produce a partially linked object file. 2) The file contains some directives that are inserting large amounts of padding. Either way, I suspect that this is going to require fixing in the Linux build system. You've hit a fundamental limit of the AArch32 architecture, as Christophe has mentioned and there's nothing at this point that the tools can do to help you. R. > Christophe. > > >> Dan >> >> >>> The range of branch instructions is indicated for instance here: >>> >>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/Cihfddaf.html >>> >>> I'm not sure why this would no happen with the 5.2 toolchain, I haven't >>> tried. >>> >>> Christophe. >>> >>> Dan -- -- Dan Murphy ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain >> >> >> >> -- >> -- >> Dan Murphy >> > ___
Re: [ANNOUNCE] Linaro GCC 4.9 2014.06 released
On 13/06/14 09:52, Matthias Klose wrote: > Am 13.06.2014 10:10, schrieb Yvan Roux: >> Linaro GCC 4.9 2014.06 is the third Linaro GCC source package release in the >> 4.9 series. It is based on FSF GCC 4.9.1+svn211054 and includes performance >> improvements and bug fixes. > > This sounds like 4.9.1 is already released. Do you mean 4.9.0+svn211054 ? > > > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-toolchain > BASE_VER is bumped in SVN immediately after the release has been made, so the svn branch probably does say 4.9.1. This should really be called 4.9.1-pre+svn211054. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [RFC][AArch64] Remove CORE_REGS form reg_class
On 15/05/14 00:22, Kugan wrote: Hi All, AAarch64 back-end defines GENERAL_REGS and CORE_REGS with the same set of register. Is there any reason why we need this? target hooks like aarch64_register_move_cost doesn’t handle CORE_REGS. In addition, IRA cost calculation also has logics like make common class biggest of best and alternate; this might get confused with this. Attached RFC patch removes it. regression tested for aarch64-none-linux-gnu on qemu-aarch64 with now new regression. Is this OK ? Patches for gcc need to be sent to gcc-patches... R. Thanks, Kugan gcc/ 2014-05-14 Kugan Vivekanandarajah * config/aarch64/aarch64.c (aarch64_regno_regclass) : Change CORE_REGS to GENERAL_REGS. (aarch64_secondary_reload) : LikeWise. (aarch64_class_max_nregs) : Remove CORE_REGS. * config/aarch64/aarch64.h (enum reg_class) : Remove CORE_REGS. (REG_CLASS_NAMES) : Likewise. (REG_CLASS_CONTENTS) : LikeWise. (INDEX_REG_CLASS) : Change CORE_REGS to GENERAL_REGS. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: -mfpu=softvfp+vfp in LLVM
On 18/02/14 12:59, Renato Golin wrote: > Richard, > > I found some emails about you implementing softvfp back in 2003, and > I'd like to know what is the expected behaviour when it conflicts with > the target triple, for example: > > -triple arm-linux-gnueabihf + -mfpu=sofvfp+vfp > > In this case, in LLVM, the triple sets "-float-abi=hard" but the fpu > would set "+soft-float-abi", which are contradictory flags. > > Is that case even possible? If so, what's the expected behaviour? Soft > or hard float? > > Do the extra flags always override the triple behaviour? Is it > expected that *every* compiler flag will work on a > last-seen-sets-behaviour manner? > > cheers, > --renato > I honestly don't remember what -mfpu=softvfp+vfp is without going to look it up... you're talking about code that was written 11 years ago! I suspect it dates to the time when we were starting to phase out support for the old FPA instructions (if you don't remember those, think yourself lucky :-); where softvfp meant to use the floating-point data format that was used with the VFP; the +vfp was probably meant to imply that vfp instructions could be used as well, but didn't change the ABI (doesn't imply float-abi=hard) -- I would say the combination you describe above is probably meaningless. In the gcc world triplets are only used during configuration of the compiler to set the various defaults, they never override something given on the command line at run time. It is possible to create meaningless combinations of some options (thumb1 + hard-float ABI is currently one that GCC can't handle). R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: MMU Off / Strict Alignment
On 18/12/13 05:06, Jonathan S. Shapiro wrote: > At the risk of sticking my nose in, this isn't a startup code issue. > It's a contract issue. > > First, I don't buy Richard's argument about memcpy() startup costs and > hard-to-predict branches. We do those tests on essentially every > *other* RISC platform without complaint, and it's very easy to order > those branches so that the currently efficient cases run well. Perhaps > more to the point, I haven't seen anybody put forward quantitative > data that using the MMU for unaligned references is any better than > executing those branches. Speaking as a recovering processor > architect, that assumption needs to be validated quantitatively. My > guess is that the branches are faster if properly arranged. > > Second, this is a contract issue. If newlib intends to support > embedded platforms, then it needs to implement algorithms that are > functionally correct without relying on an MMU. By all means use > simpler or smarter algorithms when an MMU can be assumed to be > available in a given configuration, but provide an algorithm that is > functionally correct when no MMU is available. "Good overall > performance in memcpy" is a fine thing, but it is subject to the > requirement of meeting functional specifications. As Jochen Liedtke > famously put it (read this in a heavy German accent): "Fast, ya. But > correct? (shrug) Eh!" > > So: we need a normative statement saying what the contract is. The > rest of the answer will fall out from that. > > I do agree with Richard that startup code is special. I've built > deeply embedded runtimes of one form or another for 25 years now, and > I have yet to see a system where optimizing a simplistic byte-wise > memcpy during bootstrap would have made any difference in anything > overall. That said, if the specification of memcpy requires it to > handle incompatibly aligned pointers (and it does), and the contract > for newlib requires it to operate in MMU-less scenarios in a given > configuration (which, at least in some cases, it does), it's > completely legitimate to expect that bootstrap code can call memcpy() > and expect behavior that meets specifications. > > So what's the contract? > I disagree with your assertion that newlib *requires* it to operate in an MMU-less scenario for all targets; it only does so when the target can reasonably be expected to not have an MMU. The only contract that exists is the one written in the C standard: 7.23.2.1#2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined. But that is written on the assumption that we're in a normal execution environment, not in some special case. What you're missing is that AArch64 is (in ARM ARM terms) an A-profile only environment where an MMU is mandated in the system. Furthermore, processors implementing the architecture will *expect* that the MMU be turned on as soon as possible after boot, since without this the caches cannot be used and without those the performance will be truly horrible. Once the caches are enabled, it's perfectly reasonable to assume that memcpy will only be used for copies to and from NORMAL memory, since other types of memory have potential side effects, which means that use of memcpy would be unsafe. If you want to write an MMU-less memcpy, then feel free to write one; but please install it with a different interface -- something like __memcpy_nommu(). Don't penalise the standard case for the non-standard exceptional one. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: MMU Off / Strict Alignment
On 16/12/13 17:54, Christopher Covington wrote: > Hi, > > On 11/20/2013 03:45 PM, Matthew Gretton-Dann wrote: >> On 20 November 2013 17:57, Christopher Covington wrote: >>> Hi, >>> >>> We've noticed an issue trying to use the Linaro AArch64 binary bare metal >>> toolchain release with the MMU turned off for some low-level tests. >>> >>> Anytime puts, sprintf, etc. gets called, a reent structure gets created with >>> references to STDIN, STDOUT, STDERR FILE types. A member in the __sFile >>> struct, _mbstate, is an 8 byte struct, but is not aligned on an 8 byte >>> boundary. This means that when memset (or a similar function) gets called on >>> this struct, and doesn't operate one byte at a time, a data alignment fault >>> will be generated when operating out of device memory, such as on a system >>> where the MMU has not yet been turned on yet. > > We believe to have narrowed down the issue to the AArch64 optimized > memcpy/memset implementations that assume unaligned accesses will not fault. > While the current AArch64 libgloss startup code turns the MMU on so such > accesses will succeed, I don't think turning on the MMU should be required of > all startup code. Would it be possible to modify these routines to make only > size-aligned accesses without degrading performance? If a single > implementation can't make everyone happy, should the ifdefs around them > perhaps be expanded to include something about requiring the MMU to be on? > Quite frankly, I doubt it. Good overall performance in memcpy means avoiding hard-to-predict branches (it's not unusual for the code to be called with completely random copy sizes); removing the unaligned accesses would mean many more compares and branches than are currently required, each of which would carry a significant risk of an avoidable branch mispredict. Furthermore, completely unaligned copies would then need to be entirely rewritten to use byte-shifting techniques; that would significantly impact the overall performance. My personal feeling is that startup code is really special. If you need to copy some memory during this time and the MMU has not been enabled, then you can't assume that it's safe to call memcpy. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Overheating Pandas
On 03/07/13 17:41, Renato Golin wrote: On 3 July 2013 17:22, Mans Rullgard mailto:mans.rullg...@linaro.org>> wrote: I repeat, the 4460 will run at 1.2GHz indefinitely without thermal management. My mistake, I said 1.3GHz when it was actually 1.2GHz. So, at 1.2GHz, it freezes every few hours on full load on both 4430 and 4460. linaro@linaro-panda-01:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq 120 Now what? keep lowering the clock limit (.../cpufreq/scaling_max_freq) until you get stability. If you don't, then it isn't a heating problem. Remember that manufacturers match the form of packaging to the expected TDP of the intended usage environment (to keep product costs down). In a mobile part that probably means relatively cheap plastic package because a hot chip would burn a hole in your pocket -- literally. The package almost certainly doesn't have a high thermal conductivity from the chip to the external surface so while a heat sink might help, it won't be as effective as with other packaging options. Chips expected to dissipate large amounts of power normally have a metal pad on the package so that a heat sink with thermal grease will make a good thermal contact. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [ACTIVITY] Week 12
On 25/03/13 06:00, Pinski, Andrew wrote: Yes I know what you want to do but I think this is better to without adding a new tree code. You want to expand the following: a = x < y c = z ? a : b Where a is only used in the assignment of a. I think this is better to add target specific hook for doing expansions which are complex and only effect one target. Two (at least). Then the patch becomes almost all target specific and no longer need to add a new tree code which is only good for the arm target. Not necessarily. It's often easier to lower something more specific into general code when that feature isn't present that to optimize for it by identifying specific patterns when it is. The existing code stands almost no chance of being able to deal with repeated conditional comparison operations, it just gets too hairy. R. Thanks, Andrew From: Zhenqiang Chen [zhenqiang.c...@linaro.org] Sent: Sunday, March 24, 2013 9:34 PM To: Pinski, Andrew Cc: linaro-toolchain Subject: Re: [ACTIVITY] Week 12 On 25 March 2013 12:05, Pinski, Andrew wrote: * Investigate how to expand conditional compare GIMPLE to RTL and emit asm. I think maybe we should start adding target specific expanders. Then all you need to do is take that expander and when you get a COND_EXPR and then looks at TRE provided information which then can exapnd the conditional compare without adding a new tree code. Note I think this should be discussed on the GCC list directly anyways rather than on the linaro form because it is more likely be accepted if talked about there. Thanks for the comments. The "conditional compare" mentioned here is a different from COND_EXPR. If my understanding is correct, COND_EXPR is for "c1? v1: v2" "conditional compare" here is to represent the second operand of short-circuit, e.g. TRUTH_ANDIF_EXPR. It is more like "c1? CMP (v1, v2): c1" It is just a CMP (GT, NE, etc)_EXPR if ignoring the "conditional" part. Agree with you, we should discuss it on GCC list. But before that, I want a prototype and estimate the efforts. Thanks! -Zhenqiang ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: AArch64 asm statement question
On 22/02/13 09:54, Yvan Roux wrote: Hi Richard, thanks for the reminding, my previous example was just an attempt to find the good asm statement constraints to generate a correct ldxp instruction. My real objective is to implement 128-bit single-copy atomic load/store and to do this I use ldxp without any matching stxp for the atomic_load : __asm__ __volatile__( " ldxp %0, %1, [%2]" : "=&r" (res.v1), "=&r" (res._v2) : "r" (addr) ); and a "fake" ldxp with the "real" stxp for the atomic_store: do { __asm__ __volatile__( " ldxp %0, %1, %3\n" " stxp %w2, %4, %5, %3" : "=&r" (fake_val.v1), "=&r" (fake_val.v2), "=&r" (status), "+Q" (*addr) : "r" (value.v1), "r" (value.v2) ); } while (status); do you think that it is the right way to do it ? Sadly, no. Only the STXP instruction is single-copy atomic. To get atomicity on a read you need to repeat the sequence LDXPXn, Xm, [addr] STXPWs, Xn, Xm, [addr] until the store succeeds. However, this isn't needed on 32-bit LDXP sequences. R. Thanks Yvan On 21 February 2013 19:31, Richard Earnshaw wrote: On 21/02/13 15:54, Yvan Roux wrote: Hi, in the example below I want to explicitly generate a "store exclusive pair" instruction with an asm statement: typedef struct { long unsigned int v1; long unsigned int v2; } mtype; int main () { mtype val[2] ; val[0].v1 = 1234; val[0].v2 = 5678; int status; do { __asm__ __volatile__( " stxp %0, %2, %3, %1" : "=&r" (status), "=Q" (val[1]) : "r" (val[0].v1), "r" (val[0].v2) ); } while (status != 0); if (val[1].v1 == 1234 && val[1].v2 == 5678) return 0; return 1; } The generated assembly is: .L7: ldr x0, [sp] ldr x1, [sp,8] .L3: add x3, sp, 16 stxpx2, x0, x1, [x3] cbnzw2, .L7 and the issue is that the assembler is not happy of the register x2 used to store the exclusive access status, it should be w2, but looking at constraint.md it seems that there is no constraint to say that we want the 32bit version of the register. Any idea ? You may already be aware of this, but like AArch32, the architecture restricts the use of load and store operations that are permitted between LDXP and STXP, which essentially means that any ASM block that uses LDXP must also contain the matching STXP that depends on it. If you don't do this the compiler may introduce random load/store operations (eg spills/reloads) that will kill your exclusive access and make the code unable to proceed. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: AArch64 asm statement question
On 21/02/13 15:54, Yvan Roux wrote: Hi, in the example below I want to explicitly generate a "store exclusive pair" instruction with an asm statement: typedef struct { long unsigned int v1; long unsigned int v2; } mtype; int main () { mtype val[2] ; val[0].v1 = 1234; val[0].v2 = 5678; int status; do { __asm__ __volatile__( " stxp %0, %2, %3, %1" : "=&r" (status), "=Q" (val[1]) : "r" (val[0].v1), "r" (val[0].v2) ); } while (status != 0); if (val[1].v1 == 1234 && val[1].v2 == 5678) return 0; return 1; } The generated assembly is: .L7: ldr x0, [sp] ldr x1, [sp,8] .L3: add x3, sp, 16 stxpx2, x0, x1, [x3] cbnzw2, .L7 and the issue is that the assembler is not happy of the register x2 used to store the exclusive access status, it should be w2, but looking at constraint.md it seems that there is no constraint to say that we want the 32bit version of the register. Any idea ? You may already be aware of this, but like AArch32, the architecture restricts the use of load and store operations that are permitted between LDXP and STXP, which essentially means that any ASM block that uses LDXP must also contain the matching STXP that depends on it. If you don't do this the compiler may introduce random load/store operations (eg spills/reloads) that will kill your exclusive access and make the code unable to proceed. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: aarch64 does not run
On 13/02/13 02:08, Wink Saville wrote: Thanks, I install ia32-lib and now it works, but what a poor error message. To bad i would at least tell you what file wasn't found. I agree, but that's a feature of the OS, rather than the program you were trying to run. The programs you installed never even started to run because the OS couldn't find the required libraries. R. -- Wink On Sun, Feb 10, 2013 at 3:23 AM, Mans Rullgard mailto:mans.rullg...@linaro.org>> wrote: On 10 February 2013 04:36, Wink Saville mailto:w...@saville.com>> wrote: > I downloaded the aarch64 binaries to a ubuntu machine: > > wink@ssi-primary:~$ uname -a > Linux ssi-primary 3.5.0-21-generic #32-Ubuntu SMP Tue Dec 11 18:51:59 UTC > 2012 x86_64 x86_64 x86_64 GNU/Linux > > > And when I try to run gcc-4.7.3: > > wink@ssi-primary:~$ ls -al > ~/aarch64-toolchain/gcc-linaro-aarch64-linux-gnu-4.7+bzr115029-20121015+bzr2506_linux/bin/aarch64-linux-gnu-gcc-4.7.3 > -rwxr-xr-x 1 wink wink 553068 Oct 18 14:21 > /home/wink/aarch64-toolchain/gcc-linaro-aarch64-linux-gnu-4.7+bzr115029-20121015+bzr2506_linux/bin/aarch64-linux-gnu-gcc-4.7.3 > > I get a file not found: > > wink@ssi-primary:~$ strace > /home/wink/aarch64-toolchain/gcc-linaro-aarch64-linux-gnu-4.7+bzr115029-20121015+bzr2506_linux/bin/aarch64-linux-gnu-gcc-4.7.3 > -v > > execve("/home/wink/aarch64-toolchain/gcc-linaro-aarch64-linux-gnu-4.7+bzr115029-20121015+bzr2506_linux/bin/aarch64-linux-gnu-gcc-4.7.3", > ["/home/wink/aarch64-toolchain/gcc"..., "-v"], [/* 19 vars */]) = -1 > ENOENT (No such file or directory) This error usually means the executable is requesting a non-existent "interpreter" (dynamic loader). You need to install the 32-bit compat lib package. I don't remember what it's called on ubuntu, probably ia32-libs or similar. -- Mans Rullgard / mru ATT1..txt ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: FPU error with arm-linux-gnueabihf-gcc 3.7.3
On 11/12/12 19:39, Matthew Gretton-Dann wrote: On 10 December 2012 14:53, Zhangfei Gao wrote: We met build error when using arm-linux-gnueabihf-gcc 3.7.3 for Huawei s40v200 kernel, which is 3.0.8, log is below. HAVE issue: gcc-linaro-arm-linux-gnueabihf-4.7-2012.11-20121123_linux.tar.bz2 gcc-linaro-arm-linux-gnueabihf-4.7-2012.10-20121022_linux.tar.bz2 gcc-linaro-arm-linux-gnueabi-2012.03-20120326_linux.tar.bz2, with build flag --with-float=softfp NO issue: arm-eabi-gcc 4.4.0, which comes from android package. Any suggestion? Thanks make[1]: `include/generated/mach-types.h' is up to date. CALLscripts/checksyscalls.sh CHK include/generated/compile.h AS arch/arm/mach-godbox/hi_pm_sleep.o arch/arm/mach-godbox/hi_pm_sleep.S: Assembler messages: arch/arm/mach-godbox/hi_pm_sleep.S:456: Error: selected processor does not support requested special purpose register -- `mrs r10,FPEXC' arch/arm/mach-godbox/hi_pm_sleep.S:456: Error: selected processor does not support requested special purpose register -- `msr FPEXC,r2' arch/arm/mach-godbox/hi_pm_sleep.S:456: Error: selected processor does not support requested special purpose register -- `mrs r2,FPSCR' arch/arm/mach-godbox/hi_pm_sleep.S:546: Error: selected processor does not support requested special purpose register -- `msr FPEXC,r2' arch/arm/mach-godbox/hi_pm_sleep.S:546: Error: selected processor does not support requested special purpose register -- `msr FPSCR,r10' arch/arm/mach-godbox/hi_pm_sleep.S:546: Error: selected processor does not support requested special purpose register -- `msr FPEXC,r9' make[1]: *** [arch/arm/mach-godbox/hi_pm_sleep.o] Error 1 make: *** [arch/arm/mach-godbox] Error 2 make: *** Waiting for unfinished jobs Can you show me the complete command line for assembling arch/arm/mach-godbox/hi_pm_sleep.S into arch/arm/mach-godbox/hi_pm_sleep.o please? And also the output of gcc -v? My guess is that you are building GCC yourself have specified --with-float=softfp, but not specified the actual floating point architecture with --with-fpu= on the GCC configure line. See the documentation for -mfpu= in the GCC manuals to see what values are valid here (again guessing but you probably want --with-fpu=vfpv3 or --with-fpu=neon - but I don't know what architecture you are actually compiling for). Hmm, building the kernel with floating-point enabled is probably a no-no! The kernel has to preserve the user-space floating point context as it isn't saved on kernel entry. IIRC, insns in the kernel that really must access the co-processors have to use generic instrucions (MCR/MRC/LDC/STC, etc). However, you might get a more useful answer if you talked to the kernel folk. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: AND vs UXTB
On 03/08/12 13:49, Mans Rullgard wrote: > I have noticed gcc has a preference for generating UXTB instructions > when an AND with #255 would do the same thing. This is bad, because > on A9 UXTB has two cycles latency compared to one cycle for AND. On > A8 both instructions have one cycle latency. > UXTB on the other hand is a 16-bit instruction, whereas AND is a 32-bit one. Of the cores I'm aware of, only A9 has this performance anomaly. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Distinguishing SF/HF ABI binaries, take two
On 02/08/12 18:39, Mans Rullgard wrote: > Nevertheless, the tags in the .ARM.attributes section are the standard, > published way to identify FP ABI as well as a number of other properties > that might be relevant to a linker. 1) The attributes only visible in the section view (as used by linkable object files). You can't rely on that being present in an executable image. 2) The encoding has been arranged for density, not performance; it's unsuitable for low-cost look-ups when searching a chain of libraries. Attributes were never intended for use at run time; IMO it would be a mistake to try and coerce them into such a role. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Fwd: [cbuild] gcc-4.8~svn189808 armv7l failed
On 25/07/12 05:16, Michael Hope wrote: > FYI GCC trunk r189808 fails to build with a bootstrap comparison error: > > Comparing stages 2 and 3 > warning: gcc/cc1-checksum.o differs > warning: gcc/cc1plus-checksum.o differs > warning: gcc/cc1obj-checksum.o differs > warning: gcc/cc1objplus-checksum.o differs > Bootstrap comparison failure! > arm-linux-gnueabi/libgcc/unwind-arm.o differs > arm-linux-gnueabi/libgcc/unwind-arm_s.o differs > > 189575 was fine on hard float. 189745 is fine on softfp. > 189792 is fine for softfp as well. R. > -- Michael > > -- Forwarded message -- > From: Linaro Toolchain Builder > Date: 25 July 2012 15:59 > Subject: [cbuild] gcc-4.8~svn189808 armv7l failed > To: "michael.hope+not...@linaro.org" > > > ursa3 finished running job gcc-4.8~svn189808 on > armv7l-precise-cbuild348-ursa3-cortexa9hfr1. > > The results are here: > http://builds.linaro.org/toolchain/gcc-4.8~svn189808 > > > > This email is sent from a cbuild (https://launchpad.net/cbuild) based > bot which is administered by Michael Hope . > ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Armhf dynamic linker path
On 02/05/12 13:25, Mans Rullgard wrote: > On 2 May 2012 05:15, Michael Hope wrote: >> On 27 April 2012 11:59, Michael Hope wrote: >>> On 23 April 2012 14:23, Jon Masters wrote: On 04/22/2012 06:06 PM, Michael Hope wrote: > On 21 April 2012 09:10, Jon Masters wrote: >> Hey everyone, >> >> Following up here. Where do we stand? We need to have upstream patches >> before we can pull them into the distro - is that piece done? > > Hi Jon. I've been away, sorry. I've just sent the GCC patch and > Carlos is on the hook for the GLIBC side. I saw the email. Could folks do me a favor and let me know the moment this lands in upstream and I'll arrange for us to pull it immediately. (I'm on all the libc lists, but then I'm on almost every list, everywhere, so it takes a bit of time to get to it) >>> >>> Hi Jon. There's a fault with the GCC patch so it's still in progress. >>> Carlos sent the GLIBC patch out for review today. >> >> Hi Jon. The GCC patch is now upstream as r186859 and r187012. > > I noticed that it now sets the dynamic loader to /lib/ld-linux-armhf.so.3 > even when configured for soft-float ABI and linking against a soft-float > rootfs. The resulting binaries then fail to run. Passing -mfloat-abi=softfp > to the link command fixes it. Is this change in behaviour intentional? > Eh? Exactly what command line did you invoke the compiler with, and what was the configuration? Are you sure you don't have a compiler configured to hard-float by default? R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Update on stack (re-)alignment issues
On 18/04/12 18:36, Ulrich Weigand wrote: > > Hello, > > I've been following up on the discussion we had on Monday regarding stack > alignment, and noticed that I had mis-remembered the current state of > affairs. Ramana asked me on Tuesday to provide a write-up of the actual > status, so here we go ... > > > To summarize the background of the problem: on ARM, the incoming stack > pointer is only guaranteed to be aligned to an 8 byte boundary. This means > that objects on the stack (local variables, spill slots, temporaries etc.) > cannot easily be aligned to more than 8 bytes. This can potentially cause > problems in two situations: > > 1) The object's default alignment (according to its type) is larger than 8 > bytes > 2) The object has a forced non-default alignment that is larger than 8 > bytes > > The first situation should in theory never appear, since according to the > ARM ABI all types have a default alignment of at most 8 bytes. However, > due to the current mix-up in GCC, vector types actually are considered to > have a 16-byte alignment requirement in GCC. > > The second situation can only appear with local variables that are declared > using attribute ((aligned)). > > > We had discussed on Monday that we need to fix the second situation, since > this can always occur and is supported on other platforms. By doing so, > we would then automatically fix the first situation as well. > > However, this reasoning turns out to be incorrect. There are currently in > GCC *two* completely separate mechanisms that can be used to align objects > on the stack to larger than the ABI guaranteed stack pointer alignment: > > A) Re-alignment of the full stack frame. This is what is used by the Intel > back-end (and only the Intel back-end). At function entry, generated code > will align the stack pointer itself to whatever is necessary to fulfil > alignment requirements of all objects on the stack. This may necessitate > follow-on changes: the frame pointer, if there is one, will likewise need > to be aligned at runtime. Also, since incoming stack arguments are now no > longer at a fixed offset relative to the stack pointer *or* frame pointer > in some cases, we might need an extra register as argument pointer. This > method allows extra alignment for *any* object on the stack, but needs > significant back-end support in order to be enabled on any non-Intel > architecture. > > B) Dynamic allocation of selected stack variables. This is implemented by > common code with no involvement of the back-end. In effect, the code in > cfgexpand.c:expand_stack_vars that decides on how to allocate local > variables on the stack will remove all variables that require extra > alignment and place them into an extra structure. Generated prologue code > will then in effect dynamically allocate and align that structure on the > stack, and just store a pointer to it as "variable" into the normal stack > frame. All other areas of the frame are unaffected. Since this method > just simulates code the programmer could have written themselves using > alloca, it does not require *any* back-end support and is enabled by > default everywhere. However, it only works for regular local variables, > and not for any other objects on the stack. I read the C11 standard briefly a few months back, and I believe that B) is all that is needed there. The standard excludes over-aligning function arguments. > > Objects on the stack *except* local variables always use default alignment. > Since on most platforms, except Intel and *currently* ARM, the ABI stack > pointer alignment is sufficient to implement default alignments, method B) > as above is able to fulfil all stack alignments. Intel uses method A), so > they're also OK. In effect, it's only ARM due to the vector type > alignment problem that runs into the situation that neither method works. > > > Under those circumstances, given that: > - we want to fix vector type alignment in order to become ABI compliant > - once we've fixed this, we're in the same situation as other platforms and > method B) already fixes stack alignment problems > - implementing method A) is therefore both quite involved *and* actually > superfluous > > I'd now rather recommend that we *don't* try to implement method A) (full > stack-frame re-alignment) on ARM. > > Comments? > Yes, sounds like the right solution to me. Technically, GCC's vector mechanism allows the creation of any size of vector, which will be aligned to the size of the vector. We only run into problems when that size exceeds the maximum alignment. Such values passed by value to functions should also be over-aligned. I think if we were to continue supporting such non-standard types we would have to change the rules to pass them by reference and have caller copying. We'd still need to deal with the 16-byte vectors somehow though. So overall, I think the only practical solution is to limit vectors to 8-byte
Re: getting armv7 linker to emit armv4t thumb interworking
On 13/04/12 12:05, Richard Earnshaw wrote: > On 13/04/12 11:58, Mans Rullgard wrote: >> On 13 April 2012 11:47, Richard Earnshaw wrote: >>> On 12/04/12 20:10, Allen Martin wrote: >>>> I have a cross toolchain I configured with "--with-arch=armv7-a >>>> --with-cpu=cortex-a9 --with-tune=cortex-a9" and I want the linker to emit >>>> armv4t compatible thumb interworking, but I can't seem to get it to. >>>> >>>> I noticed that if I create a armv4t toolchain with "--with-arch=armv4t >>>> --with-cpu=arm7tdmi --with-tune=arm7tdmi" and then I pass "--use-blx" to >>>> the linker it will emit armv7 thumb interworking. There doesn't seem to >>>> be any inverse "--no-use-blx" type switch though. Is this a >>>> bug/limitation of the linker or am I misunderstanding something? >>>> >>> >>> it's all in the friendly manual :-) >>> >>> The option you need is --fix-v4bx. >> >> That option is for supporting pre-thumb cores, which is not necessary >> here. > > Oops, misread the question. Sorry. > > To get v4t style interworking you need to ensure all your objects are > built for v4t. The use of blx should only occur if the linker detects > an object file in the input list that already contains support for blx > (for example, because it was compiled with v5 or later). > > R. > > I've just built an assembler/linker with the options you mention. Compiling the following testcase: .text .arm .cpu arm7tdmi .global _start .type _start, %function _start: bl bar bx lr .thumb .global bar .type bar, %function bar: bx lr Results in: .../as /tmp/asm.s -o /tmp/asm.o .../ld -o /tmp/asm /tmp/asm.o .../objdump -d /tmp/asm /tmp/asm: file format elf32-littlearm Disassembly of section .text: 8000 <_start>: 8000: eb02bl 8010 <__bar_from_arm> 8004: e12fff1ebx lr 8008 : 8008: 4770bx lr 800a: 46c0nop ; (mov r8, r8) 800c: movsr0, r0 ... 8010 <__bar_from_arm>: 8010: e59fc000ldr ip, [pc]; 8018 <__bar_from_arm+0x8> 8014: e12fff1cbx ip 8018: 8009.word 0x8009 801c: .word 0x Change the .cpu directive to .cpu cortex-a9 and you get objdump -d /tmp/asm /tmp/asm: file format elf32-littlearm Disassembly of section .text: 8000 <_start>: 8000: fa00blx 8008 8004: e12fff1ebx lr 8008 : 8008: 4770bx lr 800a: bf00nop So exactly how are you calling the linker? ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [PATCH] ld: add switch to disable use of BLX instructions
> +target2_type, fix_v4bx, use_blx, no_use_blx, > vfp11_denorm_fix, no_enum_size_warning, > no_wchar_size_warning, > pic_veneer, fix_cortex_a8, > @@ -533,6 +534,7 @@ PARSE_AND_LIST_PROLOGUE=' > #define OPTION_NO_MERGE_EXIDX_ENTRIES 316 > #define OPTION_FIX_ARM1176 317 > #define OPTION_NO_FIX_ARM1176318 > +#define OPTION_NO_USE_BLX319 > ' > > PARSE_AND_LIST_SHORTOPTS=p > @@ -547,6 +549,7 @@ PARSE_AND_LIST_LONGOPTS=' >{ "fix-v4bx", no_argument, NULL, OPTION_FIX_V4BX}, >{ "fix-v4bx-interworking", no_argument, NULL, > OPTION_FIX_V4BX_INTERWORKING}, >{ "use-blx", no_argument, NULL, OPTION_USE_BLX}, > + { "no-use-blx", no_argument, NULL, OPTION_NO_USE_BLX}, >{ "vfp11-denorm-fix", required_argument, NULL, OPTION_VFP11_DENORM_FIX}, >{ "no-enum-size-warning", no_argument, NULL, OPTION_NO_ENUM_SIZE_WARNING}, >{ "pic-veneer", no_argument, NULL, OPTION_PIC_VENEER}, > @@ -567,7 +570,7 @@ PARSE_AND_LIST_OPTIONS=' >fprintf (file, _(" --target2=Specify definition of > R_ARM_TARGET2\n")); >fprintf (file, _(" --fix-v4bx Rewrite BX rn as MOV pc, > rn for ARMv4\n")); >fprintf (file, _(" --fix-v4bx-interworking Rewrite BX rn branch to > ARMv4 interworking veneer\n")); > - fprintf (file, _(" --use-blx Enable use of BLX > instructions\n")); > + fprintf (file, _(" --[no-]use-blx Disable/enable use of BLX > instructions\n")); >fprintf (file, _(" --vfp11-denorm-fix Specify how to fix VFP11 > denorm erratum\n")); >fprintf (file, _(" --no-enum-size-warning Don'\''t warn about > objects with incompatible\n" > "enum sizes\n")); > @@ -625,6 +628,10 @@ PARSE_AND_LIST_ARGS_CASES=' >use_blx = 1; >break; > > +case OPTION_NO_USE_BLX: > + no_use_blx = 1; > + break; > + > case OPTION_VFP11_DENORM_FIX: >if (strcmp (optarg, "none") == 0) > vfp11_denorm_fix = BFD_ARM_VFP11_FIX_NONE; -- Richard Earnshaw Email: richard.earns...@arm.com Engineering Manager Phone: +44 1223 400569 (Direct + VoiceMail) OpenSource Tools Switchboard: +44 1223 400400 ARM Ltd Fax: +44 1223 400410 110 Fulbourn Rd Web: http://www.arm.com/ Cambridge, UK. CB1 9NJ -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: getting armv7 linker to emit armv4t thumb interworking
On 13/04/12 11:58, Mans Rullgard wrote: > On 13 April 2012 11:47, Richard Earnshaw wrote: >> On 12/04/12 20:10, Allen Martin wrote: >>> I have a cross toolchain I configured with "--with-arch=armv7-a >>> --with-cpu=cortex-a9 --with-tune=cortex-a9" and I want the linker to emit >>> armv4t compatible thumb interworking, but I can't seem to get it to. >>> >>> I noticed that if I create a armv4t toolchain with "--with-arch=armv4t >>> --with-cpu=arm7tdmi --with-tune=arm7tdmi" and then I pass "--use-blx" to >>> the linker it will emit armv7 thumb interworking. There doesn't seem to be >>> any inverse "--no-use-blx" type switch though. Is this a bug/limitation of >>> the linker or am I misunderstanding something? >>> >> >> it's all in the friendly manual :-) >> >> The option you need is --fix-v4bx. > > That option is for supporting pre-thumb cores, which is not necessary > here. Oops, misread the question. Sorry. To get v4t style interworking you need to ensure all your objects are built for v4t. The use of blx should only occur if the linker detects an object file in the input list that already contains support for blx (for example, because it was compiled with v5 or later). R. -- Richard Earnshaw Email: richard.earns...@arm.com Engineering Manager Phone: +44 1223 400569 (Direct + VoiceMail) OpenSource Tools Switchboard: +44 1223 400400 ARM Ltd Fax: +44 1223 400410 110 Fulbourn Rd Web: http://www.arm.com/ Cambridge, UK. CB1 9NJ -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: getting armv7 linker to emit armv4t thumb interworking
On 12/04/12 20:10, Allen Martin wrote: > I have a cross toolchain I configured with "--with-arch=armv7-a > --with-cpu=cortex-a9 --with-tune=cortex-a9" and I want the linker to emit > armv4t compatible thumb interworking, but I can't seem to get it to. > > I noticed that if I create a armv4t toolchain with "--with-arch=armv4t > --with-cpu=arm7tdmi --with-tune=arm7tdmi" and then I pass "--use-blx" to the > linker it will emit armv7 thumb interworking. There doesn't seem to be any > inverse "--no-use-blx" type switch though. Is this a bug/limitation of the > linker or am I misunderstanding something? > it's all in the friendly manual :-) The option you need is --fix-v4bx. BTW, taking a v7a toolchain and trying to use it to build v4 binaries is likely to be prolematic. For it to work you'll need to ensure that *all* your libraries are built to support back-conversion to v4, including those that are normally built as part of the toolchain (libgcc, etc). R. -- Richard Earnshaw Email: richard.earns...@arm.com Engineering Manager Phone: +44 1223 400569 (Direct + VoiceMail) OpenSource Tools Switchboard: +44 1223 400400 ARM Ltd Fax: +44 1223 400410 110 Fulbourn Rd Web: http://www.arm.com/ Cambridge, UK. CB1 9NJ -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Phone call (was Re: Armhf dynamic linker path)
On 12/04/12 19:29, Dennis Gilmore wrote: > > off topic but i find aarch64 weird and too generic is it arm alpha amd > atom. > That's only 'cos it's new. It's no different from names like ia64. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: New conference call numbers
Number is back up, but no host... On 5 Mar 2012, at 09:38, "Richard Earnshaw" wrote: > Call went silent on me. Redialing is giving number unobtainable :-( > > > > On 4 Mar 2012, at 22:35, "Michael Hope" wrote: > >> On Mon, Mar 5, 2012 at 11:31 AM, Peter Maydell >> wrote: >>> On 4 March 2012 22:21, Michael Hope wrote: >>>> The new conference call numbers have arrived. Let's use them for today's >>>> call. >>>> >>>> The toll free numbers are: >>>> >>>> * Australia: 1 800 804 786 >>>> * China: +400 120 05 90 >>>> * Germany: 01801 003 899 >>>> * New Zealand: 0800 452 947 >>>> * Sweden: 0200 125 588 >>>> * UK: 0845 351 2782 >>> >>> This isn't toll-free : UK 0845 numbers are traditionally "local rate". >>> Your wiki page correctly lists this as "lo-call". >>> >>>> * USA: 1-866-398-2885 >>>> >>>> Alternates, including cheaper (for Linaro) local numbers are: >>>> >>>> * International toll free - Germany: 0800 588 9170 >>>> * International toll free - UK: 0800 358 6385 >>> >>> ...and indeed this is the UK toll free number. >> >> Ta. Use whichever is affordable and best. >> >> -- Michael >> >> ___ >> linaro-toolchain mailing list >> linaro-toolchain@lists.linaro.org >> http://lists.linaro.org/mailman/listinfo/linaro-toolchain >> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: New conference call numbers
Call went silent on me. Redialing is giving number unobtainable :-( On 4 Mar 2012, at 22:35, "Michael Hope" wrote: > On Mon, Mar 5, 2012 at 11:31 AM, Peter Maydell > wrote: >> On 4 March 2012 22:21, Michael Hope wrote: >>> The new conference call numbers have arrived. Let's use them for today's >>> call. >>> >>> The toll free numbers are: >>> >>> * Australia: 1 800 804 786 >>> * China: +400 120 05 90 >>> * Germany: 01801 003 899 >>> * New Zealand: 0800 452 947 >>> * Sweden: 0200 125 588 >>> * UK: 0845 351 2782 >> >> This isn't toll-free : UK 0845 numbers are traditionally "local rate". >> Your wiki page correctly lists this as "lo-call". >> >>> * USA: 1-866-398-2885 >>> >>> Alternates, including cheaper (for Linaro) local numbers are: >>> >>> * International toll free - Germany: 0800 588 9170 >>> * International toll free - UK: 0800 358 6385 >> >> ...and indeed this is the UK toll free number. > > Ta. Use whichever is affordable and best. > > -- Michael > > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-toolchain > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: gcc: Thumb interworking and weakly linked functions
On 23/02/12 10:27, Aneesh V wrote: > Ok. Agree. I never used to use %function when I wrote assembly > functions earlier. I am sure a lot of code will break if this was > enforced. If you've not used %function on ARM, then your code is semantically broken even if it isn't syntactically broken. The ABI rules for dealing with interworking and and out-of-range branches all rely on %function being used correctly. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Operations on intrinsic types
On 06/01/12 02:17, Michael Hope wrote: > Hi Ramana. You were right about being able to do operations on > intrinsic types. Instead of doing the admittedly made up: > > int16x4_t foo2(int16x4_t a, int16x4_t b) > { > int16x4_t ca = vdup_n_s16(0.2126*256); > int16x4_t cb = vdup_n_s16(0.7152*256); > > return vadd_s16(vmul_s16(ca, a), vmul_s16(cb, b)); > } > > you can do: > > int16x4_t foo3(int16x4_t a, int16x4_t b) > { > int16x4_t ca = vdup_n_s16(0.2126*256); > int16x4_t cb = vdup_n_s16(0.7152*256); > > return ca*a + cb*b; > } > > which is more readable and, as an added bonus, generates the > multiply-and-accumulate that I missed when using intrinsics. Nice. This is a GCC extension. It's not portable, and in particular it's not supported by ARM's own compiler. There are also difficulties if you start doing operations directly when it comes to dealing with big-endian as there is a degree of divergence between GCC's own interpretation of vectors and the intrinsic view; mixing and matching can lead to subtle problems with lane numbering. -- Richard Earnshaw Email: richard.earns...@arm.com Engineering Manager Phone: +44 1223 400569 (Direct + VoiceMail) OpenSource Tools Switchboard: +44 1223 400400 ARM Ltd Fax: +44 1223 400410 110 Fulbourn Rd Web: http://www.arm.com/ Cambridge, UK. CB1 9NJ -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: How to fail an ARMv5T u-boot build when libgcc is ARMv7T2?
On 10 Nov 2011, at 23:47, "Loïc Minier" wrote: > On Thu, Nov 10, 2011, Richard Earnshaw wrote: >> You can't rely on finding it. Executable images are only required to >> have segment headers an d the attributes system uses section headers. >> Section headers can be removed when an image gets stripped. > > but if I have it during the u-boot build, it's reliable? I've no idea how uboot works in detail. I'm describing the global rules for elf images. > > If not, can I check the architecture for which libgcc was built > programatically? If it's a static library, the you could extract one of the object files and look at its attributes. If it's a shared lib, then the same restrictions would apply as to an executable. R. > > -- > Loïc Minier > -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: How to fail an ARMv5T u-boot build when libgcc is ARMv7T2?
You can't rely on finding it. Executable images are only required to have segment headers an d the attributes system uses section headers. Section headers can be removed when an image gets stripped. On 10 Nov 2011, at 16:46, "Loïc Minier" mailto:loic.min...@linaro.org>> wrote: On Thu, Nov 10, 2011, Richard Earnshaw wrote: The build attributes were never intended to be used in executables; the format is too expensive to decode in most situations. Instead the ABI specification included an optional PHEADER that had a highly simplified indication of the executable compatibility of an image (essentially the architecture). Unfortunately this has never been added to GNU Binutils. See PT_ARM_ARCHEXT in the ARM ELF specification (available from infocenter). As a poor man's test, would it be reliable to test Tag_CPU_name on the resulting u-boot ELF binary at the end of the build? If I read you correctly, you seem to suggest that it would be too fragile? -- Loïc Minier -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: How to fail an ARMv5T u-boot build when libgcc is ARMv7T2?
ch/arm/cpu/arm926ejs/versatile/libversatile.o arch/arm/lib/libarm.o common/libcommon.o disk/libdisk.o drivers/bios_emulator/libatibiosemu.o drivers/block/libblock.o drivers/dma/libdma.o drivers/fpga/ libfpga.o drivers/gpio/libgpio.o drivers/hwmon/libhwmon.o drivers/i2c/libi2c.o drivers/input/libinput.o drivers/misc/libmisc.o drivers/mmc/libmmc.o drivers/mtd/libmtd.o drivers/mtd/nand/libnand.o drivers/mtd/onenand/libonenand.o drivers/mtd/spi/libspi_flash.o drivers/mtd/ubi/libubi.o drivers/net/libnet.o drivers/net/phy/libphy.o drivers/pci/libpci.o drivers/pcmcia/libpcmcia.o drivers/power/libpower.o drivers/rtc/librtc.o drivers/serial/libserial.o drivers/spi/libspi.o drivers/twserial/libtws.o drivers/usb/eth/libusb_eth.o drivers/usb/gadget/libusb_gadget.o drivers/usb/host/libusb_host.o drivers/usb/musb/libusb_musb.o drivers/usb/phy/libusb_phy.o drivers/video/libvideo.o drivers/watchdog/libwatchdog.o fs/cramfs/libcramfs.o fs/ext2/libext2fs.o fs/fat/libfat.o fs/fdos/libfdos.o fs/jffs2/libjffs2.o fs/reiserfs/libreiserfs.o fs/ubifs/libubifs.o fs/yaffs2/libyaffs2.o lib/libfdt/libfdt.o lib/libgeneric.o lib/lzma/liblzma.o lib/lzo/liblzo.o lib/zlib/libz.o net/libnet.o post/libpost.o board/armltd/versatile/libversatile.o --end-group /home/lool/git/denx/u-boot/obj-v-broken/arch/arm/lib/eabi_compat.o -L /usr/lib/gcc/arm-linux-gnueabi/4.6.1 -lgcc -Map u-boot.map -o u-boot > > I verified that all the .o files passed above have Tag_CPU_name: > "5TE" in their arm-linux-gnueabi-readelf -A output; the only > problematic file is -lgcc. > > Note that the final link is done with arm-linux-gnueabi-ld and doesn't > set any architecture; I changed it manually to use gcc and pass the > -marm -march=armv5te, and had to set -nostdlib too when using gcc: > arm-linux-gnueabi-gcc -marm -march=armv5te -nostdlib -pie -T > /home/lool/git/denx/u-boot/obj-v-broken/u-boot.lds -Bstatic -Ttext 0x1 > $UNDEF_SYM arch/arm/cpu/arm926ejs/start.o -Wl,--start-group api/libapi.o > arch/arm/cpu/arm926ejs/libarm926ejs.o > arch/arm/cpu/arm926ejs/versatile/libversatile.o arch/arm/lib/libarm.o > common/libcommon.o disk/libdisk.o drivers/bios_emulator/libatibiosemu.o > drivers/block/libblock.o drivers/dma/libdma.o drivers/fpga/libfpga.o > drivers/gpio/libgpio.o drivers/hwmon/libhwmon.o drivers/i2c/libi2c.o > drivers/input/libinput.o drivers/misc/libmisc.o drivers/mmc/libmmc.o > drivers/mtd/libmtd.o drivers/mtd/nand/libnand.o > drivers/mtd/onenand/libonenand.o drivers/mtd/spi/libspi_flash.o > drivers/mtd/ubi/libubi.o drivers/net/libnet.o drivers/net/phy/libphy.o > drivers/pci/libpci.o drivers/pcmcia/libpcmcia.o drivers/power/libpower.o > drivers/rtc/librtc.o drivers/serial/libserial.o drivers/spi/libspi.o > drivers/twserial/libtws.o drivers/usb/eth/libusb_eth.o drivers/usb/ gadget/libusb_gadget.o drivers/usb/host/libusb_host.o drivers/usb/musb/libusb_musb.o drivers/usb/phy/libusb_phy.o drivers/video/libvideo.o drivers/watchdog/libwatchdog.o fs/cramfs/libcramfs.o fs/ext2/libext2fs.o fs/fat/libfat.o fs/fdos/libfdos.o fs/jffs2/libjffs2.o fs/reiserfs/libreiserfs.o fs/ubifs/libubifs.o fs/yaffs2/libyaffs2.o lib/libfdt/libfdt.o lib/libgeneric.o lib/lzma/liblzma.o lib/lzo/liblzo.o lib/zlib/libz.o net/libnet.o post/libpost.o board/armltd/versatile/libversatile.o -Wl,--end-group /home/lool/git/denx/u-boot/obj-v-broken/arch/arm/lib/eabi_compat.o -L /usr/lib/gcc/arm-linux-gnueabi/4.6.1 -lgcc -Wl,-Map u-boot.map -o u-boot > > But this command works and produces an u-boot ELF which has > Tag_CPU_name: "7-A". > The build attributes were never intended to be used in executables; the format is too expensive to decode in most situations. Instead the ABI specification included an optional PHEADER that had a highly simplified indication of the executable compatibility of an image (essentially the architecture). Unfortunately this has never been added to GNU Binutils. See PT_ARM_ARCHEXT in the ARM ELF specification (available from infocenter). R. > How would I break the build when libgcc isn't ARMv5T? > >Thanks, -- Richard Earnshaw Email: richard.earns...@arm.com Engineering Manager Phone: +44 1223 400569 (Direct + VoiceMail) OpenSource Tools Switchboard: +44 1223 400400 ARM Ltd Fax: +44 1223 400410 110 Fulbourn Rd Web: http://www.arm.com/ Cambridge, UK. CB1 9NJ -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: eglibc and fun with config.sub
On 16/09/11 15:12, Ulrich Weigand wrote: > Richard Sandiford wrote: >> David Gilbert writes: >>> My current patch: >>> * adds armv6 and armv7 to config.sub >>> * adds arm/eabi/armv7 and arm/eabi/armv6t2 and one assembler >>> routine in there. >>> * If $machine is just 'arm' then it autodetects from gcc's #defines >>> * else if $machine is armv then that's still $machine >> >> I'm taking you literally here, but I think you want things like >> armeb-linux-gnueabi to be treated like arm-linux-gnueabi. >> >> TBH, I think unconditionally using the autodetect (but setting $machine >> rather than $submachine, as you say) would be easier and more consistent >> across packages. gcc and eglibc will then agree on the target, whereas >> the extra complication in the current scheme is there simply to make >> eglibc and gcc disagree in certain cases. But I realise we might not >> want to fight that fight. > > FWIW I'd tend to agree that encoding the architecture level into the > target triplet seems to lead to more confusion than that it helps ... > Agreed. Especially as 1) It's incomplete -- doesn't cover FPU/Neon capabilities 2) It's generally mixed up with other things -- like endianness. This makes it all the more confusing 3) It's too ad-hoc -- sometimes it's an architecture, sometimes it's a CPU. This leads to all sorts of weird and wonderful names appearing in the config string which have to be understood. > We certainly never did that on s390 (or powerpc, for that matter); > instead, the way to select an architecture level is to build your > system compiler to default to that level, and then build all your > system libraries with that compiler; the libraries will detect > the desired architecture level from GCC defines. I think it comes from the i[345]86 mess. I suspect it has origins in red-hat linux where the config triplet is built directly from the 'cpu' reported by the kernel uname -m component. > > On the other hand, given that arm has already gone down the road of > using the target triplet, I guess I can see why it might make sense > to continue that. In the end, that's for the platform maintainers > to decide ... > I'd prefer that we retrenched if possible. Let's not make the rat-hole even bigger. R. > > Mit freundlichen Gruessen / Best Regards > > Ulrich Weigand > > -- > Dr. Ulrich Weigand | Phone: +49-7031/16-3727 > STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. > IBM Deutschland Research & Development GmbH > Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk > Wittkopp > Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht > Stuttgart, HRB 243294 > > > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-toolchain > ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: A question about disabling -gtoggle in bootstrap run
On Fri, 2011-03-04 at 15:33 +0200, Revital1 Eres wrote: > Hello, > > I am looking for a way to disable '-gtoggle' flag in the run of stage 2 in > bootstrap; when > configuring ARM with (*). > The flag seems to be applied in stage 2 but not in stage 3 which seems to > cause bootstrap failure when > testing SMS as in stage 2 SMS fails because of debug_insn caused > by -gtoggle disturbing do-loop; while in stage 3 SMS succeeds; resulting > in different .o files and bootsrtrap failure. > > (*) This the configure I used: > ../gcc/configure --prefix=/home/eres/mainline/build --enable-checking > --enable-languages=c --enable-bootstrap By definition that's regarded as a bug in the compiler. Code generated with debug enabled is required to be identical to code generated without it. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Perfromance Test Results using gcc-linaro-4.5-2010.11-1
On Thu, 2010-12-09 at 13:31 +1300, Michael Hope wrote: > On Wed, Dec 8, 2010 at 7:10 PM, Michael Hope wrote: > -fno-common also gives a small improvement in various benchmarks, but > may break some programs. Any breakage with -fno-common would be detected at link time, so if your program compiles successfully it should run successfully. The breakage is that some non-strictly conforming programs that declare variables tentatively with, for example: int x; in multiple translation units will get multiple-definition errors at link time. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Using inline NEON code
On Fri, 2010-12-03 at 10:49 +1300, Michael Hope wrote: > Hi there. Currently you can't use NEON instructions in inline > assembly if the compiler is set to -mfpu=vfp such as Ubuntu's > -mfpu=vfpv3-d16. Trying code like this: > > int main() > { >asm("veor d1, d2, d3"); >return 0; > } > > gives an error message like: > > test.s: Assembler messages: > test.s:29: Error: selected processor does not support Thumb mode `veor > d1,d2,d3' > > The problem is that -mfpu=vfpv3-d16 has two jobs: it tells the > compiler what instructions to use, and also tells the assembler what > instructions are valid. We might want the compiler to use the VFP for > compatibility or power reasons, but still be able to use NEON > instructions in inline assembler without passing extra flags. Sorry, I disagree with that distinction. > > Inserting ".fpu neon" to the start of the inline assembly fixes the > problem. Is this valid? Not really. It changes the global setting for the rest of the file and changes the global attributes on your object. > Are assembly files with multiple .fpu > statements allowed? Passing '-Wa,-mfpu=neon' to GCC doesn't work as > gas seems to ignore the second -mfpu. > > What's the best way to handle this? Some options are: > * Add '.fpu neon' directives to the start of any inline assembly Nope, .fpu neon is used in building the attributes data. The attribute architecture in the ABI supports attributes on sections or symbols, it does not support them on arbitrary snippets of code. The point of attributes is to reason about compatibility and users intentions. You can't do that if the attributes data is messed up. Currently BFD/GAS/GOLD cannot generate/interpret attributes on sub-elements of object files; they can only work on file-scope attributes. That's a bug, but it's not really relevant to this discussion. > * Separate out the features, so you can specify the capabilities with > one option and restrict the compiler to a subset with another. > Something like '-mfpu=neon -mfpu-tune=vfpv3-d16' That'll just confuse users, IMO. Also, see below > * Relax the assembler so that any instructions are accepted. We'd > lose some checking of GCC's output though. Trust me, you just don't want to do that (see above). You've missed the right answer though: make the compiler smarter. The compiler should be able to work out when it's beneficial to use neon and when it's not: adding hooks to try and defeat a poor choice by the compiler sounds like bad design to me. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Help on merging two patches
On Wed, 2010-11-03 at 17:39 +0800, Yao Qi wrote: > Hi, > I am backporint some patches from FSF mainline, which may improve Linaro > 4.5 gcc on thumb2 speed. > > The first one is done by Richard E. "Improve optimization to transform > TST into LSLS" > http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html > After it applied to Linaro 4.5 tree, EEMBC speed number downgrades, > while code size is reduced to some extent. The code difference is like > this, > > 6801ldr r1, [r0, #0] > f831 3013 ldrh.w r3, [r1, r3, lsl #1] > -f413 6f00tst.w r3, #2048 ; 0x800 > -f43f af41beq.w cc > +0518 lslsr0, r3, #20 > +f57f af44bpl.w cc > 4610mov r0, r2 > > After reading cortex-a8 TRM, I can't find exact timing cycles of lsls. > Under Chung-Lin's help, we feel that lsls should be slower than tst, but > don't have any evidence to prove. If any people is familiar with arm > microarch, help is welcome. If our assumption is correct, we may can > change this patch to an optimization specific to size only. > > The second patch is Bernd's "Fix an if statement in arm_rtx_costs_1" > http://gcc.gnu.org/ml/gcc-patches/2010-07/msg02096.html > After this patch applied, EEMBC benchmark number is not changed. Shall > we merge this patch to linaro 4.5 tree? I am inclined to merge it, but > if you have concerns on this patch, let us discuss here. So I have no reason to expect lsls to ever take longer to execute than tst. I suspect what you are seeing here is some unfortunate side effect that can't be explained from the small code snippet. An example would include BTAC aliasing, but there could be other reasons for this happening. So overall, I'd expect the change to be a Good Thing (tm), but there's always the chance that individual blocks of code may run more slowly. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Host strip corrupts cross-built binaries
On Wed, 2010-09-22 at 08:36 -0700, Mark Mitchell wrote: > On 9/22/2010 8:34 AM, Loïc Minier wrote: > > > Which component is to blame here? Are we looking at a binutils or a > > gcc bug for not being able to set or read enough data that the > > architecture mismatch isn't detected? What could we do about it? > > This is definitely a binutils bug. So I agree that it's a bug in binutils that this produces incorrect output. I don't think it's necessarily a bug that a generic ELF strip program is unable to strip all ARM ELF binary files. This is particularly true for unlinked object files that can contain additional sections that refer to the symbol section. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Linaro GCC 4.4 and 4.5 2010.09 released
On Mon, 2010-09-20 at 09:14 +0100, Dave Martin wrote: > On Fri, Sep 17, 2010 at 11:31 PM, Loïc Minier wrote: > > On Wed, Sep 15, 2010, Michael Hope wrote: > >> GLIBC a mechanism for picking the best routines to use based on the > >> CPU capabilities. This means that GLIBC can include A8 and A9 > >> versions both with and without NEON, Ubuntu can ship all of these > >> versions, and the dynamic linker can choose the best one based on the > >> chip it is running on. > > > > Actually I understand STT_GNU_IFUNC would allow that, we just lack a > > good test > > Is STT_GNU_IFUNC implemented yet? > No. We need to sort out the ABI specs first. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Thumb2 code size improvements
On Fri, 2010-09-17 at 18:21 +0800, Yao Qi wrote: > Michael Hope wrote: > > It's only part of the puzzle, but I run speed benchmarks as part of > > the continious build: > > http://ex.seabright.co.nz/helpers/buildlog > > http://ex.seabright.co.nz/helpers/benchcompare > > > > http://ex.seabright.co.nz/build/gcc-linaro-4.5-2010.09-1/logs/armv7l-maverick-cbuild4-pavo4/pybench-test.txt > > > > I've just modified this to build different variants as well. ffmpeg > > now builds as supplied (-O2 and others), with -Os, with hand-written > > assembler turned off, and with -mfpu=neon. corebench builds in -O2 > > and -Os. > > Here are some options we may have to use in our benchmarks, > {-Os,-O2} -fno-common -mthumb --mfloat-abi={hard,soft} -mfpu=neon > > IIRC, hardfp will increase the code size to some extent. hard-float should show a significant code size saving over pure soft-float for anything with floating point code as the compiler will be able to use single instructions for many operations rather than library calls. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Thumb2 code size improvements
On Tue, 2010-09-07 at 12:24 +0100, Julian Brown wrote: > On Tue, 7 Sep 2010 12:55:59 +0200 > Loïc Minier wrote: > > > On Tue, Sep 07, 2010, Julian Brown wrote: > > > Do > > > you still have the code fragment handy (I don't remember exactly how > > > it went)? > > > > You can extract it from the wiki history with the "Info" action on > > the page and then diffing revisions: > > Oh right, I should have realised that :-). > > > 1. stmdb/ldmia registers that are not used > > * Observations > > {{{ > > Dump of assembler code for function history_expand_line_internal: > >0x1c1c <+0>: stmdb sp!, {r4, r5, r6, r7, r8, lr} > > This could be: > > push {r3, r4, r5, r6, r7, lr} > > >0x1c20 <+4>: movs r1, #0 > >0x1c22 <+6>: ldr r5, [pc, #52] ; (0x1c58 > > ) 0x1c24 <+8>: mov r2, r1 > >0x1c26 <+10>: mov r6, r0 > >0x1c28 <+12>: ldr r7, [r5, #0] > >0x1c2a <+14>: str r1, [r5, #0] > >0x1c2c <+16>: bl 0x1c2c > >0x1c30 <+20>: str r7, [r5, #0] > >0x1c32 <+22>: cmp r0, r6 > >0x1c34 <+24>: mov r4, r0 > >0x1c36 <+26>: bne.n 0x1c52 > >0x1c38 <+28>: bl 0x1c38 > >0x1c3c <+32>: ldr r1, [pc, #28] ; (0x1c5c > > ) 0x1c3e <+34>: movw r2, #1850 ; > > 0x73a 0x1c42 <+38>: adds r0, #1 > >0x1c44 <+40>: bl 0x1c44 > >0x1c48 <+44>: mov r1, r4 > >0x1c4a <+46>: ldmia.w sp!, {r4, r5, r6, r7, r8, lr} > > This must remain a wide instruction... > > ldmia.w sp!, {r3, r4, r5, r6, r7, lr} > > >0x1c4e <+50>: b.w 0x1c4e > >0x1c52 <+54>: ldmia.w sp!, {r4, r5, r6, r7, r8, pc} > > But this could be: > > pop {r3, r4, r5, r6, r7, pc} > > >0x1c56 <+58>: nop > >0x1c58 <+60>: andeq r0, r0, r0 > >0x1c5c <+64>: andeq r0, r0, r0 > > }}} > > Register r8 is not used in this function, so no need to save/restore > > r8. > > * Possible improvements > > So yeah, I think there is indeed a possible improvement here (and we > don't even need to break the EABI, I don't think). Unless I've > overlooked something, anyway... > GCC 4.5 should already do this: 2009-06-02 Richard Earnshaw * arm.c (arm_get_frame_offsets): Prefer using r3 for padding a push/pop multiple to 8-byte alignment. R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Thumb2 code size improvements
On Tue, 2010-09-07 at 13:09 +0100, Andrew Stubbs wrote: > On 07/09/10 13:01, Yao Qi wrote: > >> * Investigate reduced alignment constraints? > > > > Any details on this? > > No, I just know that some targets like to align functions to > cache-lines. This is a useful speed optimization, but does lead to lots > of "blank" gaps in the code. I have no real idea if ARM does this kind > of thing, or if the ABI has anything to say about it. > > I just suggest that we should check it out - or at least ask an ARM > expert if I'm talking nonsense. :) I'm pretty certain we don't do this with gratuitously -Os on ARM. We may, however, align some thumb functions to a 32-bit boundary unnecessarily (still needed if there's a literal pool). R. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain