On Wed, Feb 17, 2016 at 10:20 AM, Christophe Lyon <christophe.l...@linaro.org> wrote: > On 17 February 2016 at 11:05, Kyrill Tkachov > <kyrylo.tkac...@foss.arm.com> wrote: >> >> On 17/02/16 10:03, Christophe Lyon wrote: >>> >>> On 15 February 2016 at 12:32, Kyrill Tkachov >>> <kyrylo.tkac...@foss.arm.com> wrote: >>>> >>>> On 04/02/16 08:58, Ramana Radhakrishnan wrote: >>>>> >>>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wil...@linaro.org> >>>>> wrote: >>>>>> >>>>>> This is my suggested fix for PR 65932, which is a linux kernel >>>>>> miscompile with gcc-5.1. >>>>>> >>>>>> The problem here is caused by a chain of events. The first is that >>>>>> the relatively new eipa_sra pass creates fake parameters that behave >>>>>> slightly differently than normal parameters. The second is that the >>>>>> optimizer creates phi nodes that copy local variables to fake >>>>>> parameters and/or vice versa. The third is that the ouf-of-ssa pass >>>>>> assumes that it can emit simple move instructions for these phi nodes. >>>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that >>>>>> forces QImode and HImode to unsigned, but a >>>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not. So signed char and >>>>>> short parameters have different in register representations than local >>>>>> variables, and require a conversion when copying between them, a >>>>>> conversion that the out-of-ssa pass can't easily emit. >>>>>> >>>>>> Ultimately, I think this is a problem in the arm backend. It should >>>>>> not have a PROMOTE_MODE macro that is changing the sign of char and >>>>>> short local variables. I also think that we should merge the >>>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to >>>>>> prevent this from happening again. >>>>>> >>>>>> I see four general problems with the current ARM PROMOTE_MODE >>>>>> definition. >>>>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb >>>>>> instruction was added. It is a lose for armv6 and later. >>>>>> 2) Unsigned short was only faster for targets that don't support >>>>>> unaligned accesses. Support for these targets was removed a while >>>>>> ago, and this PROMODE_MODE hunk should have been removed at the same >>>>>> time. It was accidentally left behind. >>>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was >>>>>> converted to a function, the PROMOTE_MODE code was copied without the >>>>>> UNSIGNEDP changes. Thus it is only an accident that >>>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree. Changing >>>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE >>>>>> changes to resolve the difference are safe. >>>>>> 4) There is a general principle that you should only change signedness >>>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results >>>>>> in extra conversion instructions that make code slower. The mips64 >>>>>> hardware for instance requires that 32-bit values be sign-extended >>>>>> regardless of type, and instructions may trap if this is not true. >>>>>> However, it has a set of 32-bit instructions that operate on these >>>>>> values, and hence no conversions are required. There is no similar >>>>>> case on ARM. Thus the conversions are unnecessary and unwise. This >>>>>> can be seen in the testcases where gcc emits both a zero-extend and a >>>>>> sign-extend inside a loop, as the sign-extend is required for a >>>>>> compare, and the zero-extend is required by PROMOTE_MODE. >>>>> >>>>> Given Kyrill's testing with the patch and the reasonably detailed >>>>> check of the effects of code generation changes - The arm.h hunk is ok >>>>> - I do think we should make this explicit in the documentation that >>>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and >>>>> better still maybe put in a checking assert for the same in the >>>>> mid-end but that could be the subject of a follow-up patch. >>>>> >>>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of >>>>> the testsuite fallout separately. >>>> >>>> Hi all, >>>> >>>> I'd like to backport the arm.h from this ( r233130) to the GCC 5 >>>> branch. As the CSE patch from my series had some fallout on x86_64 >>>> due to a deficiency in the AVX patterns that is too invasive to fix >>>> at this stage (and presumably backport), I'd like to just backport >>>> this arm.h fix and adjust the tests to XFAIL the fallout that comes >>>> with not applying the CSE patch. The attached patch does that. >>>> >>>> The code quality fallout on code outside the testsuite is not >>>> that gread. The SPEC benchmarks are not affected by not applying >>>> the CSE change, and only a single sequence in a popular embedded >>>> benchmark >>>> shows some degradation for -mtune=cortex-a9 in the same way as the >>>> wmul-1.c and wmul-2.c tests. >>>> >>>> I think that's a fair tradeoff for fixing the wrong code bug on that >>>> branch. >>>> >>>> Ok to backport r233130 and the attached testsuite patch to the GCC 5 >>>> branch? >>>> >>>> Thanks, >>>> Kyrill >>>> >>>> 2016-02-15 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>>> >>>> PR target/65932 >>>> * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options. >>>> xfail the scan-assembler test. >>>> * gcc.target/arm/wmul-2.c: Likewise. >>>> * gcc.target/arm/wmul-3.c: Simplify test to generate a single >>>> smulbb. >>>> >>>> >>> Hi Kyrill, >>> >>> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC >>> configuration to: >>> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16 >>> (target arm-none-linux-gnueabihf) >>> >>> The generated code is: >>> sxth r0, r0 >>> sxth r1, r1 >>> mul r0, r1, r0 >>> instead of >>> smulbb r0, r1, r0 >>> on trunk. >>> >>> I guess we don't worry? >> >> >> Hi Christophe, >> Hmmm, I suspect we might want to backport >> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html >> to fix backend the costing logic of smulbb. >> Could you please try that patch to see if it helps? >> > > Ha indeed, with the attached patch, we now generate smulbb. > I didn't run a full make check though. > > OK with a suitable ChangeLog entry?
Ok with a suitable Changelog entry and if no regressions. Ramana > > Christophe. > >> Thanks, >> Kyrill >> >> >>>> >>>>> regards >>>>> Ramana >>>>> >>>>> >>>>> >>>>> >>>>>> My change was tested with an arm bootstrap, make check, and SPEC >>>>>> CPU2000 run. The original poster verified that this gives a linux >>>>>> kernel that boots correctly. >>>>>> >>>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail. These >>>>>> are tests to verify that smulbb and/or smlabb are generated. >>>>>> Eliminating the unnecessary sign conversions causes us to get better >>>>>> code that doesn't include the smulbb and smlabb instructions. I had >>>>>> to modify the testcases to get them to emit the desired instructions. >>>>>> With the testcase changes there are no additional testsuite failures, >>>>>> though I'm concerned that these testcases with the changes may be >>>>>> fragile, and future changes may break them again. >>>>> >>>>> >>>>> >>>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla, >>>>>> then maybe we should try to add new patterns to get the instructions >>>>>> emitted again for the unmodified testcases. >>>>>> >>>>>> Jim >>>> >>>> >>