Re: Target attribute hooks questions
Hi Kyrill, you are right it's not easy to get its way among all those macros, my main source of inspiration for ARM was the x86 implementation. You can have a look at the ARM implementation to start with (on gcc-patches, under review). That would be best not to diverge too much aarch64 might have a few code to share with the arm be. FYI I'm planning to add the fpu/neon attribute extensions A few quick answer bellow, ask if you need more. Cheers Christian On 05/05/2015 03:38 PM, Kyrill Tkachov wrote: > Hi all, > > I'm looking at implementing target attributes for aarch64 and I have some > questions about the hooks involved. > I haven't looked at this part of the compiler before, so forgive me if some > of them seem obvious. I couldn't > figure it out from the documentation > (https://gcc.gnu.org/onlinedocs/gccint/Target-Attributes.html#Target-Attributes) > > * Seems to me that TARGET_OPTION_VALID_ATTRIBUTE_P is the most important one > that parses > the string inside the __attribute__ ((target ("..."))) and sets the > target-specific > flags appropriately. Is that correct? Yes, it parses the string that goes into DECL_FUNCTION_SPECIFIC_TARGET (fndecl) and then builds the struct gcc_options that will be switched between functions. Note that this gone must go again to the option_override machinery since global options can be affected by the target options. > > * What is TARGET_ATTRIBUTE_TABLE used for? It's supposed to map attributes to > handlers? > Isn't that what TARGET_OPTION_VALID_ATTRIBUTE_P is for? I think it's different. the TARGET_ATTRIBUTE_TABLE specifies specific attributes (e.g naked, interrupt, ...) while the target attribute allows to pass target flags (e.g: -marm, -mfpu=neon, ...) > > * What is the use of TARGET_OPTION_SAVE and TARGET_OPTION_RESTORE? Is that > used during > something like LTO when different object files and functions are compiled > with different > flags? Are these functions just supposed to 'backup' various tuning and ISA > decisions? > This is to save custom function information that are not restored by TARGET_SET_CURRENT_FUNCTION. I didn't need it for arm/thumb. > * Is TARGET_COMP_TYPE_ATTRIBUTES the one that's supposed to handle > incompatible attributes > being specified? (for example incompatible endianness or architecture levels)? like TARGET_ATTRIBUTE_TABLE, it's different and doesn't pertain to attribute target Cheers Christian > > Thanks for any insight, > Kyrill >
Re: interest for ARM/thumb multiversionning ?
to clarify, my use case was slightly more different than the x86 that requires a runtime cpu-check builtin. I was more focused on a link time problem (so we don't even need to go thu a function ptr) Christian On 04/30/2015 08:45 AM, Christian Bruel wrote: > > > On 04/29/2015 05:36 PM, Ramana Radhakrishnan wrote: >> >> >> On 29/04/2015 09:24, Christian Bruel wrote: >>> Hi Ramana, Richard >>> >>> After playing with the attritute ((target ("[thumb,arm]")), during the >>> pending review, I added the "default" selector to neutralize >>> -mflip-thumb for the setjmp/longjmp based tests. >>> >>> I was wondering it there would be an interest leverage on this to >>> implement multiprocessing, like on the x86 ? >>> >> >> You mean multiversioning ? How would the dispatcher work in this case ? > > not sure what you mean, the fonction's name will need to be mangled with > the target specialization. The dispatching would be made based on the > caller mode. > > Could it be also a direction to help LTOization with the proper FPU > flags (follow bz target/65837) given at link time. > > My concern is that this is limited to C++ for x86. I haven't checked the > details, just ideas. > > Cheers > > Christian > >> Ramana >> >>> something that would allow (from the x86 doc) >>> >>> __attribute__ ((target ("default"))) >>> int foo () >>> { >>>asm("..."); >>> return 0; >>> } >>> >>> __attribute__ ((target ("thumb"))) >>> int foo () >>> { >>>asm("..."); >>> } >>> >>> int main () >>> { >>> int (*p)() = &foo; >>> assert ((*p) () == foo ()); >>> return 0; >>> } >>> >>> I had initially not planned to do it, but this is a simple extension of >>> the attribute target, if someone find a use for this I can implement it >>> on the fly. >>> >>> Best Regards, >>> >>> Christian >>>
Re: interest for ARM/thumb multiversionning ?
On 04/29/2015 05:36 PM, Ramana Radhakrishnan wrote: > > > On 29/04/2015 09:24, Christian Bruel wrote: >> Hi Ramana, Richard >> >> After playing with the attritute ((target ("[thumb,arm]")), during the >> pending review, I added the "default" selector to neutralize >> -mflip-thumb for the setjmp/longjmp based tests. >> >> I was wondering it there would be an interest leverage on this to >> implement multiprocessing, like on the x86 ? >> > > You mean multiversioning ? How would the dispatcher work in this case ? not sure what you mean, the fonction's name will need to be mangled with the target specialization. The dispatching would be made based on the caller mode. Could it be also a direction to help LTOization with the proper FPU flags (follow bz target/65837) given at link time. My concern is that this is limited to C++ for x86. I haven't checked the details, just ideas. Cheers Christian > Ramana > >> something that would allow (from the x86 doc) >> >> __attribute__ ((target ("default"))) >> int foo () >> { >>asm("..."); >> return 0; >> } >> >> __attribute__ ((target ("thumb"))) >> int foo () >> { >>asm("..."); >> } >> >> int main () >> { >> int (*p)() = &foo; >> assert ((*p) () == foo ()); >> return 0; >> } >> >> I had initially not planned to do it, but this is a simple extension of >> the attribute target, if someone find a use for this I can implement it >> on the fly. >> >> Best Regards, >> >> Christian >>
interest for ARM/thumb multiversionning ?
Hi Ramana, Richard After playing with the attritute ((target ("[thumb,arm]")), during the pending review, I added the "default" selector to neutralize -mflip-thumb for the setjmp/longjmp based tests. I was wondering it there would be an interest leverage on this to implement multiprocessing, like on the x86 ? something that would allow (from the x86 doc) __attribute__ ((target ("default"))) int foo () { asm("..."); return 0; } __attribute__ ((target ("thumb"))) int foo () { asm("..."); } int main () { int (*p)() = &foo; assert ((*p) () == foo ()); return 0; } I had initially not planned to do it, but this is a simple extension of the attribute target, if someone find a use for this I can implement it on the fly. Best Regards, Christian
Re: GCC 5 Status Report (2015-01-08), Stage 4 to start soon
Hi Ramana, any chance to get the attribute target support for ARM review in time for stage 4 ? Many thanks Christian On 01/08/2015 11:32 AM, Jakub Jelinek wrote: The trunk is still in Stage 3 now, which means it is open for general bugfixing, but will enter Stage 4 on Friday, 16th, end of day (timezone of your preference). Once that happens, only wrong-code fixes, regression bugfixes and documentation fixes will be allowed, as is normal for our release branches too. There are still a few patches that have been posted during Stage 1, please get them committed into trunk before Stage 4 starts. Still misleading quality data below - some P3 bugs have not been re-prioritized. Quality Data Priority # Change from last report --- --- P1 39+ 24 P2 98+ 15 P3 48- 84 --- --- Total 185- 45 Previous Report === https://gcc.gnu.org/ml/gcc/2014-11/msg00249.html
Re: MULTILIB_OPTIONS and DRIVER_SELF_SPEC
On 05/11/2012 03:16 PM, Paulo J. Matos wrote: > Hi, > > MULTILIB_OPTIONS containing options defined in DRIVER_SELF_SPEC seemed > to be fine in GCC46 but fail in GCC47. > > For example, I have: > xap.h: > #define DRIVER_SELF_SPECS \ > "%{help:-v} % "%{mno-args-span-regs-and-mem:-mno-split-args} > % "%{mno-inline-block-copy-mode:-mno-block-copy} > % "%{mpu:-mno-block-copy -mfunction-ptr-pi} % > t-xap: > MULTILIB_OPTIONS= msmall-mode/mpu > > However, while building GCC I get that xgcc cannot understand -mpu: > Running configure in multilib subdir mpu > pwd: > /home/pm18/p4ws/pm18_binutils/bc/main/result/linux/intermediate/FirmwareGcc47Package/xap-local-xap > mkdir mpu > configure: creating cache ./config.cache > checking build system type... i686-pc-linux-gnu > checking host system type... xap-local-xap > checking for --enable-version-specific-runtime-libs... no > checking for a BSD-compatible install... /usr/bin/install -c > checking for gawk... gawk > xgcc: error: unrecognized command line option '-mpu' > > > What happened in GCC47 for this to occur? Options not explicitly described in the compiler before their use in a spec rules are now rejected. So you probably need to describe it into your target optimization file, (something like xap.opt). Cheers Christian > > Cheers, >
Re: gcc doesn't accept specs options anymore
On 05/07/2012 03:11 PM, Christian Bruel wrote: > > >> What about a generic name such as -fextension- (or both -fextension- and >> -mextension-) for options that GCC itself will ignore, if -mbsp= is >> considered inappropriate? I'd prefer that to delimiting such options with >> --start-specs and --end-specs. >> > > you mean, gcc would ignore options in the -fextension string ?. For > instance an invocation would be > > gcc -spec=board.spec -foo -fextension-foo ? as a matter of fact gcc -spec=board.spec -fextension-foo could be enough > > instead of > > gcc -spec=board.spec --start-specs -foo --end-specs > > OK, both allow to fix the problem, with a minor backward compatibility > for our BSP integrator that could be handled easily. > > If agreement, are you going to propose something ?, or do you want to > wait me to propose a patch ? > > Many thanks, > > Christian
Re: gcc doesn't accept specs options anymore
> What about a generic name such as -fextension- (or both -fextension- and > -mextension-) for options that GCC itself will ignore, if -mbsp= is > considered inappropriate? I'd prefer that to delimiting such options with > --start-specs and --end-specs. > you mean, gcc would ignore options in the -fextension string ?. For instance an invocation would be gcc -spec=board.spec -foo -fextension-foo ? instead of gcc -spec=board.spec --start-specs -foo --end-specs OK, both allow to fix the problem, with a minor backward compatibility for our BSP integrator that could be handled easily. If agreement, are you going to propose something ?, or do you want to wait me to propose a patch ? Many thanks, Christian
Re: gcc doesn't accept specs options anymore
> I think http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49858 is > essentially this issue. It can probably be closed as "won't fix", > though I notice the spec file format is still documented in the user > manual. > > Peter > yes, same root problem, although BSP design is a different usage (yet quite common). I wouldn't be in favor moving all the spec support to the GCC internals if this deprecates the ‘-specs=’ user option . many thanks Christian
Re: gcc doesn't accept specs options anymore
On 05/07/2012 12:09 PM, Joseph S. Myers wrote: > On Mon, 7 May 2012, Christian Bruel wrote: > >> Making the driver aware about all possible user defined options seems >> unpredictable. Was there any justification on removing this >> functionality or did I miss a point with the EXTRA_SPECS ? > > There are several motivations behind requiring all options to be defined > in .opt files, including: > > * For multilibs to be selected based on the semantics of options, using > values set in gcc_options structures by the same code as in cc1, rather > than by textual matching attempting to replicate semantics, the driver > needs to understand the semantics of options as similarly as possible to > cc1, rather than treating any options purely textually. > > * Every option supported by the compiler should be listed in --help (and > if the missing help information were all filled it, we could then make it > a build failure to have an option without help information). True but this removes the flexibility for a user or a BSP maintainer to define new options, e.g to the linker not the compiler, without access to the compiler sources using a --spec= file.. > > * Structured option information enables consistency in how options are > processed and errors given for unknown options or arguments. > > * It would be useful for the compiler to be able to export structured > information about all its options for use by tools such as IDEs. If the option is only supported by a BSP, and not by the compiler, I don't see how the compiler could report it since it doesn't depend on static information known at build time. A direction would be to add this information in the user spec rules *ldruntime: + %{foo: -lfoo} %{help: "describe foo "} I'm not aware about such machinery. maybe an idea of improvement ? > > There is certainly room for more extensibility in option handling - > ideally there would not be one big enum with OPT_* values for all options > and one header with all the macros and structures, but instead front-end > and back-end options would use some form of separate namespace for their > options so the language- and target-independent compiler doesn't see the > options for other parts of the compiler; that fits into the modularity > theme that ideally it would be possible to build multiple front ends and > back ends into the compiler at once, or to build front ends and back ends > separately from the compiler. But defining options through use in specs > wouldn't be part of that; rather more structured information about each > option would need to be provided somehow by a separately built front end > or back end. > > If you want -m options to select arbitrary board support packages (and the > existing ability to use -T to name a linker script isn't sufficient), then > a -mbsp= option, whose argument is not interpreted by GCC but may be > processed by whatever specs you are adding after GCC is installed, would > seem better than lots of separate -m options. I don't like this -mbsp= alternative a lot, seems confusing, not elegant, and not general for other uses (could be a runtime customization, not bsp). What about delimiters, something like --start-specs ... --end-specs ? > > As for options in specs included with GCC: they are all meant to be in the > .opt files. I went through all the specs in all the config/ headers in > GCC and added options found to .opt files before disallowing options not > included in .opt files, but as there are about 500 such headers it's quite > possible I missed some specs-defined options in the process. yes it looks ok for the GCC specs, the problem is for the user spec files. This is a new legacy issue, I thought it was worth to either report it, and see if this need/can be fixed. many thanks Christian >
gcc doesn't accept specs options anymore
Hello, There are a few EXTRA_SPECS rules that are used to custom target runtime support. For instance, "ldruntime" is used on superh for board configurations and dynamically support different runtime behaviors. Illustration of this use with a silly reduced spec *ldruntime: + %{mfoo: -lfoo} The same kind of example could be found with the x86 cc1_cpu spec rules, However since revision: r171307 | jsm28 | 2011-03-22 23:19:01 +0100 (Tue, 22 Mar 2011) | 5 lines * gcc.c (driver_unknown_option_callback): Only permit and save unknown -Wno- options. (driver_wrong_lang_callback): Save options directly instead of via driver_unknown_option_callback. using a spec defined option result in driver error: gcc: error: unrecognized command line option '-mfoo' Making the driver aware about all possible user defined options seems unpredictable. Was there any justification on removing this functionality or did I miss a point with the EXTRA_SPECS ? Any thought ? Thanks a lot, Christian
Re: Discussion: What is unspec_volatile?
On 11/13/2010 08:40 PM, Peter Bergner wrote: On Sat, 2010-11-13 at 11:27 +0100, Paolo Bonzini wrote: On 11/12/2010 03:25 PM, H.J. Lu wrote: IRA may move instructions across an unspec_volatile, Do you have a testcase? Are you sure it's IRA and not our old friend update_equiv_regs() which IRA calls? http://gcc.gnu.org/PR41171 shows an example where update_equiv_regs() moves code around. Peter I'm just having a similar issue on SH4. The machine description inserts a unspec_volatile when generating a PIC access to the stack_chk_guard symbol to avoid combine them into a mem (R0,Rx) addressing mode, generating a unable to find a register to spill in class 'R0_REGS' spill failure. The simplified RTL sequence was like, before ira: (insn 33 32 34 5 (set (reg:SI 175) (plus:SI (reg/f:SI 174) (reg:SI 12 r12))) (insn 34 33 35 5 (unspec_volatile [ (const_int 0 [0]) ] 0) (insn 35 34 36 5 (set (reg/f:SI 173) (mem/u/c:SI (reg:SI 175) [0 S4 A32])) Then during IRA : (insn 35 56 55 5 (set (reg/f:SI 1 r1 [173]) (mem/u/c:SI (plus:SI (reg/f:SI 1 r1 [174]) (reg:SI 12 r12)) [0 S4 A32])) So insn 33 has been moved across the unspec_volatile by 'update_equiv_regs'. So, back to the original question. Is unspec_volatile expected to avoid this ? The conservative illustrative attached patch fixed my problem, but this should clearly need to refined because it's also prevents combines of insns that are not concerned by the blockage. I also suspect that there are other places in the compiler where instructions could be combined without checking the unspec_volatile. Index: ira.c === --- ira.c (revision 166230) +++ ira.c (working copy) @@ -2304,6 +2304,16 @@ only mark all destinations as having no known equivalence. */ if (set == 0) { + if (GET_CODE (PATTERN (insn)) == UNSPEC_VOLATILE) + { + int i; + /* UNSPEC_VOLATILE is considered to use and clobber all hard + registers and all of memory. This blocks insns from being + combined across this point. */ + for (i = FIRST_PSEUDO_REGISTER; i < reg_equiv_init_size; i++) + reg_equiv[i].replace = 0; + } + note_stores (PATTERN (insn), no_equiv, NULL); continue; }
Re: SH optimized software floating point routines
Joern Rennecke wrote: Quoting Christian Bruel : Using the ieee-sf.S + this patch OK Is this only a proof-of-concept, because you only change the ne[sd]f2 implementation? I changed also the unordered comparison patterns. (cmpunsf_i1, cmpundf_i1). But yes, the other functions that would need the same kind of check would be unordsf2, and all the comparisons (gtsf2, gesf2f...) for floats and doubles. But I will only consider those after/if we all agree that this needs to be done instead of keeping the current QNaN only restrictions. And you go out of your way to only accept a restricted set of values. This hold for the original optimized implementation as well, for example I don't think that 0x7f81 was caught. In fact implementing correctly the isnan check without restricted set of value makes the original discussion pointless, since the Q/S bits are a subpart of all possible codings, with any fractional part != 0. Plus, the overuse of the arithmetic unit hurts SH4-100 / SH4-200 instruction pairing. > AFAICT you need only one cycle penalty, in the check_nan path: GLOBAL(nesf2): /* If the raw values are unequal, the result is unequal, unless both values are +-zero. If the raw values are equal, the result is equal, unless the values are NaN. */ cmp/eq r4,r5 mov.l LOCAL(inf2),r1 bt/s LOCAL(check_nan) mov r4,r0 or r5,r0 rts add r0,r0 LOCAL(check_nan): add r0,r0 cmp/hi r1,r0 rts movtr0 .balign 4 LOCAL(inf2): .long 0xff00 You could even save four bytes by putting the check_nan label into the delay slot, but I'm not sure if that'll discomfit any branch prediction mechanism. Thanks a lot of this one, It should fix the original problem on the restricted set of values as well. The cmpund patterns fix should probably have a similar checks. Disclaimer: I've not tested this code. For the DFmode case, what about NaNs denoted by the low word, e.g. 0x7ff0 1 ? If so, the DFmode code could become something like this: GLOBAL(nedf2): cmp/eq DBL0L,DBL1L mov.l LOCAL(inf2),r1 bf LOCAL(ne) cmp/eq DBL0H,DBL1H bt/sLOCAL(check_nan) mov DBL0H,r0 or DBL1H,r0 add r0,r0 rts or DBL0L,r0 LOCAL(check_nan): tst DBL0L,DBL0L add r0,r0 subcr1,r0 mov #-1,r0 rts negcr0,r0 LOCAL(ne): rts mov #1,r0 .balign 4 LOCAL(inf2): .long 0xffe0 > For an actual patch, you need to use the SL* macros from > config/sh/lib1funcs.h because the SH1 does not have delayed branches. OK, thanks
Re: SH optimized software floating point routines
oops, resending it with a small typo fix (a branch became delayed :-(). Just in case it we accepted that SNaNs and QNaNs are not exclusive and mimic the C model, a synthetic illustrative test case: Compile with sh-superh-elf-gcc -O2 -mieee -m4-nofpu snan.c snan2.c -g -o l.u ; sh-superh-elf-run l.u ; echo $? Original 4.6 fp-bit C model: OK Using the ieee-sf.S implementation: FAIL Using the ieee-sf.S + this patch OK same for sh4-linux. Best Regards, Christian Christian Bruel wrote: Christian Bruel wrote: Hi Kaz, Kaz Kojima wrote: BTW, it looks that softfp __unord?f2 routines check signaling NaNs only. This makes __builtin_isnan return false for quiet NaNs for which current fp-bit ones return true when -mieee enabled. Perhaps that change of behavior might be OK for software FP. I use the attached patch to handle the QNaNs in the assembly solf-fp. Need to be updated for trunk (and update the dates in changelogs). Will do. Edited to apply on top of latest Joern's patch. Certainly not optimal but it fixes the QNaNs checks for builtins and inlined unordered comparisons for -mieee or -fno-inite-math-only. Best Regards Christian diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S --- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S 2010-07-21 18:04:17.94995 +0200 +++ gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S 2010-07-21 18:09:10.602376000 +0200 @@ -92,11 +92,12 @@ HIDDEN_FUNC(GLOBAL(nedf2)) GLOBAL(nedf2): cmp/eq DBL0L,DBL1L - mov.l LOCAL(c_DF_NAN_MASK),r1 - bf LOCAL(ne) + bf.sLOCAL(ne) + mov #1,r0 cmp/eq DBL0H,DBL1H + mov.l LOCAL(c_DF_NAN_MASK),r1 + bt.sLOCAL(check_nan) not DBL0H,r0 - bt LOCAL(check_nan) mov DBL0H,r0 or DBL1H,r0 add r0,r0 @@ -104,11 +105,17 @@ or DBL0L,r0 LOCAL(check_nan): tst r1,r0 - rts + bt.sLOCAL(nan) + mov #12,r2 + shll16 r2 + xor r2,r1 + tst r1,r0 +LOCAL(nan): movtr0 LOCAL(ne): rts - mov #1,r0 + nop + .balign 4 LOCAL(c_DF_NAN_MASK): .long DF_NAN_MASK diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S --- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S 2010-07-22 14:21:50.606831000 +0200 +++ gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S 2010-07-22 15:30:17.928097000 +0200 @@ -58,6 +58,12 @@ add r0,r0 LOCAL(check_nan): tst r1,r0 + bt.sLOCAL(nan) + mov #96,r2 + shll16 r2 + xor r2,r1 + tst r1,r0 + LOCAL(nan): rts movtr0 .balign 4 diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/sh.md gnu_trunk/gcc/gcc/config/sh/sh.md --- gnu_trunk.ref/gcc/gcc/config/sh/sh.md 2010-07-21 18:06:25.978547000 +0200 +++ gnu_trunk/gcc/gcc/config/sh/sh.md 2010-07-22 09:13:12.599669000 +0200 @@ -10262,6 +10262,7 @@ (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:SI R1_REG)) + (clobber (reg:SI R2_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r"))] "TARGET_SH1 && ! TARGET_SH2E" "jsr @%1%#" @@ -10337,13 +10338,18 @@ (define_insn "cmpunsf_i1" [(set (reg:SI T_REG) - (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r") - (match_operand:SF 1 "arith_reg_operand" "r,r"))) - (use (match_operand:SI 2 "arith_reg_operand" "r,r")) - (clobber (match_scratch:SI 3 "=0,&r"))] + (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r") + (match_operand:SF 1 "arith_reg_operand" "r"))) + (use (match_operand:SI 2 "arith_reg_operand" "r")) + (clobber (match_scratch:SI 3 "=&r"))] "TARGET_SH1 && ! TARGET_SH2E" - "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:" - [(set_attr "length" "10")]) +"not\t%0,%3\;tst\t%2,%3\;bt.s\t0f +\tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f +\tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2 +\tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f +\tnot\t%1,%3\;tst\t%2,%3 + 0:" +[(set_attr "length" "28")]) ;; ??? This is a lot of code with a lot of branches; a library function ;; might be better. @@ -11069,6 +11075,7 @@ (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:SI R1_REG)) +
Re: SH optimized software floating point routines
Joern Rennecke wrote: Quoting Christian Bruel : Edited to apply on top of latest Joern's patch. Certainly not optimal but it fixes the QNaNs checks for builtins and inlined unordered comparisons for -mieee or -fno-inite-math-only. You are still on the wrong track; as I said in my earlier message, we should not emit the library call for SH4 in the first place. > Please try the attached patch instead. Hello, Sorry for the mails that crossed. I think we are dealing with 2 different problems here, that have the same root. Original one was about undefined __unorddf2/__unordsf2 regression, for which you said that the library functions should not be called. I agree, and my patch is not exclusive with yours in this regard. I was dealing with functional issues in the SNanS bit checking in the cmpun_ patterns (in addition to the floating point comparisons functions). Which is exposed by the regression test that I provided (for -m4-nofpu -mieee). About the other part of your answer, non supporting SNaNs in the fp-bit.c, it is a possibility that I didn't consider in my fix. This restriction is quite a surprise to me because, related to NaNs, it is not what I guess from the implementation of the fp-bit.c's isnan function that does check for CLASS_SNAN, and CLASS_QNAN. See for example the result of static int misnanf(float v) { return (v != v); } called with either a QNaN or a SNaN. IMO The assembly model should have the same semantic that the C model, which is not the case today. Using -fsignaling-nans and eventually putting #ifdef __SUPPORT_SNAN__ around the checking doesn't change anything since the same call is done to the floating point comparison function, that really needs to check for both formats. If your are concerned about the extra cycles needed in the nesf2f implementation (wich is nothing anyway compared to the C model), we could certainly provide a specialized one just for -fsignaling-nans. Best Regards Christian
Re: SH optimized software floating point routines
Christian Bruel wrote: Hi Kaz, Kaz Kojima wrote: BTW, it looks that softfp __unord?f2 routines check signaling NaNs only. This makes __builtin_isnan return false for quiet NaNs for which current fp-bit ones return true when -mieee enabled. Perhaps that change of behavior might be OK for software FP. I use the attached patch to handle the QNaNs in the assembly solf-fp. Need to be updated for trunk (and update the dates in changelogs). Will do. Edited to apply on top of latest Joern's patch. Certainly not optimal but it fixes the QNaNs checks for builtins and inlined unordered comparisons for -mieee or -fno-inite-math-only. Best Regards Christian 2010-07-22 Christian Bruel * gcc.dg/builtins-nan.c: New test. 2010-07-22 Christian Bruel * config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs. * config/sh/ieee-754-sf.S (nesf2f): Likewise. * config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. (cmpnesf_i1, cmpnedf_i1): Clobber R2. diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S --- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-df.S 2010-07-21 18:04:17.0 +0200 +++ gnu_trunk/gcc/gcc/config/sh/ieee-754-df.S 2010-07-21 18:09:10.0 +0200 @@ -92,11 +92,12 @@ HIDDEN_FUNC(GLOBAL(nedf2)) GLOBAL(nedf2): cmp/eq DBL0L,DBL1L - mov.l LOCAL(c_DF_NAN_MASK),r1 - bf LOCAL(ne) + bf.sLOCAL(ne) + mov #1,r0 cmp/eq DBL0H,DBL1H + mov.l LOCAL(c_DF_NAN_MASK),r1 + bt.sLOCAL(check_nan) not DBL0H,r0 - bt LOCAL(check_nan) mov DBL0H,r0 or DBL1H,r0 add r0,r0 @@ -104,11 +105,17 @@ or DBL0L,r0 LOCAL(check_nan): tst r1,r0 - rts + bt.sLOCAL(nan) + mov #12,r2 + shll16 r2 + xor r2,r1 + tst r1,r0 +LOCAL(nan): movtr0 LOCAL(ne): rts - mov #1,r0 + nop + .balign 4 LOCAL(c_DF_NAN_MASK): .long DF_NAN_MASK diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S --- gnu_trunk.ref/gcc/gcc/config/sh/ieee-754-sf.S 2010-07-21 18:04:18.0 +0200 +++ gnu_trunk/gcc/gcc/config/sh/ieee-754-sf.S 2010-07-21 18:09:10.0 +0200 @@ -51,13 +51,19 @@ cmp/eq r4,r5 mov.l LOCAL(c_SF_NAN_MASK),r1 not r4,r0 - bt LOCAL(check_nan) + bt.sLOCAL(check_nan) mov r4,r0 or r5,r0 rts add r0,r0 LOCAL(check_nan): tst r1,r0 + bt.sLOCAL(nan) + mov #96,r2 + shll16 r2 + xor r2,r1 + tst r1,r0 + LOCAL(nan): rts movtr0 .balign 4 diff '--exclude=.svn' '--exclude=*.rej' '--exclude=*~' -ubrN gnu_trunk.ref/gcc/gcc/config/sh/sh.md gnu_trunk/gcc/gcc/config/sh/sh.md --- gnu_trunk.ref/gcc/gcc/config/sh/sh.md 2010-07-21 18:06:25.0 +0200 +++ gnu_trunk/gcc/gcc/config/sh/sh.md 2010-07-22 09:13:12.0 +0200 @@ -10262,6 +10262,7 @@ (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:SI R1_REG)) + (clobber (reg:SI R2_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r"))] "TARGET_SH1 && ! TARGET_SH2E" "jsr @%1%#" @@ -10337,13 +10338,18 @@ (define_insn "cmpunsf_i1" [(set (reg:SI T_REG) - (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r") - (match_operand:SF 1 "arith_reg_operand" "r,r"))) - (use (match_operand:SI 2 "arith_reg_operand" "r,r")) - (clobber (match_scratch:SI 3 "=0,&r"))] + (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r") + (match_operand:SF 1 "arith_reg_operand" "r"))) + (use (match_operand:SI 2 "arith_reg_operand" "r")) + (clobber (match_scratch:SI 3 "=&r"))] "TARGET_SH1 && ! TARGET_SH2E" - "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:" - [(set_attr "length" "10")]) +"not\t%0,%3\;tst\t%2,%3\;bt.s\t0f +\tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f +\tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2 +\tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f +\tnot\t%1,%3\;tst\t%2,%3 + 0:" +[(set_attr "length" "28")]) ;; ??? This is a lot of code with a lot of branches; a library function ;; might be better. @@ -11069,6 +11075,7 @@ (clobber (reg:SI T_REG)) (clobber (reg:SI
Re: SH optimized software floating point routines
Hi Kaz, Kaz Kojima wrote: BTW, it looks that softfp __unord?f2 routines check signaling NaNs only. This makes __builtin_isnan return false for quiet NaNs for which current fp-bit ones return true when -mieee enabled. Perhaps that change of behavior might be OK for software FP. I use the attached patch to handle the QNaNs in the assembly solf-fp. Need to be updated for trunk (and update the dates in changelogs). Will do. Cheers Christian 2010-04-20 Christian Bruel * gcc.dg/builtins-nan.c: New test. 2010-04-20 Christian Bruel * config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs. * config/sh/ieee-754-sf.S (nesf2f): Likewise. * config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. Clobber R2. 2010-04-20 Christian Bruel * gcc.dg/builtins-nan.c: New test. 2010-04-20 Christian Bruel * config/sh/ieee-754-df.S (nedf2f): Don't check Qbit for NaNs. * config/sh/ieee-754-sf.S (nesf2f): Likewise. * config/sh/sh.md (cmpunsf_i1, cmpundf_i1): Likewise. Clobber R2. Index: gcc/config/sh/ieee-754-df.S === --- gcc/config/sh/ieee-754-df.S (revision 1352) +++ gcc/config/sh/ieee-754-df.S (revision 1373) @@ -88,11 +88,12 @@ HIDDEN_FUNC(GLOBAL(nedf2f)) GLOBAL(nedf2f): cmp/eq DBL0L,DBL1L + bf.sLOCAL(ne) + mov #1,r0 + cmp/eq DBL0H,DBL1H mov.l LOCAL(c_DF_NAN_MASK),r1 - bf LOCAL(ne) - cmp/eq DBL0H,DBL1H - not DBL0H,r0 - bt LOCAL(check_nan) + bt.sLOCAL(check_nan) + not DBL0H,r0 mov DBL0H,r0 or DBL1H,r0 add r0,r0 @@ -100,11 +101,17 @@ or DBL0L,r0 LOCAL(check_nan): tst r1,r0 - rts + bt.sLOCAL(nan) + mov #12,r2 + shll16 r2 + xor r2,r1 + tst r1,r0 +LOCAL(nan): movtr0 LOCAL(ne): rts - mov #1,r0 + nop + .balign 4 LOCAL(c_DF_NAN_MASK): .long DF_NAN_MASK Index: gcc/config/sh/ieee-754-sf.S === --- gcc/config/sh/ieee-754-sf.S (revision 1352) +++ gcc/config/sh/ieee-754-sf.S (revision 1373) @@ -55,19 +55,27 @@ the values are NaN. */ cmp/eq r4,r5 mov.l LOCAL(c_SF_NAN_MASK),r1 + bt.sLOCAL(check_nan) not r4,r0 - bt LOCAL(check_nan) mov r4,r0 or r5,r0 rts add r0,r0 LOCAL(check_nan): tst r1,r0 + bt.sLOCAL(nan) + mov #96,r2 + shll16 r2 + xor r2,r1 + tst r1,r0 +LOCAL(nan): rts movtr0 + .balign 4 LOCAL(c_SF_NAN_MASK): .long SF_NAN_MASK +LOCAL(c_SF_SNAN_MASK): ENDFUNC(GLOBAL(nesf2f)) #endif /* L_nesf2f */ Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 1352) +++ gcc/config/sh/sh.md (revision 1373) @@ -11182,6 +11182,7 @@ (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:SI R1_REG)) +(clobber (reg:SI R2_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r"))] "TARGET_SH1 && ! TARGET_SH2E" "jsr@%1%#" @@ -11257,13 +11258,18 @@ (define_insn "cmpunsf_i1" [(set (reg:SI T_REG) - (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r,r") - (match_operand:SF 1 "arith_reg_operand" "r,r"))) -(use (match_operand:SI 2 "arith_reg_operand" "r,r")) -(clobber (match_scratch:SI 3 "=0,&r"))] + (unordered:SI (match_operand:SF 0 "arith_reg_operand" "r") + (match_operand:SF 1 "arith_reg_operand" "r"))) +(use (match_operand:SI 2 "arith_reg_operand" "r")) +(clobber (match_scratch:SI 3 "=&r"))] "TARGET_SH1 && ! TARGET_SH2E" - "not\t%0,%3\;tst\t%2,%3\;not\t%1,%3\;bt\t0f\;tst\t%2,%3\;0:" - [(set_attr "length" "10")]) + "not\t%0,%3\;tst\t%2,%3\;bt.s\t0f +\tnot\t%1,%3\;tst\t%2,%3\;bt.s\t0f +\tmov\t#96,%3\;shll16\t%3\;xor\t%3,%2 +\tnot\t%0,%3\;tst\t%2,%3\;bt.s\t0f +\tnot\t%1,%3\;tst\t%2,%3 +0:" + [(set_attr "length" "28")]) ;; ??? This is a lot of code with a lot of branches; a library function ;; might be better. @@ -11967,6 +11973,7 @@ (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) (clobber (reg:SI R1_REG)) + (clobber (reg:SI R2_REG)) (use (match_operand:SI 1 "arith_reg_operand" "r"))] "TARGET_SH1_SOFTFP" "jsr @%1%#" @@ -12008,13 +12015,18 @@
fno-branch-count-reg misleading documentation woes
hello, The documentation for -fno-branch-count-reg explains that a dec-and-test-branch instruction is replaced by an equivalent sequence of instruction that decrement a register, compare it against 0, and branch. (see the use of the world *instead*) This is not really true, since this option firstly disables the loop reversal transformation (loop-init.c::gate_rtl_doloop). As such, the generated code will not necessary have an inversed decrement loop count created and the sequence of reg testing will not be necessary a decrement-test-branch sequence. another comment: -fbranch-count-reg the instruction is not necessary a "decrement and branch" but could also be a "decrement and compare" like on the SH. would a rephrasing like the following be more accurate ? thanks. -c Index: invoke.texi === --- invoke.texi (revision 135611) +++ invoke.texi (working copy) @@ -5420,10 +5420,7 @@ @item -fno-branch-count-reg @opindex fno-branch-count-reg -Do not use ``decrement and branch'' instructions on a count register, -but instead generate a sequence of instructions that decrement a -register, compare it against zero, then branch based upon the result. -This option is only meaningful on architectures that support such +Do not use ``decrement and branch/compare'' instructions on a count register. By setting this flag, loop inversion will be disabled. This option is only meaningful on architectures that support such instructions, which include x86, PowerPC, IA-64 and S/390. The default is @option{-fbranch-count-reg}.
Re: -mfmovd enabled by default for SH2A but not for SH4
Hello, Looks like you are mixing ABIs. what is you fpscr setting ? From my understanding, if the fpscr PR bit is set to 0 the 64-bit operation behaves as 2 32 bit operations (paired single precision). so I don't think you get an address error here. The well defined behavior of the fmov instruction depends on the endianess and the SZ/PR bits setting in the fpscr register. My guess is that the default gcc value of 32 bits fmov instruction is the one that matches best all sh4 configurations (SZ/PR=1 is even undefined for some cores). Changing its default would be possible if you change your ABI or have another multilib setting for startup files. But the current situation is that it is usually let to the user to explicitly provide their own fpscr setting when then want to change the fpmov size and aligns. Cheers, Christian Naveen H.S. wrote: Hi, Have you got this error on the real SH2A-FPU hardware? Yes, we got this error on SH72513(SH2A) hardware. When the same code is run on simulator, the "address error" occurs on encountering the "fmov.d" instruction. couldn't find any description for 8-byte alignment restriction for double data on memory in my SH2A manual Please refer the section 3.3 "address errors" in the SH2A software manual at the following link:- http://documentation.renesas.com/eng/products/mpumcu/rej09b0051_sh2a.pdf It is mentioned that "Double longword data accessed from other than double longword boundary" results in address error. Regards, Naveen.H.S. KPIT Cummins Infosystems Ltd, Pune (INDIA) ~~ Free download of GNU based tool-chains for Renesas' SH, H8, R8C, M16C and M32C Series. The following site also offers free technical support to its users. Visit http://www.kpitgnutools.com for details. Latest versions of KPIT GNU tools were released on February 4, 2008. ~~
Re: C++: operator new and disabled exceptions
hello, there is a difference between calling new and new (std::nothrow) from a fno-exceptions context: - new (std::nothrow) would return 0 in case of error - new () would throw std::bad_alloc that would finish in std::terminate() or abort() so there is a possible difference in behavior if an -fno-exceptions application relies on std::terminate(). so if you patch gcc to use the nothrow internally, you would need to compile all your applications and all your libraries and runtimes with -fcheck-new. Christian Christophe LYON wrote: Hello, I have already asked this question on gcc-help (see http://gcc.gnu.org/ml/gcc-help/2007-09/msg00328.html), but I would like advice from GCC developers. Basically, when I compile with -fno-exceptions, I wonder why the G++ compiler still generates calls to the standard new operator (the one that throws bad_alloc when it runs out of memory), rather than new(nothrow) (_ZnwjRKSt9nothrow_t) ? In addition, do you think I can patch my GCC such that it calls new(nothrow) when compiling with -fno-exceptions, or is it a bad idea? (compatibility issues, ...) Thanks for your recommendation, Christophe.
-fstrict-overflow example 4.2 status
hello, The example provided with the -fstrict-overflow description in the http://gcc.gnu.org/gcc-4.2/changes.html page doesn't optimize as described. Is it only a documentation bug ? The example is optimized as expected on the trunk. Regards, -c