Re: [PATCH v2] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2022-08-15 Thread Kewen.Lin via Gcc-patches
Hi Xionghu,

Thanks for the updated version of patch, some comments are inlined.

on 2022/8/11 14:15, Xionghu Luo wrote:
> 
> 
> On 2022/8/11 01:07, Segher Boessenkool wrote:
>> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote:
>>> On 2022/8/9 11:01, Kewen.Lin wrote:
 I have some concern on those changed "altivec_*_direct", IMHO the suffix
 "_direct" is normally to indicate the define_insn is mapped to the
 corresponding hw insn directly.  With this change, for example,
 altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks
 misleading.  Maybe we can add the corresponding _direct_le and _direct_be
 versions, both are mapped into the same insn but have different RTL
 patterns.  Looking forward to Segher's and David's suggestions.
>>>
>>> Thanks!  Do you mean same RTL patterns with different hw insn?
>>
>> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb
>> instruction, never a vmrglb instead.  Misleading names are an expensive
>> problem.
>>
>>
> 
> Thanks.  Then on LE platforms, if user calls altivec_vmrghw,it will be
> expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and
> finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw".
> For BE just strict forward, seems more clear :-), OK for master?
> 
> 
> [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS 
> [PR106069]
> 
> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
> patterns.
> v2: Split the direct pattern to be and le with same RTL but different insn.
> 
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.  So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
> 
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>    (subreg:V4SI (reg:V16QI 139) 0)
>    (subreg:V4SI (reg:V16QI 140) 0))
>    [const_int 0 4 1 5]))
> 
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
> 
> =>
> 
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
> 
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
> 
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64}
> Linux(Thanks to Kewen).
> 
> gcc/ChangeLog:
> 
> PR target/106069
> * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
> (altivec_vmrghb_direct_be): New pattern for BE.
> (altivec_vmrglb_direct_le): New pattern for LE.
> (altivec_vmrghh_direct): Remove.
> (altivec_vmrghh_direct_be): New pattern for BE.
> (altivec_vmrglh_direct_le): New pattern for LE.
> (altivec_vmrghw_direct_): Remove.
> (altivec_vmrghw_direct__be): New pattern for BE.
> (altivec_vmrglw_direct__le): New pattern for LE.
> (altivec_vmrglb_direct): Remove.
> (altivec_vmrglb_direct_be): New pattern for BE.
> (altivec_vmrghb_direct_le): New pattern for LE.
> (altivec_vmrglh_direct): Remove.
> (altivec_vmrglh_direct_be): New pattern for BE.
> (altivec_vmrghh_direct_le): New pattern for LE.
> (altivec_vmrglw_direct_): Remove.
> (altivec_vmrglw_direct__be): New pattern for BE.
> (altivec_vmrghw_direct__le): New pattern for LE.
> * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
> Adjust.
> * config/rs6000/vsx.md: Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> PR target/106069
> * g++.target/powerpc/pr106069.C: New test.
> 
> Signed-off-by: Xionghu Luo 
> ---
>  gcc/config/rs6000/altivec.md    | 223 ++--
>  gcc/config/rs6000/rs6000.cc |  36 ++--
>  gcc/config/rs6000/vsx.md    |  24 +--
>  gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++
>  4 files changed, 305 insertions(+), 98 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..78245f470e9 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb"
>     (use (match_operand:V16QI 2 "register_operand"))]
>    "TARGET_ALTIVEC"
>  {
> -  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> -    : gen_altivec_vmrglb_direct;
> -  if (!BYTES_BIG_ENDIAN)
> -    std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> +  rtvec v = gen_rtvec (16, GE

Re: [RFC]rs6000: split complicated constant to memory

2022-08-15 Thread Jiufu Guo via Gcc-patches
Jiufu Guo  writes:

> Hi,
>
> Richard Biener  writes:
>
>> On Mon, Aug 15, 2022 at 7:26 AM Jiufu Guo via Gcc-patches
>>  wrote:
>>>
>>> Hi,
>>>
>>> This patch tries to put the constant into constant pool if building the
>>> constant requires 3 or more instructions.
>>>
>>> But there is a concern: I'm wondering if this patch is really profitable.
>>>
>>> Because, as I tested, 1. for simple case, if instructions are not been run
>>> in parallel, loading constant from memory maybe faster; but 2. if there
>>> are some instructions could run in parallel, loading constant from memory
>>> are not win comparing with building constant.  As below examples.
>>>
>>> For f1.c and f3.c, 'loading' constant would be acceptable in runtime aspect;
>>> for f2.c and f4.c, 'loading' constant are visibly slower.
>>>
>>> For real-world cases, both kinds of code sequences exist.
>>>
>>> So, I'm not sure if we need to push this patch.
>>>
>>> Run a lot of times (10) below functions to check runtime.
>>> f1.c:
>>> long foo (long *arg, long*, long *)
>>> {
>>>   *arg = 0x12345678;
>>> }
>>> asm building constant:
>>> lis 10,0x1234
>>> ori 10,10,0x5678
>>> sldi 10,10,32
>>> vs.  asm loading
>>> addis 10,2,.LC0@toc@ha
>>> ld 10,.LC0@toc@l(10)
>>> The runtime between 'building' and 'loading' are similar: some times the
>>> 'building' is faster; sometimes 'loading' is faster. And the difference is
>>> slight.
>>
>> I wonder if it is possible to decide this during scheduling - chose the
>> variant that, when the result is needed, is cheaper?  Post-RA might
>> be a bit difficult (I see the load from memory needs the TOC, but then
>> when the TOC is not available we could just always emit the build form),
>> and pre-reload precision might be not good enough to make this worth
>> the experiment?
> Thanks a lot for your comments!
>
> Yes, Post-RA may not handle all cases.
> If there is no TOC avaiable, we are not able to load the const through
> TOC.  As Segher point out: crtl->uses_const_pool maybe an approximation
> way.
> Sched2 pass could optimize some cases(e.g. for f2.c and f4.c), but for
> some cases, it may not distrubuted those 'building' instructions.
>
> So, maybe we add a peephole after sched2.  If the five-instructions
> to building constant are still successive, then using 'load' to replace
> (need to check TOC available).
> While I'm not sure if it is worthy.

Oh, as checking the object files (from GCC bootstrap and spec), it is rare
that the five-instructions are successive.  It is often 1(or 2) insns
are distributed, and other 4(or 3) instructions are successive.
So, using peephole may not very helpful.

BR,
Jeff(Jiufu)

>
>>
>> Of course the scheduler might lack on the technical side as well.
>
>
> BR,
> Jeff(Jiufu)
>
>>
>>>
>>> f2.c
>>> long foo (long *arg, long *arg2, long *arg3)
>>> {
>>>   *arg = 0x12345678;
>>>   *arg2 = 0x79652347;
>>>   *arg3 = 0x46891237;
>>> }
>>> asm building constant:
>>> lis 7,0x1234
>>> lis 10,0x7965
>>> lis 9,0x4689
>>> ori 7,7,0x5678
>>> ori 10,10,0x2347
>>> ori 9,9,0x1237
>>> sldi 7,7,32
>>> sldi 10,10,32
>>> sldi 9,9,32
>>> vs. loading
>>> addis 7,2,.LC0@toc@ha
>>> addis 10,2,.LC1@toc@ha
>>> addis 9,2,.LC2@toc@ha
>>> ld 7,.LC0@toc@l(7)
>>> ld 10,.LC1@toc@l(10)
>>> ld 9,.LC2@toc@l(9)
>>> For this case, 'loading' is always slower than 'building' (>15%).
>>>
>>> f3.c
>>> long foo (long *arg, long *, long *)
>>> {
>>>   *arg = 384307168202282325;
>>> }
>>> lis 10,0x555
>>> ori 10,10,0x
>>> sldi 10,10,32
>>> oris 10,10,0x
>>> ori 10,10,0x
>>> For this case, 'building' (through 5 instructions) are slower, and 'loading'
>>> is faster ~5%;
>>>
>>> f4.c
>>> long foo (long *arg, long *arg2, long *arg3)
>>> {
>>>   *arg = 384307168202282325;
>>>   *arg2 = -6148914691236517205;
>>>   *arg3 = 768614336404564651;
>>> }
>>> lis 7,0x555
>>> lis 10,0x
>>> lis 9,0xaaa
>>> ori 7,7,0x
>>> ori 10,10,0x
>>> ori 9,9,0x
>>> sldi 7,7,32
>>> sldi 10,10,32
>>> sldi 9,9,32
>>> oris 7,7,0x
>>> oris 10,10,0x
>>> oris 9,9,0x
>>> ori 7,7,0x
>>> ori 10,10,0xaaab
>>> ori 9,9,0xaaab
>>> For this cases, since 'building' constant are parallel, 'loading' is slower:
>>> ~8%. On p10, 'loading'(through 'pld') is also slower >4%.
>>>
>>>
>>> BR,
>>> Jeff(Jiufu)
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.cc| 14 ++
>>>  gcc/testsuite/gcc.target/powerpc/pr63281.c | 11 +++
>>>  2 files changed, 25 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 4b727d2a500..3798e11bdbc 100644
>>> --- a

[PATCH][pushed] jobserver: fix fifo mode by opening pipe in proper mode

2022-08-15 Thread Martin Liška
The current jobserver_info relies on non-blocking FDs,
thus one the pipe in such mode.

Tested locally for GCC LTO bootstrap that was stuck before the revision.

I'm going to push the change.

Martin

gcc/ChangeLog:

* opts-common.cc (jobserver_info::connect): Open fifo
in non-blocking mode.
---
 gcc/opts-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/opts-common.cc b/gcc/opts-common.cc
index 5d79f174a38..4dec9f94447 100644
--- a/gcc/opts-common.cc
+++ b/gcc/opts-common.cc
@@ -2064,7 +2064,7 @@ void
 jobserver_info::connect ()
 {
   if (!pipe_path.empty ())
-pipefd = open (pipe_path.c_str (), O_RDWR);
+pipefd = open (pipe_path.c_str (), O_RDWR | O_NONBLOCK);
 }
 
 void
-- 
2.37.1



Re: [PATCH v3] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-08-15 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the review!

on 2022/8/16 05:30, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Jun 27, 2022 at 10:47:26AM +0800, Kewen.Lin wrote:
>> on 2022/6/25 00:49, Segher Boessenkool wrote:
>> Many thanks for all the further explanation above!  The attached patch
>> updated the misleading comments as you pointed out and suggested, could
>> you help to have another look?
> 
> Please do not send proposed patches in the middle of a mail thread.  It
> is harder to track things than needed this way, and it makes it hard to
> apply your patches as well (for test builds, say).
> 

Got it!

>> Subject: [PATCH] rs6000: Adjust mov optabs for opaque modes [PR103353]
>>
>> As PR103353 shows, we may want to continue to expand built-in
>> function __builtin_vsx_lxvp, even if we have already emitted
>> error messages about some missing required conditions.  As
>> shown in that PR, without one explicit mov optab on OOmode
>> provided, it would call emit_move_insn recursively.
> 
>> -  rs6000_emit_move (operands[0], operands[1], OOmode);
>> -  DONE;
>> +  if (TARGET_MMA) {
>> +rs6000_emit_move (operands[0], operands[1], OOmode);
>> +DONE;
>> +  }
> 
>   if (TARGET_MMA)
> {
>   rs6000_emit_move (operands[0], operands[1], OOmode);
>   DONE;
> }
> 

Good catch, done!

>> +  /* PR103353 shows we may want to continue to expand the __builtin_vsx_lxvp
>> + built-in function, even if we have already emitted error messages about
>> + some missing required conditions.  As shown in that PR, without one
>> + explicit mov optab on OOmode provided, it would call emit_move_insn
>> + recursively.  So we allow this pattern to be generated when we are
>> + expanding to RTL and have seen errors.  It would not cause further ICEs
>> + as the compilation would stop soon after expanding.  */
>> +  else if (currently_expanding_to_rtl && seen_error ())
>> +;
> 
> The comment goes inside this "else if" branch.  Maybe make it a braced
> block if the semicolon looks out of place without it.
> 

Done.

> Removing the TARGET_MMA requirement is the correct thing to do no matter
> what: this is not an MMA insn at all after all, and neither is it only
> useful if MMA is enabled, etc.

Yeah, it needs one subsequent patch as we agree on.

> 
> Is the later "unreachable" ever useful?  If not, you could just fall
> through after that "if (TARGET_MMA)" thing.  Worst that will happen you
> get an OO move in the insn stream that we will error on later :-)

Yeah, I agree that "just FAIL" works for this issue, but for now without
the valid condition in the context we only expect this expansion to just
perform for bif further expansion (knowing it's taken as bad), the
"unreachable" is meant to punt the other unexpected cases.  :)

> 
> The patch is okay for trunk with the indentation fixed.  Thanks!
> 

Indentation and comments problems addressed, committed in r13-2062.

Thanks again!

BR,
Kewen


Re: [PATCH] xtensa: Turn on -fsplit-wide-types-early by default

2022-08-15 Thread Max Filippov via Gcc-patches
On Sun, Aug 14, 2022 at 2:31 AM Takayuki 'January June' Suwa
 wrote:
>
> Since GCC10, the "subreg2" optimization pass was no longer tied to enabling
> "subreg1" unless -fsplit-wide-types-early was turned on (PR88233).  However
> on the Xtensa port, the lack of "subreg2" can degrade the quality of the
> output code, especially for those that produce many D[FC]mode pseudos.
>
> This patch turns on -fsplit-wide-types-early by default in order to restore
> the previous behavior.
>
> gcc/ChangeLog:
>
> * common/config/xtensa/xtensa-common.cc
> (xtensa_option_optimization_table): Add OPT_fsplit_wide_types_early
> for OPT_LEVELS_ALL in order to restore pre-GCC10 behavior.
> ---
>  gcc/common/config/xtensa/xtensa-common.cc | 2 ++
>  1 file changed, 2 insertions(+)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [RFC]rs6000: split complicated constant to memory

2022-08-15 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Mon, Aug 15, 2022 at 7:26 AM Jiufu Guo via Gcc-patches
>  wrote:
>>
>> Hi,
>>
>> This patch tries to put the constant into constant pool if building the
>> constant requires 3 or more instructions.
>>
>> But there is a concern: I'm wondering if this patch is really profitable.
>>
>> Because, as I tested, 1. for simple case, if instructions are not been run
>> in parallel, loading constant from memory maybe faster; but 2. if there
>> are some instructions could run in parallel, loading constant from memory
>> are not win comparing with building constant.  As below examples.
>>
>> For f1.c and f3.c, 'loading' constant would be acceptable in runtime aspect;
>> for f2.c and f4.c, 'loading' constant are visibly slower.
>>
>> For real-world cases, both kinds of code sequences exist.
>>
>> So, I'm not sure if we need to push this patch.
>>
>> Run a lot of times (10) below functions to check runtime.
>> f1.c:
>> long foo (long *arg, long*, long *)
>> {
>>   *arg = 0x12345678;
>> }
>> asm building constant:
>> lis 10,0x1234
>> ori 10,10,0x5678
>> sldi 10,10,32
>> vs.  asm loading
>> addis 10,2,.LC0@toc@ha
>> ld 10,.LC0@toc@l(10)
>> The runtime between 'building' and 'loading' are similar: some times the
>> 'building' is faster; sometimes 'loading' is faster. And the difference is
>> slight.
>
> I wonder if it is possible to decide this during scheduling - chose the
> variant that, when the result is needed, is cheaper?  Post-RA might
> be a bit difficult (I see the load from memory needs the TOC, but then
> when the TOC is not available we could just always emit the build form),
> and pre-reload precision might be not good enough to make this worth
> the experiment?
Thanks a lot for your comments!

Yes, Post-RA may not handle all cases.
If there is no TOC avaiable, we are not able to load the const through
TOC.  As Segher point out: crtl->uses_const_pool maybe an approximation
way.
Sched2 pass could optimize some cases(e.g. for f2.c and f4.c), but for
some cases, it may not distrubuted those 'building' instructions.

So, maybe we add a peephole after sched2.  If the five-instructions
to building constant are still successive, then using 'load' to replace
(need to check TOC available).
While I'm not sure if it is worthy. 

>
> Of course the scheduler might lack on the technical side as well.


BR,
Jeff(Jiufu)

>
>>
>> f2.c
>> long foo (long *arg, long *arg2, long *arg3)
>> {
>>   *arg = 0x12345678;
>>   *arg2 = 0x79652347;
>>   *arg3 = 0x46891237;
>> }
>> asm building constant:
>> lis 7,0x1234
>> lis 10,0x7965
>> lis 9,0x4689
>> ori 7,7,0x5678
>> ori 10,10,0x2347
>> ori 9,9,0x1237
>> sldi 7,7,32
>> sldi 10,10,32
>> sldi 9,9,32
>> vs. loading
>> addis 7,2,.LC0@toc@ha
>> addis 10,2,.LC1@toc@ha
>> addis 9,2,.LC2@toc@ha
>> ld 7,.LC0@toc@l(7)
>> ld 10,.LC1@toc@l(10)
>> ld 9,.LC2@toc@l(9)
>> For this case, 'loading' is always slower than 'building' (>15%).
>>
>> f3.c
>> long foo (long *arg, long *, long *)
>> {
>>   *arg = 384307168202282325;
>> }
>> lis 10,0x555
>> ori 10,10,0x
>> sldi 10,10,32
>> oris 10,10,0x
>> ori 10,10,0x
>> For this case, 'building' (through 5 instructions) are slower, and 'loading'
>> is faster ~5%;
>>
>> f4.c
>> long foo (long *arg, long *arg2, long *arg3)
>> {
>>   *arg = 384307168202282325;
>>   *arg2 = -6148914691236517205;
>>   *arg3 = 768614336404564651;
>> }
>> lis 7,0x555
>> lis 10,0x
>> lis 9,0xaaa
>> ori 7,7,0x
>> ori 10,10,0x
>> ori 9,9,0x
>> sldi 7,7,32
>> sldi 10,10,32
>> sldi 9,9,32
>> oris 7,7,0x
>> oris 10,10,0x
>> oris 9,9,0x
>> ori 7,7,0x
>> ori 10,10,0xaaab
>> ori 9,9,0xaaab
>> For this cases, since 'building' constant are parallel, 'loading' is slower:
>> ~8%. On p10, 'loading'(through 'pld') is also slower >4%.
>>
>>
>> BR,
>> Jeff(Jiufu)
>>
>> ---
>>  gcc/config/rs6000/rs6000.cc| 14 ++
>>  gcc/testsuite/gcc.target/powerpc/pr63281.c | 11 +++
>>  2 files changed, 25 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
>>
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 4b727d2a500..3798e11bdbc 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -10098,6 +10098,20 @@ rs6000_emit_set_const (rtx dest, rtx source)
>>   c = ((c & 0x) ^ 0x8000) - 0x8000;
>>   emit_move_insn (lo, GEN_INT (c));
>> }
>> +  else if (base_reg_operand (dest, mode)
>> +  && num_insns_constant (source, mode) > 2)
>> +   {
>> + rtx sym = force_const_mem (mode, source);
>> +   

Re: [PATCH V2] arm: add -static-pie support

2022-08-15 Thread Lance Fredrickson via Gcc-patches

Yes, this is a 1st submission.
Yes, I guess I was going for DCO rules.
I will look at running the test suite. Does this need to be done on the 
target? because my arm target is a measly dual core 1ghz embedded chip 
and low ram. Netgear R7000 router actually.


regards
Lance

On 8/10/2022 2:30 PM, Ramana Radhakrishnan wrote:

Hi Lance,

Thanks for your contribution - looks like your first one to GCC ?

The patch looks good to me, though it should probably go through a
full test suite run on arm-linux-gnueabihf and get a ChangeLog - see
here for more https://gcc.gnu.org/contribute.html#patches.

This is probably small enough to go under the 10 line rule but since
you've used Signed-off-by in your patch, is that indicating you are
contributing under DCO rules -
https://gcc.gnu.org/contribute.html#legal ?

regards
Ramana


On Thu, Aug 4, 2022 at 5:48 PM Lance Fredrickson via Gcc-patches
 wrote:

Just a follow up trying to get some eyes on my static-pie patch
submission for arm.
Feedback welcome.
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598610.html

thanks,
Lance Fredrickson




[PATCH] RISC-V missing __builtin_lceil and __builtin_lfloor

2022-08-15 Thread Kevin Lee
Hello,
Currently, __builtin_lceil and __builtin_lfloor doesn't generate an
existing instruction fcvt, but rather calls ceil and floor from the
library. This patch adds the missing iterator and attributes for lceil and
lfloor to produce the optimized code.
 The test cases check the correct generation of the fcvt instruction for
float/double to int/long/long long. Passed the test in riscv-linux.
Could this patch be committed?

gcc/ChangeLog:
   Michael Collison  
* config/riscv/riscv.md (RINT): Add iterator for lceil and lround.
(rint_pattern): Add ceil and floor.
(rint_rm): Add rup and rdn.

gcc/testsuite/ChangeLog:
Kevin Lee  
* gcc.target/riscv/lfloor-lceil.c: New test.
---
 gcc/config/riscv/riscv.md | 13 ++-
 gcc/testsuite/gcc.target/riscv/lfloor-lceil.c | 79 +++
 2 files changed, 88 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/lfloor-lceil.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index c6399b1389e..070004fa7fe 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -43,6 +43,9 @@ (define_c_enum "unspec" [
   UNSPEC_LRINT
   UNSPEC_LROUND

+  UNSPEC_LCEIL
+  UNSPEC_LFLOOR
+
   ;; Stack tie
   UNSPEC_TIE
 ])
@@ -345,10 +348,12 @@ (define_mode_attr UNITMODE [(SF "SF") (DF "DF")])
 ;; the controlling mode.
 (define_mode_attr HALFMODE [(DF "SI") (DI "SI") (TF "DI")])

-;; Iterator and attributes for floating-point rounding instructions.
-(define_int_iterator RINT [UNSPEC_LRINT UNSPEC_LROUND])
-(define_int_attr rint_pattern [(UNSPEC_LRINT "rint") (UNSPEC_LROUND
"round")])
-(define_int_attr rint_rm [(UNSPEC_LRINT "dyn") (UNSPEC_LROUND "rmm")])
+;; Iterator and attributes for floating-point rounding instructions.f
+(define_int_iterator RINT [UNSPEC_LRINT UNSPEC_LROUND UNSPEC_LCEIL
UNSPEC_LFLOOR])
+(define_int_attr rint_pattern [(UNSPEC_LRINT "rint") (UNSPEC_LROUND
"round")
+ (UNSPEC_LCEIL "ceil") (UNSPEC_LFLOOR
"floor")])
+(define_int_attr rint_rm [(UNSPEC_LRINT "dyn") (UNSPEC_LROUND "rmm")
+(UNSPEC_LCEIL "rup") (UNSPEC_LFLOOR "rdn")])

 ;; Iterator and attributes for quiet comparisons.
 (define_int_iterator QUIET_COMPARISON [UNSPEC_FLT_QUIET UNSPEC_FLE_QUIET])
diff --git a/gcc/testsuite/gcc.target/riscv/lfloor-lceil.c
b/gcc/testsuite/gcc.target/riscv/lfloor-lceil.c
new file mode 100644
index 000..4d81c12cefa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/lfloor-lceil.c
@@ -0,0 +1,79 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int
+ceil1(float i)
+{
+  return __builtin_lceil(i);
+}
+
+long
+ceil2(float i)
+{
+  return __builtin_lceil(i);
+}
+
+long long
+ceil3(float i)
+{
+  return __builtin_lceil(i);
+}
+
+int
+ceil4(double i)
+{
+  return __builtin_lceil(i);
+}
+
+long
+ceil5(double i)
+{
+  return __builtin_lceil(i);
+}
+
+long long
+ceil6(double i)
+{
+  return __builtin_lceil(i);
+}
+
+int
+floor1(float i)
+{
+  return __builtin_lfloor(i);
+}
+
+long
+floor2(float i)
+{
+  return __builtin_lfloor(i);
+}
+
+long long
+floor3(float i)
+{
+  return __builtin_lfloor(i);
+}
+
+int
+floor4(double i)
+{
+  return __builtin_lfloor(i);
+}
+
+long
+floor5(double i)
+{
+  return __builtin_lfloor(i);
+}
+
+long long
+floor6(double i)
+{
+  return __builtin_lfloor(i);
+}
+
+/* { dg-final { scan-assembler-times "fcvt.l.s" 6 } } */
+/* { dg-final { scan-assembler-times "fcvt.l.d" 6 } } */
+/* { dg-final { scan-assembler-not "call" } } */
-- 
2.25.1


Re: [PATCH v3] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-08-15 Thread Segher Boessenkool
Hi!

On Mon, Jun 27, 2022 at 10:47:26AM +0800, Kewen.Lin wrote:
> on 2022/6/25 00:49, Segher Boessenkool wrote:
> Many thanks for all the further explanation above!  The attached patch
> updated the misleading comments as you pointed out and suggested, could
> you help to have another look?

Please do not send proposed patches in the middle of a mail thread.  It
is harder to track things than needed this way, and it makes it hard to
apply your patches as well (for test builds, say).

> Subject: [PATCH] rs6000: Adjust mov optabs for opaque modes [PR103353]
> 
> As PR103353 shows, we may want to continue to expand built-in
> function __builtin_vsx_lxvp, even if we have already emitted
> error messages about some missing required conditions.  As
> shown in that PR, without one explicit mov optab on OOmode
> provided, it would call emit_move_insn recursively.

> -  rs6000_emit_move (operands[0], operands[1], OOmode);
> -  DONE;
> +  if (TARGET_MMA) {
> +rs6000_emit_move (operands[0], operands[1], OOmode);
> +DONE;
> +  }

  if (TARGET_MMA)
{
  rs6000_emit_move (operands[0], operands[1], OOmode);
  DONE;
}

> +  /* PR103353 shows we may want to continue to expand the __builtin_vsx_lxvp
> + built-in function, even if we have already emitted error messages about
> + some missing required conditions.  As shown in that PR, without one
> + explicit mov optab on OOmode provided, it would call emit_move_insn
> + recursively.  So we allow this pattern to be generated when we are
> + expanding to RTL and have seen errors.  It would not cause further ICEs
> + as the compilation would stop soon after expanding.  */
> +  else if (currently_expanding_to_rtl && seen_error ())
> +;

The comment goes inside this "else if" branch.  Maybe make it a braced
block if the semicolon looks out of place without it.

Removing the TARGET_MMA requirement is the correct thing to do no matter
what: this is not an MMA insn at all after all, and neither is it only
useful if MMA is enabled, etc.

Is the later "unreachable" ever useful?  If not, you could just fall
through after that "if (TARGET_MMA)" thing.  Worst that will happen you
get an OO move in the insn stream that we will error on later :-)

The patch is okay for trunk with the indentation fixed.  Thanks!


Segher


Re: [RFC]rs6000: split complicated constant to memory

2022-08-15 Thread Segher Boessenkool
Hi!

On Mon, Aug 15, 2022 at 01:25:19PM +0800, Jiufu Guo wrote:
> This patch tries to put the constant into constant pool if building the
> constant requires 3 or more instructions.
> 
> But there is a concern: I'm wondering if this patch is really profitable.
> 
> Because, as I tested, 1. for simple case, if instructions are not been run
> in parallel, loading constant from memory maybe faster; but 2. if there
> are some instructions could run in parallel, loading constant from memory
> are not win comparing with building constant.  As below examples.
> 
> For f1.c and f3.c, 'loading' constant would be acceptable in runtime aspect;
> for f2.c and f4.c, 'loading' constant are visibly slower. 
> 
> For real-world cases, both kinds of code sequences exist.
> 
> So, I'm not sure if we need to push this patch.
> 
> Run a lot of times (10) below functions to check runtime.
> f1.c:
> long foo (long *arg, long*, long *)
> {
>   *arg = 0x12345678;
> }
> asm building constant:
>   lis 10,0x1234
>   ori 10,10,0x5678
>   sldi 10,10,32
> vs.  asm loading
>   addis 10,2,.LC0@toc@ha
>   ld 10,.LC0@toc@l(10)

This is just a load insn, unless this is the only thing needing the TOC.
You can use crtl->uses_const_pool as an approximation here, to figure
out if we have that case?

> The runtime between 'building' and 'loading' are similar: some times the
> 'building' is faster; sometimes 'loading' is faster. And the difference is
> slight.

When there is only one constant, sure.  But that isn't the expensive
case we need to avoid :-)

>   addis 9,2,.LC2@toc@ha
>   ld 7,.LC0@toc@l(7)
>   ld 10,.LC1@toc@l(10)
>   ld 9,.LC2@toc@l(9)
> For this case, 'loading' is always slower than 'building' (>15%).

Only if there is nothing else to do, and only in cases where code size
does not matter (i.e. microbenchmarks).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
> @@ -0,0 +1,11 @@
> +/* PR target/63281 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -std=c99" } */

Why std=c99 btw?  The default is c17.  Is there something we need to
disable here?


Segher


Re: [PATCH] Implement __builtin_issignaling

2022-08-15 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 15, 2022 at 12:12 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The following patch implements a new builtin, __builtin_issignaling,
> which can be used to implement the ISO/IEC TS 18661-1  issignaling
> macro.
>
> It is implemented as type-generic function, so there is just one
> builtin, not many with various suffixes.
> This patch doesn't address PR56831 nor PR58416, but I think compared to
> using glibc issignaling macro could make some cases better (as
> the builtin is expanded always inline and for SFmode/DFmode just
> reinterprets a memory or pseudo register as SImode/DImode, so could
> avoid some raising of exception + turning sNaN into qNaN before the
> builtin can analyze it).
>
> For floading point modes that do not have NaNs it will return 0,
> otherwise I've tried to implement this for all the other supported
> real formats.
> It handles both the MIPS/PA floats where a sNaN has the mantissa
> MSB set and the rest where a sNaN has it cleared, with the exception
> of format which are known never to be in the MIPS/PA form.
> The MIPS/PA floats are handled using a test like
> (x & mask) == mask,
> the other usually as
> ((x ^ bit) & mask) > val
> where bit, mask and val are some constants.
> IBM double double is done by doing DFmode test on the most significant
> half, and Intel/Motorola extended (12 or 16 bytes) and IEEE quad are
> handled by extracting 32-bit/16-bit words or 64-bit parts from the
> value and testing those.
> On x86, XFmode is handled by a special optab so that even pseudo numbers
> are considered signaling, like in glibc and like the i386 specific testcase
> tests.
>
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux and
> powerpc64-linux (the last tested with -m32/-m64), ok for trunk?
>
> 2022-08-15  Jakub Jelinek  
>
> gcc/
> * builtins.def (BUILT_IN_ISSIGNALING): New built-in.
> * builtins.cc (expand_builtin_issignaling): New function.
> (expand_builtin_signbit): Don't overwrite target.
> (expand_builtin): Handle BUILT_IN_ISSIGNALING.
> (fold_builtin_classify): Likewise.
> (fold_builtin_1): Likewise.
> * optabs.def (issignaling_optab): New.
> * fold-const-call.cc (fold_const_call_ss): Handle
> BUILT_IN_ISSIGNALING.
> * config/i386/i386.md (issignalingxf2): New expander.
> * doc/extend.texi (__builtin_issignaling): Document.
> * doc/md.texi (issignaling2): Likewise.
> gcc/c-family/
> * c-common.cc (check_builtin_function_arguments): Handle
> BUILT_IN_ISSIGNALING.
> gcc/c/
> * c-typeck.cc (convert_arguments): Handle BUILT_IN_ISSIGNALING.
> gcc/fortran/
> * f95-lang.cc (gfc_init_builtin_functions): Initialize
> BUILT_IN_ISSIGNALING.
> gcc/testsuite/
> * gcc.dg/torture/builtin-issignaling-1.c: New test.
> * gcc.dg/torture/builtin-issignaling-2.c: New test.
> * gcc.target/i386/builtin-issignaling-1.c: New test.

OK for x86 part.

Thanks,
Uros.

>
> --- gcc/builtins.def.jj 2022-01-11 23:11:21.548301986 +0100
> +++ gcc/builtins.def2022-08-11 12:15:14.200908656 +0200
> @@ -908,6 +908,7 @@ DEF_GCC_BUILTIN(BUILT_IN_ISLESS,
>  DEF_GCC_BUILTIN(BUILT_IN_ISLESSEQUAL, "islessequal", BT_FN_INT_VAR, 
> ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
>  DEF_GCC_BUILTIN(BUILT_IN_ISLESSGREATER, "islessgreater", 
> BT_FN_INT_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
>  DEF_GCC_BUILTIN(BUILT_IN_ISUNORDERED, "isunordered", BT_FN_INT_VAR, 
> ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
> +DEF_GCC_BUILTIN(BUILT_IN_ISSIGNALING, "issignaling", BT_FN_INT_VAR, 
> ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
>  DEF_LIB_BUILTIN(BUILT_IN_LABS, "labs", BT_FN_LONG_LONG, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_C99_BUILTIN(BUILT_IN_LLABS, "llabs", BT_FN_LONGLONG_LONGLONG, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_LONGJMP, "longjmp", BT_FN_VOID_PTR_INT, 
> ATTR_NORETURN_NOTHROW_LIST)
> --- gcc/builtins.cc.jj  2022-07-26 10:32:23.250277352 +0200
> +++ gcc/builtins.cc 2022-08-12 17:13:06.158423558 +0200
> @@ -123,6 +123,7 @@ static rtx expand_builtin_fegetround (tr
>  static rtx expand_builtin_feclear_feraise_except (tree, rtx, machine_mode,
>   optab);
>  static rtx expand_builtin_cexpi (tree, rtx);
> +static rtx expand_builtin_issignaling (tree, rtx);
>  static rtx expand_builtin_int_roundingfn (tree, rtx);
>  static rtx expand_builtin_int_roundingfn_2 (tree, rtx);
>  static rtx expand_builtin_next_arg (void);
> @@ -2747,6 +2748,294 @@ build_call_nofold_loc (location_t loc, t
>return fn;
>  }
>
> +/* Expand the __builtin_issignaling builtin.  This needs to handle
> +   all floating point formats that do support NaNs (for those that
> +   don't it just sets target to 0).  */
> +
> +static rtx
> +expand_builtin_issignaling (tree exp, rtx target)
> +{
> +  if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE)

[committed] d: Defer compiling inline definitions until after the module has finished.

2022-08-15 Thread Iain Buclaw via Gcc-patches
Hi,

This patch is a follow-up to r13-2002 and r12-8673 (PR d/106563), that
instead pushes inline definitions onto a deferred list to be compiled
after the current module has finished.

This is to prevent the case of when generating the methods of a struct
type, we don't accidentally emit an inline function that references it,
as the outer struct itself would still be incomplete.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32,
committed to mainline, and backported to the releases/gcc-12 branch.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-tree.h (d_defer_declaration): Declare.
* decl.cc (function_needs_inline_definition_p): Defer checking
DECL_UNINLINABLE and DECL_DECLARED_INLINE_P.
(maybe_build_decl_tree): Call d_defer_declaration instead of
build_decl_tree.
* modules.cc (deferred_inline_declarations): New variable.
(build_module_tree): Set deferred_inline_declarations and a handle
declarations pushed to it.
(d_defer_declaration): New function.
---
 gcc/d/d-tree.h   |  1 +
 gcc/d/decl.cc| 22 +++---
 gcc/d/modules.cc | 20 
 3 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index 809a242ea93..4885cfe2b15 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -673,6 +673,7 @@ extern tree maybe_expand_intrinsic (tree);
 extern void build_module_tree (Module *);
 extern tree d_module_context (void);
 extern void register_module_decl (Declaration *);
+extern void d_defer_declaration (Declaration *);
 extern void d_finish_compilation (tree *, int);
 
 /* In runtime.cc.  */
diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 0131b01dcc9..e91aee30845 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -1046,18 +1046,10 @@ function_needs_inline_definition_p (FuncDeclaration *fd)
   if (!DECL_EXTERNAL (fd->csym))
 return false;
 
-  /* Non-inlineable functions are always external.  */
-  if (DECL_UNINLINABLE (fd->csym))
-return false;
-
   /* No function body available for inlining.  */
   if (!fd->fbody)
 return false;
 
-  /* Ignore functions that aren't decorated with `pragma(inline)'.  */
-  if (fd->inlining != PINLINE::always)
-return false;
-
   /* These functions are tied to the module they are defined in.  */
   if (fd->isFuncLiteralDeclaration ()
   || fd->isUnitTestDeclaration ()
@@ -1070,6 +1062,14 @@ function_needs_inline_definition_p (FuncDeclaration *fd)
   if (function_defined_in_root_p (fd))
 return false;
 
+  /* Non-inlineable functions are always external.  */
+  if (DECL_UNINLINABLE (fd->csym))
+return false;
+
+  /* Ignore functions that aren't decorated with `pragma(inline)'.  */
+  if (!DECL_DECLARED_INLINE_P (fd->csym))
+return false;
+
   /* Weak functions cannot be inlined.  */
   if (lookup_attribute ("weak", DECL_ATTRIBUTES (fd->csym)))
 return false;
@@ -1081,8 +1081,8 @@ function_needs_inline_definition_p (FuncDeclaration *fd)
   return true;
 }
 
-/* If the variable or function declaration in DECL needs to be defined, call
-   build_decl_tree on it now before returning its back-end symbol.  */
+/* If the variable or function declaration in DECL needs to be defined, add it
+   to the list of deferred declarations to build later.  */
 
 static tree
 maybe_build_decl_tree (Declaration *decl)
@@ -1103,7 +1103,7 @@ maybe_build_decl_tree (Declaration *decl)
   if (function_needs_inline_definition_p (fd))
{
  DECL_EXTERNAL (fd->csym) = 0;
- build_decl_tree (fd);
+ d_defer_declaration (fd);
}
 }
 
diff --git a/gcc/d/modules.cc b/gcc/d/modules.cc
index edc79122365..0aac8fe3545 100644
--- a/gcc/d/modules.cc
+++ b/gcc/d/modules.cc
@@ -121,6 +121,9 @@ static module_info *current_testing_module;
 
 static Module *current_module_decl;
 
+/* Any inline symbols that were deferred during codegen.  */
+vec *deferred_inline_declarations;
+
 /* Returns an internal function identified by IDENT.  This is used
by both module initialization and dso handlers.  */
 
@@ -724,6 +727,9 @@ build_module_tree (Module *decl)
   current_testing_module = &mitest;
   current_module_decl = decl;
 
+  vec deferred_decls = vNULL;
+  deferred_inline_declarations = &deferred_decls;
+
   /* Layout module members.  */
   if (decl->members)
 {
@@ -811,9 +817,14 @@ build_module_tree (Module *decl)
   layout_moduleinfo (decl);
 }
 
+  /* Process all deferred functions after finishing module.  */
+  for (size_t i = 0; i < deferred_decls.length (); ++i)
+build_decl_tree (deferred_decls[i]);
+
   current_moduleinfo = NULL;
   current_testing_module = NULL;
   current_module_decl = NULL;
+  deferred_inline_declarations = NULL;
 }
 
 /* Returns the current function or module context for the purpose
@@ -888,6 +899,15 @@ register_module_decl (Declaration *d)
 }
 }
 
+/* Add DECL as a declaration to emit at the end of the current module.  */
+
+void
+d_defer_declarat

[committed] d: Fix internal compiler error: Segmentation fault at gimple-expr.cc:88

2022-08-15 Thread Iain Buclaw via Gcc-patches
Hi,

This patch fixes an ICE in the middle-end caused by the D front-end's
code generation for the special enum representing native complex types.

Because complex types are deprecated in the language, the new way to
expose native complex types is by defining an enum with a basetype of a
library-defined struct that is implicitly treated as-if it is native.
As casts are not implicitly added by the front-end when downcasting from
enum to its underlying type, we must insert an explicit cast during the
code generation pass.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32/-mx32,
committed to mainline and backported to the releases/gcc-12 branch.

Regards,
Iain.

---
PR d/106623

gcc/d/ChangeLog:

* d-codegen.cc (underlying_complex_expr): New function.
(d_build_call): Handle passing native complex objects as the
library-defined equivalent.
* d-tree.h (underlying_complex_expr): Declare.
* expr.cc (ExprVisitor::visit (DotVarExp *)): Call
underlying_complex_expr instead of build_vconvert.

gcc/testsuite/ChangeLog:

* gdc.dg/torture/pr106623.d: New test.
---
 gcc/d/d-codegen.cc  | 34 +
 gcc/d/d-tree.h  |  1 +
 gcc/d/expr.cc   |  2 +-
 gcc/testsuite/gdc.dg/torture/pr106623.d | 28 
 4 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gdc.dg/torture/pr106623.d

diff --git a/gcc/d/d-codegen.cc b/gcc/d/d-codegen.cc
index d02da1f81e3..aa6bc9e53e4 100644
--- a/gcc/d/d-codegen.cc
+++ b/gcc/d/d-codegen.cc
@@ -1588,6 +1588,32 @@ complex_expr (tree type, tree re, tree im)
  type, re, im);
 }
 
+/* Build a two-field record TYPE representing the complex expression EXPR.  */
+
+tree
+underlying_complex_expr (tree type, tree expr)
+{
+  gcc_assert (list_length (TYPE_FIELDS (type)) == 2);
+
+  expr = d_save_expr (expr);
+
+  /* Build a constructor from the real and imaginary parts.  */
+  if (COMPLEX_FLOAT_TYPE_P (TREE_TYPE (expr)) &&
+  (!INDIRECT_REF_P (expr)
+   || !CONVERT_EXPR_CODE_P (TREE_CODE (TREE_OPERAND (expr, 0)
+{
+  vec  *ve = NULL;
+  CONSTRUCTOR_APPEND_ELT (ve, TYPE_FIELDS (type),
+real_part (expr));
+  CONSTRUCTOR_APPEND_ELT (ve, TREE_CHAIN (TYPE_FIELDS (type)),
+imaginary_part (expr));
+  return build_constructor (type, ve);
+}
+
+  /* Replace type in the reinterpret cast with a cast to the record type.  */
+  return build_vconvert (type, expr);
+}
+
 /* Cast EXP (which should be a pointer) to TYPE* and then indirect.
The back-end requires this cast in many cases.  */
 
@@ -2214,6 +2240,14 @@ d_build_call (TypeFunction *tf, tree callable, tree 
object,
  build_address (targ));
}
 
+ /* Complex types are exposed as special types with an underlying
+struct representation, if we are passing the native type to a
+function that accepts the library-defined version, then ensure
+it is properly reinterpreted as the underlying struct type.  */
+ if (COMPLEX_FLOAT_TYPE_P (TREE_TYPE (targ))
+ && arg->type->isTypeStruct ())
+   targ = underlying_complex_expr (build_ctype (arg->type), targ);
+
  /* Type `noreturn` is a terminator, as no other arguments can possibly
 be evaluated after it.  */
  if (TREE_TYPE (targ) == noreturn_type_node)
diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index c3e95e4d2d2..809a242ea93 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -576,6 +576,7 @@ extern tree size_mult_expr (tree, tree);
 extern tree real_part (tree);
 extern tree imaginary_part (tree);
 extern tree complex_expr (tree, tree, tree);
+extern tree underlying_complex_expr (tree, tree);
 extern tree indirect_ref (tree, tree);
 extern tree build_deref (tree);
 extern tree build_pointer_index (tree, tree);
diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index 40c2689a3b9..140df7ee41d 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -1892,7 +1892,7 @@ public:
   underlying is really a complex type.  */
if (e->e1->type->ty == TY::Tenum
&& e->e1->type->isTypeEnum ()->sym->isSpecial ())
- object = build_vconvert (build_ctype (tb), object);
+ object = underlying_complex_expr (build_ctype (tb), object);
 
this->result_ = component_ref (object, get_symbol_decl (vd));
  }
diff --git a/gcc/testsuite/gdc.dg/torture/pr106623.d 
b/gcc/testsuite/gdc.dg/torture/pr106623.d
new file mode 100644
index 000..d782b236861
--- /dev/null
+++ b/gcc/testsuite/gdc.dg/torture/pr106623.d
@@ -0,0 +1,28 @@
+// https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106623
+// { dg-do compile }
+private struct _Complex(T) { T re; T im; }
+enum __c_complex_double : _Complex!double;
+
+pragma(inline, true)
+ulong hashOf()(scope const dou

Re: [PATCH][_GLIBCXX_DEBUG] Add basic_string::starts_with/ends_with checks

2022-08-15 Thread François Dumont via Gcc-patches

With the patch !

On 14/08/22 17:32, François Dumont wrote:

I think we can add those checks.

Note that I wonder if it was needed as in basic_string_view I see 
usages of __attribute__((__nonnull__)). But running the test I saw no 
impact even after I try to apply this attribute to the 
starts_with/ends_with methods themselves.


Also note that several checks like the ones I am adding here are 
XFAILS when using 'make check' because of the segfault rather than on 
a proper debug checks. Would you prefer to add dg-require-debug-mode 
to those ?


    libstdc++: [_GLIBCXX_DEBUG] Add 
basic_string::starts_with/ends_with checks


    Add simple checks on C string parameters which should not be null.

    Review null string checks to show:
    _String != nullptr

    rather than:
    _String != 0

    libstdc++-v3/ChangeLog:

    * include/bits/basic_string.h (starts_with, ends_with): 
Add __glibcxx_check_string.
    * include/bits/cow_string.h (starts_with, ends_with): 
Likewise.
    * include/debug/debug.h: Use nullptr rather than '0' in 
checks in C++11.

    * include/debug/string: Likewise.
    * 
testsuite/21_strings/basic_string/operations/ends_with/char.cc: Use 
__gnu_test::string.
    * 
testsuite/21_strings/basic_string/operations/ends_with/wchar_t.cc: Use 
__gnu_test::wstring.
    * 
testsuite/21_strings/basic_string/operations/starts_with/wchar_t.cc: 
Use __gnu_test::wstring.
    * 
testsuite/21_strings/basic_string/operations/starts_with/char.cc: Use 
__gnu_test::string.
    * 
testsuite/21_strings/basic_string/operations/ends_with/char_neg.cc: 
New test.
    * 
testsuite/21_strings/basic_string/operations/ends_with/wchar_t_neg.cc: 
New test.
    * 
testsuite/21_strings/basic_string/operations/starts_with/char_neg.cc: 
New test.
    * 
testsuite/21_strings/basic_string/operations/starts_with/wchar_t_neg.cc: 
New test.


Tested under linux normal and debug modes.

François


diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index b04fba95678..d06330e6c48 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3402,7 +3402,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   constexpr bool
   starts_with(const _CharT* __x) const noexcept
-  { return __sv_type(this->data(), this->size()).starts_with(__x); }
+  {
+	__glibcxx_requires_string(__x);
+	return __sv_type(this->data(), this->size()).starts_with(__x);
+  }
 
   constexpr bool
   ends_with(basic_string_view<_CharT, _Traits> __x) const noexcept
@@ -3414,7 +3417,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
   constexpr bool
   ends_with(const _CharT* __x) const noexcept
-  { return __sv_type(this->data(), this->size()).ends_with(__x); }
+  {
+	__glibcxx_requires_string(__x);
+	return __sv_type(this->data(), this->size()).ends_with(__x);
+  }
 #endif // C++20
 
 #if __cplusplus > 202002L
diff --git a/libstdc++-v3/include/bits/cow_string.h b/libstdc++-v3/include/bits/cow_string.h
index f16e33ac1ef..59b36a1006a 100644
--- a/libstdc++-v3/include/bits/cow_string.h
+++ b/libstdc++-v3/include/bits/cow_string.h
@@ -3014,7 +3014,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   bool
   starts_with(const _CharT* __x) const noexcept
-  { return __sv_type(this->data(), this->size()).starts_with(__x); }
+  {
+	__glibcxx_requires_string(__x);
+	return __sv_type(this->data(), this->size()).starts_with(__x);
+  }
 
   bool
   ends_with(basic_string_view<_CharT, _Traits> __x) const noexcept
@@ -3026,7 +3029,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   bool
   ends_with(const _CharT* __x) const noexcept
-  { return __sv_type(this->data(), this->size()).ends_with(__x); }
+  {
+	__glibcxx_requires_string(__x);
+	return __sv_type(this->data(), this->size()).ends_with(__x);
+  }
 #endif // C++20
 
 #if __cplusplus > 202011L
diff --git a/libstdc++-v3/include/debug/debug.h b/libstdc++-v3/include/debug/debug.h
index 28e250f0c50..f4233760426 100644
--- a/libstdc++-v3/include/debug/debug.h
+++ b/libstdc++-v3/include/debug/debug.h
@@ -118,10 +118,17 @@ namespace __gnu_debug
   __glibcxx_check_heap(_First,_Last)
 # define __glibcxx_requires_heap_pred(_First,_Last,_Pred)	\
   __glibcxx_check_heap_pred(_First,_Last,_Pred)
-# define __glibcxx_requires_string(_String)	\
+# if __cplusplus < 201103L
+#  define __glibcxx_requires_string(_String)	\
   _GLIBCXX_DEBUG_PEDASSERT(_String != 0)
-# define __glibcxx_requires_string_len(_String,_Len)	\
+#  define __glibcxx_requires_string_len(_String,_Len)	\
   _GLIBCXX_DEBUG_PEDASSERT(_String != 0 || _Len == 0)
+# else
+#  define __glibcxx_requires_string(_String)	\
+  _GLIBCXX_DEBUG_PEDASSERT(_String != nullptr)
+#  define __glibcxx_requires_string_len(_String,_Len)	\
+  _GLIBCXX_DEBUG_PEDASSERT(_String != nullptr || _Len == 0)
+# endif
 # define __glibcxx_requires_irreflexive(

Re: [PATCH] fortran: Expand ieee_arithmetic module's ieee_value inline [PR106579]

2022-08-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 15, 2022 at 10:00:02PM +0200, FX wrote:
> I have two questions, on this and the ieee_class patch:
> 
> 
> > +  tree type = TREE_TYPE (arg);
> > +  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
> > +  tree field = NULL_TREE;
> > +  for (tree f = TYPE_FIELDS (type); f != NULL_TREE; f = DECL_CHAIN (f))
> > +if (TREE_CODE (f) == FIELD_DECL)
> > +  {
> > +   gcc_assert (field == NULL_TREE);
> > +   field = f;
> > +  }
> > +  gcc_assert (field);
> 
> Why looping over fields? The class type is a simple type with only one member 
> (and it should be an integer, we can assert that).

I wanted to make sure it has exactly one field.
The ieee_arithmetic.F90 module in libgfortran surely does that, but I've
been worrying about some user overriding that module with something
different.  At least in the C/C++ FEs we had in the past tons of bugs filed
for when some builtin made some assumptions about some headers and data
types in those and then somebody running a testcase that violated that.
Even failed gcc_assertion isn't the best answer to that, ideally one would
verify that upfront and then either error, sorry or ignore the call (leave
it as is).  In that last case, it might be better to do the check on the
gfortran FE types instead of trees (verify the return type or second
argument type is ieee_class_type derived type with a single integral
(hidden) field).

> > +   case IEEE_POSITIVE_ZERO:
> > + /* Make this also the default: label.  */
> > + label = gfc_build_label_decl (NULL_TREE);
> > + tmp = build_case_label (NULL_TREE, NULL_TREE, label);
> > + gfc_add_expr_to_block (&body, tmp);
> > + real_from_integer (&real, TYPE_MODE (type), 0, SIGNED);
> > + break;
> 
> Do we need a default label? It’s not like this is a more likely case than 
> anything else…

The libgfortran version had default: label:
switch (type) \
{ \
  case IEEE_SIGNALING_NAN: \
return __builtin_nans ## SUFFIX (""); \
  case IEEE_QUIET_NAN: \
return __builtin_nan ## SUFFIX (""); \
  case IEEE_NEGATIVE_INF: \
return - __builtin_inf ## SUFFIX (); \
  case IEEE_NEGATIVE_NORMAL: \
return -42; \
  case IEEE_NEGATIVE_DENORMAL: \
return -(GFC_REAL_ ## TYPE ## _TINY) / 2; \
  case IEEE_NEGATIVE_ZERO: \
return -(GFC_REAL_ ## TYPE) 0; \
  case IEEE_POSITIVE_ZERO: \
return 0; \
  case IEEE_POSITIVE_DENORMAL: \
return (GFC_REAL_ ## TYPE ## _TINY) / 2; \
  case IEEE_POSITIVE_NORMAL: \
return 42; \
  case IEEE_POSITIVE_INF: \
return __builtin_inf ## SUFFIX (); \
  default: \
return 0; \
} \
and I've tried to traslate that into what it generates.
There is at least the IEEE_OTHER_VALUE (aka 0) value
that isn't covered in the switch, but it is just an integer
under the hood, so it could have any other value.

Jakub



Re: [PATCH] fortran: Expand ieee_arithmetic module's ieee_class inline [PR106579]

2022-08-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 15, 2022 at 09:47:45PM +0200, FX wrote:
> Question to the Fortran maintainers:
> 
> Do you know if the standard allows IEEE_CLASS and IEEE_VALUE to be used as 
> procedure pointers? I think not, because they do not follow (in F2008) the 
> standard constraint C729 / R740.
> 
> If so, we need to keep these functions implementations in libgfortran for now 
> (for ABI compatibility) but can remove them at the next breakage. Is one 
> planned? Where is this tracked, is it still at 
> https://gcc.gnu.org/wiki/LibgfortranAbiCleanup or do we have another place 
> (e.g. in bugzilla)?

Both are elemental generic procedures, and we have
Procedure pointer %qs at %L shall not be elemental
and
Interface %qs at %L may not be generic
errors for these 2 cases (trying to create procedure
pointer to elemental and trying to create procedure
pointer to generic procedure).

Jakub



Re: [PATCH] c: Implement C23 nullptr (N3042)

2022-08-15 Thread Jason Merrill via Gcc-patches

On 8/13/22 14:35, Marek Polacek wrote:

This patch implements the C23 nullptr literal:
, which is
intended to replace the problematic definition of NULL which might be
either of integer type or void*.

Since C++ has had nullptr for over a decade now, it was relatively easy
to just move the built-in node definitions from the C++ FE to the C/C++
common code.  Also, our DWARF emitter already handles NULLPTR_TYPE by
emitting DW_TAG_unspecified_type.  However, I had to handle a lot of
contexts such as ?:, comparison, conversion, etc.

There are some minor differences, e.g. in C you can do

   bool b = nullptr;

but in C++ you have to use direct-initialization:

   bool b{nullptr};

And I think that

   nullptr_t n = 0;

is only valid in C++.

Of course, C doesn't have to handle mangling, RTTI, substitution,
overloading, ...

This patch also defines nullptr_t in .  I'm uncertain about
the __STDC_VERSION__ version I should be checking.  Also, I'm not
defining __STDC_VERSION_STDDEF_H__ yet, because I don't know what value
it should be defined to.  Do we know yet?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


The C++ changes are OK, but you probably want a comment in 
c_common_nodes_and_builtins that we aren't setting the alignment there 
for C++ backward ABI bug compatibility.  Or perhaps set it there and 
then break it in the C++ front end when abi < 9.



gcc/c-family/ChangeLog:

* c-common.cc (c_common_reswords): Enable nullptr in C.
(c_common_nodes_and_builtins): Create the built-in node for nullptr.
* c-common.h (enum c_tree_index): Add CTI_NULLPTR, CTI_NULLPTR_TYPE.
(nullptr_node): Define.
(nullptr_type_node): Define.
(NULLPTR_TYPE_P): Define.
* c-pretty-print.cc (c_pretty_printer::simple_type_specifier): Handle
NULLPTR_TYPE.
(c_pretty_printer::direct_abstract_declarator): Likewise.
(c_pretty_printer::constant): Likewise.

gcc/c/ChangeLog:

* c-convert.cc (c_convert) : Handle NULLPTR_TYPE.
Give a better diagnostic when converting to nullptr_t.
* c-decl.cc (c_init_decl_processing): Perform C-specific nullptr
initialization.
* c-parser.cc (c_parser_postfix_expression): Handle RID_NULLPTR.
* c-typeck.cc (null_pointer_constant_p): Return true for NULLPTR_TYPE_P.
(build_unary_op) : Handle NULLPTR_TYPE.
(build_conditional_expr): Handle the case when the second/third operand
is NULLPTR_TYPE and third/second operand is POINTER_TYPE.
(convert_for_assignment): Handle converting an expression of type
nullptr_t to pointer/bool.
(build_binary_op) : Handle NULLPTR_TYPE.
: Likewise.

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index): Remove CTI_NULLPTR, CTI_NULLPTR_TYPE.
Move it to c_tree_index.
(nullptr_node): No longer define here.
(nullptr_type_node): Likewise.
(NULLPTR_TYPE_P): Likewise.
* decl.cc (cxx_init_decl_processing): Only keep C++-specific nullptr
initialization; move the shared code to c_common_nodes_and_builtins.

gcc/ChangeLog:

* ginclude/stddef.h: Define nullptr_t.

gcc/testsuite/ChangeLog:

* gcc.dg/Wcxx-compat-2.c: Remove nullptr test.
* gcc.dg/c2x-nullptr-1.c: New test.
* gcc.dg/c2x-nullptr-2.c: New test.
* gcc.dg/c2x-nullptr-3.c: New test.
* gcc.dg/c2x-nullptr-4.c: New test.
* gcc.dg/c2x-nullptr-5.c: New test.
---
  gcc/c-family/c-common.cc |  13 +-
  gcc/c-family/c-common.h  |   8 +
  gcc/c-family/c-pretty-print.cc   |   7 +
  gcc/c/c-convert.cc   |  19 ++-
  gcc/c/c-decl.cc  |   6 +
  gcc/c/c-parser.cc|   8 +
  gcc/c/c-typeck.cc|  55 +-
  gcc/cp/cp-tree.h |   8 -
  gcc/cp/decl.cc   |   8 +-
  gcc/ginclude/stddef.h|   8 +
  gcc/testsuite/gcc.dg/Wcxx-compat-2.c |   1 -
  gcc/testsuite/gcc.dg/c2x-nullptr-1.c | 239 +++
  gcc/testsuite/gcc.dg/c2x-nullptr-2.c |   9 +
  gcc/testsuite/gcc.dg/c2x-nullptr-3.c |  62 +++
  gcc/testsuite/gcc.dg/c2x-nullptr-4.c |  10 ++
  gcc/testsuite/gcc.dg/c2x-nullptr-5.c |  11 ++
  16 files changed, 448 insertions(+), 24 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-1.c
  create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-2.c
  create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-3.c
  create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-4.c
  create mode 100644 gcc/testsuite/gcc.dg/c2x-nullptr-5.c

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 6e41ceb38e9..809e7ff5804 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -500,7 +500,7 @@ const struct c_common_resword c_common_reswords[] =
{ "namespace",RID_NAMESPACE,  D_CXXONLY | D_CXXWARN },
{

Re: [PATCH] fortran: Expand ieee_arithmetic module's ieee_value inline [PR106579]

2022-08-15 Thread FX via Gcc-patches
Hi Jakub,

I have two questions, on this and the ieee_class patch:


> +  tree type = TREE_TYPE (arg);
> +  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
> +  tree field = NULL_TREE;
> +  for (tree f = TYPE_FIELDS (type); f != NULL_TREE; f = DECL_CHAIN (f))
> +if (TREE_CODE (f) == FIELD_DECL)
> +  {
> + gcc_assert (field == NULL_TREE);
> + field = f;
> +  }
> +  gcc_assert (field);

Why looping over fields? The class type is a simple type with only one member 
(and it should be an integer, we can assert that).


> + case IEEE_POSITIVE_ZERO:
> +   /* Make this also the default: label.  */
> +   label = gfc_build_label_decl (NULL_TREE);
> +   tmp = build_case_label (NULL_TREE, NULL_TREE, label);
> +   gfc_add_expr_to_block (&body, tmp);
> +   real_from_integer (&real, TYPE_MODE (type), 0, SIGNED);
> +   break;

Do we need a default label? It’s not like this is a more likely case than 
anything else…


Thanks,
FX

Re: [PATCH] c++: Implement P2327R1 - De-deprecating volatile compound operations

2022-08-15 Thread Jason Merrill via Gcc-patches

On 8/15/22 03:31, Jakub Jelinek wrote:


 From what I can see, this has been voted in as a DR and as it means
we warn less often than before in -std={gnu,c}++2{0,3} modes or with
-Wvolatile, I wonder if it shouldn't be backported to affected release
branches as well.


If people are complaining about it on release branches, sure.


Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2022-08-15  Jakub Jelinek  

* typeck.cc (cp_build_modify_expr): Implement
P2327R1 - De-deprecating volatile compound operations.  Don't warn
for |=, &= or ^= with volatile lhs.
* expr.cc (mark_use) : Adjust warning wording,
leave out simple.

* g++.dg/cpp2a/volatile1.C: Adjust for de-deprecation of volatile
compound |=, &= and ^= operations.
* g++.dg/cpp2a/volatile3.C: Likewise.
* g++.dg/cpp2a/volatile5.C: Likewise.

--- gcc/cp/typeck.cc.jj 2022-06-17 17:36:19.689107831 +0200
+++ gcc/cp/typeck.cc2022-08-14 11:14:15.368316963 +0200
@@ -9136,10 +9136,14 @@ cp_build_modify_expr (location_t loc, tr
  
  	  /* An expression of the form E1 op= E2.  [expr.ass] says:

 "Such expressions are deprecated if E1 has volatile-qualified
-type."  We warn here rather than in cp_genericize_r because
+type and op is not one of the bitwise operators |, &, ^."
+We warn here rather than in cp_genericize_r because
 for compound assignments we are supposed to warn even if the
 assignment is a discarded-value expression.  */
- if (TREE_THIS_VOLATILE (lhs) || CP_TYPE_VOLATILE_P (lhstype))
+ if (modifycode != BIT_AND_EXPR
+ && modifycode != BIT_IOR_EXPR
+ && modifycode != BIT_XOR_EXPR
+ && (TREE_THIS_VOLATILE (lhs) || CP_TYPE_VOLATILE_P (lhstype)))
warning_at (loc, OPT_Wvolatile,
"compound assignment with %-qualified left "
"operand is deprecated");
--- gcc/cp/expr.cc.jj   2022-06-27 11:18:02.268063761 +0200
+++ gcc/cp/expr.cc  2022-08-14 11:41:37.555649422 +0200
@@ -220,7 +220,7 @@ mark_use (tree expr, bool rvalue_p, bool
  case MODIFY_EXPR:
{
  tree lhs = TREE_OPERAND (expr, 0);
- /* [expr.ass] "A simple assignment whose left operand is of
+ /* [expr.ass] "An assignment whose left operand is of
 a volatile-qualified type is deprecated unless the assignment
 is either a discarded-value expression or appears in an
 unevaluated context."  */
@@ -230,7 +230,7 @@ mark_use (tree expr, bool rvalue_p, bool
  && !TREE_THIS_VOLATILE (expr))
{
  if (warning_at (location_of (expr), OPT_Wvolatile,
- "using value of simple assignment with "
+ "using value of assignment with "
  "%-qualified left operand is "
  "deprecated"))
/* Make sure not to warn about this assignment again.  */
--- gcc/testsuite/g++.dg/cpp2a/volatile1.C.jj   2020-07-28 15:39:10.013756159 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/volatile1.C  2022-08-14 11:46:42.721626890 
+0200
@@ -56,6 +56,9 @@ fn2 ()
vi = i;
vi = i = 42;
i = vi = 42; // { dg-warning "assignment with .volatile.-qualified left operand is 
deprecated" "" { target c++20 } }
+  i = vi |= 42; // { dg-warning "using value of assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
+  i = vi &= 42; // { dg-warning "using value of assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
+  i = vi ^= 42; // { dg-warning "using value of assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
&(vi = i); // { dg-warning "assignment with .volatile.-qualified left operand is 
deprecated" "" { target c++20 } }
(vi = 42, 45);
(i = vi = 42, 10); // { dg-warning "assignment with .volatile.-qualified left operand is 
deprecated" "" { target c++20 } }
@@ -74,8 +77,9 @@ fn2 ()
vi += i; // { dg-warning "assignment with .volatile.-qualified left operand is 
deprecated" "" { target c++20 } }
vi -= i; // { dg-warning "assignment with .volatile.-qualified left operand is 
deprecated" "" { target c++20 } }
vi %= i; // { dg-warning "assignment with .volatile.-qualified left operand is 
deprecated" "" { target c++20 } }
-  vi ^= i; // { dg-warning "assignment with .volatile.-qualified left operand is 
deprecated" "" { target c++20 } }
-  vi |= i; // { dg-warning "assignment with .volatile.-qualified left operand is 
deprecated" "" { target c++20 } }
+  vi ^= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
+  vi |= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
+  vi &= i; // { dg-bogus "assignment with .volatile.-qualified left operand 

Re: [PATCH] c++: Implement -Wself-move warning [PR81159]

2022-08-15 Thread Jason Merrill via Gcc-patches

On 8/9/22 09:37, Marek Polacek wrote:

About 5 years ago we got a request to implement -Wself-move, which
warns about useless moves like this:

   int x;
   x = std::move (x);

This patch implements that warning.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/81159

gcc/c-family/ChangeLog:

* c.opt (Wself-move): New option.

gcc/cp/ChangeLog:

* typeck.cc (maybe_warn_self_move): New.
(cp_build_modify_expr): Call maybe_warn_self_move.

gcc/ChangeLog:

* doc/invoke.texi: Document -Wself-move.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wself-move1.C: New test.
---
  gcc/c-family/c.opt  |  4 ++
  gcc/cp/typeck.cc| 48 +-
  gcc/doc/invoke.texi | 23 ++-
  gcc/testsuite/g++.dg/warn/Wself-move1.C | 87 +
  4 files changed, 160 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wself-move1.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 44e1a60ce24..a098ae1830d 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1229,6 +1229,10 @@ Wselector
  ObjC ObjC++ Var(warn_selector) Warning
  Warn if a selector has multiple methods.
  
+Wself-move

+C++ ObjC++ Var(warn_self_move) Warning LangEnabledBy(C++ ObjC++, Wall)
+Warn when a value is moved to itself with std::move.
+
  Wsequence-point
  C ObjC C++ ObjC++ Var(warn_sequence_point) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall)
  Warn about possible violations of sequence point rules.
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 6e4f23af982..f05913c0fac 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -8897,7 +8897,51 @@ cp_build_c_cast (location_t loc, tree type, tree expr,
  
return error_mark_node;

  }
-
+
+/* Warn when a value is moved to itself with std::move.  LHS is the target,
+   RHS may be the std::move call, and LOC is the location of the whole
+   assignment.  */
+
+static void
+maybe_warn_self_move (location_t loc, tree lhs, tree rhs)
+{
+  if (!warn_self_move)
+return;
+
+  /* C++98 doesn't know move.  */
+  if (cxx_dialect < cxx11)
+return;
+
+  if (processing_template_decl)
+return;
+
+  /* We're looking for *std::move ((T &) &arg), or
+ *std::move ((T &) (T *) r) if the argument it a reference.  */
+  if (!REFERENCE_REF_P (rhs)
+  || TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR)
+return;
+  tree fn = TREE_OPERAND (rhs, 0);
+  if (!is_std_move_p (fn))
+return;
+  tree arg = CALL_EXPR_ARG (fn, 0);
+  if (TREE_CODE (arg) != NOP_EXPR)
+return;
+  /* Strip the (T &).  */
+  arg = TREE_OPERAND (arg, 0);
+  /* Strip the (T *) or &.  */
+  arg = TREE_OPERAND (arg, 0);


Are you sure these are the only two expressions that can make it here? 
What if the argument to move is *Tptr?



+  arg = convert_from_reference (arg);
+  /* So that we catch (i) = std::move (i);.  */
+  lhs = maybe_undo_parenthesized_ref (lhs);
+  STRIP_ANY_LOCATION_WRAPPER (lhs);
+  if (cp_tree_equal (lhs, arg))
+{
+  auto_diagnostic_group d;
+  if (warning_at (loc, OPT_Wself_move, "moving a variable to itself"))
+   inform (loc, "remove % call");
+}
+}
+
  /* For use from the C common bits.  */
  tree
  build_modify_expr (location_t location,
@@ -9101,6 +9145,8 @@ cp_build_modify_expr (location_t loc, tree lhs, enum 
tree_code modifycode,
  
if (modifycode == NOP_EXPR)

{
+ maybe_warn_self_move (loc, lhs, rhs);
+
  if (c_dialect_objc ())
{
  result = objc_maybe_build_modify_expr (lhs, rhs);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index f3e9429b2ca..28cf36b94c6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -264,7 +264,7 @@ in the following sections.
  -Wreorder  -Wregister @gol
  -Wstrict-null-sentinel  -Wno-subobject-linkage  -Wtemplates @gol
  -Wno-non-template-friend  -Wold-style-cast @gol
--Woverloaded-virtual  -Wno-pmf-conversions -Wsign-promo @gol
+-Woverloaded-virtual  -Wno-pmf-conversions -Wself-move -Wsign-promo @gol
  -Wsized-deallocation  -Wsuggest-final-methods @gol
  -Wsuggest-final-types  -Wsuggest-override  @gol
  -Wno-terminate  -Wuseless-cast  -Wno-vexing-parse  @gol
@@ -5841,6 +5841,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect 
Options}.
  -Wreorder   @gol
  -Wrestrict   @gol
  -Wreturn-type  @gol
+-Wself-move @r{(only for C++)}  @gol
  -Wsequence-point  @gol
  -Wsign-compare @r{(only in C++)}  @gol
  -Wsizeof-array-div @gol
@@ -6826,6 +6827,26 @@ of a declaration:
  
  This warning is enabled by @option{-Wall}.
  
+@item -Wno-self-move @r{(C++ and Objective-C++ only)}

+@opindex Wself-move
+@opindex Wno-self-move
+This warning warns when a value is moved to itself with @code{std::move}.
+Such a @code{std::move} has no effect.


...unless it naively breaks the object, like

T(T&& ot): data(ot.data) { ot.data = nullptr; } // oops


+@smallexample
+struct T @{
+@dots{}
+@};
+void fn()

Re: [PATCH] fortran: Expand ieee_arithmetic module's ieee_class inline [PR106579]

2022-08-15 Thread FX via Gcc-patches
Question to the Fortran maintainers:

Do you know if the standard allows IEEE_CLASS and IEEE_VALUE to be used as 
procedure pointers? I think not, because they do not follow (in F2008) the 
standard constraint C729 / R740.

If so, we need to keep these functions implementations in libgfortran for now 
(for ABI compatibility) but can remove them at the next breakage. Is one 
planned? Where is this tracked, is it still at 
https://gcc.gnu.org/wiki/LibgfortranAbiCleanup or do we have another place 
(e.g. in bugzilla)?

Thanks,
FX

[committed] d: Build internal TypeInfo types when module name is "object"

2022-08-15 Thread Iain Buclaw via Gcc-patches
Hi,

This patch is a fix-up for a previous change in r13-1070.

If for whatever reason the module declaration doesn't exist in the
object file, ensure that the internal definitions for TypeInfo and
TypeInfo_Class are still created, otherwise an ICE could occur later if
they are required for a run-time helper call.

Regression tested on x86_64-linux-gnu, and committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-compiler.cc (Compiler::onParseModule): Call create_tinfo_types
when module name is object.
* typeinfo.cc (create_tinfo_types): Add guard for multiple
invocations.
---
 gcc/d/d-compiler.cc | 11 +--
 gcc/d/typeinfo.cc   |  4 
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/d/d-compiler.cc b/gcc/d/d-compiler.cc
index ada9721541b..ef19df12437 100644
--- a/gcc/d/d-compiler.cc
+++ b/gcc/d/d-compiler.cc
@@ -130,8 +130,7 @@ Compiler::onParseModule (Module *m)
 {
   if (md->packages.length == 0)
{
- Identifier *id = (md && md->id) ? md->id : m->ident;
- if (!strcmp (id->toChars (), "object"))
+ if (!strcmp (md->id->toChars (), "object"))
{
  create_tinfo_types (m);
  return;
@@ -147,6 +146,14 @@ Compiler::onParseModule (Module *m)
}
}
 }
+  else if (m->ident)
+{
+  if (!strcmp (m->ident->toChars (), "object"))
+   {
+ create_tinfo_types (m);
+ return;
+   }
+}
 
   if (!flag_no_builtin)
 d_add_builtin_module (m);
diff --git a/gcc/d/typeinfo.cc b/gcc/d/typeinfo.cc
index d1f0d59952f..3577f669ed1 100644
--- a/gcc/d/typeinfo.cc
+++ b/gcc/d/typeinfo.cc
@@ -244,6 +244,10 @@ make_frontend_typeinfo (Identifier *ident, 
ClassDeclaration *base = NULL)
 void
 create_tinfo_types (Module *mod)
 {
+  /* Already generated internal types for the object module.  */
+  if (object_module != NULL)
+return;
+
   /* Build the internal TypeInfo and ClassInfo types.
  See TypeInfoVisitor for documentation of field layout.  */
   make_internal_typeinfo (TK_TYPEINFO_TYPE, Identifier::idPool ("TypeInfo"),
-- 
2.34.1



Re: [PATCH v2] c++: Extend -Wredundant-move for const-qual objects [PR90428]

2022-08-15 Thread Jason Merrill via Gcc-patches

On 8/8/22 13:27, Marek Polacek wrote:

On Sat, Aug 06, 2022 at 03:58:13PM -0800, Jason Merrill wrote:

On 8/6/22 11:13, Marek Polacek wrote:

In this PR, Jon suggested extending the -Wredundant-move warning
to warn when the user is moving a const object as in:

struct T { };

T f(const T& t)
{
  return std::move(t);
}

where the std::move is redundant, because T does not have
a T(const T&&) constructor (which is very unlikely).  Even with
the std::move, T(T&&) would not be used because it would mean
losing the const.  Instead, T(const T&) will be called.

I had to restructure the function a bit, but it's better now.  This patch
depends on my other recent patches to maybe_warn_pessimizing_move.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/90428

gcc/cp/ChangeLog:

* typeck.cc (maybe_warn_pessimizing_move): Extend the
-Wredundant-move warning to warn about std::move on a
const-qualified object.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/Wredundant-move1.C: Adjust dg-warning.
* g++.dg/cpp0x/Wredundant-move9.C: Likewise.
* g++.dg/cpp0x/Wredundant-move10.C: New test.
---
   gcc/cp/typeck.cc  | 157 +++---
   gcc/testsuite/g++.dg/cpp0x/Wredundant-move1.C |   3 +-
   .../g++.dg/cpp0x/Wredundant-move10.C  |  61 +++
   gcc/testsuite/g++.dg/cpp0x/Wredundant-move9.C |   3 +-
   4 files changed, 162 insertions(+), 62 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/Wredundant-move10.C

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 70a5efc45de..802bc9c43fb 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -10411,72 +10411,109 @@ maybe_warn_pessimizing_move (tree expr, tree type, 
bool return_p)
return;
   }
-  /* We're looking for *std::move ((T &) &arg).  */
-  if (REFERENCE_REF_P (expr)
-  && TREE_CODE (TREE_OPERAND (expr, 0)) == CALL_EXPR)
-{
-  tree fn = TREE_OPERAND (expr, 0);
-  if (is_std_move_p (fn))
-   {
- tree arg = CALL_EXPR_ARG (fn, 0);
- tree moved;
- if (TREE_CODE (arg) != NOP_EXPR)
-   return;
- arg = TREE_OPERAND (arg, 0);
- if (TREE_CODE (arg) != ADDR_EXPR)
-   return;
- arg = TREE_OPERAND (arg, 0);
- arg = convert_from_reference (arg);
- if (can_do_rvo_p (arg, type))
-   {
- auto_diagnostic_group d;
- if (!warning_suppressed_p (expr, OPT_Wpessimizing_move)
- && warning_at (loc, OPT_Wpessimizing_move,
-"moving a temporary object prevents copy "
-"elision"))
-   inform (loc, "remove % call");
-   }
- /* The rest of the warnings is only relevant for when we are
-returning from a function.  */
- else if (!return_p)
-   return;
- /* Warn if we could do copy elision were it not for the move.  */
- else if (can_do_nrvo_p (arg, type))
+  /* First, check if this is a call to std::move.  */
+  if (!REFERENCE_REF_P (expr)
+  || TREE_CODE (TREE_OPERAND (expr, 0)) != CALL_EXPR)
+return;
+  tree fn = TREE_OPERAND (expr, 0);
+  if (!is_std_move_p (fn))
+return;
+  tree arg = CALL_EXPR_ARG (fn, 0);
+  if (TREE_CODE (arg) != NOP_EXPR)
+return;
+  /* If we're looking at *std::move ((T &) &arg), do the pessimizing N/RVO
+ and implicitly-movable warnings.  */
+  if (TREE_CODE (TREE_OPERAND (arg, 0)) == ADDR_EXPR)
+{
+  arg = TREE_OPERAND (arg, 0);
+  arg = TREE_OPERAND (arg, 0);
+  arg = convert_from_reference (arg);
+  if (can_do_rvo_p (arg, type))


Incidentally, this function should probably have a different name if we're
checking it in non-return situations.


I've renamed it to can_elide_copy_prvalue_p in this patch.
  

+   {
+ auto_diagnostic_group d;
+ if (!warning_suppressed_p (expr, OPT_Wpessimizing_move)
+ && warning_at (loc, OPT_Wpessimizing_move,
+"moving a temporary object prevents copy elision"))
+   inform (loc, "remove % call");
+   }
+  /* The rest of the warnings is only relevant for when we are returning
+from a function.  */
+  if (!return_p)
+   return;
+
+  tree moved;
+  /* Warn if we could do copy elision were it not for the move.  */
+  if (can_do_nrvo_p (arg, type))
+   {
+ auto_diagnostic_group d;
+ if (!warning_suppressed_p (expr, OPT_Wpessimizing_move)
+ && warning_at (loc, OPT_Wpessimizing_move,
+"moving a local object in a return statement "
+"prevents copy elision"))
+   inform (loc, "remove % call");
+   }
+  /* Warn if the move is redundant.  It is redundant when we would
+do maybe-rvalue overload resolution even without std::move.  */
+  else if (warn_redundant_m

[committed] d: Field names of anonymous delegates should be same as regular delegate types.

2022-08-15 Thread Iain Buclaw via Gcc-patches
Hi,

This patch adjusts the field names of delegate csts constructed inline.
Doesn't change anything in the code generation or ABI, but makes it
consistent with regular delegates as names would match up when
inspecting tree dumps.

Regression tested on x86_64-linux-gnu and committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-codegen.cc (build_delegate_cst): Give anonymous delegate field
names same as per ABI spec.
---
 gcc/d/d-codegen.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/d/d-codegen.cc b/gcc/d/d-codegen.cc
index 3fd4bee58f6..d02da1f81e3 100644
--- a/gcc/d/d-codegen.cc
+++ b/gcc/d/d-codegen.cc
@@ -419,8 +419,8 @@ build_delegate_cst (tree method, tree object, Type *type)
 {
   /* Convert a function method into an anonymous delegate.  */
   ctype = make_struct_type ("delegate()", 2,
-   get_identifier ("object"), TREE_TYPE (object),
-   get_identifier ("func"), TREE_TYPE (method));
+   get_identifier ("ptr"), TREE_TYPE (object),
+   get_identifier ("funcptr"), TREE_TYPE (method));
   TYPE_DELEGATE (ctype) = 1;
 }
 
-- 
2.34.1



Re: [PATCH] Support threading of just the exit edge

2022-08-15 Thread Aldy Hernandez via Gcc-patches
On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod  wrote:
>
> heh. or just
>
>
> +  int_range<2> r;
> +  if (!fold_range (r, const_cast  (cond_stmt))
> +  || !r.singleton_p (&val))
>
>
> if you do not provide a range_query to any of the fold_using_range code,
> it defaults to:
>
> fur_source::fur_source (range_query *q)
> {
>if (q)
>  m_query = q;
>else if (cfun)
>  m_query = get_range_query (cfun);
>else
>  m_query = get_global_range_query ();
>m_gori = NULL;
> }
>

Sweet.  Even better!
Aldy



Re: [PATCH] Support threading of just the exit edge

2022-08-15 Thread Andrew MacLeod via Gcc-patches

heh. or just


+  int_range<2> r;
+  if (!fold_range (r, const_cast  (cond_stmt))
+      || !r.singleton_p (&val))


if you do not provide a range_query to any of the fold_using_range code, 
it defaults to:


fur_source::fur_source (range_query *q)
{
  if (q)
    m_query = q;
  else if (cfun)
    m_query = get_range_query (cfun);
  else
    m_query = get_global_range_query ();
  m_gori = NULL;
}

so it will default to the one you provide, and if there isn't one, it 
will try to use the cfun version if cfun is available.. otherwise it 
defaults to the global range query.  So you dont need to provide the 
cfun version.


This applies to the 5 basic routines in gimple-fold-range.h:

// Fold stmt S into range R using range query Q.
bool fold_range (vrange &r, gimple *s, range_query *q = NULL);
// Recalculate stmt S into R using range query Q as if it were on edge 
ON_EDGE.

bool fold_range (vrange &v, gimple *s, edge on_edge, range_query *q = NULL);

// These routines the operands to be specified when manually folding.
// Any excess queries will be drawn from the current range_query.
bool fold_range (vrange &r, gimple *s, vrange &r1);
bool fold_range (vrange &r, gimple *s, vrange &r1, vrange &r2);
bool fold_range (vrange &r, gimple *s, unsigned num_elements, vrange 
**vector);


Andrew


On 8/15/22 15:09, Aldy Hernandez wrote:

On Mon, Aug 15, 2022 at 11:39 AM Richard Biener  wrote:

On Fri, 12 Aug 2022, Aldy Hernandez wrote:


On Fri, Aug 12, 2022 at 2:01 PM Richard Biener  wrote:

This started with noticing we add ENTRY_BLOCK to our threads
just for the sake of simplifying the conditional at the end of
the first block in a function.  That's not really threading
anything but it ends up duplicating the entry block, and
re-writing the result instead of statically fold the jump.

Hmmm, but threading 2 blocks is not really threading at all??  Unless
I'm misunderstanding something, this was even documented in the
backwards threader:

[snip]
  That's not really a jump threading opportunity, but instead is
  simple cprop & simplification.  We could handle it here if we
  wanted by wiring up all the incoming edges.  If we run this
  early in IPA, that might be worth doing.   For now we just
  reject that case.  */
   if (m_path.length () <= 1)
   return false;

Which you undoubtedly ran into because you're specifically eliding the check:


- if (m_profit.profitable_path_p (m_path, m_name, taken_edge,
- &irreducible)
+ if ((m_path.length () == 1
+  || m_profit.profitable_path_p (m_path, m_name, taken_edge,
+ &irreducible))

Correct.  But currently the threader just "cheats", picks up one more
block and then "threads" the case anyway, doing this simple simplification
in the most expensive way possible ...

Ah.


The following tries to handle those by recording simplifications
of the exit conditional as a thread of length one.  That requires
special-casing them in the backward copier since if we do not
have any block to copy but modify the jump in place and remove
not taken edges this confuses the hell out of remaining threads.

So back_jt_path_registry::update_cfg now first marks all
edges we know are never taken and then prunes the threading
candidates when they include such edge.  Then it makes sure
to first perform unreachable edge removal (so we avoid
copying them when other thread paths contain the prevailing
edge) before continuing to apply the remaining threads.

This is all beyond my pay grade.  I'm not very well versed in the
threader per se.  So if y'all think it's a good idea, by all means.
Perhaps Jeff can chime in, or remembers the above comment?


In statistics you can see this avoids quite a bunch of useless
threads (I've investiated 3 random files from cc1files with
dropped stats in any of the thread passes).

Still thinking about it it would be nice to avoid the work of
discovering those candidates we have to throw away later
which could eventually be done by having the backward threader
perform a RPO walk over the CFG, skipping edges that can be
statically determined as not being executed.  Below I'm
abusing the path range query to statically analyze the exit
branch but I assume there's a simpler way of folding this stmt
which could then better integrate with such a walk.

Unreachable paths can be queried with
path_range_query::unreachable_path_p ().  Could you leverage this?
The idea was that if we ever resolved any SSA name to UNDEFINED, the
path itself was unreachable.

The situation is that we end up with paths where an intermediate
branch on the path can be simplified to false - but of course only
if we put all intermediate branch dependences into the list of
imports to consider.

I don't like it very much to use the "threading" code to perform
the simplification but I couldn't figure a cheap way to perform
the simplification without invoking a full EVRP

[PATCH] tree-object-size: Support strndup and strdup

2022-08-15 Thread Siddhesh Poyarekar
Use string length of input to strdup to determine the usable size of the
resulting object.  Avoid doing the same for strndup since there's a
chance that the input may be too large, resulting in an unnecessary
overhead or worse, the input may not be NULL terminated, resulting in a
crash where there would otherwise have been none.

gcc/ChangeLog:

* tree-object-size.cc (get_whole_object): New function.
(addr_object_size): Use it.
(strdup_object_size): New function.
(call_object_size): Use it.
(pass_data_object_sizes, pass_data_early_object_sizes): Set
todo_flags_finish to TODO_update_ssa_no_phi.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): New tests.
(main): Call them.
* gcc.dg/builtin-dynamic-object-size-1.c: Silence overread
warnings.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
---
 .../gcc.dg/builtin-dynamic-object-size-0.c| 43 +++
 .../gcc.dg/builtin-dynamic-object-size-1.c|  2 +-
 .../gcc.dg/builtin-dynamic-object-size-2.c|  2 +-
 .../gcc.dg/builtin-dynamic-object-size-3.c|  2 +-
 .../gcc.dg/builtin-dynamic-object-size-4.c|  2 +-
 gcc/testsuite/gcc.dg/builtin-object-size-1.c  | 64 +++-
 gcc/testsuite/gcc.dg/builtin-object-size-2.c  | 63 ++-
 gcc/testsuite/gcc.dg/builtin-object-size-3.c  | 63 ++-
 gcc/testsuite/gcc.dg/builtin-object-size-4.c  | 63 ++-
 gcc/tree-object-size.cc   | 76 +--
 10 files changed, 366 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index 01a280b2d7b..7f023708b15 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -479,6 +479,40 @@ test_loop (int *obj, size_t sz, size_t start, size_t end, 
int incr)
   return __builtin_dynamic_object_size (ptr, 0);
 }
 
+/* strdup/strndup.  */
+
+size_t
+__attribute__ ((noinline))
+test_strdup (const char *in)
+{
+  char *res = __builtin_strdup (in);
+  return __builtin_dynamic_object_size (res, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strndup (const char *in, size_t bound)
+{
+  char *res = __builtin_strndup (in, bound);
+  return __builtin_dynamic_object_size (res, 0);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strdup_min (const char *in)
+{
+  char *res = __builtin_strdup (in);
+  return __builtin_dynamic_object_size (res, 2);
+}
+
+size_t
+__attribute__ ((noinline))
+test_strndup_min (const char *in, size_t bound)
+{
+  char *res = __builtin_strndup (in, bound);
+  return __builtin_dynamic_object_size (res, 2);
+}
+
 /* Other tests.  */
 
 struct TV4
@@ -651,6 +685,15 @@ main (int argc, char **argv)
   int *t = test_pr105736 (&val3);
   if (__builtin_dynamic_object_size (t, 0) != -1)
 FAIL ();
+  const char *str = "hello world";
+  if (test_strdup (str) != __builtin_strlen (str) + 1)
+FAIL ();
+  if (test_strndup (str, 4) != 5)
+FAIL ();
+  if (test_strdup_min (str) != __builtin_strlen (str) + 1)
+FAIL ();
+  if (test_strndup_min (str, 4) != 0)
+FAIL ();
 
   if (nfails > 0)
 __builtin_abort ();
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
index 7cc8b1c9488..8f17c8edcaf 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -Wno-stringop-overread" } */
 /* { dg-require-effective-target alloca } */
 
 #define __builtin_object_size __builtin_dynamic_object_size
diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
index 267dbf48ca7..3677782ff1c 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -Wno-stringop-overread" }

Re: [PATCH] Support threading of just the exit edge

2022-08-15 Thread Aldy Hernandez via Gcc-patches
On Mon, Aug 15, 2022 at 11:39 AM Richard Biener  wrote:
>
> On Fri, 12 Aug 2022, Aldy Hernandez wrote:
>
> > On Fri, Aug 12, 2022 at 2:01 PM Richard Biener  wrote:
> > >
> > > This started with noticing we add ENTRY_BLOCK to our threads
> > > just for the sake of simplifying the conditional at the end of
> > > the first block in a function.  That's not really threading
> > > anything but it ends up duplicating the entry block, and
> > > re-writing the result instead of statically fold the jump.
> >
> > Hmmm, but threading 2 blocks is not really threading at all??  Unless
> > I'm misunderstanding something, this was even documented in the
> > backwards threader:
> >
> > [snip]
> >  That's not really a jump threading opportunity, but instead is
> >  simple cprop & simplification.  We could handle it here if we
> >  wanted by wiring up all the incoming edges.  If we run this
> >  early in IPA, that might be worth doing.   For now we just
> >  reject that case.  */
> >   if (m_path.length () <= 1)
> >   return false;
> >
> > Which you undoubtedly ran into because you're specifically eliding the 
> > check:
> >
> > > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge,
> > > - &irreducible)
> > > + if ((m_path.length () == 1
> > > +  || m_profit.profitable_path_p (m_path, m_name, taken_edge,
> > > + &irreducible))
>
> Correct.  But currently the threader just "cheats", picks up one more
> block and then "threads" the case anyway, doing this simple simplification
> in the most expensive way possible ...

Ah.

>
> > >
> > > The following tries to handle those by recording simplifications
> > > of the exit conditional as a thread of length one.  That requires
> > > special-casing them in the backward copier since if we do not
> > > have any block to copy but modify the jump in place and remove
> > > not taken edges this confuses the hell out of remaining threads.
> > >
> > > So back_jt_path_registry::update_cfg now first marks all
> > > edges we know are never taken and then prunes the threading
> > > candidates when they include such edge.  Then it makes sure
> > > to first perform unreachable edge removal (so we avoid
> > > copying them when other thread paths contain the prevailing
> > > edge) before continuing to apply the remaining threads.
> >
> > This is all beyond my pay grade.  I'm not very well versed in the
> > threader per se.  So if y'all think it's a good idea, by all means.
> > Perhaps Jeff can chime in, or remembers the above comment?
> >
> > >
> > > In statistics you can see this avoids quite a bunch of useless
> > > threads (I've investiated 3 random files from cc1files with
> > > dropped stats in any of the thread passes).
> > >
> > > Still thinking about it it would be nice to avoid the work of
> > > discovering those candidates we have to throw away later
> > > which could eventually be done by having the backward threader
> > > perform a RPO walk over the CFG, skipping edges that can be
> > > statically determined as not being executed.  Below I'm
> > > abusing the path range query to statically analyze the exit
> > > branch but I assume there's a simpler way of folding this stmt
> > > which could then better integrate with such a walk.
> >
> > Unreachable paths can be queried with
> > path_range_query::unreachable_path_p ().  Could you leverage this?
> > The idea was that if we ever resolved any SSA name to UNDEFINED, the
> > path itself was unreachable.
>
> The situation is that we end up with paths where an intermediate
> branch on the path can be simplified to false - but of course only
> if we put all intermediate branch dependences into the list of
> imports to consider.
>
> I don't like it very much to use the "threading" code to perform
> the simplification but I couldn't figure a cheap way to perform
> the simplification without invoking a full EVRP?  That said,
> the backwards threader simply does
>
>   basic_block bb;
>   FOR_EACH_BB_FN (bb, m_fun)
> if (EDGE_COUNT (bb->succs) > 1)
>   maybe_thread_block (bb);
>
>   bool changed = m_registry.thread_through_all_blocks (true);
>
> instead of that we should only consider edges that may be executable
> by instead walking the CFG along such edges, simplifying BB exit
> conditionals.  Unfortunately EVRP is now a C++ maze so I couldn't
> find how to actually do such simplification, not knowing how
> interacting with ranger influences the path query use either.
> If you or Andrew has any suggestions on how to essentially
> do a
>
>   if (edge e = find_taken_edge (bb))
> {
> ...
> }
>
> where find_taken_edge should be at least as powerful as using
> the path query for a single bb then I'd be all ears.  As said,
> I tried to find the code to cut&paste in EVRP but failed to ...

Interesting... If what you need is find_taken_edge(bb), I think we can
do this quite cleanly.

What you're 

[committed] analyzer: fix direction of -Wanalyzer-out-of-bounds note [PR106626]

2022-08-15 Thread David Malcolm via Gcc-patches
Fix a read/write typo.

Also, add more test coverage of -Wanalyzer-out-of-bounds to help
establish a baseline for experiments on tweaking the wording of
the warning (PR analyzer/106626).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-2054-g23e8c0b0d99f58.

gcc/analyzer/ChangeLog:
PR analyzer/106626
* region-model.cc (buffer_overread::emit): Fix copy&paste error in
direction of the access in the note.

gcc/testsuite/ChangeLog:
PR analyzer/106626
* gcc.dg/analyzer/out-of-bounds-read-char-arr.c: New test.
* gcc.dg/analyzer/out-of-bounds-read-int-arr.c: New test.
* gcc.dg/analyzer/out-of-bounds-write-char-arr.c: New test.
* gcc.dg/analyzer/out-of-bounds-write-int-arr.c: New test.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc  |  4 +-
 .../analyzer/out-of-bounds-read-char-arr.c| 55 +++
 .../analyzer/out-of-bounds-read-int-arr.c | 54 ++
 .../analyzer/out-of-bounds-write-char-arr.c   | 55 +++
 .../analyzer/out-of-bounds-write-int-arr.c| 54 ++
 5 files changed, 220 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-read-char-arr.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-read-int-arr.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-write-char-arr.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-write-int-arr.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index a58904c06a8..b05b7097c00 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1447,11 +1447,11 @@ public:
print_dec (m_out_of_bounds_range.m_size_in_bytes,
   num_bytes_past_buf, UNSIGNED);
if (m_diag_arg)
- inform (rich_loc->get_loc (), "write is %s bytes past the end"
+ inform (rich_loc->get_loc (), "read is %s bytes past the end"
" of %qE", num_bytes_past_buf,
m_diag_arg);
else
- inform (rich_loc->get_loc (), "write is %s bytes past the end"
+ inform (rich_loc->get_loc (), "read is %s bytes past the end"
"of the region",
num_bytes_past_buf);
   }
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-read-char-arr.c 
b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-read-char-arr.c
new file mode 100644
index 000..61cbfc75c11
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-read-char-arr.c
@@ -0,0 +1,55 @@
+char arr[10]; /* { dg-message "capacity is 10 bytes" } */
+
+char int_arr_read_element_before_start_far(void)
+{
+  return arr[-100]; /* { dg-warning "buffer underread" "warning" } */
+  /* { dg-message "out-of-bounds read at byte -100 but 'arr' starts at byte 0" 
"final event" { target *-*-* } .-1 } */
+}
+
+char int_arr_read_element_before_start_near(void)
+{
+  return arr[-2]; /* { dg-warning "buffer underread" "warning" } */
+  /* { dg-message "out-of-bounds read at byte -2 but 'arr' starts at byte 0" 
"final event" { target *-*-* } .-1 } */
+}
+
+char int_arr_read_element_before_start_off_by_one(void)
+{
+  return arr[-1]; /* { dg-warning "buffer underread" "warning" } */
+  /* { dg-message "out-of-bounds read at byte -1 but 'arr' starts at byte 0" 
"final event" { target *-*-* } .-1 } */
+}
+
+char int_arr_read_element_at_start(void)
+{
+  return arr[0];
+}
+
+char int_arr_read_element_at_end(void)
+{
+  return arr[9];
+}
+
+char int_arr_read_element_after_end_off_by_one(void)
+{
+  return arr[10]; /* { dg-warning "buffer overread" "warning" } */
+  /* { dg-message "out-of-bounds read at byte 10 but 'arr' ends at byte 10" 
"final event" { target *-*-* } .-1 } */
+  /* { dg-message "read is 1 bytes past the end of 'arr'" "note" { target 
*-*-* } .-2 } */
+  // FIXME(PR 106626): "1 bytes"
+}
+
+char int_arr_read_element_after_end_near(void)
+{
+  return arr[11]; /* { dg-warning "buffer overread" "warning" } */
+  /* { dg-message "out-of-bounds read at byte 11 but 'arr' ends at byte 10" 
"final event" { target *-*-* } .-1 } */
+  /* { dg-message "read is 1 bytes past the end of 'arr'" "note" { target 
*-*-* } .-2 } */
+  // FIXME(PR 106626): is the note correct?
+  // FIXME(PR 106626): "1 bytes"
+}
+
+char int_arr_read_element_after_end_far(void)
+{
+  return arr[100]; /* { dg-warning "buffer overread" "warning" } */
+  /* { dg-message "out-of-bounds read at byte 100 but 'arr' ends at byte 10" 
"final event" { target *-*-* } .-1 } */
+  /* { dg-message "read is 1 bytes past the end of 'arr'" "note" { target 
*-*-* } .-2 } */
+  // FIXME(PR 106626): the note seems incorrect (size of access is 1 byte, but 
magnitude beyond boundary is 90)
+  // FIXME(PR 106626): "1 bytes"
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of

[committed] analyzer: better fix for -Wanalyzer-use-of-uninitialized-value [PR106573]

2022-08-15 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-2053-gca123e019bb92f.

gcc/analyzer/ChangeLog:
PR analyzer/106573
* region-model.cc (region_model::on_call_pre): Use check_call_args
when ensuring that we call get_arg_svalue on all args.  Remove
redundant call from handling for stdio builtins.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 7e7077696f7..a58904c06a8 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1768,8 +1768,7 @@ region_model::on_call_pre (const gcall *call, 
region_model_context *ctxt,
  duplicates if any of the handling below also looks up the svalues,
  but the deduplication code should deal with that.  */
   if (ctxt)
-for (unsigned arg_idx = 0; arg_idx < cd.num_args (); arg_idx++)
-  cd.get_arg_svalue (arg_idx);
+check_call_args (cd);
 
   /* Some of the cases below update the lhs of the call based on the
  return value, but not all.  Provide a default value, which may
@@ -1889,7 +1888,6 @@ region_model::on_call_pre (const gcall *call, 
region_model_context *ctxt,
/* These stdio builtins have external effects that are out
   of scope for the analyzer: we only want to model the effects
   on the return value.  */
-   check_call_args (cd);
break;
 
  case BUILT_IN_VA_START:
-- 
2.26.3



Re: [PATCH] analyzer: warn on the use of floating points in the size argument [PR106181]

2022-08-15 Thread David Malcolm via Gcc-patches
On Mon, 2022-08-15 at 14:35 +0200, Tim Lange wrote:
> This patch fixes the ICE reported in PR106181 and adds a new warning
> to
> the analyzer complaining about the use of floating point operands.

Thanks for the patch.

Various comments inline...

> 
> I decided to move the warning for floats inside the size argument out
> of
> the allocation size checker code and toward the allocation such that
> the
> warning only appears once.
> I'm not sure about the wording of the diagnostic, my current wording
> feels
> a bit bulky. 

Agreed, and the warning itself is probably setting a new record for
option length: -Wanalyzer-imprecise-floating-point-arithmetic is 45
characters.  I'm not without sin here: I think the current record is -
Wanalyzer-unsafe-call-within-signal-handler, which is 43.

How about:
 -Wanalyzer-imprecise-float-arithmetic
 -Wanalyzer-imprecise-fp-arithmetic
instead?  (ideas welcome)


> Here is how the diagnostics look like:
> 
> /path/to/main.c: In function ‘test_1’:
> /path/to/main.c:10:14: warning: use of floating point arithmetic
> inside the size argument might yield unexpected results

https://gcc.gnu.org/codingconventions.html#Spelling
says we should use "floating-point" rather than "floating point".

How about just this:

"warning: use of floating-point arithmetic here might yield unexpected
results"

here...

> [-Wanalyzer-imprecise-floating-point-arithmetic]
>    10 |   int *ptr = malloc (sizeof (int) * n); /* { dg-line test_1 }
> */
>   |  ^
>   ‘test_1’: event 1
>     |
>     |   10 |   int *ptr = malloc (sizeof (int) * n); /* { dg-line
> test_1 } */
>     |  |  ^
>     |  |  |
>     |  |  (1) operand ‘n’ is of type ‘float’
>     |
> /path/to/main.c:10:14: note: only use operands of a type that
> represents whole numbers inside the size argument

...and make the note say:

"only use operands of an integer type inside the size argument"

which tells the user that it's a size that we're complaining about.


> /path/to/main.c: In function ‘test_2’:
> /path/to/main.c:20:14: warning: use of floating point arithmetic
> inside the size argument might yield unexpected results [-Wanalyzer-
> imprecise-floating-point-arithmetic]
>    20 |   int *ptr = malloc (n * 3.1); /* { dg-line test_2 } */
>   |  ^~~~
>   ‘test_2’: event 1
>     |
>     |   20 |   int *ptr = malloc (n * 3.1); /* { dg-line test_2 } */
>     |  |  ^~~~
>     |  |  |
>     |  |  (1) operand ‘3.1001e+0’ is of
> type ‘double’
>     |
> /path/to/main.c:20:14: note: only use operands of a type that
> represents whole numbers inside the size argument
> 
> Also, another point to discuss is the event note in case the
> expression is
> wrapped in a variable, such as in test_3:
> /path/to/main.c:30:10: warning: use of floating point arithmetic
> inside the size argument might yield unexpected results [-Wanalyzer-
> imprecise-floating-point-arithmetic]
>    30 |   return malloc (size); /* { dg-line test_3 } */
>   |  ^
>   ‘test_3’: events 1-2
>     |
>     |   37 | void test_3 (float f)
>     |  |  ^~
>     |  |  |
>     |  |  (1) entry to ‘test_3’
>     |   38 | {
>     |   39 |   void *ptr = alloc_me (f); /* { dg-message "calling
> 'alloc_me' from 'test_3'" } */
>     |  |   
>     |  |   |
>     |  |   (2) calling ‘alloc_me’ from ‘test_3’
>     |
>     +--> ‘alloc_me’: events 3-4
>    |
>    |   28 | void *alloc_me (size_t size)
>    |  |   ^~~~
>    |  |   |
>    |  |   (3) entry to ‘alloc_me’
>    |   29 | {
>    |   30 |   return malloc (size); /* { dg-line test_3 } */
>    |  |  ~
>    |  |  |
>    |  |  (4) operand ‘f’ is of type ‘float’
>    |
> 
> I'm not sure if it is easily discoverable that event (4) does refer
> to
> 'size'. I thought about also printing get_representative_tree
> (capacity)
> in the event but that would clutter the event message if the argument
> does hold the full expression. I don't have any strong feelings about
> the
> decision here but if I had to decide I'd leave it as is (especially
> because the warning is probably quite unusual).

Yeah; get_representative_tree tries gets a tree, but trees don't give
us a good way of referring to a local var within a particular stack
frame within a path.  So leaving it as is is OK.

> The index of the argument would also be a possibility, but that would
> get
> tricky for calloc.

[...snip...]

> 
> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 61b58c575ff..bef15eae2c3 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -98,6 

Re: [PATCH] c: Implement C23 nullptr (N3042)

2022-08-15 Thread Joseph Myers
On Sat, 13 Aug 2022, Marek Polacek via Gcc-patches wrote:

> This patch also defines nullptr_t in .  I'm uncertain about
> the __STDC_VERSION__ version I should be checking.  Also, I'm not

We're using defined (__STDC_VERSION__) && __STDC_VERSION__ > 201710L until 
the final version for C23 is settled.

> defining __STDC_VERSION_STDDEF_H__ yet, because I don't know what value
> it should be defined to.  Do we know yet?

No, and Jens's comments on the editorial review before CD ballot include 
that lots of headers don't yet have such a macro definition but should 
have one, as well as needing consistency for the numbers.

We won't know the final values for these macros until much later, because 
the timescale depends on whether ISO decides to delay things at any point 
by coming up with a long list of editorial issues required to follow the 
JTC1 Directives as they did for C17 (objections to particular words 
appearing in non-normative text, etc.).

> -  { "nullptr",   RID_NULLPTR,D_CXXONLY | D_CXX11 | D_CXXWARN 
> },
> +  { "nullptr",   RID_NULLPTR,D_CXX11 | D_CXXWARN },

You need to use D_C2X (which doesn't yet exist).  In pre-C23 modes, 
nullptr needs to be a normal identifier that can be used in all contexts 
where identifiers can be used, not a keyword at all (and then you need a 
c11-nullptr*.c test to verify that use of it as an identifier works as 
expected).

> diff --git a/gcc/c/c-convert.cc b/gcc/c/c-convert.cc
> index 18083d59618..013fe6b2a53 100644
> --- a/gcc/c/c-convert.cc
> +++ b/gcc/c/c-convert.cc
> @@ -133,6 +133,14 @@ c_convert (tree type, tree expr, bool init_const)
>   (loc, type, c_objc_common_truthvalue_conversion (input_location, expr));
>  
>  case POINTER_TYPE:
> +  /* The type nullptr_t may be converted to a pointer type.  The result 
> is
> +  a null pointer value.  */
> +  if (NULLPTR_TYPE_P (TREE_TYPE (e)))
> + {
> +   ret = build_int_cst (type, 0);
> +   goto maybe_fold;
> + }

That looks like it would lose side-effects.  You need to preserve 
side-effects in an expression of nullptr_t type being converted to a 
pointer type, and need an execution testcase that verifies such 
side-effects are preserved.

Also, you need to make sure that (void *)nullptr is not treated as a null 
pointer constant, only a null pointer; build_int_cst (type, 0) would 
produce a null pointer constant when type is void *.  (void *)nullptr 
should be handled similarly to (void *)(void *)0, which isn't a null 
pointer constant either.

> @@ -133,6 +133,13 @@ null_pointer_constant_p (const_tree expr)
>/* This should really operate on c_expr structures, but they aren't
>   yet available everywhere required.  */
>tree type = TREE_TYPE (expr);
> +
> +  /* An integer constant expression with the value 0, such an expression
> + cast to type void*, or the predefined constant nullptr, are a null
> + pointer constant.  */
> +  if (NULLPTR_TYPE_P (type))
> +return true;

That looks wrong.  You need to distinguish null pointer constants of type 
nullptr_t (nullptr, possibly enclosed in parentheses, possibly the 
selected alternative from _Generic) from all other expressions of type 
nullptr_t (including (nullptr_t)nullptr, which isn't a null pointer 
constant any more than (void *)(void *)0).

Then, for each context where it matters whether a nullptr_t value is a 
null pointer constant, there need to be testcases that the two cases are 
properly distinguished.  This includes at least equality comparisons with 
a pointer that is not a null pointer constant (seem only to be allowed 
with nullptr, not with other nullptr_t expressions).  (I think for 
conditional expressions, conditionals between nullptr_t and an integer 
null pointer constant are always invalid, whether or not the nullptr_t is 
a null pointer constant, while conditionals between nullptr_t and a 
pointer are always valid.)

> +/* The type nullptr_t shall not be converted to any type other than bool or
> +   a pointer type.  No type other than nullptr_t shall be converted to 
> nullptr_t.  */

That's other than *void*, bool or a pointer type.  (That's a correct fix 
to the N3042 wording in N3047.  There are other problems in the 
integration of nullptr in N3047 that are only fixed in my subsequent fixes 
as part of the editorial review - and many issues with integration of 
other papers that haven't yet been fixed, I currently have 25 open merge 
requests resulting from editorial review.)  And of course conversions from 
nullptr_t to void should be tested.

> diff --git a/gcc/testsuite/gcc.dg/c2x-nullptr-4.c 
> b/gcc/testsuite/gcc.dg/c2x-nullptr-4.c
> new file mode 100644
> index 000..5b15e75d159
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/c2x-nullptr-4.c
> @@ -0,0 +1,10 @@
> +/* Test that we warn about `nullptr' pre-C2X.  */
> +/* { dg-do compile } */
> +/* { dg-options "-std=c17 -pedantic-errors" } */

This test is wrong - it's a normal identifier pr

[COMMITTED] PR tree-optimization/106621 - Check for undefined and varying first.

2022-08-15 Thread Andrew MacLeod via Gcc-patches
We treat POLY_INT_CSTs as VARYING, but the check in irange::set for them 
was occurring after we called irange::set_range.  This patch merely 
shuffles the orders of checks around such that we check for undefined, 
and then varying/polyints before the other cases.  Performance impact is 
negligible.


bootstraps on 86_64-pc-linux-gnu with no regressions. Compilation 
reproduced the PR and patch passes the test on a cross compiler for 
aarch64.  I have no access to bootstrap on aarch64. Hopefully the 
testcase works as provided :)


pushed.

Andrew

commit 265cdd067afd56293137ecb3057c5ba28a7c9480
Author: Andrew MacLeod 
Date:   Mon Aug 15 10:16:23 2022 -0400

Check for undefined and varying first.

Rearrange order in irange:set to ensure all POLY_INTs map to varying.

PR tree-optimization/106621
gcc/
* value-range.cc (irange::set): Check for POLY_INT_CST early.

gcc/testsuite/
* gcc.dg/pr106621.c

diff --git a/gcc/testsuite/gcc.dg/pr106621.c b/gcc/testsuite/gcc.dg/pr106621.c
new file mode 100644
index 000..0465de4f14f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr106621.c
@@ -0,0 +1,30 @@
+/* { dg-do compile { target aarch64*-*-* } } */
+/* { dg-options "-mcpu=neoverse-v1 -O2 -fvect-cost-model=dynamic -fno-tree-scev-cprop" } */
+
+int m, n;
+
+void
+foo (unsigned int x, short int y)
+{
+  if (m)
+for (;;)
+  {
+++m;
+while (m < 1)
+  {
+n += m + x;
+++m;
+  }
+  }
+
+  for (;;)
+if (y)
+  {
+++x;
+if (x)
+  for (y = 0; y < 75; y += 2)
+{
+}
+  }
+}
+
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index a2273f540e8..d056f7356e1 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -716,25 +716,6 @@ irange::irange_set_anti_range (tree min, tree max)
 void
 irange::set (tree min, tree max, value_range_kind kind)
 {
-  if (kind != VR_UNDEFINED)
-{
-  if (TREE_OVERFLOW_P (min))
-	min = drop_tree_overflow (min);
-  if (TREE_OVERFLOW_P (max))
-	max = drop_tree_overflow (max);
-}
-
-  if (!legacy_mode_p ())
-{
-  if (kind == VR_RANGE)
-	irange_set (min, max);
-  else
-	{
-	  gcc_checking_assert (kind == VR_ANTI_RANGE);
-	  irange_set_anti_range (min, max);
-	}
-  return;
-}
   if (kind == VR_UNDEFINED)
 {
   irange::set_undefined ();
@@ -749,6 +730,22 @@ irange::set (tree min, tree max, value_range_kind kind)
   return;
 }
 
+  if (TREE_OVERFLOW_P (min))
+min = drop_tree_overflow (min);
+  if (TREE_OVERFLOW_P (max))
+max = drop_tree_overflow (max);
+
+  if (!legacy_mode_p ())
+{
+  if (kind == VR_RANGE)
+	irange_set (min, max);
+  else
+	{
+	  gcc_checking_assert (kind == VR_ANTI_RANGE);
+	  irange_set_anti_range (min, max);
+	}
+  return;
+}
   // Nothing to canonicalize for symbolic ranges.
   if (TREE_CODE (min) != INTEGER_CST
   || TREE_CODE (max) != INTEGER_CST)


Re: -Wformat-overflow handling for %b and %B directives in C2X standard

2022-08-15 Thread Frolov Daniil via Gcc-patches
вт, 12 апр. 2022 г. в 00:56, Marek Polacek :

>
> On Thu, Apr 07, 2022 at 02:10:48AM +0500, Frolov Daniil wrote:
> > Hello! Thanks for your feedback. I've tried to take into account your
> > comments. New patch applied to the letter.
>
> Thanks.
>
> > The only thing I have not removed is the check_std_c2x () function. From my
> > point of view -Wformat-overflow shouldn't be thrown if the standard < C2X.
> > So it's protection for false triggering.
>
> Sorry but I still think that is the wrong behavior.  If you want to warn
> about C2X constructs in pre-C2X modes, use -Wpedantic.  But if you want
> to use %b/%B as an extension in older dialects, that's OK too, so I don't
> know why users would want -Wformat-overflow disabled in that case.  But
> perhaps other people disagree with me.
>
Hi! Sorry for the late reply. If we want to look at it as on extension
then I am agreed with you.
Removed this function in new patch.
> > сб, 2 апр. 2022 г. в 01:15, Marek Polacek :
> >
> > > On Sat, Apr 02, 2022 at 12:19:47AM +0500, Frolov Daniil via Gcc-patches
> > > wrote:
> > > > Hello, I've noticed that -Wformat-overflow doesn't handle %b and %B
> > > > directives in the sprintf function. I've added a relevant issue in
> > > bugzilla
> > > > (bug #105129).
> > > > I attach a patch with a possible solution to the letter.
> > >
> > > Thanks for the patch.  Support for C2X %b, %B formats is relatively new
> > > (Oct 2021) so it looks like gimple-ssa-sprintf.cc hasn't caught up.
> > >
> > > This is not a regression, so should probably wait till GCC 13.  Anyway...
> > >
> > > > From 2051344e9500651f6e94c44cbc7820715382b957 Mon Sep 17 00:00:00 2001
> > > > From: Frolov Daniil 
> > > > Date: Fri, 1 Apr 2022 00:47:03 +0500
> > > > Subject: [PATCH] Support %b, %B for -Wformat-overflow (sprintf, 
> > > > snprintf)
> > > >
> > > > testsuite: add tests to check -Wformat-overflow on %b.
> > > > Wformat-overflow1.c is compiled using -std=c2x so warning has to
> > > > be throwed
> > > >
> > > > Wformat-overflow2.c doesn't throw warnings cause c2x std isn't
> > > > used
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * gimple-ssa-sprintf.cc
> > > > (check_std_c2x): New function
> > > >   (fmtresult::type_max_digits): add base == 2 handling
> > > >   (tree_digits): add handle for base == 2
> > > >   (format_integer): now handle %b and %B using base = 2
> > > >   (parse_directive): add cases to handle %b and %B directives
> > > >   (compute_format_length): add handling for base = 2
> > >
> > > The descriptions should start with a capital letter and end with a period,
> > > like "Handle base == 2."
> > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * gcc.dg/Wformat-overflow1.c: New test. (using -std=c2x)
> > > >   * gcc.dg/Wformat-overflow2.c: New test. (-std=c11 no warning)
> > >
> > > You can just say "New test."
> > >
> > > > ---
> > > >  gcc/gimple-ssa-sprintf.cc| 42 
> > > >  gcc/testsuite/gcc.dg/Wformat-overflow1.c | 28 
> > > >  gcc/testsuite/gcc.dg/Wformat-overflow2.c | 16 +
> > > >  3 files changed, 79 insertions(+), 7 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/Wformat-overflow1.c
> > > >  create mode 100644 gcc/testsuite/gcc.dg/Wformat-overflow2.c
> > > >
> > > > diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
> > > > index c93f12f90b5..7f68c2b6e51 100644
> > > > --- a/gcc/gimple-ssa-sprintf.cc
> > > > +++ b/gcc/gimple-ssa-sprintf.cc
> > > > @@ -107,6 +107,15 @@ namespace {
> > > >
> > > >  static int warn_level;
> > > >
> > > > +/* b_overflow_flag depends on the current standart when using gcc */
> > >
> > > "standard"
> > >
> > > /* Comments should be formatted like this.  */
> > >
> > > > +static bool b_overflow_flag;
> > > > +
> > > > +/* check is current standart version equals C2X*/
> > > > +static bool check_std_c2x ()
> > > > +{
> > > > +  return !strcmp (lang_hooks.name, "GNU C2X");
> > > > +}
> > >
> > > Is this really needed?  ISTM that this new checking shouldn't depend on
> > > -std=c2x.  If not using C2X, you only get a warning if -Wpedantic.  So
> > > I think you should remove b_overflow_flag.
> > >
> > > >  /* The minimum, maximum, likely, and unlikely maximum number of bytes
> > > > of output either a formatting function or an individual directive
> > > > can result in.  */
> > > > @@ -535,6 +544,8 @@ fmtresult::type_max_digits (tree type, int base)
> > > >unsigned prec = TYPE_PRECISION (type);
> > > >switch (base)
> > > >  {
> > > > +case 2:
> > > > +  return prec;
> > > >  case 8:
> > > >return (prec + 2) / 3;
> > > >  case 10:
> > > > @@ -857,11 +868,11 @@ tree_digits (tree x, int base, HOST_WIDE_INT prec,
> > > bool plus, bool prefix)
> > > >
> > > >/* Adjust a non-zero value for the base prefix, either hexadecimal,
> > > >   or, unless precision has resulted in a leading zero, also octal.
> > > */
> > > > -  if (pr

Re: [PATCH] Implement __builtin_issignaling

2022-08-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 15, 2022 at 06:14:22PM +0200, FX wrote:
> Thank you for taking this on (this patch and the Fortran ones), it is really 
> appreciated and will improve gfortran's IEEE conformance.
> 
> - "Implement __builtin_issignaling”: is approved for the tiny Fortran part
> - "libgfortran: Use __builtin_issignaling in libgfortran”: was approved by 
> Thomas

Thanks.

> - I will have to look at the next two patches (ieee_value and ieee_class) in 
> the next few days. I think we can make adjustments to the runtime library as 
> well.

I think we can, but we need to make it in an ABI compatible way until
libgfortran bumps soname.
That is why I haven't used _gfortrani_whatever as name etc.
> 
> Related: there are several tests in our testsuite that have { 
> dg-require-effective-target issignaling }
> This is checked by check_effective_target_issignaling in 
> gcc/testsuite/lib/target-supports.exp, and probes the support for the 
> issignaling macro in . I think it would make sense to adjust those 
> tests to run on a wider range of targets, using the new built-in.

We could replace it with some other effective target that would test if the
target supports sNaNs (or perhaps just NaNs might be enough).

Jakub



Re: [PATCH] Implement __builtin_issignaling

2022-08-15 Thread FX via Gcc-patches
Hi Jakub,

Thank you for taking this on (this patch and the Fortran ones), it is really 
appreciated and will improve gfortran's IEEE conformance.

- "Implement __builtin_issignaling”: is approved for the tiny Fortran part
- "libgfortran: Use __builtin_issignaling in libgfortran”: was approved by 
Thomas
- I will have to look at the next two patches (ieee_value and ieee_class) in 
the next few days. I think we can make adjustments to the runtime library as 
well.

Related: there are several tests in our testsuite that have { 
dg-require-effective-target issignaling }
This is checked by check_effective_target_issignaling in 
gcc/testsuite/lib/target-supports.exp, and probes the support for the 
issignaling macro in . I think it would make sense to adjust those 
tests to run on a wider range of targets, using the new built-in.

FX

Re: [PATCH] analyzer: fix for ICE in sm-fd.cc [PR106551]

2022-08-15 Thread David Malcolm via Gcc-patches
On Mon, 2022-08-15 at 14:02 +0530, Immad Mir wrote:
> This patch fixes the ICE caused by valid_to_unchecked_state
> in sm-fd.cc by exiting early if first argument of any "dup"
> functions is invalid.
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/106551
> * sm-fd.cc (check_for_dup): exit early if first
> argument is invalid for all dup functions.
> 
> gcc/testsuite/ChangeLog:
> PR analyzer/106551
> * gcc.dg/analyzer/fd-dup-1.c: New testcase.
> 
> Signed-off-by: Immad Mir 

Thanks; looks good to me.

Dave



Re: [PATCH] Support threading of just the exit edge

2022-08-15 Thread Jeff Law via Gcc-patches




On 8/12/2022 10:03 AM, Aldy Hernandez wrote:

On Fri, Aug 12, 2022 at 2:01 PM Richard Biener  wrote:

This started with noticing we add ENTRY_BLOCK to our threads
just for the sake of simplifying the conditional at the end of
the first block in a function.  That's not really threading
anything but it ends up duplicating the entry block, and
re-writing the result instead of statically fold the jump.

Hmmm, but threading 2 blocks is not really threading at all??  Unless
I'm misunderstanding something, this was even documented in the
backwards threader:

[snip]
  That's not really a jump threading opportunity, but instead is
  simple cprop & simplification.  We could handle it here if we
  wanted by wiring up all the incoming edges.  If we run this
  early in IPA, that might be worth doing.   For now we just
  reject that case.  */
   if (m_path.length () <= 1)
   return false;
My recollection is that code was supposed to filter out the case where 
the threading path is just a definition block and a use block where the 
definition block dominated the use block.    For that case, threading 
isn't really needed as we can just use const/copy propagation to 
propagate the value from the def to the use which should in turn allow 
the use (the conditional branch) to be simplified away -- all without 
the block copying and associated CFG updates.


What doesn't make sense to me today is how do we know there's a 
dominator relationship between the two blocks?




Jeff


Re: [PATCH] s390: Use vpdi and verllg in vec_reve.

2022-08-15 Thread Andreas Krebbel via Gcc-patches
On 8/12/22 12:13, Robin Dapp wrote:
> Hi,
> 
> swapping the two elements of a V2DImode or V2DFmode vector can be done
> with vpdi instead of using the generic way of loading a permutation mask
> from the literal pool and vperm.
> 
> Analogous to the V2DI/V2DF case reversing the elements of a four-element
> vector can be done by first swapping the elements of the first
> doubleword as well the ones of the second one and subsequently rotate
> the doublewords by 32 bits.
> 
> Bootstrapped and regtested, no regressions.
> 
> Is it OK?
> 
> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
>   PR target/100869
>   * config/s390/vector.md (@vpdi4_2): New pattern.
>   (rotl3_di): New pattern.
>   * config/s390/vx-builtins.md: Use vpdi and verll for reversing
>   elements.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/zvector/vec-reve-int-long.c: New test.

Ok. Thanks!

Andreas


Re: [PATCH] s390: Add z15 to s390_issue_rate.

2022-08-15 Thread Andreas Krebbel via Gcc-patches
On 8/12/22 12:02, Robin Dapp wrote:
> Hi,
> 
> this patch tries to be more explicit by mentioning z15 in s390_issue_rate.
> 
> No changes in testsuite, bootstrap or SPEC obviously.
> 
> Is it OK?
> 
> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.cc (s390_issue_rate): Add z15.
> ---
>  gcc/config/s390/s390.cc | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index ef38fbe68c84..528cd8c7f0f6 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -8582,6 +8582,7 @@ s390_issue_rate (void)
>  case PROCESSOR_2827_ZEC12:
>  case PROCESSOR_2964_Z13:
>  case PROCESSOR_3906_Z14:
> +case PROCESSOR_8561_Z15:
>  case PROCESSOR_3931_Z16:
>  default:
>return 1;

Ok. Thanks!

Andreas



Re: [PATCH] s390: Add -munroll-only-small-loops.

2022-08-15 Thread Andreas Krebbel via Gcc-patches
On 8/12/22 12:00, Robin Dapp wrote:
> Hi,
> 
> inspired by Power we also introduce -munroll-only-small-loops.  This
> implies activating -funroll-loops and -munroll-only-small-loops at -O2
> and above.
> 
> Bootstrapped and regtested.
> 
> This introduces one regression in gcc.dg/sms-compare-debug-1.c but
> currently dumps for sms are broken as well.  The difference is in the
> location of some INSN_DELETED notes so I would consider this a minor issue.
> 
> Is it OK?
> 
> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
>   * common/config/s390/s390-common.cc: Enable -funroll-loops and
>   -munroll-only-small-loops for OPT_LEVELS_2_PLUS_SPEED_ONLY.
>   * config/s390/s390.cc (s390_loop_unroll_adjust): Do not unroll
>   loops larger than 12 instructions.
>   (s390_override_options_after_change): Set unroll options.
>   (s390_option_override_internal): Likewise.
>   * config/s390/s390.opt: Document munroll-only-small-loops.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/vector/vec-copysign.c: Do not unroll.
>   * gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Dito.
>   * gcc.target/s390/zvector/autovec-double-signaling-ltgt.c: Dito.
>   * gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Dito.
>   * gcc.target/s390/zvector/autovec-float-signaling-ltgt.c: Dito.

Ok. Thanks!

Andreas


Re: Rust frontend patches v1

2022-08-15 Thread Martin Liška
On 8/15/22 16:07, Manuel López-Ibáñez via Gcc-patches wrote:
> Dear Philip,
> 
> Another thing to pay attention to is the move to Sphinx for documentation:
> https://gcc.gnu.org/pipermail/gcc/2022-August/239233.html

Hi.

Which is something I can help you with. I have a script that converts a texinfo 
documentation
to Sphinx.

Cheers,
Martin


c++: Fix module line no testcase

2022-08-15 Thread Nathan Sidwell via Gcc-patches


Not all systems have the same injected headers, leading to line
location table differences that are immaterial to the test.  Fix the
regexp more robustly.

nathan

--
Nathan SidwellFrom af088b32def1c56538f0f3aaea16f013e9292d64 Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Mon, 15 Aug 2022 07:19:36 -0700
Subject: [PATCH] c++: Fix module line no testcase

Not all systems have the same injected headers, leading to line
location table differences that are immaterial to the test.  Fix the
regexp more robustly.

	gcc/testsuite/
	* g++.dg/modules/loc-prune-4.C: Adjust regexp
---
 gcc/testsuite/g++.dg/modules/loc-prune-4.C | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/g++.dg/modules/loc-prune-4.C b/gcc/testsuite/g++.dg/modules/loc-prune-4.C
index 765c378e51e..aa8f248b52b 100644
--- a/gcc/testsuite/g++.dg/modules/loc-prune-4.C
+++ b/gcc/testsuite/g++.dg/modules/loc-prune-4.C
@@ -18,5 +18,5 @@ int baz (int);
 
 // { dg-final { scan-lang-dump {Ordinary maps:2 locs:12288 range_bits:5} module } }
 // { dg-final { scan-lang-dump { 1 source file names\n Source file...=[^\n]*loc-prune-4.C\n} module } }
-// { dg-final { scan-lang-dump { Span:0 ordinary \[2.\+12288,\+4096\)->\[0,\+4096\)} module } }
-// { dg-final { scan-lang-dump { Span:1 ordinary \[2.\+40960,\+8192\)->\[4096,\+8192\)} module } }
+// { dg-final { scan-lang-dump { Span:0 ordinary \[[0-9]+\+12288,\+4096\)->\[0,\+4096\)} module } }
+// { dg-final { scan-lang-dump { Span:1 ordinary \[[0-9]+\+40960,\+8192\)->\[4096,\+8192\)} module } }
-- 
2.30.2



Re: Rust frontend patches v1

2022-08-15 Thread Manuel López-Ibáñez via Gcc-patches
Dear Philip,

Another thing to pay attention to is the move to Sphinx for documentation:
https://gcc.gnu.org/pipermail/gcc/2022-August/239233.html

Best,

Manuel.

On Wed, 10 Aug 2022 at 20:57, Philip Herron 
wrote:

> Hi everyone
>
> For my v2 of the patches, I've been spending a lot of time ensuring
> each patch is buildable. It would end up being simpler if it was
> possible if each patch did not have to be like this so I could split
> up the front-end in more patches. Does this make sense? In theory,
> when everything goes well, does this still mean that we can merge in
> one commit, or should it follow a series of buildable patches? I've
> received feedback that it might be possible to ignore making each
> patch an independent chunk and just focus on splitting it up as small
> as possible even if they don't build.
>
> I hope this makes sense.
>
> Thanks
>
> --Phil
>
> On Thu, 28 Jul 2022 at 10:39, Philip Herron 
> wrote:
> >
> > Thanks, for confirming David. I think it was too big in the end. I was
> > trying to figure out how to actually split that up but it seems
> > reasonable that I can split up the front-end patches into patches for
> > each separate pass in the compiler seems like a reasonable approach
> > for now.
> >
> > --Phil
> >
> > On Wed, 27 Jul 2022 at 17:45, David Malcolm  wrote:
> > >
> > > On Wed, 2022-07-27 at 14:40 +0100, herron.philip--- via Gcc-patches
> > > wrote:
> > > > This is the initial version 1 patch set for the Rust front-end. There
> > > > are more changes that need to be extracted out for all the target
> > > > hooks we have implemented. The goal is to see if we are implementing
> > > > the target hooks information for x86 and arm. We have more patches
> > > > for the other targets I can add in here but they all follow the
> > > > pattern established here.
> > > >
> > > > Each patch is buildable on its own and rebased ontop of
> > > > 718cf8d0bd32689192200d2156722167fd21a647. As for ensuring we keep
> > > > attribution for all the patches we have received in the front-end
> > > > should we create a CONTRIBUTOR's file inside the front-end folder?
> > > >
> > > > Note thanks to Thomas Schwinge and Mark Wielaard, we are keeping a
> > > > branch up to date with our code on:
> > > >
> https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/devel/rust/master
> > > >  but this is not rebased ontop of gcc head.
> > > >
> > > > Let me know if I have sent these patches correctly or not, this is a
> > > > learning experience with git send-email.
> > > >
> > > > [PATCH Rust front-end v1 1/4] Add skeleton Rust front-end folder
> > > > [PATCH Rust front-end v1 2/4] Add Rust lang TargetHooks for i386 and
> > > > [PATCH Rust front-end v1 3/4] Add Rust target hooks to ARM
> > > > [PATCH Rust front-end v1 4/4] Add Rust front-end and associated
> > >
> > > FWIW it looks like patch 4 of the kit didn't make it (I didn't get a
> > > copy and I don't see it in the archives).
> > >
> > > Maybe it exceeded a size limit?  If so, maybe try splitting it up into
> > > more patches.
> > >
> > > Dave
> > >
>


Re: Where in C++ module streaming to handle a new bitfield added in "tree_decl_common"

2022-08-15 Thread Richard Biener via Gcc-patches
On Mon, Aug 15, 2022 at 3:29 PM Nathan Sidwell via Gcc-patches
 wrote:
>
> On 8/2/22 10:44, Qing Zhao wrote:
> > Hi, Nathan,
> >
> > I am adding a new bitfield “decl_not_flexarray” in “tree_decl_common”  
> > (gcc/tree-core.h) for the new gcc feature -fstrict-flex-arrays.
> >
> > 
> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index ea9f281f1cc..458c6e6ceea 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -1813,7 +1813,10 @@ struct GTY(()) tree_decl_common {
> >   TYPE_WARN_IF_NOT_ALIGN.  */
> >unsigned int warn_if_not_align : 6;
> >
> > -  /* 14 bits unused.  */
> > +  /* In FIELD_DECL, this is DECL_NOT_FLEXARRAY.  */
> > +  unsigned int decl_not_flexarray : 1;
>
> Is it possible to invert the meaning here -- set the flag if it /IS/ a
> flexible array? negated flags can be confusing, and I see your patch
> sets it to '!is_flexible_array (...)' anyway?

The issue is it's consumed by the middle-end but set by a single (or two)
frontends and the conservative setting is having the bit not set.  That works
nicely together with touching just the frontends that want stricter behavior
than currently ...

> > +
> > +  /* 13 bits unused.  */
> >
> >/* UID for points-to sets, stable over copying from inlining.  */
> >unsigned int pt_uid;
> > 
> >
> > (Please refer to the following for details:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598556.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598965.html
>
>
>
> > )
> >
> > Richard mentioned the following:
> >
> > "I've not seen it so you are probably missing it - the bit has to be
> > streamed in tree-streamer-{in,out}.cc to be usable from LTO.  Possibly
> > C++ module streaming also needs to handle it.”
> >
> > I have figured out that where to add the handling of the bit in 
> > “tree-streamer-{in, out}.cc,
> > However, it’s quite difficult for me to locate where should I add the 
> > handling of this new bit in
> > C++ module streaming,  could you please help me on this?
> >
>
>
> add it in to trees_{in,out}::core_bools.  You could elide streaming for
> non-FIELD_DECL decls.
>
> Hope that helps.
>
> nathan
>
>
>
> > Thanks a lot for your help.
> >
> > Qing
>
>
> --
> Nathan Sidwell


Re: Where in C++ module streaming to handle a new bitfield added in "tree_decl_common"

2022-08-15 Thread Nathan Sidwell via Gcc-patches

On 8/2/22 10:44, Qing Zhao wrote:

Hi, Nathan,

I am adding a new bitfield “decl_not_flexarray” in “tree_decl_common”  
(gcc/tree-core.h) for the new gcc feature -fstrict-flex-arrays.


diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index ea9f281f1cc..458c6e6ceea 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1813,7 +1813,10 @@ struct GTY(()) tree_decl_common {
  TYPE_WARN_IF_NOT_ALIGN.  */
   unsigned int warn_if_not_align : 6;

-  /* 14 bits unused.  */
+  /* In FIELD_DECL, this is DECL_NOT_FLEXARRAY.  */
+  unsigned int decl_not_flexarray : 1;


Is it possible to invert the meaning here -- set the flag if it /IS/ a 
flexible array? negated flags can be confusing, and I see your patch 
sets it to '!is_flexible_array (...)' anyway?



+
+  /* 13 bits unused.  */

   /* UID for points-to sets, stable over copying from inlining.  */
   unsigned int pt_uid;


(Please refer to the following for details:

https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598556.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598965.html





)

Richard mentioned the following:

"I've not seen it so you are probably missing it - the bit has to be
streamed in tree-streamer-{in,out}.cc to be usable from LTO.  Possibly
C++ module streaming also needs to handle it.”

I have figured out that where to add the handling of the bit in 
“tree-streamer-{in, out}.cc,
However, it’s quite difficult for me to locate where should I add the handling 
of this new bit in
C++ module streaming,  could you please help me on this?




add it in to trees_{in,out}::core_bools.  You could elide streaming for 
non-FIELD_DECL decls.


Hope that helps.

nathan




Thanks a lot for your help.

Qing



--
Nathan Sidwell


Re: [PATCH] Implement __builtin_issignaling

2022-08-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 15, 2022 at 12:07:38PM +, Richard Biener wrote:
> Ah, I misread
> 
> +static rtx
> +expand_builtin_issignaling (tree exp, rtx target)
> +{
> +  if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE))
> +return NULL_RTX;
> +
> +  tree arg = CALL_EXPR_ARG (exp, 0);
> +  scalar_float_mode fmode = SCALAR_FLOAT_TYPE_MODE (TREE_TYPE (arg));
> +  const struct real_format *fmt = REAL_MODE_FORMAT (fmode);
> +
> +  /* Expand the argument yielding a RTX expression. */
> +  rtx temp = expand_normal (arg);
> +
> +  /* If mode doesn't support NaN, always return 0.  */
> +  if (!HONOR_NANS (fmode))
> +{
> +  emit_move_insn (target, const0_rtx);
> +  return target;

I think I can expand on the comment why HONOR_NANS instead of HONOR_SNANS
and also add comment to the folding case.

> which doesn't use HONOR_SNANS but still HONOR_NANS and thus
> -ffinite-math-only.  You possibly want MODE_HAS_NANS instead here?

But I'm not sure we want this.  With -ffast-math/-ffinite-math-only etc.,
__builtin_isnan or __builtin_fpclassify for NaNs/Infs will just return 0,
so it would be strange if __builtin_issignaling didn't.
People usually only call __builtin_issignaling when they know they
have a NaN, so they want it guarded with
__builtin_isnan/__builtin_fpclassify or say <=> unordered.

> > That seems like a glibc bug/weird feature in the __MATH_TG macro
> > or _Generic.
> > When compiled with C++ it is rejected.
> 
> So what about __builtin_issignaling then?  Do we want to silently
> ignore errors there?

I think we should just restrict it to the scalar floating point types.
After all, other typegeneric builtins that are or can be used similarly
do the same thing.

Jakub



[PATCH] analyzer: warn on the use of floating points in the size argument [PR106181]

2022-08-15 Thread Tim Lange
This patch fixes the ICE reported in PR106181 and adds a new warning to
the analyzer complaining about the use of floating point operands.

I decided to move the warning for floats inside the size argument out of
the allocation size checker code and toward the allocation such that the
warning only appears once.
I'm not sure about the wording of the diagnostic, my current wording feels
a bit bulky. Here is how the diagnostics look like:

/path/to/main.c: In function ‘test_1’:
/path/to/main.c:10:14: warning: use of floating point arithmetic inside the 
size argument might yield unexpected results 
[-Wanalyzer-imprecise-floating-point-arithmetic]
   10 |   int *ptr = malloc (sizeof (int) * n); /* { dg-line test_1 } */
  |  ^
  ‘test_1’: event 1
|
|   10 |   int *ptr = malloc (sizeof (int) * n); /* { dg-line test_1 } */
|  |  ^
|  |  |
|  |  (1) operand ‘n’ is of type ‘float’
|
/path/to/main.c:10:14: note: only use operands of a type that represents whole 
numbers inside the size argument
/path/to/main.c: In function ‘test_2’:
/path/to/main.c:20:14: warning: use of floating point arithmetic inside the 
size argument might yield unexpected results 
[-Wanalyzer-imprecise-floating-point-arithmetic]
   20 |   int *ptr = malloc (n * 3.1); /* { dg-line test_2 } */
  |  ^~~~
  ‘test_2’: event 1
|
|   20 |   int *ptr = malloc (n * 3.1); /* { dg-line test_2 } */
|  |  ^~~~
|  |  |
|  |  (1) operand ‘3.1001e+0’ is of type 
‘double’
|
/path/to/main.c:20:14: note: only use operands of a type that represents whole 
numbers inside the size argument

Also, another point to discuss is the event note in case the expression is
wrapped in a variable, such as in test_3:
/path/to/main.c:30:10: warning: use of floating point arithmetic inside the 
size argument might yield unexpected results 
[-Wanalyzer-imprecise-floating-point-arithmetic]
   30 |   return malloc (size); /* { dg-line test_3 } */
  |  ^
  ‘test_3’: events 1-2
|
|   37 | void test_3 (float f)
|  |  ^~
|  |  |
|  |  (1) entry to ‘test_3’
|   38 | {
|   39 |   void *ptr = alloc_me (f); /* { dg-message "calling 'alloc_me' 
from 'test_3'" } */
|  |   
|  |   |
|  |   (2) calling ‘alloc_me’ from ‘test_3’
|
+--> ‘alloc_me’: events 3-4
   |
   |   28 | void *alloc_me (size_t size)
   |  |   ^~~~
   |  |   |
   |  |   (3) entry to ‘alloc_me’
   |   29 | {
   |   30 |   return malloc (size); /* { dg-line test_3 } */
   |  |  ~
   |  |  |
   |  |  (4) operand ‘f’ is of type ‘float’
   |

I'm not sure if it is easily discoverable that event (4) does refer to
'size'. I thought about also printing get_representative_tree (capacity)
in the event but that would clutter the event message if the argument
does hold the full expression. I don't have any strong feelings about the
decision here but if I had to decide I'd leave it as is (especially
because the warning is probably quite unusual).
The index of the argument would also be a possibility, but that would get
tricky for calloc.

Regrtested on Linux x86_64, ran the analyzer & analyzer-torture tests with
the -m32 option enabled and had no false positives on coreutils, httpd,
openssh and curl.

2022-08-15  Tim Lange  

gcc/analyzer/ChangeLog:

PR analyzer/106181
* analyzer.opt: Add Wanalyzer-imprecise-floating-point-arithmetic.
* region-model-impl-calls.cc (region_model::impl_call_alloca):
Add call to region_model::check_region_capacity_for_floats.
(region_model::impl_call_calloc):
Add call to region_model::check_region_capacity_for_floats.
(region_model::impl_call_malloc):
Add call to region_model::check_region_capacity_for_floats.
* region-model.cc (is_any_cast_p): Formatting.
(region_model::check_region_size): Ensure precondition.
(class imprecise_floating_point_arithmetic): New abstract
diagnostic class for all floating point related warnings.
(class float_as_size_arg): Concrete diagnostic class to complain
about floating point operands inside the size argument.
(class contains_floating_point_visitor):
New visitor to find floating point operands inside svalues.
(region_model::check_region_capacity_for_floats):
New function.
* region-model.h (class region_model):
Add region_model::check_region_capacity_for_floats.

gcc/ChangeLog:

PR analyzer/106181
* doc/invoke.texi:
Add Wan

Re: [PATCH] tree-optimization/106593 - fix ICE with backward threading

2022-08-15 Thread Richard Biener via Gcc-patches
On Fri, 12 Aug 2022, Andrew MacLeod wrote:

> 
> On 8/12/22 07:31, Aldy Hernandez wrote:
> > On Fri, Aug 12, 2022 at 12:59 PM Richard Biener  wrote:
> >> With the last re-org I failed to make sure to not add SSA names
> >> nor supported by ranger into m_imports which then triggers an
> >> ICE in range_on_path_entry because range_of_expr returns false.  I've
> >> noticed that range_on_path_entry does mightly complicated things
> >> that don't make sense to me and the commentary might just be
> >> out of date.  For the sake of it I replaced it with range_on_entry
> >> and statistics show we thread _more_ jumps with that, so better
> >> not do magic there.
> > Hang on, hang on.  range_on_path_entry was written that way for a
> > reason.  Andrew and I had numerous discussions about this.  For that
> > matter, my first implementation did exactly what you're proposing, but
> > he had reservations about using range_on_entry, which IIRC he thought
> > should be removed from the (public) API because it had a tendency to
> > blow up lookups.
> >
> > Let's wait for Andrew to chime in on this.  If indeed the commentary
> > is out of date, I would much rather use range_on_entry like you
> > propose, but he and I have fought many times about this... over
> > various versions of the path solver :).
> 
> The original issue with range-on-entry is one needed to be very careful with
> it.  If you ask for range-on-entry of something which is not dominated by the
> definition, then the cache filling walk was getting filled all the way back to
> the top of the IL, and that was both a waste of time and memory., and in some
> pathological cases was outrageous.

I think this won't happen with the backthreader sanitizing of m_imports.

I have pushed the change given the comments made later.

Thanks,
Richard.

>  And it was happening more frequently than
> one imagines... even if accidentally.  I think the most frequent accidental
> misuse we saw was calling range on entry for a def within the block, or a PHI
> for the block.
> 
> Its a legitimate issue for used before defined cases, but there isnt much we
> can do about those anyway,
> 
> range_of_expr on any stmt within a block, when the definition comes from
> outside he block causes ranger to trigger its internal range-on-entry "more
> safely", which is why it didn't need to be part of the API... but i admit it
> does cause some conniptions when for instance there is no stmt in the block.
> 
> That said, the improvements since then to the cache to be able to always use
> dominators, and selectively update the cache at strategic locations probably
> removes most issues with it. That plus we're more careful about timing things
> these days to make sure something horrid isn't introduced.  I also notice all
> my internal range_on_entry and _exit routines have evolved and are much
> cleaner than they once were.
> 
> So. now that we are sufficiently mature in this space...  I think we can
> promote range_on_entry and range_on_exit to full public API..  It does seem
> that there is some use practical use for them.
> 
> Andrew
> 
> PS. It might even be worthwhile to add an assert to make sure it isnt being
> called on the def block.. just to avoid that particular stupidty :-)   I'll
> take care of doing this.
> 
> 
> 
> 
> 
> > For now I would return VARYING in range_on_path_entry if range_of_expr
> > returns false.  We shouldn't be ICEing when we can gracefully handle
> > things.  This gcc_unreachable was there to catch implementation issues
> > during development.
> >
> > I would keep your gimple_range_ssa_p check regardless.  No sense doing
> > extra work if we're absolutely sure we won't handle it.
> >
> > Aldy
> >
> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >>
> >> Will push if that succeeds.
> >>
> >>  PR tree-optimization/106593
> >>  * tree-ssa-threadbackward.cc (back_threader::find_paths):
> >>  If the imports from the conditional do not satisfy
> >>  gimple_range_ssa_p don't try to thread anything.
> >>  * gimple-range-path.cc (range_on_path_entry): Just
> >>  call range_on_entry.
> >> ---
> >>   gcc/gimple-range-path.cc   | 33 +
> >>   gcc/tree-ssa-threadbackward.cc |  6 +-
> >>   2 files changed, 6 insertions(+), 33 deletions(-)
> >>
> >> diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
> >> index b6148eb5bd7..a7d277c31b8 100644
> >> --- a/gcc/gimple-range-path.cc
> >> +++ b/gcc/gimple-range-path.cc
> >> @@ -153,38 +153,7 @@ path_range_query::range_on_path_entry (vrange &r, tree
> >> name)
> >>   {
> >> gcc_checking_assert (defined_outside_path (name));
> >> basic_block entry = entry_bb ();
> >> -
> >> -  // Prefer to use range_of_expr if we have a statement to look at,
> >> -  // since it has better caching than range_on_edge.
> >> -  gimple *last = last_stmt (entry);
> >> -  if (last)
> >> -{
> >> -  if (m_ranger->range_of_expr (

Re: [PATCH] c++: Implement P2327R1 - De-deprecating volatile compound operations

2022-08-15 Thread Marek Polacek via Gcc-patches
On Mon, Aug 15, 2022 at 12:31:10PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> From what I can see, this has been voted in as a DR and as it means
> we warn less often than before in -std={gnu,c}++2{0,3} modes or with
> -Wvolatile, I wonder if it shouldn't be backported to affected release
> branches as well.

I'd say so.
 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Can't approve but LGTM.

> 2022-08-15  Jakub Jelinek  
> 
>   * typeck.cc (cp_build_modify_expr): Implement
>   P2327R1 - De-deprecating volatile compound operations.  Don't warn
>   for |=, &= or ^= with volatile lhs.
>   * expr.cc (mark_use) : Adjust warning wording,
>   leave out simple.
> 
>   * g++.dg/cpp2a/volatile1.C: Adjust for de-deprecation of volatile
>   compound |=, &= and ^= operations.
>   * g++.dg/cpp2a/volatile3.C: Likewise.
>   * g++.dg/cpp2a/volatile5.C: Likewise.
> 
> --- gcc/cp/typeck.cc.jj   2022-06-17 17:36:19.689107831 +0200
> +++ gcc/cp/typeck.cc  2022-08-14 11:14:15.368316963 +0200
> @@ -9136,10 +9136,14 @@ cp_build_modify_expr (location_t loc, tr
>  
> /* An expression of the form E1 op= E2.  [expr.ass] says:
>"Such expressions are deprecated if E1 has volatile-qualified
> -  type."  We warn here rather than in cp_genericize_r because
> +  type and op is not one of the bitwise operators |, &, ^."
> +  We warn here rather than in cp_genericize_r because
>for compound assignments we are supposed to warn even if the
>assignment is a discarded-value expression.  */
> -   if (TREE_THIS_VOLATILE (lhs) || CP_TYPE_VOLATILE_P (lhstype))
> +   if (modifycode != BIT_AND_EXPR
> +   && modifycode != BIT_IOR_EXPR
> +   && modifycode != BIT_XOR_EXPR
> +   && (TREE_THIS_VOLATILE (lhs) || CP_TYPE_VOLATILE_P (lhstype)))
>   warning_at (loc, OPT_Wvolatile,
>   "compound assignment with %-qualified left "
>   "operand is deprecated");
> --- gcc/cp/expr.cc.jj 2022-06-27 11:18:02.268063761 +0200
> +++ gcc/cp/expr.cc2022-08-14 11:41:37.555649422 +0200
> @@ -220,7 +220,7 @@ mark_use (tree expr, bool rvalue_p, bool
>  case MODIFY_EXPR:
>   {
> tree lhs = TREE_OPERAND (expr, 0);
> -   /* [expr.ass] "A simple assignment whose left operand is of
> +   /* [expr.ass] "An assignment whose left operand is of
>a volatile-qualified type is deprecated unless the assignment
>is either a discarded-value expression or appears in an
>unevaluated context."  */
> @@ -230,7 +230,7 @@ mark_use (tree expr, bool rvalue_p, bool
> && !TREE_THIS_VOLATILE (expr))
>   {
> if (warning_at (location_of (expr), OPT_Wvolatile,
> -   "using value of simple assignment with "
> +   "using value of assignment with "
> "%-qualified left operand is "
> "deprecated"))
>   /* Make sure not to warn about this assignment again.  */
> --- gcc/testsuite/g++.dg/cpp2a/volatile1.C.jj 2020-07-28 15:39:10.013756159 
> +0200
> +++ gcc/testsuite/g++.dg/cpp2a/volatile1.C2022-08-14 11:46:42.721626890 
> +0200
> @@ -56,6 +56,9 @@ fn2 ()
>vi = i;
>vi = i = 42;
>i = vi = 42; // { dg-warning "assignment with .volatile.-qualified left 
> operand is deprecated" "" { target c++20 } }
> +  i = vi |= 42; // { dg-warning "using value of assignment with 
> .volatile.-qualified left operand is deprecated" "" { target c++20 } }
> +  i = vi &= 42; // { dg-warning "using value of assignment with 
> .volatile.-qualified left operand is deprecated" "" { target c++20 } }
> +  i = vi ^= 42; // { dg-warning "using value of assignment with 
> .volatile.-qualified left operand is deprecated" "" { target c++20 } }
>&(vi = i); // { dg-warning "assignment with .volatile.-qualified left 
> operand is deprecated" "" { target c++20 } }
>(vi = 42, 45);
>(i = vi = 42, 10); // { dg-warning "assignment with .volatile.-qualified 
> left operand is deprecated" "" { target c++20 } }
> @@ -74,8 +77,9 @@ fn2 ()
>vi += i; // { dg-warning "assignment with .volatile.-qualified left 
> operand is deprecated" "" { target c++20 } }
>vi -= i; // { dg-warning "assignment with .volatile.-qualified left 
> operand is deprecated" "" { target c++20 } }
>vi %= i; // { dg-warning "assignment with .volatile.-qualified left 
> operand is deprecated" "" { target c++20 } }
> -  vi ^= i; // { dg-warning "assignment with .volatile.-qualified left 
> operand is deprecated" "" { target c++20 } }
> -  vi |= i; // { dg-warning "assignment with .volatile.-qualified left 
> operand is deprecated" "" { target c++20 } }
> +  vi ^= i; // { dg-bogus "assignment with .volatile.-qualified left operand 
> is deprecated" }
> +  vi |= i; // { dg-bogus "assignment with .volatile.-qualified left operand 
> is 

Re: [PATCH] Implement __builtin_issignaling

2022-08-15 Thread Richard Biener via Gcc-patches
On Mon, 15 Aug 2022, Jakub Jelinek wrote:

> On Mon, Aug 15, 2022 at 11:24:14AM +, Richard Biener wrote:
> > Unlike the issignalling macro from glibc the builtin will return
> > false for sNaN arguments when -fno-signalling-nans is used (similar
> > to isinf, isnan, etc.).  I think this deserves mentioning in the
> > documentation (and I have my reservations about this long-time
> > behavior of FP classification builtins we have).
> 
> I have actually tried to make the builtin working even with
> -fno-signaling-nans (i.e. the default).
> That is why the folding is done only if the argument is REAL_CST
> or if !tree_expr_maybe_nan_p (arg).
> At one point I was doing the folding when
> tree_expr_signaling_nan_p (arg) (to true) or
> !tree_expr_maybe_signaling_nan_p (arg) (to false) and in that
> case indeed -fsignaling-nans was a requirement.
> -fsignaling-nans is used in the tests nevertheless because the
> tests really do care about sNaNs, so I've turned on the option
> that says they should be honored.

Ah, I misread

+static rtx
+expand_builtin_issignaling (tree exp, rtx target)
+{
+  if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE))
+return NULL_RTX;
+
+  tree arg = CALL_EXPR_ARG (exp, 0);
+  scalar_float_mode fmode = SCALAR_FLOAT_TYPE_MODE (TREE_TYPE (arg));
+  const struct real_format *fmt = REAL_MODE_FORMAT (fmode);
+
+  /* Expand the argument yielding a RTX expression. */
+  rtx temp = expand_normal (arg);
+
+  /* If mode doesn't support NaN, always return 0.  */
+  if (!HONOR_NANS (fmode))
+{
+  emit_move_insn (target, const0_rtx);
+  return target;

which doesn't use HONOR_SNANS but still HONOR_NANS and thus
-ffinite-math-only.  You possibly want MODE_HAS_NANS instead here?

> > Generally it looks OK - what does it do to size optimized code?
> 
> The problem is that except for the glibc __issignaling{f,,l,f128}
> entrypoints, other C libraries don't implement it, so there is nothing to
> fallback to (unless we want to also implement it in libgcc.a).
> 
> For float/double, it is relatively short:
> movd%xmm0, %eax
> xorl$4194304, %eax
> andl$2147483647, %eax
> cmpl$2143289344, %eax
> seta%al
> movzbl  %al, %eax
> which e.g. for if (__builtin_issignaling (arg)) could be even simplified
> further by just doing ja or jna, resp.
> movabsq $9221120237041090560, %rdx
> movq%xmm0, %rax
> btcq$51, %rax
> btrq$63, %rax
> cmpq%rax, %rdx
> setb%al
> movzbl  %al, %eax
> For long double (especially Intel) / _Float128 it is larger (26 insns for 
> XFmode,
> 15 for _Float128), sure.
> 
> > glibc 2.31 seems to silently accept
> > 
> > #include 
> > 
> > int foo(_Complex double x)
> > {
> >   return issignaling (x);
> > }
> 
> That seems like a glibc bug/weird feature in the __MATH_TG macro
> or _Generic.
> When compiled with C++ it is rejected.

So what about __builtin_issignaling then?  Do we want to silently
ignore errors there?

Richard.


Re: [PATCH] libgfortran: Use __builtin_issignaling in libgfortran

2022-08-15 Thread Thomas Koenig via Gcc-patches



Hi Jakub,


The following patch makes use of the new __builtin_issignaling,
so it no longer needs the fallback implementation and can use
the builtin even where glibc provides the macro.

Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux
and powerpc64le-linux, ok for trunk?


OK. Can you mention PR 105105 in the ChangeLog when you commit?

Thanks for the patch!

Best regards

Thomas


Re: [PATCH] Implement __builtin_issignaling

2022-08-15 Thread Andreas Schwab via Gcc-patches
On Aug 15 2022, Jakub Jelinek via Gcc-patches wrote:

> That seems like a glibc bug/weird feature in the __MATH_TG macro
> or _Generic.

__MATH_TG is only defined for real floating types, since all of the type
generic macros in  only accept real floating types.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] Implement __builtin_issignaling

2022-08-15 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 15, 2022 at 11:24:14AM +, Richard Biener wrote:
> Unlike the issignalling macro from glibc the builtin will return
> false for sNaN arguments when -fno-signalling-nans is used (similar
> to isinf, isnan, etc.).  I think this deserves mentioning in the
> documentation (and I have my reservations about this long-time
> behavior of FP classification builtins we have).

I have actually tried to make the builtin working even with
-fno-signaling-nans (i.e. the default).
That is why the folding is done only if the argument is REAL_CST
or if !tree_expr_maybe_nan_p (arg).
At one point I was doing the folding when
tree_expr_signaling_nan_p (arg) (to true) or
!tree_expr_maybe_signaling_nan_p (arg) (to false) and in that
case indeed -fsignaling-nans was a requirement.
-fsignaling-nans is used in the tests nevertheless because the
tests really do care about sNaNs, so I've turned on the option
that says they should be honored.

> Generally it looks OK - what does it do to size optimized code?

The problem is that except for the glibc __issignaling{f,,l,f128}
entrypoints, other C libraries don't implement it, so there is nothing to
fallback to (unless we want to also implement it in libgcc.a).

For float/double, it is relatively short:
movd%xmm0, %eax
xorl$4194304, %eax
andl$2147483647, %eax
cmpl$2143289344, %eax
seta%al
movzbl  %al, %eax
which e.g. for if (__builtin_issignaling (arg)) could be even simplified
further by just doing ja or jna, resp.
movabsq $9221120237041090560, %rdx
movq%xmm0, %rax
btcq$51, %rax
btrq$63, %rax
cmpq%rax, %rdx
setb%al
movzbl  %al, %eax
For long double (especially Intel) / _Float128 it is larger (26 insns for 
XFmode,
15 for _Float128), sure.

> glibc 2.31 seems to silently accept
> 
> #include 
> 
> int foo(_Complex double x)
> {
>   return issignaling (x);
> }

That seems like a glibc bug/weird feature in the __MATH_TG macro
or _Generic.
When compiled with C++ it is rejected.

Jakub



Re: [PATCH] ifcvt: Fix up noce_convert_multiple_sets [PR106590]

2022-08-15 Thread Richard Biener via Gcc-patches
On Mon, 15 Aug 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled on x86_64-linux.
> The problem is in the noce_convert_multiple_sets optimization.
> We essentially have:
> if (g == 1)
>   {
> g = 1;
> f = 23;
>   }
> else
>   {
> g = 2;
> f = 20;
>   }
> and for each insn try to create a conditional move sequence.
> There is code to detect overlap with the regs used in the condition
> and the destinations, so we actually try to construct:
> tmp_g = g == 1 ? 1 : 2;
> f = g == 1 ? 23 : 20;
> g = tmp_g;
> which is fine.  But, we actually try to create two different
> conditional move sequences in each case, seq1 with the whole
> (eq (reg/v:HI 82 [ g ]) (const_int 1 [0x1]))
> condition and seq2 with cc_cmp
> (eq (reg:CCZ 17 flags) (const_int 0 [0]))
> to rely on the earlier present comparison.  In each case, we
> compare the rtx costs and choose the cheaper sequence (seq1 if both
> have the same cost).
> The problem is that with the skylake tuning,
> tmp_g = g == 1 ? 1 : 2;
> is actually expanded as
> tmp_g = (g == 1) + 1;
> in seq1 (which clobbers (reg 17 flags)) and as a cmov in seq2
> (which doesn't).  The tuning says both have the same cost, so we
> pick seq1.  Next we check sequences for
> f = g == 1 ? 23 : 20; and here the seq2 cmov is cheaper, but it
> uses (reg 17 flags) which has been clobbered earlier.
> 
> The following patch fixes that by detecting if we in the chosen
> sequence clobber some register mentioned in cc_cmp or rev_cc_cmp,
> and if yes, arranges for only seq1 (i.e. sequences that emit the
> comparison itself) to be used after that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2022-08-15  Jakub Jelinek  
> 
>   PR rtl-optimization/106590
>   * ifcvt.cc (check_for_cc_cmp_clobbers): New function.
>   (noce_convert_multiple_sets_1): If SEQ sets or clobbers any regs
>   mentioned in cc_cmp or rev_cc_cmp, don't consider seq2 for any
>   further conditional moves.
> 
>   * gcc.dg/torture/pr106590.c: New test.
> 
> --- gcc/ifcvt.cc.jj   2022-07-26 10:32:23.0 +0200
> +++ gcc/ifcvt.cc  2022-08-12 16:31:45.348151269 +0200
> @@ -3369,6 +3369,20 @@ noce_convert_multiple_sets (struct noce_
>return TRUE;
>  }
>  
> +/* Helper function for noce_convert_multiple_sets_1.  If store to
> +   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
> +
> +static void
> +check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
> +{
> +  rtx *p = (rtx *) p0;
> +  if (p[0] == NULL_RTX)
> +return;
> +  if (reg_overlap_mentioned_p (dest, p[0])
> +  || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
> +p[0] = NULL_RTX;
> +}
> +
>  /* This goes through all relevant insns of IF_INFO->then_bb and tries to
> create conditional moves.  In case a simple move sufficis the insn
> should be listed in NEED_NO_CMOV.  The rewired-src cases should be
> @@ -3519,7 +3533,7 @@ noce_convert_multiple_sets_1 (struct noc
>  
>as min/max and emit an insn, accordingly.  */
>unsigned cost1 = 0, cost2 = 0;
> -  rtx_insn *seq, *seq1, *seq2;
> +  rtx_insn *seq, *seq1, *seq2 = NULL;
>rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
>bool read_comparison = false;
>  
> @@ -3531,9 +3545,10 @@ noce_convert_multiple_sets_1 (struct noc
>as well.  This allows the backend to emit a cmov directly without
>creating an additional compare for each.  If successful, costing
>is easier and this sequence is usually preferred.  */
> -  seq2 = try_emit_cmove_seq (if_info, temp, cond,
> -  new_val, old_val, need_cmov,
> -  &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
> +  if (cc_cmp)
> + seq2 = try_emit_cmove_seq (if_info, temp, cond,
> +new_val, old_val, need_cmov,
> +&cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
>  
>/* The backend might have created a sequence that uses the
>condition.  Check this.  */
> @@ -3588,6 +3603,24 @@ noce_convert_multiple_sets_1 (struct noc
> return FALSE;
>   }
>  
> +  if (cc_cmp)
> + {
> +   /* Check if SEQ can clobber registers mentioned in
> +  cc_cmp and/or rev_cc_cmp.  If yes, we need to use
> +  only seq1 from that point on.  */
> +   rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
> +   for (walk = seq; walk; walk = NEXT_INSN (walk))
> + {
> +   note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
> +   if (cc_cmp_pair[0] == NULL_RTX)
> + {
> +   cc_cmp = NULL_RTX;
> +   rev_cc_cmp = NULL_RTX;
> +   break;
> + }
> + }
> + }
> +
>/* End the sub sequence and emit to the main sequence.  */
>emit_insn (seq);
>  
> --- gcc/testsuite/gcc.dg/torture/pr106590.c.jj2

Re: [PATCH v6] LoongArch: add addr_global attribute

2022-08-15 Thread Xi Ruoyao via Gcc-patches
Can we make a final solution to this soon?  Now the merge window of
Linux 6.0 is closed and we have two Linux kernel releases not possible
to be built with Binutils or GCC with new relocation types.  This is
just ugly...

On Fri, 2022-08-12 at 17:17 +0800, Xi Ruoyao via Gcc-patches wrote:
> v5 -> v6:
> 
> * still use "addr_global" as we don't have a better name.
> * add a test case with -mno-explicit-relocs.
> 
> -- >8 --
> 
> A linker script and/or a section attribute may locate a local object
> in
> some way unexpected by the code model, leading to a link failure. 
> This
> happens when the Linux kernel loads a module with "local" per-CPU
> variables.
> 
> Add an attribute to explicitly mark an variable with the address
> unlimited by the code model so we would be able to work around such
> problems.
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch.cc (loongarch_attribute_table):
> New attribute table.
> (TARGET_ATTRIBUTE_TABLE): Define the target hook.
> (loongarch_handle_addr_global_attribute): New static function.
> (loongarch_classify_symbol): Return SYMBOL_GOT_DISP for
> SYMBOL_REF_DECL with addr_global attribute.
> (loongarch_use_anchors_for_symbol_p): New static function.
> (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Define the target hook.
> * doc/extend.texi (Variable Attributes): Document new
> LoongArch specific attribute.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/attr-addr_global-1.c: New test.
> * gcc.target/loongarch/attr-addr_global-2.c: New test.
> ---
>  gcc/config/loongarch/loongarch.cc | 63
> +++
>  gcc/doc/extend.texi   | 17 +
>  .../gcc.target/loongarch/attr-addr_global-1.c | 29 +
>  .../gcc.target/loongarch/attr-addr_global-2.c | 29 +
>  4 files changed, 138 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-
> addr_global-1.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/attr-
> addr_global-2.c
> 
> diff --git a/gcc/config/loongarch/loongarch.cc
> b/gcc/config/loongarch/loongarch.cc
> index 79687340dfd..978e66ed549 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -1643,6 +1643,15 @@ loongarch_classify_symbol (const_rtx x)
>    && !loongarch_symbol_binds_local_p (x))
>  return SYMBOL_GOT_DISP;
>  
> +  if (SYMBOL_REF_P (x))
> +    {
> +  tree decl = SYMBOL_REF_DECL (x);
> +  /* An addr_global symbol may be out of the +/- 2GiB range
> around
> +    the PC, so we have to use GOT.  */
> +  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES
> (decl)))
> +   return SYMBOL_GOT_DISP;
> +    }
> +
>    return SYMBOL_PCREL;
>  }
>  
> @@ -6068,6 +6077,54 @@ loongarch_starting_frame_offset (void)
>    return crtl->outgoing_args_size;
>  }
>  
> +static tree
> +loongarch_handle_addr_global_attribute (tree *node, tree name, tree,
> int,
> +   bool *no_add_attrs)
> +{
> +  tree decl = *node;
> +  if (TREE_CODE (decl) == VAR_DECL)
> +    {
> +  if (DECL_CONTEXT (decl)
> + && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL
> + && !TREE_STATIC (decl))
> +   {
> + error_at (DECL_SOURCE_LOCATION (decl),
> +   "%qE attribute cannot be specified for local "
> +   "variables", name);
> + *no_add_attrs = true;
> +   }
> +    }
> +  else
> +    {
> +  warning (OPT_Wattributes, "%qE attribute ignored", name);
> +  *no_add_attrs = true;
> +    }
> +  return NULL_TREE;
> +}
> +
> +static const struct attribute_spec loongarch_attribute_table[] =
> +{
> +  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
> +   affects_type_identity, handler, exclude } */
> +  { "addr_global", 0, 0, true, false, false, false,
> +    loongarch_handle_addr_global_attribute, NULL },
> +  /* The last attribute spec is set to be NULL.  */
> +  {}
> +};
> +
> +bool
> +loongarch_use_anchors_for_symbol_p (const_rtx symbol)
> +{
> +  tree decl = SYMBOL_REF_DECL (symbol);
> +
> +  /* An addr_global attribute indicates the linker may move the
> symbol away,
> + so the use of anchor may cause relocation overflow.  */
> +  if (decl && lookup_attribute ("addr_global", DECL_ATTRIBUTES
> (decl)))
> +    return false;
> +
> +  return default_use_anchors_for_symbol_p (symbol);
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> @@ -6256,6 +6313,12 @@ loongarch_starting_frame_offset (void)
>  #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
>  #define TARGET_HAVE_SPECULATION_SAFE_VALUE
> speculation_safe_value_not_needed
>  
> +#undef  TARGET_ATTRIBUTE_TABLE
> +#define TARGET_ATTRIBUTE_TABLE loongarch_attribute_table
> +
> +#undef  TARGET_USE_ANCHORS_FOR_SYMBOL_P
> +#define TARGET_USE_ANCHORS_FOR_SYMBOL_P
> loongarch_use_anchors_f

Re: [PATCH] Implement __builtin_issignaling

2022-08-15 Thread Richard Biener via Gcc-patches
On Mon, 15 Aug 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following patch implements a new builtin, __builtin_issignaling,
> which can be used to implement the ISO/IEC TS 18661-1  issignaling
> macro.
> 
> It is implemented as type-generic function, so there is just one
> builtin, not many with various suffixes.
> This patch doesn't address PR56831 nor PR58416, but I think compared to
> using glibc issignaling macro could make some cases better (as
> the builtin is expanded always inline and for SFmode/DFmode just
> reinterprets a memory or pseudo register as SImode/DImode, so could
> avoid some raising of exception + turning sNaN into qNaN before the
> builtin can analyze it).
> 
> For floading point modes that do not have NaNs it will return 0,
> otherwise I've tried to implement this for all the other supported
> real formats.
> It handles both the MIPS/PA floats where a sNaN has the mantissa
> MSB set and the rest where a sNaN has it cleared, with the exception
> of format which are known never to be in the MIPS/PA form.
> The MIPS/PA floats are handled using a test like
> (x & mask) == mask,
> the other usually as
> ((x ^ bit) & mask) > val
> where bit, mask and val are some constants.
> IBM double double is done by doing DFmode test on the most significant
> half, and Intel/Motorola extended (12 or 16 bytes) and IEEE quad are
> handled by extracting 32-bit/16-bit words or 64-bit parts from the
> value and testing those.
> On x86, XFmode is handled by a special optab so that even pseudo numbers
> are considered signaling, like in glibc and like the i386 specific testcase
> tests.
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux and
> powerpc64-linux (the last tested with -m32/-m64), ok for trunk?

Unlike the issignalling macro from glibc the builtin will return
false for sNaN arguments when -fno-signalling-nans is used (similar
to isinf, isnan, etc.).  I think this deserves mentioning in the
documentation (and I have my reservations about this long-time
behavior of FP classification builtins we have).

Generally it looks OK - what does it do to size optimized code?

glibc 2.31 seems to silently accept

#include 

int foo(_Complex double x)
{
  return issignaling (x);
}

for vector double we get

t.c:5:23: error: incompatible type for argument 1 of ‘__issignalingf’
5 |   return issignaling (x);
  |   ^
  |   |
  |   v2df {aka __vector(2) double}
/usr/include/bits/mathcalls-helper-functions.h:42:1: note: expected 
‘float’ but argument is of type ‘v2df’ {aka ‘__vector(2) double’}
   42 | __MATHDECL_1 (int, __issignaling,, (_Mdouble_ __value))
  | ^

as far as I can see your __builtin silently accepts all arguments
without diagnostics and eventually dispatches to 'issignaling'
which isn't in glibc which instead seems to have __issignaling?

Richard.

> 2022-08-15  Jakub Jelinek  
> 
> gcc/
>   * builtins.def (BUILT_IN_ISSIGNALING): New built-in.
>   * builtins.cc (expand_builtin_issignaling): New function.
>   (expand_builtin_signbit): Don't overwrite target.
>   (expand_builtin): Handle BUILT_IN_ISSIGNALING.
>   (fold_builtin_classify): Likewise.
>   (fold_builtin_1): Likewise.
>   * optabs.def (issignaling_optab): New.
>   * fold-const-call.cc (fold_const_call_ss): Handle
>   BUILT_IN_ISSIGNALING.
>   * config/i386/i386.md (issignalingxf2): New expander.
>   * doc/extend.texi (__builtin_issignaling): Document.
>   * doc/md.texi (issignaling2): Likewise.
> gcc/c-family/
>   * c-common.cc (check_builtin_function_arguments): Handle
>   BUILT_IN_ISSIGNALING.
> gcc/c/
>   * c-typeck.cc (convert_arguments): Handle BUILT_IN_ISSIGNALING.
> gcc/fortran/
>   * f95-lang.cc (gfc_init_builtin_functions): Initialize
>   BUILT_IN_ISSIGNALING.
> gcc/testsuite/
>   * gcc.dg/torture/builtin-issignaling-1.c: New test.
>   * gcc.dg/torture/builtin-issignaling-2.c: New test.
>   * gcc.target/i386/builtin-issignaling-1.c: New test.
> 
> --- gcc/builtins.def.jj   2022-01-11 23:11:21.548301986 +0100
> +++ gcc/builtins.def  2022-08-11 12:15:14.200908656 +0200
> @@ -908,6 +908,7 @@ DEF_GCC_BUILTIN(BUILT_IN_ISLESS,
>  DEF_GCC_BUILTIN(BUILT_IN_ISLESSEQUAL, "islessequal", BT_FN_INT_VAR, 
> ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
>  DEF_GCC_BUILTIN(BUILT_IN_ISLESSGREATER, "islessgreater", 
> BT_FN_INT_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
>  DEF_GCC_BUILTIN(BUILT_IN_ISUNORDERED, "isunordered", BT_FN_INT_VAR, 
> ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
> +DEF_GCC_BUILTIN(BUILT_IN_ISSIGNALING, "issignaling", BT_FN_INT_VAR, 
> ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
>  DEF_LIB_BUILTIN(BUILT_IN_LABS, "labs", BT_FN_LONG_LONG, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_C99_BUILTIN(BUILT_IN_LLABS, "llabs", BT_FN_LONGLONG_LONGLONG, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_LONGJMP,

Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran

2022-08-15 Thread Chung-Lin Tang

On 2022/8/15 7:06 PM, Chung-Lin Tang wrote:


I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran 
interacts, but it's
finally working. Again tested without regressions. Preparing to commit to 
devel/omp/gcc-12, and seeking
approval for mainline when the requires patches are in.


Just realized that I don't have the new testcases added in this patch.
Will supplement them later :P

Thanks,
Chung-Lin


[PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran

2022-08-15 Thread Chung-Lin Tang

After the first libgfortran memory allocator preparation patch, this is the
actual patch that organizes unified_shared_memory allocation into libgfortran.

In the current OpenMP requires implementation, the requires_mask is collected
through offload LTO processing, and presented to libgomp when registering
offload images through GOMP_offload_register_ver() (called by the mkoffload 
generated
constructor linked into the program binary)

This means that the only reliable place to access omp_requires_mask is in
GOMP_offload_register_ver, however since it is called through an ELF constructor
in the *main program*, this runs later than libgfortran/runtime/main.c:init() 
constructor,
and because some libgfortran init actions there start allocating memory, this 
can cause
more deallocation errors later.

Another issue is that CUDA appears to be registering some cleanup actions using 
atexit(),
which forces libgomp to register gomp_target_fini() using atexit as well (to 
properly run
before the underlying CUDA stuff disappears). This happens to us here as well.

So to summarize we need to: (1) order libgfortran init actions after 
omp_requires_mask
processing is done, and (2) order libgfortran cleanup actions before 
gomp_target_fini,
to properly deallocate stuff without crashing.

The above explanation is for why there's a little new set of definitions, as 
well as
callback registering functions exported from libgomp to libgfortran, basically 
to register
libgfortran init/fini actions into libgomp to run.

Inside GOMP_offload_register_ver, after omp_requires_mask processing is done, 
we call into
libgfortran through a new _gfortran_mem_allocators_init function to insert the 
omp_free/alloc/etc.
based allocators into the Fortran runtime, when 
GOMP_REQUIRES_UNIFIED_SHARED_MEMORY is set.

All symbol references between libgfortran/libgomp are defined with weak 
symbols. Test of the
weak symbols are also used to determine if the other library exists in this 
program.

A final issue is: the case where we have an OpenMP program that does NOT have 
offloading.
We cannot passively determine in libgomp/libgfortran whether offloading exists 
or not, only the
main program itself can, by seeing if the hidden __OFFLOAD_TABLE__ exists.

When we do init/fini libgomp callback registering for OpenMP programs, those 
with no offloading
will not have those callback properly run (because of no offload image loading)
Therefore the solution here is a constructor added into the crtoffloadend.o 
fragment that does
a "null" call of GOMP_offload_register_ver, solely for triggering the 
post-offload_register callbacks
when __OFFLOAD_TABLE__ is NULL. (and because of this, the crtoffloadend.o 
Makefile rule is adjusted
to compile with PIC)

I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran 
interacts, but it's
finally working. Again tested without regressions. Preparing to commit to 
devel/omp/gcc-12, and seeking
approval for mainline when the requires patches are in.

Thanks,
Chung-Lin

2022-08-15  Chung-Lin Tang  

libgcc/
* Makefile.in (crtoffloadend$(objext)): Add $(PICFLAG) to compile rule.
* offloadstuff.c (GOMP_offload_register_ver): Add declaration of weak
symbol.
(__OFFLOAD_TABLE__): Likewise.
(init_non_offload): New function.

libgfortran/

* gfortran.map (GFORTRAN_13): New namespace.
(_gfortran_mem_allocators_init): New name inside GFORTRAN_13.
* libgfortran.h (mem_allocators_init): New exported declaration.
* runtime/main.c (do_init): Rename from init, add run-once guard code.
(cleanup): Add run-once guard code.
(GOMP_post_offload_register_callback): Declare weak symbol.
(GOMP_pre_gomp_target_fini_callback): Likewise.
(init): New constructor to register offload callbacks, or call do_init
when not OpenMP.
* runtime/memory.c (gfortran_malloc): New pointer variable.
(gfortran_calloc): Likewise.
(gfortran_realloc): Likewise.
(gfortran_free): Likewise.
(mem_allocators_init): New function.
(xmalloc): Use gfortran_malloc.
(xmallocarray): Use gfortran_malloc.
(xcalloc): Use gfortran_calloc.
(xrealloc): Use gfortran_realloc.
(xfree): Use gfortran_free.

libgomp/

* libgomp.map (GOMP_5.1.2): New version namespace.
(GOMP_post_offload_register_callback): New name inside GOMP_5.1.2.
(GOMP_pre_gomp_target_fini_callback): Likewise.
(GOMP_DEFINE_CALLBACK_SET): Macro to define callback set.
(post_offload_register): Define callback set for after offload image
register.
(pre_gomp_target_fini): Define callback set for before gomp_target_fini
is called.
(libgfortran_malloc_usm): New function.
(libgfortran_calloc_usm): Likewise
(libgfortran_realloc_usm): Likewise
(libgfortran_free_usm): Likewise.
(_gfortran_mem_allocators_init): De

[PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust libgfortran memory allocators

2022-08-15 Thread Chung-Lin Tang

Hi, this patch is to fix the case where 'requires unified_shared_memory' doesn't
work due to memory allocator mismatch. Currently this is only for OG12 
(devel/omp/gcc-12),
but will apply to mainline as well once those requires patches get in.

Basically, under 'requires unified_shared_memory' enables the usm_transform 
pass,
which transforms some of the expanded Fortran intrinsic code that uses 
__builtin_free()
into 'omp_free (..., ompx_unified_shared_mem_alloc)'.

The intention is to make all dynamic memory allocation use the OpenMP 
unified_shared_memory
allocator, but there is a big gap in this, namely libgfortran. What happens in 
some tests
are that libgfortran allocates stuff using normal malloc(), and the 
usm_transform generates
code that frees the stuff using omp_free(), and chaos ensues.

So the proper fix we believe is: to make it possible to move the entire 
libgfortran on to
unified_shared_memory.

This first patch is a mostly mechanical patch to change all references of 
malloc/free/calloc/realloc
in libgfortran into xmalloc/xfree/xcalloc/xrealloc in 
libgfortran/runtime/memory.c,
as well as strdup uses into a new internal xstrdup.

All of libgfortran is adjusted this way, except libgfortran/caf, which is an 
independent library
outside of libgfortran.so.

The second patch of this series will present a way to switch the references of 
allocators
in libgfortran/runtime/memory.c from the normal glibc malloc/free/etc. to 
omp_alloc/omp_free/etc.
when 'requires unified_shared_memory' is detected.

Tested on devel/omp/gcc-12. Plans is to commit there soon, but also seeking 
approval for mainline
once the requires stuff goes in.

Thanks,
Chung-Lin

2022-08-15  Chung-Lin Tang  

libgfortran/ChangeLog:

* m4/matmul_internal.m4: Adjust malloc/free to xmalloc/xfree.
* generated/matmul_c10.c: Regenerate.
* generated/matmul_c16.c: Likewise.
* generated/matmul_c17.c: Likewise.
* generated/matmul_c4.c: Likewise.
* generated/matmul_c8.c: Likewise.
* generated/matmul_i1.c: Likewise.
* generated/matmul_i16.c: Likewise.
* generated/matmul_i2.c: Likewise.
* generated/matmul_i4.c: Likewise.
* generated/matmul_i8.c: Likewise.
* generated/matmul_r10.c: Likewise.
* generated/matmul_r16.c: Likewise.
* generated/matmul_r17.c: Likewise.
* generated/matmul_r4.c: Likewise.
* generated/matmul_r8.c: Likewise.
* generated/matmulavx128_c10.c: Likewise.
* generated/matmulavx128_c16.c: Likewise.
* generated/matmulavx128_c17.c: Likewise.
* generated/matmulavx128_c4.c: Likewise.
* generated/matmulavx128_c8.c: Likewise.
* generated/matmulavx128_i1.c: Likewise.
* generated/matmulavx128_i16.c: Likewise.
* generated/matmulavx128_i2.c: Likewise.
* generated/matmulavx128_i4.c: Likewise.
* generated/matmulavx128_i8.c: Likewise.
* generated/matmulavx128_r10.c: Likewise.
* generated/matmulavx128_r16.c: Likewise.
* generated/matmulavx128_r17.c: Likewise.
* generated/matmulavx128_r4.c: Likewise.
* generated/matmulavx128_r8.c: Likewise.
* intrinsics/access.c (access_func): Adjust free to xfree.
* intrinsics/chdir.c (chdir_i4_sub): Likewise.
(chdir_i8_sub): Likewise.
* intrinsics/chmod.c (chmod_func): Likewise.
* intrinsics/date_and_time.c (secnds): Likewise.
* intrinsics/env.c (PREFIX(getenv)): Likewise.
(get_environment_variable_i4): Likewise.
* intrinsics/execute_command_line.c (execute_command_line): Likewise.
* intrinsics/getcwd.c (getcwd_i4_sub): Likewise.
* intrinsics/getlog.c (PREFIX(getlog)): Likewise.
* intrinsics/link.c (link_internal): Likewise.
* intrinsics/move_alloc.c (move_alloc): Likewise.
* intrinsics/perror.c (perror_sub): Likewise.
* intrinsics/random.c (constructor_random): Likewise.
* intrinsics/rename.c (rename_internal): Likewise.
* intrinsics/stat.c (stat_i4_sub_0): Likewise.
(stat_i8_sub_0): Likewise.
* intrinsics/symlnk.c (symlnk_internal): Likewise.
* intrinsics/system.c (system_sub): Likewise.
* intrinsics/unlink.c (unlink_i4_sub): Likewise.
* io/async.c (update_pdt): Likewise.
(async_io): Likewise.
(free_async_unit): Likewise.
(init_async_unit): Adjust calloc to xcalloc.
(enqueue_done_id): Likewise.
(enqueue_done): Likewise.
(enqueue_close): Likewise.
* io/async.h (MUTEX_DEBUG_ADD): Adjust malloc/free to xmalloc/xfree.
* io/close.c (st_close): Adjust strdup/free to xstrdup/xfree.
* io/fbuf.c (fbuf_destroy): Adjust free to xfree.
* io/format.c (free_format_hash_table): Likewise.
(save_parsed_format): Likewise.
(free_format): Likewise.
(free_format_data): Likewise.
* io/intrinsics.c (ttynam):

[PATCH] ifcvt: Fix up noce_convert_multiple_sets [PR106590]

2022-08-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled on x86_64-linux.
The problem is in the noce_convert_multiple_sets optimization.
We essentially have:
if (g == 1)
  {
g = 1;
f = 23;
  }
else
  {
g = 2;
f = 20;
  }
and for each insn try to create a conditional move sequence.
There is code to detect overlap with the regs used in the condition
and the destinations, so we actually try to construct:
tmp_g = g == 1 ? 1 : 2;
f = g == 1 ? 23 : 20;
g = tmp_g;
which is fine.  But, we actually try to create two different
conditional move sequences in each case, seq1 with the whole
(eq (reg/v:HI 82 [ g ]) (const_int 1 [0x1]))
condition and seq2 with cc_cmp
(eq (reg:CCZ 17 flags) (const_int 0 [0]))
to rely on the earlier present comparison.  In each case, we
compare the rtx costs and choose the cheaper sequence (seq1 if both
have the same cost).
The problem is that with the skylake tuning,
tmp_g = g == 1 ? 1 : 2;
is actually expanded as
tmp_g = (g == 1) + 1;
in seq1 (which clobbers (reg 17 flags)) and as a cmov in seq2
(which doesn't).  The tuning says both have the same cost, so we
pick seq1.  Next we check sequences for
f = g == 1 ? 23 : 20; and here the seq2 cmov is cheaper, but it
uses (reg 17 flags) which has been clobbered earlier.

The following patch fixes that by detecting if we in the chosen
sequence clobber some register mentioned in cc_cmp or rev_cc_cmp,
and if yes, arranges for only seq1 (i.e. sequences that emit the
comparison itself) to be used after that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-08-15  Jakub Jelinek  

PR rtl-optimization/106590
* ifcvt.cc (check_for_cc_cmp_clobbers): New function.
(noce_convert_multiple_sets_1): If SEQ sets or clobbers any regs
mentioned in cc_cmp or rev_cc_cmp, don't consider seq2 for any
further conditional moves.

* gcc.dg/torture/pr106590.c: New test.

--- gcc/ifcvt.cc.jj 2022-07-26 10:32:23.0 +0200
+++ gcc/ifcvt.cc2022-08-12 16:31:45.348151269 +0200
@@ -3369,6 +3369,20 @@ noce_convert_multiple_sets (struct noce_
   return TRUE;
 }
 
+/* Helper function for noce_convert_multiple_sets_1.  If store to
+   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
+
+static void
+check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
+{
+  rtx *p = (rtx *) p0;
+  if (p[0] == NULL_RTX)
+return;
+  if (reg_overlap_mentioned_p (dest, p[0])
+  || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
+p[0] = NULL_RTX;
+}
+
 /* This goes through all relevant insns of IF_INFO->then_bb and tries to
create conditional moves.  In case a simple move sufficis the insn
should be listed in NEED_NO_CMOV.  The rewired-src cases should be
@@ -3519,7 +3533,7 @@ noce_convert_multiple_sets_1 (struct noc
 
 as min/max and emit an insn, accordingly.  */
   unsigned cost1 = 0, cost2 = 0;
-  rtx_insn *seq, *seq1, *seq2;
+  rtx_insn *seq, *seq1, *seq2 = NULL;
   rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
   bool read_comparison = false;
 
@@ -3531,9 +3545,10 @@ noce_convert_multiple_sets_1 (struct noc
 as well.  This allows the backend to emit a cmov directly without
 creating an additional compare for each.  If successful, costing
 is easier and this sequence is usually preferred.  */
-  seq2 = try_emit_cmove_seq (if_info, temp, cond,
-new_val, old_val, need_cmov,
-&cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
+  if (cc_cmp)
+   seq2 = try_emit_cmove_seq (if_info, temp, cond,
+  new_val, old_val, need_cmov,
+  &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
 
   /* The backend might have created a sequence that uses the
 condition.  Check this.  */
@@ -3588,6 +3603,24 @@ noce_convert_multiple_sets_1 (struct noc
  return FALSE;
}
 
+  if (cc_cmp)
+   {
+ /* Check if SEQ can clobber registers mentioned in
+cc_cmp and/or rev_cc_cmp.  If yes, we need to use
+only seq1 from that point on.  */
+ rtx cc_cmp_pair[2] = { cc_cmp, rev_cc_cmp };
+ for (walk = seq; walk; walk = NEXT_INSN (walk))
+   {
+ note_stores (walk, check_for_cc_cmp_clobbers, cc_cmp_pair);
+ if (cc_cmp_pair[0] == NULL_RTX)
+   {
+ cc_cmp = NULL_RTX;
+ rev_cc_cmp = NULL_RTX;
+ break;
+   }
+   }
+   }
+
   /* End the sub sequence and emit to the main sequence.  */
   emit_insn (seq);
 
--- gcc/testsuite/gcc.dg/torture/pr106590.c.jj  2022-08-12 16:39:33.931965959 
+0200
+++ gcc/testsuite/gcc.dg/torture/pr106590.c 2022-08-12 16:39:17.752179521 
+0200
@@ -0,0 +1,75 @@
+/* PR rtl-optimization/106590 } */
+/* { dg-do run } */
+/* { dg-additional-options "-mtune=skylake" { target { i?8

[PATCH] c++: Implement P2327R1 - De-deprecating volatile compound operations

2022-08-15 Thread Jakub Jelinek via Gcc-patches
Hi!

>From what I can see, this has been voted in as a DR and as it means
we warn less often than before in -std={gnu,c}++2{0,3} modes or with
-Wvolatile, I wonder if it shouldn't be backported to affected release
branches as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-08-15  Jakub Jelinek  

* typeck.cc (cp_build_modify_expr): Implement
P2327R1 - De-deprecating volatile compound operations.  Don't warn
for |=, &= or ^= with volatile lhs.
* expr.cc (mark_use) : Adjust warning wording,
leave out simple.

* g++.dg/cpp2a/volatile1.C: Adjust for de-deprecation of volatile
compound |=, &= and ^= operations.
* g++.dg/cpp2a/volatile3.C: Likewise.
* g++.dg/cpp2a/volatile5.C: Likewise.

--- gcc/cp/typeck.cc.jj 2022-06-17 17:36:19.689107831 +0200
+++ gcc/cp/typeck.cc2022-08-14 11:14:15.368316963 +0200
@@ -9136,10 +9136,14 @@ cp_build_modify_expr (location_t loc, tr
 
  /* An expression of the form E1 op= E2.  [expr.ass] says:
 "Such expressions are deprecated if E1 has volatile-qualified
-type."  We warn here rather than in cp_genericize_r because
+type and op is not one of the bitwise operators |, &, ^."
+We warn here rather than in cp_genericize_r because
 for compound assignments we are supposed to warn even if the
 assignment is a discarded-value expression.  */
- if (TREE_THIS_VOLATILE (lhs) || CP_TYPE_VOLATILE_P (lhstype))
+ if (modifycode != BIT_AND_EXPR
+ && modifycode != BIT_IOR_EXPR
+ && modifycode != BIT_XOR_EXPR
+ && (TREE_THIS_VOLATILE (lhs) || CP_TYPE_VOLATILE_P (lhstype)))
warning_at (loc, OPT_Wvolatile,
"compound assignment with %-qualified left "
"operand is deprecated");
--- gcc/cp/expr.cc.jj   2022-06-27 11:18:02.268063761 +0200
+++ gcc/cp/expr.cc  2022-08-14 11:41:37.555649422 +0200
@@ -220,7 +220,7 @@ mark_use (tree expr, bool rvalue_p, bool
 case MODIFY_EXPR:
{
  tree lhs = TREE_OPERAND (expr, 0);
- /* [expr.ass] "A simple assignment whose left operand is of
+ /* [expr.ass] "An assignment whose left operand is of
 a volatile-qualified type is deprecated unless the assignment
 is either a discarded-value expression or appears in an
 unevaluated context."  */
@@ -230,7 +230,7 @@ mark_use (tree expr, bool rvalue_p, bool
  && !TREE_THIS_VOLATILE (expr))
{
  if (warning_at (location_of (expr), OPT_Wvolatile,
- "using value of simple assignment with "
+ "using value of assignment with "
  "%-qualified left operand is "
  "deprecated"))
/* Make sure not to warn about this assignment again.  */
--- gcc/testsuite/g++.dg/cpp2a/volatile1.C.jj   2020-07-28 15:39:10.013756159 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/volatile1.C  2022-08-14 11:46:42.721626890 
+0200
@@ -56,6 +56,9 @@ fn2 ()
   vi = i;
   vi = i = 42;
   i = vi = 42; // { dg-warning "assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
+  i = vi |= 42; // { dg-warning "using value of assignment with 
.volatile.-qualified left operand is deprecated" "" { target c++20 } }
+  i = vi &= 42; // { dg-warning "using value of assignment with 
.volatile.-qualified left operand is deprecated" "" { target c++20 } }
+  i = vi ^= 42; // { dg-warning "using value of assignment with 
.volatile.-qualified left operand is deprecated" "" { target c++20 } }
   &(vi = i); // { dg-warning "assignment with .volatile.-qualified left 
operand is deprecated" "" { target c++20 } }
   (vi = 42, 45);
   (i = vi = 42, 10); // { dg-warning "assignment with .volatile.-qualified 
left operand is deprecated" "" { target c++20 } }
@@ -74,8 +77,9 @@ fn2 ()
   vi += i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
   vi -= i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
   vi %= i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
-  vi ^= i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
-  vi |= i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" { target c++20 } }
+  vi ^= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
+  vi |= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
+  vi &= i; // { dg-bogus "assignment with .volatile.-qualified left operand is 
deprecated" }
   vi /= i; // { dg-warning "assignment with .volatile.-qualified left operand 
is deprecated" "" 

[PATCH] fortran: Expand ieee_arithmetic module's ieee_value inline [PR106579]

2022-08-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch expands IEEE_VALUE function inline in the FE.

Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux
and powerpc64-linux, ok for trunk?

2022-08-15  Jakub Jelinek  

PR fortran/106579
* trans-intrinsic.cc: Include realmpfr.h.
(conv_intrinsic_ieee_value): New function.
(gfc_conv_ieee_arithmetic_function): Handle ieee_value.

--- gcc/fortran/trans-intrinsic.cc.jj   2022-08-12 18:51:28.095927643 +0200
+++ gcc/fortran/trans-intrinsic.cc  2022-08-13 13:24:37.446768877 +0200
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.
 #include "trans-array.h"
 #include "dependency.h"/* For CAF array alias analysis.  */
 #include "attribs.h"
+#include "realmpfr.h"
 
 /* Only for gfc_trans_assign and gfc_trans_pointer_assign.  */
 
@@ -10085,6 +10086,115 @@ conv_intrinsic_ieee_class (gfc_se *se, g
 }
 
 
+/* Generate code for IEEE_VALUE.  */
+
+static void
+conv_intrinsic_ieee_value (gfc_se *se, gfc_expr *expr)
+{
+  tree args[2], arg, ret, tmp;
+  stmtblock_t body;
+
+  /* Convert args, evaluate the second one only once.  */
+  conv_ieee_function_args (se, expr, args, 2);
+  arg = gfc_evaluate_now (args[1], &se->pre);
+
+  tree type = TREE_TYPE (arg);
+  gcc_assert (TREE_CODE (type) == RECORD_TYPE);
+  tree field = NULL_TREE;
+  for (tree f = TYPE_FIELDS (type); f != NULL_TREE; f = DECL_CHAIN (f))
+if (TREE_CODE (f) == FIELD_DECL)
+  {
+   gcc_assert (field == NULL_TREE);
+   field = f;
+  }
+  gcc_assert (field);
+  arg = fold_build3_loc (input_location, COMPONENT_REF, TREE_TYPE (field),
+arg, field, NULL_TREE);
+  arg = gfc_evaluate_now (arg, &se->pre);
+
+  type = gfc_typenode_for_spec (&expr->ts);
+  gcc_assert (TREE_CODE (type) == REAL_TYPE);
+  ret = gfc_create_var (type, NULL);
+
+  gfc_init_block (&body);
+
+  tree end_label = gfc_build_label_decl (NULL_TREE);
+  for (int c = IEEE_SIGNALING_NAN; c <= IEEE_POSITIVE_INF; ++c)
+{
+  tree label = gfc_build_label_decl (NULL_TREE);
+  tree low = build_int_cst (TREE_TYPE (arg), c);
+  tmp = build_case_label (low, low, label);
+  gfc_add_expr_to_block (&body, tmp);
+
+  REAL_VALUE_TYPE real;
+  int k;
+  switch (c)
+   {
+   case IEEE_SIGNALING_NAN:
+ real_nan (&real, "", 0, TYPE_MODE (type));
+ break;
+   case IEEE_QUIET_NAN:
+ real_nan (&real, "", 1, TYPE_MODE (type));
+ break;
+   case IEEE_NEGATIVE_INF:
+ real_inf (&real);
+ real = real_value_negate (&real);
+ break;
+   case IEEE_NEGATIVE_NORMAL:
+ real_from_integer (&real, TYPE_MODE (type), -42, SIGNED);
+ break;
+   case IEEE_NEGATIVE_DENORMAL:
+ k = gfc_validate_kind (BT_REAL, expr->ts.kind, false);
+ real_from_mpfr (&real, gfc_real_kinds[k].tiny,
+ type, GFC_RND_MODE);
+ real_arithmetic (&real, RDIV_EXPR, &real, &dconst2);
+ real = real_value_negate (&real);
+ break;
+   case IEEE_NEGATIVE_ZERO:
+ real_from_integer (&real, TYPE_MODE (type), 0, SIGNED);
+ real = real_value_negate (&real);
+ break;
+   case IEEE_POSITIVE_ZERO:
+ /* Make this also the default: label.  */
+ label = gfc_build_label_decl (NULL_TREE);
+ tmp = build_case_label (NULL_TREE, NULL_TREE, label);
+ gfc_add_expr_to_block (&body, tmp);
+ real_from_integer (&real, TYPE_MODE (type), 0, SIGNED);
+ break;
+   case IEEE_POSITIVE_DENORMAL:
+ k = gfc_validate_kind (BT_REAL, expr->ts.kind, false);
+ real_from_mpfr (&real, gfc_real_kinds[k].tiny,
+ type, GFC_RND_MODE);
+ real_arithmetic (&real, RDIV_EXPR, &real, &dconst2);
+ break;
+   case IEEE_POSITIVE_NORMAL:
+ real_from_integer (&real, TYPE_MODE (type), 42, SIGNED);
+ break;
+   case IEEE_POSITIVE_INF:
+ real_inf (&real);
+ break;
+   default:
+ gcc_unreachable ();
+   }
+
+  tree val = build_real (type, real);
+  gfc_add_modify (&body, ret, val);
+
+  tmp = build1_v (GOTO_EXPR, end_label);
+  gfc_add_expr_to_block (&body, tmp);
+}
+
+  tmp = gfc_finish_block (&body);
+  tmp = fold_build2_loc (input_location, SWITCH_EXPR, NULL_TREE, arg, tmp);
+  gfc_add_expr_to_block (&se->pre, tmp);
+
+  tmp = build1_v (LABEL_EXPR, end_label);
+  gfc_add_expr_to_block (&se->pre, tmp);
+
+  se->expr = ret;
+}
+
+
 /* Generate code for an intrinsic function from the IEEE_ARITHMETIC
module.  */
 
@@ -10117,6 +10227,8 @@ gfc_conv_ieee_arithmetic_function (gfc_s
 conv_intrinsic_ieee_logb_rint (se, expr, BUILT_IN_RINT);
   else if (startswith (name, "ieee_class_") && ISDIGIT (name[11]))
 conv_intrinsic_ieee_class (se, expr);
+  else if (startswith (name, "ieee_value_") && ISDIGIT (name[11]))
+conv_intrinsic_ieee_value (se, expr);
   else
 /* It is not among the functio

[PATCH] fortran: Expand ieee_arithmetic module's ieee_class inline [PR106579]

2022-08-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch expands IEEE_CLASS inline in the FE, using the
__builtin_fpclassify, __builtin_signbit and the new __builtin_issignaling
builtins.

Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux
and powerpc64-linux, ok for trunk?

2022-08-15  Jakub Jelinek  

PR fortran/106579
gcc/fortran/
* f95-lang.cc (gfc_init_builtin_functions): Initialize
BUILT_IN_FPCLASSIFY.
* libgfortran.h (IEEE_OTHER_VALUE, IEEE_SIGNALING_NAN,
IEEE_QUIET_NAN, IEEE_NEGATIVE_INF, IEEE_NEGATIVE_NORMAL,
IEEE_NEGATIVE_DENORMAL, IEEE_NEGATIVE_SUBNORMAL,
IEEE_NEGATIVE_ZERO, IEEE_POSITIVE_ZERO, IEEE_POSITIVE_DENORMAL,
IEEE_POSITIVE_SUBNORMAL, IEEE_POSITIVE_NORMAL, IEEE_POSITIVE_INF):
New enum.
* trans-intrinsic.cc (conv_intrinsic_ieee_class): New function.
(gfc_conv_ieee_arithmetic_function): Handle ieee_class.
libgfortran/
* ieee/ieee_helper.c (IEEE_OTHER_VALUE, IEEE_SIGNALING_NAN,
IEEE_QUIET_NAN, IEEE_NEGATIVE_INF, IEEE_NEGATIVE_NORMAL,
IEEE_NEGATIVE_DENORMAL, IEEE_NEGATIVE_SUBNORMAL,
IEEE_NEGATIVE_ZERO, IEEE_POSITIVE_ZERO, IEEE_POSITIVE_DENORMAL,
IEEE_POSITIVE_SUBNORMAL, IEEE_POSITIVE_NORMAL, IEEE_POSITIVE_INF):
Move to gcc/fortran/libgfortran.h.

--- gcc/fortran/f95-lang.cc.jj  2022-08-12 17:06:33.906598328 +0200
+++ gcc/fortran/f95-lang.cc 2022-08-12 18:39:47.727073699 +0200
@@ -1017,8 +1017,9 @@ gfc_init_builtin_functions (void)
  "__builtin_issignaling", ATTR_CONST_NOTHROW_LEAF_LIST);
   gfc_define_builtin ("__builtin_signbit", ftype, BUILT_IN_SIGNBIT,
  "__builtin_signbit", ATTR_CONST_NOTHROW_LEAF_LIST);
+  gfc_define_builtin ("__builtin_fpclassify", ftype, BUILT_IN_FPCLASSIFY,
+ "__builtin_fpclassify", ATTR_CONST_NOTHROW_LEAF_LIST);
 
-  ftype = build_function_type (integer_type_node, NULL_TREE);
   gfc_define_builtin ("__builtin_isless", ftype, BUILT_IN_ISLESS,
  "__builtin_isless", ATTR_CONST_NOTHROW_LEAF_LIST);
   gfc_define_builtin ("__builtin_islessequal", ftype, BUILT_IN_ISLESSEQUAL,
--- gcc/fortran/libgfortran.h.jj2022-05-31 11:33:51.550250610 +0200
+++ gcc/fortran/libgfortran.h   2022-08-12 17:22:33.210947170 +0200
@@ -187,3 +187,23 @@ typedef enum
   BT_ASSUMED, BT_UNION, BT_BOZ
 }
 bt;
+
+/* Enumeration of the possible floating-point types. These values
+   correspond to the hidden arguments of the IEEE_CLASS_TYPE
+   derived-type of IEEE_ARITHMETIC.  */
+
+enum {
+  IEEE_OTHER_VALUE = 0,
+  IEEE_SIGNALING_NAN,
+  IEEE_QUIET_NAN,
+  IEEE_NEGATIVE_INF,
+  IEEE_NEGATIVE_NORMAL,
+  IEEE_NEGATIVE_DENORMAL,
+  IEEE_NEGATIVE_SUBNORMAL = IEEE_NEGATIVE_DENORMAL,
+  IEEE_NEGATIVE_ZERO,
+  IEEE_POSITIVE_ZERO,
+  IEEE_POSITIVE_DENORMAL,
+  IEEE_POSITIVE_SUBNORMAL = IEEE_POSITIVE_DENORMAL,
+  IEEE_POSITIVE_NORMAL,
+  IEEE_POSITIVE_INF
+};
--- gcc/fortran/trans-intrinsic.cc.jj   2022-06-28 13:14:45.322799333 +0200
+++ gcc/fortran/trans-intrinsic.cc  2022-08-12 18:51:28.095927643 +0200
@@ -10013,6 +10013,78 @@ conv_intrinsic_ieee_copy_sign (gfc_se *
 }
 
 
+/* Generate code for IEEE_CLASS.  */
+
+static void
+conv_intrinsic_ieee_class (gfc_se *se, gfc_expr *expr)
+{
+  tree arg, c, t1, t2, t3, t4;
+
+  /* Convert arg, evaluate it only once.  */
+  conv_ieee_function_args (se, expr, &arg, 1);
+  arg = gfc_evaluate_now (arg, &se->pre);
+
+  c = build_call_expr_loc (input_location,
+  builtin_decl_explicit (BUILT_IN_FPCLASSIFY), 6,
+  build_int_cst (integer_type_node, IEEE_QUIET_NAN),
+  build_int_cst (integer_type_node,
+ IEEE_POSITIVE_INF),
+  build_int_cst (integer_type_node,
+ IEEE_POSITIVE_NORMAL),
+  build_int_cst (integer_type_node,
+ IEEE_POSITIVE_DENORMAL),
+  build_int_cst (integer_type_node,
+ IEEE_POSITIVE_ZERO),
+  arg);
+  c = gfc_evaluate_now (c, &se->pre);
+  t1 = fold_build2_loc (input_location, EQ_EXPR, logical_type_node,
+   c, build_int_cst (integer_type_node,
+ IEEE_QUIET_NAN));
+  t2 = build_call_expr_loc (input_location,
+   builtin_decl_explicit (BUILT_IN_ISSIGNALING), 1,
+   arg);
+  t2 = fold_build2_loc (input_location, NE_EXPR, logical_type_node,
+   t2, build_zero_cst (TREE_TYPE (t2)));
+  t1 = fold_build2_loc (input_location, TRUTH_AND_EXPR,
+   logical_type_node, t1, t2);
+  t3 = fold_build2_loc (input_location, GE_EXPR, logical_type_node,
+   c, build_int_cst (integer_type_node,
+ IEEE_POSITIVE_ZERO));
+  t4 = build_call_e

[PATCH] libgfortran: Use __builtin_issignaling in libgfortran

2022-08-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch makes use of the new __builtin_issignaling,
so it no longer needs the fallback implementation and can use
the builtin even where glibc provides the macro.

Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux
and powerpc64le-linux, ok for trunk?

2022-08-15  Jakub Jelinek  

* ieee/ieee_helper.c: Don't include issignaling_fallback.h.
(CLASSMACRO): Use __builtin_issignaling instead of issignaling.
* ieee/issignaling_fallback.h: Removed.

--- libgfortran/ieee/ieee_helper.c.jj   2022-06-27 15:34:47.111928150 +0200
+++ libgfortran/ieee/ieee_helper.c  2022-08-12 13:21:00.922306862 +0200
@@ -26,13 +26,6 @@ see the files COPYING3 and COPYING.RUNTI
 #include "libgfortran.h"
 
 
-/* Check support for issignaling macro.  If not, we include our own
-   fallback implementation.  */
-#ifndef issignaling
-# include "issignaling_fallback.h"
-#endif
-
-
 /* Prototypes.  */
 
 extern int ieee_class_helper_4 (GFC_REAL_4 *);
@@ -94,7 +87,7 @@ enum {
  \
 if (res == IEEE_QUIET_NAN) \
 { \
-  if (issignaling (*value)) \
+  if (__builtin_issignaling (*value)) \
return IEEE_SIGNALING_NAN; \
   else \
return IEEE_QUIET_NAN; \
--- libgfortran/ieee/issignaling_fallback.h.jj  2022-06-28 13:14:45.332799201 
+0200
+++ libgfortran/ieee/issignaling_fallback.h 2022-08-12 13:20:17.784877531 
+0200
@@ -1,251 +0,0 @@
-/* Fallback implementation of issignaling macro.
-   Copyright (C) 2022 Free Software Foundation, Inc.
-   Contributed by Francois-Xavier Coudert 
-
-This file is part of the GNU Fortran runtime library (libgfortran).
-
-Libgfortran is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public
-License as published by the Free Software Foundation; either
-version 3 of the License, or (at your option) any later version.
-
-Libgfortran is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-Under Section 7 of GPL version 3, you are granted additional
-permissions described in the GCC Runtime Library Exception, version
-3.1, as published by the Free Software Foundation.
-
-You should have received a copy of the GNU General Public License and
-a copy of the GCC Runtime Library Exception along with this program;
-see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-.  */
-
-#include "libgfortran.h"
-
-/* This header provides an implementation of the type-generic issignaling 
macro.
-   Some points of note:
-
- - This header is only included if the issignaling macro is not defined.
- - All targets for which Fortran IEEE modules are supported currently have
-   the high-order bit of the NaN mantissa clear for signaling (and set
-   for quiet), as recommended by IEEE.
- - We use the __*_IS_IEC_60559__ macros to make sure we only deal with 
formats
-   we know. For other floating-point formats, we consider all NaNs as 
quiet.
-
- */
-
-typedef union
-{
-  float value;
-  uint32_t word;
-} ieee_float_shape_type;
-
-static inline int
-__issignalingf (float x)
-{
-#if __FLT_IS_IEC_60559__
-  uint32_t xi;
-  ieee_float_shape_type u;
-
-  u.value = x;
-  xi = u.word;
-
-  xi ^= 0x0040;
-  return (xi & 0x7fff) > 0x7fc0;
-#else
-  return 0;
-#endif
-}
-
-
-typedef union
-{
-  double value;
-  uint64_t word;
-} ieee_double_shape_type;
-
-static inline int
-__issignaling (double x)
-{
-#if __DBL_IS_IEC_60559__
-  ieee_double_shape_type u;
-  uint64_t xi;
-
-  u.value = x;
-  xi = u.word;
-
-  xi ^= UINT64_C (0x0008);
-  return (xi & UINT64_C (0x7fff)) > UINT64_C (0x7ff8);
-#else
-  return 0;
-#endif
-}
-
-
-#if __LDBL_DIG__ == __DBL_DIG__
-
-/* Long double is the same as double.  */
-static inline int
-__issignalingl (long double x)
-{
-  return __issignaling (x);
-}
-
-#elif (__LDBL_DIG__ == 18) && __LDBL_IS_IEC_60559__
-
-/* Long double is x86 extended type.  */
-
-typedef union
-{
-  long double value;
-  struct
-  {
-#if __FLOAT_WORD_ORDER__ == __ORDER_BIG_ENDIAN__
-int sign_exponent:16;
-unsigned int empty:16;
-uint32_t msw;
-uint32_t lsw;
-#elif __FLOAT_WORD_ORDER__ == __ORDER_LITTLE_ENDIAN__
-uint32_t lsw;
-uint32_t msw;
-int sign_exponent:16;
-unsigned int empty:16;
-#endif
-  } parts;
-} ieee_long_double_shape_type;
-
-static inline int
-__issignalingl (long double x)
-{
-  int ret;
-  uint32_t exi, hxi, lxi;
-  ieee_long_double_shape_type u;
-
-  u.value = x;
-  exi = u.parts.sign_exponent;
-  hxi = u.parts.msw;
-  lxi = u.parts.lsw;
-
-  /* Pseudo numbers on x86 are always signaling.  */
-  ret = (exi & 0x7fff) && ((hxi & 0x8000) == 0);
-
-  hxi ^= 0x4000;
-  hxi |= (lxi | -lxi) >> 31;
-  return ret || (((exi & 0x7fff) == 0x7fff) && (hxi > 0xc000));
-}
-
-#e

[PATCH] Implement __builtin_issignaling

2022-08-15 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch implements a new builtin, __builtin_issignaling,
which can be used to implement the ISO/IEC TS 18661-1  issignaling
macro.

It is implemented as type-generic function, so there is just one
builtin, not many with various suffixes.
This patch doesn't address PR56831 nor PR58416, but I think compared to
using glibc issignaling macro could make some cases better (as
the builtin is expanded always inline and for SFmode/DFmode just
reinterprets a memory or pseudo register as SImode/DImode, so could
avoid some raising of exception + turning sNaN into qNaN before the
builtin can analyze it).

For floading point modes that do not have NaNs it will return 0,
otherwise I've tried to implement this for all the other supported
real formats.
It handles both the MIPS/PA floats where a sNaN has the mantissa
MSB set and the rest where a sNaN has it cleared, with the exception
of format which are known never to be in the MIPS/PA form.
The MIPS/PA floats are handled using a test like
(x & mask) == mask,
the other usually as
((x ^ bit) & mask) > val
where bit, mask and val are some constants.
IBM double double is done by doing DFmode test on the most significant
half, and Intel/Motorola extended (12 or 16 bytes) and IEEE quad are
handled by extracting 32-bit/16-bit words or 64-bit parts from the
value and testing those.
On x86, XFmode is handled by a special optab so that even pseudo numbers
are considered signaling, like in glibc and like the i386 specific testcase
tests.

Bootstrapped/regtested on x86_64-linux, i686-linux, powerpc64le-linux and
powerpc64-linux (the last tested with -m32/-m64), ok for trunk?

2022-08-15  Jakub Jelinek  

gcc/
* builtins.def (BUILT_IN_ISSIGNALING): New built-in.
* builtins.cc (expand_builtin_issignaling): New function.
(expand_builtin_signbit): Don't overwrite target.
(expand_builtin): Handle BUILT_IN_ISSIGNALING.
(fold_builtin_classify): Likewise.
(fold_builtin_1): Likewise.
* optabs.def (issignaling_optab): New.
* fold-const-call.cc (fold_const_call_ss): Handle
BUILT_IN_ISSIGNALING.
* config/i386/i386.md (issignalingxf2): New expander.
* doc/extend.texi (__builtin_issignaling): Document.
* doc/md.texi (issignaling2): Likewise.
gcc/c-family/
* c-common.cc (check_builtin_function_arguments): Handle
BUILT_IN_ISSIGNALING.
gcc/c/
* c-typeck.cc (convert_arguments): Handle BUILT_IN_ISSIGNALING.
gcc/fortran/
* f95-lang.cc (gfc_init_builtin_functions): Initialize
BUILT_IN_ISSIGNALING.
gcc/testsuite/
* gcc.dg/torture/builtin-issignaling-1.c: New test.
* gcc.dg/torture/builtin-issignaling-2.c: New test.
* gcc.target/i386/builtin-issignaling-1.c: New test.

--- gcc/builtins.def.jj 2022-01-11 23:11:21.548301986 +0100
+++ gcc/builtins.def2022-08-11 12:15:14.200908656 +0200
@@ -908,6 +908,7 @@ DEF_GCC_BUILTIN(BUILT_IN_ISLESS,
 DEF_GCC_BUILTIN(BUILT_IN_ISLESSEQUAL, "islessequal", BT_FN_INT_VAR, 
ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
 DEF_GCC_BUILTIN(BUILT_IN_ISLESSGREATER, "islessgreater", 
BT_FN_INT_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
 DEF_GCC_BUILTIN(BUILT_IN_ISUNORDERED, "isunordered", BT_FN_INT_VAR, 
ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
+DEF_GCC_BUILTIN(BUILT_IN_ISSIGNALING, "issignaling", BT_FN_INT_VAR, 
ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
 DEF_LIB_BUILTIN(BUILT_IN_LABS, "labs", BT_FN_LONG_LONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_LLABS, "llabs", BT_FN_LONGLONG_LONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_LONGJMP, "longjmp", BT_FN_VOID_PTR_INT, 
ATTR_NORETURN_NOTHROW_LIST)
--- gcc/builtins.cc.jj  2022-07-26 10:32:23.250277352 +0200
+++ gcc/builtins.cc 2022-08-12 17:13:06.158423558 +0200
@@ -123,6 +123,7 @@ static rtx expand_builtin_fegetround (tr
 static rtx expand_builtin_feclear_feraise_except (tree, rtx, machine_mode,
  optab);
 static rtx expand_builtin_cexpi (tree, rtx);
+static rtx expand_builtin_issignaling (tree, rtx);
 static rtx expand_builtin_int_roundingfn (tree, rtx);
 static rtx expand_builtin_int_roundingfn_2 (tree, rtx);
 static rtx expand_builtin_next_arg (void);
@@ -2747,6 +2748,294 @@ build_call_nofold_loc (location_t loc, t
   return fn;
 }
 
+/* Expand the __builtin_issignaling builtin.  This needs to handle
+   all floating point formats that do support NaNs (for those that
+   don't it just sets target to 0).  */
+
+static rtx
+expand_builtin_issignaling (tree exp, rtx target)
+{
+  if (!validate_arglist (exp, REAL_TYPE, VOID_TYPE))
+return NULL_RTX;
+
+  tree arg = CALL_EXPR_ARG (exp, 0);
+  scalar_float_mode fmode = SCALAR_FLOAT_TYPE_MODE (TREE_TYPE (arg));
+  const struct real_format *fmt = REAL_MODE_FORMAT (fmode);
+
+  /* Expand the argument yielding a RTX expression. */
+  rtx temp = expand_normal (arg);
+
+  /* If mode d

Re: [PATCH] tree-optimization/106514 - revisit m_import compute in backward threading

2022-08-15 Thread Richard Biener via Gcc-patches
On Thu, 11 Aug 2022, Andrew MacLeod wrote:

> 
> On 8/10/22 06:46, Richard Biener wrote:
> >
> > I see the solver itself adds relations from edges on the path so
> > the cruical item here seems to be to add imports for the path
> > entry conditional, but those would likely be GORI imports for that
> > block?  Unfortunately that fails to add t[012], the GORI exports
> > seem to cover all that's needed but then exports might be too much
> 
> the GORI export list is the list of names which which GORi thinks might be
> different on the outgoing edges from a block.
> 
> The GORI import list is the list of names which GORI thinks can affect those
> ranges from outside the block.  It doesnt try to look at individual PHI
> aguments, so IIRC it treats a PHI def as originating outside the block for
> import purposes.  This should be a subset of the export list.

Ah - so for the exit block of the path (the block with the conditional
we want to simplify in the threader) using the GORI exports list as
m_imports for the path query (the set of interesting names to compute
ranges for) would be correct then.  Currently 
path_range_query::compute_imports starts with

  // Start with the imports from the exit block...
  basic_block exit = path[0];
  gori_compute &gori = m_ranger->gori ();
  bitmap r_imports = gori.imports (exit);
  bitmap_copy (imports, r_imports);

so that should use gori.exports (exit) I think.  For blocks further up
the path we cannot easily re-use the GORI imports or exports since those
would concern only with names related to the exit conditional of those
blocks, not of the conditional of the exit block on the path as we are
interested in.  So the special computation in the range path query
and the threader seems OK.

> GORI info is only every concerned ONLY with the basic block... all external
> influences are considrered to be in the import list. And all GORI calculations
> utilize a range_of_expr query from a range_query to resolve the incoming value
> of an import in order to calculate an outgoing edge.
> 
> outgoing_edge_range_p () has some additional smarts in it which allows for
> recomputations. You may have noticed the depend1/depend2 fields in the
> range_def_chain structure.  These are quick and dirty caches which represent
> the first 2 ssa-names encountered on the stmt when it was processed.  so for
> PHIs, the first 2 ssa names encountered, and for simple range-op qualifying
> stmts, the 1 or 2 ssanames in the def stmt for an ssa-name.
> 
> If you ask for the range of a_32 on an outgoing edge, and it is not in the
> export list, GORI would not be able to provide a range. Outgoing_edge_range_p
> utilizes those 2 depend fields as a quick check to see if a recomputation may
> give us a better range.  if either depend1 or depend2 asociated with a_32 IS i
> the export list, then it performs a recompuation of a_32 instead of a GORI
> calculation.  ie  from the example in my previous note:
> 
>   a_32 = f_16 + 10
> <...>
> bb88:
>   if (f_16 < 20)
> bb89:
>      b_8 = a_32 + 8
> 
> depend1 for a_32 will be f_16.
> during rangers evaluation of b_8 in bb89, it will ask for the range of a_32 on
> the edge 88->89.  a_32 is not an export, but it sees that depend1 (f_16) is in
> the export list, so it does a recalculation of a_32 on the edge, coming up
> with f_16 being [0, 19] and returning a_32 as [10, 29].
> 
> This is why you may see outgoing_edge_range_p coming up with things that GORI
> itself doesnt actually provide...

I see ;)

> your example:
> 
> void foo (int nest, int print_nest)
> {
>   _Bool t0 = nest != 0;
>   _Bool t1 = nest == print_nest;
>   _Bool t2 = t0 & t1;
>   if (t2)
>     __builtin_puts ("x");
>   nest++;
>   if (nest > 2)
>     __builtin_abort ();
>   if (print_nest == nest)
>     __builtin_puts ("y");
> }
> 
> 
> if (nest > 2) , : nest is the only import and export in that block, but
> outgoing_edge_range_p would be able to recompute t1 and t0 on those edges as
> nest is used in defining them. Likewise, print_nest is used in defining t1, so
> it can also be recomputed. on the final edges.     This could be why adding
> some dependencies from outside the block sometimes gets a better result.
> 
> the recomputation information is not rolled into the GORI exports, as it is
> more dynamic.  we don't know which order this will be seen in, so we dont know
> what ssa_names in advance use 'nest' in their calculations.   we could add
> them as they are seen, but then the bitmaps could explode in size.. so leaving
> it in this dynamic quick check seemed more practical.
> 
> There is no mapping from a name to the other ssanames that depend on it in
> either..  ie, given 'nest', there is no mechaism to see that t0 and t1 use
> it.   I had considered building this list as well, but at the moment didnt
> have a lot of use for it and thought a use map might get large quickly.
> 
> I wonder if what might help is to loop thru all the interesting names that
> have been calculated, and adding 

Re: [PATCH] Tame path_range_query::compute_imports

2022-08-15 Thread Richard Biener via Gcc-patches
On Thu, 11 Aug 2022, Aldy Hernandez wrote:

> On Thu, Aug 11, 2022 at 3:59 PM Andrew MacLeod  wrote:
> >
> >
> > On 8/11/22 07:42, Richard Biener wrote:
> > > This avoids going BBs outside of the path when adding def chains
> > > to the set of imports.  It also syncs the code with
> > > range_def_chain::get_def_chain to not miss out on some imports
> > > this function would identify.
> > >
> > > Bootstrap / regtest pending on x86_64-unknown-linux-gnu.
> > >
> > > The question still stands on what the path_range_query::compute_ranges
> > > actually needs in its m_imports - at least I don't easily see how
> > > the range-folds will use the path range cache or be path sensitive
> > > at all.
> >
> > All the range folding code is in gimple_range_fold.{h,cc}, and its
> > driven by the mystical FUR_source classes.  fur_source stands for
> > Fold_Using_Range source, and its basically just an API class which all
> > the folding routines use to make queries. it is used by all the fold
> > routines to ask any questions about valueizing relations,  ssa name,
> > etc..   but abstracts the actual source of the information. Its the
> > distillation from previous incarnations where I use to pass an edge, a
> > stmt and other stuff to each routine that it might need, and decided to
> > abstract since it was very unwieldy.  The base class requires only a
> > range_query which is then used for all queries.
> 
> Note that not only is ranger and path_query a range_query, so is
> vr_values from legacy land.  It all shares the same API.  And the
> simplify_using_ranges class takes a range_query, so it can work with
> legacy or ranger, or even (untested) the path_query class.
> 
> >
> > Then I derive fur_stmt which is instantiated additionally with the stmt
> > you wish to fold at, and it will perform queries using that stmt as the
> > context source..   Any requests for ranges/relations/etc will occur as
> > if that stmt location is the source.  If folding a particular stmt, you
> > use that stmt as the fur_stmt source.  This is also how I do
> > recalculations..  when we see
> > bb4:
> >a_32 = f_16 + 10
> > <...>
> > bb88:
> >if (f_16 < 20)
> >   b_8 = a_32 + 8
> > and there is sufficient reason to think that a_32 would have a different
> > value , we can invoke a re-fold of a_32's defintion stmt at the use
> > point in b_8..  using that stmt as the fur_source. Ranger will take into
> > account the range of f_16 being [0,19] at that spot, and recalculate
> > a_32 as [10,29].  Its expensive to do this at every use point, so we
> > only do it if we think there is a good reason at this point.
> >
> > The point is that the fur_source mechanism is how we provide a context,
> > and that class talkes care of the details of what the source actually is.
> >
> > There are other fur_sources.. fur_edge allows all the same questions to
> > be answered, but using an edge as the source. Meaning we can calculate
> > an arbitrary stmt/expressions as if it occurs on an edge.
> >
> > There are also a couple of specialized fur_sources.. there is an
> > internal one in ranger which communicates some other information called
> > fur_depend which acts like range_of_stmt, but with additional
> > functionality to register dependencies in GORI as they are seen.
> 
> This is a really good explanation.  I think you should save it and
> included it in the documentation when you/we get around to writing it
> ;-).
> 
> >
> > Aldy overloads the fur_depend class (called jt_fur_source--  Im not sure
> > the origination of the name) to work with the values in the path_query
> > class.   You will note that the path_range_query class inherits from a
> > range_query, so it supports all the range_of_expr, range_of_stmt, and
> > range_on_edge aspect of rangers API.
> 
> The name comes from "jump thread" fur_source.  I should probably
> rename that to path_fur_source.  Note that even though the full
> range_query API is available in path_range_query, only range_of_expr
> and range_of_stmt are supported (or tested).  As I mention in the
> comment for the class:
> 
> // This class is a basic block path solver.  Given a set of BBs
> // indicating a path through the CFG, range_of_expr and range_of_stmt
> // will calculate the range of an SSA or STMT as if the BBs in the
> // path would have been executed in order.
> 
> So using range_on_edge would probably give unexpected results, using
> stuff in the cache as it would appear at the end of the path, or some
> such.  We could definitely harden this class and make it work solidly
> across the entire API, but we've had no uses so far for anything but
> range_of_expr and range_of_stmt-- and even those are only supported
> for a range as it would appear at the end of the path.  So if you call
> range_of_expr with a statement anywhere but the end of the path,
> you're asking for trouble.
> 
> >
> > I believe all attempts are first made to pick up the value from the path
> > cache, and failing that, a query is made 

Re: [PATCH] Support threading of just the exit edge

2022-08-15 Thread Richard Biener via Gcc-patches
On Fri, 12 Aug 2022, Aldy Hernandez wrote:

> On Fri, Aug 12, 2022 at 2:01 PM Richard Biener  wrote:
> >
> > This started with noticing we add ENTRY_BLOCK to our threads
> > just for the sake of simplifying the conditional at the end of
> > the first block in a function.  That's not really threading
> > anything but it ends up duplicating the entry block, and
> > re-writing the result instead of statically fold the jump.
> 
> Hmmm, but threading 2 blocks is not really threading at all??  Unless
> I'm misunderstanding something, this was even documented in the
> backwards threader:
> 
> [snip]
>  That's not really a jump threading opportunity, but instead is
>  simple cprop & simplification.  We could handle it here if we
>  wanted by wiring up all the incoming edges.  If we run this
>  early in IPA, that might be worth doing.   For now we just
>  reject that case.  */
>   if (m_path.length () <= 1)
>   return false;
> 
> Which you undoubtedly ran into because you're specifically eliding the check:
> 
> > - if (m_profit.profitable_path_p (m_path, m_name, taken_edge,
> > - &irreducible)
> > + if ((m_path.length () == 1
> > +  || m_profit.profitable_path_p (m_path, m_name, taken_edge,
> > + &irreducible))

Correct.  But currently the threader just "cheats", picks up one more
block and then "threads" the case anyway, doing this simple simplification
in the most expensive way possible ...

> >
> > The following tries to handle those by recording simplifications
> > of the exit conditional as a thread of length one.  That requires
> > special-casing them in the backward copier since if we do not
> > have any block to copy but modify the jump in place and remove
> > not taken edges this confuses the hell out of remaining threads.
> >
> > So back_jt_path_registry::update_cfg now first marks all
> > edges we know are never taken and then prunes the threading
> > candidates when they include such edge.  Then it makes sure
> > to first perform unreachable edge removal (so we avoid
> > copying them when other thread paths contain the prevailing
> > edge) before continuing to apply the remaining threads.
> 
> This is all beyond my pay grade.  I'm not very well versed in the
> threader per se.  So if y'all think it's a good idea, by all means.
> Perhaps Jeff can chime in, or remembers the above comment?
> 
> >
> > In statistics you can see this avoids quite a bunch of useless
> > threads (I've investiated 3 random files from cc1files with
> > dropped stats in any of the thread passes).
> >
> > Still thinking about it it would be nice to avoid the work of
> > discovering those candidates we have to throw away later
> > which could eventually be done by having the backward threader
> > perform a RPO walk over the CFG, skipping edges that can be
> > statically determined as not being executed.  Below I'm
> > abusing the path range query to statically analyze the exit
> > branch but I assume there's a simpler way of folding this stmt
> > which could then better integrate with such a walk.
> 
> Unreachable paths can be queried with
> path_range_query::unreachable_path_p ().  Could you leverage this?
> The idea was that if we ever resolved any SSA name to UNDEFINED, the
> path itself was unreachable.

The situation is that we end up with paths where an intermediate
branch on the path can be simplified to false - but of course only
if we put all intermediate branch dependences into the list of
imports to consider.

I don't like it very much to use the "threading" code to perform
the simplification but I couldn't figure a cheap way to perform
the simplification without invoking a full EVRP?  That said,
the backwards threader simply does

  basic_block bb;
  FOR_EACH_BB_FN (bb, m_fun)
if (EDGE_COUNT (bb->succs) > 1)
  maybe_thread_block (bb);

  bool changed = m_registry.thread_through_all_blocks (true);

instead of that we should only consider edges that may be executable
by instead walking the CFG along such edges, simplifying BB exit
conditionals.  Unfortunately EVRP is now a C++ maze so I couldn't
find how to actually do such simplification, not knowing how
interacting with ranger influences the path query use either.
If you or Andrew has any suggestions on how to essentially
do a

  if (edge e = find_taken_edge (bb))
{
...
}

where find_taken_edge should be at least as powerful as using
the path query for a single bb then I'd be all ears.  As said,
I tried to find the code to cut&paste in EVRP but failed to ...

Thanks,
Richard.

> 
> Aldy
> 
> >
> > In any case it seems worth more conciously handling the
> > case of exit branches that simplify without path sensitive
> > information.
> >
> > Then the patch also restricts path discovery when we'd produce
> > threads we'll reject later during copying - the backward threader
> > copying cannot handle paths where the to duplicat

Re: [x86_64 PATCH] Support shifts and rotates by integer constants in TImode STV.

2022-08-15 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 15, 2022 at 10:29 AM Roger Sayle  wrote:
>
>
> Many thanks to Uros for reviewing/approving all of the previous pieces.
> This patch adds support for converting 128-bit TImode shifts and rotates
> to SSE equivalents using V1TImode during the TImode STV pass.
> Previously, only logical shifts by multiples of 8 were handled
> (from my patch earlier this month).
>
> As an example of the benefits, the following rotate by 32-bits:
>
> unsigned __int128 a, b;
> void rot32() { a = (b >> 32) | (b << 96); }
>
> when compiled on x86_64 with -O2 previously generated:
>
> movqb(%rip), %rax
> movqb+8(%rip), %rdx
> movq%rax, %rcx
> shrdq   $32, %rdx, %rax
> shrdq   $32, %rcx, %rdx
> movq%rax, a(%rip)
> movq%rdx, a+8(%rip)
> ret
>
> with this patch, now generates:
>
> movdqa  b(%rip), %xmm0
> pshufd  $57, %xmm0, %xmm0
> movaps  %xmm0, a(%rip)
> ret
>
> [which uses a V4SI permutation for those that don't read SSE].
> This should help 128-bit cryptography codes, that interleave XORs
> with rotations (but that don't use additions or subtractions).
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2022-08-15  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-features.cc
> (timode_scalar_chain::compute_convert_gain): Provide costs for
> shifts and rotates.  Provide gains for comparisons against 0/-1.

Please split out the compare part, it doesn't fit under "Support
shifts and rotates by integer constants in TImode STV." summary.

> (timode_scalar_chain::convert_insn): Handle ASHIFTRT, ROTATERT
> and ROTATE just like existing ASHIFT and LSHIFTRT cases.
> (timode_scalar_to_vector_candidate_p): Handle all shifts and
> rotates by integer constants between 0 and 127.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse4_1-stv-9.c: New test case.

OK for the patch without COMPARE stuff, the separate COMPARE patch is
pre-approved.

Thanks,
Uros.


Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]

2022-08-15 Thread Kewen.Lin via Gcc-patches
on 2022/7/11 11:42, Kewen.Lin wrote:
> on 2022/6/15 14:20, Kewen.Lin wrote:
>> Hi Honza,
>>
>> Thanks for the comments!  Some replies are inlined below.
>>
>> on 2022/6/14 19:37, Jan Hubicka wrote:
 Hi,

 Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
 if func->decl is not null but no cgraph node is available for it.
 As PR105818 shows, this could give unexpected result.  For the
 case in PR105818, when parsing bar decl in function foo, the cfun
 is a function structure for foo, for which there is none cgraph
 node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
 the context is to optimize for size, the flag optimize_size is
 true.

 The patch is to make optimize_function_for_size_p to check
 optimize_size as what it does when func->decl is unavailable.

 One regression failure got exposed on aarch64-linux-gnu:

 PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
-DPREVENT_OPTIMIZATION  line 21 x == 10 - i

 The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
 used in function fold_range_test during c parsing, it uses
 optimize_function_for_speed_p which is equal to the invertion
 of optimize_function_for_size_p.  At that time cfun->decl is valid
 but no cgraph node for it, w/o this patch function
 optimize_function_for_speed_p returns true eventually, while it
 returns false with this patch.  Since the command line option -Os
 is specified, there is no reason to interpret it as "for speed".
 I think this failure is expected and adjust the test case
 accordingly.

 Is it ok for trunk?

 BR,
 Kewen
 -

PR target/105818

 gcc/ChangeLog:

* predict.cc (optimize_function_for_size_p): Check optimize_size when
func->decl is valid but its cgraph node is unavailable.

 gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr105818.c: New test.
* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
 ---
  gcc/predict.cc  | 2 +-
  gcc/testsuite/gcc.dg/guality/pr54693-2.c| 2 +-
  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +
  3 files changed, 11 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c

 diff --git a/gcc/predict.cc b/gcc/predict.cc
 index 5734e4c8516..6c60a973236 100644
 --- a/gcc/predict.cc
 +++ b/gcc/predict.cc
 @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
cgraph_node *n = cgraph_node::get (fun->decl);
if (n)
  return n->optimize_for_size_p ();
 -  return OPTIMIZE_SIZE_NO;
 +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>>
>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>>> probably better since one can change optimize_size with optimization
>>> attribute.
>>
>> Good point, agree!
>>
>>> However I think in most cases we check for optimize_size early I think
>>> we are doing something wrong, since at that time htere is no profile
>>> available.  Why exactly PR105818 hits the flag change issue?
>>
>> For PR105818, the reason why the flag changs is that:
>>
>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
>> of rs6000_isa_flags_explicit, it's set as below:
>>
>> /* If we can shrink-wrap the TOC register save separately, then use
>>-msave-toc-indirect unless explicitly disabled.  */
>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>> && flag_shrink_wrap_separate
>> && optimize_function_for_speed_p (cfun))
>>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>
>> Initially, rs6000 initialize target_option_default_node with
>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
>> and optimize_size is true.
>>
>> Later, when c parser handling function foo, it builds target
>> option node as target_option_default_node in function
>> handle_optimize_attribute, it does global option saving and
>> verifying there as well, at that time the cfun is NULL, no
>> issue is found.  And function store_parm_decls allocates
>> struct_function for foo then, cfun becomes function struct
>> for foo, when c parser continues to handle the decl bar in
>> foo, function handle_optimize_attribute works as before,
>> tries to restore the target options at the end, it calls
>> targetm.target_option.restore (rs6000_function_specific_restore)
>> which calls function rs6000_option_override_internal again,
>> at this time the cfun is not NULL while there is no cgraph
>> node for its decl, optimize_function_for_speed_p returns true
>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
>> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
>> as the one saved previously.
>>
>> IMHO, both contexts of global and function decl foo here hold
>> optimize_size, function optimize_function_for_spe

[PATCH] analyzer: fix for ICE in sm-fd.cc [PR106551]

2022-08-15 Thread Immad Mir via Gcc-patches
This patch fixes the ICE caused by valid_to_unchecked_state
in sm-fd.cc by exiting early if first argument of any "dup"
functions is invalid.

gcc/analyzer/ChangeLog:
PR analyzer/106551
* sm-fd.cc (check_for_dup): exit early if first
argument is invalid for all dup functions.

gcc/testsuite/ChangeLog:
PR analyzer/106551
* gcc.dg/analyzer/fd-dup-1.c: New testcase.

Signed-off-by: Immad Mir 
---
 gcc/analyzer/sm-fd.cc|  3 +--
 gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c | 11 +++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/sm-fd.cc b/gcc/analyzer/sm-fd.cc
index e02b86baad1..505d598f3f0 100644
--- a/gcc/analyzer/sm-fd.cc
+++ b/gcc/analyzer/sm-fd.cc
@@ -976,8 +976,7 @@ fd_state_machine::check_for_dup (sm_context *sm_ctxt, const 
supernode *node,
 {
   check_for_open_fd (sm_ctxt, node, stmt, call, callee_fndecl,
 DIRS_READ_WRITE);
-  if (kind == DUP_1)
-   return;
+  return;
 }
   switch (kind)
 {
diff --git a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c 
b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
index b971d31b1c7..b4f43e7f0ef 100644
--- a/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/fd-dup-1.c
@@ -245,4 +245,15 @@ test_22 (int flags)
 close (fd);
 }
 
+void do_something();
+void
+test_23 ()
+{
+int nullfd = -1;
+int fd = 1;
+if (dup2 (nullfd, fd) < 0) /* { dg-warning "'dup2' on possibly invalid 
file descriptor 'nullfd'" } */
+{
+do_something();
+}
+}
 
-- 
2.25.1



[x86_64 PATCH] Support shifts and rotates by integer constants in TImode STV.

2022-08-15 Thread Roger Sayle

Many thanks to Uros for reviewing/approving all of the previous pieces.
This patch adds support for converting 128-bit TImode shifts and rotates
to SSE equivalents using V1TImode during the TImode STV pass.
Previously, only logical shifts by multiples of 8 were handled
(from my patch earlier this month).

As an example of the benefits, the following rotate by 32-bits:

unsigned __int128 a, b;
void rot32() { a = (b >> 32) | (b << 96); }

when compiled on x86_64 with -O2 previously generated:

movqb(%rip), %rax
movqb+8(%rip), %rdx
movq%rax, %rcx
shrdq   $32, %rdx, %rax
shrdq   $32, %rcx, %rdx
movq%rax, a(%rip)
movq%rdx, a+8(%rip)
ret

with this patch, now generates:

movdqa  b(%rip), %xmm0
pshufd  $57, %xmm0, %xmm0
movaps  %xmm0, a(%rip)
ret

[which uses a V4SI permutation for those that don't read SSE].
This should help 128-bit cryptography codes, that interleave XORs
with rotations (but that don't use additions or subtractions).

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2022-08-15  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.cc
(timode_scalar_chain::compute_convert_gain): Provide costs for
shifts and rotates.  Provide gains for comparisons against 0/-1.
(timode_scalar_chain::convert_insn): Handle ASHIFTRT, ROTATERT
and ROTATE just like existing ASHIFT and LSHIFTRT cases.
(timode_scalar_to_vector_candidate_p): Handle all shifts and
rotates by integer constants between 0 and 127.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-9.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index effc2f2..8ab65c8 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -1209,6 +1209,8 @@ timode_scalar_chain::compute_convert_gain ()
   rtx def_set = single_set (insn);
   rtx src = SET_SRC (def_set);
   rtx dst = SET_DEST (def_set);
+  HOST_WIDE_INT op1val;
+  int scost, vcost;
   int igain = 0;
 
   switch (GET_CODE (src))
@@ -1245,9 +1247,157 @@ timode_scalar_chain::compute_convert_gain ()
 
case ASHIFT:
case LSHIFTRT:
- /* For logical shifts by constant multiples of 8. */
- igain = optimize_insn_for_size_p () ? COSTS_N_BYTES (4)
- : COSTS_N_INSNS (1);
+ /* See ix86_expand_v1ti_shift.  */
+ op1val = XINT (src, 1);
+ if (optimize_insn_for_size_p ())
+   {
+ if (op1val == 64 || op1val == 65)
+   scost = COSTS_N_BYTES (5);
+ else if (op1val >= 66)
+   scost = COSTS_N_BYTES (6);
+ else if (op1val == 1)
+   scost = COSTS_N_BYTES (8);
+ else
+   scost = COSTS_N_BYTES (9);
+
+ if ((op1val & 7) == 0)
+   vcost = COSTS_N_BYTES (5);
+ else if (op1val > 64)
+   vcost = COSTS_N_BYTES (10);
+ else
+   vcost = TARGET_AVX ? COSTS_N_BYTES (19) : COSTS_N_BYTES (23);
+   }
+ else
+   {
+ scost = COSTS_N_INSNS (2);
+ if ((op1val & 7) == 0)
+   vcost = COSTS_N_INSNS (1);
+ else if (op1val > 64)
+   vcost = COSTS_N_INSNS (2);
+ else
+   vcost = TARGET_AVX ? COSTS_N_INSNS (4) : COSTS_N_INSNS (5);
+   }
+ igain = scost - vcost;
+ break;
+
+   case ASHIFTRT:
+ /* See ix86_expand_v1ti_ashiftrt.  */
+ op1val = XINT (src, 1);
+ if (optimize_insn_for_size_p ())
+   {
+ if (op1val == 64 || op1val == 127)
+   scost = COSTS_N_BYTES (7);
+ else if (op1val == 1)
+   scost = COSTS_N_BYTES (8);
+ else if (op1val == 65)
+   scost = COSTS_N_BYTES (10);
+ else if (op1val >= 66)
+   scost = COSTS_N_BYTES (11);
+ else
+   scost = COSTS_N_BYTES (9);
+
+ if (op1val == 127)
+   vcost = COSTS_N_BYTES (10);
+ else if (op1val == 64)
+   vcost = COSTS_N_BYTES (14);
+ else if (op1val == 96)
+   vcost = COSTS_N_BYTES (18);
+ else if (op1val >= 111)
+   vcost = COSTS_N_BYTES (15);
+  else if (TARGET_AVX2 && op1val == 32)
+   vcost = COSTS_N_BYTES (16);
+ else if (TARGET_SSE4_1 && op1val == 32)
+   vcost = COSTS_N_BYTES (20);
+ else if (op1val >= 96)
+   vcost = COSTS_N_BYTES (23);
+ else if ((op1val & 7) == 0)
+   vcost = COSTS_N_BYTES (28);
+  els

Re: [PATCH] vect: Don't allow vect_emulated_vector_p type in vectorizable_call [PR106322]

2022-08-15 Thread Richard Biener via Gcc-patches
On Mon, Aug 15, 2022 at 10:00 AM Kewen.Lin  wrote:
>
> Hi Richi,
>
> >>
> >> Yes, but you just missed the RC for 12.2 so please wait until after GCC 
> >> 12.2
> >> is released and the branch is open again.  The testcase looks mightly
> >> complicated
> >> so fallout there might be well possible as well ;)  I suppose it wasn't 
> >> possible
> >> to craft a simple C testcase after the analysis?
> >
> > Thanks for the hints!  Let me give it a try next week and get back to you 
> > then.
> >
>
> As you suggested, I constructed one C testcase which has been verified on 
> both i386
> and ppc64 (failed w/o the patch while passed w/ that).
>
> Is this attached patch ok for trunk?  And also ok for all release branches 
> after a
> week or so (also after frozen time)?

Yes.

Thanks,
Richard.

> BR,
> Kewen


Re: [RFC]rs6000: split complicated constant to memory

2022-08-15 Thread Richard Biener via Gcc-patches
On Mon, Aug 15, 2022 at 7:26 AM Jiufu Guo via Gcc-patches
 wrote:
>
> Hi,
>
> This patch tries to put the constant into constant pool if building the
> constant requires 3 or more instructions.
>
> But there is a concern: I'm wondering if this patch is really profitable.
>
> Because, as I tested, 1. for simple case, if instructions are not been run
> in parallel, loading constant from memory maybe faster; but 2. if there
> are some instructions could run in parallel, loading constant from memory
> are not win comparing with building constant.  As below examples.
>
> For f1.c and f3.c, 'loading' constant would be acceptable in runtime aspect;
> for f2.c and f4.c, 'loading' constant are visibly slower.
>
> For real-world cases, both kinds of code sequences exist.
>
> So, I'm not sure if we need to push this patch.
>
> Run a lot of times (10) below functions to check runtime.
> f1.c:
> long foo (long *arg, long*, long *)
> {
>   *arg = 0x12345678;
> }
> asm building constant:
> lis 10,0x1234
> ori 10,10,0x5678
> sldi 10,10,32
> vs.  asm loading
> addis 10,2,.LC0@toc@ha
> ld 10,.LC0@toc@l(10)
> The runtime between 'building' and 'loading' are similar: some times the
> 'building' is faster; sometimes 'loading' is faster. And the difference is
> slight.

I wonder if it is possible to decide this during scheduling - chose the
variant that, when the result is needed, is cheaper?  Post-RA might
be a bit difficult (I see the load from memory needs the TOC, but then
when the TOC is not available we could just always emit the build form),
and pre-reload precision might be not good enough to make this worth
the experiment?

Of course the scheduler might lack on the technical side as well.

>
> f2.c
> long foo (long *arg, long *arg2, long *arg3)
> {
>   *arg = 0x12345678;
>   *arg2 = 0x79652347;
>   *arg3 = 0x46891237;
> }
> asm building constant:
> lis 7,0x1234
> lis 10,0x7965
> lis 9,0x4689
> ori 7,7,0x5678
> ori 10,10,0x2347
> ori 9,9,0x1237
> sldi 7,7,32
> sldi 10,10,32
> sldi 9,9,32
> vs. loading
> addis 7,2,.LC0@toc@ha
> addis 10,2,.LC1@toc@ha
> addis 9,2,.LC2@toc@ha
> ld 7,.LC0@toc@l(7)
> ld 10,.LC1@toc@l(10)
> ld 9,.LC2@toc@l(9)
> For this case, 'loading' is always slower than 'building' (>15%).
>
> f3.c
> long foo (long *arg, long *, long *)
> {
>   *arg = 384307168202282325;
> }
> lis 10,0x555
> ori 10,10,0x
> sldi 10,10,32
> oris 10,10,0x
> ori 10,10,0x
> For this case, 'building' (through 5 instructions) are slower, and 'loading'
> is faster ~5%;
>
> f4.c
> long foo (long *arg, long *arg2, long *arg3)
> {
>   *arg = 384307168202282325;
>   *arg2 = -6148914691236517205;
>   *arg3 = 768614336404564651;
> }
> lis 7,0x555
> lis 10,0x
> lis 9,0xaaa
> ori 7,7,0x
> ori 10,10,0x
> ori 9,9,0x
> sldi 7,7,32
> sldi 10,10,32
> sldi 9,9,32
> oris 7,7,0x
> oris 10,10,0x
> oris 9,9,0x
> ori 7,7,0x
> ori 10,10,0xaaab
> ori 9,9,0xaaab
> For this cases, since 'building' constant are parallel, 'loading' is slower:
> ~8%. On p10, 'loading'(through 'pld') is also slower >4%.
>
>
> BR,
> Jeff(Jiufu)
>
> ---
>  gcc/config/rs6000/rs6000.cc| 14 ++
>  gcc/testsuite/gcc.target/powerpc/pr63281.c | 11 +++
>  2 files changed, 25 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
>
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 4b727d2a500..3798e11bdbc 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10098,6 +10098,20 @@ rs6000_emit_set_const (rtx dest, rtx source)
>   c = ((c & 0x) ^ 0x8000) - 0x8000;
>   emit_move_insn (lo, GEN_INT (c));
> }
> +  else if (base_reg_operand (dest, mode)
> +  && num_insns_constant (source, mode) > 2)
> +   {
> + rtx sym = force_const_mem (mode, source);
> + if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
> + && use_toc_relative_ref (XEXP (sym, 0), mode))
> +   {
> + rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
> + sym = gen_const_mem (mode, toc);
> + set_mem_alias_set (sym, get_TOC_alias_set ());
> +   }
> +
> + emit_insn (gen_rtx_SET (dest, sym));
> +   }
>else
> rs6000_emit_set_long_const (dest, c);
>break;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c 
> b/gcc/testsuite/gcc.target/powerpc/pr63281.c
> new file mode 100644
> index 000..469a8f64400
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
> @@ -0,0 +1,11 @@
> +/* PR target/63281 */
> +/* { dg

PING^2 [PATCH v4] rs6000: Adjust mov optabs for opaque modes [PR103353]

2022-08-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597286.html

BR,
Kewen

> 
> on 2022/6/27 10:47, Kewen.Lin via Gcc-patches wrote:
>> Hi Segher!
>>
>> on 2022/6/25 00:49, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Fri, Jun 24, 2022 at 09:03:59AM +0800, Kewen.Lin wrote:
 on 2022/6/24 03:06, Segher Boessenkool wrote:
> On Wed, May 18, 2022 at 10:07:48PM +0800, Kewen.Lin wrote:
>> As PR103353 shows, we may want to continue to expand a MMA built-in
>> function like a normal function, even if we have already emitted
>> error messages about some missing required conditions.  As shown in
>> that PR, without one explicit mov optab on OOmode provided, it would
>> call emit_move_insn recursively.
>
> First off: lxvp is a VSX insn, not an MMA insn.  So please don't call it
> that -- this confusion is what presumably caused the problem here, so it
> would be good to root it out :-)

 I guess the "it" in "don't call it call" is for "MMA built-in function"?
 It comes from the current code:
>>>
>>> Your proposed commit message says "MMA built-in function".  It is not
>>> an MMA builtin, or rather, it should not be.
>>>
>> +  /* Opaque modes are only expected to be available when MMA is 
>> supported,
>
> Why do people expect that?  It is completely wrong.  The name "opaque"
> itself already says this is not just for MMA, but perhaps more
> importantly, it is a basic VSX insn, doesn't touch any MMA resources,
> and is useful in other contexts as well.

 ... The above statements are also based on current code, for now, the
 related things like built-in functions, mov optab, hard_regno_ok etc.
 for these two modes are guarded by TARGET_MMA.
>>>
>>> Opaque modes are a generic thing, not an rs6000 thing.  It is important
>>> not to conflate completely different things that just happened to
>>> coincide some months ago (but not anymore right now even!)
>>>
 I think I get your points here, you want to separate these opaque
 modes from MMA since the underlying lxvp/stxvp are not MMA specific,
 so those related things (bifs, mov optabs etc.) are not necessarily
 guarded under MMA.
>>>
>>> Yup.  This can take some time of course, but in the mean time we should
>>> stop pretending the status quo is correct.
>>>
> So this needs some bigger surgery.

 Yes, Peter may have more comments on this.
>>>
>>> Yes.  Can you do a patch that just fixes this PR103353, without adding
>>> more misleading comments?  :-)
>>>
>>
>> Many thanks for all the further explanation above!  The attached patch
>> updated the misleading comments as you pointed out and suggested, could
>> you help to have another look?
>>
>> BR,
>> Kewen


PING^4 [PATCH v3] rs6000: Fix the check of bif argument number [PR104482]

2022-08-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595208.html

BR,
Kewen

 Hi,

 As PR104482 shown, it's one regression about the handlings when
 the argument number is more than the one of built-in function
 prototype.  The new bif support only catches the case that the
 argument number is less than the one of function prototype, but
 it misses the case that the argument number is more than the one
 of function prototype.  Because it uses "n != expected_args",
 n is updated in

for (n = 0; !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
 fnargs = TREE_CHAIN (fnargs), n++)

 , it's restricted to be less than or equal to expected_args with
 the guard !VOID_TYPE_P (TREE_VALUE (fnargs)), so it's wrong.

 The fix is to use nargs instead, also move the checking hunk's
 location ahead to avoid useless further scanning when the counts
 mismatch.

 Bootstrapped and regtested on powerpc64-linux-gnu P8 and
 powerpc64le-linux-gnu P9 and P10.

 v3: Update test case with dg-excess-errors.

 v2: Add one test case and refine commit logs.
 https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593155.html

 v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591768.html

 Is it ok for trunk?

 BR,
 Kewen
 -
PR target/104482

 gcc/ChangeLog:

* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Fix
the equality check for argument number, and move this hunk ahead.

 gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr104482.c: New test.
 ---
  gcc/config/rs6000/rs6000-c.cc   | 60 ++---
  gcc/testsuite/gcc.target/powerpc/pr104482.c | 16 ++
  2 files changed, 46 insertions(+), 30 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr104482.c

 diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
 index 9c8cbd7a66e..61881f29230 100644
 --- a/gcc/config/rs6000/rs6000-c.cc
 +++ b/gcc/config/rs6000/rs6000-c.cc
 @@ -1756,6 +1756,36 @@ altivec_resolve_overloaded_builtin (location_t loc, 
 tree fndecl,
vec *arglist = static_cast *> 
 (passed_arglist);
unsigned int nargs = vec_safe_length (arglist);

 +  /* If the number of arguments did not match the prototype, return NULL
 + and the generic code will issue the appropriate error message.  Skip
 + this test for functions where we don't fully describe all the 
 possible
 + overload signatures in rs6000-overload.def (because they aren't 
 relevant
 + to the expansion here).  If we don't, we get confusing error 
 messages.  */
 +  /* As an example, for vec_splats we have:
 +
 +; There are no actual builtins for vec_splats.  There is special handling 
 for
 +; this in altivec_resolve_overloaded_builtin in rs6000-c.cc, where the 
 call
 +; is replaced by a constructor.  The single overload here causes
 +; __builtin_vec_splats to be registered with the front end so that can 
 happen.
 +[VEC_SPLATS, vec_splats, __builtin_vec_splats]
 +  vsi __builtin_vec_splats (vsi);
 +ABS_V4SI SPLATS_FAKERY
 +
 +So even though __builtin_vec_splats accepts all vector types, the
 +infrastructure cheats and just records one prototype.  We end up 
 getting
 +an error message that refers to this specific prototype even when we
 +are handling a different argument type.  That is completely confusing
 +to the user, so it's best to let these cases be handled individually
 +in the resolve_vec_splats, etc., helper functions.  */
 +
 +  if (expected_args != nargs
 +  && !(fcode == RS6000_OVLD_VEC_PROMOTE
 + || fcode == RS6000_OVLD_VEC_SPLATS
 + || fcode == RS6000_OVLD_VEC_EXTRACT
 + || fcode == RS6000_OVLD_VEC_INSERT
 + || fcode == RS6000_OVLD_VEC_STEP))
 +return NULL;
 +
for (n = 0;
 !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
 fnargs = TREE_CHAIN (fnargs), n++)
 @@ -1816,36 +1846,6 @@ altivec_resolve_overloaded_builtin (location_t loc, 
 tree fndecl,
types[n] = type;
  }

 -  /* If the number of arguments did not match the prototype, return NULL
 - and the generic code will issue the appropriate error message.  Skip
 - this test for functions where we don't fully describe all the 
 possible
 - overload signatures in rs6000-overload.def (because they aren't 
 relevant
 - to the expansion here).  If we don't, we get confusing error 
 messages.  */
 -  /* As an example, for vec_splats we have:
 -
 -; There are no actual builtins for vec_splats.  There is special handling 
 for
 -; this in a

PING^4 [PATCH] rs6000: Handle unresolved overloaded builtin [PR105485]

2022-08-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594699.html

BR,
Kewen

>>
>>> on 2022/5/13 13:29, Kewen.Lin via Gcc-patches wrote:
 Hi,

 PR105485 exposes that new builtin function framework doesn't handle
 unresolved overloaded builtin function well.  With new builtin
 function support, we don't have builtin info for any overloaded
 rs6000_gen_builtins enum, since they are expected to be resolved to
 one specific instance.  So when function rs6000_gimple_fold_builtin
 faces one unresolved overloaded builtin, the access for builtin info
 becomes out of bound and gets ICE then.

 We should not try to fold one unresolved overloaded builtin there
 and as the previous support we should emit one error message during
 expansion phase like "unresolved overload for builtin ...".

 Bootstrapped and regtested on powerpc64-linux-gnu P8 and
 powerpc64le-linux-gnu P9 and P10.

 Is it ok for trunk?

 BR,
 Kewen
 -
PR target/105485

 gcc/ChangeLog:

* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin): Add
the handling for unresolved overloaded builtin function.
(rs6000_expand_builtin): Likewise.

 gcc/testsuite/ChangeLog:

* g++.target/powerpc/pr105485.C: New test.

 ---
  gcc/config/rs6000/rs6000-builtin.cc | 13 +
  gcc/testsuite/g++.target/powerpc/pr105485.C |  9 +
  2 files changed, 22 insertions(+)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr105485.C

 diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
 b/gcc/config/rs6000/rs6000-builtin.cc
 index e925ba9fad9..e102305c90c 100644
 --- a/gcc/config/rs6000/rs6000-builtin.cc
 +++ b/gcc/config/rs6000/rs6000-builtin.cc
 @@ -1294,6 +1294,11 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
 *gsi)
enum tree_code bcode;
gimple *g;

 +  /* For an unresolved overloaded builtin, return early here since there
 + is no builtin info for it and we are unable to fold it.  */
 +  if (fn_code > RS6000_OVLD_NONE)
 +return false;
 +
size_t uns_fncode = (size_t) fn_code;
enum insn_code icode = rs6000_builtin_info[uns_fncode].icode;
const char *fn_name1 = rs6000_builtin_info[uns_fncode].bifname;
 @@ -3295,6 +3300,14 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
 subtarget */,
tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
enum rs6000_gen_builtins fcode
  = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
 +
 +  /* Emit error message if it's an unresolved overloaded builtin.  */
 +  if (fcode > RS6000_OVLD_NONE)
 +{
 +  error ("unresolved overload for builtin %qF", fndecl);
 +  return const0_rtx;
 +}
 +
size_t uns_fcode = (size_t)fcode;
enum insn_code icode = rs6000_builtin_info[uns_fcode].icode;

 diff --git a/gcc/testsuite/g++.target/powerpc/pr105485.C 
 b/gcc/testsuite/g++.target/powerpc/pr105485.C
 new file mode 100644
 index 000..a3b8290df8c
 --- /dev/null
 +++ b/gcc/testsuite/g++.target/powerpc/pr105485.C
 @@ -0,0 +1,9 @@
 +/* It's to verify no ICE here, ignore error/warning messages since
 +   they are not test points here.  */
 +/* { dg-excess-errors "pr105485" } */
 +
 +template  void __builtin_vec_vslv();
 +typedef  __attribute__((altivec(vector__))) char T;
 +T b (T c, T d) {
 +return __builtin_vec_vslv(c, d);
 +}


PING^1 [PATCH] rs6000: Suggest unroll factor for loop vectorization

2022-08-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598601.html

BR,
Kewen

on 2022/7/20 17:30, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> Commit r12-6679-g7ca1582ca60dc8 made vectorizer accept one
> unroll factor to be applied to vectorization factor when
> vectorizing the main loop, it would be suggested by target
> when doing costing.
> 
> This patch introduces function determine_suggested_unroll_factor
> for rs6000 port, to make it be able to suggest the unroll factor
> for a given loop being vectorized.  Referring to aarch64 port
> and basing on the analysis on SPEC2017 performance evaluation
> results, it mainly considers these aspects:
>   1) unroll option and pragma which can disable unrolling for the
>  given loop;
>   2) simple hardware resource model with issued non memory access
>  vector insn per cycle;
>   3) aggressive heuristics when iteration count is unknown:
>  - reduction case to break cross iteration dependency;
>  - emulated gather load;
>   4) estimated iteration count when iteration count is unknown;
> 
> With this patch, SPEC2017 performance evaluation results on
> Power8/9/10 are listed below (speedup pct.):
> 
>   * Power10
> - O2: all are neutral (excluding some noises);
> - Ofast: 510.parest_r +6.67%, the others are neutral
>  (use ... for the followings);
> - Ofast + unroll: 510.parest_r +5.91%, ...
> - Ofast + LTO + PGO: 510.parest_r +3.00%, ...
> - Ofast + cheap vect cost: 510.parest_r +6.23%, ...
> - Ofast + very-cheap vect cost: all are neutral;
> 
>   * Power9
> - Ofast: 510.parest_r +8.73%, 538.imagick_r +11.18%
>  (likely noise), 500.perlbench_r +1.84%, ...
> 
>   * Power8
> - Ofast: 510.parest_r +5.43%, ...;
> 
> This patch also introduces one documented parameter
> rs6000-vect-unroll-limit= similar to what aarch64 proposes,
> by evaluating on P8/P9/P10, the default value 4 is slightly
> better than the other choices like 2 and 8.
> 
> It also parameterizes two other values as undocumented
> parameters for future tweaking.  One parameter is
> rs6000-vect-unroll-issue, it's to simply model hardware
> resource for non memory access vector instructions to avoid
> excessive unrolling, initially I tried to use the value in
> the hook rs6000_issue_rate, but the evaluation showed it's
> bad, so I evaluated different values 2/4/6/8 on P8/P9/P10 at
> Ofast, the results showed the default value 4 is good enough
> on these different architectures.  For a record, choice 8
> could make 510.parest_r's gain become smaller or gone on
> P8/P9/P10; choice 6 could make 503.bwaves_r degrade by more
> than 1% on P8/P10; and choice 2 could make 538.imagick_r
> degrade by 3.8%.  The other parameter is
> rs6000-vect-unroll-reduc-threshold.  It's mainly inspired by
> 510.parest_r and tweaked as it, evaluating with different
> values 0/1/2/3 for the threshold, it showed value 1 is the
> best choice.  For a record, choice 0 could make 525.x264_r
> degrade by 2% and 527.cam4_r degrade by 2.95% on P10,
> 548.exchange2_r degrade by 1.41% and 527.cam4_r degrade by
> 2.54% on P8; choice 2 and bigger values could make
> 510.parest_r's gain become smaller.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P7 and P8,
> and powerpc64le-linux-gnu P9.  Bootstrapped on
> powerpc64le-linux-gnu P10, but one failure was exposed during
> regression testing there, it's identified as one miss
> optimization and can be reproduced without this support,
> PR106365 was opened for further tracking.
> 
> Is it for trunk?
> 
> BR,
> Kewen
> --
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (class rs6000_cost_data): Add new members
>   m_nstores, m_reduc_factor, m_gather_load and member function
>   determine_suggested_unroll_factor.
>   (rs6000_cost_data::update_target_cost_per_stmt): Update for m_nstores,
>   m_reduc_factor and m_gather_load.
>   (rs6000_cost_data::determine_suggested_unroll_factor): New function.
>   (rs6000_cost_data::finish_cost): Use determine_suggested_unroll_factor.
>   * config/rs6000/rs6000.opt (rs6000-vect-unroll-limit): New parameter.
>   (rs6000-vect-unroll-issue): Likewise.
>   (rs6000-vect-unroll-reduc-threshold): Likewise.
>   * doc/invoke.texi (rs6000-vect-unroll-limit): Document new parameter.
> 
> ---
>  gcc/config/rs6000/rs6000.cc  | 125 ++-
>  gcc/config/rs6000/rs6000.opt |  18 +
>  gcc/doc/invoke.texi  |   7 ++
>  3 files changed, 147 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 3ff16b8ae04..d0f107d70a8 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -5208,16 +5208,23 @@ protected:
>   vect_cost_model_location, unsigned int);
>void density_test (loop_vec_info);
>void adjust_vect_cost_per_loop (loop_vec_info);
> +  unsigned int determine_suggested_unroll

PING^1 [PATCH v2] rs6000/test: Fix empty TU in some cases of effective targets [PR106345]

2022-08-15 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this: 
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598748.html

BR,
Kewen

on 2022/7/25 14:26, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> As the failure of test case gcc.target/powerpc/pr92398.p9-.c in
> PR106345 shows, some test sources for some powerpc effective
> targets use empty translation unit wrongly.  The test sources
> could go with options like "-ansi -pedantic-errors", then those
> effective target checkings will fail unexpectedly with the
> error messages like:
> 
>   error: ISO C forbids an empty translation unit [-Wpedantic]
> 
> This patch is to fix empty TUs with one dummy function definition
> accordingly.
> 
> Excepting for the failures on gcc.target/powerpc/pr92398.p9-.c
> fixed, I can see it helps to bring back some testing coverage like:
> 
> NA->PASS: gcc.target/powerpc/pr92398.p9+.c
> NA->PASS: gcc.target/powerpc/pr93453-1.c
> 
> Tested as before.
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598602.html
> v2: Use dummy function instead of dummy int as Segher suggested.
> 
> Segher, does this v2 look good to you?
> 
> BR,
> Kewen
> -
>   PR testsuite/106345
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/target-supports.exp (check_effective_target_powerpc_sqrt): Add
>   a function definition to avoid pedwarn about empty translation unit.
>   (check_effective_target_has_arch_pwr5): Likewise.
>   (check_effective_target_has_arch_pwr6): Likewise.
>   (check_effective_target_has_arch_pwr7): Likewise.
>   (check_effective_target_has_arch_pwr8): Likewise.
>   (check_effective_target_has_arch_pwr9): Likewise.
>   (check_effective_target_has_arch_pwr10): Likewise.
>   (check_effective_target_has_arch_ppc64): Likewise.
>   (check_effective_target_ppc_float128): Likewise.
>   (check_effective_target_ppc_float128_insns): Likewise.
>   (check_effective_target_powerpc_vsx): Likewise.
> ---
>  gcc/testsuite/lib/target-supports.exp | 33 +++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 4ed7b25b9a4..06484330178 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -6259,9 +6259,12 @@ proc check_effective_target_powerpc_sqrt { } {
>  }
> 
>  return [check_no_compiler_messages powerpc_sqrt object {
> + void test (void)
> + {
>   #ifndef _ARCH_PPCSQ
>   #error _ARCH_PPCSQ is not defined
>   #endif
> + }
>  } {}]
>  }
> 
> @@ -6369,71 +6372,92 @@ proc check_effective_target_powerpc_p9modulo_ok { } {
>  # as provided by the test.
>  proc check_effective_target_has_arch_pwr5 { } {
>   return [check_no_compiler_messages_nocache arch_pwr5 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR5
>   #error does not have power5 support.
>   #else
>   /* "has power5 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr6 { } {
>   return [check_no_compiler_messages_nocache arch_pwr6 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR6
>   #error does not have power6 support.
>   #else
>   /* "has power6 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr7 { } {
>   return [check_no_compiler_messages_nocache arch_pwr7 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR7
>   #error does not have power7 support.
>   #else
>   /* "has power7 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr8 { } {
>   return [check_no_compiler_messages_nocache arch_pwr8 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR8
>   #error does not have power8 support.
>   #else
>   /* "has power8 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr9 { } {
>   return [check_no_compiler_messages_nocache arch_pwr9 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR9
>   #error does not have power9 support.
>   #else
>   /* "has power9 support" */
>   #endif
> + }
>   } [current_compiler_flags]]
>  }
> 
>  proc check_effective_target_has_arch_pwr10 { } {
>   return [check_no_compiler_messages_nocache arch_pwr10 assembly {
> + void test (void)
> + {
>   #ifndef _ARCH_PWR10
>   #error does not have power10 support.

Re: [PATCH] vect: Don't allow vect_emulated_vector_p type in vectorizable_call [PR106322]

2022-08-15 Thread Kewen.Lin via Gcc-patches
Hi Richi,

>>
>> Yes, but you just missed the RC for 12.2 so please wait until after GCC 12.2
>> is released and the branch is open again.  The testcase looks mightly
>> complicated
>> so fallout there might be well possible as well ;)  I suppose it wasn't 
>> possible
>> to craft a simple C testcase after the analysis?
> 
> Thanks for the hints!  Let me give it a try next week and get back to you 
> then.
> 

As you suggested, I constructed one C testcase which has been verified on both 
i386
and ppc64 (failed w/o the patch while passed w/ that).

Is this attached patch ok for trunk?  And also ok for all release branches 
after a
week or so (also after frozen time)?

BR,
KewenFrom 8b63b3025d99a38cc0400ebc8d882cbcaf8a22cc Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Mon, 15 Aug 2022 01:30:48 -0500
Subject: [PATCH] vect: Don't allow vect_emulated_vector_p type in
 vectorizable_call [PR106322]

As PR106322 shows, in some cases for some vector type whose
TYPE_MODE is a scalar integral mode instead of a vector mode,
it's possible to obtain wrong target support information when
querying with the scalar integral mode.  For example, for the
test case in PR106322, on ppc64 32bit vectorizer gets vector
type "vector(2) short unsigned int" for scalar type "short
unsigned int", its mode is SImode instead of V2HImode.  The
target support querying checks umul_highpart optab with SImode
and considers it's supported, then vectorizer further generates
.MULH IFN call for that vector type.  Unfortunately it's wrong
to use SImode support for that vector type multiply highpart
here.

This patch is to teach vectorizable_call analysis not to allow
vect_emulated_vector_p type for both vectype_in and vectype_out
as Richi suggested.

PR tree-optimization/106322

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_call): Don't allow
vect_emulated_vector_p type for both vectype_in and vectype_out.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr106322.c: New test.
* gcc.target/powerpc/pr106322.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr106322.c| 51 +
 gcc/testsuite/gcc.target/powerpc/pr106322.c | 50 
 gcc/tree-vect-stmts.cc  |  8 
 3 files changed, 109 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr106322.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106322.c

diff --git a/gcc/testsuite/gcc.target/i386/pr106322.c 
b/gcc/testsuite/gcc.target/i386/pr106322.c
new file mode 100644
index 000..31333c5fdcc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106322.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O2 -mtune=generic -march=i686" } */
+
+/* As PR106322, verify this can execute well (not abort).  */
+
+#define N 64
+typedef unsigned short int uh;
+typedef unsigned short int uw;
+uh a[N];
+uh b[N];
+uh c[N];
+uh e[N];
+
+__attribute__ ((noipa)) void
+foo ()
+{
+  for (int i = 0; i < N; i++)
+c[i] = ((uw) b[i] * (uw) a[i]) >> 16;
+}
+
+__attribute__ ((optimize ("-O0"))) void
+init ()
+{
+  for (int i = 0; i < N; i++)
+{
+  a[i] = (uh) (0x7ABC - 0x5 * i);
+  b[i] = (uh) (0xEAB + 0xF * i);
+  e[i] = ((uw) b[i] * (uw) a[i]) >> 16;
+}
+}
+
+__attribute__ ((optimize ("-O0"))) void
+check ()
+{
+  for (int i = 0; i < N; i++)
+{
+  if (c[i] != e[i])
+   __builtin_abort ();
+}
+}
+
+int
+main ()
+{
+  init ();
+  foo ();
+  check ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106322.c 
b/gcc/testsuite/gcc.target/powerpc/pr106322.c
new file mode 100644
index 000..c05072d3416
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106322.c
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mdejagnu-cpu=power4" } */
+
+/* As PR106322, verify this can execute well (not abort).  */
+
+#define N 64
+typedef unsigned short int uh;
+typedef unsigned short int uw;
+uh a[N];
+uh b[N];
+uh c[N];
+uh e[N];
+
+__attribute__ ((noipa)) void
+foo ()
+{
+  for (int i = 0; i < N; i++)
+c[i] = ((uw) b[i] * (uw) a[i]) >> 16;
+}
+
+__attribute__ ((optimize ("-O0"))) void
+init ()
+{
+  for (int i = 0; i < N; i++)
+{
+  a[i] = (uh) (0x7ABC - 0x5 * i);
+  b[i] = (uh) (0xEAB + 0xF * i);
+  e[i] = ((uw) b[i] * (uw) a[i]) >> 16;
+}
+}
+
+__attribute__ ((optimize ("-O0"))) void
+check ()
+{
+  for (int i = 0; i < N; i++)
+{
+  if (c[i] != e[i])
+   __builtin_abort ();
+}
+}
+
+int
+main ()
+{
+  init ();
+  foo ();
+  check ();
+
+  return 0;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index f582d238984..c9dab217f05 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3423,6 +3423,14 @@ vectorizable_call (vec_info *vinfo,
   return false;
 }
 
+  if (vect_emulated_vector_p (vectype_in) || vect_emulated_vector_p 
(vectype_out))
+  {
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPT

Re: [PATCH] PR tree-optimization/64992: (B << 2) != 0 is B when B is Boolean.

2022-08-15 Thread Richard Biener via Gcc-patches
On Sat, Aug 13, 2022 at 12:35 AM Roger Sayle  wrote:
>
> Hi Richard,
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: 08 August 2022 12:49
> > Subject: Re: [PATCH] PR tree-optimization/64992: (B << 2) != 0 is B when B 
> > is
> > Boolean.
> >
> > On Mon, Aug 8, 2022 at 11:06 AM Roger Sayle
> >  wrote:
> > >
> > > This patch resolves both PR tree-optimization/64992 and PR
> > > tree-optimization/98956 which are missed optimization enhancement
> > > request, for which Andrew Pinski already has a proposed solution
> > > (related to a fix for PR tree-optimization/98954).  Yesterday, I
> > > proposed an alternate improved patch for PR98954, which although
> > > superior in most respects, alas didn't address this case [which
> > > doesn't include a BIT_AND_EXPR], hence this follow-up fix.
> > >
> > > For many functions, F(B), of a (zero-one) Boolean value B, the
> > > expression F(B) != 0 can often be simplified to just B.  Hence "(B *
> > > 5) != 0" is B, "-B != 0" is B, "bswap(B) != 0" is B, "(B >>r 3) != 0"
> > > is B.  These are all currently optimized by GCC, with the strange
> > > exception of left shifts by a constant (possibly due to the
> > > undefined/implementation defined behaviour when the shift constant is
> > > larger than the first operand's precision).
> > > This patch adds support for this particular case, when the shift
> > > constant is valid.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32},
> > > with no new failures.  Ok for mainline?
> >
> > +/* (X << C) != 0 can be simplified to X, when X is zero_one_valued_p.
> > +*/ (simplify
> > +  (ne (lshift zero_one_valued_p@0 INTEGER_CST@1) integer_zerop@2)
> > +  (if (tree_fits_shwi_p (@1)
> > +   && tree_to_shwi (@1) > 0
> > +   && tree_to_shwi (@1) < TYPE_PRECISION (TREE_TYPE (@0)))
> > +(convert @0)))
> >
> > while we deliberately do not fold int << 34 since the result is undefined 
> > there is
> > IMHO no reason to not fold the above for any (even non-constant) shift 
> > value.
> > We have guards with TYPE_OVERFLOW_SANITIZED in some cases but I think
> > that's not appropriate here, there's one flag_sanitize check, maybe there's 
> > a
> > special bit for SHIFT overflow we can use.  Why is (X << 0) != 0 excempt in 
> > the
> > condition?
>
> In this case, I think it makes more sense to err on the side of caution, and
> avoid changing the observable behaviour of programs, even in cases were
> the behaviour is officially undefined.  For many targets, (1< indeed
> always true for any value of x, but a counter example are x86's SSE shifts,
> where shifts beyond the size of the vector result in zero.  With STV, this
> means that (1<<258) != 0 has a different value if performed as scalar vs.
> performed as vector.  Worse, one may end up with examples, where based
> upon optimization level, we see different results as shift operands become
> propagated  constants in some paths, but was variable shifts others.
>
> Hence my personal preference is "first, do no harm" and limit this
> transformation to the safe 0 <= X < MODE_PRECISION (mode).
> Then given we'd like to avoid negative shifts, and therefore need to
> test against zero, my second preference is "0 < X" over "0 <= X".
> If the RTL contains a shift by zero, something strange is already going on
> (these should be caught optimized elsewhere), and it's better to leave
> these issues visible in the RTL, than paper over any "latent" mistakes.
>
> I fully I agree that this optimization could be more aggressive, but that
> isn't required to resolve this PR, and resolving PR64992 only to open
> the door for follow-up "unexpected behavior" PRs isn't great progress.
>
> Thoughts?  Ok for mainline?

OK - can you add a comment reflecting the above?  An improvement
might be to allow non-constant operands but test sth like

 expr_in_range (@1, 1, TYPE_PRECISION (...))

we already have expr_not_equal_to, expr_in_range could be done
with

value_range vr;
if (get_global_range_query ()->range_of_expr (vr, @0)
&& vr.kind () == VR_RANGE)
  {
wide_int wmin0 = vr.lower_bound ();
wide_int wmax0 = vr.upper_bound ();
...

there's a bunch of range_of_expr uses, I didn't check closely but
some might fit a new expr_in_range utility.

That can be done as followup (if you like).  Btw, I wonder if
bit-CCP would have not simplified the shift-by-constant compared
to zero as well?  If so, does that behave the same with respect
to 0 or out of bound shifts?

Richard.

>
> > > 2022-08-08  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > PR tree-optimization/64992
> > > PR tree-optimization/98956
> > > * match.pd (ne (lshift @0 @1) 0): Simplify (X << C) != 0 to X
> > > when X is zero_one_valued_p and the shift constant C is valid.
> > > (eq (lshift @0 @1) 0): Likewise, simplify (X << C) == 0 to !X

Re: [PATCH take #2] PR tree-optimization/71343: Optimize (X<

2022-08-15 Thread Richard Biener via Gcc-patches
On Fri, Aug 12, 2022 at 11:45 PM Roger Sayle  wrote:
>
>
> Hi Richard,
> Many thanks for the review and useful suggestions.  I (think I) agree that
> handling non-canonical forms in value_numbering makes more sense,
> so this revised patch is just the first (non-controversial) part of the 
> original
> submission, that incorporates your observation that it doesn't need to
> be limited to (valid) constant shifts, and can be generalized to any
> shift, without introducing undefined behaviour that didn't exist before.
>
> This revised patch has been tested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, both with and without
> --target_board=unix{-m32} with no new failures.  Ok for mainline?

OK.

Thanks,
Richard.

>
> 2022-08-12  Roger Sayle  
> Richard Biener  
>
> gcc/ChangeLog
> PR tree-optimization/71343
> * match.pd (op (lshift @0 @1) (lshift @2 @1)): Optimize the
> expression (X< (op (rshift @0 @1) (rshift @2 @1)): Likewise, simplify (X>>C)^(Y>>C)
> to (X^Y)>>C for binary logical operators, AND, IOR and XOR.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/71343
> * gcc.dg/pr71343-1.c: New test case.
>
>
> Thanks,
> Roger
> --
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: 08 August 2022 12:42
> > To: Roger Sayle 
> > Cc: GCC Patches 
> > Subject: Re: [PATCH] PR tree-optimization/71343: Optimize (X< > (X&Y)< >
> > On Mon, Aug 8, 2022 at 10:07 AM Roger Sayle
> >  wrote:
> > >
> > >
> > > This patch resolves PR tree-optimization/71343, a missed-optimization
> > > enhancement request where GCC fails to see that (a<<2)+(b<<2) == a*4+b*4.
> > > This requires two related (sets of) optimizations to be added to match.pd.
> > >
> > > The first is that (X< > > for many binary operators, including AND, IOR, XOR, and (if overflow
> > > isn't an issue) PLUS and MINUS.  Likewise, the right shifts (both
> > > logical and arithmetic) and bit-wise logical operators can be
> > > simplified in a similar fashion.  These all reduce the number of
> > > GIMPLE binary operations from 3 to 2, by combining/eliminating a shift
> > operation.
> > >
> > > The second optimization reflects that the middle-end doesn't impose a
> > > canonical form on multiplications by powers of two, vs. left shifts,
> > > instead leaving these operations as specified by the programmer unless
> > > there's a good reason to change them.  Hence, GIMPLE code may contain
> > > the expressions "X * 8" and "X << 3" even though these represent the
> > > same value/computation.  The tweak to match.pd is that comparison
> > > operations whose operands are equivalent non-canonical expressions can
> > > be taught their equivalence.  Hence "(X * 8) == (X << 3)" will always
> > > evaluate to true, and "(X<<2) > 4*X" will always evaluate to false.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32},
> > > with no new failures.  Ok for mainline?
> >
> > +/* Shifts by constants distribute over several binary operations,
> > +   hence (X << C) + (Y << C) can be simplified to (X + Y) << C.  */
> > +(for op (plus minus)
> > +  (simplify
> > +(op (lshift:s @0 INTEGER_CST@1) (lshift:s @2 INTEGER_CST@1))
> > +(if (INTEGRAL_TYPE_P (type)
> > +&& TYPE_OVERFLOW_WRAPS (type)
> > +&& !TYPE_SATURATING (type)
> > +&& tree_fits_shwi_p (@1)
> > +&& tree_to_shwi (@1) > 0
> > +&& tree_to_shwi (@1) < TYPE_PRECISION (type))
> >
> > I do wonder why we need to restrict this to shifts by constants?
> > Any out-of-bound shift was already there, no?
> >
> > +/* Some tree expressions are intentionally non-canonical.
> > +   We handle the comparison of the equivalent forms here.  */ (for cmp
> > +(eq le ge)
> > +  (simplify
> > +(cmp:c (lshift @0 INTEGER_CST@1) (mult @0 integer_pow2p@2))
> > +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > +&& tree_fits_shwi_p (@1)
> > +&& tree_to_shwi (@1) > 0
> > +&& tree_to_shwi (@1) < TYPE_PRECISION  (TREE_TYPE (@0))
> > +&& wi::to_wide (@1) == wi::exact_log2 (wi::to_wide (@2)))
> > +  { constant_boolean_node (true, type); })))
> > +
> > +(for cmp (ne lt gt)
> > +  (simplify
> > +(cmp:c (lshift @0 INTEGER_CST@1) (mult @0 integer_pow2p@2))
> > +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > +&& tree_fits_shwi_p (@1)
> > +&& tree_to_shwi (@1) > 0
> > +&& tree_to_shwi (@1) < TYPE_PRECISION  (TREE_TYPE (@0))
> > +&& wi::to_wide (@1) == wi::exact_log2 (wi::to_wide (@2)))
> > +  { constant_boolean_node (false, type); })))
> >
> > hmm.  I wonder if it makes more sense to handle this in value-numbering.
> > tree-ssa-sccvn.cc:visit_nary_op handles some cases that are not exactly
> > canonicalization issues but the shift vs mult could be handled there by just
> > performing the alternate lookup.  That would also enable CSE and by mean

Re: [x86 PATCH] PR target/106577: force_reg may clobber operands during split.

2022-08-15 Thread Richard Biener via Gcc-patches
On Fri, Aug 12, 2022 at 10:41 PM Roger Sayle  wrote:
>
>
> This patch fixes PR target/106577 which is a recent ICE on valid regression
> caused by my introduction of a *testti_doubleword pre-reload splitter in
> i386.md.  During the split pass before reload, this converts the virtual
> *testti_doubleword into an *andti3_doubleword and *cmpti_doubleword,
> checking that any immediate operand is a valid "x86_64_hilo_general_operand"
> and placing it into a TImode register using force_reg if it isn't.
>
> The unexpected behaviour (that caught me out) is that calling force_reg
> may occasionally clobber the contents of the global operands array, or
> more accurately recog_data.operand[0], which means that by the time
> split_XXX calls gen_split_YYY the replacement insn's operands have been
> corrupted.
>
> It's difficult to tell who (if anyone is at fault).  The re-entrant
> stack trace (for the attached PR) looks like:
>
> gen_split_203 (*testti_doubleword) calls
> force_reg calls
> emit_move_insn calls
> emit_move_insn_1 calls
> gen_movti calls
> ix86_expand_move calls
> ix86_convert_const_wide_int_to_broadcast calls
> ix86_vector_duplicate_value calls
> recog_memoized calls
> recog.
>
> By far the simplest and possibly correct fix is rather than attempt
> to push and pop recog_data, to simply (in pre-reload splits) save a
> copy of any operands that will be needed after force_reg, and use
> these copies afterwards.  Many pre-reload splitters avoid this issue
> using "[(clobber (const_int 0))]" and so avoid gen_split_YYY functions,
> but in our case we still need to save a copy of operands[0] (even if we
> call emit_insn or expand_* ourselves), so we might as well continue to
> use the conveniently generated gen_split.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures. Ok for mainline?

Why this obviously fixes the issue seen I wonder whether there's
more of recog_data that might be used after control flow returns
to recog_memoized and thus the fix would be there, not in any
backend pattern triggering the issue like this?

The "easiest" fix would maybe to add a in_recog flag and
simply return FAIL from recog when recursing.  Not sure what
the effect on this particular pattern would be though?

The better(?) fix might be to push/pop recog_data in 'recog', but
of course give that recog_data is currently a global leakage
in intermediate code can still happen.

That said - does anybody know of similar fixes for this issue in other
backends patterns?

Thanks,
Richard.

>
>
> 2022-08-12  Roger Sayle  
>
> gcc/ChangeLog
> PR target/106577
> * config/i386/i386.md (*testti_doubleword): Preserve a copy of
> operands[0], and move initialization of operands[2] later, as the
> call to force_reg may clobber the contents of the operands array.
>
> gcc/testsuite/ChangeLog
> PR target/106577
> * gcc.target/i386/pr106577.c: New test case.
>
>
> Thanks,
> Roger
> --
>