date:20221128


On Mon, 28 Nov 2022 20:49:00 PST (-0800), jeffreya...@gmail.com wrote:



On 11/28/22 19:56, Palmer Dabbelt wrote:

On Mon, 28 Nov 2022 17:46:16 PST (-0800), juzhe.zh...@rivai.ai wrote:

Yeah, I personally want to support RVV intrinsics in GCC13. As RVV
intrinsic is going to release soon next week.


OK, that's fine with me -- I was leaning that way, and I think Jeff only
had a weak opposition.  Are there any more changes required outside the
RISC-V backend?  Those would be the most controversial and are already
late, but if it's only backend stuff at this point then I'm OK taking
the risk for a bit longer.

Jeff?

It's not ideal, but I can live with the bits going into gcc-13 as long
as they don't bleed out of the RISC-V port.


Ya, that's kind of what happens every release though (and not just in 
GCC, it's that way for everything).  Maybe for gcc-14 we can commit to 
taking the stage1/stage3 split seriously in RISC-V land?


It's early enough that nobody should be surprised, and even if we don't 
need to do it as per the GCC rules we're going to go crazy if we keep 
letting things go until the last minute like this.  I think the only 
real fallout we've had so far was the B stuff in binutils, but we've 
been exceedingly close to broken releases way too many times and it's 
going to bite us at some point.

Re: [PATCH] RISC-V: Support the ins "rol" with immediate operand





On 11/28/22 18:53, Feng Wang wrote:

on 2022-11-28 23:39  Jeff Law wrote:



On 11/27/22 19:14, Feng Wang wrote:

From: wangfeng 

There is no Immediate operand of ins "rol" accroding to the B-ext,
so the immediate operand should be loaded into register at first.
But we can convert it to the ins "rori" or "roriw", and then one
immediate load ins can be reduced.

Please refer to the following use cases:
unsigned long foo2(unsigned long rs1)
{
   return (rs1 << 10) | (rs1 >> 54);
}

The complier result is:
li  a1,10
rol a0,a0,a1

This patch will generate one ins
rori a0,a0,54

gcc/ChangeLog:

   * config/riscv/bitmanip.md: Add immediate_operand support in rotl 
RTL pattern

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/zbb-rol-ror-04.c: New test.
   * gcc.target/riscv/zbb-rol-ror-05.c: New test.


So this arrived after stage1 close and I'm not aware of an existing BZ
around this issue, so I'd tend to think this should wait for stage1 to
re-open in the spring.


 From a technical standpoint, would it be better to hand this in a more
generic way?   ie, when converting from gimple into RTL, if we want to
generate a rotate left by immediate and don't have a suitable insn, then
change it to a rotate right by an adjusted immediate.    This could
probably be done in optabs.cc::expand_binop.


We might need similar code in combine.cc or simplify-rtx.cc since some
rotate cases (or exposure of the constant) may not show up until later
in the RTL pipeline.


Anyway, doing this in a more generic way seems like it's worth
investigating.


jeff


Hi jeff,

Thanks for your reply. In the currently it will judge the rotate shift number 
when converting from
gimple into RTL, if the shift number bigger than mode_size/2, then reverse the 
rotate direction.
I think the purpose of this process is to handle rotate shift quickly. I will 
think about your advice
and try to modify it in the expand pass.
Yea, in the past there were targets where the cost of a shift or rotate 
was proportional to the number of bits shifted.  So for a rotate in 
particular it was advantageous to reverse the rotation if the count was 
more than mode_size/2.


I suspect such processors are a lot less common now than in the past. 
But we can probably utilize some of that code to suit our needs.


Jeff

Re: [PATCH] RISC-V: Add attributes for VSETVL PASS





On 11/28/22 19:56, Palmer Dabbelt wrote:

On Mon, 28 Nov 2022 17:46:16 PST (-0800), juzhe.zh...@rivai.ai wrote:
Yeah, I personally want to support RVV intrinsics in GCC13. As RVV 
intrinsic is going to release soon next week.


OK, that's fine with me -- I was leaning that way, and I think Jeff only 
had a weak opposition.  Are there any more changes required outside the 
RISC-V backend?  Those would be the most controversial and are already 
late, but if it's only backend stuff at this point then I'm OK taking 
the risk for a bit longer.


Jeff?
It's not ideal, but I can live with the bits going into gcc-13 as long 
as they don't bleed out of the RISC-V port.


Jeff

Re: [PATCH V2] Use subscalar mode to move struct block for parameter

2022-11-28 Thread Jiufu Guo via Gcc-patches



Hi Jeff,

Thanks a lot for your comments!

Jeff Law  writes:

> On 11/22/22 19:58, Jiufu Guo wrote:
>> Hi Jeff,
>>
>> Thanks a lot for your comments!
>>
>> Jeff Law  writes:
>>
>>> On 11/20/22 20:07, Jiufu Guo wrote:
 Jiufu Guo  writes:

> Hi,
>
> As mentioned in the previous version patch:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
> The suboptimal code is generated for "assigning from parameter" or
> "assigning to return value".
> This patch enhances the assignment from parameters like the below
> cases:
> /case1.c
> typedef struct SA {double a[3];long l; } A;
> A ret_arg (A a) {return a;}
> void st_arg (A a, A *p) {*p = a;}
>
> case2.c
> typedef struct SA {double a[3];} A;
> A ret_arg (A a) {return a;}
> void st_arg (A a, A *p) {*p = a;}
>
> For this patch, bootstrap and regtest pass on ppc64{,le}
> and x86_64.
cut...
 + : word_mode;
 +  int mode_size = GET_MODE_SIZE (mode).to_constant ();
 +  int size = INTVAL (expr_size (from));
 +
 +  /* If/How the parameter using submode, it dependes on the size and
 +   position of the parameter.  Here using heurisitic number.  */
 +  int hurstc_num = 8;
>>> Where did this come from and what does it mean?
>> Sorry for does not make this clear. We know that an aggregate arg may be
>> on registers partially or totally, as assign_parm_adjust_entry_rtl.
>> For an example, if a parameter with 12 words and the target/ABI only
>> allow 8 gprs for arguments, then the parameter could use 8 regs at most
>> and left part in stack.
>
> Right, but the number of registers is target dependent, so I don't see
> how using "8" or any number of that matter is correct here.
I understand.  And even for the same struct type, using how many
registers to pass a parameter, it also dependends on the size of the
parameter and how many leading parameters there is.
So, as you said, "8" or any numbers are not always accurate.

Because, the enhancement in this patch is just make "block move" to be
more friendly for follow optiomizations(cse/dse/dce...) by moving
through sub-mode.  So, I just selected one arbitrary number which may
not too large and also tolerable.
I also through to query the max number registers from targets for
param/ret passing, but it may not very accurate neither.

Any sugguestions are welcome! and thanks!

>
>
>>
>>>
>>> Note that BLKmode subword values passed in registers can be either
>>> right or left justified.  I think you also need to worry about
>>> endianness here.
>> Since the subword is used to move block(read from source mem and then
>> store to destination mem with register mode), and this would keep to use
>> the same endianness on reg like move_block_from_reg. So, the patch does
>> not check the endianness.
>
> Hmm, I was clear enough here, particularly using the endianness term. 
> Don't you need to query the ABI to ensure that you're not changing
> left vs right justification for a partially in register argument.
> On the PA we have:
>
> /* Specify padding for the last element of a block move between registers
>    and memory.
>
>    The 64-bit runtime specifies that objects need to be left justified
>    (i.e., the normal justification for a big endian target).  The 32-bit
>    runtime specifies right justification for objects smaller than 64 bits.
>    We use a DImode register in the parallel for 5 to 7 byte structures
>    so that there is only one element.  This allows the object to be
>    correctly padded.  */
> #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \
>   targetm.calls.function_arg_padding ((MODE), (TYPE))

Yes. We should be careful when store registers to stack
(assign_parms/assign_parm_setup_xx/block/reg), or load to returns.

For this patch, only simple stuffs are handled like "D.xxx = param_1",
where the source and dest of the assignment are all in memory which is
the DECL_RTL(of D.xx/param_xx) in MEM_P/BLK.
And to avoid complicate, this patch only handle the case where
"(size % mode_size) == 0".

If any misunderstandings, please point out, thanks.
And thanks for comments! 


BR,
Jeff (Jiufu)

>
>
> Jeff

Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

On Mon, 28 Nov 2022 19:07:24 PST (-0800), juzhe.zh...@rivai.ai wrote:

In case of RVV intrinsic support, there is no changes outside RISC-V backend
since we don't do the autovectorization support for now.

OK, I'm fine with that.  Sounds like Kito is too?

I will postpone autovectorization until GCC14 is open.

We can still review that stuff and keep it on a branch, if that's eaiser 
on your end.

juzhe.zh...@rivai.ai

From: Palmer Dabbelt

Date: 2022-11-29 10:56
To: juzhe.zhong
CC: Kito Cheng; jeffreyalaw; gcc-patches
Subject: Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS
On Mon, 28 Nov 2022 17:46:16 PST (-0800), juzhe.zh...@rivai.ai wrote:
Yeah, I personally want to support RVV intrinsics in GCC13. 
As RVV intrinsic is going to release soon next week.

OK, that's fine with me -- I was leaning that way, and I think Jeff only 
had a weak opposition.  Are there any more changes required outside the 
RISC-V backend?  Those would be the most controversial and are already 
late, but if it's only backend stuff at this point then I'm OK taking 
the risk for a bit longer.

Jeff?

juzhe.zh...@rivai.ai

From: Kito Cheng

Date: 2022-11-29 09:38
To: Jeff Law
CC: 钟居哲; gcc-patches; palmer
Subject: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS
Actually, I am strongly support those stuff keep merge to trunk until February, 
my goal is intrinsic support for vector, but not including any vectorization 
like SLP or Loop vectorization, the most critical part is the vsetvli which is 
the mode switching, and its almost done.

Those part is kind of infrastructure for future development (vectorization), so 
I want intrinsic support could merge at GCC 13.

and we've included few intrinsic support now, stop there is kind of awkward.

Jeff Law via Gcc-patches  於 2022年11月29日 週二 07:54 寫道：

On 11/28/22 15:52, 钟居哲 wrote:

 >> I'm tempted to push this into the next stage1 given its arrival after

stage1 close, but if the wider RISC-V maintainers want to see it move
forward, I don't object strongly.

Ok, let's save these patches and merge them when GCC14 stage1 is open.
Would you mind telling me when will stage 1 be open?
Typically it's April.  As was noted elsewhere, feel free to keep 
submitting patches in this space and you can certainly create a branch 
where y'all can put patches to make it easier to collaborate and 
ultimately merge with the trunk once stage1 is open again.

 >> I'm curious about the model you're using.  Is it going to be something

similar to mode switching?  That's the first mental model that comes to
mind.  Essentially we determine the VL needed for every chunk of code,
then we do an LCM like algorithm to find the optimal placement points
for VL sets to minimize the number of VL sets across all the paths
through the CFG.  Never in a million years would I have expected we'd be
considering reusing that code.

Yes, I implemented VSETVL PASS with LCM algorithm and RTL_SSA framework.
Yea,  layering on top of RTL-SSA is probably better than the existing 
mode-switching which is LCM without SSA.

Actually, me && kito have spent a month on VSETVL PASS and we have
made a progress. We have tested it with a lot of testcases, turns out 
our implementation

of VSETVL PASS in GCC has much better codegen than the VSETVL implemented
in LLVM side in many different situations because of LCM. I am working 
on cleaning up the codes

and hopefully you will see it soon in the next patch.
Good to hear.  I argued pretty loudly in the late 90s that LCM was the 
right framework for this problem.  We didn't have rtl-ssa, but we did 
have a pure RTL LCM module that Joern and Andrew were able to re-use to 
implement sh's mode switching.

I just never thought we'd see another processor where it'd be useful.

Jeff

Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

2022-11-28 Thread juzhe.zh...@rivai.ai

In case of RVV intrinsic support, there is no changes outside RISC-V backend
since we don't do the autovectorization support for now.

I will postpone autovectorization until GCC14 is open.


juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2022-11-29 10:56
To: juzhe.zhong
CC: Kito Cheng; jeffreyalaw; gcc-patches
Subject: Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS
On Mon, 28 Nov 2022 17:46:16 PST (-0800), juzhe.zh...@rivai.ai wrote:
> Yeah, I personally want to support RVV intrinsics in GCC13. 
> As RVV intrinsic is going to release soon next week.
 
OK, that's fine with me -- I was leaning that way, and I think Jeff only 
had a weak opposition.  Are there any more changes required outside the 
RISC-V backend?  Those would be the most controversial and are already 
late, but if it's only backend stuff at this point then I'm OK taking 
the risk for a bit longer.
 
Jeff?
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Kito Cheng
> Date: 2022-11-29 09:38
> To: Jeff Law
> CC: 钟居哲; gcc-patches; palmer
> Subject: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS
> Actually, I am strongly support those stuff keep merge to trunk until 
> February, my goal is intrinsic support for vector, but not including any 
> vectorization like SLP or Loop vectorization, the most critical part is the 
> vsetvli which is the mode switching, and its almost done.
> 
> Those part is kind of infrastructure for future development (vectorization), 
> so I want intrinsic support could merge at GCC 13.
> 
> 
> and we've included few intrinsic support now, stop there is kind of awkward.
> 
> Jeff Law via Gcc-patches  於 2022年11月29日 週二 07:54 寫道：
> 
> 
> On 11/28/22 15:52, 钟居哲 wrote:
>>  >> I'm tempted to push this into the next stage1 given its arrival after
stage1 close, but if the wider RISC-V maintainers want to see it move
forward, I don't object strongly.
>> 
>> Ok, let's save these patches and merge them when GCC14 stage1 is open.
>> Would you mind telling me when will stage 1 be open?
> Typically it's April.  As was noted elsewhere, feel free to keep 
> submitting patches in this space and you can certainly create a branch 
> where y'all can put patches to make it easier to collaborate and 
> ultimately merge with the trunk once stage1 is open again.
> 
>> 
>>  >> I'm curious about the model you're using.  Is it going to be something
similar to mode switching?  That's the first mental model that comes to
mind.  Essentially we determine the VL needed for every chunk of code,
then we do an LCM like algorithm to find the optimal placement points
for VL sets to minimize the number of VL sets across all the paths
through the CFG.  Never in a million years would I have expected we'd be
considering reusing that code.
>> 
>> Yes, I implemented VSETVL PASS with LCM algorithm and RTL_SSA framework.
> Yea,  layering on top of RTL-SSA is probably better than the existing 
> mode-switching which is LCM without SSA.
> 
>> Actually, me && kito have spent a month on VSETVL PASS and we have
>> made a progress. We have tested it with a lot of testcases, turns out 
>> our implementation
>> of VSETVL PASS in GCC has much better codegen than the VSETVL implemented
>> in LLVM side in many different situations because of LCM. I am working 
>> on cleaning up the codes
>> and hopefully you will see it soon in the next patch.
> Good to hear.  I argued pretty loudly in the late 90s that LCM was the 
> right framework for this problem.  We didn't have rtl-ssa, but we did 
> have a pure RTL LCM module that Joern and Andrew were able to re-use to 
> implement sh's mode switching.
> 
> I just never thought we'd see another processor where it'd be useful.
> 
> Jeff

Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

On Mon, 28 Nov 2022 17:46:16 PST (-0800), juzhe.zh...@rivai.ai wrote:
Yeah, I personally want to support RVV intrinsics in GCC13. 
As RVV intrinsic is going to release soon next week.

OK, that's fine with me -- I was leaning that way, and I think Jeff only 
had a weak opposition.  Are there any more changes required outside the 
RISC-V backend?  Those would be the most controversial and are already 
late, but if it's only backend stuff at this point then I'm OK taking 
the risk for a bit longer.

Jeff?

juzhe.zh...@rivai.ai

From: Kito Cheng

Date: 2022-11-29 09:38
To: Jeff Law
CC: 钟居哲; gcc-patches; palmer
Subject: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS
Actually, I am strongly support those stuff keep merge to trunk until February, 
my goal is intrinsic support for vector, but not including any vectorization 
like SLP or Loop vectorization, the most critical part is the vsetvli which is 
the mode switching, and its almost done.

Those part is kind of infrastructure for future development (vectorization), so 
I want intrinsic support could merge at GCC 13.

and we've included few intrinsic support now, stop there is kind of awkward.

Jeff Law via Gcc-patches  於 2022年11月29日 週二 07:54 寫道：

On 11/28/22 15:52, 钟居哲 wrote:

 >> I'm tempted to push this into the next stage1 given its arrival after

stage1 close, but if the wider RISC-V maintainers want to see it move
forward, I don't object strongly.

Ok, let's save these patches and merge them when GCC14 stage1 is open.
Would you mind telling me when will stage 1 be open?
Typically it's April.  As was noted elsewhere, feel free to keep 
submitting patches in this space and you can certainly create a branch 
where y'all can put patches to make it easier to collaborate and 
ultimately merge with the trunk once stage1 is open again.

 >> I'm curious about the model you're using.  Is it going to be something

similar to mode switching?  That's the first mental model that comes to
mind.  Essentially we determine the VL needed for every chunk of code,
then we do an LCM like algorithm to find the optimal placement points
for VL sets to minimize the number of VL sets across all the paths
through the CFG.  Never in a million years would I have expected we'd be
considering reusing that code.

Yes, I implemented VSETVL PASS with LCM algorithm and RTL_SSA framework.
Yea,  layering on top of RTL-SSA is probably better than the existing 
mode-switching which is LCM without SSA.

Actually, me && kito have spent a month on VSETVL PASS and we have
made a progress. We have tested it with a lot of testcases, turns out 
our implementation

of VSETVL PASS in GCC has much better codegen than the VSETVL implemented
in LLVM side in many different situations because of LCM. I am working 
on cleaning up the codes

and hopefully you will see it soon in the next patch.
Good to hear.  I argued pretty loudly in the late 90s that LCM was the 
right framework for this problem.  We didn't have rtl-ssa, but we did 
have a pure RTL LCM module that Joern and Andrew were able to re-use to 
implement sh's mode switching.

I just never thought we'd see another processor where it'd be useful.

Jeff

Re: [PATCH 0/2] Support HWASAN with Intel LAM

2022-11-28 Thread Hongtao Liu via Gcc-patches

On Mon, Nov 28, 2022 at 10:40 PM Martin Liška  wrote:
>
> On 11/11/22 02:26, liuhongt via Gcc-patches wrote:
> >2 years ago, ARM folks support HWASAN[1] in GCC[2], and introduced 
> > several
> > target hooks(Many thanks to their work) so other backends can do similar
> > things if they have similar feature.
> >Intel LAM(linear Address Masking)[3 Charpter 14] supports similar 
> > feature with
> > the upper bits of pointers can be used as metadata, LAM support two modes:
> >LAM_U48:bits 48-62 can be used as metadata
> >LAM_U57:bits 57-62 can be used as metedata.
> >
> > These 2 patches mainly support those target hooks, but HWASAN is not really
> > enabled until the final decision for the LAM kernel interface which may take
> > quite a long time. We have verified our patches with a "fake" interface 
> > locally[4], and
> > decided to push the backend patches to the GCC13 to make other HWASAN 
> > developper's work
> > easy.
>
> Hello.
>
> A few random comments I noticed:
>
> 1) please document the new target -mlam in extend.texi
I will.
> 2) the description speaks about bits [48-62] or [57-62], can explain why the 
> patch contains:
>
Kernel will use bit 63 for special purposes, and here we want to
extract the tag by shifting right the pointer 57 bits, and need to
manually mask off bit63.
> +  /* Mask off bit63 when LAM_U57.  */
> +  if (ix86_lam_type == lam_u57)
> ?
>
> 3) Shouldn't the -lman option emit GNU_PROPERTY_X86_FEATURE_1_LAM_U57 or 
> GNU_PROPERTY_X86_FEATURE_1_LAM_U48
> .gnu.property note?
>
> 4) Can you please explain Florian's comment here:
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13#note_1181396487
>
> Thanks,
> Martin
>
> >
> > [1] https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
> > [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557857.html
> > [3] 
> > https://www.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > [4] https://gitlab.com/x86-gcc/gcc/-/tree/users/intel/lam/master
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > liuhongt (2):
> >Implement hwasan target_hook.
> >Enable hwasan for x86-64.
> >
> >   gcc/config/i386/i386-expand.cc  |  12 
> >   gcc/config/i386/i386-options.cc |   3 +
> >   gcc/config/i386/i386-opts.h |   6 ++
> >   gcc/config/i386/i386-protos.h   |   2 +
> >   gcc/config/i386/i386.cc | 123 
> >   gcc/config/i386/i386.opt|  16 +
> >   libsanitizer/configure.tgt  |   1 +
> >   7 files changed, 163 insertions(+)
> >
>


-- 
BR,
Hongtao

Re: [PATCH 0/2] Support HWASAN with Intel LAM

2022-11-28 Thread H.J. Lu via Gcc-patches

On Mon, Nov 28, 2022 at 6:40 AM Martin Liška  wrote:
>
> On 11/11/22 02:26, liuhongt via Gcc-patches wrote:
> >2 years ago, ARM folks support HWASAN[1] in GCC[2], and introduced 
> > several
> > target hooks(Many thanks to their work) so other backends can do similar
> > things if they have similar feature.
> >Intel LAM(linear Address Masking)[3 Charpter 14] supports similar 
> > feature with
> > the upper bits of pointers can be used as metadata, LAM support two modes:
> >LAM_U48:bits 48-62 can be used as metadata
> >LAM_U57:bits 57-62 can be used as metedata.
> >
> > These 2 patches mainly support those target hooks, but HWASAN is not really
> > enabled until the final decision for the LAM kernel interface which may take
> > quite a long time. We have verified our patches with a "fake" interface 
> > locally[4], and
> > decided to push the backend patches to the GCC13 to make other HWASAN 
> > developper's work
> > easy.
>
> Hello.
>
> A few random comments I noticed:
>
> 1) please document the new target -mlam in extend.texi
> 2) the description speaks about bits [48-62] or [57-62], can explain why the 
> patch contains:
>
> +  /* Mask off bit63 when LAM_U57.  */
> +  if (ix86_lam_type == lam_u57)
> ?
>
> 3) Shouldn't the -lman option emit GNU_PROPERTY_X86_FEATURE_1_LAM_U57 or 
> GNU_PROPERTY_X86_FEATURE_1_LAM_U48
> .gnu.property note?

Since there are no clear usages for these LAM bits, we can
leave them out for now.

> 4) Can you please explain Florian's comment here:
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13#note_1181396487
>
> Thanks,
> Martin
>
> >
> > [1] https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
> > [2] https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557857.html
> > [3] 
> > https://www.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> > [4] https://gitlab.com/x86-gcc/gcc/-/tree/users/intel/lam/master
> >
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > liuhongt (2):
> >Implement hwasan target_hook.
> >Enable hwasan for x86-64.
> >
> >   gcc/config/i386/i386-expand.cc  |  12 
> >   gcc/config/i386/i386-options.cc |   3 +
> >   gcc/config/i386/i386-opts.h |   6 ++
> >   gcc/config/i386/i386-protos.h   |   2 +
> >   gcc/config/i386/i386.cc | 123 
> >   gcc/config/i386/i386.opt|  16 +
> >   libsanitizer/configure.tgt  |   1 +
> >   7 files changed, 163 insertions(+)
> >
>


-- 
H.J.

Re: Re: [PATCH] RISC-V: Support the ins "rol" with immediate operand

2022-11-28 Thread Feng Wang

on 2022-11-28 23:39  Jeff Law wrote:
>
>
>On 11/27/22 19:14, Feng Wang wrote:
>> From: wangfeng 
>>
>> There is no Immediate operand of ins "rol" accroding to the B-ext,
>> so the immediate operand should be loaded into register at first.
>> But we can convert it to the ins "rori" or "roriw", and then one
>> immediate load ins can be reduced.
>>
>> Please refer to the following use cases:
>> unsigned long foo2(unsigned long rs1)
>> {
>>  return (rs1 << 10) | (rs1 >> 54);
>> }
>>
>> The complier result is:
>> li   a1,10
>> rol  a0,a0,a1
>>
>> This patch will generate one ins
>> rori a0,a0,54
>>
>> gcc/ChangeLog:
>>
>>  * config/riscv/bitmanip.md: Add immediate_operand support in rotl 
>>RTL pattern
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/zbb-rol-ror-04.c: New test.
>>  * gcc.target/riscv/zbb-rol-ror-05.c: New test.
>
>So this arrived after stage1 close and I'm not aware of an existing BZ
>around this issue, so I'd tend to think this should wait for stage1 to
>re-open in the spring.
>
>
> From a technical standpoint, would it be better to hand this in a more
>generic way?   ie, when converting from gimple into RTL, if we want to
>generate a rotate left by immediate and don't have a suitable insn, then
>change it to a rotate right by an adjusted immediate.    This could
>probably be done in optabs.cc::expand_binop.
>
>
>We might need similar code in combine.cc or simplify-rtx.cc since some
>rotate cases (or exposure of the constant) may not show up until later
>in the RTL pipeline.
>
>
>Anyway, doing this in a more generic way seems like it's worth
>investigating.
>
>
>jeff
> 
Hi jeff,

Thanks for your reply. In the currently it will judge the rotate shift number 
when converting from
gimple into RTL, if the shift number bigger than mode_size/2, then reverse the 
rotate direction. 
I think the purpose of this process is to handle rotate shift quickly. I will 
think about your advice
and try to modify it in the expand pass.

Wang Feng
Best regards

Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

2022-11-28 Thread juzhe.zh...@rivai.ai

Yeah, I personally want to support RVV intrinsics in GCC13. 
As RVV intrinsic is going to release soon next week.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2022-11-29 09:38
To: Jeff Law
CC: 钟居哲; gcc-patches; palmer
Subject: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS
Actually, I am strongly support those stuff keep merge to trunk until February, 
my goal is intrinsic support for vector, but not including any vectorization 
like SLP or Loop vectorization, the most critical part is the vsetvli which is 
the mode switching, and its almost done.

Those part is kind of infrastructure for future development (vectorization), so 
I want intrinsic support could merge at GCC 13.


and we've included few intrinsic support now, stop there is kind of awkward.

Jeff Law via Gcc-patches  於 2022年11月29日 週二 07:54 寫道：


On 11/28/22 15:52, 钟居哲 wrote:
>  >> I'm tempted to push this into the next stage1 given its arrival after
>>>stage1 close, but if the wider RISC-V maintainers want to see it move
>>>forward, I don't object strongly.
> 
> Ok, let's save these patches and merge them when GCC14 stage1 is open.
> Would you mind telling me when will stage 1 be open?
Typically it's April.  As was noted elsewhere, feel free to keep 
submitting patches in this space and you can certainly create a branch 
where y'all can put patches to make it easier to collaborate and 
ultimately merge with the trunk once stage1 is open again.

> 
>  >> I'm curious about the model you're using.  Is it going to be something
>>>similar to mode switching?  That's the first mental model that comes to
>>>mind.  Essentially we determine the VL needed for every chunk of code,
>>>then we do an LCM like algorithm to find the optimal placement points
>>>for VL sets to minimize the number of VL sets across all the paths
>>>through the CFG.  Never in a million years would I have expected we'd be
>>>considering reusing that code.
> 
> Yes, I implemented VSETVL PASS with LCM algorithm and RTL_SSA framework.
Yea,  layering on top of RTL-SSA is probably better than the existing 
mode-switching which is LCM without SSA.

> Actually, me && kito have spent a month on VSETVL PASS and we have
> made a progress. We have tested it with a lot of testcases, turns out 
> our implementation
> of VSETVL PASS in GCC has much better codegen than the VSETVL implemented
> in LLVM side in many different situations because of LCM. I am working 
> on cleaning up the codes
> and hopefully you will see it soon in the next patch.
Good to hear.  I argued pretty loudly in the late 90s that LCM was the 
right framework for this problem.  We didn't have rtl-ssa, but we did 
have a pure RTL LCM module that Joern and Andrew were able to re-use to 
implement sh's mode switching.

I just never thought we'd see another processor where it'd be useful.

Jeff

Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

2022-11-28 Thread Kito Cheng via Gcc-patches

Actually, I am strongly support those stuff keep merge to trunk until
February, my goal is intrinsic support for vector, but not including any
vectorization like SLP or Loop vectorization, the most critical part is the
vsetvli which is the mode switching, and its almost done.

Those part is kind of infrastructure for future development
(vectorization), so I want intrinsic support could merge at GCC 13.


and we've included few intrinsic support now, stop there is kind of awkward.

Jeff Law via Gcc-patches  於 2022年11月29日 週二 07:54
寫道：

>
>
> On 11/28/22 15:52, 钟居哲 wrote:
> >  >> I'm tempted to push this into the next stage1 given its arrival after
> >>>stage1 close, but if the wider RISC-V maintainers want to see it move
> >>>forward, I don't object strongly.
> >
> > Ok, let's save these patches and merge them when GCC14 stage1 is open.
> > Would you mind telling me when will stage 1 be open?
> Typically it's April.  As was noted elsewhere, feel free to keep
> submitting patches in this space and you can certainly create a branch
> where y'all can put patches to make it easier to collaborate and
> ultimately merge with the trunk once stage1 is open again.
>
> >
> >  >> I'm curious about the model you're using.  Is it going to be
> something
> >>>similar to mode switching?  That's the first mental model that comes to
> >>>mind.  Essentially we determine the VL needed for every chunk of code,
> >>>then we do an LCM like algorithm to find the optimal placement points
> >>>for VL sets to minimize the number of VL sets across all the paths
> >>>through the CFG.  Never in a million years would I have expected we'd be
> >>>considering reusing that code.
> >
> > Yes, I implemented VSETVL PASS with LCM algorithm and RTL_SSA framework.
> Yea,  layering on top of RTL-SSA is probably better than the existing
> mode-switching which is LCM without SSA.
>
> > Actually, me && kito have spent a month on VSETVL PASS and we have
> > made a progress. We have tested it with a lot of testcases, turns out
> > our implementation
> > of VSETVL PASS in GCC has much better codegen than the VSETVL implemented
> > in LLVM side in many different situations because of LCM. I am working
> > on cleaning up the codes
> > and hopefully you will see it soon in the next patch.
> Good to hear.  I argued pretty loudly in the late 90s that LCM was the
> right framework for this problem.  We didn't have rtl-ssa, but we did
> have a pure RTL LCM module that Joern and Andrew were able to re-use to
> implement sh's mode switching.
>
> I just never thought we'd see another processor where it'd be useful.
>
> Jeff
>

[PATCH] RISC-V: Remove tail && mask policy operand for vmclr, vmset, vmld, vmst

2022-11-28 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Sorry for resend this patch, I found I miss commit a file.
1. vector.md: remove tail && mask policy operand for mask mode operations since
   we don't need them according to RVV ISA.
2. riscv-v.cc: adapt emit_pred_op for mask mode predicated mov since all RVV 
modes
   including vector integer mode && vector float mode  && vector bool mode are
   all use emit_pred_op function. For vector integer mode && vector float mode,
   we have instruction like vle.v/vse.v that we need tail && mask policy.
   However, for vector bool mode, the instruction is vlm/vsm that we don't need
   tail && mask policy. So we add a condition here to add tail && mask policy 
operand
   during expand if it is not a vector bool modes.

This patch is to cleanup the code and make it be consistent with RVV ISA.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_pred_op): Adapt for mask mode.
* config/riscv/vector.md: Remove Tail && make policy operand for mask 
mode mov.

---
 gcc/config/riscv/riscv-v.cc | 3 ++-
 gcc/config/riscv/vector.md  | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index d54795694f1..4992ff2470c 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -136,7 +136,8 @@ emit_pred_op (unsigned icode, rtx dest, rtx src, 
machine_mode mask_mode)
   rtx vlmax = emit_vlmax_vsetvl (mode);
   e.add_input_operand (vlmax, Pmode);
 
-  e.add_policy_operand (TAIL_AGNOSTIC, MASK_AGNOSTIC);
+  if (GET_MODE_CLASS (mode) != MODE_VECTOR_BOOL)
+e.add_policy_operand (TAIL_AGNOSTIC, MASK_AGNOSTIC);
 
   e.expand ((enum insn_code) icode, MEM_P (dest) || MEM_P (src));
 }
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 3bb87232d3f..38da2f7f095 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -593,8 +593,6 @@
  (unspec:VB
[(match_operand:VB 1 "vector_mask_operand"   "Wc1, Wc1, Wc1, Wc1, 
Wc1")
 (match_operand 4 "vector_length_operand"" rK,  rK,  rK,  rK,  
rK")
-(match_operand 5 "const_int_operand""  i,   i,   i,   i,   
i")
-(match_operand 6 "const_int_operand""  i,   i,   i,   i,   
i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (match_operand:VB 3 "vector_move_operand"  "  m,  vr,  vr, Wc0, 
Wc1")
-- 
2.36.1

Re: Ping [PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

2022-11-28 Thread HAO CHEN GUI via Gcc-patches

Hi Richard,

在 2022/11/29 2:46, Richard Biener 写道:
> Anyhow - my question still stands - what's the fallback for the callers
> that do not check for failure?  How are we sure we're not running into
> these when relaxing the requirement that a MODE_CC prepare_cmp_insn
> must not fail?

I examed the code and found that currently callers should be fine with
returning a NULL_RTX for MODE_CC processing. The prepare_cmp_insn is called
by following callers.

1 gen_cond_trap which doesn't uses MODE_CC
2 prepare_cmp_insn itself where is after MODE_CC processing, so it never
hits MODE_CC
3 emit_cmp_and_jump_insns which doesn't uses MODE_CC
4 emit_conditional_move which checks the output is null or not
5 emit_conditional_add which checks the output is null or not

Not sure if I missed something. Looking forward to your advice.

Thanks a lot
Gui Haochen

[PATCH v2 1/2] gcc: xtensa: allow dynamic configuration

2022-11-28 Thread Max Filippov via Gcc-patches

Import include/xtensa-dynconfig.h that defines XCHAL_* macros as fields
of a structure returned from the xtensa_get_config_v function call.
Define that structure and fill it with default parameter values
specified in the include/xtensa-config.h.
Define reusable function xtensa_load_config that tries to load
configuration and return an address of an exported object from it.
Define the function xtensa_get_config_v1 that uses xtensa_load_config
to get structure xtensa_config_v1, either dynamically configured or the
default.

Provide essential XCHAL_* configuration parameters as __XCHAL_* built-in
macros. This way it will be possible to use them in libgcc and libc
without need to patch libgcc or libc source for the specific xtensa core
configuration.

gcc/
* config.gcc (xtensa*-*-*): Add xtensa-dynconfig.o to extra_objs.
* config/xtensa/t-xtensa (TM_H): Add xtensa-dynconfig.h.
(xtensa-dynconfig.o): New rule.
* config/xtensa/xtensa-dynconfig.c: New file.
* config/xtensa/xtensa-protos.h (xtensa_get_config_strings): New
declaration.
* config/xtensa/xtensa.h (xtensa-config.h): Replace #include
with xtensa-dynconfig.h
(XCHAL_HAVE_MUL32_HIGH, XCHAL_HAVE_RELEASE_SYNC,
 XCHAL_HAVE_S32C1I, XCHAL_HAVE_THREADPTR,
 XCHAL_HAVE_FP_POSTINC): Drop definitions.
(TARGET_DIV32): Replace with __XCHAL_HAVE_DIV32.
(TARGET_CPU_CPP_BUILTINS): Add new 'builtin' variable and loop
through string array returned by the xtensa_get_config_strings
function call.

include/
* xtensa-dynconfig.h: New file.
---
 gcc/config.gcc   |   1 +
 gcc/config/xtensa/t-xtensa   |   8 +-
 gcc/config/xtensa/xtensa-dynconfig.c | 170 +++
 gcc/config/xtensa/xtensa-protos.h|   1 +
 gcc/config/xtensa/xtensa.h   |  22 +-
 include/xtensa-dynconfig.h   | 442 +++
 6 files changed, 626 insertions(+), 18 deletions(-)
 create mode 100644 gcc/config/xtensa/xtensa-dynconfig.c
 create mode 100644 include/xtensa-dynconfig.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index b5eda0460331..951902338205 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -561,6 +561,7 @@ tic6x-*-*)
;;
 xtensa*-*-*)
extra_options="${extra_options} fused-madd.opt"
+   extra_objs="xtensa-dynconfig.o"
;;
 esac
 
diff --git a/gcc/config/xtensa/t-xtensa b/gcc/config/xtensa/t-xtensa
index 6d43b370e5a8..4e5b7dec1bce 100644
--- a/gcc/config/xtensa/t-xtensa
+++ b/gcc/config/xtensa/t-xtensa
@@ -16,5 +16,11 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
-TM_H += $(srcdir)/../include/xtensa-config.h
+TM_H += $(srcdir)/../include/xtensa-config.h \
+   $(srcdir)/../include/xtensa-dynconfig.h
 $(out_object_file): gt-xtensa.h
+
+xtensa-dynconfig.o: $(srcdir)/config/xtensa/xtensa-dynconfig.c \
+  $(CONFIG_H) $(SYSTEM_H) $(srcdir)/../include/xtensa-dynconfig.h \
+  $(srcdir)/../include/xtensa-config.h
+   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $<
diff --git a/gcc/config/xtensa/xtensa-dynconfig.c 
b/gcc/config/xtensa/xtensa-dynconfig.c
new file mode 100644
index ..056204ae9463
--- /dev/null
+++ b/gcc/config/xtensa/xtensa-dynconfig.c
@@ -0,0 +1,170 @@
+/* Xtensa configuration settings loader.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#define XTENSA_CONFIG_DEFINITION
+#include "xtensa-config.h"
+#include "xtensa-dynconfig.h"
+
+#if defined (HAVE_DLFCN_H)
+#include 
+#elif defined (_WIN32)
+#include 
+#define ENABLE_PLUGIN
+#endif
+
+#if !defined (HAVE_DLFCN_H) && defined (_WIN32)
+
+#define RTLD_LAZY 0  /* Dummy value.  */
+
+static void *
+dlopen (const char *file, int mode ATTRIBUTE_UNUSED)
+{
+  return LoadLibrary (file);
+}
+
+static void *
+dlsym (void *handle, const char *name)
+{
+  return (void *) GetProcAddress ((HMODULE) handle, name);
+}
+
+static int ATTRIBUTE_UNUSED
+dlclose (void *handle)
+{
+  FreeLibrary ((HMODULE) handle);
+  return 0;
+}
+
+static const char *
+dlerror (void)
+{
+  return _("Unable to load DLL.");
+}
+
+#endif /* !defined (HAVE_DLFCN_H) && defined (_WIN32)  */
+
+#define CONFIG_ENV_NAME "XTENSA_GNU_CONFIG"
+
+const void

[PATCH v2 2/2] libgcc: xtensa: use built-in configuration

2022-11-28 Thread Max Filippov via Gcc-patches

Now that gcc provides __XCHAL_* definitions use them instead of XCHAL_*
definitions from the include/xtensa-config.h. That makes libgcc
dynamically configurable for the target xtensa core.

libgcc/
* config/xtensa/crti.S (xtensa-config.h): Replace #inlcude with
xtensa-config-builtin.h.
* config/xtensa/crtn.S: Likewise.
* config/xtensa/lib1funcs.S: Likewise.
* config/xtensa/lib2funcs.S: Likewise.
* config/xtensa/xtensa-config-builtin.h: New File.
---
 libgcc/config/xtensa/crti.S  |   2 +-
 libgcc/config/xtensa/crtn.S  |   2 +-
 libgcc/config/xtensa/lib1funcs.S |   2 +-
 libgcc/config/xtensa/lib2funcs.S |   2 +-
 libgcc/config/xtensa/xtensa-config-builtin.h | 198 +++
 5 files changed, 202 insertions(+), 4 deletions(-)
 create mode 100644 libgcc/config/xtensa/xtensa-config-builtin.h

diff --git a/libgcc/config/xtensa/crti.S b/libgcc/config/xtensa/crti.S
index 3de7bc101f4d..2452a88a0351 100644
--- a/libgcc/config/xtensa/crti.S
+++ b/libgcc/config/xtensa/crti.S
@@ -24,7 +24,7 @@
 # .init sections.  Users may put any desired instructions in those
 # sections.
 
-#include "xtensa-config.h"
+#include "xtensa-config-builtin.h"
 
.section .init
.globl _init
diff --git a/libgcc/config/xtensa/crtn.S b/libgcc/config/xtensa/crtn.S
index 06b932edb14d..8520945fbd7c 100644
--- a/libgcc/config/xtensa/crtn.S
+++ b/libgcc/config/xtensa/crtn.S
@@ -25,7 +25,7 @@
 # fact return.  Users may put any desired instructions in those sections.
 # This file is the last thing linked into any executable.
 
-#include "xtensa-config.h"
+#include "xtensa-config-builtin.h"
 
.section .init
 #if XCHAL_HAVE_WINDOWED && !__XTENSA_CALL0_ABI__
diff --git a/libgcc/config/xtensa/lib1funcs.S b/libgcc/config/xtensa/lib1funcs.S
index 3932d206256f..e5a35aa7dcc8 100644
--- a/libgcc/config/xtensa/lib1funcs.S
+++ b/libgcc/config/xtensa/lib1funcs.S
@@ -23,7 +23,7 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
-#include "xtensa-config.h"
+#include "xtensa-config-builtin.h"
 
 /* Define macros for the ABS and ADDX* instructions to handle cases
where they are not included in the Xtensa processor configuration.  */
diff --git a/libgcc/config/xtensa/lib2funcs.S b/libgcc/config/xtensa/lib2funcs.S
index 681bac1be8cf..ef2a83251352 100644
--- a/libgcc/config/xtensa/lib2funcs.S
+++ b/libgcc/config/xtensa/lib2funcs.S
@@ -23,7 +23,7 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
-#include "xtensa-config.h"
+#include "xtensa-config-builtin.h"
 
 /* __xtensa_libgcc_window_spill: This function flushes out all but the
current register window.  This is used to set up the stack so that
diff --git a/libgcc/config/xtensa/xtensa-config-builtin.h 
b/libgcc/config/xtensa/xtensa-config-builtin.h
new file mode 100644
index ..36d4d9db330b
--- /dev/null
+++ b/libgcc/config/xtensa/xtensa-config-builtin.h
@@ -0,0 +1,198 @@
+/* Xtensa configuration settings.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef XTENSA_CONFIG_BUILTIN_H
+#define XTENSA_CONFIG_BUILTIN_H
+
+/* The macros defined here match those with the same names in the Xtensa
+   compile-time HAL (Hardware Abstraction Layer).  Please refer to the
+   Xtensa System Software Reference Manual for documentation of these
+   macros.  */
+
+#undef XCHAL_HAVE_BE
+#define XCHAL_HAVE_BE  __XCHAL_HAVE_BE
+
+#undef XCHAL_HAVE_DENSITY
+#define XCHAL_HAVE_DENSITY __XCHAL_HAVE_DENSITY
+
+#undef XCHAL_HAVE_CONST16
+#define XCHAL_HAVE_CONST16 __XCHAL_HAVE_CONST16
+
+#undef XCHAL_HAVE_ABS
+#define XCHAL_HAVE_ABS __XCHAL_HAVE_ABS
+
+#undef XCHAL_HAVE_ADDX
+#define XCHAL_HAVE_ADDX

[PATCH v2 0/2] gcc: xtensa: allow dynamic configuration

2022-11-28 Thread Max Filippov via Gcc-patches

Hello,

this series addresses the long standing issue with xtensa configuration
support by adding a way to configure toolchain for a specific xtensa
core at runtime using the xtensa-dynconfig [1] library as a plugin.
On a platform with shared library support single toolchain binary
becomes capable of building code for arbitrary xtensa configuration.
At the same time it fully preserves the traditional way of configuring
the toolchain using the xtensa configuration overlay.

Currently xtensa toolchain needs to be patched and rebuilt for every
new xtensa processor configuration. This has a number of downsides:
- toolchain builders need to change the toolchain source code, and
  because xtensa configuration overlay is not a patch, this change is
  special, embedding it into the toolchain build process gets
  backpressure.
- toolchain built for one configuration is usually not usable for any
  other configuration. It's not possible for a distribution to provide
  reusable prebuilt xtensa toolchain.

This series allows building the toolchain (including target libraries)
without its source code modification. Built toolchain takes configuration
parameters from the shared object specified in the environment variable.
That shared object may be built by the xtensa-dynconfig project [1].

The same shared object is used for gcc, all binutils and for gdb.
Xtensa core specific information needed to build that shared object is
taken from the configuration overlay.

Both gcc and binutils-gdb get new shared header file
include/xtensa-dynconfig.h that provides definition of configuration
data structure, initialization macros, redefines XCHAL_* macros to
access this structure and declares function for loading configuration
dynamically.

This is not the first submission of this series, it was first
submitted in 2017 [2]. This version has improved configuration
versioning and GPL-compatibility check that was suggested in comments
for the v1.

[1] https://github.com/jcmvbkbc/xtensa-dynconfig
[2] https://gcc.gnu.org/pipermail/gcc-patches/2017-May/475109.html

Max Filippov (2):
  gcc: xtensa: allow dynamic configuration
  libgcc: xtensa: use built-in configuration

 gcc/config.gcc   |   1 +
 gcc/config/xtensa/t-xtensa   |   8 +-
 gcc/config/xtensa/xtensa-dynconfig.c | 170 +++
 gcc/config/xtensa/xtensa-protos.h|   1 +
 gcc/config/xtensa/xtensa.h   |  22 +-
 include/xtensa-dynconfig.h   | 442 +++
 libgcc/config/xtensa/crti.S  |   2 +-
 libgcc/config/xtensa/crtn.S  |   2 +-
 libgcc/config/xtensa/lib1funcs.S |   2 +-
 libgcc/config/xtensa/lib2funcs.S |   2 +-
 libgcc/config/xtensa/xtensa-config-builtin.h | 198 +
 11 files changed, 828 insertions(+), 22 deletions(-)
 create mode 100644 gcc/config/xtensa/xtensa-dynconfig.c
 create mode 100644 include/xtensa-dynconfig.h
 create mode 100644 libgcc/config/xtensa/xtensa-config-builtin.h

-- 
2.30.2

Re: [PATCH] rtl: add predicates for addition, subtraction & multiplication

On Sun, Nov 27, 2022 at 09:21:00AM -0500, David Malcolm via Gcc-patches wrote:
> We're currently in "stage 3" of GCC 13 development, which means that
> we're focusing on bug-fixing, rather than cleanups and feature work. 
> Though exceptions can be made for low-risk work, at the discretion of
> the release managers; I've taken the liberty of CCing them.

Such global changes are incomnvenient for people who have touched any
of that code in their own patches.  If we really want to do that it
should be done early in stage 1 (when everything is broken for everyone
anyway), and should be agreed on beforehand, or really, should only be
done for obvious improvements.

This is not an obvious improvement.

> > All existings tests did pass.

I have never seen a single target where all existing tests passed.  What
we usually do is "same failures before and after the patch" :-)

> RTL is an aspect of the compiler that tends to have the most per-target
> differences, so it's especially important to be precise about which
> target(s) you built and tested on.

Not that that should matter at all for patches that do not actually
change anything, like this one should be: it should only change
notation.  That is in the nature of helper functions and helper macros.

> > Like I said, this is my first patch. 
> 
> We're sometimes not as welcoming to newcomers as we could be, so please
> bear with us.  Let me know if anything in this email is unclear.

x2 from me!

> As noted in another reply, there are lots of places in the code where
> the patch touches lines that it doesn't need to: generally formatting
> and whitespace changes.
> 
> We have over 30 years of source history which we sometimes need to look
> back on, and RTL is some of the oldest code in the compiler, so we want
> to minimize "churn" to keep tools like "git blame" useful.

Not to mention that many of those changes violated our coding style, or
even look like an automated formatter going haywire.  And, of course,
such changes should be separate patches, if done at all!

Segher

Re: [PATCH] rtl: add predicates for addition, subtraction & multiplication

Hi!

On Sat, Nov 26, 2022 at 09:16:13PM -0500, Charlie Sale via Gcc-patches wrote:
> This is my first contribution to GCC :) one of the beginner projects
> suggested on the website was to add and use RTL type predicates.

It says "See which ones are worth having a predicate for, and add them."

None of the operations should get a predicate, imnsho, only more
structural things.  Code using PLUS_P is way *less* readable than that
using GET_CODE directly!  It is good if important things are more
explicit, it is bad to have many names, etc.

> + * rtl.h (PLUS_P): RTL addition predicate
> + (MINUS_P): RTL subtraction predicate
> + (MULT_P): RTL multiplication predicate

* rtl.h (PLUS_P): New.

> + * alias.cc: use RTL predicates

* alias.cc: Use new predicates.

Send the changelog as plain text btw, not as patch; if nothing else,
such patches will never apply cleanly :-)

> set_reg_known_value (regno, XEXP (note, 0));
> -   set_reg_known_equiv_p (regno,
> -  REG_NOTE_KIND (note) == 
> REG_EQUIV);
> +   set_reg_known_equiv_p (regno, REG_NOTE_KIND (note)
> +   == REG_EQUIV);

Don't reformat unrelated code.  And certainly not to something that
violates our coding standards :-)

> -&& (t = get_reg_known_value (REGNO (XEXP (src, 
> 0
> +&& (t = get_reg_known_value (
> +  REGNO (XEXP (src, 0

Wow, this is even worse.  Why would you do this at all?  I guess you
used some automatic formatting thing that sets maximum line length to
79?  It is 80, and it is a bad idea to reformat any random code.

> -&& (REG_P (XEXP (SET_SRC (pat), 1)))
> -&& GET_CODE (SET_SRC (pat)) == PLUS)
> +&& (REG_P (XEXP (SET_SRC (pat), 1))) && PLUS_P (SET_SRC (pat)))

You could have removed the superfluous extra parentheses here :-)

>   case SUBREG:
> if ((SUBREG_PROMOTED_VAR_P (x)
>  || (REG_P (SUBREG_REG (x)) && REG_POINTER (SUBREG_REG (x)))
> -|| (GET_CODE (SUBREG_REG (x)) == PLUS
> -&& REG_P (XEXP (SUBREG_REG (x), 0))
> +|| (PLUS_P (SUBREG_REG (x)) && REG_P (XEXP (SUBREG_REG (x), 0))
>  && REG_POINTER (XEXP (SUBREG_REG (x), 0))
>  && CONST_INT_P (XEXP (SUBREG_REG (x), 1

There was only one && per line here on purpose.  It makes the code much
more readable.

> -  if (GET_CODE (x) == PLUS
> -  && XEXP (x, 0) == stack_pointer_rtx
> +  if (PLUS_P (x) && XEXP (x, 0) == stack_pointer_rtx
>&& CONST_INT_P (XEXP (x, 1)))

Similar here (but it is so simple here that either is easy to read of
course).

> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3016,19 +3016,17 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
> *i1, rtx_insn *i0,
>/* See if any of the insns is a MULT operation.  Unless one is, we will
>   reject a combination that is, since it must be slower.  Be conservative
>   here.  */
> -  if (GET_CODE (i2src) == MULT
> -  || (i1 != 0 && GET_CODE (i1src) == MULT)
> -  || (i0 != 0 && GET_CODE (i0src) == MULT)
> -  || (GET_CODE (PATTERN (i3)) == SET
> -   && GET_CODE (SET_SRC (PATTERN (i3))) == MULT))
> +  if (MULT_P (i2src) || (i1 != 0 && MULT_P (i1src))
> +  || (i0 != 0 && MULT_P (i0src))
> +  || (GET_CODE (PATTERN (i3)) == SET && MULT_P (SET_SRC (PATTERN (i3)
>  have_mult = 1;

No.  All || align here.  Please leave it that way.

> -  /* If I3 has an inc, then give up if I1 or I2 uses the reg that is inc'd.
> - We used to do this EXCEPT in one case: I3 has a post-inc in an
> - output operand.  However, that exception can give rise to insns like
> - mov r3,(r3)+
> - which is a famous insn on the PDP-11 where the value of r3 used as the
> - source was model-dependent.  Avoid this sort of thing.  */
> +/* If I3 has an inc, then give up if I1 or I2 uses the reg that is inc'd.
> +   We used to do this EXCEPT in one case: I3 has a post-inc in an
> +   output operand.  However, that exception can give rise to insns like
> +   mov r3,(r3)+
> +   which is a famous insn on the PDP-11 where the value of r3 used as the
> +   source was model-dependent.  Avoid this sort of thing.  */

The indentation was correct, it now isn't anymore.  There is absolutely
no reason to touch this at all anyway.  NAK.

> -  if ((FIND_REG_INC_NOTE (i2, NULL_RTX) != 0
> -   && i2_is_used + added_sets_2 > 1)
> +  if ((FIND_REG_INC_NOTE (i2, NULL_RTX) != 0 && i2_is_used + added_sets_2 > 
> 1)

Do not touch random other code please.  If there is a reason to reformat
it (there isn't here!) do that as a separate patch.

>|| (i1 != 0 && FIND_REG_INC_NOTE (i1, NULL_RTX) != 0
> -   && (i1_is_used + added_sets_1 + (added_sets_2 && i1_feeds_i2_n)
> -

Re: [PATCH] c++: TYPENAME_TYPE lookup ignoring non-types [PR107773]

2022-11-28 Thread Patrick Palka via Gcc-patches

On Mon, 28 Nov 2022, Patrick Palka wrote:

> [temp.res.general]/3 says, in a note, "the usual qualified name lookup
> ([basic.lookup.qual]) applies even in the presence of typename".  Thus
> when resolving a TYPENAME_TYPE, it seems we shouldn't be looking past
> non-type members.
> 
> This patch fixes this by passing want_type=false instead of =true during
> the member lookup from make_typename_type.  An old nearby comment
> mentions that we want to continue to set want_type=true when resolving a
> nested typename type, but it appears that the nested case is handled by
> resolve_typename_type instead (which passes want_type=true appropriately).

Whoops, it seems this isn't true -- not all nested TYPENAME_TYPEs are
handled by resolve_typename_type, e.g. for T::b in

  struct a {
struct b { typedef void get; };
int b;
  };

  template
  void f() {
typedef typename T::b::get type;
  }

  template void f();

Passing want_type=false in make_typename_type causes us to incorrectly
reject the TYPENAME_TYPE for T::b here because qualified lookup now
finds the data member a::b instead of the nested class of the same name.
So it looks like we need a flag to control whether we're dealing with a
nested TYPENAME_TYPE or not and to pass want_type=true/false appropriately,
I'll poke more tomorrow.

> 
> In passing, use lookup_member instead of lookup_field so that we give a
> better diagnostic when a member function is found, and generalize the T
> format specifier to D in the diagnostic.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
>   PR c++/107773
> 
> gcc/cp/ChangeLog:
> 
>   * decl.cc (make_typename_type): Use lookup_member instead of
>   lookup_field.  Pass want_type=false instead of =true.  Use D
>   instead of T format specifier.
>   * search.cc (lookup_member): Document default argument.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/typename24.C: New test.
>   * g++.dg/template/typename25.C: New test.
> ---
>  gcc/cp/decl.cc |  7 +++
>  gcc/cp/search.cc   |  2 +-
>  gcc/testsuite/g++.dg/template/typename24.C | 16 
>  gcc/testsuite/g++.dg/template/typename25.C | 20 
>  4 files changed, 40 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/typename24.C
>  create mode 100644 gcc/testsuite/g++.dg/template/typename25.C
> 
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 238e72f90da..673e10801a6 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -4303,9 +4303,8 @@ make_typename_type (tree context, tree name, enum 
> tag_types tag_type,
>   member of the current instantiation or a non-dependent base;
>   lookup will stop when we hit a dependent base.  */
>if (!dependent_scope_p (context))
> -/* We should only set WANT_TYPE when we're a nested typename type.
> -   Then we can give better diagnostics if we find a non-type.  */
> -t = lookup_field (context, name, 2, /*want_type=*/true);
> +t = lookup_member (context, name, /*protect=*/2, /*want_type=*/false,
> +complain);
>else
>  t = NULL_TREE;
>  
> @@ -4357,7 +4356,7 @@ make_typename_type (tree context, tree name, enum 
> tag_types tag_type,
>else
>   {
> if (complain & tf_error)
> - error ("% names %q#T, which is not a type",
> + error ("% names %q#D, which is not a type",
>  context, name, t);
> return error_mark_node;
>   }
> diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
> index 0dbb3be1ee7..e5848ebc620 100644
> --- a/gcc/cp/search.cc
> +++ b/gcc/cp/search.cc
> @@ -1109,7 +1109,7 @@ build_baselink (tree binfo, tree access_binfo, tree 
> functions, tree optype)
>  
>  tree
>  lookup_member (tree xbasetype, tree name, int protect, bool want_type,
> -tsubst_flags_t complain, access_failure_info *afi)
> +tsubst_flags_t complain, access_failure_info *afi /* = NULL */)
>  {
>tree rval, rval_binfo = NULL_TREE;
>tree type = NULL_TREE, basetype_path = NULL_TREE;
> diff --git a/gcc/testsuite/g++.dg/template/typename24.C 
> b/gcc/testsuite/g++.dg/template/typename24.C
> new file mode 100644
> index 000..4b1d5e5271b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/typename24.C
> @@ -0,0 +1,16 @@
> +// PR c++/107773
> +
> +struct a {
> +  typedef void get;
> +};
> +
> +struct b : a {
> +  int get(int i) const;
> +};
> +
> +template
> +void f() {
> +  typedef typename T::get type; // { dg-error "'int b::get\\(int\\) const', 
> which is not a type" }
> +}
> +
> +template void f();
> diff --git a/gcc/testsuite/g++.dg/template/typename25.C 
> b/gcc/testsuite/g++.dg/template/typename25.C
> new file mode 100644
> index 000..4e6b764a97b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/typename25.C
> @@ -0,0 +1,20 @@
> +// Example 4 from [temp.res.general]/3.
> +
> +struct A {
> +

Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

On 11/28/22 15:52, 钟居哲 wrote:

 >> I'm tempted to push this into the next stage1 given its arrival after

stage1 close, but if the wider RISC-V maintainers want to see it move
forward, I don't object strongly.

Ok, let's save these patches and merge them when GCC14 stage1 is open.
Would you mind telling me when will stage 1 be open?
Typically it's April.  As was noted elsewhere, feel free to keep 
submitting patches in this space and you can certainly create a branch 
where y'all can put patches to make it easier to collaborate and 
ultimately merge with the trunk once stage1 is open again.

 >> I'm curious about the model you're using.  Is it going to be something

similar to mode switching?  That's the first mental model that comes to
mind.  Essentially we determine the VL needed for every chunk of code,
then we do an LCM like algorithm to find the optimal placement points
for VL sets to minimize the number of VL sets across all the paths
through the CFG.  Never in a million years would I have expected we'd be
considering reusing that code.

Yes, I implemented VSETVL PASS with LCM algorithm and RTL_SSA framework.
Yea,  layering on top of RTL-SSA is probably better than the existing 
mode-switching which is LCM without SSA.

Actually, me && kito have spent a month on VSETVL PASS and we have
made a progress. We have tested it with a lot of testcases, turns out 
our implementation

of VSETVL PASS in GCC has much better codegen than the VSETVL implemented
in LLVM side in many different situations because of LCM. I am working 
on cleaning up the codes

and hopefully you will see it soon in the next patch.
Good to hear.  I argued pretty loudly in the late 90s that LCM was the 
right framework for this problem.  We didn't have rtl-ssa, but we did 
have a pure RTL LCM module that Joern and Andrew were able to re-use to 
implement sh's mode switching.

I just never thought we'd see another processor where it'd be useful.

Jeff

Re: [PATCH] c++: explicit specialization and trailing requirements [PR107864]


On 11/28/22 15:16, Patrick Palka wrote:

Here we're crashing when using an explicit specialization of a function
template with trailing requirements ultimately because decls_match
(called indirectly from register_specialization) returns false since the
template has trailing requirements whereas the specialization doesn't.

In r12-2230-gddd25bd1a7c8f4, we fixed a similar issue concerning
template requirements instead of trailing requirements.  We could just
extend this fix to ignore trailing requirement mismatches for
explicit specializations as well, but it seems cleaner to just propagate
constraints from the specialized template to the explicit specialization
so that decls_match will naturally return true.  And it looks like
determine_specialization already does this, albeit inconsistently (only
for non-template member functions of a class template as in
cpp2a/concepts-explicit-spec4.C).

So this patch makes determine_specialization consistently propagate
constraints from the specialized template to the specialization, which
obviates the function_requirements_equivalent_p special case added by
r12-2230.  In passing use add_outermost_template_args instead of open
coding it.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Also tested on range-v3 and cmcstl2.

PR c++/107864

gcc/cp/ChangeLog:

* decl.cc (function_requirements_equivalent_p): Don't check
DECL_TEMPLATE_SPECIALIZATION.
* pt.cc (determine_specialization): Propagate constraints when
specializing a function template too.  Simplify by using
add_outermost_template_args.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/explicit-spec1a.C: New test.
---
  gcc/cp/decl.cc|  4 +---
  gcc/cp/pt.cc  | 21 +--
  .../g++.dg/concepts/explicit-spec1a.C | 11 ++
  3 files changed, 22 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/concepts/explicit-spec1a.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 544efdc9914..238e72f90da 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -956,9 +956,7 @@ static bool
  function_requirements_equivalent_p (tree newfn, tree oldfn)
  {
/* In the concepts TS, the combined constraints are compared.  */
-  if (cxx_dialect < cxx20
-  && (DECL_TEMPLATE_SPECIALIZATION (newfn)
- <= DECL_TEMPLATE_SPECIALIZATION (oldfn)))
+  if (cxx_dialect < cxx20)


This should probably check flag_concepts_ts instead of cxx_dialect.

OK with that change.


  {
tree ci1 = get_constraints (oldfn);
tree ci2 = get_constraints (newfn);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index fbf498ad16a..e677e9d1b38 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -2482,17 +2482,16 @@ determine_specialization (tree template_id,
  }
  
/* It was a specialization of a template.  */

-  targs = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (TREE_VALUE (templates)));
-  if (TMPL_ARGS_HAVE_MULTIPLE_LEVELS (targs))
-{
-  *targs_out = copy_node (targs);
-  SET_TMPL_ARGS_LEVEL (*targs_out,
-  TMPL_ARGS_DEPTH (*targs_out),
-  TREE_PURPOSE (templates));
-}
-  else
-*targs_out = TREE_PURPOSE (templates);
-  return TREE_VALUE (templates);
+  tree tmpl = TREE_VALUE (templates);
+  targs = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (tmpl));
+  targs = add_outermost_template_args (targs, TREE_PURPOSE (templates));
+  *targs_out = targs;
+
+  /* Propagate the template's constraints to the declaration.  */
+  if (tsk != tsk_template)
+set_constraints (decl, get_constraints (tmpl));
+
+  return tmpl;
  }
  
  /* Returns a chain of parameter types, exactly like the SPEC_TYPES,

diff --git a/gcc/testsuite/g++.dg/concepts/explicit-spec1a.C 
b/gcc/testsuite/g++.dg/concepts/explicit-spec1a.C
new file mode 100644
index 000..ec678740cb8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/explicit-spec1a.C
@@ -0,0 +1,11 @@
+// A version of explicit-spec1.C where the template g has trailing instead of
+// template requirements.
+// PR c++/107864
+// { dg-do compile { target concepts } }
+
+template concept C = __is_class(T);
+struct Y { int n; } y;
+template void g(T) requires C { }
+int called;
+template<> void g(Y) { called = 3; }
+int main() { g(y); }

Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS


On Mon, 28 Nov 2022 15:10:15 PST (-0800), juzhe.zh...@rivai.ai wrote:
Thanks. 


I think we still can continue RVV feature reviewing process in github branch
that we have talked about. Such patches that have been reviewed I will still 
send
them to GCC mail list and not to merge right now, we can wait until stage1 is 
open.


That also works for me.  We can always stack them up on a vendor branch 
for a few months until things re-open.



Is it a good idea ? I don't want to make RVV support in GCC stop here since 
LLVM already has
all RVV support  and GCC is far behind LLVM for a long time in case of RVV.


Yes, please don't stop ;).  It's really important work!




juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt

Date: 2022-11-29 02:02
To: jeffreyalaw
CC: juzhe.zhong; gcc-patches; Kito Cheng
Subject: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS
On Mon, 28 Nov 2022 08:44:16 PST (-0800), jeffreya...@gmail.com wrote:


On 11/28/22 07:14, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * config/riscv/riscv-protos.h (enum vlmul_type): New enum.
 (get_vlmul): New function.
 (get_ratio): Ditto.
 * config/riscv/riscv-v.cc (struct mode_vtype_group): New struct.
 (ENTRY): Adapt for attributes.
 (enum vlmul_type): New enum.
 (get_vlmul): New function.
 (get_ratio): New function.
 * config/riscv/riscv-vector-switch.def (ENTRY): Adapt for attributes.
 * config/riscv/riscv.cc (ENTRY): Ditto.
 * config/riscv/vector.md (false,true): Add attributes.


I'm tempted to push this into the next stage1 given its arrival after
stage1 close, but if the wider RISC-V maintainers want to see it move
forward, I don't object strongly.
 
I'm also on the fence here: the RISC-V V implementation is a huge 
feature so it's a bit awkward to land it this late in the release, but 
on the flip side it's a very important feature.  It's complicated enough 
that whatever our first release is will probably be a mess, so I'd 
prefer to just get that pain out of the way sooner rather than later.  
There's no V hardware availiable now and nothing concretely announced so 
any users are probably going to be pretty advanced, but having at least 
the basics of V in there will allow us to kick the tires on the rest of 
the stack a lot more easily.
 
There's obviously risk to taking something this late in the process.  We 
don't have anything else that triggers the vectorizer, so I think it 
should be seperable enough that risk is manageable.
 
Not sure if Kito wants to chim in, though.
 

I'm curious about the model you're using.  Is it going to be something
similar to mode switching?  That's the first mental model that comes to
mind.  Essentially we determine the VL needed for every chunk of code,
then we do an LCM like algorithm to find the optimal placement points
for VL sets to minimize the number of VL sets across all the paths
through the CFG.  Never in a million years would I have expected we'd be
considering reusing that code.


Jeff

Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

Thanks. 

I think we still can continue RVV feature reviewing process in github branch
that we have talked about. Such patches that have been reviewed I will still 
send
them to GCC mail list and not to merge right now, we can wait until stage1 is 
open.

Is it a good idea ? I don't want to make RVV support in GCC stop here since 
LLVM already has
all RVV support  and GCC is far behind LLVM for a long time in case of RVV.

juzhe.zh...@rivai.ai

From: Palmer Dabbelt
Date: 2022-11-29 02:02
To: jeffreyalaw
CC: juzhe.zhong; gcc-patches; Kito Cheng
Subject: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS
On Mon, 28 Nov 2022 08:44:16 PST (-0800), jeffreya...@gmail.com wrote:
>
> On 11/28/22 07:14, juzhe.zh...@rivai.ai wrote:
>> From: Ju-Zhe Zhong 
>>
>> gcc/ChangeLog:
>>
>>  * config/riscv/riscv-protos.h (enum vlmul_type): New enum.
>>  (get_vlmul): New function.
>>  (get_ratio): Ditto.
>>  * config/riscv/riscv-v.cc (struct mode_vtype_group): New struct.
>>  (ENTRY): Adapt for attributes.
>>  (enum vlmul_type): New enum.
>>  (get_vlmul): New function.
>>  (get_ratio): New function.
>>  * config/riscv/riscv-vector-switch.def (ENTRY): Adapt for 
>> attributes.
>>  * config/riscv/riscv.cc (ENTRY): Ditto.
>>  * config/riscv/vector.md (false,true): Add attributes.
>
> I'm tempted to push this into the next stage1 given its arrival after
> stage1 close, but if the wider RISC-V maintainers want to see it move
> forward, I don't object strongly.

I'm also on the fence here: the RISC-V V implementation is a huge 
feature so it's a bit awkward to land it this late in the release, but 
on the flip side it's a very important feature.  It's complicated enough 
that whatever our first release is will probably be a mess, so I'd 
prefer to just get that pain out of the way sooner rather than later.  
There's no V hardware availiable now and nothing concretely announced so 
any users are probably going to be pretty advanced, but having at least 
the basics of V in there will allow us to kick the tires on the rest of 
the stack a lot more easily.

There's obviously risk to taking something this late in the process.  We 
don't have anything else that triggers the vectorizer, so I think it 
should be seperable enough that risk is manageable.

Not sure if Kito wants to chim in, though.

> I'm curious about the model you're using.  Is it going to be something
> similar to mode switching?  That's the first mental model that comes to
> mind.  Essentially we determine the VL needed for every chunk of code,
> then we do an LCM like algorithm to find the optimal placement points
> for VL sets to minimize the number of VL sets across all the paths
> through the CFG.  Never in a million years would I have expected we'd be
> considering reusing that code.
>
>
> Jeff

Re: Ping [PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

On Mon, Nov 28, 2022 at 07:46:07PM +0100, Richard Biener wrote:
> Anyhow - my question still stands - what's the fallback for the callers
> that do not check for failure?  How are we sure we're not running into
> these when relaxing the requirement that a MODE_CC prepare_cmp_insn
> must not fail?

This will work the same as with any other define_expand?  If the caller
of gen_blablabla does not check for failure, you end up with a NULL_RTX
in the instruction stream, which will ICE sooner or later.  Not pretty,
sure, but at least it is a reliable ICE :-)

Segher

Re: Re: [PATCH] RISC-V: Add duplicate vector support.

OK.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2022-11-29 00:49
To: juzhe.zhong; gcc-patches
CC: kito.cheng
Subject: Re: [PATCH] RISC-V: Add duplicate vector support.
 
On 11/25/22 09:06, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
>  * config/riscv/constraints.md (Wdm): New constraint.
>  * config/riscv/predicates.md (direct_broadcast_operand): New 
> predicate.
>  * config/riscv/riscv-protos.h (RVV_VLMAX): New macro.
>  (emit_pred_op): Refine function.
>  * config/riscv/riscv-selftests.cc (run_const_vector_selftests): New 
> function.
>  (run_broadcast_selftests): Ditto.
>  (BROADCAST_TEST): New tests.
>  (riscv_run_selftests): More tests.
>  * config/riscv/riscv-v.cc (emit_pred_move): Refine function.
>  (emit_vlmax_vsetvl): Ditto.
>  (emit_pred_op): Ditto.
>  (expand_const_vector): New function.
>  (legitimize_move): Add constant vector support.
>  * config/riscv/riscv.cc (riscv_print_operand): New asm print rule 
> for const vector.
>  * config/riscv/riscv.h (X0_REGNUM): New macro.
>  * config/riscv/vector-iterators.md: New attribute.
>  * config/riscv/vector.md (vec_duplicate): New pattern.
>  (@pred_broadcast): New pattern.
>
> gcc/testsuite/ChangeLog:
>
>  * gcc.target/riscv/rvv/base/dup-1.c: New test.
>  * gcc.target/riscv/rvv/base/dup-2.c: New test.
 
I think this should wait for the next stage1 cycle.
 
jeff

Re: Re: [PATCH] RISC-V: Remove tail && mask policy operand for vmclr, vmset, vmld, vmst

Yes, it's a cleanup.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2022-11-29 00:48
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Remove tail && mask policy operand for vmclr, 
vmset, vmld, vmst
 
On 11/28/22 07:21, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
>
> Since mask instruction doesn't need policy, so remove it to make it look 
> reasonable.
> gcc/ChangeLog:
>
>  * config/riscv/vector.md: Remove TA && MA operands.
 
Does this fix a known bug or is it just a cleanup?   I think the latter, 
but I want to be sure.
 
 
 
Jeff

Re: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

>> I'm tempted to push this into the next stage1 given its arrival after
>> stage1 close, but if the wider RISC-V maintainers want to see it move
>> forward, I don't object strongly.

Ok, let's save these patches and merge them when GCC14 stage1 is open.
Would you mind telling me when will stage 1 be open?

>> I'm curious about the model you're using.  Is it going to be something
>> similar to mode switching?  That's the first mental model that comes to
>> mind.  Essentially we determine the VL needed for every chunk of code,
>> then we do an LCM like algorithm to find the optimal placement points
>> for VL sets to minimize the number of VL sets across all the paths
>> through the CFG.  Never in a million years would I have expected we'd be
>> considering reusing that code.

Yes, I implemented VSETVL PASS with LCM algorithm and RTL_SSA framework.
Actually, me && kito have spent a month on VSETVL PASS and we have 
made a progress. We have tested it with a lot of testcases, turns out our 
implementation
of VSETVL PASS in GCC has much better codegen than the VSETVL implemented
in LLVM side in many different situations because of LCM. I am working on 
cleaning up the codes
and hopefully you will see it soon in the next patch.

Thanks

juzhe.zh...@rivai.ai

From: Jeff Law
Date: 2022-11-29 00:44
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer
Subject: Re: [PATCH] RISC-V: Add attributes for VSETVL PASS

On 11/28/22 07:14, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
>  * config/riscv/riscv-protos.h (enum vlmul_type): New enum.
>  (get_vlmul): New function.
>  (get_ratio): Ditto.
>  * config/riscv/riscv-v.cc (struct mode_vtype_group): New struct.
>  (ENTRY): Adapt for attributes.
>  (enum vlmul_type): New enum.
>  (get_vlmul): New function.
>  (get_ratio): New function.
>  * config/riscv/riscv-vector-switch.def (ENTRY): Adapt for attributes.
>  * config/riscv/riscv.cc (ENTRY): Ditto.
>  * config/riscv/vector.md (false,true): Add attributes.

I'm tempted to push this into the next stage1 given its arrival after 
stage1 close, but if the wider RISC-V maintainers want to see it move 
forward, I don't object strongly.

I'm curious about the model you're using.  Is it going to be something 
similar to mode switching?  That's the first mental model that comes to 
mind.  Essentially we determine the VL needed for every chunk of code, 
then we do an LCM like algorithm to find the optimal placement points 
for VL sets to minimize the number of VL sets across all the paths 
through the CFG.  Never in a million years would I have expected we'd be 
considering reusing that code.

Jeff

Re: Java front-end and library patches.

2022-11-28 Thread Joseph Myers

On Fri, 25 Nov 2022, Zopolis0 via Gcc-patches wrote:

> Firstly, to get feedback and reviews on the 56 already existing
> patches, even though most are just re-adding code or making idiomatic
> changes, so that when the final issue is solved everything has already
> been approved (hopefully) and the merge is good to go.

I think a lot more explanation is needed to get much useful feedback.

* Each patch should have its own explanation of what it is doing and why, 
in the message body (not in an attachment).  Just the commit summary line 
and ChangeLog entries aren't enough, we need the actual substantive commit 
message explaining the patch.

* An overall explanation is needed of what the patch series is doing and 
why.  Why is it now considered useful to add this front end back?  Which 
version is the basis of the one being added back - the version removed 
from GCC (that used ECJ for converting Java source to Java byte-code), or 
some other version?  How has the series been validated?  Would you propose 
to maintain the front end and libraries in future?  Would you re-open any 
bugs against the front end or libraries that were closed (as WONTFIX or 
otherwise) as a result of it being removed from the tree (maybe when it 
was removed, maybe later when the last release series with the front end 
ceased to be supported)?  And so on.

-- 
Joseph S. Myers
jos...@codesourcery.com

[PATCH] RISC-V: Fix up some wording in the mcpu/mtune comment

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_option_override): Fix comment
wording.
---
Just stumbled on this one looking at the output of that sed script.  The
script itself was fine, the original comment was to blame.
---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 05bdba5ab4d..26c11507895 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5978,7 +5978,7 @@ riscv_option_override (void)
 target_flags |= MASK_FDIV;
 
   /* Handle -mtune, use -mcpu if -mtune is not given, and use default -mtune
- if -mtune and -mcpu both not given.  */
+ if both -mtune and -mcpu are not given.  */
   cpu = riscv_parse_tune (riscv_tune_string ? riscv_tune_string :
  (riscv_cpu_string ? riscv_cpu_string :
   RISCV_TUNE_STRING_DEFAULT));
-- 
2.38.1

[PATCH] c++: TYPENAME_TYPE lookup ignoring non-types [PR107773]

2022-11-28 Thread Patrick Palka via Gcc-patches

[temp.res.general]/3 says, in a note, "the usual qualified name lookup
([basic.lookup.qual]) applies even in the presence of typename".  Thus
when resolving a TYPENAME_TYPE, it seems we shouldn't be looking past
non-type members.

This patch fixes this by passing want_type=false instead of =true during
the member lookup from make_typename_type.  An old nearby comment
mentions that we want to continue to set want_type=true when resolving a
nested typename type, but it appears that the nested case is handled by
resolve_typename_type instead (which passes want_type=true appropriately).

In passing, use lookup_member instead of lookup_field so that we give a
better diagnostic when a member function is found, and generalize the T
format specifier to D in the diagnostic.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/107773

gcc/cp/ChangeLog:

* decl.cc (make_typename_type): Use lookup_member instead of
lookup_field.  Pass want_type=false instead of =true.  Use D
instead of T format specifier.
* search.cc (lookup_member): Document default argument.

gcc/testsuite/ChangeLog:

* g++.dg/template/typename24.C: New test.
* g++.dg/template/typename25.C: New test.
---
 gcc/cp/decl.cc |  7 +++
 gcc/cp/search.cc   |  2 +-
 gcc/testsuite/g++.dg/template/typename24.C | 16 
 gcc/testsuite/g++.dg/template/typename25.C | 20 
 4 files changed, 40 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/typename24.C
 create mode 100644 gcc/testsuite/g++.dg/template/typename25.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 238e72f90da..673e10801a6 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -4303,9 +4303,8 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
  member of the current instantiation or a non-dependent base;
  lookup will stop when we hit a dependent base.  */
   if (!dependent_scope_p (context))
-/* We should only set WANT_TYPE when we're a nested typename type.
-   Then we can give better diagnostics if we find a non-type.  */
-t = lookup_field (context, name, 2, /*want_type=*/true);
+t = lookup_member (context, name, /*protect=*/2, /*want_type=*/false,
+  complain);
   else
 t = NULL_TREE;
 
@@ -4357,7 +4356,7 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
   else
{
  if (complain & tf_error)
-   error ("% names %q#T, which is not a type",
+   error ("% names %q#D, which is not a type",
   context, name, t);
  return error_mark_node;
}
diff --git a/gcc/cp/search.cc b/gcc/cp/search.cc
index 0dbb3be1ee7..e5848ebc620 100644
--- a/gcc/cp/search.cc
+++ b/gcc/cp/search.cc
@@ -1109,7 +1109,7 @@ build_baselink (tree binfo, tree access_binfo, tree 
functions, tree optype)
 
 tree
 lookup_member (tree xbasetype, tree name, int protect, bool want_type,
-  tsubst_flags_t complain, access_failure_info *afi)
+  tsubst_flags_t complain, access_failure_info *afi /* = NULL */)
 {
   tree rval, rval_binfo = NULL_TREE;
   tree type = NULL_TREE, basetype_path = NULL_TREE;
diff --git a/gcc/testsuite/g++.dg/template/typename24.C 
b/gcc/testsuite/g++.dg/template/typename24.C
new file mode 100644
index 000..4b1d5e5271b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/typename24.C
@@ -0,0 +1,16 @@
+// PR c++/107773
+
+struct a {
+  typedef void get;
+};
+
+struct b : a {
+  int get(int i) const;
+};
+
+template
+void f() {
+  typedef typename T::get type; // { dg-error "'int b::get\\(int\\) const', 
which is not a type" }
+}
+
+template void f();
diff --git a/gcc/testsuite/g++.dg/template/typename25.C 
b/gcc/testsuite/g++.dg/template/typename25.C
new file mode 100644
index 000..4e6b764a97b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/typename25.C
@@ -0,0 +1,20 @@
+// Example 4 from [temp.res.general]/3.
+
+struct A {
+  struct X { };
+  int X;
+};
+struct B {
+  struct X { };
+};
+template void f(T t) {
+  typename T::X x; // { dg-error "'int A::X', which is not a type" }
+}
+void foo() {
+  A a;
+  B b;
+  f(b); // OK, T::X refers to B::X
+  // { dg-bogus "" "" { target *-*-* } .-1 }
+  f(a); // error: T::X refers to the data member A::X not the struct A::X
+  // { dg-message "required from here" "" { target *-*-* } .-1 }
+}
-- 
2.39.0.rc0.33.g815c1e8202

Re: [committed] Fix up duplicated duplicated words in comments

2022-11-28 Thread Andrew Pinski via Gcc-patches

On Mon, Mar 7, 2022 at 6:06 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> Like in r10-7215-g700d4cb08c88aec37c13e21e63dd61fd698baabc 2 years ago,
> I've run
> grep -v 'long long\|optab optab\|template template\|double double' 
> *.{[chS],cc} */*.{[chS],cc} *.def config/*/* 2>/dev/null | grep ' 
> \([a-zA-Z]\+\) \1 '

Some small changes to this shell command for next time:
grep -vi 'long long\|optab optab\|template template\|double double'
*.{[chS],cc} */*.{[chS],cc} *.def config/*/* *.pd 2>/dev/null | grep
-i ' \([a-zA-Z]\+\) \1 '

Adding .pd and doing a case insensitive grep would have found the fix
I had committed at
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607425.html .
The case insensitive grep increases the number of false positives
though but I am not the best person for grep and the case where we had
"mode MODE" which shows up
in the comments of the function calls.

Thanks,
Andrew Pinski


> and for the cases that looked clearly wrong changed them, mostly by removing
> one of the duplicated words but in some cases with other changes.
>
> Committed to trunk as obvious.
>
> 2022-03-07  Jakub Jelinek  
>
> gcc/
> * tree-ssa-propagate.cc: Fix up duplicated word issue in a comment.
> * config/riscv/riscv.cc: Likewise.
> * config/darwin.h: Likewise.
> * config/i386/i386.cc: Likewise.
> * config/aarch64/thunderx3t110.md: Likewise.
> * config/aarch64/fractional-cost.h: Likewise.
> * config/vax/vax.cc: Likewise.
> * config/rs6000/pcrel-opt.md: Likewise.
> * config/rs6000/predicates.md: Likewise.
> * ctfc.h: Likewise.
> * tree-ssa-uninit.cc: Likewise.
> * value-relation.h: Likewise.
> * gimple-range-gori.cc: Likewise.
> * ipa-polymorphic-call.cc: Likewise.
> * pointer-query.cc: Likewise.
> * ipa-sra.cc: Likewise.
> * internal-fn.cc: Likewise.
> * varasm.cc: Likewise.
> * gimple-ssa-warn-access.cc: Likewise.
> gcc/analyzer/
> * store.cc: Fix up duplicated word issue in a comment.
> * analyzer.cc: Likewise.
> * engine.cc: Likewise.
> * sm-taint.cc: Likewise.
> gcc/c-family/
> * c-attribs.cc: Fix up duplicated word issue in a comment.
> gcc/cp/
> * cvt.cc: Fix up duplicated word issue in a comment.
> * pt.cc: Likewise.
> * module.cc: Likewise.
> * coroutines.cc: Likewise.
> gcc/fortran/
> * trans-expr.cc: Fix up duplicated word issue in a comment.
> * gfortran.h: Likewise.
> * scanner.cc: Likewise.
> gcc/jit/
> * libgccjit.h: Fix up duplicated word issue in a comment.
>
> --- gcc/tree-ssa-propagate.cc.jj2022-01-18 11:59:00.090974799 +0100
> +++ gcc/tree-ssa-propagate.cc   2022-03-07 14:33:28.033829512 +0100
> @@ -697,7 +697,7 @@ private:
>  gimple_stmt_iterator new_gsi);
>  };
>
> -/* Call post_new_stmt for each each new statement that has been added
> +/* Call post_new_stmt for each new statement that has been added
> to the current BB.  OLD_GSI is the statement iterator before the BB
> changes ocurred.  NEW_GSI is the iterator which may contain new
> statements.  */
> --- gcc/config/riscv/riscv.cc.jj2022-02-04 14:36:54.467612813 +0100
> +++ gcc/config/riscv/riscv.cc   2022-03-07 14:50:54.717372413 +0100
> @@ -4984,7 +4984,7 @@ riscv_option_override (void)
>  target_flags |= MASK_FDIV;
>
>/* Handle -mtune, use -mcpu if -mtune is not given, and use default -mtune
> - if -mtune and -mcpu both not not given.  */
> + if -mtune and -mcpu both not given.  */
>cpu = riscv_parse_tune (riscv_tune_string ? riscv_tune_string :
>   (riscv_cpu_string ? riscv_cpu_string :
>RISCV_TUNE_STRING_DEFAULT));
> --- gcc/config/darwin.h.jj  2022-01-18 11:58:59.078989257 +0100
> +++ gcc/config/darwin.h 2022-03-07 14:36:18.924463533 +0100
> @@ -340,7 +340,7 @@ extern GTY(()) int darwin_ms_struct;
>  " %:version-compare(>= 10.6 mmacosx-version-min= -no_compact_unwind) "
>
>  /* In Darwin linker specs we can put -lcrt0.o and ld will search the library
> -   path for crt0.o or -lcrtx.a and it will search for for libcrtx.a.  As for
> +   path for crt0.o or -lcrtx.a and it will search for libcrtx.a.  As for
> other ports, we can also put xxx.{o,a}%s and get the appropriate complete
> startfile absolute directory.  This latter point is important when we want
> to override ld's rule of .dylib being found ahead of .a and the user wants
> --- gcc/config/i386/i386.cc.jj  2022-03-04 09:35:58.674788325 +0100
> +++ gcc/config/i386/i386.cc 2022-03-07 14:50:08.093016106 +0100
> @@ -20334,7 +20334,7 @@ ix86_division_cost (const struct process
>
>  /* Return cost of shift in MODE.
> If CONSTANT_OP1 is true, the op1 value is known and set in OP1_VAL.
> -   AND_IN_OP1 specify in op1 is result of and and SHIFT_AND_TRUNCATE
> +

[pushed] c++: simple-requirement starting with 'typename' [PR101733]

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Usually a requirement starting with 'typename' is a type-requirement, but it
might be a simple-requirement such as a functional cast to a typename-type.

PR c++/101733

gcc/cp/ChangeLog:

* parser.cc (cp_parser_requirement): Parse tentatively for the
'typename' case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-requires32.C: New test.
---
 gcc/cp/parser.cc | 15 ++-
 gcc/testsuite/g++.dg/cpp2a/concepts-requires32.C | 11 +++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires32.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 82459b7683a..a13fbe41309 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -30737,7 +30737,20 @@ cp_parser_requirement (cp_parser *parser)
   if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
 return cp_parser_compound_requirement (parser);
   else if (cp_lexer_next_token_is_keyword (parser->lexer, RID_TYPENAME))
-return cp_parser_type_requirement (parser);
+{
+  /* It's probably a type-requirement.  */
+  cp_parser_parse_tentatively (parser);
+  tree req = cp_parser_type_requirement (parser);
+  if (cp_parser_parse_definitely (parser))
+   return req;
+  /* No, maybe it's something like typename T::type(); */
+  cp_parser_parse_tentatively (parser);
+  req = cp_parser_simple_requirement (parser);
+  if (cp_parser_parse_definitely (parser))
+   return req;
+  /* Non-tentative for the error.  */
+  return cp_parser_type_requirement (parser);
+}
   else if (cp_lexer_next_token_is_keyword (parser->lexer, RID_REQUIRES))
 return cp_parser_nested_requirement (parser);
   else
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-requires32.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-requires32.C
new file mode 100644
index 000..117b8920787
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires32.C
@@ -0,0 +1,11 @@
+// PR c++/101733
+// { dg-do compile { target c++20 } }
+
+template
+requires requires {
+typename T::type;
+(typename T::type()); // (1)
+T::type();// (2)
+typename T::type();   // (3)
+}
+void f(T) { }

base-commit: 47d81b1b89d615cea27307c713a4afe591e1cd2d
prerequisite-patch-id: 275d90e1bd8b940c1cca2840bc38dc4fafa0797b
-- 
2.31.1

[pushed] c++: be more strict about 'concept bool'

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Some clang folks mailed me asking about being less permissive about
'concept bool', so let's bump it up from pedwarn to permerror.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_decl_specifier_seq): Change 'concept bool'
diagnostic from pedwarn to permerror.
---
 gcc/cp/parser.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index aec625e2d9c..82459b7683a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -15831,11 +15831,11 @@ cp_parser_decl_specifier_seq (cp_parser* parser,
 {
  cp_token *next = cp_lexer_peek_token (parser->lexer);
  if (next->keyword == RID_BOOL)
-   pedwarn (next->location, 0, "the % keyword is not "
-"allowed in a C++20 concept definition");
+   permerror (next->location, "the % keyword is not "
+  "allowed in a C++20 concept definition");
  else
-   pedwarn (token->location, 0, "C++20 concept definition syntax "
-"is % = %>");
+   error_at (token->location, "C++20 concept definition syntax "
+ "is % = %>");
 }
 
  /* In C++20 a concept definition is just 'concept name = expr;'

base-commit: 47d81b1b89d615cea27307c713a4afe591e1cd2d
-- 
2.31.1

Re: [PATCH] coroutines: Fix promotion of class members in co_await statements [PR99576]

2022-11-28 Thread Iain Sandoe

Hi Adrian,

Again thanks for working on this and getting it into a suitable state for 
posting.
It’s a good idea to CC relevant maintainer(s) on patches [see MAINTAINERS] and 
I’d welcome cc on any coroutines ones (it’s easy to miss a patch otherwise).

> On 28 Nov 2022, at 11:55, Adrian Perl via Gcc-patches 
>  wrote:

> please have a look at the patch below for a potential fix that addresses the 
> incorrect 'promotion'
> of members found in temporaries within co_await statements.
> 
> To summarize my post in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99576#c5, the recursive
> promotion of temporaries is too eager and also recurses into constructor 
> statements where it
> finds the initialization of members. This patch prevents the recursion into 
> constructors and
> skips the remaining subtree.
> 
> It fixes the issues in bug reports [PR99576, PR100611, PR101976, PR101367, 
> PR107288] which are all
> related to incorrect 'promotions' (and manifest in too many destructor calls).
> 
> I have added test applications based on examples in the PRs, which I have 
> only slightly
> annotated and refactored to allow automatic testing. They are all basically 
> quiet similar.
> The main difference is how the temporaries are used in the co_await (and 
> co_yield) statements.
> 
> Bootstrapping and running the testsuite on x86_64 was successfull. No 
> regression occured.

This looks resonable to me, as said in the PR.  I’d like to test a little wider 
with some larger
codebases, if you could bear with me for a few days.

I think this patch is small enough to accept without any copyright machinery, 
but (for reference and
in the hope that larger patches might be coming) .. at some stage, you’d need 
to do one of:

1. get an FSF copyright assignment
2. release your patch under the DCO.

See: “Legal Prerequisites” here https://gcc.gnu.org/contribute.html.

thanks again for the patch
Iain

> Please let me know if you need more information.
> 
>PR 100611
>PR 101367
>PR 101976
>PR 99576
> 
>gcc/cp/ChangeLog:
> 
>* coroutines.cc (find_interesting_subtree): Prevent recursion into 
> constructor
> 
>gcc/testsuite/ChangeLog:
> 
>* g++.dg/coroutines/pr100611.C: New test.
>* g++.dg/coroutines/pr101367.C: New test.
>* g++.dg/coroutines/pr101976.C: New test.
>* g++.dg/coroutines/pr99576_1.C: New test.
>* g++.dg/coroutines/pr99576_2.C: New test.
> 
> ---
> gcc/cp/coroutines.cc|   2 +
> gcc/testsuite/g++.dg/coroutines/pr100611.C  |  93 +++
> gcc/testsuite/g++.dg/coroutines/pr101367.C  |  78 +
> gcc/testsuite/g++.dg/coroutines/pr101976.C  |  76 
> gcc/testsuite/g++.dg/coroutines/pr99576_1.C | 123 
> gcc/testsuite/g++.dg/coroutines/pr99576_2.C |  71 +++
> 6 files changed, 443 insertions(+)
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr100611.C
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr101367.C
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr101976.C
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr99576_1.C
> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr99576_2.C
> 
> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
> index 01a3e831ee5..a87ea7fe60a 100644
> --- a/gcc/cp/coroutines.cc
> +++ b/gcc/cp/coroutines.cc
> @@ -2684,6 +2684,8 @@ find_interesting_subtree (tree *expr_p, int *dosub, 
> void *d)
> return expr;
>   }
> }
> +  else if (TREE_CODE(expr) == CONSTRUCTOR)
> +*dosub = 0; /* We don't need to consider this any further.  */
>   else if (tmp_target_expr_p (expr)
>  && !p->temps_used->contains (expr))
> {
> diff --git a/gcc/testsuite/g++.dg/coroutines/pr100611.C 
> b/gcc/testsuite/g++.dg/coroutines/pr100611.C
> new file mode 100644
> index 000..5fbcfa7e6ec
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/coroutines/pr100611.C
> @@ -0,0 +1,93 @@
> +/*
> +  Test that instances created in capture clauses within co_await statements 
> do not
> +  get 'promoted'. This would lead to the members destructor getting called 
> more
> +  than once.
> +
> +  Correct output should look like:
> +  Foo(23) 0xf042d8
> +  Foo(const& 23) 0xf042ec
> +  ~Foo(23) 0xf042ec
> +  After co_await
> +  ~Foo(23) 0xf042d8
> +*/
> +#include 
> +#include 
> +
> +static unsigned int struct_Foo_destructor_counter = 0;
> +static bool lambda_was_executed = false;
> +
> +class Task {
> +public:
> +  struct promise_type {
> +Task get_return_object() {
> +  return {std::coroutine_handle::from_promise(*this)};
> +}
> +
> +std::suspend_never initial_suspend() { return {}; }
> +std::suspend_always final_suspend() noexcept { return {}; }
> +void unhandled_exception() {}
> +void return_void() {}
> +  };
> +
> +  ~Task() {
> +if (handle_) {
> +  handle_.destroy();
> +}
> +  }
> +
> +  bool await_ready() { return false; }
>

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Adapt to_chars/from_chars symbols


On 28/11/22 19:35, Jonathan Wakely wrote:



On Mon, 28 Nov 2022 at 06:07, François Dumont via Libstdc++ 
mailto:libstdc%2b...@gcc.gnu.org>> wrote:


This patch is fixing those tests:

20_util/to_chars/float128_c++23.cc
std/format/formatter/requirements.cc
std/format/functions/format.cc
std/format/functions/format_to_n.cc
std/format/functions/size.cc
std/format/functions/vformat_to.cc
std/format/string.cc

Note that symbols used in  for __ibm128 and __iee128 are
untested.


We don't need to do this for those symbols, the ALT128 config is 
incompatible with versioned namespace. If you're using the versioned 
namespace, you don't need backwards compatibility with the old long 
double ABI.





Here is the simplified patch then.

    libstdc++: [_GLIBCXX_INLINE_VERSION] Add to_chars/from_chars 
symbols export


    libstdc++-v3/ChangeLog

    * include/std/format [_GLIBCXX_INLINE_VERSION](to_chars): 
Adapt __asm symbol

    specifications.
    * config/abi/pre/gnu-versioned-namespace.ver: Add 
to_chars/from_chars symbols

    export.

Ok to commit ?

François
diff --git a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
index 06ccaa80a58..7fc81514808 100644
--- a/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
+++ b/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
@@ -142,6 +142,12 @@ GLIBCXX_8.0 {
 _ZN14__gnu_parallel9_Settings3getEv;
 _ZN14__gnu_parallel9_Settings3setERS0_;
 
+# to_chars/from_chars _Float128
+_ZNSt3__88to_charsEPcS0_DF128_;
+_ZNSt3__88to_charsEPcS0_DF128_NS_12chars_formatE;
+_ZNSt3__88to_charsEPcS0_DF128_NS_12chars_formatEi;
+_ZNSt3__810from_charsEPKcS1_RDF128_NS_12chars_formatE;
+
   local:
 *;
 };
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 23ffbdabed8..fb7a02cec57 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1288,15 +1288,27 @@ namespace __format
   // Make them available as std::__format::to_chars.
   to_chars_result
   to_chars(char*, char*, _Float128) noexcept
+#  if _GLIBCXX_INLINE_VERSION
+__asm("_ZNSt3__88to_charsEPcS0_DF128_");
+#  else
 __asm("_ZSt8to_charsPcS_DF128_");
+#  endif
 
   to_chars_result
   to_chars(char*, char*, _Float128, chars_format) noexcept
+#  if _GLIBCXX_INLINE_VERSION
+__asm("_ZNSt3__88to_charsEPcS0_DF128_NS_12chars_formatE");
+#  else
 __asm("_ZSt8to_charsPcS_DF128_St12chars_format");
+#  endif
 
   to_chars_result
   to_chars(char*, char*, _Float128, chars_format, int) noexcept
+#  if _GLIBCXX_INLINE_VERSION
+__asm("_ZNSt3__88to_charsEPcS0_DF128_NS_12chars_formatEi");
+#  else
 __asm("_ZSt8to_charsPcS_DF128_St12chars_formati");
+#  endif
 # endif
 #endif

[PATCH] c++: explicit specialization and trailing requirements [PR107864]

2022-11-28 Thread Patrick Palka via Gcc-patches

Here we're crashing when using an explicit specialization of a function
template with trailing requirements ultimately because decls_match
(called indirectly from register_specialization) returns false since the
template has trailing requirements whereas the specialization doesn't.

In r12-2230-gddd25bd1a7c8f4, we fixed a similar issue concerning
template requirements instead of trailing requirements.  We could just
extend this fix to ignore trailing requirement mismatches for
explicit specializations as well, but it seems cleaner to just propagate
constraints from the specialized template to the explicit specialization
so that decls_match will naturally return true.  And it looks like
determine_specialization already does this, albeit inconsistently (only
for non-template member functions of a class template as in
cpp2a/concepts-explicit-spec4.C).

So this patch makes determine_specialization consistently propagate
constraints from the specialized template to the specialization, which
obviates the function_requirements_equivalent_p special case added by
r12-2230.  In passing use add_outermost_template_args instead of open
coding it.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  Also tested on range-v3 and cmcstl2.

PR c++/107864

gcc/cp/ChangeLog:

* decl.cc (function_requirements_equivalent_p): Don't check
DECL_TEMPLATE_SPECIALIZATION.
* pt.cc (determine_specialization): Propagate constraints when
specializing a function template too.  Simplify by using
add_outermost_template_args.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/explicit-spec1a.C: New test.
---
 gcc/cp/decl.cc|  4 +---
 gcc/cp/pt.cc  | 21 +--
 .../g++.dg/concepts/explicit-spec1a.C | 11 ++
 3 files changed, 22 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/concepts/explicit-spec1a.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 544efdc9914..238e72f90da 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -956,9 +956,7 @@ static bool
 function_requirements_equivalent_p (tree newfn, tree oldfn)
 {
   /* In the concepts TS, the combined constraints are compared.  */
-  if (cxx_dialect < cxx20
-  && (DECL_TEMPLATE_SPECIALIZATION (newfn)
- <= DECL_TEMPLATE_SPECIALIZATION (oldfn)))
+  if (cxx_dialect < cxx20)
 {
   tree ci1 = get_constraints (oldfn);
   tree ci2 = get_constraints (newfn);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index fbf498ad16a..e677e9d1b38 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -2482,17 +2482,16 @@ determine_specialization (tree template_id,
 }
 
   /* It was a specialization of a template.  */
-  targs = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (TREE_VALUE (templates)));
-  if (TMPL_ARGS_HAVE_MULTIPLE_LEVELS (targs))
-{
-  *targs_out = copy_node (targs);
-  SET_TMPL_ARGS_LEVEL (*targs_out,
-  TMPL_ARGS_DEPTH (*targs_out),
-  TREE_PURPOSE (templates));
-}
-  else
-*targs_out = TREE_PURPOSE (templates);
-  return TREE_VALUE (templates);
+  tree tmpl = TREE_VALUE (templates);
+  targs = DECL_TI_ARGS (DECL_TEMPLATE_RESULT (tmpl));
+  targs = add_outermost_template_args (targs, TREE_PURPOSE (templates));
+  *targs_out = targs;
+
+  /* Propagate the template's constraints to the declaration.  */
+  if (tsk != tsk_template)
+set_constraints (decl, get_constraints (tmpl));
+
+  return tmpl;
 }
 
 /* Returns a chain of parameter types, exactly like the SPEC_TYPES,
diff --git a/gcc/testsuite/g++.dg/concepts/explicit-spec1a.C 
b/gcc/testsuite/g++.dg/concepts/explicit-spec1a.C
new file mode 100644
index 000..ec678740cb8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/explicit-spec1a.C
@@ -0,0 +1,11 @@
+// A version of explicit-spec1.C where the template g has trailing instead of
+// template requirements.
+// PR c++/107864
+// { dg-do compile { target concepts } }
+
+template concept C = __is_class(T);
+struct Y { int n; } y;
+template void g(T) requires C { }
+int called;
+template<> void g(Y) { called = 3; }
+int main() { g(y); }
-- 
2.39.0.rc0.33.g815c1e8202

[COMMITTED] Fix comment for (A / (1 << B)) -> (A >> B).

2022-11-28 Thread apinski--- via Gcc-patches

From: Andrew Pinski 

There was a small typo where Also was done
twice. The second also should have been
handled. This fixes that.

Committed as obvious after a build.

gcc/ChangeLog:

* match.pd ((A / (1 << B)) -> (A >> B).):
Fix comment.
---
 gcc/match.pd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 67a0a682f31..f8610e37011 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -354,7 +354,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
Only for unsigned A.  For signed A, this would not preserve rounding
toward zero.
For example: (-1 / ( 1 << B)) !=  -1 >> B.
-   Also also widening conversions, like:
+   Also handle widening conversions, like:
(A / (unsigned long long) (1U << B)) -> (A >> B)
or
(A / (unsigned long long) (1 << B)) -> (A >> B).
-- 
2.17.1

[PATCH] Fortran: intrinsic MERGE shall use all its arguments [PR107874]

2022-11-28 Thread Harald Anlauf via Gcc-patches

Dear all,

as reported, the Fortran standard requires all actual argument
expressions to be evaluated (e.g. F2018:15.5.3).

There were two cases for intrinsic MERGE where we failed to do so:

- non-constant mask; Steve provided the patch

- constant scalar mask; we need to be careful to simplify only if
  the argument on the "other" path is known to be constant so that
  it does not have side-effects and can be immediately removed.

The latter change needed a correction of a sub-test of testcase
merge_init_expr_2.f90, which should not have been simplified
the way the original author assumed.  I decided to modify the
test in such way that simplification is valid and provides
the expect pattern.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald
From 0f6058937c04a7af5e6dcfa173648149c24f08df Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 28 Nov 2022 20:43:02 +0100
Subject: [PATCH] Fortran: intrinsic MERGE shall use all its arguments
 [PR107874]

gcc/fortran/ChangeLog:

	PR fortran/107874
	* simplify.cc (gfc_simplify_merge): When simplifying MERGE with a
	constant scalar MASK, ensure that arguments TSOURCE and FSOURCE are
	either constant or will be evaluated.
	* trans-intrinsic.cc (gfc_conv_intrinsic_merge): Evaluate arguments
	before generating conditional expression.

gcc/testsuite/ChangeLog:

	PR fortran/107874
	* gfortran.dg/merge_init_expr_2.f90: Adjust code to the corrected
	simplification.
	* gfortran.dg/merge_1.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/simplify.cc   | 17 ++-
 gcc/fortran/trans-intrinsic.cc|  3 ++
 gcc/testsuite/gfortran.dg/merge_1.f90 | 49 +++
 .../gfortran.dg/merge_init_expr_2.f90 |  3 +-
 4 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/merge_1.f90

diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
index 9c2fea8c5f2..b6184181f26 100644
--- a/gcc/fortran/simplify.cc
+++ b/gcc/fortran/simplify.cc
@@ -4913,7 +4913,22 @@ gfc_simplify_merge (gfc_expr *tsource, gfc_expr *fsource, gfc_expr *mask)

   if (mask->expr_type == EXPR_CONSTANT)
 {
-  result = gfc_copy_expr (mask->value.logical ? tsource : fsource);
+  /* The standard requires evaluation of all function arguments.
+	 Simplify only when the other dropped argument (FSOURCE or TSOURCE)
+	 is a constant expression.  */
+  if (mask->value.logical)
+	{
+	  if (!gfc_is_constant_expr (fsource))
+	return NULL;
+	  result = gfc_copy_expr (tsource);
+	}
+  else
+	{
+	  if (!gfc_is_constant_expr (tsource))
+	return NULL;
+	  result = gfc_copy_expr (fsource);
+	}
+
   /* Parenthesis is needed to get lower bounds of 1.  */
   result = gfc_get_parentheses (result);
   gfc_simplify_expr (result, 1);
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index bb938026828..93426981bac 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -7557,6 +7557,9 @@ gfc_conv_intrinsic_merge (gfc_se * se, gfc_expr * expr)
    >pre);
   se->string_length = len;
 }
+  tsource = gfc_evaluate_now (tsource, >pre);
+  fsource = gfc_evaluate_now (fsource, >pre);
+  mask = gfc_evaluate_now (mask, >pre);
   type = TREE_TYPE (tsource);
   se->expr = fold_build3_loc (input_location, COND_EXPR, type, mask, tsource,
 			  fold_convert (type, fsource));
diff --git a/gcc/testsuite/gfortran.dg/merge_1.f90 b/gcc/testsuite/gfortran.dg/merge_1.f90
new file mode 100644
index 000..abbc2276b1c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/merge_1.f90
@@ -0,0 +1,49 @@
+! { dg-do run }
+! PR fortran/107874 - merge not using all its arguments
+! Contributed by John Harper
+
+program testmerge9
+  implicit none
+  integer :: i
+  logical :: x(2) = (/.true., .false./)
+  logical :: called(2)
+
+  ! At run-time all arguments shall be evaluated
+  do i = 1,2
+ called = .false.
+ print *, merge (tstuff(), fstuff(), x(i))
+ if (any (.not. called)) stop 1
+  end do
+
+  ! Compile-time simplification shall not drop non-constant args
+  called = .false.
+  print *, merge (tstuff(),fstuff(),.true.)
+  if (any (.not. called)) stop 2
+  called = .false.
+  print *, merge (tstuff(),fstuff(),.false.)
+  if (any (.not. called)) stop 3
+  called = .false.
+  print *, merge (tstuff(),.false.,.true.)
+  if (any (called .neqv. [.true.,.false.])) stop 4
+  called = .false.
+  print *, merge (tstuff(),.false.,.false.)
+  if (any (called .neqv. [.true.,.false.])) stop 5
+  called = .false.
+  print *, merge (.true.,fstuff(),.true.)
+  if (any (called .neqv. [.false.,.true.])) stop 6
+  called = .false.
+  print *, merge (.true.,fstuff(),.false.)
+  if (any (called .neqv. [.false.,.true.])) stop 7
+contains
+  logical function tstuff()
+print *,'tstuff'
+tstuff = .true.
+called(1) = .true.
+  end function tstuff
+
+  logical function fstuff()
+print *,'fstuff'
+fstuff = .false.

Re: [PATCH RESEND] riscv: improve the cost model for loading a 64bit constant in rv32.




On 11/10/22 07:37, Lin Sinan via Gcc-patches wrote:

The motivation of this patch is to correct the wrong estimation of the number 
of instructions needed for loading a 64bit constant in rv32 in the current cost 
model(riscv_interger_cost). According to the current implementation, if a 
constant requires more than 3 instructions(riscv_const_insn and 
riscv_legitimate_constant_p), then the constant will be put into constant pool 
when expanding gimple to rtl(legitimate_constant_p hook and emit_move_insn). So 
the inaccurate cost model leads to the suboptimal codegen in rv32 and the wrong 
estimation part could be corrected through this fix.

e.g. the current codegen for loading 0x839290001 in rv32

   lui a5,%hi(.LC0)
   lw  a0,%lo(.LC0)(a5)
   lw  a1,%lo(.LC0+4)(a5)
.LC0:
   .word   958988289
   .word   8

output after this patch

   li a0,958988288
   addi a0,a0,1
   li a1,8

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_build_integer): Handle the case of 
loading 64bit constant in rv32.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rv32-load-64bit-constant.c: New test.

Signed-off-by: Lin Sinan 
I fixed up the ChangeLog and some minor formatting issues in the new 
code in riscv_build_integer.  I also twiddled the test so that it 
iterates over the optimization levels properly while skipping -O0.


Attached is the patch I committed.

jeffcommit 940d5b56990fdf171f49517ae102673817b9c869
Author: Sinan 
Date:   Mon Nov 28 12:41:17 2022 -0700

riscv: improve cost model for loading 64bit constant in rv32

The motivation of this patch is to correct the wrong estimation of the 
number of instructions needed for loading a 64bit constant in rv32 in the 
current cost model(riscv_interger_cost). According to the current 
implementation, if a constant requires more than 3 
instructions(riscv_const_insn and riscv_legitimate_constant_p), then the 
constant will be put into constant pool when expanding gimple to 
rtl(legitimate_constant_p hook and emit_move_insn). So the inaccurate cost 
model leads to the suboptimal codegen in rv32 and the wrong estimation part 
could be corrected through this fix.

e.g. the current codegen for loading 0x839290001 in rv32

  lui a5,%hi(.LC0)
  lw  a0,%lo(.LC0)(a5)
  lw  a1,%lo(.LC0+4)(a5)
.LC0:
  .word   958988289
  .word   8

output after this patch

  li a0,958988288
  addi a0,a0,1
  li a1,8

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_build_integer): Improve some cases
of loading 64bit constants for rv32.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32-load-64bit-constant.c: New test.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ab02a81e152..05bdba5ab4d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -625,6 +625,30 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
}
 }
 
+  if (!TARGET_64BIT
+  && (value > INT32_MAX || value < INT32_MIN))
+{
+  unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
+  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
+  struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
+  struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
+  int hi_cost, lo_cost;
+
+  hi_cost = riscv_build_integer_1 (hicode, hival, mode);
+  if (hi_cost < cost)
+   {
+ lo_cost = riscv_build_integer_1 (alt_codes, loval, mode);
+ if (lo_cost + hi_cost < cost)
+   {
+ memcpy (codes, alt_codes,
+ lo_cost * sizeof (struct riscv_integer_op));
+ memcpy (codes + lo_cost, hicode,
+ hi_cost * sizeof (struct riscv_integer_op));
+ cost = lo_cost + hi_cost;
+   }
+   }
+}
+
   return cost;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c 
b/gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c
new file mode 100644
index 000..954e1ddf1c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32im -mabi=ilp32" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+
+/* This test only applies to RV32. Some of 64bit constants in this test will 
be put
+into the constant pool in RV64, since RV64 might need one extra instruction to 
load
+64bit constant. */
+
+unsigned long long
+rv32_mov_64bit_int1 (void)
+{
+  return 0x739290001LL;
+}
+
+unsigned long long
+rv32_mov_64bit_int2 (void)
+{
+  return 0x839290001LL;
+}
+
+unsigned long long
+rv32_mov_64bit_int3 (void)
+{
+  return 0x392900013929LL;
+}
+
+unsigned long long
+rv32_mov_64bit_int4 (void)
+{
+  return 0x392900113929LL;
+}
+
+unsigned long long
+rv32_mov_64bit_int5 (void)
+{
+  return 0x14736def3929LL;
+}
+
+/* { dg-final { scan-assembler-not "lw\t"

Re: [PATCH v2] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU

2022-11-28 Thread Maciej W. Rozycki

On Mon, 28 Nov 2022, Jeff Law wrote:

> >   Given the false negatives how about getting a bit stricter and also
> > checking there's nothing following the XORI instruction, like here?
> > 
> >   It might be an overkill to have a check both for the sequence and for the
> > absence of ANDI or SEXT.W as well, but I'd rather have them both out of an
> > abundance of caution.
> Sure.  That works for me as well.  OK for the trunk.

 I have committed it then.  Thank you for your review.

> Interestingly enough Raphael and I are looking at a case where Roger's patch
> is causing poorer code generation.  Given what we're finding as we work
> through the other case, I won't be surprised if we find multiple cases where
> RISC-V is generating poorer code after that patch, even though it's a
> perfectly sensible patch.

 I think it would make sense to run RISC-V performance evaluation w/ and 
w/o Roger's applied.  Sadly I am somewhat resource-constrained right now 
and won't be able to do that anytime soon, but hopefully there's enough 
RISC-V hardware available now for someone to pick it up.

  Maciej

Re: 回复：[PING] [PATCH RESEND] riscv: improve the cost model for loading a 64bit constant in rv32.

On Mon, 28 Nov 2022 11:15:01 PST (-0800), gcc-patches@gcc.gnu.org wrote:
>
>
> On 11/24/22 00:43, Sinan wrote:
>>>Â TheÂ motivationÂ ofÂ thisÂ patchÂ isÂ toÂ correctÂ theÂ wrongÂ estimationÂ 
>>>of
Â theÂ numberÂ ofÂ instructionsÂ neededÂ forÂ loadingÂ aÂ 64bitÂ constantÂ 
in
Â rv32Â inÂ theÂ currentÂ costÂ model(riscv_interger_cost).Â AccordingÂ to
Â theÂ currentÂ implementation,Â ifÂ aÂ constantÂ requiresÂ moreÂ thanÂ 3
Â instructions(riscv_const_insnÂ andÂ riscv_legitimate_constant_p),
Â thenÂ theÂ constantÂ willÂ beÂ putÂ intoÂ constantÂ poolÂ whenÂ expanding
Â gimpleÂ toÂ rtl(legitimate_constant_pÂ hookÂ andÂ emit_move_insn).
Â SoÂ theÂ inaccurateÂ costÂ modelÂ leadsÂ toÂ theÂ suboptimalÂ codegen
Â inÂ rv32Â andÂ theÂ wrongÂ estimationÂ partÂ couldÂ beÂ correctedÂ through
Â thisÂ fix.

Â e.g.Â theÂ currentÂ codegenÂ forÂ loadingÂ 0x839290001Â inÂ rv32

Â Â Â Â luiÂ Â Â Â Â a5,%hi(.LC0)
Â Â Â Â lwÂ Â Â Â Â Â a0,%lo(.LC0)(a5)
Â Â Â Â lwÂ Â Â Â Â Â a1,%lo(.LC0+4)(a5)
Â .LC0:
Â Â Â Â .wordÂ Â Â 958988289
Â Â Â Â .wordÂ Â Â 8

Â outputÂ afterÂ thisÂ patch

Â Â Â Â liÂ a0,958988288
Â Â Â Â addiÂ a0,a0,1
Â Â Â Â liÂ a1,8

Â gcc/ChangeLog:

Â Â Â Â Â Â Â Â Â Â *Â config/riscv/riscv.ccÂ (riscv_build_integer):Â 
HandleÂ theÂ caseÂ ofÂ loadingÂ 64bitÂ constantÂ inÂ rv32.

Â gcc/testsuite/ChangeLog:

Â Â Â Â Â Â Â Â Â Â *Â gcc.target/riscv/rv32-load-64bit-constant.c:Â NewÂ 
test.

Â Signed-off-by:Â LinÂ SinanÂ 
Â ---
Â Â Â gcc/config/riscv/riscv.ccÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â |Â 
23Â +++
Â Â Â .../riscv/rv32-load-64bit-constant.cÂ Â Â Â Â Â Â Â Â Â |Â 38Â 
+++
Â Â Â 2Â filesÂ changed,Â 61Â insertions(+)
Â Â Â createÂ modeÂ 100644Â 
gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c

Â diffÂ --gitÂ a/gcc/config/riscv/riscv.ccÂ b/gcc/config/riscv/riscv.cc
Â indexÂ 32f9ef9ade9..9dffabdc5e3Â 100644
Â ---Â a/gcc/config/riscv/riscv.cc
Â +++Â b/gcc/config/riscv/riscv.cc
Â @@Â -618,6Â +618,29Â @@Â riscv_build_integerÂ (structÂ riscv_integer_opÂ 
*codes,Â HOST_WIDE_INTÂ value,
Â Â Â Â }
Â Â Â Â Â Â Â }

Â +Â Â ifÂ ((valueÂ >Â INT32_MAXÂ ||Â valueÂ <Â INT32_MIN)Â &&Â 
!TARGET_64BIT)
>>>
>>>Â Nit.Â Â Â It'sÂ commonÂ practiceÂ toÂ haveÂ theÂ TARGETÂ testÂ firstÂ inÂ 
>>>aÂ seriesÂ of
>>>Â tests.Â Â ItÂ mayÂ alsoÂ beÂ advisableÂ toÂ breakÂ thisÂ intoÂ twoÂ lines.
>>>Â SomethingÂ likeÂ this:
>>>
>>>
>>>Â Â Â ifÂ ((!TARGET_64BIT)
>>>Â Â Â Â Â Â Â ||Â valueÂ >Â INT32_MAXÂ ||Â valueÂ <Â INT32_MIN)
>>>
>>>
>>>Â That'sÂ theÂ styleÂ mostÂ GCCÂ folksÂ areÂ moreÂ accustomedÂ toÂ reading.
>>
>> ThanksÂ forÂ theÂ tipsÂ andÂ IÂ willÂ changeÂ itÂ then.
>>
Â +Â Â Â Â {
Â +Â Â Â Â Â Â unsignedÂ HOST_WIDE_INTÂ lovalÂ =Â sext_hwiÂ (value,Â 32);
Â +Â Â Â Â Â Â unsignedÂ HOST_WIDE_INTÂ hivalÂ =Â sext_hwiÂ ((valueÂ -Â 
loval)Â >>Â 32,Â 32);
Â +Â Â Â Â Â Â structÂ riscv_integer_opÂ alt_codes[RISCV_MAX_INTEGER_OPS],
Â +Â Â Â Â Â Â Â hicode[RISCV_MAX_INTEGER_OPS];
Â +Â Â Â Â Â Â intÂ hi_cost,Â lo_cost;
Â +
Â +Â Â Â Â Â Â hi_costÂ =Â riscv_build_integer_1Â (hicode,Â hival,Â mode);
Â +Â Â Â Â Â Â ifÂ (hi_costÂ <Â cost)
Â +Â {
Â +Â Â Â lo_costÂ =Â riscv_build_integer_1Â (alt_codes,Â loval,Â mode);
Â +Â Â Â ifÂ (lo_costÂ +Â hi_costÂ <Â cost)
>>>
>>>Â JustÂ soÂ I'mÂ sure.Â Â "cost"Â hereÂ refersÂ strictlyÂ toÂ otherÂ 
>>>synthesized
>>>Â forms?Â IfÂ so,Â thenÂ ISTMÂ thatÂ we'dÂ wantÂ toÂ generateÂ theÂ newÂ 
>>>styleÂ when
>>>Â lo_costÂ +Â hi_costÂ <Â costÂ ORÂ whenÂ lo_costÂ +Â hi_costÂ isÂ lessÂ 
>>>thanÂ loading
>>>Â theÂ constantÂ fromÂ memoryÂ --Â whichÂ isÂ almostÂ certainlyÂ moreÂ thanÂ 
>>>"3"
>>>Â sinceÂ theÂ sequenceÂ fromÂ memoryÂ willÂ beÂ atÂ leastÂ 3Â instructions,Â 
>>>twoÂ of
>>>Â whichÂ willÂ hitÂ memory.
>>>
>>>
>>>Â Jeff
>>>
>>
>> Yes,Â almostÂ right.Â TheÂ basicÂ ideaÂ ofÂ thisÂ patchÂ isÂ toÂ improveÂ 
>> theÂ cost
>> calculationÂ forÂ loadingÂ 64bitÂ constantÂ inÂ rv32,Â insteadÂ ofÂ addingÂ 
>> aÂ new
>> wayÂ toÂ loadÂ constant.
>>
>> gccÂ nowÂ loadsÂ 0x739290001LLÂ inÂ rv32gcÂ withÂ threeÂ instructions,
>>  Â Â Â Â Â Â Â Â liÂ Â Â Â Â Â a0,958988288
>>  Â Â Â Â Â Â Â Â addiÂ Â Â Â a0,a0,1
>>  Â Â Â Â Â Â Â Â liÂ Â Â Â Â Â a1,7
>> However,Â whenÂ itÂ loadsÂ 0x839290001LL,Â theÂ outputÂ assemblyÂ becomes
>>  Â Â Â Â Â Â Â Â luiÂ Â Â Â Â a5,%hi(.LC0)
>>  Â Â Â Â Â Â Â Â lwÂ Â Â Â Â Â a0,%lo(.LC0)(a5)
>>  Â Â Â Â Â Â Â Â lwÂ Â Â Â Â Â a1,%lo(.LC0+4)(a5)
>>  Â Â Â Â .LC0:
>>  Â Â Â Â Â Â Â Â .wordÂ Â Â 958988289
>>  Â Â Â Â Â Â Â Â .wordÂ Â Â 8
>> TheÂ costÂ calculationÂ isÂ inaccurateÂ inÂ suchÂ cases,Â sinceÂ loadingÂ 
>> these
>> twoÂ constantsÂ shouldÂ haveÂ noÂ differenceÂ inÂ rv32Â (justÂ changeÂ `liÂ 
>> a1,7`
>> toÂ `liÂ a1,8`Â toÂ loadÂ theÂ hiÂ part).Â ThisÂ patchÂ willÂ takeÂ theseÂ 
>> cases
>> intoÂ

Re: 回复：[PING] [PATCH RESEND] riscv: improve the cost model for loading a 64bit constant in rv32.





On 11/24/22 00:43, Sinan wrote:

 The motivation of this patch is to correct the wrong estimation of

 the number of instructions needed for loading a 64bit constant in
 rv32 in the current cost model(riscv_interger_cost). According to
 the current implementation, if a constant requires more than 3
 instructions(riscv_const_insn and riscv_legitimate_constant_p),
 then the constant will be put into constant pool when expanding
 gimple to rtl(legitimate_constant_p hook and emit_move_insn).
 So the inaccurate cost model leads to the suboptimal codegen
 in rv32 and the wrong estimation part could be corrected through
 this fix.

 e.g. the current codegen for loading 0x839290001 in rv32

lui a5,%hi(.LC0)
lw  a0,%lo(.LC0)(a5)
lw  a1,%lo(.LC0+4)(a5)
 .LC0:
.word   958988289
.word   8

 output after this patch

li a0,958988288
addi a0,a0,1
li a1,8

 gcc/ChangeLog:

  * config/riscv/riscv.cc (riscv_build_integer): Handle the case of 
loading 64bit constant in rv32.

 gcc/testsuite/ChangeLog:

  * gcc.target/riscv/rv32-load-64bit-constant.c: New test.

 Signed-off-by: Lin Sinan 
 ---
   gcc/config/riscv/riscv.cc | 23 +++
   .../riscv/rv32-load-64bit-constant.c  | 38 +++
   2 files changed, 61 insertions(+)
   create mode 100644 gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c

 diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
 index 32f9ef9ade9..9dffabdc5e3 100644
 --- a/gcc/config/riscv/riscv.cc
 +++ b/gcc/config/riscv/riscv.cc
 @@ -618,6 +618,29 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
}
   }
  
 +  if ((value > INT32_MAX || value < INT32_MIN) && !TARGET_64BIT)


 Nit.   It's common practice to have the TARGET test first in a series of 
 tests.  It may also be advisable to break this into two lines.  
 Something like this:



   if ((!TARGET_64BIT)
   || value > INT32_MAX || value < INT32_MIN)


 That's the style most GCC folks are more accustomed to reading.


Thanks for the tips and I will change it then.


 +{
 +  unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
 +  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
 +  struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS],
 +   hicode[RISCV_MAX_INTEGER_OPS];
 +  int hi_cost, lo_cost;
 +
 +  hi_cost = riscv_build_integer_1 (hicode, hival, mode);
 +  if (hi_cost < cost)
 + {
 +   lo_cost = riscv_build_integer_1 (alt_codes, loval, mode);
 +   if (lo_cost + hi_cost < cost)


 Just so I'm sure.  "cost" here refers strictly to other synthesized 
 forms? If so, then ISTM that we'd want to generate the new style when 
 lo_cost + hi_cost < cost OR when lo_cost + hi_cost is less than loading 
 the constant from memory -- which is almost certainly more than "3" 
 since the sequence from memory will be at least 3 instructions, two of 
 which will hit memory.



 Jeff



Yes, almost right. The basic idea of this patch is to improve the cost
calculation for loading 64bit constant in rv32, instead of adding a new
way to load constant.

gcc now loads 0x739290001LL in rv32gc with three instructions,
 li  a0,958988288
 addia0,a0,1
 li  a1,7
However, when it loads 0x839290001LL, the output assembly becomes
 lui a5,%hi(.LC0)
 lw  a0,%lo(.LC0)(a5)
 lw  a1,%lo(.LC0+4)(a5)
 .LC0:
 .word   958988289
 .word   8
The cost calculation is inaccurate in such cases, since loading these
two constants should have no difference in rv32 (just change `li a1,7`
to `li a1,8` to load the hi part). This patch will take these cases
into consideration.

I think I see better what's going on.  This really isn't about the 
constant pool costing.  It's about another way to break down the 
constant into components.


riscv_build_integer_1, for the cases we're looking at breaks down the 
constant so that high + low will give the final result.  It costs the 
high and low parts separately, then sums their cost + 1 for the addition 
step.


Your patch adds another method that is specific to rv32 and takes 
advantage of register pairs.   You break the constant down into 32bit 
high and low chunks, where each chunk will go into a different 32 bit 
register.  You just then need to sum the cost of loading each chunk.


For the constants in question, your new method will result in a smaller 
cost than the current method.   That's really the point of 
riscv_build_integer -- find the sequence and cost of creation.  We later 
use that information to determine if we should use that sequence or a 
constant pool.


Palmer raised an issue on the tests with a request to not include the 
arch/abi specification.  But I think you addressed that in a later 
comment.  Specifically for rv64 we end up with another instruction, 
which would cause some constants to be considered cheaper as

Re: Ping [PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

2022-11-28 Thread Richard Biener via Gcc-patches

On Mon, Nov 28, 2022 at 6:58 PM Segher Boessenkool
 wrote:
>
> On Mon, Nov 28, 2022 at 09:42:05AM +0100, Richard Biener wrote:
> > Since the function seems to be allowed to fail the patch looks
> > reasonable - still I wonder
> > what the "fallback" for a MODE_CC style compare-and-branch is?  There
> > are callers
> > of this function that do not seem to expect failure at least, some
> > suspiciously looking
> > like MODE_CC candiates.
>
> Hi!
>
> cbranchcc4 is *not* a compare-and-branch, like ccbranch4 for other
> modes are.  Instead, it is a conditional branch.  I still think it is a
> bad idea to use this same pattern name for a completely different (and
> much more basic) concept, it just confuses many things, makes us need
> exceptions in most users of cbranch4 :-(
>
> cbranchcc4 does not do a comparison.  Instead, it uses the result of
> some previous comparison in some CC register (or anything else that set
> such a register).  We want to use a cbranchcc4 to reuse some earlier
> comparison here.  Which is great of course!  But, redoing the
> (potentially expensive) computation to prepare the CC for a more
> complicated condition is not a good idea.  Also, Power's conditional
> branch insns just branch on one of the 32 condition bits (either set or
> unset), not on a logical combination of multiple of those bits, as we
> need with LTGT, UNLT, UNGT, UNEQ, and LE and GE without fastmath.  So it
> is much cleaner (and causes fewer problems later on) if we only allow
> those codes we do support.
>
> Example of LTGT:
>   fcmpu 0,0,1   # compare f0 <=> f1 to cr0 (exactly one of
> # cr0.lt, cr0.gt, cr0.eq, cr0.un will be set)
>   cror 2,0,1# cr0.eq = cr0.lt | cr0.gt
>   beq 0 # branch if cr0.eq is set
>
> So, we want the cbranchcc4 here to just do that last insn, not the last
> two insns (or all three as any other cbranch4 is!)

Anyhow - my question still stands - what's the fallback for the callers
that do not check for failure?  How are we sure we're not running into
these when relaxing the requirement that a MODE_CC prepare_cmp_insn
must not fail?

Richard.

>
> Segher

Re: [PATCH] RISC-V: Note that __builtin_riscv_pause() implies Xgnuzihintpausestate


On Fri, 18 Nov 2022 09:01:08 PST (-0800), Palmer Dabbelt wrote:

On Thu, 17 Nov 2022 22:59:08 PST (-0800), Kito Cheng wrote:

Wait, what's Xgnuzihintpausestate???


I just made it up, it's defined right next to the name like those
profile extensions are.  I figured that's the most RISC-V way to define
something like this, but we could just drop it and run with the
definition -- IIRC we just stuck a comment in for Linux and QEMU, I
doubt anyone is actually going to implement the "doesn't touch PC"
version of pause.


Just checking up on this one.  I don't care a ton about the name, just 
that we document where we're intentionally violating the specs.





On Fri, Nov 18, 2022 at 12:30 PM Palmer Dabbelt  wrote:


gcc/ChangeLog:

* doc/extend.texi (__builtin_riscv_pause): Imply
Xgnuzihintpausestate.
---
 gcc/doc/extend.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b1dd39e64b8..26f14e61bc8 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21103,7 +21103,9 @@ Returns the value that is currently set in the 
@samp{tp} register.
 @end deftypefn

 @deftypefn {Built-in Function}  void __builtin_riscv_pause (void)
-Generates the @code{pause} (hint) machine instruction.
+Generates the @code{pause} (hint) machine instruction.  This implies the
+Xgnuzihintpausestate extension, which redefines the @code{pause} instruction to
+change architectural state.
 @end deftypefn

 @node RX Built-in Functions
--
2.38.1

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Adapt to_chars/from_chars symbols

I forgot to add the patch but as you already made another feedback I'll 
clean my patch first.



On 28/11/22 19:43, François Dumont wrote:

On 28/11/22 11:21, Jonathan Wakely wrote:



On Mon, 28 Nov 2022 at 10:10, Jonathan Wakely  wrote:



On Mon, 28 Nov 2022 at 06:07, François Dumont via Libstdc++
mailto:libstdc%2b...@gcc.gnu.org>> wrote:

This patch is fixing those tests:

20_util/to_chars/float128_c++23.cc
std/format/formatter/requirements.cc
std/format/functions/format.cc
std/format/functions/format_to_n.cc
std/format/functions/size.cc
std/format/functions/vformat_to.cc
std/format/string.cc

Note that symbols used in  for __ibm128 and __iee128
are untested.

I even wonder if the normal mode ones are because I cannot
find the
symbols used in gnu.ver.


 libstdc++: [_GLIBCXX_INLINE_VERSION] Add
to_chars/from_chars
symbols export

 libstdc++-v3/ChangeLog

 * include/std/format
[_GLIBCXX_INLINE_VERSION](to_chars):
Adapt __asm symbol
 specifications.
 * config/abi/pre/gnu-versioned-namespace.ver: Add
to_chars/from_chars symbols
 export.

Ok to commit ?



Why are changes needed to the linker script?

Those functions should already match the general wildcard:

    # Names inside the 'extern' block are demangled names.
    extern "C++"
    {
      std::*;
      std::__8::*;
    };



No idear, my guess was that it has something to do with the __asm 
usages in  and with the commnt:


  // These overloads exist in the library, but are not declared for C++20.
  // Make them available as std::__format::to_chars.

Maybe they exist in the library but are unused so not exported unless 
specified in the linker script ?





Instead of nine separate #if blocks, can we just do:

#if _GLIBCXX_INLINE_VERSION
# define _GLIBCXX_ALIAS(S) __asm("_ZNSt3__8" S)
#else
# define _GLIBCXX_ALIAS(S) __asm("_ZNSt" S)
#endif

 And then use:

  _GLIBCXX_ALIAS("8to_charsPcS_eSt12chars_format");

and finally:

#undef _GLIBCXX_ALIAS


I tried and as expected it's not working because the diff in the 
symbol is not limited to the '3__8' pattern. 'chars_format' is also 
defined in versioned namespace which might perhaps explain some 
mangling diff.


Here is an updated patch though, I had forgotten to replace a _DF128 
with a __ieee128 in the untested part of this patch.


If you prefer to take a closer look later I'll just re-submit my patch 
to move versioned namespace mode to cxx11 abi knowing that those tests 
are already FAIL.


François

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Adapt to_chars/from_chars symbols


On 28/11/22 11:21, Jonathan Wakely wrote:



On Mon, 28 Nov 2022 at 10:10, Jonathan Wakely  wrote:



On Mon, 28 Nov 2022 at 06:07, François Dumont via Libstdc++
mailto:libstdc%2b...@gcc.gnu.org>> wrote:

This patch is fixing those tests:

20_util/to_chars/float128_c++23.cc
std/format/formatter/requirements.cc
std/format/functions/format.cc
std/format/functions/format_to_n.cc
std/format/functions/size.cc
std/format/functions/vformat_to.cc
std/format/string.cc

Note that symbols used in  for __ibm128 and __iee128
are untested.

I even wonder if the normal mode ones are because I cannot
find the
symbols used in gnu.ver.


 libstdc++: [_GLIBCXX_INLINE_VERSION] Add to_chars/from_chars
symbols export

 libstdc++-v3/ChangeLog

 * include/std/format
[_GLIBCXX_INLINE_VERSION](to_chars):
Adapt __asm symbol
 specifications.
 * config/abi/pre/gnu-versioned-namespace.ver: Add
to_chars/from_chars symbols
 export.

Ok to commit ?



Why are changes needed to the linker script?

Those functions should already match the general wildcard:

    # Names inside the 'extern' block are demangled names.
    extern "C++"
    {
      std::*;
      std::__8::*;
    };



No idear, my guess was that it has something to do with the __asm usages 
in  and with the commnt:


  // These overloads exist in the library, but are not declared for C++20.
  // Make them available as std::__format::to_chars.

Maybe they exist in the library but are unused so not exported unless 
specified in the linker script ?





Instead of nine separate #if blocks, can we just do:

#if _GLIBCXX_INLINE_VERSION
# define _GLIBCXX_ALIAS(S) __asm("_ZNSt3__8" S)
#else
# define _GLIBCXX_ALIAS(S) __asm("_ZNSt" S)
#endif

 And then use:

  _GLIBCXX_ALIAS("8to_charsPcS_eSt12chars_format");

and finally:

#undef _GLIBCXX_ALIAS


I tried and as expected it's not working because the diff in the symbol 
is not limited to the '3__8' pattern. 'chars_format' is also defined in 
versioned namespace which might perhaps explain some mangling diff.


Here is an updated patch though, I had forgotten to replace a _DF128 
with a __ieee128 in the untested part of this patch.


If you prefer to take a closer look later I'll just re-submit my patch 
to move versioned namespace mode to cxx11 abi knowing that those tests 
are already FAIL.


François

[PATCH] tree-optimization/107896 - allow v2si to dimode unpacks

2022-11-28 Thread Richard Biener via Gcc-patches

The following avoids ICEing for V2SI -> DImode vec_unpacks_lo.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107896
* tree-vect-stmts.cc (supportable_widening_operation):
Handle non-vector mode intermediate mode.
---
 gcc/tree-vect-stmts.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b35b986889d..5485da58b38 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12194,9 +12194,8 @@ supportable_widening_operation (vec_info *vinfo,
   if (VECTOR_BOOLEAN_TYPE_P (prev_type))
intermediate_type
  = vect_halve_mask_nunits (prev_type, intermediate_mode);
-  else
+  else if (VECTOR_MODE_P (intermediate_mode))
{
- gcc_assert (VECTOR_MODE_P (intermediate_mode));
  tree intermediate_element_type
= lang_hooks.types.type_for_mode (GET_MODE_INNER 
(intermediate_mode),
  TYPE_UNSIGNED (prev_type));
@@ -12204,6 +12203,10 @@ supportable_widening_operation (vec_info *vinfo,
= build_vector_type_for_mode (intermediate_element_type,
  intermediate_mode);
}
+  else
+   intermediate_type
+ = lang_hooks.types.type_for_mode (intermediate_mode,
+   TYPE_UNSIGNED (prev_type));
 
   if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
  && VECTOR_BOOLEAN_TYPE_P (prev_type)
-- 
2.35.3

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Adapt to_chars/from_chars symbols

On Mon, 28 Nov 2022 at 06:07, François Dumont via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> This patch is fixing those tests:
>
> 20_util/to_chars/float128_c++23.cc
> std/format/formatter/requirements.cc
> std/format/functions/format.cc
> std/format/functions/format_to_n.cc
> std/format/functions/size.cc
> std/format/functions/vformat_to.cc
> std/format/string.cc
>
> Note that symbols used in  for __ibm128 and __iee128 are untested.
>

We don't need to do this for those symbols, the ALT128 config is
incompatible with versioned namespace. If you're using the versioned
namespace, you don't need backwards compatibility with the old long double
ABI.

Re: [PATCH v2] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU





On 11/28/22 10:44, Maciej W. Rozycki wrote:




Your patch is probably still useful.  I think Kito's only concern was to make
sure we don't have the ANDI instruction in addition to not having the SEXT
instruction.  So still approved for trunk, just update the testcases to make
sure we don't have the ANDI too.


  Given the false negatives how about getting a bit stricter and also
checking there's nothing following the XORI instruction, like here?

  It might be an overkill to have a check both for the sequence and for the
absence of ANDI or SEXT.W as well, but I'd rather have them both out of an
abundance of caution.

Sure.  That works for me as well.  OK for the trunk.

Interestingly enough Raphael and I are looking at a case where Roger's 
patch is causing poorer code generation.  Given what we're finding as we 
work through the other case, I won't be surprised if we find multiple 
cases where RISC-V is generating poorer code after that patch, even 
though it's a perfectly sensible patch.


jeff

Re: [PATCH] RISC-V: Add attributes for VSETVL PASS


On Mon, 28 Nov 2022 08:44:16 PST (-0800), jeffreya...@gmail.com wrote:


On 11/28/22 07:14, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * config/riscv/riscv-protos.h (enum vlmul_type): New enum.
 (get_vlmul): New function.
 (get_ratio): Ditto.
 * config/riscv/riscv-v.cc (struct mode_vtype_group): New struct.
 (ENTRY): Adapt for attributes.
 (enum vlmul_type): New enum.
 (get_vlmul): New function.
 (get_ratio): New function.
 * config/riscv/riscv-vector-switch.def (ENTRY): Adapt for attributes.
 * config/riscv/riscv.cc (ENTRY): Ditto.
 * config/riscv/vector.md (false,true): Add attributes.


I'm tempted to push this into the next stage1 given its arrival after
stage1 close, but if the wider RISC-V maintainers want to see it move
forward, I don't object strongly.


I'm also on the fence here: the RISC-V V implementation is a huge 
feature so it's a bit awkward to land it this late in the release, but 
on the flip side it's a very important feature.  It's complicated enough 
that whatever our first release is will probably be a mess, so I'd 
prefer to just get that pain out of the way sooner rather than later.  
There's no V hardware availiable now and nothing concretely announced so 
any users are probably going to be pretty advanced, but having at least 
the basics of V in there will allow us to kick the tires on the rest of 
the stack a lot more easily.


There's obviously risk to taking something this late in the process.  We 
don't have anything else that triggers the vectorizer, so I think it 
should be seperable enough that risk is manageable.


Not sure if Kito wants to chim in, though.


I'm curious about the model you're using.  Is it going to be something
similar to mode switching?  That's the first mental model that comes to
mind.  Essentially we determine the VL needed for every chunk of code,
then we do an LCM like algorithm to find the optimal placement points
for VL sets to minimize the number of VL sets across all the paths
through the CFG.  Never in a million years would I have expected we'd be
considering reusing that code.


Jeff

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Adapt dg error messages


On 28/11/22 14:39, Jonathan Wakely wrote:



On Mon, 28 Nov 2022 at 10:08, Jonathan Wakely  wrote:



On Mon, 28 Nov 2022 at 10:06, Jonathan Wakely 
wrote:



On Mon, 28 Nov 2022 at 06:02, François Dumont via Libstdc++
mailto:libstdc%2b...@gcc.gnu.org>> wrote:

libstdc++: [_GLIBCXX_INLINE_VERSION] Adapt dg error messages

libstdc++-v3/ChangeLog

 * testsuite/20_util/bind/ref_neg.cc: Adapt
dg-prune-output message.
 * testsuite/20_util/function/cons/70692.cc: Adapt
dg-error message.

Ok to commit ?


OK, thanks.



Actually wait, can you test this instead?

--- a/libstdc++-v3/testsuite/lib/prune.exp
+++ b/libstdc++-v3/testsuite/lib/prune.exp
@@ -37,6 +37,8 @@ proc libstdc++-dg-prune { system text } {
      return "::unsupported::hosted C++ headers not supported"
    }

+    regsub -all "std::__8::" $text "std::" text
+
    # Ignore caret diagnostics. Unfortunately dejaGNU trims leading
    # spaces, so one cannot rely on them being present.
    regsub -all "(^|\n)\[^\n\]+\n *\\^\n" $text "\n" text

This should mean we can stop needing to make these changes to
every test, and just write the tests naturally.


That only helps for dg-prune-output but we still need to (__8::)? for 
dg-error.


Please push your change to 20_util/function/cons/70692.cc but not the 
change to 20_util/bind/ref_neg.cc (the latter will get fixed after I 
pushed the prune.expo change).



Done as requested and I confirm that prune.exp enhancement fixed 
20_util/bind/ref_neg.cc.


Thanks

Re: Ping [PATCH] Change the behavior of predicate check failure on cbranchcc4 operand0 in prepare_cmp_insn

On Mon, Nov 28, 2022 at 09:42:05AM +0100, Richard Biener wrote:
> Since the function seems to be allowed to fail the patch looks
> reasonable - still I wonder
> what the "fallback" for a MODE_CC style compare-and-branch is?  There
> are callers
> of this function that do not seem to expect failure at least, some
> suspiciously looking
> like MODE_CC candiates.

Hi!

cbranchcc4 is *not* a compare-and-branch, like ccbranch4 for other
modes are.  Instead, it is a conditional branch.  I still think it is a
bad idea to use this same pattern name for a completely different (and
much more basic) concept, it just confuses many things, makes us need
exceptions in most users of cbranch4 :-(

cbranchcc4 does not do a comparison.  Instead, it uses the result of
some previous comparison in some CC register (or anything else that set
such a register).  We want to use a cbranchcc4 to reuse some earlier
comparison here.  Which is great of course!  But, redoing the
(potentially expensive) computation to prepare the CC for a more
complicated condition is not a good idea.  Also, Power's conditional
branch insns just branch on one of the 32 condition bits (either set or
unset), not on a logical combination of multiple of those bits, as we
need with LTGT, UNLT, UNGT, UNEQ, and LE and GE without fastmath.  So it
is much cleaner (and causes fewer problems later on) if we only allow
those codes we do support.

Example of LTGT:
  fcmpu 0,0,1   # compare f0 <=> f1 to cr0 (exactly one of
# cr0.lt, cr0.gt, cr0.eq, cr0.un will be set)
  cror 2,0,1# cr0.eq = cr0.lt | cr0.gt
  beq 0 # branch if cr0.eq is set

So, we want the cbranchcc4 here to just do that last insn, not the last
two insns (or all three as any other cbranch4 is!)

Segher

Re: [PATCH] Introduce -nolibstdc++ option


On 9/16/22 07:52, Jason Merrill wrote:

On 6/24/22 01:23, Alexandre Oliva via Gcc-patches wrote:

On Jun 23, 2022, Alexandre Oliva  wrote:


Here's the patch.  Regstrapped on x86_64-linux-gnu, also tested with a
cross to aarch64-rtems6.  Ok to install?



Introduce -nostdlib++ option


Uhh, I went ahead and installed this.  The earlier patch was approved if
nobody objected, and so, having overcome the objection to the option
spelling, it ended up in my "approved" patchset.

In case there are objections to it, please let me know, and I'll revert
it promptly, but I guess it makes little sense to revert it on the odd
change that someone does.  Thanks for your understanding.


I'm getting failures from pure-virtual1.C with

xg++: error: unrecognized command-line option '-nostdlib++'

I guess that's because it isn't handled by the specs in the way nostdlib 
and nodefautlibs are.  Maybe the solution is to set SKIPOPT in the driver?


Are you not seeing this problem?


I started seeing this again and decided to track it down more.  It seems 
to be dependent on specs, as explained in this commit message:


From 0e74112cf494c93f170739b87ecc89b2d5d97f92 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Sun, 27 Nov 2022 14:30:14 -0500
Subject: [PATCH] driver: fix validate_switches logic
To: gcc-patches@gcc.gnu.org

Under the old logic for validate_switches, once suffix or starred got set,
they stayed set for all later switches found in the spec.  So for e.g.

%{g*:%{%:debug-level-gt(0):

Once we see g*, starred is set.  Then we see %:, and it sees that as a
zero-length switch, which because starred is still set, matches any and all
command-line options.  So targets that use such a spec accept all options in
the driver, while ones that don't reject some, such as the recent
-nostdlib++.

This patch fixes the inconsistency, so all targets reject -nostdlib++.

gcc/ChangeLog:

	* gcc.cc (validate_switches): Reset suffix/starred on loop.
---
 gcc/gcc.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index ca1c9e27a94..2278e2b6bb1 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -9299,12 +9299,15 @@ validate_switches (const char *start, bool user_spec, bool braced)
   const char *atom;
   size_t len;
   int i;
-  bool suffix = false;
-  bool starred = false;
+  bool suffix;
+  bool starred;
 
 #define SKIP_WHITE() do { while (*p == ' ' || *p == '\t') p++; } while (0)
 
 next_member:
+  suffix = false;
+  starred = false;
+
   SKIP_WHITE ();
 
   if (*p == '!')
-- 
2.31.1

[PATCH v2] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU

2022-11-28 Thread Maciej W. Rozycki

We produce inefficient code for some synthesized SImode conditional set 
operations (i.e. ones that are not directly implemented in hardware) on 
RV64.  For example a piece of C code like this:

int
sleu (unsigned int x, unsigned int y)
{
  return x <= y;
}

gets compiled (at `-O2') to this:

sleu:
sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_disi
xoria0,a0,1 # 10[c=4 l=4]  *xorsi3_internal/1
andia0,a0,1 # 16[c=4 l=4]  anddi3/1
ret # 25[c=0 l=4]  simple_return

or (at `-O1') to this:

sleu:
sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_disi
xoria0,a0,1 # 10[c=4 l=4]  *xorsi3_internal/1
sext.w  a0,a0   # 16[c=4 l=4]  extendsidi2/0
ret # 24[c=0 l=4]  simple_return

This is because the middle end expands a SLEU operation missing from 
RISC-V hardware into a sequence of a SImode SGTU operation followed by 
an explicit SImode XORI operation with immediate 1.  And while the SGTU 
machine instruction (alias SLTU with the input operands swapped) gives a 
properly sign-extended 32-bit result which is valid both as a SImode or 
a DImode operand the middle end does not see that through a SImode XORI 
operation, because we tell the middle end that the RISC-V target (unlike 
MIPS) may hold values in DImode integer registers that are valid for 
SImode operations even if not properly sign-extended.

However the RISC-V psABI requires that 32-bit function arguments and 
results passed in 64-bit integer registers be properly sign-extended, so 
this is explicitly done at the conclusion of the function.

Fix this by making the backend use a sequence of a DImode SGTU operation 
followed by a SImode SEQZ operation instead.  The latter operation is 
known by the middle end to produce a properly sign-extended 32-bit 
result and therefore combine gets rid of the sign-extension operation 
that follows and actually folds it into the very same XORI machine 
operation resulting in:

sleu:
sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_didi
xoria0,a0,1 # 16[c=4 l=4]  xordi3/1
ret # 25[c=0 l=4]  simple_return

instead (although the SEQZ alias SLTIU against immediate 1 machine 
instruction would equally do and is actually retained at `-O0').  This 
is handled analogously for the remaining synthesized operations of this 
kind, i.e. `SLE', `SGEU', and `SGE'.

gcc/
* config/riscv/riscv.cc (riscv_emit_int_order_test): Use EQ 0 
rather that XOR 1 for LE and LEU operations.

gcc/testsuite/
* gcc.target/riscv/sge.c: New test.
* gcc.target/riscv/sgeu.c: New test.
* gcc.target/riscv/sle.c: New test.
* gcc.target/riscv/sleu.c: New test.
---
On Mon, 28 Nov 2022, Jeff Law wrote:

> > > >I have noticed it went nowhere.  Can you please check what
> > > > compilation
> > > > options lead to this discrepancy so that we can have the fix included in
> > > > GCC 13?  I'd like to understand what's going on here.
> > > FWIW, I don't see the redundant sign extension with this testcase at -O2
> > > on
> > > the trunk.  Is it possible the patch has been made redundant over the last
> > > few
> > > months?
> >   Maybe at -O2, but the test cases continue to fail in my configuration for
> > other optimisation levels:
> > 
> > FAIL: gcc.target/riscv/sge.c   -O1   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sge.c  -Og -g   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sgeu.c   -O1   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sgeu.c  -Og -g   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sle.c   -O1   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sle.c  -Og -g   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sleu.c   -O1   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sleu.c  -Og -g   scan-assembler-not sext\\.w
> 
> I may have been running an rv32 toolchain...  So I'll start over and ensure
> that I'm running rv64 :-)
> 
> 
> With the trunk, I get code like Kito (AND with 0x1 mask)

 Right, I have examined assembly produced at -O2 and this is what happens 
here as well:

--- sleu-O1.s   2022-11-28 16:31:18.520538342 +
+++ sleu-O2.s   2022-11-28 16:30:27.054241372 +
@@ -10,7 +10,7 @@
 sleu:
sgtua0,a0,a1
xoria0,a0,1
-   sext.w  a0,a0
+   andia0,a0,1
ret
.size   sleu, .-sleu
.section.note.GNU-stack,"",@progbits

following Kito's observations.  Which is why the tests incorrectly pass at 
some optimisation levels while code produced is still suboptimal and just 
trivially different.

> The key difference is Roger's patch:
> 
> commit c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f
> Author: Roger Sayle 
> Date:   Wed Aug 3 08:55:35 2022 +0100
> 
>     Some additional zero-extension related optimizations in simplify-rtx.
> 
>

Re: [PATCH] Improve profile handling in switch lowering.





On 1/26/22 04:11, Martin Liška wrote:

Hello.

Right now, switch lowering does not update basic_block::count values
so that they are uninitiliazed. Moreover, I've updated probability scaling
when a more complex expansion happens. There are still some situations 
where
the profile is a bit imprecise, but the patch improves rapidly the 
current situation.


Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

 PR tree-optimization/101301
 PR tree-optimization/103680

gcc/ChangeLog:

 * tree-switch-conversion.cc (bit_test_cluster::emit):
 Handle correctly remaining probability.
 (switch_decision_tree::try_switch_expansion): Fix BB's count
 where a cluster expansion happens.
 (switch_decision_tree::emit_cmp_and_jump_insns): Fill up also
 BB count.
 (switch_decision_tree::do_jump_if_equal): Likewise.
 (switch_decision_tree::emit_case_nodes): Handle special case
 for BT expansion which can also fallback to a default BB.
 * tree-switch-conversion.h (cluster::cluster): Add
 m_default_prob probability.
Funny you just ping'd this patch.  I've held it in my queue for months 
as I didn't see it get installed.


As far as I'm concerned, you know the switch conversion bits better than 
anyone.  If you think the patch significantly improves the profile 
handling for switch conversion, then I'd say go for it.  Particularly 
since it seems to fix at least two known bugs.



Keff

Re: -Wformat-overflow handling for %b and %B directives in C2X standard

On 9/1/22 03:41, Даниил Александрович Фролов via Gcc-patches wrote:

Subject:
Re: -Wformat-overflow handling for %b and %B directives in C2X standard
From:
Даниил Александрович Фролов via Gcc-patches 
Date:
9/1/22, 03:41

To:
Marek Polacek 
CC:
"gcc-patches@gcc.gnu.org" 

 From eb9e8241d99145020ec5c050c918c1ad3abc2701 Mon Sep 17 00:00:00 2001
From: Frolov Daniil
Date: Thu, 1 Sep 2022 10:55:01 +0300
Subject: [PATCH] Support %b, %B for -Wformat-overflow (sprintf, snprintf)

gcc/ChangeLog:

* gimple-ssa-sprintf.cc (fmtresult::type_max_digits): Handle
base == 2.
(tree_digits): Likewise.
(format_integer): Likewise.
(parse_directive): Add cases for %b and %B directives.

gcc/testsuite/ChangeLog:

* gcc.dg/Wformat-overflow1.c: New test.

Thanks.  I've pushed this to the trunk.

Jeff

Re: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

On Mon, Nov 28, 2022 at 03:51:59PM +0800, Jiufu Guo wrote:
> Jiufu Guo via Gcc-patches  writes:
> > Segher Boessenkool  writes:
> >>> > +  else
> >>> > +   {
> >>> > + emit_move_insn (temp,
> >>> > + GEN_INT (((ud2 << 16) ^ 0x8000) - 
> >>> > 0x8000));
> >>> > + if (ud1 != 0)
> >>> > +   emit_move_insn (temp, gen_rtx_IOR (DImode, temp, GEN_INT 
> >>> > (ud1)));
> >>> > + emit_move_insn (dest,
> >>> > + gen_rtx_ZERO_EXTEND (DImode,
> >>> > +  gen_lowpart (SImode, 
> >>> > temp)));
> >>> > +   }
> >>
> >> Why this?  Please just write it in DImode, do not go via SImode?
> > Thanks for catch this. Yes, gen_lowpart with DImode would be ok.
> Oh, Sorry. DImode can not be used here.  The genreated pattern with
> DImode can not be recognized.  Using SImode is to match 'rlwxx'.

There are patterns that accept DImode for rlwinm just fine.  Please use
  (and:DI (const_int 0x) (x:DI))
not the obfuscated
  (zero_extend:DI (subreg:SI (x:DI) LOWBYTE))


Segher

[committed] libstdc++: Fix std::string_view for I32LP16 targets

Tested x86_64-linux, built on msp430-elf and h8300-elf. Pushed to trunk.

-- >8 --

For H8/300 with -msx -mn -mint32 the type of (_M_len - __pos) is int,
because int is wider than size_t so the operands are promoted.

libstdc++-v3/ChangeLog:

* include/std/string_view (basic_string_view::copy) Use explicit
template argument for call to std::min.
(basic_string_view::substr): Likewise.
---
 libstdc++-v3/include/std/string_view | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/string_view 
b/libstdc++-v3/include/std/string_view
index 2604af2e9aa..f42045dd6f1 100644
--- a/libstdc++-v3/include/std/string_view
+++ b/libstdc++-v3/include/std/string_view
@@ -315,7 +315,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
__glibcxx_requires_string_len(__str, __n);
__pos = std::__sv_check(size(), __pos, "basic_string_view::copy");
-   const size_type __rlen = std::min(__n, _M_len - __pos);
+   const size_type __rlen = std::min(__n, _M_len - __pos);
// _GLIBCXX_RESOLVE_LIB_DEFECTS
// 2777. basic_string_view::copy should use char_traits::copy
traits_type::copy(__str, data() + __pos, __rlen);
@@ -327,7 +327,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   substr(size_type __pos = 0, size_type __n = npos) const noexcept(false)
   {
__pos = std::__sv_check(size(), __pos, "basic_string_view::substr");
-   const size_type __rlen = std::min(__n, _M_len - __pos);
+   const size_type __rlen = std::min(__n, _M_len - __pos);
return basic_string_view{_M_str + __pos, __rlen};
   }
 
-- 
2.38.1

Re: [PATCH V2] Use subscalar mode to move struct block for parameter




On 11/22/22 19:58, Jiufu Guo wrote:

Hi Jeff,

Thanks a lot for your comments!

Jeff Law  writes:


On 11/20/22 20:07, Jiufu Guo wrote:

Jiufu Guo  writes:


Hi,

As mentioned in the previous version patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
The suboptimal code is generated for "assigning from parameter" or
"assigning to return value".
This patch enhances the assignment from parameters like the below
cases:
/case1.c
typedef struct SA {double a[3];long l; } A;
A ret_arg (A a) {return a;}
void st_arg (A a, A *p) {*p = a;}

case2.c
typedef struct SA {double a[3];} A;
A ret_arg (A a) {return a;}
void st_arg (A a, A *p) {*p = a;}

For this patch, bootstrap and regtest pass on ppc64{,le}
and x86_64.
* Besides asking for help reviewing this patch, I would like to
consult comments about enhancing for "assigning to returns".

I updated the patch to fix the issue for returns.  This patch
adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used
by a return stmt.  This patch fix the issue in expand pass only,
so, we would try to update the patch to avoid this flag.

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index dd29c03..09b8ec64cea 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
   frame_phase = off ? align - off : 0;
 }
   +  /* Collect VARs on returns.  */
+  if (DECL_RESULT (current_function_decl))
+{
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+   if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
+ {
+   tree val = gimple_return_retval (ret);
+   if (val && VAR_P (val))
+ DECL_USEDBY_RETURN_P (val) = 1;
+ }
+}
+
 /* Set TREE_USED on all variables in the local_decls.  */
 FOR_EACH_LOCAL_DECL (cfun, i, var)
   TREE_USED (var) = 1;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..20973649963 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool nontemporal)
 return;
   }
   +  if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)
+   && TYPE_MODE (TREE_TYPE (from)) == BLKmode
+   && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
+  || REG_P (DECL_INCOMING_RTL (from
+  || (VAR_P (to) && DECL_USEDBY_RETURN_P (to)
+ && TYPE_MODE (TREE_TYPE (to)) == BLKmode
+ && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl)))
+  == PARALLEL))
+{
+  push_temp_slots ();
+  rtx par_ret;
+  machine_mode mode;
+  par_ret = TREE_CODE (from) == PARM_DECL
+ ? DECL_INCOMING_RTL (from)
+ : DECL_RTL (DECL_RESULT (current_function_decl));
+  mode = GET_CODE (par_ret) == PARALLEL
+  ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0))
+  : word_mode;
+  int mode_size = GET_MODE_SIZE (mode).to_constant ();
+  int size = INTVAL (expr_size (from));
+
+  /* If/How the parameter using submode, it dependes on the size and
+position of the parameter.  Here using heurisitic number.  */
+  int hurstc_num = 8;

Where did this come from and what does it mean?

Sorry for does not make this clear. We know that an aggregate arg may be
on registers partially or totally, as assign_parm_adjust_entry_rtl.
For an example, if a parameter with 12 words and the target/ABI only
allow 8 gprs for arguments, then the parameter could use 8 regs at most
and left part in stack.


Right, but the number of registers is target dependent, so I don't see 
how using "8" or any number of that matter is correct here.







Note that BLKmode subword values passed in registers can be either
right or left justified.  I think you also need to worry about
endianness here.

Since the subword is used to move block(read from source mem and then
store to destination mem with register mode), and this would keep to use
the same endianness on reg like move_block_from_reg. So, the patch does
not check the endianness.


Hmm, I was clear enough here, particularly using the endianness term.  
Don't you need to query the ABI to ensure that you're not changing left 
vs right justification for a partially in register argument. On the 
PA we have:


/* Specify padding for the last element of a block move between registers
   and memory.

   The 64-bit runtime specifies that objects need to be left justified
   (i.e., the normal justification for a big endian target).  The 32-bit
   runtime specifies right justification for objects smaller than 64 bits.
   We use a DImode register in the parallel for 5 to 7 byte structures
   so that there is only one element.  This allows the object to be
   correctly padded.  */
#define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \
  targetm.calls.function_arg_padding ((MODE), (TYPE))


Jeff

[committed] libstdc++: Fix src/c++17/memory_resource for H8 targets [PR107801]

Tested x86_64-linux, built on msp430-elf and h8300-elf. Pushed to trunk.

-- >8 --

This fixes compilation failures for H8 multilibs. For the normal
multilib (ILP16L32?), the chunk struct does not have the expected size,
because uint32_t is type long and has alignment 4 (by default). This
forces sizeof(chunk) to be 12 instead of the expected 10. We can fix
that by using bitset::size_type instead of uint32_t, so that we only use
a 16-bit size when size_t and pointers are 16-bit types.

For the I32LP16 multilibs that use -mint32 int is wider than size_t
and so arithmetic expressions involving size_t promote to int. This
means we need some explicit casts back to size_t.

libstdc++-v3/ChangeLog:

PR libstdc++/107801
* src/c++17/memory_resource.cc (chunk::_M_bytes): Change type
from uint32_t to bitset::size_type. Adjust static assertion.
(__pool_resource::_Pool::replenish): Cast to size_t after
multiplication instead of before.
(__pool_resource::_M_alloc_pools): Ensure both arguments to
std::max have type size_t.
---
 libstdc++-v3/src/c++17/memory_resource.cc | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 0bd94dbc6a7..a1854c55bd0 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -505,7 +505,7 @@ namespace pmr
 }
 
 // Allocated size of chunk:
-uint32_t _M_bytes = 0;
+bitset::size_type _M_bytes = 0;
 // Start of allocated chunk:
 std::byte* _M_p = nullptr;
 
@@ -579,7 +579,7 @@ namespace pmr
   // For 16-bit pointers it's five pointers (10 bytes).
   // TODO pad 64-bit to 4*sizeof(void*) to avoid splitting across cache lines?
   static_assert(sizeof(chunk)
-  == sizeof(bitset::size_type) + sizeof(uint32_t) + 2 * sizeof(void*));
+  == 2 * sizeof(bitset::size_type) + 2 * sizeof(void*));
 
   // An oversized allocation that doesn't fit in a pool.
   struct big_block
@@ -734,7 +734,7 @@ namespace pmr
  _M_blocks_per_chunk = std::min({
  max_blocks,
  __opts.max_blocks_per_chunk,
- (size_t)_M_blocks_per_chunk * 2
+ size_t(_M_blocks_per_chunk * 2)
  });
}
 }
@@ -1057,7 +1057,8 @@ namespace pmr
// Decide on initial number of blocks per chunk.
// At least 16 blocks per chunk seems reasonable,
// more for smaller blocks:
-   size_t blocks_per_chunk = std::max(size_t(16), 1024 / block_size);
+   size_t blocks_per_chunk = 1024 / block_size;
+   blocks_per_chunk = std::max(size_t(16), blocks_per_chunk);
// But don't exceed the requested max_blocks_per_chunk:
blocks_per_chunk
  = std::min(blocks_per_chunk, _M_opts.max_blocks_per_chunk);
-- 
2.38.1

[committed] libstdc++: Fix _Hash_bytes for I16LP32 targets [PR107885]

Tested x86_64-linux, built on msp430-elf and h8300-elf. Pushed to trunk.

-- >8 --

For H8/300 size_t is 32 bits wide, but (unsigned char)buf[2] << 16
promotes to int which is only 16 bits wide. The shift is then undefined.
This fixes it by converting to size_t before shifting.

libstdc++-v3/ChangeLog:

PR libstdc++/107885
* libsupc++/hash_bytes.cc (_Hash_bytes): Convert to size_t
instead of implicit integer promotion to 16 bits.
---
 libstdc++-v3/libsupc++/hash_bytes.cc | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
b/libstdc++-v3/libsupc++/hash_bytes.cc
index ffdd04f7602..67e2dbb1a0f 100644
--- a/libstdc++-v3/libsupc++/hash_bytes.cc
+++ b/libstdc++-v3/libsupc++/hash_bytes.cc
@@ -90,17 +90,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
len -= 4;
   }
 
+size_t k;
 // Handle the last few bytes of the input array.
 switch(len)
   {
   case 3:
-   hash ^= static_cast(buf[2]) << 16;
+   k = static_cast(buf[2]);
+   hash ^= k << 16;
[[gnu::fallthrough]];
   case 2:
-   hash ^= static_cast(buf[1]) << 8;
+   k = static_cast(buf[1]);
+   hash ^= k << 8;
[[gnu::fallthrough]];
   case 1:
-   hash ^= static_cast(buf[0]);
+   k = static_cast(buf[0]);
+   hash ^= k;
hash *= m;
   };
 
-- 
2.38.1

Re: [PATCH 1/1] RISC-V: fix stack access before allocation.

On 11/27/22 22:28, Fei Gao wrote:

In current riscv stack frame allocation, 2 steps are used. The first step
allocates memories at least for callee saved GPRs and FPRs, and the second step
allocates the rest if stack size is greater than signed 12-bit range. But it's
observed in some cases, like gcc.target/riscv/stack_frame.c in my patch, callee
saved FPRs fail to be included in the first step allocation, so we get
generated instructions like this:

li t0,-16384
addisp,sp,-48
addit0,t0,752
...
fsw fs4,-4(sp) #issue here of accessing before allocation
...
add sp,sp,t0

"fsw fs4,-4(sp)" has issue here of accessing stack before allocation. Although "add
sp,sp,t0" reserves later the memory for fs4, it exposes a risk when an interrupt comes in between "fsw
fs4,-4(sp)" and "add sp,sp,t0", resulting in a corruption in the stack storing fs4 after interrupt
context saving and a failure to get the correct value of fs4 later.

This patch fixes issue above, adapts testcases identified in regression tests,
and add a new testcase for the change.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_first_stack_step):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr93304.c: Adapt testcase for the change, constrain
match to assembly instructions only.
* gcc.target/riscv/rvv/base/spill-11.c: Adapt testcase for the change.
* gcc.target/riscv/stack_frame.c: New test.

They key here is that the MIN_FIRST_STEP wasn't including the FP save
space. The stack layout diagram before riscv_stack_align, combined with
the info you provided made that pretty clear.

Good job tracking it down -- these can be tough to reproduce and thus
tough to debug/fix.

I made minor adjustments to the ChangeLog and committed your change to
the trunk.

Thanks,

Jeff

Re: [PATCH] gcc: Use ld -r when checking for HAVE_LD_RO_RW_SECTION_MIXING




On 11/28/22 06:59, Joakim Nohlgård wrote:

The check for HAVE_LD_RO_RW_SECTION_MIXING fails on targets where ld
does not support shared objects, even though the answer to the test
should be 'read-write'. One such target is riscv64-unknown-elf. Failing
this test results in a libgcc crtbegin.o which has a writable .eh_frame
section leading to the default linker scripts placing the .eh_frame
section in a writable memory segment, or a linker warning about writable
sections in a read-only segment when using ld scripts that place
.eh_frame unconditionally in ROM.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Use ld -r in the check for HAVE_LD_RO_RW_SECTION_MIXING


I'm not sure that simply replacing -shared with -r is the right fix 
here.  ISTM that if the -shared tests fails, then we can/should try the 
-r variant.    Am I missing something here?



jeff

Re: [PATCH] RISC-V: Add duplicate vector support.




On 11/25/22 09:06, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * config/riscv/constraints.md (Wdm): New constraint.
 * config/riscv/predicates.md (direct_broadcast_operand): New predicate.
 * config/riscv/riscv-protos.h (RVV_VLMAX): New macro.
 (emit_pred_op): Refine function.
 * config/riscv/riscv-selftests.cc (run_const_vector_selftests): New 
function.
 (run_broadcast_selftests): Ditto.
 (BROADCAST_TEST): New tests.
 (riscv_run_selftests): More tests.
 * config/riscv/riscv-v.cc (emit_pred_move): Refine function.
 (emit_vlmax_vsetvl): Ditto.
 (emit_pred_op): Ditto.
 (expand_const_vector): New function.
 (legitimize_move): Add constant vector support.
 * config/riscv/riscv.cc (riscv_print_operand): New asm print rule for 
const vector.
 * config/riscv/riscv.h (X0_REGNUM): New macro.
 * config/riscv/vector-iterators.md: New attribute.
 * config/riscv/vector.md (vec_duplicate): New pattern.
 (@pred_broadcast): New pattern.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/base/dup-1.c: New test.
 * gcc.target/riscv/rvv/base/dup-2.c: New test.


I think this should wait for the next stage1 cycle.

jeff

Re: [PATCH] RISC-V: Remove tail && mask policy operand for vmclr, vmset, vmld, vmst




On 11/28/22 07:21, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

Since mask instruction doesn't need policy, so remove it to make it look 
reasonable.
gcc/ChangeLog:

 * config/riscv/vector.md: Remove TA && MA operands.


Does this fix a known bug or is it just a cleanup?   I think the latter, 
but I want to be sure.




Jeff

Re: [PATCH] RISC-V: Add attributes for VSETVL PASS




On 11/28/22 07:14, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

gcc/ChangeLog:

 * config/riscv/riscv-protos.h (enum vlmul_type): New enum.
 (get_vlmul): New function.
 (get_ratio): Ditto.
 * config/riscv/riscv-v.cc (struct mode_vtype_group): New struct.
 (ENTRY): Adapt for attributes.
 (enum vlmul_type): New enum.
 (get_vlmul): New function.
 (get_ratio): New function.
 * config/riscv/riscv-vector-switch.def (ENTRY): Adapt for attributes.
 * config/riscv/riscv.cc (ENTRY): Ditto.
 * config/riscv/vector.md (false,true): Add attributes.


I'm tempted to push this into the next stage1 given its arrival after 
stage1 close, but if the wider RISC-V maintainers want to see it move 
forward, I don't object strongly.



I'm curious about the model you're using.  Is it going to be something 
similar to mode switching?  That's the first mental model that comes to 
mind.  Essentially we determine the VL needed for every chunk of code, 
then we do an LCM like algorithm to find the optimal placement points 
for VL sets to minimize the number of VL sets across all the paths 
through the CFG.  Never in a million years would I have expected we'd be 
considering reusing that code.



Jeff

Re: [PING][PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU




On 11/28/22 08:38, Maciej W. Rozycki wrote:

On Mon, 28 Nov 2022, Jeff Law wrote:


LGTM, but with a nit, I don't get set.w but get an andi like below, so
maybe we should also scan-assembler-not andi? feel free to commit that
directly with that fix

```asm
sleu:
 sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_disi
 xoria0,a0,1 # 10[c=4 l=4]  *xorsi3_internal/1
 andia0,a0,1 # 16[c=4 l=4]  anddi3/1
 ret # 25[c=0 l=4]  simple_return
```

   Interesting.  I can do that, but can you please share the compilation
options, given or defaulted (from `--with...' configuration options), this
happens with?

   I have noticed it went nowhere.  Can you please check what compilation
options lead to this discrepancy so that we can have the fix included in
GCC 13?  I'd like to understand what's going on here.

FWIW, I don't see the redundant sign extension with this testcase at -O2 on
the trunk.  Is it possible the patch has been made redundant over the last few
months?

  Maybe at -O2, but the test cases continue to fail in my configuration for
other optimisation levels:

FAIL: gcc.target/riscv/sge.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sge.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sgeu.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sgeu.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sle.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sle.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sleu.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sleu.c  -Og -g   scan-assembler-not sext\\.w


I may have been running an rv32 toolchain...  So I'll start over and 
ensure that I'm running rv64 :-)



With the trunk, I get code like Kito (AND with 0x1 mask)


The key difference is Roger's patch:

commit c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f
Author: Roger Sayle 
Date:   Wed Aug 3 08:55:35 2022 +0100

    Some additional zero-extension related optimizations in simplify-rtx.

    This patch implements some additional zero-extension and sign-extension
    related optimizations in simplify-rtx.cc.  The original motivation 
comes

    from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees:

    Failed to match this instruction:
    (set (reg:DI 88 [ _1 ])
    (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))

[ ... ]

With that patch the sign extension is removed and instead we generate 
the AND with 0x1.


Old, from combine dump:

  Successfully matched this instruction:
  (set (reg/i:DI 10 a0)
! (sign_extend:DI (reg:SI 78)))


New, from combine dump:

  (set (reg/i:DI 10 a0)
! (and:DI (subreg:DI (reg:SI 78) 0)
! (const_int 1 [0x1])))

Note the date on Roger's patch, roughly the same time as yours. I 
suspect Kito had tested the truck with Roger's patch.



Your patch is probably still useful.  I think Kito's only concern was to 
make sure we don't have the ANDI instruction in addition to not having 
the SEXT instruction.  So still approved for trunk, just update the 
testcases to make sure we don't have the ANDI too.



jeff

Re: [PATCH] tree-optimization/107672 - avoid vector mode type_for_mode call

2022-11-28 Thread Alex Coplan via Gcc-patches

Hi Richard,

On 25/11/2022 21:08, Richard Biener via Gcc-patches wrote:
> 
> 
> On Fri, 25 Nov 2022, Vaseeharan Vinayagamoorthy wrote:
> 
> > Hi,
> > 
> > I am seeing an internal compiler error, related to this patch:
> 
> Can you please open a bugzilla for this and attach preprocessed
> source so I can reproduce the ICE with a cc1 cross compiler?

I've raised a PR for this with a reduced testcase here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107896

Thanks,
Alex

> 
> Thanks,
> Richard.
> 
> > 
> > during GIMPLE pass: slp
> > 
> > options-save.cc: In function 'void cl_optimization_restore(gcc_options*,
> > gcc_options*, cl_optimization*)':
> > 
> > options-save.cc:1292:1: internal compiler error: in
> > supportable_widening_operation, at tree-vect-stmts.cc:12199
> > 
> > ï¿½
> > 
> > ï¿½1292 | cl_optimization_restore (struct gcc_options *opts, struct
> > gcc_options *opts_set,
> > 
> > ï¿½| ^~~
> > 
> > /ï¿½/src/gcc/gcc/profile-count.cc: In member function 'int
> > profile_count::to_cgraph_frequency(profile_count) const':
> > 
> > /ï¿½/src/gcc/gcc/profile-count.cc:308:1: note: parameter passing for 
> > argument
> > of type 'profile_count' changed in GCC 9.1
> > 
> > ï¿½308 | profile_count::to_cgraph_frequency (profile_count entry_bb_count)
> > const
> > 
> > ï¿½| ^
> > 
> > /ï¿½/src/gcc/gcc/profile-count.cc: In member function 'sreal
> > profile_count::to_sreal_scale(profile_count, bool*) const':
> > 
> > /ï¿½/src/gcc/gcc/profile-count.cc:326:1: note: parameter passing for 
> > argument
> > of type 'profile_count' changed in GCC 9.1
> > 
> > ï¿½326 | profile_count::to_sreal_scale (profile_count in, bool *known) const
> > 
> > ï¿½| ^
> > 
> > 0x2195bdd supportable_widening_operation(vec_info*, tree_code,
> > _stmt_vec_info*, tree_node*, tree_node*, tree_code*, tree_code*, int*,
> > vec*)
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-stmts.cc:12199
> > 
> > 0x2180493 vectorizable_conversion
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-stmts.cc:5064
> > 
> > 0x2192fdd vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*, _slp_tree*,
> > _slp_instance*, vec*)
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-stmts.cc:11256
> > 
> > ï¿½
> > 
> > /ï¿½/src/gcc/gcc/profile-count.cc: In member function 'profile_count
> > profile_count::combine_with_ipa_count(profile_count)':
> > 
> > /ï¿½/src/gcc/gcc/profile-count.cc:398:1: note: parameter passing for 
> > argument
> > of type 'profile_count' changed in GCC 9.1
> > 
> > ï¿½398 | profile_count::combine_with_ipa_count (profile_count ipa)
> > 
> > ï¿½| ^
> > 
> > 0x14f95d1 vect_slp_analyze_node_operations_1
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-slp.cc:5958
> > 
> > 0x14f9c19 vect_slp_analyze_node_operations
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-slp.cc:6147
> > 
> > 0x14f9b4d vect_slp_analyze_node_operations
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-slp.cc:6126
> > 
> > 0x14fa439 vect_slp_analyze_operations(vec_info*)
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-slp.cc:6387
> > 
> > 0x14fd423 vect_slp_analyze_bb_1
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-slp.cc:7372
> > 
> > 0x14fd599 vect_slp_region
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-slp.cc:7419
> > 
> > 0x14fe0d1 vect_slp_bbs
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-slp.cc:7610
> > 
> > 0x14fe46f vect_slp_function(function*)
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vect-slp.cc:7698
> > 
> > 0x151a109 execute
> > 
> > ï¿½/ï¿½/src/gcc/gcc/tree-vectorizer.cc:1532
> > 
> > Please submit a full bug report, with preprocessed source (by using
> > -freport-bug).
> > 
> > Please include the complete backtrace with any bug report.
> > 
> > See  for instructions.
> > 
> > Makefile:1146: recipe for target 'options-save.o' failed
> > 
> > make[3]: *** [options-save.o] Error 1
> > 
> > 
> > 
> > That happens when building theï¿½arm-none-linux-gnueabihf toolchain natively
> > with glibc bootstrap:
> > Build:arm-none-linux-gnueabihf
> > Host:arm-none-linux-gnueabihf
> > Target: arm-none-linux-gnueabihf
> > 
> > The compiler being used to build the toolchain is gcc 7.5.0.
> > 
> > 
> > Kind regards
> > Vasee
> > 
> > 
> > From: Gcc-patches  on
> > behalf of Richard Biener via Gcc-patches 
> > Sent: 22 November 2022 08:48
> > To: gcc-patches@gcc.gnu.org 
> > Subject: [PATCH] tree-optimization/107672 - avoid vector mode type_for_mode
> > call ï¿½
> > The following avoids using type_for_mode on vector modes which might
> > not work for all frontends.ï¿½ Instead we look for the inner mode
> > type and use build_vector_type_for_mode instead.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> > 
> > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ PR tree-optimization/107672
> > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * tree-vect-stmts.cc 
> > (supportable_widening_operation): Avoid
> > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ type_for_mode on vector modes.
> > ---
> > ï¿½gcc/tree-vect-stmts.cc | 12 +---
> > ï¿½1 file changed, 9 insertions(+), 3

Re: PING [PATCH v3] c++: Allow module name to be a single letter on Windows

2022-11-28 Thread Torbjorn SVENSSON via Gcc-patches





On 2022-11-28 12:21, Nathan Sidwell wrote:

On 11/25/22 14:03, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606528.html


ok, thanks!


Pushed.





Kind regards,
Torbjörn

On 2022-11-17 14:20, Torbjörn SVENSSON wrote:

v1 -> v2:
Paths without "C:" part can still be absolute if they start with / or
\ on Windows.

v2 -> v3:
Use alternative approach by having platform specific code in module.cc.

Truth table for the new expression:
c:\foo -> true
c:/foo -> true
/foo   -> true
\foo   -> true
c:foo  -> false
foo    -> false
./foo  -> true
.\foo  -> true


Ok for trunk?

---

On Windows, the ':' character is special and when the module name is
a single character, like 'A', then the flatname would be (for
example) 'A:Foo'. On Windows, 'A:Foo' is treated as an absolute
path by the module loader and is likely not found.

Without this patch, the test case pr98944_c.C fails with:

In module imported at /src/gcc/testsuite/g++.dg/modules/pr98944_b.C:7:1,
of module A:Foo, imported at 
/src/gcc/testsuite/g++.dg/modules/pr98944_c.C:7:

A:Internals: error: header module expected, module 'A:Internals' found
A:Internals: error: failed to read compiled module: Bad file data
A:Internals: note: compiled module file is 'gcm.cache/A-Internals.gcm'
In module imported at /src/gcc/testsuite/g++.dg/modules/pr98944_c.C:7:8:
A:Foo: error: failed to read compiled module: Bad import dependency
A:Foo: note: compiled module file is 'gcm.cache/A-Foo.gcm'
A:Foo: fatal error: returning to the gate for a mechanical issue
compilation terminated.

gcc/cp/ChangeLog:

* module.cc: On Windows, 'A:Foo' is supposed to be a module
and not a path.

Tested on Windows with arm-none-eabi for Cortex-M3 in gcc-11 tree.

Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
  gcc/cp/module.cc | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 0e9af318ba4..fa41a86213f 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13960,7 +13960,15 @@ get_module (tree name, module_state *parent, 
bool partition)

  static module_state *
  get_module (const char *ptr)
  {
-  if (ptr[0] == '.' ? IS_DIR_SEPARATOR (ptr[1]) : IS_ABSOLUTE_PATH 
(ptr))
+  /* On DOS based file systems, there is an ambiguity with A:B which 
can be
+ interpreted as a module Module:Partition or Drive:PATH.  
Interpret strings
+ which clearly starts as pathnames as header-names and 
everything else is

+ treated as a (possibly malformed) named moduled.  */
+  if (IS_DIR_SEPARATOR (ptr[ptr[0] == '.']) // ./FOO or /FOO
+#if HAVE_DOS_BASED_FILE_SYSTEM
+  || (HAS_DRIVE_SPEC (ptr) && IS_DIR_SEPARATOR (ptr[2])) // A:/FOO
+#endif
+  || false)
  /* A header name.  */
  return get_module (build_string (strlen (ptr), ptr));

Re: [PATCH 03/12] ipa-cp: Write transformation summaries of all functions

2022-11-28 Thread Martin Jambor

Hi,

On Wed, Nov 16 2022, Jan Hubicka wrote:
>> IPA-CP transformation summary streaming code currently won't stream
>> out transformations necessary for clones which are only necessary for
>> materialization of other clones (such as an IPA-CP clone which is then
>> cloned again by IPA-SRA).  However, a follow-up patch for bettor
>> reconciling IPA-SRA and IPA-CP modifications requires to have that
>> information at its disposal and so this one reworks the streaming to
>> write out all non-empty transformation summaries.
>> 
>> This should actually mean less streaming in typical case because
>> previously we streamed three zeros for all nodes in a partition with
>> no useful information associated with them.  Currently we don't stream
>> anything for those.
>> 
>> When reworking the streaming, I also simplified it a little a
>> converted it writing to nicer C++ vector iterations.
>> 
>> Bootstrapped and tested on x86_64-linux.  OK for master?
>> 
>> Thanks,
>> 
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>> 2022-11-11  Martin Jambor  
>> 
>>  * ipa-prop.cc (useful_ipcp_transformation_info_p): New function.
>>  (write_ipcp_transformation_info): Added a parameter, simplified
>>  given that is known not to be NULL.
>>  (ipcp_write_transformation_summaries): Write out all useful
>>  transformation summaries.
>>  (read_ipcp_transformation_info): Simplify given that some info
>>  will be read.
>>  (read_replacements_section): Remove assert.
>> ---
>>  gcc/ipa-prop.cc | 151 +++-
>>  1 file changed, 71 insertions(+), 80 deletions(-)
>> 
>> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
>> index e6cf25591b3..cfd12a97b36 100644
>> --- a/gcc/ipa-prop.cc
>> +++ b/gcc/ipa-prop.cc
>> @@ -5279,80 +5279,72 @@ ipa_prop_read_jump_functions (void)
>>  }
>>  }
>>  
>> -void
>> -write_ipcp_transformation_info (output_block *ob, cgraph_node *node)
>> +/* Return true if the IPA-CP transformation summary TS is non-NULL and 
>> contains
>> +   useful info.  */
>> +static bool
>> +useful_ipcp_transformation_info_p (ipcp_transformation *ts)
>>  {
>> -  int node_ref;
>> -  unsigned int count = 0;
>> -  lto_symtab_encoder_t encoder;
>> +  if (!ts)
>> +return false;
>> +  if (!vec_safe_is_empty (ts->m_agg_values)
>> +  || !vec_safe_is_empty (ts->bits)
>> +  || !vec_safe_is_empty (ts->m_vr))
>> +return true;
>> +  return false;
>> +}
>
> This way we stream transformation info for everything in the boundary.
> Even for functions that belongs to other partitions and they are called
> from current partition.  Perhaps we want to check that the function has
> body streamed?
>

Please see the updated patch below, which has survived bootstrap,
LTO-bootstrap and testing.

> Also would be possible to make a testcase? I think it should be possible
> to use -flto-partition=max and builtin_constant_p checks verifying that
> intended transformation happens.
>

make -k check-gcc RUNTESTFLAGS="execute.exp=20001017-2.c"

is a testcase which ICEs if IPA-CP info is not streamed and the
subsequent patch is applied.

Thanks,

Martin




IPA-CP transformation summary streaming code currently won't stream
out transformations necessary for clones which are only necessary for
materialization of other clones (such as an IPA-CP clone which is then
cloned again by IPA-SRA).  However, a follow-up patch for bettor
reconciling IPA-SRA and IPA-CP modifications requires to have that
information at its disposal and so this one reworks the streaming to
write out all non-empty transformation summaries.

In order not to stream transformation summaies into partitions where
the node itself nor any of its clones are materialized, I had to make
sure that clones also get encode_body flag in the encoder (so that it
could be tested) and therefore in turn lto_output understands it needs
to skip clones.

This should actually mean less streaming in typical case because
previously we streamed three zeros for all nodes in a partition with
no useful information associated with them.  Currently we don't stream
anything for those.

When reworking the streaming, I also simplified it a little a
converted it writing to nicer C++ vector iterations.

gcc/ChangeLog:

2022-11-25  Martin Jambor  

* ipa-prop.cc (useful_ipcp_transformation_info_p): New function.
(write_ipcp_transformation_info): Added a parameter, simplified
given that is known not to be NULL.
(ipcp_write_transformation_summaries): Write out all useful
transformation summaries.
(read_ipcp_transformation_info): Simplify given that some info
will be read.
(read_replacements_section): Remove assert.
* lto-cgraph.cc (add_node_to): Also set encode_body for clones.
* lto-streamer-out.cc (lto_output): Do not output virtual clones.
---
 gcc/ipa-prop.cc | 153 +++-
 gcc/lto-cgraph.cc   |   2 +-
 gcc/lto-streamer-out.cc |

Re: [PATCH] RISC-V: Support the ins "rol" with immediate operand




On 11/27/22 19:14, Feng Wang wrote:

From: wangfeng 

There is no Immediate operand of ins "rol" accroding to the B-ext,
so the immediate operand should be loaded into register at first.
But we can convert it to the ins "rori" or "roriw", and then one
immediate load ins can be reduced.

Please refer to the following use cases:
unsigned long foo2(unsigned long rs1)
{
 return (rs1 << 10) | (rs1 >> 54);
}

The complier result is:
li  a1,10
rol a0,a0,a1

This patch will generate one ins
rori a0,a0,54

gcc/ChangeLog:

 * config/riscv/bitmanip.md: Add immediate_operand support in rotl RTL 
pattern

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zbb-rol-ror-04.c: New test.
 * gcc.target/riscv/zbb-rol-ror-05.c: New test.


So this arrived after stage1 close and I'm not aware of an existing BZ 
around this issue, so I'd tend to think this should wait for stage1 to 
re-open in the spring.



From a technical standpoint, would it be better to hand this in a more 
generic way?   ie, when converting from gimple into RTL, if we want to 
generate a rotate left by immediate and don't have a suitable insn, then 
change it to a rotate right by an adjusted immediate.    This could 
probably be done in optabs.cc::expand_binop.



We might need similar code in combine.cc or simplify-rtx.cc since some 
rotate cases (or exposure of the constant) may not show up until later 
in the RTL pipeline.



Anyway, doing this in a more generic way seems like it's worth 
investigating.



jeff

Re: [PING][PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU

2022-11-28 Thread Maciej W. Rozycki

On Mon, 28 Nov 2022, Jeff Law wrote:

> > > > LGTM, but with a nit, I don't get set.w but get an andi like below, so
> > > > maybe we should also scan-assembler-not andi? feel free to commit that
> > > > directly with that fix
> > > > 
> > > > ```asm
> > > > sleu:
> > > > sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_disi
> > > > xoria0,a0,1 # 10[c=4 l=4]  *xorsi3_internal/1
> > > > andia0,a0,1 # 16[c=4 l=4]  anddi3/1
> > > > ret # 25[c=0 l=4]  simple_return
> > > > ```
> > >   Interesting.  I can do that, but can you please share the compilation
> > > options, given or defaulted (from `--with...' configuration options), this
> > > happens with?
> >   I have noticed it went nowhere.  Can you please check what compilation
> > options lead to this discrepancy so that we can have the fix included in
> > GCC 13?  I'd like to understand what's going on here.
> 
> FWIW, I don't see the redundant sign extension with this testcase at -O2 on
> the trunk.  Is it possible the patch has been made redundant over the last few
> months?

 Maybe at -O2, but the test cases continue to fail in my configuration for 
other optimisation levels:

FAIL: gcc.target/riscv/sge.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sge.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sgeu.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sgeu.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sle.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sle.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sleu.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sleu.c  -Og -g   scan-assembler-not sext\\.w

when applied on top of:

$ riscv64-linux-gnu-gcc --version
riscv64-linux-gnu-gcc (GCC) 13.0.0 20221128 (experimental)

Not anymore with the whole patch applied.

 Does it make sense to bisect the change that removed the pessimisation at 
-O2 to understand what is going on here?

 I think my change is worthwhile anyway: why to rely on the optimiser to 
get things sorted while we can produce the best code in the backend right 
away in the first place?

  Maciej

Re: [committed] libstdc++: Make 16-bit std::subtract_with_carry_engine work [PR107466]

On Mon, 28 Nov 2022 at 15:20, Jonathan Wakely via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Tested x86_64-linux. Pushed to trunk.
>

I didn't notice until my mailer flagged it that this commit has a non-ASCII
character. Fixed by the attached patch, pushed to trunk.
commit c775e2b81fca39f366040d423e3e44f4abecf753
Author: Jonathan Wakely 
Date:   Mon Nov 28 15:21:52 2022

libstdc++: Replace non-ASCII character in comment

This has an unnecessary UTF-8 non-breaking space.

libstdc++-v3/ChangeLog:

* 
testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc:
Replace non-ASCII character.

diff --git 
a/libstdc++-v3/testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc

b/libstdc++-v3/testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc
index 21f246b8dc0..d91ee7448f6 100644
--- 
a/libstdc++-v3/testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc
+++ 
b/libstdc++-v3/testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc
@@ -3,7 +3,7 @@
 #include 

 // LWG 3809. Is std::subtract_with_carry_engine supposed to work?
-// PR 107466 - invalid -Wnarrowing error with std::subtract_with_carry_engine
+// PR 107466 - invalid -Wnarrowing error with std::subtract_with_carry_engine

 int main()
 {

[committed] libstdc++: Prune versioned namespace from testsuite output

Tested x86_64-linux. Pushed to trunk.

-- >8 --

This means we don't need to use "(__8::)?" in dg-prune-output
directives.

libstdc++-v3/ChangeLog:

* testsuite/20_util/is_complete_or_unbounded/memoization_neg.cc:
Simplify dg-prune-output pattern.
* testsuite/lib/prune.exp (libstdc++-dg-prune): Prune "::__8".
---
 .../20_util/is_complete_or_unbounded/memoization_neg.cc | 2 +-
 libstdc++-v3/testsuite/lib/prune.exp| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git 
a/libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/memoization_neg.cc 
b/libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/memoization_neg.cc
index bc66c13feee..fc0b70b319c 100644
--- a/libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/memoization_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/is_complete_or_unbounded/memoization_neg.cc
@@ -1,6 +1,6 @@
 // { dg-do compile { target c++11 } }
 // { dg-prune-output "must be a complete" }
-// { dg-prune-output "'value' is not a member of 'std::(__8::)?is_move_cons" }
+// { dg-prune-output "'value' is not a member of 'std::is_move_cons" }
 // { dg-prune-output "invalid use of incomplete type" }
 
 // Copyright (C) 2019-2022 Free Software Foundation, Inc.
diff --git a/libstdc++-v3/testsuite/lib/prune.exp 
b/libstdc++-v3/testsuite/lib/prune.exp
index 6d0b77a8ccd..74842ae680c 100644
--- a/libstdc++-v3/testsuite/lib/prune.exp
+++ b/libstdc++-v3/testsuite/lib/prune.exp
@@ -37,6 +37,8 @@ proc libstdc++-dg-prune { system text } {
   return "::unsupported::hosted C++ headers not supported"
 }
 
+regsub -all "std::__8::" $text "std::" text
+
 # Ignore caret diagnostics. Unfortunately dejaGNU trims leading
 # spaces, so one cannot rely on them being present.
 regsub -all "(^|\n)\[^\n\]+\n *\\^\n" $text "\n" text
-- 
2.38.1

[committed] libstdc++: Make 16-bit std::subtract_with_carry_engine work [PR107466]

Tested x86_64-linux. Pushed to trunk.

-- >8 --

This implements the proposed resolution of LWG 3809, so that
std::subtract_with_carry_engine can be used with a 16-bit result_type.
Currently this produces a narrowing error when instantiating the
std::linear_congruential_engine to create the initial state. It also
truncates the default_seed constant when passing it as a result_type
argument.

Change the type of the constant to uint_least32_t and pass 0u when the
default_seed should be used.

libstdc++-v3/ChangeLog:

PR libstdc++/107466
* include/bits/random.h (subtract_with_carry_engine): Use 32-bit
type for default seed. Use 0u as default argument for
subtract_with_carry_engine(result_type) constructor and
seed(result_type) member function.
* include/bits/random.tcc (subtract_with_carry_engine): Use
32-bit type for default seed and engine used for initial state.
* 
testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc:
New test.
---
 libstdc++-v3/include/bits/random.h|  6 ++---
 libstdc++-v3/include/bits/random.tcc  |  4 +--
 .../cons/lwg3809.cc   | 26 +++
 3 files changed, 31 insertions(+), 5 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc

diff --git a/libstdc++-v3/include/bits/random.h 
b/libstdc++-v3/include/bits/random.h
index 3b4e7d42bb5..523ef2d6565 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -721,9 +721,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static constexpr size_t  word_size= __w;
   static constexpr size_t  short_lag= __s;
   static constexpr size_t  long_lag = __r;
-  static constexpr result_type default_seed = 19780503u;
+  static constexpr uint_least32_t default_seed = 19780503u;
 
-  subtract_with_carry_engine() : subtract_with_carry_engine(default_seed)
+  subtract_with_carry_engine() : subtract_with_carry_engine(0u)
   { }
 
   /**
@@ -758,7 +758,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* set carry to 1, otherwise sets carry to 0.
*/
   void
-  seed(result_type __sd = default_seed);
+  seed(result_type __sd = 0u);
 
   /**
* @brief Seeds the initial state @f$x_0@f$ of the
diff --git a/libstdc++-v3/include/bits/random.tcc 
b/libstdc++-v3/include/bits/random.tcc
index cb1d3675783..7ec2b3f6c35 100644
--- a/libstdc++-v3/include/bits/random.tcc
+++ b/libstdc++-v3/include/bits/random.tcc
@@ -532,7 +532,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 subtract_with_carry_engine<_UIntType, __w, __s, __r>::long_lag;
 
   template
-constexpr _UIntType
+constexpr uint_least32_t
 subtract_with_carry_engine<_UIntType, __w, __s, __r>::default_seed;
 #endif
 
@@ -541,7 +541,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 subtract_with_carry_engine<_UIntType, __w, __s, __r>::
 seed(result_type __value)
 {
-  std::linear_congruential_engine
+  std::linear_congruential_engine
__lcg(__value == 0u ? default_seed : __value);
 
   const size_t __n = (__w + 31) / 32;
diff --git 
a/libstdc++-v3/testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc
 
b/libstdc++-v3/testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc
new file mode 100644
index 000..21f246b8dc0
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/26_numerics/random/subtract_with_carry_engine/cons/lwg3809.cc
@@ -0,0 +1,26 @@
+// { dg-do run { target c++11 } }
+#include 
+#include 
+
+// LWG 3809. Is std::subtract_with_carry_engine supposed to work?
+// PRÂ 107466 - invalid -Wnarrowing error with std::subtract_with_carry_engine
+
+int main()
+{
+  // It should be possible to construct this engine with a 16-bit result_type:
+  std::subtract_with_carry_engine s16;
+  std::subtract_with_carry_engine s32;
+  // It should have been seeded with the same sequence as the 32-bit version
+  // and produce random numbers in the same range, [0, 1<<12).
+  for (int i = 0; i < 10; ++i)
+VERIFY( s16() == s32() );
+  // The default seed should be usable without truncation to uint16_t:
+  s16.seed();
+  s32.seed();
+  for (int i = 0; i < 10; ++i)
+VERIFY( s16() == s32() );
+  s16.seed(101);
+  s32.seed(101);
+  for (int i = 0; i < 10; ++i)
+VERIFY( s16() == s32() );
+}
-- 
2.38.1

Re: [PING][PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU




On 11/25/22 07:07, Maciej W. Rozycki wrote:

Hi Kito,

On Fri, 12 Aug 2022, Maciej W. Rozycki wrote:


LGTM, but with a nit, I don't get set.w but get an andi like below, so
maybe we should also scan-assembler-not andi? feel free to commit that
directly with that fix

```asm
sleu:
sgtua0,a0,a1# 9 [c=4 l=4]  *sgtu_disi
xoria0,a0,1 # 10[c=4 l=4]  *xorsi3_internal/1
andia0,a0,1 # 16[c=4 l=4]  anddi3/1
ret # 25[c=0 l=4]  simple_return
```

  Interesting.  I can do that, but can you please share the compilation
options, given or defaulted (from `--with...' configuration options), this
happens with?

  I have noticed it went nowhere.  Can you please check what compilation
options lead to this discrepancy so that we can have the fix included in
GCC 13?  I'd like to understand what's going on here.


FWIW, I don't see the redundant sign extension with this testcase at -O2 
on the trunk.  Is it possible the patch has been made redundant over the 
last few months?



Jeff

Re: [RFA] src-release.sh: Fix gdb source tarball build failure due to libsframe

2022-11-28 Thread Tom Tromey

> "Joel" == Joel Brobecker via Gdb-patches  
> writes:

Joel> ChangeLog:

Joel> * src-release.sh (GDB_SUPPORT_DIRS): Add libsframe.

Joel> Ok to apply to master?

Looks good to me.
I think we recently agreed that gdb and binutils maintainers can approve
patches like this... ?

thanks,
Tom

Re: [PATCH 0/2] Support HWASAN with Intel LAM

2022-11-28 Thread Martin Liška


On 11/11/22 02:26, liuhongt via Gcc-patches wrote:

   2 years ago, ARM folks support HWASAN[1] in GCC[2], and introduced several
target hooks(Many thanks to their work) so other backends can do similar
things if they have similar feature.
   Intel LAM(linear Address Masking)[3 Charpter 14] supports similar feature 
with
the upper bits of pointers can be used as metadata, LAM support two modes:
   LAM_U48:bits 48-62 can be used as metadata
   LAM_U57:bits 57-62 can be used as metedata.

These 2 patches mainly support those target hooks, but HWASAN is not really
enabled until the final decision for the LAM kernel interface which may take
quite a long time. We have verified our patches with a "fake" interface 
locally[4], and
decided to push the backend patches to the GCC13 to make other HWASAN 
developper's work
easy.


Hello.

A few random comments I noticed:

1) please document the new target -mlam in extend.texi
2) the description speaks about bits [48-62] or [57-62], can explain why the 
patch contains:

+  /* Mask off bit63 when LAM_U57.  */
+  if (ix86_lam_type == lam_u57)
?

3) Shouldn't the -lman option emit GNU_PROPERTY_X86_FEATURE_1_LAM_U57 or 
GNU_PROPERTY_X86_FEATURE_1_LAM_U48
.gnu.property note?

4) Can you please explain Florian's comment here:
https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13#note_1181396487

Thanks,
Martin



[1] https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557857.html
[3] 
https://www.intel.com/content/dam/develop/external/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
[4] https://gitlab.com/x86-gcc/gcc/-/tree/users/intel/lam/master


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

liuhongt (2):
   Implement hwasan target_hook.
   Enable hwasan for x86-64.

  gcc/config/i386/i386-expand.cc  |  12 
  gcc/config/i386/i386-options.cc |   3 +
  gcc/config/i386/i386-opts.h |   6 ++
  gcc/config/i386/i386-protos.h   |   2 +
  gcc/config/i386/i386.cc | 123 
  gcc/config/i386/i386.opt|  16 +
  libsanitizer/configure.tgt  |   1 +
  7 files changed, 163 insertions(+)

[PATCH] RISC-V: Remove tail && mask policy operand for vmclr, vmset, vmld, vmst

2022-11-28 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Since mask instruction doesn't need policy, so remove it to make it look 
reasonable.
gcc/ChangeLog:

* config/riscv/vector.md: Remove TA && MA operands.

---
 gcc/config/riscv/vector.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 3bb87232d3f..38da2f7f095 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -593,8 +593,6 @@
  (unspec:VB
[(match_operand:VB 1 "vector_mask_operand"   "Wc1, Wc1, Wc1, Wc1, 
Wc1")
 (match_operand 4 "vector_length_operand"" rK,  rK,  rK,  rK,  
rK")
-(match_operand 5 "const_int_operand""  i,   i,   i,   i,   
i")
-(match_operand 6 "const_int_operand""  i,   i,   i,   i,   
i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (match_operand:VB 3 "vector_move_operand"  "  m,  vr,  vr, Wc0, 
Wc1")
-- 
2.36.1

Re: [PATCH V2] rs6000: Support to build constants by li/lis+oris/xoris

On Mon, Nov 28, 2022 at 11:37:34AM +0800, Jiufu Guo wrote:
> Segher Boessenkool  writes:
> > On Fri, Nov 25, 2022 at 04:11:49PM +0800, Kewen.Lin wrote:
> >> on 2022/10/26 19:40, Jiufu Guo wrote:
> >> for "li/lis + oris/xoris", I interpreted it into four combinations:
> >> 
> >>li + oris, lis + oris, li + xoris, lis + xoris.
> >> 
> >> not sure just me interpreting like that, but the actual combinations
> >> which this patch adopts are:
> >> 
> >>li + oris, li + xoris, lis + xoris.
> >> 
> >> It's a bit off, but not a big deal, up to you to reword it or not.  :)
> >
> > The first two are obvious, but the last one is almost never a good idea,
> > there usually are better ways to do the same.  I cannot even think of
> > any case where this is best?  A lis;rl* is always prefered (it can
> > optimise better, be combined with other insns).
> I understant your point here.  The first two: 'li' for lowest 16bits,
> 'oris/xoris' for next 16bits.
> 
> While for 'lis + xoris', it may not obvious, because both 'lis' and
> 'xoris' operates on 17-31bits.
> 'lis + xoris' is for case "32(1) || 1(0) || 15(x) || 16(0)". xoris is
> used to clean bit31.  This case seems hard to be supported by 'rlxx'.

Please put that in a separate patch?  First do a patch with just
lis;x?oris.  They are unrelated and different in almost every way.

> I hit to find this case when I analyze what kind of constants can be
> build by two instructions. Checked the posssible combinations:
> "addi/addis" + "neg/ori/../xoris/rldX/rlwX/../sradi/extswsli"(those
> instructions which accept one register and one immediate).
> 
> I also drafted the patch to use "li/lis+rlxx" to build constant.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601276.html
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601277.html

Those seem to do many things in one patch as well :-(  It is very hard
to review such things, it takes many hours each to do properly.


Segher

[PATCH] RISC-V: Add attributes for VSETVL PASS

2022-11-28 Thread juzhe . zhong

From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum vlmul_type): New enum.
(get_vlmul): New function.
(get_ratio): Ditto.
* config/riscv/riscv-v.cc (struct mode_vtype_group): New struct.
(ENTRY): Adapt for attributes.
(enum vlmul_type): New enum.
(get_vlmul): New function.
(get_ratio): New function.
* config/riscv/riscv-vector-switch.def (ENTRY): Adapt for attributes.
* config/riscv/riscv.cc (ENTRY): Ditto.
* config/riscv/vector.md (false,true): Add attributes.

---
 gcc/config/riscv/riscv-protos.h  |  13 ++
 gcc/config/riscv/riscv-v.cc  |  41 +
 gcc/config/riscv/riscv-vector-switch.def |  97 ++--
 gcc/config/riscv/riscv.cc|   2 +-
 gcc/config/riscv/vector.md   | 185 +++
 5 files changed, 290 insertions(+), 48 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 27692ffb210..e17e003f8e2 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -120,6 +120,17 @@ extern void riscv_run_selftests (void);
 
 namespace riscv_vector {
 #define RVV_VLMAX gen_rtx_REG (Pmode, X0_REGNUM)
+enum vlmul_type
+{
+  LMUL_1 = 0,
+  LMUL_2 = 1,
+  LMUL_4 = 2,
+  LMUL_8 = 3,
+  LMUL_RESERVED = 4,
+  LMUL_F8 = 5,
+  LMUL_F4 = 6,
+  LMUL_F2 = 7,
+};
 /* Routines implemented in riscv-vector-builtins.cc.  */
 extern void init_builtins (void);
 extern const char *mangle_builtin_type (const_tree);
@@ -132,6 +143,8 @@ extern rtx expand_builtin (unsigned int, tree, rtx);
 extern bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
 extern bool legitimize_move (rtx, rtx, machine_mode);
 extern void emit_pred_op (unsigned, rtx, rtx, machine_mode);
+extern enum vlmul_type get_vlmul (machine_mode);
+extern unsigned int get_ratio (machine_mode);
 enum tail_policy
 {
   TAIL_UNDISTURBED = 0,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index fbd8bbfe254..d54795694f1 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -214,4 +214,45 @@ legitimize_move (rtx dest, rtx src, machine_mode mask_mode)
   return true;
 }
 
+/* VTYPE information for machine_mode.  */
+struct mode_vtype_group
+{
+  enum vlmul_type vlmul_for_min_vlen32[NUM_MACHINE_MODES];
+  uint8_t ratio_for_min_vlen32[NUM_MACHINE_MODES];
+  enum vlmul_type vlmul_for_min_vlen64[NUM_MACHINE_MODES];
+  uint8_t ratio_for_min_vlen64[NUM_MACHINE_MODES];
+  mode_vtype_group ()
+  {
+#define ENTRY(MODE, REQUIREMENT, VLMUL_FOR_MIN_VLEN32, RATIO_FOR_MIN_VLEN32,   
\
+ VLMUL_FOR_MIN_VLEN64, RATIO_FOR_MIN_VLEN64)  \
+  vlmul_for_min_vlen32[MODE##mode] = VLMUL_FOR_MIN_VLEN32; 
\
+  ratio_for_min_vlen32[MODE##mode] = RATIO_FOR_MIN_VLEN32; 
\
+  vlmul_for_min_vlen64[MODE##mode] = VLMUL_FOR_MIN_VLEN64; 
\
+  ratio_for_min_vlen64[MODE##mode] = RATIO_FOR_MIN_VLEN64;
+#include "riscv-vector-switch.def"
+  }
+};
+
+static mode_vtype_group mode_vtype_infos;
+
+/* Get vlmul field value by comparing LMUL with BYTES_PER_RISCV_VECTOR.  */
+enum vlmul_type
+get_vlmul (machine_mode mode)
+{
+  if (TARGET_MIN_VLEN == 32)
+return mode_vtype_infos.vlmul_for_min_vlen32[mode];
+  else
+return mode_vtype_infos.vlmul_for_min_vlen64[mode];
+}
+
+/* Get ratio according to machine mode.  */
+unsigned int
+get_ratio (machine_mode mode)
+{
+  if (TARGET_MIN_VLEN == 32)
+return mode_vtype_infos.ratio_for_min_vlen32[mode];
+  else
+return mode_vtype_infos.ratio_for_min_vlen64[mode];
+}
+
 } // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vector-switch.def 
b/gcc/config/riscv/riscv-vector-switch.def
index ee8ebd5f1cc..a51f45be487 100644
--- a/gcc/config/riscv/riscv-vector-switch.def
+++ b/gcc/config/riscv/riscv-vector-switch.def
@@ -80,7 +80,8 @@ TODO: FP16 vector needs support of 'zvfh', we don't support 
it yet.  */
 /* Return 'REQUIREMENT' for machine_mode 'MODE'.
For example: 'MODE' = VNx64BImode needs TARGET_MIN_VLEN > 32.  */
 #ifndef ENTRY
-#define ENTRY(MODE, REQUIREMENT)
+#define ENTRY(MODE, REQUIREMENT, VLMUL_FOR_MIN_VLEN32, RATIO_FOR_MIN_VLEN32,   
\
+ VLMUL_FOR_MIN_VLEN64, RATIO_FOR_MIN_VLEN64)
 #endif
 /* Flag of FP32 vector.  */
 #ifndef TARGET_VECTOR_FP32
@@ -94,66 +95,68 @@ TODO: FP16 vector needs support of 'zvfh', we don't support 
it yet.  */
 #endif
 
 /* Mask modes. Disable VNx64BImode when TARGET_MIN_VLEN == 32.  */
-ENTRY (VNx64BI, TARGET_MIN_VLEN > 32)
-ENTRY (VNx32BI, true)
-ENTRY (VNx16BI, true)
-ENTRY (VNx8BI, true)
-ENTRY (VNx4BI, true)
-ENTRY (VNx2BI, true)
-ENTRY (VNx1BI, true)
+ENTRY (VNx64BI, TARGET_MIN_VLEN > 32, LMUL_F8, 64, LMUL_RESERVED, 0)
+ENTRY (VNx32BI, true, LMUL_F4, 32, LMUL_RESERVED, 0)
+ENTRY (VNx16BI, true, LMUL_F2, 16, LMUL_RESERVED, 0)
+ENTRY (VNx8BI, true, LMUL_1, 8, LMUL_RESERVED, 0)
+ENTRY (VNx4BI, true,

[PATCH] c++: Fall back to global cpp spec if CPLUSPLUS_CPP_SPEC is not defined

2022-11-28 Thread Joakim Nohlgård

When CPLUSPLUS_CPP_SPEC is set to a string literal it is not possible to
modify it through external spec files by renaming the original cpp spec
and replacing it because the compiler cpp_spec will still point to the
original, renamed cpp spec. Not defining CPLUSPLUS_CPP_SPEC makes gcc.cc
fall back to using the same cpp spec as the C compiler when substituting
%C in spec strings.

gcc/ChangeLog:

* defaults.h (CPLUSPLUS_CPP_SPEC): Remove default definition.

Signed-off-by: Joakim Nohlgård 
---
 gcc/defaults.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/gcc/defaults.h b/gcc/defaults.h
index 376687d91b1..223460ef239 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -783,14 +783,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #endif
 #endif
 
-/* By default, the preprocessor should be invoked the same way in C++
-   as in C.  */
-#ifndef CPLUSPLUS_CPP_SPEC
-#ifdef CPP_SPEC
-#define CPLUSPLUS_CPP_SPEC CPP_SPEC
-#endif
-#endif
-
 #ifndef ACCUMULATE_OUTGOING_ARGS
 #define ACCUMULATE_OUTGOING_ARGS 0
 #endif
-- 
2.38.1

[PATCH] gcc: Use ld -r when checking for HAVE_LD_RO_RW_SECTION_MIXING

2022-11-28 Thread Joakim Nohlgård

The check for HAVE_LD_RO_RW_SECTION_MIXING fails on targets where ld
does not support shared objects, even though the answer to the test
should be 'read-write'. One such target is riscv64-unknown-elf. Failing
this test results in a libgcc crtbegin.o which has a writable .eh_frame
section leading to the default linker scripts placing the .eh_frame
section in a writable memory segment, or a linker warning about writable
sections in a read-only segment when using ld scripts that place
.eh_frame unconditionally in ROM.

gcc/ChangeLog:

* configure: Regenerate.
* configure.ac: Use ld -r in the check for HAVE_LD_RO_RW_SECTION_MIXING

Signed-off-by: Joakim Nohlgård 
---
 gcc/configure.ac | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 7c55bff6cb0..4f36ed4aff3 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3153,9 +3153,9 @@ elif test x$gcc_cv_as != x -a x$gcc_cv_ld != x -a 
x$gcc_cv_objdump != x ; then
   if $gcc_cv_as -o conftest1.o conftest1.s > /dev/null 2>&1 \
  && $gcc_cv_as -o conftest2.o conftest2.s > /dev/null 2>&1 \
  && $gcc_cv_as -o conftest3.o conftest3.s > /dev/null 2>&1 \
- && $gcc_cv_ld -shared -o conftest1.so conftest1.o \
+ && $gcc_cv_ld -r -o conftest.o conftest1.o \
conftest2.o conftest3.o > /dev/null 2>&1; then
-gcc_cv_ld_ro_rw_mix=`$gcc_cv_objdump -h conftest1.so \
+gcc_cv_ld_ro_rw_mix=`$gcc_cv_objdump -h conftest.o \
 | sed -e '/myfoosect/!d' -e N`
 if echo "$gcc_cv_ld_ro_rw_mix" | grep CONTENTS > /dev/null; then
   if echo "$gcc_cv_ld_ro_rw_mix" | grep READONLY > /dev/null; then
-- 
2.38.1

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Adapt dg error messages

On Mon, 28 Nov 2022 at 10:08, Jonathan Wakely  wrote:

>
>
> On Mon, 28 Nov 2022 at 10:06, Jonathan Wakely  wrote:
>
>>
>>
>> On Mon, 28 Nov 2022 at 06:02, François Dumont via Libstdc++ <
>> libstd...@gcc.gnu.org> wrote:
>>
>>> libstdc++: [_GLIBCXX_INLINE_VERSION] Adapt dg error messages
>>>
>>> libstdc++-v3/ChangeLog
>>>
>>>  * testsuite/20_util/bind/ref_neg.cc: Adapt dg-prune-output
>>> message.
>>>  * testsuite/20_util/function/cons/70692.cc: Adapt dg-error
>>> message.
>>>
>>> Ok to commit ?
>>>
>>>
>> OK, thanks.
>>
>>
>>
> Actually wait, can you test this instead?
>
> --- a/libstdc++-v3/testsuite/lib/prune.exp
> +++ b/libstdc++-v3/testsuite/lib/prune.exp
> @@ -37,6 +37,8 @@ proc libstdc++-dg-prune { system text } {
>   return "::unsupported::hosted C++ headers not supported"
> }
>
> +regsub -all "std::__8::" $text "std::" text
> +
> # Ignore caret diagnostics. Unfortunately dejaGNU trims leading
> # spaces, so one cannot rely on them being present.
> regsub -all "(^|\n)\[^\n\]+\n *\\^\n" $text "\n" text
>
> This should mean we can stop needing to make these changes to every test,
> and just write the tests naturally.
>

That only helps for dg-prune-output but we still need to (__8::)? for
dg-error.

Please push your change to 20_util/function/cons/70692.cc but not the
change to 20_util/bind/ref_neg.cc (the latter will get fixed after I pushed
the prune.expo change).

[PATCH] [x86] Fix unrecognizable insn due to illegal immediate_operand (const_int 255) of QImode.

2022-11-28 Thread liuhongt via Gcc-patches

For __builtin_ia32_vec_set_v16qi (a, -1, 2) with
!flag_signed_char. it's transformed to
__builtin_ia32_vec_set_v16qi (_4, 255, 2) in the gimple,
and expanded to (const_int 255) in the rtl. But for immediate_operand,
it expects (const_int 255) to be signed extended to
(const_int -1). The mismatch caused an unrecognizable insn error.

expand_expr_real_1 generates (const_int 255) without considering the target 
mode.
I guess it's on purpose, so I'll leave that alone and only change the expander
in the backend. After applying convert_modes to (const_int 255),
it's transformed to (const_int -1) which fix the issue.

Bootstrapped and regtested x86_64-pc-linux-gnu{-m32,}.
Ok for trunk(and backport to GCC-10/11/12 release branches)?

gcc/ChangeLog:

PR target/107863
* config/i386/i386-expand.cc (ix86_expand_vec_set_builtin):
Convert op1 to target mode whenever mode mismatch.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr107863.c: New test.
---
 gcc/config/i386/i386-expand.cc   | 2 +-
 gcc/testsuite/gcc.target/i386/pr107863.c | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107863.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 0373c3614a4..c639ee3a9f7 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -12475,7 +12475,7 @@ ix86_expand_vec_set_builtin (tree exp)
   op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
   elt = get_element_number (TREE_TYPE (arg0), arg2);
 
-  if (GET_MODE (op1) != mode1 && GET_MODE (op1) != VOIDmode)
+  if (GET_MODE (op1) != mode1)
 op1 = convert_modes (mode1, GET_MODE (op1), op1, true);
 
   op0 = force_reg (tmode, op0);
diff --git a/gcc/testsuite/gcc.target/i386/pr107863.c 
b/gcc/testsuite/gcc.target/i386/pr107863.c
new file mode 100644
index 000..99fd85d9765
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107863.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O" } */
+
+typedef char v16qi __attribute__((vector_size(16)));
+
+v16qi foo(v16qi a){
+  return __builtin_ia32_vec_set_v16qi (a, -1, 2);
+}
-- 
2.27.0

[PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2022-11-28 Thread Stam Markianos-Wright via Gcc-patches



On 11/15/22 15:51, Andre Vieira (lists) wrote:


On 11/11/2022 17:40, Stam Markianos-Wright via Gcc-patches wrote:

Hi all,

This is the 2/2 patch that contains the functional changes needed
for MVE Tail Predicated Low Overhead Loops.  See my previous email
for a general introduction of MVE LOLs.

This support is added through the already existing loop-doloop
mechanisms that are used for non-MVE dls/le looping.

Changes are:

1) Relax the loop-doloop mechanism in the mid-end to allow for
   decrement numbers other that -1 and for `count` to be an
   rtx containing the number of elements to be processed, rather
   than an expression for calculating the number of iterations.
2) Add a `allow_elementwise_doloop` target hook. This allows the
   target backend to manipulate the iteration count as it needs:
   in our case to change it from a pre-calculation of the number
   of iterations to the number of elements to be processed.
3) The doloop_end target-insn now had an additional parameter:
   the `count` (note: this is before it gets modified to just be
   the number of elements), so that the decrement value is
   extracted from that parameter.

And many things in the backend to implement the above optimisation:

4)  Appropriate changes to the define_expand of doloop_end and new
    patterns for dlstp and letp.
5) `arm_attempt_dlstp_transform`: (called from the define_expand of
    doloop_end) this function checks for the loop's suitability for
    dlstp/letp transformation and then implements it, if possible.
6) `arm_mve_get_loop_unique_vctp`: A function that loops through
    the loop contents and returns the vctp VPR-genereting operation
    within the loop, if it is unique and there is exclusively one
    vctp within the loop.
7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map
   from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`
   to check an insn to see if it requires the VPR or not.

No regressions on arm-none-eabi with various targets and on
aarch64-none-elf. Thoughts on getting this into trunk?

Thank you,
Stam Markianos-Wright

gcc/ChangeLog:

    * config/aarch64/aarch64.md: Add extra doloop_end arg.
    * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New.
    * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New.
    (arm_mve_get_vctp_lanes): New.
    (arm_get_required_vpr_reg): New.
    (arm_mve_get_loop_unique_vctp): New.
    (arm_attempt_dlstp_transform): New.
    (arm_allow_elementwise_doloop): New.
    * config/arm/iterators.md:
    * config/arm/mve.md (*predicated_doloop_end_internal): New.
    (dlstp_insn): New.
    * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
    * config/arm/unspecs.md: New unspecs.
    * config/ia64/ia64.md: Add extra doloop_end arg.
    * config/pru/pru.md: Add extra doloop_end arg.
    * config/rs6000/rs6000.md: Add extra doloop_end arg.
    * config/s390/s390.md: Add extra doloop_end arg.
    * config/v850/v850.md: Add extra doloop_end arg.
    * doc/tm.texi: Document new hook.
    * doc/tm.texi.in: Likewise.
    * loop-doloop.cc (doloop_condition_get): Relax conditions.
    (doloop_optimize): Add support for elementwise LoLs.
    * target-insns.def (doloop_end): Add extra arg.
    * target.def (allow_elementwise_doloop): New hook.
    * targhooks.cc (default_allow_elementwise_doloop): New.
    * targhooks.h (default_allow_elementwise_doloop): New.

gcc/testsuite/ChangeLog:

    * gcc.target/arm/lob.h: Update framework.
    * gcc.target/arm/lob1.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.
    * gcc.target/arm/dlstp-int16x8.c: New test.
    * gcc.target/arm/dlstp-int32x4.c: New test.
    * gcc.target/arm/dlstp-int64x2.c: New test.
    * gcc.target/arm/dlstp-int8x16.c: New test.


### Inline copy of patch ###

diff --git a/gcc/config/aarch64/aarch64.md 
b/gcc/config/aarch64/aarch64.md
index 
f2e3d905dbbeb2949f2947f5cfd68208c94c9272..7a6d24a80060b4a704a481ccd1a32d96e7b0f369 
100644

--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7366,7 +7366,8 @@
 ;; knows what to generate.
 (define_expand "doloop_end"
   [(use (match_operand 0 "" ""))  ; loop pseudo
-   (use (match_operand 1 "" ""))] ; label
+   (use (match_operand 1 "" ""))  ; label
+   (use (match_operand 2 "" ""))] ; decrement constant
   "optimize > 0 && flag_modulo_sched"
 {
   rtx s0;
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
550272facd12e60a49bf8a3b20f811cc13765b3a..7684620f0f4d161dd9e9ad2d70308021ec3d3d34 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -63,7 +63,7 @@ extern void arm_decompose_di_binop (rtx, rtx, rtx 
*, rtx *, rtx *, rtx *);

 extern bool arm_q_bit_access (void);
 extern bool arm_ge_bits_access (void);
 extern bool arm_target_insn_ok_for_lob (rtx);
-
+extern rtx

[COMMITTED] ada: Adjust runtime library and User's Guide to PIE default on Linux

From: Eric Botcazou 

gcc/ada/

* libgnat/g-traceb.ads: Minor tweaks in the commentary.
(Executable_Load_Address): New function.
* doc/gnat_ugn/gnat_and_program_execution.rst (Non-Symbolic
Traceback): Adjust to PIE default on Linux.
(Symbolic Traceback): Likewise.
* doc/gnat_ugn/gnat_utility_programs.rst (gnatsymbolize): Likewise.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../gnat_ugn/gnat_and_program_execution.rst   | 46 ++---
 .../doc/gnat_ugn/gnat_utility_programs.rst| 30 +--
 gcc/ada/gnat_ugn.texi | 50 ---
 gcc/ada/libgnat/g-traceb.ads  | 36 +++--
 4 files changed, 98 insertions(+), 64 deletions(-)

diff --git a/gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst 
b/gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst
index c239c363eae..45ecea75416 100644
--- a/gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst
+++ b/gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst
@@ -859,16 +859,18 @@ bug occurs, and then be able to retrieve the sequence of 
calls with the same
 program compiled with debug information.
 
 However the ``addr2line`` tool does not work with Position-Independent Code
-(PIC), the historical example being Windows DLLs, which nowadays encompasses
-Position-Independent Executables (PIE) on recent Windows versions.
-
-In order to translate addresses into the source lines with Position-Independent
-Executables on recent Windows versions, in other words without using the switch
-:switch:`-no-pie` during linking, you need to use the ``gnatsymbolize`` tool
-with :switch:`--load` instead of the ``addr2line`` tool. The main difference
-is that you need to copy the Load Address output in the traceback ahead of the
-sequence of addresses. And the default mode of ``gnatsymbolize`` is equivalent
-to that of ``addr2line`` with the above switches, so none of them is needed::
+(PIC), the historical example being Linux dynamic libraries and Windows DLLs,
+which nowadays encompasse Position-Independent Executables (PIE) on recent
+Linux and Windows versions.
+
+In order to translate addresses the source lines with Position-Independent
+Executables on recent Linux and Windows versions, in other words without
+using the switch :switch:`-no-pie` during linking, you need to use the
+``gnatsymbolize`` tool with :switch:`--load` instead of the ``addr2line``
+tool. The main difference is that you need to copy the Load Address output
+in the traceback ahead of the sequence of addresses. And the default mode
+of ``gnatsymbolize`` is equivalent to that of ``addr2line`` with the above
+switches, so none of them is needed::
 
  $ gnatmake stb -g -bargs -E
  $ stb
@@ -879,7 +881,7 @@ to that of ``addr2line`` with the above switches, so none 
of them is needed::
  Call stack traceback locations:
  0x401373 0x40138b 0x40139c 0x401335 0x4011c4 0x4011f1 0x77e892a4
 
- $ gnatsymbolize --load stb 0x40 0x401373 0x40138b 0x40139c 0x401335
+ $ gnatsymbolize --load stb 0x40 0x401373 0x40138b 0x40139c 0x401335 \
 0x4011c4 0x4011f1 0x77e892a4
 
  0x00401373 Stb.P1 at stb.adb:5
@@ -957,12 +959,17 @@ addresses to strings:
   with Ada.Text_IO;
   with GNAT.Traceback;
   with GNAT.Debug_Utilities;
+  with System;
 
   procedure STB is
 
  use Ada;
+ use Ada.Text_IO;
  use GNAT;
  use GNAT.Traceback;
+ use System;
+
+ LA : constant Address := Executable_Load_Address;
 
  procedure P1 is
 TB  : Tracebacks_Array (1 .. 10);
@@ -972,14 +979,14 @@ addresses to strings:
  begin
 Call_Chain (TB, Len);
 
-Text_IO.Put ("In STB.P1 : ");
+Put ("In STB.P1 : ");
 
 for K in 1 .. Len loop
-   Text_IO.Put (Debug_Utilities.Image (TB (K)));
-   Text_IO.Put (' ');
+   Put (Debug_Utilities.Image_C (TB (K)));
+   Put (' ');
 end loop;
 
-Text_IO.New_Line;
+New_Line;
  end P1;
 
  procedure P2 is
@@ -988,6 +995,10 @@ addresses to strings:
  end P2;
 
   begin
+ if LA /= Null_Address then
+Put_Line ("Load address: " & Debug_Utilities.Image_C (LA));
+ end if;
+
  P2;
   end STB;
 
@@ -996,8 +1007,9 @@ addresses to strings:
  $ gnatmake stb -g
  $ stb
 
- In STB.P1 : 16#0040_F1E4# 16#0040_14F2# 16#0040_170B# 16#0040_171C#
- 16#0040_1461# 16#0040_11C4# 16#0040_11F1# 16#77E8_92A4#
+ Load address: 0x40
+ In STB.P1 : 0x40F1E4 0x4014F2 0x40170B 0x40171C 0x401461 0x4011C4 \
+   0x4011F1 0x77E892A4
 
 
 You can then get further information by invoking the ``addr2line`` tool or
diff --git a/gcc/ada/doc/gnat_ugn/gnat_utility_programs.rst 
b/gcc/ada/doc/gnat_ugn/gnat_utility_programs.rst
index

[COMMITTED] ada: doc/share/conf.py: Switch the HTML documentation to using the RTD theme

From: Joel Brobecker 

This commit adjust the sphinx configuration to use the "Read The Docs"
theme, which has the advantage of allowing the navigation bar
(containing among other things a search bar, and the TOC) to stay
fixed while scrolling the contents of the page being read. This is
particularly useful to allow access to those features while reading
a long page, for instance.

gcc/ada/

* doc/share/conf.py (extensions): Add 'sphinx_rtd_theme'.
(html_theme): Set to 'sphinx_rtd_theme'.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/doc/share/conf.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/doc/share/conf.py b/gcc/ada/doc/share/conf.py
index bb36bfa0c6a..9ab80e7759e 100644
--- a/gcc/ada/doc/share/conf.py
+++ b/gcc/ada/doc/share/conf.py
@@ -92,7 +92,7 @@ if doc_name == 'gnat_rm':
 exclude_patterns.append('share/gnat_project_manager.rst')
 print('ignoring share/gnat_project_manager.rst')
 
-extensions = []
+extensions = ['sphinx_rtd_theme']
 templates_path = ['_templates']
 source_suffix = '.rst'
 master_doc = doc_name
@@ -107,7 +107,7 @@ release = get_gnat_version()
 
 pygments_style = None
 tags.add(get_gnat_build_type())
-html_theme = 'sphinxdoc'
+html_theme = 'sphinx_rtd_theme'
 if os.path.isfile('adacore_transparent.png'):
 html_logo = 'adacore_transparent.png'
 if os.path.isfile('favicon.ico'):
-- 
2.34.1

[COMMITTED] ada: Add PIE support to backtraces on Linux

From: Eric Botcazou 

gcc/ada/

* adaint.c [Linux]: Include .
(__gnat_get_executable_load_address) [Linux]: Enable.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/adaint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
index 199dbe0e405..d2604ca9b77 100644
--- a/gcc/ada/adaint.c
+++ b/gcc/ada/adaint.c
@@ -3524,6 +3524,8 @@ __gnat_cpu_set (int cpu, size_t count ATTRIBUTE_UNUSED, 
cpu_set_t *set)
 
 #if defined (__APPLE__)
 #include 
+#elif defined (__linux__)
+#include 
 #endif
 
 const void *
@@ -3532,10 +3534,8 @@ __gnat_get_executable_load_address (void)
 #if defined (__APPLE__)
   return _dyld_get_image_header (0);
 
-#elif 0 && defined (__linux__)
-  /* Currently disabled as it needs at least -ldl.  */
+#elif defined (__linux__)
   struct link_map *map = _r_debug.r_map;
-
   return (const void *)map->l_addr;
 
 #elif defined (_WIN32)
-- 
2.34.1

[COMMITTED] ada: Annotate GNAT.Source_Info with an abstract state

From: Claire Dross 

So it can be used safely from SPARK code. The abstract state represents
the source code information that is accessed by the functions defined
in Source_Info. It is volatile as it is updated asyncronously when
moving in the code.

gcc/ada/

* libgnat/g-souinf.ads (Source_Code_Information): Add a new
volatile abstract state and add it in the global contract of all
functions defined in Source_Info.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/g-souinf.ads | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/libgnat/g-souinf.ads b/gcc/ada/libgnat/g-souinf.ads
index 700f5180c82..6b72a6497f1 100644
--- a/gcc/ada/libgnat/g-souinf.ads
+++ b/gcc/ada/libgnat/g-souinf.ads
@@ -36,7 +36,13 @@
 --  and logging purposes. For example, an exception handler can print out
 --  the name of the source file in which the exception is handled.
 
-package GNAT.Source_Info is
+package GNAT.Source_Info with
+   SPARK_Mode,
+   Abstract_State =>
+ (Source_Code_Information with
+ External => (Async_Writers, Async_Readers)),
+   Annotate => (GNATprove, Always_Return)
+is
pragma Preelaborate;
--  Note that this unit is Preelaborate, but not Pure, that's because the
--  functions here such as Line are clearly not pure functions, and normally
@@ -47,6 +53,8 @@ package GNAT.Source_Info is
--  intrinsics as not Pure, even in Pure units, so no problems arose.
 
function File return String with
+ Volatile_Function,
+ Global => Source_Code_Information,
  Import, Convention => Intrinsic;
--  Return the name of the current file, not including the path information.
--  The result is considered to be a static string constant.
@@ -57,6 +65,8 @@ package GNAT.Source_Info is
--  static expression.
 
function Source_Location return String with
+ Volatile_Function,
+ Global => Source_Code_Information,
  Import, Convention => Intrinsic;
--  Return a string literal of the form "name:line", where name is the
--  current source file name without path information, and line is the
@@ -66,6 +76,8 @@ package GNAT.Source_Info is
--  string constant.
 
function Enclosing_Entity return String with
+ Volatile_Function,
+ Global => Source_Code_Information,
  Import, Convention => Intrinsic;
--  Return the name of the current subprogram, package, task, entry or
--  protected subprogram. The string is in exactly the form used for the
@@ -80,15 +92,21 @@ package GNAT.Source_Info is
--  from within generic templates.
 
function Compilation_ISO_Date return String with
+ Volatile_Function,
+ Global => Source_Code_Information,
  Import, Convention => Intrinsic;
--  Returns date of compilation as a static string "-mm-dd".
 
function Compilation_Date return String with
+ Volatile_Function,
+ Global => Source_Code_Information,
  Import, Convention => Intrinsic;
--  Returns date of compilation as a static string "mmm dd ". This is
--  in local time form, and is exactly compatible with C macro __DATE__.
 
function Compilation_Time return String with
+ Volatile_Function,
+ Global => Source_Code_Information,
  Import, Convention => Intrinsic;
--  Returns GMT time of compilation as a static string "hh:mm:ss". This is
--  in local time form, and is exactly compatible with C macro __TIME__.
-- 
2.34.1

[COMMITTED] ada: Fix internal error on conversion as in/out actual with -gnatVa

From: Eric Botcazou 

The problem is that the regular expansion of the conversion around the
call to the subprogram is disabled by the expansion of the validity check
around the same call, as documented in Expand_Actuals:

  --  This case is given higher priority because the subsequent check
  --  for type conversion may add an extra copy of the variable and
  --  prevent proper value propagation back in the original object.

Now the two mechanisms need to cooperate in order for the code to compile.

gcc/ada/

* exp_ch6.adb (Expand_Actuals.Add_Call_By_Copy_Code): Deal with a
reference to a validation variable in the actual.
(Expand_Actuals.Add_Validation_Call_By_Copy_Code): Minor tweak.
(Expand_Actuals): Call Add_Validation_Call_By_Copy_Code directly
only if Add_Call_By_Copy_Code is not to be invoked.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch6.adb | 61 -
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index 237a19d1327..0fe980c499a 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -1639,6 +1639,27 @@ package body Exp_Ch6 is
 Crep  := False;
  end if;
 
+ --  If the actual denotes a variable which captures the value of an
+ --  object for validation purposes, we propagate the link with this
+ --  object to the new variable made from the actual just above.
+
+ if Ekind (Formal) /= E_In_Parameter
+   and then Is_Validation_Variable_Reference (Actual)
+ then
+declare
+   Ref : constant Node_Id := Unqual_Conv (Actual);
+
+begin
+   if Is_Entity_Name (Ref) then
+  Set_Validated_Object (Var, Validated_Object (Entity (Ref)));
+
+   else
+  pragma Assert (False);
+  null;
+   end if;
+end;
+ end if;
+
  --  Setup initialization for case of in out parameter, or an out
  --  parameter where the formal is an unconstrained array (in the
  --  latter case, we have to pass in an object with bounds).
@@ -1906,6 +1927,13 @@ package body Exp_Ch6 is
   Name   => Lhs,
   Expression => Expr));
end if;
+
+   --  Add a copy-back to reflect any potential changes in value
+   --  back into the original object, if any.
+
+   if Is_Validation_Variable_Reference (Lhs) then
+  Add_Validation_Call_By_Copy_Code (Lhs);
+   end if;
 end;
  end if;
   end Add_Call_By_Copy_Code;
@@ -2052,10 +2080,11 @@ package body Exp_Ch6 is
   --
 
   procedure Add_Validation_Call_By_Copy_Code (Act : Node_Id) is
+ Var : constant Node_Id := Unqual_Conv (Act);
+
  Expr: Node_Id;
  Obj : Node_Id;
  Obj_Typ : Entity_Id;
- Var : constant Node_Id := Unqual_Conv (Act);
  Var_Id  : Entity_Id;
 
   begin
@@ -2405,26 +2434,10 @@ package body Exp_Ch6 is
end if;
 end if;
 
---  The actual denotes a variable which captures the value of an
---  object for validation purposes. Add a copy-back to reflect any
---  potential changes in value back into the original object.
-
---Var : ... := Object;
---if not Var'Valid then  --  validity check
---Call (Var);--  modify var
---Object := Var; --  update Object
-
---  This case is given higher priority because the subsequent check
---  for type conversion may add an extra copy of the variable and
---  prevent proper value propagation back in the original object.
-
-if Is_Validation_Variable_Reference (Actual) then
-   Add_Validation_Call_By_Copy_Code (Actual);
-
 --  If argument is a type conversion for a type that is passed by
 --  copy, then we must pass the parameter by copy.
 
-elsif Nkind (Actual) = N_Type_Conversion
+if Nkind (Actual) = N_Type_Conversion
   and then
 (Is_Elementary_Type (E_Formal)
   or else Is_Bit_Packed_Array (Etype (Formal))
@@ -2508,6 +2521,18 @@ package body Exp_Ch6 is
   and then not In_Subrange_Of (E_Actual, E_Formal)))
 then
Add_Call_By_Copy_Code;
+
+--  The actual denotes a variable which captures the value of an
+--  object for validation purposes. Add a copy-back to reflect any
+--  potential changes in value back into the original object.
+
+--Var : ... := Object;
+--if not Var'Valid then  --  validity check
+--

[COMMITTED] ada: Implement change to SPARK RM rule on state refinement

From: Yannick Moy 

SPARK RM 7.1.4(4) does not mandate anymore that a package with abstract
states has a completing body, unless the package state is mentioned in
Part_Of specifications. Implement that change.

gcc/ada/

* sem_prag.adb (Check_Part_Of_Abstract_State): Add verification
related to use of Part_Of, so that constituents in private childs
that refer to state in a sibling or parent unit force that unit to
have a body.
* sem_util.adb (Check_State_Refinements): Drop the requirement to
have always a package body for state refinement, when the package
state is mentioned in no Part_Of specification.
* sem_ch3.adb (Analyze_Declarations): Refresh SPARK refs in comment.
* sem_ch7.adb (Analyze_Package_Declaration): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch3.adb  |  3 ++-
 gcc/ada/sem_ch7.adb  | 10 +-
 gcc/ada/sem_prag.adb | 16 
 gcc/ada/sem_util.adb | 14 ++
 4 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index ce5a00b7fc8..61386e27feb 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -2942,7 +2942,8 @@ package body Sem_Ch3 is
  --  Verify that all abstract states found in any package declared in
  --  the input declarative list have proper refinements. The check is
  --  performed only when the context denotes a block, entry, package,
- --  protected, subprogram, or task body (SPARK RM 7.2.2(3)).
+ --  protected, subprogram, or task body (SPARK RM 7.1.4(4) and SPARK
+ --  RM 7.2.2(3)).
 
  Check_State_Refinements (Context);
 
diff --git a/gcc/ada/sem_ch7.adb b/gcc/ada/sem_ch7.adb
index 0fb9fe10ff6..284706981d6 100644
--- a/gcc/ada/sem_ch7.adb
+++ b/gcc/ada/sem_ch7.adb
@@ -1243,11 +1243,11 @@ package body Sem_Ch7 is
  Check_Completion;
 
  --  If the package spec does not require an explicit body, then all
- --  abstract states declared in nested packages cannot possibly get
- --  a proper refinement (SPARK RM 7.2.2(3)). This check is performed
- --  only when the compilation unit is the main unit to allow for
- --  modular SPARK analysis where packages do not necessarily have
- --  bodies.
+ --  abstract states declared in nested packages cannot possibly get a
+ --  proper refinement (SPARK RM 7.1.4(4) and SPARK RM 7.2.2(3)). This
+ --  check is performed only when the compilation unit is the main
+ --  unit to allow for modular SPARK analysis where packages do not
+ --  necessarily have bodies.
 
  if Is_Comp_Unit then
 Check_State_Refinements
diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 0a91518cff9..27bd879903e 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -63,6 +63,7 @@ with Sem;use Sem;
 with Sem_Aux;use Sem_Aux;
 with Sem_Ch3;use Sem_Ch3;
 with Sem_Ch6;use Sem_Ch6;
+with Sem_Ch7;use Sem_Ch7;
 with Sem_Ch8;use Sem_Ch8;
 with Sem_Ch12;   use Sem_Ch12;
 with Sem_Ch13;   use Sem_Ch13;
@@ -3567,6 +3568,21 @@ package body Sem_Prag is
 return;
  end if;
 
+ --  In the case of state in a (descendant of a private) child which
+ --  is Part_Of the state of another package, the package defining the
+ --  encapsulating abstract state should have a body, to ensure that it
+ --  has a state refinement (SPARK RM 7.1.4(4)).
+
+ if Enclosing_Comp_Unit_Node (Encap_Id) /=
+Enclosing_Comp_Unit_Node (Item_Id)
+   and then not Unit_Requires_Body (Scope (Encap_Id))
+ then
+SPARK_Msg_N
+  ("indicator Part_Of must denote abstract state of package "
+   & "with a body (SPARK RM 7.1.4(4))", Indic);
+return;
+ end if;
+
  --  At this point it is known that the Part_Of indicator is legal
 
  Legal := True;
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index f331b4b78ba..a13d9ebef5b 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -5450,12 +5450,18 @@ package body Sem_Util is
while Present (State_Elmt) loop
   State_Id := Node (State_Elmt);
 
-  --  Emit an error when a non-null state lacks any form of
-  --  refinement.
+  --  Emit an error when a non-null state lacks refinement,
+  --  but has Part_Of constituents or there is a package
+  --  body (SPARK RM 7.1.4(4)). Constituents in private
+  --  child packages, which are not known at this stage,
+  --  independently require the existence of a package body.
 
   if not Is_Null_State (State_Id)
-and then not Has_Null_Refinement

[PATCH] coroutines: Fix promotion of class members in co_await statements [PR99576]

2022-11-28 Thread Adrian Perl via Gcc-patches

Hi,

please have a look at the patch below for a potential fix that addresses the 
incorrect 'promotion'
of members found in temporaries within co_await statements.

To summarize my post in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99576#c5, 
the recursive
promotion of temporaries is too eager and also recurses into constructor 
statements where it
finds the initialization of members. This patch prevents the recursion into 
constructors and
skips the remaining subtree.

It fixes the issues in bug reports [PR99576, PR100611, PR101976, PR101367, 
PR107288] which are all
related to incorrect 'promotions' (and manifest in too many destructor calls).

I have added test applications based on examples in the PRs, which I have only 
slightly
annotated and refactored to allow automatic testing. They are all basically 
quiet similar.
The main difference is how the temporaries are used in the co_await (and 
co_yield) statements.

Bootstrapping and running the testsuite on x86_64 was successfull. No 
regression occured.

Please let me know if you need more information.

PR 100611
PR 101367
PR 101976
PR 99576

gcc/cp/ChangeLog:

* coroutines.cc (find_interesting_subtree): Prevent recursion into 
constructor

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr100611.C: New test.
* g++.dg/coroutines/pr101367.C: New test.
* g++.dg/coroutines/pr101976.C: New test.
* g++.dg/coroutines/pr99576_1.C: New test.
* g++.dg/coroutines/pr99576_2.C: New test.

---
 gcc/cp/coroutines.cc|   2 +
 gcc/testsuite/g++.dg/coroutines/pr100611.C  |  93 +++
 gcc/testsuite/g++.dg/coroutines/pr101367.C  |  78 +
 gcc/testsuite/g++.dg/coroutines/pr101976.C  |  76 
 gcc/testsuite/g++.dg/coroutines/pr99576_1.C | 123 
 gcc/testsuite/g++.dg/coroutines/pr99576_2.C |  71 +++
 6 files changed, 443 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr100611.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr101367.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr101976.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr99576_1.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr99576_2.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 01a3e831ee5..a87ea7fe60a 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -2684,6 +2684,8 @@ find_interesting_subtree (tree *expr_p, int *dosub, void 
*d)
  return expr;
}
 }
+  else if (TREE_CODE(expr) == CONSTRUCTOR)
+*dosub = 0; /* We don't need to consider this any further.  */
   else if (tmp_target_expr_p (expr)
   && !p->temps_used->contains (expr))
 {
diff --git a/gcc/testsuite/g++.dg/coroutines/pr100611.C 
b/gcc/testsuite/g++.dg/coroutines/pr100611.C
new file mode 100644
index 000..5fbcfa7e6ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr100611.C
@@ -0,0 +1,93 @@
+/*
+  Test that instances created in capture clauses within co_await statements do 
not
+  get 'promoted'. This would lead to the members destructor getting called more
+  than once.
+
+  Correct output should look like:
+  Foo(23) 0xf042d8
+  Foo(const& 23) 0xf042ec
+  ~Foo(23) 0xf042ec
+  After co_await
+  ~Foo(23) 0xf042d8
+*/
+#include 
+#include 
+
+static unsigned int struct_Foo_destructor_counter = 0;
+static bool lambda_was_executed = false;
+
+class Task {
+public:
+  struct promise_type {
+Task get_return_object() {
+  return {std::coroutine_handle::from_promise(*this)};
+}
+
+std::suspend_never initial_suspend() { return {}; }
+std::suspend_always final_suspend() noexcept { return {}; }
+void unhandled_exception() {}
+void return_void() {}
+  };
+
+  ~Task() {
+if (handle_) {
+  handle_.destroy();
+}
+  }
+
+  bool await_ready() { return false; }
+  bool await_suspend(std::coroutine_handle<>) { return false; }
+  bool await_resume() { return false; }
+
+private:
+  Task(std::coroutine_handle handle) : handle_(handle) {}
+
+  std::coroutine_handle handle_;
+};
+
+class Foo {
+public:
+  Foo(int id) : id_(id) {
+std::cout << "Foo(" << id_ << ") " << (void*)this << std::endl;
+  }
+
+  Foo(Foo const& other) : id_(other.id_) {
+std::cout << "Foo(const& " << id_ << ") " << (void*)this << std::endl;
+  }
+
+  Foo(Foo&& other) : id_(other.id_) {
+std::cout << "Foo(&& " << id_ << ") " << (void*)this << std::endl;
+  }
+
+  ~Foo() {
+std::cout << "~Foo(" << id_ << ") " << (void*)this << std::endl;
+struct_Foo_destructor_counter++;
+
+if (struct_Foo_destructor_counter > 2){
+  std::cout << "Foo was destroyed more than two times!\n";
+  __builtin_abort();
+}
+}
+
+private:
+  int id_;
+};
+
+Task test() {
+  Foo foo(23);
+
+  co_await [foo]() -> Task { // A copy of foo is captured. This copy must not 
get 'promoted'.
+co_return;
+  }();
+
+

Re: [PATCH] Fortran: ICE with elemental and dummy argument with VALUE attribute [PR107819]

2022-11-28 Thread Mikael Morin


Le 27/11/2022 à 21:32, Harald Anlauf via Fortran a écrit :

Dear Fortranners,

in dependency checking of arguments of elemental prodecures
we should treat dummy arguments with the value attribute as
implicitly having intent(in).  This is simple and obvious.

The PR by Gerhard provides a series of testcases that are
either valid (like the one in the attached patch), or
arguably non-conforming.  The issue is related to the
standard prescribing a temporary (in standardese language)
for the argument with the value attribute, while the
elemental attribute prescribes an application order.

Playing with other compiler brands, there seemed to be an
obvious discrepancy between NAG and Intel on the one side
and Intel on the other.  Steve Lionel attributed this to
non-conformance for the discussed case (see link in PR).

I therefore decided to only use a conforming testcase
for the testsuite, as this is sufficient to check for
the fix for the ICE.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


Yes.
Thanks.

Re: Extend fold_vec_perm to fold VEC_PERM_EXPR in VLA manner

2022-11-28 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 21 Nov 2022 at 14:37, Prathamesh Kulkarni
 wrote:
>
> On Fri, 4 Nov 2022 at 14:00, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 31 Oct 2022 at 15:27, Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > On Wed, 26 Oct 2022 at 21:07, Richard Sandiford
> > > >  wrote:
> > > >>
> > > >> Sorry for the slow response.  I wanted to find some time to think
> > > >> about this a bit more.
> > > >>
> > > >> Prathamesh Kulkarni  writes:
> > > >> > On Fri, 30 Sept 2022 at 21:38, Richard Sandiford
> > > >> >  wrote:
> > > >> >>
> > > >> >> Richard Sandiford via Gcc-patches  writes:
> > > >> >> > Prathamesh Kulkarni  writes:
> > > >> >> >> Sorry to ask a silly question but in which case shall we select 
> > > >> >> >> 2nd vector ?
> > > >> >> >> For num_poly_int_coeffs == 2,
> > > >> >> >> a1 /trunc n1 == (a1 + 0x) / (n1.coeffs[0] + n1.coeffs[1]*x)
> > > >> >> >> If a1/trunc n1 succeeds,
> > > >> >> >> 0 / n1.coeffs[1] == a1/n1.coeffs[0] == 0.
> > > >> >> >> So, a1 has to be < n1.coeffs[0] ?
> > > >> >> >
> > > >> >> > Remember that a1 is itself a poly_int.  It's not necessarily a 
> > > >> >> > constant.
> > > >> >> >
> > > >> >> > E.g. the TRN1 .D instruction maps to a VEC_PERM_EXPR with the 
> > > >> >> > selector:
> > > >> >> >
> > > >> >> >   { 0, 2 + 2x, 1, 4 + 2x, 2, 6 + 2x, ... }
> > > >> >>
> > > >> >> Sorry, should have been:
> > > >> >>
> > > >> >>   { 0, 2 + 2x, 2, 4 + 2x, 4, 6 + 2x, ... }
> > > >> > Hi Richard,
> > > >> > Thanks for the clarifications, and sorry for late reply.
> > > >> > I have attached POC patch that tries to implement the above approach.
> > > >> > Passes bootstrap+test on x86_64-linux-gnu and aarch64-linux-gnu for 
> > > >> > VLS vectors.
> > > >> >
> > > >> > For VLA vectors, I have only done limited testing so far.
> > > >> > It seems to pass couple of tests written in the patch for
> > > >> > nelts_per_pattern == 3,
> > > >> > and folds the following svld1rq test:
> > > >> > int32x4_t v = {1, 2, 3, 4};
> > > >> > return svld1rq_s32 (svptrue_b8 (), [0])
> > > >> > into:
> > > >> > return {1, 2, 3, 4, ...};
> > > >> > I will try to bootstrap+test it on SVE machine to test further for 
> > > >> > VLA folding.
> > > >> >
> > > >> > I have a couple of questions:
> > > >> > 1] When mask selects elements from same vector but from different 
> > > >> > patterns:
> > > >> > For eg:
> > > >> > arg0 = {1, 11, 2, 12, 3, 13, ...},
> > > >> > arg1 = {21, 31, 22, 32, 23, 33, ...},
> > > >> > mask = {0, 0, 0, 1, 0, 2, ... },
> > > >> > All have npatterns = 2, nelts_per_pattern = 3.
> > > >> >
> > > >> > With above mask,
> > > >> > Pattern {0, ...} selects arg0[0], ie {1, ...}
> > > >> > Pattern {0, 1, 2, ...} selects arg0[0], arg0[1], arg0[2], ie {1, 11, 
> > > >> > 2, ...}
> > > >> > While arg0[0] and arg0[2] belong to same pattern, arg0[1] belongs to 
> > > >> > different
> > > >> > pattern in arg0.
> > > >> > The result is:
> > > >> > res = {1, 1, 1, 11, 1, 2, ...}
> > > >> > In this case, res's 2nd pattern {1, 11, 2, ...} is encoded with:
> > > >> > with a0 = 1, a1 = 11, S = -9.
> > > >> > Is that expected tho ? It seems to create a new encoding which
> > > >> > wasn't present in the input vector. For instance, the next elem in
> > > >> > sequence would be -7,
> > > >> > which is not present originally in arg0.
> > > >>
> > > >> Yeah, you're right, sorry.  Going back to:
> > > >>
> > > >> (2) The explicit encoding can be used to produce a sequence of N*Ex*Px
> > > >> elements for any integer N.  This extended sequence can be 
> > > >> reencoded
> > > >> as having N*Px patterns, with Ex staying the same.
> > > >>
> > > >> I guess we need to pick an N for the selector such that each new
> > > >> selector pattern (each one out of the N*Px patterns) selects from
> > > >> the *same pattern* of the same data input.
> > > >>
> > > >> So if a particular pattern in the selector has a step S, and the data
> > > >> input it selects from has Pi patterns, N*S must be a multiple of Pi.
> > > >> N must be a multiple of least_common_multiple(S,Pi)/S.
> > > >>
> > > >> I think that means that the total number of patterns in the result
> > > >> (Pr from previous messages) can safely be:
> > > >>
> > > >>   Ps * least_common_multiple(
> > > >> least_common_multiple(S[1], P[input(1)]) / S[1],
> > > >> ...
> > > >> least_common_multiple(S[Ps], P[input(Ps)]) / S[Ps]
> > > >>   )
> > > >>
> > > >> where:
> > > >>
> > > >>   Ps = the number of patterns in the selector
> > > >>   S[I] = the step for selector pattern I (I being 1-based)
> > > >>   input(I) = the data input selected by selector pattern I (I being 
> > > >> 1-based)
> > > >>   P[I] = the number of patterns in data input I
> > > >>
> > > >> That's getting quite complicated :-)  If we allow arbitrary P[...]
> > > >> and S[...] then it could also get large.  Perhaps we should finally
> > > >> give up on the general case and limit this to power-of-2 patterns and
> > > >> power-of-2 steps, so that

Re: PING [PATCH v3] c++: Allow module name to be a single letter on Windows

2022-11-28 Thread Nathan Sidwell via Gcc-patches


On 11/25/22 14:03, Torbjorn SVENSSON wrote:

Hi,

Ping, https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606528.html


ok, thanks!



Kind regards,
Torbjörn

On 2022-11-17 14:20, Torbjörn SVENSSON wrote:

v1 -> v2:
Paths without "C:" part can still be absolute if they start with / or
\ on Windows.

v2 -> v3:
Use alternative approach by having platform specific code in module.cc.

Truth table for the new expression:
c:\foo -> true
c:/foo -> true
/foo   -> true
\foo   -> true
c:foo  -> false
foo    -> false
./foo  -> true
.\foo  -> true


Ok for trunk?

---

On Windows, the ':' character is special and when the module name is
a single character, like 'A', then the flatname would be (for
example) 'A:Foo'. On Windows, 'A:Foo' is treated as an absolute
path by the module loader and is likely not found.

Without this patch, the test case pr98944_c.C fails with:

In module imported at /src/gcc/testsuite/g++.dg/modules/pr98944_b.C:7:1,
of module A:Foo, imported at /src/gcc/testsuite/g++.dg/modules/pr98944_c.C:7:
A:Internals: error: header module expected, module 'A:Internals' found
A:Internals: error: failed to read compiled module: Bad file data
A:Internals: note: compiled module file is 'gcm.cache/A-Internals.gcm'
In module imported at /src/gcc/testsuite/g++.dg/modules/pr98944_c.C:7:8:
A:Foo: error: failed to read compiled module: Bad import dependency
A:Foo: note: compiled module file is 'gcm.cache/A-Foo.gcm'
A:Foo: fatal error: returning to the gate for a mechanical issue
compilation terminated.

gcc/cp/ChangeLog:

* module.cc: On Windows, 'A:Foo' is supposed to be a module
and not a path.

Tested on Windows with arm-none-eabi for Cortex-M3 in gcc-11 tree.

Co-Authored-By: Yvan ROUX 
Signed-off-by: Torbjörn SVENSSON 
---
  gcc/cp/module.cc | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 0e9af318ba4..fa41a86213f 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13960,7 +13960,15 @@ get_module (tree name, module_state *parent, bool 
partition)

  static module_state *
  get_module (const char *ptr)
  {
-  if (ptr[0] == '.' ? IS_DIR_SEPARATOR (ptr[1]) : IS_ABSOLUTE_PATH (ptr))
+  /* On DOS based file systems, there is an ambiguity with A:B which can be
+ interpreted as a module Module:Partition or Drive:PATH.  Interpret strings
+ which clearly starts as pathnames as header-names and everything else is
+ treated as a (possibly malformed) named moduled.  */
+  if (IS_DIR_SEPARATOR (ptr[ptr[0] == '.']) // ./FOO or /FOO
+#if HAVE_DOS_BASED_FILE_SYSTEM
+  || (HAS_DRIVE_SPEC (ptr) && IS_DIR_SEPARATOR (ptr[2])) // A:/FOO
+#endif
+  || false)
  /* A header name.  */
  return get_module (build_string (strlen (ptr), ptr));


--
Nathan Sidwell

Re: [PATCH][_GLIBCXX_INLINE_VERSION] Adapt to_chars/from_chars symbols