Re: [PATCH V3] Split loop for NE condition.

2021-06-21 Thread guojiufu via Gcc-patches

On 2021-06-21 16:51, Richard Biener wrote:

On Wed, 9 Jun 2021, guojiufu wrote:


On 2021-06-09 17:42, guojiufu via Gcc-patches wrote:
> On 2021-06-08 18:13, Richard Biener wrote:
>> On Fri, 4 Jun 2021, Jiufu Guo wrote:
>>
> cut...
>>> +  gcond *cond = as_a (last);
>>> +  enum tree_code code = gimple_cond_code (cond);
>>> +  if (!(code == NE_EXPR
>>> +  || (code == EQ_EXPR && (e->flags & EDGE_TRUE_VALUE
>>
>> The NE_EXPR check misses a corresponding && (e->flags & EDGE_FALSE_VALUE)
>> check.
>>
> Thanks, check (e->flags & EDGE_FALSE_VALUE) would be safer.
>
>>> +  continue;
>>> +
>>> +  /* Check if bound is invarant.  */
>>> +  tree idx = gimple_cond_lhs (cond);
>>> +  tree bnd = gimple_cond_rhs (cond);
>>> +  if (expr_invariant_in_loop_p (loop, idx))
>>> +  std::swap (idx, bnd);
>>> +  else if (!expr_invariant_in_loop_p (loop, bnd))
>>> +  continue;
>>> +
>>> +  /* Only unsigned type conversion could cause wrap.  */
>>> +  tree type = TREE_TYPE (idx);
>>> +  if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME
>>> +|| !TYPE_UNSIGNED (type))
>>> +  continue;
>>> +
>>> +  /* Avoid to split if bound is MAX/MIN val.  */
>>> +  tree bound_type = TREE_TYPE (bnd);
>>> +  if (TREE_CODE (bnd) == INTEGER_CST && INTEGRAL_TYPE_P (bound_type)
>>> +&& (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
>>> +|| tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type
>>> +  continue;
>>
>> Note you do not require 'bnd' to be constant and thus at runtime those
>> cases still need to be handled correctly.
> Yes, bnd is not required to be constant.  The above code is filtering the
> case
> where bnd is const max/min value of the type.  So, the code could be updated
> as:
>   if (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
>   || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type)))


Yes, and the comment adjusted to "if bound is known to be MAX/MIN val."


>>
>>> +  /* Check if there is possible wrap.  */
>>> +  class tree_niter_desc niter;
>>> +  if (!number_of_iterations_exit (loop, e, , false, false))
> cut...
>>> +
>>> +  /* Change if (i != n) to LOOP1:if (i > n) and LOOP2:if (i < n) */
>>
>> It now occurs to me that we nowhere check the evolution of IDX
>> (split_at_bb_p uses simple_iv for this for example).  The transform
>> assumes that we will actually hit i == n and that i increments, but
>> while you check the control IV from number_of_iterations_exit
>> for NE_EXPR that does not guarantee a positive evolution.
>>
> If I do not correctly reply your question, please point out:
> number_of_iterations_exit is similar with simple_iv to invoke
> simple_iv_with_niters
> which check the evolution, and number_of_iterations_exit check
> number_of_iterations_cond
> which check no_overflow more accurate, this is one reason I use this
> function.
>
> This transform assumes that the last run hits i==n.
> Otherwise, the loop may run infinitely wrap after wrap.
> For safe, if the step is 1 or -1,  this assumption would be true.  I
> would add this check.


OK.


> Thanks so much for pointing out I missed the negative step!
>
>> Your testcases do not include any negative step examples, but I guess
>> the conditions need to be swapped in this case?
>
> I would add cases and code to support step 1/-1.
>
>>
>> I think you also have to consider the order we split, say with
>>
>>   for (i = start; i != end; ++i)
>> {
>>   push (i);
>>   if (a[i] != b[i])
>> break;
>> }
>>
>> push (i) calls need to be in the same order for all cases of
>> start < end, start == end and start > end (and also cover
>> runtime testcases with end == 0 or end == UINT_MAX, likewise
>> for start).
> I add tests for the above cases. If missing sth, please point out, thanks!
>
>>
>>> +  bool inv = expr_invariant_in_loop_p (loop, gimple_cond_lhs (gc));
>>> +  enum tree_code up_code = inv ? LT_EXPR : GT_EXPR;
>>> +  enum tree_code down_code = inv ? GT_EXPR : LT_EXPR;
> cut
>
> Thanks again for the very helpful review!
>
> BR,
> Jiufu Guo.

Here is the updated patch, thanks for your time!

diff --git a/gcc/testsuite/gcc.dg/loop-split1.c
b/gcc/testsuite/gcc.dg/loop-split1.c
new file mode 100644
index 000..dd2d03a7b96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split1.c
@@ -0,0 +1,101 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+void
+foo (int *a, int *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+a[l] = b[l] + 1;
+}
+void
+foo_1 (int *a, int *b, unsigned n)
+{
+  unsigned l = 0;
+  while (++l != n)
+a[l] = b[l] + 1;
+}
+
+void
+foo1 (int *a, int *b, unsigned l, unsigned n)
+{
+  while (l++ != n)
+a[l] = b[l] + 1;
+}
+
+/* No wrap.  */
+void
+foo1_1 (int *a, int *b, unsigned n)
+{
+  unsigned l = 0;
+  while (l++ != n)
+a[l] = b[l] + 1;
+}
+
+unsigned
+foo2 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+if 

Re: [PATCH] Add vect_recog_popcount_pattern to handle mismatch between the vectorized popcount IFN and scalar popcount builtin.

2021-06-21 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 22, 2021 at 10:43 AM Hongtao Liu  wrote:
>
> On Mon, Jun 21, 2021 at 6:05 PM Richard Biener
>  wrote:
> >
> > On Thu, Jun 17, 2021 at 8:29 AM liuhongt  wrote:
> > >
> > > The patch remove those pro- and demotions when backend support direct
> > > optab.
> > >
> > > For i386: it enables vectorization for vpopcntb/vpopcntw and optimized
> > > for vpopcntq.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR tree-optimization/97770
> > > * tree-vect-patterns.c (vect_recog_popcount_pattern):
> > > New.
> > > (vect_recog_func vect_vect_recog_func_ptrs): Add new pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR tree-optimization/97770
> > > * gcc.target/i386/avx512bitalg-pr97770-1.c: Remove xfail.
> > > * gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Remove xfail.
> > > ---
> > >  .../gcc.target/i386/avx512bitalg-pr97770-1.c  |  27 +++--
> > >  .../i386/avx512vpopcntdq-pr97770-1.c  |   9 +-
> > >  gcc/tree-vect-patterns.c  | 110 ++
> > >  3 files changed, 127 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c 
> > > b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> > > index c83a477045c..d1beec4cdb4 100644
> > > --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> > > +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> > > @@ -1,19 +1,18 @@
> > >  /* PR target/97770 */
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-O2 -mavx512bitalg -mavx512vl 
> > > -mprefer-vector-width=512" } */
> > > -/* Add xfail since no IFN for QI/HImode popcount */
> > > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 
> > > 1 {xfail *-*-*} } } */
> > > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 
> > > 1 {xfail *-*-*} } } */
> > > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 
> > > 1 {xfail *-*-*} } } */
> > > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 
> > > 1 {xfail *-*-*} } } */
> > > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 
> > > 1 {xfail *-*-*} } } */
> > > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 
> > > 1 {xfail *-*-*} } } */
> > > +/* { dg-options "-O2 -march=icelake-server -mprefer-vector-width=512" } 
> > > */
> > > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 
> > > 1  } } */
> > > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 
> > > 1  } } */
> > > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 
> > > 1  } } */
> > > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 
> > > 1  } } */
> > > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 
> > > 1  } } */
> > > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 
> > > 1  } } */
> > >
> > >  #include 
> > >
> > >  void
> > >  __attribute__ ((noipa, optimize("-O3")))
> > > -popcountb_128 (char * __restrict dest, char* src)
> > > +popcountb_128 (unsigned char * __restrict dest, unsigned char* src)
> > >  {
> > >for (int i = 0; i != 16; i++)
> > >  dest[i] = __builtin_popcount (src[i]);
> > > @@ -21,7 +20,7 @@ popcountb_128 (char * __restrict dest, char* src)
> > >
> > >  void
> > >  __attribute__ ((noipa, optimize("-O3")))
> > > -popcountw_128 (short* __restrict dest, short* src)
> > > +popcountw_128 (unsigned short* __restrict dest, unsigned short* src)
> > >  {
> > >for (int i = 0; i != 8; i++)
> > >  dest[i] = __builtin_popcount (src[i]);
> > > @@ -29,7 +28,7 @@ popcountw_128 (short* __restrict dest, short* src)
> > >
> > >  void
> > >  __attribute__ ((noipa, optimize("-O3")))
> > > -popcountb_256 (char * __restrict dest, char* src)
> > > +popcountb_256 (unsigned char * __restrict dest, unsigned char* src)
> > >  {
> > >for (int i = 0; i != 32; i++)
> > >  dest[i] = __builtin_popcount (src[i]);
> > > @@ -37,7 +36,7 @@ popcountb_256 (char * __restrict dest, char* src)
> > >
> > >  void
> > >  __attribute__ ((noipa, optimize("-O3")))
> > > -popcountw_256 (short* __restrict dest, short* src)
> > > +popcountw_256 (unsigned short* __restrict dest, unsigned short* src)
> > >  {
> > >for (int i = 0; i != 16; i++)
> > >  dest[i] = __builtin_popcount (src[i]);
> > > @@ -45,7 +44,7 @@ popcountw_256 (short* __restrict dest, short* src)
> > >
> > >  void
> > >  __attribute__ ((noipa, optimize("-O3")))
> > > -popcountb_512 (char * __restrict dest, char* src)
> > > +popcountb_512 (unsigned char * __restrict dest, unsigned char* src)
> > >  {
> > >for (int i = 0; i != 64; i++)
> > >  dest[i] = __builtin_popcount (src[i]);
> > > @@ -53,7 +52,7 @@ popcountb_512 (char * __restrict dest, char* src)
> > >
> > >  void
> > >  __attribute__ ((noipa, optimize("-O3")))
> > > -popcountw_512 (short* __restrict dest, short* src)
> > > 

Re: [PATCH] Add vect_recog_popcount_pattern to handle mismatch between the vectorized popcount IFN and scalar popcount builtin.

2021-06-21 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 21, 2021 at 6:05 PM Richard Biener
 wrote:
>
> On Thu, Jun 17, 2021 at 8:29 AM liuhongt  wrote:
> >
> > The patch remove those pro- and demotions when backend support direct
> > optab.
> >
> > For i386: it enables vectorization for vpopcntb/vpopcntw and optimized
> > for vpopcntq.
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/97770
> > * tree-vect-patterns.c (vect_recog_popcount_pattern):
> > New.
> > (vect_recog_func vect_vect_recog_func_ptrs): Add new pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/97770
> > * gcc.target/i386/avx512bitalg-pr97770-1.c: Remove xfail.
> > * gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Remove xfail.
> > ---
> >  .../gcc.target/i386/avx512bitalg-pr97770-1.c  |  27 +++--
> >  .../i386/avx512vpopcntdq-pr97770-1.c  |   9 +-
> >  gcc/tree-vect-patterns.c  | 110 ++
> >  3 files changed, 127 insertions(+), 19 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c 
> > b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> > index c83a477045c..d1beec4cdb4 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> > @@ -1,19 +1,18 @@
> >  /* PR target/97770 */
> >  /* { dg-do compile } */
> > -/* { dg-options "-O2 -mavx512bitalg -mavx512vl -mprefer-vector-width=512" 
> > } */
> > -/* Add xfail since no IFN for QI/HImode popcount */
> > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1 
> > {xfail *-*-*} } } */
> > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1 
> > {xfail *-*-*} } } */
> > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1 
> > {xfail *-*-*} } } */
> > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1 
> > {xfail *-*-*} } } */
> > -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1 
> > {xfail *-*-*} } } */
> > -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1 
> > {xfail *-*-*} } } */
> > +/* { dg-options "-O2 -march=icelake-server -mprefer-vector-width=512" } */
> > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1  
> > } } */
> > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1  
> > } } */
> > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1  
> > } } */
> > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1  
> > } } */
> > +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1  
> > } } */
> > +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1  
> > } } */
> >
> >  #include 
> >
> >  void
> >  __attribute__ ((noipa, optimize("-O3")))
> > -popcountb_128 (char * __restrict dest, char* src)
> > +popcountb_128 (unsigned char * __restrict dest, unsigned char* src)
> >  {
> >for (int i = 0; i != 16; i++)
> >  dest[i] = __builtin_popcount (src[i]);
> > @@ -21,7 +20,7 @@ popcountb_128 (char * __restrict dest, char* src)
> >
> >  void
> >  __attribute__ ((noipa, optimize("-O3")))
> > -popcountw_128 (short* __restrict dest, short* src)
> > +popcountw_128 (unsigned short* __restrict dest, unsigned short* src)
> >  {
> >for (int i = 0; i != 8; i++)
> >  dest[i] = __builtin_popcount (src[i]);
> > @@ -29,7 +28,7 @@ popcountw_128 (short* __restrict dest, short* src)
> >
> >  void
> >  __attribute__ ((noipa, optimize("-O3")))
> > -popcountb_256 (char * __restrict dest, char* src)
> > +popcountb_256 (unsigned char * __restrict dest, unsigned char* src)
> >  {
> >for (int i = 0; i != 32; i++)
> >  dest[i] = __builtin_popcount (src[i]);
> > @@ -37,7 +36,7 @@ popcountb_256 (char * __restrict dest, char* src)
> >
> >  void
> >  __attribute__ ((noipa, optimize("-O3")))
> > -popcountw_256 (short* __restrict dest, short* src)
> > +popcountw_256 (unsigned short* __restrict dest, unsigned short* src)
> >  {
> >for (int i = 0; i != 16; i++)
> >  dest[i] = __builtin_popcount (src[i]);
> > @@ -45,7 +44,7 @@ popcountw_256 (short* __restrict dest, short* src)
> >
> >  void
> >  __attribute__ ((noipa, optimize("-O3")))
> > -popcountb_512 (char * __restrict dest, char* src)
> > +popcountb_512 (unsigned char * __restrict dest, unsigned char* src)
> >  {
> >for (int i = 0; i != 64; i++)
> >  dest[i] = __builtin_popcount (src[i]);
> > @@ -53,7 +52,7 @@ popcountb_512 (char * __restrict dest, char* src)
> >
> >  void
> >  __attribute__ ((noipa, optimize("-O3")))
> > -popcountw_512 (short* __restrict dest, short* src)
> > +popcountw_512 (unsigned short* __restrict dest, unsigned short* src)
> >  {
> >for (int i = 0; i != 32; i++)
> >  dest[i] = __builtin_popcount (src[i]);
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c 
> > 

predcom: Refactor more by encapsulating global states

2021-06-21 Thread Kewen.Lin via Gcc-patches
Hi Richi and Martin,

>>
>> Thanks Richi!  One draft (not ready for review) is attached for the further
>> discussion.  It follows the idea of RAII-style cleanup.  I noticed that
>> Martin suggested stepping forward to make tree_predictive_commoning_loop
>> and its callees into one class (Thanks Martin), since there are not many
>> this kind of C++-style work functions, I want to double confirm which option
>> do you guys prefer?
>>
> 
> Such general cleanup is of course desired - Giuliano started some of it within
> GSoC two years ago in the attempt to thread the compilation process.  The
> cleanup then helps to get rid of global state which of course interferes here
> (and avoids unnecessary use of TLS vars).
> 
> So yes, encapsulating global state into a class and making accessors
> member functions is something that is desired (but a lot of mechanical
> work).
> 
> Thanks
> Richard.
> 
> I meant that not necessarily as something to include in this patch
> but as a suggestion for a future improvement.  If you'd like to
> tackle it at any point that would be great of course   In any
> event, thanks for double-checking!
>
> The attached patch looks good to me as well (more for the sake of
> style than anything else, declaring the class copy ctor and copy assignment = 
> delete would > make it clear it's not meant to be
> copied, although in this case it's unlikely to make a practical
> difference). 
> 
> Martin.


Thanks for your explanation!  Sorry for the late response.
As the way to encapsulate global state into a class and making accessors
member functions looks more complete, I gave up the RAII draft and
switched onto this way.

This patch is to encapsulate global states into a class and
making their accessors as member functions, remove some
consequent useless clean up code, and do some clean up with
RAII.

Bootstrapped/regtested on powerpc64le-linux-gnu P9,
x86_64-redhat-linux and aarch64-linux-gnu, also
bootstrapped on ppc64le P9 with bootstrap-O3 config.

Is it ok for trunk?

BR,
Kewen
-
gcc/ChangeLog:

* tree-predcom.c (class pcom_worker): New class.
(release_chain): Renamed to...
(pcom_worker::release_chain): ...this.
(release_chains): Renamed to...
(pcom_worker::release_chains): ...this.
(aff_combination_dr_offset): Renamed to...
(pcom_worker::aff_combination_dr_offset): ...this.
(determine_offset): Renamed to...
(pcom_worker::determine_offset): ...this.
(class comp_ptrs): New class.
(split_data_refs_to_components): Renamed to...
(pcom_worker::split_data_refs_to_components): ...this,
and update with class comp_ptrs.
(suitable_component_p): Renamed to...
(pcom_worker::suitable_component_p): ...this.
(filter_suitable_components): Renamed to...
(pcom_worker::filter_suitable_components): ...this.
(valid_initializer_p): Renamed to...
(pcom_worker::valid_initializer_p): ...this.
(find_looparound_phi): Renamed to...
(pcom_worker::find_looparound_phi): ...this.
(add_looparound_copies): Renamed to...
(pcom_worker::add_looparound_copies): ...this.
(determine_roots_comp): Renamed to...
(pcom_worker::determine_roots_comp): ...this.
(determine_roots): Renamed to...
(pcom_worker::determine_roots): ...this.
(single_nonlooparound_use): Renamed to...
(pcom_worker::single_nonlooparound_use): ...this.
(remove_stmt): Renamed to...
(pcom_worker::remove_stmt): ...this.
(execute_pred_commoning_chain): Renamed to...
(pcom_worker::execute_pred_commoning_chain): ...this.
(execute_pred_commoning): Renamed to...
(pcom_worker::execute_pred_commoning): ...this.
(struct epcc_data): New member worker.
(execute_pred_commoning_cbck): Call execute_pred_commoning
with pcom_worker pointer.
(find_use_stmt): Renamed to...
(pcom_worker::find_use_stmt): ...this.
(find_associative_operation_root): Renamed to...
(pcom_worker::find_associative_operation_root): ...this.
(find_common_use_stmt): Renamed to...
(pcom_worker::find_common_use_stmt): ...this.
(combinable_refs_p): Renamed to...
(pcom_worker::combinable_refs_p): ...this.
(reassociate_to_the_same_stmt): Renamed to...
(pcom_worker::reassociate_to_the_same_stmt): ...this.
(stmt_combining_refs): Renamed to...
(pcom_worker::stmt_combining_refs): ...this.
(combine_chains): Renamed to...
(pcom_worker::combine_chains): ...this.
(try_combine_chains): Renamed to...
(pcom_worker::try_combine_chains): ...this.
(prepare_initializers_chain): Renamed to...
(pcom_worker::prepare_initializers_chain): ...this.
(prepare_initializers): Renamed to...
(pcom_worker::prepare_initializers): ...this.
(prepare_finalizers_chain): Renamed 

[r12-1702 Regression] FAIL: g++.target/i386/empty-class1.C -std=c++2a scan-rtl-dump-not expand "set" on Linux/x86_64

2021-06-21 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

7232f7c4c2d727431096a7ecfcf4ad4db71dcf2a is the first bad commit
commit 7232f7c4c2d727431096a7ecfcf4ad4db71dcf2a
Author: Jason Merrill 
Date:   Sun Jun 13 14:00:12 2021 -0400

expand: empty class return optimization [PR88529]

caused

FAIL: g++.target/i386/empty-class1.C  -std=c++14  scan-rtl-dump-not expand "set"
FAIL: g++.target/i386/empty-class1.C  -std=c++17  scan-rtl-dump-not expand "set"
FAIL: g++.target/i386/empty-class1.C  -std=c++2a  scan-rtl-dump-not expand "set"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-1702/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=g++.target/i386/empty-class1.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=g++.target/i386/empty-class1.C 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[PATCH] docs: drop unbalanced parenthesis in rtl.texi

2021-06-21 Thread Sergei Trofimovich via Gcc-patches
From: Sergei Trofimovich 

gcc/ChangeLog:

* doc/rtl.texi: drop unbalanced parenthesis.
---
 gcc/doc/rtl.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 5af71137a87..e1e76a93a8b 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -144,7 +144,7 @@ Currently, @file{rtl.def} defines these classes:
 @item RTX_OBJ
 An RTX code that represents an actual object, such as a register
 (@code{REG}) or a memory location (@code{MEM}, @code{SYMBOL_REF}).
-@code{LO_SUM}) is also included; instead, @code{SUBREG} and
+@code{LO_SUM} is also included; instead, @code{SUBREG} and
 @code{STRICT_LOW_PART} are not in this class, but in class
 @code{RTX_EXTRA}.
 
-- 
2.32.0



Re: [PATCH] Modula-2 into the GCC tree on master

2021-06-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 21, 2021 at 11:36:48PM +0100, Gaius Mulley via Gcc-patches wrote:
> > : error: the file containing the definition module 
> > <80><98>M2RTS
> > <80><99> cannot be found
> > compiler exited with status 1
> > output is:
> > : error: the file containing the definition module 
> > <80><98>M2RTS
> > <80><99> cannot be found
> 
> ah yes, it would be good to make it autoconf locale utf-8

No, whether gcc is configured on an UTF-8 capable terminal or using UTF-8
locale doesn't imply whether it will actually be used in such a terminal
later on.
See e.g. gcc/intl.c (gcc_init_libintl) how it decides whether to use UTF-8
or normal quotes.

Jakub



Re: [PATCH] Modula-2 into the GCC tree on master

2021-06-21 Thread Gaius Mulley via Gcc-patches
Segher Boessenkool  writes:

> On Sat, Jun 19, 2021 at 09:09:05AM -0500, Segher Boessenkool wrote:
>> powerpc64-linux now is building, and is running the tetsuite.  My
>> powerpc64le-linux build used --enable-languages=all, but Ada fails to
>> build, so I'll redo that without Ada.
>
> For powerpc64le-linux I get
>
> === gm2 tests ===
>
>
> Running target unix
> FAIL: gm2/pim/fail/TestLong4.mod,  -g
> FAIL: gm2/pim/fail/TestLong4.mod,  -O
> FAIL: gm2/pim/fail/TestLong4.mod,  -O -g
> FAIL: gm2/pim/fail/TestLong4.mod,  -Os
> FAIL: gm2/pim/fail/TestLong4.mod,  -O3 -fomit-frame-pointer
> FAIL: gm2/pim/fail/TestLong4.mod,  -O3 -fomit-frame-pointer -finline-functions
> FAIL: gm2/pimlib/logitech/run/pass/realconv.mod execution,  -g
> FAIL: gm2/pimlib/logitech/run/pass/realconv.mod execution,  -O
> FAIL: gm2/pimlib/logitech/run/pass/realconv.mod execution,  -O -g
> FAIL: gm2/pimlib/logitech/run/pass/realconv.mod execution,  -Os
> FAIL: gm2/pimlib/logitech/run/pass/realconv.mod execution,  -O3 
> -fomit-frame-pointer
> FAIL: gm2/pimlib/logitech/run/pass/realconv.mod execution,  -O3 
> -fomit-frame-pointer -finline-functions
>
> === gm2 Summary ===
>
> # of expected passes11610
> # of unexpected failures12
>
> So that is excellent, only two failing tests :-)

yes indeed - I see TestLong4.mod fail on some of my x86_64 test
machines.

> For BE there is more:
>
> A whole bunch of testcases fail to build (both 32-bit and 64-bit).  I
> don't know yet.
>
> The realconv.mod testcase fails at all optimisation levels (also -O0).
>
> setarith*.mod and setrotate*.mod and setshift*.mod and simple*.mod fail
> to build.  Also cardrange*.mod and intrange*.mod and multint*.mod and
> realrange*.mod and subrange.mod and cardrange.mod and forcheck.mod.
> And the extended-opaque tests.  And more :-)
>
> : error: the file containing the definition module <80><98>M2RTS
> <80><99> cannot be found
> compiler exited with status 1
> output is:
> : error: the file containing the definition module <80><98>M2RTS
> <80><99> cannot be found

ah yes, it would be good to make it autoconf locale utf-8

> (That is UTF-8 quotation marks, and I do not use an UTF-8 locale there
> btw.  That is just a cosmetic problem of course.)
>
> Does this have to do with gm2tools?

Ah just examined simple2.mod and yes absolutely [to the set test
failures] - gm2tools uses word sized set types to implement the first
and follow set recursive descent parsers.  simple2.mod is performing bit
exclusion on a word sized set and failing :-) - which is good news as
this should be easy to debug.  Thanks for testing!




regards,
Gaius


[PATCH] correct handling of variable offset minus constant in -Warray-bounds (PR 100137)

2021-06-21 Thread Martin Sebor via Gcc-patches

-Warray-bounds relies on similar logic as -Wstringop-overflow et al.,
but using its own algorithm, including its own bugs such as PR 100137.
The attached patch takes the first step toward unifying the logic
between the warnings.  It changes a subset of -Warray-bounds to call
compute_objsize() to detect out-of-bounds indices.  Besides fixing
the bug this also nicely simplifies the code and improves
the consistency between the informational messages printed by both
classes of warnings.

The changes to the test suite are extensive mainly because of
the different format of the diagnostics resulting from slightly
tighter bounds of offsets computed by the new algorithm, and in
smaller part because the change lets -Warray-bounds diagnose some
problems it previously missed due to the limitations of its own
solution.

The false positive reported in PR 100137 is a 10/11/12 regression
but this change is too intrusive to backport.  I have a smaller
and more targeted patch I plan to backport in its stead.

Tested on x86_64-linux.

Martin
Correct handling of variable offset minus constant in -Warray-bounds [PR100137]

Resolves:
PR tree-optimization/100137 - -Warray-bounds false positive on varying offset plus negative
PR tree-optimization/99121 - ICE in -Warray-bounds on a multidimensional
PR tree-optimization/97027 - missing warning on buffer overflow storing a larger scalar into a smaller array

gcc/ChangeLog:

	* builtins.c (access_ref::access_ref): Also set offmax.
	(access_ref::offset_in_range): Define new function.
	(access_ref::add_offset): Set offmax.
	(access_ref::inform_access): Handle access_none.
	(handle_mem_ref): Clear ostype.
	(compute_objsize_r): Handle ASSERT_EXPR.
	* builtins.h (struct access_ref): Add offmax member.
	* gimple-array-bounds.cc (array_bounds_checker::check_mem_ref): Use
	compute_objsize() and simplify.

gcc/testsuite/ChangeLog:

	* c-c++-common/Warray-bounds-3.c: Remove xfail
	* c-c++-common/Warray-bounds-4.c: Add an expected warning.
	* g++.dg/warn/Warray-bounds-10.C: Adjust text of expected messages.
	* g++.dg/warn/Warray-bounds-11.C: Same.
	* g++.dg/warn/Warray-bounds-12.C: Same.
	* g++.dg/warn/Warray-bounds-13.C: Same.
	* g++.dg/warn/Warray-bounds-17.C: Same.
	* g++.dg/warn/Warray-bounds-20.C: Same.
	* gcc.dg/Warray-bounds-29.c: Same.
	* gcc.dg/Warray-bounds-30.c: Add xfail.
	* gcc.dg/Warray-bounds-31.c: Adjust text of expected messages.
	* gcc.dg/Warray-bounds-32.c: Same.
	* gcc.dg/Warray-bounds-52.c: Same.
	* gcc.dg/Warray-bounds-53.c: Same.
	* gcc.dg/Warray-bounds-58.c: Remove xfail.
	* gcc.dg/Warray-bounds-63.c: Adjust text of expected messages.
	* gcc.dg/Warray-bounds-66.c: Same.
	* gcc.dg/Warray-bounds-69.c: Same.
	* gcc.dg/Wstringop-overflow-34.c: Same.
	* gcc.dg/Wstringop-overflow-47.c: Same.
	* gcc.dg/Wstringop-overflow-61.c: Same.
	* gcc.dg/Warray-bounds-71.c: New test.
	* gcc.dg/Warray-bounds-72.c: New test.
	* gcc.dg/Warray-bounds-73.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 855ad1eb6bb..f39a7fd93e7 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -206,6 +206,7 @@ access_ref::access_ref (tree bound /* = NULL_TREE */,
 {
   /* Set to valid.  */
   offrng[0] = offrng[1] = 0;
+  offmax[0] = offmax[1] = 0;
   /* Invalidate.   */
   sizrng[0] = sizrng[1] = -1;
 
@@ -457,6 +458,21 @@ access_ref::size_remaining (offset_int *pmin /* = NULL */) const
   return sizrng[1] - or0;
 }
 
+/* Return true if the offset and object size are in range for SIZE.  */
+
+bool
+access_ref::offset_in_range (const offset_int ) const
+{
+  if (size_remaining () < size)
+return false;
+
+  if (base0)
+return offmax[0] >= 0 && offmax[1] <= sizrng[1];
+
+  offset_int maxoff = wi::to_offset (TYPE_MAX_VALUE (ptrdiff_type_node));
+  return offmax[0] > -maxoff && offmax[1] < maxoff;
+}
+
 /* Add the range [MIN, MAX] to the offset range.  For known objects (with
zero-based offsets) at least one of whose offset's bounds is in range,
constrain the other (or both) to the bounds of the object (i.e., zero
@@ -493,6 +509,8 @@ void access_ref::add_offset (const offset_int , const offset_int )
   if (max >= 0)
 	{
 	  offrng[0] = 0;
+	  if (offmax[0] > 0)
+	offmax[0] = 0;
 	  return;
 	}
 
@@ -509,6 +527,12 @@ void access_ref::add_offset (const offset_int , const offset_int )
 	offrng[0] = 0;
 }
 
+  /* Set the minimum and maximmum computed so far. */
+  if (offrng[1] < 0 && offrng[1] < offmax[0])
+offmax[0] = offrng[1];
+  if (offrng[0] > 0 && offrng[0] > offmax[1])
+offmax[1] = offrng[0];
+
   if (!base0)
 return;
 
@@ -4575,23 +4599,46 @@ access_ref::inform_access (access_mode mode) const
   return;
 }
 
+  if (mode == access_read_only)
+{
+  if (allocfn == NULL_TREE)
+	{
+	  if (*offstr)
+	inform (loc, "at offset %s into source object %qE of size %s",
+		offstr, ref, sizestr);
+	  else
+	inform (loc, "source object %qE of size %s", ref, sizestr);
+
+	  return;
+	}
+
+  if (*offstr)
+	inform (loc,
+		"at offset %s 

[PATCH] IBM Z: Define NO_PROFILE_COUNTERS

2021-06-21 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure.  Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.
---
 gcc/config/s390/s390.c | 14 ++
 gcc/config/s390/s390.h |  2 ++
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..96c9a9db53b 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13110,17 +13110,13 @@ output_asm_nops (const char *user, int hw)
 }
 }
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
+/* Output assembler code to FILE to call a profiler hook.  */
 
 void
-s390_function_profiler (FILE *file, int labelno)
+s390_function_profiler (FILE *file, int /* labelno */)
 {
   rtx op[8];
 
-  char label[128];
-  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
-
   fprintf (file, "# function profiler \n");
 
   op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
@@ -13128,10 +13124,6 @@ s390_function_profiler (FILE *file, int labelno)
   op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
   op[7] = GEN_INT (UNITS_PER_LONG);
 
-  op[2] = gen_rtx_REG (Pmode, 1);
-  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
-  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
-
   op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
   if (flag_pic)
 {
@@ -13162,7 +13154,6 @@ s390_function_profiler (FILE *file, int labelno)
  output_asm_insn ("stg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
  output_asm_insn ("brasl\t%0,%4", op);
  output_asm_insn ("lg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
@@ -13179,7 +13170,6 @@ s390_function_profiler (FILE *file, int labelno)
  output_asm_insn ("st\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
  output_asm_insn ("brasl\t%0,%4", op);
  output_asm_insn ("l\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 3b876160420..fb16a455a03 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
 
 #define PROFILE_BEFORE_PROLOGUE 1
 
+#define NO_PROFILE_COUNTERS 1
+
 
 /* Trampolines for nested functions.  */
 
-- 
2.31.1



[PING][PATCH 10/13] v2 Use new per-location warning APIs in the middle end

2021-06-21 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571981.html

Looking for a review of the middle end changes to replace the uses
of TREE_NO_WARNING and gimple_{get,set}_no_warning with the new
warning group APIs.  Most of the changes are a mechanical search
and replace kind, just a handful do ever-so-slightly more than
that, but none of those should have any observable effect.

On 6/4/21 3:43 PM, Martin Sebor wrote:

The attached patch introduces declarations of the new
suppress_warning(), warning_suppressed_p(), and copy_warning() APIs,
and replaces the uses of TREE_NO_WARNING in the middle end with them.





[PING][PATCH 9/13] v2 Use new per-location warning APIs in LTO

2021-06-21 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571980.html

Looking for a review of the LTO changes to switch TREE_NO_WARNING to
the suppress_warning() API.

On 6/4/21 3:43 PM, Martin Sebor wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the LTO
front end with the new suppress_warning() API.  It adds a couple of
FIXMEs that I plan to take care of in a follow up.




[PING][PATCH 7/13] v2 Use new per-location warning APIs in the FORTRAN front end

2021-06-21 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571978.html

Looking for an approval of the 99%  mechanical changes to switch
the FORTRAN front end from TREE_NO_WARNING to the new suppress_warning()
API.  There's only one place in this patch where a specific warning is
being suppressed -Wuninitialized.  All other calls suppress all
warnings as before, so I don't expect the patch to have any visible
changes.

On 6/4/21 3:42 PM, Martin Sebor wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the FORTRAN
front end with the new suppress_warning() API.




[PING][PATCH 6/13] v2 Use new per-location warning APIs in the C++ front end

2021-06-21 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571977.html

Looking for a review of the C++ front end changes to switch to the new
suppress_warning() API.

On 6/4/21 3:42 PM, Martin Sebor wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the C++
front end with the new suppress_warning(), warning_suppressed_p(),
and copy_warning() APIs.




[PING][PATCH 4/13] v2 Use new per-location warning APIs in C family code

2021-06-21 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571975.html

I'm looking for a review of the mostly mechanical shared subset of
the C and C++ front end changes to the new suppress_warning() API.

On 6/4/21 3:42 PM, Martin Sebor wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the shared
C family front end with the new suppress_warning(),
warning_suppressed_p(), and copy_warning() APIs.




[PING][PATCH 3/13] v2 Use new per-location warning APIs in C front end

2021-06-21 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571974.html

I'm looking for a review of the mostly mechanical C front end changes
to the new suppress_warning() API.

On 6/4/21 3:41 PM, Martin Sebor wrote:

The attached patch replaces the uses of TREE_NO_WARNING in the C front
end with the new suppress_warning(), warning_suppressed_p(), and
copy_warning() APIs.




[PING][PATCH 1/13] v2 [PATCH 1/13] Add support for per-location warning groups (PR 74765)

2021-06-21 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571973.html

Looking for a review of v2 of the diagnostic infrastructure bits.

On 6/4/21 3:41 PM, Martin Sebor wrote:

The attached patch introduces the suppress_warning(),
warning_suppressed(), and copy_no_warning() APIs without making
use of them in the rest of GCC.  They are in three files:

   diagnostic-spec.{h,c}: Location-centric overloads.
   warning-control.cc: Tree- and gimple*-centric overloads.

The location-centric overloads are suitable to use from the diagnostic
subsystem.  The rest can be used from the front ends and the middle end.




Re: [patch v2] Fortran: fix sm computation in CFI_allocate [PR93524]

2021-06-21 Thread Sandra Loosemore

On 6/21/21 5:42 AM, Tobias Burnus wrote:

On 21.06.21 08:05, Sandra Loosemore wrote:

I ran into this bug in CFI_allocate while testing something else and 
then realized there was already a PR open for it.  It seems like an 
easy fix, and I've used Tobias's test case from the issue more or less 
verbatim.


There were some other bugs added on to this issue but I think they 
have all been fixed already except for this one.


OK to check in?

OK – but see some comments below.


Revised patch attached.  How's this one?

-Sandra
commit 323fda07729fa0b0f2d1f8b4269db874280ac318
Author: Sandra Loosemore 
Date:   Mon Jun 21 13:25:55 2021 -0700

Fortran: fix sm computation in CFI_allocate [PR93524]

This patch fixes a bug in setting the step multiplier field in the
C descriptor for array dimensions > 2.

2021-06-21  Sandra Loosemore  
Tobias Burnus  

libgfortran/
	PR fortran/93524
	* runtime/ISO_Fortran_binding.c (CFI_allocate): Fix
	sm computation.

gcc/testsuite/
	PR fortran/93524
	* gfortran.dg/pr93524.c: New.
	* gfortran.dg/pr93524.f90: New.

diff --git a/gcc/testsuite/gfortran.dg/pr93524.c b/gcc/testsuite/gfortran.dg/pr93524.c
new file mode 100644
index 000..24e5e09
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr93524.c
@@ -0,0 +1,33 @@
+/* Test the fix for PR93524, in which CFI_allocate was computing
+   sm incorrectly for dimensions > 2.  */
+
+#include   // For size_t
+#include "../../../libgfortran/ISO_Fortran_binding.h"
+
+void my_fortran_sub_1 (CFI_cdesc_t *dv); 
+void my_fortran_sub_2 (CFI_cdesc_t *dv); 
+
+int main ()
+{
+  CFI_CDESC_T (3) a;
+  CFI_cdesc_t *dv = (CFI_cdesc_t *) 
+  // dv, base_addr, attribute,type, elem_len, rank, extents
+  CFI_establish (dv, NULL, CFI_attribute_allocatable, CFI_type_float, 0, 3, NULL); 
+
+  if (dv->base_addr != NULL)
+return 1;  // shall not be allocated
+
+  CFI_index_t lower_bounds[] = {-10, 0, 3}; 
+  CFI_index_t upper_bounds[] = {10, 5, 10}; 
+  size_t elem_len = 0;  // only needed for strings
+  if (CFI_SUCCESS != CFI_allocate (dv, lower_bounds, upper_bounds, elem_len))
+return 2;
+
+  if (!CFI_is_contiguous (dv))
+return 2;  // allocatables shall be contiguous,unless a strided section is used
+
+  my_fortran_sub_1 (dv);
+  my_fortran_sub_2 (dv);
+  CFI_deallocate (dv);
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/pr93524.f90 b/gcc/testsuite/gfortran.dg/pr93524.f90
new file mode 100644
index 000..0cebc8f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr93524.f90
@@ -0,0 +1,17 @@
+! { dg-additional-sources pr93524.c }
+! { dg-do run }
+!
+! Test the fix for PR93524.  The main program is in pr93524.c.
+
+subroutine my_fortran_sub_1 (A) bind(C)
+  real :: A(:, :, :)
+  if (any (lbound(A) /= 1)) stop 1
+  if (any (ubound(A) /= [21,6,8])) stop 2
+  if (.not. is_contiguous (A)) stop 3
+end
+subroutine my_fortran_sub_2 (A) bind(C)
+  real, ALLOCATABLE :: A(:, :, :)
+  if (any (lbound(A) /= [-10,0,3])) stop 1
+  if (any (ubound(A) /= [10,5,10])) stop 2
+  if (.not. is_contiguous (A)) stop 3
+end subroutine my_fortran_sub_2
diff --git a/libgfortran/runtime/ISO_Fortran_binding.c b/libgfortran/runtime/ISO_Fortran_binding.c
index 20833ad..0978832 100644
--- a/libgfortran/runtime/ISO_Fortran_binding.c
+++ b/libgfortran/runtime/ISO_Fortran_binding.c
@@ -254,10 +254,7 @@ CFI_allocate (CFI_cdesc_t *dv, const CFI_index_t lower_bounds[],
 	{
 	  dv->dim[i].lower_bound = lower_bounds[i];
 	  dv->dim[i].extent = upper_bounds[i] - dv->dim[i].lower_bound + 1;
-	  if (i == 0)
-	dv->dim[i].sm = dv->elem_len;
-	  else
-	dv->dim[i].sm = dv->elem_len * dv->dim[i - 1].extent;
+	  dv->dim[i].sm = dv->elem_len * arr_len;
 	  arr_len *= dv->dim[i].extent;
 }
 }


Re: [Patch, fortran V2] PR fortran/93308/93963/94327/94331/97046 problems raised by descriptor handling

2021-06-21 Thread Tobias Burnus

Hi José,

On 21.06.21 19:52, José Rui Faustino de Sousa wrote:

On 21/06/21 16:46, Tobias Burnus wrote:

Well, as said: directly into the compiler where currently the call to
libgomp is.


(should be libgfortran)

I meant converting the operation done
by the libgfortran/runtime/ISO_Fortran_binding.c functions
* cfi_desc_to_gfc_desc and
*gfc_desc_to_cfi_desc

into tree code, generated in place by the current callers
* gfor_fndecl_gfc_to_cfi (in trans-decl.c)
* gfc_conv_gfc_desc_to_cfi_desc (in trans-expr.c)

And then effectively retiring those functions (except for
old code which still calls them).

 * * *

However, that's independent from the patch you had submitted
and which is fine except for the two tiny nits.

Tobias
-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[PATCH] ipa-sra: Fix thinko when overriding safe_to_import_accesses (PR 101066)

2021-06-21 Thread Martin Jambor
Hi,

The "new" IPA-SRA has a more difficult job than the previous
not-truly-IPA version when identifying situations in which a parameter
passed by reference can be passed into a third function and only thee
converted to one passed by value (and possibly "split" at the same
time).

In order to allow this, two conditions must be fulfilled.  First the
call to the third function must happen before any modifications of
memory, because it could change the value passed by reference.
Second, in order to make sure we do not introduce new (invalid)
dereferences, the call must postdominate the entry BB.

The second condition is actually not necessary if the caller function
is also certain to dereference the pointer but the first one must
still hold.  Unfortunately, the code making this overriding decision
also happen to trigger when the first condition is not fulfilled.
This is fixed in the following patch.

Bootstrapped, LTO-bootstrapped and tested on x86_64-linux, OK for trunk
and the gcc-11 branch?  On gcc-10, I might just remove the override
altogether, the case might not be important enough to change LTO format.

Thanks,

Martin



gcc/ChangeLog:

2021-06-16  Martin Jambor  

PR ipa/101066
* ipa-sra.c (class isra_call_summary): New member
m_before_any_store, initialize it in the constructor.
(isra_call_summary::dump): Dump the new field.
(ipa_sra_call_summaries::duplicate): Copy it.
(process_scan_results): Set it.
(isra_write_edge_summary): Stream it.
(isra_read_edge_summary): Likewise.
(param_splitting_across_edge): Only override
safe_to_import_accesses if m_before_any_store is set.

gcc/testsuite/ChangeLog:

2021-06-16  Martin Jambor  

PR ipa/101066
* gcc.dg/ipa/pr101066.c: New test.
---
 gcc/ipa-sra.c   | 15 +--
 gcc/testsuite/gcc.dg/ipa/pr101066.c | 20 
 2 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr101066.c

diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
index 3272daf56e4..965e246d788 100644
--- a/gcc/ipa-sra.c
+++ b/gcc/ipa-sra.c
@@ -343,7 +343,7 @@ class isra_call_summary
 public:
   isra_call_summary ()
 : m_arg_flow (), m_return_ignored (false), m_return_returned (false),
-  m_bit_aligned_arg (false)
+  m_bit_aligned_arg (false), m_before_any_store (false)
   {}
 
   void init_inputs (unsigned arg_count);
@@ -362,6 +362,10 @@ public:
 
   /* Set when any of the call arguments are not byte-aligned.  */
   unsigned m_bit_aligned_arg : 1;
+
+  /* Set to true if the call happend before any (other) store to memory in the
+ caller.  */
+  unsigned m_before_any_store : 1;
 };
 
 /* Class to manage function summaries.  */
@@ -491,6 +495,8 @@ isra_call_summary::dump (FILE *f)
 fprintf (f, "return value ignored\n");
   if (m_return_returned)
 fprintf (f, "return value used only to compute caller return value\n");
+  if (m_before_any_store)
+fprintf (f, "happens before any store to memory\n");
   for (unsigned i = 0; i < m_arg_flow.length (); i++)
 {
   fprintf (f, "Parameter %u:\n", i);
@@ -535,6 +541,7 @@ ipa_sra_call_summaries::duplicate (cgraph_edge *, 
cgraph_edge *,
   new_sum->m_return_ignored = old_sum->m_return_ignored;
   new_sum->m_return_returned = old_sum->m_return_returned;
   new_sum->m_bit_aligned_arg = old_sum->m_bit_aligned_arg;
+  new_sum->m_before_any_store = old_sum->m_before_any_store;
 }
 
 
@@ -2374,6 +2381,7 @@ process_scan_results (cgraph_node *node, struct function 
*fun,
unsigned count = gimple_call_num_args (call_stmt);
isra_call_summary *csum = call_sums->get_create (cs);
csum->init_inputs (count);
+   csum->m_before_any_store = uses_memory_as_obtained;
for (unsigned argidx = 0; argidx < count; argidx++)
  {
if (!csum->m_arg_flow[argidx].pointer_pass_through)
@@ -2546,6 +2554,7 @@ isra_write_edge_summary (output_block *ob, cgraph_edge *e)
   bp_pack_value (, csum->m_return_ignored, 1);
   bp_pack_value (, csum->m_return_returned, 1);
   bp_pack_value (, csum->m_bit_aligned_arg, 1);
+  bp_pack_value (, csum->m_before_any_store, 1);
   streamer_write_bitpack ();
 }
 
@@ -2664,6 +2673,7 @@ isra_read_edge_summary (struct lto_input_block *ib, 
cgraph_edge *cs)
   csum->m_return_ignored = bp_unpack_value (, 1);
   csum->m_return_returned = bp_unpack_value (, 1);
   csum->m_bit_aligned_arg = bp_unpack_value (, 1);
+  csum->m_before_any_store = bp_unpack_value (, 1);
 }
 
 /* Read intraprocedural analysis information about NODE and all of its outgoing
@@ -3420,7 +3430,8 @@ param_splitting_across_edge (cgraph_edge *cs)
}
  else if (!ipf->safe_to_import_accesses)
{
- if (!all_callee_accesses_present_p (param_desc, arg_desc))
+ if (!csum->m_before_any_store
+ || !all_callee_accesses_present_p (param_desc, arg_desc))
  

RE: [PATCH][pushed] gcov: update documentation entry about string format

2021-06-21 Thread Eugene Rozenfeld via Gcc-patches
Thank you for updating the documentation Martin.

The following line can now be removed:

padding: | char:0 | char:0 char:0 | char:0 char:0 char:0

Eugene

-Original Message-
From: Gcc-patches  On 
Behalf Of Martin Liška
Sent: Thursday, June 17, 2021 2:40 AM
To: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] [PATCH][pushed] gcov: update documentation entry about 
string format

gcc/ChangeLog:

* gcov-io.h: Update documentation entry about string format.
---
  gcc/gcov-io.h | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h index f7584eb9679..ff92afe63df 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -42,15 +42,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
  
 Numbers are recorded in the 32 bit unsigned binary form of the
 endianness of the machine generating the file. 64 bit numbers are
-   stored as two 32 bit numbers, the low part first.  Strings are
-   padded with 1 to 4 NUL bytes, to bring the length up to a multiple
-   of 4. The number of 4 bytes is stored, followed by the padded
+   stored as two 32 bit numbers, the low part first.
+   The number of bytes is stored, followed by the
 string. Zero length and NULL strings are simply stored as a length
 of zero (they have no trailing NUL or padding).
  
int32:  byte3 byte2 byte1 byte0 | byte0 byte1 byte2 byte3
int64:  int32:low int32:high
-   string: int32:0 | int32:length char* char:0 padding
+   string: int32:0 | int32:length char* char:0
padding: | char:0 | char:0 char:0 | char:0 char:0 char:0
item: int32 | int64 | string
  
--
2.32.0



Re: [Patch, fortran V2] PR fortran/93308/93963/94327/94331/97046 problems raised by descriptor handling

2021-06-21 Thread José Rui Faustino de Sousa via Gcc-patches

Hi Tobias,

On 21/06/21 16:46, Tobias Burnus wrote:

Well, as said: directly into the compiler where currently the call to
libgomp is.

>

I don't think I understand were you mean. You don't mean the includes in 
"f95-lang.c" do you?


Best regards,
José Rui




[committed] libstdc++: Improve std::lock algorithm

2021-06-21 Thread Jonathan Wakely via Gcc-patches
The current std::lock algorithm is the one called "persistent" in Howard
Hinnant's https://howardhinnant.github.io/dining_philosophers.html post.
While it tends to perform acceptably fast, it wastes a lot of CPU cycles
by continuously locking and unlocking the uncontended mutexes.
Effectively, it's a spin lock with no back-off.

This replaces it with the one Howard calls "smart and polite". It's
smart, because when a Mi.try_lock() call fails because mutex Mi is
contended, the algorithm reorders the mutexes until Mi is first, then
calls Mi.lock(), to block until Mi is no longer contended.  It's
polite because it uses std::this_thread::yield() between the failed
Mi.try_lock() call and the Mi.lock() call. (In reality it uses
__gthread_yield() directly, because using this_thread::yield() would
require shuffling code around to avoid a circular dependency.)

This version of the algorithm is inspired by some hints from Howard, so
that it has strictly bounded stack usage. As the comment in the code
says:

// This function can recurse up to N levels deep, for N = 1+sizeof...(L1).
// On each recursion the lockables are rotated left one position,
// e.g. depth 0: l0, l1, l2; depth 1: l1, l2, l0; depth 2: l2, l0, l1.
// When a call to l_i.try_lock() fails it recurses/returns to depth=i
// so that l_i is the first argument, and then blocks until l_i is locked.

The 'i' parameter is the desired permuation of the lockables, and the
'depth' parameter is the depth in the call stack of the current
instantiation of the function template. If i == depth then the function
calls l0.lock() and then l1.try_lock()... for each lockable in the
parameter pack l1.  If i > depth then the function rotates the lockables
to the left one place, and calls itself again to go one level deeper.
Finally, if i < depth then the function returns to a shallower depth,
equivalent to a right rotate of the lockables.  When a call to
try_lock() fails, i is set to the index of the contended lockable, so
that the next call to l0.lock() will use the contended lockable as l0.

This commit also replaces the std::try_lock implementation details. The
new code is identical in behaviour, but uses a pair of constrained
function templates. This avoids instantiating a class template, and is a
litle simpler to call where used in std::__detail::__lock_impl and
std::try_lock.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/mutex (__try_to_lock): Move to __detail namespace.
(struct __try_lock_impl): Replace with ...
(__detail::__try_lock_impl(tuple&)): New
function templates to implement std::try_lock.
(try_lock): Use new __try_lock_impl.
(__detail::__lock_impl(int, int&, L0&, L1&...)): New function
template to implement std::lock.
(lock): Use __lock_impl.

Tested powerpc64le-linux. Committed to trunk.

commit 6cf0040fff78a665db31a6a8dee60b12eef2e590
Author: Jonathan Wakely 
Date:   Mon Jun 21 13:35:18 2021

libstdc++: Improve std::lock algorithm

The current std::lock algorithm is the one called "persistent" in Howard
Hinnant's https://howardhinnant.github.io/dining_philosophers.html post.
While it tends to perform acceptably fast, it wastes a lot of CPU cycles
by continuously locking and unlocking the uncontended mutexes.
Effectively, it's a spin lock with no back-off.

This replaces it with the one Howard calls "smart and polite". It's
smart, because when a Mi.try_lock() call fails because mutex Mi is
contended, the algorithm reorders the mutexes until Mi is first, then
calls Mi.lock(), to block until Mi is no longer contended.  It's
polite because it uses std::this_thread::yield() between the failed
Mi.try_lock() call and the Mi.lock() call. (In reality it uses
__gthread_yield() directly, because using this_thread::yield() would
require shuffling code around to avoid a circular dependency.)

This version of the algorithm is inspired by some hints from Howard, so
that it has strictly bounded stack usage. As the comment in the code
says:

// This function can recurse up to N levels deep, for N = 1+sizeof...(L1).
// On each recursion the lockables are rotated left one position,
// e.g. depth 0: l0, l1, l2; depth 1: l1, l2, l0; depth 2: l2, l0, l1.
// When a call to l_i.try_lock() fails it recurses/returns to depth=i
// so that l_i is the first argument, and then blocks until l_i is locked.

The 'i' parameter is the desired permuation of the lockables, and the
'depth' parameter is the depth in the call stack of the current
instantiation of the function template. If i == depth then the function
calls l0.lock() and then l1.try_lock()... for each lockable in the
parameter pack l1.  If i > depth then the function rotates the lockables
to the left one place, and calls itself again to go one level deeper.
Finally, if i < depth then the function returns 

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Qing Zhao via Gcc-patches


> On Jun 21, 2021, at 11:18 AM, Kees Cook  wrote:
> 
> On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
>> So, if “pattern value” is “0x”, then it’s a valid canonical 
>> virtual memory address.  However, for most OS, “0x” should 
>> be not in user space.
>> 
>> My question is, is “0xF” good for pointer? Or 
>> “0x” better?
> 
> I think 0xFF repeating is fine for this version. Everything else is a
> "nice to have" for the pattern-init, IMO. :)

Okay, thank you!

Qing
> 
> -- 
> Kees Cook



[PATCH] testsuite: add -fwrapv for 950704-1.c

2021-06-21 Thread Xi Ruoyao via Gcc-patches
This test relies on wrap behavior of signed overflow.  Without -fwrapv
it is known to fail on mips (and maybe some other targets as well).

gcc/testsuite/

* gcc.c-torture/execute/950704-1.c: Add -fwrapv to avoid
  undefined behavior.
---
 gcc/testsuite/gcc.c-torture/execute/950704-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.c-torture/execute/950704-1.c 
b/gcc/testsuite/gcc.c-torture/execute/950704-1.c
index f11aff8cabc..67fe0885e5a 100644
--- a/gcc/testsuite/gcc.c-torture/execute/950704-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/950704-1.c
@@ -1,3 +1,4 @@
+/* { dg-additional-options "-fwrapv" } */
 int errflag;
 
 long long
-- 
2.32.0





Re: [Patch, fortran V2] PR fortran/93308/93963/94327/94331/97046 problems raised by descriptor handling

2021-06-21 Thread Tobias Burnus

Hi José,

On 21.06.21 17:51, José Rui Faustino de Sousa via Fortran wrote:

On 21/06/21 13:46, Tobias Burnus wrote:


(in principle, I'd like to have the libgfortran function moved to the
compiler proper to avoid some issues, but that's admittedly a task
independent of your work.)

cfi_desc_to_gfc_desc and gfc_desc_to_cfi_desc from ISO_c_binding.c,
right?

Yes.


So, I could look further into that. Were would you like them placed?

Well, as said: directly into the compiler where currently the call to
libgomp is.

LGTM – except for one minor nit.


Found a second tiny nit:

+  if (GFC_DESCRIPTOR_DATA (d))
+for (n = 0; n < GFC_DESCRIPTOR_RANK (d); n++)
+  {
+   CFI_index_t lb = 1;
+
+   if (s->attribute != CFI_attribute_other)

There is tailing whitespace in the otherwise empty line.


In trans-expr.c's gfc_conv_gfc_desc_to_cfi_desc:

/* Transfer values back to gfc descriptor.  */
+  if (cfi_attribute != 2
+  && !fsym->attr.value
+  && fsym->attr.intent != INTENT_IN)

Can you add after the '2' the string '  /* CFI_attribute_other. */'
to make the number less magic.


Yes... I had the same idea... :-) But all those constants are defined
in "ISO_Fortran_binding.h"... And moving all those definitions would
be a major change... So I left it as it was...


Well, I am currently only asking to add a comment after the "2;".

This fixing those two nits (removing tailing whitespace + adding a
comment) and is be trivial.

* * *

However, in the long run, I think we should put it into either a
separate file, which is included into ISO_Fortran_binding.h and the
proper compiler (and installed alongside ISO_Fortran_binding.h) - or
just in libgfortran.h and adding some check/(static)assert that it
matches to the value in ISO_Fortran_binding.h.

Or, possibly, we could also include ISO_Fortran_binding.h when building
the compiler itself, possibly adding some '#ifdef' code to disable parts
we do not want when we do #include. it.

(We already have '#include "libgfortran.h"' in gcc/fortran/gfortran.h.)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Kees Cook via Gcc-patches
On Mon, Jun 21, 2021 at 03:39:45PM +, Qing Zhao wrote:
> So, if “pattern value” is “0x”, then it’s a valid canonical 
> virtual memory address.  However, for most OS, “0x” should be 
> not in user space.
> 
> My question is, is “0xF” good for pointer? Or 
> “0x” better?

I think 0xFF repeating is fine for this version. Everything else is a
"nice to have" for the pattern-init, IMO. :)

-- 
Kees Cook


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Qing Zhao via Gcc-patches


> On Jun 21, 2021, at 10:35 AM, Richard Biener  wrote:
>>> I think we can drop -fauto-var-init=pattern and just go with block
>>> initializing which will cover padding as well which means we can
>>> stay with the odd -ftrivial-auto-var-init name used by CLANG and
>>> add no additional options.
>> 
>> Yes, this is a good idea. 
>> 
>> block initializing will cover all paddings automatically. 
>> 
>> Shall we do block initializing for both “zero initialization” and
>> “pattern initialization”?
>> 
>> Currently, for zero initialization, I used the following:
>> 
> +case AUTO_INIT_ZERO:
> +  init = build_zero_cst (TREE_TYPE (var));
> +  expand_assignment (var, init, false);
> +  break;
>> 
>> Looks like that the current “expand_assignment” does not initialize
>> paddings with zeroes. 
>> Shall I also use “memset” for “zero initialization”?
> 
> I'd say so, yes. 

Okay.

One more question for the current “expand_builtin_memset”:

Is the current implementation of “expand_builtin_memset” automatically handle 
short length memset optimally? 

i.e, do I need to specially handle char type, short type, or other types that 
can fit to a register?


>>> 
>>> There's no "safe" pattern besides all-zero for all "undefined" uses
>>> (note that uses do not necessarily use declared types).  Which is why
>>> recommending pattern init is somewhat misguided.  There's maybe 
>>> some useful pattern that more readily produces crashes, those that
>>> produce a FP sNaN for all of the float types.
>> 
>> So, pattern value as 0xFF might be better than 0xAA since 0x
>> will be a NaN value for floating type?
> 
> I think for debugging NaNs are quite nice, yes. 

For floating point, 0x is good. 
But for pointer type, is it good? (See my other email to Kees).

>> 
>> Not sure whether it’s necessary to expose this to user.
>> 
>> One question that is important to the implementation is:
>> 
>> Shall we use “byte-repeated” or “word-repeated” pattern?
>> Is “word-repeated” pattern better than “byte-repeated” pattern?
>> 
>> For implementation, “byte-repeated” pattern will make the whole
>> implementation much simpler since both “zero initialization” 
>> and “pattern initialization” can be implemented with “memset” with
>> different “value”.  
>> 
>> So, if “word-repeated” pattern will not have too much more benefit, I
>> will prefer “byte-repeated” pattern.
>> 
>> Let me know your comments here.
> 
> I have no strong opinion and prefer byte repetition for simplicity. But I 
> would document this as implementation detail that can change. 

Okay, if we finally decide to go with byte repetition, I will document this as 
implementation details that can be changed later.

Qing
> 
> Richard. 
> 
>>> 
 
 
 As said, for example glibc allocator hardening with MALLOC_PERTURB_
 uses simple byte-init.
 
 What’s the pattern glibc used?
>>> 
>>> The value of the MALLOC_PERTURB_ environment truncated to a byte.
>> 
>> Okay.
>> 
>> thanks.
>> 
>> Qing
>>> 
>>> Richard.
>>> 
> 



Re: [Patch, fortran V2] PR fortran/93308/93963/94327/94331/97046 problems raised by descriptor handling

2021-06-21 Thread José Rui Faustino de Sousa via Gcc-patches

On 21/06/21 13:46, Tobias Burnus wrote:

Hi José,

(in principle, I'd like to have the libgfortran function moved to the
compiler proper to avoid some issues, but that's admittedly a task
independent of your work.)



cfi_desc_to_gfc_desc and gfc_desc_to_cfi_desc from ISO_c_binding.c, right?

Since fixing:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100917

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100910

would very likely require passing an additional "kind" parameter (and 
future descriptor unification) that would be a good idea.


I had a patch to do this, passing the kind value, but AFAIR there were 
issues with kind values for C_PTR and C_FUNPTR (and I didn't want to 
mess with the ABI also in one go)... But I might have fixed that 
somewhere else afterwards...


So, I could look further into that. Were would you like them placed?

LGTM – except for one minor nit. In trans-expr.c's 
gfc_conv_gfc_desc_to_cfi_desc:


    /* Transfer values back to gfc descriptor.  */
+  if (cfi_attribute != 2
+  && !fsym->attr.value
+  && fsym->attr.intent != INTENT_IN)

Can you add after the '2' the string '  /* CFI_attribute_other.  */'
to make the number less magic.



Yes... I had the same idea... :-) But all those constants are defined in 
"ISO_Fortran_binding.h"... And moving all those definitions would be a 
major change... So I left it as it was...


What do you suggest I do?

Best regards,
José Rui





Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Qing Zhao via Gcc-patches
Hi, Kees,

On Jun 18, 2021, at 6:47 PM, Kees cook 
mailto:keesc...@chromium.org>> wrote:

On Wed, Jun 16, 2021 at 07:39:02PM +, Qing Zhao wrote:
So, the major question now is:

Is one single repeatable pattern enough for pattern initialization for all 
different types of auto variables?

If YES, then the implementation for pattern initialization will be much easier 
and simpler
 as you pointed out. And will save me a lot of pain to implement this part.
If NO, then we have to keep the current complicate implementation since it 
provides us
 the flexibility to assign different patterns to different types.

Honestly, I don’t have a good justification on this question myself.

The previous references I have so far are the current behavior of CLANG and 
Microsoft compiler.

For your reference,
. CLANG uses different patterns for INTEGER  (0x) and FLOAT 
(0x) and 32-bit pointer (0x00AA)
https://reviews.llvm.org/D54604
. Microsoft uses different patterns for INTEGERS ( 0xE2), FLOAT (1.0)
https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/

My understanding from CLANG’s comment is, the patterns are easier to crash the 
program for the certain type, therefore easier to
catch any potential bugs.

Right, this is the justification for the different patterns. I am
fine with a static value for the first version of this functionality,
as long as it's a non-canonical virtual memory address when evaluated
as a pointer (so that the pattern can't be made to aim at a legitimate
fixed allocatable address in memory).

Just searched online, 
(https://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details)

===
Canonical form addresses run from 0 through 7FFF', and from 
8000' through ', for a total of 256 TB of usable 
virtual address space.
===

So, if “pattern value” is “0x”, then it’s a valid canonical 
virtual memory address.  However, for most OS, “0x” should be 
not in user space.

My question is, is “0xF” good for pointer? Or 
“0x” better?

Thanks.
Qing


Don’t know why Microsoft chose the pattern like this.

So, For GCC, what should we do on the pattern initializations, shall we choose 
one single repeatable pattern for all the types as you suggested,
Or chose different patterns for different types as Clang and Microsoft 
compiler’s behavior?

Kees, do you have any comment on this?

How did Linux Kernel use -ftrivial-auto-var-init=pattern feature of CLANG?

It's just used as-is from the compiler, and recommended for "debug
builds".

--
Kees Cook



Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Richard Biener
On June 21, 2021 5:11:30 PM GMT+02:00, Qing Zhao  wrote:
>HI, Richard,
>
>> On Jun 21, 2021, at 2:53 AM, Richard Biener 
>wrote:
>> 
>>> 
>>> 
>>> This is for the compatibility with CLANG. -:).
>(https://reviews.llvm.org/D54604)
>> 
>> I don't care about functional 1:1 "compatibility" with CLANG.
>
>Okay.  -:)
>
>> 
>>> 1. Pattern initialization
>>> 
>>>  This is the recommended initialization approach. Pattern
>initialization's
>> 
>> But elsewhere you said pattern initialization is only for debugging,
>> not production …
>
>Yes. Pattern initialization is only for debugging purpose during
>development phase.
>
>> 
>>> Use a pattern that fits them all.  I mean memory allocation
>hardening
>>> fills allocated storage with a repeated (byte) pattern and people
>are
>>> happy with that.  It also makes it easy to spot uninitialized
>storage
>>> from a debugger.  So please, do not over-design this, it really
>doesn't
>>> make any sense and the common case you are inevitably chasing here
>>> would already be fine with a random repeated pattern.
>>> 
>>> So, My question is:
>>> 
>>> If we want to pattern initialize with the single repeated pattern
>for all types, with one is better to use:  “0x”
>>> or “0x” , or other pattern that our current glibc used?
>What’s that pattern?
>> 
>> It's set by the user.
>
>Yes, looks like that glibc uses a byte-repeated pattern that is set by
>the user through environment variable.
>
>> 
>>> Will  “0x” in a floating type auto variable crash the
>program?
>>> Will “0x” in a pointer type auto variable crash the program?
>(Might crash?)
>>> 
>>> 
>>> (thus also my suggestion to split out
>>> padding handling - now we can also split out pattern init handling,
>>> maybe somebody else feels like reviewing and approving this, who
>knows).
>>> 
>>> I am okay with further splitting pattern initialization part to a
>separate patch. Then we will
>>> have 4 independent patches in total:
>>> 
>>> 1. -fauto-var-init=zero and all the handling in other passes to the
>new added call to .DEFERRED_INIT.
>>> 2. Add -fauto-var-init=pattern
>>> 3. Add -fauto-var-init-padding
>>> 4. Add -ftrivial-auto-var-init for CLANG compatibility.
>>> 
>>> Are the above the correct understanding?
>> 
>> I think we can drop -fauto-var-init=pattern and just go with block
>> initializing which will cover padding as well which means we can
>> stay with the odd -ftrivial-auto-var-init name used by CLANG and
>> add no additional options.
>
>Yes, this is a good idea. 
>
>block initializing will cover all paddings automatically. 
>
>Shall we do block initializing for both “zero initialization” and
>“pattern initialization”?
>
>Currently, for zero initialization, I used the following:
>
 +case AUTO_INIT_ZERO:
 +  init = build_zero_cst (TREE_TYPE (var));
 +  expand_assignment (var, init, false);
 +  break;
>
>Looks like that the current “expand_assignment” does not initialize
>paddings with zeroes. 
>Shall I also use “memset” for “zero initialization”?

I'd say so, yes. 

>> 
>>> As said, block-initializing with a repeated pattern is OK and I can
>see
>>> that being useful.  Trying to produce "nicer" values for floats,
>bools
>>> and pointers on 32bit platforms is IMHO not going to fix anything
>and
>>> introduce as many problems as it will "fix".
>>> 
>>> Yes, I agree, if we can find a good repeated pattern for all types’s
>
>>> pattern initialization, that will be much easier and simpler to 
>>> implement, I am happy to do that.  (Honestly, the part of
>implementation 
>>> that took me most of the time is pattern-initialization.. and I am
>still 
>>> not very comfortable with this part Of the code myself.  -:)
>> 
>> There's no "safe" pattern besides all-zero for all "undefined" uses
>> (note that uses do not necessarily use declared types).  Which is why
>> recommending pattern init is somewhat misguided.  There's maybe 
>> some useful pattern that more readily produces crashes, those that
>> produce a FP sNaN for all of the float types.
>
>So, pattern value as 0xFF might be better than 0xAA since 0x
>will be a NaN value for floating type?

I think for debugging NaNs are quite nice, yes. 

>> 
>>> And if you block-initialize stuff you then automagically cover
>padding.
>>> I call this a win-win, no?
>>> 
>>> Yes, this will also initialize paddings with patterns (Not zeroes as
>CLANG did).
>>> Shall we compatible with CLANG on this?
>> 
>> No, why?
>
>Okay.
>
>>> in my example code (untested) you then still need
>>> 
>>>  expand_assignment (var, ctor, false);
>>> 
>>> it would be the easiest way to try pattern init with a pattern
>that's
>>> bigger than a byte (otherwise of course the memset path is optimal).
>>> 
>>> If the pattern that is used to initialize all types is
>byte-repeatable, for example, 0xA or 0xF, then
>>> We can use memset to initialize all types, however, the potential
>problem is, if later we decide
>>> To change to another pattern 

Re: [[PATCH V9] 3/7] CTF/BTF debug formats

2021-06-21 Thread Jose E. Marchesi via Gcc-patches


> OK otherwise.  I think I OKed 1/7 lst time and thus this should now have
> all parts OKed by me besides the BPF backend changes.
>
> Please leave others a day or two to comment (and obviously the BPF
> maintainer to ack his part).

The BPF parts are OK. (Speaking as the BPF maintainer.)

> Thanks for your patience,

No, thanks to you for taking the effort and the pain to review this big
and fast-moving patch series.  It is MUCH appreciated by the whole
Oracle team.


Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid

2021-06-21 Thread José Rui Faustino de Sousa via Gcc-patches

Hi Tobias,

On 21/06/21 12:37, Tobias Burnus wrote:

Thus: Do you have a list of patches pending review?

>

https://gcc.gnu.org/pipermail/fortran/2021-April/055924.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055933.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056168.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056167.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056163.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056162.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056155.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056154.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056152.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056159.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055982.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055949.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html

https://gcc.gnu.org/pipermail/fortran/2021-June/056169.html

https://gcc.gnu.org/pipermail/fortran/2021-April/055921.html

I am not 100% sure this is all of them but it should be most.


Secondly, I assume
you can commit or do you have commit issues?



Up to now there were no problems.

Best regards,
José Rui


Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-21 Thread Qing Zhao via Gcc-patches
HI, Richard,

> On Jun 21, 2021, at 2:53 AM, Richard Biener  wrote:
> 
>> 
>> 
>> This is for the compatibility with CLANG. -:). 
>> (https://reviews.llvm.org/D54604)
> 
> I don't care about functional 1:1 "compatibility" with CLANG.

Okay.  -:)

> 
>> 1. Pattern initialization
>> 
>>  This is the recommended initialization approach. Pattern initialization's
> 
> But elsewhere you said pattern initialization is only for debugging,
> not production …

Yes. Pattern initialization is only for debugging purpose during development 
phase.

> 
>> Use a pattern that fits them all.  I mean memory allocation hardening
>> fills allocated storage with a repeated (byte) pattern and people are
>> happy with that.  It also makes it easy to spot uninitialized storage
>> from a debugger.  So please, do not over-design this, it really doesn't
>> make any sense and the common case you are inevitably chasing here
>> would already be fine with a random repeated pattern.
>> 
>> So, My question is:
>> 
>> If we want to pattern initialize with the single repeated pattern for all 
>> types, with one is better to use:  “0x”
>> or “0x” , or other pattern that our current glibc used? What’s that 
>> pattern?
> 
> It's set by the user.

Yes, looks like that glibc uses a byte-repeated pattern that is set by the user 
through environment variable.

> 
>> Will  “0x” in a floating type auto variable crash the program?
>> Will “0x” in a pointer type auto variable crash the program? (Might 
>> crash?)
>> 
>> 
>> (thus also my suggestion to split out
>> padding handling - now we can also split out pattern init handling,
>> maybe somebody else feels like reviewing and approving this, who knows).
>> 
>> I am okay with further splitting pattern initialization part to a separate 
>> patch. Then we will
>> have 4 independent patches in total:
>> 
>> 1. -fauto-var-init=zero and all the handling in other passes to the new 
>> added call to .DEFERRED_INIT.
>> 2. Add -fauto-var-init=pattern
>> 3. Add -fauto-var-init-padding
>> 4. Add -ftrivial-auto-var-init for CLANG compatibility.
>> 
>> Are the above the correct understanding?
> 
> I think we can drop -fauto-var-init=pattern and just go with block
> initializing which will cover padding as well which means we can
> stay with the odd -ftrivial-auto-var-init name used by CLANG and
> add no additional options.

Yes, this is a good idea. 

block initializing will cover all paddings automatically. 

Shall we do block initializing for both “zero initialization” and “pattern 
initialization”?

Currently, for zero initialization, I used the following:

>>> +case AUTO_INIT_ZERO:
>>> +  init = build_zero_cst (TREE_TYPE (var));
>>> +  expand_assignment (var, init, false);
>>> +  break;

Looks like that the current “expand_assignment” does not initialize paddings 
with zeroes. 
Shall I also use “memset” for “zero initialization”?

> 
>> As said, block-initializing with a repeated pattern is OK and I can see
>> that being useful.  Trying to produce "nicer" values for floats, bools
>> and pointers on 32bit platforms is IMHO not going to fix anything and
>> introduce as many problems as it will "fix".
>> 
>> Yes, I agree, if we can find a good repeated pattern for all types’s 
>> pattern initialization, that will be much easier and simpler to 
>> implement, I am happy to do that.  (Honestly, the part of implementation 
>> that took me most of the time is pattern-initialization.. and I am still 
>> not very comfortable with this part Of the code myself.  -:)
> 
> There's no "safe" pattern besides all-zero for all "undefined" uses
> (note that uses do not necessarily use declared types).  Which is why
> recommending pattern init is somewhat misguided.  There's maybe 
> some useful pattern that more readily produces crashes, those that
> produce a FP sNaN for all of the float types.

So, pattern value as 0xFF might be better than 0xAA since 0x will be a 
NaN value for floating type?

> 
>> And if you block-initialize stuff you then automagically cover padding.
>> I call this a win-win, no?
>> 
>> Yes, this will also initialize paddings with patterns (Not zeroes as CLANG 
>> did).
>> Shall we compatible with CLANG on this?
> 
> No, why?

Okay.

>> in my example code (untested) you then still need
>> 
>>  expand_assignment (var, ctor, false);
>> 
>> it would be the easiest way to try pattern init with a pattern that's
>> bigger than a byte (otherwise of course the memset path is optimal).
>> 
>> If the pattern that is used to initialize all types is byte-repeatable, for 
>> example, 0xA or 0xF, then
>> We can use memset to initialize all types, however, the potential problem 
>> is, if later we decide
>> To change to another pattern that might not be byte-repeatable, then the 
>> memset implementation
>> is not proper at that time.
>> 
>> Is it possible that we might change the pattern later?
> 
> The pattern should be documented as an implementation detail 

[PATCH]middle-end[RFC] slp: new implementation of complex numbers

2021-06-21 Thread Tamar Christina via Gcc-patches
Hi Richi,

This patch is still very much incomplete and I do know that it is missing things
but it's complete enough such that examples are working and allows me to show
what I'm working towards.

note, that this approach will remove a lot of code in tree-vect-slp-patterns but
to keep the diff readable I've left them in and just commented out the calls or
removed them where needed.

The patch rewrites the complex numbers detection by splitting the detection of
structure from dataflow analysis.  In principle the biggest difference between
this and the previous implementation is that instead of trying to detect valid
complex operations it *makes* an operation a valid complex operation.

To do this each operation gets a dual optab which matches the same structure but
has no dataflow requirement.

i.e. in this patch I added 4, ADDSUB, SUBADD, MUL_ADDSUB, MULL_SUBADD.

There is a then a mapping between these and their variant with the dataflow:

* ADDSUB -> COMPLEX_ADD_ROT270
* SUBADD -> COMPLEX_ADD_ROT90
* MUL_ADDSUB -> COMPLEX_MUL_CONJ
* MUL_SUBADD -> COMPLEX_MUL

with the intention that when we detect the structure of an operation we query
the backend for both optabs.

This should result in one of three states:

 * not supported: Move on.
 * Supports ADDSUB only: Rewrite using ADDSUB, set type to 'cannot_transform'
 * Supports COMPLEX_ADD_ROT270 only: Rewrite using ADDSUB, set type to 
'must_transform'
 * Supports both: Rewrite using ADDSUB, set type fo 'can_transform'

with the idea behind `can_transform` is to check the costs of the inverse
permute needed to use the complex operation and if this is very expensive then
stick to addsub.  This requires the target to be able to cost the operations
reasonably correct.

So for ADD this looks like

 === vect_match_slp_patterns ===
 Analyzing SLP tree 0x494e970 for patterns
 Found ADDSUB pattern in SLP tree
 Target does not support ADDSUB for vector type vector(4) float
 Found COMPLEX_ADD_ROT270 pattern in SLP tree
 Target supports COMPLEX_ADD_ROT270 vectorization with mode vector(4) float
Pattern matched SLP tree
node 0x494e970 (max_nunits=4, refcnt=1)
op template: REALPART_EXPR <*_10> = _23;
  stmt 0 REALPART_EXPR <*_10> = _23;
  stmt 1 IMAGPART_EXPR <*_10> = _22;
  children 0x494ea00
node 0x494ea00 (max_nunits=4, refcnt=1)
op template: slp_patt_39 = .ADDSUB (_23, _23);
  stmt 0 _23 = _6 + _13;
  stmt 1 _22 = _12 - _8;
  children 0x494eb20 0x494ebb0
node 0x494eb20 (max_nunits=4, refcnt=1)
op template: _13 = REALPART_EXPR <*_3>;
  stmt 0 _13 = REALPART_EXPR <*_3>;
  stmt 1 _12 = IMAGPART_EXPR <*_3>;
node 0x494ebb0 (max_nunits=4, refcnt=1)
op: VEC_PERM_EXPR
  { }
  lane permutation { 0[1] 0[0] }
  children 0x494ec40
node 0x494ec40 (max_nunits=1, refcnt=2)
op template: _8 = REALPART_EXPR <*_5>;
  stmt 0 _8 = REALPART_EXPR <*_5>;
  stmt 1 _6 = IMAGPART_EXPR <*_5>;
  load permutation { 0 1 }

and later during optimize_slp we get

Tranforming SLP expression from ADDSUB to COMPLEX_ADD_ROT270
processing node 0x494ebb0
simplifying permute node 0x494ebb0
Optimized SLP instances:
node 0x494e970 (max_nunits=4, refcnt=1)
op template: REALPART_EXPR <*_10> = _23;
   stmt 0 REALPART_EXPR <*_10> = _23;
   stmt 1 IMAGPART_EXPR <*_10> = _22;
   children 0x494ea00
node 0x494ea00 (max_nunits=4, refcnt=1)
op template: slp_patt_39 = .COMPLEX_ADD_ROT270 (_23, _23);
   stmt 0 _23 = _6 + _13;
   stmt 1 _22 = _12 - _8;
   children 0x494eb20 0x494ebb0
node 0x494eb20 (max_nunits=4, refcnt=1)
op template: _13 = REALPART_EXPR <*_3>;
   stmt 0 _13 = REALPART_EXPR <*_3>;
   stmt 1 _12 = IMAGPART_EXPR <*_3>;
node 0x494ebb0 (max_nunits=4, refcnt=1)
op: VEC_PERM_EXPR
   { }
   lane permutation { 0[0] 0[1] }
   children 0x494ec40
node 0x494ec40 (max_nunits=1, refcnt=2)
op template: _8 = REALPART_EXPR <*_5>;
   stmt 0 _8 = REALPART_EXPR <*_5>;
   stmt 1 _6 = IMAGPART_EXPR <*_5>;

Now I still have to elide the VEC_PERM_EXPR here but that's easy.

This works for ADD, but it doesn't work well when there's a complicated sequence
of loads.  So for example

#define N 200
void g (double complex a[restrict N], double complex b[restrict N],
double complex c[restrict N])
{
  for (int i=0; i < N; i++)
{
  c[i] =  a[i] * b[i];
}
}

will results in an SLP tree where each operand of the multiply does not get to
see all the original vectors:

Final SLP tree for instance 0x5678a30:
node 0x55965a0 (max_nunits=4, refcnt=2)
op template: REALPART_EXPR <*_7> = _25;
  stmt 0 REALPART_EXPR <*_7> = _25;
  stmt 1 IMAGPART_EXPR <*_7> = _26;
  children 0x5596630
node 0x5596630 (max_nunits=4, refcnt=2)
op: VEC_PERM_EXPR
  stmt 0 _25 = _17 - _22;
  stmt 1 _26 = _23 + _24;
  lane permutation { 0[0] 1[1] }
  children 0x5596a20 0x5596ab0
node 0x5596a20 (max_nunits=1, refcnt=1)
op template: _25 = _17 - _22;
  { }
  children 0x55966c0 0x5596870
node 0x55966c0 (max_nunits=4, refcnt=3)
op template: _17 = _10 * _19;
  stmt 0 _17 = _10 * _19;
  stmt 1 _23 = _10 * _18;
  children 0x5596750 0x55967e0
node 0x5596750 

Re: [ping PATCH v2 2/2] rs6000: Add test for _mm_minpos_epu16

2021-06-21 Thread Paul A. Clarke via Gcc-patches
Gentle ping.

I now realize I forgot to include a blurb about "what changed in v2".

v2:
- Rewrote input computation into a fixed set of useful input sets:
  - All equal.
  - All but one equal.
  - A couple equal, but not in positions that are endian-identical.
  - Minimum first.
  - Minimum last.
  - Values which are large enough that if considered signed would
be considered minimums, but not if unsigned.
- Convert "dimension of array" computation to DIM macro.

On Tue, Jun 08, 2021 at 02:11:55PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Copy the test for _mm_minpos_epu16 from
> gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
> a few adjustments:
> 
> - Adjust the dejagnu directives for powerpc platform.
> - Make the data not be monotonically increasing,
>   such that some of the returned values are not
>   always the first value (index 0).
> - Create a list of input data testing various scenarios
>   including more than one minimum value and different
>   orders and indicies of the minimum value.
> - Fix a masking issue where the index was being truncated
>   to 2 bits instead of 3 bits, which wasn't found because
>   all of the returned indicies were 0 with the original
>   generated data.
> - Support big-endian.
> 
> 2021-06-08  Paul A. Clarke  
> 
> gcc/testsuite/ChangeLog:
> * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
> gcc/testsuite/gcc.target/i386, make more robust.
> ---
>  .../gcc.target/powerpc/sse4_1-phminposuw.c| 68 +++
>  1 file changed, 68 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c 
> b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
> new file mode 100644
> index ..3bb5a2dfe4f5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
> @@ -0,0 +1,68 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +/* { dg-require-effective-target p8vector_hw } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include 
> +
> +#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
> +
> +static void
> +TEST (void)
> +{
> +  union
> +{
> +  __m128i x;
> +  unsigned short s[8];
> +} src[] =
> +{
> +  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 
> 0x } },
> +  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 
> 0x } },
> +  { .s = { 0x, 0x, 0x, 0x, 0x, 0x, 0x, 
> 0x } },
> +  { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 
> 0x0008 } },
> +  { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 
> 0x0001 } },
> +  { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 
> 0xfff3 } }
> +};
> +  unsigned short minVal[DIM (src)];
> +  int minInd[DIM (src)];
> +  unsigned short minValScalar, minIndScalar;
> +  int i, j;
> +  union
> +{
> +  int i;
> +  unsigned short s[2];
> +} res;
> +
> +  for (i = 0; i < DIM (src); i++)
> +{
> +  res.i = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x));
> +  minVal[i] = res.s[0];
> +  minInd[i] = res.s[1] & 0b111;
> +}
> +
> +  for (i = 0; i < DIM (src); i++)
> +{
> +  minValScalar = src[i].s[0];
> +  minIndScalar = 0;
> +
> +  for (j = 1; j < 8; j++)
> + if (minValScalar > src[i].s[j])
> +   {
> + minValScalar = src[i].s[j];
> + minIndScalar = j;
> +   }
> +
> +  if (minValScalar != minVal[i] && minIndScalar != minInd[i])
> + abort ();
> +}
> +}
> -- 
> 2.27.0
> 


Re: [ping PATCH v2 1/2] rs6000: Add support for _mm_minpos_epu16

2021-06-21 Thread Paul A. Clarke via Gcc-patches
Gentle ping.

I now realize I forgot to include a blurb about "what changed in v2".

v2:
- Slight formatting changes based on Segher's review (simplified
  condition, single line).

PC

On Tue, Jun 08, 2021 at 02:11:54PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Add a naive implementation of the subject x86 intrinsic to
> ease porting.
> 
> 2021-06-08  Paul A. Clarke  
> 
> gcc/ChangeLog:
> * config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
> ---
>  gcc/config/rs6000/smmintrin.h | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index bdf6eb365d88..b7de38763f2b 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -116,4 +116,29 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i 
> __mask)
>return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
>  }
>  
> +/* Return horizontal packed word minimum and its index in bits [15:0]
> +   and bits [18:16] respectively.  */
> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> +_mm_minpos_epu16 (__m128i __A)
> +{
> +  union __u
> +{
> +  __m128i __m;
> +  __v8hu __uh;
> +};
> +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> +  unsigned short __ridx = 0;
> +  unsigned short __rmin = __u.__uh[__ridx];
> +  for (unsigned long __i = __ridx + 1; __i < 8; __i++)
> +{
> +  if (__u.__uh[__i] < __rmin)
> +{
> +  __rmin = __u.__uh[__i];
> +  __ridx = __i;
> +}
> +}
> +  __r.__uh[0] = __rmin;
> +  __r.__uh[1] = __ridx;
> +  return __r.__m;
> +}
>  #endif
> -- 
> 2.27.0
> 


Re: [[PATCH V9] 4/7] CTF/BTF testsuites

2021-06-21 Thread Richard Biener via Gcc-patches
On Mon, May 31, 2021 at 7:20 PM Jose E. Marchesi via Gcc-patches
 wrote:
>
> This commit adds a new testsuite for the CTF debug format.

OK if the rest is approved - while I'm not too familiar with dejagnu I think
we can deal with fallout after the fact.

Richard.

> 2021-05-14  Indu Bhagat  
> David Faust  
>
> gcc/testsuite/
>
> * lib/gcc-dg.exp (gcc-dg-frontend-supports-ctf): New procedure.
> (gcc-dg-debug-runtest): Add -gctf support.
> * gcc.dg/debug/btf/btf-1.c: New test.
> * gcc.dg/debug/btf/btf-2.c: Likewise.
> * gcc.dg/debug/btf/btf-anonymous-struct-1.c: Likewise.
> * gcc.dg/debug/btf/btf-anonymous-union-1.c: Likewise.
> * gcc.dg/debug/btf/btf-array-1.c: Likewise.
> * gcc.dg/debug/btf/btf-bitfields-1.c: Likewise.
> * gcc.dg/debug/btf/btf-bitfields-2.c: Likewise.
> * gcc.dg/debug/btf/btf-bitfields-3.c: Likewise.
> * gcc.dg/debug/btf/btf-cvr-quals-1.c: Likewise.
> * gcc.dg/debug/btf/btf-enum-1.c: Likewise.
> * gcc.dg/debug/btf/btf-forward-1.c: Likewise.
> * gcc.dg/debug/btf/btf-function-1.c: Likewise.
> * gcc.dg/debug/btf/btf-function-2.c: Likewise.
> * gcc.dg/debug/btf/btf-int-1.c: Likewise.
> * gcc.dg/debug/btf/btf-pointers-1.c: Likewise.
> * gcc.dg/debug/btf/btf-struct-1.c: Likewise.
> * gcc.dg/debug/btf/btf-typedef-1.c: Likewise.
> * gcc.dg/debug/btf/btf-union-1.c: Likewise.
> * gcc.dg/debug/btf/btf-variables-1.c: Likewise.
> * gcc.dg/debug/btf/btf.exp: Likewise.
> * gcc.dg/debug/ctf/ctf-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-anonymous-struct-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-anonymous-union-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-array-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-array-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-array-3.c: Likewise.
> * gcc.dg/debug/ctf/ctf-array-4.c: Likewise.
> * gcc.dg/debug/ctf/ctf-attr-mode-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-attr-used-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-bitfields-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-bitfields-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-bitfields-3.c: Likewise.
> * gcc.dg/debug/ctf/ctf-bitfields-4.c: Likewise.
> * gcc.dg/debug/ctf/ctf-complex-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-cvr-quals-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-cvr-quals-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-cvr-quals-3.c: Likewise.
> * gcc.dg/debug/ctf/ctf-cvr-quals-4.c: Likewise.
> * gcc.dg/debug/ctf/ctf-enum-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-enum-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-file-scope-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-float-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-forward-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-forward-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-func-index-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-function-pointers-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-function-pointers-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-function-pointers-3.c: Likewise.
> * gcc.dg/debug/ctf/ctf-functions-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-int-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-objt-index-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-pointers-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-pointers-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-preamble-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-skip-types-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-skip-types-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-skip-types-3.c: Likewise.
> * gcc.dg/debug/ctf/ctf-skip-types-4.c: Likewise.
> * gcc.dg/debug/ctf/ctf-skip-types-5.c: Likewise.
> * gcc.dg/debug/ctf/ctf-skip-types-6.c: Likewise.
> * gcc.dg/debug/ctf/ctf-str-table-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-struct-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-struct-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-struct-array-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-struct-pointer-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-struct-pointer-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-typedef-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-typedef-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-typedef-3.c: Likewise.
> * gcc.dg/debug/ctf/ctf-typedef-struct-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-typedef-struct-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf-typedef-struct-3.c: Likewise.
> * gcc.dg/debug/ctf/ctf-union-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-variables-1.c: Likewise.
> * gcc.dg/debug/ctf/ctf-variables-2.c: Likewise.
> * gcc.dg/debug/ctf/ctf.exp: Likewise.
> ---
>  gcc/testsuite/gcc.dg/debug/btf/btf-1.c|  6 ++
>  gcc/testsuite/gcc.dg/debug/btf/btf-2.c| 10 +++
>  

Re: [[PATCH V9] 0/7] Support for the CTF and BTF debug formats

2021-06-21 Thread Richard Biener via Gcc-patches
On Mon, May 31, 2021 at 7:16 PM Jose E. Marchesi via Gcc-patches
 wrote:
>
> [Changes from V8:
> - Rebased to today's master.
> - Adapted to use the write-symbols new infrastructure recently
>   applied upstream.
> - Little change in libiberty to copy .BTF sections over when
>   LTOing.]
>
> Hi people!
>
> Last year we submitted a first patch series introducing support for
> the CTF debugging format in GCC [1].  We got a lot of feedback that
> prompted us to change the approach used to generate the debug info,
> and this patch series is the result of that.
>
> This series also add support for the BTF debug format, which is needed
> by the BPF backend (more on this below.)
>
> This implementation works, but there are several points that need
> discussion and agreement with the upstream community, as they impact
> the way debugging options work.  We are also proposing a way to add
> additional debugging formats in the future.  See below for more
> details.
>
> Finally, a patch makes the BPF GCC backend to use the DWARF debug
> hooks in order to make -gbtf available to it.
>
> [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2019-05/msg01297.html
>
> About CTF
> =
>
> CTF is a debugging format designed in order to express C types in a
> very compact way.  The key is compactness and simplicity.  For more
> information see:
>
> - CTF specification
>   http://www.esperi.org.uk/~oranix/ctf/ctf-spec.pdf
>
> - Compact C-Type support in the GNU toolchain (talk + slides)
>   https://linuxplumbersconf.org/event/4/contributions/396/
>
> - On type de-duplication in CTF (talk + slides)
>   https://linuxplumbersconf.org/event/7/contributions/725/
>
> About BTF
> =
>
> BTF is a debugging format, similar to CTF, that is used in the Linux
> kernel as the debugging format for BPF programs.  From the kernel
> documentation:
>
> "BTF (BPF Type Format) is the metadata format which encodes the debug
>  info related to BPF program/map. The name BTF was used initially to
>  describe data types. The BTF was later extended to include function
>  info for defined subroutines, and line info for source/line
>  information."
>
> Supporting BTF in GCC is important because compiled BPF programs
> (which GCC supports as a target) require the type information in order
> to be loaded and run in diverse kernel versions.  This mechanism is
> known as CO-RE (compile-once, run-everywhere) and is described in the
> "Update of the BPF support in the GNU Toolchain" talk mentioned below.
>
> The BTF is documented in the Linux kernel documentation tree:
> - linux/Documentation/bpf/btf.rst
>
> CTF in the GNU Toolchain
> 
>
> During the last year we have been working in adding support for CTF to
> several components of the GNU toolchain:
>
> - binutils support is already upstream.  It supports linking objects
>   with CTF information with full type de-duplication.
>
> - GDB support is to be sent upstream very shortly.  It makes the
>   debugger capable to use the CTF information whenever available.
>   This is useful in cases where DWARF has been stripped out but CTF is
>   kept.
>
> - GCC support is being discussed and submitted in this series.
>
> Overview of the Implementation
> ==
>
>   dwarf2out.c
>
> The enabled debug formats are hooked in dwarf2out_early_finish.
>
>   dwarf2int.h
>
> Internal interface that exports a few functions and data types
> defined in dwarf2out.c.
>
>   dwarf2ctf.c
>
> Code that tranform the internal GCC DWARF DIEs into CTF container
> structures.  This file uses the dwarf2int.h interface.
>
>   ctfc.c
>   ctfc.h
>
> These two files implement the "CTF container", which is shared
> among CTF and BTF, due to the many similarities between both
> formats.
>
>   ctfout.c
>
> Code that emits assembler with the .ctf section data, from the CTF
> container.
>
>   btfout.c
>
> Code that emits assembler with the .BTF section data, from the CTF
> container.
>
> From debug hooks to debug formats
> =
>
> Our first attempt in adding CTF to GCC used the obvious approach of
> adding a new set of debug hooks as defined in gcc/debug.h.
>
> During our first interaction with the upstream community we were told
> to _not_ use debug hooks, because these are to be obsoleted at some
> point.  We were suggested to instead hook our handlers (which
> processed type TREE nodes producing CTF types from them) somewhere
> else.  So we did.
>
> However at the time we were also facing the need to support BTF, which
> is another type-related debug format needed by the BPF GCC backend.
> Hooking here and there doesn't sound like such a good idea when it
> comes to support several debug formats.
>
> Therefore we thought about how to make GCC support diverse debugging
> formats in a better way.  This led to a proposal we tried to discuss
> at the GNU Tools Track in LPC2020:
>
> - Update of the BPF support in the GNU 

Re: [[PATCH V9] 7/7] libiberty: copy over .BTF section when using LTO

2021-06-21 Thread Richard Biener via Gcc-patches
On Mon, May 31, 2021 at 7:19 PM Jose E. Marchesi via Gcc-patches
 wrote:
>
> libiberty/ChangeLog:

OK.  You can apply this separately.

I still believe LTO support is kind-of broken, but well ;)

Richard.

> * simple-object.c (handle_lto_debug_sections): Copy over .BTF section.
> ---
>  libiberty/simple-object.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/libiberty/simple-object.c b/libiberty/simple-object.c
> index 909995dd166..facbf94fd09 100644
> --- a/libiberty/simple-object.c
> +++ b/libiberty/simple-object.c
> @@ -307,6 +307,9 @@ handle_lto_debug_sections (const char *name, int rename)
>/* Copy over .ctf section under the same name if present.  */
>else if (strcmp (name, ".ctf") == 0)
>  return strcpy (newname, name);
> +  /* Copy over .BTF section under the same name if present.  */
> +  else if (strcmp (name, ".BTF") == 0)
> +return strcpy (newname, name);
>free (newname);
>return NULL;
>  }
> --
> 2.25.0.2.g232378479e
>


Re: [[PATCH V9] 6/7] Enable BTF generation in the BPF backend

2021-06-21 Thread Richard Biener via Gcc-patches
On Mon, May 31, 2021 at 7:18 PM Jose E. Marchesi via Gcc-patches
 wrote:
>
> This patch changes the BPF GCC backend in order to use the DWARF debug
> hooks and therefore enables the user to generate BTF debugging
> information with -gbtf.  Generating BTF is crucial when compiling BPF
> programs, since the CO-RE (compile-once, run-everwhere) mechanism
> used by the kernel BPF loader relies on it.
>
> Note that since in eBPF it is not possible to unwind frames due to the
> restrictive nature of the target architecture, we are disabling the
> generation of CFA in this target.

You want to CC the BPF maintainer here.  Note that IIRC RTX_FRAME_RELATED_P
also prevents code-motion for some insns, so I'm not sure removing
that is 100% safe.

> 2021-05-14  David Faust 
>
> * config/bpf/bpf.c (bpf_expand_prologue): Do not mark insns as
> frame related.
> (bpf_expand_epilogue): Likewise.
> * config/bpf/bpf.h (DWARF2_FRAME_INFO): Define to 0.
> Do not define DBX_DEBUGGING_INFO.
> ---
>  gcc/config/bpf/bpf.c |  4 
>  gcc/config/bpf/bpf.h | 12 ++--
>  2 files changed, 2 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.c b/gcc/config/bpf/bpf.c
> index 126d4a2798d..e635f9edb40 100644
> --- a/gcc/config/bpf/bpf.c
> +++ b/gcc/config/bpf/bpf.c
> @@ -349,7 +349,6 @@ bpf_expand_prologue (void)
>   hard_frame_pointer_rtx,
>   fp_offset - 8));
>   insn = emit_move_insn (mem, gen_rtx_REG (DImode, regno));
> - RTX_FRAME_RELATED_P (insn) = 1;
>   fp_offset -= 8;
> }
> }
> @@ -364,7 +363,6 @@ bpf_expand_prologue (void)
>  {
>insn = emit_move_insn (stack_pointer_rtx,
>  hard_frame_pointer_rtx);
> -  RTX_FRAME_RELATED_P (insn) = 1;
>
>if (size > 0)
> {
> @@ -372,7 +370,6 @@ bpf_expand_prologue (void)
>  gen_rtx_PLUS (Pmode,
>stack_pointer_rtx,
>GEN_INT (-size;
> - RTX_FRAME_RELATED_P (insn) = 1;
> }
>  }
>  }
> @@ -412,7 +409,6 @@ bpf_expand_epilogue (void)
>   hard_frame_pointer_rtx,
>   fp_offset - 8));
>   insn = emit_move_insn (gen_rtx_REG (DImode, regno), mem);
> - RTX_FRAME_RELATED_P (insn) = 1;
>   fp_offset -= 8;
> }
> }
> diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h
> index 4c5b19e262b..55beecbcb36 100644
> --- a/gcc/config/bpf/bpf.h
> +++ b/gcc/config/bpf/bpf.h
> @@ -235,17 +235,9 @@ enum reg_class
>
>  / Debugging Info /
>
> -/* We cannot support DWARF2 because of the limitations of eBPF.  */
> +/* In eBPF it is not possible to unwind frames. Disable CFA.  */
>
> -/* elfos.h insists in using DWARF.  Undo that here.  */
> -#ifdef DWARF2_DEBUGGING_INFO
> -# undef DWARF2_DEBUGGING_INFO
> -#endif
> -#ifdef PREFERRED_DEBUGGING_TYPE
> -# undef PREFERRED_DEBUGGING_TYPE
> -#endif
> -
> -#define DBX_DEBUGGING_INFO
> +#define DWARF2_FRAME_INFO 0
>
>  / Stack Layout and Calling Conventions.  */
>
> --
> 2.25.0.2.g232378479e
>


Re: [[PATCH V9] 5/7] CTF/BTF documentation

2021-06-21 Thread Richard Biener via Gcc-patches
On Mon, May 31, 2021 at 7:17 PM Jose E. Marchesi via Gcc-patches
 wrote:
>
> This commit documents the new command line options introduced by the
> CTF and BTF debug formats.

Can you amend the cover text for 'Debugging Options' to mention that
there are debug formats that can co-exist with others (like DWARF with
CTF) but unless stated explicitly the -g option specifies the
main debug info format to be used?  And for -gctf and friends document
it can work as alternate format (but also main, in case no other is specified?).

Thanks,
Richard.

> 2021-05-14  Indu Bhagat  
>
> * doc/invoke.texi: Document the CTF and BTF debug info options.
> ---
>  gcc/doc/invoke.texi | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 5cd4e2d993c..25dd50738de 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -466,6 +466,7 @@ Objective-C and Objective-C++ Dialects}.
>  @item Debugging Options
>  @xref{Debugging Options,,Options for Debugging Your Program}.
>  @gccoptlist{-g  -g@var{level}  -gdwarf  -gdwarf-@var{version} @gol
> +-gbtf -gctf  -gctf@var{level} @gol
>  -ggdb  -grecord-gcc-switches  -gno-record-gcc-switches @gol
>  -gstabs  -gstabs+  -gstrict-dwarf  -gno-strict-dwarf @gol
>  -gas-loc-support  -gno-as-loc-support @gol
> @@ -9696,6 +9697,25 @@ other DWARF-related options such as
>  @option{-fno-dwarf2-cfi-asm}) retain a reference to DWARF Version 2
>  in their names, but apply to all currently-supported versions of DWARF.
>
> +@item -gbtf
> +@opindex gbtf
> +Request BTF debug information.
> +
> +@item -gctf
> +@itemx -gctf@var{level}
> +@opindex gctf
> +Request CTF debug information and use level to specify how much CTF debug
> +information should be produced.  If -gctf is specified without a value for
> +level, the default level of CTF debug information is 2.
> +
> +Level 0 produces no CTF debug information at all.  Thus, -gctf0 negates 
> -gctf.
> +
> +Level 1 produces CTF information for tracebacks only.  This includes callsite
> +information, but does not include type information.
> +
> +Level 2 produces type information for entities (functions, data objects etc.)
> +at file-scope or global-scope only.
> +
>  @item -gstabs
>  @opindex gstabs
>  Produce debugging information in stabs format (if that is supported),
> --
> 2.25.0.2.g232378479e
>


Re: [[PATCH V9] 2/7] dejagnu: modularize gcc-dg-debug-runtest a bit

2021-06-21 Thread Richard Biener via Gcc-patches
On Mon, May 31, 2021 at 7:15 PM Jose E. Marchesi via Gcc-patches
 wrote:
>
> Move some functionality into a procedure of its own. This is only so that when
> the patch for ctf comes along, the gcc-dg-debug-runtest procedure looks bit
> more uniform.

OK (you can apply this separately).

Richard.

> gcc/testsuite/ChangeLog:
>
> * lib/gcc-dg.exp (gcc-dg-target-supports-debug-format): New procedure.
> ---
>  gcc/testsuite/lib/gcc-dg.exp | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> index fce0989cd9c..c7722ba07da 100644
> --- a/gcc/testsuite/lib/gcc-dg.exp
> +++ b/gcc/testsuite/lib/gcc-dg.exp
> @@ -621,18 +621,27 @@ proc gcc-dg-runtest { testcases flags 
> default-extra-flags } {
>  }
>  }
>
> -proc gcc-dg-debug-runtest { target_compile trivial opt_opts testcases } {
> +# Check if the target system supports the debug format
> +proc gcc-dg-target-supports-debug-format { target_compile trivial type } {
>  global srcdir subdir
>
> +set comp_output [$target_compile \
> +   "$srcdir/$subdir/$trivial" "trivial.S" assembly \
> +   "additional_flags=$type"]
> +if { ! [string match "*: target system does not support the * debug 
> format*" \
> +   $comp_output] } {
> +   remove-build-file "trivial.S"
> +   return 1
> +}
> +return 0
> +}
> +
> +proc gcc-dg-debug-runtest { target_compile trivial opt_opts testcases } {
>  if ![info exists DEBUG_TORTURE_OPTIONS] {
> set DEBUG_TORTURE_OPTIONS ""
> foreach type {-gdwarf-2 -gstabs -gstabs+ -gxcoff -gxcoff+} {
> -   set comp_output [$target_compile \
> -   "$srcdir/$subdir/$trivial" "trivial.S" assembly \
> -   "additional_flags=$type"]
> -   if { ! [string match "*: target system does not support the * 
> debug format*" \
> -   $comp_output] } {
> -   remove-build-file "trivial.S"
> +   if [expr [gcc-dg-target-supports-debug-format \
> + $target_compile $trivial $type]] {
> foreach level {1 "" 3} {
> if { ($type == "-gdwarf-2") && ($level != "") } {
> lappend DEBUG_TORTURE_OPTIONS [list "${type}" 
> "-g${level}"]
> --
> 2.25.0.2.g232378479e
>


Re: [Patch, fortran V2] PR fortran/93308/93963/94327/94331/97046 problems raised by descriptor handling

2021-06-21 Thread Tobias Burnus

Hi José,

(in principle, I'd like to have the libgfortran function moved to the
compiler proper to avoid some issues, but that's admittedly a task
independent of your work.)

On 15.06.21 01:09, José Rui Faustino de Sousa via Fortran wrote:

Update to a proposed patch to:
Bug 93308 - bind(c) subroutine changes lower bound of array argument
in caller
Bug 93963 - Select rank mishandling allocatable and pointer arguments
with bind(c)
Bug 94327 - Bind(c) argument attributes are incorrectly set
Bug 94331 - Bind(C) corrupts array descriptors
Bug 97046 - Bad interaction between lbound/ubound, allocatable arrays
and bind(C) subroutine with dimension(..) parameter
...
Patch tested only on x86_64-pc-linux-gnu.
Fix attribute handling, which reflect a prior intermediate version of
the Fortran standard.


LGTM – except for one minor nit. In trans-expr.c's 
gfc_conv_gfc_desc_to_cfi_desc:

   /* Transfer values back to gfc descriptor.  */
+  if (cfi_attribute != 2
+  && !fsym->attr.value
+  && fsym->attr.intent != INTENT_IN)

Can you add after the '2' the string '  /* CFI_attribute_other.  */'
to make the number less magic.

Thanks,

Tobias




CFI descriptors, in most cases, should not be copied out has they can
corrupt the Fortran descriptor. Bounds will vary and the original
Fortran bounds are definitively lost on conversion.

Thank you very much.

Best regards,
José Rui

Fortran: Fix attributtes and bounds in ISO_Fortran_binding.

gcc/fortran/ChangeLog:

PR fortran/93308
PR fortran/93963
PR fortran/94327
PR fortran/94331
PR fortran/97046
* trans-decl.c (convert_CFI_desc): Only copy out the descriptor
if necessary.
* trans-expr.c (gfc_conv_gfc_desc_to_cfi_desc): Updated attribute
handling which reflect a previous intermediate version of the
standard. Only copy out the descriptor if necessary.

libgfortran/ChangeLog:

PR fortran/93308
PR fortran/93963
PR fortran/94327
PR fortran/94331
PR fortran/97046
* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc): Add code
to verify the descriptor. Correct bounds calculation.
(gfc_desc_to_cfi_desc): Add code to verify the descriptor.

gcc/testsuite/ChangeLog:

PR fortran/93308
PR fortran/93963
PR fortran/94327
PR fortran/94331
PR fortran/97046
* gfortran.dg/ISO_Fortran_binding_1.f90: Add pointer attribute,
this test is still erroneous but now it compiles.
* gfortran.dg/bind_c_array_params_2.f90: Update regex to match
code changes.
* gfortran.dg/PR93308.f90: New test.
* gfortran.dg/PR93963.f90: New test.
* gfortran.dg/PR94327.c: New test.
* gfortran.dg/PR94327.f90: New test.
* gfortran.dg/PR94331.c: New test.
* gfortran.dg/PR94331.f90: New test.
* gfortran.dg/PR97046.f90: New test.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[PATCH] mips: add MSA vec_cmp and vec_cmpu expand pattern [PR101132]

2021-06-21 Thread Xi Ruoyao via Gcc-patches
Middle-end started to emit vec_cmp and vec_cmpu since GCC 11, causing
ICE on MIPS with MSA enabled.  Add the pattern to prevent it.

Bootstrapped and regression tested on mips64el-linux-gnu.
Ok for trunk?

gcc/

* config/mips/mips-protos.h (mips_expand_vec_cmp_expr): Declare.
* config/mips/mips.c (mips_expand_vec_cmp_expr): New function.
* config/mips/mips-msa.md (vec_cmp): New
  expander.
  (vec_cmpu): New expander.
---
 gcc/config/mips/mips-msa.md   | 22 ++
 gcc/config/mips/mips-protos.h |  1 +
 gcc/config/mips/mips.c| 11 +++
 3 files changed, 34 insertions(+)

diff --git a/gcc/config/mips/mips-msa.md b/gcc/config/mips/mips-msa.md
index 3ecf2bde19f..3a67f25be56 100644
--- a/gcc/config/mips/mips-msa.md
+++ b/gcc/config/mips/mips-msa.md
@@ -435,6 +435,28 @@
   DONE;
 })
 
+(define_expand "vec_cmp"
+  [(match_operand: 0 "register_operand")
+   (match_operator 1 ""
+ [(match_operand:MSA 2 "register_operand")
+  (match_operand:MSA 3 "register_operand")])]
+  "ISA_HAS_MSA"
+{
+  mips_expand_vec_cmp_expr (operands);
+  DONE;
+})
+
+(define_expand "vec_cmpu"
+  [(match_operand: 0 "register_operand")
+   (match_operator 1 ""
+ [(match_operand:IMSA 2 "register_operand")
+  (match_operand:IMSA 3 "register_operand")])]
+  "ISA_HAS_MSA"
+{
+  mips_expand_vec_cmp_expr (operands);
+  DONE;
+})
+
 (define_insn "msa_insert_"
   [(set (match_operand:MSA 0 "register_operand" "=f,f")
(vec_merge:MSA
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 2cf4ed50292..a685f7f7dd5 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -385,6 +385,7 @@ extern mulsidi3_gen_fn mips_mulsidi3_gen_fn (enum rtx_code);
 
 extern void mips_register_frame_header_opt (void);
 extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
+extern void mips_expand_vec_cmp_expr (rtx *);
 
 /* Routines implemented in mips-d.c  */
 extern void mips_d_target_versions (void);
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 00a8eef96aa..8f043399a8e 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -22321,6 +22321,17 @@ mips_expand_msa_cmp (rtx dest, enum rtx_code cond, rtx 
op0, rtx op1)
 }
 }
 
+void
+mips_expand_vec_cmp_expr (rtx *operands)
+{
+  rtx cond = operands[1];
+  rtx op0 = operands[2];
+  rtx op1 = operands[3];
+  rtx res = operands[0];
+
+  mips_expand_msa_cmp (res, GET_CODE (cond), op0, op1);
+}
+
 /* Expand VEC_COND_EXPR, where:
MODE is mode of the result
VIMODE equivalent integer mode
-- 
2.32.0





Re: [PATCH] libstdc++: Sync __cpp_lib_ranges macro defined in ranges_cmp.h

2021-06-21 Thread Jonathan Wakely via Gcc-patches
On Mon, 21 Jun 2021 at 14:37, Patrick Palka via Libstdc++
 wrote:
>
> r12-1606 bumped the value of __cpp_lib_ranges defined in ,
> but this macro is also defined in , so it needs to
> be updated there too.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_cmp.h (__cpp_lib_ranges): Adjust value.

OK.


[PATCH] libstdc++: Sync __cpp_lib_ranges macro defined in ranges_cmp.h

2021-06-21 Thread Patrick Palka via Gcc-patches
r12-1606 bumped the value of __cpp_lib_ranges defined in ,
but this macro is also defined in , so it needs to
be updated there too.

libstdc++-v3/ChangeLog:

* include/bits/ranges_cmp.h (__cpp_lib_ranges): Adjust value.
---
 libstdc++-v3/include/bits/ranges_cmp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/ranges_cmp.h 
b/libstdc++-v3/include/bits/ranges_cmp.h
index f859a33b2c1..1d7da30dddf 100644
--- a/libstdc++-v3/include/bits/ranges_cmp.h
+++ b/libstdc++-v3/include/bits/ranges_cmp.h
@@ -57,7 +57,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #ifdef __cpp_lib_concepts
 // Define this here, included by all the headers that need to define it.
-#define __cpp_lib_ranges 201911L
+#define __cpp_lib_ranges 202106L
 
 namespace ranges
 {
-- 
2.32.0.93.g670b81a890



Re: [Patch, v2] contrib/mklog.py: Improve PR handling (was: Re: git gcc-commit-mklog doesn't extract PR number to ChangeLog)

2021-06-21 Thread Tobias Burnus

Now committed as r12-1700-gedf0c3ffb59d75c11e05bc722432dc554e275c72 / as
attached.

(We had some follow-up discussion on IRC.)

On 21.06.21 14:53, Martin Liška wrote:

+# PR number in the file name
+fname = os.path.basename(file.path)


This is a dead code.


+ fname = os.path.splitext(fname)[0]
+m = pr_filename_regex.search(fname)

(Meant was the 'splitext' line - it is dead code as the re.search globs
all digits after 'pr' and then stops, ignoring the rest, including file
extensions. – I first thought it referred to the basename line, which
confused me.)

+ parser.add_argument('-b', '--pr-numbers', action='append',
+help='Add the specified PRs (comma
separated)')


Do we really want to support '-b 1 -b 2' and also -b '1,2' formats?
Seems to me quite
complicated.

[...]
I would start with -b 1,2,3,4 syntax. It will be likely easier for git
alias integration.


Done so.

I note that argparse permits '-d dir1 -d dir2 -b bug1 -b bug2' which
then only keeps the dir2 as directory and bug2 as bug without printing
an error (or warning) for ignoring dir1 and bug1.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
commit edf0c3ffb59d75c11e05bc722432dc554e275c72
Author: Tobias Burnus 
Date:   Mon Jun 21 15:17:22 2021 +0200

contrib/mklog.py: Improve PR handling

Co-authored-by: Martin Sebor 

contrib/ChangeLog:

* mklog.py (bugzilla_url): Fetch also component.
(pr_filename_regex): New.
(get_pr_titles): Update PR string with correct format and component.
(generate_changelog): Take additional PRs; extract PR from the
filename.
(__main__): Add -b/--pr-numbers argument.
* test_mklog.py (EXPECTED4): Update to expect a PR for the new file.
---
 contrib/mklog.py  | 38 +-
 contrib/test_mklog.py |  3 +++
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 1f59055e723..0b434f67971 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -42,6 +42,7 @@ pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)')
 prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)')
 dr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PDR [0-9]+)')
 dg_regex = re.compile(r'{\s+dg-(error|warning)')
+pr_filename_regex = re.compile(r'(^|[\W_])[Pp][Rr](?P\d{4,})')
 identifier_regex = re.compile(r'^([a-zA-Z0-9_#].*)')
 comment_regex = re.compile(r'^\/\*')
 struct_regex = re.compile(r'^(class|struct|union|enum)\s+'
@@ -52,7 +53,7 @@ fn_regex = re.compile(r'([a-zA-Z_][^()\s]*)\s*\([^*]')
 template_and_param_regex = re.compile(r'<[^<>]*>')
 md_def_regex = re.compile(r'\(define.*\s+"(.*)"')
 bugzilla_url = 'https://gcc.gnu.org/bugzilla/rest.cgi/bug?id=%s;' \
-   'include_fields=summary'
+   'include_fields=summary,component'
 
 function_extensions = {'.c', '.cpp', '.C', '.cc', '.h', '.inc', '.def', '.md'}
 
@@ -118,20 +119,23 @@ def sort_changelog_files(changed_file):
 
 
 def get_pr_titles(prs):
-output = ''
-for pr in prs:
+output = []
+for idx, pr in enumerate(prs):
 pr_id = pr.split('/')[-1]
 r = requests.get(bugzilla_url % pr_id)
 bugs = r.json()['bugs']
 if len(bugs) == 1:
-output += '%s - %s\n' % (pr, bugs[0]['summary'])
-print(output)
+prs[idx] = 'PR %s/%s' % (bugs[0]['component'], pr_id)
+out = '%s - %s\n' % (prs[idx], bugs[0]['summary'])
+if out not in output:
+output.append(out)
 if output:
-output += '\n'
-return output
+output.append('')
+return '\n'.join(output)
 
 
-def generate_changelog(data, no_functions=False, fill_pr_titles=False):
+def generate_changelog(data, no_functions=False, fill_pr_titles=False,
+   additional_prs=None):
 changelogs = {}
 changelog_list = []
 prs = []
@@ -139,6 +143,8 @@ def generate_changelog(data, no_functions=False, fill_pr_titles=False):
 diff = PatchSet(data)
 global firstpr
 
+if additional_prs:
+prs = [pr for pr in additional_prs if pr not in prs]
 for file in diff:
 # skip files that can't be parsed
 if file.path == '/dev/null':
@@ -154,21 +160,32 @@ def generate_changelog(data, no_functions=False, fill_pr_titles=False):
 # Only search first ten lines as later lines may
 # contains commented code which a note that it
 # has not been tested due to a certain PR or DR.
+this_file_prs = []
 for line in list(file)[0][0:10]:
 m = pr_regex.search(line.value)
 if m:
 pr = m.group('pr')
 if pr not in prs:
 

Re: [PATCH] tree-optimization/101120 - fix compile-time issue with SLP groups

2021-06-21 Thread Richard Biener via Gcc-patches
On Fri, Jun 18, 2021 at 4:24 PM Richard Biener
 wrote:
>
> On Fri, Jun 18, 2021 at 2:23 PM Richard Biener  wrote:
> >
> > This places two hacks to avoid an old compile-time issue when
> > vectorizing large permuted SLP groups with gaps where we end up
> > emitting loads and IV adjustments for the gap as well and those
> > have quite a high cost until they are eventually cleaned up.
> >
> > The first hack is to fold the auto-inc style IV updates early
> > in the vectorizer rather than in the next forwprop pass which
> > shortens the SSA use-def chains of the used IV.
> >
> > The second hack is to remove the unused loads after we've picked
> > all that we possibly use.
> >
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> >
> > I wonder if this is too gross (and I have to check the one or two
> > bug duplicates), but it should be at least easy to backport ...
>
> Was apparently too simple - the following passes bootstrap and
> regtest.

I've pushed this now after thinking about better solutions.

Richard.

>
> Richard.


Re: [PATCH] tree-optimization/101014 - Remove poor value computations.

2021-06-21 Thread Andrew MacLeod via Gcc-patches

On 6/19/21 3:51 AM, Richard Biener wrote:

On June 18, 2021 11:46:08 PM GMT+02:00, Andrew MacLeod  
wrote:

I am pleased to say that this patch kills the poor value computations
in
the ranger's cache.

Its been a bit of a thorn, and is mostly a hack that was applied early
on to enable getting some opportunities that were hard to get
otherwise.

The more consistent propagation we now do combined with other changes
means I can kill this wart on trunk. It even results in a 1% speedup..
and should resolve some of the excessive compile time issues causes by
undesirable iteration, including 101014.. for good I hope :-).

I tried turning off the poor_value computations on the GCC11 branch,
and
we may want to consider doing it there too.  In my testsuite, we miss a

total of 3 cases out of 4700 new ones identified by ranger.  For the
stability, I'd suggest we turn off poor_value computations there as
well.  This patch rips out all the code, but for GCC11 I'd just change
push_poor_value to always return false, thus never registering any
values. less churn that way. I'll run some tests and post that
separately if you think we should go ahead with it.

Bootstraps on 86_64-pc-linux-gnu with no regressions.  pushed.

Nice. I think we should indeed consider mostly syncing the algorithmic changes 
with GCC 11 to make maintenance easier, at least up to 11.2. Now, please leave 
such changes some time to bake on trunk before backporting.

Thanks,
Richard.

For sure.  Im accumulating the gcc11 patches, and will hold them for a 
bit yet.


Andrew



Re: [Patch, v2] contrib/mklog.py: Improve PR handling (was: Re: git gcc-commit-mklog doesn't extract PR number to ChangeLog)

2021-06-21 Thread Martin Liška

On 6/21/21 10:37 AM, Tobias Burnus wrote:

On 21.06.21 10:09, Martin Liška wrote:


$ pytest test_mklog.py
FAILED test_mklog.py::TestMklog::test_sorting - AssertionError: assert
'\n\tPR 50209...New test.\n\n' == 'gcc/ChangeLo...New test.\n\n'

Aha, missed that there is indeed a testsuite - nice!

$ flake8 mklog.py
mklog.py:187:23: Q000 Remove bad quotes

I have now filled:
https://bugs.launchpad.net/ubuntu/+source/python-pytest-flake8/+bug/1933075


+    # PR number in the file name
+    fname = os.path.basename(file.path)


This is a dead code.


+ fname = os.path.splitext(fname)[0]
+    m = pr_filename_regex.search(fname)

It does not look like dead code to me.


Hello.

The code is weird as os.path.basename returns:

In [5]: os.path.basename('/tmp/a/b/c.txt')
Out[5]: 'c.txt'

why do you need os.path.splitext(fname) call?



+ parser.add_argument('-b', '--pr-numbers', action='append',
+    help='Add the specified PRs (comma separated)')


Do we really want to support '-b 1 -b 2' and also -b '1,2' formats?
Seems to me quite
complicated.


I don't have a strong opinion. I started with '-b 123,245', believing
that the syntax is fine. But then I realized that without '-p'
specifying multiple '-b' looks better by having multiple '-b' if 'PR
/'  (needed for -p as the string is than taken as is). Thus,
I ended up supporting either variant.


I would start with -b 1,2,3,4 syntax. It will be likely easier for git alias 
integration.

Martin



But I also happily drop the ',' support.

Change: One quote change, one test_mklog update.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf




[PATCH] RISC-V: Add tune info for T-HEAD C906.

2021-06-21 Thread Jojo R via Gcc-patches
gcc/
* config/riscv/riscv.c (thead_c906_tune_info): New.
* config/riscv/riscv.c (riscv_tune_info_table): Use new tune.
---
 gcc/config/riscv/riscv.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 1baa2990ee27..576960bb37cb 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -300,6 +300,19 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   true,/* 
slow_unaligned_access */
 };
 
+/* Costs to use when optimizing for T-HEAD c906.  */
+static const struct riscv_tune_param thead_c906_tune_info = {
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_add */
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
+  {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */
+  {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
+  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
+  1,/* issue_rate */
+  3,/* branch_cost */
+  5,/* memory_cost */
+  false,/* slow_unaligned_access */
+};
+
 /* Costs to use when optimizing for size.  */
 static const struct riscv_tune_param optimize_size_tune_info = {
   {COSTS_N_INSNS (1), COSTS_N_INSNS (1)},  /* fp_add */
@@ -348,6 +361,7 @@ static const struct riscv_tune_info riscv_tune_info_table[] 
= {
   { "sifive-3-series", generic, _tune_info },
   { "sifive-5-series", generic, _tune_info },
   { "sifive-7-series", sifive_7, _7_tune_info },
+  { "thead-c906", generic, _c906_tune_info },
   { "size", generic, _size_tune_info },
 };
 
-- 
2.24.3 (Apple Git-128)



Re: [Patch, fortran V3] PR fortran/100683 - Array initialization refuses valid

2021-06-21 Thread Tobias Burnus

Hi José,

On 17.06.21 21:34, José Rui Faustino de Sousa via Gcc-patches wrote:

Update to a proposed patch to:
PR100683 - Array initialization refuses valid
due to more errors being found...

Patch tested only on x86_64-pc-linux-gnu.

LGTM – sorry for the very belated review.

Add call to simplify expression before parsing

Nit: I think you mean resolving/processing/expanding/checking – as
gfc_resolve_expr comes after the actual parsing.

*and* check *appropriately* if the expression is still an array after
simplification.


 * * *

I have to admit that I got a bit lost with your patches. Are there still
outstanding patches? I also recall approving a patch quite some time ago
which was then not committed for a long time. (I have not checked
whether it was committed by now.)

Thus: Do you have a list of patches pending review? Secondly, I assume
you can commit or do you have commit issues?

Tobias


Fortran: Fix bogus error

gcc/fortran/ChangeLog:

PR fortran/100683
* resolve.c (gfc_resolve_expr): Add call to gfc_simplify_expr.

gcc/testsuite/ChangeLog:

PR fortran/100683
* gfortran.dg/pr87993.f90: increased test coverage.
* gfortran.dg/PR100683.f90: New test.

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [RFC] New idea to split loop based on no-wrap conditions

2021-06-21 Thread Richard Biener
On Mon, 21 Jun 2021, guojiufu wrote:

> On 2021-06-21 14:19, guojiufu via Gcc-patches wrote:
> > On 2021-06-09 19:18, guojiufu wrote:
> >> On 2021-06-09 17:42, guojiufu via Gcc-patches wrote:
> >>> On 2021-06-08 18:13, Richard Biener wrote:
>  On Fri, 4 Jun 2021, Jiufu Guo wrote:
>  
> >>> cut...
> > cut...
> >> 
> 
> Besides the method in the previous mails, 
> I’m thinking of another way to split loops:
> 
> foo (int *a, int *b, unsigned k, unsigned n)
> {   
>  while (++k != n)
>    a[k] = b[k] + 1;   
> } 
> 
> We may split it into:
> if (k {
>   while (++k < n)  //loop1
>    a[k] = b[k] + 1;   
> }
> else
> {
>  while (++k != n) //loop2
>    a[k] = b[k] + 1;  
> }
> 
> In most cases, loop1 would be hit, the overhead of this method is only
> checking “if (k which would be smaller than the previous method.

That would be your original approach of versioning the loop.  I think
I suggested that for this scalar evolution and dataref analysis should
be enhanced to build up conditions under which IV evolutions are
affine (non-wrapping) and the versioning code in actual transforms
should then do the appropriate versioning (like the vectorizer already
does for niter analysis ->assumptions for example).

Richard.

> And this method would be more easy to extend to nest loops like:
>  unsigned int l_n = 0;
>  unsigned int l_m = 0;
>  unsigned int l_k = 0;
>  for (l_n = 0; l_n != n; l_n++)
>    for (l_k = 0; l_k != k; l_k++)
>  for (l_m = 0; l_m != m; l_m++)
>  xxx;
> 
> Do you think this method is more valuable to implement? 
> Below is a quick patch.  This patch does not support nest loops yet.
> 
> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> index 3a09bbc39e5..c9d161565e4 100644
> --- a/gcc/tree-ssa-loop-split.c
> +++ b/gcc/tree-ssa-loop-split.c
> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "cfghooks.h"
>  #include "gimple-fold.h"
>  #include "gimplify-me.h"
> +#include "tree-ssa-loop-ivopts.h"
> 
>  /* This file implements two kinds of loop splitting.
> 
> @@ -1593,6 +1594,468 @@ split_loop_on_cond (struct loop *loop)
>return do_split;
>  }
> 
> +/* Filter out type conversions on IDX.
> +   Store the shortest type during conversion to SMALL_TYPE.
> +   Store the longest type during conversion to LARGE_TYPE.  */
> +
> +static gimple *
> +filter_conversions (class loop *loop, tree idx, tree *small_type = NULL,
> + tree *large_type = NULL)
> +{
> +  gcc_assert (TREE_CODE (idx) == SSA_NAME);
> +  gimple *stmt = SSA_NAME_DEF_STMT (idx);
> +  while (is_gimple_assign (stmt)
> +  && flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> +{
> +  if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
> + {
> +   idx = gimple_assign_rhs1 (stmt);
> +   if (small_type)
> + {
> +   tree type = TREE_TYPE (idx);
> +   if (TYPE_PRECISION (*small_type) > TYPE_PRECISION (type)
> +   || (TYPE_PRECISION (*small_type) == TYPE_PRECISION (type)
> +   && TYPE_UNSIGNED (*small_type) && !TYPE_UNSIGNED
> (type)))
> + *small_type = type;
> + }
> +   if (large_type)
> + {
> +   tree type = TREE_TYPE (idx);
> +   if (TYPE_PRECISION (*large_type) < TYPE_PRECISION (type)
> +   || (TYPE_PRECISION (*large_type) == TYPE_PRECISION (type)
> +   && !TYPE_UNSIGNED (*large_type) && TYPE_UNSIGNED
> (type)))
> + *large_type = type;
> + }
> + }
> +  else
> + break;
> +
> +  if (TREE_CODE (idx) != SSA_NAME)
> + break;
> +  stmt = SSA_NAME_DEF_STMT (idx);
> +}
> +  return stmt;
> +}
> +
> +/* Collection of loop index related elements.  */
> +struct idx_elements
> +{
> +  gcond *gc;
> +  gphi *phi;
> +  gimple *inc_stmt;
> +  tree idx;
> +  tree bnd;
> +  tree step;
> +  tree large_type;
> +  tree small_type;
> +  bool cmp_on_next;
> +};
> +
> +/*  Analyze and get the idx related elements: bnd,
> +phi, increase stmt from exit edge E, etc.
> +
> +i = phi (b, n)
> +...
> +n0 = ik + 1
> +n1 = (type)n0
> +...
> +if (i != bnd) or if (n != bnd)
> +...
> +n = ()nl
> +
> +   IDX is the i' or n'.  */
> +
> +bool
> +analyze_idx_elements (class loop *loop, edge e, idx_elements )
> +{
> +  /* Avoid complicated edge.  */
> +  if (e->flags & EDGE_FAKE)
> +return false;
> +  if (e->src != loop->header && e->src != single_pred (loop->latch))
> +return false;
> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->src))
> +return false;
> +
> +  /* Check gcond.  */
> +  gimple *last = last_stmt (e->src);
> +  if (!last || gimple_code (last) != GIMPLE_COND)
> +return false;
> +
> +  /* Get idx and bnd from gcond. */
> +  gcond *gc = as_a (last);
> +  tree bnd = gimple_cond_rhs (gc);
> +  tree idx = gimple_cond_lhs (gc);
> +  if 

Re: [Ping^2, Patch, Fortran] PR100337 Should be able to pass non-present optional arguments to CO_BROADCAST

2021-06-21 Thread Tobias Burnus

Any reason that you did not put it under
  gfortran.dg/coarray/
such that it is also run with -fcoarray=lib (-lcaf_single)?
I know that the issue only exists for single, but it also makes
sense to check that libcaf_single works 

In that sense, I wonder whether also the other CO_* should be
checked in the testsuite as they are handled differently in
libcaf_... (but identical with -fcoarray=single).

Except for those two nits, it LGTM. Thanks!

Tobias

PS: The function is used by
case GFC_ISYM_CO_BROADCAST:
case GFC_ISYM_CO_MIN:
case GFC_ISYM_CO_MAX:
case GFC_ISYM_CO_REDUCE:
case GFC_ISYM_CO_SUM:
and, with -fcoarray=single, errmsg is not touched
as stat is (unconditionally) 0 (success)..


On 19.06.21 13:23, Andre Vehreschild via Fortran wrote:

PING!

On Fri, 4 Jun 2021 18:05:18 +0200
Andre Vehreschild  wrote:


Ping!

On Fri, 21 May 2021 15:33:11 +0200
Andre Vehreschild  wrote:


Hi,

the attached patch fixes an issue when calling CO_BROADCAST in
-fcoarray=single mode, where the optional but non-present (in the calling
scope) stat variable was assigned to before checking for it being not
present.

Regtests fine on x86-64-linux/f33. Ok for trunk?

Regards,
Andre




--
Andre Vehreschild * Email: vehre ad gmx dot de

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [PATCH] PR fortran/100950 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-06-21 Thread Tobias Burnus

Hi Harald,

sorry for being way behind my review duties :-(

On 10.06.21 20:52, Harald Anlauf via Fortran wrote:

+static bool
+substring_has_constant_len (gfc_expr *e)
+{
+  ptrdiff_t istart, iend;
+  size_t length;
+  bool equal_length = false;
+
+  if (e->ts.type != BT_CHARACTER
+  || !e->ref
+  || e->ref->type != REF_SUBSTRING


Is there a reason why you do not handle:

type t
  character(len=5) :: str1
  character(len=:), allocatable :: str2
end type
type(t) :: x

allocate(x%str2, source="abd")
if (len (x%str)) /= 1) ...
if (len (x%str2(1:2) /= 2) ...
etc.

Namely: Search the last_ref = expr->ref->next->next ...?
and then check that lastref?

  * * *

Slightly unrelated: I think the following does not violate
F2018's R916 / C923 – but is rejected, namely:
  R916  type-param-inquiry  is  designator % type-param-name
the latter is 'len' or 'kind' for intrinsic types. And:
  R901  designator is ...
   or substring
But

character(len=5) :: str
print *, str(1:3)%len
end

fails with

2 | print *, str(1:3)%len
  |  1
Error: Syntax error in PRINT statement at (1)


Assuming you don't want to handle it, can you open a new PR?
Thanks!

 * * *

That's in so far related to your patch as last_ref would
then be the last ref before ref->next == NULL or
before ref->next->type == REF_INQUIRY


+  istart = gfc_mpz_get_hwi (e->ref->u.ss.start->value.integer);
+  iend = gfc_mpz_get_hwi (e->ref->u.ss.end->value.integer);
+  length = gfc_mpz_get_hwi (e->ref->u.ss.length->length->value.integer);
+
+  if (istart <= iend)
+{
+  if (istart < 1)
+ {
+   gfc_error ("Substring start index (%ld) at %L below 1",
+  (long) istart, >ref->u.ss.start->where);


As mentioned by Bernhard, you could use HOST_WIDE_INT_PRINT_DEC.

(It probably only matters on Windows which uses long == int = 32bit for
strings longer than INT_MAX.)

Thanks,

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-21 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 21, 2021 at 12:28 PM Jakub Jelinek  wrote:
>
> On Mon, Jun 21, 2021 at 12:14:09PM +0200, Richard Biener wrote:
> > > But we could do what I've done in
> > > r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70
> > > - have int ix86_last_zero_store_uid;
> > > set to INSN_UID of the last store emitted by the peephole2s and
> > > then check that INSN_UID against the var.
> >
> > Hmm, or have reg_nonzero_bits_for_peephole2 () and maintain
> > that somehow ... (conservatively drop it when a SET is seen).
>
> Maintaining something in peephole2 wouldn't be that easy because
> of peephole2's rolling window, plus it would need to be done
> in the generic code even when nothing but a single target in a specific case
> needs that.
>
> The following seems to work.
>
> 2021-06-21  Jakub Jelinek  
>
> PR target/11877
> * config/i386/i386-protos.h (ix86_last_zero_store_uid): Declare.
> * config/i386/i386-expand.c (ix86_last_zero_store_uid): New variable.
> * config/i386/i386.c (ix86_expand_prologue): Clear it.
> * config/i386/i386.md (peephole2s for 1/2/4 stores of const0_rtx):
> Remove "" from match_operand.  Emit new insns using emit_move_insn and
> set ix86_last_zero_store_uid to INSN_UID of the last store.
> Add peephole2s for 1/2/4 stores of const0_rtx following previous
> successful peep2s.

LGTM.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-protos.h.jj2021-06-21 11:59:16.769693735 +0200
> +++ gcc/config/i386/i386-protos.h   2021-06-21 12:01:47.875691930 +0200
> @@ -111,6 +111,7 @@ extern bool ix86_use_lea_for_mov (rtx_in
>  extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]);
>  extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode);
>  extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]);
> +extern int ix86_last_zero_store_uid;
>  extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool 
> high);
>  extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
>  extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
> --- gcc/config/i386/i386-expand.c.jj2021-06-21 09:39:21.604488082 +0200
> +++ gcc/config/i386/i386-expand.c   2021-06-21 12:21:33.017977951 +0200
> @@ -1316,6 +1316,9 @@ find_nearest_reg_def (rtx_insn *insn, in
>return false;
>  }
>
> +/* INSN_UID of the last insn emitted by zero store peephole2s.  */
> +int ix86_last_zero_store_uid;
> +
>  /* Split lea instructions into a sequence of instructions
> which are executed on ALU to avoid AGU stalls.
> It is assumed that it is allowed to clobber flags register
> --- gcc/config/i386/i386.c.jj   2021-06-21 09:39:21.622487840 +0200
> +++ gcc/config/i386/i386.c  2021-06-21 12:06:54.049634337 +0200
> @@ -8196,6 +8196,7 @@ ix86_expand_prologue (void)
>bool save_stub_call_needed;
>rtx static_chain = NULL_RTX;
>
> +  ix86_last_zero_store_uid = 0;
>if (ix86_function_naked (current_function_decl))
>  {
>if (flag_stack_usage_info)
> --- gcc/config/i386/i386.md.jj  2021-06-21 09:42:04.086303699 +0200
> +++ gcc/config/i386/i386.md 2021-06-21 12:14:10.411847549 +0200
> @@ -19360,37 +19360,96 @@ (define_peephole2
>  ;; When optimizing for size, zeroing memory should use a register.
>  (define_peephole2
>[(match_scratch:SWI48 0 "r")
> -   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
> -   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))
> -   (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0))
> -   (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))]
> +   (set (match_operand:SWI48 1 "memory_operand") (const_int 0))
> +   (set (match_operand:SWI48 2 "memory_operand") (const_int 0))
> +   (set (match_operand:SWI48 3 "memory_operand") (const_int 0))
> +   (set (match_operand:SWI48 4 "memory_operand") (const_int 0))]
>"optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
> -  [(set (match_dup 1) (match_dup 0))
> -   (set (match_dup 2) (match_dup 0))
> -   (set (match_dup 3) (match_dup 0))
> -   (set (match_dup 4) (match_dup 0))]
> +  [(const_int 0)]
>  {
>ix86_expand_clear (operands[0]);
> +  emit_move_insn (operands[1], operands[0]);
> +  emit_move_insn (operands[2], operands[0]);
> +  emit_move_insn (operands[3], operands[0]);
> +  ix86_last_zero_store_uid
> += INSN_UID (emit_move_insn (operands[4], operands[0]));
> +  DONE;
>  })
>
>  (define_peephole2
>[(match_scratch:SWI48 0 "r")
> -   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
> -   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))]
> +   (set (match_operand:SWI48 1 "memory_operand") (const_int 0))
> +   (set (match_operand:SWI48 2 "memory_operand") (const_int 0))]
>"optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
> -  [(set (match_dup 1) (match_dup 0))
> -   (set (match_dup 2) (match_dup 0))]
> +  [(const_int 0)]
>  {
>ix86_expand_clear (operands[0]);
> +  

Re: [patch] Fortran: fix sm computation in CFI_allocate [PR93524]

2021-06-21 Thread Tobias Burnus

On 21.06.21 08:05, Sandra Loosemore wrote:


I ran into this bug in CFI_allocate while testing something else and
then realized there was already a PR open for it.  It seems like an
easy fix, and I've used Tobias's test case from the issue more or less
verbatim.

There were some other bugs added on to this issue but I think they
have all been fixed already except for this one.

OK to check in?

OK – but see some comments below.


 libgfortran/
  PR fortran/93524
  * runtime/ISO_Fortran_binding.c (CFI_allocate): Fix
  sm computation.

 gcc/testsuite/
  PR fortran/93524
  * gfortran.dg/pr93524.c, gfortran.dg/pr93524.f90: New.

It is new to me that we use this syntax. I think you want to have one
line per file, each starting with "*"

+++ b/gcc/testsuite/gfortran.dg/pr93524.c
@@ -0,0 +1,33 @@
+/* Test the fix for PR93524, in which CFI_allocate was computing
+   sm incorrectly for dimensions > 2.  */
+
+#include   // For size_t
+#include 


I keep making this mistake myself: The last line works if you
use the installed compiler for testing; if you run the testsuite
via the build directory, it will either fail or take the wrong
version of the file (the one under /usr/include). Solution: Use

#include "../../../libgfortran/ISO_Fortran_binding.h"

as we do in the other tests which use that file.


+++ b/gcc/testsuite/gfortran.dg/pr93524.f90
...
+! Test the fix for PR93524.  The main program is in pr93524.c.
+
+subroutine my_fortran_sub_1 (A) bind(C)
+  real :: A(:, :,:)
+  print *, 'Lower bounds: ', lbound(A) ! Lower bounds:111
+  print *, 'Upper bounds: ', ubound(A) ! Upper bounds:   2168
+end
+subroutine my_fortran_sub_2 (A) bind(C)
+  real, ALLOCATABLE :: A(:, :,:)
+  print *, 'Lower bounds: ', lbound(A)
+  print *, 'Upper bounds: ', ubound(A)


I think the 'print' should be replaced (or commented + augmented) by 'if
(any (lbound(A) /= 1) stop 1'; 'if (any (ubound(A) /= [21,6,8])) stop 2'
etc.

Actually, it probably does not work for the second function due to
PR92189 (lbounds are wrong). If so, you could use 'if (any (shape(A) /=
[21,6,8])) stop 4' instead.

Can you also add 'if (.not. is_contiguous (A)) stop 3' to both
functions? That issue was mentioned in the PR and is probably fixed by
your change.

Otherwise, it looks fine :-)

Thanks for the patch.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [PATCH][AVX512] Optimize vpexpand* to mask mov when mask have all ones in it's lower part (including 0 and -1).

2021-06-21 Thread Hongtao Liu via Gcc-patches
This is the patch I'm going to push to the trunk.

On Wed, May 12, 2021 at 3:28 PM Hongtao Liu  wrote:
>
> ping
>
> On Fri, Apr 30, 2021 at 12:49 PM Hongtao Liu  wrote:
> >
> > Hi:
> >   For v{,p}expand* When mask is 0, -1, or has all all one bits in its
> > lower part, it can be optimized to simple mov or mask mov.
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}  and
> > x86_64-linux-gnu{m32\ -march=cascadelake,-m64\ -march=cascadelake},
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-builtin.def (BDESC): Adjust builtin name.
> > * config/i386/sse.md (_expand_mask): Rename to ..
> > (expand_mask): this ..
> > (*expand_mask): New pre_reload splitter to transform
> > * v{,p}expand* to vmov* when mask is zero, all ones, or has
> > all ones in its lower part, otherwise still generate v{,p}expand*.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx512bw-pr100267-1.c: New test.
> > * gcc.target/i386/avx512bw-pr100267-b-2.c: New test.
> > * gcc.target/i386/avx512bw-pr100267-d-2.c: New test.
> > * gcc.target/i386/avx512bw-pr100267-q-2.c: New test.
> > * gcc.target/i386/avx512bw-pr100267-w-2.c: New test.
> > * gcc.target/i386/avx512f-pr100267-1.c: New test.
> > * gcc.target/i386/avx512f-pr100267-pd-2.c: New test.
> > * gcc.target/i386/avx512f-pr100267-ps-2.c: New test.
> > * gcc.target/i386/avx512vl-pr100267-1.c: New test.
> > * gcc.target/i386/avx512vl-pr100267-pd-2.c: New test.
> > * gcc.target/i386/avx512vl-pr100267-ps-2.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-1.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-b-2.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-d-2.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-q-2.c: New test.
> > * gcc.target/i386/avx512vlbw-pr100267-w-2.c: New test.
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao
From 17e8b8e85da9d3a2bcacc108615a307ae04d67f3 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Thu, 29 Apr 2021 18:27:09 +0800
Subject: [PATCH 2/2] [i386] Optimize vpexpand* to mask mov when mask have all
 ones in it's lower part (including 0 and -1).

gcc/ChangeLog:

	* config/i386/i386-builtin.def (BDESC): Adjust builtin name.
	* config/i386/sse.md (_expand_mask): Rename to ..
	(expand_mask): this ..
	(*expand_mask): New pre_reload splitter to transform
	v{,p}expand* to vmov* when mask is zero, all ones, or has all
	ones in it's lower part, otherwise still generate
	v{,p}expand*.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-pr100267-1.c: New test.
	* gcc.target/i386/avx512bw-pr100267-b-2.c: New test.
	* gcc.target/i386/avx512bw-pr100267-d-2.c: New test.
	* gcc.target/i386/avx512bw-pr100267-q-2.c: New test.
	* gcc.target/i386/avx512bw-pr100267-w-2.c: New test.
	* gcc.target/i386/avx512f-pr100267-1.c: New test.
	* gcc.target/i386/avx512f-pr100267-pd-2.c: New test.
	* gcc.target/i386/avx512f-pr100267-ps-2.c: New test.
	* gcc.target/i386/avx512vl-pr100267-1.c: New test.
	* gcc.target/i386/avx512vl-pr100267-pd-2.c: New test.
	* gcc.target/i386/avx512vl-pr100267-ps-2.c: New test.
	* gcc.target/i386/avx512vlbw-pr100267-1.c: New test.
	* gcc.target/i386/avx512vlbw-pr100267-b-2.c: New test.
	* gcc.target/i386/avx512vlbw-pr100267-d-2.c: New test.
	* gcc.target/i386/avx512vlbw-pr100267-q-2.c: New test.
	* gcc.target/i386/avx512vlbw-pr100267-w-2.c: New test.
---
 gcc/config/i386/i386-builtin.def  |  48 +++
 gcc/config/i386/sse.md|  69 +-
 .../gcc.target/i386/avx512bw-pr100267-1.c |  38 ++
 .../gcc.target/i386/avx512bw-pr100267-b-2.c   |  74 +++
 .../gcc.target/i386/avx512bw-pr100267-d-2.c   |  74 +++
 .../gcc.target/i386/avx512bw-pr100267-q-2.c   |  74 +++
 .../gcc.target/i386/avx512bw-pr100267-w-2.c   |  74 +++
 .../gcc.target/i386/avx512f-pr100267-1.c  |  66 ++
 .../gcc.target/i386/avx512f-pr100267-pd-2.c   |  76 +++
 .../gcc.target/i386/avx512f-pr100267-ps-2.c   |  72 +++
 .../gcc.target/i386/avx512vl-pr100267-1.c | 122 ++
 .../gcc.target/i386/avx512vl-pr100267-pd-2.c  |  15 +++
 .../gcc.target/i386/avx512vl-pr100267-ps-2.c  |  15 +++
 .../gcc.target/i386/avx512vlbw-pr100267-1.c   |  66 ++
 .../gcc.target/i386/avx512vlbw-pr100267-b-2.c |  16 +++
 .../gcc.target/i386/avx512vlbw-pr100267-d-2.c |  15 +++
 .../gcc.target/i386/avx512vlbw-pr100267-q-2.c |  15 +++
 .../gcc.target/i386/avx512vlbw-pr100267-w-2.c |  16 +++
 18 files changed, 920 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-b-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-d-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr100267-q-2.c
 create mode 100644 

Re: [PATCH] Modula-2 into the GCC tree on master

2021-06-21 Thread Gaius Mulley via Gcc-patches
Segher Boessenkool  writes:

> Gaius, could you look through the two patches I did to get the build to
> work, see if those are correct or if something better needs to be done?
>
> 
> $(subdir) is an absolute path for me, so ../$(subdir) cannot work.

Hi Segher,

ah after more testing the patch will fail on a relative srcdir and the
configure has to be run in the m2 or m2/gm2-libs build directories.
I've pushed some changes onto the gm2 repro at savannah to fix this
which are included below.  Also included are the renaming changes to
getopt you suggested

regards,
Gaius


diff --git a/gcc-versionno/gcc/m2/ChangeLog b/gcc-versionno/gcc/m2/ChangeLog
index eeac1930..e523f307 100644
--- a/gcc-versionno/gcc/m2/ChangeLog
+++ b/gcc-versionno/gcc/m2/ChangeLog
@@ -1,4 +1,23 @@
-2021-06-19  Matthias Klose 
+2021-06-21   Gaius Mulley 
+
+   * tools-src/calcpath:  (New file).
+   * Make-lang.in:  (m2/gm2-libs/gm2-libs-host.h) use calcpath
+   to determine the srcdir of the new subdirectory.
+   (m2/gm2config.h) use calcpath
+   to determine the srcdir of the new subdirectory.
+   Fixes an error and based on a patch reported by Segher Boessenkool
+   .
+   * Make-lang.in:  (m2/gm2-libs/gm2-libs-host.h) Restore tabs.
+   * Make-lang.in:  (m2/gm2config.h) Restore tabs.
+   * Make-lang.in:  Replaced getopt.c by cgetopt.c.
+   * gm2-libs/getopt.def:  Renamed gm2-libs/cgetopt.def.
+   * gm2-libs-ch/getopt.c:  Renamed gm2-libs-ch/cgetopt.c.
+   Replaced getopt_ by cgetopt_.
+   Fixes an error reported by Segher Boessenkool
+   .
+   * tools-src/calcpath:  (Corrected header comment).
+
+2021-06-19   Matthias Klose 
 
* Make-lang.in:  introduce parallel linking.
* Make-lang.in (m2.serial): New target.
diff --git a/gcc-versionno/gcc/m2/Make-lang.in 
b/gcc-versionno/gcc/m2/Make-lang.in
index 58e5312e..298da26f 100644
--- a/gcc-versionno/gcc/m2/Make-lang.in
+++ b/gcc-versionno/gcc/m2/Make-lang.in
@@ -1179,9 +1179,11 @@ m2/gm2-libs-iso/%.o: $(srcdir)/m2/gm2-libs-iso/%.mod
 
 m2/gm2-libs/gm2-libs-host.h:
echo "Configuring to build libraries using native compiler" ; \
+NEW_SRCDIR=`${srcdir}/m2/tools-src/calcpath ../../ ${srcdir} 
m2/gm2-libs` ; \
+export NEW_SRCDIR ; \
 cd m2/gm2-libs ; \
-$(SHELL) -c '../../$(srcdir)/m2/gm2-libs/config-host \
-   --srcdir=../../$(srcdir)/m2/gm2-libs \
+$(SHELL) -c '$${NEW_SRCDIR}/config-host \
+   --srcdir=$${NEW_SRCDIR} \
--target=$(target) \
--program-suffix=$(exeext)'
 
@@ -1189,15 +1191,21 @@ m2/gm2-libs/gm2-libs-host.h:
 # cross compiler and the ../Makefile.in above appends this to INTERNAL_CFLAGS.
 
 m2/gm2config.h:
+   NEW_SRCDIR=`${srcdir}/m2/tools-src/calcpath ../ ${srcdir} m2` ; \
+export NEW_SRCDIR ; \
cd m2 ; \
if echo $(INTERNAL_CFLAGS) | grep \\-DCROSS_DIRECTORY_STRUCTURE; then \
-AR=`echo $(AR_FOR_TARGET) | sed -e "s/^ //"` ; \
+AR=$(echo $(AR_FOR_TARGET) | sed -e "s/^ //") ; \
 export AR ; \
-RANLIB=`echo $(RANLIB_FOR_TARGET) | sed -e "s/^ //"` ; \
+RANLIB=$(echo $(RANLIB_FOR_TARGET) | sed -e "s/^ //") ; \
 export RANLIB ; \
-$(SHELL) -c '../$(srcdir)/m2/configure --srcdir=../$(srcdir)/m2 
--target=$(target) --program-suffix=$(exeext) --includedir=$(SYSTEM_HEADER_DIR) 
--libdir=$(libdir) --libexecdir=$(libexecdir)' ; \
+$(SHELL) -c '$${NEW_SRCDIR}/configure --srcdir=$${NEW_SRCDIR} \
+--target=$(target) --program-suffix=$(exeext) \
+--includedir=$(SYSTEM_HEADER_DIR) --libdir=$(libdir) \
+--libexecdir=$(libexecdir)' ; \
 else \
-$(SHELL) -c '../$(srcdir)/m2/configure --srcdir=../$(srcdir)/m2 
--target=$(target) --program-suffix=$(exeext)' ; \
+$(SHELL) -c '$${NEW_SRCDIR}/configure --srcdir=$(NEW_SRCDIR) \
+--target=$(target) --program-suffix=$(exeext)' ; \
 fi
 
 $(objdir)/m2/gm2-libs-min/SYSTEM.def: $(GM2_PROG_DEP)
diff --git a/gcc-versionno/gcc/m2/gm2-libs-ch/getopt.c 
b/gcc-versionno/gcc/m2/gm2-libs-ch/cgetopt.c
similarity index 67%
rename from gcc-versionno/gcc/m2/gm2-libs-ch/getopt.c
rename to gcc-versionno/gcc/m2/gm2-libs-ch/cgetopt.c
index 1f483a72..205c0487 100644
--- a/gcc-versionno/gcc/m2/gm2-libs-ch/getopt.c
+++ b/gcc-versionno/gcc/m2/gm2-libs-ch/cgetopt.c
@@ -28,21 +28,21 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #include "system.h"
 #include "ansi-decl.h"
 
-char *getopt_optarg;
-int getopt_optind;
-int getopt_opterr;
-int getopt_optopt;
+char *cgetopt_optarg;
+int cgetopt_optind;
+int cgetopt_opterr;
+int cgetopt_optopt;
 
 
 char
-getopt_getopt (int argc, char *argv[], char *optstring)
+cgetopt_getopt (int argc, char *argv[], char 

Re: [PATCH][AVX512] Fix ICE for vpexpand*.

2021-06-21 Thread Hongtao Liu via Gcc-patches
This is the patch I'm going to push to the trunk.

On Wed, May 12, 2021 at 3:29 PM Hongtao Liu  wrote:
>
> ping
>
> On Fri, Apr 30, 2021 at 12:42 PM Hongtao Liu  wrote:
> >
> > Hi:
> >   This patch is to fix ice which was introduced by my
> > r11-5696-g35c4c67e6c534ef3d6ba7a7752ab7e0fbc91755b.
> >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> >   Ok for trunk and backport to GCC11?
> >
> >   gcc/ChangeLog
> >
> > PR target/100310
> > * config/i386/i386-expand.c
> > (ix86_expand_special_args_builtin): Keep constm1_operand only
> > if it satisfies insn's operand predicate.
> >
> > gcc/testsuite/ChangeLog
> >
> > PR target/100310
> > * gcc.target/i386/pr100310.c: New test.
> >
> > Add test
> > ---
> >  gcc/config/i386/i386-expand.c|  5 +++--
> >  gcc/testsuite/gcc.target/i386/pr100310.c | 12 
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr100310.c
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index 516440eb5c1..b2bb2b1e3a1 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -10862,11 +10862,12 @@ ix86_expand_special_args_builtin (const
> > struct builtin_description *d,
> >
> >   op = fixup_modeless_constant (op, mode);
> >
> > - /* NB: 3-operands load implied it's a mask load,
> > + /* NB: 3-operands load implied it's a mask load or v{p}expand*,
> >  and that mask operand shoud be at the end.
> >  Keep all-ones mask which would be simplified by the expander.  
> > */
> >   if (nargs == 3 && i == 2 && klass == load
> > - && constm1_operand (op, mode))
> > + && constm1_operand (op, mode)
> > + && insn_p->operand[i].predicate (op, mode))
> > ;
> >   else if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
> > op = copy_to_mode_reg (mode, op);
> > diff --git a/gcc/testsuite/gcc.target/i386/pr100310.c
> > b/gcc/testsuite/gcc.target/i386/pr100310.c
> > new file mode 100644
> > index 000..54ace18531b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr100310.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512f -O2" } */
> > +#include 
> > +
> > +double *p;
> > +volatile __m512d x;
> > +volatile __mmask8 m;
> > +
> > +void foo()
> > +{
> > +  x = _mm512_mask_expandloadu_pd (x, 255, p);
> > +}
> > --
> > 2.18.1
> >
> >
> > --
> > BR,
> > Hongtao
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao
From 274325ebab87bd56484a6a55cfeb358dc5189263 Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Wed, 28 Apr 2021 14:52:59 +0800
Subject: [PATCH 1/2] [i386] Fix ICE for vpexpand*.

gcc/ChangeLog

	PR target/100310
	* config/i386/i386-expand.c
	(ix86_expand_special_args_builtin): Keep constm1_operand only
	if it satisfies insn's operand predicate.

gcc/testsuite/ChangeLog

	PR target/100310
	* gcc.target/i386/pr100310.c: New test.

Add test
---
 gcc/config/i386/i386-expand.c|  5 +++--
 gcc/testsuite/gcc.target/i386/pr100310.c | 12 
 2 files changed, 15 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr100310.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 8f4e4e4d884..cc2eaeed8df 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -10969,11 +10969,12 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
 
 	  op = fixup_modeless_constant (op, mode);
 
-	  /* NB: 3-operands load implied it's a mask load,
+	  /* NB: 3-operands load implied it's a mask load or v{p}expand*,
 	 and that mask operand shoud be at the end.
 	 Keep all-ones mask which would be simplified by the expander.  */
 	  if (nargs == 3 && i == 2 && klass == load
-	  && constm1_operand (op, mode))
+	  && constm1_operand (op, mode)
+	  && insn_p->operand[i].predicate (op, mode))
 	;
 	  else if (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode)
 	op = copy_to_mode_reg (mode, op);
diff --git a/gcc/testsuite/gcc.target/i386/pr100310.c b/gcc/testsuite/gcc.target/i386/pr100310.c
new file mode 100644
index 000..54ace18531b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100310.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512f -O2" } */
+#include 
+
+double *p;
+volatile __m512d x;
+volatile __mmask8 m;
+
+void foo()
+{
+  x = _mm512_mask_expandloadu_pd (x, 255, p);
+}
-- 
2.18.1



Re: [PATCH 5/5] Fortran: Re-enable 128-bit integers for AMD GCN

2021-06-21 Thread Tobias Burnus

On 18.06.21 16:20, Julian Brown wrote:

This patch reverts the part of Tobias's patch for PR target/96306 that
disables 128-bit integer support for AMD GCN.

OK for mainline (assuming the previous patches are in first)?


Well, as the only reason for that patch was to avoid tons of fails/ICE due
to incomplete TI support, I think it is fine (obvious) that this band aid
can be removed when complete/mostly complete TI mode
(int128_t/integer kind=16) is now available.

Besides, as remarked on IRC, as this is target specific, I think you
can also approve it yourself as GCN maintainer.

But for completeness: OK from my/Fortran's side.
And thanks for the patches!

Tobias


2021-06-18  Julian Brown  

libgfortran/
  PR target/96306
  * configure.ac: Remove stanza that removes KIND=16 integers for AMD GCN.
  * configure: Regenerate.
---
  libgfortran/configure| 22 --
  libgfortran/configure.ac |  4 
  2 files changed, 4 insertions(+), 22 deletions(-)

diff --git a/libgfortran/configure b/libgfortran/configure
index f3634389cf8..886216f69d4 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -6017,7 +6017,7 @@ case "$host" in
  case "$enable_cet" in
auto)
  # Check if target supports multi-byte NOPs
- # and if assembler supports CET insn.
+ # and if compiler and assembler support CET insn.
  cet_save_CFLAGS="$CFLAGS"
  CFLAGS="$CFLAGS -fcf-protection"
  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
@@ -6216,10 +6216,6 @@ fi
  LIBGOMP_CHECKED_INT_KINDS="1 2 4 8 16"
  LIBGOMP_CHECKED_REAL_KINDS="4 8 10 16"

-if test "x${target_cpu}" = xamdgcn; then
-  # amdgcn only has limited support for __int128.
-  LIBGOMP_CHECKED_INT_KINDS="1 2 4 8"
-fi



@@ -12731,7 +12727,7 @@ else
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
-#line 12744 "configure"
+#line 12730 "configure"
  #include "confdefs.h"

  #if HAVE_DLFCN_H
@@ -12837,7 +12833,7 @@ else
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
-#line 12850 "configure"
+#line 12836 "configure"
  #include "confdefs.h"

  #if HAVE_DLFCN_H
@@ -15532,16 +15528,6 @@ freebsd* | dragonfly*)
esac
;;

-gnu*)
-  version_type=linux
-  need_lib_prefix=no
-  need_version=no
-  library_names_spec='${libname}${release}${shared_ext}$versuffix 
${libname}${release}${shared_ext}${major} ${libname}${shared_ext}'
-  soname_spec='${libname}${release}${shared_ext}$major'
-  shlibpath_var=LD_LIBRARY_PATH
-  hardcode_into_libs=yes
-  ;;
-
  haiku*)
version_type=linux
need_lib_prefix=no
@@ -15663,7 +15649,7 @@ linux*oldld* | linux*aout* | linux*coff*)
  # project, but have not yet been accepted: they are GCC-local changes
  # for the time being.  (See
  # https://lists.gnu.org/archive/html/libtool-patches/2018-05/msg0.html)
-linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
+linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu* | uclinuxfdpiceabi)
version_type=linux
need_lib_prefix=no
need_version=no
diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
index 8961e314d82..523eb24bca1 100644
--- a/libgfortran/configure.ac
+++ b/libgfortran/configure.ac
@@ -222,10 +222,6 @@ AM_CONDITIONAL(LIBGFOR_MINIMAL, [test "x${target_cpu}" = 
xnvptx])
  LIBGOMP_CHECKED_INT_KINDS="1 2 4 8 16"
  LIBGOMP_CHECKED_REAL_KINDS="4 8 10 16"

-if test "x${target_cpu}" = xamdgcn; then
-  # amdgcn only has limited support for __int128.
-  LIBGOMP_CHECKED_INT_KINDS="1 2 4 8"
-fi
  AC_SUBST(LIBGOMP_CHECKED_INT_KINDS)
  AC_SUBST(LIBGOMP_CHECKED_REAL_KINDS)


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


[Ada] Add Ada.Strings.Text_Buffers and replace uses of Ada.Strings.Text_Output

2021-06-21 Thread Pierre-Marie de Rodat
GNAT's initial implementation of Ada 2022's Image attributes for
non-scalar types referenced (in user-visible ways) a GNAT-defined
package Ada.Strings.Text_Output and child units thereof. The Ada RM
specifies that a similar-but-different package,
Ada.Strings.Text_Buffers, is to be provided (RM A.4.12) and used in the
definition of the Put_Image attribute (RM 4.10). Step 1 is to provide a
spec and implementation for this new package and its language-defined
child units. Step 2 is to modify the compiler and the run-time library
to use this new package instead of the old-one for all image-related
matters. Having done this, we want to get rid of the old Text_Output
package. It has one other client, the Gen_IL package and its child
units.  Using such bleeding-edge I/O units is problematic during the
bootstrap process. So step 3 is modifying Gen_IL to use only
Ada.Streams.Stream_IO instead. Step 4 will be finally deleting the
Text_Output package. Steps 1-3 are included in this commit; step 4 is
not.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* Make-generated.in (GEN_IL_FLAGS): Keep only GNAT flags.
(ada/stamp-gen_il): Remove dependencies on libgnat/ sources.  Do not
copy libgnat/ sources locally and tidy up.
* Makefile.rtl: Include object files for new Text_Buffer units
in the GNATRTL_NONTASKING_OBJS list.
* exp_put_image.ads, exp_put_image.adb: Update Rtsfind calls to
match new specs. For example, calls to RE_Sink are replaced with
calls to RE_Root_Buffer_Type. Update comments and change
subprogram names accordingly (e.g., Preload_Sink is changed to
Preload_Root_Buffer_Type).
* impunit.adb: Add 6 new predefined units (Text_Buffers and 5
child units thereof).
* rtsfind.ads, rtsfind.adb: Add interfaces for accessing the
Ada.Strings.Text_Buffers package and declarations
therein (including the Unbounded child unit). Do not (yet)
delete interfaces for accessing the old Text_Output package.
* sem_attr.adb (Check_Put_Image_Attribute): Replace RE_Sink uses
with RE_Root_Buffer_Type and update comments accordingly.
* sem_ch10.adb (Analyze_Compilation_Unit): Update call to
reflect name change of callee (that is, the former Preload_Sink
is now Preload_Root_Buffer_Type).
* sem_ch13.adb (Has_Good_Profile): Replace RE_Sink use with
RE_Root_Buffer_Type.
(Build_Spec): Update comment describing a parameter type.
* gen_il.ads: Remove clauses for the old Text_Output package and
add them for Ada.Streams.Stream_IO.
(Sink): Declare.
(Create_File): Likewise.
(Increase_Indent): Likewise.
(Decrease_Indent): Likewise.
(Put): Likewise.
(LF): Likewise.
* gen_il.adb: Add clauses for Ada.Streams.Stream_IO.
(Create_File): New procedure.
(Increase_Indent): Likewise.
(Decrease_Indent): Likewise.
(Put): New procedures.
* gen_il-gen.adb: Add clauses for Ada.Text_IO.  Replace
Sink'Class with Sink throughout.  Use string concatenation and
LF marker instead of formatted strings and "\n" marker.  Update
Indent/Outdent calls to use new Increase_Indent/Decrease_Indent
names.
(Put_Membership_Query_Decl): Remove.
* gen_il-internals.ads: Replace Sink'Class with Sink throughout.
(Ptypes): Remove.
(Pfields): Likewise.
* gen_il-internals.adb: Remove clauses for GNAT.OS_Lib and
Ada.Strings.Text_Buffers.Files.  Replace Sink'Class with Sink
throughout.  Use string concatenation and LF marker instead of
formatted strings and "\n" marker.
(Stdout): Remove.
(Ptypes): Likewise.
(Pfields): Likewise.
* libgnarl/s-putaim.ads: Modify context clause, update
declaration of subtype Sink to refer to
Text_Buffers.Root_Buffer_Type instead of the old
Text_Output.Sink type.
* libgnarl/s-putaim.adb: Modify context clause and add use
clause to refer to Text_Buffers package.
* libgnat/a-cbdlli.ads, libgnat/a-cbdlli.adb,
libgnat/a-cbhama.ads, libgnat/a-cbhama.adb,
libgnat/a-cbhase.ads, libgnat/a-cbhase.adb,
libgnat/a-cbmutr.ads, libgnat/a-cbmutr.adb,
libgnat/a-cborma.ads, libgnat/a-cborma.adb,
libgnat/a-cborse.ads, libgnat/a-cborse.adb,
libgnat/a-cdlili.ads, libgnat/a-cdlili.adb,
libgnat/a-cidlli.ads, libgnat/a-cidlli.adb,
libgnat/a-cihama.ads, libgnat/a-cihama.adb,
libgnat/a-cihase.ads, libgnat/a-cihase.adb,
libgnat/a-cimutr.ads, libgnat/a-cimutr.adb,
libgnat/a-ciorma.ads, libgnat/a-ciorma.adb,
libgnat/a-ciormu.ads, libgnat/a-ciormu.adb,
libgnat/a-ciorse.ads, libgnat/a-ciorse.adb,
libgnat/a-coboho.ads, libgnat/a-coboho.adb,
libgnat/a-cobove.ads, libgnat/a-cobove.adb,
 

[Ada] Implement fixed-lower-bound consistency checks for qualified_expressions

2021-06-21 Thread Pierre-Marie de Rodat
This change implements a missing check for qualified_expressions where
the qualifying subtype is an unconstrained array subtype that specifies
fixed lower bounds for one or more of its index ranges.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* checks.adb (Selected_Range_Checks): In the case of a
qualified_expression where the qualifying subtype is an
unconstrained array subtype with fixed lower bounds for some of
its indexes, generate tests to check that those bounds are equal
to the corresponding lower bounds of the qualified array object.diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -11106,6 +11106,56 @@ package body Checks is
end;
 end if;
 
+ --  If the context is a qualified_expression where the subtype is
+ --  an unconstrained array subtype with fixed-lower-bound indexes,
+ --  then consistency checks must be done between the lower bounds
+ --  of any such indexes and the corresponding lower bounds of the
+ --  qualified array object.
+
+ elsif Is_Fixed_Lower_Bound_Array_Subtype (T_Typ)
+   and then Nkind (Parent (Expr)) = N_Qualified_Expression
+   and then not Do_Access
+ then
+declare
+   Ndims : constant Pos := Number_Dimensions (T_Typ);
+
+   Qual_Index : Node_Id;
+   Expr_Index : Node_Id;
+
+begin
+   Expr_Actual := Get_Referenced_Object (Expr);
+   Exptyp  := Get_Actual_Subtype (Expr_Actual);
+
+   Qual_Index := First_Index (T_Typ);
+   Expr_Index := First_Index (Exptyp);
+
+   for Indx in 1 .. Ndims loop
+  if Nkind (Expr_Index) /= N_Raise_Constraint_Error then
+
+ --  If this index of the qualifying array subtype has
+ --  a fixed lower bound, then apply a check that the
+ --  corresponding lower bound of the array expression
+ --  is equal to it.
+
+ if Is_Fixed_Lower_Bound_Index_Subtype (Etype (Qual_Index))
+ then
+Evolve_Or_Else
+  (Cond,
+   Make_Op_Ne (Loc,
+ Left_Opnd   =>
+   Get_E_First_Or_Last
+ (Loc, Exptyp, Indx, Name_First),
+ Right_Opnd  =>
+   New_Copy_Tree
+ (Type_Low_Bound (Etype (Qual_Index);
+ end if;
+
+ Next (Qual_Index);
+ Next (Expr_Index);
+  end if;
+   end loop;
+end;
+
  else
 --  For a conversion to an unconstrained array type, generate an
 --  Action to check that the bounds of the source value are within




[Ada] Optimization of System.Value_N

2021-06-21 Thread Pierre-Marie de Rodat
Add Inline and Pure_Function aspects. With these changes, and additional
changes in gigi, we should be able to eliminate duplicate calls to
Value_Enumeration_Pos.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-valuen.ads (Value_Enumeration,
Valid_Enumeration_Value): Inline.
(Value_Enumeration_Pos): Add Pure_Function.diff --git a/gcc/ada/libgnat/s-valuen.ads b/gcc/ada/libgnat/s-valuen.ads
--- a/gcc/ada/libgnat/s-valuen.ads
+++ b/gcc/ada/libgnat/s-valuen.ads
@@ -48,7 +48,7 @@ package System.Value_N is
   Hash: Hash_Function_Ptr;
   Num : Natural;
   Str : String)
-  returnNatural;
+  returnNatural with Inline;
--  Used to compute Enum'Value (Str) where Enum is some enumeration type
--  other than those defined in package Standard. Names is a string with
--  a lower bound of 1 containing the characters of all the enumeration
@@ -73,7 +73,7 @@ package System.Value_N is
   Hash: Hash_Function_Ptr;
   Num : Natural;
   Str : String)
-  returnBoolean;
+  returnBoolean with Inline;
--  Returns True if Str is a valid Image of some enumeration literal, False
--  otherwise. That is, returns False if and only if Value_Enumeration would
--  raise Constraint_Error. The parameters have the same meaning as for
@@ -87,7 +87,7 @@ package System.Value_N is
   Hash: Hash_Function_Ptr;
   Num : Natural;
   Str : String)
-  returnInteger;
+  returnInteger with Pure_Function;
--  Same as Value_Enumeration, except returns Invalid if Value_Enumeration
--  would raise Constraint_Error.
 




[Ada] INOX: prototype "when" constructs

2021-06-21 Thread Pierre-Marie de Rodat
This patch implements experimental features under the -gnatX flag for
"return ... when", "raise ... when", and "goto ... when" constructs.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* doc/gnat_rm/implementation_defined_pragmas.rst: Document new
feature under pragma Extensions_Allowed.
* gnat_rm.texi: Regenerate.
* errout.adb, errout.ads (Error_Msg_GNAT_Extension): Created to
issue errors when parsing extension only constructs.
* exp_ch11.adb, exp_ch11.ads (Expand_N_Raise_When_Statement):
Created to expand raise ... when constucts.
* exp_ch5.adb, exp_ch5.ads (Expand_N_Goto_When_Statement):
Created to expand goto ... when constructs.
* exp_ch6.adb, exp_ch6.ads (Expand_N_Return_When_Statement):
Created to expand return ... when constructs.
* expander.adb (Expand): Add case entries for "when" constructs.
* gen_il-gen-gen_nodes.adb, gen_il-types.ads: Add entries for
"when" constructs.
* par-ch11.adb (P_Raise_Statement): Add processing for raise ...
when.
* par-ch5.adb (Missing_Semicolon_On_Exit): Renamed to
Missing_Semicolon_On_When and moved to par-util.adb.
* par-ch6.adb (Get_Return_Kind): Renamed from Is_Simple and
processing added for "return ... when" return kind.
(Is_Simple): Renamed to Get_Return_Kind.
(P_Return_Statement): Add case for return ... when variant of
return statement.
* par-util.adb, par.adb (Missing_Semicolon_On_When): Added to
centeralize parsing of "when" keywords in the context of "when"
constructs.
* sem.adb (Analyze): Add case for "when" constructs.
* sem_ch11.adb, sem_ch11.ads (Analyze_Raise_When_Statement):
Created to analyze raise ... when constructs.
* sem_ch5.adb, sem_ch5.ads (Analyzed_Goto_When_Statement):
Created to analyze goto ... when constructs.
* sem_ch6.adb, sem_ch6.ads (Analyze_Return_When_Statement):
Created to analyze return ... when constructs.
* sprint.adb (Sprint_Node_Actual): Add entries for new "when"
nodes.

patch.diff.gz
Description: application/gzip


[Ada] Improve efficiency of small slice assignments of packed arrays

2021-06-21 Thread Pierre-Marie de Rodat
If slices fit in 32 bits, and bounds are known at compile time, use a
more efficient method for slice assignment. (32 is the usual case here,
we're really talking about Val_Bits, which is 32 on most targets.) We
might want to change 32 to 64 here.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* rtsfind.ads, libgnat/s-bitfie.ads, libgnat/s-bituti.adb,
libgnat/s-bituti.ads (Fast_Copy_Bitfield): New run-time library
function to copy bit fields faster than Copy_Bitfield. Cannot be
called with zero-size bit fields.  Remove obsolete ??? comments
from s-bituti.adb; we already do "avoid calling this if
Forwards_OK is False".
* exp_ch5.adb (Expand_Assign_Array_Loop_Or_Bitfield,
Expand_Assign_Array_Bitfield_Fast): Generate calls to
Fast_Copy_Bitfield when appropriate.
* sem_util.adb, sem_util.ads (Get_Index_Bounds): Two new
functions for getting the index bounds. These are more
convenient than the procedure of the same name, because they can
be used to initialize constants.diff --git a/gcc/ada/exp_ch5.adb b/gcc/ada/exp_ch5.adb
--- a/gcc/ada/exp_ch5.adb
+++ b/gcc/ada/exp_ch5.adb
@@ -64,6 +64,7 @@ with Snames; use Snames;
 with Stand;  use Stand;
 with Stringt;use Stringt;
 with Tbuild; use Tbuild;
+with Ttypes; use Ttypes;
 with Uintp;  use Uintp;
 with Validsw;use Validsw;
 
@@ -127,8 +128,16 @@ package body Exp_Ch5 is
   R_Type : Entity_Id;
   Rev: Boolean) return Node_Id;
--  Alternative to Expand_Assign_Array_Loop for packed bitfields. Generates
-   --  a call to the System.Bitfields.Copy_Bitfield, which is more efficient
-   --  than copying component-by-component.
+   --  a call to System.Bitfields.Copy_Bitfield, which is more efficient than
+   --  copying component-by-component.
+
+   function Expand_Assign_Array_Bitfield_Fast
+ (N  : Node_Id;
+  Larray : Entity_Id;
+  Rarray : Entity_Id) return Node_Id;
+   --  Alternative to Expand_Assign_Array_Bitfield. Generates a call to
+   --  System.Bitfields.Fast_Copy_Bitfield, which is more efficient than
+   --  Copy_Bitfield, but only works in restricted situations.
 
function Expand_Assign_Array_Loop_Or_Bitfield
  (N  : Node_Id;
@@ -138,8 +147,8 @@ package body Exp_Ch5 is
   R_Type : Entity_Id;
   Ndim   : Pos;
   Rev: Boolean) return Node_Id;
-   --  Calls either Expand_Assign_Array_Loop or Expand_Assign_Array_Bitfield as
-   --  appropriate.
+   --  Calls either Expand_Assign_Array_Loop, Expand_Assign_Array_Bitfield, or
+   --  Expand_Assign_Array_Bitfield_Fast as appropriate.
 
procedure Expand_Assign_Record (N : Node_Id);
--  N is an assignment of an untagged record value. This routine handles
@@ -1440,6 +1449,84 @@ package body Exp_Ch5 is
   R_Addr, R_Bit, L_Addr, L_Bit, Size));
end Expand_Assign_Array_Bitfield;
 
+   ---
+   -- Expand_Assign_Array_Bitfield_Fast --
+   ---
+
+   function Expand_Assign_Array_Bitfield_Fast
+ (N  : Node_Id;
+  Larray : Entity_Id;
+  Rarray : Entity_Id) return Node_Id
+   is
+  pragma Assert (not Change_Of_Representation (N));
+  --  This won't work, for example, to copy a packed array to an unpacked
+  --  array.
+
+  --  For L (A .. B) := R (C .. D), we generate:
+  --
+  -- L := Fast_Copy_Bitfield (R, , L, ,
+  --  L (A .. B)'Length * L'Component_Size);
+  --
+  --  with L and R suitably uncheckedly converted to/from Val_2.
+  --  The offsets are from the start of L and R.
+
+  Loc  : constant Source_Ptr := Sloc (N);
+
+  L_Val : constant Node_Id :=
+Unchecked_Convert_To (RTE (RE_Val_2), Larray);
+  R_Val : constant Node_Id :=
+Unchecked_Convert_To (RTE (RE_Val_2), Rarray);
+  --  Converted values of left- and right-hand sides
+
+  C_Size : constant Uint := Component_Size (Etype (Larray));
+  pragma Assert (C_Size >= 1);
+  pragma Assert (C_Size = Component_Size (Etype (Rarray)));
+
+  Larray_Bounds : constant Range_Values :=
+Get_Index_Bounds (First_Index (Etype (Larray)));
+  L_Bounds : constant Range_Values :=
+(if Nkind (Name (N)) = N_Slice
+ then Get_Index_Bounds (Discrete_Range (Name (N)))
+ else Larray_Bounds);
+  --  If the left-hand side is A (L..H), Larray_Bounds is A'Range, and
+  --  L_Bounds is L..H. If it's not a slice, we treat it like a slice
+  --  starting at A'First.
+
+  L_Bit : constant Node_Id :=
+Make_Integer_Literal (Loc, (L_Bounds.L - Larray_Bounds.L) * C_Size);
+
+  Rarray_Bounds : constant Range_Values :=
+Get_Index_Bounds (First_Index (Etype (Rarray)));
+  R_Bounds : constant Range_Values :=
+(if Nkind (Expression (N)) = N_Slice
+ then Get_Index_Bounds 

[Ada] Implementation of AI12-205: defaults for formal types

2021-06-21 Thread Pierre-Marie de Rodat
This patch implements the new language feature described in AI12-0205
that introduces defaults for generic formal types. These defaults are
subtype marks that must denote a type that is usable as an actual in any
subsequent instantiation of the enclosing generic unit. The legality
rules are similar but not identical to the ones used for actuals in an
instantiation.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gen_il-fields.ads: Add Default_Subtype_Mark to enumeration
type for fields.
* gen_il-gen-gen_nodes.adb: Add call to create new field for
Formal_Type_Declaration node.
* par-ch12.adb (P_Formal_Type_Declaration): in Ada_2022 mode,
recognize new syntax for default: "or use subtype_mark".
(P_Formal_Type_Definition): Ditto for the case of a formal
incomplete type.
* sinfo.ads: Add field Default_Subtype_Mark to
N_Formal_Type_Declaration.
* sem_ch12.adb (Validate_Formal_Type_Default): New procedure, to
apply legality rules to default subtypes in formal type
declarations. Some legality rules apply to all defaults, such as
the requirement that the default for a formal type that depends
on previous formal entities must itself be a previously declared
formal of the same unit. Other checks are kind- specific.
(Analyze_Associations): Use specified default if there is no
actual provided for a formal type in an instance.
(Analyze_Formal_Type_Declaration): Call
Validate_Formal_Type_Default when default subtype is present.diff --git a/gcc/ada/gen_il-fields.ads b/gcc/ada/gen_il-fields.ads
--- a/gcc/ada/gen_il-fields.ads
+++ b/gcc/ada/gen_il-fields.ads
@@ -136,6 +136,7 @@ package Gen_IL.Fields is
   Default_Expression,
   Default_Storage_Pool,
   Default_Name,
+  Default_Subtype_Mark,
   Defining_Identifier,
   Defining_Unit_Name,
   Delay_Alternative,


diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -610,7 +610,8 @@ begin -- Gen_IL.Gen.Gen_Nodes
(Sy (Defining_Identifier, Node_Id),
 Sy (Formal_Type_Definition, Node_Id),
 Sy (Discriminant_Specifications, List_Id, Default_No_List),
-Sy (Unknown_Discriminants_Present, Flag)));
+Sy (Unknown_Discriminants_Present, Flag),
+Sy (Default_Subtype_Mark, Node_Id)));
 
Cc (N_Full_Type_Declaration, N_Declaration,
(Sy (Defining_Identifier, Node_Id),


diff --git a/gcc/ada/par-ch12.adb b/gcc/ada/par-ch12.adb
--- a/gcc/ada/par-ch12.adb
+++ b/gcc/ada/par-ch12.adb
@@ -559,6 +559,20 @@ package body Ch12 is
 
   if Def_Node /= Error then
  Set_Formal_Type_Definition (Decl_Node, Def_Node);
+
+ if Token = Tok_Or then
+Error_Msg_Ada_2022_Feature
+  ("default for formal type", Sloc (Decl_Node));
+Scan;   --  Past OR
+
+if Token /= Tok_Use then
+   Error_Msg_SC ("missing USE for default subtype");
+else
+   Scan;   -- Past USE
+   Set_Default_Subtype_Mark (Decl_Node, P_Name);
+end if;
+ end if;
+
  P_Aspect_Specifications (Decl_Node);
 
   else
@@ -727,11 +741,18 @@ package body Ch12 is
return Error;
 end if;
 
+ when Tok_Or =>
+--  Ada_2022: incomplete type with default
+return
+ New_Node (N_Formal_Incomplete_Type_Definition, Token_Ptr);
+
  when Tok_Private =>
 return P_Formal_Private_Type_Definition;
 
  when Tok_Tagged =>
-if Next_Token_Is (Tok_Semicolon) then
+if Next_Token_Is (Tok_Semicolon)
+  or else Next_Token_Is (Tok_Or)
+then
Typedef_Node :=
  New_Node (N_Formal_Incomplete_Type_Definition, Token_Ptr);
Set_Tagged_Present (Typedef_Node);


diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -888,6 +888,17 @@ package body Sem_Ch12 is
--  Verify that an attribute that appears as the default for a formal
--  subprogram is a function or procedure with the correct profile.
 
+   procedure Validate_Formal_Type_Default (Decl : Node_Id);
+   --  Ada_2022 AI12-205: if a default subtype_mark is present, verify
+   --  that it is the name of a type in the same class as the formal.
+   --  The treatment parallels what is done in Instantiate_Type but differs
+   --  in a few ways so that this machinery cannot be reused as is: on one
+   --  hand there are no visibility issues for a default, because it is
+   --  analyzed in the same context as the formal type definition; on the
+   --  other hand the check needs to take into acount the use of a previous
+   --  formal type in the current formal type definition 

[Ada] Adjust new fast bit-field copy path to big-endian platforms

2021-06-21 Thread Pierre-Marie de Rodat
The issue is that the unchecked conversion of small bit-packed arrays
to modular types is not done in memory order, whereas this order is
expected by the System.Bitfield_Utils unit.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch5.adb (Expand_Assign_Array_Bitfield_Fast): If big-endian
ordering is in effect for the operands and they are small,
adjust the unchecked conversions done around them.diff --git a/gcc/ada/exp_ch5.adb b/gcc/ada/exp_ch5.adb
--- a/gcc/ada/exp_ch5.adb
+++ b/gcc/ada/exp_ch5.adb
@@ -1472,18 +1472,30 @@ package body Exp_Ch5 is
 
   Loc  : constant Source_Ptr := Sloc (N);
 
+  L_Typ : constant Entity_Id := Etype (Larray);
+  R_Typ : constant Entity_Id := Etype (Rarray);
+  --  The original type of the arrays
+
   L_Val : constant Node_Id :=
 Unchecked_Convert_To (RTE (RE_Val_2), Larray);
   R_Val : constant Node_Id :=
 Unchecked_Convert_To (RTE (RE_Val_2), Rarray);
   --  Converted values of left- and right-hand sides
 
-  C_Size : constant Uint := Component_Size (Etype (Larray));
+  L_Small : constant Boolean :=
+Known_Static_RM_Size (L_Typ)
+  and then RM_Size (L_Typ) < Standard_Long_Long_Integer_Size;
+  R_Small : constant Boolean :=
+Known_Static_RM_Size (R_Typ)
+  and then RM_Size (R_Typ) < Standard_Long_Long_Integer_Size;
+  --  Whether the above unchecked conversions need to be padded with zeros
+
+  C_Size : constant Uint := Component_Size (L_Typ);
   pragma Assert (C_Size >= 1);
-  pragma Assert (C_Size = Component_Size (Etype (Rarray)));
+  pragma Assert (C_Size = Component_Size (R_Typ));
 
   Larray_Bounds : constant Range_Values :=
-Get_Index_Bounds (First_Index (Etype (Larray)));
+Get_Index_Bounds (First_Index (L_Typ));
   L_Bounds : constant Range_Values :=
 (if Nkind (Name (N)) = N_Slice
  then Get_Index_Bounds (Discrete_Range (Name (N)))
@@ -1496,7 +1508,7 @@ package body Exp_Ch5 is
 Make_Integer_Literal (Loc, (L_Bounds.L - Larray_Bounds.L) * C_Size);
 
   Rarray_Bounds : constant Range_Values :=
-Get_Index_Bounds (First_Index (Etype (Rarray)));
+Get_Index_Bounds (First_Index (R_Typ));
   R_Bounds : constant Range_Values :=
 (if Nkind (Expression (N)) = N_Slice
  then Get_Index_Bounds (Discrete_Range (Expression (N)))
@@ -1516,15 +1528,56 @@ package body Exp_Ch5 is
   Duplicate_Subexpr (Larray, True),
 Attribute_Name => Name_Component_Size));
 
-  Call : constant Node_Id := Make_Function_Call (Loc,
+  L_Arg, R_Arg, Call : Node_Id;
+
+   begin
+  --  The semantics of unchecked conversion between bit-packed arrays that
+  --  are implemented as modular types and modular types is precisely that
+  --  of unchecked conversion between modular types. Therefore, if it needs
+  --  to be padded with zeros, the padding must be moved to the correct end
+  --  for memory order because System.Bitfield_Utils works in memory order.
+
+  if L_Small
+and then (Bytes_Big_Endian xor Reverse_Storage_Order (L_Typ))
+  then
+ L_Arg := Make_Op_Shift_Left (Loc,
+   Left_Opnd  => L_Val,
+   Right_Opnd => Make_Integer_Literal (Loc,
+   Standard_Long_Long_Integer_Size - RM_Size (L_Typ)));
+  else
+ L_Arg := L_Val;
+  end if;
+
+  if R_Small
+and then (Bytes_Big_Endian xor Reverse_Storage_Order (R_Typ))
+  then
+ R_Arg := Make_Op_Shift_Left (Loc,
+   Left_Opnd  => R_Val,
+   Right_Opnd => Make_Integer_Literal (Loc,
+   Standard_Long_Long_Integer_Size - RM_Size (R_Typ)));
+  else
+ R_Arg := R_Val;
+  end if;
+
+  Call := Make_Function_Call (Loc,
 Name => New_Occurrence_Of (RTE (RE_Fast_Copy_Bitfield), Loc),
 Parameter_Associations => New_List (
-  R_Val, R_Bit, L_Val, L_Bit, Size));
+  R_Arg, R_Bit, L_Arg, L_Bit, Size));
+
+  --  Conversely, the final unchecked conversion must take significant bits
+
+  if L_Small
+and then (Bytes_Big_Endian xor Reverse_Storage_Order (L_Typ))
+  then
+ Call := Make_Op_Shift_Right (Loc,
+   Left_Opnd  => Call,
+   Right_Opnd => Make_Integer_Literal (Loc,
+   Standard_Long_Long_Integer_Size - RM_Size (L_Typ)));
+  end if;
 
-   begin
   return Make_Assignment_Statement (Loc,
 Name => Duplicate_Subexpr (Larray, True),
-Expression => Unchecked_Convert_To (Etype (Larray), Call));
+Expression => Unchecked_Convert_To (L_Typ, Call));
end Expand_Assign_Array_Bitfield_Fast;
 
--




[Ada] Add Return_Statement field

2021-06-21 Thread Pierre-Marie de Rodat
Used by GNAT LLVM to handle E_Constant and E_Variable with
Is_Return_Object.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* einfo.ads (Return_Statement): Add documentation.
* exp_ch6.adb (Expand_N_Extended_Return_Statement): Set it.
* gen_il-fields.ads: Add it.
* gen_il-gen-gen_entities.adb: Add it.diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads
--- a/gcc/ada/einfo.ads
+++ b/gcc/ada/einfo.ads
@@ -4206,6 +4206,11 @@ package Einfo is
 --   entities (for convenience in setting it), but is only tested
 --   for the function case.
 
+--Return_Statement
+--   Defined in E_Variable. Set when Is_Return_Object is set, in which
+--   case it points to the N_Simple_Return_Statement made from the
+--   extended return statement.
+
 --Returns_By_Ref
 --   Defined in subprogram type entities and functions. Set if a function
 --   (or an access-to-function type) returns a result by reference, either


diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -6033,6 +6033,7 @@ package body Exp_Ch6 is
   --  Set the flag to prevent infinite recursion
 
   Set_Comes_From_Extended_Return_Statement (Return_Stmt);
+  Set_Return_Statement (Ret_Obj_Id, Return_Stmt);
 
   Rewrite (N, Result);
 


diff --git a/gcc/ada/gen_il-fields.ads b/gcc/ada/gen_il-fields.ads
--- a/gcc/ada/gen_il-fields.ads
+++ b/gcc/ada/gen_il-fields.ads
@@ -874,6 +874,7 @@ package Gen_IL.Fields is
   Requires_Overriding,
   Return_Applies_To,
   Return_Present,
+  Return_Statement,
   Returns_By_Ref,
   Reverse_Bit_Order,
   Reverse_Storage_Order,


diff --git a/gcc/ada/gen_il-gen-gen_entities.adb b/gcc/ada/gen_il-gen-gen_entities.adb
--- a/gcc/ada/gen_il-gen-gen_entities.adb
+++ b/gcc/ada/gen_il-gen-gen_entities.adb
@@ -350,6 +350,7 @@ begin -- Gen_IL.Gen.Gen_Entities
 Sm (Prival_Link, Node_Id),
 Sm (Related_Expression, Node_Id),
 Sm (Related_Type, Node_Id),
+Sm (Return_Statement, Node_Id),
 Sm (Size_Check_Code, Node_Id),
 Sm (SPARK_Pragma, Node_Id),
 Sm (SPARK_Pragma_Inherited, Flag),
@@ -421,6 +422,7 @@ begin -- Gen_IL.Gen.Gen_Entities
 Sm (Prival_Link, Node_Id),
 Sm (Related_Expression, Node_Id),
 Sm (Related_Type, Node_Id),
+Sm (Return_Statement, Node_Id),
 Sm (Shared_Var_Procs_Instance, Node_Id),
 Sm (Size_Check_Code, Node_Id),
 Sm (SPARK_Pragma, Node_Id),




[Ada] Implement 'Valid_Value attribute

2021-06-21 Thread Pierre-Marie de Rodat
Implement the 'Valid_Value attribute for enumeration types.  Currently,
'Valid_Value is not supported for types in Standard, because they do not
have image/value tables. Currently, there are no 'Valid_Wide_Image or
'Valid_Wide_Wide_Image attributes.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-valuen.ads, libgnat/s-valuen.adb
(Value_Enumeration_Pos): New function to compute the 'Pos of the
enumeration literal for a given String.  Return a special value
instead of raising an exception on invalid input. Called by both
Valid_Enumeration_Image and Value_Enumeration.
(Valid_Enumeration_Image): Return a Boolean indicating whether
the String is a valid Image for the given enumeration type.
(Value_Enumeration): Implement in terms of
Value_Enumeration_Pos.
* libgnat/s-vaenu8.ads, libgnat/s-vaen16.ads,
libgnat/s-vaen32.ads: Rename Valid_Enumeration_Image from the
instances.
* libgnat/s-valuti.ads: Correct documentation (it was not true
for the null string).
* libgnat/s-valuti.adb (Normalize_String): Do not raise
Constraint_Error for the null string, nor strings containing
nothing but blanks, so that Valid_Enumeration_Image can return
False in these cases, rather than raising an exception.
* rtsfind.ads (RE_Value_Enumeration_8, RE_Value_Enumeration_16,
RE_Value_Enumeration_32): New functions.
(RTE_Available): Improve comment (E doesn't have to be a
subprogram, although that's the usual case).
* sem_attr.adb (nalid_Value): Semantic analysis for new
attribute.
* exp_attr.adb: Call Expand_Valid_Value_Attribute for new
attribute.
* exp_imgv.ads, exp_imgv.adb (Expand_Valid_Value_Attribute): New
procedure to expand Valid_Value into a call to
Valid_Enumeration_Image_NN.
(Expand_Value_Attribute): Misc code cleanups.  Remove two ???
mark comments. RTE_Available won't work here.  For one thing,
RTE_Available (X) shouldn't be called until the compiler has
decided to make use of X (see comments on RTE_Available), and in
this case we're trying to AVOID calling something.
* snames.ads-tmpl: New attribute name.
* doc/gnat_rm/implementation_defined_attributes.rst: Document
new attribute.
* gnat_rm.texi: Regenerate.

patch.diff.gz
Description: application/gzip


[Ada] Make -gnatU and -gnatw.d the default

2021-06-21 Thread Pierre-Marie de Rodat
-gnatU prepends `error:` to error messages, enabling it by default makes
error messages more consistent with warnings.  -gnatw.d tags messages
with the flag that caused them.

As users might need to switch back to the previous behavior, the
`-gnatd_U` flag is introduced to do exactly that.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* debug.adb: Document -gnatd_U as taken.
* err_vars.ads (Warning_Doc_Switch): Set to True.
* errout.ads (Errout): Update documentation.
* gnat1drv.adb (Adjust_Global_Switches): React to -gnatd_U.
* hostparm.ads (Tag_Errors): Set to True.
* opt.ads (Unique_Error_Tag): Document -gnatd_U.diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb
--- a/gcc/ada/debug.adb
+++ b/gcc/ada/debug.adb
@@ -186,7 +186,7 @@ package body Debug is
--  d_R
--  d_S
--  d_T  Output trace information on invocation path recording
-   --  d_U
+   --  d_U  Disable prepending messages with "error:".
--  d_V  Enable verifications on the expanded tree
--  d_W
--  d_X
@@ -1017,6 +1017,9 @@ package body Debug is
--  d_T  The compiler outputs trace information to standard output whenever
--   an invocation path is recorded.
 
+   --  d_U  Disable prepending 'error:' to error messages. This used to be the
+   --   default and can be seen as the opposite of -gnatU.
+
--  d_V  Enable verification of the expanded code before calling the backend
--   and generate error messages on each inconsistency found.
 


diff --git a/gcc/ada/err_vars.ads b/gcc/ada/err_vars.ads
--- a/gcc/ada/err_vars.ads
+++ b/gcc/ada/err_vars.ads
@@ -89,7 +89,7 @@ package Err_Vars is
--  Source_Reference line, then this is initialized to No_Source_File,
--  to force an initial reference to the real source file name.
 
-   Warning_Doc_Switch : Boolean := False;
+   Warning_Doc_Switch : Boolean := True;
--  If this is set True, then the ??/?x?/?x? sequences in error messages
--  are active (see errout.ads for details). If this switch is False, then
--  these sequences are ignored (i.e. simply equivalent to a single ?). The


diff --git a/gcc/ada/errout.ads b/gcc/ada/errout.ads
--- a/gcc/ada/errout.ads
+++ b/gcc/ada/errout.ads
@@ -519,7 +519,7 @@ package Errout is
--  The prefixes error and warning are supplied automatically (depending
--  on the use of the ? insertion character), and the call to the error
--  message routine supplies the text. The "error: " prefix is omitted
-   --  in brief error message formats.
+   --  if -gnatd_U is among the options given to gnat.
 
--  Reserved Ada keywords in the message are in the default keyword case
--  (determined from the given source program), surrounded by quotation


diff --git a/gcc/ada/gnat1drv.adb b/gcc/ada/gnat1drv.adb
--- a/gcc/ada/gnat1drv.adb
+++ b/gcc/ada/gnat1drv.adb
@@ -153,6 +153,12 @@ procedure Gnat1drv is
 
   Map_Pragma_Name (From => Name_Gnat_Annotate, To => Name_Annotate);
 
+  --  -gnatd_U disables prepending error messages with "error:"
+
+  if Debug_Flag_Underscore_UU then
+ Unique_Error_Tag := False;
+  end if;
+
   --  -gnatd.M enables Relaxed_RM_Semantics
 
   if Debug_Flag_Dot_MM then


diff --git a/gcc/ada/hostparm.ads b/gcc/ada/hostparm.ads
--- a/gcc/ada/hostparm.ads
+++ b/gcc/ada/hostparm.ads
@@ -56,9 +56,10 @@ package Hostparm is
--  of file names in the library, must be at least Max_Line_Length, but
--  can be larger.
 
-   Tag_Errors : constant Boolean := False;
+   Tag_Errors : constant Boolean := True;
--  If set to true, then brief form error messages will be prefaced by
-   --  the string "error:". Used as default for Opt.Unique_Error_Tag.
+   --  the string "error:". Used as default for Opt.Unique_Error_Tag. Disabled
+   --  by gnatd_U.
 
Exclude_Missing_Objects : constant Boolean := True;
--  If set to true, gnatbind will exclude from consideration all


diff --git a/gcc/ada/opt.ads b/gcc/ada/opt.ads
--- a/gcc/ada/opt.ads
+++ b/gcc/ada/opt.ads
@@ -1651,7 +1651,8 @@ package Opt is
Unique_Error_Tag : Boolean := Tag_Errors;
--  GNAT
--  Indicates if error messages are to be prefixed by the string error:
-   --  Initialized from Tag_Errors, can be forced on with the -gnatU switch.
+   --  Initialized from Tag_Errors, can be forced on with the -gnatU switch and
+   --  disabled with -gnatd_U.
 
Unnest_Subprogram_Mode : Boolean := False;
--  If true, activates the circuitry for unnesting subprograms (see the spec




[Ada] Zero-size slices

2021-06-21 Thread Pierre-Marie de Rodat
Fix a bug in slices, where a zero-sized slice causes an invalid read
detected by valgrind

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-bituti.ads (Small_Size): Do not include 0 in this
type.
* libgnat/s-bituti.adb (Copy_Bitfield): Do nothing for 0-bit
bitfields.diff --git a/gcc/ada/libgnat/s-bituti.adb b/gcc/ada/libgnat/s-bituti.adb
--- a/gcc/ada/libgnat/s-bituti.adb
+++ b/gcc/ada/libgnat/s-bituti.adb
@@ -402,11 +402,22 @@ package body System.Bitfield_Utils is
  pragma Assert (Al_Src_Address mod Val'Alignment = 0);
  pragma Assert (Al_Dest_Address mod Val'Alignment = 0);
   begin
+ --  Optimized small case
+
  if Size in Small_Size then
 Copy_Small_Bitfield
   (Al_Src_Address, Al_Src_Offset,
Al_Dest_Address, Al_Dest_Offset,
Size);
+
+ --  Do nothing for zero size. This is necessary to avoid doing invalid
+ --  reads, which are detected by valgrind.
+
+ elsif Size = 0 then
+null;
+
+ --  Large case
+
  else
 Copy_Large_Bitfield
   (Al_Src_Address, Al_Src_Offset,


diff --git a/gcc/ada/libgnat/s-bituti.ads b/gcc/ada/libgnat/s-bituti.ads
--- a/gcc/ada/libgnat/s-bituti.ads
+++ b/gcc/ada/libgnat/s-bituti.ads
@@ -98,9 +98,9 @@ package System.Bitfield_Utils is
   pragma Assert (Val_Array'Component_Size = Val'Size);
 
   subtype Bit_Size is Natural; -- Size in bits of a bit field
-  subtype Small_Size is Bit_Size range 0 .. Val'Size;
+  subtype Small_Size is Bit_Size range 1 .. Val'Size;
   --  Size of a small one
-  subtype Bit_Offset is Small_Size range 0 .. Val'Size - 1;
+  subtype Bit_Offset is Small_Size'Base range 0 .. Val'Size - 1;
   --  Starting offset
   subtype Bit_Offset_In_Byte is Bit_Offset range 0 .. Storage_Unit - 1;
 




[Ada] Fix invalid JSON real numbers generated with -gnatRj

2021-06-21 Thread Pierre-Marie de Rodat
The -gnatR output contains information about fixed-point types declared
in the program and it comprises real numbers, which are displayed using
a custom format specific to the compiler, which is not always compatible
with the JSON data interchange format.

The change also fixes an off-by-one bug in Decimal_Exponent_Lo and also
tweaks Decimal_Exponent_Hi for the sake of consistency.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* urealp.ads (UR_Write_To_JSON): Declare.
* urealp.adb (Decimal_Exponent_Hi): Treat numbers in base 10
specially and rewrite handling of numbers in other bases.
(Decimal_Exponent_Lo): Likewise.
(Normalize): Minor tweak.
(UR_Write_To_JSON): New wrapper procedure around UR_Write.
* repinfo.adb (List_Type_Info): When the output is to JSON, call
UR_Write_To_JSON instead of UR_Write.diff --git a/gcc/ada/repinfo.adb b/gcc/ada/repinfo.adb
--- a/gcc/ada/repinfo.adb
+++ b/gcc/ada/repinfo.adb
@@ -2030,7 +2030,7 @@ package body Repinfo is
  if List_Representation_Info_To_JSON then
 Write_Line (",");
 Write_Str ("  ""Small"": ");
-UR_Write (Small_Value (Ent));
+UR_Write_To_JSON (Small_Value (Ent));
  else
 Write_Str ("for ");
 List_Name (Ent);
@@ -2052,9 +2052,9 @@ package body Repinfo is
if List_Representation_Info_To_JSON then
   Write_Line (",");
   Write_Str ("  ""Range"": [ ");
-  UR_Write (Realval (Low_Bound (R)));
+  UR_Write_To_JSON (Realval (Low_Bound (R)));
   Write_Str (", ");
-  UR_Write (Realval (High_Bound (R)));
+  UR_Write_To_JSON (Realval (High_Bound (R)));
   Write_Str (" ]");
else
   Write_Str ("for ");


diff --git a/gcc/ada/urealp.adb b/gcc/ada/urealp.adb
--- a/gcc/ada/urealp.adb
+++ b/gcc/ada/urealp.adb
@@ -174,16 +174,30 @@ package body Urealp is
  return UI_Decimal_Digits_Hi (Val.Num) -
 UI_Decimal_Digits_Lo (Val.Den);
 
-  --  For based numbers, just subtract the decimal exponent from the
-  --  high estimate of the number of digits in the numerator and add
-  --  one to accommodate possible round off errors for non-decimal
-  --  bases. For example:
+  --  For based numbers, get the maximum number of digits in the numerator
+  --  minus one and the either exact or floor value of the decimal exponent
+  --  of the denominator, and subtract. For example:
 
-  -- 1_500_000 / 10**4 = 1.50E-2
+  --  321 / 10**3 = 3.21E-1
+  --  435 / 5**7  = 5.57E-3
 
-  else -- Val.Rbase /= 0
- return UI_Decimal_Digits_Hi (Val.Num) -
-Equivalent_Decimal_Exponent (Val) + 1;
+  else
+ declare
+E : Int;
+
+ begin
+if Val.Rbase = 10 then
+   E := UI_To_Int (Val.Den);
+
+else
+   E := Equivalent_Decimal_Exponent (Val);
+   if E < 0 then
+  E := E - 1;
+   end if;
+end if;
+
+return UI_Decimal_Digits_Hi (Val.Num) - 1 - E;
+ end;
   end if;
end Decimal_Exponent_Hi;
 
@@ -213,16 +227,30 @@ package body Urealp is
  return UI_Decimal_Digits_Lo (Val.Num) -
 UI_Decimal_Digits_Hi (Val.Den) - 1;
 
-  --  For based numbers, just subtract the decimal exponent from the
-  --  low estimate of the number of digits in the numerator and subtract
-  --  one to accommodate possible round off errors for non-decimal
-  --  bases. For example:
+  --  For based numbers, get the minimum number of digits in the numerator
+  --  minus one and the either exact or ceil value of the decimal exponent
+  --  of the denominator, and subtract. For example:
 
-  -- 1_500_000 / 10**4 = 1.50E-2
+  --  321 / 10**3 = 3.21E-1
+  --  435 / 5**7  = 5.57E-3
 
-  else -- Val.Rbase /= 0
- return UI_Decimal_Digits_Lo (Val.Num) -
-Equivalent_Decimal_Exponent (Val) - 1;
+  else
+ declare
+E : Int;
+
+ begin
+if Val.Rbase = 10 then
+   E := UI_To_Int (Val.Den);
+
+else
+   E := Equivalent_Decimal_Exponent (Val);
+   if E > 0 then
+  E := E + 1;
+   end if;
+end if;
+
+return UI_Decimal_Digits_Lo (Val.Num) - 1 - E;
+ end;
   end if;
end Decimal_Exponent_Lo;
 
@@ -374,7 +402,7 @@ package body Urealp is
   Tmp : Uint;
   Num : Uint;
   Den : Uint;
-  M   : constant Uintp.Save_Mark := Uintp.Mark;
+  M   : constant Uintp.Save_Mark := Mark;
 
begin
   --  Start by setting J to the greatest of the absolute values of the
@@ -1486,6 +1514,80 @@ package body Urealp is

[Ada] Fix unbalanced parens in documentation of Address clauses

2021-06-21 Thread Pierre-Marie de Rodat
Typo in description of handling of Address clauses by GNAT; spotted
while implementing support for overlays in GNATprove.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* doc/gnat_rm/representation_clauses_and_pragmas.rst (Address
Clauses): Fix unbalanced parens.
* gnat_rm.texi: Regenerate.diff --git a/gcc/ada/doc/gnat_rm/representation_clauses_and_pragmas.rst b/gcc/ada/doc/gnat_rm/representation_clauses_and_pragmas.rst
--- a/gcc/ada/doc/gnat_rm/representation_clauses_and_pragmas.rst
+++ b/gcc/ada/doc/gnat_rm/representation_clauses_and_pragmas.rst
@@ -1738,7 +1738,7 @@ of the use of this pragma. This may cause an overlay to have this
 unintended clobbering effect. The compiler avoids this for scalar
 types, but not for composite objects (where in general the effect
 of ``Initialize_Scalars`` is part of the initialization routine
-for the composite object:
+for the composite object):
 
 ::
 


diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -19946,7 +19946,7 @@ of the use of this pragma. This may cause an overlay to have this
 unintended clobbering effect. The compiler avoids this for scalar
 types, but not for composite objects (where in general the effect
 of @code{Initialize_Scalars} is part of the initialization routine
-for the composite object:
+for the composite object):
 
 @example
 pragma Initialize_Scalars;




[Ada] Fix detection of overlapping actuals with renamings

2021-06-21 Thread Pierre-Marie de Rodat
Simplify detection of renamings within actuals that denote the same
object. This code only needs to take object renamings and shouldn't care
about renamings of subprogram, packages or exceptions.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Is_Object_Renaming): Rename from Is_Renaming;
simplify; adapt callers.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -7262,8 +7262,8 @@ package body Sem_Util is
-
 
function Denotes_Same_Object (A1, A2 : Node_Id) return Boolean is
-  function Is_Renaming (N : Node_Id) return Boolean;
-  --  Return true if N names a renaming entity
+  function Is_Object_Renaming (N : Node_Id) return Boolean;
+  --  Return true if N names an object renaming entity
 
   function Is_Valid_Renaming (N : Node_Id) return Boolean;
   --  For renamings, return False if the prefix of any dereference within
@@ -7271,35 +7271,16 @@ package body Sem_Util is
   --  renamed object_name contains references to variables or calls on
   --  nonstatic functions; otherwise return True (RM 6.4.1(6.10/3))
 
-  -
-  -- Is_Renaming --
-  -
+  
+  -- Is_Object_Renaming --
+  
 
-  function Is_Renaming (N : Node_Id) return Boolean is
+  function Is_Object_Renaming (N : Node_Id) return Boolean is
   begin
- if not Is_Entity_Name (N) then
-return False;
- end if;
-
- case Ekind (Entity (N)) is
-when E_Variable | E_Constant =>
-   return Present (Renamed_Object (Entity (N)));
-
-when E_Exception
-   | E_Function
-   | E_Generic_Function
-   | E_Generic_Package
-   | E_Generic_Procedure
-   | E_Operator
-   | E_Package
-   | E_Procedure
-=>
-   return Present (Renamed_Entity (Entity (N)));
-
-when others =>
-   return False;
- end case;
-  end Is_Renaming;
+ return Is_Entity_Name (N)
+   and then Ekind (Entity (N)) in E_Variable | E_Constant
+   and then Present (Renamed_Object (Entity (N)));
+  end Is_Object_Renaming;
 
   ---
   -- Is_Valid_Renaming --
@@ -7307,7 +7288,7 @@ package body Sem_Util is
 
   function Is_Valid_Renaming (N : Node_Id) return Boolean is
   begin
- if Is_Renaming (N)
+ if Is_Object_Renaming (N)
and then not Is_Valid_Renaming (Renamed_Entity (Entity (N)))
  then
 return False;
@@ -7494,12 +7475,12 @@ package body Sem_Util is
   --  no references to variables nor calls on nonstatic functions (RM
   --  6.4.1(6.11/3)).
 
-  elsif Is_Renaming (A1)
+  elsif Is_Object_Renaming (A1)
 and then Is_Valid_Renaming (A1)
   then
  return Denotes_Same_Object (Renamed_Entity (Entity (A1)), A2);
 
-  elsif Is_Renaming (A2)
+  elsif Is_Object_Renaming (A2)
 and then Is_Valid_Renaming (A2)
   then
  return Denotes_Same_Object (A1, Renamed_Entity (Entity (A2)));




[Ada] Simplify detection of statically overlapping slices

2021-06-21 Thread Pierre-Marie de Rodat
Statically matching slices in actual parameters are now detected in the
Denotes_Same_Object routine by directly examining the slice indexes and
by a dubious recursive call.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Denotes_Same_Object): Simplify handling of
slices.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -7462,9 +7462,12 @@ package body Sem_Util is
--  Check whether bounds are statically identical. There is no
--  attempt to detect partial overlap of slices.
 
-   return Denotes_Same_Object (Lo1, Lo2)
-and then
-  Denotes_Same_Object (Hi1, Hi2);
+   return Is_OK_Static_Expression (Lo1)
+ and then Is_OK_Static_Expression (Lo2)
+ and then Is_OK_Static_Expression (Hi1)
+ and then Is_OK_Static_Expression (Hi2)
+ and then Expr_Value (Lo1) = Expr_Value (Lo2)
+ and then Expr_Value (Hi1) = Expr_Value (Hi2);
 end;
  end if;
 
@@ -7485,20 +7488,6 @@ package body Sem_Util is
   then
  return Denotes_Same_Object (A1, Renamed_Entity (Entity (A2)));
 
-  --  In the recursion, integer literals appear as slice bounds
-
-  elsif Nkind (A1) = N_Integer_Literal
-and then Nkind (A2) = N_Integer_Literal
-  then
- return Intval (A1) = Intval (A2);
-
-  --  Likewise for character literals
-
-  elsif Nkind (A1) = N_Character_Literal
-and then Nkind (A2) = N_Character_Literal
-  then
- return Char_Literal_Value (A1) = Char_Literal_Value (A2);
-
   else
  return False;
   end if;




[Ada] Disable wrong computation of offsets within multidimensional arrays

2021-06-21 Thread Pierre-Marie de Rodat
Routine Indexed_Component_Bit_Offset is meant to return the first bit
position of an array component, but it only examined the first index
expression and necessarily produced wrong results for multidimensional
arrays.

Since this routine is only used for warnings, it is safe to simply
disable this wrong code and behave just like if the offsets within
a multidimensional array would not be known at compile time.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Indexed_Component_Bit_Offset): Return an unknown
offset for components within multidimensional arrays; remove
redundant parens.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -14827,6 +14827,12 @@ package body Sem_Util is
  return No_Uint;
   end if;
 
+  --  Do not attempt to compute offsets within multi-dimensional arrays
+
+  if Present (Next_Index (Ind)) then
+ return No_Uint;
+  end if;
+
   if Nkind (Ind) = N_Subtype_Indication then
  Ind := Constraint (Ind);
 
@@ -14843,7 +14849,7 @@ package body Sem_Util is
 
   --  Return the scaled offset
 
-  return Off * (Expr_Value (Exp) - Expr_Value (Low_Bound ((Ind;
+  return Off * (Expr_Value (Exp) - Expr_Value (Low_Bound (Ind)));
end Indexed_Component_Bit_Offset;
 
-




[Ada] Compiler crash on sliding of fixed-lower-bound object in Loop_Invariant

2021-06-21 Thread Pierre-Marie de Rodat
When a sliding conversion is expanded during preanalysis of certain
assertion pragmas (such as Loop_Invariant), to convert an object of to
an array subtype with a fixed lower bound, a Val attribute created as
part of the upper bound expression of the conversion's subtype is not
expanded later when the pragma argument is reanalyzed as part of the
Check pragma that replaces the assertion pragma. This can lead to a
crash in gigi. This is fixed by not expanding sliding conversions during
preanalysis (when Expander_Active is False).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_util.adb (Expand_Sliding_Conversion): Only perform
expansion when Expander_Active is True. Add a comment about this
and refine existing comment regarding string literals.diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -5343,10 +5343,15 @@ package body Exp_Util is
   All_FLBs_Match : Boolean := True;
 
begin
-  --  Sliding should never be needed for string literals, because they have
-  --  their bounds set according to the applicable index constraint.
-
-  if Nkind (N) /= N_String_Literal then
+  --  This procedure is called during semantic analysis, and we only expand
+  --  a sliding conversion when Expander_Active, to avoid doing it during
+  --  preanalysis (which can lead to problems with the target subtype not
+  --  getting properly expanded during later full analysis). Also, sliding
+  --  should never be needed for string literals, because their bounds are
+  --  determined directly based on the fixed lower bound of Arr_Typ and
+  --  their length.
+
+  if Expander_Active and then Nkind (N) /= N_String_Literal then
  Constraints := New_List;
 
  Act_Subt  := Get_Actual_Subtype (N);




[Ada] Skip overlay checks on protected components with expansion disabled

2021-06-21 Thread Pierre-Marie de Rodat
Routine Find_Overlaid_Entity collects entire objects from prefixes of
attribute Address is overlay specifications. The alignment of those
entire objects are then examined in Validate_Address_Clauses.

However, Find_Overlaid_Entity wrongly collects protected components (and
discriminants of concurrent units), even though they do not represent
entire objects and don't have alignment specified, which causes crashes.

This is only a problem when expansion is disabled, e.g. in GNATprove
mode or when switch -gnatc is used. When expansion is enabled,
references to protected components are rewritten into references to
renamings of components of the implicit concurrent type record.

Since this only affects warnings and not legality checks, it is harmless
to ignore such objects in non-standard compilation modes.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Find_Overlaid_Entity): Ignore references to
components and discriminants.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -9437,6 +9437,18 @@ package body Sem_Util is
 
  if Is_Entity_Name (Expr) then
 Ent := Entity (Expr);
+
+--  If expansion is disabled, then we might see an entity of a
+--  protected component or of a discriminant of a concurrent unit.
+--  Ignore such entities, because further warnings for overlays
+--  expect this routine to only collect entities of entire objects.
+
+if Ekind (Ent) in E_Component | E_Discriminant then
+   pragma Assert
+ (not Expander_Active
+  and then Is_Concurrent_Type (Scope (Ent)));
+   Ent := Empty;
+end if;
 return;
 
  --  Check for components




[Ada] Compile s-mmap on aarch64-linux

2021-06-21 Thread Pierre-Marie de Rodat
The rules for building s-mmap were missing from aarch64-linux for no
apparent reason. The macros for this package includes other packages
that were also missing. There was no example where s-mmap was added by
itself so the standard macros were added.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* Makefile.rtl (aarch64-linux) [LIBGNAT_TARGET_PAIRS]: Add
$(TRASYM_DWARF_UNIX_PAIRS).
[EXTRA_GNAT_RTL_NONTASKING_OBJS]: Add $(TRASYM_DWARF_UNIX_OBJS)diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2456,6 +2456,7 @@ ifeq ($(strip $(filter-out aarch64% linux%,$(target_cpu) $(target_os))),)
   s-inmaop.adb

Re: [x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 21, 2021 at 12:14:09PM +0200, Richard Biener wrote:
> > But we could do what I've done in
> > r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70
> > - have int ix86_last_zero_store_uid;
> > set to INSN_UID of the last store emitted by the peephole2s and
> > then check that INSN_UID against the var.
> 
> Hmm, or have reg_nonzero_bits_for_peephole2 () and maintain
> that somehow ... (conservatively drop it when a SET is seen).

Maintaining something in peephole2 wouldn't be that easy because
of peephole2's rolling window, plus it would need to be done
in the generic code even when nothing but a single target in a specific case
needs that.

The following seems to work.

2021-06-21  Jakub Jelinek  

PR target/11877
* config/i386/i386-protos.h (ix86_last_zero_store_uid): Declare.
* config/i386/i386-expand.c (ix86_last_zero_store_uid): New variable.
* config/i386/i386.c (ix86_expand_prologue): Clear it.
* config/i386/i386.md (peephole2s for 1/2/4 stores of const0_rtx):
Remove "" from match_operand.  Emit new insns using emit_move_insn and
set ix86_last_zero_store_uid to INSN_UID of the last store.
Add peephole2s for 1/2/4 stores of const0_rtx following previous
successful peep2s.

--- gcc/config/i386/i386-protos.h.jj2021-06-21 11:59:16.769693735 +0200
+++ gcc/config/i386/i386-protos.h   2021-06-21 12:01:47.875691930 +0200
@@ -111,6 +111,7 @@ extern bool ix86_use_lea_for_mov (rtx_in
 extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]);
 extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode);
 extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]);
+extern int ix86_last_zero_store_uid;
 extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
 extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
 extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
--- gcc/config/i386/i386-expand.c.jj2021-06-21 09:39:21.604488082 +0200
+++ gcc/config/i386/i386-expand.c   2021-06-21 12:21:33.017977951 +0200
@@ -1316,6 +1316,9 @@ find_nearest_reg_def (rtx_insn *insn, in
   return false;
 }
 
+/* INSN_UID of the last insn emitted by zero store peephole2s.  */
+int ix86_last_zero_store_uid;
+
 /* Split lea instructions into a sequence of instructions
which are executed on ALU to avoid AGU stalls.
It is assumed that it is allowed to clobber flags register
--- gcc/config/i386/i386.c.jj   2021-06-21 09:39:21.622487840 +0200
+++ gcc/config/i386/i386.c  2021-06-21 12:06:54.049634337 +0200
@@ -8196,6 +8196,7 @@ ix86_expand_prologue (void)
   bool save_stub_call_needed;
   rtx static_chain = NULL_RTX;
 
+  ix86_last_zero_store_uid = 0;
   if (ix86_function_naked (current_function_decl))
 {
   if (flag_stack_usage_info)
--- gcc/config/i386/i386.md.jj  2021-06-21 09:42:04.086303699 +0200
+++ gcc/config/i386/i386.md 2021-06-21 12:14:10.411847549 +0200
@@ -19360,37 +19360,96 @@ (define_peephole2
 ;; When optimizing for size, zeroing memory should use a register.
 (define_peephole2
   [(match_scratch:SWI48 0 "r")
-   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
-   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))
-   (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0))
-   (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))]
+   (set (match_operand:SWI48 1 "memory_operand") (const_int 0))
+   (set (match_operand:SWI48 2 "memory_operand") (const_int 0))
+   (set (match_operand:SWI48 3 "memory_operand") (const_int 0))
+   (set (match_operand:SWI48 4 "memory_operand") (const_int 0))]
   "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
-  [(set (match_dup 1) (match_dup 0))
-   (set (match_dup 2) (match_dup 0))
-   (set (match_dup 3) (match_dup 0))
-   (set (match_dup 4) (match_dup 0))]
+  [(const_int 0)]
 {
   ix86_expand_clear (operands[0]);
+  emit_move_insn (operands[1], operands[0]);
+  emit_move_insn (operands[2], operands[0]);
+  emit_move_insn (operands[3], operands[0]);
+  ix86_last_zero_store_uid
+= INSN_UID (emit_move_insn (operands[4], operands[0]));
+  DONE;
 })
 
 (define_peephole2
   [(match_scratch:SWI48 0 "r")
-   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
-   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))]
+   (set (match_operand:SWI48 1 "memory_operand") (const_int 0))
+   (set (match_operand:SWI48 2 "memory_operand") (const_int 0))]
   "optimize_insn_for_size_p () && peep2_regno_dead_p (0, FLAGS_REG)"
-  [(set (match_dup 1) (match_dup 0))
-   (set (match_dup 2) (match_dup 0))]
+  [(const_int 0)]
 {
   ix86_expand_clear (operands[0]);
+  emit_move_insn (operands[1], operands[0]);
+  ix86_last_zero_store_uid
+= INSN_UID (emit_move_insn (operands[2], operands[0]));
+  DONE;
 })
 
 (define_peephole2
   [(match_scratch:SWI48 0 "r")
-   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))]
+   (set 

Re: [x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-21 Thread Richard Biener via Gcc-patches
On Mon, Jun 21, 2021 at 11:59 AM Jakub Jelinek  wrote:
>
> On Mon, Jun 21, 2021 at 11:19:12AM +0200, Richard Biener wrote:
> > > --- gcc/config/i386/i386.c.jj   2021-06-21 09:39:21.622487840 +0200
> > > +++ gcc/config/i386/i386.c  2021-06-21 10:21:12.389794740 +0200
> > > @@ -15186,6 +15186,33 @@ ix86_lea_for_add_ok (rtx_insn *insn, rtx
> > >return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0, false);
> > >  }
> > >
> > > +/* Return true if insns before FIRST_INSN (which is of the form
> > > +   (set (memory) (zero_operand)) are all also either in the
> > > +   same form, or (set (zero_operand) (const_int 0)).  */
> > > +
> > > +bool
> > > +ix86_zero_stores_peep2_p (rtx_insn *first_insn, rtx zero_operand)
> > > +{
> > > +  rtx_insn *insn = first_insn;
> > > +  for (int count = 0; count < 512; count++)
> >
> > Can't the peephole add a note (reg_equal?) that the
> > SET_SRC of the previously matched store is zero?
>
> I think REG_EQUAL is not valid, the documentation says that it should
> be used on SET of a REG, which is not the case here - we have a MEM.
>
> > That would avoid the need to walk here.
>
> But we could do what I've done in
> r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70
> - have int ix86_last_zero_store_uid;
> set to INSN_UID of the last store emitted by the peephole2s and
> then check that INSN_UID against the var.

Hmm, or have reg_nonzero_bits_for_peephole2 () and maintain
that somehow ... (conservatively drop it when a SET is seen).

> Jakub
>


Re: [PATCH] Add vect_recog_popcount_pattern to handle mismatch between the vectorized popcount IFN and scalar popcount builtin.

2021-06-21 Thread Richard Biener via Gcc-patches
On Thu, Jun 17, 2021 at 8:29 AM liuhongt  wrote:
>
> The patch remove those pro- and demotions when backend support direct
> optab.
>
> For i386: it enables vectorization for vpopcntb/vpopcntw and optimized
> for vpopcntq.
>
> gcc/ChangeLog:
>
> PR tree-optimization/97770
> * tree-vect-patterns.c (vect_recog_popcount_pattern):
> New.
> (vect_recog_func vect_vect_recog_func_ptrs): Add new pattern.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/97770
> * gcc.target/i386/avx512bitalg-pr97770-1.c: Remove xfail.
> * gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Remove xfail.
> ---
>  .../gcc.target/i386/avx512bitalg-pr97770-1.c  |  27 +++--
>  .../i386/avx512vpopcntdq-pr97770-1.c  |   9 +-
>  gcc/tree-vect-patterns.c  | 110 ++
>  3 files changed, 127 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> index c83a477045c..d1beec4cdb4 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-pr97770-1.c
> @@ -1,19 +1,18 @@
>  /* PR target/97770 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mavx512bitalg -mavx512vl -mprefer-vector-width=512" } 
> */
> -/* Add xfail since no IFN for QI/HImode popcount */
> -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1 
> {xfail *-*-*} } } */
> -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1 
> {xfail *-*-*} } } */
> -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1 
> {xfail *-*-*} } } */
> -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1 
> {xfail *-*-*} } } */
> -/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1 
> {xfail *-*-*} } } */
> -/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1 
> {xfail *-*-*} } } */
> +/* { dg-options "-O2 -march=icelake-server -mprefer-vector-width=512" } */
> +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*xmm" 1  } 
> } */
> +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*xmm" 1  } 
> } */
> +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*ymm" 1  } 
> } */
> +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*ymm" 1  } 
> } */
> +/* { dg-final { scan-assembler-times "vpopcntb\[ \\t\]+\[^\\n\\r\]*zmm" 1  } 
> } */
> +/* { dg-final { scan-assembler-times "vpopcntw\[ \\t\]+\[^\\n\\r\]*zmm" 1  } 
> } */
>
>  #include 
>
>  void
>  __attribute__ ((noipa, optimize("-O3")))
> -popcountb_128 (char * __restrict dest, char* src)
> +popcountb_128 (unsigned char * __restrict dest, unsigned char* src)
>  {
>for (int i = 0; i != 16; i++)
>  dest[i] = __builtin_popcount (src[i]);
> @@ -21,7 +20,7 @@ popcountb_128 (char * __restrict dest, char* src)
>
>  void
>  __attribute__ ((noipa, optimize("-O3")))
> -popcountw_128 (short* __restrict dest, short* src)
> +popcountw_128 (unsigned short* __restrict dest, unsigned short* src)
>  {
>for (int i = 0; i != 8; i++)
>  dest[i] = __builtin_popcount (src[i]);
> @@ -29,7 +28,7 @@ popcountw_128 (short* __restrict dest, short* src)
>
>  void
>  __attribute__ ((noipa, optimize("-O3")))
> -popcountb_256 (char * __restrict dest, char* src)
> +popcountb_256 (unsigned char * __restrict dest, unsigned char* src)
>  {
>for (int i = 0; i != 32; i++)
>  dest[i] = __builtin_popcount (src[i]);
> @@ -37,7 +36,7 @@ popcountb_256 (char * __restrict dest, char* src)
>
>  void
>  __attribute__ ((noipa, optimize("-O3")))
> -popcountw_256 (short* __restrict dest, short* src)
> +popcountw_256 (unsigned short* __restrict dest, unsigned short* src)
>  {
>for (int i = 0; i != 16; i++)
>  dest[i] = __builtin_popcount (src[i]);
> @@ -45,7 +44,7 @@ popcountw_256 (short* __restrict dest, short* src)
>
>  void
>  __attribute__ ((noipa, optimize("-O3")))
> -popcountb_512 (char * __restrict dest, char* src)
> +popcountb_512 (unsigned char * __restrict dest, unsigned char* src)
>  {
>for (int i = 0; i != 64; i++)
>  dest[i] = __builtin_popcount (src[i]);
> @@ -53,7 +52,7 @@ popcountb_512 (char * __restrict dest, char* src)
>
>  void
>  __attribute__ ((noipa, optimize("-O3")))
> -popcountw_512 (short* __restrict dest, short* src)
> +popcountw_512 (unsigned short* __restrict dest, unsigned short* src)
>  {
>for (int i = 0; i != 32; i++)
>  dest[i] = __builtin_popcount (src[i]);
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c 
> b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c
> index 63bb00d9b4a..dedd2e4c3d6 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-pr97770-1.c
> @@ -1,13 +1,12 @@
>  /* PR target/97770 */
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -mavx512vpopcntdq 

Re: [x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 21, 2021 at 11:19:12AM +0200, Richard Biener wrote:
> > --- gcc/config/i386/i386.c.jj   2021-06-21 09:39:21.622487840 +0200
> > +++ gcc/config/i386/i386.c  2021-06-21 10:21:12.389794740 +0200
> > @@ -15186,6 +15186,33 @@ ix86_lea_for_add_ok (rtx_insn *insn, rtx
> >return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0, false);
> >  }
> >
> > +/* Return true if insns before FIRST_INSN (which is of the form
> > +   (set (memory) (zero_operand)) are all also either in the
> > +   same form, or (set (zero_operand) (const_int 0)).  */
> > +
> > +bool
> > +ix86_zero_stores_peep2_p (rtx_insn *first_insn, rtx zero_operand)
> > +{
> > +  rtx_insn *insn = first_insn;
> > +  for (int count = 0; count < 512; count++)
> 
> Can't the peephole add a note (reg_equal?) that the
> SET_SRC of the previously matched store is zero?

I think REG_EQUAL is not valid, the documentation says that it should
be used on SET of a REG, which is not the case here - we have a MEM.

> That would avoid the need to walk here.

But we could do what I've done in
r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70
- have int ix86_last_zero_store_uid;
set to INSN_UID of the last store emitted by the peephole2s and
then check that INSN_UID against the var.

Jakub



Re: [PATCH 5/7] Allow match-and-simplified phiopt to run in early phiopt

2021-06-21 Thread Richard Biener via Gcc-patches
On Sat, Jun 19, 2021 at 11:44 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> To move a few things more to match-and-simplify from phiopt,
> we need to allow match_simplify_replacement to run in early
> phiopt.  To do this, we need to mark some match patterns
> if they can be done in early phiopt or not.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no
> regressions.

I wonder if it would be easier to change

  result = gimple_simplify (COND_EXPR, type,
cond,
arg0, arg1,
, NULL);

to

  gimple_match_op op (gimple_match_cond::UNCOND,
  COND_EXPR, type, arg0, arg1);
  if (op.resimplify (early_p ? NULL : , NULL))
{
   /* Early we only want MIN/MAX, etc.  */
   if (early_p
  && (!op.code.is_tree_code ()
 || (tree_code)op.code != ...))
 return false;
  result = mabe_push_res_to_seq (op, );
  if (!result)
 ...

thus avoid complex transforms by passing a NULL seq.  'op'
will be the piecewise result which you can check before
eventually re-materializing it as GIMPLE stmt via
mabe_push_res_to_seq.

Note I didn't really try to look at the ultimate results of allowing
all phi-opts early but the testsuite fallout was spectacular as
(too) simple jump-threading tests fail since the code is no longer
jumpy as we arrive at the first DOM/VRP ...

Richard.

> gcc/ChangeLog:
>
> * generic-match-head.c (phiopt_earlymode): New function.
> * gimple-match-head.c (phiopt_earlymode): New function.
> * match.pd (A ? CST0 : CST1): Disable for early phiopt.
> (x >= 0 ? ~y : y): Likewise.
> (x >= 0 ? y : ~y): Likewise.
> * tree-pass.h (PROP_gimple_lomp_dev): Increment bit by one.
> (PROP_rtl_split_insns): Likewise.
> (PROP_phioptearly): New define.
> * tree-ssa-phiopt.c (tree_ssa_phiopt_worker): Set and unset
> PROP_phioptearly on curr_properties if early.
> ---
>  gcc/generic-match-head.c |  7 +
>  gcc/gimple-match-head.c  |  7 +
>  gcc/match.pd | 76 
> ++--
>  gcc/tree-pass.h  |  5 ++--
>  gcc/tree-ssa-phiopt.c|  8 +++--
>  5 files changed, 63 insertions(+), 40 deletions(-)
>
> diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c
> index f426208..90ebf84 100644
> --- a/gcc/generic-match-head.c
> +++ b/gcc/generic-match-head.c
> @@ -91,6 +91,13 @@ optimize_vectors_before_lowering_p ()
>return true;
>  }
>
> +/* Return true if phiopt is in early mode. */
> +static inline bool
> +phiopt_earlymode ()
> +{
> +  return false;
> +}
> +
>  /* Return true if successive divisions can be optimized.
> Defer to GIMPLE opts.  */
>
> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
> index 7112c11..1eafbb7 100644
> --- a/gcc/gimple-match-head.c
> +++ b/gcc/gimple-match-head.c
> @@ -1159,6 +1159,13 @@ canonicalize_math_after_vectorization_p ()
>return !cfun || (cfun->curr_properties & PROP_gimple_lvec) != 0;
>  }
>
> +/* Return true if phiopt is in early mode. */
> +static inline bool
> +phiopt_earlymode ()
> +{
> +  return !cfun || (cfun->curr_properties & PROP_phioptearly) != 0;
> +}
> +
>  /* Return true if we can still perform transformations that may introduce
> vector operations that are not supported by the target. Vector lowering
> normally handles those, but after that pass, it becomes unsafe.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 39fb57e..f38baf2 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3728,39 +3728,40 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  #if GIMPLE
>  (simplify
>   (cond @0 INTEGER_CST@1 INTEGER_CST@2)
> - (switch
> -  (if (integer_zerop (@2))
> -   (switch
> -/* a ? 1 : 0 -> a if 0 and 1 are integral types. */
> -(if (integer_onep (@1))
> - (convert (convert:boolean_type_node @0)))
> -/* a ? -1 : 0 -> -a. */
> -(if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@1))
> - (negate (convert (convert:boolean_type_node @0
> -/* a ? powerof2cst : 0 -> a << (log2(powerof2cst)) */
> -(if (INTEGRAL_TYPE_P (type) && integer_pow2p (@1))
> - (with {
> -   tree shift = build_int_cst (integer_type_node, tree_log2 (@1));
> -  }
> -  (lshift (convert (convert:boolean_type_node @0)) { shift; })
> -  (if (integer_zerop (@1))
> -   (with {
> -  tree booltrue = constant_boolean_node (true, boolean_type_node);
> -}
> + (if (!phiopt_earlymode ())
> +  (switch
> +   (if (integer_zerop (@2))
>  (switch
> - /* a ? 0 : 1 -> !a. */
> - (if (integer_onep (@2))
> -  (convert (bit_xor (convert:boolean_type_node @0) { booltrue; } )))
> - /* a ? -1 : 0 -> -(!a). */
> - (if (INTEGRAL_TYPE_P (type) && integer_all_onesp (@2))
> -  (negate (convert (bit_xor (convert:boolean_type_node @0) { booltrue; } 
> 
> - /* a ? powerof2cst : 0 

Re: [PATCH 6/7] Lower for loops before lowering cond in genmatch

2021-06-21 Thread Richard Biener via Gcc-patches
On Sat, Jun 19, 2021 at 11:43 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> While converting some fold_cond_expr_with_comparison
> to match, I found that I wanted to use "for cnd (cond vec_cond)"
> but that was not causing the lowering of cond to happen.
> What was happening was the lowering of the for loop
> was happening after the lowering of the cond. So
> swapping was the correct thing to do but it also
> means we need to copy for_subst_vec in lower_cond.
>
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.  Can you please put a comment before for lowering that says
why it's performed before cond lowering and also why it is safe
(for substitution delay does not happen for cond/vec_cond).

Thanks,
Richard.

> gcc/ChangeLog:
>
> * genmatch.c (lower_cond): Copy for_subst_vec
> for the simplify also.
> (lower): Swap the order for lower_for and lower_cond.
> ---
>  gcc/genmatch.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/genmatch.c b/gcc/genmatch.c
> index 4d47672..3aee3dd 100644
> --- a/gcc/genmatch.c
> +++ b/gcc/genmatch.c
> @@ -1306,6 +1306,7 @@ lower_cond (simplify *s, vec& simplifiers)
>  {
>simplify *ns = new simplify (s->kind, s->id, matchers[i], s->result,
>s->for_vec, s->capture_ids);
> +  ns->for_subst_vec.safe_splice (s->for_subst_vec);
>simplifiers.safe_push (ns);
>  }
>  }
> @@ -1543,24 +1544,23 @@ static void
>  lower (vec& simplifiers, bool gimple)
>  {
>auto_vec out_simplifiers;
> -  for (unsigned i = 0; i < simplifiers.length (); ++i)
> -lower_opt (simplifiers[i], out_simplifiers);
> +  for (auto s: simplifiers)
> +lower_opt (s, out_simplifiers);
>
>simplifiers.truncate (0);
> -  for (unsigned i = 0; i < out_simplifiers.length (); ++i)
> -lower_commutative (out_simplifiers[i], simplifiers);
> +  for (auto s: out_simplifiers)
> +lower_commutative (s, simplifiers);
>
>out_simplifiers.truncate (0);
> -  if (gimple)
> -for (unsigned i = 0; i < simplifiers.length (); ++i)
> -  lower_cond (simplifiers[i], out_simplifiers);
> -  else
> -out_simplifiers.safe_splice (simplifiers);
> -
> +  for (auto s: simplifiers)
> +lower_for (s, out_simplifiers);
>
>simplifiers.truncate (0);
> -  for (unsigned i = 0; i < out_simplifiers.length (); ++i)
> -lower_for (out_simplifiers[i], simplifiers);
> +  if (gimple)
> +for (auto s: out_simplifiers)
> +  lower_cond (s, simplifiers);
> +  else
> +simplifiers.safe_splice (out_simplifiers);
>  }
>
>
> --
> 1.8.3.1
>


rs6000: Fix typos in float128 ISA3.1 support

2021-06-21 Thread Kewen.Lin via Gcc-patches
Hi,

Recently if we build gcc on Power with the assembler which doesn't
have Power10 support, the build will fail when building libgcc with
one error message like:

Error: invalid switch -mpower10
Error: unrecognized option -mpower10
make[2]: *** [...gcc/gcc-base/libgcc/shared-object.mk:14: float128-p10.o] Error 
1

By checking the culprit commit r12-1340, it's caused by some typos.

What the proposed patch does:
  - fix test target typo libgcc_cv_powerpc_3_1_float128_hw
(written wrongly as libgcc_cv_powerpc_float128_hw, so it's going
 to build ISA3.1 stuffs just when detecting ISA3.0).
  - fix test case used for libgcc_cv_powerpc_3_1_float128_hw check.
  - fix test option used for libgcc_cv_powerpc_3_1_float128_hw check.
  - remove the ISA3.1 related contents from t-float128-hw.
  - add new macro FLOAT128_HW_INSNS_ISA3_1 to differentiate ISA3.1
content from ISA3.0 part in ifunc support.

For the last two points, I think it's by design, when the assembler
only supports power9 insns but not power10 insns, we are unable to
build power10 hw supported function successfully, so we shouldn't
generate related ifunc stuffs which rely on p10 insns.

This patch makes the build happy even without p10 supported as.

Bootstrapped/regtested on:
  - powerpc64le-linux-gnu P10
  - powerpc64le-linux-gnu P9 (w/i and w/o p10 supported as)
  - powerpc64-linux-gnu P8 (w/i and w/o p10 supported as)

BTW, there are some noises during regression testings due to
newer versions binutils, but they were identified as unrelated
after some checkings.

Is it ok for trunk?

BR,
Kewen
-
libgcc/ChangeLog:

* configure: Regenerate.
* configure.ac (test for libgcc_cv_powerpc_3_1_float128_hw): Fix
typos among name, CFLAGS and test case.
* config/rs6000/t-float128-hw(fp128_3_1_hw_funcs,
fp128_3_1_hw_src, fp128_3_1_hw_static_obj, fp128_3_1_hw_shared_obj,
fp128_3_1_hw_obj): Remove variables for ISA 3.1 support.
* config/rs6000/t-float128-p10-hw (FLOAT128_HW_INSNS): Append
macro FLOAT128_HW_INSNS_ISA3_1 for ISA 3.1 support.
(FP128_3_1_CFLAGS_HW): Fix option typo.
* config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): Guarded
with FLOAT128_HW_INSNS_ISA3_1.
(__floattikf_resolve): Likewise.
(__floatuntikf_resolve): Likewise.
(__fixkfti_resolve): Likewise.
(__fixunskfti_resolve): Likewise.
(__floattikf): Likewise.
(__floatuntikf): Likewise.
(__fixkfti): Likewise.
(__fixunskfti): Likewise.
 libgcc/config/rs6000/float128-ifunc.c  |  9 -
 libgcc/config/rs6000/t-float128-hw | 16 
 libgcc/config/rs6000/t-float128-p10-hw |  4 ++--
 libgcc/configure   | 18 +-
 libgcc/configure.ac| 14 +++---
 5 files changed, 26 insertions(+), 35 deletions(-)

diff --git a/libgcc/config/rs6000/float128-ifunc.c 
b/libgcc/config/rs6000/float128-ifunc.c
index 57545dd7edb..ef7f731bf0b 100644
--- a/libgcc/config/rs6000/float128-ifunc.c
+++ b/libgcc/config/rs6000/float128-ifunc.c
@@ -46,7 +46,9 @@
 #endif
 
 #define SW_OR_HW(SW, HW) (__builtin_cpu_supports ("ieee128") ? HW : SW)
+#ifdef FLOAT128_HW_INSNS_ISA3_1
 #define SW_OR_HW_ISA3_1(SW, HW) (__builtin_cpu_supports ("arch_3_1") ? HW : SW)
+#endif
 
 /* Resolvers.  */
 static __typeof__ (__addkf3_sw) *
@@ -97,6 +99,7 @@ __floatdikf_resolve (void)
   return SW_OR_HW (__floatdikf_sw, __floatdikf_hw);
 }
 
+#ifdef FLOAT128_HW_INSNS_ISA3_1
 static __typeof__ (__floattikf_sw) *
 __floattikf_resolve (void)
 {
@@ -108,6 +111,7 @@ __floatuntikf_resolve (void)
 {
   return SW_OR_HW_ISA3_1 (__floatuntikf_sw, __floatuntikf_hw);
 }
+#endif
 
 static __typeof__ (__floatunsikf_sw) *
 __floatunsikf_resolve (void)
@@ -121,7 +125,7 @@ __floatundikf_resolve (void)
   return SW_OR_HW (__floatundikf_sw, __floatundikf_hw);
 }
 
-
+#ifdef FLOAT128_HW_INSNS_ISA3_1
 static __typeof__ (__fixkfti_sw) *
 __fixkfti_resolve (void)
 {
@@ -133,6 +137,7 @@ __fixunskfti_resolve (void)
 {
   return SW_OR_HW_ISA3_1 (__fixunskfti_sw, __fixunskfti_hw);
 }
+#endif
 
 static __typeof__ (__fixkfsi_sw) *
 __fixkfsi_resolve (void)
@@ -323,6 +328,7 @@ TFtype __floatsikf (SItype_ppc)
 TFtype __floatdikf (DItype_ppc)
   __attribute__ ((__ifunc__ ("__floatdikf_resolve")));
 
+#ifdef FLOAT128_HW_INSNS_ISA3_1
 TFtype __floattikf (TItype_ppc)
   __attribute__ ((__ifunc__ ("__floattikf_resolve")));
 
@@ -334,6 +340,7 @@ TItype_ppc __fixkfti (TFtype)
 
 UTItype_ppc __fixunskfti (TFtype)
   __attribute__ ((__ifunc__ ("__fixunskfti_resolve")));
+#endif
 
 TFtype __floatunsikf (USItype_ppc)
   __attribute__ ((__ifunc__ ("__floatunsikf_resolve")));
diff --git a/libgcc/config/rs6000/t-float128-hw 
b/libgcc/config/rs6000/t-float128-hw
index c0827366cc4..d64ca4dd694 100644
--- a/libgcc/config/rs6000/t-float128-hw
+++ b/libgcc/config/rs6000/t-float128-hw
@@ -13,13 +13,6 @@ fp128_hw_static_obj  = $(addsuffix 

Re: [x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-21 Thread Richard Biener via Gcc-patches
On Mon, Jun 21, 2021 at 10:37 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Mon, Jun 21, 2021 at 09:18:28AM +0200, Uros Bizjak via Gcc-patches wrote:
> > > 2021-06-20  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > PR target/11877
> > > * config/i386/i386.md: New define_peephole2s to shrink writing
> > > 1, 2 or 4 consecutive zeros to memory when optimizing for size.
> > >
> > > gcc/testsuite/ChangeLog
> > > PR target/11877
> > > * gcc.target/i386/pr11877.c: New test case.
> >
> > OK.
>
> It unfortunately doesn't extend well to larger memory clearing.
> Consider e.g.
> void
> foo (int *p)
> {
>   p[0] = 0;
>   p[7] = 0;
>   p[23] = 0;
>   p[41] = 0;
>   p[48] = 0;
>   p[59] = 0;
>   p[69] = 0;
>   p[78] = 0;
>   p[83] = 0;
>   p[89] = 0;
>   p[98] = 0;
>   p[121] = 0;
>   p[132] = 0;
>   p[143] = 0;
>   p[154] = 0;
> }
> where with the patch we emit:
> xorl%eax, %eax
> xorl%edx, %edx
> xorl%ecx, %ecx
> xorl%esi, %esi
> xorl%r8d, %r8d
> movl%eax, (%rdi)
> movl%eax, 28(%rdi)
> movl%eax, 92(%rdi)
> movl%eax, 164(%rdi)
> movl%edx, 192(%rdi)
> movl%edx, 236(%rdi)
> movl%edx, 276(%rdi)
> movl%edx, 312(%rdi)
> movl%ecx, 332(%rdi)
> movl%ecx, 356(%rdi)
> movl%ecx, 392(%rdi)
> movl%ecx, 484(%rdi)
> movl%esi, 528(%rdi)
> movl%esi, 572(%rdi)
> movl%r8d, 616(%rdi)
> Here is an incremental (so far untested) patch that emits:
> xorl%eax, %eax
> movl%eax, (%rdi)
> movl%eax, 28(%rdi)
> movl%eax, 92(%rdi)
> movl%eax, 164(%rdi)
> movl%eax, 192(%rdi)
> movl%eax, 236(%rdi)
> movl%eax, 276(%rdi)
> movl%eax, 312(%rdi)
> movl%eax, 332(%rdi)
> movl%eax, 356(%rdi)
> movl%eax, 392(%rdi)
> movl%eax, 484(%rdi)
> movl%eax, 528(%rdi)
> movl%eax, 572(%rdi)
> movl%eax, 616(%rdi)
> instead:
>
> 2021-06-21  Jakub Jelinek  
>
> PR target/11877
> * config/i386/i386-protos.h (ix86_zero_stores_peep2_p): Declare.
> * config/i386/i386.c (ix86_zero_stores_peep2_p): New function.
> * config/i386/i386.md (peephole2s for 1/2/4 stores of const0_rtx):
> Remove "" from match_operand.  Add peephole2s for 1/2/4 stores of
> const0_rtx following previous successful peep2s.
>
> --- gcc/config/i386/i386-protos.h.jj2021-06-07 09:24:57.696690116 +0200
> +++ gcc/config/i386/i386-protos.h   2021-06-21 10:21:05.428887980 +0200
> @@ -111,6 +111,7 @@ extern bool ix86_use_lea_for_mov (rtx_in
>  extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]);
>  extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode);
>  extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]);
> +extern bool ix86_zero_stores_peep2_p (rtx_insn *, rtx);
>  extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool 
> high);
>  extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
>  extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
> --- gcc/config/i386/i386.c.jj   2021-06-21 09:39:21.622487840 +0200
> +++ gcc/config/i386/i386.c  2021-06-21 10:21:12.389794740 +0200
> @@ -15186,6 +15186,33 @@ ix86_lea_for_add_ok (rtx_insn *insn, rtx
>return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0, false);
>  }
>
> +/* Return true if insns before FIRST_INSN (which is of the form
> +   (set (memory) (zero_operand)) are all also either in the
> +   same form, or (set (zero_operand) (const_int 0)).  */
> +
> +bool
> +ix86_zero_stores_peep2_p (rtx_insn *first_insn, rtx zero_operand)
> +{
> +  rtx_insn *insn = first_insn;
> +  for (int count = 0; count < 512; count++)

Can't the peephole add a note (reg_equal?) that the
SET_SRC of the previously matched store is zero?

That would avoid the need to walk here.

> +{
> +  insn = prev_nonnote_nondebug_insn_bb (insn);
> +  if (!insn)
> +   return false;
> +  rtx set = single_set (insn);
> +  if (!set)
> +   return false;
> +  if (SET_SRC (set) == const0_rtx
> + && rtx_equal_p (SET_DEST (set), zero_operand))
> +   return true;
> +  if (set != PATTERN (insn)
> + || !rtx_equal_p (SET_SRC (set), zero_operand)
> + || !memory_operand (SET_DEST (set), VOIDmode))
> +   return false;
> +}
> +  return false;
> +}
> +
>  /* Return true if destination reg of SET_BODY is shift count of
> USE_BODY.  */
>
> --- gcc/config/i386/i386.md.jj  2021-06-21 09:42:04.086303699 +0200
> +++ gcc/config/i386/i386.md 2021-06-21 10:21:31.932532964 +0200
> @@ -19360,10 +19360,10 @@ (define_peephole2
>  ;; When optimizing for size, zeroing memory should use a register.
>  (define_peephole2
>

[PATCH] tree-optimization/101121 - avoid infinite SLP build

2021-06-21 Thread Richard Biener
The following plugs another hole where we cache a failed SLP build
attempt with an all-success 'matches'.  It also adds checking that
we don't do that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-21  Richard Biener  

PR tree-optimization/101121
* tree-vect-slp.c (vect_build_slp_tree_2): To not fail fatally
when we just lack a stmt with the desired op when doing permutation.
(vect_build_slp_tree): When caching a failed SLP build attempt
assert that at least one lane is marked as not matching.

* gfortran.dg/pr101121.f: New testcase.
---
 gcc/testsuite/gfortran.dg/pr101121.f | 203 +++
 gcc/tree-vect-slp.c  |  18 ++-
 2 files changed, 218 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr101121.f

diff --git a/gcc/testsuite/gfortran.dg/pr101121.f 
b/gcc/testsuite/gfortran.dg/pr101121.f
new file mode 100644
index 000..b623ac10794
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr101121.f
@@ -0,0 +1,203 @@
+! { dg-do compile }
+! { dg-options "-Ofast -std=legacy" }
+! { dg-additional-options "-march=haswell" { target x86_64-*-* i?86-*-* } }
+  COMMON /JMSG80/ PI4,PIF,P120,R12,P340,R34,FCS(4,3),E34MAX,
+ 7IJSAME,KLSAME,IKSMJL
+  DIMENSION P1(3),FQ(0:5),F1(0:4),F2(0:4),WS(8),WP(8)
+  DIMENSION VEA(12),VES(9),WES(6)
+  DIMENSION T(0:20),U(0:20)
+  DIMENSION T3R(6,3,3,3),T9B(0:20,3,3,3)
+  DIMENSION F5X(0:12,3,3,3),F6X(0: 6,3,3,3,3)
+  DIMENSION A31(0:3,0:3),C31(2,0:3),A32(0:3,0:3),C32(2,0:3)
+  DIMENSION A41(0:3,0:3),C41(2,0:3),A42(0:3,0:3),C42(2,0:3)
+  DIMENSION A33(16),FIJ00(2),A43(16),FI0K0(2)
+  DIMENSION SEJJK0(  3),A54(16,  3),C54(2,  3)
+  DIMENSION A56(0:22,3,0:3),C56(2,0:3)
+  DIMENSION A60(0:3,0:3),C60(2,0:3),A61(0:3,0:3),C61(2,0:3)
+  DIMENSION A62(16),FI00L(2),A63(16),F0J0L(2)
+  DIMENSION A64(0:3,0:3),C64(2,  3),A65(0:3,0:3),C65(2,  3)
+  DIMENSION A69(0:3,  3),C69(2,0:3),A70(0:3,  3),C70(2,0:3)
+  DIMENSION A71(18,  3),C71(2,  3)
+  DIMENSION A72(18,  3),C72(2,  3)
+  DIMENSION A73(18,0:3),C73(2,0:3)
+  DIMENSION SE0LKL(  3),A75(16,3),C75(2,0:3)
+  DIMENSION SE0JLL(  3),A76(16,3),C76(2,0:3)
+  DIMENSION A77(0:25,3,0:3),C77(2,0:3),A78(0:31,3,0:3),C78(2,0:3)
+  DIMENSION A79(0:31,3,0:3),C79(2,0:3)
+  DIMENSION A80(0: 2,2),A81(0:24,3),A82(0:31,2),A83(0:22,2)
+  DIMENSION A84(0:13,2),A85(0:13,2),A86(0: 6)
+  DIMENSION S4(0:14),Q4(0:4),FIJKL(2)
+  IF(XVA.LT.CUG) THEN
+  ENDIF
+ F1(M)= FQ0*TMP
+ F2(M)= FQ0*TMP
+  XX1=-X12*X43
+  IF(JI.EQ.1) THEN
+DO 255 J=1,3
+  255CONTINUE
+ DO 268 K=1,3
+SEJJK00= E0+E(2,2,K,0)+E(3,3,K,0)
+A54( 5,K)= A540
+  268CONTINUE
+  297   F5X(3+M,I,I,I)=-R3(M,I,I,I)
+DO 299 J=1,3
+ F5X(3+M,I,I,J)=-R3(M,J,I,I)
+  299CONTINUE
+ DO 300 L=0,M56
+DO 300 M=1,3
+  300A56(N,M,L)= ZER
+   A60(2,L)= A600+P34(I,3)*E(I,0,0,L)
+   A61(0,L)= A610+D1I *E(L,0,0,I)
+   A61(1,L)= A610+P12(I,3)*E(L,0,0,I)
+ SEL00L= E(1,0,0,1)+E(2,0,0,2)+E(3,0,0,3)
+   IF(I.NE.J) THEN
+  K=6-I-J
+  F6X(0,J,I,I,I)= ZER
+  F6X(0,I,J,I,I)= ZER
+  F6X(0,I,I,J,I)= ZER
+  F6X(0,I,I,I,J)= ZER
+ F6X(M,I,I,K,J)= R2(M,K,J)
+   ENDIF
+  391   A82( M,N)= ZER
+  392   A83( M,N)= ZER
+   A84(M,N)= ZER
+   A85(M,N)= ZER
+  397A86( M)= ZER
+ DO 399 K=1,3
+DO 399 J=1,3
+  DO 398 M=1,6
+ T9B(M+ 2,I,J,K)= T3R0
+ T9B(M+ 8,I,J,K)= T1R(M,I,J,K)
+ T9B(M+14,I,J,K)= T3R0
+  398 CONTINUE
+  399CONTINUE
+  417A77( M,3,K)= A770+F5X0*GEIJKL
+  445A81( M,3) = A81( M,3)+T( M)*TMP
+ IF(K.EQ.L)A81( 5,3)=A81( 5,3)+TMP
+ IF(I.EQ.J) THEN
+DO 447 M=6,11
+  447   A81( M,3) = A81( M,3)+T( M)*GEIJKL
+ ENDIF
+  ENDIF
+  IF(LK.EQ.1) THEN
+ IF(JTYPE.NE.4) THEN
+DO 510 J=0,3
+   A31(3,J)= A310+ A310*Y02
+   A32(3,J)= A320+ A320*Y02
+  510   CONTINUE
+A33( 6)=-AEIJ00*Y1Y+T01
+A33( 7)= A330-0*Y01+T01
+A33( 8)= A330- A330*Y01
+A33(15)= A330+0*Y02
+A33(16)= A330+ A330*Y02
+ ENDIF
+A84(12,N)= A84( 7,N)+ A84( 8,N)*Y02
+A84(13,N)= A84( 9,N)
+ A85(10,2)= A85(10,2)- A85(10,1)+ A850
+ A85(11,2)= A85(11,2)- A85(11,1)+ A850
+ A85(12,2)= A85(12,2)- A85(12,1)+ A850
+ A85(13,2)= 

Re: [PATCH V3] Split loop for NE condition.

2021-06-21 Thread Richard Biener
On Wed, 9 Jun 2021, guojiufu wrote:

> On 2021-06-09 17:42, guojiufu via Gcc-patches wrote:
> > On 2021-06-08 18:13, Richard Biener wrote:
> >> On Fri, 4 Jun 2021, Jiufu Guo wrote:
> >> 
> > cut...
> >>> +  gcond *cond = as_a (last);
> >>> +  enum tree_code code = gimple_cond_code (cond);
> >>> +  if (!(code == NE_EXPR
> >>> + || (code == EQ_EXPR && (e->flags & EDGE_TRUE_VALUE
> >> 
> >> The NE_EXPR check misses a corresponding && (e->flags & EDGE_FALSE_VALUE)
> >> check.
> >> 
> > Thanks, check (e->flags & EDGE_FALSE_VALUE) would be safer.
> > 
> >>> + continue;
> >>> +
> >>> +  /* Check if bound is invarant.  */
> >>> +  tree idx = gimple_cond_lhs (cond);
> >>> +  tree bnd = gimple_cond_rhs (cond);
> >>> +  if (expr_invariant_in_loop_p (loop, idx))
> >>> + std::swap (idx, bnd);
> >>> +  else if (!expr_invariant_in_loop_p (loop, bnd))
> >>> + continue;
> >>> +
> >>> +  /* Only unsigned type conversion could cause wrap.  */
> >>> +  tree type = TREE_TYPE (idx);
> >>> +  if (!INTEGRAL_TYPE_P (type) || TREE_CODE (idx) != SSA_NAME
> >>> +   || !TYPE_UNSIGNED (type))
> >>> + continue;
> >>> +
> >>> +  /* Avoid to split if bound is MAX/MIN val.  */
> >>> +  tree bound_type = TREE_TYPE (bnd);
> >>> +  if (TREE_CODE (bnd) == INTEGER_CST && INTEGRAL_TYPE_P (bound_type)
> >>> +   && (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
> >>> +   || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type
> >>> + continue;
> >> 
> >> Note you do not require 'bnd' to be constant and thus at runtime those
> >> cases still need to be handled correctly.
> > Yes, bnd is not required to be constant.  The above code is filtering the
> > case
> > where bnd is const max/min value of the type.  So, the code could be updated
> > as:
> >   if (tree_int_cst_equal (bnd, TYPE_MAX_VALUE (bound_type))
> >   || tree_int_cst_equal (bnd, TYPE_MIN_VALUE (bound_type)))

Yes, and the comment adjusted to "if bound is known to be MAX/MIN val."

> >> 
> >>> +  /* Check if there is possible wrap.  */
> >>> +  class tree_niter_desc niter;
> >>> +  if (!number_of_iterations_exit (loop, e, , false, false))
> > cut...
> >>> +
> >>> +  /* Change if (i != n) to LOOP1:if (i > n) and LOOP2:if (i < n) */
> >> 
> >> It now occurs to me that we nowhere check the evolution of IDX
> >> (split_at_bb_p uses simple_iv for this for example).  The transform
> >> assumes that we will actually hit i == n and that i increments, but
> >> while you check the control IV from number_of_iterations_exit
> >> for NE_EXPR that does not guarantee a positive evolution.
> >> 
> > If I do not correctly reply your question, please point out:
> > number_of_iterations_exit is similar with simple_iv to invoke
> > simple_iv_with_niters
> > which check the evolution, and number_of_iterations_exit check
> > number_of_iterations_cond
> > which check no_overflow more accurate, this is one reason I use this
> > function.
> >
> > This transform assumes that the last run hits i==n.
> > Otherwise, the loop may run infinitely wrap after wrap.
> > For safe, if the step is 1 or -1,  this assumption would be true.  I
> > would add this check.

OK.

> > Thanks so much for pointing out I missed the negative step!
> > 
> >> Your testcases do not include any negative step examples, but I guess
> >> the conditions need to be swapped in this case?
> > 
> > I would add cases and code to support step 1/-1.
> > 
> >> 
> >> I think you also have to consider the order we split, say with
> >> 
> >>   for (i = start; i != end; ++i)
> >> {
> >>   push (i);
> >>   if (a[i] != b[i])
> >> break;
> >> }
> >> 
> >> push (i) calls need to be in the same order for all cases of
> >> start < end, start == end and start > end (and also cover
> >> runtime testcases with end == 0 or end == UINT_MAX, likewise
> >> for start).
> > I add tests for the above cases. If missing sth, please point out, thanks!
> > 
> >> 
> >>> +  bool inv = expr_invariant_in_loop_p (loop, gimple_cond_lhs (gc));
> >>> +  enum tree_code up_code = inv ? LT_EXPR : GT_EXPR;
> >>> +  enum tree_code down_code = inv ? GT_EXPR : LT_EXPR;
> > cut
> > 
> > Thanks again for the very helpful review!
> > 
> > BR,
> > Jiufu Guo.
> 
> Here is the updated patch, thanks for your time!
> 
> diff --git a/gcc/testsuite/gcc.dg/loop-split1.c
> b/gcc/testsuite/gcc.dg/loop-split1.c
> new file mode 100644
> index 000..dd2d03a7b96
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/loop-split1.c
> @@ -0,0 +1,101 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
> +
> +void
> +foo (int *a, int *b, unsigned l, unsigned n)
> +{
> +  while (++l != n)
> +a[l] = b[l] + 1;
> +}
> +void
> +foo_1 (int *a, int *b, unsigned n)
> +{
> +  unsigned l = 0;
> +  while (++l != n)
> +a[l] = b[l] + 1;
> +}
> +
> +void
> +foo1 (int *a, int *b, unsigned l, unsigned n)
> +{
> +  while (l++ != n)
> +a[l] = 

RE: [ARM] PR97906 - Missed lowering abs(a) >= abs(b) to vacge

2021-06-21 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: 21 June 2021 09:33
> To: Kyrylo Tkachov 
> Cc: gcc Patches 
> Subject: Re: [ARM] PR97906 - Missed lowering abs(a) >= abs(b) to vacge
> 
> On Wed, 16 Jun 2021 at 15:49, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 14 Jun 2021 at 16:15, Kyrylo Tkachov 
> wrote:
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Prathamesh Kulkarni 
> > > > Sent: 14 June 2021 08:58
> > > > To: gcc Patches ; Kyrylo Tkachov
> > > > 
> > > > Subject: Re: [ARM] PR97906 - Missed lowering abs(a) >= abs(b) to vacge
> > > >
> > > > On Mon, 7 Jun 2021 at 12:46, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Tue, 1 Jun 2021 at 16:03, Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > > As mentioned in PR, for following test-case:
> > > > > >
> > > > > > #include 
> > > > > >
> > > > > > uint32x2_t f1(float32x2_t a, float32x2_t b)
> > > > > > {
> > > > > >   return vabs_f32 (a) >= vabs_f32 (b);
> > > > > > }
> > > > > >
> > > > > > uint32x2_t f2(float32x2_t a, float32x2_t b)
> > > > > > {
> > > > > >   return (uint32x2_t) __builtin_neon_vcagev2sf (a, b);
> > > > > > }
> > > > > >
> > > > > > We generate vacge for f2, but with -ffast-math, we generate
> following
> > > > for f1:
> > > > > > f1:
> > > > > > vabs.f32d1, d1
> > > > > > vabs.f32d0, d0
> > > > > > vcge.f32d0, d0, d1
> > > > > > bx  lr
> > > > > >
> > > > > > This happens because, the middle-end inverts the comparison to b
> <= a,
> > > > > > .optimized dump:
> > > > > >  _8 = __builtin_neon_vabsv2sf (a_4(D));
> > > > > >   _7 = __builtin_neon_vabsv2sf (b_5(D));
> > > > > >   _1 = _7 <= _8;
> > > > > >   _2 = VIEW_CONVERT_EXPR(_1);
> > > > > >   _6 = VIEW_CONVERT_EXPR(_2);
> > > > > >   return _6;
> > > > > >
> > > > > > and combine fails to match the following pattern:
> > > > > > (set (reg:V2SI 121)
> > > > > > (neg:V2SI (le:V2SI (abs:V2SF (reg:V2SF 123))
> > > > > > (abs:V2SF (reg:V2SF 122)
> > > > > >
> > > > > > because neon_vca pattern has GTGE code
> iterator.
> > > > > > The attached patch adjusts the neon_vca patterns to use GLTE
> instead
> > > > > > similar to neon_vca_fp16insn, and removes
> > > > NEON_VACMP iterator.
> > > > > > Code-gen with patch:
> > > > > > f1:
> > > > > > vacle.f32   d0, d1, d0
> > > > > > bx  lr
> > > > > >
> > > > > > Bootstrapped + tested on arm-linux-gnueabihf and cross-tested on
> arm*-
> > > > *-*.
> > > > > > OK to commit ?
> > >
> > > Is that inversion guaranteed to happen (is it a canonicalization rule)?
> > I think it follows the following rule for canonicalization from
> > tree_swap_operands_p:
> >   /* It is preferable to swap two SSA_NAME to ensure a canonical form
> >  for commutative and comparison operators.  Ensuring a canonical
> >  form allows the optimizers to find additional redundancies without
> >  having to explicitly check for both orderings.  */
> >   if (TREE_CODE (arg0) == SSA_NAME
> >   && TREE_CODE (arg1) == SSA_NAME
> >   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
> > return 1;
> >
> > For the above test-case, it's ccp1 that inverts the comparison.
> > The input to ccp1 pass is:
> >   _12 = __builtin_neon_vabsv2sf (a_6(D));
> >   _14 = _12;
> >   _1 = _14;
> >   _11 = __builtin_neon_vabsv2sf (b_8(D));
> >   _16 = _11;
> >   _2 = _16;
> >   _3 = _1 >= _2;
> >   _4 = VEC_COND_EXPR <_3, { -1, -1 }, { 0, 0 }>;
> >   _10 = VIEW_CONVERT_EXPR(_4);
> >   return _10;
> >
> > _3 = _1 >= _2 is folded into:
> > _3 = _12 >= _11
> >
> > Since _12 is higher ssa version than _11, it is canonicalized to:
> > _3 = _11 <= _12.
> >
> Hi Kyrill,
> Is it OK to push given the above canonicalization ?

Hi Prathamesh,

Yes, that's okay, thanks for checking.
Kyrill

> 
> Thanks,
> Prathamesh
> > Thanks,
> > Prathamesh
> > > If so, ok.
> > > Thanks,
> > > Kyrill
> > >
> > >
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh


Re: [Patch, v2] contrib/mklog.py: Improve PR handling (was: Re: git gcc-commit-mklog doesn't extract PR number to ChangeLog)

2021-06-21 Thread Tobias Burnus

On 21.06.21 10:09, Martin Liška wrote:


$ pytest test_mklog.py
FAILED test_mklog.py::TestMklog::test_sorting - AssertionError: assert
'\n\tPR 50209...New test.\n\n' == 'gcc/ChangeLo...New test.\n\n'

Aha, missed that there is indeed a testsuite - nice!

$ flake8 mklog.py
mklog.py:187:23: Q000 Remove bad quotes

I have now filled:
https://bugs.launchpad.net/ubuntu/+source/python-pytest-flake8/+bug/1933075


+# PR number in the file name
+fname = os.path.basename(file.path)


This is a dead code.


+ fname = os.path.splitext(fname)[0]
+m = pr_filename_regex.search(fname)

It does not look like dead code to me.

+ parser.add_argument('-b', '--pr-numbers', action='append',
+help='Add the specified PRs (comma separated)')


Do we really want to support '-b 1 -b 2' and also -b '1,2' formats?
Seems to me quite
complicated.


I don't have a strong opinion. I started with '-b 123,245', believing
that the syntax is fine. But then I realized that without '-p'
specifying multiple '-b' looks better by having multiple '-b' if 'PR
/'  (needed for -p as the string is than taken as is). Thus,
I ended up supporting either variant.

But I also happily drop the ',' support.

Change: One quote change, one test_mklog update.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
contrib/mklog.py: Improve PR handling

Co-authored-by: Martin Sebor 

contrib/ChangeLog:

	* mklog.py (bugzilla_url): Fetch also component.
	(pr_filename_regex): New.
	(get_pr_titles): Update PR string with correct format and component.
	(generate_changelog): Take additional PRs; extract PR from the
	filename.
	(__main__): Add -b/--pr-numbers argument.
	* test_mklog.py (EXPECTED4): Update to expect a PR for the new file.

 contrib/mklog.py  | 41 -
 contrib/test_mklog.py |  3 +++
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 1f59055e723..e49d14d0859 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -42,6 +42,7 @@ pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)')
 prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)')
 dr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PDR [0-9]+)')
 dg_regex = re.compile(r'{\s+dg-(error|warning)')
+pr_filename_regex = re.compile(r'(^|[\W_])[Pp][Rr](?P\d{4,})')
 identifier_regex = re.compile(r'^([a-zA-Z0-9_#].*)')
 comment_regex = re.compile(r'^\/\*')
 struct_regex = re.compile(r'^(class|struct|union|enum)\s+'
@@ -52,7 +53,7 @@ fn_regex = re.compile(r'([a-zA-Z_][^()\s]*)\s*\([^*]')
 template_and_param_regex = re.compile(r'<[^<>]*>')
 md_def_regex = re.compile(r'\(define.*\s+"(.*)"')
 bugzilla_url = 'https://gcc.gnu.org/bugzilla/rest.cgi/bug?id=%s;' \
-   'include_fields=summary'
+   'include_fields=summary,component'
 
 function_extensions = {'.c', '.cpp', '.C', '.cc', '.h', '.inc', '.def', '.md'}
 
@@ -118,20 +119,23 @@ def sort_changelog_files(changed_file):
 
 
 def get_pr_titles(prs):
-output = ''
-for pr in prs:
+output = []
+for idx, pr in enumerate(prs):
 pr_id = pr.split('/')[-1]
 r = requests.get(bugzilla_url % pr_id)
 bugs = r.json()['bugs']
 if len(bugs) == 1:
-output += '%s - %s\n' % (pr, bugs[0]['summary'])
-print(output)
+prs[idx] = 'PR %s/%s' % (bugs[0]['component'], pr_id)
+out = '%s - %s\n' % (prs[idx], bugs[0]['summary'])
+if out not in output:
+output.append(out)
 if output:
-output += '\n'
-return output
+output.append('')
+return '\n'.join(output)
 
 
-def generate_changelog(data, no_functions=False, fill_pr_titles=False):
+def generate_changelog(data, no_functions=False, fill_pr_titles=False,
+   additional_prs=None):
 changelogs = {}
 changelog_list = []
 prs = []
@@ -139,6 +143,8 @@ def generate_changelog(data, no_functions=False, fill_pr_titles=False):
 diff = PatchSet(data)
 global firstpr
 
+if additional_prs:
+prs = [pr for pr in additional_prs if pr not in prs]
 for file in diff:
 # skip files that can't be parsed
 if file.path == '/dev/null':
@@ -154,21 +160,33 @@ def generate_changelog(data, no_functions=False, fill_pr_titles=False):
 # Only search first ten lines as later lines may
 # contains commented code which a note that it
 # has not been tested due to a certain PR or DR.
+this_file_prs = []
 for line in list(file)[0][0:10]:
 m = pr_regex.search(line.value)
 if m:
 pr = m.group('pr')
 if pr not in prs:
 prs.append(pr)
+

Re: [x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Jun 21, 2021 at 09:18:28AM +0200, Uros Bizjak via Gcc-patches wrote:
> > 2021-06-20  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/11877
> > * config/i386/i386.md: New define_peephole2s to shrink writing
> > 1, 2 or 4 consecutive zeros to memory when optimizing for size.
> >
> > gcc/testsuite/ChangeLog
> > PR target/11877
> > * gcc.target/i386/pr11877.c: New test case.
> 
> OK.

It unfortunately doesn't extend well to larger memory clearing.
Consider e.g.
void
foo (int *p)
{
  p[0] = 0;
  p[7] = 0;
  p[23] = 0;
  p[41] = 0;
  p[48] = 0;
  p[59] = 0;
  p[69] = 0;
  p[78] = 0;
  p[83] = 0;
  p[89] = 0;
  p[98] = 0;
  p[121] = 0;
  p[132] = 0;
  p[143] = 0;
  p[154] = 0;
}
where with the patch we emit:
xorl%eax, %eax
xorl%edx, %edx
xorl%ecx, %ecx
xorl%esi, %esi
xorl%r8d, %r8d
movl%eax, (%rdi)
movl%eax, 28(%rdi)
movl%eax, 92(%rdi)
movl%eax, 164(%rdi)
movl%edx, 192(%rdi)
movl%edx, 236(%rdi)
movl%edx, 276(%rdi)
movl%edx, 312(%rdi)
movl%ecx, 332(%rdi)
movl%ecx, 356(%rdi)
movl%ecx, 392(%rdi)
movl%ecx, 484(%rdi)
movl%esi, 528(%rdi)
movl%esi, 572(%rdi)
movl%r8d, 616(%rdi)
Here is an incremental (so far untested) patch that emits:
xorl%eax, %eax
movl%eax, (%rdi)
movl%eax, 28(%rdi)
movl%eax, 92(%rdi)
movl%eax, 164(%rdi)
movl%eax, 192(%rdi)
movl%eax, 236(%rdi)
movl%eax, 276(%rdi)
movl%eax, 312(%rdi)
movl%eax, 332(%rdi)
movl%eax, 356(%rdi)
movl%eax, 392(%rdi)
movl%eax, 484(%rdi)
movl%eax, 528(%rdi)
movl%eax, 572(%rdi)
movl%eax, 616(%rdi)
instead:

2021-06-21  Jakub Jelinek  

PR target/11877
* config/i386/i386-protos.h (ix86_zero_stores_peep2_p): Declare.
* config/i386/i386.c (ix86_zero_stores_peep2_p): New function.
* config/i386/i386.md (peephole2s for 1/2/4 stores of const0_rtx):
Remove "" from match_operand.  Add peephole2s for 1/2/4 stores of
const0_rtx following previous successful peep2s.

--- gcc/config/i386/i386-protos.h.jj2021-06-07 09:24:57.696690116 +0200
+++ gcc/config/i386/i386-protos.h   2021-06-21 10:21:05.428887980 +0200
@@ -111,6 +111,7 @@ extern bool ix86_use_lea_for_mov (rtx_in
 extern bool ix86_avoid_lea_for_addr (rtx_insn *, rtx[]);
 extern void ix86_split_lea_for_addr (rtx_insn *, rtx[], machine_mode);
 extern bool ix86_lea_for_add_ok (rtx_insn *, rtx[]);
+extern bool ix86_zero_stores_peep2_p (rtx_insn *, rtx);
 extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
 extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
 extern bool ix86_agi_dependent (rtx_insn *set_insn, rtx_insn *use_insn);
--- gcc/config/i386/i386.c.jj   2021-06-21 09:39:21.622487840 +0200
+++ gcc/config/i386/i386.c  2021-06-21 10:21:12.389794740 +0200
@@ -15186,6 +15186,33 @@ ix86_lea_for_add_ok (rtx_insn *insn, rtx
   return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0, false);
 }
 
+/* Return true if insns before FIRST_INSN (which is of the form
+   (set (memory) (zero_operand)) are all also either in the
+   same form, or (set (zero_operand) (const_int 0)).  */
+
+bool
+ix86_zero_stores_peep2_p (rtx_insn *first_insn, rtx zero_operand)
+{
+  rtx_insn *insn = first_insn;
+  for (int count = 0; count < 512; count++)
+{
+  insn = prev_nonnote_nondebug_insn_bb (insn);
+  if (!insn) 
+   return false;
+  rtx set = single_set (insn);
+  if (!set)
+   return false;
+  if (SET_SRC (set) == const0_rtx
+ && rtx_equal_p (SET_DEST (set), zero_operand))
+   return true;
+  if (set != PATTERN (insn)
+ || !rtx_equal_p (SET_SRC (set), zero_operand)
+ || !memory_operand (SET_DEST (set), VOIDmode))
+   return false;
+}
+  return false;
+}
+
 /* Return true if destination reg of SET_BODY is shift count of
USE_BODY.  */
 
--- gcc/config/i386/i386.md.jj  2021-06-21 09:42:04.086303699 +0200
+++ gcc/config/i386/i386.md 2021-06-21 10:21:31.932532964 +0200
@@ -19360,10 +19360,10 @@ (define_peephole2
 ;; When optimizing for size, zeroing memory should use a register.
 (define_peephole2
   [(match_scratch:SWI48 0 "r")
-   (set (match_operand:SWI48 1 "memory_operand" "") (const_int 0))
-   (set (match_operand:SWI48 2 "memory_operand" "") (const_int 0))
-   (set (match_operand:SWI48 3 "memory_operand" "") (const_int 0))
-   (set (match_operand:SWI48 4 "memory_operand" "") (const_int 0))]
+   (set (match_operand:SWI48 1 "memory_operand") (const_int 0))
+   (set (match_operand:SWI48 2 "memory_operand") (const_int 0))
+   (set (match_operand:SWI48 3 "memory_operand") (const_int 0))
+   

Re: [ARM] PR66791: Replace calls to builtin in vmul_n (a, b) intrinsics with __a * __b

2021-06-21 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 14 Jun 2021 at 13:27, Prathamesh Kulkarni
 wrote:
>
> On Mon, 7 Jun 2021 at 12:45, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 31 May 2021 at 16:01, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 31 May 2021 at 15:22, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Wed, 26 May 2021 at 14:07, Marc Glisse  wrote:
> > > > >
> > > > > On Wed, 26 May 2021, Prathamesh Kulkarni via Gcc-patches wrote:
> > > > >
> > > > > > The attached patch removes calls to builtins in vmul_n* (a, b) with 
> > > > > > __a * __b.
> > > > >
> > > > > I am not familiar with neon, but are __a and __b unsigned here? 
> > > > > Otherwise,
> > > > > is vmul_n already undefined in case of overflow?
> > > > Hi Marc,
> > > > Sorry for late reply, for vmul_n_s*, I think they are signed
> > > > (intx_t).
> > > Oops, I meant intx_t.
> > > > I am not sure how should the intrinsic behave in case of signed 
> > > > overflow,
> > > > but I am assuming it's OK since vmul_s* intrinsics leave it undefined 
> > > > too.
> > > > Kyrill, is it OK to leave vmul_s* and vmul_n_s* undefined in case of 
> > > > overflow ?
> > The attached version fixes one fallout I missed earlier.
> > Is this OK to commit ?
> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html
ping * 2 https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572037.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > --
> > > > > Marc Glisse


Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-06-21 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 14 Jun 2021 at 13:31, Prathamesh Kulkarni
 wrote:
>
> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>  wrote:
> >
> > On Fri, 4 Jun 2021 at 13:15, Christophe Lyon  
> > wrote:
> > >
> > > On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > > As mentioned in PR, for the following test-case:
> > > >
> > > > #include 
> > > >
> > > > bfloat16x4_t f1 (bfloat16_t a)
> > > > {
> > > >   return vdup_n_bf16 (a);
> > > > }
> > > >
> > > > bfloat16x4_t f2 (bfloat16_t a)
> > > > {
> > > >   return (bfloat16x4_t) {a, a, a, a};
> > > > }
> > > >
> > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-abi=softfp
> > > > -march=armv8.2-a+bf16+fp16 results in f2 not being vectorized:
> > > >
> > > > f1:
> > > > vdup.16 d16, r0
> > > > vmovr0, r1, d16  @ v4bf
> > > > bx  lr
> > > >
> > > > f2:
> > > > mov r3, r0  @ __bf16
> > > > adr r1, .L4
> > > > ldrdr0, [r1]
> > > > mov r2, r3  @ __bf16
> > > > mov ip, r3  @ __bf16
> > > > bfi r1, r2, #0, #16
> > > > bfi r0, ip, #0, #16
> > > > bfi r1, r3, #16, #16
> > > > bfi r0, r2, #16, #16
> > > > bx  lr
> > > >
> > > > This seems to happen because vec_init pattern in neon.md has VDQ mode
> > > > iterator, which doesn't include V4BF. In attached patch, I changed
> > > > mode
> > > > to VDQX which seems to work for the test-case, and the compiler now 
> > > > generates:
> > > >
> > > > f2:
> > > > vdup.16 d16, r0
> > > > vmovr0, r1, d16  @ v4bf
> > > > bx  lr
> > > >
> > > > However, the pattern is also gated on TARGET_HAVE_MVE and I am not
> > > > sure if either VDQ or VDQX are correct modes for MVE since MVE has
> > > > only 128-bit vectors ?
> > > >
> > >
> > > I think patterns common to both Neon and MVE should be moved to
> > > vec-common.md, I don't know why such patterns were left in neon.md.
> > Since we end up calling neon_expand_vector_init for both NEON and MVE,
> > I am not sure if we should separate the pattern ?
> > Would it make sense to FAIL if the mode size isn't 16 bytes for MVE as
> > in attached patch so
> > it will call neon_expand_vector_init only for 128-bit vectors ?
> > Altho hard-coding 16 in the pattern doesn't seem a good idea to me either.
> ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572342.html
> (attaching patch as text).
ping https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572648.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > That being said, I suggest you look at other similar patterns in
> > > vec-common.md, most of which are gated on
> > > ARM_HAVE__ARITH
> > > and possibly beware of issues with iwmmxt :-)
> > >
> > > Christophe
> > >
> > > > Thanks,
> > > > Prathamesh


Re: [ARM] PR97906 - Missed lowering abs(a) >= abs(b) to vacge

2021-06-21 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 16 Jun 2021 at 15:49, Prathamesh Kulkarni
 wrote:
>
> On Mon, 14 Jun 2021 at 16:15, Kyrylo Tkachov  wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Prathamesh Kulkarni 
> > > Sent: 14 June 2021 08:58
> > > To: gcc Patches ; Kyrylo Tkachov
> > > 
> > > Subject: Re: [ARM] PR97906 - Missed lowering abs(a) >= abs(b) to vacge
> > >
> > > On Mon, 7 Jun 2021 at 12:46, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Tue, 1 Jun 2021 at 16:03, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > > As mentioned in PR, for following test-case:
> > > > >
> > > > > #include 
> > > > >
> > > > > uint32x2_t f1(float32x2_t a, float32x2_t b)
> > > > > {
> > > > >   return vabs_f32 (a) >= vabs_f32 (b);
> > > > > }
> > > > >
> > > > > uint32x2_t f2(float32x2_t a, float32x2_t b)
> > > > > {
> > > > >   return (uint32x2_t) __builtin_neon_vcagev2sf (a, b);
> > > > > }
> > > > >
> > > > > We generate vacge for f2, but with -ffast-math, we generate following
> > > for f1:
> > > > > f1:
> > > > > vabs.f32d1, d1
> > > > > vabs.f32d0, d0
> > > > > vcge.f32d0, d0, d1
> > > > > bx  lr
> > > > >
> > > > > This happens because, the middle-end inverts the comparison to b <= a,
> > > > > .optimized dump:
> > > > >  _8 = __builtin_neon_vabsv2sf (a_4(D));
> > > > >   _7 = __builtin_neon_vabsv2sf (b_5(D));
> > > > >   _1 = _7 <= _8;
> > > > >   _2 = VIEW_CONVERT_EXPR(_1);
> > > > >   _6 = VIEW_CONVERT_EXPR(_2);
> > > > >   return _6;
> > > > >
> > > > > and combine fails to match the following pattern:
> > > > > (set (reg:V2SI 121)
> > > > > (neg:V2SI (le:V2SI (abs:V2SF (reg:V2SF 123))
> > > > > (abs:V2SF (reg:V2SF 122)
> > > > >
> > > > > because neon_vca pattern has GTGE code iterator.
> > > > > The attached patch adjusts the neon_vca patterns to use GLTE instead
> > > > > similar to neon_vca_fp16insn, and removes
> > > NEON_VACMP iterator.
> > > > > Code-gen with patch:
> > > > > f1:
> > > > > vacle.f32   d0, d1, d0
> > > > > bx  lr
> > > > >
> > > > > Bootstrapped + tested on arm-linux-gnueabihf and cross-tested on arm*-
> > > *-*.
> > > > > OK to commit ?
> >
> > Is that inversion guaranteed to happen (is it a canonicalization rule)?
> I think it follows the following rule for canonicalization from
> tree_swap_operands_p:
>   /* It is preferable to swap two SSA_NAME to ensure a canonical form
>  for commutative and comparison operators.  Ensuring a canonical
>  form allows the optimizers to find additional redundancies without
>  having to explicitly check for both orderings.  */
>   if (TREE_CODE (arg0) == SSA_NAME
>   && TREE_CODE (arg1) == SSA_NAME
>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
> return 1;
>
> For the above test-case, it's ccp1 that inverts the comparison.
> The input to ccp1 pass is:
>   _12 = __builtin_neon_vabsv2sf (a_6(D));
>   _14 = _12;
>   _1 = _14;
>   _11 = __builtin_neon_vabsv2sf (b_8(D));
>   _16 = _11;
>   _2 = _16;
>   _3 = _1 >= _2;
>   _4 = VEC_COND_EXPR <_3, { -1, -1 }, { 0, 0 }>;
>   _10 = VIEW_CONVERT_EXPR(_4);
>   return _10;
>
> _3 = _1 >= _2 is folded into:
> _3 = _12 >= _11
>
> Since _12 is higher ssa version than _11, it is canonicalized to:
> _3 = _11 <= _12.
>
Hi Kyrill,
Is it OK to push given the above canonicalization ?

Thanks,
Prathamesh
> Thanks,
> Prathamesh
> > If so, ok.
> > Thanks,
> > Kyrill
> >
> >
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Prathamesh


RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-06-21 Thread Tamar Christina via Gcc-patches
Ping

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Monday, June 14, 2021 1:06 PM
> To: Richard Sandiford 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Biener
> 
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Hi Richard,
> 
> I've attached a new version of the patch with the changes.
> I have also added 7 new tests in the testsuite to check the cases you
> mentioned.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * optabs.def (usdot_prod_optab): New.
>   * doc/md.texi: Document it and clarify other dot prod optabs.
>   * optabs-tree.h (enum optab_subtype): Add
> optab_vector_mixed_sign.
>   * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
>   * optabs.c (expand_widen_pattern_expr): Likewise.
>   * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>   * tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
>   * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
>   optab subtype.
>   (vect_widened_op_tree): Optionally ignore
>   mismatch types.
>   (vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> 00caf3844ccf8ea289d581839766502d51b9e8d7..1356afb7f903f17c198103562b
> 5cd145ecb9966f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes
> an additional mask operand
> 
>  @cindex @code{sdot_prod@var{m}} instruction pattern  @item
> @samp{sdot_prod@var{m}}
> +
> +Compute the sum of the products of two signed elements.
> +Operand 1 and operand 2 are of the same mode. Their product, which is
> +of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the
> +following signs
> +
> +@smallexample
> +sdot ==
> +   res = sign-ext (a) * sign-ext (b) + c @dots{} @end smallexample
> +
>  @cindex @code{udot_prod@var{m}} instruction pattern -@itemx
> @samp{udot_prod@var{m}} -Compute the sum of the products of two
> signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode
> equal or -wider than the mode of the product. The result is placed in operand
> 0, which -is of the same mode as operand 3.
> +@item @samp{udot_prod@var{m}}
> +
> +Compute the sum of the products of two unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their product, which is
> +of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the
> +following signs
> +
> +@smallexample
> +udot ==
> +   res = zero-ext (a) * zero-ext (b) + c @dots{} @end smallexample
> +
> +
> +
> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@item @samp{usdot_prod@var{m}}
> +Compute the sum of the products of elements of different signs.
> +Operand 1 must be unsigned and operand 2 signed. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following
> signs
> +
> +@smallexample
> +usdot ==
> +   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
> +@dots{}
> +@end smallexample
> 
>  @cindex @code{ssad@var{m}} instruction pattern
>  @item @samp{ssad@var{m}}
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index
> c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b31
> 4830e6b564b37abb 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -29,7 +29,8 @@ enum optab_subtype
>  {
>optab_default,
>optab_scalar,
> -  optab_vector
> +  optab_vector,
> +  optab_vector_mixed_sign
>  };
> 
>  /* Return the optab used for computing the given operation on the type
> given by
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index
> 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994
> bc5311e9c010bb 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
>return TYPE_UNSIGNED (type) ? usum_widen_optab :
> ssum_widen_optab;
> 
>  case DOT_PROD_EXPR:
> -  return TYPE_UNSIGNED (type) ? 

Re: [Patch, v2] contrib/mklog.py: Improve PR handling (was: Re: git gcc-commit-mklog doesn't extract PR number to ChangeLog)

2021-06-21 Thread Martin Liška

Hi.

I see the following warnings:

$ pytest test_mklog.py
FAILED test_mklog.py::TestMklog::test_sorting - AssertionError: assert '\n\tPR 
50209...New test.\n\n' == 'gcc/ChangeLo...New test.\n\n'

$ flake8 mklog.py
mklog.py:187:23: Q000 Remove bad quotes

and ...


contrib/mklog.py: Improve PR handling

Co-authored-by: Martin Sebor 

contrib/ChangeLog:

* mklog.py (bugzilla_url): Fetch also component.
(pr_filename_regex): New.
(get_pr_titles): Update PR string with correct format and component.
(generate_changelog): Take additional PRs; extract PR from the
filename.
(__main__): Add -b/--pr-numbers argument.

 contrib/mklog.py | 41 -
 1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 1f59055e723..bba6c1a0e1a 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -42,6 +42,7 @@ pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR 
[a-z+-]+\/[0-9]+)')
 prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)')
 dr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PDR [0-9]+)')
 dg_regex = re.compile(r'{\s+dg-(error|warning)')
+pr_filename_regex = re.compile(r'(^|[\W_])[Pp][Rr](?P\d{4,})')
 identifier_regex = re.compile(r'^([a-zA-Z0-9_#].*)')
 comment_regex = re.compile(r'^\/\*')
 struct_regex = re.compile(r'^(class|struct|union|enum)\s+'
@@ -52,7 +53,7 @@ fn_regex = re.compile(r'([a-zA-Z_][^()\s]*)\s*\([^*]')
 template_and_param_regex = re.compile(r'<[^<>]*>')
 md_def_regex = re.compile(r'\(define.*\s+"(.*)"')
 bugzilla_url = 'https://gcc.gnu.org/bugzilla/rest.cgi/bug?id=%s;' \
-   'include_fields=summary'
+   'include_fields=summary,component'
 
 function_extensions = {'.c', '.cpp', '.C', '.cc', '.h', '.inc', '.def', '.md'}
 
@@ -118,20 +119,23 @@ def sort_changelog_files(changed_file):
 
 
 def get_pr_titles(prs):

-output = ''
-for pr in prs:
+output = []
+for idx, pr in enumerate(prs):
 pr_id = pr.split('/')[-1]
 r = requests.get(bugzilla_url % pr_id)
 bugs = r.json()['bugs']
 if len(bugs) == 1:
-output += '%s - %s\n' % (pr, bugs[0]['summary'])
-print(output)
+prs[idx] = 'PR %s/%s' % (bugs[0]['component'], pr_id)
+out = '%s - %s\n' % (prs[idx], bugs[0]['summary'])
+if out not in output:
+output.append(out)
 if output:
-output += '\n'
-return output
+output.append('')
+return '\n'.join(output)
 
 
-def generate_changelog(data, no_functions=False, fill_pr_titles=False):

+def generate_changelog(data, no_functions=False, fill_pr_titles=False,
+   additional_prs=None):
 changelogs = {}
 changelog_list = []
 prs = []
@@ -139,6 +143,8 @@ def generate_changelog(data, no_functions=False, 
fill_pr_titles=False):
 diff = PatchSet(data)
 global firstpr
 
+if additional_prs:

+prs = [pr for pr in additional_prs if pr not in prs]
 for file in diff:
 # skip files that can't be parsed
 if file.path == '/dev/null':
@@ -154,21 +160,33 @@ def generate_changelog(data, no_functions=False, 
fill_pr_titles=False):
 # Only search first ten lines as later lines may
 # contains commented code which a note that it
 # has not been tested due to a certain PR or DR.
+this_file_prs = []
 for line in list(file)[0][0:10]:
 m = pr_regex.search(line.value)
 if m:
 pr = m.group('pr')
 if pr not in prs:
 prs.append(pr)
+this_file_prs.append(pr.split('/')[-1])
 else:
 m = dr_regex.search(line.value)
 if m:
 dr = m.group('dr')
 if dr not in prs:
 prs.append(dr)
+this_file_prs.append(dr.split('/')[-1])
 elif dg_regex.search(line.value):
 # Found dg-warning/dg-error line
 break
+# PR number in the file name
+fname = os.path.basename(file.path)


This is a dead code.


+fname = os.path.splitext(fname)[0]
+m = pr_filename_regex.search(fname)
+if m:
+pr = m.group('pr')
+pr2 = "PR " + pr


Bad quotes here.


+if pr not in this_file_prs and pr2 not in prs:
+prs.append(pr2)
 
 if prs:

 firstpr = prs[0]
@@ -286,6 +304,8 @@ if __name__ == '__main__':
 parser = argparse.ArgumentParser(description=help_message)
 parser.add_argument('input', nargs='?',
 help='Patch file (or missing, read standard input)')
+parser.add_argument('-b', '--pr-numbers', action='append',
+help='Add the 

Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-21 Thread Hongtao Liu via Gcc-patches
On Mon, Jun 21, 2021 at 3:28 PM Uros Bizjak via Gcc-patches
 wrote:
>
> On Mon, Jun 21, 2021 at 6:56 AM liuhongt  wrote:
> >
> > The avx512 supports bitwise operations with mask registers, but the
> > throughput of those instructions is much lower than that of the
> > corresponding gpr version, so we would additionally disparages
> > slightly the mask register alternative for bitwise operations in the
> > LRA.
>
> This was the reason for UNSPEC tagged instructions with mask
> registers, used mainly for builtins.
>
> Also, care should be taken if we want mask registers to be used under
> GPR pressure, or it is better to spill GPR registers. In the past,
> DImode values caused a lot of problems with MMX registers on x86-64,
> but we were able to hand-tune the allocator in the way you propose.
>
> Let's try the proposed approach to see what happens.
>
> > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, allocate
> > MASK_REGS first since it has already been disparaged.
> >
> > gcc/ChangeLog:
> >
> > PR target/101142
> > * config/i386/i386.md: (*anddi_1): Disparage slightly the mask
> > register alternative.
> > (*and_1): Ditto.
> > (*andqi_1): Ditto.
> > (*andn_1): Ditto.
> > (*_1): Ditto.
> > (*qi_1): Ditto.
> > (*one_cmpl2_1): Ditto.
> > (*one_cmplsi2_1_zext): Ditto.
> > (*one_cmplqi2_1): Ditto.
> > * config/i386/i386.c (x86_order_regs_for_local_alloc): Change
> > the order of mask registers to be before general registers.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/101142
> > * gcc.target/i386/spill_to_mask-1.c: Adjust testcase.
> > * gcc.target/i386/spill_to_mask-2.c: Adjust testcase.
> > * gcc.target/i386/spill_to_mask-3.c: Adjust testcase.
> > * gcc.target/i386/spill_to_mask-4.c: Adjust testcase.
>
> OK with a comment addition, see inline.
>
> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386.c|  8 +-
> >  gcc/config/i386/i386.md   | 20 ++---
> >  .../gcc.target/i386/spill_to_mask-1.c | 89 +--
> >  .../gcc.target/i386/spill_to_mask-2.c | 11 ++-
> >  .../gcc.target/i386/spill_to_mask-3.c | 11 ++-
> >  .../gcc.target/i386/spill_to_mask-4.c | 11 ++-
> >  6 files changed, 91 insertions(+), 59 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index a61255857ff..a651853ca3b 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -20463,6 +20463,10 @@ x86_order_regs_for_local_alloc (void)
> > int pos = 0;
> > int i;
> >
> > +   /* Mask register.  */
> > +   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > + reg_alloc_order [pos++] = i;
>
> Please add a comment why mask registers should come first.
Thanks for the review, this is the patch i'm check in.
>
> > /* First allocate the local general purpose registers.  */
> > for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> >   if (GENERAL_REGNO_P (i) && call_used_or_fixed_reg_p (i))
> > @@ -20489,10 +20493,6 @@ x86_order_regs_for_local_alloc (void)
> > for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
> >   reg_alloc_order [pos++] = i;
> >
> > -   /* Mask register.  */
> > -   for (i = FIRST_MASK_REG; i <= LAST_MASK_REG; i++)
> > - reg_alloc_order [pos++] = i;
> > -
> > /* x87 registers.  */
> > if (TARGET_SSE_MATH)
> >   for (i = FIRST_STACK_REG; i <= LAST_STACK_REG; i++)
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 6e4abf32e7c..3eef56b27d7 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -9138,7 +9138,7 @@ (define_insn_and_split "*anddi3_doubleword"
> >  })
> >
> >  (define_insn "*anddi_1"
> > -  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,k")
> > +  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,?k")
> > (and:DI
> >  (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm,k")
> >  (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m,L,k")))
> > @@ -9226,7 +9226,7 @@ (define_insn "*andsi_1_zext"
> > (set_attr "mode" "SI")])
> >
> >  (define_insn "*and_1"
> > -  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya,k")
> > +  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya,?k")
> > (and:SWI24 (match_operand:SWI24 1 "nonimmediate_operand" 
> > "%0,0,qm,k")
> >(match_operand:SWI24 2 "" 
> > "r,m,L,k")))
> > (clobber (reg:CC FLAGS_REG))]
> > @@ -9255,7 +9255,7 @@ (define_insn "*and_1"
> > (set_attr "mode" ",,SI,")])
> >
> >  (define_insn "*andqi_1"
> > -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,k")
> > +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k")
> > (and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
> > (match_operand:QI 2 

  1   2   >