Re: [PATCH] c-family: Improve MEM_REF printing for diagnostics [PR98597]

2021-01-13 Thread Richard Biener
On Wed, 13 Jan 2021, Jakub Jelinek wrote:

> Hi!
> 
> The following patch doesn't actually fix the print_mem_ref bugs, I've kept
> it for now as broken as it was, but at least improves the cases where
> we can unambiguously map back MEM[ + off] into some particular
> reference (e.g. something.foo[1].bar etc.).
> In the distant past I think we were folding such MEM_REFs back to
> COMPONENT_REFs and ARRAY_REFs, but we've stopped doing that.

Yeah, because it has different semantics - *(((int *)t + 3)
accesses an int object while t.u.b accesses a 't' object from the TBAA
perspective.

>  But for
> diagnostics that is what the user actually want to see IMHO.
> So on the attached testcase, instead of printing what is in left column
> it prints what is in right column:
> ((int*)t) + 3 t.u.b
> ((int*)t) + 6 t.u.e.i
> ((int*)t) + 8 t.v
> s + 1 s[1]

so while that's "nice" in general, for TBAA diagnostics it might actually
be misleading.

I wonder whether we absolutely need to print a C expression here.
We could print, instead of *((int *)t + 3), "access to a memory
object of type 'int' at offset 12 bytes from 't'", thus explain
in plain english.

That said, *((int *)t + 3) is exactly what the access is,
semantically.  There's the case of a mismatch of the access type
and the TBAA type which we cannot write down in C terms but maybe
we want to have a builtin for this?  __builtin_access (ptr, lvalue-type,
tbaa-type)?

> Of course, print_mem_ref needs to be also fixed to avoid printing the
> nonsense it is printing right now, t is a structure type, so it can't be
> cast to int* in C and in C++ only using some user operator, and
> the result of what it printed is a pointer, while the uninitialized reads
> are int.
> 
> I was hoping Martin would fix that, but given his comment in the PR I think
> I'll fix it myself tomorrow.
> 
> Anyway, this patch is useful on its own.  Bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

In the light of Martins patch this is probably reasonable but still
the general direction is wrong (which is why I didn't approve Martins
original patch).  I'm also somewhat disappointed we're breaking this
so late in the cycle.

c_fold_indirect_ref_for_warn doesn't look like it is especially
careful about error recovery issues (error_mark_node in random
places of the trees).  Maybe that never happens.

Thanks,
Richard.

> 2021-01-13  Jakub Jelinek  
> 
>   PR tree-optimization/98597
>   * c-pretty-print.c (c_fold_indirect_ref_for_warn): New function.
>   (print_mem_ref): Call it.
> 
>   * gcc.dg/uninit-40.c: New test.
> 
> --- gcc/c-family/c-pretty-print.c.jj  2021-01-13 08:02:09.425498954 +0100
> +++ gcc/c-family/c-pretty-print.c 2021-01-13 15:02:57.860329998 +0100
> @@ -1809,6 +1809,113 @@ pp_c_call_argument_list (c_pretty_printe
>pp_c_right_paren (pp);
>  }
>  
> +/* Try to fold *(type *) into op.fld.fld2[1] if possible.
> +   Only used for printing expressions.  Should punt if ambiguous
> +   (e.g. in unions).  */
> +
> +static tree
> +c_fold_indirect_ref_for_warn (location_t loc, tree type, tree op,
> +   offset_int )
> +{
> +  tree optype = TREE_TYPE (op);
> +  if (off == 0)
> +{
> +  if (lang_hooks.types_compatible_p (optype, type))
> + return op;
> +  /* *(foo *) => __real__ complexfoo */
> +  else if (TREE_CODE (optype) == COMPLEX_TYPE
> +&& lang_hooks.types_compatible_p (type, TREE_TYPE (optype)))
> + return build1_loc (loc, REALPART_EXPR, type, op);
> +}
> +  /* ((foo*))[1] => __imag__ complexfoo */
> +  else if (TREE_CODE (optype) == COMPLEX_TYPE
> +&& lang_hooks.types_compatible_p (type, TREE_TYPE (optype))
> +&& tree_to_uhwi (TYPE_SIZE_UNIT (type)) == off)
> +{
> +  off = 0;
> +  return build1_loc (loc, IMAGPART_EXPR, type, op);
> +}
> +  /* ((foo *))[x] => fooarray[x] */
> +  if (TREE_CODE (optype) == ARRAY_TYPE
> +  && TYPE_SIZE_UNIT (TREE_TYPE (optype))
> +  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (optype))) == INTEGER_CST
> +  && !integer_zerop (TYPE_SIZE_UNIT (TREE_TYPE (optype
> +{
> +  tree type_domain = TYPE_DOMAIN (optype);
> +  tree min_val = size_zero_node;
> +  if (type_domain && TYPE_MIN_VALUE (type_domain))
> + min_val = TYPE_MIN_VALUE (type_domain);
> +  offset_int el_sz = wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (optype)));
> +  offset_int idx = off / el_sz;
> +  offset_int rem = off % el_sz;
> +  if (TREE_CODE (min_val) == INTEGER_CST)
> + {
> +   tree index
> + = wide_int_to_tree (sizetype, idx + wi::to_offset (min_val));
> +   op = build4_loc (loc, ARRAY_REF, TREE_TYPE (optype), op, index,
> +NULL_TREE, NULL_TREE);
> +   off = rem;
> +   if (tree ret = c_fold_indirect_ref_for_warn (loc, type, op, off))
> + return ret;
> +   return op;
> + }
> +}
> +  /* ((foo 

Re: [PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-01-13 Thread Richard Biener via Gcc-patches
On Wed, Jan 13, 2021 at 11:04 PM David Malcolm via Gcc-patches
 wrote:
>
> gimple.h has this comment for gimple's uid field:
>
>   /* UID of this statement.  This is used by passes that want to
>  assign IDs to statements.  It must be assigned and used by each
>  pass.  By default it should be assumed to contain garbage.  */
>   unsigned uid;
>
> and gimple_set_uid has:
>
>Please note that this UID property is supposed to be undefined at
>pass boundaries.  This means that a given pass should not assume it
>contains any useful value when the pass starts and thus can set it
>to any value it sees fit.
>
> which suggests that any pass can use the uid field as an arbitrary
> scratch space.
>
> PR analyzer/98599 reports a case where this error occurs in LTO mode:
>   fatal error: Cgraph edge statement index out of range
> on certain inputs with -fanalyzer.
>
> The error occurs in the LTRANS phase after -fanalyzer runs in the
> WPA phase.  The analyzer pass writes to the uid fields of all stmts.
>
> The error occurs when LTRANS is streaming callgraph edges back in.
> If I'm reading things correctly, the LTO format uses stmt uids to
> associate call stmts with callgraph edges between WPA and LTRANS.
> For example, in lto-cgraph.c, lto_output_edge writes out the
> gimple_uid, and input_edge reads it back in.
>
> Hence IPA passes that touch the uids in WPA need to restore them,
> or the stream-in at LTRANS will fail.
>
> Is it intended that the LTO machinery relies on the value of the uid
> field being preserved during WPA (or, at least, needs to be saved and
> restored by passes that touch it)?

I belive this is solely at the cgraph stream out to stream in boundary but
this may be a blurred area since while we materialize the whole cgraph
at once the function bodies are streamed in on demand.

Honza can probably clarify things.

Note LTO uses this exactly because of this comment to avoid allocating
extra memory for an 'index' but it could of course leave gimple_uid alone
at some extra expense (eventually paid for in generic cgraph data structures
and thus for not only the streaming time).

> On the assumption that this is the case, this patch updates the comments
> in gimple.h referring to passes being able to set uid to any value to
> note the caveat for IPA passes, and it updates the analyzer to save
> and restore the UIDs, fixing the error.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> OK for master?

The analyzer bits are OK, let's see how Honza can clarify the situation.

Thanks,
Richard.

> gcc/analyzer/ChangeLog:
> PR analyzer/98599
> * supergraph.cc (saved_uids::make_uid_unique): New.
> (saved_uids::restore_uids): New.
> (supergraph::supergraph): Replace assignments to stmt->uid with
> calls to m_stmt_uids.make_uid_unique.
> (supergraph::~supergraph): New.
> * supergraph.h (class saved_uids): New.
> (supergraph::~supergraph): New decl.
> (supergraph::m_stmt_uids): New field.
>
> gcc/ChangeLog:
> PR analyzer/98599
> * doc/gimple.texi: Document that UIDs must not change during IPA
> passes.
> * gimple.h (gimple::uid): Likewise.
> (gimple_set_uid): Likewise.
>
> gcc/testsuite/ChangeLog:
> PR analyzer/98599
> * gcc.dg/analyzer/pr98599-a.c: New test.
> * gcc.dg/analyzer/pr98599-b.c: New test.
> ---
>  gcc/analyzer/supergraph.cc| 53 +--
>  gcc/analyzer/supergraph.h | 15 +++
>  gcc/doc/gimple.texi   |  6 +++
>  gcc/gimple.h  | 13 +-
>  gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 
>  gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
>  6 files changed, 90 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
>  create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c
>
> diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
> index 419f6424f76..40acfbd16a8 100644
> --- a/gcc/analyzer/supergraph.cc
> +++ b/gcc/analyzer/supergraph.cc
> @@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple *stmt)
>return edge;
>  }
>
> +/* class saved_uids.
> +
> +   In order to ensure consistent results without relying on the ordering
> +   of pointer values we assign a uid to each gimple stmt, globally unique
> +   across all functions.
> +
> +   Normally, the stmt uids are a scratch space that each pass can freely
> +   assign its own values to.  However, in the case of LTO, the uids are
> +   used to associate call stmts with callgraph edges between the WPA phase
> +   (where the analyzer runs in LTO mode) and the LTRANS phase; if the
> +   analyzer changes them in the WPA phase, it leads to errors when
> +   streaming the code back in at LTRANS.
> +
> +   Hence this class has responsibility for tracking the original uids
> +   and restoring them once the pass is 

Re: [PATCH, Fortran] PR fortran/93524 - ISO_Fortran_binding signed char arrays

2021-01-13 Thread Jerry DeLisle via Gcc-patches
Looks good and I have tested. I will commit after one more double check 
for regressions with the test case in the testsuite


I have been following Harris on this one for several days and tested the 
patch before submitted here.


Thanks for patch Harris, much appreciated.

Regards,

Jerry

On 1/13/21 8:02 AM, Harris Snyder wrote:

On Wed, Jan 13, 2021 at 1:34 AM Harris Snyder  wrote:

Hi everyone,

Sorry, my previous email erroneously referred to unsigned chars / uint8_t,
when I in fact meant signed chars / int8_t. The actual patch works, but the
test case files have ‘uint’ in the file names. I’ll provide a corrected patch
tomorrow to fix the file names.

Harris

Corrected patch below.

OK for master? I don't have write access so I will need someone else
to commit this for me if possible.

Thanks,
Harris Snyder


Fixes a bug in ISO_Fortran_binding.c whereby signed char or int8_t
arrays would cause crashes unless an element size is specified.
Related to PR fortran/93524.

libgfortran/ChangeLog:

 * runtime/ISO_Fortran_binding.c (CFI_establish): fixed signed char arrays.

gcc/testsuite/ChangeLog:

 * gfortran.dg/iso_fortran_binding_int8_array.f90: New test.
 * gfortran.dg/iso_fortran_binding_int8_array_driver.c: New test.

diff --git a/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array.f90
b/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array.f90
new file mode 100644
index 000..31fdf863e0a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array.f90
@@ -0,0 +1,11 @@
+! { dg-do run }
+! { dg-additional-sources iso_fortran_binding_int8_array_driver.c }
+
+module m
+   use iso_c_binding
+contains
+   subroutine fsub( x ) bind(C, name="fsub")
+  integer(c_int8_t), intent(inout) :: x(:)
+  x = x+1
+   end subroutine
+end module
diff --git a/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array_driver.c
b/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array_driver.c
new file mode 100644
index 000..84ed127d525
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array_driver.c
@@ -0,0 +1,25 @@
+#include 
+#include 
+#include 
+#include "ISO_Fortran_binding.h"
+
+extern void fsub(CFI_cdesc_t *);
+
+int main(void)
+{
+   int8_t x[] = {1,2,3,4};
+   int N = sizeof(x)/sizeof(x[0]);
+
+   CFI_CDESC_T(1) dat;
+   CFI_index_t ext[1];
+   ext[0] = (CFI_index_t)N;
+   int rc = CFI_establish((CFI_cdesc_t *), , CFI_attribute_other,
+  CFI_type_int8_t, 0, (CFI_rank_t)1, ext);
+   printf("CFI_establish call returned: %d\n", rc);
+
+   fsub((CFI_cdesc_t *) );
+
+   for (int i=0; ibase_addr = base_addr;

if (type == CFI_type_char || type == CFI_type_ucs4_char ||
-  type == CFI_type_signed_char || type == CFI_type_struct ||
-  type == CFI_type_other)
+  type == CFI_type_struct || type == CFI_type_other)
  dv->elem_len = elem_len;
else
  {




PING^7 [PATCH 1/4] unroll: Add middle-end unroll factor estimation

2021-01-13 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping^7 for:

https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html

BR,
Kewen

on 2020/12/17 上午10:58, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> Gentle ping^6 for:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html
> 
> BR,
> Kewen
> 
> on 2020/11/19 下午1:50, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> Gentle ping^5 for:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html
>>
>> BR,
>> Kewen
>>
>> on 2020/11/2 下午5:13, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> Gentle ping^4 this:
>>>
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html
>>>
>>> BR,
>>> Kewen
>>>
>>> on 2020/10/13 下午3:06, Kewen.Lin via Gcc-patches wrote:
 Hi,

 Gentle ping this:

 https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html

 BR,
 Kewen

 on 2020/9/15 下午3:44, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> Gentle ping this:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html
>
> BR,
> Kewen
>
> on 2020/8/31 下午1:49, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> I'd like to gentle ping this since IVOPTs part is already to land.
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-May/546698.html
>>
>> BR,
>> Kewen
>>
>> on 2020/5/28 下午8:19, Kewen.Lin via Gcc-patches wrote:
>>>
>>> gcc/ChangeLog
>>>
>>> 2020-MM-DD  Kewen Lin  
>>>
>>> * cfgloop.h (struct loop): New field estimated_unroll.
>>> * tree-ssa-loop-manip.c (decide_unroll_const_iter): New 
>>> function.
>>> (decide_unroll_runtime_iter): Likewise.
>>> (decide_unroll_stupid): Likewise.
>>> (estimate_unroll_factor): Likewise.
>>> * tree-ssa-loop-manip.h (estimate_unroll_factor): New 
>>> declaration.
>>> * tree-ssa-loop.c (tree_average_num_loop_insns): New function.
>>> * tree-ssa-loop.h (tree_average_num_loop_insns): New 
>>> declaration.
>>>


PING^1 [PATCH] rs6000: Use rldimi for vec init instead of shift + ior

2021-01-13 Thread Kewen.Lin via Gcc-patches
Hi,

I'd like to gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562407.html


BR,
Kewen

on 2020/12/22 下午4:08, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> This patch is to make unsigned int vector init go with
> rldimi to merge two integers instead of shift and ior.
> 
> I tried to use nonzero_bits in md file to make it more
> general, but the testing shows it isn't doable.  The
> reason is that some passes would replace some pseudos
> with other pseudos and do the recog again, but at that
> time the nonzero_bits could get rough information and
> lead the recog fails unexpectedly.
> 
> btw, the test case would reply on the combine patch[1].
> 
> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
> 
> BR,
> Kewen
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561413.html
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.md (*rotl3_insert_3): Renamed to...
>   (rotl3_insert_3): ...this.
>   * config/rs6000/vsx.md (vsx_init_v4si): Use gen_rotldi3_insert_3
>   for integer merging.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/vec-init-10.c: New test.
> 
> -
> 


PING^1 [PATCH/RFC] combine: Tweak the condition of last_set invalidation

2021-01-13 Thread Kewen.Lin via Gcc-patches
Hi,

I'd like to gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562015.html


BR,
Kewen

on 2020/12/16 下午4:49, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> When I was investigating unsigned int vec_init issue on Power,
> I happened to find there seems something we can enhance in how
> combine pass invalidate last_set (set last_set_invalid nonzero).
> 
> Currently we have the check:
> 
>   if (!insn
> || (value && rsp->last_set_table_tick >= label_tick_ebb_start))
>   rsp->last_set_invalid = 1; 
> 
> which means if we want to record some value for some reg and
> this reg got refered before in a valid scope, we invalidate the
> set of reg (last_set_invalid to 1).  It avoids to find the wrong
> set for one reg reference, such as the case like:
> 
>... op regX  // this regX could find wrong last_set below
>regX = ...   // if we think this set is valid
>... op regX
> 
> But because of retry's existence, the last_set_table_tick could
> be set by some later reference insns, but we see it's set due
> to retry on the set (for that reg) insn again, such as:
> 
>insn 1
>insn 2
> 
>regX = ... --> (a)
>... op regX--> (b)
>
>insn 3
> 
>// assume all in the same BB.
> 
> Assuming we combine 1, 2 -> 3 sucessfully and replace them as two
> (3 insns -> 2 insns), retrying from insn1 or insn2 again:
> it will scan insn (a) again, the below condition holds for regX:
> 
>   (value && rsp->last_set_table_tick >= label_tick_ebb_start)
> 
> it will mark this set as invalid set.  But actually the
> last_set_table_tick here is set by insn (b) before retrying, so it
> should be safe to be taken as valid set.
> 
> This proposal is to check whether the last_set_table safely happens
> after the current set, make the set still valid if so.
> 
> Bootstrapped/regtested on powerpc64le-linux-gnu (P9),
> aarch64-linux-gnu and x86_64-pc-linux-gnu.
> 
> Full SPEC2017 building shows this patch gets more sucessful combines
> from 1902208 to 1902243 (trivial though).
> 
> Any comments are highly appreciated!
> 
> BR,
> Kewen
> -
> 
> gcc/ChangeLog:
> 
>   * combine.c (struct reg_stat_type): New member
>   last_set_table_luid.
>   (update_table_tick): Add one argument for insn luid and
>   set last_set_table_luid with it.
>   (record_value_for_reg): Adjust the condition to set
>   last_set_invalid nonzero.
> 


BR,
Kewen


PING^1 [PATCH] combine: zeroing cost for new copies

2021-01-13 Thread Kewen.Lin via Gcc-patches
Hi,

I'd like to gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561413.html


BR,
Kewen

on 2020/12/9 下午5:49, Kewen.Lin via Gcc-patches wrote:
> Hi,
> 
> This patch is to treat those new pseudo-to-pseudo copies
> after hard-reg-to-pseudo-copy as zero costs.  The
> justification is that these new copies are closely after
> the corresponding hard-reg-to-pseudo-copy insns, register
> allocation should be able to coalesce them and get them
> eliminated.
> 
> Now these copies follow the normal costing scheme, the
> below case dump shows the unexpected combination:
> 
> ``` dump
> 
> Trying 3, 2 -> 13:
> 3: r119:DI=r132:DI
>   REG_DEAD r132:DI
> 2: r118:DI=r131:DI
>   REG_DEAD r131:DI
>13: r128:DI=r118:DI&0x|r119:DI<<0x20
>   REG_DEAD r119:DI
>   REG_DEAD r118:DI
> 
> Failed to match this instruction:
> (set (reg:DI 128)
> (ior:DI (ashift:DI (reg:DI 132)
> (const_int 32 [0x20]))
> (reg:DI 131)))
> Successfully matched this instruction:
> (set (reg/v:DI 119 [ f2 ])
> (ashift:DI (reg:DI 132)
> (const_int 32 [0x20])))
> Successfully matched this instruction:
> (set (reg:DI 128)
> (ior:DI (reg/v:DI 119 [ f2 ])
> (reg:DI 131)))
> allowing combination of insns 2, 3 and 13
> original costs 4 + 4 + 4 = 12
> replacement costs 4 + 4 = 8
> deferring deletion of insn with uid = 2.
> modifying insn i2 3: r119:DI=r132:DI<<0x20
>   REG_DEAD r132:DI
> deferring rescan insn with uid = 3.
> modifying insn i313: r128:DI=r119:DI|r131:DI
>   REG_DEAD r131:DI
>   REG_DEAD r119:DI
> deferring rescan insn with uid = 13.
> 
> ``` end dump
> 
> The original insn 13 can work well as rotldi3_insert_3,
> so the combination with shift/or isn't better, but the
> costing doesn't matches.
> 
> With this patch, we get below instead:
> 
> rejecting combination of insns 2, 3 and 13
> original costs 0 + 0 + 4 = 4
> replacement costs 4 + 4 = 8
> 
> 
> Bootstrapped/regtested on powerpc64le-linux-gnu P9.
> 
> Is it reasonable?  Any comments are highly appreciated!
> 
> BR,
> Kewen
> --
> gcc/ChangeLog:
> 
>   * combine.c (new_copies): New static global variable declare/init.
>   (combine_validate_cost): Consider zero costs from new_copies.
>   (combine_instructions): Set zero cost for insns in new_copies.
>   (make_more_copies): Record new pseudo-to-pseudo copies to new_copies.
>   (rest_of_handle_combine): Call bitmap alloc/free for new_copies.
> 



[PATCH] lra: clear lra_insn_recog_data after simplifying a mem subreg

2021-01-13 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I ran into this problem when writing new patterns for s390.  I'm not
100% sure this fix is correct, but it resolves my issue and survives
bootstrap and regtest on x86_64-redhat-linux, ppc64le-redhat-linux and
s390x-redhat-linux.  Could you please take a look?

Best regards,
Ilya




Suppose we have:

(insn (set (reg:FPRX2 70) (subreg:FPRX2 (reg/v:TF 63) 0)))

where operand_loc[0] points to r70 and operand_loc[1] points to r63.
If r63 is spilled, remove_pseudos() will change this insn to:

  (insn (set (reg:FPRX2 70)
 (subreg:FPRX2 (mem/c:TF (plus:DI (reg:DI %fp)
  (const_int 144))

This is fine so far: rtx pointed to by operand_loc[1] has been changed
from (reg) to (mem), but its slot is still under (subreg).  However,
alter_subreg() will simplify this insn to:

  (insn (set (reg:FPRX2 70)
 (mem/c:FPRX2 (plus:DI (reg:DI %fp) (const_int 144)

The (subreg) is gone, and therefore operand_loc[1] is no longer valid.
This will prevent process_insn_for_elimination() from updating the spill
slot offset, causing miscompilation: different instructions will refer
to the same spill slot using different offsets.

Fix by clearing all the cached data, and not just used_insn_alternative.

gcc/ChangeLog:

2021-01-13  Ilya Leoshkevich  

* lra-spills.c (remove_pseudos): Call lra_update_insn_recog_data()
after calling alter_subreg() on a (mem).
---
 gcc/lra-spills.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c
index 26f56b2df02..01bd82574e7 100644
--- a/gcc/lra-spills.c
+++ b/gcc/lra-spills.c
@@ -431,7 +431,7 @@ remove_pseudos (rtx *loc, rtx_insn *insn)
  alter_subreg (loc, false);
  if (GET_CODE (*loc) == MEM)
{
- lra_get_insn_recog_data (insn)->used_insn_alternative = -1;
+ lra_update_insn_recog_data (insn);
  if (lra_dump_file != NULL)
fprintf (lra_dump_file,
 "Memory subreg was simplified in insn #%u\n",
-- 
2.26.2



[PATCH v2 5/5] or1k: Fixup exception header data encodings

2021-01-13 Thread Stafford Horne via Gcc-patches
While running glibc tests several *-textrel tests failed showing that
relocations remained against read only sections.  It turned out this was
related to exception headers data encoding being wrong.

By default pointer encoding will always use the DW_EH_PE_absptr format.

This patch uses format DW_EH_PE_pcrel and DW_EH_PE_sdata4.  Optionally
DW_EH_PE_indirect is included for global symbols.  This eliminates the
relocations.

gcc/ChangeLog:

* config/or1k/or1k.h (ASM_PREFERRED_EH_DATA_FORMAT): New macro.
---
 gcc/config/or1k/or1k.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h
index b686f1bd159..fe01ab81ead 100644
--- a/gcc/config/or1k/or1k.h
+++ b/gcc/config/or1k/or1k.h
@@ -408,4 +408,8 @@ do {\
 ((N) < 4 ? HW_TO_GCC_REGNO (25) + (N) : INVALID_REGNUM)
 #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, EH_RETURN_REGNUM)
 
+/* Select a format to encode pointers in exception handling data.  */
+#define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
+  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | DW_EH_PE_sdata4)
+
 #endif /* GCC_OR1K_H */
-- 
2.26.2



[PATCH v2 4/5] or1k: Add note to indicate execstack

2021-01-13 Thread Stafford Horne via Gcc-patches
Define TARGET_ASM_FILE_END as file_end_indicate_exec_stack to allow
generation of the ".note.GNU-stack" section note.  This allows binutils
to properly set PT_GNU_STACK in the program header.

This fixes a glibc execstack testsuite test failure found while working
on the OpenRISC glibc port.

gcc/ChangeLog:

* config/or1k/linux.h (TARGET_ASM_FILE_END): Define macro.
---
 gcc/config/or1k/linux.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/or1k/linux.h b/gcc/config/or1k/linux.h
index 74fbe082103..196f3f3c8f0 100644
--- a/gcc/config/or1k/linux.h
+++ b/gcc/config/or1k/linux.h
@@ -42,4 +42,6 @@
  %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}} \
%{static-pie:-Bstatic -pie --no-dynamic-linker -z text}"
 
+#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
+
 #endif /* GCC_OR1K_LINUX_H */
-- 
2.26.2



[PATCH v2 3/5] or1k: Support for softfloat to emulate hw exceptions

2021-01-13 Thread Stafford Horne via Gcc-patches
This allows the openrisc softfloat implementation to set exceptions.
This also sets the correct tininess after rounding value to be
consistent with hardware and simulator implementations.

libgcc/ChangeLog:

* config/or1k/sfp-machine.h (FP_RND_NEAREST, FP_RND_ZERO,
FP_RND_PINF, FP_RND_MINF, FP_RND_MASK, FP_EX_OVERFLOW,
FP_EX_UNDERFLOW, FP_EX_INEXACT, FP_EX_INVALID, FP_EX_DIVZERO,
FP_EX_ALL): New constant macros.
(_FP_DECL_EX, FP_ROUNDMODE, FP_INIT_ROUNDMODE,
FP_HANDLE_EXCEPTIONS): New macros.
(_FP_TININESS_AFTER_ROUNDING): Change to 1.
---
 libgcc/config/or1k/sfp-machine.h | 41 +++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/or1k/sfp-machine.h b/libgcc/config/or1k/sfp-machine.h
index 5da9e84990d..eebe5b0578e 100644
--- a/libgcc/config/or1k/sfp-machine.h
+++ b/libgcc/config/or1k/sfp-machine.h
@@ -41,12 +41,51 @@
 R##_c = FP_CLS_NAN;\
   } while (0)
 
+/* Handle getting and setting rounding mode for soft fp operations.  */
+
+#define FP_RND_NEAREST (0x0 << 1)
+#define FP_RND_ZERO(0x1 << 1)
+#define FP_RND_PINF(0x2 << 1)
+#define FP_RND_MINF(0x3 << 1)
+#define FP_RND_MASK(0x3 << 1)
+
+#define FP_EX_OVERFLOW 1 << 3
+#define FP_EX_UNDERFLOW1 << 4
+#define FP_EX_INEXACT  1 << 8
+#define FP_EX_INVALID  1 << 9
+#define FP_EX_DIVZERO  1 << 11
+#define FP_EX_ALL \
+   (FP_EX_INVALID | FP_EX_DIVZERO | FP_EX_OVERFLOW | FP_EX_UNDERFLOW \
+| FP_EX_INEXACT)
+
+#define _FP_DECL_EX \
+  unsigned int _fpcsr __attribute__ ((unused)) = FP_RND_NEAREST
+
+#define FP_ROUNDMODE (_fpcsr & FP_RND_MASK)
+
+#ifdef __or1k_hard_float__
+#define FP_INIT_ROUNDMODE  \
+do {   \
+  __asm__ volatile ("l.mfspr %0,r0,20" : "=r" (_fpcsr));   \
+} while (0)
+
+#define FP_HANDLE_EXCEPTIONS   \
+do {   \
+  if (__builtin_expect (_fex, 0))  \
+{  \
+  _fpcsr &= ~FP_EX_ALL;\
+  _fpcsr |= _fex;  \
+  __asm__ volatile ("l.mtspr r0,%0,20" : : "r" (_fpcsr));  \
+}  \
+} while (0)
+#endif
+
 #define__LITTLE_ENDIAN 1234
 #define__BIG_ENDIAN4321
 
 #define __BYTE_ORDER __BIG_ENDIAN
 
-#define _FP_TININESS_AFTER_ROUNDING 0
+#define _FP_TININESS_AFTER_ROUNDING 1
 
 /* Define ALIASNAME as a strong alias for NAME.  */
 # define strong_alias(name, aliasname) _strong_alias(name, aliasname)
-- 
2.26.2



[PATCH v2 2/5] or1k: Add builtin define to detect hard float

2021-01-13 Thread Stafford Horne via Gcc-patches
This is used in libgcc and now glibc to detect when hardware floating
point operations are supported by the target.

gcc/ChangeLog:

* config/or1k/or1k.h (TARGET_CPU_CPP_BUILTINS): Add builtin
  define for __or1k_hard_float__.
---
 gcc/config/or1k/or1k.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h
index dc579e4a388..b686f1bd159 100644
--- a/gcc/config/or1k/or1k.h
+++ b/gcc/config/or1k/or1k.h
@@ -30,6 +30,8 @@
   builtin_define ("__or1k__"); \
   if (TARGET_CMOV) \
builtin_define ("__or1k_cmov__");   \
+  if (TARGET_HARD_FLOAT)   \
+   builtin_define ("__or1k_hard_float__"); \
   builtin_assert ("cpu=or1k"); \
   builtin_assert ("machine=or1k"); \
 }  \
-- 
2.26.2



[PATCH v2 1/5] or1k: Implement profile hook calling _mcount

2021-01-13 Thread Stafford Horne via Gcc-patches
Defining this to not abort as found when working on running tests in
the glibc test suite.

We implement this with a call to _mcount with no arguments.  The required
return address's will be pulled from the stack.  Passing the LR (r9) as
an argument had problems as sometimes r9 is clobbered by the GOT logic
in the prologue before the call to _mcount.

gcc/ChangeLog:

* config/or1k/or1k.h (NO_PROFILE_COUNTERS): Define as 1.
(PROFILE_HOOK): Define to call _mcount.
(FUNCTION_PROFILER): Change from abort to no-op.
---
 gcc/config/or1k/or1k.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h
index ab1c4bbd2a7..dc579e4a388 100644
--- a/gcc/config/or1k/or1k.h
+++ b/gcc/config/or1k/or1k.h
@@ -379,8 +379,19 @@ do {\
 /* Always pass the SYMBOL_REF for direct calls to the expanders.  */
 #define NO_FUNCTION_CSE 1
 
-/* Profiling */
-#define FUNCTION_PROFILER(FILE,LABELNO) (abort (), 0)
+#define NO_PROFILE_COUNTERS 1
+
+/* Emit rtl for profiling.  Output assembler code to call "_mcount" for
+   profiling a function entry.  */
+#define PROFILE_HOOK(LABEL)\
+  {\
+rtx fun;   \
+fun = gen_rtx_SYMBOL_REF (Pmode, "_mcount");   \
+emit_library_call (fun, LCT_NORMAL, VOIDmode); \
+  }
+
+/* All the work is done in PROFILE_HOOK, but this is still required.  */
+#define FUNCTION_PROFILER(STREAM, LABELNO) do { } while (0)
 
 /* Dwarf 2 Support */
 #define DWARF2_DEBUGGING_INFO 1
-- 
2.26.2



[RESEND PATCH 0/5] OpenRISC GCC Fixes for Glibc Support

2021-01-13 Thread Stafford Horne via Gcc-patches
Hello,

Changes since v1:
 - Rebase

This just a resend of v1 with no changes from when I sent it last year.  I
hadn't committed it because I had not completed all testing in glibc.  Now that
I have done that and it all seems to work I will commit it.

I am currently working on the glibc port for OpenRISC.  This is a series of
patches that fix issues and add features that were missing in GCC causing glibc
testsuite failures.

Pretty much all of these changes are just adding macros.

These changes have been tested via the glibc test suite.

-Stafford

Stafford Horne (5):
  or1k: Implement profile hook calling _mcount
  or1k: Add builtin define to detect hard float
  or1k: Support for softfloat to emulate hw exceptions
  or1k: Add note to indicate execstack
  or1k: Fixup exception header data encodings

 gcc/config/or1k/linux.h  |  2 ++
 gcc/config/or1k/or1k.h   | 21 ++--
 libgcc/config/or1k/sfp-machine.h | 41 +++-
 3 files changed, 61 insertions(+), 3 deletions(-)

-- 
2.26.2



[PATCH] [og10] openacc: Adjust loop lowering for AMD GCN

2021-01-13 Thread Julian Brown
This patch adjusts OpenACC loop lowering in the AMD GCN target compiler
in such a way that the autovectorizer can vectorize the "vector" dimension
of those loops in more cases.

Rather than generating "SIMT" code that executes a scalar instruction
stream for each lane of a vector in lockstep, for GCN we model the GPU
like a typical CPU, with separate instructions to operate on scalar and
vector data. That means that unlike other offload targets, we rely on
the autovectorizer to handle the innermost OpenACC parallelism level,
which is "vector".

Because of this, the OpenACC builtin functions to return the current
vector lane and the vector width return 0 and 1 respectively, despite
the native vector width being 64 elements wide.

This allows generated code to work with our chosen compilation model,
but the way loops are lowered in omp-offload.c:oacc_xform_loop does not
understand the discrepancy between logical (OpenACC) and physical vector
sizes correctly. That means that if a loop is partitioned over e.g. the
worker AND vector dimensions, we actually lower with unit vector size --
meaning that if we then autovectorize, we end up trying to vectorize
over the "worker" dimension rather than the vector one! Then, because
the number of workers is not fixed at compile time, that means the
autovectorizer has a hard time analysing the loop and thus vectorization
often fails entirely.

We can fix this by deducing the true vector width in oacc_xform_loop,
and using that when we are on a "non-SIMT" offload target. We can then
rearrange how loops are lowered in that function so that the loop form
fed to the autovectorizer is more amenable to vectorization -- namely,
the innermost step is set to process each loop iteration sequentially.

For some benchmarks, allowing vectorization to succeed leads to quite
impressive performance improvements -- I've observed between 2.5x and
40x on one machine/GPU combination.

The low-level builtins available to user code (__builtin_goacc_parlevel_id
and __builtin_goacc_parlevel_size) continue to return 0/1 respectively
for the vector dimension for AMD GCN, even if their containing loop is
vectorized -- that's a quirk that we might possibly want to address at
some later date.

Only non-"chunking" loops are handled at present. "Chunking" loops are
still lowered as before.

Tested with offloading to AMD GCN. I will apply to the og10 branch
shortly.

Julian

2021-01-13  Julian Brown  

gcc/
* omp-offload.c (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter.
Add overloaded wrapper for previous arguments & behaviour.
(oacc_xform_loop): Lower vector loops to iterate a multiple of
omp_max_vf times over contiguous steps on non-SIMT targets.

libgomp/testsuite/
* libgomp.oacc-c-c++-common/loop-gwv-1.c: Adjust for loop lowering
changes.
* libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
* libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
* libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
* libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
---
 gcc/omp-offload.c | 160 ++
 .../libgomp.oacc-c-c++-common/loop-gwv-1.c|  15 +-
 .../loop-red-gwv-1.c  |  17 +-
 .../libgomp.oacc-c-c++-common/loop-red-wv-1.c |  16 ++
 .../libgomp.oacc-c-c++-common/loop-wv-1.c |  16 ++
 .../libgomp.oacc-c-c++-common/routine-gwv-1.c |  17 +-
 .../libgomp.oacc-c-c++-common/routine-wv-1.c  |  16 ++
 7 files changed, 214 insertions(+), 43 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index f5ce34d3bdd8..bb3bfd130ee4 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -470,11 +470,13 @@ oacc_dim_call (bool pos, int dim, gimple_seq *seq)
 }
 
 /* Find the number of threads (POS = false), or thread number (POS =
-   true) for an OpenACC region partitioned as MASK.  Setup code
+   true) for an OpenACC region partitioned as MASK.  If VF_BY_VECTORIZER is
+   true, use that as the vectorization factor for the auto-vectorized
+   dimension size, instead of calling the builtin function.  Setup code
required for the calculation is added to SEQ.  */
 
 static tree
-oacc_thread_numbers (bool pos, int mask, gimple_seq *seq)
+oacc_thread_numbers (bool pos, int mask, tree vf_by_vectorizer, gimple_seq 
*seq)
 {
   tree res = pos ? NULL_TREE : build_int_cst (unsigned_type_node, 1);
   unsigned ix;
@@ -487,13 +489,15 @@ oacc_thread_numbers (bool pos, int mask, gimple_seq *seq)
  {
/* We had an outer index, so scale that by the size of
   this dimension.  */
-   tree n = oacc_dim_call (false, ix, seq);
+   tree n = (ix == GOMP_DIM_VECTOR && vf_by_vectorizer)
+? vf_by_vectorizer : oacc_dim_call (false, ix, seq);
res = fold_build2 (MULT_EXPR, integer_type_node, res, n);
  }
if (pos)
  {

[PATCH] [og10] vect: Add target hook to prefer gather/scatter instructions

2021-01-13 Thread Julian Brown
For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.

The attached patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.

Tested with offloading to AMD GCN. I will apply to the og10 branch
shortly.

Julian

2021-01-13  Julian Brown  

gcc/
* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
documentation hook.
* doc/tm.texi: Regenerate.
* target.def (prefer_gather_scatter): Add target hook under vectorizer.
* tree-vect-stmts.c (get_group_load_store_type): Optionally prefer
gather/scatter instructions to scalar/elementwise fallback.
* config/gcn/gcn.c (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define
hook.
---
 gcc/config/gcn/gcn.c  | 2 ++
 gcc/doc/tm.texi   | 5 +
 gcc/doc/tm.texi.in| 2 ++
 gcc/target.def| 8 
 gcc/tree-vect-stmts.c | 9 +++--
 5 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index ee9f00558305..ea88b5e91244 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -6501,6 +6501,8 @@ gcn_dwarf_register_span (rtx rtl)
   gcn_vector_alignment_reachable
 #undef  TARGET_VECTOR_MODE_SUPPORTED_P
 #define TARGET_VECTOR_MODE_SUPPORTED_P gcn_vector_mode_supported_p
+#undef  TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+#define TARGET_VECTORIZE_PREFER_GATHER_SCATTER true
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 581b7b51eeb0..bd0b2eea477a 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6122,6 +6122,11 @@ The default is @code{NULL_TREE} which means to not 
vectorize scatter
 stores.
 @end deftypefn
 
+@deftypevr {Target Hook} bool TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+This hook is set to TRUE if gather loads or scatter stores are cheaper on
+this target than a sequence of elementwise loads or stores.
+@end deftypevr
+
 @deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN 
(struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, 
@var{int})
 This hook should set @var{vecsize_mangle}, @var{vecsize_int}, 
@var{vecsize_float}
 fields in @var{simd_clone} structure pointed by @var{clone_info} argument and 
also
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index afa19d4ac63c..c0883e5da82c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4195,6 +4195,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_BUILTIN_SCATTER
 
+@hook TARGET_VECTORIZE_PREFER_GATHER_SCATTER
+
 @hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
 
 @hook TARGET_SIMD_CLONE_ADJUST
diff --git a/gcc/target.def b/gcc/target.def
index 00421f3a6acd..0b34ab5c3d52 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2027,6 +2027,14 @@ all zeros.  GCC can then try to branch around the 
instruction instead.",
  (unsigned ifn),
  default_empty_mask_is_expensive)
 
+/* Prefer gather/scatter loads/stores to e.g. elementwise accesses if\n\
+we cannot use a contiguous access.  */
+DEFHOOKPOD
+(prefer_gather_scatter,
+ "This hook is set to TRUE if gather loads or scatter stores are cheaper on\n\
+this target than a sequence of elementwise loads or stores.",
+ bool, false)
+
 /* Target builtin that implements vector gather operation.  */
 DEFHOOK
 (builtin_gather,
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9ace345fc5e2..e117d3d16afc 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2444,9 +2444,14 @@ get_group_load_store_type (stmt_vec_info stmt_info, tree 
vectype, bool slp,
 it probably isn't a win to use separate strided accesses based
 on nearby locations.  Or, even if it's a win over scalar code,
 it might not be a win over vectorizing at a lower VF, if that
-allows us to use contiguous accesses.  */
+allows us to use contiguous accesses.
+
+On some targets (e.g. AMD GCN), always use gather/scatter accesses
+here since those are the only types of vector loads/stores available,
+and the fallback case of using elementwise accesses is very
+inefficient.  */
   if (*memory_access_type == VMAT_ELEMENTWISE
- && single_element_p
+ && (targetm.vectorize.prefer_gather_scatter || single_element_p)
  && loop_vinfo
  && vect_use_strided_gather_scatters_p 

[PATCH] c++: ICE when mangling operator name [PR98545]

2021-01-13 Thread Marek Polacek via Gcc-patches
r11-6301 added some asserts in mangle.c, and now we trip over one of
them.  In particular, it's the one asserting that we didn't get
IDENTIFIER_ANY_OP_P when mangling an expression with a dependent name.

As this testcase shows, it's possible to get that, so turn the assert
into an if and write "on".  That changes the mangling in the following
way:

With this patch:

$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTonclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

G++10:
$ c++filt _ZN1i1hIJ1adS1_EEEDTcldtdefpTclspcvT__EEEDpS2_
decltype (((*this).(operator()))((a)(), (double)(), (a)())) i::h(a, double, a)

clang++/icc:
$ c++filt _ZN1i1hIJ1adS1_EEEDTclonclspcvT__EEEDpS2_
decltype ((operator())((a)(), (double)(), (a)())) i::h(a, double, 
a)

I'm not sure why we differ in the "(*this)." part, but at least the
suffix "onclspcvT__EEEDpS2_" is the same for all three compilers.  So
I hope the following fix makes sense.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/98545
* mangle.c (write_expression): When the expression is a dependent name
and an operator name, write "on" before writing its name.

gcc/testsuite/ChangeLog:

PR c++/98545
* g++.dg/abi/mangle76.C: New test.
---
 gcc/cp/mangle.c |  3 ++-
 gcc/testsuite/g++.dg/abi/mangle76.C | 39 +
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/mangle76.C

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 11eb8962d28..bb3c4b76d33 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3349,7 +3349,8 @@ write_expression (tree expr)
   else if (dependent_name (expr))
 {
   tree name = dependent_name (expr);
-  gcc_assert (!IDENTIFIER_ANY_OP_P (name));
+  if (IDENTIFIER_ANY_OP_P (name))
+   write_string ("on");
   write_unqualified_id (name);
 }
   else
diff --git a/gcc/testsuite/g++.dg/abi/mangle76.C 
b/gcc/testsuite/g++.dg/abi/mangle76.C
new file mode 100644
index 000..0c2964cbecb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/mangle76.C
@@ -0,0 +1,39 @@
+// PR c++/98545
+// { dg-do compile { target c++11 } }
+
+class a {
+public:
+  a();
+  template  a(b);
+};
+template  using c = a;
+class f {
+protected:
+  template  void operator()(d, double, e);
+};
+class i : f {
+public:
+  template 
+  [[gnu::used]] auto h(g...) -> decltype(operator()(g()...)) {}
+// { dg-final { scan-assembler 
"_ZN1i1hIJ1adS1_EEEDTcldtdefpTonclspcvT__EEEDpS2_" } }
+};
+template  class C {
+public:
+  template  C(j);
+  i k() const;
+  int operator()() {
+int l = 10;
+c<> m, n;
+operator()(m, l, n);
+return 0;
+  }
+  int operator()(c<> &, c<> const &, c<> const &) const;
+  template  void k(d m, double gamma, e o) const {
+k().h(m, gamma, o);
+  }
+};
+template  int C::operator()(c<> &, c<> const &, c<> const &) const 
{
+  [&](c<> m, double gamma, c<> o) { k(m, gamma, o); };
+  return 0;
+}
+c<> p = C(p)();

base-commit: 796ead19f85372e59217c9888db688a2fe11b54f
-- 
2.29.2



[PATCH v5] rs6000, vector integer multiply/divide/modulo instructions

2021-01-13 Thread Carl Love via Gcc-patches
Will:

I have addressed the various typos you mentioned in the messages to the
maintainers.

Per your comment I have also tested the updated patch on Power 8 BE.

The patch was compiled and tested on:

   powerpc64le-unknown-linux-gnu (Power 8 BE)
   powerpc64le-unknown-linux-gnu (Power 9 LE)
   powerpc64le-unknown-linux-gnu (Power 10 LE)

I have fixed the change log entries.

I have fixed the formatting/white space issues you mentioned.

With regards to the comment:

> Presumably it is safe (no side affects) when adding V4SI and V2DI here,
> with respect to other current users of 'bits'.
> Is it worth adding the
> other modes while we are here? (V1TI, V8HI, V16QI ).

I did not add the additional modes.  I don't see any reason it would
hurt but feel it is best to only add them when they are needed.

 Carl 

---

Will:

I have addressed you comments with regards to the Change Log entries.  

The extra define vec_div was removed.

Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.

The extra MULLD_V2DI case statement entry was removed.

Added comment in rs6000.md about size for vector types per discussion
with Pat.

  Carl


GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are: vec_mulh(),
vec_dive(), vec_mod() for signed and unsigned integers and long
long integers. The existing support for the vec_div() and vec_mul()
builtins emulate the vector operations with multiple scalar
instructions.  This patch adds support for these builtins using the new
vector instructions for Power 10.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)
  powerpc64le-unknown-linux-gnu (Power 10 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

---

2021-01-12  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
defines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
Add builtin define.
(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
VSX_BUILTIN_VEC_DIVE, P10_BUILTIN_VEC_MOD, P10_BUILTIN_VEC_MULH):
New overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
P10V_BUILTIN_MULHU_V4SI]: Add case
statement for builtins.
* config/rs6000/rs6000.md (bits): Add new attribute sizes V4SI, V2DI.
* config/rs6000/vsx.md (VIlong): Moved from config/rs6000/altivec.md.
(UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
(vsx_mul_v2di): Add if TARGET_POWER10 statement.
(vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(dives_, diveu_, div3, uvdiv3,
mods_, modu_, mulhs_, mulhu_, mulv2di3):
Add define_insn, mode is VIlong.
doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   4 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  21 +
 gcc/config/rs6000/rs6000-call.c   |  53 +++
 gcc/config/rs6000/rs6000.md   |   3 +-
 gcc/config/rs6000/vsx.md  | 211 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 398 ++
 8 files changed, 759 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 06f0d4d9f14..961621a0841 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX 

Re: [PATCH] c++: Failure to lookup using-decl name [PR98231]

2021-01-13 Thread Nathan Sidwell

On 1/13/21 1:38 PM, Marek Polacek wrote:

In r11-4690 we removed the call to finish_nonmember_using_decl in
tsubst_expr/DECL_EXPR in the USING_DECL block.  This was done not
to perform name lookup twice for a non-dependent using-decl, which
sounds sensible.

However, finish_nonmember_using_decl also pushes the decl's bindings
which we still have to do so that we can find the USING_DECL's name
later.  In this case, we've got a USING_DECL N::operator<<  that we are
tsubstituting.  We already looked it up while parsing the template
"foo", and lookup_using_decl stashed the OVERLOAD it found into
USING_DECL_DECLS.  Now we just have to update the IDENTIFIER_BINDING of
the identifier for operator<< with the overload the name is bound to.

I didn't want to export push_local_binding so I've introduced a new
wrapper.


seems reasonable


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


ok



--
Nathan Sidwell


[PATCH] gimple UIDs, LTO and -fanalyzer [PR98599]

2021-01-13 Thread David Malcolm via Gcc-patches
gimple.h has this comment for gimple's uid field:

  /* UID of this statement.  This is used by passes that want to
 assign IDs to statements.  It must be assigned and used by each
 pass.  By default it should be assumed to contain garbage.  */
  unsigned uid;

and gimple_set_uid has:

   Please note that this UID property is supposed to be undefined at
   pass boundaries.  This means that a given pass should not assume it
   contains any useful value when the pass starts and thus can set it
   to any value it sees fit.

which suggests that any pass can use the uid field as an arbitrary
scratch space.

PR analyzer/98599 reports a case where this error occurs in LTO mode:
  fatal error: Cgraph edge statement index out of range
on certain inputs with -fanalyzer.

The error occurs in the LTRANS phase after -fanalyzer runs in the
WPA phase.  The analyzer pass writes to the uid fields of all stmts.

The error occurs when LTRANS is streaming callgraph edges back in.
If I'm reading things correctly, the LTO format uses stmt uids to
associate call stmts with callgraph edges between WPA and LTRANS.
For example, in lto-cgraph.c, lto_output_edge writes out the
gimple_uid, and input_edge reads it back in.

Hence IPA passes that touch the uids in WPA need to restore them,
or the stream-in at LTRANS will fail.

Is it intended that the LTO machinery relies on the value of the uid
field being preserved during WPA (or, at least, needs to be saved and
restored by passes that touch it)?

On the assumption that this is the case, this patch updates the comments
in gimple.h referring to passes being able to set uid to any value to
note the caveat for IPA passes, and it updates the analyzer to save
and restore the UIDs, fixing the error.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for master?

gcc/analyzer/ChangeLog:
PR analyzer/98599
* supergraph.cc (saved_uids::make_uid_unique): New.
(saved_uids::restore_uids): New.
(supergraph::supergraph): Replace assignments to stmt->uid with
calls to m_stmt_uids.make_uid_unique.
(supergraph::~supergraph): New.
* supergraph.h (class saved_uids): New.
(supergraph::~supergraph): New decl.
(supergraph::m_stmt_uids): New field.

gcc/ChangeLog:
PR analyzer/98599
* doc/gimple.texi: Document that UIDs must not change during IPA
passes.
* gimple.h (gimple::uid): Likewise.
(gimple_set_uid): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/98599
* gcc.dg/analyzer/pr98599-a.c: New test.
* gcc.dg/analyzer/pr98599-b.c: New test.
---
 gcc/analyzer/supergraph.cc| 53 +--
 gcc/analyzer/supergraph.h | 15 +++
 gcc/doc/gimple.texi   |  6 +++
 gcc/gimple.h  | 13 +-
 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c |  8 
 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c |  1 +
 6 files changed, 90 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-a.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr98599-b.c

diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
index 419f6424f76..40acfbd16a8 100644
--- a/gcc/analyzer/supergraph.cc
+++ b/gcc/analyzer/supergraph.cc
@@ -87,6 +87,46 @@ supergraph_call_edge (function *fun, gimple *stmt)
   return edge;
 }
 
+/* class saved_uids.
+
+   In order to ensure consistent results without relying on the ordering
+   of pointer values we assign a uid to each gimple stmt, globally unique
+   across all functions.
+
+   Normally, the stmt uids are a scratch space that each pass can freely
+   assign its own values to.  However, in the case of LTO, the uids are
+   used to associate call stmts with callgraph edges between the WPA phase
+   (where the analyzer runs in LTO mode) and the LTRANS phase; if the
+   analyzer changes them in the WPA phase, it leads to errors when
+   streaming the code back in at LTRANS.
+
+   Hence this class has responsibility for tracking the original uids
+   and restoring them once the pass is complete (in the supergraph dtor).  */
+
+/* Give STMT a globally unique uid, storing its original uid so it can
+   later be restored.  */
+
+void
+saved_uids::make_uid_unique (gimple *stmt)
+{
+  unsigned next_uid = m_old_stmt_uids.length ();
+  unsigned old_stmt_uid = stmt->uid;
+  stmt->uid = next_uid;
+  m_old_stmt_uids.safe_push
+(std::pair (stmt, old_stmt_uid));
+}
+
+/* Restore the saved uids of all stmts.  */
+
+void
+saved_uids::restore_uids () const
+{
+  unsigned i;
+  std::pair *pair;
+  FOR_EACH_VEC_ELT (m_old_stmt_uids, i, pair)
+pair->first->uid = pair->second;
+}
+
 /* supergraph's ctor.  Walk the callgraph, building supernodes for each
CFG basic block, splitting the basic blocks at callsites.  Join
together the supernodes with interprocedural and intraprocedural
@@ -101,8 +141,6 @@ supergraph::supergraph 

Re: use sigjmp_buf for analyzer sigsetjmp tests

2021-01-13 Thread David Malcolm via Gcc-patches
On Fri, 2020-12-25 at 03:21 -0300, Alexandre Oliva wrote:
> The sigsetjmp analyzer tests use jmp_buf in sigsetjmp and siglongjmp
> calls.  Not every system that supports sigsetjmp uses the same data
> structure for setjmp and sigsetjmp, which results in type mismatches.
> 
> This patch changes the tests to use sigjmp_buf, that is the
> POSIX-specific type for use with sigsetjmp and siglongjmp.
> 
> Regstrapped on x86_64-linux-gnu, also tested on arm-vxworks7r2.
> Ok to install?
> 

Sorry about the mistake.

The patch looks good to me; I think you can install this under the
 "obvious" rule.

Thanks
Dave
> 



Re: declare getpass in analyzer/sensitive-1.c test

2021-01-13 Thread David Malcolm via Gcc-patches
On Fri, 2020-12-25 at 03:21 -0300, Alexandre Oliva wrote:
> The getpass function is not available on all systems; and not
> necessarily declared in unistd.h, as expected by the sensitive-1
> analyzer test.
> 
> Since this is a compile-only test, it doesn't really matter if the
> function is defined in the system libraries.  All we need is a
> declaration, to avoid warnings from calling an undeclared function.
> This patch adds the declaration, in a way that is most unlikely to
> conflict with any existing declaration.
> 
> Regstrapped on x86_64-linux-gnu, also tested on arm-vxworks7r2.
> Ok to install?

The patch looks good to me.  Technically I'm not a reviewer for the
analyzer, but I think you can go ahead under the "obvious" rule.

Thanks
Dave




[PATCH] PR fortran/98661 - valgrind issues with error recovery

2021-01-13 Thread Harald Anlauf via Gcc-patches
Dear all,

the former Fortran testcase charlen_03.f90, which some time ago used to
ICE, could still display issues during error recovery.  As Dominique
pointed out, this required either an instrumented compiler, or valgrind.

The issue turned out to not have anything to do with CHARACTER, but
with an invalid attempt resolve an invalid array specification.

Regtested on x86_64-pc-linux-gnu, and checked for the testcase with valgrind.

OK for master?

Thanks,
Harald


PR fortran/98661 - valgrind issues with error recovery

During error recovery after an invalid derived type specification it was
possible to try to resolve an invalid array specification.  We now skip
this if the component has the ALLOCATABLE or POINTER attribute and the
shape is not deferred.

gcc/fortran/ChangeLog:

PR fortran/98661
* resolve.c (resolve_component): Derived type components with
ALLOCATABLE or POINTER attribute shall have a deferred shape.

gcc/testsuite/ChangeLog:

PR fortran/98661
* gfortran.dg/pr98661.f90: New test.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 3929ddff849..448a2362e95 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -14723,6 +14735,10 @@ resolve_component (gfc_component *c, gfc_symbol *sym)
 && sym != c->ts.u.derived)
 add_dt_to_dt_list (c->ts.u.derived);

+  if (c->as && c->as->type != AS_DEFERRED
+  && (c->attr.pointer || c->attr.allocatable))
+return false;
+
   if (!gfc_resolve_array_spec (c->as,
!(c->attr.pointer || c->attr.proc_pointer
  || c->attr.allocatable)))
diff --git a/gcc/testsuite/gfortran.dg/pr98661.f90 b/gcc/testsuite/gfortran.dg/pr98661.f90
new file mode 100644
index 000..40ddff05d43
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr98661.f90
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! PR fortran/98661 - valgrind issues with error recovery
+!
+! Test issues related to former testcase charlen_03.f90
+program p
+  implicit none
+  type t
+ character(:), pointer :: c(n) ! { dg-error "must have a deferred shape" }
+ real, allocatable :: x(n) ! { dg-error "must have a deferred shape" }
+  end type
+end
+
+subroutine s
+! no 'implicit none'
+  type u
+ character(:), pointer :: c(n) ! { dg-error "must have a deferred shape" }
+ real, allocatable :: x(n) ! { dg-error "must have a deferred shape" }
+  end type
+end


[PATCH] c-family: Improve MEM_REF printing for diagnostics [PR98597]

2021-01-13 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch doesn't actually fix the print_mem_ref bugs, I've kept
it for now as broken as it was, but at least improves the cases where
we can unambiguously map back MEM[ + off] into some particular
reference (e.g. something.foo[1].bar etc.).
In the distant past I think we were folding such MEM_REFs back to
COMPONENT_REFs and ARRAY_REFs, but we've stopped doing that.  But for
diagnostics that is what the user actually want to see IMHO.
So on the attached testcase, instead of printing what is in left column
it prints what is in right column:
((int*)t) + 3   t.u.b
((int*)t) + 6   t.u.e.i
((int*)t) + 8   t.v
s + 1   s[1]
Of course, print_mem_ref needs to be also fixed to avoid printing the
nonsense it is printing right now, t is a structure type, so it can't be
cast to int* in C and in C++ only using some user operator, and
the result of what it printed is a pointer, while the uninitialized reads
are int.

I was hoping Martin would fix that, but given his comment in the PR I think
I'll fix it myself tomorrow.

Anyway, this patch is useful on its own.  Bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2021-01-13  Jakub Jelinek  

PR tree-optimization/98597
* c-pretty-print.c (c_fold_indirect_ref_for_warn): New function.
(print_mem_ref): Call it.

* gcc.dg/uninit-40.c: New test.

--- gcc/c-family/c-pretty-print.c.jj2021-01-13 08:02:09.425498954 +0100
+++ gcc/c-family/c-pretty-print.c   2021-01-13 15:02:57.860329998 +0100
@@ -1809,6 +1809,113 @@ pp_c_call_argument_list (c_pretty_printe
   pp_c_right_paren (pp);
 }
 
+/* Try to fold *(type *) into op.fld.fld2[1] if possible.
+   Only used for printing expressions.  Should punt if ambiguous
+   (e.g. in unions).  */
+
+static tree
+c_fold_indirect_ref_for_warn (location_t loc, tree type, tree op,
+ offset_int )
+{
+  tree optype = TREE_TYPE (op);
+  if (off == 0)
+{
+  if (lang_hooks.types_compatible_p (optype, type))
+   return op;
+  /* *(foo *) => __real__ complexfoo */
+  else if (TREE_CODE (optype) == COMPLEX_TYPE
+  && lang_hooks.types_compatible_p (type, TREE_TYPE (optype)))
+   return build1_loc (loc, REALPART_EXPR, type, op);
+}
+  /* ((foo*))[1] => __imag__ complexfoo */
+  else if (TREE_CODE (optype) == COMPLEX_TYPE
+  && lang_hooks.types_compatible_p (type, TREE_TYPE (optype))
+  && tree_to_uhwi (TYPE_SIZE_UNIT (type)) == off)
+{
+  off = 0;
+  return build1_loc (loc, IMAGPART_EXPR, type, op);
+}
+  /* ((foo *))[x] => fooarray[x] */
+  if (TREE_CODE (optype) == ARRAY_TYPE
+  && TYPE_SIZE_UNIT (TREE_TYPE (optype))
+  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (optype))) == INTEGER_CST
+  && !integer_zerop (TYPE_SIZE_UNIT (TREE_TYPE (optype
+{
+  tree type_domain = TYPE_DOMAIN (optype);
+  tree min_val = size_zero_node;
+  if (type_domain && TYPE_MIN_VALUE (type_domain))
+   min_val = TYPE_MIN_VALUE (type_domain);
+  offset_int el_sz = wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (optype)));
+  offset_int idx = off / el_sz;
+  offset_int rem = off % el_sz;
+  if (TREE_CODE (min_val) == INTEGER_CST)
+   {
+ tree index
+   = wide_int_to_tree (sizetype, idx + wi::to_offset (min_val));
+ op = build4_loc (loc, ARRAY_REF, TREE_TYPE (optype), op, index,
+  NULL_TREE, NULL_TREE);
+ off = rem;
+ if (tree ret = c_fold_indirect_ref_for_warn (loc, type, op, off))
+   return ret;
+ return op;
+   }
+}
+  /* ((foo *)_with_foo_field)[x] => COMPONENT_REF */
+  else if (TREE_CODE (optype) == RECORD_TYPE)
+{
+  for (tree field = TYPE_FIELDS (optype);
+  field; field = DECL_CHAIN (field))
+   if (TREE_CODE (field) == FIELD_DECL
+   && TREE_TYPE (field) != error_mark_node
+   && TYPE_SIZE_UNIT (TREE_TYPE (field))
+   && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (field))) == INTEGER_CST)
+ {
+   tree pos = byte_position (field);
+   if (TREE_CODE (pos) != INTEGER_CST)
+ continue;
+   offset_int upos = wi::to_offset (pos);
+   offset_int el_sz
+ = wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (field)));
+   if (upos <= off && off < upos + el_sz)
+ {
+   tree cop = build3_loc (loc, COMPONENT_REF, TREE_TYPE (field),
+  op, field, NULL_TREE);
+   off = off - upos;
+   if (tree ret = c_fold_indirect_ref_for_warn (loc, type, cop,
+off))
+ return ret;
+   return cop;
+ }
+ }
+}
+  /* Similarly for unions, but in this case try to be very conservative,
+ only match if some field has type compatible with type and it is the
+ only such 

Re: [Ada,FYI] revamp ada.numerics.aux

2021-01-13 Thread Alexandre Oliva
On Jan 13, 2021, Sebastian Huber  wrote:

> This is probably not RTEMS-specific. I guess it is a general riscv issue.

You're right, but the riscv*-linux-gnu configuration already has the
fix.

>> Would you like to take a stab at it yourself, or should I?
> I can try to figure out a patch, but I have to guess how it works.

What's needed is adding

  LIBGNAT_TARGET_PAIRS += \
  a-nallfl.adshttps://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


[PATCH] c++: Crash when deducing template arguments [PR98659]

2021-01-13 Thread Marek Polacek via Gcc-patches
maybe_instantiate_noexcept doesn't expect to see error_mark_node, so
the new callsite I introduced in r11-6476 needs to be properly guarded.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/98659
* pt.c (resolve_overloaded_unification): Don't call
maybe_instantiate_noexcept with error_mark_node.

gcc/testsuite/ChangeLog:

PR c++/98659
* g++.dg/template/deduce8.C: New test.
---
 gcc/cp/pt.c |  2 +-
 gcc/testsuite/g++.dg/template/deduce8.C | 21 +
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/deduce8.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 100c35f053c..83ecb0a2c3a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -22382,7 +22382,7 @@ resolve_overloaded_unification (tree tparms,
  --function_depth;
}
 
- if (flag_noexcept_type)
+ if (flag_noexcept_type && fn != error_mark_node)
maybe_instantiate_noexcept (fn, tf_none);
 
  elem = TREE_TYPE (fn);
diff --git a/gcc/testsuite/g++.dg/template/deduce8.C 
b/gcc/testsuite/g++.dg/template/deduce8.C
new file mode 100644
index 000..430be426689
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/deduce8.C
@@ -0,0 +1,21 @@
+// PR c++/98659
+// { dg-do compile }
+
+template  struct enable_if;
+struct function {
+  template  void operator=(_F);
+};
+struct map {
+  function operator[](int);
+};
+enum { E };
+template  void foo ();
+template 
+typename enable_if::type foo ();
+
+void
+bar ()
+{
+  map m;
+  m[E] = foo;
+}

base-commit: 7d7ef413ef1b696dec2710ae0acc058bdc832686
-- 
2.29.2



[PATCH] c++: Failure to lookup using-decl name [PR98231]

2021-01-13 Thread Marek Polacek via Gcc-patches
In r11-4690 we removed the call to finish_nonmember_using_decl in
tsubst_expr/DECL_EXPR in the USING_DECL block.  This was done not
to perform name lookup twice for a non-dependent using-decl, which
sounds sensible.

However, finish_nonmember_using_decl also pushes the decl's bindings
which we still have to do so that we can find the USING_DECL's name
later.  In this case, we've got a USING_DECL N::operator<<  that we are
tsubstituting.  We already looked it up while parsing the template
"foo", and lookup_using_decl stashed the OVERLOAD it found into
USING_DECL_DECLS.  Now we just have to update the IDENTIFIER_BINDING of
the identifier for operator<< with the overload the name is bound to.

I didn't want to export push_local_binding so I've introduced a new
wrapper.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

PR c++/98231
* name-lookup.c (push_using_decl_bindings): New.
* name-lookup.h (push_using_decl_bindings): Declare.
* pt.c (tsubst_expr): Call push_using_decl_bindings.

gcc/testsuite/ChangeLog:

PR c++/98231
* g++.dg/lookup/using63.C: New test.
---
 gcc/cp/name-lookup.c  | 10 ++
 gcc/cp/name-lookup.h  |  1 +
 gcc/cp/pt.c   |  3 +++
 gcc/testsuite/g++.dg/lookup/using63.C | 17 +
 4 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/lookup/using63.C

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 5078a0706b9..b4b6c0b81b5 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -9279,4 +9279,14 @@ push_operator_bindings ()
}
 }
 
+/* Wrapper around push_local_binding to push the bindings for
+   a non-member USING_DECL DECL that was found during template parsing.  */
+
+void
+push_using_decl_bindings (tree decl)
+{
+  push_local_binding (DECL_NAME (decl), USING_DECL_DECLS (decl),
+ /*using*/true);
+}
+
 #include "gt-cp-name-lookup.h"
diff --git a/gcc/cp/name-lookup.h b/gcc/cp/name-lookup.h
index 7172079b274..bac3fa71fc9 100644
--- a/gcc/cp/name-lookup.h
+++ b/gcc/cp/name-lookup.h
@@ -478,6 +478,7 @@ extern void push_to_top_level (void);
 extern void pop_from_top_level (void);
 extern void maybe_save_operator_binding (tree);
 extern void push_operator_bindings (void);
+extern void push_using_decl_bindings (tree);
 extern void discard_operator_bindings (tree);
 
 /* Lower level interface for modules. */
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 100c35f053c..c27ef6d9fe0 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -18133,6 +18133,9 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl,
tree scope = USING_DECL_SCOPE (decl);
gcc_checking_assert (scope
 == tsubst (scope, args, complain, in_decl));
+   /* We still need to push the bindings so that we can look up
+  this name later.  */
+   push_using_decl_bindings (decl);
  }
else if (is_capture_proxy (decl)
 && !DECL_TEMPLATE_INSTANTIATION (current_function_decl))
diff --git a/gcc/testsuite/g++.dg/lookup/using63.C 
b/gcc/testsuite/g++.dg/lookup/using63.C
new file mode 100644
index 000..fd4bf26f1ad
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/using63.C
@@ -0,0 +1,17 @@
+// PR c++/98231
+// { dg-do compile }
+
+template  struct basic_ostream {};
+namespace N {
+  template 
+  void operator<<(basic_ostream, T);
+}
+basic_ostream os;
+
+template void
+foo (T value)
+{
+  using N::operator<<;
+  os << value;
+}
+void bar() { foo (1); }

base-commit: 285fa338b06b804e72997c4d876ecf08a9c083af
-- 
2.29.2



Re: use -mfpu=neon for arm/simd/vmmla_1.c

2021-01-13 Thread Alexandre Oliva
On Jan  5, 2021, Alexandre Oliva  wrote:

> So this patch adds -mfpu=auto to the test, overriding any implicit
> flags with the fpu implied by the arch.

>   * gcc.target/arm/simd/vmmla_1.c: Pass -mfpu=auto.

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562798.html

(The subject is outdated; it is -mfpu=auto rather than =neon)

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: declare getpass in analyzer/sensitive-1.c test

2021-01-13 Thread Alexandre Oliva
Hello, David,

Long time no see!  I hope you've had a good time during the holidays.

I'd appreciate your review of this and a couple of other analyzer
testsuite patches we've submitted in the past few weeks.

>   * gcc.dg/analyzer/sensitive-1.c: Declare getpass.

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562493.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562494.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562266.html

Thanks in advance,

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: Add dg-require-wchars to libstdc++ testsuite

2021-01-13 Thread Alexandre Oliva
On Dec 28, 2020, François Dumont  wrote:

> On 22/12/20 10:12 pm, Alexandre Oliva wrote:
>> Some tests uses structures from the libstdc++ that are present only if
>> the target has a wchar.h header.  However, those tests do not check
>> that the target supports those constructs before executing the tests.

> Looks like those tests should be in some sub-folder containing
> 'wchar_t' to be considered as UNSUP.

> Maybe Jonathan will prefer them to be moved even if your approach
> seems more convenient to me.

I'd be glad to make such changes, but I'd appreciate stronger guidance
as to the preferences and the way to go before doing so.  Jonathan,
would you please share your wisdom WRT this patch and the other
wchar_t-related libstdc++ testsuite one?

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562435.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562438.html

>> The function dg-require-wchars checks that "_GLIBCXX_USE_WCHAR_T" is
>> defined by the configure of the libstdc++.  If it is not the case, the
>> test is not executed.

> This check_v3_target_wchars looks like a good candidate to leverage
> on: v3_check_preprocessor_condition.

Nice!, thanks for the tip, I was not aware of this proc.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [Ada,FYI] revamp ada.numerics.aux

2021-01-13 Thread Sebastian Huber

On 13/01/2021 17:45, Alexandre Oliva wrote:


Hello, Sebastian,

On Jan 13, 2021, Sebastian Huber  wrote:


I have a similar issue on riscv-rtems*:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98595
Can this be fixed also with a patch to gcc/ada/Makefile.rtl?

Yeah, that's definitely the way to go.

I'm afraid the fix will be a little more involved than others because
there doesn't seem to be any Makefile.rtl section specific for
riscv-rtems, but that's just a matter of creating one.

This is probably not RTEMS-specific. I guess it is a general riscv issue.


Would you like to take a stab at it yourself, or should I?

I can try to figure out a patch, but I have to guess how it works.

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/



[PATCH] c++: Fix ICE with non-constant satisfaction [PR98644]

2021-01-13 Thread Patrick Palka via Gcc-patches
In the below testcase, the expression of the atomic constraint after
substitution is (int *) NON_LVALUE_EXPR <1> != 0B which is not a C++
constant expression, but its TREE_CONSTANT flag is set (from build2),
so satisfy_atom fails to notice that it's non-constant (and we end
up tripping over the assert in satisfaction_value).

Since TREE_CONSTANT doesn't necessarily correspond to C++ constantness,
this patch makes satisfy_atom instead check is_rvalue_constant_expression.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/10?

gcc/cp/ChangeLog:

PR c++/98644
* constraint.cc (satisfy_atom): Check is_rvalue_constant_expression
instead of TREE_CONSTANT.

gcc/testsuite/ChangeLog:

PR c++/98644
* g++.dg/cpp2a/concepts-pr98644.C: New test.
---
 gcc/cp/constraint.cc  | 2 +-
 gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C | 7 +++
 2 files changed, 8 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 9049d087859..f99a25dc8a4 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2969,7 +2969,7 @@ satisfy_atom (tree t, tree args, sat_info info)
 {
   result = maybe_constant_value (result, NULL_TREE,
 /*manifestly_const_eval=*/true);
-  if (!TREE_CONSTANT (result))
+  if (!is_rvalue_constant_expression (result))
result = error_mark_node;
 }
   result = satisfaction_value (result);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C
new file mode 100644
index 000..6772f72a3ce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr98644.C
@@ -0,0 +1,7 @@
+// PR c++/98644
+// { dg-do compile { target c++20 } }
+
+template concept Signed = bool(T(1)); // { dg-error 
"reinterpret_cast" }
+static_assert(Signed); // { dg-error "non-constant" }
+
+constexpr bool B = requires { requires bool((char *)1); }; // { dg-error 
"reinterpret_cast" }
-- 
2.30.0



Re: calibrate intervals to avoid zero in futures poll test

2021-01-13 Thread Alexandre Oliva
On Jan  5, 2021, Alexandre Oliva  wrote:

> We get occasional failures of 30_threads/future/members/poll.cc
> on some platforms whose high resolution clock doesn't have such a high
> resolution; wait_for_0 ends up as 0, and then some asserts fail as
> intervals measured as longer than zero are tested for less than
> several times zero.

> This patch adds some calibration in the iteration count to set a
> measurable base time interval with some additional margin.

> Regstrapped on x86_64-linux-gnu, and also tested on
> x-arm-wrs-vxworks7r2.  Ok to install?

Ping?

https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562796.html

> for  libstdc++-v3/ChangeLog

>   * testsuite/30_threads/future/members/poll.cc: Calibrate
>   iteration count.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: Add conditional include of vxWorks.h for kernel mode

2021-01-13 Thread Alexandre Oliva
On Dec 29, 2020, Mike Stump  wrote:

> On Dec 22, 2020, at 1:14 PM, Alexandre Oliva  wrote:
>> 
>> In kernel mode, an application must include vxWorks.h before any other
>> system header, this patch adds exactly that to the test that were
>> failing due to a missing declaration that was found in vxWorks.h.

> I'm inclined to rather have a -include vxWorks.h method where you
> figure out when this should be done and add it to the command line
> flags as necessary, that, or have the gcc includes mechanism
> automatically include the file itself when those conditions are
> present.  Although, yet more possibilities exist, like knowing what
> from that file is necessary, and builtinizing that content so that the
> tests pass anyway.

> Thoughts?

I've looked into some alternatives, but in the end realized that this
patch was only needed for legacy versions of the target system, so we
might as well keep it internal and eventually phase it out.  Patch
withdrawn.  Thanks,

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [Ada,FYI] revamp ada.numerics.aux

2021-01-13 Thread Alexandre Oliva
Hello, Sebastian,

On Jan 13, 2021, Sebastian Huber  wrote:

> I have a similar issue on riscv-rtems*:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98595

> Can this be fixed also with a patch to gcc/ada/Makefile.rtl?

Yeah, that's definitely the way to go.

I'm afraid the fix will be a little more involved than others because
there doesn't seem to be any Makefile.rtl section specific for
riscv-rtems, but that's just a matter of creating one.

Would you like to take a stab at it yourself, or should I?

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


[PATCH][pushed] gcc-changelog: Support multiline parentheses wrapping

2021-01-13 Thread Martin Liška

Hello.

The patch is about supporting of the following ChangeLog entries:

gcc/ChangeLog:

* config/i386/i386.md (*fix_trunc_i387_1, *add3_eq,
*add3_ne, *add3_eq_0, *add3_ne_0, *add3_eq,
*fist2__1, *3_1, *di3_doubleword):
Use ix86_pre_reload_split instead of can_create_pseudo_p in condition.
* config/i386/sse.md
(*fix_trunc_i387_1, *add3_eq,
*add3_ne, *add3_eq_0, *add3_ne_0, *add3_eq,
*fist2__1): This should also work.

I verified last 1000 git commits and found 3 valid violations.
The generated ChangeLog entries are identical.

Martin

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Support wrapping of functions
in parentheses that can take multiple lines.
* gcc-changelog/test_email.py: Add tests for it.
* gcc-changelog/test_patches.txt: Add 2 patches.
---
 contrib/gcc-changelog/git_commit.py| 32 -
 contrib/gcc-changelog/test_email.py|  8 
 contrib/gcc-changelog/test_patches.txt | 62 ++
 3 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 59f478670d7..e9dae0a838d 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -214,6 +214,7 @@ class ChangeLogEntry:
 self.lines = []
 self.files = []
 self.file_patterns = []
+self.opened_parentheses = 0
 
 def parse_file_names(self):

 # Whether the content currently processed is between a star prefix the
@@ -223,8 +224,14 @@ class ChangeLogEntry:
 for line in self.lines:
 # If this line matches the star prefix, start the location
 # processing on the information that follows the star.
+# Note that we need to skip macro names that can be in form of:
+#
+# * config/i386/i386.md (*fix_trunc_i387_1,
+# *add3_ne, *add3_eq_0, *add3_ne_0,
+# *fist2__1, *3_1):
+#
 m = star_prefix_regex.match(line)
-if m:
+if m and len(m.group('spaces')) == 1:
 in_location = True
 line = m.group('content')
 
@@ -328,6 +335,7 @@ class GitCommit:

 self.parse_changelog()
 self.parse_file_names()
 self.check_for_empty_description()
+self.check_for_broken_parentheses()
 self.deduce_changelog_locations()
 self.check_file_patterns()
 if not self.errors:
@@ -496,7 +504,8 @@ class GitCommit:
 else:
 m = star_prefix_regex.match(line)
 if m:
-if len(m.group('spaces')) != 1:
+if (len(m.group('spaces')) != 1 and
+last_entry.opened_parentheses == 0):
 msg = 'one space should follow asterisk'
 self.errors.append(Error(msg, line))
 else:
@@ -508,6 +517,7 @@ class GitCommit:
 msg = f'empty group "{needle}" found'
 self.errors.append(Error(msg, line))
 last_entry.lines.append(line)
+self.process_parentheses(last_entry, line)
 else:
 if last_entry.is_empty:
 msg = 'first line should start with a tab, ' \
@@ -515,6 +525,18 @@ class GitCommit:
 self.errors.append(Error(msg, line))
 else:
 last_entry.lines.append(line)
+self.process_parentheses(last_entry, line)
+
+def process_parentheses(self, last_entry, line):
+for c in line:
+if c == '(':
+last_entry.opened_parentheses += 1
+elif c == ')':
+if last_entry.opened_parentheses == 0:
+msg = 'bad wrapping of parenthesis'
+self.errors.append(Error(msg, line))
+else:
+last_entry.opened_parentheses -= 1
 
 def parse_file_names(self):

 for entry in self.changelog_entries:
@@ -538,6 +560,12 @@ class GitCommit:
 msg = 'missing description of a change'
 self.errors.append(Error(msg, line))
 
+def check_for_broken_parentheses(self):

+for entry in self.changelog_entries:
+if entry.opened_parentheses != 0:
+msg = 'bad parentheses wrapping'
+self.errors.append(Error(msg, entry.lines[0]))
+
 def get_file_changelog_location(self, changelog_file):
 for file in self.info.modified_files:
 if file[0] == changelog_file:
diff --git a/contrib/gcc-changelog/test_email.py 
b/contrib/gcc-changelog/test_email.py
index 

[PATCH, Fortran] PR fortran/93524 - ISO_Fortran_binding signed char arrays

2021-01-13 Thread Harris Snyder
On Wed, Jan 13, 2021 at 1:34 AM Harris Snyder  wrote:
>
> Hi everyone,
>
> Sorry, my previous email erroneously referred to unsigned chars / uint8_t,
> when I in fact meant signed chars / int8_t. The actual patch works, but the
> test case files have ‘uint’ in the file names. I’ll provide a corrected patch
> tomorrow to fix the file names.
>
> Harris

Corrected patch below.

OK for master? I don't have write access so I will need someone else
to commit this for me if possible.

Thanks,
Harris Snyder


Fixes a bug in ISO_Fortran_binding.c whereby signed char or int8_t
arrays would cause crashes unless an element size is specified.
Related to PR fortran/93524.

libgfortran/ChangeLog:

* runtime/ISO_Fortran_binding.c (CFI_establish): fixed signed char arrays.

gcc/testsuite/ChangeLog:

* gfortran.dg/iso_fortran_binding_int8_array.f90: New test.
* gfortran.dg/iso_fortran_binding_int8_array_driver.c: New test.

diff --git a/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array.f90
b/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array.f90
new file mode 100644
index 000..31fdf863e0a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array.f90
@@ -0,0 +1,11 @@
+! { dg-do run }
+! { dg-additional-sources iso_fortran_binding_int8_array_driver.c }
+
+module m
+   use iso_c_binding
+contains
+   subroutine fsub( x ) bind(C, name="fsub")
+  integer(c_int8_t), intent(inout) :: x(:)
+  x = x+1
+   end subroutine
+end module
diff --git a/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array_driver.c
b/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array_driver.c
new file mode 100644
index 000..84ed127d525
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/iso_fortran_binding_int8_array_driver.c
@@ -0,0 +1,25 @@
+#include 
+#include 
+#include 
+#include "ISO_Fortran_binding.h"
+
+extern void fsub(CFI_cdesc_t *);
+
+int main(void)
+{
+   int8_t x[] = {1,2,3,4};
+   int N = sizeof(x)/sizeof(x[0]);
+
+   CFI_CDESC_T(1) dat;
+   CFI_index_t ext[1];
+   ext[0] = (CFI_index_t)N;
+   int rc = CFI_establish((CFI_cdesc_t *), , CFI_attribute_other,
+  CFI_type_int8_t, 0, (CFI_rank_t)1, ext);
+   printf("CFI_establish call returned: %d\n", rc);
+
+   fsub((CFI_cdesc_t *) );
+
+   for (int i=0; ibase_addr = base_addr;

   if (type == CFI_type_char || type == CFI_type_ucs4_char ||
-  type == CFI_type_signed_char || type == CFI_type_struct ||
-  type == CFI_type_other)
+  type == CFI_type_struct || type == CFI_type_other)
 dv->elem_len = elem_len;
   else
 {


Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-13 Thread Richard Biener
On Wed, 13 Jan 2021, Qing Zhao wrote:

> 
> 
> > On Jan 13, 2021, at 9:10 AM, Richard Biener  wrote:
> > 
> > On Wed, 13 Jan 2021, Qing Zhao wrote:
> > 
> >> 
> >> 
> >>> On Jan 13, 2021, at 1:39 AM, Richard Biener  wrote:
> >>> 
> >>> On Tue, 12 Jan 2021, Qing Zhao wrote:
> >>> 
>  Hi, 
>  
>  Just check in to see whether you have any comments and suggestions on 
>  this:
>  
>  FYI, I have been continue with Approach D implementation since last week:
>  
>  D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>  .DEFFERED_INIT during expand to
>  real initialization. Adjusting uninitialized pass with the new refs with 
>  “.DEFFERED_INIT”.
>  
>  For the remaining work of Approach D:
>  
>  ** complete the implementation of -ftrivial-auto-var-init=pattern;
>  ** complete the implementation of uninitialized warnings maintenance 
>  work for D. 
>  
>  I have completed the uninitialized warnings maintenance work for D.
>  And finished partial of the -ftrivial-auto-var-init=pattern 
>  implementation. 
>  
>  The following are remaining work of Approach D:
>  
>   ** -ftrivial-auto-var-init=pattern for VLA;
>   **add a new attribute for variable:
>  __attribute((uninitialized)
>  the marked variable is uninitialized intentionaly for performance 
>  purpose.
>   ** adding complete testing cases;
>  
>  
>  Please let me know if you have any objection on my current decision on 
>  implementing approach D. 
> >>> 
> >>> Did you do any analysis on how stack usage and code size are changed 
> >>> with approach D?
> >> 
> >> I did the code size change comparison (I will provide the data in another 
> >> email). And with this data, D works better than A in general. (This is 
> >> surprise to me actually).
> >> 
> >> But not the stack usage.  Not sure how to collect the stack usage data, 
> >> do you have any suggestion on this?
> > 
> > There is -fstack-usage you could use, then of course watching
> > the stack segment at runtime.
> 
> I can do this for CPU2017 to collect the stack usage data and report back.
> 
> >  I'm mostly concerned about
> > stack-limited "processes" such as the linux kernel which I think
> > is a primary target of your work.
> 
> I don’t have any experience on building linux kernel. 
> Do we have to collect data for linux kernel at this time? Is CPU2017 data not 
> enough?

Well, it depends on the desired target.  The linux kernel has a
8kb hard stack limit for kernel threads on x86_64 (IIRC).  You
don't have to do anything, it was just a suggestion.  For normal
program stack usage is probably the least important problem.

Richard.

> Qing
> > 
> > Richard.
> > 
> >> 
> >>> How does compile-time behave (we could gobble up
> >>> lots of .DEFERRED_INIT calls I guess)?
> >> I can collect this data too and report it later.
> >> 
> >> Thanks.
> >> 
> >> Qing
> >>> 
> >>> Richard.
> >>> 
>  Thanks a lot for your help.
>  
>  Qing
>  
>  
> > On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
> >  wrote:
> > 
> > Hi,
> > 
> > This is an update for our previous discussion. 
> > 
> > 1. I implemented the following two different implementations in the 
> > latest upstream gcc:
> > 
> > A. Adding real initialization during gimplification, not maintain the 
> > uninitialized warnings.
> > 
> > D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> > .DEFFERED_INIT during expand to
> > real initialization. Adjusting uninitialized pass with the new refs 
> > with “.DEFFERED_INIT”.
> > 
> > Note, in this initial implementation,
> > ** I ONLY implement -ftrivial-auto-var-init=zero, the 
> > implementation of -ftrivial-auto-var-init=pattern 
> >is not done yet.  Therefore, the performance data is only 
> > about -ftrivial-auto-var-init=zero. 
> > 
> > ** I added an temporary  option 
> > -fauto-var-init-approach=A|B|C|D  to choose implementation A or D for 
> >runtime performance study.
> > ** I didn’t finish the uninitialized warnings maintenance work 
> > for D. (That might take more time than I expected). 
> > 
> > 2. I collected runtime data for CPU2017 on a x86 machine with this new 
> > gcc for the following 3 cases:
> > 
> > no: default. (-g -O2 -march=native )
> > A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> > D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> > 
> > And then compute the slowdown data for both A and D as following:
> > 
> > benchmarks  A / no  D /no
> > 
> > 500.perlbench_r 1.25%   1.25%
> > 502.gcc_r   0.68%   1.80%
> > 505.mcf_r   0.68%   0.14%
> > 520.omnetpp_r   4.83%   4.68%
> 

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-13 Thread Qing Zhao via Gcc-patches



> On Jan 13, 2021, at 9:10 AM, Richard Biener  wrote:
> 
> On Wed, 13 Jan 2021, Qing Zhao wrote:
> 
>> 
>> 
>>> On Jan 13, 2021, at 1:39 AM, Richard Biener  wrote:
>>> 
>>> On Tue, 12 Jan 2021, Qing Zhao wrote:
>>> 
 Hi, 
 
 Just check in to see whether you have any comments and suggestions on this:
 
 FYI, I have been continue with Approach D implementation since last week:
 
 D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
 .DEFFERED_INIT during expand to
 real initialization. Adjusting uninitialized pass with the new refs with 
 “.DEFFERED_INIT”.
 
 For the remaining work of Approach D:
 
 ** complete the implementation of -ftrivial-auto-var-init=pattern;
 ** complete the implementation of uninitialized warnings maintenance work 
 for D. 
 
 I have completed the uninitialized warnings maintenance work for D.
 And finished partial of the -ftrivial-auto-var-init=pattern 
 implementation. 
 
 The following are remaining work of Approach D:
 
  ** -ftrivial-auto-var-init=pattern for VLA;
  **add a new attribute for variable:
 __attribute((uninitialized)
 the marked variable is uninitialized intentionaly for performance purpose.
  ** adding complete testing cases;
 
 
 Please let me know if you have any objection on my current decision on 
 implementing approach D. 
>>> 
>>> Did you do any analysis on how stack usage and code size are changed 
>>> with approach D?
>> 
>> I did the code size change comparison (I will provide the data in another 
>> email). And with this data, D works better than A in general. (This is 
>> surprise to me actually).
>> 
>> But not the stack usage.  Not sure how to collect the stack usage data, 
>> do you have any suggestion on this?
> 
> There is -fstack-usage you could use, then of course watching
> the stack segment at runtime.

I can do this for CPU2017 to collect the stack usage data and report back.

>  I'm mostly concerned about
> stack-limited "processes" such as the linux kernel which I think
> is a primary target of your work.

I don’t have any experience on building linux kernel. 
Do we have to collect data for linux kernel at this time? Is CPU2017 data not 
enough?

Qing
> 
> Richard.
> 
>> 
>>> How does compile-time behave (we could gobble up
>>> lots of .DEFERRED_INIT calls I guess)?
>> I can collect this data too and report it later.
>> 
>> Thanks.
>> 
>> Qing
>>> 
>>> Richard.
>>> 
 Thanks a lot for your help.
 
 Qing
 
 
> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> Hi,
> 
> This is an update for our previous discussion. 
> 
> 1. I implemented the following two different implementations in the 
> latest upstream gcc:
> 
> A. Adding real initialization during gimplification, not maintain the 
> uninitialized warnings.
> 
> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> .DEFFERED_INIT during expand to
> real initialization. Adjusting uninitialized pass with the new refs with 
> “.DEFFERED_INIT”.
> 
> Note, in this initial implementation,
>   ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
> -ftrivial-auto-var-init=pattern 
>  is not done yet.  Therefore, the performance data is only about 
> -ftrivial-auto-var-init=zero. 
> 
>   ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
> choose implementation A or D for 
>  runtime performance study.
>   ** I didn’t finish the uninitialized warnings maintenance work for D. 
> (That might take more time than I expected). 
> 
> 2. I collected runtime data for CPU2017 on a x86 machine with this new 
> gcc for the following 3 cases:
> 
> no: default. (-g -O2 -march=native )
> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> 
> And then compute the slowdown data for both A and D as following:
> 
> benchmarksA / no  D /no
> 
> 500.perlbench_r   1.25%   1.25%
> 502.gcc_r 0.68%   1.80%
> 505.mcf_r 0.68%   0.14%
> 520.omnetpp_r 4.83%   4.68%
> 523.xalancbmk_r   0.18%   1.96%
> 525.x264_r1.55%   2.07%
> 531.deepsjeng_11.57%  11.85%
> 541.leela_r   0.64%   0.80%
> 557.xz_-0.41% -0.41%
> 
> 507.cactuBSSN_r   0.44%   0.44%
> 508.namd_r0.34%   0.34%
> 510.parest_r  0.17%   0.25%
> 511.povray_r  56.57%  57.27%
> 519.lbm_r 0.00%   0.00%
> 521.wrf_r  -0.28% -0.37%
> 526.blender_r 16.96%  17.71%
> 527.cam4_r0.70%   0.53%
> 538.imagick_r 2.40%   

Re: [PATCH, OpenMP 5.0] More implementation of the requires directive

2021-01-13 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 13, 2021 at 11:07:44PM +0800, Chung-Lin Tang wrote:
> 2021-01-13  Chung-Lin Tang  

...
Looks mostly ok, with some nits.

>   * parse.c ("tree.h"): Add include.
>   ("omp-general.h"): Likewise.

I think the usual way is to write:
* parse.c: Include "tree.h" and "omp-general.h".
(gfc_parse_file): Add code to merge omp_requires to omp_requires_mask.

Something I miss in the patch is that for the device API calls (I'd bother
only with direct calls) we should probably set OMP_REQUIRES_TARGET_USED
too, so perhaps do that during gimplification if flag_openmp and
in gimplify_call_expr there is fndecl and DECL_NAME of it is non-NULL and
starts with "omp_" it looks at DECL_ASSEMBLER_NAME and compares that to a
selected list of device APIs.

Also, would it be possible to diagnose .gnu.gomp_requires mismatches also
at link time through the linker plugin/mkoffload?
At least if we have LTO offloading bytecode in and the plugin is doing
something...

> +  if (flag_openmp && (omp_requires_mask & OMP_REQUIRES_TARGET_USED) != 0)
> + {
> +   const char *requires_section = ".gnu.gomp_requires";
> +   tree maskvar = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +  get_identifier (".gomp_requires_mask"),
> +  unsigned_type_node);
> +   SET_DECL_ALIGN (maskvar, TYPE_ALIGN (unsigned_type_node));
> +   TREE_STATIC (maskvar) = 1;
> +   DECL_INITIAL (maskvar)
> + = build_int_cst (unsigned_type_node,
> +  ((unsigned int) omp_requires_mask
> +   & (OMP_REQUIRES_UNIFIED_ADDRESS
> +  | OMP_REQUIRES_UNIFIED_SHARED_MEMORY
> +  | OMP_REQUIRES_REVERSE_OFFLOAD)));
> +   set_decl_section_name (maskvar, requires_section);

This probably needs to sorry if the target doesn't support named sections.
We probably don't support LTO in that case either though.

Also, the diagnostic of the mismatches on the library side should print
details, say that libfoobar is #pragma omp requires unified_shared_memory
while libbar is not.

Jakub



Re: [PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2021-01-13 Thread Chung-Lin Tang

Ping x2.

Hi Jakub, would like this part of OpenMP 5.0 to be considered for GCC 11.

Thanks,
Chung-Lin

On 2020/12/14 6:32 PM, Chung-Lin Tang wrote:

Ping.

On 2020/12/4 10:15 PM, Chung-Lin Tang wrote:

Hi Jakub,
this is a new version of the structure element mapping patch for OpenMP 5.0 
requirement
changes.

This one uses the approach you've outlined in your concept patch [1], basically 
to
use more special REFCOUNT_* values to mark them, and link following structure 
element
splay_tree_keys back to the first key's refcount.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557622.html

Implementation notes of the attached patch:

(1) This patch solves the 5.0 requirements of "not already 
incremented/decremented
because of the effect of a map clause on the construct" by pulling in 
libgomp/hashtab.h
and using htab_t as a pointer set. A "htab_t *refcount_set" is added in 
map/unmap
routines to track the processing status of the uintptr_t* addresses of refcount
fields in splay_tree_keys.

    * Currently this patch is using the same htab_create/htab_free routines 
like in task.c.
  I toyed with creating a 'htab_alloca' macro (allocating a fixed size 
htab) to speed
  things further, but decided to play it safer for the current patch.

(2) Because of the use of pointer-to-refcounts as the basis, and structure 
element
siblings all share a same refcount, uniform increment/decrement without 
repeating is
also naturally achieved.

(3) Because of the need to remove whole structure element sibling sequences out 
of
context, it appears we need to mark the first/last of such a sequence. You'll 
see that
the special REFCOUNT_* values have been expanded a bit more than your concept 
patch
(at some point we should think about stop abusing it and add a proper flags 
word)

(4) The new increment/decrement routines combine most of the new refcount_set 
lookup
code with the refcount adjusting. For the decrement routine, "copy" and 
"removal" are
now separate return values, since for structure element sequences, even when 
signalling
"removal" you may still need to finish the "copy" work of following 
target_var_descs.

(5) There are some re-organizing changes to oacc-parallel.c and oacc-mem.c, but 
most
of the code that matters is in target.c.

(6) New testcases have been added to reflect the cases discussed on omp-lang 
list.

This patch has been tested for libgomp with no regressions on x86_64-linux with
nvptx offloading. Since I submitted the first "v1" patch long ago, is this okay 
to be
considered as committable now after approval?

Thanks,
Chung-Lin

2020-12-04  Chung-Lin Tang  

 libgomp/
 * hashtab.h (htab_clear): New function with initialization code
 factored out from...
 (htab_create): ...here, adjust to use htab_clear function.

 * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
 special refcount values, add comments.
 (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
 (REFCOUNT_LINK): Likewise.
 (REFCOUNT_STRUCTELEM): New special refcount range for structure
 element siblings.
 (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
 sibling maps.
 (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
 (REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
 (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
 (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
 (struct splay_tree_key_s): Add structelem_refcount and
 structelem_refcount_ptr fields into a union with dynamic_refcount.
 Add comments.
 (gomp_map_vars): Delete declaration.
 (gomp_map_vars_async): Likewise.
 (gomp_unmap_vars): Likewise.
 (gomp_unmap_vars_async): Likewise.
 (goacc_map_vars): New declaration.
 (goacc_unmap_vars): Likewise.

 * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
 (goacc_enter_datum): Likewise.
 (goacc_enter_data_internal): Likewise.
 * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
 and goacc_unmap_vars.
 (GOACC_data_start): Adjust to use goacc_map_vars.
 (GOACC_data_end): Adjust to use goacc_unmap_vars.

 * target.c (hash_entry_type): New typedef.
 (htab_alloc): New function hook for hashtab.h.
 (htab_free): Likewise.
 (htab_hash): Likewise.
 (htab_eq): Likewise.
 (hashtab.h): Add file include.
 (gomp_increment_refcount): New function.
 (gomp_decrement_refcount): Likewise.
 (gomp_map_vars_existing): Add refcount_set parameter, adjust to use
 gomp_increment_refcount.
 (gomp_map_fields_existing): Add refcount_set parameter, adjust calls
 to gomp_map_vars_existing.

 (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
 variable to guard OpenMP specific paths, adjust calls to
 gomp_map_vars_existing, add structure element sibling splay_tree_key
 sequence creation code, adjust Fortran map case to 

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-13 Thread Richard Biener
On Wed, 13 Jan 2021, Qing Zhao wrote:

> 
> 
> > On Jan 13, 2021, at 1:39 AM, Richard Biener  wrote:
> > 
> > On Tue, 12 Jan 2021, Qing Zhao wrote:
> > 
> >> Hi, 
> >> 
> >> Just check in to see whether you have any comments and suggestions on this:
> >> 
> >> FYI, I have been continue with Approach D implementation since last week:
> >> 
> >> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> >> .DEFFERED_INIT during expand to
> >> real initialization. Adjusting uninitialized pass with the new refs with 
> >> “.DEFFERED_INIT”.
> >> 
> >> For the remaining work of Approach D:
> >> 
> >> ** complete the implementation of -ftrivial-auto-var-init=pattern;
> >> ** complete the implementation of uninitialized warnings maintenance work 
> >> for D. 
> >> 
> >> I have completed the uninitialized warnings maintenance work for D.
> >> And finished partial of the -ftrivial-auto-var-init=pattern 
> >> implementation. 
> >> 
> >> The following are remaining work of Approach D:
> >> 
> >>   ** -ftrivial-auto-var-init=pattern for VLA;
> >>   **add a new attribute for variable:
> >> __attribute((uninitialized)
> >> the marked variable is uninitialized intentionaly for performance purpose.
> >>   ** adding complete testing cases;
> >> 
> >> 
> >> Please let me know if you have any objection on my current decision on 
> >> implementing approach D. 
> > 
> > Did you do any analysis on how stack usage and code size are changed 
> > with approach D?
> 
> I did the code size change comparison (I will provide the data in another 
> email). And with this data, D works better than A in general. (This is 
> surprise to me actually).
> 
> But not the stack usage.  Not sure how to collect the stack usage data, 
> do you have any suggestion on this?

There is -fstack-usage you could use, then of course watching
the stack segment at runtime.  I'm mostly concerned about
stack-limited "processes" such as the linux kernel which I think
is a primary target of your work.

Richard.

> 
> > How does compile-time behave (we could gobble up
> > lots of .DEFERRED_INIT calls I guess)?
> I can collect this data too and report it later.
> 
> Thanks.
> 
> Qing
> > 
> > Richard.
> > 
> >> Thanks a lot for your help.
> >> 
> >> Qing
> >> 
> >> 
> >>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
> >>>  wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> This is an update for our previous discussion. 
> >>> 
> >>> 1. I implemented the following two different implementations in the 
> >>> latest upstream gcc:
> >>> 
> >>> A. Adding real initialization during gimplification, not maintain the 
> >>> uninitialized warnings.
> >>> 
> >>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
> >>> .DEFFERED_INIT during expand to
> >>> real initialization. Adjusting uninitialized pass with the new refs with 
> >>> “.DEFFERED_INIT”.
> >>> 
> >>> Note, in this initial implementation,
> >>>   ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
> >>> -ftrivial-auto-var-init=pattern 
> >>>  is not done yet.  Therefore, the performance data is only about 
> >>> -ftrivial-auto-var-init=zero. 
> >>> 
> >>>   ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
> >>> choose implementation A or D for 
> >>>  runtime performance study.
> >>>   ** I didn’t finish the uninitialized warnings maintenance work for D. 
> >>> (That might take more time than I expected). 
> >>> 
> >>> 2. I collected runtime data for CPU2017 on a x86 machine with this new 
> >>> gcc for the following 3 cases:
> >>> 
> >>> no: default. (-g -O2 -march=native )
> >>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
> >>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
> >>> 
> >>> And then compute the slowdown data for both A and D as following:
> >>> 
> >>> benchmarksA / no  D /no
> >>> 
> >>> 500.perlbench_r   1.25%   1.25%
> >>> 502.gcc_r 0.68%   1.80%
> >>> 505.mcf_r 0.68%   0.14%
> >>> 520.omnetpp_r 4.83%   4.68%
> >>> 523.xalancbmk_r   0.18%   1.96%
> >>> 525.x264_r1.55%   2.07%
> >>> 531.deepsjeng_11.57%  11.85%
> >>> 541.leela_r   0.64%   0.80%
> >>> 557.xz_-0.41% -0.41%
> >>> 
> >>> 507.cactuBSSN_r   0.44%   0.44%
> >>> 508.namd_r0.34%   0.34%
> >>> 510.parest_r  0.17%   0.25%
> >>> 511.povray_r  56.57%  57.27%
> >>> 519.lbm_r 0.00%   0.00%
> >>> 521.wrf_r  -0.28% -0.37%
> >>> 526.blender_r 16.96%  17.71%
> >>> 527.cam4_r0.70%   0.53%
> >>> 538.imagick_r 2.40%   2.40%
> >>> 544.nab_r 0.00%   -0.65%
> >>> 
> >>> avg   5.17%   5.37%
> >>> 
> >>> From the above data, we can see that in general, the runtime performance 
> >>> slowdown for 
> >>> implementation A and D are similar for individual benchmarks.
> >>> 
> >>> There are several 

[PATCH, OpenMP 5.0] More implementation of the requires directive

2021-01-13 Thread Chung-Lin Tang

Hi Jakub,
this patch provides more implementation of the requires directive, basically:

(1) The collection of the reverse_offload, unified_address, and 
unified_shared_memory
clauses into a .gnu.gomp_requires section

(2) libgomp checking of consistency across the entire .gnu.gomp_requires 
section,
and querying into the offload plugin to see if the offload target supports the 
required
features (as of now, the setting is that none of those features are supported 
by any
of the plugins).

We currently emit errors, but do not fatally cause exit of the program if those
are not met. We're still unsure if complete block-out of program execution is 
the right
thing for the user. This can be discussed later.

Is this okay for trunk after stage1 re-opens?

Thanks,
Chung-Lin

2021-01-13  Chung-Lin Tang  

gcc/c/
* c-parser.c (c_parser_declaration_or_fndef): Set
OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
"omp declare target" attribute.
(c_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
omp_requires_mask.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.

gcc/cp/
* parser.c (cp_parser_simple_declaration): Set
OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
"omp declare target" attribute.
(cp_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
omp_requires_mask.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.

gcc/fortran/
* openmp.c (gfc_check_omp_requires): Fix REVERSE_OFFLOAD typo.
(gfc_match_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.
* parse.c ("tree.h"): Add include.
("omp-general.h"): Likewise.
(gfc_parse_file): Add code to merge omp_requires to omp_requires_mask.

gcc/
* omp-offload.c (omp_finish_file): Add code to reate OpenMP requires
mask variable in .gnu.gomp_requires section if needed.

gcc/testsuite/
* c-c++-common/gomp/requires-4.c: Remove prune of "not supported yet".
* gcc/testsuite/gfortran.dg/gomp/requires-4.f90: Fix REVERSE_OFFLOAD 
typo.
* gcc/testsuite/gfortran.dg/gomp/requires-8.f90: Likewise.

include/
* gomp-constants.h (GOMP_REQUIRES_UNIFIED_ADDRESS): New symbol.
(GOMP_REQUIRES_UNIFIED_SHARED_MEMORY): Likewise.
(GOMP_REQUIRES_REVERSE_OFFLOAD): Likewise.

libgcc/
* offloadstuff.c (__requires_mask_table): New symbol to mark start of
.gnu.gomp_requires section.
(__requires_mask_table_end): New symbol to mark end of
.gnu.gomp_requires section.

libgomp/
* libgomp-plugin.h (GOMP_OFFLOAD_supported_features): New declaration.
* libgomp.h (struct gomp_device_descr): New 'supported_features_func'
plugin hook field.
* oacc-host.c (host_supported_features): New host hook function.
(host_dispatch): Initialize 'supported_features_func' host hook.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_supported_features): New function.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_supported_features): Likewise.
* target.c (): Add include of standard header.
(gomp_requires_mask): New static variable.
(__requires_mask_table): New declaration.
(__requires_mask_table_end): Likewise.
(gomp_load_plugin_for_device): Add loading of 'supported_features' hook.
(gomp_target_init): Add code to summarize .gnu._gomp_requires section
mask values, emit error if inconsistency found.

* testsuite/libgomp.c-c++-common/requires-1.c: New test.
* testsuite/libgomp.c-c++-common/requires-1-aux.c: New file linked with
above test.
* testsuite/libgomp.c-c++-common/requires-2.c: New test.
* testsuite/libgomp.c-c++-common/requires-2-aux.c: New file linked with
above test.

liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_supported_features):
New function.
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index c77d9fccdc2..e685b26746e 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -2475,6 +2475,12 @@ c_parser_declaration_or_fndef (c_parser *parser, bool 
fndef_ok,
  break;
}
 
+  if (flag_openmp
+ && lookup_attribute ("omp declare target",
+  DECL_ATTRIBUTES (current_function_decl)))
+   omp_requires_mask
+ = (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_TARGET_USED);
+
   if (DECL_DECLARED_INLINE_P (current_function_decl))
 tv = 

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

2021-01-13 Thread Qing Zhao via Gcc-patches



> On Jan 13, 2021, at 1:39 AM, Richard Biener  wrote:
> 
> On Tue, 12 Jan 2021, Qing Zhao wrote:
> 
>> Hi, 
>> 
>> Just check in to see whether you have any comments and suggestions on this:
>> 
>> FYI, I have been continue with Approach D implementation since last week:
>> 
>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>> .DEFFERED_INIT during expand to
>> real initialization. Adjusting uninitialized pass with the new refs with 
>> “.DEFFERED_INIT”.
>> 
>> For the remaining work of Approach D:
>> 
>> ** complete the implementation of -ftrivial-auto-var-init=pattern;
>> ** complete the implementation of uninitialized warnings maintenance work 
>> for D. 
>> 
>> I have completed the uninitialized warnings maintenance work for D.
>> And finished partial of the -ftrivial-auto-var-init=pattern implementation. 
>> 
>> The following are remaining work of Approach D:
>> 
>>   ** -ftrivial-auto-var-init=pattern for VLA;
>>   **add a new attribute for variable:
>> __attribute((uninitialized)
>> the marked variable is uninitialized intentionaly for performance purpose.
>>   ** adding complete testing cases;
>> 
>> 
>> Please let me know if you have any objection on my current decision on 
>> implementing approach D. 
> 
> Did you do any analysis on how stack usage and code size are changed 
> with approach D?

I did the code size change comparison (I will provide the data in another 
email). And with this data, D works better than A in general. (This is surprise 
to me actually).

But not the stack usage.  Not sure how to collect the stack usage data, do you 
have any suggestion on this?


> How does compile-time behave (we could gobble up
> lots of .DEFERRED_INIT calls I guess)?
I can collect this data too and report it later.

Thanks.

Qing
> 
> Richard.
> 
>> Thanks a lot for your help.
>> 
>> Qing
>> 
>> 
>>> On Jan 5, 2021, at 1:05 PM, Qing Zhao via Gcc-patches 
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> This is an update for our previous discussion. 
>>> 
>>> 1. I implemented the following two different implementations in the latest 
>>> upstream gcc:
>>> 
>>> A. Adding real initialization during gimplification, not maintain the 
>>> uninitialized warnings.
>>> 
>>> D. Adding  calls to .DEFFERED_INIT during gimplification, expand the 
>>> .DEFFERED_INIT during expand to
>>> real initialization. Adjusting uninitialized pass with the new refs with 
>>> “.DEFFERED_INIT”.
>>> 
>>> Note, in this initial implementation,
>>> ** I ONLY implement -ftrivial-auto-var-init=zero, the implementation of 
>>> -ftrivial-auto-var-init=pattern 
>>>is not done yet.  Therefore, the performance data is only about 
>>> -ftrivial-auto-var-init=zero. 
>>> 
>>> ** I added an temporary  option -fauto-var-init-approach=A|B|C|D  to 
>>> choose implementation A or D for 
>>>runtime performance study.
>>> ** I didn’t finish the uninitialized warnings maintenance work for D. 
>>> (That might take more time than I expected). 
>>> 
>>> 2. I collected runtime data for CPU2017 on a x86 machine with this new gcc 
>>> for the following 3 cases:
>>> 
>>> no: default. (-g -O2 -march=native )
>>> A:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=A 
>>> D:  default +  -ftrivial-auto-var-init=zero -fauto-var-init-approach=D 
>>> 
>>> And then compute the slowdown data for both A and D as following:
>>> 
>>> benchmarks  A / no  D /no
>>> 
>>> 500.perlbench_r 1.25%   1.25%
>>> 502.gcc_r   0.68%   1.80%
>>> 505.mcf_r   0.68%   0.14%
>>> 520.omnetpp_r   4.83%   4.68%
>>> 523.xalancbmk_r 0.18%   1.96%
>>> 525.x264_r  1.55%   2.07%
>>> 531.deepsjeng_  11.57%  11.85%
>>> 541.leela_r 0.64%   0.80%
>>> 557.xz_  -0.41% -0.41%
>>> 
>>> 507.cactuBSSN_r 0.44%   0.44%
>>> 508.namd_r  0.34%   0.34%
>>> 510.parest_r0.17%   0.25%
>>> 511.povray_r56.57%  57.27%
>>> 519.lbm_r   0.00%   0.00%
>>> 521.wrf_r-0.28% -0.37%
>>> 526.blender_r   16.96%  17.71%
>>> 527.cam4_r  0.70%   0.53%
>>> 538.imagick_r   2.40%   2.40%
>>> 544.nab_r   0.00%   -0.65%
>>> 
>>> avg 5.17%   5.37%
>>> 
>>> From the above data, we can see that in general, the runtime performance 
>>> slowdown for 
>>> implementation A and D are similar for individual benchmarks.
>>> 
>>> There are several benchmarks that have significant slowdown with the new 
>>> added initialization for both
>>> A and D, for example, 511.povray_r, 526.blender_, and 531.deepsjeng_r, I 
>>> will try to study a little bit
>>> more on what kind of new initializations introduced such slowdown. 
>>> 
>>> From the current study so far, I think that approach D should be good 
>>> enough for our final implementation. 
>>> So, I will try to finish approach D with the following remaining work
>>> 
>>> ** complete the implementation of 

[PATCH v2] Add line debug info for virtual thunks [PR97937]

2021-01-13 Thread Bernd Edlinger
Hi,

this is a new improved version of my patch.
The previous patch had two defects:
It failed with -ffunction-section.  Although
the line info was emitted, that was not working
since the debug_ranges did not contain the
thunk.
And secondly it failed to address the case of
functions without any source line information.

The new pattch addresses both cases, of DECL_IGNORED_P
functions:

In the case of virtual thunks we emit the line
number from the declaration.
Other than the previous version this patch
also explicitly adds the virtual thunk to the
debug_ranges and debug_aranges.  If that is not
done, the debugger does not recognize the line
table for these functions.

But if that location info is unavailable,
the function is explicitly removed from the
debug_ranges and debug_aranges.  That has
the same effect as a theoretical .noloc assembler
directive.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
From feffb6731523e3a77566c2a5f541c6b90e1ffb19 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger 
Date: Tue, 12 Jan 2021 16:27:53 +0100
Subject: [PATCH] Add line debug info for virtual thunks

There is no debug info when the DECL_IGNORED_P flag
is set.  But sometimes we have the line info of the
function decl, as in the case of on virtual thunks.
So instead of no line info at all, we emit at least
the location of the function decl.
On the other side, there are DECL_IGNORED_P functions
which do not have any source line info at all.
Remove those from the debug_range info, to make it
clear for the debugger that the line info for these
functions is invalid.  This has the effect that the
debugger will not step into the function without
debug info.

2021-01-13  Bernd Edlinger  

	PR ipa/97937
	* debug.h (gcc_debug_hooks): Add set_ignored_loc function pointer.
	* dwarf2out.h (dw_fde_node::ignored_debug): New data item.
	* dbxout.c (dbx_debug_hooks, xcoff_debug_hooks): Add dummy
	set_ignored_loc callbacks.
	* debug.c (do_nothing_debug_hooks): Likewise.
	* vmsdbgout.c (vmsdbg_debug_hooks): Likewise.
	* dwarf2out.c (text_section_used, cold_text_section_used): Remove.
	(in_text_section_p, last_text_label, last_cold_label,
	switch_text_ranges, switch_cold_ranges): New data items.
	(dwarf2out_note_section_used): Remove.
	(dwarf2out_begin_prologue): Set fde->ignored_debug and
	in_text_section_p.
	(mark_ignored_debug_section): New helper function.
	(dwarf2out_end_epilogue, dwarf2out_switch_text_section): Call
	mark_ignored_debug_section.
	(dwarf2_debug_hooks): Use dwarf2out_set_ignored_loc.
	(dwarf2_lineno_debug_hooks): Use dummy for set_ignored_loc.
	(size_of_aranges): Adjust formula for multi-part text ranges size.
	(output_aranges): Output multi-part text ranges.
	(dwarf2out_set_ignored_loc): New callback function.
	(dwarf2out_finish): Output multi-part text ranges.
	(dwarf2out_c_finalize): Clear new data items.
	* final.c (final_start_function_1): Call set_ignored_loc callback.
	(final_scan_insn_1): Likewise.
	* ggc-page.c (gt_ggc_mx): New helper function.
	* stringpool.c (gt_pch_nx): Likewise.
---
 gcc/dbxout.c |   2 +
 gcc/debug.c  |   1 +
 gcc/debug.h  |   4 +
 gcc/dwarf2out.c  | 244 +++
 gcc/dwarf2out.h  |   2 +
 gcc/final.c  |   8 ++
 gcc/ggc-page.c   |   6 ++
 gcc/stringpool.c |   6 ++
 gcc/vmsdbgout.c  |   1 +
 9 files changed, 224 insertions(+), 50 deletions(-)

diff --git a/gcc/dbxout.c b/gcc/dbxout.c
index 70b635c..d20527b 100644
--- a/gcc/dbxout.c
+++ b/gcc/dbxout.c
@@ -362,6 +362,7 @@ const struct gcc_debug_hooks dbx_debug_hooks =
   dbxout_end_block,
   debug_true_const_tree,	 /* ignore_block */
   dbxout_source_line,		 /* source_line */
+  debug_nothing_int_int_charstar,	 /* set_ignored_loc */
   dbxout_begin_prologue,	 /* begin_prologue */
   debug_nothing_int_charstar,	 /* end_prologue */
   debug_nothing_int_charstar,	 /* begin_epilogue */
@@ -409,6 +410,7 @@ const struct gcc_debug_hooks xcoff_debug_hooks =
   xcoffout_end_block,
   debug_true_const_tree,	 /* ignore_block */
   xcoffout_source_line,
+  debug_nothing_int_int_charstar,	 /* set_ignored_loc */
   xcoffout_begin_prologue,	 /* begin_prologue */
   debug_nothing_int_charstar,	 /* end_prologue */
   debug_nothing_int_charstar,	 /* begin_epilogue */
diff --git a/gcc/debug.c b/gcc/debug.c
index 0a7fcfa..39add0d 100644
--- a/gcc/debug.c
+++ b/gcc/debug.c
@@ -36,6 +36,7 @@ const struct gcc_debug_hooks do_nothing_debug_hooks =
   debug_nothing_int_int,	 /* end_block */
   debug_true_const_tree,	 /* ignore_block */
   debug_nothing_int_int_charstar_int_bool, /* source_line */
+  debug_nothing_int_int_charstar,	 /* set_ignored_loc */
   debug_nothing_int_int_charstar,	 /* begin_prologue */
   debug_nothing_int_charstar,	 /* end_prologue */
   debug_nothing_int_charstar,	 /* begin_epilogue */
diff --git a/gcc/debug.h b/gcc/debug.h
index 

Re: [PATCH] Add pytest for a GCOV test-case

2021-01-13 Thread Rainer Orth
Hi David,

>>   On top of all this, I wonder why you insist on a particular Python
>>   version here: I tried your single testcase and it PASSes just as
>> well
>>   with Python 2.7!?  One reason I'm asking is that Solaris 11.3
>> bundles
>>   both Python 2.7 and 3.4, but (unlike Linux and Solaris 11.4) don't
>>   have /usr/bin/python3, just python (which is 2.7), python2.7, and
>>   python3.4.  Not that it matters too much, but you should be aware
>> of
>>   the issue.
>
> In particular, given the differences between Python 2 and Python 3 I
> think it's a good idea to be explicit about the versions of Python that
> we expect i.e. are such tests coded to the common subset of Python 2
> and 3, or just to Python 3?  Given that they're intended to be
> optional, I suggest just Python 3.  (I would like to be able to add
> other Python-based scripts to the DejaGnu suite e.g. to verify JSON
> output, hence I'd prefer to not have to maintain Python 2
> compatibility)

understood.  The whole Python 2 vs. 3 situation is a mess,
unfortunately, but I expect that all systems we deal with that have
Python at all will have some version of Python 3 around, and moving
forward Python 2 will phase out over time given the lack of upstream
support.  Dealing with both at the current time seems a waste of time.
The only possible challenge may be to determine a python3 executable,
given that this exact name seems not to be ubiquitious.  Maybe there's
an autoconf macro that does the necessary work instead of having to do
this in the testsuite directly?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Add pytest for a GCOV test-case

2021-01-13 Thread David Malcolm via Gcc-patches
On Wed, 2021-01-13 at 14:38 +0100, Rainer Orth wrote:
> Hi Martin,
> 
> > On 1/6/21 12:36 AM, Jeff Law wrote:
> > > unresolved "could not find python interpreter $testcase" in
> > > run-gcov-pytest if you find the right magic in the output of your
> > > spawn.
> > 
> > Achieved that with the updated patch.
> > 
> > Ready for master?
> 
> unfortunately, your patch has a large number of problems:
> 
> * On targets where run-gcov-pytest decides that pytest isn't
> available
>   (incorrectly in some cases), mail-report.log is cluttered with
> 
> UNRESOLVED: could not find Python interpreter and (or) pytest module
> for pr98273.C
> 
>   I fear you've been misled by David and Jeff here: UNRESOLVED isn't
>   appropriate for cases like this.  Please read the DejaGnu manual
> for
>   the semantics of the various test outcomes.  If anything (we often
>   just silently skip testcases that cannot be run on some target),
> use
>   UNSUPPORTED instead.

This may be lack of sleep on my part, but looking back at the emails, I
said that this case should be "UNSUPPORTED", but it looks like Martin
misread it as "UNRESOLVED", and I didn't correct him.  Sorry.

> * Besides, the test outcomes are not generic message facilities but
> are
>   supposed to follow a common format:
> 
>   :  []
> 
>   with  the pathname to the test relative to (in this case)
>   gcc/testsuite.  In this case, this might be something like
> 
>   UNSUPPORTED: g++.dg/gcov/pr98273.C run-gcov-pytest
> 
>   Currently, you don't have the pathname in run-gcov-pytest, though.
> 
> * If we now have an (even optional) dependency on python/pytest, this
>   (with the exact versions and use) needs to be documented in
>   install.texi.

Agreed.

[...snip various other issues...]

> 
>   On top of all this, I wonder why you insist on a particular Python
>   version here: I tried your single testcase and it PASSes just as
> well
>   with Python 2.7!?  One reason I'm asking is that Solaris 11.3
> bundles
>   both Python 2.7 and 3.4, but (unlike Linux and Solaris 11.4) don't
>   have /usr/bin/python3, just python (which is 2.7), python2.7, and
>   python3.4.  Not that it matters too much, but you should be aware
> of
>   the issue.

In particular, given the differences between Python 2 and Python 3 I
think it's a good idea to be explicit about the versions of Python that
we expect i.e. are such tests coded to the common subset of Python 2
and 3, or just to Python 3?  Given that they're intended to be
optional, I suggest just Python 3.  (I would like to be able to add
other Python-based scripts to the DejaGnu suite e.g. to verify JSON
output, hence I'd prefer to not have to maintain Python 2
compatibility)

[...snip...]

Dave



Re: [PATCH] Add pytest for a GCOV test-case

2021-01-13 Thread Rainer Orth
Hi Martin,

> On 1/6/21 12:36 AM, Jeff Law wrote:
>> unresolved "could not find python interpreter $testcase" in
>> run-gcov-pytest if you find the right magic in the output of your spawn.
>
> Achieved that with the updated patch.
>
> Ready for master?

unfortunately, your patch has a large number of problems:

* On targets where run-gcov-pytest decides that pytest isn't available
  (incorrectly in some cases), mail-report.log is cluttered with

UNRESOLVED: could not find Python interpreter and (or) pytest module for 
pr98273.C

  I fear you've been misled by David and Jeff here: UNRESOLVED isn't
  appropriate for cases like this.  Please read the DejaGnu manual for
  the semantics of the various test outcomes.  If anything (we often
  just silently skip testcases that cannot be run on some target), use
  UNSUPPORTED instead.

* Besides, the test outcomes are not generic message facilities but are
  supposed to follow a common format:

  :  []

  with  the pathname to the test relative to (in this case)
  gcc/testsuite.  In this case, this might be something like

  UNSUPPORTED: g++.dg/gcov/pr98273.C run-gcov-pytest

  Currently, you don't have the pathname in run-gcov-pytest, though.

* If we now have an (even optional) dependency on python/pytest, this
  (with the exact versions and use) needs to be documented in
  install.texi.

* Speaking of documenting, the new run-gcov-pytest needs to be
  documented in sourcebuild.texi.

* On to the implementation: your test for the presence of pytest is
  wrong:

set result [remote_exec host "pytest -m pytest --version"]

  has nothing to do with what you actually use later: on all of Fedora
  29, Ubuntu 20.04, and Solaris 11.4 (with a caveat) pytest is Python
  2.7 based, but you don't check that.  It is well possible that pytest
  for 2.7 is installed, but pytest for Python 3.x isn't.

  Besides, while Solaris 11.4 does bundle pytest, they don't deliver
  pytest, but only py.test due to a conflict with a different pytest from
  logilab-common, cf. https://github.com/pytest-dev/pytest/issues/1833.

  This is immaterial, however, since what you actually run is

spawn -noecho python3 -m pytest --color=no -rA -s --tb=no 
$srcdir/$subdir/$pytest_script

  So you should just run python3 -m pytest --version instead to check
  for the presence of the version you're going to use.

  Btw., there's a mess with pytest on Fedora 29: running the above gives

[...]
pluggy.PluginValidationError: Plugin 'benchmark' could not be loaded: (pytest 
3.6.4 (/usr/lib/python3.7/site-packages), Requirement.parse('pytest>=3.8'))!

  Seems the packagers have broken things there.

  On top of all this, I wonder why you insist on a particular Python
  version here: I tried your single testcase and it PASSes just as well
  with Python 2.7!?  One reason I'm asking is that Solaris 11.3 bundles
  both Python 2.7 and 3.4, but (unlike Linux and Solaris 11.4) don't
  have /usr/bin/python3, just python (which is 2.7), python2.7, and
  python3.4.  Not that it matters too much, but you should be aware of
  the issue.

  When running the test on Solaris 11.4 (with the bundled pytest 4.4.0),
  I get

= test session starts ==
platform sunos5 -- Python 3.7.9, pytest-4.4.0, py-1.8.0, pluggy-0.9.0
rootdir: /vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/gcov
collected 2 items  

../../../../../../../../../vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/gcov/test-pr98273.py
 ..

=== 2 passed in 0.04 seconds ===

while 4.6.9 on Linux gives

= test session starts ==
platform linux -- Python 3.8.2, pytest-4.6.9, py-1.8.1, pluggy-0.13.0
rootdir: /vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/gcov
collected 2 items  

../../../../../../../../../../vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/gcov/test-pr98273.py
 ..

=== short test summary info 
PASSED 
../../../../../../../../../../vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/gcov/test-pr98273.py::test_basics
PASSED 
../../../../../../../../../../vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/gcov/test-pr98273.py::test_lines
=== 2 passed in 0.17 seconds ===

  Obviously pytest -rA was introduced only after 4.4.0 and the 'A' is
  silently ignored.  Fortunately, I can just use -rap instead which
  works with both versions.

  After this has been processed by gcov.exp, g++.sum contains

PASS:  
../../../../../../../../../../vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/gcov/test-pr98273.py::test_basic
PASS:  
../../../../../../../../../../vol/gcc/src/hg/master/local/gcc/testsuite/g++.dg/gcov/test-pr98273.py::test_line

  which is again completely wrong in light of 

[PATCH][pushed] mklog: support define_insn_and_split format

2021-01-13 Thread Martin Liška

contrib/ChangeLog:

* mklog.py: Parse also define_insn_and_split and similar
directives in .md files.
* test_mklog.py: Test.
---
 contrib/mklog.py  | 12 +++-
 contrib/test_mklog.py | 42 ++
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index e696f5d0388..bf51e56337e 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -49,10 +49,11 @@ macro_regex = 
re.compile(r'#\s*(define|undef)\s+([a-zA-Z0-9_]+)')
 super_macro_regex = re.compile(r'^DEF[A-Z0-9_]+\s*\(([a-zA-Z0-9_]+)')
 fn_regex = re.compile(r'([a-zA-Z_][^()\s]*)\s*\([^*]')
 template_and_param_regex = re.compile(r'<[^<>]*>')
+md_def_regex = re.compile(r'\(define.*\s+"(.*)"')
 bugzilla_url = 'https://gcc.gnu.org/bugzilla/rest.cgi/bug?id=%s;' \
'include_fields=summary'
 
-function_extensions = {'.c', '.cpp', '.C', '.cc', '.h', '.inc', '.def'}

+function_extensions = {'.c', '.cpp', '.C', '.cc', '.h', '.inc', '.def', '.md'}
 
 help_message = """\

 Generate ChangeLog template for PATCH.
@@ -200,6 +201,15 @@ def generate_changelog(data, no_functions=False, 
fill_pr_titles=False):
 for line in hunk:
 m = identifier_regex.match(line.value)
 if line.is_added or line.is_removed:
+# special-case definition in .md files
+m2 = md_def_regex.match(line.value)
+if extension == '.md' and m2:
+fn = m2.group(1)
+if fn not in functions:
+functions.append(fn)
+last_fn = None
+success = True
+
 if not line.value.strip():
 continue
 modified_visited = True
diff --git a/contrib/test_mklog.py b/contrib/test_mklog.py
index 344b7a2c771..7e95ec1a2ab 100755
--- a/contrib/test_mklog.py
+++ b/contrib/test_mklog.py
@@ -399,6 +399,44 @@ gcc/ChangeLog:
 
 '''
 
+PATCH9 = '''\

+diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
+index 2a260c1cfbd..7f03fc491c3 100644
+--- a/gcc/config/i386/sse.md
 b/gcc/config/i386/sse.md
+@@ -17611,6 +17611,23 @@ (define_insn "avx2_v16qiv16hi2"
+(set_attr "prefix" "maybe_evex")
+(set_attr "mode" "OI")])
+
++(define_insn_and_split "*avx2_zero_extendv16qiv16hi2_1"
++  [(set (match_operand:V32QI 0 "register_operand" "=v")
++  (vec_select:V32QI
++(vec_concat:V64QI
++  (match_operand:V32QI 1 "nonimmediate_operand" "vm")
++  (match_operand:V32QI 2 "const0_operand" "C"))
++(match_parallel 3 "pmovzx_parallel"
++  [(match_operand 4 "const_int_operand" "n")])))]
++  "TARGET_AVX2"
++  "#"
++  "&& reload_completed"
++  [(set (match_dup 0) (zero_extend:V16HI (match_dup 1)))]
++{
++  operands[0] = lowpart_subreg (V16HImode, operands[0], V32QImode);
++  operands[1] = lowpart_subreg (V16QImode, operands[1], V32QImode);
++})
++
+ (define_expand "v16qiv16hi2"
+   [(set (match_operand:V16HI 0 "register_operand")
+   (any_extend:V16HI
+'''
+
+EXPECTED9 = '''\
+gcc/ChangeLog:
+
+   * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1):
+
+'''
+
 class TestMklog(unittest.TestCase):
 def test_macro_definition(self):
 changelog = generate_changelog(PATCH1)
@@ -437,3 +475,7 @@ class TestMklog(unittest.TestCase):
 def test_renaming(self):
 changelog = generate_changelog(PATCH8)
 assert changelog == EXPECTED8
+
+def test_define_macro_parsing(self):
+changelog = generate_changelog(PATCH9)
+assert changelog == EXPECTED9
--
2.29.2



c++: Fix cp_build_function_call_vec [PR 98626]

2021-01-13 Thread Nathan Sidwell

I misunderstood the cp_build_function_call_vec API, thinking a NULL
vector was an acceptable way of passing no arguments.  You need to
pass a vector of no elements.

PR c++/98626
gcc/cp/
* module.cc (module_add_import_initializers):  Pass a
zero-element argument vector.


--
Nathan Sidwell
diff --git c/gcc/cp/module.cc w/gcc/cp/module.cc
index d2093916c9e..1fd0bcfe3eb 100644
--- c/gcc/cp/module.cc
+++ w/gcc/cp/module.cc
@@ -18977,8 +18977,8 @@ module_add_import_initializers ()
   if (modules)
 {
   tree fntype = build_function_type (void_type_node, void_list_node);
-  vec *args = NULL;
-  
+  releasing_vec args;  // There are no args
+
   for (unsigned ix = modules->length (); --ix;)
 	{
 	  module_state *import = (*modules)[ix];


[PATCH] tree-optimization/92645 - avoid harmful early BIT_FIELD_REF canonicalization

2021-01-13 Thread Richard Biener
This avoids canonicalizing BIT_FIELD_REF  (a, , 0) to
(T1)a on integer typed a.  This confuses the vectorizer SLP matching.

With this delayed to after vector lowering the testcase in PR92645
from Skia is now finally optimized to reasonable assembly.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2021-01-13  Richard Biener  

PR tree-optimization/92645
* match.pd (BIT_FIELD_REF to conversion): Delay canonicalization
until after vector lowering.

* gcc.target/i386/pr92645-7.c: New testcase.
---
 gcc/match.pd  |  2 ++
 gcc/testsuite/gcc.target/i386/pr92645-7.c | 24 +++
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92645-7.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c286a540c4e..60c383da13b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6075,6 +6075,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   /* Low-parts can be reduced to integral conversions.
  ???  The following doesn't work for PDP endian.  */
   || (BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN
+  /* But only do this after vectorization.  */
+  && canonicalize_math_after_vectorization_p ()
   /* Don't even think about BITS_BIG_ENDIAN.  */
   && TYPE_PRECISION (TREE_TYPE (@0)) % BITS_PER_UNIT == 0
   && TYPE_PRECISION (type) % BITS_PER_UNIT == 0
diff --git a/gcc/testsuite/gcc.target/i386/pr92645-7.c 
b/gcc/testsuite/gcc.target/i386/pr92645-7.c
new file mode 100644
index 000..e4c04c2a82a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92645-7.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -msse2" } */
+
+typedef long v2di __attribute__((vector_size(16)));
+typedef int v4si __attribute__((vector_size(16)));
+
+void bar (v4si *p, __int128_t *q)
+{
+  union { __int128_t a; v4si b; } u;
+  u.a = *q;
+  (*p)[0] = u.b[0];
+  (*p)[1] = u.b[2];
+  (*p)[2] = u.b[1];
+  (*p)[3] = u.b[3];
+}
+
+/* The function should end up with sth like
+ [v]pshufd $216, (%esi), %xmm0
+ [v]movdqa %xmm0, (%edi)
+ ret
+   recognized by SLP vectorization involving an existing "vector".  */
+/* { dg-final { scan-assembler-not "punpck" } } */
+/* { dg-final { scan-assembler-times "pshufd" 1 } } */
-- 
2.26.2


[pushed] aarch64: Add support for unpacked SVE MLS and MSB

2021-01-13 Thread Richard Sandiford via Gcc-patches
This patch extends the MLS/MSB patterns to support unpacked
integer vectors.  The type suffix could be either the element
size or the container size, but using the element size should
be more efficient.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/
* config/aarch64/aarch64-sve.md (fnma4): Extend from SVE_FULL_I
to SVE_I.
(@aarch64_pred_fnma, cond_fnma, *cond_fnma_2)
(*cond_fnma_4, *cond_fnma_any): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/mls_2.c: New test.
* g++.target/aarch64/sve/cond_mls_1.C: Likewise.
* g++.target/aarch64/sve/cond_mls_2.C: Likewise.
* g++.target/aarch64/sve/cond_mls_3.C: Likewise.
* g++.target/aarch64/sve/cond_mls_4.C: Likewise.
* g++.target/aarch64/sve/cond_mls_5.C: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 88 +--
 .../g++.target/aarch64/sve/cond_mls_1.C   | 33 +++
 .../g++.target/aarch64/sve/cond_mls_2.C   | 33 +++
 .../g++.target/aarch64/sve/cond_mls_3.C   | 33 +++
 .../g++.target/aarch64/sve/cond_mls_4.C   | 36 
 .../g++.target/aarch64/sve/cond_mls_5.C   | 33 +++
 gcc/testsuite/gcc.target/aarch64/sve/mls_2.c  | 34 +++
 7 files changed, 246 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mls_1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mls_2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mls_3.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mls_4.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mls_5.C
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mls_2.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index ac8a9b4b167..da15bd87885 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6695,14 +6695,14 @@ (define_insn_and_rewrite "*cond_fma_any"
 
 ;; Unpredicated integer subtraction of product.
 (define_expand "fnma4"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand")
-   (minus:SVE_FULL_I
- (match_operand:SVE_FULL_I 3 "register_operand")
- (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand")
+   (minus:SVE_I
+ (match_operand:SVE_I 3 "register_operand")
+ (unspec:SVE_I
[(match_dup 4)
-(mult:SVE_FULL_I
-  (match_operand:SVE_FULL_I 1 "register_operand")
-  (match_operand:SVE_FULL_I 2 "general_operand"))]
+(mult:SVE_I
+  (match_operand:SVE_I 1 "register_operand")
+  (match_operand:SVE_I 2 "general_operand"))]
UNSPEC_PRED_X)))]
   "TARGET_SVE"
   {
@@ -6714,14 +6714,14 @@ (define_expand "fnma4"
 
 ;; Predicated integer subtraction of product.
 (define_insn "@aarch64_pred_fnma"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, w, ?")
-   (minus:SVE_FULL_I
- (match_operand:SVE_FULL_I 4 "register_operand" "w, 0, w")
- (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?")
+   (minus:SVE_I
+ (match_operand:SVE_I 4 "register_operand" "w, 0, w")
+ (unspec:SVE_I
[(match_operand: 1 "register_operand" "Upl, Upl, Upl")
-(mult:SVE_FULL_I
-  (match_operand:SVE_FULL_I 2 "register_operand" "%0, w, w")
-  (match_operand:SVE_FULL_I 3 "register_operand" "w, w, w"))]
+(mult:SVE_I
+  (match_operand:SVE_I 2 "register_operand" "%0, w, w")
+  (match_operand:SVE_I 3 "register_operand" "w, w, w"))]
UNSPEC_PRED_X)))]
   "TARGET_SVE"
   "@
@@ -6733,15 +6733,15 @@ (define_insn "@aarch64_pred_fnma"
 
 ;; Predicated integer subtraction of product with merging.
 (define_expand "cond_fnma"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand")
-   (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand")
+   (unspec:SVE_I
[(match_operand: 1 "register_operand")
-(minus:SVE_FULL_I
-  (match_operand:SVE_FULL_I 4 "register_operand")
-  (mult:SVE_FULL_I
-(match_operand:SVE_FULL_I 2 "register_operand")
-(match_operand:SVE_FULL_I 3 "general_operand")))
-(match_operand:SVE_FULL_I 5 "aarch64_simd_reg_or_zero")]
+(minus:SVE_I
+  (match_operand:SVE_I 4 "register_operand")
+  (mult:SVE_I
+(match_operand:SVE_I 2 "register_operand")
+(match_operand:SVE_I 3 "general_operand")))
+(match_operand:SVE_I 5 "aarch64_simd_reg_or_zero")]
UNSPEC_SEL))]
   "TARGET_SVE"
   {
@@ -6756,14 +6756,14 @@ (define_expand "cond_fnma"
 
 ;; Predicated integer subtraction of product, merging with the first input.
 (define_insn "*cond_fnma_2"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, ?")
-   (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" 

[pushed] aarch64: Add support for unpacked SVE MLA and MAD

2021-01-13 Thread Richard Sandiford via Gcc-patches
This patch extends the MLA/MAD patterns to support unpacked
integer vectors.  The type suffix could be either the element
size or the container size, but using the element size should
be more efficient.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/
* config/aarch64/aarch64-sve.md (fma4): Extend from SVE_FULL_I
to SVE_I.
(@aarch64_pred_fma, cond_fma, *cond_fma_2)
(*cond_fma_4, *cond_fma_any): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/mla_2.c: New test.
* g++.target/aarch64/sve/cond_mla_1.C: Likewise.
* g++.target/aarch64/sve/cond_mla_2.C: Likewise.
* g++.target/aarch64/sve/cond_mla_3.C: Likewise.
* g++.target/aarch64/sve/cond_mla_4.C: Likewise.
* g++.target/aarch64/sve/cond_mla_5.C: Likewise.
---
 gcc/config/aarch64/aarch64-sve.md | 88 +--
 .../g++.target/aarch64/sve/cond_mla_1.C   | 33 +++
 .../g++.target/aarch64/sve/cond_mla_2.C   | 33 +++
 .../g++.target/aarch64/sve/cond_mla_3.C   | 33 +++
 .../g++.target/aarch64/sve/cond_mla_4.C   | 36 
 .../g++.target/aarch64/sve/cond_mla_5.C   | 33 +++
 gcc/testsuite/gcc.target/aarch64/sve/mla_2.c  | 34 +++
 7 files changed, 246 insertions(+), 44 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mla_1.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mla_2.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mla_3.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mla_4.C
 create mode 100644 gcc/testsuite/g++.target/aarch64/sve/cond_mla_5.C
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/mla_2.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index a6f8450f951..ac8a9b4b167 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -6554,15 +6554,15 @@ (define_insn "*3_ptest"
 
 ;; Unpredicated integer addition of product.
 (define_expand "fma4"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand")
-   (plus:SVE_FULL_I
- (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand")
+   (plus:SVE_I
+ (unspec:SVE_I
[(match_dup 4)
-(mult:SVE_FULL_I
-  (match_operand:SVE_FULL_I 1 "register_operand")
-  (match_operand:SVE_FULL_I 2 "nonmemory_operand"))]
+(mult:SVE_I
+  (match_operand:SVE_I 1 "register_operand")
+  (match_operand:SVE_I 2 "nonmemory_operand"))]
UNSPEC_PRED_X)
- (match_operand:SVE_FULL_I 3 "register_operand")))]
+ (match_operand:SVE_I 3 "register_operand")))]
   "TARGET_SVE"
   {
 if (aarch64_prepare_sve_int_fma (operands, PLUS))
@@ -6573,15 +6573,15 @@ (define_expand "fma4"
 
 ;; Predicated integer addition of product.
 (define_insn "@aarch64_pred_fma"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, w, ?")
-   (plus:SVE_FULL_I
- (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?")
+   (plus:SVE_I
+ (unspec:SVE_I
[(match_operand: 1 "register_operand" "Upl, Upl, Upl")
-(mult:SVE_FULL_I
-  (match_operand:SVE_FULL_I 2 "register_operand" "%0, w, w")
-  (match_operand:SVE_FULL_I 3 "register_operand" "w, w, w"))]
+(mult:SVE_I
+  (match_operand:SVE_I 2 "register_operand" "%0, w, w")
+  (match_operand:SVE_I 3 "register_operand" "w, w, w"))]
UNSPEC_PRED_X)
- (match_operand:SVE_FULL_I 4 "register_operand" "w, 0, w")))]
+ (match_operand:SVE_I 4 "register_operand" "w, 0, w")))]
   "TARGET_SVE"
   "@
mad\t%0., %1/m, %3., %4.
@@ -6592,15 +6592,15 @@ (define_insn "@aarch64_pred_fma"
 
 ;; Predicated integer addition of product with merging.
 (define_expand "cond_fma"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand")
-   (unspec:SVE_FULL_I
+  [(set (match_operand:SVE_I 0 "register_operand")
+   (unspec:SVE_I
  [(match_operand: 1 "register_operand")
-  (plus:SVE_FULL_I
-(mult:SVE_FULL_I
-  (match_operand:SVE_FULL_I 2 "register_operand")
-  (match_operand:SVE_FULL_I 3 "general_operand"))
-(match_operand:SVE_FULL_I 4 "register_operand"))
-  (match_operand:SVE_FULL_I 5 "aarch64_simd_reg_or_zero")]
+  (plus:SVE_I
+(mult:SVE_I
+  (match_operand:SVE_I 2 "register_operand")
+  (match_operand:SVE_I 3 "general_operand"))
+(match_operand:SVE_I 4 "register_operand"))
+  (match_operand:SVE_I 5 "aarch64_simd_reg_or_zero")]
  UNSPEC_SEL))]
   "TARGET_SVE"
   {
@@ -6615,14 +6615,14 @@ (define_expand "cond_fma"
 
 ;; Predicated integer addition of product, merging with the first input.
 (define_insn "*cond_fma_2"
-  [(set (match_operand:SVE_FULL_I 0 "register_operand" "=w, 

Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-13 Thread Rainer Orth
Hi Clement,

>> OTOH, I wondered if it wouldn't be better to start from the GNU code
>> which is certainly known to work, rather than the DragonflyBSD one which
>> may well have bitrotten since most of the BSDs moved to LLVM.  Then
>> again, it may not: Gerald tests on FreeBSD regularly.  Perhaps a 3-way
> merge of gnu and *bsd -> ieee_1003.1-2008 is in order?

not in the sense of actually combining the code bases, I'd say, but
certainly comparing all three.  gnu is guaranteed to be better
maintained/kept up to date.

> Gnu model is really different as it implements catalogues, is using
> nl_langinfo_l with GNU specific defines instead of localeconv_l, and few
> other stuffs like this.

True: we can only take parts that are in POSIX.1-2008/XPG7, obviously.
But checking for differences is still in order, I believe.

> I'll check if some parts might be interesting. But BSD seems closer to want
> we actually want.

Right, except for the fear that the code has partially bitrotten.  It's
up to Jonathan, of course, to decide if we're better off keeping gnu,
dragonfly (better renamed to bsd to match actual use) and
ieee-1003.1-2008 separate or update/rename the dragonfly code to work on
both the BSDs and POSIX.1-2008 systems.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-13 Thread CHIGOT, CLEMENT via Gcc-patches
> > However, is the C++ ABI not standard ? I don't have this test failing on
> > AIX, AFAIK. Strange.
>
> Not really: the ABI tests rely on symbol versioning, which is only
> available on Solaris and GNU ELF targets.
Maybe it's normal then.

> OTOH, I wondered if it wouldn't be better to start from the GNU code
> which is certainly known to work, rather than the DragonflyBSD one which
> may well have bitrotten since most of the BSDs moved to LLVM.  Then
> again, it may not: Gerald tests on FreeBSD regularly.  Perhaps a 3-way
merge of gnu and *bsd -> ieee_1003.1-2008 is in order?
Gnu model is really different as it implements catalogues, is using
nl_langinfo_l with GNU specific defines instead of localeconv_l, and few
other stuffs like this.
I'll check if some parts might be interesting. But BSD seems closer to want
we actually want.



Re: [PATCH] match.pd, v2: Fold (~X | C) ^ D into (X | C) ^ (~D ^ C) if (~D ^ C) can be simplified [PR96691]

2021-01-13 Thread Richard Biener
On Wed, 13 Jan 2021, Jakub Jelinek wrote:

> On Wed, Jan 13, 2021 at 08:35:01AM +0100, Richard Biener wrote:
> > I guess you could use
> > 
> >   (bit_xor (bit_ior @0 @1) (bit_xor! (bit_not! @2) @1))
> 
> I wasn't aware of !.
> I had to put it under #if GIMPLE because ! doesn't seem to be implemented
> for GENERIC, but otherwise it works fine.

Ah, true.

> Though, shouldn't I add then :c on bit_xor and bit_ior/bit_and then?
> The rationale for not adding them before was that constants are
> canonicalized to go last, but if we simplify other things, they might not be
> last anymore...

Yeah, that's true.

> Ok for trunk if it passes another bootstrap/regtest?

OK with the missing :c added.

Thanks,
Richard.

> 2021-01-13  Jakub Jelinek  
> 
>   PR tree-optimization/96691
>   * match.pd ((~X | C) ^ D -> (X | C) ^ (~D ^ C),
>   (~X & C) ^ D -> (X & C) ^ (D ^ C)): New simplifications if
>   (~D ^ C) or (D ^ C) can be simplified.
> 
>   * gcc.dg/tree-ssa/pr96691.c: New test.
> 
> --- gcc/match.pd.jj   2021-01-13 08:01:58.197627154 +0100
> +++ gcc/match.pd  2021-01-13 10:25:32.327321918 +0100
> @@ -947,8 +947,18 @@ (define_operator_list COND_TERNARY
>   (bit_ior:c (bit_xor:c@3 @0 @1) (bit_xor:c (bit_xor:c @1 @2) @0))
>   (bit_ior @3 @2))
>  
> -/* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
>  #if GIMPLE
> +/* (~X | C) ^ D -> (X | C) ^ (~D ^ C) if (~D ^ C) can be simplified.  */
> +(simplify
> + (bit_xor (bit_ior:s (bit_not @0) @1) @2)
> +  (bit_xor (bit_ior @0 @1) (bit_xor! (bit_not! @2) @1)))
> +
> +/* (~X & C) ^ D -> (X & C) ^ (D ^ C) if (D ^ C) can be simplified.  */
> +(simplify
> + (bit_xor (bit_and:s (bit_not @0) @1) @2)
> +  (bit_xor (bit_and @0 @1) (bit_xor! @2 @1)))
> +
> +/* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
>  (simplify
>   (bit_and (bit_not SSA_NAME@0) INTEGER_CST@1)
>   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> --- gcc/testsuite/gcc.dg/tree-ssa/pr96691.c.jj2021-01-13 
> 10:16:45.828331557 +0100
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr96691.c   2021-01-13 10:16:45.828331557 
> +0100
> @@ -0,0 +1,21 @@
> +/* PR tree-optimization/96691 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times " \\\| 123;" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\& 123;" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\^ -315;" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " \\\^ 314;" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " \\\^ 321;" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " = ~" "optimized" } } */
> +
> +int
> +foo (int x)
> +{
> +  return (~x | 123) ^ 321;
> +}
> +
> +int
> +bar (int x)
> +{
> +  return (~x & 123) ^ 321;
> +}
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-13 Thread Rainer Orth
Hi Clement,

>> This alone makes the patch inacceptable in its present form: breaking
>> the libstdc++ ABI is a non-starter.  However, I suspect this can be
>> avoided somehow.
> Thanks for having tried and I agree the patch isn't ready at all. I've just
> wanted to see how behave on other systems and it seems that it might
> not work as is.

even if it did, this is certainly not stage3 material.  However,
submitting the patch early gives us enough time to work out the issues
in time for GCC 12.

> However, is the C++ ABI not standard ? I don't have this test failing on
> AIX, AFAIK. Strange.

Not really: the ABI tests rely on symbol versioning, which is only
available on Solaris and GNU ELF targets.

>> +FAIL: 22_locale/classification/isblank.cc execution test
>>
>> /vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/classification/isblank.cc:38:
>> void test02(): Assertion 'std::isblank(L' ', std::locale::classic())'
>> failed.
>>
>> It turns out that this is caused by ieee_1003.1-2008/ctype_members.cc
>> using __bitmapsize = 11 in a couple of places, unlike the generic
>> version which uses 15 to accomodate variations in  character
>> classifications.
> 11 is the correct value on AIX and maybe also in Dragnofly/FreeBSD,
> based on the value of Dragonfly model. However, big endian vs little
> endian needs to be handled. I haven't fix that part on the current
> patch I guess. (when bitmapsize = 15, there is no difference, endianness
> doesn't matter). However, it might be simpler to set bitmapsize = 15
> for everyone, instead of having some defines for that. Maybe.

I guess so, because this matches what the generic version does:
obviously there are targets that need it depending in their 
constants, and Solaris may be just one of them.

OTOH, I wondered if it wouldn't be better to start from the GNU code
which is certainly known to work, rather than the DragonflyBSD one which
may well have bitrotten since most of the BSDs moved to LLVM.  Then
again, it may not: Gerald tests on FreeBSD regularly.  Perhaps a 3-way
merge of gnu and *bsd -> ieee_1003.1-2008 is in order?

>> However, there are many more which I haven't even started to
>> investigate.  I suspect there's one (or a few) reasons immediately
>> obvious to someone familiar with the code.
> Thanks for the list. I'll check if there is some common with AIX. But
> wait until I push the new patch to start studying them.
> It'll have the correct #ifdef based on configure and I'll try it on Linux.
> Thus, it should be far better.

That has been my plan: you're obviously way more familiar with this code
than I am.

Thanks for doing all this work.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-13 Thread CHIGOT, CLEMENT via Gcc-patches
Hi Rainer,

> This alone makes the patch inacceptable in its present form: breaking
> the libstdc++ ABI is a non-starter.  However, I suspect this can be
> avoided somehow.
Thanks for having tried and I agree the patch isn't ready at all. I've just
wanted to see how behave on other systems and it seems that it might
not work as is.
However, is the C++ ABI not standard ? I don't have this test failing on
AIX, AFAIK. Strange.


> +FAIL: 22_locale/classification/isblank.cc execution test
>
> /vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/classification/isblank.cc:38:
>  void test02(): Assertion 'std::isblank(L' ', std::locale::classic())' failed.
>
> It turns out that this is caused by ieee_1003.1-2008/ctype_members.cc
> using __bitmapsize = 11 in a couple of places, unlike the generic
> version which uses 15 to accomodate variations in  character
> classifications.
11 is the correct value on AIX and maybe also in Dragnofly/FreeBSD,
based on the value of Dragonfly model. However, big endian vs little
endian needs to be handled. I haven't fix that part on the current
patch I guess. (when bitmapsize = 15, there is no difference, endianness
doesn't matter). However, it might be simpler to set bitmapsize = 15
for everyone, instead of having some defines for that. Maybe.

> However, there are many more which I haven't even started to
> investigate.  I suspect there's one (or a few) reasons immediately
> obvious to someone familiar with the code.
Thanks for the list. I'll check if there is some common with AIX. But
wait until I push the new patch to start studying them.
It'll have the correct #ifdef based on configure and I'll try it on Linux.
Thus, it should be far better.


Thanks,
Clément


Re: [committed][nvptx] Set -misa=sm_35 by default

2021-01-13 Thread Thomas Schwinge
Hi Tom!

On 2020-10-09T13:56:09+0200, Tom de Vries  wrote:
> The nvptx-as assembler verifies the ptx code using ptxas, if there's any
> in the PATH.
>

After quite some digression to first add a testsuite to nvptx-tools (see
 or just
), which
I found advisable generally, and then given the kinds of changes we're
now doing :-) -- I've now prepared nvptx-as code changes as discussed in
 "nvptx-as
should not assume a default sm version".  (Currently testing.)

> The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of the
> latest cuda release (11.1) no longer supports sm_30.
>
> Consequently we cannot build gcc against that release (although we should
> still be able to build without any cuda release).
>
> Fix this by setting -misa=sm_35 by default.
>
> Tested check-gcc on nvptx.
>
> Tested libgomp on x86_64-linux with nvpx accelerator.
>
> Both build against cuda 9.1.
>
> Committed to trunk.

ACK.

What is your opinion about backporting that (plus Tobias' documentation
update, plus corresponding web 'changes.html' updates?) to release
branches, so that nvptx offloading users may use GCC 10/9/8 with CUDA
11.0+?

I don't think losing sm_30 support is a major concern: support for sm_35
has been introduced with PTX ISA 3.1, CUDA 5.0, driver r302.


Further:

> [nvptx] Set -misa=sm_35 by default

>   PR target/97348
>   * config/nvptx/nvptx.h (ASM_SPEC): Also pass -m to nvptx-as if
>   default is used.
>   * config/nvptx/nvptx.opt (misa): Init with PTX_ISA_SM35.

> --- a/gcc/config/nvptx/nvptx.h
> +++ b/gcc/config/nvptx/nvptx.h

> -#define ASM_SPEC "%{misa=*:-m %*}"
> +/* Default needs to be in sync with default for misa in nvptx.opt.
> +   We add a default here to work around a hard-coded sm_30 default in
> +   nvptx-as.  */
> +#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"

> --- a/gcc/config/nvptx/nvptx.opt
> +++ b/gcc/config/nvptx/nvptx.opt

> +; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
>  misa=
> -Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
> Init(PTX_ISA_SM30)
> +Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
> Init(PTX_ISA_SM35)
>  Specify the version of the ptx ISA to use.

As I'd suggested in
 "nvptx-as
should not assume a default sm version", I'd then push the attached
"[nvptx] Let nvptx-as figure out the target architecture [PR97348]" to
GCC master branch, OK?  (Currently testing.)

That one I wouldn't backport to GCC release branches, so that we don't
require users to update nvptx-tools for these builds.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From eac0d3458f38cd5bb4c930b2887a547b64b046ef Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 13 Jan 2021 09:04:47 +0100
Subject: [PATCH] [nvptx] Let nvptx-as figure out the target architecture
 [PR97348]

... now that it has been enhanced to do so.

This is a follow-up to PR97348 commit 383400a6078d75bbfa1216c9af2c37f7e88740c9
"[nvptx] Set -misa=sm_35 by default".

	gcc/
	PR target/97348
	* config/nvptx/nvptx.h (ASM_SPEC): Don't set.
	* config/nvptx/nvptx.opt (misa): Adjust comment.
---
 gcc/config/nvptx/nvptx.h   | 5 -
 gcc/config/nvptx/nvptx.opt | 1 -
 2 files changed, 6 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 2451703e77f..1a61e6207f6 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -29,11 +29,6 @@
 
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
-/* Default needs to be in sync with default for misa in nvptx.opt.
-   We add a default here to work around a hard-coded sm_30 default in
-   nvptx-as.  */
-#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
-
 #define TARGET_CPU_CPP_BUILTINS()		\
   do		\
 {		\
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 51363e4e276..cf7f9022663 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -61,7 +61,6 @@ Enum(ptx_isa) String(sm_30) Value(PTX_ISA_SM30)
 EnumValue
 Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35)
 
-; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
 misa=
 Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) Init(PTX_ISA_SM35)
 Specify the version of the ptx ISA to use.
-- 
2.17.1



Re: [PATCH] libstdc++: implement locale support for AIX

2021-01-13 Thread Rainer Orth
Hi David,

>> I agree that #ifdef's are not the correct approach, but, if you enable
>> the fallbacks for Solaris, does everything then work?  Are those
>> fallbacks portable and we solely need a better mechanism to enable
>> them on platforms that require them?
>
> it mostly compiles, with two caveats:
>
> * c_locale.h needs to include  for declarations of strtod and
>   friends.
>
> * Solaris  only declares the int_p_cs_precedes etc. members of
>   struct lconv for C99+, but not for C++11+, as it should.  I'll file a
>   bug for that, but for now one can work around the issue by defining
>   _LCONV_C99 before including  in monetary_members.cc.
>
> With those changes, I can at least build libstdc++ with
> --enable-clocale=ieee_1003.1-2008.  Bootstrap still running, though.

while that allowed the compilation to succeed, test results are not
good:

+FAIL: libstdc++-abi/abi_check

# of added symbols:  173
# of missing symbols:72
# of undesignated symbols:   0
# of incompatible symbols:   144

This alone makes the patch inacceptable in its present form: breaking
the libstdc++ ABI is a non-starter.  However, I suspect this can be
avoided somehow.

+FAIL: 22_locale/classification/isblank.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/classification/isblank.cc:38:
 void test02(): Assertion 'std::isblank(L' ', std::locale::classic())' failed.

It turns out that this is caused by ieee_1003.1-2008/ctype_members.cc
using __bitmapsize = 11 in a couple of places, unlike the generic
version which uses 15 to accomodate variations in  character
classifications.

Making this change lets a few tests PASS:

@@ -3354 +3354 @@
-FAIL: 22_locale/classification/isblank.cc execution test
+PASS: 22_locale/classification/isblank.cc execution test
@@ -3621 +3621 @@
-FAIL: 22_locale/ctype/is/wchar_t/1.cc execution test
+PASS: 22_locale/ctype/is/wchar_t/1.cc execution test
@@ -3655 +3655 @@
-FAIL: 22_locale/ctype/scan/wchar_t/1.cc execution test
+PASS: 22_locale/ctype/scan/wchar_t/1.cc execution test
@@ -3657 +3657 @@
-FAIL: 22_locale/ctype/scan/wchar_t/wrapped_env.cc execution test
+PASS: 22_locale/ctype/scan/wchar_t/wrapped_env.cc execution test
@@ -10864 +10864 @@
-FAIL: 28_regex/algorithms/regex_match/ecma/wchar_t/hex.cc execution test
+PASS: 28_regex/algorithms/regex_match/ecma/wchar_t/hex.cc execution test
@@ -11070 +11070 @@
-FAIL: 28_regex/traits/wchar_t/isctype.cc execution test
+PASS: 28_regex/traits/wchar_t/isctype.cc execution test

However, there are many more which I haven't even started to
investigate.  I suspect there's one (or a few) reasons immediately
obvious to someone familiar with the code.

+FAIL: 22_locale/codecvt/in/wchar_t/3.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/codecvt/in/wchar_t/3.cc:118:
 void test03(): Assertion '!int_traits::compare(i_arr, i_lit, size)' failed.

+FAIL: 22_locale/codecvt/max_length/wchar_t/4.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/codecvt/max_length/wchar_t/4.cc:41:
 void test04(): Assertion 'k == 6' failed.

+FAIL: 22_locale/codecvt/out/wchar_t/3.cc execution test

vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/codecvt/out/wchar_t/3.cc:113:
 void test03(): Assertion 'r2 == codecvt_base::ok' failed.

+FAIL: 22_locale/collate/compare/wchar_t/3.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/collate/compare/wchar_t/3.cc:61:
 void test03(): Assertion 'i == -1' failed.

+FAIL: 22_locale/collate/transform/wchar_t/3.cc execution test

terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::append

+FAIL: 22_locale/ctype/is/wchar_t/1.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/ctype/is/wchar_t/1.cc:71:
 void test01(): Assertion 'gctype.is(std::ctype_base::punct, c40)' failed.

+FAIL: 22_locale/ctype/is/wchar_t/wrapped_env.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/ctype/is/wchar_t/1.cc:71:
 void test01(): Assertion 'gctype.is(std::ctype_base::punct, c40)' failed.

+FAIL: 22_locale/ctype/scan/wchar_t/1.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/ctype/scan/wchar_t/1.cc:69:
 void test01(): Assertion 'gctype.scan_is((std::ctype_base::xdigit), (ca), (ca) 
+ traits_type::length(ca)) == (ca)' failed.

+FAIL: 22_locale/ctype/scan/wchar_t/wrapped_env.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/ctype/scan/wchar_t/1.cc:69:
 void test01(): Assertion 'gctype.scan_is((std::ctype_base::xdigit), (ca), (ca) 
+ traits_type::length(ca)) == (ca)' failed.

+FAIL: 22_locale/ctype/widen/wchar_t/2.cc execution test

/vol/gcc/src/hg/master/solaris/libstdc++-v3/testsuite/22_locale/ctype/widen/wchar_t/2.cc:39:
 void test02(): Assertion 'wc == static_cast(0xff)' failed.

+FAIL: 22_locale/locale/cons/29217.cc execution test


[pushed] aarch64: Tighten condition on sve/sel* tests

2021-01-13 Thread Richard Sandiford via Gcc-patches
Noticed while testing on a different machine that the sve/sel_*.c
tests require .variant_pcs support but don't test for it.
.variant_pcs post-dates SVE so there shouldn't be a need to test
for both.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/testsuite/
* gcc.target/aarch64/sve/sel_1.c: Require aarch64_variant_pcs.
* gcc.target/aarch64/sve/sel_2.c: Likewise.
* gcc.target/aarch64/sve/sel_3.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/sve/sel_1.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/sel_2.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/sel_3.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/sel_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/sel_1.c
index 9c581c52fde..65208ddbdf1 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/sel_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/sel_1.c
@@ -1,4 +1,4 @@
-/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-do assemble { target aarch64_variant_pcs } } */
 /* { dg-options "-O2 -msve-vector-bits=256 --save-temps" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/sel_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/sel_2.c
index 60aaa878534..8087073b662 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/sel_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/sel_2.c
@@ -1,4 +1,4 @@
-/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-do assemble { target aarch64_variant_pcs } } */
 /* { dg-options "-O2 -msve-vector-bits=256 --save-temps" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/sel_3.c 
b/gcc/testsuite/gcc.target/aarch64/sve/sel_3.c
index 36ec15b7da6..68f9d97ea72 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/sel_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/sel_3.c
@@ -1,4 +1,4 @@
-/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-do assemble { target aarch64_variant_pcs } } */
 /* { dg-options "-O2 -msve-vector-bits=256 --save-temps" } */
 /* { dg-final { check-function-bodies "**" "" } } */
 


[PATCH] tree-optimization/92645 - improve SLP with existing vectors

2021-01-13 Thread Richard Biener
This improves SLP discovery in the face of existing vectors allowing
punning of the vector shape (or even punning from an integer type).
For punning from integer types this does not yet handle lane zero
extraction being represented as conversion rather than BIT_FIELD_REF.

On x86 this for example improves the added testcases from

foo:
.LFB0:
.cfi_startproc
movdqa  (%rsi), %xmm0
movdqa  %xmm0, %xmm2
movdqa  %xmm0, %xmm1
punpckhdq   %xmm0, %xmm2
movdqa  %xmm2, %xmm3
pshufd  $85, %xmm0, %xmm2
pshufd  $255, %xmm0, %xmm0
punpckldq   %xmm0, %xmm2
movdqa  %xmm1, %xmm0
punpckldq   %xmm3, %xmm0
punpcklqdq  %xmm2, %xmm0
movaps  %xmm0, (%rdi)
ret

and

bar:
.LFB1:
.cfi_startproc
movq(%rsi), %rax
movq8(%rsi), %rdx
sarq$32, %rax
movd%edx, %xmm3
movq%rax, %xmm0
movq%rdx, %rax
sarq$32, %rax
movdqa  %xmm0, %xmm1
punpckldq   %xmm3, %xmm0
movd%eax, %xmm2
punpckldq   %xmm2, %xmm1
punpcklqdq  %xmm1, %xmm0
movaps  %xmm0, (%rdi)
ret

to just

foo:
.LFB0:
.cfi_startproc
pshufd  $216, (%esi), %xmm0
movaps  %xmm0, (%edi)
ret

and

bar:
.LFB1:
.cfi_startproc
pshufd  $217, (%esi), %xmm0
movaps  %xmm0, (%edi)
ret

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

2021-01-13  Richard Biener  

PR tree-optimization/92645
* tree-vect-slp.c (vect_build_slp_tree_1): Relax supported
BIT_FIELD_REF argument.
(vect_build_slp_tree_2): Record the desired vector type
on the external vector def.
(vectorizable_slp_permutation): Handle required punning
of existing vector defs.

* gcc.target/i386/pr92645-6.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr92645-6.c | 34 +++
 gcc/tree-vect-slp.c   | 31 +++--
 2 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr92645-6.c

diff --git a/gcc/testsuite/gcc.target/i386/pr92645-6.c 
b/gcc/testsuite/gcc.target/i386/pr92645-6.c
new file mode 100644
index 000..c5c5f8f8df2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr92645-6.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -msse2" } */
+
+typedef long v2di __attribute__((vector_size(16)));
+typedef int v4si __attribute__((vector_size(16)));
+
+void foo (v4si *p, v2di *q)
+{
+  union { v2di a; v4si b; } u;
+  u.a = *q;
+  (*p)[0] = u.b[0];
+  (*p)[1] = u.b[2];
+  (*p)[2] = u.b[1];
+  (*p)[3] = u.b[3];
+}
+
+void bar (v4si *p, __int128_t *q)
+{
+  union { __int128_t a; v4si b; } u;
+  u.a = *q;
+  (*p)[0] = u.b[1];
+  (*p)[1] = u.b[2];
+  (*p)[2] = u.b[1];
+  (*p)[3] = u.b[3];
+}
+
+/* Both functions should end up with sth like
+ [v]pshufd $val, (%esi), %xmm0
+ [v]movdqa %xmm0, (%edi)
+ ret
+   recognized by SLP vectorization involving an existing "vector".  */
+/* { dg-final { scan-assembler-not "punpck" } } */
+/* { dg-final { scan-assembler-times "pshufd" 2 } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 8670d5455b9..c2a3d46c6e7 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1109,7 +1109,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  tree vec = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
  if (!is_a  (vinfo)
  || TREE_CODE (vec) != SSA_NAME
- || !types_compatible_p (vectype, TREE_TYPE (vec)))
+ || !operand_equal_p (TYPE_SIZE (vectype),
+  TYPE_SIZE (TREE_TYPE (vec
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -1721,7 +1722,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  lperm.safe_push (std::make_pair (0, (unsigned)lane));
}
   slp_tree vnode = vect_create_new_slp_node (vNULL);
-  SLP_TREE_VECTYPE (vnode) = TREE_TYPE (vec);
+  /* ???  We record vectype here but we hide eventually necessary
+punning and instead rely on code generation to materialize
+VIEW_CONVERT_EXPRs as necessary.  We instead should make
+this explicit somehow.  */
+  SLP_TREE_VECTYPE (vnode) = vectype;
   SLP_TREE_VEC_DEFS (vnode).safe_push (vec);
   /* We are always building a permutation node even if it is an identity
 permute to shield the rest of the vectorizer from the odd node
@@ -6114,6 +6119,18 @@ vectorizable_slp_permutation (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
  slp_tree first_node = SLP_TREE_CHILDREN (node)[first_vec.first];
  tree first_def
= vect_get_slp_vect_def 

[pushed] rtl-ssa: Fix reversed comparisons in accesses.h comment

2021-01-13 Thread Richard Sandiford via Gcc-patches
Noticed while looking at something else that the comment above
def_lookup got the description of the comparisons the wrong way
round.

Tested on aarch64-linux-gnu and pushed as obvious.

Richard


gcc/
* rtl-ssa/accesses.h (def_lookup): Fix order of comparison results.
---
 gcc/rtl-ssa/accesses.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
index a47f9d997ec..09ae583f993 100644
--- a/gcc/rtl-ssa/accesses.h
+++ b/gcc/rtl-ssa/accesses.h
@@ -965,13 +965,13 @@ public:
 //   clobber_group that spans P.  MUX then contains this definition
 //   or clobber_group.
 //
-// - Otherwise, COMPARISON is less than 0 if we found the definition
+// - Otherwise, COMPARISON is greater than 0 if we found the definition
 //   that precedes P or the group of clobbers that precedes P.  MUX then
 //   contains this definition or clobber_group.
 //
-// - Otherwise, COMPARISON is greater than zero and we found the
-//   definition that follows P, or the group of clobbers that follows P.
-//   MUX then contains this definition or clobber_group.
+// - Otherwise, COMPARISON is less than zero and we found the definition
+//   that follows P, or the group of clobbers that follows P.  MUX then
+//   contains this definition or clobber_group.
 class def_lookup
 {
 public:


Re: [PATCH] Hurd: Enable ifunc by default

2021-01-13 Thread Thomas Schwinge
Hi!

Thanks (and sorry for the delay), pushed "Hurd: Enable ifunc by default"
to master branch in commit e9cb89b936f831a02318d45fc4ddb06f7be55ae4, and
cherry-picked into releases/gcc-10 branch in commit
92b131491c22eb4e4b663d226e9d97f1fd693063, releases/gcc-9 branch in commit
0313ce139f4ca3c96db9dc82125ec9e4a167a224, releases/gcc-8 branch in commit
975b0fa0f43e84bed3cb1b2b593132bc219f962c, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From e9cb89b936f831a02318d45fc4ddb06f7be55ae4 Mon Sep 17 00:00:00 2001
From: Samuel Thibault 
Date: Sun, 8 Nov 2020 23:52:51 +0100
Subject: [PATCH] Hurd: Enable ifunc by default

The binutils bugs seem to have been fixed.

	gcc/
	* config.gcc [$target == *-*-gnu*]: Enable
	'default_gnu_indirect_function'.
---
 gcc/config.gcc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 4bec543fa76..9fb57e96121 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3598,7 +3598,9 @@ esac
 case ${target} in
 *-*-linux*android*|*-*-linux*uclibc*|*-*-linux*musl*)
 ;;
-*-*-linux*)
+*-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu)
+;;
+*-*-linux* | *-*-gnu*)
 	case ${target} in
 	aarch64*-* | arm*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-*)
 		default_gnu_indirect_function=yes
-- 
2.17.1

>From 92b131491c22eb4e4b663d226e9d97f1fd693063 Mon Sep 17 00:00:00 2001
From: Samuel Thibault 
Date: Sun, 8 Nov 2020 23:52:51 +0100
Subject: [PATCH] Hurd: Enable ifunc by default

The binutils bugs seem to have been fixed.

	gcc/
	* config.gcc [$target == *-*-gnu*]: Enable
	'default_gnu_indirect_function'.

(cherry picked from commit e9cb89b936f831a02318d45fc4ddb06f7be55ae4)
---
 gcc/config.gcc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 67bce508a1d..cb3e3238e91 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3542,7 +3542,9 @@ esac
 case ${target} in
 *-*-linux*android*|*-*-linux*uclibc*|*-*-linux*musl*)
 ;;
-*-*-linux*)
+*-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu)
+;;
+*-*-linux* | *-*-gnu*)
 	case ${target} in
 	aarch64*-* | arm*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-*)
 		default_gnu_indirect_function=yes
-- 
2.17.1

>From 0313ce139f4ca3c96db9dc82125ec9e4a167a224 Mon Sep 17 00:00:00 2001
From: Samuel Thibault 
Date: Sun, 8 Nov 2020 23:52:51 +0100
Subject: [PATCH] Hurd: Enable ifunc by default

The binutils bugs seem to have been fixed.

	gcc/
	* config.gcc [$target == *-*-gnu*]: Enable
	'default_gnu_indirect_function'.

(cherry picked from commit e9cb89b936f831a02318d45fc4ddb06f7be55ae4)
---
 gcc/config.gcc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 477aba7e0f6..82f80d4c748 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3283,7 +3283,9 @@ esac
 case ${target} in
 *-*-linux*android*|*-*-linux*uclibc*|*-*-linux*musl*)
 ;;
-*-*-linux*)
+*-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu)
+;;
+*-*-linux* | *-*-gnu*)
 	case ${target} in
 	aarch64*-* | arm*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-*)
 		default_gnu_indirect_function=yes
-- 
2.17.1

>From 975b0fa0f43e84bed3cb1b2b593132bc219f962c Mon Sep 17 00:00:00 2001
From: Samuel Thibault 
Date: Sun, 8 Nov 2020 23:52:51 +0100
Subject: [PATCH] Hurd: Enable ifunc by default

The binutils bugs seem to have been fixed.

	gcc/
	* config.gcc [$target == *-*-gnu*]: Enable
	'default_gnu_indirect_function'.

(cherry picked from commit e9cb89b936f831a02318d45fc4ddb06f7be55ae4)
---
 gcc/config.gcc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 61bf317ea11..af9d1221da8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3174,7 +3174,9 @@ esac
 case ${target} in
 *-*-linux*android*|*-*-linux*uclibc*|*-*-linux*musl*)
 ;;
-*-*-linux*)
+*-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu)
+;;
+*-*-linux* | *-*-gnu*)
 	case ${target} in
 	aarch64*-* | arm*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-*)
 		default_gnu_indirect_function=yes
-- 
2.17.1



Re: [PATCHv2] hurd: libgcc unwinding over signal trampolines with SIGINFO

2021-01-13 Thread Thomas Schwinge
Hi!

On 2020-12-21T15:36:30+0100, Samuel Thibault  wrote:
> When the application sets SA_SIGINFO, the signal trampoline parameters
> are different to follow POSIX.

Thanks (and sorry for the delay), pushed "hurd: libgcc unwinding over
signal trampolines with SIGINFO" to master branch in commit
2b356e689c334ca4765a9ffd95a76cf715447200, and cherry-picked into
releases/gcc-10 branch in commit
2c4d3e6db8583c83f4252bb0c78b85f174420a90, releases/gcc-9 branch in commit
23b1bb7b22aa23a1f096448c319856cbeb720082, releases/gcc-8 branch in commit
c6c6d1d33533a3107bec0bf6f149761b4084d8b4, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 2b356e689c334ca4765a9ffd95a76cf715447200 Mon Sep 17 00:00:00 2001
From: Samuel Thibault 
Date: Mon, 21 Dec 2020 15:36:30 +0100
Subject: [PATCH] hurd: libgcc unwinding over signal trampolines with SIGINFO

When the application sets SA_SIGINFO, the signal trampoline parameters
are different to follow POSIX.

	libgcc/
	* config/i386/gnu-unwind.h (x86_gnu_fallback_frame_state): Add the
	posix siginfo case to struct handler_args. Detect between legacy
	and siginfo from the second parameter, which is a small sigcode in
	the legacy case, and a pointer in the siginfo case.
---
 libgcc/config/i386/gnu-unwind.h | 60 ++---
 1 file changed, 47 insertions(+), 13 deletions(-)

diff --git a/libgcc/config/i386/gnu-unwind.h b/libgcc/config/i386/gnu-unwind.h
index d69d07c1c80..0632348d4cd 100644
--- a/libgcc/config/i386/gnu-unwind.h
+++ b/libgcc/config/i386/gnu-unwind.h
@@ -38,10 +38,21 @@ x86_gnu_fallback_frame_state
 {
   struct handler_args {
 int signo;
-int sigcode;
-struct sigcontext *scp;
+union
+  {
+	struct
+	  {
+	long int sigcode;
+	struct sigcontext *scp;
+	  } legacy;
+	struct
+	  {
+	siginfo_t *siginfop;
+	ucontext_t *uctxp;
+	  } posix;
+  };
   } *handler_args;
-  struct sigcontext *scp;
+  long int sigcode;
   unsigned long usp;
 
 /*
@@ -75,29 +86,52 @@ x86_gnu_fallback_frame_state
 return _URC_END_OF_STACK;
 
   handler_args = context->cfa;
-  scp = handler_args->scp;
-  usp = scp->sc_uesp;
+  sigcode = handler_args->legacy.sigcode;
+  if (sigcode >= -16 && sigcode < 4096)
+{
+  /* This cannot be a SIGINFO pointer, assume legacy.  */
+  struct sigcontext *scp = handler_args->legacy.scp;
+  usp = scp->sc_uesp;
+
+  fs->regs.reg[0].loc.offset = (unsigned long)>sc_eax - usp;
+  fs->regs.reg[1].loc.offset = (unsigned long)>sc_ecx - usp;
+  fs->regs.reg[2].loc.offset = (unsigned long)>sc_edx - usp;
+  fs->regs.reg[3].loc.offset = (unsigned long)>sc_ebx - usp;
+  fs->regs.reg[5].loc.offset = (unsigned long)>sc_ebp - usp;
+  fs->regs.reg[6].loc.offset = (unsigned long)>sc_esi - usp;
+  fs->regs.reg[7].loc.offset = (unsigned long)>sc_edi - usp;
+  fs->regs.reg[8].loc.offset = (unsigned long)>sc_eip - usp;
+}
+  else
+{
+  /* This is not a valid sigcode, assume SIGINFO.  */
+  ucontext_t *uctxp = handler_args->posix.uctxp;
+  gregset_t *gregset = >uc_mcontext.gregs;
+  usp = (*gregset)[REG_UESP];
+
+  fs->regs.reg[0].loc.offset = (unsigned long)&(*gregset)[REG_EAX] - usp;
+  fs->regs.reg[1].loc.offset = (unsigned long)&(*gregset)[REG_ECX] - usp;
+  fs->regs.reg[2].loc.offset = (unsigned long)&(*gregset)[REG_EDX] - usp;
+  fs->regs.reg[3].loc.offset = (unsigned long)&(*gregset)[REG_EBX] - usp;
+  fs->regs.reg[5].loc.offset = (unsigned long)&(*gregset)[REG_EBP] - usp;
+  fs->regs.reg[6].loc.offset = (unsigned long)&(*gregset)[REG_ESI] - usp;
+  fs->regs.reg[7].loc.offset = (unsigned long)&(*gregset)[REG_EDI] - usp;
+  fs->regs.reg[8].loc.offset = (unsigned long)&(*gregset)[REG_EIP] - usp;
+}
 
   fs->regs.cfa_how = CFA_REG_OFFSET;
   fs->regs.cfa_reg = 4;
   fs->regs.cfa_offset = usp - (unsigned long) context->cfa;
 
   fs->regs.reg[0].how = REG_SAVED_OFFSET;
-  fs->regs.reg[0].loc.offset = (unsigned long)>sc_eax - usp;
   fs->regs.reg[1].how = REG_SAVED_OFFSET;
-  fs->regs.reg[1].loc.offset = (unsigned long)>sc_ecx - usp;
   fs->regs.reg[2].how = REG_SAVED_OFFSET;
-  fs->regs.reg[2].loc.offset = (unsigned long)>sc_edx - usp;
   fs->regs.reg[3].how = REG_SAVED_OFFSET;
-  fs->regs.reg[3].loc.offset = (unsigned long)>sc_ebx - usp;
   fs->regs.reg[5].how = REG_SAVED_OFFSET;
-  fs->regs.reg[5].loc.offset = (unsigned long)>sc_ebp - usp;
   fs->regs.reg[6].how = REG_SAVED_OFFSET;
-  fs->regs.reg[6].loc.offset = (unsigned long)>sc_esi - usp;
   fs->regs.reg[7].how = REG_SAVED_OFFSET;
-  fs->regs.reg[7].loc.offset = (unsigned long)>sc_edi - usp;
   fs->regs.reg[8].how = REG_SAVED_OFFSET;
-  fs->regs.reg[8].loc.offset = (unsigned long)>sc_eip - usp;
+
   fs->retaddr_column = 8;
   fs->signal_frame = 1;
 
-- 
2.17.1

>From 

[PATCH][pushed] gcc-changelog: Allow modifications to old ChangeLogs without entry

2021-01-13 Thread Martin Liška

It allows modifications to ChangeLog* files without need create
a ChangeLog entry for them.

Pushed.
Martin

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Allow modifications of older
ChangeLog (or specific) files without need to make a ChangeLog
entry.
* gcc-changelog/test_email.py: Test it.
* gcc-changelog/test_patches.txt: Add new patch.
---
 contrib/gcc-changelog/git_commit.py| 12 +---
 contrib/gcc-changelog/test_email.py|  4 
 contrib/gcc-changelog/test_patches.txt | 20 
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index ee1973371be..59f478670d7 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -308,7 +308,7 @@ class GitCommit:
 self.info = self.commit_to_info_hook(self.revert_commit)
 
 project_files = [f for f in self.info.modified_files

- if self.is_changelog_filename(f[0])
+ if self.is_changelog_filename(f[0], allow_suffix=True)
  or f[0] in misc_files]
 ignored_files = [f for f in self.info.modified_files
  if self.in_ignored_location(f[0])]
@@ -343,8 +343,14 @@ class GitCommit:
 return [x[0] for x in self.info.modified_files if x[1] == 'A']
 
 @classmethod

-def is_changelog_filename(cls, path):
-return path.endswith('/ChangeLog') or path == 'ChangeLog'
+def is_changelog_filename(cls, path, allow_suffix=False):
+basename = os.path.basename(path)
+if basename == 'ChangeLog':
+return True
+elif allow_suffix and basename.startswith('ChangeLog'):
+return True
+else:
+return False
 
 @classmethod

 def find_changelog_location(cls, name):
diff --git a/contrib/gcc-changelog/test_email.py 
b/contrib/gcc-changelog/test_email.py
index 5db56caef9e..532ed6a7983 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -404,3 +404,7 @@ class TestGccChangelog(unittest.TestCase):
 email = self.from_patch_glob('0001-Add-horse2.patch')
 assert not email.errors
 assert email.changelog_entries[0].files == ['koní�ek.txt']
+
+def test_modification_of_old_changelog(self):
+email = self.from_patch_glob('0001-fix-old-ChangeLog.patch')
+assert not email.errors
diff --git a/contrib/gcc-changelog/test_patches.txt 
b/contrib/gcc-changelog/test_patches.txt
index ffd13682d5c..6b75e488903 100644
--- a/contrib/gcc-changelog/test_patches.txt
+++ b/contrib/gcc-changelog/test_patches.txt
@@ -3398,4 +3398,24 @@ index 000..56c67f58752
 --
 2.29.2
 
+=== 0001-fix-old-ChangeLog.patch ===

+From fd498465b2801203089616be9a0e3c1f4fc065a0 Mon Sep 17 00:00:00 2001
+From: Martin Liska 
+Date: Wed, 13 Jan 2021 11:45:37 +0100
+Subject: [PATCH] Fix a changelog.
+
+---
+ gcc/ChangeLog-2020 | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/gcc/ChangeLog-2020 b/gcc/ChangeLog-2020
+index 6553720acad..2c170ef014a 100644
+--- a/gcc/ChangeLog-2020
 b/gcc/ChangeLog-2020
+@@ -1 +1,2 @@
+
++
+
+--
+2.29.2
 
--

2.29.2



[PATCH] tree-optimization/98640 - fix bogus sign-extension with VN

2021-01-13 Thread Richard Biener
VN tried to express a sign extension from int to long of
a trucated quantity with a plain conversion but that loses the
truncation.  Since there's no single operand doing truncate plus
sign extend (there was a proposed SEXT_EXPR to do that at some
point mapping to RTL sign_extract) don't bother to appropriately
model this with two ops (which the VN insert machinery doesn't
handle and which is unlikely to CSE fully).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-01-13  Richard Biener  

PR tree-optimization/98640
* tree-ssa-sccvn.c (visit_nary_op): Do not try to
handle plus or minus from a truncated operand to be
sign-extended.

* gcc.dg/torture/pr98640.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr98640.c | 22 ++
 gcc/tree-ssa-sccvn.c   | 15 +--
 2 files changed, 31 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr98640.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr98640.c 
b/gcc/testsuite/gcc.dg/torture/pr98640.c
new file mode 100644
index 000..b187781d614
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr98640.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+/* { dg-require-effective-target stdint_types } */
+
+#include 
+
+uint64_t var_0 = 18128133247277979402ULL;
+int64_t var_14 = 6557021550272328915LL;
+uint64_t var_83 = 10966786425750692026ULL;
+
+void test()
+{
+  var_14 = var_0 + (_Bool)7;
+  var_83 = 1 + (int)var_0; // 1 + 888395530
+}
+
+int main()
+{
+  test();
+  if (var_83 != 888395531)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 0ba846f0be2..588f1b82478 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -4681,7 +4681,7 @@ visit_copy (tree lhs, tree rhs)
is the same.  */
 
 static tree
-valueized_wider_op (tree wide_type, tree op)
+valueized_wider_op (tree wide_type, tree op, bool allow_truncate)
 {
   if (TREE_CODE (op) == SSA_NAME)
 op = vn_valueize (op);
@@ -4695,7 +4695,7 @@ valueized_wider_op (tree wide_type, tree op)
 return tem;
 
   /* Or the op is truncated from some existing value.  */
-  if (TREE_CODE (op) == SSA_NAME)
+  if (allow_truncate && TREE_CODE (op) == SSA_NAME)
 {
   gimple *def = SSA_NAME_DEF_STMT (op);
   if (is_gimple_assign (def)
@@ -4760,12 +4760,15 @@ visit_nary_op (tree lhs, gassign *stmt)
  || gimple_assign_rhs_code (def) == MULT_EXPR))
{
  tree ops[3] = {};
+ /* When requiring a sign-extension we cannot model a
+previous truncation with a single op so don't bother.  */
+ bool allow_truncate = TYPE_UNSIGNED (TREE_TYPE (rhs1));
  /* Either we have the op widened available.  */
- ops[0] = valueized_wider_op (type,
-  gimple_assign_rhs1 (def));
+ ops[0] = valueized_wider_op (type, gimple_assign_rhs1 (def),
+  allow_truncate);
  if (ops[0])
-   ops[1] = valueized_wider_op (type,
-gimple_assign_rhs2 (def));
+   ops[1] = valueized_wider_op (type, gimple_assign_rhs2 (def),
+allow_truncate);
  if (ops[0] && ops[1])
{
  ops[0] = vn_nary_op_lookup_pieces
-- 
2.26.2


Re: [PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]

2021-01-13 Thread Richard Biener
On Wed, 13 Jan 2021, Jakub Jelinek wrote:

> On Wed, Jan 13, 2021 at 08:26:49AM +0100, Richard Biener wrote:
> > +  if (op1 && op0 != op1)
> > +op1 = force_reg (vmode, op1);
> > 
> > code (presumably to handle RTX sharing here)?
> 
> That could be actually simplified, incrementally e.g. to:
>if (op0)
>  {
>rtx nop0 = force_reg (vmode, op0);
>if (op0 == op1)
>  op1 = nop0;
>op0 = nop0;
>  }
> -  if (op1 && op0 != op1)
> +  if (op1)
>  op1 = force_reg (vmode, op1);
> 
> (because the outer force_reg in force_reg (vmode, force_reg (vmode, X))
> just returns its argument).

I see.  Thanks for clarifying in the earlier mail - the non-x86
parts of the patch are OK.

Thanks,
Richard.


[PATCH] match.pd, v2: Fold (~X | C) ^ D into (X | C) ^ (~D ^ C) if (~D ^ C) can be simplified [PR96691]

2021-01-13 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 13, 2021 at 08:35:01AM +0100, Richard Biener wrote:
> I guess you could use
> 
>   (bit_xor (bit_ior @0 @1) (bit_xor! (bit_not! @2) @1))

I wasn't aware of !.
I had to put it under #if GIMPLE because ! doesn't seem to be implemented
for GENERIC, but otherwise it works fine.

Though, shouldn't I add then :c on bit_xor and bit_ior/bit_and then?
The rationale for not adding them before was that constants are
canonicalized to go last, but if we simplify other things, they might not be
last anymore...

Ok for trunk if it passes another bootstrap/regtest?

2021-01-13  Jakub Jelinek  

PR tree-optimization/96691
* match.pd ((~X | C) ^ D -> (X | C) ^ (~D ^ C),
(~X & C) ^ D -> (X & C) ^ (D ^ C)): New simplifications if
(~D ^ C) or (D ^ C) can be simplified.

* gcc.dg/tree-ssa/pr96691.c: New test.

--- gcc/match.pd.jj 2021-01-13 08:01:58.197627154 +0100
+++ gcc/match.pd2021-01-13 10:25:32.327321918 +0100
@@ -947,8 +947,18 @@ (define_operator_list COND_TERNARY
  (bit_ior:c (bit_xor:c@3 @0 @1) (bit_xor:c (bit_xor:c @1 @2) @0))
  (bit_ior @3 @2))
 
-/* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
 #if GIMPLE
+/* (~X | C) ^ D -> (X | C) ^ (~D ^ C) if (~D ^ C) can be simplified.  */
+(simplify
+ (bit_xor (bit_ior:s (bit_not @0) @1) @2)
+  (bit_xor (bit_ior @0 @1) (bit_xor! (bit_not! @2) @1)))
+
+/* (~X & C) ^ D -> (X & C) ^ (D ^ C) if (D ^ C) can be simplified.  */
+(simplify
+ (bit_xor (bit_and:s (bit_not @0) @1) @2)
+  (bit_xor (bit_and @0 @1) (bit_xor! @2 @1)))
+
+/* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0.  */
 (simplify
  (bit_and (bit_not SSA_NAME@0) INTEGER_CST@1)
  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
--- gcc/testsuite/gcc.dg/tree-ssa/pr96691.c.jj  2021-01-13 10:16:45.828331557 
+0100
+++ gcc/testsuite/gcc.dg/tree-ssa/pr96691.c 2021-01-13 10:16:45.828331557 
+0100
@@ -0,0 +1,21 @@
+/* PR tree-optimization/96691 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times " \\\| 123;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\& 123;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\^ -315;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\\^ 314;" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not " \\\^ 321;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " = ~" "optimized" } } */
+
+int
+foo (int x)
+{
+  return (~x | 123) ^ 321;
+}
+
+int
+bar (int x)
+{
+  return (~x & 123) ^ 321;
+}


Jakub



Re: [PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]

2021-01-13 Thread Uros Bizjak via Gcc-patches
On Wed, Jan 13, 2021 at 8:13 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch implements what I've talked about, i.e. to no longer
> force operands of vec_perm_const into registers in the generic code, but let
> each of the (currently 8) targets force it into registers individually,
> giving the targets better control on if it does that and when and allowing
> them to do something special with some particular operands.
> And then defines the define_insn_and_split for the 256-bit and 512-bit
> permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could
> add define_insn_and_split patterns also for the bd, bq and wq).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-01-13  Jakub Jelinek  
>
> PR target/95905
> * optabs.c (expand_vec_perm_const): Don't force v0 and v1 into
> registers before calling targetm.vectorize.vec_perm_const, only after
> that.
> * config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle
> two argument permutation when one operand is zero vector and only
> after that force operands into registers.
> * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1,
> *avx512bw_zero_extendv32qiv32hi2_1, *avx512f_zero_extendv16hiv16si2_1,
> *avx2_zero_extendv8hiv8si2_1, *avx512f_zero_extendv8siv8di2_1,
> *avx2_zero_extendv4siv4di2_1): New define_insn_and_split patterns.
> * config/mips/mips.c (mips_vectorize_vec_perm_const): Force operands
> into registers.
> * config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise.
> * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise.
> * config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise.
> * config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const): 
> Likewise.
> * config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const): Likewise.
> * config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise.  Use 
> std::swap.
>
> * gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of
> scan-assembler.  Add tests with zero vector as first __builtin_shuffle
> operand.
> * gcc.target/i386/pr95905-3.c: New test.
> * gcc.target/i386/pr95905-4.c: New test.

LGTM for x86 part.

Thanks,
Uros.

>
> --- gcc/optabs.c.jj 2021-01-04 10:25:38.632236100 +0100
> +++ gcc/optabs.c2021-01-12 14:46:44.719557815 +0100
> @@ -6070,11 +6070,8 @@ expand_vec_perm_const (machine_mode mode
>
>if (targetm.vectorize.vec_perm_const != NULL)
>  {
> -  v0 = force_reg (mode, v0);
>if (single_arg_p)
> v1 = v0;
> -  else
> -   v1 = force_reg (mode, v1);
>
>if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
> return target;
> @@ -6095,6 +6092,11 @@ expand_vec_perm_const (machine_mode mode
> return gen_lowpart (mode, target_qi);
>  }
>
> +  v0 = force_reg (mode, v0);
> +  if (single_arg_p)
> +v1 = v0;
> +  v1 = force_reg (mode, v1);
> +
>/* Otherwise expand as a fully variable permuation.  */
>
>/* The optabs are only defined for selectors with the same width
> --- gcc/config/i386/i386-expand.c.jj2021-01-12 11:01:51.189386077 +0100
> +++ gcc/config/i386/i386-expand.c   2021-01-12 15:43:55.673095807 +0100
> @@ -19929,6 +19929,33 @@ ix86_vectorize_vec_perm_const (machine_m
>
>two_args = canonicalize_perm ();
>
> +  /* If one of the operands is a zero vector, try to match pmovzx.  */
> +  if (two_args && (d.op0 == CONST0_RTX (vmode) || d.op1 == CONST0_RTX 
> (vmode)))
> +{
> +  struct expand_vec_perm_d dzero = d;
> +  if (d.op0 == CONST0_RTX (vmode))
> +   {
> + d.op1 = dzero.op1 = force_reg (vmode, d.op1);
> + std::swap (dzero.op0, dzero.op1);
> + for (i = 0; i < nelt; ++i)
> +   dzero.perm[i] ^= nelt;
> +   }
> +  else
> +   d.op0 = dzero.op0 = force_reg (vmode, d.op0);
> +
> +  if (expand_vselect_vconcat (dzero.target, dzero.op0, dzero.op1,
> + dzero.perm, nelt, dzero.testing_p))
> +   return true;
> +}
> +
> +  /* Force operands into registers.  */
> +  rtx nop0 = force_reg (vmode, d.op0);
> +  if (d.op0 == d.op1)
> +d.op1 = nop0;
> +  d.op0 = nop0;
> +  if (d.op0 != d.op1)
> +d.op1 = force_reg (vmode, d.op1);
> +
>if (ix86_expand_vec_perm_const_1 ())
>  return true;
>
> --- gcc/config/i386/sse.md.jj   2021-01-12 14:30:32.688546846 +0100
> +++ gcc/config/i386/sse.md  2021-01-12 15:40:29.018402527 +0100
> @@ -17611,6 +17611,23 @@ (define_insn "avx2_v16qiv16hi2 (set_attr "prefix" "maybe_evex")
> (set_attr "mode" "OI")])
>
> +(define_insn_and_split "*avx2_zero_extendv16qiv16hi2_1"
> +  [(set (match_operand:V32QI 0 "register_operand" "=v")
> +   (vec_select:V32QI
> + (vec_concat:V64QI
> +   (match_operand:V32QI 1 "nonimmediate_operand" "vm")
> +   

Re: [PATCH] i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905]

2021-01-13 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 13, 2021 at 08:26:49AM +0100, Richard Biener wrote:
> +  if (op1 && op0 != op1)
> +op1 = force_reg (vmode, op1);
> 
> code (presumably to handle RTX sharing here)?

That could be actually simplified, incrementally e.g. to:
   if (op0)
 {
   rtx nop0 = force_reg (vmode, op0);
   if (op0 == op1)
 op1 = nop0;
   op0 = nop0;
 }
-  if (op1 && op0 != op1)
+  if (op1)
 op1 = force_reg (vmode, op1);

(because the outer force_reg in force_reg (vmode, force_reg (vmode, X))
just returns its argument).

Jakub



Re: [PATCH] i386: Add define_insn_and_split patterns for btrl [PR96938]

2021-01-13 Thread Uros Bizjak via Gcc-patches
On Wed, Jan 13, 2021 at 8:18 AM Jakub Jelinek  wrote:
>
> Hi!
>
> In the following testcase we only optimize f2 and f7 to btrl, although we
> should optimize that way all of the functions.  The problem is the type
> demotion/narrowing (which is performed solely during the generic folding and
> not later), without it we see the AND performed in SImode and match it as
> btrl, but with it while the shifts are still performed in SImode, the
> AND is already done in QImode or HImode low part of the shift.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-01-13  Jakub Jelinek  
>
> PR target/96938
> * config/i386/i386.md (*btr_1, *btr_2): New
> define_insn_and_split patterns.
> (splitter after *btr_2): New splitter.
>
> * gcc.target/i386/pr96938.c: New test.

OK with a small fix below.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.md.jj  2021-01-07 17:18:39.653487482 +0100
> +++ gcc/config/i386/i386.md 2021-01-12 19:01:37.286603961 +0100
> @@ -12419,6 +12419,70 @@ (define_insn_and_split "*btr_mask_
>  (match_dup 3)))
>(clobber (reg:CC FLAGS_REG))])])
>
> +(define_insn_and_split "*btr_1"
> +  [(set (match_operand:SWI12 0 "register_operand")
> +   (and:SWI12
> + (subreg:SWI12
> +   (rotate:SI (const_int -2)
> +  (match_operand:QI 2 "register_operand")) 0)
> + (match_operand:SWI12 1 "nonimmediate_operand")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(parallel
> + [(set (match_dup 0)
> +  (and:SI (rotate:SI (const_int -2) (match_dup 2))
> +  (match_dup 1)))
> +  (clobber (reg:CC FLAGS_REG))])]
> +{
> +  operands[0] = lowpart_subreg (SImode, operands[0], mode);
> +  if (MEM_P (operands[1]))
> +operands[1] = force_reg (mode, operands[1]);
> +  operands[1] = lowpart_subreg (SImode, operands[1], mode);
> +})
> +
> +(define_insn_and_split "*btr_2"
> +  [(set (zero_extract:HI
> + (match_operand:SWI12 0 "nonimmediate_operand")
> + (const_int 1)
> + (zero_extend:SI (match_operand:QI 1 "register_operand")))
> +   (const_int 0))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& MEM_P (operands[0])"
> +  [(set (match_dup 2) (match_dup 0))
> +   (parallel
> + [(set (match_dup 3) (and:SI (rotate:SI (const_int -2) (match_dup 1))

Please move (and ...) to a separate line.

> +(match_dup 4)))
> +  (clobber (reg:CC FLAGS_REG))])
> +   (set (match_dup 0) (match_dup 5))]
> +{
> +  operands[2] = gen_reg_rtx (mode);
> +  operands[5] = gen_reg_rtx (mode);
> +  operands[3] = lowpart_subreg (SImode, operands[5], mode);
> +  operands[4] = lowpart_subreg (SImode, operands[2], mode);
> +})
> +
> +(define_split
> +  [(set (zero_extract:HI
> + (match_operand:SWI12 0 "register_operand")
> + (const_int 1)
> + (zero_extend:SI (match_operand:QI 1 "register_operand")))
> +   (const_int 0))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  [(parallel
> + [(set (match_dup 0)
> +  (and:SI (rotate:SI (const_int -2) (match_dup 1))
> +  (match_dup 2)))
> +  (clobber (reg:CC FLAGS_REG))])]
> +{
> +  operands[2] = lowpart_subreg (SImode, operands[0], mode);
> +  operands[0] = lowpart_subreg (SImode, operands[0], mode);
> +})
> +
>  ;; These instructions are never faster than the corresponding
>  ;; and/ior/xor operations when using immediate operand, so with
>  ;; 32-bit there's no point.  But in 64-bit, we can't hold the
> --- gcc/testsuite/gcc.target/i386/pr96938.c.jj  2021-01-12 19:12:48.285023954 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr96938.c 2021-01-12 19:12:33.209194271 
> +0100
> @@ -0,0 +1,66 @@
> +/* PR target/96938 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tbtrl\t" 10 } } */
> +
> +void
> +f1 (unsigned char *f, int o, unsigned char v)
> +{
> +  *f = (*f & ~(1 << o)) | (v << o);
> +}
> +
> +void
> +f2 (unsigned char *f, int o, unsigned char v)
> +{
> +  int t = *f & ~(1 << o);
> +  *f = t | (v << o);
> +}
> +
> +void
> +f3 (unsigned char *f, int o, unsigned char v)
> +{
> +  *f &= ~(1 << o);
> +}
> +
> +void
> +f4 (unsigned char *f, int o, unsigned char v)
> +{
> +  *f = (*f & ~(1 << (o & 31))) | v;
> +}
> +
> +void
> +f5 (unsigned char *f, int o, unsigned char v)
> +{
> +  *f = (*f & ~(1 << (o & 31))) | (v << (o & 31));
> +}
> +
> +void
> +f6 (unsigned short *f, int o, unsigned short v)
> +{
> +  *f = (*f & ~(1 << o)) | (v << o);
> +}
> +
> +void
> +f7 (unsigned short *f, int o, unsigned short v)
> +{
> +  int t = *f & ~(1 << o);
> +  *f = t | (v << o);
> +}
> +
> +void
> +f8 (unsigned short *f, int o, unsigned short v)
> +{
> +  *f &= ~(1 << o);
> +}
> +
> +void
> +f9 (unsigned short *f, int o, unsigned 

Re: [PATCH] if-to-switch: fix also virtual phis

2021-01-13 Thread Richard Biener via Gcc-patches
On Wed, Jan 13, 2021 at 9:30 AM Martin Liška  wrote:
>
> On 1/12/21 4:14 PM, Richard Biener wrote:
> > ) On Tue, Jan 12, 2021 at 3:50 PM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> As seen in the PR, we need to fix also virtual PHIs, otherwise
> >> TODO_cfg will skip edges for a missing PHI argument.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > OK - doesn't this mean you can remove the
> >
> >mark_virtual_operands_for_renaming (fun);
> >
> > call and thus TODO_update_ssa?  Btw, the pass seems
> > to unconditionally schedule TODO_cleanup_cfg - it would
> > be nice to only do that (return TODO_cleanup_cfg from
> > pass_if_to_switch::execute) if it did any transform.
>
> Yes, application of all the suggesting works fine.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

Thanks,
Richard.

> Thanks,
> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >>  PR tree-optimization/98455
> >>  * gimple-if-to-switch.cc (condition_info::record_phi_mapping):
> >>  Record also virtual PHIs.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  PR tree-optimization/98455
> >>  * gcc.dg/tree-ssa/pr98455.c: New test.
> >> ---
> >>gcc/gimple-if-to-switch.cc  |  7 ++-
> >>gcc/testsuite/gcc.dg/tree-ssa/pr98455.c | 19 +++
> >>2 files changed, 21 insertions(+), 5 deletions(-)
> >>create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
> >>
> >> diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
> >> index 560753d0311..96213d86c28 100644
> >> --- a/gcc/gimple-if-to-switch.cc
> >> +++ b/gcc/gimple-if-to-switch.cc
> >> @@ -91,11 +91,8 @@ condition_info::record_phi_mapping (edge e, mapping_vec 
> >> *vec)
> >>   gsi_next ())
> >>{
> >>  gphi *phi = gsi.phi ();
> >> -  if (!virtual_operand_p (gimple_phi_result (phi)))
> >> -   {
> >> - tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> >> - vec->safe_push (std::make_pair (phi, arg));
> >> -   }
> >> +  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> >> +  vec->safe_push (std::make_pair (phi, arg));
> >>}
> >>}
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
> >> new file mode 100644
> >> index 000..24e249f6fcb
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
> >> @@ -0,0 +1,19 @@
> >> +/* PR tree-optimization/98455 */
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O1 -fno-tree-dce --param case-values-threshold=1" } */
> >> +
> >> +void
> >> +n4 (int io, int vb)
> >> +{
> >> +  double uc[2] = { 1.0, 2.0, };
> >> +
> >> +  if (io == 0)
> >> +uc[0] = 0.0;
> >> +
> >> +  for (;;)
> >> +if (io == 0)
> >> +  if (vb == 0)
> >> +uc[0] = uc[1];
> >> +  else if (vb == 1)
> >> +uc[1] = 0.0;
> >> +}
> >> --
> >> 2.29.2
> >>
>


Re: [PATCH] if-to-switch: fix also virtual phis

2021-01-13 Thread Martin Liška

On 1/12/21 4:14 PM, Richard Biener wrote:

) On Tue, Jan 12, 2021 at 3:50 PM Martin Liška  wrote:


Hello.

As seen in the PR, we need to fix also virtual PHIs, otherwise
TODO_cfg will skip edges for a missing PHI argument.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


OK - doesn't this mean you can remove the

   mark_virtual_operands_for_renaming (fun);

call and thus TODO_update_ssa?  Btw, the pass seems
to unconditionally schedule TODO_cleanup_cfg - it would
be nice to only do that (return TODO_cleanup_cfg from
pass_if_to_switch::execute) if it did any transform.


Yes, application of all the suggesting works fine.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin



Thanks,
Richard.


Thanks,
Martin

gcc/ChangeLog:

 PR tree-optimization/98455
 * gimple-if-to-switch.cc (condition_info::record_phi_mapping):
 Record also virtual PHIs.

gcc/testsuite/ChangeLog:

 PR tree-optimization/98455
 * gcc.dg/tree-ssa/pr98455.c: New test.
---
   gcc/gimple-if-to-switch.cc  |  7 ++-
   gcc/testsuite/gcc.dg/tree-ssa/pr98455.c | 19 +++
   2 files changed, 21 insertions(+), 5 deletions(-)
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c

diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
index 560753d0311..96213d86c28 100644
--- a/gcc/gimple-if-to-switch.cc
+++ b/gcc/gimple-if-to-switch.cc
@@ -91,11 +91,8 @@ condition_info::record_phi_mapping (edge e, mapping_vec *vec)
  gsi_next ())
   {
 gphi *phi = gsi.phi ();
-  if (!virtual_operand_p (gimple_phi_result (phi)))
-   {
- tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
- vec->safe_push (std::make_pair (phi, arg));
-   }
+  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+  vec->safe_push (std::make_pair (phi, arg));
   }
   }

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
new file mode 100644
index 000..24e249f6fcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
@@ -0,0 +1,19 @@
+/* PR tree-optimization/98455 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-dce --param case-values-threshold=1" } */
+
+void
+n4 (int io, int vb)
+{
+  double uc[2] = { 1.0, 2.0, };
+
+  if (io == 0)
+uc[0] = 0.0;
+
+  for (;;)
+if (io == 0)
+  if (vb == 0)
+uc[0] = uc[1];
+  else if (vb == 1)
+uc[1] = 0.0;
+}
--
2.29.2



>From 69d356dea9a202b945288f07b1a9dcc88ea98fd6 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 12 Jan 2021 13:40:44 +0100
Subject: [PATCH] if-to-switch: fix also virtual phis

gcc/ChangeLog:

	PR tree-optimization/98455
	* gimple-if-to-switch.cc (condition_info::record_phi_mapping):
	Record also virtual PHIs.
	(pass_if_to_switch::execute): Return TODO_cleanup_cfg only
	conditionally.

gcc/testsuite/ChangeLog:

	PR tree-optimization/98455
	* gcc.dg/tree-ssa/pr98455.c: New test.
---
 gcc/gimple-if-to-switch.cc  | 11 ---
 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c | 19 +++
 2 files changed, 23 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr98455.c

diff --git a/gcc/gimple-if-to-switch.cc b/gcc/gimple-if-to-switch.cc
index 560753d0311..1712fc4c8b3 100644
--- a/gcc/gimple-if-to-switch.cc
+++ b/gcc/gimple-if-to-switch.cc
@@ -91,11 +91,8 @@ condition_info::record_phi_mapping (edge e, mapping_vec *vec)
gsi_next ())
 {
   gphi *phi = gsi.phi ();
-  if (!virtual_operand_p (gimple_phi_result (phi)))
-	{
-	  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
-	  vec->safe_push (std::make_pair (phi, arg));
-	}
+  tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+  vec->safe_push (std::make_pair (phi, arg));
 }
 }
 
@@ -470,7 +467,7 @@ const pass_data pass_data_if_to_switch =
   0, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_cleanup_cfg | TODO_update_ssa /* todo_flags_finish */
+  TODO_update_ssa /* todo_flags_finish */
 };
 
 class pass_if_to_switch : public gimple_opt_pass
@@ -575,7 +572,7 @@ pass_if_to_switch::execute (function *fun)
   if (!all_candidates.is_empty ())
 {
   free_dominance_info (CDI_DOMINATORS);
-  mark_virtual_operands_for_renaming (fun);
+  return TODO_cleanup_cfg;
 }
 
   return 0;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
new file mode 100644
index 000..24e249f6fcb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr98455.c
@@ -0,0 +1,19 @@
+/* PR tree-optimization/98455 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-dce --param case-values-threshold=1" } */
+
+void
+n4 (int io, int vb)
+{
+  double uc[2] = { 1.0, 2.0, };
+
+  if (io == 0)
+uc[0] = 0.0;
+
+  for (;;)
+if (io == 0)
+  if (vb == 0)
+uc[0] = uc[1];
+  else if 

[PATCH][pushed] ipa: remove a dead code

2021-01-13 Thread Martin Liška

Hello.

As mentioned in the PR, the code as dead as changed is assigned to true.
The corresponding newline is properly printed at line 1204.

I'm going to push it as obvious.
Martin

gcc/ChangeLog:

PR ipa/98652
* cgraphunit.c (analyze_functions): Remove dead code.
---
 gcc/cgraphunit.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index cf64e56ab95..b401f0817a3 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1191,8 +1191,6 @@ analyze_functions (bool first_time)
  changed = true;
  if (symtab->dump_file)
fprintf (symtab->dump_file, " %s", node->dump_asm_name ());
- if (!changed && symtab->dump_file)
-   fprintf (symtab->dump_file, "\n");
}
  if (node == first_analyzed
  || node == first_analyzed_var)
--
2.29.2