Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-08-29 Thread Uros Bizjak
On Fri, Aug 30, 2019 at 2:08 AM Hongtao Liu  wrote:
>
> On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak  wrote:
> >
> > 2019-08-28  Uroš Bizjak  
> >
> > * config/i386/i386.c (ix86_register_move_cost): Do not
> > limit the cost of moves to/from XMM register to minimum 8.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Actually committed as r274994 with the wrong ChangeLog.
> >
> > Uros.
>
> There is 11% regression in 548.exchange_r of SPEC2017.
>
> Reason for the regression:
> For 548.exchange_r, a lot of movements between gpr and xmm are
> generated as expected,
> and it reduced  clocksticks by 3%.

This is OK, and expected from the patch.

> But  however maybe too many xmm registers are used,
> a frequency reduction issue is triggered(average frequency reduced by 13%).
> So totally it takes more time.

This is a secondary effect that is currently not modelled by the compiler.

However, I expected that SSE <-> int moves in x86-tune-cost.h will
have to be retuned. Up to now, both directions were limited to minimum
8, so any value lower than 8 was ignored. However, minimum was set to
work-around certain limitation in reload, which is not needed anymore.

You can simply set the values of SSE <-> int moves to 8 (which is an
arbitrary value!) to restore the previous behaviour, but I think that
a more precise cost value should be determined, probably a different
one for each direction. But until register pressure effects are
modelled, any artificially higher value will represent a workaround
and not the true reg-reg move cost.

Uros.


[PATCH] use fallback location for warning (PR 91599)

2019-08-29 Thread Martin Sebor

warning_at() calls with the %G directive rely on the gimple statement
for both their location and the inlining context.  When the statement
is not associated with a location, the warning doesn't point at any
line even if the location argument passed to the call was valid.
The attached patch changes the percent_G_percent handler to fall back
on the provided location in that case, and the recently added warning
for char assignments to pass to the function a fallback location if
the statement doesn't have one.

Tested on x86_64-linux.

Martin
PR middle-end/91599 - GCC does not say where warning is happening

gcc/ChangeLog:

	PR middle-end/91599
	* tree-ssa-strlen.c (handle_store): Use a fallback location if
	the statement doesn't have one.
	* gimple-pretty-print.c (percent_G_format): Same.

gcc/testsuite/ChangeLog:

	PR middle-end/91599
	* gcc.dg/Wstringop-overflow-16.c: New test.

Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c	(revision 275047)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -4036,7 +4036,12 @@ handle_store (gimple_stmt_iterator *gsi)
 	  if (tree dstsize = compute_objsize (lhs, 1, &decl))
 	if (compare_tree_int (dstsize, lenrange[2]) < 0)
 	  {
+		/* Fall back on the LHS location if the statement
+		   doesn't have one.  */
 		location_t loc = gimple_nonartificial_location (stmt);
+		if (loc == UNKNOWN_LOCATION)
+		  loc = tree_nonartificial_location (lhs);
+		loc = expansion_point_location_if_in_system_header (loc);
 		if (warning_n (loc, OPT_Wstringop_overflow_,
 			   lenrange[2],
 			   "%Gwriting %u byte into a region of size %E",
Index: gcc/gimple-pretty-print.c
===
--- gcc/gimple-pretty-print.c	(revision 275047)
+++ gcc/gimple-pretty-print.c	(working copy)
@@ -3034,8 +3034,12 @@ percent_G_format (text_info *text)
 {
   gimple *stmt = va_arg (*text->args_ptr, gimple*);
 
+  /* Fall back on the rich location if the statement doesn't have one.  */
+  location_t loc = gimple_location (stmt);
+  if (loc == UNKNOWN_LOCATION)
+loc = text->m_richloc->get_loc ();
   tree block = gimple_block (stmt);
-  percent_K_format (text, gimple_location (stmt), block);
+  percent_K_format (text, loc, block);
 }
 
 #if __GNUC__ >= 10
Index: gcc/testsuite/gcc.dg/Wstringop-overflow-16.c
===
--- gcc/testsuite/gcc.dg/Wstringop-overflow-16.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/Wstringop-overflow-16.c	(working copy)
@@ -0,0 +1,21 @@
+/* PR middle-end/91599 - GCC does not say where warning is happening
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+struct charseq {
+  unsigned char bytes[0]; // { dg-message "object declared here" }
+};
+
+struct locale_ctype_t {
+  struct charseq *mboutdigits[10];
+};
+
+void ctype_finish (struct locale_ctype_t *ctype)
+{
+  long unsigned int cnt;
+  for (cnt = 0; cnt < 20; ++cnt) {
+static struct charseq replace[2];
+replace[0].bytes[1] = '\0';   // { dg-warning "\\\[-Wstringop-overflow" }
+ctype->mboutdigits[cnt] = &replace[0];
+  }
+}


Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-08-29 Thread Hongtao Liu
On Fri, Aug 30, 2019 at 8:10 AM Hongtao Liu  wrote:
>
> On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak  wrote:
> >
> > 2019-08-28  Uroš Bizjak  
> >
> > * config/i386/i386.c (ix86_register_move_cost): Do not
> > limit the cost of moves to/from XMM register to minimum 8.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Actually committed as r274994 with the wrong ChangeLog.
> >
> > Uros.
>
> There is 11% regression in 548.exchange_r of SPEC2017.
>
> Reason for the regression:
> For 548.exchange_r, a lot of movements between gpr and xmm are
> generated as expected,
> and it reduced  clocksticks by 3%.
> But  however maybe too many xmm registers are used,
> a frequency reduction issue is triggered(average frequency reduced by 13%).
> So totally it takes more time.
>
>
>
> --
> BR,
> Hongtao

Tested on skylake workstation.

-- 
BR,
Hongtao


Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-08-29 Thread Hongtao Liu
On Fri, Aug 30, 2019 at 2:09 AM Uros Bizjak  wrote:
>
> 2019-08-28  Uroš Bizjak  
>
> * config/i386/i386.c (ix86_register_move_cost): Do not
> limit the cost of moves to/from XMM register to minimum 8.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Actually committed as r274994 with the wrong ChangeLog.
>
> Uros.

There is 11% regression in 548.exchange_r of SPEC2017.

Reason for the regression:
For 548.exchange_r, a lot of movements between gpr and xmm are
generated as expected,
and it reduced  clocksticks by 3%.
But  however maybe too many xmm registers are used,
a frequency reduction issue is triggered(average frequency reduced by 13%).
So totally it takes more time.



-- 
BR,
Hongtao


Re: [PATCH] Couple of debug dump improvements to scheduler (no code-gen changes)

2019-08-29 Thread Jeff Law
On 8/29/19 9:44 AM, Maxim Kuvyrkov wrote:
> Hi,
> 
> The first patch adds ranking statistics for autoprefetcher heuristic.
> 
> The second one makes it easier to diff scheduler debug dumps by adding more 
> context lines for diff at clock increments.
> 
> OK to commit?
OK for both.
jeff


Re: [PATCH, V3, #3 of 10], Add prefixed RTL insn attribute

2019-08-29 Thread Segher Boessenkool
Hi Mike,

On Mon, Aug 26, 2019 at 04:31:02PM -0400, Michael Meissner wrote:
>   (rs6000_asm_output_opcode): New function for prifixed memory.

Typo.  Just say "New." or "New function." please.

> --- gcc/config/rs6000/rs6000.c(revision 274871)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -13827,23 +13827,23 @@ addr_mask_to_trad_insn (machine_mode mod
>   early RTL stages before register allocation has been done.  */
>if ((addr_mask & flags) == RELOAD_REG_MULTIPLE)
>  {
> -  machine_mode inner = word_mode;
> +  machine_mode mode2 = mode;

So what is "mode2" for?  A meaningful name and/or some comments would help.

> +   if ((reg_addr[E_DFmode].default_addr_mask & RELOAD_REG_OFFSET) != 0)
> + mode = DFmode;

(Don't use E_ if you do not need it -- i.e. most of the time).

> +/* Helper function to take a REG and a MODE and turn it into the traditional
> +   instruction format (D/DS/DQ) used for offset memory.  */

Is this the form of the preferred insn to do this?  Or ths minimum required
to do it at all?  Something else?

> +  /* If it isn't a register, use the defaults.  */
> +  if (!REG_P (reg) && !SUBREG_P (reg))
> +addr_mask = reg_addr[mode].default_addr_mask;
> +
> +  else
> +{
> +  unsigned int r = reg_or_subregno (reg);

This ICEs if it is a subreg of something else than a reg.

You can just start with

  if (SUBREG_P (reg))
reg = SUBREG_REG (reg);

  if (REG_P (reg))
   ... etc.

> +/* Whether a load instruction is a prefixed instruction.  This is called from
> +   the prefixed attribute processing.  */
> +
> +bool
> +prefixed_load_p (rtx_insn *insn)
> +{
> +  /* Validate the insn to make sure it is a normal load insn.  */
> +  extract_insn_cached (insn);
> +  if (recog_data.n_operands < 2)
> +return false;

Why don't you handle this the same way "indexed" and "update" are already
handled?  That is *easy* and it *works*, it trivially verifiably works.
It also doesn't care whether something is a load or a store.  You hardcode
the few exceptions (okay, twenty or whatever update insns -- but all are
similar, so that is easy), and everything else just works.

The way you code it you just hope to exclude all of the exceptions,
instead of handling them directly.

> +void
> +rs6000_asm_output_opcode (FILE *stream)
> +{
> +  if (next_insn_prefixed_p)
> +fputc ('p', stream);
> +
> +  return;
> +}

You can just write fprintf fwiw, the compile can optimise it for you
just fine.

> +#define ASM_OUTPUT_OPCODE(STREAM, OPCODE)\
> +  do \
> +{
> \
> + if (TARGET_PREFIXED_ADDR)   
> \
> +   rs6000_asm_output_opcode (STREAM);\
> +}
> \
> +  while (0)

(Indentation of the "if" is weird?)

> +;; Whether an insn is a prefixed insn, and an initial 'p' should be printed
> +;; before the instruction.  A prefixed instruction has a prefix instruction

Whether it is a prefixed insn, period.

> +;; word that extends the immediate value of the instructions from 12-16 bits 
> to
> +;; 34 bits.  The macro ASM_OUTPUT_OPCODE emits a leading 'p' for prefixed
> +;; insns.  The default "length" attribute will also be adjusted by default to
> +;; be 12 bytes.

Don't say all the effects here, say that where you make it happen?

> +;; Length in bytes of instructions that use prefixed addressing and length in
> +;; bytes of instructions that does not use prefixed addressing.  This allows
> +;; both lengths to be defined as constants, and the length attribute can pick
> +;; the size as appropriate.
> +(define_attr "prefixed_length" "" (const_int 12))
> +(define_attr "non_prefixed_length" "" (const_int 4))

Do you mean a define_insn can override either to something else?  Then
say that, please?

> +;; Length of the instruction (in bytes).  Prefixed insns are 8 bytes, but the
> +;; assembler might issue need to issue a NOP so that the prefixed instruction
> +;; does not cross a cache boundary, which makes them possibly 12 bytes.

s/issue //

> @@ -9883,8 +9926,8 @@ (define_insn "pcrel_local_addr"
>[(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
>   (match_operand:DI 1 "pcrel_local_address"))]
>"TARGET_PCREL"
> -  "pla %0,%a1"
> -  [(set_attr "length" "12")])
> +  "la %0,%a1"
> +  [(set_attr "prefixed" "yes")])

And just like this you can set the few insns that do not have operands 0
and 1 as source and dest to "no", exactly like is already done for "update"
and "indexed".


Segher


[PATCH] correct MEM_REF bounds checking of arrays (PR 91584)

2019-08-29 Thread Martin Sebor

The -Warray-bounds enhancement I added to GCC 9 causes false
positives in languages like Fortran whose first array element
is at a non-zero index.  The attached patch has the function
responsible for the warning normalize the array bounds to
always start at zero to avoid these false positives.

Tested on x86_64-linux.

Martin
PR middle-end/91584 - Bogus warning from -Warray-bounds during string assignment

gcc/ChangeLog:

	PR middle-end/91584
	* tree-vrp.c (vrp_prop::check_mem_ref): Normalize type domain bounds
	before using them to validate MEM_REF offset.

gcc/testsuite/ChangeLog:
	* gfortran.dg/char_array_constructor_4.f90: New test.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c	(revision 275047)
+++ gcc/tree-vrp.c	(working copy)
@@ -4703,31 +4703,23 @@ vrp_prop::check_mem_ref (location_t location, tree
   || RECORD_OR_UNION_TYPE_P (reftype))
 return false;
 
+  arrbounds[0] = 0;
+
   offset_int eltsize;
   if (TREE_CODE (reftype) == ARRAY_TYPE)
 {
   eltsize = wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (reftype)));
-
   if (tree dom = TYPE_DOMAIN (reftype))
 	{
 	  tree bnds[] = { TYPE_MIN_VALUE (dom), TYPE_MAX_VALUE (dom) };
-	  if (array_at_struct_end_p (arg)
-	  || !bnds[0] || !bnds[1])
-	{
-	  arrbounds[0] = 0;
-	  arrbounds[1] = wi::lrshift (maxobjsize, wi::floor_log2 (eltsize));
-	}
+	  if (array_at_struct_end_p (arg) || !bnds[0] || !bnds[1])
+	arrbounds[1] = wi::lrshift (maxobjsize, wi::floor_log2 (eltsize));
 	  else
-	{
-	  arrbounds[0] = wi::to_offset (bnds[0]) * eltsize;
-	  arrbounds[1] = (wi::to_offset (bnds[1]) + 1) * eltsize;
-	}
+	arrbounds[1] = (wi::to_offset (bnds[1]) - wi::to_offset (bnds[0])
+			+ 1) * eltsize;
 	}
   else
-	{
-	  arrbounds[0] = 0;
-	  arrbounds[1] = wi::lrshift (maxobjsize, wi::floor_log2 (eltsize));
-	}
+	arrbounds[1] = wi::lrshift (maxobjsize, wi::floor_log2 (eltsize));
 
   if (TREE_CODE (ref) == MEM_REF)
 	{
@@ -4742,7 +4734,6 @@ vrp_prop::check_mem_ref (location_t location, tree
   else
 {
   eltsize = 1;
-  arrbounds[0] = 0;
   arrbounds[1] = wi::to_offset (TYPE_SIZE_UNIT (reftype));
 }
 
Index: gcc/testsuite/gfortran.dg/char_array_constructor_4.f90
===
--- gcc/testsuite/gfortran.dg/char_array_constructor_4.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/char_array_constructor_4.f90	(working copy)
@@ -0,0 +1,13 @@
+! PR 30319 - Bogus warning from -Warray-bounds during string assignment
+! { dg-do compile }
+! { dg-options "-O2 -Warray-bounds" }
+
+program test_bounds
+
+  character(256) :: foo
+
+  foo = '1234' ! { dg-bogus "\\\[-Warray-bounds" }
+
+  print *, foo
+
+end program test_bounds


Re: [PATCH V3, #1 of 10], Add basic pc-relative support

2019-08-29 Thread Segher Boessenkool
On Wed, Aug 28, 2019 at 05:26:55PM -0400, Michael Meissner wrote:
> On Wed, Aug 28, 2019 at 12:14:58PM -0500, Segher Boessenkool wrote:
> > > +/* Enumeration giving the type of traditional addressing that would be 
> > > used to
> > > +   decide whether an instruction uses prefixed memory or not.  If the
> > > +   traditional instruction uses the DS instruction format, and the 
> > > bottom 2
> > > +   bits of the offset are not 0, the traditional instruction cannot be 
> > > used,
> > > +   but a prefixed instruction can be used.  */
> > 
> > "Traditional" is a bad word for documentation.  What you mean is what was
> > supported before.  Before you know it "new" will be old as well.
> 
> Yeah, yeah, yeah.  I recall in Amsterdam there is the "Oude Kerk" (old church)
> built in the 1200's and the "De Nieuwe Kerk" in Amsterdam (built in the 
> 1500's)
> and thinking then of the problems of calling something "new" and "old".

:-)

> > Can you fix this struct / arrays / whatever, instead of adding more to it?

> > And these "address masks" are bitmaps of random flags, one for each
> > "register class" (which is not related to the core GCC concept of "register
> > class", and the bits are called "RELOAD_REG_*" although this isn't for
> > reload at all?
> 
> Actually no, they were created explicitly for the secondary reload handler 
> when
> I wrote this interface to add VSX support.

This is not just for reload anymore, so please don't name it that.  Renaming
things isn't hard, this isn't a public API or anything :-)

> > > +   if ((addr_mask & quad_flags) == RELOAD_REG_OFFSET
> > > +   && ((rc == RELOAD_REG_GPR && msize >= 8 && TARGET_POWERPC64)
> > > +   || (rc == RELOAD_REG_VMX)))
> > > + addr_mask |= RELOAD_REG_DS_OFFSET;
> > > +
> > > reg_addr[m].addr_mask[rc] = addr_mask;
> > > -   any_addr_mask |= addr_mask;
> > > +   any_addr_mask |= (addr_mask & ~RELOAD_REG_AND_M16);
> > 
> > Why do you need this last line?  Why was that flag set at all?  What does
> > "any mask" mean if it is not?
> 
> The flag is set to say this register class allows the funky (reg + reg) & -16
> addressing used with the original Altivec instructions.

No, I understand that, but why was it set in some individual mask if you
need to clean it in the "any" mask?

> > > @@ -10770,11 +10855,10 @@ rs6000_secondary_reload_memory (rtx addr
> > >& ~RELOAD_REG_AND_M16);
> > >  
> > >/* If the register allocator hasn't made up its mind yet on the 
> > > register
> > > - class to use, settle on defaults to use.  */
> > > + class to use, use the default address mask bits.  */
> > >else if (rclass == NO_REGS)
> > 
> > And this *does* mean register class.
> 
> No, in the context of the code, it means reload register class.

rclass is a register class.  NO_REGS is a register class.  "rc" isn't.

> The whole
> point is to reduce all of the normal register classes just to the 3 hardware
> register types.

Yes, so don't call it register class.  Don't use the same word for two
different things, esp. when one is used all over the place already.

> > I think this would all be much simpler with just a few lines of code instead
> > of all these tables, fwiw.

That's the core of most of this.  All this precomputation is indirection
that makes things really hard to understand.

And a lot of the more problematic code is the *older* code.  If you improve
that first -- *first*, that is what the earlier patches in a series are
for -- then this all will be much much easier to read and understand and
review and comment on and accept.

> > > +;; Load up a pc-relative address.  Print_operand_address will append a 
> > > @pcrel
> > > +;; to the symbol or label.
> > > +(define_insn "pcrel_local_addr"
> > 
> > This isn't used anywhere?  Not by name, that is?
> 
> Yes it is used in rs6000_emit_move.

Not in this patch though?

> > > +  [(set (match_operand:DI 0 "gpc_reg_operand" "=b*r")
> > > + (match_operand:DI 1 "pcrel_local_address"))]
> > > +  "TARGET_PCREL"
> > > +  "pla %0,%a1"
> > > +  [(set_attr "length" "12")])
> > 
> > I wonder if that whole "b*r" thing is useful at all these days, btw.
> 
> Yep.

You mean it is useful?  Or you question it too?

> > This patch changes a whole bunch of things.  You probably can split it
> > into smaller, self-contained pieces.
> 
> Not really, but I will try.  However, then of course you have the issue that a
> particular patch creates a function that isn't used for a few patches, and you
> have to look at several patches all at once.

No, not if you divide things properly.  You *never* need to introduce more
than one thing at once, if they all are unused!

Multiple concepts in one patch is a LOT of work to review.  It is MUCH
less work to review 50 focused patches than to review just 5 doing the
same, even if those 50 make up twice as many lines of patch total.


Segher


Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment

2019-08-29 Thread Bernd Edlinger
On 8/29/19 11:08 AM, Christophe Lyon wrote:
> On Thu, 29 Aug 2019 at 10:58, Kyrill Tkachov
>  wrote:
>>
>> Hi Bernd,
>>
>> On 8/28/19 10:36 PM, Bernd Edlinger wrote:
>>> On 8/28/19 2:07 PM, Christophe Lyon wrote:
 Hi,

 This patch causes an ICE when building libgcc's unwind-arm.o
 when configuring GCC:
 --target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
 cortex-a15 --with-fpu neon-vfpv4:

 The build works for the same target, but --with-mode arm --with-cpu
 cortex a9 --with-fpu vfp

 In file included from
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
 In function 'get_eit_entry':
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
 warning: cast discards 'const' qualifier from pointer target type
 [-Wcast-qual]
245 |   ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
| ^
 during RTL pass: expand
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
 In function 'unwind_phase2_forced':
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
 internal compiler error: in gen_movdi, at config/arm/arm.md:5235
319 |   saved_vrs.core = entry_vrs->core;
|   ~~~^
 0x126530f gen_movdi(rtx_def*, rtx_def*)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
 0x897083 emit_move_insn(rtx_def*, rtx_def*)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
 0x89ba1e emit_block_move_via_cpymem
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
 block_op_methods, unsigned int, long, unsigned long, unsigned long,
 unsigned long, bool, bool*)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
 0x88c1f9 store_field
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
 0x761964 expand_gimple_stmt_1
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
 0x761964 expand_gimple_stmt
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
 0x768583 expand_gimple_basic_block
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
 0x76abc6 execute
  
 /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538

 Christophe

>>> Okay, sorry for the breakage.
>>>
>>> What is happening in gen_cpymem_ldrd_strd is of course against the rules:
>>>
>>> It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.
>>>
>>> I have a patch for this, which is able to fix the libgcc build on a cross, 
>>> but have no
>>> possibility to bootstrap the affected target.
>>>
>>> Could you please help?
>>
>> Well it's good that the sanitisation is catching the bugs!
>>

Yes, more than expected, though ;)

>> Bootstrapping this patch I get another assert with the backtrace:
> 
> Thanks for the additional testing, Kyrill!
> 
> FWIW, my original report was with a failure to just build GCC for
> cortex-a15. I later got the reports of testing cross-toolchains, and
> saw other problems on cortex-a9 for instance.
> But I guess, you have noticed them with yo

Re: C++ PATCH for P1152R4: Deprecating some uses of volatile (PR c++/91361)

2019-08-29 Thread Martin Sebor

On 8/28/19 5:56 PM, Marek Polacek wrote:

--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -3516,6 +3516,19 @@ result in a call to @code{terminate}.
  Disable the warning about the case when a conversion function converts an
  object to the same type, to a base class of that type, or to void; such
  a conversion function will never be called.
+
+@item -Wvolatile @r{(C++ and Objective-C++ only)}
+@opindex Wvolatile
+@opindex Wno-volatile
+Warn about deprecated uses of the volatile qualifier.  This includes postfix
+and prefix @code{++} and @code{--} expressions of volatile-qualified types,
+using simple assignments where the left operand is a volatile-qualified
+non-class type for their value, compound assignments where the left operand
+is a volatile-qualified non-class type, volatile-qualified function return
+type, volatile-qualified parameter type, and structured bindings of a
+volatile-qualified type.  This usage was deprecated in C++20.


Just a minor thing: Since the text uses volatile as a keyword
(as opposed to an adjective) it should be @code{volatile},
analogously how it's quoted in warnings.

Martin


Re: Go patch committed: Provide index information on bounds check failure

2019-08-29 Thread Andreas Schwab
On Aug 28 2019, Ian Lance Taylor  wrote:

> This patch to the Go frontend and libgo changes the panic message
> reported for an out of bounds index or slice operation to include the
> invalid values.

This breaks aarch64/-mabi=ilp32.

aarch64-suse-linux/ilp32/libgo/archive/tar/check-testlog:

/usr/aarch64-suse-linux/bin/ld: _gotest_.o: in function 
`archive..z2ftar.Reader.next':
/opt/gcc/gcc-20190829/Build/aarch64-suse-linux/ilp32/libgo/gotest1086/test/reader.go:72:
 undefined reference to `runtime.goPanicExtendSliceAlen'

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] Adding _Dependent_ptr type qualifier in C part 1/3

2019-08-29 Thread Joseph Myers
On Fri, 30 Aug 2019, Akshat Garg wrote:

> > The first question for any new thing that is syntactically a qualifier is:
> > is it intended generally to be counted as a qualifier where the standard
> > refers to qualified type, the unqualified version of a type, etc.?  Or is
> > it, like _Atomic, a qualifier only syntactically and generally excluded
> > from references to qualifiers?
> >
> Can you help me in understanding why the _Atomic is excluded from the
> standard references. I referred to the C standard draft N2310 (
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf) but I couldn't
> understand how it is excluded? I want to know what properties should a
> qualifier have to be in standard qualifiers list?

In the case of _Atomic, it can affect the size and alignment of the type 
to which it is applied, which means it can't be considered a qualifier in 
the nornal semantic sense.

In general you need to consider questions such as: is it safe to convert a 
pointer to unqualified type to a pointer to the corresponding 
_Dependent_ptr-qualified type?  Are conditional expressions between 
pointers whose target types differ in presence or absence of 
_Dependent_ptr safe?  If in general, in such places where the standard 
refers to qualifiers, the existing logic there is also appropriate for 
_Dependent_ptr, that suggests it should be a qualifier semantically.  If 
the standard logic often seems inappropriate for _Dependent_ptr, that 
indicates it's not a qualifier in standard terms.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Adding _Dependent_ptr type qualifier in C part 1/3

2019-08-29 Thread Akshat Garg
Hi Joseph,
Many thanks for giving us your feedback.

On Tue, Aug 20, 2019 at 3:57 AM Joseph Myers 
wrote:

> On Tue, 30 Jul 2019, Martin Sebor wrote:
>
> > On 7/30/19 1:13 AM, Akshat Garg wrote:
> > > Hi,
> > > This patch includes C front-end code for a type qualifier
> _Dependent_ptr.
> >
> > Just some very high-level comments/questions.  I only followed
> > the _Dependent_ptr discussion from a distance and I'm likely
> > missing some context so the first thing I looked for in this
> > patch is documentation of the new qualifier.  Unless it's
>
> The first question for any new thing that is syntactically a qualifier is:
> is it intended generally to be counted as a qualifier where the standard
> refers to qualified type, the unqualified version of a type, etc.?  Or is
> it, like _Atomic, a qualifier only syntactically and generally excluded
> from references to qualifiers?
>
Can you help me in understanding why the _Atomic is excluded from the
standard references. I referred to the C standard draft N2310 (
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf) but I couldn't
understand how it is excluded? I want to know what properties should a
qualifier have to be in standard qualifiers list?

Thanks,
Akshat

>
> For the _Atomic implementation I had to go through all the references to
> qualifiers or TYPE_MAIN_VARIANT in the front end and consider in each case
> whether it handled _Atomic correctly, given that _Atomic is not counted as
> a qualifier in the standard (so the unqualified version of const _Atomic
> int is _Atomic int not int, and so can't be derived simply by using
> TYPE_MAIN_VARIANT, for example).  Some cases didn't need changing because
> the handling (e.g. diagnostic for different types) was still appropriate
> for _Atomic even though not formally a qualifier, but plenty did need
> changing and associated tests added.
>
> Such a check of front end code is probably unavoidable (before a change is
> ready for trunk, not necessarily for an initial rough RFC patch) for any
> new qualifier, whether it counts as a qualifier in standard terms or not
> (and the patch reviewer will need to do their own check of references to
> qualifiers or TYPE_MAIN_VARIANT that didn't get changed by the patch), but
> the answer to that question helps indicate whether the default is to
> expect code to need changing for the new qualifier or not.
>
> > you point to it?  (In that case, or if a proposal is planned,
> > the feature should probably either only be available with
> > -std=c2x and -std=gnu2x or a pedantic warning should be issued
>
> There should not be any -std=c2x (flag_isoc2x) conditionals simply based
> on "a proposal is planned".  flag_isoc2x conditionals (pedwarn_c11 calls,
> etc.) should be for cases where a feature is *accepted and committed into
> the C2x branch of Jens's git repository for the C standard*, not for
> something that might be proposed, or is proposed, but doesn't yet have
> specific text integrated into the text of the standard.
>
> If something is simply proposed *and we've concluded it's a good feature
> to have as an extension in any case* then you have a normal
> pedwarn-if-pedantic (no condition on standard version) as for any GNU
> extension (and flag_isoc2x conditions / changes to use pedwarn_c11 instead
> can be added later if the extension is added to the standard).
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
>


Re: Go patch committed: Provide index information on bounds check failure

2019-08-29 Thread Rainer Orth
Hi Ian,

> This patch to the Go frontend and libgo changes the panic message
> reported for an out of bounds index or slice operation to include the
> invalid values.  This makes it easier for the user to see what the
> problem is.  This implements https://golang.org/cl/161477 in the
> gofrontend, for https://golang.org/issue/30116.  Bootstrapped and ran
> Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.
>
> Unfortunately, GMail has once again blocked the patch attachment.  So
> if you want to see the patch, see
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=274998 .

this patch broke sparc-sun-solaris2.11 bootstrap: in gotools I get
several link failures like this:

Undefined   first referenced
 symbol in file
runtime.goPanicExtendSliceAcap  
../sparc-sun-solaris2.11/libgo/.libs/libgo.so
runtime.goPanicExtendSliceAlen  
../sparc-sun-solaris2.11/libgo/.libs/libgo.so
runtime.goPanicExtendIndex  
../sparc-sun-solaris2.11/libgo/.libs/libgo.so
runtime.goPanicExtendIndexU 
../sparc-sun-solaris2.11/libgo/.libs/libgo.so
runtime.goPanicExtendSliceB 
../sparc-sun-solaris2.11/libgo/.libs/libgo.so
runtime.goPanicExtendSliceAcapU 
../sparc-sun-solaris2.11/libgo/.libs/libgo.so
runtime.goPanicExtendSliceBU
../sparc-sun-solaris2.11/libgo/libgotool.a(buildid.o)
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:829: buildid] Error 1

The attached patch fixes this and allows the links to succeed; tests
still to be run.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


diff --git a/libgo/go/runtime/panic32.go b/libgo/go/runtime/panic32.go
--- a/libgo/go/runtime/panic32.go
+++ b/libgo/go/runtime/panic32.go
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build 386 amd64p32 arm mips mipsle m68k nios2 sh shbe
+// +build 386 amd64p32 arm mips mipsle m68k nios2 sh shbe sparc
 
 package runtime
 


[PATCH] Fix unused malloc return value warning

2019-08-29 Thread François Dumont

Hi

    I am having this warning:

/home/fdt/dev/gcc/git/libstdc++-v3/testsuite/util/testsuite_performance.h:170: 
attention: ignoring return value of « void* malloc(size_t) » declared 
with attribute « warn_unused_result » [-Wunused-result]

  170 |   malloc(0); // Needed for some implementations.

    Ok to fix it with attached patch ?

    It seems trivial but I wonder if I shouldn't keep the malloc 
returned pointer and free it properly ?


    Or maybe just remove the malloc cause there is not clear comment 
explaining why it's needed and I haven't found much in SVN audit trail.


    * testsuite_files/util/testsuite_performance.h
    (resource_counter::start): Ignore unused malloc(0) result.

François

diff --git a/libstdc++-v3/testsuite/util/testsuite_performance.h b/libstdc++-v3/testsuite/util/testsuite_performance.h
index 556c78159be..8abc77cf31a 100644
--- a/libstdc++-v3/testsuite/util/testsuite_performance.h
+++ b/libstdc++-v3/testsuite/util/testsuite_performance.h
@@ -167,7 +167,7 @@ namespace __gnu_test
 {
   if (getrusage(who, &rusage_begin) != 0 )
 	memset(&rusage_begin, 0, sizeof(rusage_begin));
-  malloc(0); // Needed for some implementations.
+  void* p __attribute__((unused)) = malloc(0); // Needed for some implementations.
   allocation_begin = mallinfo();
 }
 



[PATCH, i386]: Tighten inline_secondary_memory_needed to reject moves between (SSE,mask) and non-general regs

2019-08-29 Thread Uros Bizjak
2019-08-29  Uroš Bizjak  

* config/i386/i386.c (inline_secondary_memory_needed): Return true
for moves between SSE and non-general registers and between
mask and non-general registers.
(ix86_register_move_cost): Remove stalled comment.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d2d84eb11663..1c9c719f22a3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18306,32 +18306,36 @@ inline_secondary_memory_needed (machine_mode mode, 
reg_class_t class1,
   if (FLOAT_CLASS_P (class1) != FLOAT_CLASS_P (class2))
 return true;
 
-  /* Between mask and general, we have moves no larger than word size.  */
-  if ((MASK_CLASS_P (class1) != MASK_CLASS_P (class2))
-  && (GET_MODE_SIZE (mode) > UNITS_PER_WORD))
-  return true;
-
   /* ??? This is a lie.  We do have moves between mmx/general, and for
  mmx/sse2.  But by saying we need secondary memory we discourage the
  register allocator from using the mmx registers unless needed.  */
   if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2))
 return true;
 
+  /* Between mask and general, we have moves no larger than word size.  */
+  if (MASK_CLASS_P (class1) != MASK_CLASS_P (class2))
+{
+  if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
+ || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   return true;
+}
+
   if (SSE_CLASS_P (class1) != SSE_CLASS_P (class2))
 {
   /* SSE1 doesn't have any direct moves from other classes.  */
   if (!TARGET_SSE2)
return true;
 
+  /* Between SSE and general, we have moves no larger than word size.  */
+  if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
+ || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   return true;
+
   /* If the target says that inter-unit moves are more expensive
 than moving through memory, then don't generate them.  */
   if ((SSE_CLASS_P (class1) && !TARGET_INTER_UNIT_MOVES_FROM_VEC)
  || (SSE_CLASS_P (class2) && !TARGET_INTER_UNIT_MOVES_TO_VEC))
return true;
-
-  /* Between SSE and general, we have moves no larger than word size.  */
-  if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
-   return true;
 }
 
   return false;
@@ -18608,15 +18612,7 @@ ix86_register_move_cost (machine_mode mode, 
reg_class_t class1_i,
   if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2))
 gcc_unreachable ();
 
-  /* Moves between SSE and integer units are expensive.  */
   if (SSE_CLASS_P (class1) != SSE_CLASS_P (class2))
-
-/* ??? By keeping returned value relatively high, we limit the number
-   of moves between integer and SSE registers for all targets.
-   Additionally, high value prevents problem with x86_modes_tieable_p(),
-   where integer modes in SSE registers are not tieable
-   because of missing QImode and HImode moves to, from or between
-   MMX/SSE registers.  */
 return (SSE_CLASS_P (class1)
? ix86_cost->hard_register.sse_to_integer
: ix86_cost->hard_register.integer_to_sse);


Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Wilco Dijkstra
Hi Alexander,
 
> So essentially the main issue is not a hardware peculiarity, but rather the
> bad schedule being totally wrong (it could only make sense if loads had 
> 1-cycle
> latency, which they do not).

The scheduling is only bad because the specific intrinsics used are mapped
onto asm statements, so they are ignored by the scheduler and modelled
with zero latencies.

> I think this highlights how implementing this autoprefetch heuristic via the
> dfa_lookahead_guard interface looks questionable in the first place, but the
> patch itself makes sense to me.

Yes I'm still not sure what this autoprefetch heuristic is trying to 
accomplish...
We could try disabling it and see whether it actually helps.

Wilco

[PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

2019-08-29 Thread Uros Bizjak
2019-08-28  Uroš Bizjak  

* config/i386/i386.c (ix86_register_move_cost): Do not
limit the cost of moves to/from XMM register to minimum 8.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Actually committed as r274994 with the wrong ChangeLog.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 49ab50ea41bf..11c75be113e0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18601,9 +18601,9 @@ ix86_register_move_cost (machine_mode mode, reg_class_t 
class1_i,
where integer modes in SSE registers are not tieable
because of missing QImode and HImode moves to, from or between
MMX/SSE registers.  */
-return MAX (8, SSE_CLASS_P (class1)
-   ? ix86_cost->hard_register.sse_to_integer
-   : ix86_cost->hard_register.integer_to_sse);
+return (SSE_CLASS_P (class1)
+   ? ix86_cost->hard_register.sse_to_integer
+   : ix86_cost->hard_register.integer_to_sse);
 
   if (MAYBE_FLOAT_CLASS_P (class1))
 return ix86_cost->hard_register.fp_move;


C++ PATCH for c++/91129 - wrong error with binary op in template argument

2019-08-29 Thread Marek Polacek
We reject this test with errors like

nontype1.C:22:14: error: taking address of rvalue [-fpermissive]
   22 |   A{}> a2;
  |  ^~~

that happens because for converting "C{}" to int we generate
"C::operator int (&TARGET_EXPR )".  The second
template argument is a binary operator, so while parsing the template
arguments in a function template foo we end up in cp_build_binary_op.

cp_build_binary_op calls, for certain binary ops, fold_non_dependent_expr.
Since  these
calls are no longer tf_none.  fold_non_dependent_expr, when in a template,
will instantiate, which gives the "taking address of rvalue" error.

In this particular case the fix seems to be using fold_for_warn instead,
which in a template is fold_non_dependent_expr with tf_none; all the
fold calls I'm changing in this patch are used for diagnostic purposes
only, and it fixes all the bogus errors.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-08-29  Marek Polacek  

PR c++/91129 - wrong error with binary op in template argument.
* typeck.c (warn_for_null_address): Use fold_for_warn instead of
fold_non_dependent_expr.
(cp_build_binary_op): Likewise.

* g++.dg/cpp1y/nontype1.C: New test.

diff --git gcc/cp/typeck.c gcc/cp/typeck.c
index c09bb309142..31414453524 100644
--- gcc/cp/typeck.c
+++ gcc/cp/typeck.c
@@ -4305,7 +4305,7 @@ warn_for_null_address (location_t location, tree op, 
tsubst_flags_t complain)
   || TREE_NO_WARNING (op))
 return;
 
-  tree cop = fold_non_dependent_expr (op, complain);
+  tree cop = fold_for_warn (op);
 
   if (TREE_CODE (cop) == ADDR_EXPR
   && decl_with_nonnull_addr_p (TREE_OPERAND (cop, 0))
@@ -4628,9 +4628,8 @@ cp_build_binary_op (const op_location_t &location,
  || code1 == COMPLEX_TYPE || code1 == VECTOR_TYPE))
{
  enum tree_code tcode0 = code0, tcode1 = code1;
- tree cop1 = fold_non_dependent_expr (op1, complain);
  doing_div_or_mod = true;
- warn_for_div_by_zero (location, cop1);
+ warn_for_div_by_zero (location, fold_for_warn (op1));
 
  if (tcode0 == COMPLEX_TYPE || tcode0 == VECTOR_TYPE)
tcode0 = TREE_CODE (TREE_TYPE (TREE_TYPE (op0)));
@@ -4669,11 +4668,8 @@ cp_build_binary_op (const op_location_t &location,
 
 case TRUNC_MOD_EXPR:
 case FLOOR_MOD_EXPR:
-  {
-   tree cop1 = fold_non_dependent_expr (op1, complain);
-   doing_div_or_mod = true;
-   warn_for_div_by_zero (location, cop1);
-  }
+  doing_div_or_mod = true;
+  warn_for_div_by_zero (location, fold_for_warn (op1));
 
   if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
  && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
@@ -4766,7 +4762,7 @@ cp_build_binary_op (const op_location_t &location,
}
   else if (code0 == INTEGER_TYPE && code1 == INTEGER_TYPE)
{
- tree const_op1 = fold_non_dependent_expr (op1, complain);
+ tree const_op1 = fold_for_warn (op1);
  if (TREE_CODE (const_op1) != INTEGER_CST)
const_op1 = op1;
  result_type = type0;
@@ -4812,10 +4808,10 @@ cp_build_binary_op (const op_location_t &location,
}
   else if (code0 == INTEGER_TYPE && code1 == INTEGER_TYPE)
{
- tree const_op0 = fold_non_dependent_expr (op0, complain);
+ tree const_op0 = fold_for_warn (op0);
  if (TREE_CODE (const_op0) != INTEGER_CST)
const_op0 = op0;
- tree const_op1 = fold_non_dependent_expr (op1, complain);
+ tree const_op1 = fold_for_warn (op1);
  if (TREE_CODE (const_op1) != INTEGER_CST)
const_op1 = op1;
  result_type = type0;
diff --git gcc/testsuite/g++.dg/cpp1y/nontype1.C 
gcc/testsuite/g++.dg/cpp1y/nontype1.C
new file mode 100644
index 000..a37e996a3ff
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp1y/nontype1.C
@@ -0,0 +1,42 @@
+// PR c++/91129 - wrong error with binary op in template argument.
+// { dg-do compile { target c++14 } }
+
+template
+struct C
+{
+  constexpr operator T() const { return v; }
+  constexpr auto operator()() const { return v; }
+};
+
+template
+struct A
+{
+};
+
+template
+void foo ()
+{
+  A{}> a0;
+  A{}> a1;
+  A{}> a2;
+  A{}> a3;
+  A{}> a4;
+  A{}> a5;
+  A{}> a6;
+  A{}> a7;
+  A{}> a8;
+  A{}> a9;
+  A{}> a10;
+  A> C{})> a11;
+  A{}> a12;
+  A{}> a13;
+  A{}> a14;
+  A{}> a15;
+  A{}> a16;
+  A{}> a17;
+}
+
+int main()
+{
+  foo<10>();
+}


Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Alexander Monakov
On Thu, 29 Aug 2019, Maxim Kuvyrkov wrote:

> >> r1 = [rb + 0]
> >> 
> >> r2 = [rb + 8]
> >> 
> >> r3 = [rb + 16]
> >> 
> >> 
> >> which, apparently, cortex-a53 autoprefetcher doesn't recognize.  This
> >> schedule happens because r2= load gets lower priority than the
> >> "irrelevant"  due to the above patch.
> >> 
> >> If we think about it, the fact that "r1 = [rb + 0]" can be scheduled
> >> means that true dependencies of all similar base+offset loads are
> >> resolved.  Therefore, for autoprefetcher-friendly schedule we should
> >> prioritize memory reads before "irrelevant" instructions.
> > 
> > But isn't there also max number of load issues in a fetch window to 
> > consider? 
> > So interleaving arithmetic with loads might be profitable. 
> 
> It appears that cores with autoprefetcher hardware prefer loads and stores
> bundled together, not interspersed with other instructions to occupy the rest
> of CPU units.

Let me point out that the motivating example has a bigger effect in play:

(1) r1 = [rb + 0]
(2) 
(3) r2 = [rb + 8]
(4) 
(5) r3 = [rb + 16]
(6) 

here Cortex-A53, being an in-order core, cannot issue the load at (3) until
after the load at (1) has completed, because the use at (2) depends on it.
The good schedule allows the three loads to issue in a pipelined fashion.

So essentially the main issue is not a hardware peculiarity, but rather the
bad schedule being totally wrong (it could only make sense if loads had 1-cycle
latency, which they do not).

I think this highlights how implementing this autoprefetch heuristic via the
dfa_lookahead_guard interface looks questionable in the first place, but the
patch itself makes sense to me.

Alexander


Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Wilco Dijkstra
Hi Maxim,
 
 >  It appears that cores with autoprefetcher hardware prefer loads and stores 
 >bundled together, not interspersed with > other instructions to occupy the 
 >rest of CPU units.
  
 I don't believe it is as simple as that - modern cores have multiple 
prefetchers but
 won't prefer bundling loads and stores in large blocks. That would result in 
terrible
 performance due to dispatch and issue stalls. Also the increased register 
pressure
 could cause extra spilling. If we group loads and stores, we'd definitely need 
to
 limit them to say 4 or so at most, and then interleave ALU operations.
 
  > Autoprefetching heuristic is enabled only for cores that support it, and 
isn't active for by default.
  
 It's enabled on most cores, including the default (generic). So we do have to 
be
 careful that this doesn't regress any other benchmarks or do worse on modern
 cores.
 
 Cheers,
 Wilco
  
 

[PATCH][ARM] Add logical DImode expanders

2019-08-29 Thread Wilco Dijkstra
We currently use default mid-end expanders for logical DImode operations.
These split operations without first splitting off complex immediates or
memory operands.  The resulting expansions are non-optimal and allow for
fewer LDRD/STRD opportunities.  So add back explicit expanders which ensure
memory operands and immediates are handled more efficiently.

Bootstrap OK on armhf, regress passes.

ChangeLog:
2019-08-29  Wilco Dijkstra  

* config/arm/arm.md (anddi3): Expand explicitly.
(iordi3): Likewise.
(xordi3): Likewise.
(one_cmpldi2): Likewise.
* config/arm/arm.c (const_ok_for_dimode_op): Return true if one
of the constant parts is simple.
* config/arm/predicates.md (arm_anddi_operand): Add predicate.
(arm_iordi_operand): Add predicate.
(arm_xordi_operand): Add predicate.

--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
fb57880fe0568be96a04aee1b7d230e77121e3f5..1fec00baa2a5e510ef2c02d9766432cc7cd0a17b
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -4273,8 +4273,8 @@ const_ok_for_dimode_op (HOST_WIDE_INT i, enum rtx_code 
code)
 case AND:
 case IOR:
 case XOR:
-  return (const_ok_for_op (hi_val, code) || hi_val == 0x)
-  && (const_ok_for_op (lo_val, code) || lo_val == 0x);
+  return const_ok_for_op (hi_val, code) || hi_val == 0x
+|| const_ok_for_op (lo_val, code) || lo_val == 0x;
 case PLUS:
   return arm_not_operand (hi, SImode) && arm_add_operand (lo, SImode);
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
ed49c4beda138633a84b58fe345cf5ba99103ab7..738d42fd164f117f1dec1108a824d984ccd70d09
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2176,6 +2176,89 @@ (define_expand "divdf3"
   "")
 
 
+; Expand logical operations.  The mid-end expander does not split off memory
+; operands or complex immediates, which leads to fewer LDRD/STRD instructions.
+; So an explicit expander is needed to generate better code.
+
+(define_expand "anddi3"
+  [(set (match_operand:DI0 "s_register_operand")
+   (and:DI (match_operand:DI 1 "s_register_operand")
+   (match_operand:DI 2 "arm_anddi_operand")))]
+  "TARGET_32BIT"
+  {
+  rtx low  = simplify_gen_binary (AND, SImode,
+ gen_lowpart (SImode, operands[1]),
+ gen_lowpart (SImode, operands[2]));
+  rtx high = simplify_gen_binary (AND, SImode,
+ gen_highpart (SImode, operands[1]),
+ gen_highpart_mode (SImode, DImode,
+operands[2]));
+
+  emit_insn (gen_rtx_SET (gen_lowpart (SImode, operands[0]), low));
+  emit_insn (gen_rtx_SET (gen_highpart (SImode, operands[0]), high));
+  DONE;
+  }
+)
+
+(define_expand "iordi3"
+  [(set (match_operand:DI0 "s_register_operand")
+   (ior:DI (match_operand:DI 1 "s_register_operand")
+   (match_operand:DI 2 "arm_iordi_operand")))]
+  "TARGET_32BIT"
+  {
+  rtx low  = simplify_gen_binary (IOR, SImode,
+ gen_lowpart (SImode, operands[1]),
+ gen_lowpart (SImode, operands[2]));
+  rtx high = simplify_gen_binary (IOR, SImode,
+ gen_highpart (SImode, operands[1]),
+ gen_highpart_mode (SImode, DImode,
+operands[2]));
+
+  emit_insn (gen_rtx_SET (gen_lowpart (SImode, operands[0]), low));
+  emit_insn (gen_rtx_SET (gen_highpart (SImode, operands[0]), high));
+  DONE;
+  }
+)
+
+(define_expand "xordi3"
+  [(set (match_operand:DI0 "s_register_operand")
+   (xor:DI (match_operand:DI 1 "s_register_operand")
+   (match_operand:DI 2 "arm_xordi_operand")))]
+  "TARGET_32BIT"
+  {
+   rtx low  = simplify_gen_binary (XOR, SImode,
+   gen_lowpart (SImode, operands[1]),
+   gen_lowpart (SImode, operands[2]));
+   rtx high = simplify_gen_binary (XOR, SImode,
+   gen_highpart (SImode, operands[1]),
+   gen_highpart_mode (SImode, DImode,
+  operands[2]));
+
+   emit_insn (gen_rtx_SET (gen_lowpart (SImode, operands[0]), low));
+   emit_insn (gen_rtx_SET (gen_highpart (SImode, operands[0]), high));
+   DONE;
+  }
+)
+
+(define_expand "one_cmpldi2"
+  [(set (match_operand:DI 0 "s_register_operand")
+   (not:DI (match_operand:DI 1 "s_register_operand")))]
+  "TARGET_32BIT"
+  {
+  rtx low  = simplify_gen_unary (NOT, SImode,
+gen_lowpart (SImode, operands[1]),
+  

Re: [PATCH], Fix V1TI in Altivec regs on old systems

2019-08-29 Thread Segher Boessenkool
Hi!

On Tue, Aug 20, 2019 at 02:00:31PM -0400, Michael Meissner wrote:
> I
> noticed on power5 that the V1TImode mode is allowed in Altivec registers, even
> though power5 doesn't have Altivec registers.
> 
> While it doesn't seem to effect anything (I couldn't create a test case that
> failed), it is a small nit that should be fixed.  The test for TARGET_VADDUQM
> matches a test earlier in the function where VSX registers are checked.

Yeah, but does that test make any sense?

Why p8 (or later) only?  Why vector only?  (Well that one is clear, and
what this patch is about).  Why -mpowerpc64 only?  Because it has __int128
maybe?  But then you should test for *that* (and why is it important?)

Where would V1TI go if not in vector regs?  Just in memory?

And, what happens on 970?  p5 doesn't have vector registers, but 970 does.

> --- gcc/config/rs6000/rs6000.c(revision 274635)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -1874,7 +1874,7 @@ rs6000_hard_regno_mode_ok_uncached (int
>/* AltiVec only in AldyVec registers.  */
>if (ALTIVEC_REGNO_P (regno))
>  return (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
> - || mode == V1TImode);
> + || (TARGET_VADDUQM && mode == V1TImode));


Segher


Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Maxim Kuvyrkov
> On Aug 29, 2019, at 7:29 PM, Richard Biener  
> wrote:
> 
> On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov 
>  wrote:
>> Hi,
>> 
>> This patch tweaks autoprefetcher heuristic in scheduler to better group
>> memory loads and stores together.
>> 
>> From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598:
>> 
>> There are two separate changes, both related to instruction scheduler,
>> that cause the regression.  The first change in r253235 is responsible
>> for 70% of the regression.
>> ===
>>   haifa-sched: fix autopref_rank_for_schedule qsort comparator
>> 
>> * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns
>>   first, always call autopref_rank_data otherwise.
>> 
>> 
>> 
>> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253235
>> 138bc75d-0d04-0410-961f-82ee72b054a4
>> ===
>> 
>> After this change instead of
>> r1 = [rb + 0]
>> r2 = [rb + 8]
>> r3 = [rb + 16]
>> r4 = 
>> r5 = 
>> r6 = 
>> 
>> we get
>> r1 = [rb + 0]
>> 
>> r2 = [rb + 8]
>> 
>> r3 = [rb + 16]
>> 
>> 
>> which, apparently, cortex-a53 autoprefetcher doesn't recognize.  This
>> schedule happens because r2= load gets lower priority than the
>> "irrelevant"  due to the above patch.
>> 
>> If we think about it, the fact that "r1 = [rb + 0]" can be scheduled
>> means that true dependencies of all similar base+offset loads are
>> resolved.  Therefore, for autoprefetcher-friendly schedule we should
>> prioritize memory reads before "irrelevant" instructions.
> 
> But isn't there also max number of load issues in a fetch window to consider? 
> So interleaving arithmetic with loads might be profitable. 

It appears that cores with autoprefetcher hardware prefer loads and stores 
bundled together, not interspersed with other instructions to occupy the rest 
of CPU units.

Autoprefetching heuristic is enabled only for cores that support it, and isn't 
active for by default.

> 
>> On the other hand, following similar logic, we want to delay memory
>> stores as much as possible to start scheduling them only after all
>> potential producers are scheduled.  I.e., for autoprefetcher-friendly
>> schedule we should prioritize "irrelevant" instructions before memory
>> writes.
>> 
>> Obvious patch to implement the above is attached.  It brings 70% of
>> regressed performance on this testcase back.
>> 
>> OK to commit?
>> 
>> Regards,
>> 
>> --
>> Maxim Kuvyrkov
>> www.linaro.org



Re: [PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Richard Biener
On August 29, 2019 5:40:47 PM GMT+02:00, Maxim Kuvyrkov 
 wrote:
>Hi,
>
>This patch tweaks autoprefetcher heuristic in scheduler to better group
>memory loads and stores together.
>
>From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598:
>
>There are two separate changes, both related to instruction scheduler,
>that cause the regression.  The first change in r253235 is responsible
>for 70% of the regression.
>===
>haifa-sched: fix autopref_rank_for_schedule qsort comparator
>
> * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns
>first, always call autopref_rank_data otherwise.
>
>
>
>git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253235
>138bc75d-0d04-0410-961f-82ee72b054a4
>===
>
>After this change instead of
>r1 = [rb + 0]
>r2 = [rb + 8]
>r3 = [rb + 16]
>r4 = 
>r5 = 
>r6 = 
>
>we get
>r1 = [rb + 0]
>
>r2 = [rb + 8]
>
>r3 = [rb + 16]
>
>
>which, apparently, cortex-a53 autoprefetcher doesn't recognize.  This
>schedule happens because r2= load gets lower priority than the
>"irrelevant"  due to the above patch.
>
>If we think about it, the fact that "r1 = [rb + 0]" can be scheduled
>means that true dependencies of all similar base+offset loads are
>resolved.  Therefore, for autoprefetcher-friendly schedule we should
>prioritize memory reads before "irrelevant" instructions.

But isn't there also max number of load issues in a fetch window to consider? 
So interleaving arithmetic with loads might be profitable. 

>On the other hand, following similar logic, we want to delay memory
>stores as much as possible to start scheduling them only after all
>potential producers are scheduled.  I.e., for autoprefetcher-friendly
>schedule we should prioritize "irrelevant" instructions before memory
>writes.
>
>Obvious patch to implement the above is attached.  It brings 70% of
>regressed performance on this testcase back.
>
>OK to commit?
>
>Regards,
>
>--
>Maxim Kuvyrkov
>www.linaro.org



Re: [PATCH] Generalized predicate/condition for parameter reference in IPA (PR ipa/91088)

2019-08-29 Thread Martin Jambor
Hi,

On Fri, Jul 12 2019, Feng Xue OS wrote:
> Current IPA-cp only generates cost-evaluating predicate for conditional 
> statement like
> "if (param cmp const_val)", it is too simple and conservative. This patch 
> generalizes the
> process to handle the form as T(param), a mathematical transformation on the 
> function
> parameter, in which the parameter occurs once, and other operands are 
> constant value.

thanks for working on this.  I cannot approve this, but I have had a
brief look and have the following comments:
>
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3d92250b520..0110446e09e 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,25 @@
> +2019-07-12  Feng Xue  
> +
> + PR ipa/91088
> + * ipa-predicat.h (struct expr_eval_op): New struct.
> + (expr_eval_ops): New typedef.
> + (struct condition): Add param_ops member.
> + (add_condition): Add param_ops parameter.
> + * ipa-predicat.c (expr_eval_ops_equal_p): New function.
> + (predicate::add_clause): Add param_ops comparison.
> + (dump_condition): Add debug dump for param_ops.
> + (remap_after_inlining): Add param_ops argument to call to
> + add_condition.
> + (add_condition): Add parameter param_ops.
> + * ipa-fnsummary.c (evaluate_conditions_for_known_args): Fold
> + parameter expressions using param_ops.
> + (decompose_param_expr):  New function.
> + (set_cond_stmt_execution_predicate): Use call to decompose_param_expr
> + to replace call to unmodified_parm_or_parm_agg_item.
> + (set_switch_stmt_execution_predicate): Likewise.
> + (inline_read_section): Read param_ops from summary stream.
> + (ipa_fn_summary_write): Write param_ops to summary stream.
> +

(It's a bad idea to make ChangeLog entries part of the patch, it won't
apply to anyone, not even to you nowadays. )


>  2019-07-11  Sunil K Pandey  
>  
>   PR target/90980
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
> index 09986211a1d..faf8bd39090 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -301,9 +301,9 @@ set_hint_predicate (predicate **p, predicate 
> new_predicate)
>  }
>  
>  
> -/* Compute what conditions may or may not hold given invormation about
> +/* Compute what conditions may or may not hold given information about
> parameters.  RET_CLAUSE returns truths that may hold in a specialized 
> copy,
> -   whie RET_NONSPEC_CLAUSE returns truths that may hold in an nonspecialized
> +   while RET_NONSPEC_CLAUSE returns truths that may hold in an nonspecialized
> copy when called in a given context.  It is a bitmask of conditions. Bit
> 0 means that condition is known to be false, while bit 1 means that 
> condition
> may or may not be true.  These differs - for example NOT_INLINED condition
> @@ -336,6 +336,8 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
>  {
>tree val;
>tree res;
> +  int j;
> +  struct expr_eval_op *op;
>  
>/* We allow call stmt to have fewer arguments than the callee function
>   (especially for K&R style programs).  So bound check here (we assume
> @@ -399,7 +401,18 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
> continue;
>   }
>  
> -  val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (c->val), val);
> +  for (j = 0; vec_safe_iterate (c->param_ops, j, &op); j++)
> + {
> +   if (!op->val)
> + val = fold_unary (op->code, op->type, val);
> +   else if (op->val_is_rhs)
> + val = fold_binary_to_constant (op->code, op->type, val, op->val);
> +   else
> + val = fold_binary_to_constant (op->code, op->type, op->val, val);
> +   if (!val)
> + break;
> + }
> +
>res = val
>   ? fold_binary_to_constant (c->code, boolean_type_node, val, c->val)
>   : NULL;
> @@ -1177,6 +1190,105 @@ eliminated_by_inlining_prob (ipa_func_body_info *fbi, 
> gimple *stmt)
>  }
>  }
>  
> +/* Flatten a tree expression on parameter into a set of sequential 
> operations.
> +   we only handle expression that is a mathematical transformation on the
> +   parameter, and in the expression, parameter occurs only once, and other
> +   operands are IPA invariant.  */

I understand describing these things is difficult, but flatten is
strange way to describe what the function does.  What about somthing
like the following?

Analyze EXPR if it represents a series of simple operations performed on
a function parameter and return true if so.  FBI, STMT, INDEX_P, SIZE_P
and AGGPOS have the same meaning like in
unmodified_parm_or_parm_agg_item.  Operations on the parameter are
recorded to PARAM_OPS_P if it is not NULL.


> +
> +static bool
> +decompose_param_expr (struct ipa_func_body_info *fbi,
> +   gimple *stmt, tree expr,
> +   int *index_p, HOST_WIDE_INT *size_p,
> +   struct agg_position_info *aggpos,
> +   e

[PATCH] Couple of debug dump improvements to scheduler (no code-gen changes)

2019-08-29 Thread Maxim Kuvyrkov
Hi,

The first patch adds ranking statistics for autoprefetcher heuristic.

The second one makes it easier to diff scheduler debug dumps by adding more 
context lines for diff at clock increments.

OK to commit?

--
Maxim Kuvyrkov
www.linaro.org




0002-Add-missing-entry-for-rank_for_schedule-stats.patch
Description: Binary data


0003-Improve-diff-ability-of-scheduler-logs.patch
Description: Binary data


[PR91598] Improve autoprefetcher heuristic in haifa-sched.c

2019-08-29 Thread Maxim Kuvyrkov
Hi,

This patch tweaks autoprefetcher heuristic in scheduler to better group memory 
loads and stores together.

From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598:

There are two separate changes, both related to instruction scheduler, that 
cause the regression.  The first change in r253235 is responsible for 70% of 
the regression.
===
haifa-sched: fix autopref_rank_for_schedule qsort comparator

* haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' 
insns
first, always call autopref_rank_data otherwise.



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253235 
138bc75d-0d04-0410-961f-82ee72b054a4
===

After this change instead of
r1 = [rb + 0]
r2 = [rb + 8]
r3 = [rb + 16]
r4 = 
r5 = 
r6 = 

we get
r1 = [rb + 0]

r2 = [rb + 8]

r3 = [rb + 16]


which, apparently, cortex-a53 autoprefetcher doesn't recognize.  This schedule 
happens because r2= load gets lower priority than the "irrelevant"  due to the above patch.

If we think about it, the fact that "r1 = [rb + 0]" can be scheduled means that 
true dependencies of all similar base+offset loads are resolved.  Therefore, 
for autoprefetcher-friendly schedule we should prioritize memory reads before 
"irrelevant" instructions.

On the other hand, following similar logic, we want to delay memory stores as 
much as possible to start scheduling them only after all potential producers 
are scheduled.  I.e., for autoprefetcher-friendly schedule we should prioritize 
"irrelevant" instructions before memory writes.

Obvious patch to implement the above is attached.  It brings 70% of regressed 
performance on this testcase back.

OK to commit?

Regards,

--
Maxim Kuvyrkov
www.linaro.org




0001-Improve-autoprefetcher-heuristic-partly-fix-regressi.patch
Description: Binary data


Re: [ARM/FDPIC v5 00/21] FDPIC ABI for ARM

2019-08-29 Thread Christophe Lyon

On 29/08/2019 15:57, Christophe Lyon wrote:

Hi,

On 15/05/2019 14:39, Christophe Lyon wrote:

Hello,

This patch series implements the GCC contribution of the FDPIC ABI for
ARM targets.

This ABI enables to run Linux on ARM MMU-less cores and supports
shared libraries to reduce the memory footprint.

Without MMU, text and data segments relative distances are different
from one process to another, hence the need for a dedicated FDPIC
register holding the start address of the data segment. One of the
side effects is that function pointers require two words to be
represented: the address of the code, and the data segment start
address. These two words are designated as "Function Descriptor",
hence the "FD PIC" name.

On ARM, the FDPIC register is r9 [1], and the target name is
arm-uclinuxfdpiceabi. Note that arm-uclinux exists, but uses another
ABI and the BFLAT file format; it does not support code sharing.
The -mfdpic option is enabled by default, and -mno-fdpic should be
used to build the Linux kernel.

This work was developed some time ago by STMicroelectronics, and was
presented during Linaro Connect SFO15 (September 2015). You can watch
the discussion and read the slides [2].
This presentation was related to the toolchain published on github [3],
which is based on binutils-2.22, gcc-4.7, uclibc-0.9.33.2, gdb-7.5.1
and qemu-2.3.0, and for which pre-built binaries are available [3].

The ABI itself is described in details in [1].

Our Linux kernel patches have been updated and committed by Nicolas
Pitre (Linaro) in July 2017. They are required so that the loader is
able to handle this new file type. Indeed, the ELF files are tagged
with ELFOSABI_ARM_FDPIC. This new tag has been allocated by ARM, as
well as the new relocations involved.

The binutils, QEMU and uclibc-ng patch series have been merged a few
months ago. [4][5][6]

This series provides support for architectures that support ARM and/or
Thumb-2 and has been tested on arm-linux-gnueabi without regression,
as well as arm-uclinuxfdpiceabi, using QEMU. arm-uclinuxfdpiceabi has
a few more failures than arm-linux-gnueabi, but is quite functional.

I have also booted an STM32 board (stm32f469) which uses a cortex-m4
with linux-4.20.17 and ran successfully several tools.

Are the GCC patches OK for inclusion in master?


I have addressed the comments I received on v5, and I am going to post updated 
versions of the patches that needed changes as follow-ups in this thread. I 
hope this will help reviewers as I will provide answers and updated patches 
next to their comments. After that, I will rebase the whole series and send it 
as v6 if that helps (several testsuite patches have already been approved 
as-is, but committing them now would change the patch numbering, thus possibly 
confusing reviewers).

However, note that several patches in the series haven't received feedback yet, 
so this is a ping for them :-)
[ARM/FDPIC v5 06/21] [ARM] FDPIC: Add support for c++ exceptions
[ARM/FDPIC v5 10/21] [ARM] FDPIC: Implement TLS support.
[ARM/FDPIC v5 11/21] [ARM] FDPIC: Add support to unwind FDPIC signal frame
[ARM/FDPIC v5 12/21] [ARM] FDPIC: Restore r9 after we call __aeabi_read_tp
[ARM/FDPIC v5 13/21] [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture



I forgot to mention that I found a problem in libitm's sjlj.S, worth this 
additional patch.

Christophe



Thanks,

Christophe


Changes between v4 and v5:
- rebased on top of recent gcc-10 master (April 26th, 2019)
- fixed handling of stack-protector combined patterns in FDPIC mode

Changes between v3 and v4:

- improved documentation (patch 1)
- emit an error message (sorry) if the target architecture does not
   support arm nor thumb-2 modes (patch 4)
- handle Richard's comments on patch 4 (comments, unspec)
- added .align directive (patch 5)
- fixed use of kernel helpers (__kernel_cmpxchg, __kernel_dmb) (patch 6)
- code factorization in patch 7
- typos/internal function name in patch 8
- improved patch 12
- dropped patch 16
- patch 20 introduces arm_arch*_thumb_ok effective targets to help
   skip some tests
- I tested patch 2 on xtensa-buildroot-uclinux-uclibc, it adds many
   new tests, but a few regressions
   (https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00713.html)
- I compiled and executed several LTP tests to exercise pthreads and signals
- I wrote and executed a simple testcase to change the interaction
   with __kernel_cmpxchg (ie. call the kernel helper rather than use an
   implementation in libgcc as requested by Richard)

Changes between v2 and v3:
- added doc entry for -mfdpic new option
- took Kyrill's comments into account (use "Armv7" instead of "7",
   code factorization, use preprocessor instead of hard-coding "r9",
   remove leftover code for thumb1 support, fixed comments)
- rebase over recent trunk
- patches with changes: 1, 2 (commit message), 3 (rebase), 4, 6, 7, 9,
   14 (rebase), 19 (rebase)

Changes between v1 and v2:
- fix GNU coding style
- exit with an error 

Re: [PATCH v2 0/9] S/390: Use signaling FP comparison instructions

2019-08-29 Thread Ilya Leoshkevich
> Am 22.08.2019 um 15:45 schrieb Ilya Leoshkevich :
> 
> Bootstrap and regtest running on x86_64-redhat-linux and
> s390x-redhat-linux.
> 
> This patch series adds signaling FP comparison support (both scalar and
> vector) to s390 backend.

I'm running into a problem on ppc64 with this patch, and it would be
great if someone could help me figure out the best way to resolve it.

vector36.C test is failing because gimplifier produces the following

  _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
  _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;

from

  VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
  { -1, -1, -1, -1 } ,
  { 0, 0, 0, 0 } >

Since the comparison tree code is now hidden behind a temporary, my code
does not have anything to pass to the backend.  The reason for creating
a temporary is that the comparison can trap, and so the following check
in gimplify_expr fails:

  if (gimple_seq_empty_p (internal_post) && (*gimple_test_f) (*expr_p))
goto out;

gimple_test_f is is_gimple_condexpr, and it eventually calls
operation_could_trap_p (GT).

My current solution is to simply state that backend does not support
SSA_NAME in vector comparisons, however, I don't like it, since it may
cause performance regressions due to having to fall back to scalar
comparisons.

I was thinking about two other possible solutions:

1. Change the gimplifier to allow trapping vector comparisons.  That's
   a bit complicated, because tree_could_throw_p checks not only for
   floating point traps, but also e.g. for array index out of bounds
   traps.  So I would have to create a tree_could_throw_p version which
   disregards specific kinds of traps.

2. Change expand_vector_condition to follow SSA_NAME_DEF_STMT and use
   its tree_code instead of SSA_NAME.  The potential problem I see with
   this is that there appears to be no guarantee that _5 will be inlined
   into _6 at a later point.  So if we say that we don't need to fall
   back to scalar comparisons based on availability of vector >
   instruction and inlining does not happen, then what's actually will
   be required is vector selection (vsel on S/390), which might not be
   available in general case.

What would be a better way to proceed here?



Re: [ARM/FDPIC v5 12/21] [ARM] FDPIC: Restore r9 after we call __aeabi_read_tp

2019-08-29 Thread Kyrill Tkachov

Hi Christophe,

On 5/15/19 1:39 PM, Christophe Lyon wrote:

We call __aeabi_read_tp() to get the thread pointer. Since this is a
function call, we have to restore the FDPIC register afterwards.

2019-XX-XX  Christophe Lyon  
    Mickaël Guêné 

    gcc/
    * config/arm/arm.c (arm_load_tp): Add FDPIC support.
    * config/arm/arm.md (load_tp_soft_fdpic): New pattern.
    (load_tp_soft): Disable in FDPIC mode.

Change-Id: I1f6dfaee6260ecb453270f4971b3c5124317a186

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5fc7a20..26f29c7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8732,7 +8732,25 @@ arm_load_tp (rtx target)

   rtx tmp;

-  emit_insn (gen_load_tp_soft ());
+  if (TARGET_FDPIC)
+   {
+ rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+ rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+ rtx initial_fdpic_reg = get_hard_reg_initial_val (Pmode, 
FDPIC_REGNUM);

+
+ emit_insn (gen_load_tp_soft_fdpic ());
+
+ /* Restore r9.  */
+ XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,
+   gen_rtvec (2, fdpic_reg,
+ initial_fdpic_reg),
+ UNSPEC_PIC_RESTORE);
+ XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg);
+ XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg);
+ emit_insn (par);
+   }
+  else
+   emit_insn (gen_load_tp_soft ());

   tmp = gen_rtx_REG (SImode, R0_REGNUM);
   emit_move_insn (target, tmp);
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 9036255..0edcb1d 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11759,12 +11759,25 @@
 )

 ;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
+(define_insn "load_tp_soft_fdpic"
+  [(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS))
+   (clobber (reg:SI 9))


Use FDPIC_REGNUM here (does it need to be declared at the top of arm.md 
for it to work?)


Otherwise this is ok.

Thanks,

Kyrill




+   (clobber (reg:SI LR_REGNUM))
+   (clobber (reg:SI IP_REGNUM))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_SOFT_TP && TARGET_FDPIC"
+  "bl\\t__aeabi_read_tp\\t@ load_tp_soft"
+  [(set_attr "conds" "clob")
+   (set_attr "type" "branch")]
+)
+
+;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
 (define_insn "load_tp_soft"
   [(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS))
    (clobber (reg:SI LR_REGNUM))
    (clobber (reg:SI IP_REGNUM))
    (clobber (reg:CC CC_REGNUM))]
-  "TARGET_SOFT_TP"
+  "TARGET_SOFT_TP && !TARGET_FDPIC"
   "bl\\t__aeabi_read_tp\\t@ load_tp_soft"
   [(set_attr "conds" "clob")
    (set_attr "type" "branch")]
--
2.6.3



Re: [ARM/FDPIC v5 12/21] [ARM] FDPIC: Restore r9 after we call __aeabi_read_tp

2019-08-29 Thread Christophe Lyon

Here is an updated version that makes use of the helper 
gen_restore_pic_register_after_call

Christophe


On 15/05/2019 14:39, Christophe Lyon wrote:

We call __aeabi_read_tp() to get the thread pointer. Since this is a
function call, we have to restore the FDPIC register afterwards.

2019-XX-XX  Christophe Lyon  
Mickaël Guêné 

gcc/
* config/arm/arm.c (arm_load_tp): Add FDPIC support.
* config/arm/arm.md (load_tp_soft_fdpic): New pattern.
(load_tp_soft): Disable in FDPIC mode.

Change-Id: I1f6dfaee6260ecb453270f4971b3c5124317a186

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5fc7a20..26f29c7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8732,7 +8732,25 @@ arm_load_tp (rtx target)
  
rtx tmp;
  
-  emit_insn (gen_load_tp_soft ());

+  if (TARGET_FDPIC)
+   {
+ rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+ rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+ rtx initial_fdpic_reg = get_hard_reg_initial_val (Pmode, 
FDPIC_REGNUM);
+
+ emit_insn (gen_load_tp_soft_fdpic ());
+
+ /* Restore r9.  */
+ XVECEXP (par, 0, 0) = gen_rtx_UNSPEC (VOIDmode,
+   gen_rtvec (2, fdpic_reg,
+  initial_fdpic_reg),
+   UNSPEC_PIC_RESTORE);
+ XVECEXP (par, 0, 1) = gen_rtx_USE (VOIDmode, initial_fdpic_reg);
+ XVECEXP (par, 0, 2) = gen_rtx_CLOBBER (VOIDmode, fdpic_reg);
+ emit_insn (par);
+   }
+  else
+   emit_insn (gen_load_tp_soft ());
  
tmp = gen_rtx_REG (SImode, R0_REGNUM);

emit_move_insn (target, tmp);
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 9036255..0edcb1d 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11759,12 +11759,25 @@
  )
  
  ;; Doesn't clobber R1-R3.  Must use r0 for the first operand.

+(define_insn "load_tp_soft_fdpic"
+  [(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS))
+   (clobber (reg:SI 9))
+   (clobber (reg:SI LR_REGNUM))
+   (clobber (reg:SI IP_REGNUM))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_SOFT_TP && TARGET_FDPIC"
+  "bl\\t__aeabi_read_tp\\t@ load_tp_soft"
+  [(set_attr "conds" "clob")
+   (set_attr "type" "branch")]
+)
+
+;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
  (define_insn "load_tp_soft"
[(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS))
 (clobber (reg:SI LR_REGNUM))
 (clobber (reg:SI IP_REGNUM))
 (clobber (reg:CC CC_REGNUM))]
-  "TARGET_SOFT_TP"
+  "TARGET_SOFT_TP && !TARGET_FDPIC"
"bl\\t__aeabi_read_tp\\t@ load_tp_soft"
[(set_attr "conds" "clob")
 (set_attr "type" "branch")]



>From b27af6ffc5423679167b5862764d259598b3bf29 Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Thu, 8 Feb 2018 14:51:07 +0100
Subject: [ARM/FDPIC v6 12/24] [ARM] FDPIC: Restore r9 after we call
 __aeabi_read_tp
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We call __aeabi_read_tp() to get the thread pointer. Since this is a
function call, we have to restore the FDPIC register afterwards.

2019-XX-XX  Christophe Lyon  
	Mickaël Guêné 

	gcc/
	* config/arm/arm.c (arm_load_tp): Add FDPIC support.
	* config/arm/arm.md (load_tp_soft_fdpic): New pattern.
	(load_tp_soft): Disable in FDPIC mode.

Change-Id: I0811cc7c5df8f44dd8b8b1f4caf54c7d3609c414

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 43fe467..9501e8d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8729,7 +8729,18 @@ arm_load_tp (rtx target)
 
   rtx tmp;
 
-  emit_insn (gen_load_tp_soft ());
+  if (TARGET_FDPIC)
+	{
+	  rtx fdpic_reg = gen_rtx_REG (Pmode, FDPIC_REGNUM);
+	  rtx initial_fdpic_reg = get_hard_reg_initial_val (Pmode, FDPIC_REGNUM);
+
+	  emit_insn (gen_load_tp_soft_fdpic ());
+
+	  /* Restore r9.  */
+	  emit_insn (gen_restore_pic_register_after_call(fdpic_reg, initial_fdpic_reg));
+	}
+  else
+	emit_insn (gen_load_tp_soft ());
 
   tmp = gen_rtx_REG (SImode, R0_REGNUM);
   emit_move_insn (target, tmp);
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 328d32d..ea015ed 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11700,12 +11700,25 @@
 )
 
 ;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
+(define_insn "load_tp_soft_fdpic"
+  [(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS))
+   (clobber (reg:SI 9))
+   (clobber (reg:SI LR_REGNUM))
+   (clobber (reg:SI IP_REGNUM))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_SOFT_TP && TARGET_FDPIC"
+  "bl\\t__aeabi_read_tp\\t@ load_tp_soft"
+  [(set_attr "conds" "clob")
+   (set_attr "type" "branch")]
+)
+
+;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
 (define_insn "load_tp_soft"
   [(set (reg:SI 0) (unspec:SI [(const_int 0)] UNSPEC_TLS))
(clobber (reg:SI LR_REGNUM))
(clo

Re: [ARM/FDPIC v5 09/21] [ARM] FDPIC: Add support for taking address of nested function

2019-08-29 Thread Christophe Lyon

On 31/07/2019 16:44, Christophe Lyon wrote:

On Tue, 16 Jul 2019 at 14:42, Kyrill Tkachov
 wrote:



On 7/16/19 12:18 PM, Kyrill Tkachov wrote:

Hi Christophe

On 5/15/19 1:39 PM, Christophe Lyon wrote:

In FDPIC mode, the trampoline generated to support pointers to nested
functions looks like:

.word trampoline address
.word trampoline GOT address
ldrr12, [pc, #8]
ldrr9, [pc, #8]
ldr   pc, [pc, #8]
.word static chain value
.word GOT address
.word function's address

because in FDPIC function pointers are actually pointers to function
descriptors, we have to actually generate a function descriptor for
the trampoline.

2019-XX-XX  Christophe Lyon 
 Mickaël Guêné 

 gcc/
 * config/arm/arm.c (arm_asm_trampoline_template): Add FDPIC
 support.
 (arm_trampoline_init): Likewise.
 (arm_trampoline_init): Likewise.
 * config/arm/arm.h (TRAMPOLINE_SIZE): Likewise.

Change-Id: Idc4d5f629ae4f8d79bdf9623517481d524a0c144

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 40e3f3b..99d13bf 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3976,13 +3976,50 @@ arm_warn_func_return (tree decl)
 .word static chain value
 .word function's address
 XXX FIXME: When the trampoline returns, r8 will be clobbered.  */
+/* In FDPIC mode, the trampoline looks like:
+  .word trampoline address
+  .word trampoline GOT address
+  ldrr12, [pc, #8] ; #4 for Thumb2
+  ldrr9,  [pc, #8] ; #4 for Thumb2
+  ldr   pc,  [pc, #8] ; #4 for Thumb2
+  .word static chain value
+  .word GOT address
+  .word function's address
+*/



I think this comment is not right for Thumb2.

These load instructionshave 32-bit encodings, even in Thumb2 (they use
high registers).


Andre and Wilco pointed out to me offline that the offset should be #4
for Arm mode.

The Arm ARM at E1.2.3 says:

PC, the program counter

* When executing an A32 instruction, PC reads as the address of the
current instruction plus 8.

* When executing a T32 instruction, PC reads as the address of the
current instruction plus 4.



Yes, it looks like the code is right, and the comment is wrong:
- offset 8 for thumb2 mode
- offset 4 for arm mode


Here is the updated version


Thanks,

Christophe


Thanks,

Kyrill




Also, please merge this comment with the one above (no separate /**/)



  static void
  arm_asm_trampoline_template (FILE *f)
  {
fprintf (f, "\t.syntax unified\n");

-  if (TARGET_ARM)
+  if (TARGET_FDPIC)
+{
+  /* The first two words are a function descriptor pointing to the
+trampoline code just below.  */
+  if (TARGET_ARM)
+   fprintf (f, "\t.arm\n");
+  else if (TARGET_THUMB2)
+   fprintf (f, "\t.thumb\n");
+  else
+   /* Only ARM and Thumb-2 are supported.  */
+   gcc_unreachable ();
+
+  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+  /* Trampoline code which sets the static chain register but also
+PIC register before jumping into real code. */
+  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
+  STATIC_CHAIN_REGNUM, PC_REGNUM,
+  TARGET_THUMB2 ? 8 : 4);
+  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
+  PIC_OFFSET_TABLE_REGNUM, PC_REGNUM,
+  TARGET_THUMB2 ? 8 : 4);
+  asm_fprintf (f, "\tldr\t%r, [%r, #%d]\n",
+  PC_REGNUM, PC_REGNUM,
+  TARGET_THUMB2 ? 8 : 4);



As above, I think the offset should be 8 for both Arm and Thumb2.

Thanks,

Kyrill



+  assemble_aligned_integer (UNITS_PER_WORD, const0_rtx);
+}
+  else if (TARGET_ARM)
  {
fprintf (f, "\t.arm\n");
asm_fprintf (f, "\tldr\t%r, [%r, #0]\n", STATIC_CHAIN_REGNUM,
PC_REGNUM);
@@ -4023,12 +4060,40 @@ arm_trampoline_init (rtx m_tramp, tree fndecl,
rtx chain_value)
emit_block_move (m_tramp, assemble_trampoline_template (),
 GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);

-  mem = adjust_address (m_tramp, SImode, TARGET_32BIT ? 8 : 12);
-  emit_move_insn (mem, chain_value);
+  if (TARGET_FDPIC)
+{
+  rtx funcdesc = XEXP (DECL_RTL (fndecl), 0);
+  rtx fnaddr = gen_rtx_MEM (Pmode, funcdesc);
+  rtx gotaddr = gen_rtx_MEM (Pmode, plus_constant (Pmode,
funcdesc, 4));
+  /* The function start address is at offset 8, but in Thumb mode
+we want bit 0 set to 1 to indicate Thumb-ness, hence 9
+below.  */
+  rtx trampoline_code_start
+   = plus_constant (Pmode, XEXP (m_tramp, 0), TARGET_THUMB2 ? 9

: 8);

+
+  /* Write initial funcdesc which points to the trampoline.  */
+  mem = adjust_address (m_tramp, SImode, 0);
+  emit_move_insn (mem, trampoline_code_sta

Re: [ARM/FDPIC v5 05/21] [ARM] FDPIC: Fix __do_global_dtors_aux and frame_dummy generation

2019-08-29 Thread Christophe Lyon

On 12/07/2019 08:06, Richard Sandiford wrote:

Christophe Lyon  writes:

In FDPIC, we need to make sure __do_global_dtors_aux and frame_dummy
are referenced by their address, not by pointers to the function
descriptors.

2019-XX-XX  Christophe Lyon  
Mickaël Guêné 

* libgcc/crtstuff.c: Add support for FDPIC.

Change-Id: I0bc4b1232fbf3c69068fb23a1b9cafc895d141b1

diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
index 4927a9f..159b461 100644
--- a/libgcc/crtstuff.c
+++ b/libgcc/crtstuff.c
@@ -429,9 +429,18 @@ __do_global_dtors_aux (void)
  #ifdef FINI_SECTION_ASM_OP
  CRT_CALL_STATIC_FUNCTION (FINI_SECTION_ASM_OP, __do_global_dtors_aux)
  #elif defined (FINI_ARRAY_SECTION_ASM_OP)
+#if defined(__FDPIC__)
+__asm__(
+"   .section .fini_array\n"
+"   .align 2\n"
+"   .word __do_global_dtors_aux\n"
+);
+asm (TEXT_SECTION_ASM_OP);
+#else /* defined(__FDPIC__) */
  static func_ptr __do_global_dtors_aux_fini_array_entry[]
__attribute__ ((__used__, section(".fini_array"), 
aligned(sizeof(func_ptr
= { __do_global_dtors_aux };
+#endif /* defined(__FDPIC__) */
  #else /* !FINI_SECTION_ASM_OP && !FINI_ARRAY_SECTION_ASM_OP */
  static void __attribute__((used))
  __do_global_dtors_aux_1 (void)


It'd be good to avoid hard-coding the pointer size.  Would it work to do:

__asm__("\t.equ\.t__do_global_dtors_aux_alias, __do_global_dtors_aux\n");
extern char __do_global_dtors_aux_alias;
static void *__do_global_dtors_aux_fini_array_entry[]
__attribute__ ((__used__, section(".fini_array"), aligned(sizeof(void *
= { &__do_global_dtors_aux_alias };

?  Similarly for the init_array.


OK, done.


AFAICT this and 02/21 are the only patches that aren't Arm-specific,
is that right?

Thanks,
Richard
.



>From ea0eee1ddeddef92277ae68eac4af28994c2902c Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Thu, 8 Feb 2018 11:12:52 +0100
Subject: [ARM/FDPIC v6 05/24] [ARM] FDPIC: Fix __do_global_dtors_aux and
 frame_dummy generation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In FDPIC, we need to make sure __do_global_dtors_aux and frame_dummy
are referenced by their address, not by pointers to the function
descriptors.

2019-XX-XX  Christophe Lyon  
	Mickaël Guêné 

	libgcc/
	* libgcc/crtstuff.c: Add support for FDPIC.

Change-Id: I0bc4b1232fbf3c69068fb23a1b9cafc895d141b1

diff --git a/libgcc/crtstuff.c b/libgcc/crtstuff.c
index 4927a9f..6659039 100644
--- a/libgcc/crtstuff.c
+++ b/libgcc/crtstuff.c
@@ -429,9 +429,17 @@ __do_global_dtors_aux (void)
 #ifdef FINI_SECTION_ASM_OP
 CRT_CALL_STATIC_FUNCTION (FINI_SECTION_ASM_OP, __do_global_dtors_aux)
 #elif defined (FINI_ARRAY_SECTION_ASM_OP)
+#if defined(__FDPIC__)
+__asm__("\t.equ\t__do_global_dtors_aux_alias, __do_global_dtors_aux\n");
+extern char __do_global_dtors_aux_alias;
+static void *__do_global_dtors_aux_fini_array_entry[]
+__attribute__ ((__used__, section(".fini_array"), aligned(sizeof(void *
+ = { &__do_global_dtors_aux_alias };
+#else /* defined(__FDPIC__) */
 static func_ptr __do_global_dtors_aux_fini_array_entry[]
   __attribute__ ((__used__, section(".fini_array"), aligned(sizeof(func_ptr
   = { __do_global_dtors_aux };
+#endif /* defined(__FDPIC__) */
 #else /* !FINI_SECTION_ASM_OP && !FINI_ARRAY_SECTION_ASM_OP */
 static void __attribute__((used))
 __do_global_dtors_aux_1 (void)
@@ -473,9 +481,17 @@ frame_dummy (void)
 #ifdef __LIBGCC_INIT_SECTION_ASM_OP__
 CRT_CALL_STATIC_FUNCTION (__LIBGCC_INIT_SECTION_ASM_OP__, frame_dummy)
 #else /* defined(__LIBGCC_INIT_SECTION_ASM_OP__) */
+#if defined(__FDPIC__)
+__asm__("\t.equ\t__frame_dummy_alias, frame_dummy\n");
+extern char __frame_dummy_alias;
+static void *__frame_dummy_init_array_entry[]
+__attribute__ ((__used__, section(".init_array"), aligned(sizeof(void *
+ = { &__frame_dummy_alias };
+#else /* defined(__FDPIC__) */
 static func_ptr __frame_dummy_init_array_entry[]
   __attribute__ ((__used__, section(".init_array"), aligned(sizeof(func_ptr
   = { frame_dummy };
+#endif /* defined(__FDPIC__) */
 #endif /* !defined(__LIBGCC_INIT_SECTION_ASM_OP__) */
 #endif /* USE_EH_FRAME_REGISTRY || USE_TM_CLONE_REGISTRY */
 
-- 
2.6.3



Re: [ARM/FDPIC v5 04/21] [ARM] FDPIC: Add support for FDPIC for arm architecture

2019-08-29 Thread Christophe Lyon

On 16/07/2019 13:58, Richard Sandiford wrote:

Christophe Lyon  writes:

The FDPIC register is hard-coded to r9, as defined in the ABI.

We have to disable tailcall optimizations if we don't know if the
target function is in the same module. If not, we have to set r9 to
the value associated with the target module.

When generating a symbol address, we have to take into account whether
it is a pointer to data or to a function, because different
relocations are needed.

2019-XX-XX  Christophe Lyon  
Mickaël Guêné 

* config/arm/arm-c.c (__FDPIC__): Define new pre-processor macro
in FDPIC mode.
* config/arm/arm-protos.h (arm_load_function_descriptor): Declare
new function.
* config/arm/arm.c (arm_option_override): Define pic register to
FDPIC_REGNUM.
(arm_function_ok_for_sibcall): Disable sibcall optimization if we
have no decl or go through PLT.
(arm_load_pic_register): Handle TARGET_FDPIC.
(arm_is_segment_info_known): New function.
(arm_pic_static_addr): Add support for FDPIC.
(arm_load_function_descriptor): New function.
(arm_assemble_integer): Add support for FDPIC.
* config/arm/arm.h (PIC_OFFSET_TABLE_REG_CALL_CLOBBERED):
Define. (FDPIC_REGNUM): New define.
* config/arm/arm.md (call): Add support for FDPIC.
(call_value): Likewise.
(*restore_pic_register_after_call): New pattern.
(untyped_call): Disable if FDPIC.
(untyped_return): Likewise.
* config/arm/unspecs.md (UNSPEC_PIC_RESTORE): New.

Change-Id: I8fb1a6b85ace672184013568c5d28fbda2f7fda4

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 6e256ee..34695fa 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -203,6 +203,8 @@ arm_cpu_builtins (struct cpp_reader* pfile)
builtin_define ("__ARM_EABI__");
  }
  
+  def_or_undef_macro (pfile, "__FDPIC__", TARGET_FDPIC);

+
def_or_undef_macro (pfile, "__ARM_ARCH_EXT_IDIV__", TARGET_IDIV);
def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);
  
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

index 485bc68..272968a 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -139,6 +139,7 @@ extern int arm_max_const_double_inline_cost (void);
  extern int arm_const_double_inline_cost (rtx);
  extern bool arm_const_double_by_parts (rtx);
  extern bool arm_const_double_by_immediates (rtx);
+extern rtx arm_load_function_descriptor (rtx funcdesc);
  extern void arm_emit_call_insn (rtx, rtx, bool);
  bool detect_cmse_nonsecure_call (tree);
  extern const char *output_call (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 45abcd8..d9397b5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3485,6 +3485,15 @@ arm_option_override (void)
if (flag_pic && TARGET_VXWORKS_RTP)
  arm_pic_register = 9;
  
+  /* If in FDPIC mode then force arm_pic_register to be r9.  */

+  if (TARGET_FDPIC)
+{
+  arm_pic_register = FDPIC_REGNUM;
+  if (! TARGET_ARM && ! TARGET_THUMB2)
+   sorry ("FDPIC mode is supported on architecture versions that "
+  "support ARM or Thumb-2 only.");
+}
+
if (arm_pic_register_string != NULL)
  {
int pic_register = decode_reg_name (arm_pic_register_string);


Isn't this equivalent to rejecting Thumb-1?  I think that would be
clearer in both the condition and the error message.


Right, fixed.


How does this interact with arm_pic_data_is_text_relative?  Are both
values supported?

It doesn't interact well... it only works with the default value.
Otherwise, there are compiler crashes.




@@ -7295,6 +7304,21 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
if (cfun->machine->sibcall_blocked)
  return false;
  
+  if (TARGET_FDPIC)

+{
+  /* In FDPIC, never tailcall something for which we have no decl:
+the target function could be in a different module, requiring
+a different FDPIC register value.  */
+  if (decl == NULL)
+   return false;
+
+  /* Don't tailcall if we go through the PLT since the FDPIC
+register is then corrupted and we don't restore it after
+static function calls.  */
+  if (!targetm.binds_local_p (decl))
+   return false;
+}
+
/* Never tailcall something if we are generating code for Thumb-1.  */
if (TARGET_THUMB1)
  return false;
@@ -7711,7 +7735,9 @@ arm_load_pic_register (unsigned long saved_regs 
ATTRIBUTE_UNUSED, rtx pic_reg)
  {
rtx l1, labelno, pic_tmp, pic_rtx;
  
-  if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)

+  if (crtl->uses_pic_offset_table == 0
+  || TARGET_SINGLE_PIC_BASE
+  || TARGET_FDPIC)
  return;
  
gcc_assert (flag_pic);

@@ -7780,28 +7806,142 @@ arm_load_pic_register (unsigned long saved_regs 
ATTRIBUTE_UNUSED, rtx pic_reg)
emit_use (pic_reg);
  }
  
+/* Try to determine w

Re: [ARM/FDPIC v5 13/21] [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture

2019-08-29 Thread Kyrill Tkachov

Hi Christophe,

On 5/15/19 1:39 PM, Christophe Lyon wrote:

Without this, when we are unwinding across a signal frame we can jump
to an even address which leads to an exception.

This is needed in __gnu_persnality_sigframe_fdpic() when restoring the
PC from the signal frame since the PC saved by the kernel has the LSB
bit set to zero.

2019-XX-XX  Christophe Lyon  
    Mickaël Guêné 

    libgcc/
    * config/arm/unwind-arm.c (_Unwind_VRS_Set): Handle v7m
    architecture.

Change-Id: Ie84de548226bcf1751e19a09e8f091fb3013ccea

diff --git a/libgcc/config/arm/unwind-arm.c 
b/libgcc/config/arm/unwind-arm.c

index 9ba73e7..ba47150 100644
--- a/libgcc/config/arm/unwind-arm.c
+++ b/libgcc/config/arm/unwind-arm.c
@@ -199,6 +199,11 @@ _Unwind_VRS_Result _Unwind_VRS_Set 
(_Unwind_Context *context,

 return _UVRSR_FAILED;

   vrs->core.r[regno] = *(_uw *) valuep;
+#if defined(__ARM_ARCH_7M__)
+  /* Force LSB bit since we always run thumb code.  */
+  if (regno == 15)
+   vrs->core.r[regno] |= 1;
+#endif


Hmm, this looks quite specific. There are other architectures that are 
thumb-only too (6-M, 7E-M etc).


Would checking for __thumb__ be better?

Thanks,

Kyrill



   return _UVRSR_OK;

 case _UVRSC_VFP:
--
2.6.3



[PATCH V6 05/11] bpf: new GCC port

2019-08-29 Thread Jose E. Marchesi


This patch adds a port for the Linux kernel eBPF architecture to GCC.

ChangeLog:

  * configure.ac: Support for bpf-*-* targets.
  * configure: Regenerate.

contrib/ChangeLog:

  * config-list.mk (LIST): Disable go in bpf-*-* targets.

gcc/ChangeLog:

  * config.gcc: Support for bpf-*-* targets.
  * common/config/bpf/bpf-common.c: New file.
  * config/bpf/t-bpf: Likewise.
  * config/bpf/predicates.md: Likewise.
  * config/bpf/constraints.md: Likewise.
  * config/bpf/bpf.opt: Likewise.
  * config/bpf/bpf.md: Likewise.
  * config/bpf/bpf.h: Likewise.
  * config/bpf/bpf.c: Likewise.
  * config/bpf/bpf-protos.h: Likewise.
  * config/bpf/bpf-opts.h: Likewise.
  * config/bpf/bpf-helpers.h: Likewise.
  * config/bpf/bpf-helpers.def: Likewise.
---
 ChangeLog  |   5 +
 configure  |  54 ++-
 configure.ac   |  54 ++-
 contrib/ChangeLog  |   4 +
 contrib/config-list.mk |   3 +-
 gcc/ChangeLog  |  16 +
 gcc/common/config/bpf/bpf-common.c |  55 +++
 gcc/config.gcc |   9 +
 gcc/config/bpf/bpf-helpers.def | 194 
 gcc/config/bpf/bpf-helpers.h   | 327 +
 gcc/config/bpf/bpf-opts.h  |  56 +++
 gcc/config/bpf/bpf-protos.h|  33 ++
 gcc/config/bpf/bpf.c   | 948 +
 gcc/config/bpf/bpf.h   | 539 +
 gcc/config/bpf/bpf.md  | 497 +++
 gcc/config/bpf/bpf.opt | 123 +
 gcc/config/bpf/constraints.md  |  32 ++
 gcc/config/bpf/predicates.md   |  64 +++
 gcc/config/bpf/t-bpf   |   0
 19 files changed, 3010 insertions(+), 3 deletions(-)
 create mode 100644 gcc/common/config/bpf/bpf-common.c
 create mode 100644 gcc/config/bpf/bpf-helpers.def
 create mode 100644 gcc/config/bpf/bpf-helpers.h
 create mode 100644 gcc/config/bpf/bpf-opts.h
 create mode 100644 gcc/config/bpf/bpf-protos.h
 create mode 100644 gcc/config/bpf/bpf.c
 create mode 100644 gcc/config/bpf/bpf.h
 create mode 100644 gcc/config/bpf/bpf.md
 create mode 100644 gcc/config/bpf/bpf.opt
 create mode 100644 gcc/config/bpf/constraints.md
 create mode 100644 gcc/config/bpf/predicates.md
 create mode 100644 gcc/config/bpf/t-bpf

diff --git a/configure.ac b/configure.ac
index 1fe97c001cc..b8ce2ad20b9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -638,6 +638,9 @@ case "${target}" in
 # No hosted I/O support.
 noconfigdirs="$noconfigdirs target-libssp"
 ;;
+  bpf-*-*)
+noconfigdirs="$noconfigdirs target-libssp"
+;;
   powerpc-*-aix* | rs6000-*-aix*)
 noconfigdirs="$noconfigdirs target-libssp"
 ;;
@@ -672,12 +675,43 @@ if test "${ENABLE_LIBSTDCXX}" = "default" ; then
 avr-*-*)
   noconfigdirs="$noconfigdirs target-libstdc++-v3"
   ;;
+bpf-*-*)
+  noconfigdirs="$noconfigdirs target-libstdc++-v3"
+  ;;
 ft32-*-*)
   noconfigdirs="$noconfigdirs target-libstdc++-v3"
   ;;
   esac
 fi
 
+# Disable C++ on systems where it is known to not work.
+# For testing, you can override this with --enable-languages=c++.
+case ,${enable_languages}, in
+  *,c++,*)
+;;
+  *)
+  case "${target}" in
+bpf-*-*)
+  unsupported_languages="$unsupported_languages c++"
+  ;;
+  esac
+  ;;
+esac
+
+# Disable Objc on systems where it is known to not work.
+# For testing, you can override this with --enable-languages=objc.
+case ,${enable_languages}, in
+  *,objc,*)
+;;
+  *)
+  case "${target}" in
+bpf-*-*)
+  unsupported_languages="$unsupported_languages objc"
+  ;;
+  esac
+  ;;
+esac
+
 # Disable D on systems where it is known to not work.
 # For testing, you can override this with --enable-languages=d.
 case ,${enable_languages}, in
@@ -687,6 +721,9 @@ case ,${enable_languages}, in
 case "${target}" in
   *-*-darwin*)
unsupported_languages="$unsupported_languages d"
+;;
+  bpf-*-*)
+   unsupported_languages="$unsupported_languages d"
;;
 esac
 ;;
@@ -715,6 +752,9 @@ case "${target}" in
 # See .
 unsupported_languages="$unsupported_languages fortran"
 ;;
+  bpf-*-*)
+unsupported_languages="$unsupported_languages fortran"
+;;
 esac
 
 # Disable libffi for some systems.
@@ -761,6 +801,9 @@ case "${target}" in
   arm*-*-symbianelf*)
 noconfigdirs="$noconfigdirs target-libffi"
 ;;
+  bpf-*-*)
+noconfigdirs="$noconfigdirs target-libffi"
+;;
   cris-*-* | crisv32-*-*)
 case "${target}" in
   *-*-linux*)
@@ -807,7 +850,7 @@ esac
 # Disable the go frontend on systems where it is known to not work. Please keep
 # this in sync with contrib/config-list.mk.
 case "${target}" in
-*-*-darwin* | *-*-cygwin* | *-*-mingw*)
+*-*-darwin* | *-*-cygwin* | *-*-mingw* | bpf-* )
 unsupported_languages="$unsupported_languages go"
 

[PATCH][GCC] Complex division improvements in GCC

2019-08-29 Thread Elen Kalda
Hi all,

Advice and help needed! 

This patch makes changes to the the complex division in GCC. The algorithm 
used is same as in https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01629.html - 
same problems, same improvement in robustness, same loss in accuracy. 

Since Baudin adds another underflow check, two more basic blocks get added 
during the cplxlower1 pass. 

No problems with bootstrap on aarch64-none-linux-gnu. Unsurprisingly, there
are regressions in gcc/testsuite/gcc.dg/torture/builtin-math-7.c. As in the 
patch linked above, the regressions in that test are due to the loss in 
accuracy.

To evaluate the performance, the same test which generates 360 000 000 random 
numbers was used. Doing one less division results in a nice 11.32% 
improvement in performance:

| CPU time

smiths  | 7 290 996
b1div   | 6 465 590

That implementation works (in a sense that it produces an expected result), 
but it could be made more efficient and clean. As an example, the cplxlower1
pass assigns one variable to another variable, which seems redundant:

[...]

  [local count: 1063004407]:
  # i_19 = PHI <0(2), i_15(7)>
  _9 = REALPART_EXPR ;
  _7 = IMAGPART_EXPR ;
  _1 = COMPLEX_EXPR <_9, _7>;
  _18 = REALPART_EXPR ;
  _17 = IMAGPART_EXPR ;
  _2 = COMPLEX_EXPR <_18, _17>;
  _16 = ABS_EXPR <_18>;
  _21 = ABS_EXPR <_17>;
  _22 = _16 > _21;
  if (_22 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 531502204]:
  _23 = _17 / _18;
  _24 = _17 * _23;
  _25 = _18 + _24;
  _26 = 1.0e+0 / _25;
  _27 = _23 == 0.0;
  if (_27 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 265751102]:
  _28 = _7 / _18;
  _29 = _17 * _28;
  _30 = _9 + _29;
  _31 = _26 * _30;
  _32 = _9 / _18;
  _33 = _17 * _32;
  _34 = _7 - _33;
  _35 = _26 * _34;
  _83 = _31; <--- could these extra assignments be avoided?
  _84 = _35; <---|
  goto ; [100.00%]

   [local count: 265751102]:
  _36 = _7 * _23;
  _37 = _9 + _36;
  _38 = _26 * _37;
  _39 = _9 * _23;
  _40 = _7 - _39;
  _41 = _26 * _40;
  _81 = _38;
  _82 = _41;

   [local count: 531502204]:
  # _71 = PHI <_83(12), _81(13)>
  # _72 = PHI <_84(12), _82(13)>
  _85 = _71;
  _86 = _72;
  goto ; [100.00%]

   [local count: 531502204]:
  _42 = _18 / _17;
  _43 = _18 * _42;
  _44 = _17 + _43;
  _45 = 1.0e+0 / _44;
  _46 = _42 == 0.0;
  if (_46 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 265751102]:
  _47 = _9 / _17;
  _48 = _18 * _47;
  _49 = _7 + _48;
  _50 = _45 * _49;
  _51 = _7 / _17;
  _52 = _18 * _51;
  _53 = _9 - _52;
  _54 = _45 * _53;
  _77 = _50;
  _78 = _54;
  goto ; [100.00%]

   [local count: 265751102]:
  _55 = _9 * _42;
  _56 = _7 + _55;
  _57 = _45 * _56;
  _58 = _7 * _42;
  _59 = _9 - _58;
  _60 = _45 * _59;
  _75 = _57;
  _76 = _60;

   [local count: 531502204]:
  # _73 = PHI <_77(15), _75(16)>
  # _74 = PHI <_78(15), _76(16)>
  _61 = -_74;
  _79 = _73;
  _80 = _61;

[...]

Best wishes,
Elen

gcc/ChangeLog:

2019-08-29  Elen Kalda  

* fold-const.c (fold_negate_const): Make the fold_negate_const function 
non-static
(const_binop): Implement Baudin's algorithm for complex division
* fold-const.h (fold_negate_const): Add a fold_negate_const function 
declaration
* tree-complex.c (complex_div_internal_wide): New function to aid with the 
wide complex division
(expand_complex_div_wide): Implement Baudin's algorithm for complex 
division
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 54c850a3ee1f5db7c20fc8ab07ea504d634b55b8..71c1631881b693f973fa9ef94154abb02064e1c1 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -194,6 +194,7 @@ extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (combined_fn);
 extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL);
 extern wide_int tree_nonzero_bits (const_tree);
+extern tree fold_negate_const (tree arg0, tree type);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
POINTER_PLUS_EXPR.  Use location LOC for this conversion.  */
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 0bd68b5e2d484d6f3be52b1d38be5a9f41637355..e4ea9046fbf010861b726dd742fd3834b35e80ec 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -133,7 +133,7 @@ static tree fold_binary_op_with_conditional_arg (location_t,
 		 enum tree_code, tree,
 		 tree, tree,
 		 tree, tree, int);
-static tree fold_negate_const (tree, tree);
+tree fold_negate_const (tree, tree);
 static tree fold_not_const (const_tree, tree);
 static tree fold_relational_const (enum tree_code, tree, tree, tree);
 static tree fold_convert_const (enum tree_code, tree, tree);
@@ -1387,7 +1387,9 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
   tree i1 = TREE_IMAGPART (arg1);
   tree r2 = TREE_REALPART (arg2);
   tree i2 = TREE_IMAGPART (arg2);
+
   tree real, imag;
+  imag = real = NULL_TREE;
 
   switch 

[PATCH V6 02/11] opt-functions.awk: fix comparison of limit, begin and end

2019-08-29 Thread Jose E. Marchesi
The function integer_range_info makes sure that, if provided, the
initial value fills in the especified range.  However, it is necessary
to convert the values to a numerical context before comparing, to make
sure awk is using arithmetical order and not lexicographical order.

gcc/ChangeLog:

* opt-functions.awk (integer_range_info): Make sure values are in
numeric context before operating with them.
---
 gcc/ChangeLog | 5 +
 gcc/opt-functions.awk | 7 ---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/opt-functions.awk b/gcc/opt-functions.awk
index 1190e6d6b66..c1da80c648c 100644
--- a/gcc/opt-functions.awk
+++ b/gcc/opt-functions.awk
@@ -346,9 +346,10 @@ function search_var_name(name, opt_numbers, opts, flags, 
n_opts)
 function integer_range_info(range_option, init, option)
 {
 if (range_option != "") {
-   start = nth_arg(0, range_option);
-   end = nth_arg(1, range_option);
-   if (init != "" && init != "-1" && (init < start || init > end))
+   ival = init + 0;
+   start = nth_arg(0, range_option) + 0;
+   end = nth_arg(1, range_option) + 0;
+   if (init != "" && init != "-1" && (ival < start || ival > end))
  print "#error initial value " init " of '" option "' must be in range 
[" start "," end "]"
return start ", " end
 }
-- 
2.11.0



[PATCH V6 00/11] eBPF support for GCC

2019-08-29 Thread Jose E. Marchesi
[Differences from V5:
. Use TARGET_BIG_ENDIAN instead of TARGET_LITTLE_ENDIAN, and make sure
  ASM_SPEC always passes an endianness selector argument to the
  assembler.
. De-obfuscate the usage of arg.type_size_in_bytes and
  arg_aggregate-type_p.
. Increase the cummulative argument unconditionally in
  bpf_function_arg_advance.
. Simplify conditional in bpf_print_operand_address.
. Remove predicates reg_or_memory_operand and mov_dst_operand, because
  they became the same than nonimmediate_operand.
. The kernel helper get_current_task() doesn't accept arguments.
  Adjust the helper definition accordingly, and the corresponding
  test.]

Hi people!

This patch series introduces a port of GCC to eBPF, which is a virtual
machine that resides in the Linux kernel.  Initially intended for
user-level packet capture and filtering, eBPF is nowadays generalized
to serve as a general-purpose infrastructure also for non-networking
purposes.

The binutils support is already upstream.  See
https://sourceware.org/ml/binutils/2019-05/msg00306.html.

eBPF architecture and ABI
=
   
Documentation for eBPF can be found in the linux kernel source tree,
file Documentation/networking/filter.txt.  It covers the instructions
set, the way the interpreter works and the many restrictions imposed
by the kernel verifier.
   
As for the ABI, att this moment compiled eBPF doesn't have very well
established conventions.  The details on what is expected to be in an
ELF file containing eBPF is determined, in practice, by what the llvm
BPF backend generates and what is expected by the the two existing
kernel loaders: bpf_load.c and libbpf.

We hope that the addition of this port to the GNU toolchain will help
to mature this domain.

Overview of the patch series

   
The first few patches are preparatory:

. The first patch updates config.guess and config.sub from the
  'config' upstream project, in order to recognize bpf-*-* triplets.

. The second patch fixes an integrity check in opt-functions.awk.

. The third patch annotates many tests in the gcc.c-torture/compile
  testsuite with their requirements in terms of stack frame size,
  using the existing dg-require-stack-size machinery.

. The fourth patch introduces a new effective target flag called
  indirect_call, and annotates the tests in gcc.c-torture/compile
  accordingly.

The rest of the patches are BPF specific:

The fifth patch adds the new GCC port proper.  Machine description,
implementation of target hooks and macros, command-line options and
the like.

The sixth patch adds a libgcc port for eBPF.  At the moment, it is
minimal and it basically addresses the limitations imposed by the
target, by excluding a few functions in libgcc2 (all of them related
to TImodes) whose default implementations exceed the eBPF stack limit.

The seventh, eight and ninth patches deal with testing the new
port. The gcc.target testsuite is extended with eBPF-specific tests,
covering the backend-specific built-in functions and diagnostics.  The
check-effective-target functions are made aware of eBPF targets. Many
tests in the gcc.c-torture/compile testsuite are annotated to be
skipped in bpf-*-* targets, since they violate some restriction
imposed by the hardware (such as surpassing the stack limit.)  The
resulting testsuite doesn't have unexpected failures, and is currently
the principal way to check for regressions in the port.  Likewise,
many tests in the gcc.dg testsuite are annotated to be skipped in
bpf-*-* targets.

The tenth patch adds documentation updates to the GCC manual,
including information on the new command line options and compiler
built-ins.

Finally, the last patch adds myself as the maintainer of the BPF port.
I personally commit to evolve and maintain the port for as long as
necessary, and to find a suitable replacement in case I have to step
down for whatever reason.

Some notes on the port
==

As a compilation target, eBPF is rather peculiar.  This is mainly due
to the quite hard restrictions imposed by the kernel verifier, and
also due to the security-driven design of the architecture itself.

To list a few examples:

. The stack is disjoint, and each stack frame corresponding to a
  function activation is isolated: it is not possible for a callee to
  access the stack frame of the caller, nor for a caller to access the
  stack frame of it's callees.  The frame pointer register is
  read-only.

. Therefore it is not possible to pass arguments in the stack.

. Argument passing is restricted to 5 arguments.

. Each stack frame is limited to 512 bytes by default.

. The instruction set doesn't support indirect jumps.

. The instruction set doesn't support indirect calls.

. The architecture doesn't provide an explicit stack pointer.
  Instead, the eBPF "hardware" (in this case the kernel verifier)
  examines the compiled program and, by looking at the way the stack
  is accessed, estimates the size of the 

[PATCH V6 06/11] bpf: new libgcc port

2019-08-29 Thread Jose E. Marchesi
This patch adds an eBPF port to libgcc.

As of today, compiled eBPF programs do not support a single-entry
point schema.  Instead, a BPF "executable" is a relocatable ELF object
file containing multiple entry points, in certain named sections.

Also, the BPF loaders in the kernel do not execute .ini/.fini
constructors/destructors.  Therefore, this patch provides empty crtn.S
and cri.S files.

libgcc/ChangeLog:

* config.host: Set cpu_type for bpf-*-* targets.
* config/bpf/t-bpf: Likewise.
* config/bpf/crtn.S: Likewise.
* config/bpf/crti.S: New file.
---
 libgcc/ChangeLog |  7 +++
 libgcc/config.host   |  7 +++
 libgcc/config/bpf/crti.S |  0
 libgcc/config/bpf/crtn.S |  0
 libgcc/config/bpf/t-bpf  | 23 +++
 5 files changed, 37 insertions(+)
 create mode 100644 libgcc/config/bpf/crti.S
 create mode 100644 libgcc/config/bpf/crtn.S
 create mode 100644 libgcc/config/bpf/t-bpf

diff --git a/libgcc/config.host b/libgcc/config.host
index 503ebb6be20..2e9fbc35482 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -107,6 +107,9 @@ avr-*-*)
 bfin*-*)
cpu_type=bfin
;;
+bpf-*-*)
+cpu_type=bpf
+;;
 cr16-*-*)
;;
 crisv32-*-*)
@@ -526,6 +529,10 @@ bfin*-*)
tmake_file="$tmake_file bfin/t-bfin t-fdpbit"
extra_parts="crtbegin.o crtend.o crti.o crtn.o"
 ;;
+bpf-*-*)
+tmake_file="$tmake_file ${cpu_type}/t-${cpu_type}"
+extra_parts="crti.o crtn.o"
+   ;;
 cr16-*-elf)
tmake_file="${tmake_file} cr16/t-cr16 cr16/t-crtlibid t-fdpbit"
extra_parts="$extra_parts crti.o crtn.o crtlibid.o"
diff --git a/libgcc/config/bpf/crti.S b/libgcc/config/bpf/crti.S
new file mode 100644
index 000..e69de29bb2d
diff --git a/libgcc/config/bpf/crtn.S b/libgcc/config/bpf/crtn.S
new file mode 100644
index 000..e69de29bb2d
diff --git a/libgcc/config/bpf/t-bpf b/libgcc/config/bpf/t-bpf
new file mode 100644
index 000..88129a78f61
--- /dev/null
+++ b/libgcc/config/bpf/t-bpf
@@ -0,0 +1,23 @@
+LIB2ADDEH = 
+
+crti.o: $(srcdir)/config/bpf/crti.S
+   $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $<
+
+crtn.o: $(srcdir)/config/bpf/crtn.S
+   $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $<
+
+# Some of the functions defined in libgcc2 exceed the eBPF stack
+# limit, or other restrictions imposed by this peculiar target.
+# Therefore we have to exclude them here.
+#
+# Patterns in bpf.md must guarantee that no calls to the excluded
+# functions are ever generated, and compiler tests should make sure
+# this holds.
+#
+# Note that the modes in the function names below are misleading: di
+# means TImode.
+LIB2FUNCS_EXCLUDE = _mulvdi3 _divdi3 _moddi3 _divmoddi4 _udivdi3 _umoddi3 \
+_udivmoddi4
+
+# Prevent building "advanced" stuff (for example, gcov support).
+INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
-- 
2.11.0



[PATCH V6 09/11] bpf: adjust GCC testsuite to eBPF limitations

2019-08-29 Thread Jose E. Marchesi
This patch makes many tests in gcc.dg and gcc.c-torture to be skipped
in bpf-*-* targets.  This is due to the many limitations imposed by
eBPF to what would be perfectly valid C code: no support for more than
5 arguments to function calls, no support for indirect jumps, a very
limited range for direct jumps, etc.

Hopefully some of these restrictions will be relaxed in the future.
Also, as semantics associated with object linking get developed in
eBPF, it may be possible at some point to provide a set of standard
run-time libraries for eBPF programs.

gcc/testsuite/ChangeLog:

* gcc.dg/builtins-config.h: eBPF doesn't support C99 standard
functions.
* gcc.c-torture/compile/20101217-1.c: Add a function prototype for
printf.
* gcc.c-torture/compile/2211-1.c: Skip if target bpf-*-*.
* gcc.c-torture/compile/poor.c: Likewise.
* gcc.c-torture/compile/pr25311.c: Likewise.
* gcc.c-torture/compile/pr39928-1.c: Likewise.
* gcc.c-torture/compile/pr70061.c: Likewise.
* gcc.c-torture/compile/920501-7.c: Likewise.
* gcc.c-torture/compile/2403-1.c: Likewise.
* gcc.c-torture/compile/20001226-1.c: Likewise.
* gcc.c-torture/compile/20030903-1.c: Likewise.
* gcc.c-torture/compile/20031125-1.c: Likewise.
* gcc.c-torture/compile/20040101-1.c: Likewise.
* gcc.c-torture/compile/20040317-2.c: Likewise.
* gcc.c-torture/compile/20040726-1.c: Likewise.
* gcc.c-torture/compile/20051216-1.c: Likewise.
* gcc.c-torture/compile/900313-1.c: Likewise.
* gcc.c-torture/compile/920625-1.c: Likewise.
* gcc.c-torture/compile/930421-1.c: Likewise.
* gcc.c-torture/compile/930623-1.c: Likewise.
* gcc.c-torture/compile/961004-1.c: Likewise.
* gcc.c-torture/compile/980504-1.c: Likewise.
* gcc.c-torture/compile/980816-1.c: Likewise.
* gcc.c-torture/compile/990625-1.c: Likewise.
* gcc.c-torture/compile/DFcmp.c: Likewise.
* gcc.c-torture/compile/HIcmp.c: Likewise.
* gcc.c-torture/compile/HIset.c: Likewise.
* gcc.c-torture/compile/QIcmp.c: Likewise.
* gcc.c-torture/compile/QIset.c: Likewise.
* gcc.c-torture/compile/SFset.c: Likewise.
* gcc.c-torture/compile/SIcmp.c: Likewise.
* gcc.c-torture/compile/SIset.c: Likewise.
* gcc.c-torture/compile/UHIcmp.c: Likewise.
* gcc.c-torture/compile/UQIcmp.c: Likewise.
* gcc.c-torture/compile/USIcmp.c: Likewise.
* gcc.c-torture/compile/consec.c: Likewise.
* gcc.c-torture/compile/limits-fndefn.c: Likewise.
* gcc.c-torture/compile/lll.c: Likewise.
* gcc.c-torture/compile/parms.c: Likewise.
* gcc.c-torture/compile/pass.c: Likewise.
* gcc.c-torture/compile/pp.c: Likewise.
* gcc.c-torture/compile/pr32399.c: Likewise.
* gcc.c-torture/compile/pr34091.c: Likewise.
* gcc.c-torture/compile/pr34688.c: Likewise.
* gcc.c-torture/compile/pr37258.c: Likewise.
* gcc.c-torture/compile/pr37327.c: Likewise.
* gcc.c-torture/compile/pr37381.c: Likewise.
* gcc.c-torture/compile/pr37669-2.c: Likewise.
* gcc.c-torture/compile/pr37669.c: Likewise.
* gcc.c-torture/compile/pr37742-3.c: Likewise.
* gcc.c-torture/compile/pr44063.c: Likewise.
* gcc.c-torture/compile/pr48596.c: Likewise.
* gcc.c-torture/compile/pr51856.c: Likewise.
* gcc.c-torture/compile/pr54428.c: Likewise.
* gcc.c-torture/compile/pr54713-1.c: Likewise.
* gcc.c-torture/compile/pr54713-2.c: Likewise.
* gcc.c-torture/compile/pr54713-3.c: Likewise.
* gcc.c-torture/compile/pr55921.c: Likewise.
* gcc.c-torture/compile/pr70240.c: Likewise.
* gcc.c-torture/compile/pr70355.c: Likewise.
* gcc.c-torture/compile/pr82052.c: Likewise.
* gcc.c-torture/compile/pr83487.c: Likewise.
* gcc.c-torture/compile/pr86122.c: Likewise.
* gcc.c-torture/compile/pret-arg.c: Likewise.
* gcc.c-torture/compile/regs-arg-size.c: Likewise.
* gcc.c-torture/compile/structret.c: Likewise.
* gcc.c-torture/compile/uuarg.c: Likewise.
* gcc.dg/20001009-1.c: Likewise.
* gcc.dg/20020418-1.c: Likewise.
* gcc.dg/20020426-2.c: Likewise.
* gcc.dg/20020430-1.c: Likewise.
* gcc.dg/20040306-1.c: Likewise.
* gcc.dg/20040622-2.c: Likewise.
* gcc.dg/20050603-2.c: Likewise.
* gcc.dg/20050629-1.c: Likewise.
* gcc.dg/20061026.c: Likewise.
* gcc.dg/Warray-bounds-3.c: Likewise.
* gcc.dg/Warray-bounds-30.c: Likewise.
* gcc.dg/Wframe-larger-than-2.c: Likewise.
* gcc.dg/Wframe-larger-than.c: Likewise.
* gcc.dg/Wrestrict-11.c: Likewise.
* gcc.c-torture/compile/2804-1.c: Likewise.
---
 gcc/testsuite/ChangeLog| 87 ++
 gcc/test

[PATCH V6 01/11] Update config.sub and config.guess.

2019-08-29 Thread Jose E. Marchesi
* config.sub: Import upstream version 2019-06-30.
* config.guess: Import upstream version 2019-07-24.
---
 ChangeLog|   5 ++
 config.guess | 264 +++
 config.sub   |  50 +--
 3 files changed, 240 insertions(+), 79 deletions(-)

diff --git a/config.guess b/config.guess
index 8e2a58b864f..97ad0733304 100755
--- a/config.guess
+++ b/config.guess
@@ -2,7 +2,7 @@
 # Attempt to guess a canonical system name.
 #   Copyright 1992-2019 Free Software Foundation, Inc.
 
-timestamp='2019-01-03'
+timestamp='2019-07-24'
 
 # This file is free software; you can redistribute it and/or modify it
 # under the terms of the GNU General Public License as published by
@@ -262,6 +262,9 @@ case 
"$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in
 *:SolidBSD:*:*)
echo "$UNAME_MACHINE"-unknown-solidbsd"$UNAME_RELEASE"
exit ;;
+*:OS108:*:*)
+   echo "$UNAME_MACHINE"-unknown-os108_"$UNAME_RELEASE"
+   exit ;;
 macppc:MirBSD:*:*)
echo powerpc-unknown-mirbsd"$UNAME_RELEASE"
exit ;;
@@ -275,8 +278,8 @@ case 
"$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in
echo "$UNAME_MACHINE"-unknown-redox
exit ;;
 mips:OSF1:*.*)
-echo mips-dec-osf1
-exit ;;
+   echo mips-dec-osf1
+   exit ;;
 alpha:OSF1:*:*)
case $UNAME_RELEASE in
*4.0)
@@ -385,20 +388,7 @@ case 
"$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in
echo sparc-hal-solaris2"`echo "$UNAME_RELEASE"|sed -e 's/[^.]*//'`"
exit ;;
 sun4*:SunOS:5.*:* | tadpole*:SunOS:5.*:*)
-   set_cc_for_build
-   SUN_ARCH=sparc
-   # If there is a compiler, see if it is configured for 64-bit objects.
-   # Note that the Sun cc does not turn __LP64__ into 1 like gcc does.
-   # This test works for both compilers.
-   if [ "$CC_FOR_BUILD" != no_compiler_found ]; then
-   if (echo '#ifdef __sparcv9'; echo IS_64BIT_ARCH; echo '#endif') | \
-   (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \
-   grep IS_64BIT_ARCH >/dev/null
-   then
-   SUN_ARCH=sparcv9
-   fi
-   fi
-   echo "$SUN_ARCH"-sun-solaris2"`echo "$UNAME_RELEASE"|sed -e 
's/[^.]*//'`"
+   echo sparc-sun-solaris2"`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'`"
exit ;;
 i86pc:AuroraUX:5.*:* | i86xen:AuroraUX:5.*:*)
echo i386-pc-auroraux"$UNAME_RELEASE"
@@ -998,22 +988,50 @@ EOF
exit ;;
 mips:Linux:*:* | mips64:Linux:*:*)
set_cc_for_build
+   IS_GLIBC=0
+   test x"${LIBC}" = xgnu && IS_GLIBC=1
sed 's/^//' << EOF > "$dummy.c"
#undef CPU
-   #undef ${UNAME_MACHINE}
-   #undef ${UNAME_MACHINE}el
+   #undef mips
+   #undef mipsel
+   #undef mips64
+   #undef mips64el
+   #if ${IS_GLIBC} && defined(_ABI64)
+   LIBCABI=gnuabi64
+   #else
+   #if ${IS_GLIBC} && defined(_ABIN32)
+   LIBCABI=gnuabin32
+   #else
+   LIBCABI=${LIBC}
+   #endif
+   #endif
+
+   #if ${IS_GLIBC} && defined(__mips64) && defined(__mips_isa_rev) && 
__mips_isa_rev>=6
+   CPU=mipsisa64r6
+   #else
+   #if ${IS_GLIBC} && !defined(__mips64) && defined(__mips_isa_rev) && 
__mips_isa_rev>=6
+   CPU=mipsisa32r6
+   #else
+   #if defined(__mips64)
+   CPU=mips64
+   #else
+   CPU=mips
+   #endif
+   #endif
+   #endif
+
#if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL) || 
defined(MIPSEL)
-   CPU=${UNAME_MACHINE}el
+   MIPS_ENDIAN=el
#else
#if defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || 
defined(MIPSEB)
-   CPU=${UNAME_MACHINE}
+   MIPS_ENDIAN=
#else
-   CPU=
+   MIPS_ENDIAN=
#endif
#endif
 EOF
-   eval "`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^CPU'`"
-   test "x$CPU" != x && { echo "$CPU-unknown-linux-$LIBC"; exit; }
+   eval "`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep 
'^CPU\|^MIPS_ENDIAN\|^LIBCABI'`"
+   test "x$CPU" != x && { echo 
"$CPU${MIPS_ENDIAN}-unknown-linux-$LIBCABI"; exit; }
;;
 mips64el:Linux:*:*)
echo "$UNAME_MACHINE"-unknown-linux-"$LIBC"
@@ -1126,7 +1144,7 @@ EOF
*Pentium)UNAME_MACHINE=i586 ;;
*Pent*|*Celeron) UNAME_MACHINE=i686 ;;
esac
-   echo 
"$UNAME_MACHINE-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}{$UNAME_VERSION}"
+   echo 
"$UNAME_MACHINE-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}${UNAME_VERSION}"
exit ;;
 i*86:*:3.2:*)
if test -f /usr/options/cb.name; then
@@ -1310,38 +1328,39 @@ EOF
echo "$UNAME_MACHINE"-apple-rhapsody"$UNAME_RELEASE"
exit ;;
 *:Darwin:*:*)
-   UNAME_PROCESSOR=`uname -p` || UNAME_PROCESSOR=unknown
-   set_cc_for_build
-   if test "$UNAME_PROCESSOR" = unknown ; then
-   

[PATCH V6 11/11] bpf: add myself as the maintainer for the eBPF port

2019-08-29 Thread Jose E. Marchesi
ChangeLog:

* MAINTAINERS: Add myself as the maintainer of the eBPF port.
Remove myself from Write After Approval section.
---
 ChangeLog   | 5 +
 MAINTAINERS | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5d8402949bc..5d69d696c2c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -57,6 +57,7 @@ arm port  Ramana Radhakrishnan

 arm port   Kyrylo Tkachov  
 avr port   Denis Chertykov 
 bfin port  Jie Zhang   
+bpf port   Jose E. Marchesi
 c6x port   Bernd Schmidt   
 cris port  Hans-Peter Nilsson  
 c-sky port Xianmiao Qu 
@@ -497,7 +498,6 @@ Luis Machado

 Ziga Mahkovec  
 Matthew Malcomson  
 Mikhail Maltsev
-Jose E. Marchesi   
 Patrick Marlier

 Simon Martin   
 Alejandro Martinez 

-- 
2.11.0



[PATCH V6 10/11] bpf: manual updates for eBPF

2019-08-29 Thread Jose E. Marchesi
gcc/ChangeLog:

* doc/invoke.texi (Option Summary): Cover eBPF.
(eBPF Options): New section.
* doc/extend.texi (BPF Built-in Functions): Likewise.
(BPF Kernel Helpers): Likewise.
---
 gcc/ChangeLog   |   7 +++
 gcc/doc/extend.texi | 171 
 gcc/doc/invoke.texi |  37 
 3 files changed, 215 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 4aea4d31761..e821cafff1e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -13604,6 +13604,8 @@ instructions, but allow the compiler to schedule those 
calls.
 * ARM ARMv8-M Security Extensions::
 * AVR Built-in Functions::
 * Blackfin Built-in Functions::
+* BPF Built-in Functions::
+* BPF Kernel Helpers::
 * FR-V Built-in Functions::
 * MIPS DSP Built-in Functions::
 * MIPS Paired-Single Support::
@@ -14601,6 +14603,175 @@ void __builtin_bfin_csync (void)
 void __builtin_bfin_ssync (void)
 @end smallexample
 
+@node BPF Built-in Functions
+@subsection BPF Built-in Functions
+
+The following built-in functions are available for eBPF targets.
+
+@deftypefn {Built-in Function} unsigned long long __builtin_bpf_load_byte 
(unsigned long long @var{offset})
+Load a byte from the @code{struct sk_buff} packet data pointed by the register 
@code{%r6} and return it.
+@end deftypefn
+
+@deftypefn {Built-in Function} unsigned long long __builtin_bpf_load_half 
(unsigned long long @var{offset})
+Load 16-bits from the @code{struct sk_buff} packet data pointed by the 
register @code{%r6} and return it.
+@end deftypefn
+
+@deftypefn {Built-in Function} unsigned long long __builtin_bpf_load_word 
(unsigned long long @var{offset})
+Load 32-bits from the @code{struct sk_buff} packet data pointed by the 
register @code{%r6} and return it.
+@end deftypefn
+
+@node BPF Kernel Helpers
+@subsection BPF Kernel Helpers
+
+These built-in functions are available for calling kernel helpers, and
+they are available depending on the kernel version selected as the
+CPU.
+
+Rather than using the built-ins directly, it is preferred for programs
+to include @file{bpf-helpers.h} and use the wrappers defined there.
+
+For a full description of what the helpers do, the arguments they
+take, and the returned value, see the
+@file{linux/include/uapi/linux/bpf.h} in a Linux source tree.
+
+@smallexample
+void *__builtin_bpf_helper_map_lookup_elem (void *map, void *key)
+int   __builtin_bpf_helper_map_update_elem (void *map, void *key,
+void *value,
+unsigned long long flags)
+int   __builtin_bpf_helper_map_delete_elem (void *map, const void *key)
+int   __builtin_bpf_helper_map_push_elem (void *map, const void *value,
+  unsigned long long flags)
+int   __builtin_bpf_helper_map_pop_elem (void *map, void *value)
+int   __builtin_bpf_helper_map_peek_elem (void *map, void *value)
+int __builtin_bpf_helper_clone_redirect (void *skb,
+ unsigned int ifindex,
+ unsigned long long flags)
+int __builtin_bpf_helper_skb_get_tunnel_key (void *ctx, void *key, int size, 
int flags)
+int __builtin_bpf_helper_skb_set_tunnel_key (void *ctx, void *key, int size, 
int flags)
+int __builtin_bpf_helper_skb_get_tunnel_opt (void *ctx, void *md, int size)
+int __builtin_bpf_helper_skb_set_tunnel_opt (void *ctx, void *md, int size)
+int __builtin_bpf_helper_skb_get_xfrm_state (void *ctx, int index, void *state,
+int size, int flags)
+static unsigned long long __builtin_bpf_helper_skb_cgroup_id (void *ctx)
+static unsigned long long __builtin_bpf_helper_skb_ancestor_cgroup_id
+ (void *ctx, int level)
+int __builtin_bpf_helper_skb_vlan_push (void *ctx, __be16 vlan_proto, __u16 
vlan_tci)
+int __builtin_bpf_helper_skb_vlan_pop (void *ctx)
+int __builtin_bpf_helper_skb_ecn_set_ce (void *ctx)
+
+int __builtin_bpf_helper_skb_load_bytes (void *ctx, int off, void *to, int len)
+int __builtin_bpf_helper_skb_load_bytes_relative (void *ctx, int off, void 
*to, int len, __u32 start_header)
+int __builtin_bpf_helper_skb_store_bytes (void *ctx, int off, void *from, int 
len, int flags)
+int __builtin_bpf_helper_skb_under_cgroup (void *ctx, void *map, int index)
+int __builtin_bpf_helper_skb_change_head (void *, int len, int flags)
+int __builtin_bpf_helper_skb_pull_data (void *, int len)
+int __builtin_bpf_helper_skb_change_proto (void *ctx, __be16 proto, __u64 
flags)
+int __builtin_bpf_helper_skb_change_type (void *ctx, __u32 type)
+int __builtin_bpf_helper_skb_change_tail (void *ctx, __u32 len, __u64 flags)
+int __builtin_bpf_helper_skb_adjust_room (void *ctx, __s32 len_diff, __u32 
mode,
+ unsigned long long flags)
+@end smallexample
+
+Other helpers:
+
+@smallexample
+int __builtin_bpf_helper_probe_rea

[PATCH V6 08/11] bpf: make target-supports.exp aware of eBPF

2019-08-29 Thread Jose E. Marchesi
This patch makes the several effective target checks in
target-supports.exp to be aware of eBPF targets.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_trampolines): Adapt to eBPF.
(check_effective_target_stack_size): Likewise.
(dg-effective-target-value): Likewise.
(check_effective_target_indirect_jumps): Likewise.
(check_effective_target_nonlocal_goto): Likewise.
(check_effective_target_global_constructor): Likewise.
(check_effective_target_return_address): Likewise.
---
 gcc/testsuite/ChangeLog   |  9 +
 gcc/testsuite/lib/target-supports.exp | 18 +++---
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f457a46a02b..ce08a2f8421 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -526,7 +526,8 @@ proc check_effective_target_trampolines { } {
 || [istarget nvptx-*-*]
 || [istarget hppa2.0w-hp-hpux11.23]
 || [istarget hppa64-hp-hpux11.23]
-|| [istarget pru-*-*] } {
+|| [istarget pru-*-*]
+|| [istarget bpf-*-*] } {
return 0;
 }
 return 1
@@ -781,7 +782,7 @@ proc add_options_for_tls { flags } {
 # Return 1 if indirect jumps are supported, 0 otherwise.
 
 proc check_effective_target_indirect_jumps {} {
-if { [istarget nvptx-*-*] } {
+if { [istarget nvptx-*-*] || [istarget bpf-*-*] } {
return 0
 }
 return 1
@@ -790,7 +791,7 @@ proc check_effective_target_indirect_jumps {} {
 # Return 1 if nonlocal goto is supported, 0 otherwise.
 
 proc check_effective_target_nonlocal_goto {} {
-if { [istarget nvptx-*-*] } {
+if { [istarget nvptx-*-*] || [istarget bpf-*-*] } {
return 0
 }
 return 1
@@ -799,10 +800,9 @@ proc check_effective_target_nonlocal_goto {} {
 # Return 1 if global constructors are supported, 0 otherwise.
 
 proc check_effective_target_global_constructor {} {
-if { [istarget nvptx-*-*] } {
-   return 0
-}
-if { [istarget amdgcn-*-*] } {
+if { [istarget nvptx-*-*]
+|| [istarget amdgcn-*-*]
+|| [istarget bpf-*-*] } {
return 0
 }
 return 1
@@ -825,6 +825,10 @@ proc check_effective_target_return_address {} {
 if { [istarget nvptx-*-*] } {
return 0
 }
+# No notion of return address in eBPF.
+if { [istarget bpf-*-*] } {
+   return 0
+}
 # It could be supported on amdgcn, but isn't yet.
 if { [istarget amdgcn*-*-*] } {
return 0
-- 
2.11.0



[PATCH V6 07/11] bpf: gcc.target eBPF testsuite

2019-08-29 Thread Jose E. Marchesi
This patch adds a new testsuite to gcc.target, with eBPF specific
tests.

Tests are included for:
- Target specific diagnostics.
- All built-in functions.

testsuite/ChangeLog:

* gcc.target/bpf/bpf.exp: New file.
* gcc.target/bpf/builtin-load.c: Likewise.
* cc.target/bpf/constant-calls.c: Likewise.
* gcc.target/bpf/diag-funargs.c: Likewise.
* cc.target/bpf/diag-indcalls.c: Likewise.
* gcc.target/bpf/helper-bind.c: Likewise.
* cc.target/bpf/helper-bpf-redirect.c: Likewise.
* gcc.target/bpf/helper-clone-redirect.c: Likewise.
* gcc.target/bpf/helper-csum-diff.c: Likewise.
* gcc.target/bpf/helper-csum-update.c: Likewise.
* gcc.target/bpf/helper-current-task-under-cgroup.c: Likewise.
* gcc.target/bpf/helper-fib-lookup.c: Likewise.
* gcc.target/bpf/helper-get-cgroup-classid.c: Likewise.
* gcc.target/bpf/helper-get-current-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-get-current-comm.c: Likewise.
* gcc.target/bpf/helper-get-current-pid-tgid.c: Likewise.
* gcc.target/bpf/helper-get-current-task.c: Likewise.
* gcc.target/bpf/helper-get-current-uid-gid.c: Likewise.
* gcc.target/bpf/helper-get-hash-recalc.c: Likewise.
* gcc.target/bpf/helper-get-listener-sock.c: Likewise.
* gcc.target/bpf/helper-get-local-storage.c: Likewise.
* gcc.target/bpf/helper-get-numa-node-id.c: Likewise.
* gcc.target/bpf/helper-get-prandom-u32.c: Likewise.
* gcc.target/bpf/helper-get-route-realm.c: Likewise.
* gcc.target/bpf/helper-get-smp-processor-id.c: Likewise.
* gcc.target/bpf/helper-get-socket-cookie.c: Likewise.
* gcc.target/bpf/helper-get-socket-uid.c: Likewise.
* gcc.target/bpf/helper-getsockopt.c: Likewise.
* gcc.target/bpf/helper-get-stack.c: Likewise.
* gcc.target/bpf/helper-get-stackid.c: Likewise.
* gcc.target/bpf/helper-ktime-get-ns.c: Likewise.
* gcc.target/bpf/helper-l3-csum-replace.c: Likewise.
* gcc.target/bpf/helper-l4-csum-replace.c: Likewise.
* gcc.target/bpf/helper-lwt-push-encap.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-action.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-adjust-srh.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-store-bytes.c: Likewise.
* gcc.target/bpf/helper-map-delete-elem.c: Likewise.
* gcc.target/bpf/helper-map-lookup-elem.c: Likewise.
* gcc.target/bpf/helper-map-peek-elem.c: Likewise.
* gcc.target/bpf/helper-map-pop-elem.c: Likewise.
* gcc.target/bpf/helper-map-push-elem.c: Likewise.
* gcc.target/bpf/helper-map-update-elem.c: Likewise.
* gcc.target/bpf/helper-msg-apply-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-cork-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-pop-data.c: Likewise.
* gcc.target/bpf/helper-msg-pull-data.c: Likewise.
* gcc.target/bpf/helper-msg-push-data.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-hash.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-map.c: Likewise.
* gcc.target/bpf/helper-override-return.c: Likewise.
* gcc.target/bpf/helper-perf-event-output.c: Likewise.
* gcc.target/bpf/helper-perf-event-read.c: Likewise.
* gcc.target/bpf/helper-perf-event-read-value.c: Likewise.
* gcc.target/bpf/helper-perf-prog-read-value.c: Likewise.
* gcc.target/bpf/helper-probe-read.c: Likewise.
* gcc.target/bpf/helper-probe-read-str.c: Likewise.
* gcc.target/bpf/helper-probe-write-user.c: Likewise.
* gcc.target/bpf/helper-rc-keydown.c: Likewise.
* gcc.target/bpf/helper-rc-pointer-rel.c: Likewise.
* gcc.target/bpf/helper-rc-repeat.c: Likewise.
* gcc.target/bpf/helper-redirect-map.c: Likewise.
* gcc.target/bpf/helper-set-hash.c: Likewise.
* gcc.target/bpf/helper-set-hash-invalid.c: Likewise.
* gcc.target/bpf/helper-setsockopt.c: Likewise.
* gcc.target/bpf/helper-skb-adjust-room.c: Likewise.
* gcc.target/bpf/helper-skb-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-skb-change-head.c: Likewise.
* gcc.target/bpf/helper-skb-change-proto.c: Likewise.
* gcc.target/bpf/helper-skb-change-tail.c: Likewise.
* gcc.target/bpf/helper-skb-change-type.c: Likewise.
* gcc.target/bpf/helper-skb-ecn-set-ce.c: Likewise.
* gcc.target/bpf/helper-skb-get-tunnel-key.c: Likewise.
* gcc.target/bpf/helper-skb-get-tunnel-opt.c: Likewise.
* gcc.target/bpf/helper-skb-get-xfrm-state.c: Likewise.
* gcc.target/bpf/helper-skb-load-bytes.c: Likewise.
* gcc.target/bpf/helper-skb-load-bytes-relative.c: Likewise.
* gcc.target/bpf/helper-skb-pull-data.c: Likewise.
* gcc.target/bpf/helper-skb-set-tunnel-key.c: Likewise.
* gcc.target/bpf/helper-skb-set-tunnel-opt.c: Likewise.
*

[PATCH V6 04/11] testsuite: new require effective target indirect_calls

2019-08-29 Thread Jose E. Marchesi
This patch adds a new dg_require_effective_target procedure to the
testsuite infrastructure: indirect_calls.  This new function tells
whether a target supports calls to non-constant call targets.

This patch also annotates the tests in the gcc.c-torture testuite that
require support for indirect calls.

gcc/ChangeLog:

* doc/sourcebuild.texi (Effective-Target Keywords): Document
indirect_calls.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_indirect_calls):
New proc.
* gcc.c-torture/compile/20010102-1.c: Annotate with
dg-require-effective-target indirect_calls.
* gcc.c-torture/compile/20010107-1.c: Likewise.
* gcc.c-torture/compile/20011109-1.c: Likewise.
* gcc.c-torture/compile/20011218-1.c: Likewise.
* gcc.c-torture/compile/20011229-1.c: Likewise.
* gcc.c-torture/compile/20020129-1.c: Likewise.
* gcc.c-torture/compile/20020320-1.c: Likewise.
* gcc.c-torture/compile/20020706-1.c: Likewise.
* gcc.c-torture/compile/20020706-2.c: Likewise.
* gcc.c-torture/compile/20021205-1.c: Likewise.
* gcc.c-torture/compile/20030921-1.c: Likewise.
* gcc.c-torture/compile/20031023-1.c: Likewise.
* gcc.c-torture/compile/20031023-2.c: Likewise.
* gcc.c-torture/compile/20031023-3.c: Likewise.
* gcc.c-torture/compile/20031023-4.c: Likewise.
* gcc.c-torture/compile/20040614-1.c: Likewise.
* gcc.c-torture/compile/20040909-1.c: Likewise.
* gcc.c-torture/compile/20050122-1.c: Likewise.
* gcc.c-torture/compile/20050202-1.c: Likewise.
* gcc.c-torture/compile/20060208-1.c: Likewise.
* gcc.c-torture/compile/20081108-1.c: Likewise.
* gcc.c-torture/compile/20150327.c: Likewise.
* gcc.c-torture/compile/920428-2.c: Likewise.
* gcc.c-torture/compile/920928-5.c: Likewise.
* gcc.c-torture/compile/930117-1.c: Likewise.
* gcc.c-torture/compile/930607-1.c: Likewise.
* gcc.c-torture/compile/991213-2.c: Likewise.
* gcc.c-torture/compile/callind.c: Likewise.
* gcc.c-torture/compile/calls-void.c: Likewise.
* gcc.c-torture/compile/calls.c: Likewise.
* gcc.c-torture/compile/pr21840.c: Likewise.
* gcc.c-torture/compile/pr32139.c: Likewise.
* gcc.c-torture/compile/pr35607.c: Likewise.
* gcc.c-torture/compile/pr37433-1.c: Likewise.
* gcc.c-torture/compile/pr37433.c: Likewise.
* gcc.c-torture/compile/pr39941.c: Likewise.
* gcc.c-torture/compile/pr40080.c: Likewise.
* gcc.c-torture/compile/pr43635.c: Likewise.
* gcc.c-torture/compile/pr43791.c: Likewise.
* gcc.c-torture/compile/pr43845.c: Likewise.
* gcc.c-torture/compile/pr44043.c: Likewise.
* gcc.c-torture/compile/pr51694.c: Likewise.
* gcc.c-torture/compile/pr77754-2.c: Likewise.
* gcc.c-torture/compile/pr77754-3.c: Likewise.
* gcc.c-torture/compile/pr77754-4.c: Likewise.
* gcc.c-torture/compile/pr89663-2.c: Likewise.
* gcc.c-torture/compile/pta-1.c: Likewise.
* gcc.c-torture/compile/stack-check-1.c: Likewise.
* gcc.dg/Walloc-size-larger-than-18.c: Likewise.
---
 gcc/ChangeLog  |  5 ++
 gcc/doc/sourcebuild.texi   |  4 ++
 gcc/testsuite/ChangeLog| 55 ++
 gcc/testsuite/gcc.c-torture/compile/20010102-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20010107-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20011109-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20011218-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20011229-1.c   |  3 ++
 gcc/testsuite/gcc.c-torture/compile/20020129-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20020320-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20020706-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20020706-2.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20021205-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20030921-1.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/20031023-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20031023-2.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20031023-3.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20031023-4.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20040614-1.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/20040909-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20050122-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20050202-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20060208-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20081108-1.c   |  2 +
 gcc/testsuite/gcc.c-torture/compile/20150327.c |  2 +
 gcc/testsuite/gcc.c-torture/compile/920428-2.c |  2 +
 gcc/testsuite/gcc.c-torture/compile/920928-5.c |  3 ++
 gcc/testsuite/gcc.c-torture/compile/930117-1.c |  2 +
 gcc/testsuite/gcc.c-torture/compile/930607

[PATCH V6 03/11] testsuite: annotate c-torture/compile tests with dg-require-stack-size

2019-08-29 Thread Jose E. Marchesi
This patch annotates tests that make use of a significant a mount of
stack space.  Embedded and other restricted targets may have problems
compiling and running these tests.  Note that the annotations are in
many cases not exact.

testsuite/ChangeLog:

* gcc.c-torture/compile/2609-1.c: Annotate with
dg-require-stack-size.
* gcc/testsuite/gcc.c-torture/compile/2804-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20020304-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20020604-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20021015-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20050303-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20060421-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20071207-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20080903-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20121027-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/20151204.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/920501-12.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/920501-4.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/920723-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/921202-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/931003-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/931004-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/950719-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/951222-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/990517-1.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/bcopy.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr23929.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr25310.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr34458.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr39937.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr41181.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr41634.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr43415.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr43417.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/pr44788.c: Likewise.
* gcc/testsuite/gcc.c-torture/compile/sound.c: Likewise.
---
 gcc/testsuite/ChangeLog  | 35 
 gcc/testsuite/gcc.c-torture/compile/2609-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/2804-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20020304-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/20020604-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20021015-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20050303-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20060421-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/20071207-1.c |  1 +
 gcc/testsuite/gcc.c-torture/compile/20080903-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/20121027-1.c |  2 ++
 gcc/testsuite/gcc.c-torture/compile/20151204.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/920501-12.c  |  1 +
 gcc/testsuite/gcc.c-torture/compile/920501-4.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/920723-1.c   |  1 +
 gcc/testsuite/gcc.c-torture/compile/921202-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/931003-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/931004-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/950719-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/951222-1.c   |  2 ++
 gcc/testsuite/gcc.c-torture/compile/990517-1.c   |  3 ++
 gcc/testsuite/gcc.c-torture/compile/bcopy.c  |  1 +
 gcc/testsuite/gcc.c-torture/compile/pr23929.c|  1 +
 gcc/testsuite/gcc.c-torture/compile/pr25310.c|  1 +
 gcc/testsuite/gcc.c-torture/compile/pr34458.c|  1 +
 gcc/testsuite/gcc.c-torture/compile/pr39937.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr41181.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr41634.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr43415.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr43417.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/pr44788.c|  2 ++
 gcc/testsuite/gcc.c-torture/compile/sound.c  |  1 +
 32 files changed, 84 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/2609-1.c 
b/gcc/testsuite/gcc.c-torture/compile/2609-1.c
index f03aa35a7ac..e41701cc6d9 100644
--- a/gcc/testsuite/gcc.c-torture/compile/2609-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/2609-1.c
@@ -1,3 +1,5 @@
+/* { dg-require-stack-size "1024" } */
+
 int main ()
 {
   char temp[1024] = "tempfile";
diff --git a/gcc/testsuite/gcc.c-torture/compile/2804-1.c 
b/gcc/testsuite/gcc.c-torture/compile/2804-1.c
index 35464c212d2..550669b53a3 100644
--- a/gcc/testsuite/gcc.c-torture/compile/2804-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/2804-1.c
@@ -6,6 +6,7 @@
 /* { dg-skip-if "Not enough 64-bit registers" { pdp11-*-* 

Backports to 8.4

2019-08-29 Thread Jakub Jelinek
Hi!

I've backported following 12 commits from trunk to 8.4,
bootstrapped/regtested on x86_64-linux and i686-linux and committed
to gcc-8-branch.

Jakub
2019-08-29  Jakub Jelinek  

Backported from mainline
2019-04-19  Jakub Jelinek  

PR middle-end/90139
* tree-outof-ssa.c (get_temp_reg): If reg_mode is BLKmode, return
assign_temp instead of gen_reg_rtx.

* gcc.c-torture/compile/pr90139.c: New test.

--- gcc/tree-outof-ssa.c(revision 270456)
+++ gcc/tree-outof-ssa.c(revision 270457)
@@ -653,6 +653,8 @@ get_temp_reg (tree name)
   tree type = TREE_TYPE (name);
   int unsignedp;
   machine_mode reg_mode = promote_ssa_mode (name, &unsignedp);
+  if (reg_mode == BLKmode)
+return assign_temp (type, 0, 0);
   rtx x = gen_reg_rtx (reg_mode);
   if (POINTER_TYPE_P (type))
 mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
--- gcc/testsuite/gcc.c-torture/compile/pr90139.c   (nonexistent)
+++ gcc/testsuite/gcc.c-torture/compile/pr90139.c   (revision 270457)
@@ -0,0 +1,20 @@
+/* PR middle-end/90139 */
+
+typedef float __attribute__((vector_size (sizeof (float V;
+void bar (int, V *);
+int l;
+
+void
+foo (void)
+{
+  V n, b, o;
+  while (1)
+switch (l)
+  {
+  case 0:
+   o = n;
+   n = b;
+   b = o;
+   bar (1, &o);
+  }
+}
2019-08-29  Jakub Jelinek  

Backported from mainline
2019-04-26  Jakub Jelinek  

PR debug/90197
* c-tree.h (c_finish_loop): Add 2 further location_t arguments.
* c-parser.c (c_parser_while_statement): Adjust c_finish_loop caller.
(c_parser_do_statement): Likewise.
(c_parser_for_statement): Likewise.  Formatting fixes.
* c-typeck.c (c_finish_loop): Add COND_LOCUS and INCR_LOCUS arguments,
emit DEBUG_BEGIN_STMTs if needed.

--- gcc/c/c-parser.c(revision 271347)
+++ gcc/c/c-parser.c(revision 271348)
@@ -6001,7 +6001,8 @@ c_parser_while_statement (c_parser *pars
   location_t loc_after_labels;
   bool open_brace = c_parser_next_token_is (parser, CPP_OPEN_BRACE);
   body = c_parser_c99_block_statement (parser, if_p, &loc_after_labels);
-  c_finish_loop (loc, cond, NULL, body, c_break_label, c_cont_label, true);
+  c_finish_loop (loc, loc, cond, UNKNOWN_LOCATION, NULL, body,
+c_break_label, c_cont_label, true);
   add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
   c_parser_maybe_reclassify_token (parser);
 
@@ -6046,6 +6047,7 @@ c_parser_do_statement (c_parser *parser,
   c_break_label = save_break;
   new_cont = c_cont_label;
   c_cont_label = save_cont;
+  location_t cond_loc = c_parser_peek_token (parser)->location;
   cond = c_parser_paren_condition (parser);
   if (ivdep && cond != error_mark_node)
 cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
@@ -6059,7 +6061,8 @@ c_parser_do_statement (c_parser *parser,
   build_int_cst (integer_type_node, unroll));
   if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>"))
 c_parser_skip_to_end_of_block_or_statement (parser);
-  c_finish_loop (loc, cond, NULL, body, new_break, new_cont, false);
+  c_finish_loop (loc, cond_loc, cond, UNKNOWN_LOCATION, NULL, body,
+new_break, new_cont, false);
   add_stmt (c_end_compound_stmt (loc, block, flag_isoc99));
 }
 
@@ -6132,7 +6135,9 @@ c_parser_for_statement (c_parser *parser
   /* Silence the bogus uninitialized warning.  */
   tree collection_expression = NULL;
   location_t loc = c_parser_peek_token (parser)->location;
-  location_t for_loc = c_parser_peek_token (parser)->location;
+  location_t for_loc = loc;
+  location_t cond_loc = UNKNOWN_LOCATION;
+  location_t incr_loc = UNKNOWN_LOCATION;
   bool is_foreach_statement = false;
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_FOR));
   token_indent_info for_tinfo
@@ -6166,7 +6171,8 @@ c_parser_for_statement (c_parser *parser
  c_parser_consume_token (parser);
  is_foreach_statement = true;
  if (check_for_loop_decls (for_loc, true) == NULL_TREE)
-   c_parser_error (parser, "multiple iterating variables in fast 
enumeration");
+   c_parser_error (parser, "multiple iterating variables in "
+   "fast enumeration");
}
  else
check_for_loop_decls (for_loc, flag_isoc99);
@@ -6196,7 +6202,8 @@ c_parser_for_statement (c_parser *parser
  c_parser_consume_token (parser);
  is_foreach_statement = true;
  if (check_for_loop_decls (for_loc, true) == NULL_TREE)
-   c_parser_error (parser, "multiple iterating variables in 
fast enumeration");
+   c_parser_error (parser, "multiple iterating variables in "
+   "fast enumeration");
}
  else
check_for_loop_decls (for_loc, flag_isoc99);
@@ -6218,1

Re: [ARM/FDPIC v5 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2019-08-29 Thread Christophe Lyon

On 16/07/2019 12:34, Richard Sandiford wrote:

Christophe Lyon  writes:

On 22/05/2019 10:45, Christophe Lyon wrote:

On Wed, 22 May 2019 at 10:39, Szabolcs Nagy  wrote:


On 21/05/2019 16:28, Christophe Lyon wrote:

--- a/gcc/config/arm/linux-eabi.h
+++ b/gcc/config/arm/linux-eabi.h
@@ -89,7 +89,7 @@
   #define MUSL_DYNAMIC_LINKER_E "%{mbig-endian:eb}"
   #endif
   #define MUSL_DYNAMIC_LINKER \
-  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E "%{mfloat-abi=hard:hf}.so.1"
+  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E
"%{mfloat-abi=hard:hf}%{mfdpic:-fdpic}.so.1"


the line break seems wrong (either needs \ or no newline)


Sorry, that's a mailer artifact.


--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -45,7 +45,7 @@ case "${target}" in
  ;;
 sparc*-*-solaris2.11*)
  ;;
-  arm*-*-uclinuxfdpiceabi)
+  arm*-*-fdpiceabi)


should be *fdpiceabi instead of *-fdpiceabi i think.


Indeed, thanks
.


FWIW, here is the updated patch:
- handles musl -fdpic suffix
- disables sanitizers for arm*-*-fdpiceabi
- does not handle -static in a special way, so using -static produces binaries 
that request the non-existing /usr/lib/ld.so.1, thus effectively making -static 
broken/unsupported (this does lead to a few more FAIL in the testsuite)

The plan is to work -static-pie later, as discussed.


Could you make -static without -mno-fdpic an error via a %e spec,
so that the failure mode is a bit more user-friendly?

I realise this isn't your preferred option, sorry.



As discussed later, I didn't because I couldn't find a way
to catch linker (-Wl,XXX) options in the specs, and I prefer
to keep the possibility to generic a "static" binary using
"-static -Wl,-dynamic-linker XXX"

However, I've also a new patch in the series to disable tests that involve 
-static, attached here.



diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index e1bacf4..6c25a1a 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -55,6 +55,8 @@
  #define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*"\
"|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}"
  
+#define TARGET_FDPIC_ASM_SPEC  ""


Formatting nit: should be a single space before ""


OK


+
  #define BE8_LINK_SPEC \
"%{!r:%{!mbe32:%:be8_linkopt(%{mlittle-endian:little}"\
" %{mbig-endian:big}" \
@@ -64,7 +66,7 @@
  /* Tell the assembler to build BPABI binaries.  */
  #undef  SUBTARGET_EXTRA_ASM_SPEC
  #define SUBTARGET_EXTRA_ASM_SPEC \
-  "%{mabi=apcs-gnu|mabi=atpcs:-meabi=gnu;:-meabi=5}" TARGET_FIX_V4BX_SPEC
+  "%{mabi=apcs-gnu|mabi=atpcs:-meabi=gnu;:-meabi=5}" TARGET_FIX_V4BX_SPEC 
TARGET_FDPIC_ASM_SPEC


Long line.


OK


diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
index 66ec0ea..d7cc923 100644
--- a/gcc/config/arm/linux-eabi.h
+++ b/gcc/config/arm/linux-eabi.h
@@ -89,7 +89,7 @@
  #define MUSL_DYNAMIC_LINKER_E "%{mbig-endian:eb}"
  #endif
  #define MUSL_DYNAMIC_LINKER \
-  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E "%{mfloat-abi=hard:hf}.so.1"
+  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E 
"%{mfloat-abi=hard:hf}%{mfdpic:-fdpic}.so.1"
  
  /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to

 use the GNU/Linux version, not the generic BPABI version.  */


Rich, could you confirm that this is (going to be?) the correct name?


This was confirmed by Rich.


diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
index 66ec0ea..d7cc923 100644
--- a/gcc/config/arm/linux-eabi.h
+++ b/gcc/config/arm/linux-eabi.h
@@ -89,7 +89,7 @@
  #define MUSL_DYNAMIC_LINKER_E "%{mbig-endian:eb}"
  #endif
  #define MUSL_DYNAMIC_LINKER \
-  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E "%{mfloat-abi=hard:hf}.so.1"
+  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E 
"%{mfloat-abi=hard:hf}%{mfdpic:-fdpic}.so.1"
  
  /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to

 use the GNU/Linux version, not the generic BPABI version.  */
@@ -101,11 +101,14 @@
  #undef  ASAN_CC1_SPEC
  #define ASAN_CC1_SPEC "%{%:sanitize(address):-funwind-tables}"
  
+#define FDPIC_CC1_SPEC ""

+
  #undef  CC1_SPEC
  #define CC1_SPEC  \
-  LINUX_OR_ANDROID_CC (GNU_USER_TARGET_CC1_SPEC " " ASAN_CC1_SPEC,   \
+  LINUX_OR_ANDROID_CC (GNU_USER_TARGET_CC1_SPEC " " ASAN_CC1_SPEC " "  \
+  FDPIC_CC1_SPEC,  \
   GNU_USER_TARGET_CC1_SPEC " " ASAN_CC1_SPEC " "   \
-  ANDROID_CC1_SPEC)
+  ANDROID_CC1_SPEC "" FDPIC_CC1_SPEC)
  
  #define CC1PLUS_SPEC \

LINUX_OR_ANDROID_CC ("", ANDROID_CC1PLUS_SPEC)


Does it make sense to add FDPIC_CC1_SPEC to the Android version?


No, now fixed.


diff --git a/gcc/config/arm/uclinuxfdpiceabi.h 
b/gcc/config/arm/uclinuxfdpiceabi.h
new file mode 100644
index 000..3180bcd
--- /dev/null
+

Re: [PATCH] Setup predicate for switch default case in IPA (PR ipa/91089)

2019-08-29 Thread Martin Jambor
Hi,

On Fri, Jul 12 2019, Feng Xue OS wrote:
> IPA does not construct executability predicate for default case of switch 
> statement.
> So execution cost of default case is not properly evaluated in IPA-cp, this 
> might
> prevent function clone for function containing switch statement, if certain 
> non-default
> case is proved to be executed after constant propagation.
>
> This patch is composed to deduce predicate for default case, if it turns out 
> to be a
> relative simple one, for example, we can try to merge case range, and use
> comparison upon range bounds, and also range analysis information to simplify 
> predicate.
>

I have read through the patch and it looks OK to me but I cannot approve
it, you have to ping Honza for that.  Since you decided to use the value
range info, it would be nice if you could also add a testcase where it
plays a role.  Also, please don't post changelog entries as a part of
the patch, it basically guarantees it will not apply for anybody, not
even for you when you update your trunk.

Thanks for working on this,

Martin


> Feng
>
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3d92250b520..4de2f568990 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2019-07-12  Feng Xue  
> +
> + PR ipa/91089
> + * ipa-fnsummary.c (set_switch_stmt_execution_predicate): Add predicate
> + for switch default case using range analysis information.
> + * params.def (PARAM_IPA_MAX_SWITCH_PREDICATE_BOUNDS): New.
> +
>  2019-07-11  Sunil K Pandey  
>  


Re: [ARM/FDPIC v5 02/21] [ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts

2019-08-29 Thread Christophe Lyon

On 12/07/2019 08:49, Richard Sandiford wrote:

Christophe Lyon  writes:

The new arm-uclinuxfdpiceabi target behaves pretty much like
arm-linux-gnueabi. In order the enable the same set of features, we
have to update several configure scripts that generally match targets
like *-*-linux*: in most places, we add *-uclinux* where there is
already *-linux*, or uclinux* when there is already linux*.

In gcc/config.gcc and libgcc/config.host we use *-*-uclinuxfdpiceabi
because there is already a different behaviour for *-*uclinux* target.

In libtool.m4, we use uclinuxfdpiceabi in cases where ELF shared
libraries support is required, as uclinux does not guarantee that.

2019-XX-XX  Christophe Lyon  

config/
* futex.m4: Handle *-uclinux*.
* tls.m4 (GCC_CHECK_TLS): Likewise.

gcc/
* config.gcc: Handle *-*-uclinuxfdpiceabi.

libatomic/
* configure.tgt: Handle arm*-*-uclinux*.
* configure: Regenerate.

libgcc/
* config.host: Handle *-*-uclinuxfdpiceabi.

libitm/
* configure.tgt: Handle *-*-uclinux*.
* configure: Regenerate.

libstdc++-v3/
* acinclude.m4: Handle uclinux*.
* configure: Regenerate.
* configure.host: Handle uclinux*

* libtool.m4: Handle uclinux*.


Has the libtool.m4 patch been submitted to upstream libtool?
I think this is supposed to be handled by submitting there first
and then cherry-picking into gcc, so that the change isn't lost
by a future import.


I added a comment to libtool.m4 about this.


[...]

diff --git a/config/tls.m4 b/config/tls.m4
index 1a5fc59..a487aa4 100644
--- a/config/tls.m4
+++ b/config/tls.m4
@@ -76,7 +76,7 @@ AC_DEFUN([GCC_CHECK_TLS], [
  dnl Shared library options may depend on the host; this check
  dnl is only known to be needed for GNU/Linux.
  case $host in
-   *-*-linux*)
+   *-*-linux* | -*-uclinux*)
  LDFLAGS="-shared -Wl,--no-undefined $LDFLAGS"
  ;;
  esac


Is this right for all uclinux targets?

I don't think so, now restricted to -*-uclinuxfdpic*




diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 84258d8..cb0fdc5 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4


It'd probably be worth splitting out the libstdc++-v3 bits and
submitting them separately, cc:ing libstd...@gcc.gnu.org.  But...


I've now split the patch into two parts (both attached here)



@@ -1404,7 +1404,7 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME], [
  ac_has_nanosleep=yes
  ac_has_sched_yield=yes
  ;;
-  gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu)
+  gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu | uclinux*)
  AC_MSG_CHECKING([for at least GNU libc 2.17])
  AC_TRY_COMPILE(
[#include ],


is this the right thing to do?  It seems odd to be testing the glibc
version for uclibc.

Do you want to support multiple possible settings of
ac_has_clock_monotonic and ac_has_clock_realtime?  Or could you just
hard-code the values, given particular baseline assumptions about the
version of uclibc etc.?  Hard-coding would then make


@@ -1526,7 +1526,7 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME], [
  
if test x"$ac_has_clock_monotonic" != x"yes"; then

  case ${target_os} in
-  linux*)
+  linux* | uclinux*)
AC_MSG_CHECKING([for clock_gettime syscall])
AC_TRY_COMPILE(
  [#include 


...this redundant.


Right, now fixed.


@@ -2415,7 +2415,7 @@ AC_DEFUN([GLIBCXX_ENABLE_CLOCALE], [
# Default to "generic".
if test $enable_clocale_flag = auto; then
  case ${target_os} in
-  linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu)
+  linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu | uclinux*)
enable_clocale_flag=gnu
;;
darwin*)


This too seems to be choosing a glibc setting for a uclibc target.

Indeed.




@@ -2661,7 +2661,7 @@ AC_DEFUN([GLIBCXX_ENABLE_ALLOCATOR], [
# Default to "new".
if test $enable_libstdcxx_allocator_flag = auto; then
  case ${target_os} in
-  linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu)
+  linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu | uclinux*)
enable_libstdcxx_allocator_flag=new
;;
*)


The full case is:

   # Probe for host-specific support if no specific model is specified.
   # Default to "new".
   if test $enable_libstdcxx_allocator_flag = auto; then
 case ${target_os} in
   linux* | gnu* | kfreebsd*-gnu | knetbsd*-gnu)
enable_libstdcxx_allocator_flag=new
;;
   *)
enable_libstdcxx_allocator_flag=new
;;
 esac
   fi

which looks a bit redundant :-)


Right :-)

Thanks,

Christophe



Thanks,
Richard
.



>From 81c84839b8f004b7b52317850f27f58e05bec6ad Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Fri, 4 May 2018 15:11:35 +
Subject: [ARM/FDPIC v6 02/24] [ARM] FDPIC: Handle arm*-*-uclinuxfdpic

Re: [ARM/FDPIC v5 01/21] [ARM] FDPIC: Add -mfdpic option support

2019-08-29 Thread Christophe Lyon

On 16/07/2019 12:11, Richard Sandiford wrote:

[This isn't really something that should be reviewed under global
reviewership, but if it's either that or nothing, I'll do it anyway...]

Christophe Lyon  writes:

2019-XX-XX  Christophe Lyon  
Mickaël Guêné  

gcc/
* config/arm/arm.opt: Add -mfdpic option.
* doc/invoke.texi: Add documentation for -mfdpic.

Change-Id: I0eabd1d11c9406fd4a43c4333689ebebbfcc4fe8

diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 9067d49..2ed3bd5 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -306,3 +306,7 @@ Cost to assume for a branch insn.
  mgeneral-regs-only
  Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
  Generate code which uses the core registers only (r0-r14).
+
+mfdpic
+Target Report Mask(FDPIC)
+Enable Function Descriptor PIC mode.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 29585cf..805d7cc 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -703,7 +703,8 @@ Objective-C and Objective-C++ Dialects}.
  -mrestrict-it @gol
  -mverbose-cost-dump @gol
  -mpure-code @gol
--mcmse}
+-mcmse @gol
+-mfdpic}
  
  @emph{AVR Options}

  @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args @gol
@@ -17912,6 +17913,23 @@ MOVT instruction.
  Generate secure code as per the "ARMv8-M Security Extensions: Requirements on
  Development Tools Engineering Specification", which can be found on
  
@url{http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/ECM0359818_armv8m_security_extensions_reqs_on_dev_tools_1_0.pdf}.
+
+@item -mfdpic
+@itemx -mno-fdpic
+@opindex mfdpic
+@opindex mno-fdpic
+Select the FDPIC ABI, which uses function descriptors to represent


Maybe "64-bit function descriptors"?  Just a suggestion, might not be useful.

OK with that change, thanks.


OK, here is a new version, where I added a few words to explain that -static
is not supported.

Thanks,
Christophe



Richard


+pointers to functions.  When the compiler is configured for
+@code{arm-*-uclinuxfdpiceabi} targets, this option is on by default
+and implies @option{-fPIE} if none of the PIC/PIE-related options is
+provided.  On other targets, it only enables the FDPIC-specific code
+generation features, and the user should explicitly provide the
+PIC/PIE-related options as needed.
+
+The opposite @option{-mno-fdpic} option is useful (and required) to
+build the Linux kernel using the same (@code{arm-*-uclinuxfdpiceabi})
+toolchain as the one used to build the userland programs.
+
  @end table
  
  @node AVR Options

.



>From c936684e2b77ff5716bd8b67c617dcad088c72e0 Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Thu, 8 Feb 2018 10:44:32 +0100
Subject: [ARM/FDPIC v6 01/24] [ARM] FDPIC: Add -mfdpic option support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

2019-XX-XX  Christophe Lyon  
	Mickaël Guêné  

	gcc/
	* config/arm/arm.opt: Add -mfdpic option.
	* doc/invoke.texi: Add documentation for -mfdpic.

Change-Id: I05b98d6ae87c2b3fc04dd7fba415c730accdf33e

diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 9067d49..2ed3bd5 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -306,3 +306,7 @@ Cost to assume for a branch insn.
 mgeneral-regs-only
 Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses the core registers only (r0-r14).
+
+mfdpic
+Target Report Mask(FDPIC)
+Enable Function Descriptor PIC mode.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 29585cf..b77fa06 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -703,7 +703,8 @@ Objective-C and Objective-C++ Dialects}.
 -mrestrict-it @gol
 -mverbose-cost-dump @gol
 -mpure-code @gol
--mcmse}
+-mcmse @gol
+-mfdpic}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args @gol
@@ -17912,6 +17913,27 @@ MOVT instruction.
 Generate secure code as per the "ARMv8-M Security Extensions: Requirements on
 Development Tools Engineering Specification", which can be found on
 @url{http://infocenter.arm.com/help/topic/com.arm.doc.ecm0359818/ECM0359818_armv8m_security_extensions_reqs_on_dev_tools_1_0.pdf}.
+
+@item -mfdpic
+@itemx -mno-fdpic
+@opindex mfdpic
+@opindex mno-fdpic
+Select the FDPIC ABI, which uses 64-bit function descriptors to
+represent pointers to functions.  When the compiler is configured for
+@code{arm-*-uclinuxfdpiceabi} targets, this option is on by default
+and implies @option{-fPIE} if none of the PIC/PIE-related options is
+provided.  On other targets, it only enables the FDPIC-specific code
+generation features, and the user should explicitly provide the
+PIC/PIE-related options as needed.
+
+Note that static linking is not supported because it would still
+involve the dynamic linker when the program self-relocates.  If such
+behaviour is acceptable, use -static and -Wl,-dynamic-linker options.
+
+The opposite @option{-mno-fdpic} option is useful (and required) to
+build

Re: [PATCH] correct an ILP32/LP64 bug in sprintf warning (PR 91567)

2019-08-29 Thread Rainer Orth
Hi Martin,

> The recent sprintf+strlen integration doesn't handle unbounded
> string lengths entirely correctly for ILP32 targets and causes
> -Wformat-overflow false positives in some common cases, including
> during GCC bootstrap targeting such systems  The attached patch
> fixes that mistake.  (I think this code could be cleaned up and
> simplified some more but in the interest of unblocking the ILP32
> bootstrap and Glibc builds I haven't taken the time to do that.)
> The patch also adjusts down the maximum strlen result set by EVRP
> to PTRDIFF_MAX - 2, to match what the strlen pass does.
>
> The strlen maximum would ideally be computed in terms of
> max_object_size() (for which there would ideally be a --param
> setting), and checked the same way to avoid off-by-one mistakes
> between subsystems and their clients.  I have not made this change
> here but added a FIXME comment mentioning it.  I plan to add such
> a parameter and use it in max_object_size() in a future change.
>
> Testing with an ILP32 compiler also ran into the known limitation
> of the strlen pass being unable to determine the length of array
> members of local aggregates (PR 83543) initialized using
> the braced-list syntax.  gcc.dg/tree-ssa/builtin-snprintf-6.c
> fails a few cases as a result.I've xfailed the assertions
> for targets other than x86_64 where it isn't an issue.

there's still something wrong with that testcase:

* It XPASSes on i386-pc-solaris2.11 with -m64:

XPASS: gcc.dg/tree-ssa/builtin-snprintf-6.c scan-tree-dump-times optimized 
"Function test_assign_aggregate" 1

  It's almost always wrong to only handle x86_64; you need to consider
  i?86 && lp64, too.

* Even after your patch, there are several gcc-testresults postings
  showing the test to XPASS on x86_64-pc-linux-gnu.  I haven't looked
  closer yet.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[preprocessor] Include stacking

2019-08-29 Thread Nathan Sidwell
This patch refactors pieces of the include stacking routines.  Again 
from the modules branch, so these happen to provide the right slots to 
make that work, but I think they're good in their own right.


_cpp_find_file excitingly hid an if as the loop-invariant conditional of 
a for.  Refactored that, so mostly just changing indentation.


_should_stack_file was confusing, and I broke it apart into two 
predicates, then called from _stack_file.


Finally I extended the include_type enum to add IT_MAIN, for the main 
file, which means _cpp_stack_include just passes the include type down 
to _cpp_stack_file.


Applying to trunk.

nathan
--
Nathan Sidwell
2019-08-29  Nathan Sidwell  

	* internal.h (enum include_type): Add IT_MAIN, IT_DIRECTIVE_HWM,
	IT_HEADER_HWM.
	(_cpp_stack_file): Take include_type, not a bool.
	* files.c (_cpp_find_file): Refactor to not hide an if inside a
	for conditional.
	(should_stack_file): Break apart to ...
	(is_known_idempotent_file, has_unique_contents): ... these.
	(_cpp_stack_file): Replace IMPORT boolean with include_type enum.
	Refactor to use new predicates.  Do linemap compensation here ...
	(_cpp_stack_include): ... not here.
	* init.c (cpp_read_main_file): Pass IT_MAIN to _cpp_stack_file.

Index: files.c
===
--- files.c	(revision 275032)
+++ files.c	(working copy)
@@ -177,6 +177,4 @@ static bool read_file_guts (cpp_reader *
 static bool read_file (cpp_reader *pfile, _cpp_file *file,
 		   location_t loc);
-static bool should_stack_file (cpp_reader *, _cpp_file *file, bool import,
-			   location_t loc);
 static struct cpp_dir *search_path_head (cpp_reader *, const char *fname,
  int angle_brackets, enum include_type);
@@ -537,77 +535,84 @@ _cpp_find_file (cpp_reader *pfile, const
 	   && pfile->buffer->file->implicit_preinclude));
 
-  /* Try each path in the include chain.  */
-  for (; !fake ;)
-{
-  if (find_file_in_dir (pfile, file, &invalid_pch, loc))
-	break;
+  if (!fake)
+/* Try each path in the include chain.  */
+for (;;)
+  {
+	if (find_file_in_dir (pfile, file, &invalid_pch, loc))
+	  break;
 
-  file->dir = file->dir->next;
-  if (file->dir == NULL)
-	{
-	  if (search_path_exhausted (pfile, fname, file))
-	{
-	  /* Although this file must not go in the cache, because
-		 the file found might depend on things (like the current file)
-		 that aren't represented in the cache, it still has to go in
-		 the list of all files so that #import works.  */
-	  file->next_file = pfile->all_files;
-	  pfile->all_files = file;
-	  if (*hash_slot == NULL)
-		{
-		  /* If *hash_slot is NULL, the above htab_find_slot_with_hash
-		 call just created the slot, but we aren't going to store
-		 there anything, so need to remove the newly created entry.
-		 htab_clear_slot requires that it is non-NULL, so store
-		 there some non-NULL pointer, htab_clear_slot will
-		 overwrite it immediately.  */
-		  *hash_slot = file;
-		  htab_clear_slot (pfile->file_hash, hash_slot);
-		}
-	  return file;
-	}
+	file->dir = file->dir->next;
+	if (file->dir == NULL)
+	  {
+	if (search_path_exhausted (pfile, fname, file))
+	  {
+		/* Although this file must not go in the cache,
+		   because the file found might depend on things (like
+		   the current file) that aren't represented in the
+		   cache, it still has to go in the list of all files
+		   so that #import works.  */
+		file->next_file = pfile->all_files;
+		pfile->all_files = file;
+		if (*hash_slot == NULL)
+		  {
+		/* If *hash_slot is NULL, the above
+		   htab_find_slot_with_hash call just created the
+		   slot, but we aren't going to store there
+		   anything, so need to remove the newly created
+		   entry.  htab_clear_slot requires that it is
+		   non-NULL, so store there some non-NULL pointer,
+		   htab_clear_slot will overwrite it
+		   immediately.  */
+		*hash_slot = file;
+		htab_clear_slot (pfile->file_hash, hash_slot);
+		  }
+		return file;
+	  }
 
-	  if (invalid_pch)
-	{
-	  cpp_error (pfile, CPP_DL_ERROR,
-	   "one or more PCH files were found, but they were invalid");
-	  if (!cpp_get_options (pfile)->warn_invalid_pch)
+	if (invalid_pch)
+	  {
 		cpp_error (pfile, CPP_DL_ERROR,
-			   "use -Winvalid-pch for more information");
-	}
-	  if (implicit_preinclude)
-	{
-	  free ((char *) file->name);
-	  free (file);
-	  if (*hash_slot == NULL)
-		{
-		  /* See comment on the above htab_clear_slot call.  */
-		  *hash_slot = file;
-		  htab_clear_slot (pfile->file_hash, hash_slot);
-		}
-	  return NULL;
-	}
-	  else
-	open_file_failed (pfile, file, angle_brackets, loc);
-	  break;
-	}
+			   "one or more PCH files were found,"
+			   " but they were invalid");
+		if (!cpp_get_options (pfile)->warn_invalid_pch)
+		  cpp_error (pfile, CPP_DL_ERROR,
+			 "us

Re: [ARM/FDPIC v5 00/21] FDPIC ABI for ARM

2019-08-29 Thread Christophe Lyon

Hi,

On 15/05/2019 14:39, Christophe Lyon wrote:

Hello,

This patch series implements the GCC contribution of the FDPIC ABI for
ARM targets.

This ABI enables to run Linux on ARM MMU-less cores and supports
shared libraries to reduce the memory footprint.

Without MMU, text and data segments relative distances are different
from one process to another, hence the need for a dedicated FDPIC
register holding the start address of the data segment. One of the
side effects is that function pointers require two words to be
represented: the address of the code, and the data segment start
address. These two words are designated as "Function Descriptor",
hence the "FD PIC" name.

On ARM, the FDPIC register is r9 [1], and the target name is
arm-uclinuxfdpiceabi. Note that arm-uclinux exists, but uses another
ABI and the BFLAT file format; it does not support code sharing.
The -mfdpic option is enabled by default, and -mno-fdpic should be
used to build the Linux kernel.

This work was developed some time ago by STMicroelectronics, and was
presented during Linaro Connect SFO15 (September 2015). You can watch
the discussion and read the slides [2].
This presentation was related to the toolchain published on github [3],
which is based on binutils-2.22, gcc-4.7, uclibc-0.9.33.2, gdb-7.5.1
and qemu-2.3.0, and for which pre-built binaries are available [3].

The ABI itself is described in details in [1].

Our Linux kernel patches have been updated and committed by Nicolas
Pitre (Linaro) in July 2017. They are required so that the loader is
able to handle this new file type. Indeed, the ELF files are tagged
with ELFOSABI_ARM_FDPIC. This new tag has been allocated by ARM, as
well as the new relocations involved.

The binutils, QEMU and uclibc-ng patch series have been merged a few
months ago. [4][5][6]

This series provides support for architectures that support ARM and/or
Thumb-2 and has been tested on arm-linux-gnueabi without regression,
as well as arm-uclinuxfdpiceabi, using QEMU. arm-uclinuxfdpiceabi has
a few more failures than arm-linux-gnueabi, but is quite functional.

I have also booted an STM32 board (stm32f469) which uses a cortex-m4
with linux-4.20.17 and ran successfully several tools.

Are the GCC patches OK for inclusion in master?


I have addressed the comments I received on v5, and I am going to post updated 
versions of the patches that needed changes as follow-ups in this thread. I 
hope this will help reviewers as I will provide answers and updated patches 
next to their comments. After that, I will rebase the whole series and send it 
as v6 if that helps (several testsuite patches have already been approved 
as-is, but committing them now would change the patch numbering, thus possibly 
confusing reviewers).

However, note that several patches in the series haven't received feedback yet, 
so this is a ping for them :-)
[ARM/FDPIC v5 06/21] [ARM] FDPIC: Add support for c++ exceptions
[ARM/FDPIC v5 10/21] [ARM] FDPIC: Implement TLS support.
[ARM/FDPIC v5 11/21] [ARM] FDPIC: Add support to unwind FDPIC signal frame
[ARM/FDPIC v5 12/21] [ARM] FDPIC: Restore r9 after we call __aeabi_read_tp
[ARM/FDPIC v5 13/21] [ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture

Thanks,

Christophe


Changes between v4 and v5:
- rebased on top of recent gcc-10 master (April 26th, 2019)
- fixed handling of stack-protector combined patterns in FDPIC mode

Changes between v3 and v4:

- improved documentation (patch 1)
- emit an error message (sorry) if the target architecture does not
   support arm nor thumb-2 modes (patch 4)
- handle Richard's comments on patch 4 (comments, unspec)
- added .align directive (patch 5)
- fixed use of kernel helpers (__kernel_cmpxchg, __kernel_dmb) (patch 6)
- code factorization in patch 7
- typos/internal function name in patch 8
- improved patch 12
- dropped patch 16
- patch 20 introduces arm_arch*_thumb_ok effective targets to help
   skip some tests
- I tested patch 2 on xtensa-buildroot-uclinux-uclibc, it adds many
   new tests, but a few regressions
   (https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00713.html)
- I compiled and executed several LTP tests to exercise pthreads and signals
- I wrote and executed a simple testcase to change the interaction
   with __kernel_cmpxchg (ie. call the kernel helper rather than use an
   implementation in libgcc as requested by Richard)

Changes between v2 and v3:
- added doc entry for -mfdpic new option
- took Kyrill's comments into account (use "Armv7" instead of "7",
   code factorization, use preprocessor instead of hard-coding "r9",
   remove leftover code for thumb1 support, fixed comments)
- rebase over recent trunk
- patches with changes: 1, 2 (commit message), 3 (rebase), 4, 6, 7, 9,
   14 (rebase), 19 (rebase)

Changes between v1 and v2:
- fix GNU coding style
- exit with an error for pre-Armv7
- use ACLE __ARM_ARCH and remove dead code for pre-Armv4
- remove unsupported attempts of pre-Armv7/thumb1 support
- add instructions in comm

Re: [PR 91579] Avoid creating redundant PHI nodes in tail-call pass

2019-08-29 Thread Martin Jambor
Hi,

On Thu, Aug 29 2019, Richard Biener wrote:
> On Thu, Aug 29, 2019 at 11:04 AM Martin Jambor  wrote:
>>
>> Hi,
>>
>> when turning a tail-recursive call into a loop, the tail-call pass
>> creates a phi node for each gimple_reg function parameter that has any
>> use at all, even when the value passed to the original call is the same
>> as the received one, when it is the parameter's default definition.
>> This results in a redundant phi node in which one argument is the
>> default definition and all others are the LHS of the same phi node.  See
>> the Bugzilla entry for an example.  These phi nodes in turn confuses
>> ipa-prop.c which cannot skip them and may not create a pass-through jump
>> function when it should.
>>
>> Fixed by the following patch which just adds a bitmap to remember where
>> there are non-default-defs passed to a tail-recursive call and then
>> creates phi nodes only for such parameters.  It has passed bootstrap and
>> testing on x86_64-linux.
>>
>> OK for trunk?
>
> OK.  Eventually arg_needs_copy_p can be elided completely
> and pre-computed into the bitmap so we can just check
> the positional?  And rename the bitmap to arg_need_copy itself.

Like this?  Bootstrapped and tested on x86_64-linux.

Thanks,

Martin


2019-08-29  Martin Jambor  

tree-optimization/91579
* tree-tailcall.c (tailr_arg_needs_copy): New variable.
(find_tail_calls): Allocate tailr_arg_needs_copy and set its bits as
appropriate.
(arg_needs_copy_p): Removed.
(eliminate_tail_call): Test tailr_arg_needs_copy instead of calling
arg_needs_copy_p.
(tree_optimize_tail_calls_1): Likewise.  Free tailr_arg_needs_copy.

testsuite/
* gcc.dg/tree-ssa/pr91579.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr91579.c | 22 
 gcc/tree-tailcall.c | 48 +
 2 files changed, 47 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr91579.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c
new file mode 100644
index 000..ee752be1a85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailr1" } */
+
+typedef long unsigned int size_t;
+typedef int (*compare_t)(const void *, const void *);
+
+int partition (void *base, size_t nmemb, size_t size, compare_t cmp);
+
+void
+my_qsort (void *base, size_t nmemb, size_t size, compare_t cmp)
+{
+  int pt;
+  if (nmemb > 1)
+{
+  pt = partition (base, nmemb, size, cmp);
+  my_qsort (base, pt + 1, size, cmp);
+  my_qsort ((void*)((char*) base + (pt + 1) * size),
+   nmemb - pt - 1, size, cmp);
+}
+}
+
+/* { dg-final { scan-tree-dump-not "cmp\[^\r\n\]*PHI" "tailr1" } } */
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index a4b563efd73..4824a5e650f 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -126,6 +126,11 @@ struct tailcall
accumulator.  */
 static tree m_acc, a_acc;
 
+/* Bitmap with a bit for each function parameter which is set to true if we
+   have to copy the parameter for conversion of tail-recursive calls.  */
+
+static bitmap tailr_arg_needs_copy;
+
 static bool optimize_tail_call (struct tailcall *, bool);
 static void eliminate_tail_call (struct tailcall *);
 
@@ -727,6 +732,18 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
  gimple_stmt_iterator mgsi = gsi_for_stmt (stmt);
  gsi_move_before (&mgsi, &gsi);
}
+  if (!tailr_arg_needs_copy)
+   tailr_arg_needs_copy = BITMAP_ALLOC (NULL);
+  for (param = DECL_ARGUMENTS (current_function_decl), idx = 0;
+  param;
+  param = DECL_CHAIN (param), idx++)
+   {
+ tree ddef, arg = gimple_call_arg (call, idx);
+ if (is_gimple_reg (param)
+ && (ddef = ssa_default_def (cfun, param))
+ && (arg != ddef))
+   bitmap_set_bit (tailr_arg_needs_copy, idx);
+   }
 }
 
   nw = XNEW (struct tailcall);
@@ -905,25 +922,6 @@ decrease_profile (basic_block bb, profile_count count)
 }
 }
 
-/* Returns true if argument PARAM of the tail recursive call needs to be copied
-   when the call is eliminated.  */
-
-static bool
-arg_needs_copy_p (tree param)
-{
-  tree def;
-
-  if (!is_gimple_reg (param))
-return false;
-
-  /* Parameters that are only defined but never used need not be copied.  */
-  def = ssa_default_def (cfun, param);
-  if (!def)
-return false;
-
-  return true;
-}
-
 /* Eliminates tail call described by T.  TMP_VARS is a list of
temporary variables used to copy the function arguments.  */
 
@@ -1005,7 +1003,7 @@ eliminate_tail_call (struct tailcall *t)
param;
param = DECL_CHAIN (param), idx++)
 {
-  if (!arg_needs_copy_p (param))
+  if (!bitmap_bit_p (tailr_arg_needs_copy, idx))
continue;
 
   arg = gimple_call_arg

[PATCH][AArch64] Vectorize MULH(R)S patterns with SVE2 instructions

2019-08-29 Thread Yuliang Wang
This patch allows for more efficient SVE2 vectorization of Multiply High with 
Round and Scale (MULHRS) patterns.

The example snippet:

uint16_t a[N], b[N], c[N];

void foo_round (void)
{
for (int i = 0; i < N; i++)
a[i] = int32_t)b[i] * (int32_t)c[i]) >> 14) + 1) >> 1;
}

... previously vectorized to:

foo_round:
...
ptrue   p0.s
whilelo p1.h, wzr, w2
ld1h{z2.h}, p1/z, [x4, x0, lsl #1]
ld1h{z0.h}, p1/z, [x3, x0, lsl #1]
uunpklo z3.s, z2.h  //
uunpklo z1.s, z0.h  //
uunpkhi z2.s, z2.h  //
uunpkhi z0.s, z0.h  //
mul z1.s, p0/m, z1.s, z3.s  //
mul z0.s, p0/m, z0.s, z2.s  //
asr z1.s, z1.s, #14 //
asr z0.s, z0.s, #14 //
add z1.s, z1.s, #1  //
add z0.s, z0.s, #1  //
asr z1.s, z1.s, #1  //
asr z0.s, z0.s, #1  //
uzp1z0.h, z1.h, z0.h//
st1h{z0.h}, p1, [x1, x0, lsl #1]
inchx0
whilelo p1.h, w0, w2
b.ne28
ret

... and now vectorizes to:

foo_round:
...
whilelo p0.h, wzr, w2
nop
ld1h{z1.h}, p0/z, [x4, x0, lsl #1]
ld1h{z2.h}, p0/z, [x3, x0, lsl #1]
umullb  z0.s, z1.h, z2.h//
umullt  z1.s, z1.h, z2.h//
rshrnb  z0.h, z0.s, #15 //
rshrnt  z0.h, z1.s, #15 //
st1h{z0.h}, p0, [x1, x0, lsl #1]
inchx0
whilelo p0.h, w0, w2
b.ne28
ret
nop

Also supported are:

* Non-rounding cases

The equivalent example snippet:

void foo_trunc (void)
{
for (int i = 0; i < N; i++)
a[i] = ((int32_t)b[i] * (int32_t)c[i]) >> 15;
}

... vectorizes with SHRNT/SHRNB

* 32-bit and 8-bit input/output types

* Signed output types

SMULLT/SMULLB are generated instead

SQRDMULH was considered as a potential single-instruction optimization but 
saturates the intermediate value instead of truncating.

Best Regards,
Yuliang Wang


ChangeLog:

2019-08-22  Yuliang Wang  

* config/aarch64/aarch64-sve2.md: support for SVE2 instructions 
[S/U]MULL[T/B] + [R]SHRN[T/B] and MULHRS pattern variants
* config/aarch64/iterators.md: iterators and attributes for above
* internal-fn.def: internal functions for MULH[R]S patterns
* optabs.def: optabs definitions for above and sign variants
* tree-vect-patterns.c (vect_recog_multhi_pattern): pattern recognition 
function for MULHRS
* gcc.target/aarch64/sve2/mulhrs_1.c: new test for all variants


rb11655.patch
Description: rb11655.patch


[PATCH] PR libstdc++/91067 add more missing exports for directory iterators

2019-08-29 Thread Jonathan Wakely

PR libstdc++/91067
* acinclude.m4 (libtool_VERSION): Bump to 6:28:0.
* configure: Regenerate.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.28): Add new version. Export
missing symbols.
* testsuite/27_io/filesystem/iterators/91067.cc: Test move
constructors.
* testsuite/util/testsuite_abi.cc: Add new symbol version.

As mentioned yesterday, we need to add some more exports for
std::filesystem directory iterators. As discussed in PR 91067 Clang
inlines the move constructor and optimises it to a tail call to the C2
move constructor of __shared_ptr, which wasn't exported.

Tested x86_64-linux, i686-linux, powerpc64-linux. Committing to trunk
and (later today) gcc-9-branch.


commit c6c577f0fd4e807f071a2a7d519e109733e3c704
Author: Jonathan Wakely 
Date:   Tue Aug 27 15:20:10 2019 +0100

PR libstdc++/91067 add more missing exports for directory iterators

PR libstdc++/91067
* acinclude.m4 (libtool_VERSION): Bump to 6:28:0.
* configure: Regenerate.
* config/abi/pre/gnu.ver (GLIBCXX_3.4.28): Add new version. Export
missing symbols.
* testsuite/27_io/filesystem/iterators/91067.cc: Test move
constructors.
* testsuite/util/testsuite_abi.cc: Add new symbol version.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 24145fdf1ce..9485b1fecee 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -3832,7 +3832,7 @@ changequote([,])dnl
 fi
 
 # For libtool versioning info, format is CURRENT:REVISION:AGE
-libtool_VERSION=6:27:0
+libtool_VERSION=6:28:0
 
 # Everything parsed; figure out what files and settings to use.
 case $enable_symvers in
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index ff4b74cb971..07a00036827 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2268,11 +2268,11 @@ GLIBCXX_3.4.26 {
 
 GLIBCXX_3.4.27 {
 
-# __shared_ptr<_Dir>::__shared_ptr()
+# __shared_ptr<_Dir>::__shared_ptr() (base object ctor)
 
_ZNSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE[012]EEC2Ev;
 
_ZNSt12__shared_ptrINSt10filesystem7__cxx114_DirELN9__gnu_cxx12_Lock_policyE[012]EEC2Ev;
 
-# __shared_ptr::__shared_ptr()
+# __shared_ptr::__shared_ptr() 
(base object ctor)
 
_ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEC2Ev;
 
_ZNSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEC2Ev;
 
@@ -2282,6 +2282,18 @@ GLIBCXX_3.4.27 {
 
 } GLIBCXX_3.4.26;
 
+GLIBCXX_3.4.28 {
+
+# __shared_ptr<_Dir>::__shared_ptr(__shared_ptr&&) (base object ctor)
+
_ZNSt12__shared_ptrINSt10filesystem4_DirELN9__gnu_cxx12_Lock_policyE[012]EEC2EOS4_;
+
_ZNSt12__shared_ptrINSt10filesystem7__cxx114_DirELN9__gnu_cxx12_Lock_policyE[012]EEC2EOS5_;
+
+# 
__shared_ptr::__shared_ptr(__shared_ptr&&)
 (base object ctor)
+
_ZNSt12__shared_ptrINSt10filesystem28recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEC2EOS5_;
+
_ZNSt12__shared_ptrINSt10filesystem7__cxx1128recursive_directory_iterator10_Dir_stackELN9__gnu_cxx12_Lock_policyE[012]EEC2EOS6_;
+
+} GLIBCXX_3.4.27;
+
 # Symbols in the support library (libsupc++) have their own tag.
 CXXABI_1.3 {
 
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/iterators/91067.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/91067.cc
index 54172d9f20b..39fbc7b5d96 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/iterators/91067.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/91067.cc
@@ -37,9 +37,25 @@ test02()
   d = std::move(d);
 }
 
+void
+test03()
+{
+  std::filesystem::directory_iterator d;
+  auto d2 = std::move(d);
+}
+
+void
+test04()
+{
+  std::filesystem::recursive_directory_iterator d;
+  auto d2 = std::move(d);
+}
+
 int
 main()
 {
   test01();
   test02();
+  test03();
+  test04();
 }
diff --git a/libstdc++-v3/testsuite/util/testsuite_abi.cc 
b/libstdc++-v3/testsuite/util/testsuite_abi.cc
index 1277972049f..dfce3741521 100644
--- a/libstdc++-v3/testsuite/util/testsuite_abi.cc
+++ b/libstdc++-v3/testsuite/util/testsuite_abi.cc
@@ -208,6 +208,7 @@ check_version(symbol& test, bool added)
   known_versions.push_back("GLIBCXX_3.4.25");
   known_versions.push_back("GLIBCXX_3.4.26");
   known_versions.push_back("GLIBCXX_3.4.27");
+  known_versions.push_back("GLIBCXX_3.4.28");
   known_versions.push_back("CXXABI_1.3");
   known_versions.push_back("CXXABI_LDBL_1.3");
   known_versions.push_back("CXXABI_1.3.1");
@@ -239,7 +240,7 @@ check_version(symbol& test, bool added)
test.version_status = symbol::incompatible;
 
   // Check that added symbols are added in the latest pre-release version.
-  bool latestp = (test.version_name == "GLIBCXX_3.4.27"
+  bool 

Re: [PATCH/RFC] Simplify wrapped RTL op

2019-08-29 Thread Segher Boessenkool
Hi Robin,

On Thu, Aug 29, 2019 at 11:08:11AM +0200, Robin Dapp wrote:
> >> PR37451.  Not clear what target that regressed on, btw.
> > 
> > And PR55190 and PR67288 and probably more.
> 
> Thanks for finding those.  So the hope is to get this fixed or rather
> move towards a fix with the patch series that's currently reviewed which
> injects some doloop knowledge into ivopts?

The long-term plan is to do more and more of the loop optimisation earlier,
at Gimple level, and to certainly do almost no analysis on the RTL, as done
currently.  But this is a longer term still somewhat vague plan.

Right now ivopts decides if a loop can use doloop, and costs stuff based
on that.  Next steps will be to communicate "please use / do not use doloop
here" down to RTL.  Perhaps the doloop expansion should move closer to the
expand pass as well.

And it's not just doloop -- the unrolling should be done earlier, too.

How much of this will ever work out, and how much of it will make GCC 10,
hrm I need a new crystal ball :-)

> As said before, I was thinking of storing niter + 1 somewhere and use
> this instead of doing the + 1 later when it cannot be simplified
> anymore. But if we expect a larger rewrite anyway, it's probably not
> worthwhile to pursue that now.

This sounds like a pretty simple short-term solution.  If it works, I'm
all for it :-)


Segher


Re: [PATCH] Fix bootstrap miscompare by STV (PR91580)

2019-08-29 Thread Richard Biener
On Thu, 29 Aug 2019, Richard Biener wrote:

> On Thu, 29 Aug 2019, Jakub Jelinek wrote:
> 
> > On Thu, Aug 29, 2019 at 12:04:53PM +0200, Richard Biener wrote:
> > > + else
> > 
> > Perhaps use
> > else if (MAY_HAVE_DEBUG_BIND_INSNS)
> > instead so that you don't walk it once again if there can't be DEBUG_INSNs?
> 
> Sure - will do as followup to unbreak bootstrap w/o re-testing this.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-08-29  Richard Biener  

* config/i386/i386-features.c (general_scalar_chain::convert_insn):
Guard debug work with MAY_HAVE_DEBUG_BIND_INSNS.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 275030)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -893,7 +893,7 @@ general_scalar_chain::convert_insn (rtx_
if (use)
  convert_reg (insn, DF_REF_REG (ref),
   *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
-   else
+   else if (MAY_HAVE_DEBUG_BIND_INSNS)
  {
/* If we generated a scalar copy we can leave debug-insns
   as-is, if not, we have to adjust them.  */


Re: [Preprocessor] small cleanups

2019-08-29 Thread Nathan Sidwell

On 8/29/19 3:57 AM, Bernhard Reutner-Fischer wrote:

On Wed, 28 Aug 2019 14:42:57 -0400
Nathan Sidwell  wrote:


/* If opened with #import or contains #pragma once.  */
-  bool once_only;
+  bool once_only : 1;


I'm curious why you have them as bool and not unsigned?


because they're true/false values and this is C++.

nathan

--
Nathan Sidwell


Re: [PATCH] Fix bootstrap miscompare by STV (PR91580)

2019-08-29 Thread Richard Biener
On Thu, 29 Aug 2019, Uros Bizjak wrote:

> On Thu, Aug 29, 2019 at 12:04 PM Richard Biener  wrote:
> >
> >
> > The following fixes the bootstrap-debug miscompare caused by STV
> > where we ended up with chain-to-scalar copies just because of
> > debug uses.  Instead we have to avoid that, eventually substituting
> > into debug uses or resetting debug stmts when there are reaching
> > defs from both inside and outside of the chain (since we rename
> > all in-chain defs).
> >
> > Bootstrapped on i686-linux-gnu (with a setup previously
> > reproducing the miscompare).  Bootstrapped on x86_64-unknown-linux-gnu,
> > testing in progress.
> >
> > OK for trunk?
> >
> > Thanks,
> > Richard.
> >
> > 2019-08-29  Richard Biener  
> >
> > PR bootstrap/91580
> > * config/i386/i386-features.c (general_scalar_chain::convert_insn):
> > Do not emit scalar copies for debug-insns, instead replace
> > their uses with the reg copy used in the chain or reset them
> > if there is a reaching definition outside of the chain as well.
> 
> OK.

r275030.

> Let's fix the breakage, and maybe later we could look into merging
> with TImode debug fixups (which looks similar to the functionality in
> this patch).

I don't think that's easily possible since the TImode chains still
work in the way of having all defs of a pseudo in a chain, so
code generation and replacement is different.  Rather TImode
chains could be handled by the generic machinery by only making
loads, stores and reg-reg copies candidates.

Richard.

> Thanks,
> Uros.
> 
> > Index: gcc/config/i386/i386-features.c
> > ===
> > --- gcc/config/i386/i386-features.c (revision 274991)
> > +++ gcc/config/i386/i386-features.c (working copy)
> > @@ -880,18 +880,52 @@ general_scalar_chain::convert_op (rtx *o
> >  void
> >  general_scalar_chain::convert_insn (rtx_insn *insn)
> >  {
> > -  /* Generate copies for out-of-chain uses of defs.  */
> > +  /* Generate copies for out-of-chain uses of defs and adjust debug uses.  
> > */
> >for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
> >  if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
> >{
> > df_link *use;
> > for (use = DF_REF_CHAIN (ref); use; use = use->next)
> > - if (DF_REF_REG_MEM_P (use->ref)
> > - || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
> > + if (NONDEBUG_INSN_P (DF_REF_INSN (use->ref))
> > + && (DF_REF_REG_MEM_P (use->ref)
> > + || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref
> > break;
> > if (use)
> >   convert_reg (insn, DF_REF_REG (ref),
> >*defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
> > +   else
> > + {
> > +   /* If we generated a scalar copy we can leave debug-insns
> > +  as-is, if not, we have to adjust them.  */
> > +   auto_vec to_reset_debug_insns;
> > +   for (use = DF_REF_CHAIN (ref); use; use = use->next)
> > + if (DEBUG_INSN_P (DF_REF_INSN (use->ref)))
> > +   {
> > + rtx_insn *debug_insn = DF_REF_INSN (use->ref);
> > + /* If there's a reaching definition outside of the
> > +chain we have to reset.  */
> > + df_link *def;
> > + for (def = DF_REF_CHAIN (use->ref); def; def = def->next)
> > +   if (!bitmap_bit_p (insns, DF_REF_INSN_UID (def->ref)))
> > + break;
> > + if (def)
> > +   to_reset_debug_insns.safe_push (debug_insn);
> > + else
> > +   {
> > + *DF_REF_REAL_LOC (use->ref)
> > +   = *defs_map.get (regno_reg_rtx [DF_REF_REGNO 
> > (ref)]);
> > + df_insn_rescan (debug_insn);
> > +   }
> > +   }
> > +   /* Have to do the reset outside of the DF_CHAIN walk to not
> > +  disrupt it.  */
> > +   while (!to_reset_debug_insns.is_empty ())
> > + {
> > +   rtx_insn *debug_insn = to_reset_debug_insns.pop ();
> > +   INSN_VAR_LOCATION_LOC (debug_insn) = 
> > gen_rtx_UNKNOWN_VAR_LOC ();
> > +   df_insn_rescan_debug_internal (debug_insn);
> > + }
> > + }
> >}
> >
> >/* Replace uses in this insn with the defs we use in the chain.  */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 247165 (AG München)

Re: [PATCH] Fix bootstrap miscompare by STV (PR91580)

2019-08-29 Thread Richard Biener
On Thu, 29 Aug 2019, Jakub Jelinek wrote:

> On Thu, Aug 29, 2019 at 12:04:53PM +0200, Richard Biener wrote:
> > +   else
> 
> Perhaps use
>   else if (MAY_HAVE_DEBUG_BIND_INSNS)
> instead so that you don't walk it once again if there can't be DEBUG_INSNs?

Sure - will do as followup to unbreak bootstrap w/o re-testing this.

Thanks,
Richard.


Re: [PATCH] Update ABI baselines for x86 and powerpc GNU targets

2019-08-29 Thread Jonathan Wakely

On 29/08/19 12:17 +0200, Jakub Jelinek wrote:

On Wed, Aug 28, 2019 at 10:14:35PM +0100, Jonathan Wakely wrote:

* config/abi/post/i386-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/i486-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt: Update.
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Update.
* config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Update.
* config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Update.

I need to bump the library version again, to fix PR 91067 (again).

After doing that, the make check-abi test fails, because the
GLIBCXX_3.4.27 symbols aren't in the baseline files yet. This updates
the baselines to match what's in gcc-9.2 and trunk already. I'll
commit this to trunk and gcc-9-branch.


Here is the same thing for aarch64 and s390x (from Fedora libstdc++.so.6
binaries).  Ok for trunk/9.3?


Yes, thanks!




Re: [PATCH] Update ABI baselines for x86 and powerpc GNU targets

2019-08-29 Thread Jakub Jelinek
On Wed, Aug 28, 2019 at 10:14:35PM +0100, Jonathan Wakely wrote:
>   * config/abi/post/i386-linux-gnu/baseline_symbols.txt: Update.
>   * config/abi/post/i486-linux-gnu/baseline_symbols.txt: Update.
>   * config/abi/post/powerpc-linux-gnu/baseline_symbols.txt: Update.
>   * config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt: Update.
>   * config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Update.
>   * config/abi/post/x86_64-linux-gnu/32/baseline_symbols.txt: Update.
>   * config/abi/post/x86_64-linux-gnu/baseline_symbols.txt: Update.
> 
> I need to bump the library version again, to fix PR 91067 (again).
> 
> After doing that, the make check-abi test fails, because the
> GLIBCXX_3.4.27 symbols aren't in the baseline files yet. This updates
> the baselines to match what's in gcc-9.2 and trunk already. I'll
> commit this to trunk and gcc-9-branch.

Here is the same thing for aarch64 and s390x (from Fedora libstdc++.so.6
binaries).  Ok for trunk/9.3?

2019-08-29  Jakub Jelinek  

Re: [PATCH] Fix bootstrap miscompare by STV (PR91580)

2019-08-29 Thread Jakub Jelinek
On Thu, Aug 29, 2019 at 12:04:53PM +0200, Richard Biener wrote:
> + else

Perhaps use
else if (MAY_HAVE_DEBUG_BIND_INSNS)
instead so that you don't walk it once again if there can't be DEBUG_INSNs?

Jakub


Re: [PATCH] Fix bootstrap miscompare by STV (PR91580)

2019-08-29 Thread Uros Bizjak
On Thu, Aug 29, 2019 at 12:04 PM Richard Biener  wrote:
>
>
> The following fixes the bootstrap-debug miscompare caused by STV
> where we ended up with chain-to-scalar copies just because of
> debug uses.  Instead we have to avoid that, eventually substituting
> into debug uses or resetting debug stmts when there are reaching
> defs from both inside and outside of the chain (since we rename
> all in-chain defs).
>
> Bootstrapped on i686-linux-gnu (with a setup previously
> reproducing the miscompare).  Bootstrapped on x86_64-unknown-linux-gnu,
> testing in progress.
>
> OK for trunk?
>
> Thanks,
> Richard.
>
> 2019-08-29  Richard Biener  
>
> PR bootstrap/91580
> * config/i386/i386-features.c (general_scalar_chain::convert_insn):
> Do not emit scalar copies for debug-insns, instead replace
> their uses with the reg copy used in the chain or reset them
> if there is a reaching definition outside of the chain as well.

OK.

Let's fix the breakage, and maybe later we could look into merging
with TImode debug fixups (which looks similar to the functionality in
this patch).

Thanks,
Uros.

> Index: gcc/config/i386/i386-features.c
> ===
> --- gcc/config/i386/i386-features.c (revision 274991)
> +++ gcc/config/i386/i386-features.c (working copy)
> @@ -880,18 +880,52 @@ general_scalar_chain::convert_op (rtx *o
>  void
>  general_scalar_chain::convert_insn (rtx_insn *insn)
>  {
> -  /* Generate copies for out-of-chain uses of defs.  */
> +  /* Generate copies for out-of-chain uses of defs and adjust debug uses.  */
>for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
>  if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
>{
> df_link *use;
> for (use = DF_REF_CHAIN (ref); use; use = use->next)
> - if (DF_REF_REG_MEM_P (use->ref)
> - || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
> + if (NONDEBUG_INSN_P (DF_REF_INSN (use->ref))
> + && (DF_REF_REG_MEM_P (use->ref)
> + || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref
> break;
> if (use)
>   convert_reg (insn, DF_REF_REG (ref),
>*defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
> +   else
> + {
> +   /* If we generated a scalar copy we can leave debug-insns
> +  as-is, if not, we have to adjust them.  */
> +   auto_vec to_reset_debug_insns;
> +   for (use = DF_REF_CHAIN (ref); use; use = use->next)
> + if (DEBUG_INSN_P (DF_REF_INSN (use->ref)))
> +   {
> + rtx_insn *debug_insn = DF_REF_INSN (use->ref);
> + /* If there's a reaching definition outside of the
> +chain we have to reset.  */
> + df_link *def;
> + for (def = DF_REF_CHAIN (use->ref); def; def = def->next)
> +   if (!bitmap_bit_p (insns, DF_REF_INSN_UID (def->ref)))
> + break;
> + if (def)
> +   to_reset_debug_insns.safe_push (debug_insn);
> + else
> +   {
> + *DF_REF_REAL_LOC (use->ref)
> +   = *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]);
> + df_insn_rescan (debug_insn);
> +   }
> +   }
> +   /* Have to do the reset outside of the DF_CHAIN walk to not
> +  disrupt it.  */
> +   while (!to_reset_debug_insns.is_empty ())
> + {
> +   rtx_insn *debug_insn = to_reset_debug_insns.pop ();
> +   INSN_VAR_LOCATION_LOC (debug_insn) = gen_rtx_UNKNOWN_VAR_LOC 
> ();
> +   df_insn_rescan_debug_internal (debug_insn);
> + }
> + }
>}
>
>/* Replace uses in this insn with the defs we use in the chain.  */


[PATCH] Fix bootstrap miscompare by STV (PR91580)

2019-08-29 Thread Richard Biener


The following fixes the bootstrap-debug miscompare caused by STV
where we ended up with chain-to-scalar copies just because of
debug uses.  Instead we have to avoid that, eventually substituting
into debug uses or resetting debug stmts when there are reaching
defs from both inside and outside of the chain (since we rename
all in-chain defs).

Bootstrapped on i686-linux-gnu (with a setup previously
reproducing the miscompare).  Bootstrapped on x86_64-unknown-linux-gnu,
testing in progress.

OK for trunk?

Thanks,
Richard.

2019-08-29  Richard Biener  

PR bootstrap/91580
* config/i386/i386-features.c (general_scalar_chain::convert_insn):
Do not emit scalar copies for debug-insns, instead replace
their uses with the reg copy used in the chain or reset them
if there is a reaching definition outside of the chain as well.

Index: gcc/config/i386/i386-features.c
===
--- gcc/config/i386/i386-features.c (revision 274991)
+++ gcc/config/i386/i386-features.c (working copy)
@@ -880,18 +880,52 @@ general_scalar_chain::convert_op (rtx *o
 void
 general_scalar_chain::convert_insn (rtx_insn *insn)
 {
-  /* Generate copies for out-of-chain uses of defs.  */
+  /* Generate copies for out-of-chain uses of defs and adjust debug uses.  */
   for (df_ref ref = DF_INSN_DEFS (insn); ref; ref = DF_REF_NEXT_LOC (ref))
 if (bitmap_bit_p (defs_conv, DF_REF_REGNO (ref)))
   {
df_link *use;
for (use = DF_REF_CHAIN (ref); use; use = use->next)
- if (DF_REF_REG_MEM_P (use->ref)
- || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref)))
+ if (NONDEBUG_INSN_P (DF_REF_INSN (use->ref))
+ && (DF_REF_REG_MEM_P (use->ref)
+ || !bitmap_bit_p (insns, DF_REF_INSN_UID (use->ref
break;
if (use)
  convert_reg (insn, DF_REF_REG (ref),
   *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]));
+   else
+ {
+   /* If we generated a scalar copy we can leave debug-insns
+  as-is, if not, we have to adjust them.  */
+   auto_vec to_reset_debug_insns;
+   for (use = DF_REF_CHAIN (ref); use; use = use->next)
+ if (DEBUG_INSN_P (DF_REF_INSN (use->ref)))
+   {
+ rtx_insn *debug_insn = DF_REF_INSN (use->ref);
+ /* If there's a reaching definition outside of the
+chain we have to reset.  */
+ df_link *def;
+ for (def = DF_REF_CHAIN (use->ref); def; def = def->next)
+   if (!bitmap_bit_p (insns, DF_REF_INSN_UID (def->ref)))
+ break;
+ if (def)
+   to_reset_debug_insns.safe_push (debug_insn);
+ else
+   {
+ *DF_REF_REAL_LOC (use->ref)
+   = *defs_map.get (regno_reg_rtx [DF_REF_REGNO (ref)]);
+ df_insn_rescan (debug_insn);
+   }
+   }
+   /* Have to do the reset outside of the DF_CHAIN walk to not
+  disrupt it.  */
+   while (!to_reset_debug_insns.is_empty ())
+ {
+   rtx_insn *debug_insn = to_reset_debug_insns.pop ();
+   INSN_VAR_LOCATION_LOC (debug_insn) = gen_rtx_UNKNOWN_VAR_LOC ();
+   df_insn_rescan_debug_internal (debug_insn);
+ }
+ }
   }
 
   /* Replace uses in this insn with the defs we use in the chain.  */


Re: [PATCH, i386]: Fix secondary_reload_needed (was: Re: [PATCH, i386]: Improve STV conversion of shifts)

2019-08-29 Thread Uros Bizjak
And the patch...

On Thu, Aug 29, 2019 at 12:00 PM Uros Bizjak  wrote:
>
> As usual with costing changes, the patch exposes latent problem. The
> patched compiler tries to generate non-existing DImode move from mask
> register to XMM register, and ICEs during reload [1]. Attached patch
> tightens secondary_reload_needed condition and fixes the issue.
>
> I'm bootstrapping and regression testing patch, and will submit a
> formal submission later today.
>
> [1] https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00537.html
>
> Uros.
>
> On Thu, Aug 29, 2019 at 9:53 AM Uros Bizjak  wrote:
> >
> > On Wed, Aug 28, 2019 at 5:12 PM Uros Bizjak  wrote:
> > >
> > > Attached patch improves costing for STV shifts and corrects reject
> > > condition for out of range shift count operands.
> > >
> > > 2019-08-28  Uroš Bizjak  
> > >
> > > * config/i386/i386-features.c
> > > (general_scalar_chain::compute_convert_gain):
> > > Correct cost for double-word shifts.
> > > (general_scalar_to_vector_candidate_p): Reject count operands
> > > greater or equal to mode bitsize.
> > >
> > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> > >
> > > Committed to mainline SVN.
> >
> > Ouch... I mixed up patches and actually committed the patch that
> > removes maximum from cost of sse<->int moves.
> >
> > I can leave the patch for a day, so we can see the effects of the cost
> > change, and if the patch creates problems, I'll revert it.
> >
> > Sorry for the mixup,
> > Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d2d84eb11663..1c9c719f22a3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18306,32 +18306,36 @@ inline_secondary_memory_needed (machine_mode mode, 
reg_class_t class1,
   if (FLOAT_CLASS_P (class1) != FLOAT_CLASS_P (class2))
 return true;
 
-  /* Between mask and general, we have moves no larger than word size.  */
-  if ((MASK_CLASS_P (class1) != MASK_CLASS_P (class2))
-  && (GET_MODE_SIZE (mode) > UNITS_PER_WORD))
-  return true;
-
   /* ??? This is a lie.  We do have moves between mmx/general, and for
  mmx/sse2.  But by saying we need secondary memory we discourage the
  register allocator from using the mmx registers unless needed.  */
   if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2))
 return true;
 
+  /* Between mask and general, we have moves no larger than word size.  */
+  if (MASK_CLASS_P (class1) != MASK_CLASS_P (class2))
+{
+  if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
+ || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   return true;
+}
+
   if (SSE_CLASS_P (class1) != SSE_CLASS_P (class2))
 {
   /* SSE1 doesn't have any direct moves from other classes.  */
   if (!TARGET_SSE2)
return true;
 
+  /* Between SSE and general, we have moves no larger than word size.  */
+  if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
+ || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+   return true;
+
   /* If the target says that inter-unit moves are more expensive
 than moving through memory, then don't generate them.  */
   if ((SSE_CLASS_P (class1) && !TARGET_INTER_UNIT_MOVES_FROM_VEC)
  || (SSE_CLASS_P (class2) && !TARGET_INTER_UNIT_MOVES_TO_VEC))
return true;
-
-  /* Between SSE and general, we have moves no larger than word size.  */
-  if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
-   return true;
 }
 
   return false;
@@ -18608,15 +18612,7 @@ ix86_register_move_cost (machine_mode mode, 
reg_class_t class1_i,
   if (MMX_CLASS_P (class1) != MMX_CLASS_P (class2))
 gcc_unreachable ();
 
-  /* Moves between SSE and integer units are expensive.  */
   if (SSE_CLASS_P (class1) != SSE_CLASS_P (class2))
-
-/* ??? By keeping returned value relatively high, we limit the number
-   of moves between integer and SSE registers for all targets.
-   Additionally, high value prevents problem with x86_modes_tieable_p(),
-   where integer modes in SSE registers are not tieable
-   because of missing QImode and HImode moves to, from or between
-   MMX/SSE registers.  */
 return (SSE_CLASS_P (class1)
? ix86_cost->hard_register.sse_to_integer
: ix86_cost->hard_register.integer_to_sse);


[PATCH, i386]: Fix secondary_reload_needed (was: Re: [PATCH, i386]: Improve STV conversion of shifts)

2019-08-29 Thread Uros Bizjak
As usual with costing changes, the patch exposes latent problem. The
patched compiler tries to generate non-existing DImode move from mask
register to XMM register, and ICEs during reload [1]. Attached patch
tightens secondary_reload_needed condition and fixes the issue.

I'm bootstrapping and regression testing patch, and will submit a
formal submission later today.

[1] https://gcc.gnu.org/ml/gcc-regression/2019-08/msg00537.html

Uros.

On Thu, Aug 29, 2019 at 9:53 AM Uros Bizjak  wrote:
>
> On Wed, Aug 28, 2019 at 5:12 PM Uros Bizjak  wrote:
> >
> > Attached patch improves costing for STV shifts and corrects reject
> > condition for out of range shift count operands.
> >
> > 2019-08-28  Uroš Bizjak  
> >
> > * config/i386/i386-features.c
> > (general_scalar_chain::compute_convert_gain):
> > Correct cost for double-word shifts.
> > (general_scalar_to_vector_candidate_p): Reject count operands
> > greater or equal to mode bitsize.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Committed to mainline SVN.
>
> Ouch... I mixed up patches and actually committed the patch that
> removes maximum from cost of sse<->int moves.
>
> I can leave the patch for a day, so we can see the effects of the cost
> change, and if the patch creates problems, I'll revert it.
>
> Sorry for the mixup,
> Uros.


Re: [PR 91579] Avoid creating redundant PHI nodes in tail-call pass

2019-08-29 Thread Richard Biener
On Thu, Aug 29, 2019 at 11:04 AM Martin Jambor  wrote:
>
> Hi,
>
> when turning a tail-recursive call into a loop, the tail-call pass
> creates a phi node for each gimple_reg function parameter that has any
> use at all, even when the value passed to the original call is the same
> as the received one, when it is the parameter's default definition.
> This results in a redundant phi node in which one argument is the
> default definition and all others are the LHS of the same phi node.  See
> the Bugzilla entry for an example.  These phi nodes in turn confuses
> ipa-prop.c which cannot skip them and may not create a pass-through jump
> function when it should.
>
> Fixed by the following patch which just adds a bitmap to remember where
> there are non-default-defs passed to a tail-recursive call and then
> creates phi nodes only for such parameters.  It has passed bootstrap and
> testing on x86_64-linux.
>
> OK for trunk?

OK.  Eventually arg_needs_copy_p can be elided completely
and pre-computed into the bitmap so we can just check
the positional?  And rename the bitmap to arg_need_copy itself.

Thanks,
Richard.

>
> Martin
>
>
> 2019-08-28  Martin Jambor  
>
> tree-optimization/91579
> * tree-tailcall.c (tailr_non_ddef_args): New variable.
> (find_tail_calls): Allocate tailr_non_ddef_args and set its bits as
> appropriate.
> (arg_needs_copy_p): New parameter idx.  Also check
> tailr_non_ddef_args.
> (tree_optimize_tail_calls_1): Free tailr_non_ddef_args.
>
> testsuite/
> * gcc.dg/tree-ssa/pr91579.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr91579.c | 22 ++
>  gcc/tree-tailcall.c | 38 +++--
>  2 files changed, 52 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr91579.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c
> new file mode 100644
> index 000..ee752be1a85
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-tailr1" } */
> +
> +typedef long unsigned int size_t;
> +typedef int (*compare_t)(const void *, const void *);
> +
> +int partition (void *base, size_t nmemb, size_t size, compare_t cmp);
> +
> +void
> +my_qsort (void *base, size_t nmemb, size_t size, compare_t cmp)
> +{
> +  int pt;
> +  if (nmemb > 1)
> +{
> +  pt = partition (base, nmemb, size, cmp);
> +  my_qsort (base, pt + 1, size, cmp);
> +  my_qsort ((void*)((char*) base + (pt + 1) * size),
> +   nmemb - pt - 1, size, cmp);
> +}
> +}
> +
> +/* { dg-final { scan-tree-dump-not "cmp\[^\r\n\]*PHI" "tailr1" } } */
> diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
> index a4b563efd73..23d60f492da 100644
> --- a/gcc/tree-tailcall.c
> +++ b/gcc/tree-tailcall.c
> @@ -126,6 +126,13 @@ struct tailcall
> accumulator.  */
>  static tree m_acc, a_acc;
>
> +/* Bitmap with a bit for each function parameter which is set to true if in a
> +   tail-recursion we pass to the actual argument something else than the
> +   default definition of the corresponding formal parameter.  It has no 
> meaning
> +   for non-gimple-register parameters.  */
> +
> +static bitmap tailr_non_ddef_args;
> +
>  static bool optimize_tail_call (struct tailcall *, bool);
>  static void eliminate_tail_call (struct tailcall *);
>
> @@ -727,6 +734,17 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
>   gimple_stmt_iterator mgsi = gsi_for_stmt (stmt);
>   gsi_move_before (&mgsi, &gsi);
> }
> +  if (!tailr_non_ddef_args)
> +   tailr_non_ddef_args = BITMAP_ALLOC (NULL);
> +  for (param = DECL_ARGUMENTS (current_function_decl), idx = 0;
> +  param;
> +  param = DECL_CHAIN (param), idx++)
> +   {
> + tree arg = gimple_call_arg (call, idx);
> + if (is_gimple_reg (param)
> + && (arg != ssa_default_def (cfun, param)))
> +   bitmap_set_bit (tailr_non_ddef_args, idx);
> +   }
>  }
>
>nw = XNEW (struct tailcall);
> @@ -905,11 +923,11 @@ decrease_profile (basic_block bb, profile_count count)
>  }
>  }
>
> -/* Returns true if argument PARAM of the tail recursive call needs to be 
> copied
> -   when the call is eliminated.  */
> +/* Returns true if PARAM, which is the IDX-th argument of the tail 
> recursively
> +   called function, needs to be copied when the call is eliminated.  */
>
>  static bool
> -arg_needs_copy_p (tree param)
> +arg_needs_copy_p (tree param, unsigned idx)
>  {
>tree def;
>
> @@ -918,7 +936,7 @@ arg_needs_copy_p (tree param)
>
>/* Parameters that are only defined but never used need not be copied.  */
>def = ssa_default_def (cfun, param);
> -  if (!def)
> +  if (!def || !bitmap_bit_p (tailr_non_ddef_args, idx))
>  return false;
>
>return true;
> @@ -1005,7 +1023,7 @@ eliminate_tail

Re: [PATCH V5 05/11] bpf: new GCC port

2019-08-29 Thread Jose E. Marchesi


> +/* Return true if an argument at the position indicated by CUM should
> +   be passed by reference.  If the hook returns true, a copy of that
> +   argument is made in memory and a pointer to the argument is passed
> +   instead of the argument itself.  */
> +
> +static bool
> +bpf_pass_by_reference (cumulative_args_t cum ATTRIBUTE_UNUSED,
> +const function_arg_info &arg)
> +{
> +  poly_int64 mode_size = GET_MODE_SIZE (arg.mode);
> +  unsigned num_bytes
> += (arg.type ? arg.type_size_in_bytes() : mode_size);

Sorry, I meant replace the whole expression with arg.type_size_in_bytes ():

static bool
bpf_pass_by_reference (cumulative_args_t cum ATTRIBUTE_UNUSED,
   const function_arg_info &arg)
{
  unsigned num_bytes = arg.type_size_in_bytes ();

I found it weird, but didn't look at the implementation of the arg
methods... too bad I didnt :)

> +
> +  /* Pass aggregates and values bigger than 5 words by reference.
> + Everything else is passed by copy.  */
> +  return ((arg.type && arg.aggregate_type_p ())
> +   || (num_bytes > 8*5));

Here too:

  return (arg.aggregate_type_p ()
  || (num_bytes > 8*5));

(although it now fits easily on one line).

> +/* Update the summarizer variable pointed by CA to advance past an
> +   argument in the argument list.  */
> +
> +static void
> +bpf_function_arg_advance (cumulative_args_t ca,
> +   const function_arg_info &arg)
> +{
> +  CUMULATIVE_ARGS *cum = get_cumulative_args (ca);
> +  poly_int64 mode_size = GET_MODE_SIZE (arg.mode);
> +  unsigned num_bytes = (arg.type
> + ? arg.type_size_in_bytes () : mode_size);

Same here.

> +  unsigned num_words
> += CEIL (num_bytes, UNITS_PER_WORD);

Nit: line break seems unnecessary.

> +
> +  if (*cum <= 5 && *cum + num_words > 5)
> +error ("too many function arguments for eBPF");
> +  else
> +*cum += num_words;

For this to work as intended (i.e. only report an error once), I think
you need to do "*cum += num_words;" even when reporting the error.

Right.

> +/* Return the appropriate instruction to CALL to a function.  TARGET
> +   is a RTX denoting the address of the called function.

an RTX

> + if (GET_CODE (op0) == REG
> + && (GET_CODE (op1) == LABEL_REF
> + || GET_CODE (op1) == SYMBOL_REF
> + || GET_CODE (op1) == CONST_INT
> + || GET_CODE (op1) == CONST))
> +   {
> + fprintf (file, "[%s+", reg_names[REGNO (op0)]);
> + output_addr_const (file, op1);
> + fputs ("]", file);
> +   }
> + else
> +   fatal_insn ("invalid address in operand", addr);

Do you support all those codes on the rhs of a PLUS?  I thought it was
just CONST_INT.
   
It is just CONST_INT.  I removed a cpp constant and it had the
previously legal values.  Fixing.
 
> +(define_predicate "reg_or_memory_operand"
> +  (ior (match_operand 0 "register_operand")
> +   (match_operand 0 "memory_operand")))
> +
> +(define_predicate "mov_dst_operand"
> +  (ior (match_operand 0 "register_operand")
> +   (match_operand 0 "memory_operand")))

These are both equivalent to the standard nonimmediate_operand.
It'd be better to use that instead if possible.

Ok.

Looks good otherwise.

Richard


Re: [PATCH] Improve -mavx -mno-avx2 32-byte vector permutations (PR target/91560)

2019-08-29 Thread Uros Bizjak
On Thu, Aug 29, 2019 at 10:41 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch improves especially V8SFmode permutations for
> AVX (non-AVX2) ISA, where we punted way too often, even when we can handle
> it.
> On the
> typedef float __v8sf __attribute__((vector_size (32)));
> typedef double __v4df __attribute__((vector_size (32)));
> typedef int __v8si __attribute__((vector_size (32)));
> typedef long long __v4di __attribute__((vector_size (32)));
> #ifdef __clang__
> #define S(x, y, t, ...) __builtin_shufflevector (x, y, __VA_ARGS__)
> #else
> #define S(x, y, t, ...) __builtin_shuffle (x, y, (t) { __VA_ARGS__ })
> #endif
>
> __v8sf f1 (__v8sf x, __v8sf y) { return S (x, y, __v8si, 0, 8, 9, 10, 11, 12, 
> 13, 14 ); }
> __v8sf f2 (__v8sf x, __v8sf y) { return S (x, y, __v8si, 0, 1, 8, 9, 10, 11, 
> 12, 13 ); }
> testcase we used to emit terrible code (8 BIT_FIELD_REFs + composition
> back), while LLVM emits:
> vpermilps   $144, %xmm1, %xmm2 # xmm2 = xmm1[0,0,1,2]
> vextractf128$1, %ymm1, %xmm3
> vblendps$8, %xmm1, %xmm3, %xmm1 # xmm1 = xmm3[0,1,2],xmm1[3]
> vpermilps   $147, %xmm1, %xmm1 # xmm1 = xmm1[3,0,1,2]
> vinsertf128 $1, %xmm1, %ymm2, %ymm1
> vblendps$1, %ymm0, %ymm1, %ymm0 # ymm0 = 
> ymm0[0],ymm1[1,2,3,4,5,6,7]
> and
> vextractf128$1, %ymm1, %xmm2
> vshufpd $1, %xmm2, %xmm1, %xmm2 # xmm2 = xmm1[1],xmm2[0]
> vmovddup%xmm1, %xmm1# xmm1 = xmm1[0,0]
> vinsertf128 $1, %xmm2, %ymm1, %ymm1
> vblendps$3, %ymm0, %ymm1, %ymm0 # ymm0 = 
> ymm0[0,1],ymm1[2,3,4,5,6,7]
> With the patch we emit:
> vpermilps   $144, %ymm1, %ymm2
> vpermilps   .LC0(%rip), %ymm1, %ymm1
> vblendps$238, %ymm2, %ymm0, %ymm0
> vperm2f128  $1, %ymm1, %ymm1, %ymm1
> vblendps$16, %ymm1, %ymm0, %ymm0
> and
> vshufps $68, %ymm1, %ymm0, %ymm0
> vpermilps   .LC1(%rip), %ymm1, %ymm1
> vperm2f128  $1, %ymm1, %ymm1, %ymm1
> vblendps$48, %ymm1, %ymm0, %ymm0
> so one insn each shorter than what LLVM emits.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-08-29  Jakub Jelinek  
>
> PR target/91560
> * config/i386/i386-expand.c (expand_vec_perm_movs,
> expand_vec_perm_blend, expand_vec_perm_vpermil,
> expand_vec_perm_pshufb, expand_vec_perm_1,
> expand_vec_perm_pshuflw_pshufhw, expand_vec_perm_palignr,
> expand_vec_perm_interleave2, expand_vec_perm_vpermq_perm_1,
> expand_vec_perm_vperm2f128, expand_vec_perm_interleave3,
> expand_vec_perm_vperm2f128_vblend, expand_vec_perm_2vperm2f128_vshuf,
> expand_vec_perm_even_odd, expand_vec_perm_broadcast): Adjust function
> comments - replace ix86_expand_vec_perm_builtin_1 with
> ix86_expand_vec_perm_const_1.
> (expand_vec_perm2_vperm2f128_vblend): New function.
> (ix86_expand_vec_perm_const_1): New forward declaration.  Call
> expand_vec_perm2_vperm2f128_vblend as last resort.
> (canonicalize_perm): Formatting fix.
>
> * gcc.dg/torture/vshuf-8.inc: Add two further permutations.

LGTM, but actually your area ;)

Thanks,
Uros.

> --- gcc/config/i386/i386-expand.c.jj2019-08-27 12:26:25.383089132 +0200
> +++ gcc/config/i386/i386-expand.c   2019-08-28 15:22:43.911004586 +0200
> @@ -16372,7 +16372,7 @@ expand_vselect_vconcat (rtx target, rtx
>return ok;
>  }
>
> -/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
> +/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
> using movss or movsd.  */
>  static bool
>  expand_vec_perm_movs (struct expand_vec_perm_d *d)
> @@ -16408,7 +16408,7 @@ expand_vec_perm_movs (struct expand_vec_
>return true;
>  }
>
> -/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
> +/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
> in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
>
>  static bool
> @@ -16633,7 +16633,7 @@ expand_vec_perm_blend (struct expand_vec
>return true;
>  }
>
> -/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
> +/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
> in terms of the variable form of vpermilps.
>
> Note that we will have already failed the immediate input vpermilps,
> @@ -16709,7 +16709,7 @@ valid_perm_using_mode_p (machine_mode vm
>return true;
>  }
>
> -/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
> +/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
> in terms of pshufb, vpperm, vpermq, vpermd, vpermps or vperm2i128.  */
>
>  static bool
> @@ -17026,7 +17026,7 @@ ix86_expand_vec_one_operand_perm_avx512
>
>  static bool expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool);
>
> -/* A subroutine of ix86_expand_v

Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment

2019-08-29 Thread Christophe Lyon
On Thu, 29 Aug 2019 at 10:58, Kyrill Tkachov
 wrote:
>
> Hi Bernd,
>
> On 8/28/19 10:36 PM, Bernd Edlinger wrote:
> > On 8/28/19 2:07 PM, Christophe Lyon wrote:
> >> Hi,
> >>
> >> This patch causes an ICE when building libgcc's unwind-arm.o
> >> when configuring GCC:
> >> --target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
> >> cortex-a15 --with-fpu neon-vfpv4:
> >>
> >> The build works for the same target, but --with-mode arm --with-cpu
> >> cortex a9 --with-fpu vfp
> >>
> >> In file included from
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
> >> In function 'get_eit_entry':
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
> >> warning: cast discards 'const' qualifier from pointer target type
> >> [-Wcast-qual]
> >>245 |   ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
> >>| ^
> >> during RTL pass: expand
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
> >> In function 'unwind_phase2_forced':
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
> >> internal compiler error: in gen_movdi, at config/arm/arm.md:5235
> >>319 |   saved_vrs.core = entry_vrs->core;
> >>|   ~~~^
> >> 0x126530f gen_movdi(rtx_def*, rtx_def*)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
> >> 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
> >> 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
> >> 0x897083 emit_move_insn(rtx_def*, rtx_def*)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
> >> 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
> >> 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
> >> 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
> >> 0x89ba1e emit_block_move_via_cpymem
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
> >> 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
> >> block_op_methods, unsigned int, long, unsigned long, unsigned long,
> >> unsigned long, bool, bool*)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
> >> 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
> >> 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
> >> 0x88c1f9 store_field
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
> >> 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
> >> 0x761964 expand_gimple_stmt_1
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
> >> 0x761964 expand_gimple_stmt
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
> >> 0x768583 expand_gimple_basic_block
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
> >> 0x76abc6 execute
> >>  
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538
> >>
> >> Christophe
> >>
> > Okay, sorry for the breakage.
> >
> > What is happening in gen_cpymem_ldrd_strd is of course against the rules:
> >
> > It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.
> >
> > I have a patch for this, which is able to fix the libgcc build on a cross, 
> > but have no
> > possibility to bootstrap the affected target.
> >
> > Could you please help?
>
> Well it's good that the sanitisation is catching the bugs!
>
> Bootstrapping this patch I get another assert with the backtrace:

Thanks for the additional testing, Kyrill!

FWIW, my original report was with a failure to just build GCC for
cortex-a15. I later got the reports of testing cross-toolchains, and
saw other problems on cortex-a9 for instance.
But I guess, you have noticed them with your bootstrap?
on arm-linux-gnueabi
gcc.target/arm/aapcs/align4.c (internal compiler error)
gcc.target/arm/a

Re: [PATCH/RFC] Simplify wrapped RTL op

2019-08-29 Thread Robin Dapp
>> PR37451.  Not clear what target that regressed on, btw.
> 
> And PR55190 and PR67288 and probably more.

Thanks for finding those.  So the hope is to get this fixed or rather
move towards a fix with the patch series that's currently reviewed which
injects some doloop knowledge into ivopts?

As said before, I was thinking of storing niter + 1 somewhere and use
this instead of doing the + 1 later when it cannot be simplified
anymore. But if we expect a larger rewrite anyway, it's probably not
worthwhile to pursue that now.

Regards
 Robin



[PR 91579] Avoid creating redundant PHI nodes in tail-call pass

2019-08-29 Thread Martin Jambor
Hi,

when turning a tail-recursive call into a loop, the tail-call pass
creates a phi node for each gimple_reg function parameter that has any
use at all, even when the value passed to the original call is the same
as the received one, when it is the parameter's default definition.
This results in a redundant phi node in which one argument is the
default definition and all others are the LHS of the same phi node.  See
the Bugzilla entry for an example.  These phi nodes in turn confuses
ipa-prop.c which cannot skip them and may not create a pass-through jump
function when it should.

Fixed by the following patch which just adds a bitmap to remember where
there are non-default-defs passed to a tail-recursive call and then
creates phi nodes only for such parameters.  It has passed bootstrap and
testing on x86_64-linux.

OK for trunk?

Martin


2019-08-28  Martin Jambor  

tree-optimization/91579
* tree-tailcall.c (tailr_non_ddef_args): New variable.
(find_tail_calls): Allocate tailr_non_ddef_args and set its bits as
appropriate.
(arg_needs_copy_p): New parameter idx.  Also check
tailr_non_ddef_args.
(tree_optimize_tail_calls_1): Free tailr_non_ddef_args.

testsuite/
* gcc.dg/tree-ssa/pr91579.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr91579.c | 22 ++
 gcc/tree-tailcall.c | 38 +++--
 2 files changed, 52 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr91579.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c
new file mode 100644
index 000..ee752be1a85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr91579.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailr1" } */
+
+typedef long unsigned int size_t;
+typedef int (*compare_t)(const void *, const void *);
+
+int partition (void *base, size_t nmemb, size_t size, compare_t cmp);
+
+void
+my_qsort (void *base, size_t nmemb, size_t size, compare_t cmp)
+{
+  int pt;
+  if (nmemb > 1)
+{
+  pt = partition (base, nmemb, size, cmp);
+  my_qsort (base, pt + 1, size, cmp);
+  my_qsort ((void*)((char*) base + (pt + 1) * size),
+   nmemb - pt - 1, size, cmp);
+}
+}
+
+/* { dg-final { scan-tree-dump-not "cmp\[^\r\n\]*PHI" "tailr1" } } */
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index a4b563efd73..23d60f492da 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -126,6 +126,13 @@ struct tailcall
accumulator.  */
 static tree m_acc, a_acc;
 
+/* Bitmap with a bit for each function parameter which is set to true if in a
+   tail-recursion we pass to the actual argument something else than the
+   default definition of the corresponding formal parameter.  It has no meaning
+   for non-gimple-register parameters.  */
+
+static bitmap tailr_non_ddef_args;
+
 static bool optimize_tail_call (struct tailcall *, bool);
 static void eliminate_tail_call (struct tailcall *);
 
@@ -727,6 +734,17 @@ find_tail_calls (basic_block bb, struct tailcall **ret)
  gimple_stmt_iterator mgsi = gsi_for_stmt (stmt);
  gsi_move_before (&mgsi, &gsi);
}
+  if (!tailr_non_ddef_args)
+   tailr_non_ddef_args = BITMAP_ALLOC (NULL);
+  for (param = DECL_ARGUMENTS (current_function_decl), idx = 0;
+  param;
+  param = DECL_CHAIN (param), idx++)
+   {
+ tree arg = gimple_call_arg (call, idx);
+ if (is_gimple_reg (param)
+ && (arg != ssa_default_def (cfun, param)))
+   bitmap_set_bit (tailr_non_ddef_args, idx);
+   }
 }
 
   nw = XNEW (struct tailcall);
@@ -905,11 +923,11 @@ decrease_profile (basic_block bb, profile_count count)
 }
 }
 
-/* Returns true if argument PARAM of the tail recursive call needs to be copied
-   when the call is eliminated.  */
+/* Returns true if PARAM, which is the IDX-th argument of the tail recursively
+   called function, needs to be copied when the call is eliminated.  */
 
 static bool
-arg_needs_copy_p (tree param)
+arg_needs_copy_p (tree param, unsigned idx)
 {
   tree def;
 
@@ -918,7 +936,7 @@ arg_needs_copy_p (tree param)
 
   /* Parameters that are only defined but never used need not be copied.  */
   def = ssa_default_def (cfun, param);
-  if (!def)
+  if (!def || !bitmap_bit_p (tailr_non_ddef_args, idx))
 return false;
 
   return true;
@@ -1005,7 +1023,7 @@ eliminate_tail_call (struct tailcall *t)
param;
param = DECL_CHAIN (param), idx++)
 {
-  if (!arg_needs_copy_p (param))
+  if (!arg_needs_copy_p (param, idx))
continue;
 
   arg = gimple_call_arg (stmt, idx);
@@ -1139,10 +1157,11 @@ tree_optimize_tail_calls_1 (bool opt_tailcalls)
  split_edge (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
 
  /* Copy the args if needed.  */
- for (param = DECL_ARGUMENTS (current_function_decl);
+ 

Re: [PATCH] Fix up switchconv for strict enum types (PR tree-optimization/91351)

2019-08-29 Thread Richard Biener
On Thu, 29 Aug 2019, Jakub Jelinek wrote:

> Hi!
> 
> switchconv uses unsigned_type_for to get unsigned type to perform
> computations in, which is fine if you just do a comparison in that type or
> similar, but not when actually constructing range-like checks or doing
> further arithmetics on the type.
> The reassoc and fold-const range test optimization has range_check_type
> function for that, this patch makes use of that function, which among other
> things uses an INTEGER_TYPE instead of enums/booleans and verifies the
> INTEGER_TYPE minimum/maximum to make sure it properly wraps around.
> One issue is that this function can return NULL on some weird integral
> types (perhaps Ada special integers), so we need to punt in that case.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2019-08-29  Jakub Jelinek  
> 
>   PR tree-optimization/91351
>   * tree-cfg.c (generate_range_test): Use range_check_type instead of
>   unsigned_type_for.
>   * tree-cfgcleanup.c (convert_single_case_switch): Punt if
>   range_check_type returns NULL.
>   * tree-switch-conversion.c (switch_conversion::build_one_array):
>   Use range_check_type instead of unsigned_type_for, don't perform
>   linear opt if it returns NULL.
>   (bit_test_cluster::find_bit_tests): Formatting fix.
>   (bit_test_cluster::emit): Use range_check_type instead of
>   unsigned_type_for.
>   (switch_decision_tree::try_switch_expansion): Punt if range_check_type
>   returns NULL.
> 
>   * g++.dg/opt/pr91351.C: New test.
> 
> --- gcc/tree-cfg.c.jj 2019-07-29 12:56:38.971248016 +0200
> +++ gcc/tree-cfg.c2019-08-28 19:37:50.843262628 +0200
> @@ -9221,7 +9221,7 @@ generate_range_test (basic_block bb, tre
>tree *lhs, tree *rhs)
>  {
>tree type = TREE_TYPE (index);
> -  tree utype = unsigned_type_for (type);
> +  tree utype = range_check_type (type);
>  
>low = fold_convert (utype, low);
>high = fold_convert (utype, high);
> --- gcc/tree-cfgcleanup.c.jj  2019-04-24 10:10:22.816535073 +0200
> +++ gcc/tree-cfgcleanup.c 2019-08-28 19:39:40.032680646 +0200
> @@ -101,6 +101,8 @@ convert_single_case_switch (gswitch *swt
>if (high)
>  {
>tree lhs, rhs;
> +  if (range_check_type (TREE_TYPE (index)) == NULL_TREE)
> + return false;
>generate_range_test (bb, index, low, high, &lhs, &rhs);
>cond = gimple_build_cond (LE_EXPR, lhs, rhs, NULL_TREE, NULL_TREE);
>  }
> --- gcc/tree-switch-conversion.c.jj   2019-07-10 15:53:01.148520370 +0200
> +++ gcc/tree-switch-conversion.c  2019-08-28 19:45:18.062783134 +0200
> @@ -605,7 +605,9 @@ switch_conversion::build_one_array (int
>vec *constructor = m_constructors[num];
>wide_int coeff_a, coeff_b;
>bool linear_p = contains_linear_function_p (constructor, &coeff_a, 
> &coeff_b);
> -  if (linear_p)
> +  tree type;
> +  if (linear_p
> +  && (type = range_check_type (TREE_TYPE ((*constructor)[0].value
>  {
>if (dump_file && coeff_a.to_uhwi () > 0)
>   fprintf (dump_file, "Linear transformation with A = %" PRId64
> @@ -613,13 +615,12 @@ switch_conversion::build_one_array (int
>coeff_b.to_shwi ());
>  
>/* We must use type of constructor values.  */
> -  tree t = unsigned_type_for (TREE_TYPE ((*constructor)[0].value));
>gimple_seq seq = NULL;
> -  tree tmp = gimple_convert (&seq, t, m_index_expr);
> -  tree tmp2 = gimple_build (&seq, MULT_EXPR, t,
> - wide_int_to_tree (t, coeff_a), tmp);
> -  tree tmp3 = gimple_build (&seq, PLUS_EXPR, t, tmp2,
> - wide_int_to_tree (t, coeff_b));
> +  tree tmp = gimple_convert (&seq, type, m_index_expr);
> +  tree tmp2 = gimple_build (&seq, MULT_EXPR, type,
> + wide_int_to_tree (type, coeff_a), tmp);
> +  tree tmp3 = gimple_build (&seq, PLUS_EXPR, type, tmp2,
> + wide_int_to_tree (type, coeff_b));
>tree tmp4 = gimple_convert (&seq, TREE_TYPE (name), tmp3);
>gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
>load = gimple_build_assign (name, tmp4);
> @@ -1351,7 +1352,7 @@ bit_test_cluster::find_bit_tests (vec entire));
>   }
>else
> - for (int i = end - 1; i >=  start; i--)
> + for (int i = end - 1; i >= start; i--)
> output.safe_push (clusters[i]);
>  
>end = start;
> @@ -1484,7 +1485,7 @@ bit_test_cluster::emit (tree index_expr,
>unsigned int i, j, k;
>unsigned int count;
>  
> -  tree unsigned_index_type = unsigned_type_for (index_type);
> +  tree unsigned_index_type = range_check_type (index_type);
>  
>gimple_stmt_iterator gsi;
>gassign *shift_stmt;
> @@ -1794,7 +1795,8 @@ switch_decision_tree::try_switch_expansi
>tree index_type = TREE_TYPE (index_expr);
>basic_block bb = gimple_bb (m_

Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment

2019-08-29 Thread Kyrill Tkachov

Hi Bernd,

On 8/28/19 10:36 PM, Bernd Edlinger wrote:

On 8/28/19 2:07 PM, Christophe Lyon wrote:

Hi,

This patch causes an ICE when building libgcc's unwind-arm.o
when configuring GCC:
--target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
cortex-a15 --with-fpu neon-vfpv4:

The build works for the same target, but --with-mode arm --with-cpu
cortex a9 --with-fpu vfp

In file included from
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
In function 'get_eit_entry':
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
warning: cast discards 'const' qualifier from pointer target type
[-Wcast-qual]
   245 |   ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
   | ^
during RTL pass: expand
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
In function 'unwind_phase2_forced':
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
internal compiler error: in gen_movdi, at config/arm/arm.md:5235
   319 |   saved_vrs.core = entry_vrs->core;
   |   ~~~^
0x126530f gen_movdi(rtx_def*, rtx_def*)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
0x897083 emit_move_insn(rtx_def*, rtx_def*)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
0x89ba1e emit_block_move_via_cpymem
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
block_op_methods, unsigned int, long, unsigned long, unsigned long,
unsigned long, bool, bool*)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
0x88c1f9 store_field
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
0x761964 expand_gimple_stmt_1
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
0x761964 expand_gimple_stmt
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
0x768583 expand_gimple_basic_block
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
0x76abc6 execute
 
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538

Christophe


Okay, sorry for the breakage.

What is happening in gen_cpymem_ldrd_strd is of course against the rules:

It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.

I have a patch for this, which is able to fix the libgcc build on a cross, but 
have no
possibility to bootstrap the affected target.

Could you please help?


Well it's good that the sanitisation is catching the bugs!

Bootstrapping this patch I get another assert with the backtrace:

$BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h: 
In function '(static initializers for 
$SRC/libstdc++-v3/libsupc++/eh_alloc.cc)':
$BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:129:5: 
internal compiler error: in gen_movv8qi, at config/arm/vec-common.md:29

  129 | {
  | ^
0x14155cb gen_movv8qi(rtx_def*, rtx_def*)
    $SRC/gcc/config/arm/vec-common.md:29
0x96bb89 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
    $SRC/gcc/recog.h:318
0x94bc95 emit_move_insn_1(rtx_def*, rtx_def*)
    $SRC/gcc/expr.c:3694
0x94c05b emit_move_insn(rtx_def*, rtx_def*)
    $SRC/gcc/expr.c:3790
0x10d5ee5 arm_block_set_aligned_vect
    $SRC/gcc/config/arm/arm.c:30204
0x10d6b37 arm_block_set_vect
    $SRC/gcc/config/arm/arm.c:30428
0x10d6caf arm_gen_setmem(rtx_def**)
    $SRC/gcc/config/arm/arm.c:30458
0x14

Re: [PATCH V5 05/11] bpf: new GCC port

2019-08-29 Thread Richard Sandiford
jema...@gnu.org (Jose E. Marchesi) writes:
> +/* Return true if an argument at the position indicated by CUM should
> +   be passed by reference.  If the hook returns true, a copy of that
> +   argument is made in memory and a pointer to the argument is passed
> +   instead of the argument itself.  */
> +
> +static bool
> +bpf_pass_by_reference (cumulative_args_t cum ATTRIBUTE_UNUSED,
> +const function_arg_info &arg)
> +{
> +  poly_int64 mode_size = GET_MODE_SIZE (arg.mode);
> +  unsigned num_bytes
> += (arg.type ? arg.type_size_in_bytes() : mode_size);

Sorry, I meant replace the whole expression with arg.type_size_in_bytes ():

static bool
bpf_pass_by_reference (cumulative_args_t cum ATTRIBUTE_UNUSED,
   const function_arg_info &arg)
{
  unsigned num_bytes = arg.type_size_in_bytes ();

> +
> +  /* Pass aggregates and values bigger than 5 words by reference.
> + Everything else is passed by copy.  */
> +  return ((arg.type && arg.aggregate_type_p ())
> +   || (num_bytes > 8*5));

Here too:

  return (arg.aggregate_type_p ()
  || (num_bytes > 8*5));

(although it now fits easily on one line).

> +/* Update the summarizer variable pointed by CA to advance past an
> +   argument in the argument list.  */
> +
> +static void
> +bpf_function_arg_advance (cumulative_args_t ca,
> +   const function_arg_info &arg)
> +{
> +  CUMULATIVE_ARGS *cum = get_cumulative_args (ca);
> +  poly_int64 mode_size = GET_MODE_SIZE (arg.mode);
> +  unsigned num_bytes = (arg.type
> + ? arg.type_size_in_bytes () : mode_size);

Same here.

> +  unsigned num_words
> += CEIL (num_bytes, UNITS_PER_WORD);

Nit: line break seems unnecessary.

> +
> +  if (*cum <= 5 && *cum + num_words > 5)
> +error ("too many function arguments for eBPF");
> +  else
> +*cum += num_words;

For this to work as intended (i.e. only report an error once), I think
you need to do "*cum += num_words;" even when reporting the error.

> +/* Return the appropriate instruction to CALL to a function.  TARGET
> +   is a RTX denoting the address of the called function.

an RTX

> + if (GET_CODE (op0) == REG
> + && (GET_CODE (op1) == LABEL_REF
> + || GET_CODE (op1) == SYMBOL_REF
> + || GET_CODE (op1) == CONST_INT
> + || GET_CODE (op1) == CONST))
> +   {
> + fprintf (file, "[%s+", reg_names[REGNO (op0)]);
> + output_addr_const (file, op1);
> + fputs ("]", file);
> +   }
> + else
> +   fatal_insn ("invalid address in operand", addr);

Do you support all those codes on the rhs of a PLUS?  I thought it was
just CONST_INT.

> +(define_predicate "reg_or_memory_operand"
> +  (ior (match_operand 0 "register_operand")
> +   (match_operand 0 "memory_operand")))
> +
> +(define_predicate "mov_dst_operand"
> +  (ior (match_operand 0 "register_operand")
> +   (match_operand 0 "memory_operand")))

These are both equivalent to the standard nonimmediate_operand.
It'd be better to use that instead if possible.

Looks good otherwise.

Thanks,
Richard


[PATCH] Fix up switchconv for strict enum types (PR tree-optimization/91351)

2019-08-29 Thread Jakub Jelinek
Hi!

switchconv uses unsigned_type_for to get unsigned type to perform
computations in, which is fine if you just do a comparison in that type or
similar, but not when actually constructing range-like checks or doing
further arithmetics on the type.
The reassoc and fold-const range test optimization has range_check_type
function for that, this patch makes use of that function, which among other
things uses an INTEGER_TYPE instead of enums/booleans and verifies the
INTEGER_TYPE minimum/maximum to make sure it properly wraps around.
One issue is that this function can return NULL on some weird integral
types (perhaps Ada special integers), so we need to punt in that case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-08-29  Jakub Jelinek  

PR tree-optimization/91351
* tree-cfg.c (generate_range_test): Use range_check_type instead of
unsigned_type_for.
* tree-cfgcleanup.c (convert_single_case_switch): Punt if
range_check_type returns NULL.
* tree-switch-conversion.c (switch_conversion::build_one_array):
Use range_check_type instead of unsigned_type_for, don't perform
linear opt if it returns NULL.
(bit_test_cluster::find_bit_tests): Formatting fix.
(bit_test_cluster::emit): Use range_check_type instead of
unsigned_type_for.
(switch_decision_tree::try_switch_expansion): Punt if range_check_type
returns NULL.

* g++.dg/opt/pr91351.C: New test.

--- gcc/tree-cfg.c.jj   2019-07-29 12:56:38.971248016 +0200
+++ gcc/tree-cfg.c  2019-08-28 19:37:50.843262628 +0200
@@ -9221,7 +9221,7 @@ generate_range_test (basic_block bb, tre
 tree *lhs, tree *rhs)
 {
   tree type = TREE_TYPE (index);
-  tree utype = unsigned_type_for (type);
+  tree utype = range_check_type (type);
 
   low = fold_convert (utype, low);
   high = fold_convert (utype, high);
--- gcc/tree-cfgcleanup.c.jj2019-04-24 10:10:22.816535073 +0200
+++ gcc/tree-cfgcleanup.c   2019-08-28 19:39:40.032680646 +0200
@@ -101,6 +101,8 @@ convert_single_case_switch (gswitch *swt
   if (high)
 {
   tree lhs, rhs;
+  if (range_check_type (TREE_TYPE (index)) == NULL_TREE)
+   return false;
   generate_range_test (bb, index, low, high, &lhs, &rhs);
   cond = gimple_build_cond (LE_EXPR, lhs, rhs, NULL_TREE, NULL_TREE);
 }
--- gcc/tree-switch-conversion.c.jj 2019-07-10 15:53:01.148520370 +0200
+++ gcc/tree-switch-conversion.c2019-08-28 19:45:18.062783134 +0200
@@ -605,7 +605,9 @@ switch_conversion::build_one_array (int
   vec *constructor = m_constructors[num];
   wide_int coeff_a, coeff_b;
   bool linear_p = contains_linear_function_p (constructor, &coeff_a, &coeff_b);
-  if (linear_p)
+  tree type;
+  if (linear_p
+  && (type = range_check_type (TREE_TYPE ((*constructor)[0].value
 {
   if (dump_file && coeff_a.to_uhwi () > 0)
fprintf (dump_file, "Linear transformation with A = %" PRId64
@@ -613,13 +615,12 @@ switch_conversion::build_one_array (int
 coeff_b.to_shwi ());
 
   /* We must use type of constructor values.  */
-  tree t = unsigned_type_for (TREE_TYPE ((*constructor)[0].value));
   gimple_seq seq = NULL;
-  tree tmp = gimple_convert (&seq, t, m_index_expr);
-  tree tmp2 = gimple_build (&seq, MULT_EXPR, t,
-   wide_int_to_tree (t, coeff_a), tmp);
-  tree tmp3 = gimple_build (&seq, PLUS_EXPR, t, tmp2,
-   wide_int_to_tree (t, coeff_b));
+  tree tmp = gimple_convert (&seq, type, m_index_expr);
+  tree tmp2 = gimple_build (&seq, MULT_EXPR, type,
+   wide_int_to_tree (type, coeff_a), tmp);
+  tree tmp3 = gimple_build (&seq, PLUS_EXPR, type, tmp2,
+   wide_int_to_tree (type, coeff_b));
   tree tmp4 = gimple_convert (&seq, TREE_TYPE (name), tmp3);
   gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
   load = gimple_build_assign (name, tmp4);
@@ -1351,7 +1352,7 @@ bit_test_cluster::find_bit_tests (vec=  start; i--)
+   for (int i = end - 1; i >= start; i--)
  output.safe_push (clusters[i]);
 
   end = start;
@@ -1484,7 +1485,7 @@ bit_test_cluster::emit (tree index_expr,
   unsigned int i, j, k;
   unsigned int count;
 
-  tree unsigned_index_type = unsigned_type_for (index_type);
+  tree unsigned_index_type = range_check_type (index_type);
 
   gimple_stmt_iterator gsi;
   gassign *shift_stmt;
@@ -1794,7 +1795,8 @@ switch_decision_tree::try_switch_expansi
   tree index_type = TREE_TYPE (index_expr);
   basic_block bb = gimple_bb (m_switch);
 
-  if (gimple_switch_num_labels (m_switch) == 1)
+  if (gimple_switch_num_labels (m_switch) == 1
+  || range_check_type (index_type) == NULL_TREE)
 return false;
 
   /* Find the default case target label.  */
--- gcc/testsuite/g++.dg/opt/pr91351.C.jj   2019-08-28 19:53:50.946352281 
+0200
+++ gcc/t

[PATCH] Improve -mavx -mno-avx2 32-byte vector permutations (PR target/91560)

2019-08-29 Thread Jakub Jelinek
Hi!

The following patch improves especially V8SFmode permutations for
AVX (non-AVX2) ISA, where we punted way too often, even when we can handle
it.
On the
typedef float __v8sf __attribute__((vector_size (32)));
typedef double __v4df __attribute__((vector_size (32)));
typedef int __v8si __attribute__((vector_size (32)));
typedef long long __v4di __attribute__((vector_size (32)));
#ifdef __clang__
#define S(x, y, t, ...) __builtin_shufflevector (x, y, __VA_ARGS__)
#else
#define S(x, y, t, ...) __builtin_shuffle (x, y, (t) { __VA_ARGS__ })
#endif

__v8sf f1 (__v8sf x, __v8sf y) { return S (x, y, __v8si, 0, 8, 9, 10, 11, 12, 
13, 14 ); }
__v8sf f2 (__v8sf x, __v8sf y) { return S (x, y, __v8si, 0, 1, 8, 9, 10, 11, 
12, 13 ); }
testcase we used to emit terrible code (8 BIT_FIELD_REFs + composition
back), while LLVM emits:
vpermilps   $144, %xmm1, %xmm2 # xmm2 = xmm1[0,0,1,2]
vextractf128$1, %ymm1, %xmm3
vblendps$8, %xmm1, %xmm3, %xmm1 # xmm1 = xmm3[0,1,2],xmm1[3]
vpermilps   $147, %xmm1, %xmm1 # xmm1 = xmm1[3,0,1,2]
vinsertf128 $1, %xmm1, %ymm2, %ymm1
vblendps$1, %ymm0, %ymm1, %ymm0 # ymm0 = 
ymm0[0],ymm1[1,2,3,4,5,6,7]
and
vextractf128$1, %ymm1, %xmm2
vshufpd $1, %xmm2, %xmm1, %xmm2 # xmm2 = xmm1[1],xmm2[0]
vmovddup%xmm1, %xmm1# xmm1 = xmm1[0,0]
vinsertf128 $1, %xmm2, %ymm1, %ymm1
vblendps$3, %ymm0, %ymm1, %ymm0 # ymm0 = 
ymm0[0,1],ymm1[2,3,4,5,6,7]
With the patch we emit:
vpermilps   $144, %ymm1, %ymm2
vpermilps   .LC0(%rip), %ymm1, %ymm1
vblendps$238, %ymm2, %ymm0, %ymm0
vperm2f128  $1, %ymm1, %ymm1, %ymm1
vblendps$16, %ymm1, %ymm0, %ymm0
and
vshufps $68, %ymm1, %ymm0, %ymm0
vpermilps   .LC1(%rip), %ymm1, %ymm1
vperm2f128  $1, %ymm1, %ymm1, %ymm1
vblendps$48, %ymm1, %ymm0, %ymm0
so one insn each shorter than what LLVM emits.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-08-29  Jakub Jelinek  

PR target/91560
* config/i386/i386-expand.c (expand_vec_perm_movs,
expand_vec_perm_blend, expand_vec_perm_vpermil,
expand_vec_perm_pshufb, expand_vec_perm_1,
expand_vec_perm_pshuflw_pshufhw, expand_vec_perm_palignr,
expand_vec_perm_interleave2, expand_vec_perm_vpermq_perm_1,
expand_vec_perm_vperm2f128, expand_vec_perm_interleave3,
expand_vec_perm_vperm2f128_vblend, expand_vec_perm_2vperm2f128_vshuf,
expand_vec_perm_even_odd, expand_vec_perm_broadcast): Adjust function
comments - replace ix86_expand_vec_perm_builtin_1 with
ix86_expand_vec_perm_const_1.
(expand_vec_perm2_vperm2f128_vblend): New function.
(ix86_expand_vec_perm_const_1): New forward declaration.  Call
expand_vec_perm2_vperm2f128_vblend as last resort.
(canonicalize_perm): Formatting fix.

* gcc.dg/torture/vshuf-8.inc: Add two further permutations.

--- gcc/config/i386/i386-expand.c.jj2019-08-27 12:26:25.383089132 +0200
+++ gcc/config/i386/i386-expand.c   2019-08-28 15:22:43.911004586 +0200
@@ -16372,7 +16372,7 @@ expand_vselect_vconcat (rtx target, rtx
   return ok;
 }
 
-/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
using movss or movsd.  */
 static bool
 expand_vec_perm_movs (struct expand_vec_perm_d *d)
@@ -16408,7 +16408,7 @@ expand_vec_perm_movs (struct expand_vec_
   return true;
 }
 
-/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
 static bool
@@ -16633,7 +16633,7 @@ expand_vec_perm_blend (struct expand_vec
   return true;
 }
 
-/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
in terms of the variable form of vpermilps.
 
Note that we will have already failed the immediate input vpermilps,
@@ -16709,7 +16709,7 @@ valid_perm_using_mode_p (machine_mode vm
   return true;
 }
 
-/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+/* A subroutine of ix86_expand_vec_perm_const_1.  Try to implement D
in terms of pshufb, vpperm, vpermq, vpermd, vpermps or vperm2i128.  */
 
 static bool
@@ -17026,7 +17026,7 @@ ix86_expand_vec_one_operand_perm_avx512
 
 static bool expand_vec_perm_palignr (struct expand_vec_perm_d *d, bool);
 
-/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to instantiate D
+/* A subroutine of ix86_expand_vec_perm_const_1.  Try to instantiate D
in a single instruction.  */
 
 static bool
@@ -17216,7 +17216,7 @@ expand_vec_perm_1 (struct expand_vec_per
   return false;
 }
 
-/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to

Re: [PATCH] Fix PR91568

2019-08-29 Thread Richard Biener
On Wed, 28 Aug 2019, Richard Sandiford wrote:

> Richard Biener  writes:
> > When making the SLP tree a graph I messed up and in some cases we can
> > fail to propagate a higher max_nunits upwards when re-using a subtree.
> > For the testcase, but only on the gcc-9-branch, this causes us to
> > miss a higher value of 4 completely since it comes from a
> > single widening stmt which we remember but throw out of the window
> > due to an early mismatch fixable by commutating.
> >
> > I hope TYPE_VECTOR_SUBPARTS are all ordered, at least aarch64-sve.exp
> > is clean as it gets with a cc1 cross (it doesn't seem to find
> > gcc/include/stdint.h so has more compile-time errors than necessary...).
> >
> > Richard, is that an OK-ish assumption?
> 
> Yeah, it's probably a reasonable assumption for types and sizes
> currently chosen by the target vectorisation hooks (although not
> more generally).  But the existing max_nunits code goes through
> vect_update_max_nunits, with the idea being to keep the assumption
> in a single place.  So IMO it'd be better to split that into:
> 
> static inline void
> vect_update_max_nunits (poly_uint64 *max_nunits, poly_uint64 nunits)
> {
>   /* All unit counts have the form current_vector_size * X for some
>  rational X, so two unit sizes must have a common multiple.
>  Everything is a multiple of the initial value of 1.  */
>   *max_nunits = force_common_multiple (*max_nunits, nunits);
> }
> 
> static inline void
> vect_update_max_nunits (poly_uint64 *max_nunits, tree vectype)
> {
>   vect_update_max_nunits (max_nunits, TYPE_VECTOR_SUBPARTS (vectype));
> }
> 
> LGTM otherwise.  The estimated_poly_value could end up being a bit
> confusing when reading dumps, but I'll take that as a sign that you
> don't want to add poly_* printers ;-)

:P

The following is what I have applied.  Bootstrapped and tested
on x86_64-unknown-linux-gnu on trunk and branch.

Richard.

2019-08-29  Richard Biener  

PR tree-optimization/91568
* tree-vectorizer.h (_slp_tree::max_nunits): Add.
(vect_update_max_nunits): Add overload for poly_uint64.
* tree-vect-slp.c (vect_create_new_slp_node): Initialize it.
(vect_build_slp_tree): Record max_nunits into the subtree
and merge it upwards.
(vect_print_slp_tree): Print max_nunits.

* gfortran.dg/pr91568.f: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 274983)
+++ gcc/tree-vect-slp.c (working copy)
@@ -129,6 +129,7 @@ vect_create_new_slp_node (vecrefcnt = 1;
+  node->max_nunits = 1;
 
   unsigned i;
   FOR_EACH_VEC_ELT (scalar_stmts, i, stmt_info)
@@ -1051,15 +1052,24 @@ vect_build_slp_tree (vec_info *vinfo,
dump_printf_loc (MSG_NOTE, vect_location, "re-using %sSLP tree %p\n",
 *leader ? "" : "failed ", *leader);
   if (*leader)
-   (*leader)->refcnt++;
+   {
+ (*leader)->refcnt++;
+ vect_update_max_nunits (max_nunits, (*leader)->max_nunits);
+   }
   return *leader;
 }
-  slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size, max_nunits,
+  poly_uint64 this_max_nunits = 1;
+  slp_tree res = vect_build_slp_tree_2 (vinfo, stmts, group_size,
+   &this_max_nunits,
matches, npermutes, tree_size,
max_tree_size, bst_map);
-  /* Keep a reference for the bst_map use.  */
   if (res)
-res->refcnt++;
+{
+  res->max_nunits = this_max_nunits;
+  vect_update_max_nunits (max_nunits, this_max_nunits);
+  /* Keep a reference for the bst_map use.  */
+  res->refcnt++;
+}
   bst_map->put (stmts.copy (), res);
   return res;
 }
@@ -1463,9 +1473,10 @@ vect_print_slp_tree (dump_flags_t dump_k
 
   dump_metadata_t metadata (dump_kind, loc.get_impl_location ());
   dump_user_location_t user_loc = loc.get_user_location ();
-  dump_printf_loc (metadata, user_loc, "node%s %p\n",
+  dump_printf_loc (metadata, user_loc, "node%s %p (max_nunits=%u)\n",
   SLP_TREE_DEF_TYPE (node) != vect_internal_def
-  ? " (external)" : "", node);
+  ? " (external)" : "", node,
+  estimated_poly_value (node->max_nunits));
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
 dump_printf_loc (metadata, user_loc, "\tstmt %d %G", i, stmt_info->stmt);
   if (SLP_TREE_CHILDREN (node).is_empty ())
Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   (revision 274983)
+++ gcc/tree-vectorizer.h   (working copy)
@@ -132,6 +132,9 @@ struct _slp_tree {
   unsigned int vec_stmts_size;
   /* Reference count in the SLP graph.  */
   unsigned int refcnt;
+  /* The maximum number of vector elements for the subtree rooted
+ at this node.  */
+  poly_uint64 max_nunits;
   /* Whet

Re: [Preprocessor] small cleanups

2019-08-29 Thread Bernhard Reutner-Fischer
On Wed, 28 Aug 2019 14:42:57 -0400
Nathan Sidwell  wrote:

>/* If opened with #import or contains #pragma once.  */
> -  bool once_only;
> +  bool once_only : 1;

I'm curious why you have them as bool and not unsigned?

thanks,


Re: [PATCH, i386]: Improve STV conversion of shifts

2019-08-29 Thread Uros Bizjak
On Wed, Aug 28, 2019 at 5:12 PM Uros Bizjak  wrote:
>
> Attached patch improves costing for STV shifts and corrects reject
> condition for out of range shift count operands.
>
> 2019-08-28  Uroš Bizjak  
>
> * config/i386/i386-features.c
> (general_scalar_chain::compute_convert_gain):
> Correct cost for double-word shifts.
> (general_scalar_to_vector_candidate_p): Reject count operands
> greater or equal to mode bitsize.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Committed to mainline SVN.

Ouch... I mixed up patches and actually committed the patch that
removes maximum from cost of sse<->int moves.

I can leave the patch for a day, so we can see the effects of the cost
change, and if the patch creates problems, I'll revert it.

Sorry for the mixup,
Uros.


Re: [PATCH] Use cxx_printable_name for __PRETTY_FUNCTION__ in cp_fname_init.

2019-08-29 Thread Martin Liška
On 8/28/19 10:19 PM, Jason Merrill wrote:
> On 8/28/19 12:29 PM, Martin Liška wrote:
>> The patch restores behavior before r265711 where we used
>> cxx_printable_name for __PRETTY_FUNCTION__.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
>> Thanks,
>> Martin
>>
>> gcc/c-family/ChangeLog:
>>
>> 2019-08-27  Martin Liska  
>>
>>  PR c++/91155
>>  * c-common.c (fname_as_string): Use cxx_printable_name for
>>  __PRETTY_FUNCTION__ same as was used before r265711.
> 
>> -  if (name)
>> -    free (CONST_CAST (char *, name));
> 
> This creates a memory leak for the fname_as_string case.
> 
> Jason
> 

Sure, fixed in the updated patch.

Ready for trunk?
Thanks,
Martin
>From 528350107f256b101040bb1074006b812c052e15 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 27 Aug 2019 13:16:08 +0200
Subject: [PATCH] Use cxx_printable_name for __PRETTY_FUNCTION__ in
 cp_fname_init.

gcc/c-family/ChangeLog:

2019-08-27  Martin Liska  

	PR c++/91155
	* c-common.c (fname_as_string): Use cxx_printable_name for
	__PRETTY_FUNCTION__ same as was used before r265711.

gcc/testsuite/ChangeLog:

2019-08-27  Martin Liska  

	PR c++/91155
	* g++.dg/torture/pr91155.C: New test.
---
 gcc/cp/decl.c  | 20 +---
 gcc/testsuite/g++.dg/torture/pr91155.C | 18 ++
 2 files changed, 35 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr91155.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 2aef330455f..f72f6f2dac8 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4511,13 +4511,27 @@ cp_fname_init (const char* name, tree *type_p)
 static tree
 cp_make_fname_decl (location_t loc, tree id, int type_dep)
 {
-  const char *const name = (type_dep && in_template_function ()
-			? NULL : fname_as_string (type_dep));
+  const char * name = NULL;
+  bool release_name = false;
+  if (!(type_dep && in_template_function ()))
+{
+  if (current_function_decl == NULL_TREE)
+	name = "top level";
+  else if (type_dep == 1) /* __PRETTY_FUNCTION__ */
+	name = cxx_printable_name (current_function_decl, 2);
+  else if (type_dep == 0) /* __FUNCTION__ */
+	{
+	  name = fname_as_string (type_dep);
+	  release_name = true;
+	}
+  else
+	gcc_unreachable ();
+}
   tree type;
   tree init = cp_fname_init (name, &type);
   tree decl = build_decl (loc, VAR_DECL, id, type);
 
-  if (name)
+  if (release_name)
 free (CONST_CAST (char *, name));
 
   /* As we're using pushdecl_with_scope, we must set the context.  */
diff --git a/gcc/testsuite/g++.dg/torture/pr91155.C b/gcc/testsuite/g++.dg/torture/pr91155.C
new file mode 100644
index 000..04e4f7ab41b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr91155.C
@@ -0,0 +1,18 @@
+/* PR c++/91155.  */
+
+template< char C > struct dummy {};
+
+template< typename T > const char *test()
+{
+  __builtin_printf ("test: %s\n", __PRETTY_FUNCTION__);
+  return __PRETTY_FUNCTION__;
+}
+
+int main()
+{
+if (__builtin_strcmp ("const char* test() [with T = dummy<\'\\000\'>]", test< dummy< '\0' > > ()) != 0)
+{};//  __builtin_abort ();
+if (__builtin_strcmp ("const char* test() [with T = dummy<\'\\\'\'>]", test< dummy< '\'' > > ()) != 0)
+{};//  __builtin_abort ();
+return 0;
+}
-- 
2.22.1