Re: [PATCH 1/6] Modify gcc driver for parallel compilation

2020-08-24 Thread Richard Biener via Gcc-patches
On Mon, Aug 24, 2020 at 8:06 PM Giuliano Belinassi
 wrote:
>
> Hi, Richi.
>
> On 08/24, Richard Biener wrote:
> > On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
> >  wrote:
> > >
> > > Update the driver for parallel compilation. This process work as
> > > follows:
> > >
> > > When calling gcc, the driver will check if the flag
> > > "-fparallel-jobs" was provided by the user. If yes, then we will
> > > check what is the desired output, and if it can be parallelized.
> > > There are the following cases, which is described:
> > >
> > > 1. -S or -E was provided: We can't run in parallel, as the output
> > >can not be easily merged together into one file.
> > >
> > > 2. -c was provided: When cc1* forks into multiple processes, it
> > >must tell the driver where it stored its generated assembler files.
> > >Therefore we pass a hidden "-fsplit-outputs=filename" to the compiler,
> > >and we check if "filename" was created by it. If yes, we open it,
> > >call assembler for each generated asm file
> > >(this file must not be empty), and link them together with
> > >partial linking to a single .o file. This process is done for each
> > >object file in the argument list.
> > >
> > > 3. -c was not provided, and the final product will be an binary: Here
> > >we proceed exactly as 2., but we avoid doing the partial
> > >linking, feeding the generated object files directly into the final 
> > > link.
> > >
> > > For that to work, we had to heavily modify how the "execute" function
> > > works, extracting common code which is used multiple times, and
> > > also detecting when the command is a call to a compiler or an
> > > assembler, as can be seen in append_split_outputs.
> > >
> > > Finally, we added some tests which reflects all cases found when
> > > bootstrapping the compiler, so development of further features to the
> > > driver get faster for now on.
> >
> > Few comments inline, Joseph may want to comment on the overall
> > structure as driver maintainer (CCed).
> >
> > I know I asked for the changes on the branch to be squashed but
> > the diff below is quite unreadable with the ChangeLog not helping
> > the overall refactoring much.  Is it possible to do some of the
> > factoring/refactoring without any functionality change to make the
> > actual diff easier to follow?
>
> Well, the refactoring is necessary, otherwise I would need to copy and
> paste a really huge amount of code.
>
> What I can do (and sounds reasonable to me) is to break this patch into
> two parts; one with just refactoring changes, and the other adding the
> parallelism engine.
>
> >
> > Thanks,
> > Richard.
> >
> > > gcc/ChangeLog
> > > 2020-08-20  Giuliano Belinassi  
> > >
> > > * common.opt (fsplit-outputs): New flag.
> > > (fparallel-jobs): New flag.
> > > * gcc.c (extra_arg_storer): New class.
> > > (have_S): New variable.
> > > (struct command): Move from execute.
> > > (is_compiler): New function.
> > > (is_assembler): New function.
> > > (get_number_of_args): New function.
> > > (get_file_by_lines): New function.
> > > (identify_asm_file): New function.
> > > (struct infile): New attribute temp_additional_asm.
> > > (current_infile): New variable.
> > > (get_path_to_ld): New function.
> > > (has_hidden_E): New function.
> > > (sort_asm_files): New function.
> > > (append_split_outputs): New function.
> > > (print_command): New function.
> > > (print_commands): New function.
> > > (print_argbuf): New function.
> > > (handle_verbose): Extracted from execute.
> > > (append_valgrind): Same as above.
> > > (async_launch_commands): Same as above.
> > > (await_commands_to_finish): Same as above.
> > > (split_commands): Same as above.
> > > (parse_argbuf): Same as above.
> > > (execute): Refator.
> > > (fsplit_arg): New function.
> > > (alloc_infile): Initialize infiles with 0.
> > > (process_command): Remember when -S was passed.
> > > (do_spec_on_infiles): Remember current infile being processed.
> > > (maybe_run_linker): Replace object files when -o is a executable.
> > > (finalize): Deinitialize temp_object_files.
> > >
> > > gcc/testsuite/ChangeLog:
> > > 20-08-2020  Giuliano Belinassi  
> > >
> > > * driver/driver.exp: New test.
> > > * driver/a.c: New file.
> > > * driver/b.c: New file.
> > > * driver/empty.c: New file.
> > > * driver/foo.c: New file.
> > > ---
> > >  gcc/common.opt  |4 +
> > >  gcc/gcc.c   | 1219 ---
> > >  gcc/testsuite/driver/a.c|6 +
> > >  gcc/testsuite/driver/b.c|6 +
> > >  gcc/testsuite/driver/driver.exp |   80 ++
> > >  gcc/testsuite/driver/empty.c|0
> > >  gcc/testsuite/driver/

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 24, 2020 at 10:43 PM Qing Zhao  wrote:
>
>
>
> > On Aug 24, 2020, at 3:20 PM, Segher Boessenkool 
> >  wrote:
> >
> > Hi!
> >
> > On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
> >>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool 
> >>>  wrote:
> >>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> > On Aug 19, 2020, at 5:57 PM, Segher Boessenkool 
> >  wrote:
> > Numbers on how expensive this is (for what arch, in code size and in
> > execution time) would be useful.  If it is so expensive that no one will
> > use it, it helps security at most none at all :-(
> >>>
> >>> Without numbers on this, no one can determine if it is a good tradeoff
> >>> for them.  And we (the GCC people) cannot know if it will be useful for
> >>> enough users that it will be worth the effort for us.  Which is why I
> >>> keep hammering on this point.
> >> I can collect some run-time overhead data on this, do you have a 
> >> recommendation on what test suite I can use
> >> For this testing? (Is CPU2017 good enough)?
> >
> > I would use something more real-life, not 12 small pieces of code.
>
> Then, what kind of real-life benchmark you are suggesting?
>
> >
> >>> (The other side of the coin is how much this helps prevent exploitation;
> >>> numbers on that would be good to see, too.)
> >>
> >> This can be well showed from the paper:
> >>
> >> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented 
> >> Programming Attacks"
> >>
> >> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$
> >>   
> >>  >>  >
> >>
> >> Please take a look at this paper.
> >
> > As I told you before, that isn't open information, I cannot reply to
> > any of that.
>
> A little confused here, what’s you mean by “open information”? Is the 
> information in a published paper not open information?

No, because it is behind a paywall.

Uros.


[PATCH] Fortran : ICE for division by zero in declaration PR95882

2020-08-24 Thread Mark Eggleston

Second attempt, this time with the correct attachment.

OK to commit and backport?

[PATCH] Fortran  : ICE for division by zero in declaration PR95882

A length expression containing a divide by zero in a character
declaration will result in an ICE if the constant is anymore
complicated that a contant divided by a constant.

The cause was that char_len_param_value can return MATCH_YES
even if a divide by zero was seen.  Prior to returning check
whether a divide by zero was seen and if so set it to MATCH_ERROR.

2020-08-24  Mark Eggleston 

gcc/fortran

    PR fortran/95882
    * decl.c (char_len_param_value): Check gfc_seen_div0 and
    it is set return MATCH_ERROR.

2020-08-24  Mark Eggleston 

gcc/testsuite/

    PR fortran/95882
    * gfortran.dg/pr95882_1.f90: New test.
    * gfortran.dg/pr95882_2.f90: New test.
    * gfortran.dg/pr95882_3.f90: New test.
    * gfortran.dg/pr95882_4.f90: New test.
    * gfortran.dg/pr95882_5.f90: New test.

--
https://www.codethink.co.uk/privacy.html

>From e97963ec8edc58217d2ff225c58256ebd61c8e7c Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Fri, 21 Aug 2020 06:39:30 +0100
Subject: [PATCH] Fortran  : ICE for division by zero in declaration PR95882

A length expression containing a divide by zero in a character
declaration will result in an ICE if the constant is anymore
complicated that a contant divided by a constant.

The cause was that char_len_param_value can return MATCH_YES
even if a divide by zero was seen.  Prior to returning check
whether a divide by zero was seen and if so set it to MATCH_ERROR.

2020-08-24  Mark Eggleston  

gcc/fortran

	PR fortran/95882
	* decl.c (char_len_param_value): Check gfc_seen_div0 and
	it is set return MATCH_ERROR.

2020-08-24  Mark Eggleston  

gcc/testsuite/

	PR fortran/95882
	* gfortran.dg/pr95882_1.f90: New test.
	* gfortran.dg/pr95882_2.f90: New test.
	* gfortran.dg/pr95882_3.f90: New test.
	* gfortran.dg/pr95882_4.f90: New test.
	* gfortran.dg/pr95882_5.f90: New test.
---
 gcc/fortran/decl.c  | 3 +++
 gcc/testsuite/gfortran.dg/pr95882_1.f90 | 8 
 gcc/testsuite/gfortran.dg/pr95882_2.f90 | 6 ++
 gcc/testsuite/gfortran.dg/pr95882_3.f90 | 6 ++
 gcc/testsuite/gfortran.dg/pr95882_4.f90 | 7 +++
 gcc/testsuite/gfortran.dg/pr95882_5.f90 | 6 ++
 6 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr95882_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr95882_2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr95882_3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr95882_4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/pr95882_5.f90

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index d854b2a0307..c612b492f3e 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -1146,6 +1146,9 @@ char_len_param_value (gfc_expr **expr, bool *deferred)
   gfc_free_expr (e);
 }
 
+  if (gfc_seen_div0)
+m = MATCH_ERROR;
+
   return m;
 
 syntax:
diff --git a/gcc/testsuite/gfortran.dg/pr95882_1.f90 b/gcc/testsuite/gfortran.dg/pr95882_1.f90
new file mode 100644
index 000..c254bddf494
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95882_1.f90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+
+module m
+   type t
+  character(((0)/0)) :: c  ! { dg-error "Division by zero" }
+   end type
+end
+
diff --git a/gcc/testsuite/gfortran.dg/pr95882_2.f90 b/gcc/testsuite/gfortran.dg/pr95882_2.f90
new file mode 100644
index 000..d308f0c3181
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95882_2.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+
+module m
+   character(0/(0)) :: c = '123456789'  ! { dg-error "Division by zero" }
+end
+
diff --git a/gcc/testsuite/gfortran.dg/pr95882_3.f90 b/gcc/testsuite/gfortran.dg/pr95882_3.f90
new file mode 100644
index 000..bd849135480
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95882_3.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+
+subroutine s(c)
+   character(((0)/0)) :: c  ! { dg-error "Division by zero" }
+end
+
diff --git a/gcc/testsuite/gfortran.dg/pr95882_4.f90 b/gcc/testsuite/gfortran.dg/pr95882_4.f90
new file mode 100644
index 000..52892d32b8b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95882_4.f90
@@ -0,0 +1,7 @@
+! { dg-do compile }
+
+program p
+   character(((0)/0)) :: c  ! { dg-error "Division by zero" }
+   common /x/ c
+end
+
diff --git a/gcc/testsuite/gfortran.dg/pr95882_5.f90 b/gcc/testsuite/gfortran.dg/pr95882_5.f90
new file mode 100644
index 000..dcdf5304052
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95882_5.f90
@@ -0,0 +1,6 @@
+! { dg-do compile }
+
+program p
+   character(0/(0)) :: c = '123456789'  ! { dg-error "Division by zero" }
+   common c
+end
-- 
2.11.0



Re: [PATCH] Fortran : ICE for division by zero in declaration PR95882

2020-08-24 Thread Mark Eggleston



On 25/08/2020 07:13, Mark Eggleston wrote:


On 24/08/2020 17:42, Thomas Koenig wrote:

Hi Mark,


OK to commit and backport?


The test cases mentioned in the ChangeLog are not in the
patch, instead there is the test case for PR 96624.

Could you correct that?

Whoops, yes I'll fix that.

It is actually the wrong attachment, I'll try again.


Best regards

Thomas


--
https://www.codethink.co.uk/privacy.html



Re: [PATCH] Fortran : ICE for division by zero in declaration PR95882

2020-08-24 Thread Mark Eggleston



On 24/08/2020 17:42, Thomas Koenig wrote:

Hi Mark,


OK to commit and backport?


The test cases mentioned in the ChangeLog are not in the
patch, instead there is the test case for PR 96624.

Could you correct that?

Whoops, yes I'll fix that.


Best regards

Thomas


--
https://www.codethink.co.uk/privacy.html



Re: [PATCH] improve validation of attribute arguments (PR c/78666)

2020-08-24 Thread Aldy Hernandez via Gcc-patches

Attached is another revision of this patch with the new helper
function you suggested.  Is this version okay to commit?


I don't actually have the ability to grant check-in in this area.  I 
just figured that implementing some of these suggestions might make it 
more palatable to reviewers who do.


The patch is much better, but IMO it is a bit hard to read and not 
generic enough, but I defer to the relevant maintainers.


Aldy



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Alexandre Oliva
On Aug 24, 2020, Richard Biener  wrote:

> since the option is quite elaborate on what (sub-)set of regs is
> supposed to be cleared I'm not sure an implementation not involving
> any target hook is possible?

I don't think this follows.  Machine-independent code has a pretty good
notion of what registers are call-saved or call-clobbered, which ones
could be changed in this regard for function-specific calling
conventions, which ones may be used by a function to hold its return
value, which ones are used within a function...

It *should* be possible to introduce this in machine-independent code,
emitting insns to set registers to zero and regarding them as holding
values to be returned from the function.  Machine-specific code could
use more efficient insns to get the same result, but I can't see good
reason to not have a generic fallback implementation with at least a
best-effort attempt to offer the desired feature.


Now, this is for the regular return path.  Is zeroing registers in
exception-propagation paths not relevant?

I thought it is, and I think we could have generic code that identifies
the registers that ought to be zeroed, issues CFI notes to get them
zeroed in the exception path, and requests a target hook to emit the
insns to zero them in the regular return path.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH 0/3] Power10 PCREL_OPT support

2020-08-24 Thread Bill Schmidt via Gcc-patches

On 8/24/20 11:01 PM, Michael Meissner wrote:

On Sat, Aug 22, 2020 at 07:05:51PM -0500, Bill Schmidt wrote:

What is necessary in order to allow this optimization to occur
earlier is to make this hidden dependency explicit.  When the
relocation is inserted, we have to change the "pld" instruction to
have a specific clobber of (in this case) r5, which represents what
will happen if the linker makes the substitution.

I agree that it's too fragile to force this to be the last pass, so
I think if Mike can look into introducing a clobber of the hard
register when performing the optimization, that would at least allow
us to move this anywhere after reload.

I don't immediately see a solution that works prior to register
allocation because we basically are representing two potential
starting points of a live range, only one of which will survive in
the final code.  That is too ugly a problem to hand to the register
allocator.

As I said in a private message, I have the appropriate clobbers and such
already.


Great, thanks!  I had failed to understand this was there now -- excellent.

I started cobbling up some code to use the du-chains to simplify things 
a little (but only a little).  Dealing with intervening loads and stores 
still requires a little mess.  I'll send you something in the next day 
or two, I hope.


Bill



Here is the program I used in my previous reply to Segher:

extern int a, b, c;

int sum (void)
{
  return a + b + c;
}


Here is the RTL before the PCREL_OPT pass from sched2:

;; Load the address of a into r8
(insn:TI 5 13 6 2 (set (reg/f:DI 8 8 [123])
(symbol_ref:DI ("a") [flags 0xc0]  )) 
"foo02.c":5:12 722 {*pcrel_extern_addr}
 (expr_list:REG_EQUIV (symbol_ref:DI ("a") [flags 0xc0]  )
(nil)))

;; Load the address of b into r10
(insn 6 5 10 2 (set (reg/f:DI 10 10 [124])
(symbol_ref:DI ("b") [flags 0xc0]  )) 
"foo02.c":5:12 722 {*pcrel_extern_addr}
 (expr_list:REG_EQUIV (symbol_ref:DI ("b") [flags 0xc0]  )
(nil)))

;; Load the address of c into r9
(insn 10 6 7 2 (set (reg/f:DI 9 9 [128])
(symbol_ref:DI ("c") [flags 0xc0]  )) 
"foo02.c":5:16 722 {*pcrel_extern_addr}
 (expr_list:REG_EQUIV (symbol_ref:DI ("c") [flags 0xc0]  )
(nil)))

;; Load a's value into r3, using r8 as the base register
(insn:TI 7 10 8 2 (set (reg:DI 3 3)
(zero_extend:DI (mem/c:SI (reg/f:DI 8 8 [123]) [1 a+0 S4 A32]))) 
"foo02.c":5:12 16 {zero_extendsidi2}
 (expr_list:REG_DEAD (reg/f:DI 8 8 [123])
(nil)))

;; Load b's value into r10, using r10 as the base register
(insn 8 7 11 2 (set (reg:DI 10 10)
(zero_extend:DI (mem/c:SI (reg/f:DI 10 10 [124]) [1 b+0 S4 A32]))) 
"foo02.c":5:12 16 {zero_extendsidi2}
 (nil))

;; Load c's value into r9, using r9 as the base register
(insn 11 8 9 2 (set (reg:DI 9 9)
(zero_extend:DI (mem/c:SI (reg/f:DI 9 9 [128]) [1 c+0 S4 A32]))) 
"foo02.c":5:16 16 {zero_extendsidi2}
 (nil))

;; Add a+b
(insn:TI 9 11 12 2 (set (reg:SI 3 3 [125])
(plus:SI (reg:SI 3 3 [orig:126 a ] [126])
(reg:SI 10 10 [orig:127 b ] [127]))) "foo02.c":5:12 65 
{*addsi3}
 (expr_list:REG_DEAD (reg:SI 10 10 [orig:127 b ] [127])
(nil)))

;; Add (a+b)+c
(insn:TI 12 9 18 2 (set (reg:SI 3 3 [122])
(plus:SI (reg:SI 3 3 [125])
(reg:SI 9 9 [orig:129 c ] [129]))) "foo02.c":5:16 65 
{*addsi3}
 (expr_list:REG_DEAD (reg:SI 9 9 [orig:129 c ] [129])
(nil)))

;; Sign extend
(insn:TI 18 12 19 2 (set (reg/i:DI 3 3)
(sign_extend:DI (reg:SI 3 3 [122]))) "foo02.c":6:1 31 
{extendsidi2}
 (nil))

;; Return
(insn 19 18 29 2 (use (reg/i:DI 3 3)) "foo02.c":6:1 -1
 (nil))
(note 29 19 25 2 NOTE_INSN_EPILOGUE_BEG)
(jump_insn 25 29 26 2 (simple_return) "foo02.c":6:1 866 {simple_return}
 (nil)
 -> simple_return)


And here is the RTL after the PCREL_OPT:

;; Load of address a into r8, a will be loaded into r3
(insn:TI 5 13 6 2 (parallel [
(set (reg/f:DI 8 8 [123])
(unspec:DI [
(symbol_ref:DI ("a") [flags 0xc0]  )
(const_int 1 [0x1])
] UNSPEC_PCREL_OPT_LD_ADDR))
(set (reg:DI 3 3)
(unspec:DI [
(const_int 0 [0])
] UNSPEC_PCREL_OPT_LD_ADDR))
]) "foo02.c":5:12 2198 {pcrel_opt_ld_addr}
 (expr_list:REG_EQUIV (symbol_ref:DI ("a") [flags 0xc0]  )
  

Re: [PATCH 0/3] Power10 PCREL_OPT support

2020-08-24 Thread Michael Meissner via Gcc-patches
I forgot to mention that comparing the three tests of placement of the
PCREL_OPT pass:

Having the pass after sched2 generated the same number of PCREL_OPT relocations
as having the pass immediately after reload.

But having the pass just before sched2 resulted in some more PCREL_OPT
relocations.  Mostly the number of instructions was the same, but a few
benchmarks had a few more 4-byte instructions generated.  It looks like those
new PCREL_OPT relocations replaced a few places where we loaded the address,
but did not do a PCREL_OPT optimization.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH 1/5] Don't enable -gvariable-location-views by default for DWARF5.

2020-08-24 Thread Alexandre Oliva
On Aug 24, 2020, Jakub Jelinek  wrote:

> On Mon, Aug 24, 2020 at 02:56:54PM +0200, Mark Wielaard wrote:
>> DWARF5 makes it possible to read loclists tables without consulting
>> the debuginfo tree by introducing a table header. Adding location views
>> breaks this (at least for binutils and elfutils). So don't enable
>> variable-location-views by default if DWARF5 or higher is selected.

> This should be discussed with Alex, CCed.
> I'd say elfutils/binutils should just show .debug_loclists independent of
> .debug_info if there are no loc views and use .debug_info otherwise.

I've suggested before that it made sense to me to start emitting
locviews when there were concrete plans to implement support for them in
debug info consumers.

Without such plans, it would make more sense to just disable them
altogether.

Now, if there are any such plans, it is disabling them for the default
debug format that doesn't make much sense to me; it would seem to make
more sense to adopt and promote the proposed extension, implemented in
=incompat5 in GCC, but it would need some implementation work in
consumers to at least ignore the extension.


Red Hat has had me involved in these efforts for over a decade, but I'm
not aware of its plans any more, and I don't know of anyone else driving
the implementation of locviews in consumers, so, given the little I
know, this patch seems like a timid step in a reasonable direction:
outputting locviews is no use if there are no consumers for it, more so
when they actively disturb existing standard-compliant consumers.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH 0/3] Power10 PCREL_OPT support

2020-08-24 Thread Michael Meissner via Gcc-patches
On Sat, Aug 22, 2020 at 07:05:51PM -0500, Bill Schmidt wrote:
> What is necessary in order to allow this optimization to occur
> earlier is to make this hidden dependency explicit.  When the
> relocation is inserted, we have to change the "pld" instruction to
> have a specific clobber of (in this case) r5, which represents what
> will happen if the linker makes the substitution.
> 
> I agree that it's too fragile to force this to be the last pass, so
> I think if Mike can look into introducing a clobber of the hard
> register when performing the optimization, that would at least allow
> us to move this anywhere after reload.
> 
> I don't immediately see a solution that works prior to register
> allocation because we basically are representing two potential
> starting points of a live range, only one of which will survive in
> the final code.  That is too ugly a problem to hand to the register
> allocator.

As I said in a private message, I have the appropriate clobbers and such
already.

Here is the program I used in my previous reply to Segher:

extern int a, b, c;

int sum (void)
{
  return a + b + c;
}


Here is the RTL before the PCREL_OPT pass from sched2:

;; Load the address of a into r8
(insn:TI 5 13 6 2 (set (reg/f:DI 8 8 [123])
(symbol_ref:DI ("a") [flags 0xc0]  )) "foo02.c":5:12 722 {*pcrel_extern_addr}
 (expr_list:REG_EQUIV (symbol_ref:DI ("a") [flags 0xc0]  )
(nil)))

;; Load the address of b into r10
(insn 6 5 10 2 (set (reg/f:DI 10 10 [124])
(symbol_ref:DI ("b") [flags 0xc0]  )) "foo02.c":5:12 722 {*pcrel_extern_addr}
 (expr_list:REG_EQUIV (symbol_ref:DI ("b") [flags 0xc0]  )
(nil)))

;; Load the address of c into r9
(insn 10 6 7 2 (set (reg/f:DI 9 9 [128])
(symbol_ref:DI ("c") [flags 0xc0]  )) "foo02.c":5:16 722 {*pcrel_extern_addr}
 (expr_list:REG_EQUIV (symbol_ref:DI ("c") [flags 0xc0]  )
(nil)))

;; Load a's value into r3, using r8 as the base register
(insn:TI 7 10 8 2 (set (reg:DI 3 3)
(zero_extend:DI (mem/c:SI (reg/f:DI 8 8 [123]) [1 a+0 S4 
A32]))) "foo02.c":5:12 16 {zero_extendsidi2}
 (expr_list:REG_DEAD (reg/f:DI 8 8 [123])
(nil)))

;; Load b's value into r10, using r10 as the base register
(insn 8 7 11 2 (set (reg:DI 10 10)
(zero_extend:DI (mem/c:SI (reg/f:DI 10 10 [124]) [1 b+0 S4 
A32]))) "foo02.c":5:12 16 {zero_extendsidi2}
 (nil))

;; Load c's value into r9, using r9 as the base register
(insn 11 8 9 2 (set (reg:DI 9 9)
(zero_extend:DI (mem/c:SI (reg/f:DI 9 9 [128]) [1 c+0 S4 
A32]))) "foo02.c":5:16 16 {zero_extendsidi2}
 (nil))

;; Add a+b
(insn:TI 9 11 12 2 (set (reg:SI 3 3 [125])
(plus:SI (reg:SI 3 3 [orig:126 a ] [126])
(reg:SI 10 10 [orig:127 b ] [127]))) "foo02.c":5:12 65 
{*addsi3}
 (expr_list:REG_DEAD (reg:SI 10 10 [orig:127 b ] [127])
(nil)))

;; Add (a+b)+c
(insn:TI 12 9 18 2 (set (reg:SI 3 3 [122])
(plus:SI (reg:SI 3 3 [125])
(reg:SI 9 9 [orig:129 c ] [129]))) "foo02.c":5:16 65 
{*addsi3}
 (expr_list:REG_DEAD (reg:SI 9 9 [orig:129 c ] [129])
(nil)))

;; Sign extend
(insn:TI 18 12 19 2 (set (reg/i:DI 3 3)
(sign_extend:DI (reg:SI 3 3 [122]))) "foo02.c":6:1 31 
{extendsidi2}
 (nil))

;; Return
(insn 19 18 29 2 (use (reg/i:DI 3 3)) "foo02.c":6:1 -1
 (nil))
(note 29 19 25 2 NOTE_INSN_EPILOGUE_BEG)
(jump_insn 25 29 26 2 (simple_return) "foo02.c":6:1 866 {simple_return}
 (nil)
 -> simple_return)


And here is the RTL after the PCREL_OPT:

;; Load of address a into r8, a will be loaded into r3
(insn:TI 5 13 6 2 (parallel [
(set (reg/f:DI 8 8 [123])
(unspec:DI [
(symbol_ref:DI ("a") [flags 0xc0]  )
(const_int 1 [0x1])
] UNSPEC_PCREL_OPT_LD_ADDR))
(set (reg:DI 3 3)
(unspec:DI [
(const_int 0 [0])
] UNSPEC_PCREL_OPT_LD_ADDR))
]) "foo02.c":5:12 2198 {pcrel_opt_ld_addr}
 (expr_list:REG_EQUIV (symbol_ref:DI ("a") [flags 0xc0]  )
(nil)))

;; Load of address b into r10, which will be the same register b's 
value is loaded into
(insn 6 5 10 2 (set (reg/f:DI 10 10 [124])
(unspec:DI [
(symbol_ref:DI ("b") [flags 0xc0]  )
(const_int 2 [0x2])
] UNSPEC_PCREL_OPT_LD_ADDR_SA

Re: [PATCH 0/3] Power10 PCREL_OPT support

2020-08-24 Thread Michael Meissner via Gcc-patches
On Thu, Aug 20, 2020 at 06:33:29PM -0500, Segher Boessenkool wrote:
> > These patches allow the load of the address to not be physically adjacent to
> > the actual load or store, which should allow for better code.
> 
> Why is that?  That is not what it does anyway?  /confused

It does allow that.  Perhaps I'm not being clear.

Consider this example:

extern int a, b, c;

int sum (void)
{
  return a + b + c;
}

With my patches it generates:

sum:
pld 8,a@got@pcrel
.Lpcrel1:

pld 10,b@got@pcrel
.Lpcrel2:

pld 9,c@got@pcrel
.Lpcrel3:

.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
lwz 3,0(8)

.reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
lwz 10,0(10)

.reloc .Lpcrel3-8,R_PPC64_PCREL_OPT,.-(.Lpcrel3-8)
lwz 9,0(9)

add 3,3,10
add 3,3,9
extsw 3,3
blr


Thus it separates the load of the 3 external addresses from the actual LWZ used
to load the values.

For example, in a recent Spec 2017 build for power10, over all of the 
benchmarks:

Total PCREL_OPTs found for load/store:  41,440
Times the PLD/PLA was separated from the load/store:17,893
Times the PLD/PLA was adjacent to the load/store:   23,547

Number of PCREL_OPT loads:  38,657
Number of PCREL_OPT loads separated from the PLD:   15,768
Number of PCREL_OPT loads adjancent to the PLD: 22,889

Number of PCREL_OPT stores:  2,783
Number of PCREL_OPT stores separated from the PLD:   2,125
Number of PCREL_OPT stores adjancent to the PLD:   658

Where it wins is if the external variable is in a shared library.  There the
PLD is in fact a load, and having some separation from the dependent load/store
helps.


> > In order to do this, the pass that converts the load address and load/store
> > must occur late in the compilation cycle.
> 
> That does not follow afaics.
> 
> > In particular, the second scheduler
> > pass will duplicate and optimize some of the references and it will produce 
> > an
> > invalid program.  In the past, Segher has said that we should be able to 
> > move
> > it earlier.
> 
> I said that you shouldn't require this to be the very last pass.  There
> is no reason for that, and that will not scale (what if a second pass
> shows up that also requires this!)

The patches I submitted don't require it to be the 'last' pass.  In fact, I put
it after sched2 because earlier versions of the patch could not be moved
earlier.  There are 11 passes after sched2 before final.

However, it turns out that in the last spin of the patches, I added the
necessary clobbers and such, so it can now go any where after register
allocation.  I built 3 versions of the compiler:

The first version had the pass after sched2 (version in patches);
The second version had the pass before sched2; (and)
The third version had the pass immediately after reload.

I built Spec 2017 with the two compilers.  Unlike before, there were no linker
failures.  I also wrote a perl script to verify that each PCREL_OPT relocation
only targeted one PLD/PLA with one load or store.

> It also makes it impossible to do normal late optimisations on code
> produced here (optimisations like peephole, cprop_hardreg, dce).

Now, it can do those optimizations.

> I also said that you should use the DF framework, not parse all RTL by
> hand and getting it all wrong, as *everyone* does: this stuff is hard.

Bill has said he would look into helping convert it to DF format.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-24 Thread xiezhiheng
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Friday, August 21, 2020 5:02 PM
> To: xiezhiheng 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3

Cut...
 
> Looks like the saturating intrinsics might need a bit more thought.
> Would you mind submitting the patch with just the other parts?
> Those were uncontroversial and it would be a shame to hold them
> up over this.

Okay, I reorganized the existing patch and finished the first half of the 
intrinsics
except saturating intrinsics and load intrinsics.

Bootstrapped and tested on aarch64 Linux platform.

For load intrinsics, I have one problem when I set FLAG_READ_MEMORY for them,
some test cases like
gcc.target/aarch64/advsimd-intrinsics/vld2_lane_p8_indices_1.c
  #include 

  /* { dg-do compile } */
  /* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */

  poly8x8x2_t
  f_vld2_lane_p8 (poly8_t * p, poly8x8x2_t v)
  {
poly8x8x2_t res;
/* { dg-error "lane 8 out of range 0 - 7" "" { target *-*-* } 0 } */
res = vld2_lane_p8 (p, v, 8);
/* { dg-error "lane -1 out of range 0 - 7" "" { target *-*-* } 0 } */
res = vld2_lane_p8 (p, v, -1);
return res;
  }
would fail in regression.  Because the first statement
  res = vld2_lane_p8 (p, v, 8);
would be eliminated as dead code in gimple phase but the error message is
generated in expand pass.  So I am going to replace the second statement
  res = vld2_lane_p8 (p, v, -1);
with
  res = vld2_lane_p8 (p, res, -1);
or do you have any other suggestions?

And for test case gcc.target/aarch64/arg-type-diagnostics-1.c, I return the 
result
to prevent the statement
  result = vrsra_n_s32 (arg1, arg2, a);
from being eliminated by treated as dead code.

Thanks,
Xie Zhiheng


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7a71b4367d4..217344d7d1f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-25  Zhiheng Xie  
+
+   * config/aarch64/aarch64-simd-builtins.def: Add proper FLAGS
+   for intrinsic functions.
+

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index b9562e67883..e10bcc9b28a 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-25  Zhiheng Xie  
+
+   * gcc.target/aarch64/arg-type-diagnostics-1.c: Return result
+   to prevent statement from being eliminated.
+


pr94442-v1.patch
Description: pr94442-v1.patch


[PATCH] libstdc++: Add more C++20 additions to

2020-08-24 Thread Patrick Palka via Gcc-patches
This patch adds the C++20 calendar types and their methods as defined in
[time.cal] (modulo the parsing/printing support).  This patch also
implements [time.hms] and [time.12], and a few more bits of
[time.clock].  The remaining C++20 additions to  from P0355 and
P1466 depend on [time.zone] and , so they will come later, as
will more optimized versions of some of the calendar algorithms.

The non-member operator overloads for the calendar types are defined as
namespace-scope functions in the standard, but here we instead define
each such operator overload as a hidden friend of the appropriate class.
This simplifies the implementation somewhat and lets us reap the
benefits of hidden friends for these routines.

The bulk of this work is based on a patch from Ed Smith-Rowland, which can
be found at the Git branch users/redi/heads/calendar.

Regression tested on x86_64-pc-linux-gnu, and also tested against the
testsuite for date.h of Howard Hinnant's 'date' library, i.e. the tests
at https://github.com/HowardHinnant/date/tree/master/test/date_test
(though some minor modifications to the tests are first needed to
account for the differences between the library API and the standard).

Co-authored-by: Ed Smith-Rowland <3dw...@verizon.net>
Co-authored-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/chrono (time_point::operator++)
(time_point::operator--): Define.
(utc_clock, tai_clock, gps_clock): Forward declare.
(utc_time, utc_seconds, tai_time, tai_seconds, gps_time)
(gps_seconds): Define.
(is_clock, is_clock, is_clock)
(is_clock_v, is_clock_v)
(is_clock_v): Define these specializations.
(leap_second_info): Define.
(day, month, year, weekday, weekday_indexed)
(weekday_last, month_day, month_day_last, month_weekday)
(month_weekday_last, year_month, year_month_day)
(year_month_day_last, year_month_weekday, year_month_weekday_last):
Declare and later define.
(last_spec, last, __detail::__days_per_month)
(__detail::__days_per_month, __detail::__last_day): Define.
(January, February, March, April, May, June, July, August)
(September, October, November, December, Sunday, Monday, Tuesday)
(Wednesday, Thursday, Friday, Saturday): Define.
(weekday::operator[]): Define out-of-line.
(year_month_day::_S_from_days, year_month_day::M_days_since_epoch):
Likewise.
(year_month_day::year_month_day, year_month_day::ok): Likewise.
(__detail::__pow10, hh_mm_ss): Define.
(literals::chrono_literals::operator""d)
(literals::chrono_literals::operator""y): Define.
(is_am, is_pm, make12, make24): Define.
* testsuite/20_util/time_point/4.cc: New test.
* testsuite/std/time/day/1.cc: New test.
* testsuite/std/time/hh_mm_ss/1.cc: New test.
* testsuite/std/time/is_am/1.cc: New test.
* testsuite/std/time/is_pm/1.cc: New test.
* testsuite/std/time/make12/1.cc: New test.
* testsuite/std/time/make24/1.cc: New test.
* testsuite/std/time/month/1.cc: New test.
* testsuite/std/time/month_day/1.cc: New test.
* testsuite/std/time/month_day_last/1.cc: New test.
* testsuite/std/time/month_weekday/1.cc: New test.
* testsuite/std/time/month_weekday_last/1.cc: New test.
* testsuite/std/time/weekday/1.cc: New test.
* testsuite/std/time/weekday_indexed/1.cc: New test.
* testsuite/std/time/weekday_last/1.cc: New test.
* testsuite/std/time/year/1.cc: New test.
* testsuite/std/time/year_month/1.cc: New test.
* testsuite/std/time/year_month_day/1.cc: New test.
* testsuite/std/time/year_month_day_last/1.cc: New test.
* testsuite/std/time/year_month_weekday/1.cc: New test.
* testsuite/std/time/year_month_weekday_last/1.cc: New test.
---
 libstdc++-v3/include/std/chrono   | 1907 +
 .../testsuite/20_util/time_point/4.cc |   42 +
 libstdc++-v3/testsuite/std/time/day/1.cc  |   67 +
 libstdc++-v3/testsuite/std/time/hh_mm_ss/1.cc |   63 +
 libstdc++-v3/testsuite/std/time/is_am/1.cc|   35 +
 libstdc++-v3/testsuite/std/time/is_pm/1.cc|   35 +
 libstdc++-v3/testsuite/std/time/make12/1.cc   |   36 +
 libstdc++-v3/testsuite/std/time/make24/1.cc   |   41 +
 libstdc++-v3/testsuite/std/time/month/1.cc|   75 +
 .../testsuite/std/time/month_day/1.cc |   73 +
 .../testsuite/std/time/month_day_last/1.cc|   65 +
 .../testsuite/std/time/month_weekday/1.cc |   48 +
 .../std/time/month_weekday_last/1.cc  |   48 +
 libstdc++-v3/testsuite/std/time/weekday/1.cc  |  102 +
 .../testsuite/std/time/weekday_indexed/1.cc   |   53 +
 .../testsuite/std/time/weekday_last/1.cc  |   48 +
 libstdc++-v3/testsuite/std/time/year/1.cc |   85 +
 .../testsuite/std/time/year_month/1.cc|   86 +
 .../testsuite/std/time/year_month_

Re: [PATCH v2] c++: Implement P1009: Array size deduction in new-expressions.

2020-08-24 Thread Jason Merrill via Gcc-patches

On 8/24/20 5:44 PM, Marek Polacek wrote:

On Fri, Aug 21, 2020 at 05:04:43PM -0400, Jason Merrill wrote:

On 8/20/20 4:22 PM, Marek Polacek wrote:

@@ -3917,6 +3926,47 @@ build_new (location_t loc, vec **placement, 
tree type,
 return error_mark_node;
   }
+  /* P1009: Array size deduction in new-expressions.  */
+  if (TREE_CODE (type) == ARRAY_TYPE
+  && !TYPE_DOMAIN (type)
+  && *init)
+{
+  /* This means we have 'new T[]()'.  */
+  if ((*init)->is_empty ())
+   {
+ tree ctor = build_constructor (init_list_type_node, NULL);
+ CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
+ vec_safe_push (*init, ctor);
+   }
+  tree &elt = (**init)[0];
+  /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
+  if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
+   {
+ /* Handle new char[]("foo").  */
+ if (vec_safe_length (*init) == 1
+ && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
+ && TREE_CODE (tree_strip_any_location_wrapper (elt))
+== STRING_CST)
+   /* Leave it alone: the string should not be wrapped in {}.  */;
+ else
+   {
+ /* Create a CONSTRUCTOR from the vector INIT.  */
+ tree list = build_tree_list_vec (*init);
+ tree ctor = build_constructor_from_list (init_list_type_node, 
list);
+ CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
+ CONSTRUCTOR_IS_PAREN_INIT (ctor) = true;


It seems wasteful to build a TREE_LIST only to throw it away; let's add a
function to build a CONSTRUCTOR directly from a vec*.  And use it in
build_new_method_call_1 as well.


Good point.  Done.


@@ -9082,14 +9084,17 @@ cp_parser_direct_new_declarator (cp_parser* parser)
 cp_parser_require (parser, CPP_OPEN_SQUARE, RT_OPEN_SQUARE);
 token = cp_lexer_peek_token (parser->lexer);
-  expression = cp_parser_expression (parser);
+  if (token->type == CPP_CLOSE_SQUARE)
+   expression = NULL_TREE;


Only the first bound is optional; we need to require the expression for
subsequent bounds.


Fixed (though the diagnostic we gave was pretty good).  New test added.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements C++20 P1009, allowing code like

   new double[]{1,2,3}; // array bound will be deduced

Since this proposal makes the initialization rules more consistent, it is
applied to all previous versions of C++ (thus, effectively, all the way back
to C++11).

My patch is based on Jason's patch that handled the basic case.  I've
extended it to work with ()-init and also the string literal case.
Further testing revealed that to handle stuff like

   new int[]{t...};

in a template, we have to consider such a NEW_EXPR type-dependent.
Obviously, we first have to expand the pack to be able to deduce the
number of elements in the array.

Curiously, while implementing this proposal, I noticed that we fail
to accept

   new char[4]{"abc"};

so I've assigned 77841 to self.  I think the fix will depend on the
build_new_1 hunk in this patch.

The new tree.c function build_constructor_from_vec helps us morph
a vector into a CONSTRUCTOR more efficiently.

gcc/cp/ChangeLog:

PR c++/93529
* call.c (build_new_method_call_1): Use build_constructor_from_vec
instead of build_tree_list_vec + build_constructor_from_list.
* init.c (build_new_1): Handle new char[]{"foo"}.  Use
build_constructor_from_vec instead of build_tree_list_vec +
build_constructor_from_list.
(build_new): Deduce the array size in new-expression if not
present.  Handle ()-init.  Handle initializing an array from
a string literal.
* parser.c (cp_parser_new_type_id): Leave [] alone.
(cp_parser_direct_new_declarator): Allow [].
* pt.c (type_dependent_expression_p): In a NEW_EXPR, consider
array types whose dimension has to be deduced type-dependent.

gcc/ChangeLog:

PR c++/93529
* tree.c (build_constructor_from_vec): New.
* tree.h (build_constructor_from_vec): Declare.

gcc/testsuite/ChangeLog:

PR c++/93529
* g++.dg/cpp0x/sfinae4.C: Adjust expected result after P1009.
* g++.dg/cpp2a/new-array1.C: New test.
* g++.dg/cpp2a/new-array2.C: New test.
* g++.dg/cpp2a/new-array3.C: New test.
* g++.dg/cpp2a/new-array4.C: New test.

Co-authored-by: Jason Merrill 
---
  gcc/cp/call.c   |  4 +-
  gcc/cp/init.c   | 55 +--
  gcc/cp/parser.c | 13 +++--
  gcc/cp/pt.c |  4 ++
  gcc/testsuite/g++.dg/cpp0x/sfinae4.C|  8 ++-
  gcc/testsuite/g++.dg/cpp2a/new-array1.C | 70 +
  gcc/testsuite/g++.dg/cpp2a/new-array2.C | 22 
  gcc/testsuite/g++.dg/cpp2a/new-array3.C | 17 ++
  gcc/testsuite/g++.dg/

[PATCH] fix a typo in rtl.def

2020-08-24 Thread Wei Wentao
Hi,

This patch fix a typo in rtl.def.

Regards!

Weiwt

---
 gcc/rtl.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rtl.def b/gcc/rtl.def
index 9754333eafb..7ec94a95105 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -380,7 +380,7 @@ DEF_RTL_EXPR(PC, "pc", "", RTX_OBJ)
 
 /* A register.  The "operand" is the register number, accessed with
the REGNO macro.  If this number is less than FIRST_PSEUDO_REGISTER
-   than a hardware register is being referred to.  The second operand
+   then a hardware register is being referred to.  The second operand
points to a reg_attrs structure.
This rtx needs to have as many (or more) fields as a MEM, since we
can change REG rtx's into MEMs during reload.  */
-- 
2.18.1





Re: [PATCH] c++: Fix up ptr.~PTR () handling [PR96721]

2020-08-24 Thread Jason Merrill via Gcc-patches

On 8/24/20 5:34 PM, Jakub Jelinek wrote:

Hi!

The following testcase is miscompiled, because build_trivial_dtor_call
handles the case when instance is a pointer by adding a clobber to what
the pointer points to (which is desirable e.g. for delete) rather than the
pointer itself.  That is I think always desirable behavior for references,
but for pointers for the pseudo dtor case it is not.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?


OK.


2020-08-24  Jakub Jelinek  

PR c++/96721
* cp-tree.h (build_trivial_dtor_call): Add bool argument defaulted
to false.
* call.c (build_trivial_dtor_call): Add NO_PTR_DEREF argument.  If
instance is a pointer and NO_PTR_DEREF is true, clobber the pointer
rather than what it points to.
* semantics.c (finish_call_expr): Call build_trivial_dtor_call with
true as NO_PTR_DEREF.

* g++.dg/opt/flifetime-dse8.C: New test.

--- gcc/cp/cp-tree.h.jj 2020-08-24 10:00:01.383257572 +0200
+++ gcc/cp/cp-tree.h2020-08-24 12:28:02.698410896 +0200
@@ -6248,7 +6248,7 @@ extern bool null_member_pointer_value_p
  extern bool sufficient_parms_p(const_tree);
  extern tree type_decays_to(tree);
  extern tree extract_call_expr (tree);
-extern tree build_trivial_dtor_call(tree);
+extern tree build_trivial_dtor_call(tree, bool = false);
  extern tree build_user_type_conversion(tree, tree, int,
 tsubst_flags_t);
  extern tree build_new_function_call   (tree, vec **,
--- gcc/cp/call.c.jj2020-08-18 07:50:18.564922335 +0200
+++ gcc/cp/call.c   2020-08-24 12:30:55.211961352 +0200
@@ -8430,10 +8430,12 @@ conv_binds_ref_to_prvalue (conversion *c
  }
  
  /* Call the trivial destructor for INSTANCE, which can be either an lvalue of

-   class type or a pointer to class type.  */
+   class type or a pointer to class type.  If NO_PTR_DEREF is true and
+   INSTANCE has pointer type, clobber the pointer rather than what it points
+   to.  */
  
  tree

-build_trivial_dtor_call (tree instance)
+build_trivial_dtor_call (tree instance, bool no_ptr_deref)
  {
gcc_assert (!is_dummy_object (instance));
  
@@ -8443,7 +8445,8 @@ build_trivial_dtor_call (tree instance)

return fold_convert (void_type_node, instance);
  }
  
-  if (INDIRECT_TYPE_P (TREE_TYPE (instance)))

+  if (INDIRECT_TYPE_P (TREE_TYPE (instance))
+  && (!no_ptr_deref || TYPE_REF_P (TREE_TYPE (instance
  {
if (VOID_TYPE_P (TREE_TYPE (TREE_TYPE (instance
goto no_clobber;
--- gcc/cp/semantics.c.jj   2020-08-18 07:50:18.653921086 +0200
+++ gcc/cp/semantics.c  2020-08-24 12:31:06.564800148 +0200
@@ -2713,7 +2713,7 @@ finish_call_expr (tree fn, vec



Re: [PATCH] Check calls before loop unrolling

2020-08-24 Thread Jiufu Guo via Gcc-patches

On 2020-08-24 19:16, Jan Hubicka wrote:

On Thu, Aug 20, 2020 at 6:35 AM guojiufu via Gcc-patches
 wrote:
>
> Hi,
>
> This patch is checking the _average_ number of calls which is the
> summary of call numbers multiply the possibility of the call maybe
> executed.  The _average_ number could be a fraction, to keep the
> precision, the param is the threshold number multiply 1.
>
Can you try mimicking what try_unroll_loop_completely on GIMPLE does
instead?  IIRC the main motivation to not unroll calls is the spilling 
code
around it which we cannot estimate very well.  And that spilling 
happens
irrespective of whether the call is in a hot or cold path so I'm not 
sure

it makes sense to use the "average" number of calls here.


In try_unroll_loop_completely, it is checking the calls in the hot path:
num_non_pure_calls_on_hot_path), and avoid unrolling if there is.
This is one reason for here to use "average".


As long as I remember, we excluded calls simply becuase it is/was an
expensive intruction so it was an indication that the loop overhead is
small compared to the overhead of loop body.


Thanks, Honza and Richard!


Honza


Re: [PATCH] Fix libstdc++ testsuite to handle VxWorks gthreads implementation

2020-08-24 Thread Alexandre Oliva
On Aug 24, 2020, Jonathan Wakely  wrote:

>>> OK for trunk, thanks.

>> Given the approval and the lack of significant changes, I'll put this in
>> unless there are objections in the next 48 hours.  Thanks for the review!

> Thanks. There's a new FAIL due to a bad merge.

Erhm...  Weird, I don't recall any manual adjustments to
packaged_task/cons/alloc.cc.  Indeed, the changes look exactly the same
that Corentin had proposed before.  Maybe the constraints were already
there, but hte unintended effects were not noticeable before the change
to the default?

Anyway, thanks for fixing the failure.  That said, I confess I don't get
how this test will (or should) ever be run in C++14 mode, being given an
explicit -std=gnu++11.  Is this really the right way to go about it?

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-24 Thread Kito Cheng via Gcc-patches
Hi Maciej:

Thanks for your patch, I tried to compile all target libraries with
-O0 and got link error as you describe, it's really confusing about
the error message, __pthread_mutex_lock and __pthread_mutex_unlock are
undefined, it turns out the root cause is the __umoddi3 called
_Unwind_Resume and pull-in lots of functions, so really appreciate you
digging out this issue.

After figure out what happen and reproduce what you see, I spend times
to dig out the reason why -fexceptions -fnon-call-exceptions is set
for div and mod routines, it was introduce at many years ago[1] and no
much comment on why doing this, but I guess it's because some ISAs
might generate exception/trap when divide-by-0, and then the execution
environment will convert that to an exception, so those files/routines
need to compile with such flags.

However RISC-V never causes a trap when divide-by-0, so I believe
-fexceptions -fnon-call-exceptions could be removed safely for RISC-V,
but I am not sure it's safe for all other targets or not.

I would suggest you set LIB2_DIVMOD_EXCEPTION_FLAGS to empty or
'-fasynchronous-unwind-tables' for RISC-V port and do not change the
default one in libgcc/Makefile.in.

[1] 
https://github.com/gcc-mirror/gcc/commit/fc6aa0a98a0c7d10d39dd9d1485d0996b84b1860#diff-f98ad72e54bdf47d90acbfafa467d802R132


On Fri, Aug 21, 2020 at 2:46 AM Maciej W. Rozycki via Gcc-patches
 wrote:
>
> Complement commit b932f770f70d ("x86_64 frame unwind info"), SVN r46374,
> , and replace
> `-fexceptions -fnon-call-exceptions' with `-fasynchronous-unwind-tables'
> in LIB2_DIVMOD_FUNCS compilation flags so as to provide unwind tables
> for the affected functions while not pulling the unwinder proper, which
> is not required here.
>
> Beyond saving program space it fixes a RISC-V glibc build error due to
> unsatisfied `malloc' and `free' references from the unwinder causing
> link errors with `ld.so' where libgcc has been built at -O0.
>
> gcc/
> * testsuite/gcc.target/arm/div64-unwinding.c: Rename to...
> * testsuite/gcc.dg/div64-unwinding.c: ... this.
>
> libgcc/
> * Makefile.in [!LIB2_DIVMOD_EXCEPTION_FLAGS]
> (LIB2_DIVMOD_EXCEPTION_FLAGS): Replace `-fexceptions
> -fnon-call-exceptions' with `-fasynchronous-unwind-tables'.
> ---
> Hi,
>
>  No change from v2 except for the removal of the ARM parts; hence no need
> to retest.  OK to apply?
>
>   Maciej
>
> Changes from v2:
>
> - Removal of the ARM overrides removed.
>
> Changes from v1:
>
> - ChangeLog entries added.
> ---
>  gcc/testsuite/gcc.dg/div64-unwinding.c |   25 
> +
>  gcc/testsuite/gcc.target/arm/div64-unwinding.c |   25 
> -
>  libgcc/Makefile.in |2 +-
>  3 files changed, 26 insertions(+), 26 deletions(-)
>
> gcc-libgcc-divmod-asynchronous-unwind-tables.diff
> Index: gcc/gcc/testsuite/gcc.dg/div64-unwinding.c
> ===
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.dg/div64-unwinding.c
> @@ -0,0 +1,25 @@
> +/* Performing a 64-bit division should not pull in the unwinder.  */
> +
> +/* { dg-do run { target { { ! *-*-linux* } && { ! *-*-uclinux* } } } } */
> +/* { dg-skip-if "load causes weak symbol resolution" { vxworks_kernel } } */
> +/* { dg-options "-O0" } */
> +
> +#include 
> +
> +long long
> +foo (long long c, long long d)
> +{
> +  return c/d;
> +}
> +
> +long long x = 0;
> +long long y = 1;
> +
> +extern int (*_Unwind_RaiseException) (void *) __attribute__((weak));
> +
> +int main(void)
> +{
> +  if (&_Unwind_RaiseException != NULL)
> +abort ();;
> +  return foo (x, y);
> +}
> Index: gcc/gcc/testsuite/gcc.target/arm/div64-unwinding.c
> ===
> --- gcc.orig/gcc/testsuite/gcc.target/arm/div64-unwinding.c
> +++ /dev/null
> @@ -1,25 +0,0 @@
> -/* Performing a 64-bit division should not pull in the unwinder.  */
> -
> -/* { dg-do run { target { { ! *-*-linux* } && { ! *-*-uclinux* } } } } */
> -/* { dg-skip-if "load causes weak symbol resolution" { vxworks_kernel } } */
> -/* { dg-options "-O0" } */
> -
> -#include 
> -
> -long long
> -foo (long long c, long long d)
> -{
> -  return c/d;
> -}
> -
> -long long x = 0;
> -long long y = 1;
> -
> -extern int (*_Unwind_RaiseException) (void *) __attribute__((weak));
> -
> -int main(void)
> -{
> -  if (&_Unwind_RaiseException != NULL)
> -abort ();;
> -  return foo (x, y);
> -}
> Index: gcc/libgcc/Makefile.in
> ===
> --- gcc.orig/libgcc/Makefile.in
> +++ gcc/libgcc/Makefile.in
> @@ -533,7 +533,7 @@ endif
>  ifeq ($(LIB2_DIVMOD_EXCEPTION_FLAGS),)
>  # Provide default flags for compiling divmod functions, if they haven't been
>  # set already by a target-specific Makefile fragment.
> -LIB2_DIVMOD_EXCEPTION_FLAGS := -fexceptions -fnon-ca

Re: [PATCH] middle-end: Simplify (sign_extend:HI (truncate:QI (ashiftrt:HI X 8)))

2020-08-24 Thread Jeff Law via Gcc-patches
On Sun, 2020-07-19 at 10:42 +0100, Roger Sayle wrote:
> The combination of several my recent nvptx patches has revealed an
> interesting RTL optimization opportunity.  This patch to simplify-rtx.c
> simplifies (sign_extend:HI (truncate:QI (?shiftrt:HI x 8))) to just
> (ashiftrt:HI x 8), as the inner shift already sets the high bits
> appropriately.  The equivalent zero_extend variant appears to already
> be implemented in simplify_unary_operation_1.
> 
> During the compilation of one of the tests in the test suite, we
> manage the generate the redundant sequence of instructions:
> 
> (insn 17 16 18 3 (set (reg:HI 35)
> (ashiftrt:HI (reg:HI 34 [ arg ])
> (const_int 8 [0x8]))) "v2si-cvt.c":14:8 94 {ashrhi3}
>  (expr_list:REG_DEAD (reg:HI 34 [ arg ])
> (nil)))
> (insn 18 17 19 3 (set (reg:QI 36)
> (truncate:QI (reg:HI 35))) "v2si-cvt.c":14:8 28 {trunchiqi2}
>  (expr_list:REG_DEAD (reg:HI 35)
> (nil)))
> (insn 19 18 20 3 (set (reg:HI 37)
> (sign_extend:HI (reg:QI 36))) "v2si-cvt.c":14:6 22 {extendqihi2}
>  (expr_list:REG_DEAD (reg:QI 36)
> (nil)))
I can't recall the target, but I've seen similar sequences as well.  Thanks for
digging into it. 

Jeff



Re: [PING][PATCH] tilepro: Update generator file to define IN_TARGET_CODE in target file.

2020-08-24 Thread Jeff Law via Gcc-patches
On Mon, 2020-08-17 at 09:58 +0200, Iain Buclaw via Gcc-patches wrote:
> Ping.
> 
> On 31/05/2020 12:48, Iain Buclaw wrote:
> > Hi,
> > 
> > The target files tilegx/mul-tables.c and tilepri/mul-tables.c were
> > updated in SVN r255743, but the generator file that produces them
> > wasn't, so it was reverting this change during builds.
> > 
> > Only tested by running make all-gcc for all tile*-*-* targets present in
> > config-list.mk.
> > 
> > OK?
> > 
> > Regards
> > Iain
> > 
> > ---
> > gcc/ChangeLog:
> > 
> > * config/tilepro/gen-mul-tables.cc (main): Define IN_TARGET_CODE to 1
> > in the target file.
OK
jeff
> > 



Re: [PING][PATCH 6/6] contrib: Add OPT-enable-obsolete to tile*-*-*

2020-08-24 Thread Jeff Law via Gcc-patches
On Mon, 2020-08-17 at 09:59 +0200, Iain Buclaw via Gcc-patches wrote:
> Ping.
> 
> On 31/05/2020 12:20, Iain Buclaw wrote:
> > The tile*-*-* targets were marked as obsolete in SVN r259724.
> > 
> > OK?
> > 
> > Regards
> > Iain
> > 
> > ---
> > contrib/ChangeLog:
> > 
> > * config-list.mk (LIST): Add OPT-enable-obsolete to tilegx-linux-gnu,
> > tilegxbe-linux-gnu, and tilepro-linux-gnu.
OK
jeff
> > 



Re: [PATCH] middle-end: Simplify popcount/parity of bswap/rotate.

2020-08-24 Thread Jeff Law via Gcc-patches
On Fri, 2020-08-21 at 17:52 +0100, Roger Sayle wrote:
> This simple patch to match.pd optimizes away bit permutation
> operations, specifically bswap and rotate, in calls to popcount and
> parity.  Although this patch has been developed and tested on LP64,
> it relies on there being no truncations or extensions to "marry up"
> the appropriate PARITY, PARITYL and PARITYLL forms with either BSWAP32
> or BSWAP64, assuming this transformation won't fire if the integral
> types have different sizes.
> 
> The following patch has been tested on x86_64-pc-linux-gnu with
> "make bootstrap" and "make -k check" with no new failures.
> Ok for mainline?
> 
> 2020-08-21  Roger Sayle  
> 
> gcc/ChangeLog
>   * gcc/match.pd (popcount(bswapN(x)) -> popcount(x),
>   popcount(rotate(x)) -> popcount(x), parity(bswapN(x)) -> parity(x),
>   parity(rotate(x)) -> parity(x)): New simplifications.
> 
> gcc/testsuite/ChangeLog
>   * gcc.dg/fold-popcount-6.c: New test.
>   * gcc.dg/fold-popcount-7.c: New test.
>   * gcc.dg/fold-parity-6.c: New test.
>   * gcc.dg/fold-parity-7.c: New test.
My worry here would be GCC's habit of not type checking arguments to builtins 
all
that well.  It's come up several times recently and while I think some of those
holes have been closed, I don't have much confidence we're doing a good job
there.

So with that in mind, this is fine if you verify that the type of the argument 
is
the same as the resultant type.  I think you've got access to both within the
match.pd framework.

jeff
> 



Re: [PATCH] middle-end: PR tree-optimization/21137: STRIP_NOPS avoids missed optimization.

2020-08-24 Thread Jeff Law via Gcc-patches
On Fri, 2020-08-21 at 17:55 +0100, Roger Sayle wrote:
> PR tree-optimization/21137 is now an old enhancement request pointing out
> that an optimization I added back in 2006, to optimize "((x>>31)&64) != 0"
> as "x < 0", doesn't fire in the presence of unanticipated type conversions.
> The fix is to call STRIP_NOPS at the appropriate point.
> 
> I'd considered moving this transformation to match.pd, but it's a lot of
> complex logic that (I suspect) would be just as ugly in match.pd as it is
> in fold-const.c.
> 
> This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
> and "make -k check" with no new failures.
> Ok for mainline?
> 
> 2020-08-21  Roger Sayle  
> 
> gcc/ChangeLog
>   PR tree-optimization/21137
>   * gcc/fold-const.c (fold_binary_loc) [NE_EXPR/EQ_EXPR]: Call
>   STRIP_NOPS when checking whether to simplify ((x>>C1)&C2) != 0.
> 
> gcc/testsuite/ChangeLog
>   PR tree-optimization/21137
>   * gcc.dg/pr21137.c: New test.
OK
jeff
> 



Re: [PATCH] sra: Bail out when encountering accesses with negative offsets (PR 96730)

2020-08-24 Thread Jeff Law via Gcc-patches
On Mon, 2020-08-24 at 17:07 +0200, Martin Jambor wrote:
> Hi,
> 
> I must admit I was quite surprised to see that SRA does not disqualify
> an aggregate from any transformations when it encounters an offset for
> which get_ref_base_and_extent returns a negative offset.  It may not
> matter too much because I sure hope such programs always have
> undefined behavior (SRA candidates are local variables on stack) but
> it is probably better not to perform weird transformations on them as
> build ref model with the new build_reconstructed_reference function
> currently happily do for negative offsets (they just copy the existing
> expression which is then used as the expression of a "propagated"
> access) and of course the compiler must not ICE (as it currently does
> because the SRA forest verifier does not like the expression).
> 
> Fixed with the following patch which also passed bootstrap and testing
> on an x86_64-linux.  OK for master and later on for the gcc-10 branch?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2020-08-24  Martin Jambor  
> 
>   PR tree-optimization/96730
>   * tree-sra.c (create_access): Disqualify any aggregate with negative
>   offset access.
>   (build_ref_for_model): Add assert that offset is non-negative.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-08-24  Martin Jambor  
> 
>   PR tree-optimization/96730
>   * gcc.dg/tree-ssa/pr96730.c: New test.
OK
jeff
> 



Re: [PATCH] match.pd: Simplify copysign (x, -x) to -x [PR96715]

2020-08-24 Thread Jeff Law via Gcc-patches
On Mon, 2020-08-24 at 23:39 +0200, Jakub Jelinek via Gcc-patches wrote:
> Hi!
> 
> The following patch implements an optimization suggested in the PR,
> copysign(x,-x) can be optimized into -x (even without -ffast-math,
> should work fine even for signed zeros and infinities or nans).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2020-08-24  Jakub Jelinek  
> 
>   PR tree-optimization/96715
>   * match.pd (copysign(x,-x) -> -x): New simplification.
> 
>   * gcc.dg/tree-ssa/copy-sign-3.c: New test.
OK
jeff
> 



Re: [PATCH] MIPS: Fix __builtin_longjmp (PR 64242)

2020-08-24 Thread Jeff Law via Gcc-patches
On Sun, 2020-07-12 at 14:27 +0800, Paul Hua wrote:
> From 589dbe8a1c2397bfafefa4e84abe5ec6e6798928 Mon Sep 17 00:00:00 2001
> From: Andrew Pinski 
> Date: Wed, 12 Feb 2020 11:42:57 +
> Subject: [PATCH] MIPS: Fix __builtin_longjmp (PR 64242)
> 
> The problem here is mips has its own builtin_longjmp
> pattern and it was not fixed when expand_builtin_longjmp
> was fixed.  We need to read the new fp and gp before
> restoring the stack as the buffer might be a local
> variable.
> 
> Change-Id: I3416568e260e6bde3ad5cc470fb4f2ecfa207f05
> Signed-off-by: Andrew Pinski 
> 
> This patch from Andrew, I bootstrapped and tested on mips64el-linux-gnu.
> 
> OK for master ?
> 
> gcc/ChangeLog:
> 
> PR middle-end/64242
> * config/mips/mips.md (builtin_longjmp): Restore the frame pointer
>and stack pointer and gp.
OK
jeff



Re: std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-24 Thread Andrew MacLeod via Gcc-patches

On 8/24/20 5:53 PM, Jeff Law wrote:

On Wed, 2020-08-05 at 17:41 +0200, Aldy Hernandez wrote:

On 8/5/20 5:09 PM, Martin Jambor wrote:


On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:

[...]


* ipa-cp changes from vec to std::vec.

We are using std::vec to ensure constructors are run, which they aren't
in our internal vec<> implementation.  Although we usually steer away
from using std::vec because of interactions with our GC system,
ipcp_param_lattices is only live within the pass and allocated with calloc.


Ummm... I did not object but I will save the URL of this message in the
archive so that I can waive it in front of anyone complaining why I
don't use our internal vec's in IPA data structures.

But it actually raises a broader question: was this supposed to be an
exception, allowed only not to complicate the irange patch further, or
will this be generally accepted thing to do when someone wants to have a
vector of constructed items?

I don't want to start a precedent without further discussion, so let's
assume this was an exception.

I'd characterize it as an exception too.  When I looked at the patch for Aldy, I
called out this code as well and asked him to justify it and convince me it was
safe, which he did.  I'd to the same for anything else adding use of a standard
C++ container to hold a GC-able object.

I'd certainly like to get to a place where we could be using more standard C++
library components.  But it'll take time to get there and consensus within the
project.

jeff
I think there were adjustments made to gcc's vec, and the usage of 
std::vector is no longer present..


I also think it was not a long term perfect solution, but adequate for 
our needs at the moment.


Personally, I have no issue with std::vector when approriate.. too bad 
we can make it adequate for all our usage  :-P  That is a much bigger 
issue I am afraid.


Andrew




Re: std:vec for classes with constructor? (Was: Re: [patch] multi-range implementation for value_range (irange))

2020-08-24 Thread Jeff Law via Gcc-patches
On Wed, 2020-08-05 at 17:41 +0200, Aldy Hernandez wrote:
> On 8/5/20 5:09 PM, Martin Jambor wrote:
> 
> > On Fri, Jul 31 2020, Aldy Hernandez via Gcc-patches wrote:
> > 
> > [...]
> > 
> > > * ipa-cp changes from vec to std::vec.
> > > 
> > > We are using std::vec to ensure constructors are run, which they aren't
> > > in our internal vec<> implementation.  Although we usually steer away
> > > from using std::vec because of interactions with our GC system,
> > > ipcp_param_lattices is only live within the pass and allocated with 
> > > calloc.
> > > 
> > 
> > Ummm... I did not object but I will save the URL of this message in the
> > archive so that I can waive it in front of anyone complaining why I
> > don't use our internal vec's in IPA data structures.
> > 
> > But it actually raises a broader question: was this supposed to be an
> > exception, allowed only not to complicate the irange patch further, or
> > will this be generally accepted thing to do when someone wants to have a
> > vector of constructed items?
> 
> I don't want to start a precedent without further discussion, so let's 
> assume this was an exception.
I'd characterize it as an exception too.  When I looked at the patch for Aldy, I
called out this code as well and asked him to justify it and convince me it was
safe, which he did.  I'd to the same for anything else adding use of a standard
C++ container to hold a GC-able object.

I'd certainly like to get to a place where we could be using more standard C++
library components.  But it'll take time to get there and consensus within the
project.

jeff
> 



Re: [PATCH v2] c++: Implement P1009: Array size deduction in new-expressions.

2020-08-24 Thread Marek Polacek via Gcc-patches
On Fri, Aug 21, 2020 at 05:04:43PM -0400, Jason Merrill wrote:
> On 8/20/20 4:22 PM, Marek Polacek wrote:
> > @@ -3917,6 +3926,47 @@ build_new (location_t loc, vec 
> > **placement, tree type,
> > return error_mark_node;
> >   }
> > +  /* P1009: Array size deduction in new-expressions.  */
> > +  if (TREE_CODE (type) == ARRAY_TYPE
> > +  && !TYPE_DOMAIN (type)
> > +  && *init)
> > +{
> > +  /* This means we have 'new T[]()'.  */
> > +  if ((*init)->is_empty ())
> > +   {
> > + tree ctor = build_constructor (init_list_type_node, NULL);
> > + CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
> > + vec_safe_push (*init, ctor);
> > +   }
> > +  tree &elt = (**init)[0];
> > +  /* The C++20 'new T[](e_0, ..., e_k)' case allowed by P0960.  */
> > +  if (!DIRECT_LIST_INIT_P (elt) && cxx_dialect >= cxx20)
> > +   {
> > + /* Handle new char[]("foo").  */
> > + if (vec_safe_length (*init) == 1
> > + && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type)))
> > + && TREE_CODE (tree_strip_any_location_wrapper (elt))
> > +== STRING_CST)
> > +   /* Leave it alone: the string should not be wrapped in {}.  */;
> > + else
> > +   {
> > + /* Create a CONSTRUCTOR from the vector INIT.  */
> > + tree list = build_tree_list_vec (*init);
> > + tree ctor = build_constructor_from_list (init_list_type_node, 
> > list);
> > + CONSTRUCTOR_IS_DIRECT_INIT (ctor) = true;
> > + CONSTRUCTOR_IS_PAREN_INIT (ctor) = true;
> 
> It seems wasteful to build a TREE_LIST only to throw it away; let's add a
> function to build a CONSTRUCTOR directly from a vec*.  And use it in
> build_new_method_call_1 as well.

Good point.  Done.

> > @@ -9082,14 +9084,17 @@ cp_parser_direct_new_declarator (cp_parser* parser)
> > cp_parser_require (parser, CPP_OPEN_SQUARE, RT_OPEN_SQUARE);
> > token = cp_lexer_peek_token (parser->lexer);
> > -  expression = cp_parser_expression (parser);
> > +  if (token->type == CPP_CLOSE_SQUARE)
> > +   expression = NULL_TREE;
> 
> Only the first bound is optional; we need to require the expression for
> subsequent bounds.

Fixed (though the diagnostic we gave was pretty good).  New test added.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
This patch implements C++20 P1009, allowing code like

  new double[]{1,2,3}; // array bound will be deduced

Since this proposal makes the initialization rules more consistent, it is
applied to all previous versions of C++ (thus, effectively, all the way back
to C++11).

My patch is based on Jason's patch that handled the basic case.  I've
extended it to work with ()-init and also the string literal case.
Further testing revealed that to handle stuff like

  new int[]{t...};

in a template, we have to consider such a NEW_EXPR type-dependent.
Obviously, we first have to expand the pack to be able to deduce the
number of elements in the array.

Curiously, while implementing this proposal, I noticed that we fail
to accept

  new char[4]{"abc"};

so I've assigned 77841 to self.  I think the fix will depend on the
build_new_1 hunk in this patch.

The new tree.c function build_constructor_from_vec helps us morph
a vector into a CONSTRUCTOR more efficiently.

gcc/cp/ChangeLog:

PR c++/93529
* call.c (build_new_method_call_1): Use build_constructor_from_vec
instead of build_tree_list_vec + build_constructor_from_list.
* init.c (build_new_1): Handle new char[]{"foo"}.  Use
build_constructor_from_vec instead of build_tree_list_vec +
build_constructor_from_list.
(build_new): Deduce the array size in new-expression if not
present.  Handle ()-init.  Handle initializing an array from
a string literal.
* parser.c (cp_parser_new_type_id): Leave [] alone.
(cp_parser_direct_new_declarator): Allow [].
* pt.c (type_dependent_expression_p): In a NEW_EXPR, consider
array types whose dimension has to be deduced type-dependent.

gcc/ChangeLog:

PR c++/93529
* tree.c (build_constructor_from_vec): New.
* tree.h (build_constructor_from_vec): Declare.

gcc/testsuite/ChangeLog:

PR c++/93529
* g++.dg/cpp0x/sfinae4.C: Adjust expected result after P1009.
* g++.dg/cpp2a/new-array1.C: New test.
* g++.dg/cpp2a/new-array2.C: New test.
* g++.dg/cpp2a/new-array3.C: New test.
* g++.dg/cpp2a/new-array4.C: New test.

Co-authored-by: Jason Merrill 
---
 gcc/cp/call.c   |  4 +-
 gcc/cp/init.c   | 55 +--
 gcc/cp/parser.c | 13 +++--
 gcc/cp/pt.c |  4 ++
 gcc/testsuite/g++.dg/cpp0x/sfinae4.C|  8 ++-
 gcc/testsuite/g++.dg/cpp2a/new-array1.C | 70 +
 gcc/testsuite/g++.dg/cpp2a/new-array2.C | 22 
 gcc/testsuite/g++.dg/cpp2a/new-array3.C | 17

[PATCH] match.pd: Simplify copysign (x, -x) to -x [PR96715]

2020-08-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch implements an optimization suggested in the PR,
copysign(x,-x) can be optimized into -x (even without -ffast-math,
should work fine even for signed zeros and infinities or nans).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-08-24  Jakub Jelinek  

PR tree-optimization/96715
* match.pd (copysign(x,-x) -> -x): New simplification.

* gcc.dg/tree-ssa/copy-sign-3.c: New test.

--- gcc/match.pd.jj 2020-08-24 10:00:01.0 +0200
+++ gcc/match.pd2020-08-24 15:53:56.001381109 +0200
@@ -5293,6 +5293,11 @@ (define_operator_list COND_TERNARY
  (COPYSIGN_ALL @0 @0)
  @0)
 
+(simplify
+ /* copysign(x,-x) -> -x.  */
+ (COPYSIGN_ALL @0 (negate@1 @0))
+ @1)
+
 (simplify
  /* copysign(x,y) -> fabs(x) if y is nonnegative.  */
  (COPYSIGN_ALL @0 tree_expr_nonnegative_p@1)
--- gcc/testsuite/gcc.dg/tree-ssa/copy-sign-3.c.jj  2020-08-24 
16:09:11.543453434 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/copy-sign-3.c 2020-08-24 16:08:31.360020186 
+0200
@@ -0,0 +1,23 @@
+/* PR tree-optimization/96715 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-not "= __builtin_copysign" "optimized" } } */
+/* { dg-final { scan-tree-dump-times " = -x_\[0-9]*\\(D\\)" 3 "optimized" } } 
*/
+
+float
+foo (float x)
+{
+  return __builtin_copysignf (x, -x);
+}
+
+double
+bar (double x)
+{
+  return __builtin_copysign (x, -x);
+}
+
+long double
+baz (long double x)
+{
+  return __builtin_copysignl (x, -x);
+}

Jakub



RE: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-24 Thread Carl Love via Gcc-patches
Segher:

On Wed, 2020-08-19 at 15:16 -0500, Segher Boessenkool wrote:
> On Wed, Aug 19, 2020 at 02:19:12PM -0500, Peter Bergner wrote:
> > On 8/14/20 7:42 PM, Segher Boessenkool wrote:
> > > I think your current code is fine; I hadn't considered Bill's
> > > upcoming
> > > rewrite.  It is more important to make that go smoother than to
> > > fix some
> > > aesthetics right now.
> > > 
> > > Please check if the names for the builtins match the (future)
> > > documentation, and then, approved for trunk.  Thank you!
> > 
> > This is also a bug in GCC 10, so this should be backported too.
> > 
> > Segher, is this ok for Carl to backport to GCC 10 after it has sat
> > on
> > trunk for a few days?
> 
> Yes.  Thanks guys.

I attempted to do the backport however the patch doesn't even come
close to applying.  The names XVCVBF16SPN and XVCVSPBF16 are the only
two builtin names that exist in GCC 10.  The other issue is there is no
Power 10 builtin macro definitions in GCC 10.  So basically, I can fix
the issue with XVCVBF16SPN and XVCVSPBF16 to be restricted to Power 10
by adding the needed Power 10 macro definition.  

This is a whole new patch so I figure it needs to be reviewed to make
sure we want to make this change to GCC 10.  I did run the regression
tests again using a Power 9 machine to verify it complies and there are
no regression test failures.  

Please let me know if this is OK for the GCC 10 tree.  Thanks.

  Carl Love


[PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Back port to GCC 
10.

gcc/ChangeLog

2020-08-24  Carl Love  
* config/rs6000/rs6000-builtin.def: (BU_P10V_VSX_1) New builtin macro 
expansion.
(XVCVBF16SPN, XVCVSPBF16): Replace macro expansion BU_VSX_1 with 
BU_P10V_VSX_1.
* config/rs6000/rs6000-call.c: (VSX_BUILTIN_XVCVSPBF16, 
VSX_BUILTIN_XVCVBF16SPN):
Replace with P10V_BUILTIN_XVCVSPBF16, P10V_BUILTIN_XVCVBF16SPN 
respectively.
---
 gcc/config/rs6000/rs6000-builtin.def | 14 --
 gcc/config/rs6000/rs6000-call.c  |  4 ++--
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 88f78cb0a15..512b7629a46 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1014,6 +1014,16 @@
 | RS6000_BTC_BINARY),  \
CODE_FOR_ ## ICODE) /* ICODE */
 
+/* For builtins for power10 vector instructions that are encoded as altivec
+   instructions, use __builtin_altivec_ as the builtin name.  */
+
+#define BU_P10V_VSX_1(ENUM, NAME, ATTR, ICODE)\
+  RS6000_BUILTIN_1 (P10_BUILTIN_ ## ENUM,  /* ENUM */  \
+   "__builtin_vsx_" NAME,  /* NAME */  \
+   RS6000_BTM_P10, /* MASK */  \
+   (RS6000_BTC_ ## ATTR/* ATTR */  \
+   | RS6000_BTC_UNARY),\
+   CODE_FOR_ ## ICODE) /* ICODE */
 #endif
 
 
@@ -2698,8 +2708,8 @@ BU_SPECIAL_X (RS6000_BUILTIN_CFSTRING, 
"__builtin_cfstring", RS6000_BTM_ALWAYS,
  RS6000_BTC_MISC)
 
 /* POWER10 MMA builtins.  */
-BU_VSX_1 (XVCVBF16SPN, "xvcvbf16spn",  MISC, vsx_xvcvbf16spn)
-BU_VSX_1 (XVCVSPBF16,  "xvcvspbf16",   MISC, vsx_xvcvspbf16)
+BU_P10V_VSX_1 (XVCVBF16SPN,"xvcvbf16spn",  MISC, vsx_xvcvbf16spn)
+BU_P10V_VSX_1 (XVCVSPBF16, "xvcvspbf16",   MISC, vsx_xvcvspbf16)
 
 BU_MMA_1 (XXMFACC, "xxmfacc",  QUAD, mma_xxmfacc)
 BU_MMA_1 (XXMTACC, "xxmtacc",  QUAD, mma_xxmtacc)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 2cf78dfa5fe..fc1671e1bb2 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13383,8 +13383,8 @@ builtin_function_type (machine_mode mode_ret, 
machine_mode mode_arg0,
 case P8V_BUILTIN_VGBBD:
 case MISC_BUILTIN_CDTBCD:
 case MISC_BUILTIN_CBCDTD:
-case VSX_BUILTIN_XVCVSPBF16:
-case VSX_BUILTIN_XVCVBF16SPN:
+case P10V_BUILTIN_XVCVSPBF16:
+case P10V_BUILTIN_XVCVBF16SPN:
   h.uns_p[0] = 1;
   h.uns_p[1] = 1;
   break;
-- 
2.17.1




Re: [PATCH 2/5] Make sure that static data member constexpr isn't optimized away in test.

2020-08-24 Thread Jason Merrill via Gcc-patches

On 8/24/20 1:40 PM, Jakub Jelinek wrote:

On Mon, Aug 24, 2020 at 02:56:55PM +0200, Mark Wielaard wrote:

In DWARF5 class variables (static data members) are represented with a
DW_TAG_variable instead of a DW_TAG_member. Make sure the variable isn't
optimized away in the constexpr-var-1.C testcase so we can still match (2)
const_expr in the the assembly output.

Note that the same issue causes some failures in the gdb testsuite
for static data members when we enable DWARF5 by default:
https://sourceware.org/bugzilla/show_bug.cgi?id=26525
---
  gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
index 19062e29fd59..c6ad3f645379 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
@@ -7,3 +7,4 @@ struct S
  {
static constexpr int b = 6;
  } s;
+const int &c = s.b;


This looks incorrect to me, that is a workaround for a real GCC bug.

Shouldn't we instead do something like (untested) following patch?
I mean, for DWARF < 5 the static data members were using DW_TAG_member,
which has been always marked by the function, so IMHO we should also always
mark the DW_TAG_variables at the class scope that replaced those.


The earlier behavior seems like an accident, that happened because we 
always need to emit information about non-static data members.  I don't 
think we should take it as guidance.


In this case one reason we don't emit debug info is because (before 
C++17) there's no definition of 'b'.  After C++17 the in-class 
declaration of 'b' is a definition, but we don't have to give it a 
symbol, so there's still nothing for the debug info to describe.


This issue doesn't seem specific to class members; it also affects 
namespace-scope C++17 inline variables:


inline const int var = 42;
int main() { return var; }

Compiling this testcase with -g doesn't emit any debug info for 'var' 
even though it's used.


Should we assume that if a variable with DW_AT_const_value is TREE_USED, 
we need to write out debug info for it?


Jason



2020-08-24  Jakub Jelinek  

* dwarf2out.c (prune_unused_types_walk): Mark DW_TAG_variable DIEs
at class scope for DWARF5+.

--- gcc/dwarf2out.c.jj  2020-07-28 15:39:09.883757946 +0200
+++ gcc/dwarf2out.c 2020-08-24 19:33:16.503961786 +0200
@@ -29392,6 +29392,13 @@ prune_unused_types_walk (dw_die_ref die)
  if (die->die_perennial_p)
break;
  
+	  /* For static data members, the declaration in the class is supposed

+to have DW_TAG_member tag in DWARF{3,4} but DW_TAG_variable in
+DWARF5.  DW_TAG_member will be marked, so mark even such
+DW_TAG_variables in DWARF5.  */
+ if (dwarf_version >= 5 && class_scope_p (die->die_parent))
+   break;
+
  /* premark_used_variables marks external variables --- don't mark
 them here.  But function-local externals are always considered
 used.  */


Jakub





[PATCH] gimple: Ignore *0 = {CLOBBER} in path isolation [PR96722]

2020-08-24 Thread Jakub Jelinek via Gcc-patches
Hi!

Clobbers of MEM_REF with NULL address are just fancy nops, something we just
ignore and don't emit any code for it (ditto for other clobbers), they just
mark end of life on something, so we shouldn't infer from those that there
is some UB.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-08-24  Jakub Jelinek  

PR tree-optimization/96722
* gimple.c (infer_nonnull_range): Formatting fix.
(infer_nonnull_range_by_dereference): Return false for clobber stmts.

* g++.dg/opt/pr96722.C: New test.

--- gcc/gimple.c.jj 2020-08-03 22:54:51.417531677 +0200
+++ gcc/gimple.c2020-08-24 13:23:22.082312349 +0200
@@ -2917,8 +2917,8 @@ check_loadstore (gimple *, tree op, tree
 bool
 infer_nonnull_range (gimple *stmt, tree op)
 {
-  return infer_nonnull_range_by_dereference (stmt, op)
-|| infer_nonnull_range_by_attribute (stmt, op);
+  return (infer_nonnull_range_by_dereference (stmt, op)
+ || infer_nonnull_range_by_attribute (stmt, op));
 }
 
 /* Return true if OP can be inferred to be non-NULL after STMT
@@ -2930,7 +2930,8 @@ infer_nonnull_range_by_dereference (gimp
  non-NULL if -fdelete-null-pointer-checks is enabled.  */
   if (!flag_delete_null_pointer_checks
   || !POINTER_TYPE_P (TREE_TYPE (op))
-  || gimple_code (stmt) == GIMPLE_ASM)
+  || gimple_code (stmt) == GIMPLE_ASM
+  || gimple_clobber_p (stmt))
 return false;
 
   if (walk_stmt_load_store_ops (stmt, (void *)op,
--- gcc/testsuite/g++.dg/opt/pr96722.C.jj   2020-08-24 13:24:45.357132323 
+0200
+++ gcc/testsuite/g++.dg/opt/pr96722.C  2020-08-24 13:25:06.224836626 +0200
@@ -0,0 +1,20 @@
+// PR tree-optimization/96722
+// { dg-do run }
+// { dg-options "-O2" }
+
+struct S { int s; ~S () {} };
+
+void
+foo (S *a)
+{
+  if (a)
+return;
+  a->~S ();
+}
+
+int
+main ()
+{
+  S s;
+  foo (&s);
+}

Jakub



[PATCH] c++: Fix up ptr.~PTR () handling [PR96721]

2020-08-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled, because build_trivial_dtor_call
handles the case when instance is a pointer by adding a clobber to what
the pointer points to (which is desirable e.g. for delete) rather than the
pointer itself.  That is I think always desirable behavior for references,
but for pointers for the pseudo dtor case it is not.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2020-08-24  Jakub Jelinek  

PR c++/96721
* cp-tree.h (build_trivial_dtor_call): Add bool argument defaulted
to false.
* call.c (build_trivial_dtor_call): Add NO_PTR_DEREF argument.  If
instance is a pointer and NO_PTR_DEREF is true, clobber the pointer
rather than what it points to.
* semantics.c (finish_call_expr): Call build_trivial_dtor_call with
true as NO_PTR_DEREF.

* g++.dg/opt/flifetime-dse8.C: New test.

--- gcc/cp/cp-tree.h.jj 2020-08-24 10:00:01.383257572 +0200
+++ gcc/cp/cp-tree.h2020-08-24 12:28:02.698410896 +0200
@@ -6248,7 +6248,7 @@ extern bool null_member_pointer_value_p
 extern bool sufficient_parms_p (const_tree);
 extern tree type_decays_to (tree);
 extern tree extract_call_expr  (tree);
-extern tree build_trivial_dtor_call(tree);
+extern tree build_trivial_dtor_call(tree, bool = false);
 extern tree build_user_type_conversion (tree, tree, int,
 tsubst_flags_t);
 extern tree build_new_function_call(tree, vec **,
--- gcc/cp/call.c.jj2020-08-18 07:50:18.564922335 +0200
+++ gcc/cp/call.c   2020-08-24 12:30:55.211961352 +0200
@@ -8430,10 +8430,12 @@ conv_binds_ref_to_prvalue (conversion *c
 }
 
 /* Call the trivial destructor for INSTANCE, which can be either an lvalue of
-   class type or a pointer to class type.  */
+   class type or a pointer to class type.  If NO_PTR_DEREF is true and
+   INSTANCE has pointer type, clobber the pointer rather than what it points
+   to.  */
 
 tree
-build_trivial_dtor_call (tree instance)
+build_trivial_dtor_call (tree instance, bool no_ptr_deref)
 {
   gcc_assert (!is_dummy_object (instance));
 
@@ -8443,7 +8445,8 @@ build_trivial_dtor_call (tree instance)
   return fold_convert (void_type_node, instance);
 }
 
-  if (INDIRECT_TYPE_P (TREE_TYPE (instance)))
+  if (INDIRECT_TYPE_P (TREE_TYPE (instance))
+  && (!no_ptr_deref || TYPE_REF_P (TREE_TYPE (instance
 {
   if (VOID_TYPE_P (TREE_TYPE (TREE_TYPE (instance
goto no_clobber;
--- gcc/cp/semantics.c.jj   2020-08-18 07:50:18.653921086 +0200
+++ gcc/cp/semantics.c  2020-08-24 12:31:06.564800148 +0200
@@ -2713,7 +2713,7 @@ finish_call_expr (tree fn, vec

[PATCH] strlen: Fix handle_builtin_string_cmp [PR96758]

2020-08-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is miscompiled, because handle_builtin_string_cmp
sees a strncmp call with constant last argument 4, where one of the strings
has an upper bound of 5 bytes (due to it being an array of that size) and
the other has a known string length of 1 and the result is used only in
equality comparison.
It is folded into __builtin_strncmp_eq (str1, str2, 4), which is
incorrect, because that means reading 4 bytes from both strings and
comparing that.  When one of the strings has known strlen of 1, we want to
compare just 2 bytes, not 4, as strncmp shouldn't compare any bytes beyond
the null.
So, the last argument to __builtin_strncmp_eq should be the minimum of the
provided strncmp last argument and the known string length + 1 (assuming
the other string has only a known upper bound due to array size).

Besides that, I've noticed the code has been written with the intent to also
support the case where we know exact string length of both strings (but not
the string content, so we can't compute it at compile time).  In that case,
both cstlen1 and cstlen2 are non-negative and both arysiz1 and arysiz2 are
negative.  We wouldn't optimize that, cmpsiz would be either the strncmp
last argument, or for strcmp the first string length, but varsiz would be
-1 and thus cmpsiz would be never < varsiz.  The patch fixes it by using the
correct length, in that case using the minimum of the two and for strncmp
also the last argument.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
For backporting, I think we should keep the
-  if (cmpsiz < varsiz && used_only_for_zero_equality (lhs))
+  if ((varsiz < 0 || cmpsiz < varsiz) && used_only_for_zero_equality (lhs))
change outso that we don't introduce a new optimization and just fix the
bug.

2020-08-24  Jakub Jelinek  

PR tree-optimization/96758
* tree-ssa-strlen.c (handle_builtin_string_cmp): If both cstlen1
and cstlen2 are set, set cmpsiz to their minimum, otherwise use the
one that is set.  If bound is used and smaller than cmpsiz, set cmpsiz
to bound.  If both cstlen1 and cstlen2 are set, perform the 
optimization.

* gcc.dg/strcmpopt_12.c: New test.

--- gcc/tree-ssa-strlen.c.jj2020-07-28 15:39:10.078755265 +0200
+++ gcc/tree-ssa-strlen.c   2020-08-24 11:31:05.457832003 +0200
@@ -4485,11 +4485,19 @@ handle_builtin_string_cmp (gimple_stmt_i
 ++cstlen2;
 
   /* The exact number of characters to compare.  */
-  HOST_WIDE_INT cmpsiz = bound < 0 ? cstlen1 < 0 ? cstlen2 : cstlen1 : bound;
+  HOST_WIDE_INT cmpsiz;
+  if (cstlen1 >= 0 && cstlen2 >= 0)
+cmpsiz = MIN (cstlen1, cstlen2);
+  else if (cstlen1 >= 0)
+cmpsiz = cstlen1;
+  else
+cmpsiz = cstlen2;
+  if (bound >= 0)
+cmpsiz = MIN (cmpsiz, bound);
   /* The size of the array in which the unknown string is stored.  */
   HOST_WIDE_INT varsiz = arysiz1 < 0 ? arysiz2 : arysiz1;
 
-  if (cmpsiz < varsiz && used_only_for_zero_equality (lhs))
+  if ((varsiz < 0 || cmpsiz < varsiz) && used_only_for_zero_equality (lhs))
 {
   /* If the known length is less than the size of the other array
 and the strcmp result is only used to test equality to zero,
--- gcc/testsuite/gcc.dg/strcmpopt_12.c.jj  2020-08-24 11:36:21.380357549 
+0200
+++ gcc/testsuite/gcc.dg/strcmpopt_12.c 2020-08-24 11:36:35.405158915 +0200
@@ -0,0 +1,17 @@
+/* PR tree-optimization/96758 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int v = 1;
+
+int
+main ()
+{
+  const char *s = v ? "a" : "b";
+  char x[5];
+  char y[5] = "a\0a";
+  __builtin_memcpy (x, y, sizeof (y));
+  if (__builtin_strncmp (x, s, 4) != 0)
+__builtin_abort ();
+  return 0;
+}

Jakub



Re: [PATCH] Adjust tree-ssa-dom.c for irange API.

2020-08-24 Thread Jeff Law via Gcc-patches
On Tue, 2020-08-04 at 13:39 +0200, Aldy Hernandez wrote:
> This patch removes all uses of VR_ANTI_RANGE in DOM.  It required
> minor surgery in the switch handling code.
> 
> In doing so, I was able to abstract all the code handling the cases
> with ranges into its own function.  Interestingly, there is an exact
> copy of this function in VRP, so I was able to use that there too.
> 
> I also saw that most of simplify_stmt_for_jump_threading() is
> duplicated in VRP/DOM, but I left that alone.  The amount of
> duplicated code in this space is mind boggling.
> 
> OK?
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-dom.c (simplify_stmt_for_jump_threading): Abstract code out 
> to...
>   * tree-vrp.c (find_case_label_range): ...here.  Rewrite for to use 
> irange
>   API.
>   (simplify_stmt_for_jump_threading): Call find_case_label_range instead 
> of
>   duplicating the code in simplify_stmt_for_jump_threading.
>   * tree-vrp.h (find_case_label_range): New prototype.
OK
jeff
> 



[PATCH] PR fortran/96711 - ICE on NINT() Function

2020-08-24 Thread Harald Anlauf
When rounding a real to integer(16) an ICE happened due to an unhandled
case in build_round_expr.  The attached patch add a special case for this.

I had to change a fold_convert to a convert (which seems to be frontend
specific stuff), otherwise I would get errors from the generated GIMPLE.
Does anybody know if this could produce problems elsewhere?
I'm open to other / better solutions.

Anyway, the patch does regtest cleanly on x86_64-pc-linux-gnu.

OK for master?

Thanks,
Harald


PR fortran/96711 - ICE with NINT() for integer(16) result

When rounding a real to the nearest integer, temporarily convert the real
argument to a longer real kind when the result is of type/kind integer(16).

gcc/fortran/ChangeLog:

* trans-intrinsic.c (build_round_expr): Use temporary with
appropriate kind for conversion before rounding to nearest
integer when the result precision is 128 bits.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr96711.f90: New test.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 2483f016d8e..32fe9886c57 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -395,11 +395,24 @@ build_round_expr (tree arg, tree restype)
 fn = builtin_decl_for_precision (BUILT_IN_LROUND, argprec);
   else if (resprec <= LONG_LONG_TYPE_SIZE)
 fn = builtin_decl_for_precision (BUILT_IN_LLROUND, argprec);
+  else if (resprec >= argprec && resprec == 128)
+{
+  /* Search for a real kind suitable as temporary for conversion.  */
+  int kind = -1;
+  for (int i = 0; kind < 0 && gfc_real_kinds[i].kind != 0; i++)
+	if (gfc_real_kinds[i].mode_precision >= resprec)
+	  kind = gfc_real_kinds[i].kind;
+  if (kind < 0)
+	gfc_internal_error ("Could not find real kind with at least %d bits",
+			resprec);
+  arg = fold_convert (gfc_float128_type_node, arg);
+  fn = gfc_builtin_decl_for_float_kind (BUILT_IN_ROUND, kind);
+}
   else
 gcc_unreachable ();

-  return fold_convert (restype, build_call_expr_loc (input_location,
-		 fn, 1, arg));
+  return convert (restype, build_call_expr_loc (input_location,
+		fn, 1, arg));
 }


diff --git a/gcc/testsuite/gfortran.dg/pr96711.f90 b/gcc/testsuite/gfortran.dg/pr96711.f90
new file mode 100644
index 000..c6534713b52
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr96711.f90
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! { dg-require-effective-target fortran_integer_16 }
+! { dg-require-effective-target fortran_real_16 }
+!
+! PR fortran/96711 - ICE on NINT() Function
+
+program p
+  implicit none
+  integer(16) :: m
+  real(8) :: x
+  real(16):: y
+  x = 2 / epsilon (x)
+  y = 2 / epsilon (y)
+  m = nint (x, kind(m))
+  print *, m
+  m = nint (y, kind(m))
+  print *, m
+end program


Re: [PATCH 2/5] Make sure that static data member constexpr isn't optimized away in test.

2020-08-24 Thread Tom Tromey
>> This looks incorrect to me, that is a workaround for a real GCC bug.

Mark> I was discussing this after the BoF with Tom Tromey (CCed) and he also
Mark> thought gdb could/should actually support the DWARF5 representation,
Mark> but because the DW_TAG_variable was removed because the static data
Mark> member wasn't referenced in the gdb testcases.

I updated the gdb bug with some findings:

https://sourceware.org/bugzilla/show_bug.cgi?id=26525

Mark> This looks really good, and it makes all the FAILs in the gdb bug
Mark> report PASS (when build with -gdwarf-5 as default).

I thought this might be the case; thank you for trying it out.

Tom


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Qing Zhao via Gcc-patches



> On Aug 24, 2020, at 3:26 PM, Segher Boessenkool  
> wrote:
> 
> On Mon, Aug 24, 2020 at 01:48:02PM -0500, Qing Zhao wrote:
>> 
>> 
>>> On Aug 24, 2020, at 12:59 PM, Segher Boessenkool 
>>>  wrote:
>>> 
>>> [ Please quote correctly.  I fixed this up a bit. ]
>>> 
>>> On Mon, Aug 24, 2020 at 02:47:22PM +, Rodriguez Bahena, Victor wrote:
> The call-clobbered regs are the only ones you *can* touch.  That does
> not mean you should clear them all (it doesn't help much at all in some
> cases).  Only the backend knows.
 
 I think that for ROP mitigation purpose, we only need to clear the 
 call-used (i.e, call-clobbered) registers that are used in the current 
 routine and
 can pass parameters.
>>> 
>>> Which is more than you *can* do as well (consider return value registers
>>> for example; there are more cases, in general; only the backend code can
>>> know what is safe to do).
>> 
>> Yes, So, we agreed to move the code generation implementation part into 
>> backend.
>> 
>> In Middle-end, we will only compute the hard register set based on call abi 
>> information and data flow information, also handle the command line option.
> 
> You cannot in general figure out what registers you can clobber without
> asking the backend.  You can figure out some that you *cannot* clobber,
> but that isn't very useful.
> 
> Do you want to do this before or after the epilogue code is generated?

static rtx_insn *
make_epilogue_seq (void)
{
  if (!targetm.have_epilogue ())
return NULL;

  start_sequence ();
  emit_note (NOTE_INSN_EPILOGUE_BEG);

 + gen_call_used_regs_seq (); // this is the place to 
emit the zeroing insn sequence

  rtx_insn *seq = targetm.gen_epilogue ();
…
}

Any comment on this?

thanks.

Qing




> 
> 
> Segher



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Qing Zhao via Gcc-patches



> On Aug 24, 2020, at 3:20 PM, Segher Boessenkool  
> wrote:
> 
> Hi!
> 
> On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
>>> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool 
>>>  wrote:
>>> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool 
>  wrote:
> Numbers on how expensive this is (for what arch, in code size and in
> execution time) would be useful.  If it is so expensive that no one will
> use it, it helps security at most none at all :-(
>>> 
>>> Without numbers on this, no one can determine if it is a good tradeoff
>>> for them.  And we (the GCC people) cannot know if it will be useful for
>>> enough users that it will be worth the effort for us.  Which is why I
>>> keep hammering on this point.
>> I can collect some run-time overhead data on this, do you have a 
>> recommendation on what test suite I can use
>> For this testing? (Is CPU2017 good enough)?
> 
> I would use something more real-life, not 12 small pieces of code.

Then, what kind of real-life benchmark you are suggesting? 

> 
>>> (The other side of the coin is how much this helps prevent exploitation;
>>> numbers on that would be good to see, too.)
>> 
>> This can be well showed from the paper:
>> 
>> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming 
>> Attacks"
>> 
>> https://urldefense.com/v3/__https://ieeexplore.ieee.org/document/8445132__;!!GqivPVa7Brio!JbdLvo54xB3ORTeZqpy_PwZsL9drNLaKjbg14bTKMOwxt8LWnjZ8gJWlqtlrFKPh$
>>   
>> >  >
>> 
>> Please take a look at this paper. 
> 
> As I told you before, that isn't open information, I cannot reply to
> any of that.

A little confused here, what’s you mean by “open information”? Is the 
information in a published paper not open information?

Qing
> 
> 
> Segher



Re: [PATCH 3/5] Add DWARF5 variants of assembly scan tests that use DW_FORM_implicit_const

2020-08-24 Thread Mark Wielaard
On Mon, Aug 24, 2020 at 07:44:27PM +0200, Jakub Jelinek wrote:
> On Mon, Aug 24, 2020 at 02:56:56PM +0200, Mark Wielaard wrote:
> > Some DWARF tests scan the assembly output looking for constant values.
> > When using DWARF5 those constants might use DW_FORM_implicit_const,
> > which are output (in the comments) after the attribute instead of
> > before. To make sure these tests work introduce a -gdwarf-5 variant
> > of these tests and explicitly use -gdwarf-2 for the original.
> 
> I just wonder if we want to use -gdwarf-2 rather than -gdwarf-4 in the
> original, -gdwarf-5 has been the default for a couple of years and thus
> that is what those testshave been compiled with.

I used -gdwarf-2 because I thought that was still the default for some
arches/platforms. And they pass with -gdwarf-2.

> Also not sure about the -dwarf5 suffixes, couldn't we say just use
> pr41445-{7,8}.c, inline-var-2.C or inline3.c (or whatever next number
> with the same prefix is still unused)?

Sure, if that is a better naming scheme I'll rename them.

Cheers,

Mark



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Segher Boessenkool
On Mon, Aug 24, 2020 at 01:48:02PM -0500, Qing Zhao wrote:
> 
> 
> > On Aug 24, 2020, at 12:59 PM, Segher Boessenkool 
> >  wrote:
> > 
> > [ Please quote correctly.  I fixed this up a bit. ]
> > 
> > On Mon, Aug 24, 2020 at 02:47:22PM +, Rodriguez Bahena, Victor wrote:
> >>> The call-clobbered regs are the only ones you *can* touch.  That does
> >>> not mean you should clear them all (it doesn't help much at all in some
> >>> cases).  Only the backend knows.
> >> 
> >> I think that for ROP mitigation purpose, we only need to clear the 
> >> call-used (i.e, call-clobbered) registers that are used in the current 
> >> routine and
> >> can pass parameters.
> > 
> > Which is more than you *can* do as well (consider return value registers
> > for example; there are more cases, in general; only the backend code can
> > know what is safe to do).
> 
> Yes, So, we agreed to move the code generation implementation part into 
> backend.
> 
> In Middle-end, we will only compute the hard register set based on call abi 
> information and data flow information, also handle the command line option.

You cannot in general figure out what registers you can clobber without
asking the backend.  You can figure out some that you *cannot* clobber,
but that isn't very useful.

Do you want to do this before or after the epilogue code is generated?


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Segher Boessenkool
Hi!

On Mon, Aug 24, 2020 at 01:02:03PM -0500, Qing Zhao wrote:
> > On Aug 24, 2020, at 12:49 PM, Segher Boessenkool 
> >  wrote:
> > On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> >>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool 
> >>>  wrote:
> >>> Numbers on how expensive this is (for what arch, in code size and in
> >>> execution time) would be useful.  If it is so expensive that no one will
> >>> use it, it helps security at most none at all :-(
> > 
> > Without numbers on this, no one can determine if it is a good tradeoff
> > for them.  And we (the GCC people) cannot know if it will be useful for
> > enough users that it will be worth the effort for us.  Which is why I
> > keep hammering on this point.
> I can collect some run-time overhead data on this, do you have a 
> recommendation on what test suite I can use
> For this testing? (Is CPU2017 good enough)?

I would use something more real-life, not 12 small pieces of code.

> > (The other side of the coin is how much this helps prevent exploitation;
> > numbers on that would be good to see, too.)
> 
> This can be well showed from the paper:
> 
> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming 
> Attacks"
> 
> https://ieeexplore.ieee.org/document/8445132 
> 
> 
> Please take a look at this paper. 

As I told you before, that isn't open information, I cannot reply to
any of that.


Segher


Re: [PATCH 2/5] Make sure that static data member constexpr isn't optimized away in test.

2020-08-24 Thread Mark Wielaard
Hi Jakub,

On Mon, Aug 24, 2020 at 07:40:51PM +0200, Jakub Jelinek wrote:
> On Mon, Aug 24, 2020 at 02:56:55PM +0200, Mark Wielaard wrote:
> > In DWARF5 class variables (static data members) are represented with a
> > DW_TAG_variable instead of a DW_TAG_member. Make sure the variable isn't
> > optimized away in the constexpr-var-1.C testcase so we can still match (2)
> > const_expr in the the assembly output.
> > 
> > Note that the same issue causes some failures in the gdb testsuite
> > for static data members when we enable DWARF5 by default:
> > https://sourceware.org/bugzilla/show_bug.cgi?id=26525
> > ---
> >  gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C 
> > b/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
> > index 19062e29fd59..c6ad3f645379 100644
> > --- a/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
> > +++ b/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
> > @@ -7,3 +7,4 @@ struct S
> >  {
> >static constexpr int b = 6;
> >  } s;
> > +const int &c = s.b;
> 
> This looks incorrect to me, that is a workaround for a real GCC bug.

I was discussing this after the BoF with Tom Tromey (CCed) and he also
thought gdb could/should actually support the DWARF5 representation,
but because the DW_TAG_variable was removed because the static data
member wasn't referenced in the gdb testcases.

> Shouldn't we instead do something like (untested) following patch?
> I mean, for DWARF < 5 the static data members were using DW_TAG_member,
> which has been always marked by the function, so IMHO we should also always
> mark the DW_TAG_variables at the class scope that replaced those.
> 
> 2020-08-24  Jakub Jelinek  
> 
>   * dwarf2out.c (prune_unused_types_walk): Mark DW_TAG_variable DIEs
>   at class scope for DWARF5+.

This looks really good, and it makes all the FAILs in the gdb bug
report PASS (when build with -gdwarf-5 as default).

> --- gcc/dwarf2out.c.jj2020-07-28 15:39:09.883757946 +0200
> +++ gcc/dwarf2out.c   2020-08-24 19:33:16.503961786 +0200
> @@ -29392,6 +29392,13 @@ prune_unused_types_walk (dw_die_ref die)
> if (die->die_perennial_p)
>   break;
>  
> +   /* For static data members, the declaration in the class is supposed
> +  to have DW_TAG_member tag in DWARF{3,4} but DW_TAG_variable in
> +  DWARF5.  DW_TAG_member will be marked, so mark even such
> +  DW_TAG_variables in DWARF5.  */
> +   if (dwarf_version >= 5 && class_scope_p (die->die_parent))
> + break;
> +
> /* premark_used_variables marks external variables --- don't mark
>them here.  But function-local externals are always considered
>used.  */

Thanks,

Mark


Re: [PATCH 1/5] Don't enable -gvariable-location-views by default for DWARF5.

2020-08-24 Thread Mark Wielaard
Hi,

On Mon, Aug 24, 2020 at 07:38:10PM +0200, Jakub Jelinek wrote:
> On Mon, Aug 24, 2020 at 02:56:54PM +0200, Mark Wielaard wrote:
> > DWARF5 makes it possible to read loclists tables without consulting
> > the debuginfo tree by introducing a table header. Adding location views
> > breaks this (at least for binutils and elfutils). So don't enable
> > variable-location-views by default if DWARF5 or higher is selected.
> 
> This should be discussed with Alex, CCed.
> I'd say elfutils/binutils should just show .debug_loclists independent of
> .debug_info if there are no loc views and use .debug_info otherwise.

Yes, sorry, should have CCed Alex. I do agree this is less than
ideal. It was really just to make sure the elfutils testsuite was
clean.

But the issue is real. As is, when enabling DWARF5, it is impossible
for consumers to parse the .debug_loclists independently. The only way
to know whether (and where) the locviews are is by finding and then
parsing the corresponding CU DIE tree.

There is actually an alternative representation enabled by
-gvariable-location-views=incompat5 that might help, but as the name
implies it isn't standard DWARF5, and I believe neither elfutils nor
binutils parses it.

I am not sure what a good default is in case we enable -gdwarf-5.

Cheers,

Mark


libbacktrace patch committed: Add support for Mach-O 64-bit FAT files

2020-08-24 Thread Ian Lance Taylor via Gcc-patches
This libbacktrace patch adds support for Mach-O 64-bit FAT files.
Bootstrapped and tested on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian

libbacktrace/:
* macho.c (MACH_O_MH_MAGIC_FAT_64): Define.
(MACH_O_MH_CIGAM_FAT_64): Define.
(struct macho_fat_arch_64): Define.
(macho_add_fat): Add and use is_64 parameter.
(macho_add): Recognize 64-bit fat files.
diff --git a/libbacktrace/macho.c b/libbacktrace/macho.c
index 3aea70cdbbe..bd737226ca6 100644
--- a/libbacktrace/macho.c
+++ b/libbacktrace/macho.c
@@ -75,7 +75,7 @@ struct macho_header_64
 
 struct macho_header_fat
 {
-  uint32_t magic;  /* Magic number (MACH_O_MH_MAGIC_FAT) */
+  uint32_t magic;  /* Magic number (MACH_O_MH_(MAGIC|CIGAM)_FAT(_64)?) */
   uint32_t nfat_arch;   /* Number of components */
 };
 
@@ -85,6 +85,8 @@ struct macho_header_fat
 #define MACH_O_MH_MAGIC_64 0xfeedfacf
 #define MACH_O_MH_MAGIC_FAT0xcafebabe
 #define MACH_O_MH_CIGAM_FAT0xbebafeca
+#define MACH_O_MH_MAGIC_FAT_64 0xcafebabf
+#define MACH_O_MH_CIGAM_FAT_64 0xbfbafeca
 
 /* Value for the header filetype field.  */
 
@@ -105,6 +107,20 @@ struct macho_fat_arch
   uint32_t align;  /* Alignment of this entry */
 };
 
+/* A component of a 64-bit fat file.  This is used if the magic field
+   is MAGIC_FAT_64.  This is only used when some file size or file
+   offset is too large to represent in the 32-bit format.  */
+
+struct macho_fat_arch_64
+{
+  uint32_t cputype;/* CPU type */
+  uint32_t cpusubtype; /* CPU subtype */
+  uint64_t offset; /* File offset of this entry */
+  uint64_t size;   /* Size of this entry */
+  uint32_t align;  /* Alignment of this entry */
+  uint32_t reserved;   /* Reserved */
+};
+
 /* Values for the fat_arch cputype field (and the header cputype
field).  */
 
@@ -740,14 +756,14 @@ static int
 macho_add_fat (struct backtrace_state *state, const char *filename,
   int descriptor, int swapped, off_t offset,
   const unsigned char *match_uuid, uintptr_t base_address,
-  int skip_symtab, uint32_t nfat_arch,
+  int skip_symtab, uint32_t nfat_arch, int is_64,
   backtrace_error_callback error_callback, void *data,
   fileline *fileline_fn, int *found_sym)
 {
   int arch_view_valid;
   unsigned int cputype;
+  size_t arch_size;
   struct backtrace_view arch_view;
-  size_t archoffset;
   unsigned int i;
 
   arch_view_valid = 0;
@@ -765,21 +781,39 @@ macho_add_fat (struct backtrace_state *state, const char 
*filename,
   goto fail;
 #endif
 
+  if (is_64)
+arch_size = sizeof (struct macho_fat_arch_64);
+  else
+arch_size = sizeof (struct macho_fat_arch);
+
   if (!backtrace_get_view (state, descriptor, offset,
-  nfat_arch * sizeof (struct macho_fat_arch),
+  nfat_arch * arch_size,
   error_callback, data, &arch_view))
 goto fail;
 
-  archoffset = 0;
   for (i = 0; i < nfat_arch; ++i)
 {
-  struct macho_fat_arch fat_arch;
+  struct macho_fat_arch_64 fat_arch;
   uint32_t fcputype;
 
-  memcpy (&fat_arch,
- ((const char *) arch_view.data
-  + i * sizeof (struct macho_fat_arch)),
- sizeof fat_arch);
+  if (is_64)
+   memcpy (&fat_arch,
+   (const char *) arch_view.data + i * arch_size,
+   arch_size);
+  else
+   {
+ struct macho_fat_arch fat_arch_32;
+
+ memcpy (&fat_arch_32,
+ (const char *) arch_view.data + i * arch_size,
+ arch_size);
+ fat_arch.cputype = fat_arch_32.cputype;
+ fat_arch.cpusubtype = fat_arch_32.cpusubtype;
+ fat_arch.offset = (uint64_t) fat_arch_32.offset;
+ fat_arch.size = (uint64_t) fat_arch_32.size;
+ fat_arch.align = fat_arch_32.align;
+ fat_arch.reserved = 0;
+   }
 
   fcputype = fat_arch.cputype;
   if (swapped)
@@ -787,19 +821,17 @@ macho_add_fat (struct backtrace_state *state, const char 
*filename,
 
   if (fcputype == cputype)
{
- uint32_t foffset;
+ uint64_t foffset;
 
  /* FIXME: What about cpusubtype?  */
  foffset = fat_arch.offset;
  if (swapped)
-   foffset = __builtin_bswap32 (foffset);
+   foffset = __builtin_bswap64 (foffset);
  backtrace_release_view (state, &arch_view, error_callback, data);
  return macho_add (state, filename, descriptor, foffset, match_uuid,
base_address, skip_symtab, error_callback, data,
fileline_fn, found_sym);
}
-
-  archoffset += sizeof (struct macho_fat_arch);
 }
 
   error_callback (data, "could not find executable in fat file", 0);
@@ -980,6 +1012,7 @@ macho_add (struct backtrace_state *state, const char 
*filename, int descriptor,
   hdroffset = offset + sizeof (struct mach

Re: [PATCH] x86: Use target("baseline-isas-only") in

2020-08-24 Thread H.J. Lu via Gcc-patches
On Mon, Aug 24, 2020 at 12:25 PM Uros Bizjak  wrote:
>
> On Mon, Aug 24, 2020 at 6:17 PM H.J. Lu  wrote:
> >
> > On Mon, Aug 24, 2020 at 7:55 AM Uros Bizjak  wrote:
> > >
> > > On Mon, Aug 24, 2020 at 3:23 PM H.J. Lu  wrote:
> > >
> > > > > Speaking of pragmas, these should be added outside cpuid.h, like:
> > > > >
> > > > > #pragma GCC push_options
> > > > > #pragma GCC target("general-regs-only")
> > > > >
> > > > > #include 
> > > > >
> > > > > void cpuid_check ()
> > > > > ...
> > > > >
> > > > > #pragma GCC pop_options
> > > > >
> > > > > >footnote
> > > > >
> > > > > Nowadays, -march=native is mostly used outside generic target
> > > > > compilations, so for relevant avx512 targets, we still generate spills
> > > > > to mask regs. In future, we can review the setting of the tuning flag
> > > > > for a generic target in the same way as with SSE2 inter-reg moves.
> > > > >
> > > >
> > > > Florian raised an issue that we need to limit  to the basic 
> > > > ISAs.
> > > >  should be handled similarly to other intrinsic header files.
> > > > That is  should use
> > > >
> > > > #pragma GCC push_options
> > > > #ifdef __x86_64__
> > > > #pragma GCC target("arch=x86-64")
> > > > #else
> > > > #pragma GCC target("arch=i386")
> > > > ...
> > > > #pragma GCC pop_options
> > > >
> > > > Here is a patch.  OK for master?
> > >
> > > -ENOPATCH
> > >
> > > However, how will this affect inlining? Every single function in
> > > cpuid.h is defined as static __inline, and due to target flags
> > > mismatch, it won't be inlined anymore. These inline functions are used
> > > in some bit testing functions, and to keep them inlined, these should
> > > also use the same options to avoid non-basic ISAs. This is the reason
> > > cpuid.h should be #included after pragma, together with bit testing
> > > functions, as shown above.
> > >
> >
> > How about target("baseline-isas-only")? All CPUID functions are
> > inlined.
>
> No, I don't think this is a good idea. Now consider the situation that
> caller functions are compiled with e.g. -mgeneral-regs-only. Due to
> #pragmas, CPUID functions are compiled with a superset ISAs, so they
> again won't be inlined. ISAs of caller functions and CPUID should
> match, the best way is to include  after the #pragma. And
> IMO, general-regs-only target #pragma is an excellent setting for
> both: cpuid.h and caller bit testing functions.
>
> So, if we care about inlining, decorating cpuid.h with target pragmas
> is a bad idea.

This can be done with #pragma in .

-- 
H.J.


Re: [PATCH] x86: Use target("baseline-isas-only") in

2020-08-24 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 24, 2020 at 6:17 PM H.J. Lu  wrote:
>
> On Mon, Aug 24, 2020 at 7:55 AM Uros Bizjak  wrote:
> >
> > On Mon, Aug 24, 2020 at 3:23 PM H.J. Lu  wrote:
> >
> > > > Speaking of pragmas, these should be added outside cpuid.h, like:
> > > >
> > > > #pragma GCC push_options
> > > > #pragma GCC target("general-regs-only")
> > > >
> > > > #include 
> > > >
> > > > void cpuid_check ()
> > > > ...
> > > >
> > > > #pragma GCC pop_options
> > > >
> > > > >footnote
> > > >
> > > > Nowadays, -march=native is mostly used outside generic target
> > > > compilations, so for relevant avx512 targets, we still generate spills
> > > > to mask regs. In future, we can review the setting of the tuning flag
> > > > for a generic target in the same way as with SSE2 inter-reg moves.
> > > >
> > >
> > > Florian raised an issue that we need to limit  to the basic ISAs.
> > >  should be handled similarly to other intrinsic header files.
> > > That is  should use
> > >
> > > #pragma GCC push_options
> > > #ifdef __x86_64__
> > > #pragma GCC target("arch=x86-64")
> > > #else
> > > #pragma GCC target("arch=i386")
> > > ...
> > > #pragma GCC pop_options
> > >
> > > Here is a patch.  OK for master?
> >
> > -ENOPATCH
> >
> > However, how will this affect inlining? Every single function in
> > cpuid.h is defined as static __inline, and due to target flags
> > mismatch, it won't be inlined anymore. These inline functions are used
> > in some bit testing functions, and to keep them inlined, these should
> > also use the same options to avoid non-basic ISAs. This is the reason
> > cpuid.h should be #included after pragma, together with bit testing
> > functions, as shown above.
> >
>
> How about target("baseline-isas-only")? All CPUID functions are
> inlined.

No, I don't think this is a good idea. Now consider the situation that
caller functions are compiled with e.g. -mgeneral-regs-only. Due to
#pragmas, CPUID functions are compiled with a superset ISAs, so they
again won't be inlined. ISAs of caller functions and CPUID should
match, the best way is to include  after the #pragma. And
IMO, general-regs-only target #pragma is an excellent setting for
both: cpuid.h and caller bit testing functions.

So, if we care about inlining, decorating cpuid.h with target pragmas
is a bad idea.

Uros.


Re: [PATCH v5] dse: Remove partial load after full store for high part access[PR71309]

2020-08-24 Thread Jeff Law via Gcc-patches
On Thu, 2020-08-06 at 06:23 +0100, Richard Sandiford wrote:
> luoxhu  writes:
> > Hi Richard,
> > 
> > On 2020/8/3 22:01, Richard Sandiford wrote:
> > > > /* Try a wider mode if truncating the store mode to NEW_MODE
> > > >  requires a real instruction.  */
> > > > if (maybe_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE 
> > > > (store_mode))
> > > > @@ -1779,6 +1780,25 @@ find_shift_sequence (poly_int64 access_size,
> > > >   && !targetm.modes_tieable_p (new_mode, store_mode))
> > > > continue;
> > > >   
> > > > +  if (multiple_p (GET_MODE_BITSIZE (new_mode), shift)
> > > 
> > > Like I mentioned before, this should be the other way around:
> > > 
> > >  multiple_p (shift, GET_MODE_BITSIZE (new_mode))
> > > 
> > > i.e. for the subreg to be valid, the shift amount must be a multiple
> > > of the outer mode size in bits.
> > > 
> > > OK with that change, thanks, and sorry for the multiple review cycles.
> > 
> > Just had another question is do we want to backport this patch to gcc-10
> > and gcc-9 branches? :)
> 
> One for the RMs, but it looks like the PR was reported against GCC 7 and
> isn't a regression.  Given that it's also “just” a missed optimisation,
> I'm guessing it should be trunk only.
Not a release manager, but I agree, let's keep this on the trunk.

jeff
> 



[pushed] c++: Emit as-base 'tor symbols for final class. [PR95428]

2020-08-24 Thread Jason Merrill via Gcc-patches
For PR70462 I stopped emitting the as-base constructor and destructor
variants for final classes, because they can never be called.  Except that
it turns out that clang calls base variants from complete variants, even for
classes with virtual bases, and in some cases inlines them such that the
calls to the base variant are exposed.  So we need to continue to emit the
as-base symbols, even though they're unreachable by G++-compiled code.

Tested x86_64-pc-linux-gnu, applying to trunk and 10.

gcc/cp/ChangeLog:

PR c++/95428
* optimize.c (populate_clone_array): Revert PR70462 change.
(maybe_clone_body): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/other/final8.C: Adjust expected output.
---
 gcc/cp/optimize.c   | 10 ++
 gcc/testsuite/g++.dg/other/final8.C |  8 ++--
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c
index abdcd7fa19f..00621d636bf 100644
--- a/gcc/cp/optimize.c
+++ b/gcc/cp/optimize.c
@@ -244,19 +244,13 @@ populate_clone_array (tree fn, tree *fns)
   fns[1] = NULL_TREE;
   fns[2] = NULL_TREE;
 
-  tree ctx = DECL_CONTEXT (fn);
-
   FOR_EACH_CLONE (clone, fn)
 if (DECL_NAME (clone) == complete_dtor_identifier
|| DECL_NAME (clone) == complete_ctor_identifier)
   fns[1] = clone;
 else if (DECL_NAME (clone) == base_dtor_identifier
 || DECL_NAME (clone) == base_ctor_identifier)
-  {
-   /* We don't need to define the base variants for a final class.  */
-   if (!CLASSTYPE_FINAL (ctx))
- fns[0] = clone;
-  }
+  fns[0] = clone;
 else if (DECL_NAME (clone) == deleting_dtor_identifier)
   fns[2] = clone;
 else
@@ -481,7 +475,7 @@ maybe_clone_body (tree fn)
 
   /* Remember if we can't have multiple clones for some reason.  We need to
  check this before we remap local static initializers in clone_body.  */
-  if (!tree_versionable_function_p (fn) && fns[0] && fns[1])
+  if (!tree_versionable_function_p (fn))
 need_alias = true;
 
   /* We know that any clones immediately follow FN in the TYPE_FIELDS
diff --git a/gcc/testsuite/g++.dg/other/final8.C 
b/gcc/testsuite/g++.dg/other/final8.C
index f90f94e9ea0..67c87112353 100644
--- a/gcc/testsuite/g++.dg/other/final8.C
+++ b/gcc/testsuite/g++.dg/other/final8.C
@@ -1,6 +1,10 @@
+// PR c++/70462
+// PR c++/95428
 // { dg-do compile { target c++11 } }
-// { dg-final { scan-assembler-not "_ZN1BC2Ev" } }
-// { dg-final { scan-assembler-not "_ZN1BD2Ev" } }
+// { dg-final { scan-assembler "_ZN1BC1Ev" } }
+// { dg-final { scan-assembler "_ZN1BC2Ev" } }
+// { dg-final { scan-assembler "_ZN1BD2Ev" } }
+// { dg-final { scan-assembler "_ZN1BD1Ev" } }
 
 struct A { int i; A(); virtual ~A() = 0; };
 struct B final: public virtual A { int j; B(); ~B(); };

base-commit: 259d072067997ab8f55afcf735c91b6740fd0425
-- 
2.18.1



[PING #5][PATCH] separate reading past the end from -Wstringop-overflow

2020-08-24 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548786.html

On 8/10/20 10:48 AM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548786.html

On 7/26/20 11:42 AM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548786.html

On 7/13/20 6:05 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548786.html

On 7/7/20 12:56 PM, Martin Sebor wrote:

Ping.  Despite its size, there isn't much new in the patch, it
pretty much just splits an existing warning into two, one for
buffer overflow and another for "overread."

https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548786.html

On 6/23/20 8:05 PM, Martin Sebor wrote:

-Wstringop-overflow is issued for both writing and reading past
the end, even though the latter problem is often viewed as being
lower severity than the former, or at least sufficiently different
to triage separately.  In CWE, for example, each of the two kinds
of problems has its own classification (CWE-787: Out-of-bounds
Write, and CWE-125: Out-of-bounds Read).

To make this easier with GCC, the attached patch introduces
a new option called -Wstringop-overread and splits the current
indiscriminate warning into the respective two categories.  Other
than the new option the improvements also exposed a few instances
of reading past the end that GCC doesn't detect that the new code
does thanks to more consistent checking.

As usual, I tested the patch by building Binutils/GDB, Glibc, and
the Linux kernel with no unexpected (or any, in fact) instances
of the two warnings.

The work involved more churn that I expected, and maintaining
the expected precedence of warnings (missing terminating nul,
excessive bounds, etc.) turned out to be more delicate and time
consuming than I would have liked.  I think the result is cleaner
code, but I'm quite sure it can still stand to be made better.
That's also my plan for the next step when I'd like to move this
checking out of builtins.c and calls.c and into a file of its own.
Ultimately, Jeff and have would like to integrate it into the path
isolation pass.

Martin

PS There's a FIXME comment in expand_builtin_memory_chk as
a reminder of a future change.  It currently has no bearing
on anything.












Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Qing Zhao via Gcc-patches



> On Aug 24, 2020, at 12:59 PM, Segher Boessenkool  
> wrote:
> 
> [ Please quote correctly.  I fixed this up a bit. ]
> 
> On Mon, Aug 24, 2020 at 02:47:22PM +, Rodriguez Bahena, Victor wrote:
>>> The call-clobbered regs are the only ones you *can* touch.  That does
>>> not mean you should clear them all (it doesn't help much at all in some
>>> cases).  Only the backend knows.
>> 
>> I think that for ROP mitigation purpose, we only need to clear the call-used 
>> (i.e, call-clobbered) registers that are used in the current routine and
>> can pass parameters.
> 
> Which is more than you *can* do as well (consider return value registers
> for example; there are more cases, in general; only the backend code can
> know what is safe to do).

Yes, So, we agreed to move the code generation implementation part into backend.

In Middle-end, we will only compute the hard register set based on call abi 
information and data flow information, also handle the command line option.

Qing
> 
> 
> Segher



Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.

2020-08-24 Thread Giuliano Belinassi via Gcc-patches
Ho, Josh.

On 08/24, Josh Triplett wrote:
> On Sat, Aug 22, 2020 at 06:04:48PM -0300, Giuliano Belinassi wrote:
> > Hi, Josh
> > 
> > On 08/21, Josh Triplett wrote:
> > > On Thu, Aug 20, 2020 at 07:00:13PM -0300, Giuliano Belinassi wrote:
> > > > This patch series add a new flag "-fparallel-jobs=" to control if the
> > > > compiler should try to compile the current file in parallel.
> > > [...]
> > > > Bootstrapped and Regtested on Linux x86_64.
> > > > 
> > > > Giuliano Belinassi (6):
> > > >   Modify gcc driver for parallel compilation
> > > >   Implement a new partitioner for parallel compilation
> > > >   Implement fork-based parallelism engine
> > > >   Add `+' for Jobserver Integration
> > > >   Add invoke documentation
> > > >   New tests for parallel compilation feature
> > > 
> > > Very nice!
> > 
> > Thank you for your interest in this :)
> > 
> > > 
> > > I'm interested in testing this on a highly parallel system. What
> > > baseline do these patches apply to?  They don't seem to apply to GCC
> > > trunk.
> > 
> > Hummm, this was supposed to work on trunk out of the box. However,
> > there is a high probability that I messed up something while rebasing.
> > I will post a version 2 of it when I get more comments and when I fix
> > the Makefile issue that Joseph pointed out in other e-mail.
> > 
> > If you want to test it on a high parallel system, I think it will be
> > cool to see how it behaves also when --param=promote-statics=1, as it
> > increases parallelism opportunity. :)
> 
> I plan to try several variations, including that.
> 
> I'd like to see how it affects the performance of Linux kernel builds.

Well, I expect little to no impact on that.  I ran an experiment back
on 2018 looking for parallelism bottleneck in Kernel, and what I found
was that the developers did a good job on balancing the file sizes.

This was run on a machine with 4x AMD Opteron CPUs, (64 cores in total)
https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg

As you can see from this image, the jobs ends almost at the same time.

> 
> > > Also, I tried to bootstrap the current tip of the devel/autopar_devel
> > > branch, but ended up with compiler segfaults that all look like this:
> > > ../../gcc/zlib/compress.c:86:1: internal compiler error: Segmentation 
> > > fault
> > >86 | }
> > >   | ^
> > 
> > Well, there was once a bug in this branch when compiling with -flto that
> > caused the assembler output file not to be properly initialized early
> > enough, resulting in LTO LGEN stage writing into a invalid FILE pointer.
> > I fixed this during rebasing but I forgot to push to the autopar_devel
> > branch. In any case, I just pushed the recent changes to autopar_devel
> > which fix this issue.
> 
> That might explain the problem; I had tried to build gcc with the
> bootstrap-lto configuration.
> 
> > In any case, -fparallel-jobs= should NOT be used together with -flto.
> > Although I used part of the LTO engine for development of this feature,
> > they are meant for distinct things. I guess I should give a warning
> > about that in next version :)
> 
> Interesting. Is that something that could change in the future? I'd like
> to be able to get some parallelism when creating the object files, and
> then more parallelism when doing the final LTO link.

Well, if by "final LTO link" you mean LTO's Whole Program Analysis,
that is a quite challenging task to parallelize :)

As for the "creating object files", you mean the LTO LGEN, I think
it is not possible for now because -- as far as I understeand --, LTO
object files are just containers for a intermediate language and
does not support partial linking.

However, I would not expect LGEN bottlenecking compilation of any
project. Most compilation time is spent in optimization, that is
IPA and Intra-Procedural.

> 
> > Also, I just tested bootstrap with
> > 
> > ../gcc/configure --disable-multilib --enable-languages=c,c++
> >
> > on x86_64 linux and it is working.
> 
> I'd used --enable-multilib, and --enable-languages=c,c++,lto . Would
> that be expected to work?

Yes. If it doesn't, that is a bug :)

> 
> Thanks,
> Josh


Re: [PATCH] sra: Bail out when encountering accesses with negative offsets (PR 96730)

2020-08-24 Thread Richard Biener
On August 24, 2020 5:07:56 PM GMT+02:00, Martin Jambor  wrote:
>Hi,
>
>I must admit I was quite surprised to see that SRA does not disqualify
>an aggregate from any transformations when it encounters an offset for
>which get_ref_base_and_extent returns a negative offset.  It may not
>matter too much because I sure hope such programs always have
>undefined behavior (SRA candidates are local variables on stack) but
>it is probably better not to perform weird transformations on them as
>build ref model with the new build_reconstructed_reference function
>currently happily do for negative offsets (they just copy the existing
>expression which is then used as the expression of a "propagated"
>access) and of course the compiler must not ICE (as it currently does
>because the SRA forest verifier does not like the expression).
>
>Fixed with the following patch which also passed bootstrap and testing
>on an x86_64-linux.  OK for master and later on for the gcc-10 branch?

OK. 

Richard. 

>Thanks,
>
>Martin
>
>
>gcc/ChangeLog:
>
>2020-08-24  Martin Jambor  
>
>   PR tree-optimization/96730
>   * tree-sra.c (create_access): Disqualify any aggregate with negative
>   offset access.
>   (build_ref_for_model): Add assert that offset is non-negative.
>
>gcc/testsuite/ChangeLog:
>
>2020-08-24  Martin Jambor  
>
>   PR tree-optimization/96730
>   * gcc.dg/tree-ssa/pr96730.c: New test.
>---
> gcc/testsuite/gcc.dg/tree-ssa/pr96730.c | 13 +
> gcc/tree-sra.c  |  6 ++
> 2 files changed, 19 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr96730.c
>
>diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96730.c
>b/gcc/testsuite/gcc.dg/tree-ssa/pr96730.c
>new file mode 100644
>index 000..39a06846529
>--- /dev/null
>+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96730.c
>@@ -0,0 +1,13 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O1" } */
>+
>+struct a {
>+  int b;
>+  int c;
>+} d() {
>+  struct a e[9];
>+  int f = 3362953455;
>+  e[f] = e[6];
>+  e[6].c = 1;
>+}
>+int main() {}
>diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>index fcba7fbdd31..754f41302fc 100644
>--- a/gcc/tree-sra.c
>+++ b/gcc/tree-sra.c
>@@ -931,6 +931,11 @@ create_access (tree expr, gimple *stmt, bool
>write)
> }
>   if (size == 0)
> return NULL;
>+  if (offset < 0)
>+{
>+  disqualify_candidate (base, "Encountered a negative offset
>access.");
>+  return NULL;
>+}
>   if (size < 0)
> {
>   disqualify_candidate (base, "Encountered an unconstrained access.");
>@@ -1667,6 +1672,7 @@ build_ref_for_model (location_t loc, tree base,
>HOST_WIDE_INT offset,
>struct access *model, gimple_stmt_iterator *gsi,
>bool insert_after)
> {
>+  gcc_assert (offset >= 0);
>   if (TREE_CODE (model->expr) == COMPONENT_REF
>   && DECL_BIT_FIELD (TREE_OPERAND (model->expr, 1)))
> {



[committed] doc: Switch valgrind.com to https

2020-08-24 Thread Gerald Pfeifer
Pushed.

Jonathan, I found two more references to valgrind.com without https (which 
lead to a redirect) at http://gcc.gnu.org/onlinedocs/libstdc++/manual/debug.html
  http://valgrind.org/docs/manual/drd-manual.html
  http://valgrind.org/docs/manual/hg-manual.html

Would you mind having a look, perhaps?  That would be lovely, thanks.

Gerald

gcc/ChangeLog:
* doc/install.texi (Configuration): Switch valgrind.com to https.
---
 gcc/doc/install.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index e1ca876d729..5330bf3bb29 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1887,7 +1887,7 @@ checking with extra checks that might affect code 
generation and should
 therefore not differ between stage1 and later stages in bootstrap.
 
 The @samp{valgrind} check requires the external @command{valgrind} simulator,
-available from @uref{http://valgrind.org/}.  The @samp{rtl} checks are
+available from @uref{https://valgrind.org}.  The @samp{rtl} checks are
 expensive and the @samp{df}, @samp{gcac} and @samp{valgrind} checks are very
 expensive.
 
-- 
2.28.0


[committed] wwwdocs: Switch www.validlab.com to https

2020-08-24 Thread Gerald Pfeifer
Pushed. Gerald.

---
 htdocs/bugs/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/bugs/index.html b/htdocs/bugs/index.html
index a6631d8a..88fba1b9 100644
--- a/htdocs/bugs/index.html
+++ b/htdocs/bugs/index.html
@@ -300,7 +300,7 @@ to round to the nearest representable number.
 
 This is not a bug in the compiler, but an inherent limitation of
 the floating point types. Please study
-http://www.validlab.com/goldberg/paper.ps";>this paper
+https://www.validlab.com/goldberg/paper.ps";>this paper
 for more information.
 
 
-- 
2.28.0


Re: [PATCH] cmpelim: recognize extra clobbers in insns

2020-08-24 Thread Jeff Law via Gcc-patches
On Thu, 2020-08-06 at 12:42 +, Pip Cet via Gcc-patches wrote:
> I'm working on the AVR cc0 -> CCmode conversion (bug#92729). One
> problem is that the cmpelim pass is currently very strict in requiring
> insns of the form
> 
> (parallel [(set (reg:SI) (op:SI ... ...))
>(clobber (reg:CC REG_CC))])
> 
> when in fact AVR's insns often have the form
> 
> (parallel [(set (reg:SI) (op:SI ... ...))
>(clobber (scratch:QI))
>(clobber (reg:CC REG_CC))])
> 
> The attached patch relaxes checks in the cmpelim code to recognize
> such insns, and makes it attempt to recognize
> 
> (parallel [(set (reg:CC REG_CC) (compare:CC ... ...))
>(set (reg:SI (op:SI ... ...)))
>(clobber (scratch:QI))])
> 
> as a new insn for that example. This appears to work.
> 
> I've bootstrapped and run the test suite with the patch, without differences.
So it looks like Richard has given you some feedback and you've got some further
work to do.  I just wanted to chime in and say thanks for tackling this.

The H8 has similiar problems -- I've fixed them in a different way, but 
improving
cmp-elim is still a good thing.
Jeff



Re: [PATCH 1/6] Modify gcc driver for parallel compilation

2020-08-24 Thread Giuliano Belinassi via Gcc-patches
Hi, Richi.

On 08/24, Richard Biener wrote:
> On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
>  wrote:
> >
> > Update the driver for parallel compilation. This process work as
> > follows:
> >
> > When calling gcc, the driver will check if the flag
> > "-fparallel-jobs" was provided by the user. If yes, then we will
> > check what is the desired output, and if it can be parallelized.
> > There are the following cases, which is described:
> >
> > 1. -S or -E was provided: We can't run in parallel, as the output
> >can not be easily merged together into one file.
> >
> > 2. -c was provided: When cc1* forks into multiple processes, it
> >must tell the driver where it stored its generated assembler files.
> >Therefore we pass a hidden "-fsplit-outputs=filename" to the compiler,
> >and we check if "filename" was created by it. If yes, we open it,
> >call assembler for each generated asm file
> >(this file must not be empty), and link them together with
> >partial linking to a single .o file. This process is done for each
> >object file in the argument list.
> >
> > 3. -c was not provided, and the final product will be an binary: Here
> >we proceed exactly as 2., but we avoid doing the partial
> >linking, feeding the generated object files directly into the final link.
> >
> > For that to work, we had to heavily modify how the "execute" function
> > works, extracting common code which is used multiple times, and
> > also detecting when the command is a call to a compiler or an
> > assembler, as can be seen in append_split_outputs.
> >
> > Finally, we added some tests which reflects all cases found when
> > bootstrapping the compiler, so development of further features to the
> > driver get faster for now on.
> 
> Few comments inline, Joseph may want to comment on the overall
> structure as driver maintainer (CCed).
> 
> I know I asked for the changes on the branch to be squashed but
> the diff below is quite unreadable with the ChangeLog not helping
> the overall refactoring much.  Is it possible to do some of the
> factoring/refactoring without any functionality change to make the
> actual diff easier to follow?

Well, the refactoring is necessary, otherwise I would need to copy and
paste a really huge amount of code.

What I can do (and sounds reasonable to me) is to break this patch into
two parts; one with just refactoring changes, and the other adding the
parallelism engine.

> 
> Thanks,
> Richard.
> 
> > gcc/ChangeLog
> > 2020-08-20  Giuliano Belinassi  
> >
> > * common.opt (fsplit-outputs): New flag.
> > (fparallel-jobs): New flag.
> > * gcc.c (extra_arg_storer): New class.
> > (have_S): New variable.
> > (struct command): Move from execute.
> > (is_compiler): New function.
> > (is_assembler): New function.
> > (get_number_of_args): New function.
> > (get_file_by_lines): New function.
> > (identify_asm_file): New function.
> > (struct infile): New attribute temp_additional_asm.
> > (current_infile): New variable.
> > (get_path_to_ld): New function.
> > (has_hidden_E): New function.
> > (sort_asm_files): New function.
> > (append_split_outputs): New function.
> > (print_command): New function.
> > (print_commands): New function.
> > (print_argbuf): New function.
> > (handle_verbose): Extracted from execute.
> > (append_valgrind): Same as above.
> > (async_launch_commands): Same as above.
> > (await_commands_to_finish): Same as above.
> > (split_commands): Same as above.
> > (parse_argbuf): Same as above.
> > (execute): Refator.
> > (fsplit_arg): New function.
> > (alloc_infile): Initialize infiles with 0.
> > (process_command): Remember when -S was passed.
> > (do_spec_on_infiles): Remember current infile being processed.
> > (maybe_run_linker): Replace object files when -o is a executable.
> > (finalize): Deinitialize temp_object_files.
> >
> > gcc/testsuite/ChangeLog:
> > 20-08-2020  Giuliano Belinassi  
> >
> > * driver/driver.exp: New test.
> > * driver/a.c: New file.
> > * driver/b.c: New file.
> > * driver/empty.c: New file.
> > * driver/foo.c: New file.
> > ---
> >  gcc/common.opt  |4 +
> >  gcc/gcc.c   | 1219 ---
> >  gcc/testsuite/driver/a.c|6 +
> >  gcc/testsuite/driver/b.c|6 +
> >  gcc/testsuite/driver/driver.exp |   80 ++
> >  gcc/testsuite/driver/empty.c|0
> >  gcc/testsuite/driver/foo.c  |7 +
> >  7 files changed, 1049 insertions(+), 273 deletions(-)
> >  create mode 100644 gcc/testsuite/driver/a.c
> >  create mode 100644 gcc/testsuite/driver/b.c
> >  create mode 100644 gcc/testsuite/driver/driver.exp
> >  create mode 100644 gcc/testsuite/driver/empt

Re: reorg.c (fill_slots_from_thread): Improve for TARGET_FLAGS_REGNUM targets

2020-08-24 Thread Jeff Law via Gcc-patches
On Fri, 2020-08-14 at 07:27 +0200, Hans-Peter Nilsson via Gcc-patches wrote:
> Originally I thought to bootstrap this patch on MIPS and SPARC
> since they're both delayed-branch-slot targets but I
> reconsidered, as neither is a TARGET_FLAGS_REGNUM target.  It
> seems only visium and CRIS has this feature set, and I see no
> trace of visium in neither newlib nor the simulator next to
> glibc.  So, I just tested cris-elf.
[ Sorry, been focused on non-upstream stuff for the last month+, so yea,
  some stuff is falling through the cracks. ]

Ibuild newlib on visium daily.  I also run the gcc testsuite, but using dummy
simulator.  So we're really just testing that the code generator doesn't fault
and the various dg-scan style tests.

http://3.14.90.209:8080/job/visium-elf/


> 
> This handles TARGET_FLAGS_REGNUM clobbering insns as delay-slot
> fillers using a method similar to that in commit 33c2207d3fda,
> where care was taken for fill_simple_delay_slots to allow such
> insns when scanning for delay-slot fillers *backwards* (before
> the insn).
> 
> A TARGET_FLAGS_REGNUM target is typically a former cc0 target.
> For cc0 targets, insns don't mention clobbering cc0, so the
> clobbers are mentioned in the "resources" only as a special
> entity and only for compare-insns and branches, where the cc0
> value matters.
> 
> In contrast, with TARGET_FLAGS_REGNUM, most insns clobber it and
> the register liveness detection in reorg.c / resource.c treats
> that as a blocker (for other insns mentioning it, i.e. most)
> when looking for delay-slot-filling candidates.  This means that
> when comparing core and performance for a delay-slot cc0 target
> before and after the de-cc0 conversion, the inability to fill a
> delay slot after conversion manifests as a regression.  This was
> one such case, for CRIS, with random_bitstring in
> gcc.c-torture/execute/arith-rand-ll.c as well as the target
> libgcc division function.
I wouldn't be surprised if H8 would have tripped over this as well.  Converting
it to REG_CC is still in-progress and a failure to fill a delay slot would 
likely
have been missed in my performance testing to-date (delay slot filling isn't
terribly important on the H8 thankfully).


> 
> After this, all known performance regressions compared to cc0
> are fixed.
> 
> Ok to commit?
> 
> gcc:
>   PR target/93372
>   * reorg.c (fill_slots_from_thread): Allow trial insns that clobber
>   TARGET_FLAGS_REGNUM as delay-slot fillers.
> 
> gcc/testsuite:
>   PR target/93372
>   * gcc.target/cris/pr93372-47.c: New test.
It looks reasonable to me.

jeff
> 



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Qing Zhao via Gcc-patches



> On Aug 24, 2020, at 12:49 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
>>> On Aug 19, 2020, at 5:57 PM, Segher Boessenkool 
>>>  wrote:
>>> Numbers on how expensive this is (for what arch, in code size and in
>>> execution time) would be useful.  If it is so expensive that no one will
>>> use it, it helps security at most none at all :-(
> 
> Without numbers on this, no one can determine if it is a good tradeoff
> for them.  And we (the GCC people) cannot know if it will be useful for
> enough users that it will be worth the effort for us.  Which is why I
> keep hammering on this point.
I can collect some run-time overhead data on this, do you have a recommendation 
on what test suite I can use
For this testing? (Is CPU2017 good enough)?

> 
> (The other side of the coin is how much this helps prevent exploitation;
> numbers on that would be good to see, too.)

This can be well showed from the paper:

"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming 
Attacks"

https://ieeexplore.ieee.org/document/8445132 


Please take a look at this paper. 

> 
   So, from both run-time performance and code-size aspects, setting the
 registers to zero is a better approach. 
>>> 
>>> From a security perspective, this isn't clear though.  But that is a lot
>>> of extra research ;-)
>> 
>> There has been quite some discussion on this topic at
>> 
>> https://urldefense.com/v3/__https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html__;!!GqivPVa7Brio!PFjWvu3miQeS8XQehbw1moYxXTbbRvu9MTbjQxtxad_YQQGSdZg97Dl8-c2w5Y32$
>>   
>> >  >
>> 
>> From those old discussion, we can see that zero value should be good enough 
>> for the security purpose (though it’s not perfect).
> 
> And there has been zero proof or even any arguments from the security
> angle, only "anything other than 0 is too expensive", which isn't
> obviously true either (it isn't even cheaper than other small numbers,
> on many archs).
> 
> A large fraction of function arguments is zero in valid executions, so
> zeroing them out to try to prevent exploitation attempts might not help
> so much.

Please take a look at the paper:
"Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming 
Attacks"

https://ieeexplore.ieee.org/document/8445132 


From the study, zeroing out the registers mitigate the ROP very well.

thanks.

Qing



> 
> 
> Segher



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Segher Boessenkool
[ Please quote correctly.  I fixed this up a bit. ]

On Mon, Aug 24, 2020 at 02:47:22PM +, Rodriguez Bahena, Victor wrote:
> > The call-clobbered regs are the only ones you *can* touch.  That does
> > not mean you should clear them all (it doesn't help much at all in some
> > cases).  Only the backend knows.
> 
> I think that for ROP mitigation purpose, we only need to clear the call-used 
> (i.e, call-clobbered) registers that are used in the current routine and
> can pass parameters.

Which is more than you *can* do as well (consider return value registers
for example; there are more cases, in general; only the backend code can
know what is safe to do).


Segher


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Segher Boessenkool
On Wed, Aug 19, 2020 at 06:27:45PM -0500, Qing Zhao wrote:
> > On Aug 19, 2020, at 5:57 PM, Segher Boessenkool 
> >  wrote:
> > Numbers on how expensive this is (for what arch, in code size and in
> > execution time) would be useful.  If it is so expensive that no one will
> > use it, it helps security at most none at all :-(

Without numbers on this, no one can determine if it is a good tradeoff
for them.  And we (the GCC people) cannot know if it will be useful for
enough users that it will be worth the effort for us.  Which is why I
keep hammering on this point.

(The other side of the coin is how much this helps prevent exploitation;
numbers on that would be good to see, too.)

> >>So, from both run-time performance and code-size aspects, setting the
> >> registers to zero is a better approach. 
> > 
> > From a security perspective, this isn't clear though.  But that is a lot
> > of extra research ;-)
> 
> There has been quite some discussion on this topic at
> 
> https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html 
> 
> 
> From those old discussion, we can see that zero value should be good enough 
> for the security purpose (though it’s not perfect).

And there has been zero proof or even any arguments from the security
angle, only "anything other than 0 is too expensive", which isn't
obviously true either (it isn't even cheaper than other small numbers,
on many archs).

A large fraction of function arguments is zero in valid executions, so
zeroing them out to try to prevent exploitation attempts might not help
so much.


Segher


Re: [PATCH] gimple-fold: Don't optimize wierdo floating point value reads [PR95450]

2020-08-24 Thread Jeff Law via Gcc-patches
On Mon, 2020-08-24 at 12:51 +0200, Richard Biener wrote:
> On Wed, 12 Aug 2020, Jakub Jelinek wrote:
> 
> > On Wed, Aug 12, 2020 at 04:30:35PM +0200, Richard Biener wrote:
> > > Not a final review but if we care for this kind of normalization at all
> > > the we should do so consistently, thus for both encode and interpret and
> > > for all modes.  For FP we could also check if we'd consider the values
> > > equal rather than whether we would en/decode them to the same bit pattern
> > > (which might or might not be what an actual ISA gpr<->fpr reg move would
> > > do)
> > 
> > I think the verification that what we encode can be interpreted back
> > woiuld be only an internal consistency check (so perhaps for ENABLE_CHECKING
> > if flag_checking only, but if both directions perform it, then we need
> > to avoid mutual recursion).
> > While for the other direction (interpretation), at least for the broken by
> > design long doubles we just know we can't represent in GCC all valid values.
> > The other floating point formats are just theoretical case, perhaps we would
> > canonicalize something to a value that wouldn't trigger invalid exception
> > when without canonicalization it would trigger it at runtime, so let's just
> > ignore those.
> > 
> > Adjusted (so far untested) patch to do it in native_interpret_real instead
> > and limit it to the MODE_COMPOSITE_P cases, for which e.g.
> > fold-const.c/simplify-rtx.c punts in several other places too because we 
> > just
> > know we can't represent everything.
> > 
> > E.g.
> >   /* Don't constant fold this floating point operation if the
> >  result may dependent upon the run-time rounding mode and
> >  flag_rounding_math is set, or if GCC's software emulation
> >  is unable to accurately represent the result.  */
> >   if ((flag_rounding_math
> >|| (MODE_COMPOSITE_P (mode) && !flag_unsafe_math_optimizations))
> >   && (inexact || !real_identical (&result, &value)))
> > return NULL_TREE;
> > Or perhaps guard it with MODE_COMPOSITE_P (mode) && 
> > !flag_unsafe_math_optimizations
> > too, thus break what gnulib / m4 does with -ffast-math, but not normally?
> 
> OK.
> 
> Richard.
> 
> > 2020-08-12  Jakub Jelinek  
> > 
> > PR target/95450
> > * fold-const.c (native_interpret_real): For MODE_COMPOSITE_P modes
> > punt if the to be returned REAL_CST does not encode to the bitwise
> > same representation.
Jakub, can you pull this into F33/F34.  It's affecting a few packages (assuming
this is meant to fix the gnulib FP testsuite failures on ppc64le).

Jeff



Re: [PATCH 3/5] Add DWARF5 variants of assembly scan tests that use DW_FORM_implicit_const

2020-08-24 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 24, 2020 at 02:56:56PM +0200, Mark Wielaard wrote:
> Some DWARF tests scan the assembly output looking for constant values.
> When using DWARF5 those constants might use DW_FORM_implicit_const,
> which are output (in the comments) after the attribute instead of
> before. To make sure these tests work introduce a -gdwarf-5 variant
> of these tests and explicitly use -gdwarf-2 for the original.

I just wonder if we want to use -gdwarf-2 rather than -gdwarf-4 in the
original, -gdwarf-5 has been the default for a couple of years and thus
that is what those testshave been compiled with.
Also not sure about the -dwarf5 suffixes, couldn't we say just use
pr41445-{7,8}.c, inline-var-2.C or inline3.c (or whatever next number
with the same prefix is still unused)?

> ---
>  .../dwarf2/{inline-var-1.C => inline-var-1-dwarf5.C}  | 6 --
>  gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C  | 2 +-
>  .../gcc.dg/debug/dwarf2/{inline2.c => inline2-dwarf5.c}   | 7 ---
>  gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c   | 4 +++-
>  .../debug/dwarf2/{pr41445-5.c => pr41445-5-dwarf5.c}  | 8 
>  gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c | 2 +-
>  .../debug/dwarf2/{pr41445-6.c => pr41445-6-dwarf5.c}  | 8 
>  gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c | 2 +-
>  8 files changed, 22 insertions(+), 17 deletions(-)
>  copy gcc/testsuite/g++.dg/debug/dwarf2/{inline-var-1.C => 
> inline-var-1-dwarf5.C} (76%)
>  copy gcc/testsuite/gcc.dg/debug/dwarf2/{inline2.c => inline2-dwarf5.c} (88%)
>  copy gcc/testsuite/gcc.dg/debug/dwarf2/{pr41445-5.c => pr41445-5-dwarf5.c} 
> (73%)
>  copy gcc/testsuite/gcc.dg/debug/dwarf2/{pr41445-6.c => pr41445-6-dwarf5.c} 
> (68%)
> 
> diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C 
> b/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1-dwarf5.C
> similarity index 76%
> copy from gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
> copy to gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1-dwarf5.C
> index 3b1c913edfca..52ed5b6912fd 100644
> --- a/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
> +++ b/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1-dwarf5.C
> @@ -1,7 +1,9 @@
> +// DWARF5 variant of inline-var-1.C
>  // { dg-do compile { target c++17 } }
> -// { dg-options "-O -g -dA -gno-strict-dwarf 
> -fno-eliminate-unused-debug-symbols" }
> +// { dg-options "-O -gdwarf-5 -dA -gno-strict-dwarf 
> -fno-eliminate-unused-debug-symbols" }
>  // { dg-require-weak "" }
> -// { dg-final { scan-assembler-times "0x3\[^\n\r]* DW_AT_inline" 6 { xfail 
> *-*-aix* } } }
> +// { dg-final { scan-assembler-times " DW_AT_inline \\(0x3\\)" 2 { xfail 
> *-*-aix* } } }
> +// { dg-final { scan-assembler-times "0x3\[^\n\r]* DW_AT_inline" 4 { xfail 
> *-*-aix* } } }
>  // { dg-final { scan-assembler-times "0x1\[^\n\r]* DW_AT_inline" 2 { xfail 
> *-*-aix* } } }
>  // { dg-final { scan-assembler-times " DW_AT_declaration" 6 { xfail *-*-aix* 
> } } }
>  // { dg-final { scan-assembler-times " DW_AT_specification" 6 { xfail 
> *-*-aix* } } }
> diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C 
> b/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
> index 3b1c913edfca..9a88e28cbe0f 100644
> --- a/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
> +++ b/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
> @@ -1,5 +1,5 @@
>  // { dg-do compile { target c++17 } }
> -// { dg-options "-O -g -dA -gno-strict-dwarf 
> -fno-eliminate-unused-debug-symbols" }
> +// { dg-options "-O -gdwarf-2 -dA -gno-strict-dwarf 
> -fno-eliminate-unused-debug-symbols" }
>  // { dg-require-weak "" }
>  // { dg-final { scan-assembler-times "0x3\[^\n\r]* DW_AT_inline" 6 { xfail 
> *-*-aix* } } }
>  // { dg-final { scan-assembler-times "0x1\[^\n\r]* DW_AT_inline" 2 { xfail 
> *-*-aix* } } }
> diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c 
> b/gcc/testsuite/gcc.dg/debug/dwarf2/inline2-dwarf5.c
> similarity index 88%
> copy from gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
> copy to gcc/testsuite/gcc.dg/debug/dwarf2/inline2-dwarf5.c
> index 7e019a6c06a0..03013f11bca8 100644
> --- a/gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
> +++ b/gcc/testsuite/gcc.dg/debug/dwarf2/inline2-dwarf5.c
> @@ -1,4 +1,4 @@
> -/* Contributed by Dodji Seketeli 
> +/* DWARF5 variant of inline2.
> Origin: PR debug/37801
>  
>Abstract instances (DW_TAG_subroutines having the DW_AT_inline attribute)
> @@ -14,7 +14,8 @@
>properly nested DW_TAG_inlined_subroutine DIEs for third, second and first.
>  */
>  
> -/* { dg-options "-O -g3 -gdwarf -dA -fgnu89-inline" } */
> +/* Explicitly use dwarf-5 which uses DW_FORM_implicit_const.  */
> +/* { dg-options "-O -g3 -gdwarf-5 -dA -fgnu89-inline" } */
>  /* { dg-do compile } */
>  
>  /* There are 6 inlined subroutines:
> @@ -32,7 +33,7 @@
>  /* There are 3 DW_AT_inline attributes: one per abstract inline instance.
> The value of the attribute must be 0x3, meaning the function was
> actually inlined.  */
> -/* { dg-fin

Re: [PATCH 2/5] Make sure that static data member constexpr isn't optimized away in test.

2020-08-24 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 24, 2020 at 02:56:55PM +0200, Mark Wielaard wrote:
> In DWARF5 class variables (static data members) are represented with a
> DW_TAG_variable instead of a DW_TAG_member. Make sure the variable isn't
> optimized away in the constexpr-var-1.C testcase so we can still match (2)
> const_expr in the the assembly output.
> 
> Note that the same issue causes some failures in the gdb testsuite
> for static data members when we enable DWARF5 by default:
> https://sourceware.org/bugzilla/show_bug.cgi?id=26525
> ---
>  gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C 
> b/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
> index 19062e29fd59..c6ad3f645379 100644
> --- a/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
> +++ b/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
> @@ -7,3 +7,4 @@ struct S
>  {
>static constexpr int b = 6;
>  } s;
> +const int &c = s.b;

This looks incorrect to me, that is a workaround for a real GCC bug.

Shouldn't we instead do something like (untested) following patch?
I mean, for DWARF < 5 the static data members were using DW_TAG_member,
which has been always marked by the function, so IMHO we should also always
mark the DW_TAG_variables at the class scope that replaced those.

2020-08-24  Jakub Jelinek  

* dwarf2out.c (prune_unused_types_walk): Mark DW_TAG_variable DIEs
at class scope for DWARF5+.

--- gcc/dwarf2out.c.jj  2020-07-28 15:39:09.883757946 +0200
+++ gcc/dwarf2out.c 2020-08-24 19:33:16.503961786 +0200
@@ -29392,6 +29392,13 @@ prune_unused_types_walk (dw_die_ref die)
  if (die->die_perennial_p)
break;
 
+ /* For static data members, the declaration in the class is supposed
+to have DW_TAG_member tag in DWARF{3,4} but DW_TAG_variable in
+DWARF5.  DW_TAG_member will be marked, so mark even such
+DW_TAG_variables in DWARF5.  */
+ if (dwarf_version >= 5 && class_scope_p (die->die_parent))
+   break;
+
  /* premark_used_variables marks external variables --- don't mark
 them here.  But function-local externals are always considered
 used.  */


Jakub



Re: [PATCH 1/5] Don't enable -gvariable-location-views by default for DWARF5.

2020-08-24 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 24, 2020 at 02:56:54PM +0200, Mark Wielaard wrote:
> DWARF5 makes it possible to read loclists tables without consulting
> the debuginfo tree by introducing a table header. Adding location views
> breaks this (at least for binutils and elfutils). So don't enable
> variable-location-views by default if DWARF5 or higher is selected.

This should be discussed with Alex, CCed.
I'd say elfutils/binutils should just show .debug_loclists independent of
.debug_info if there are no loc views and use .debug_info otherwise.

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 70dc1ab73a12..e5dddc236d7d 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9301,9 +9301,9 @@ can be consumed by debug information consumers that are 
> not aware of
>  these augmentations, but they won't derive any benefit from them either.
>  
>  This is enabled by default when outputting DWARF 2 debug information at
> -the normal level, as long as there is assembler support,
> -@option{-fvar-tracking-assignments} is enabled and
> -@option{-gstrict-dwarf} is not.  When assembler support is not
> +the normal level for DWARF versions lower than 5, as long as there
> +is assembler support, @option{-fvar-tracking-assignments} is enabled
> +and @option{-gstrict-dwarf} is not.  When assembler support is not
>  available, this may still be enabled, but it will force GCC to output
>  internal line number tables, and if
>  @option{-ginternal-reset-location-views} is not enabled, that will most
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 07457d08c3aa..34218c6b3349 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1672,6 +1672,7 @@ process_options (void)
>  && (write_symbols == DWARF2_DEBUG
>  || write_symbols == VMS_AND_DWARF2_DEBUG)
>  && !dwarf_strict
> +&& dwarf_version < 5
>  && dwarf2out_as_loc_support
>  && dwarf2out_as_locview_support);
>  }
> -- 
> 2.18.4

Jakub



Re: [PATCH] improve validation of attribute arguments (PR c/78666)

2020-08-24 Thread Martin Sebor via Gcc-patches

On 8/24/20 4:59 AM, Aldy Hernandez wrote:



On 8/21/20 1:37 AM, Martin Sebor wrote:

On 8/20/20 3:00 PM, Aldy Hernandez wrote:



Regardless, here are some random comments.


Thanks for the careful review!


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 37214831538..bc4f409e346 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -720,6 +725,124 @@ positional_argument (const_tree fntype, 
const_tree atname, tree pos,

   return pos;
 }

+/* Given a pair of NODEs for arbitrary DECLs or TYPEs, validate one or
+   two integral or string attribute arguments NEWARGS to be applied to
+   NODE[0] for the absence of conflicts with the same attribute 
arguments
+   already applied to NODE[1]. Issue a warning for conflicts and 
return

+   false.  Otherwise, when no conflicts are found, return true.  */
+
+static bool
+validate_attr_args (tree node[2], tree name, tree newargs[2])


I think you're doing too much work in one function.  Also, I *really* 
dislike sending pairs of objects in arrays, especially when they're 
called something so abstract as "node" and "newargs".


Would it be possible to make this function only validate one single 
argument and call it twice?  Or do we gain something by having it do 
two things at once?


I agree about the name "node."  The argument comes from the attribute
handlers: they all take something called a node as their first argument.
It's an array of three elements:
   [0] the current declaration or type
   [1] the previous declaration or type or null
   [2] the current declaration if [0] is a type


Ah, the rest of the functions are taking a node pointer.  Your patch 
threw me off because you use a node[2] instead of a node pointer like 
the rest of the functions.  Perhaps you should keep to the current style 
and pass a node *.


It takes tree node[3] and the -Warray-parameter option (being
reviewed) uses the bound to check for out-of-bounds accesses, both
callers and the callee itself.  (C, but not C++, has special syntax
for this: tree node[static 3].)




validate_attr_args() is called with the same node as the handlers
and uses both node[0] and node[1] to recursively validate the first
one against itself and then against the second.  It could be changed
to take two arguments instead of an array (the new "node" and
the original "node," perhaps even under some better name).  That
would make it different from every handler but maybe that wouldn't
be a problem.

The newargs argument is also an array, with the second element
being optional.  Both elements are used and validated against
the attribute arguments on the same declaration first and again
on the previous one.  The array could be split up into two
distinct arguments, say newarg1 and newarg2, or passed in as
std::pair.  I'm not sure I see much of a difference
between the approaches.


It looks like node[] carries all the information for the current 
attribute and arguments, as well the same information for the previous 
attribute.  Could your validate function just take:


validate_attr_args (tree *node, tree name)

That way you can save passing a pair of arguments, plus you can save 
accumulating said arguments in the handle_* functions.


Or is there something I'm missing here that makes this unfeasible?


If the function didn't the newargs array it would have to extract
the argument(s) from node, duplicating the work already done in
the callers.  I.e., figuring out how many arguments the attribute
expects (one or two, depending on the specific attribute), and for
handle_alloc_size_attribute, calling positional_argument (or at
a minimum default_conversion) to convert it to the expected value.
So it's feasible but doesn't seem like a good design.



  /* Extract the same attribute from the previous declaration or 
type.  */

  tree prevattr = NULL_TREE;
  if (DECL_P (node[1]))
    {
  prevattr = DECL_ATTRIBUTES (node[1]);
  if (!prevattr)
{
  tree type = TREE_TYPE (node[1]);
  prevattr = TYPE_ATTRIBUTES (type);
}
    }
  else if (TYPE_P (node[1]))
    prevattr = TYPE_ATTRIBUTES (node[1]);


If all this section does is extract the attribute from a decl, it would 
look cleaner if you abstracted out this functionality into its own 
function.  I'm a big fan of one function to do one thing.



  const char* const namestr = IDENTIFIER_POINTER (name);
  prevattr = lookup_attribute (namestr, prevattr);
  if (!prevattr)
    return true;


Perhaps a better name would be attribute_name_str?


Thanks for the suggestion but I think NAMESTR is good enough: it
should be clear enough from the function argument NAME that it
refers to the string representation of the NAME.  There also is
already a pre-existing use of NAMESTR elsewhere and so a precedent
for this choice.


  /* Extract one or both attribute arguments.  */
  tree prevargs[2];
  prevargs[0] = TREE_VALUE (TREE_VALUE (prevattr));
  prevargs[1] = TREE_CHAIN (TREE_VALUE (prevattr));
  if (prevargs[1])
    p

Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.

2020-08-24 Thread Josh Triplett
On Sat, Aug 22, 2020 at 06:04:48PM -0300, Giuliano Belinassi wrote:
> Hi, Josh
> 
> On 08/21, Josh Triplett wrote:
> > On Thu, Aug 20, 2020 at 07:00:13PM -0300, Giuliano Belinassi wrote:
> > > This patch series add a new flag "-fparallel-jobs=" to control if the
> > > compiler should try to compile the current file in parallel.
> > [...]
> > > Bootstrapped and Regtested on Linux x86_64.
> > > 
> > > Giuliano Belinassi (6):
> > >   Modify gcc driver for parallel compilation
> > >   Implement a new partitioner for parallel compilation
> > >   Implement fork-based parallelism engine
> > >   Add `+' for Jobserver Integration
> > >   Add invoke documentation
> > >   New tests for parallel compilation feature
> > 
> > Very nice!
> 
> Thank you for your interest in this :)
> 
> > 
> > I'm interested in testing this on a highly parallel system. What
> > baseline do these patches apply to?  They don't seem to apply to GCC
> > trunk.
> 
> Hummm, this was supposed to work on trunk out of the box. However,
> there is a high probability that I messed up something while rebasing.
> I will post a version 2 of it when I get more comments and when I fix
> the Makefile issue that Joseph pointed out in other e-mail.
> 
> If you want to test it on a high parallel system, I think it will be
> cool to see how it behaves also when --param=promote-statics=1, as it
> increases parallelism opportunity. :)

I plan to try several variations, including that.

I'd like to see how it affects the performance of Linux kernel builds.

> > Also, I tried to bootstrap the current tip of the devel/autopar_devel
> > branch, but ended up with compiler segfaults that all look like this:
> > ../../gcc/zlib/compress.c:86:1: internal compiler error: Segmentation fault
> >86 | }
> >   | ^
> 
> Well, there was once a bug in this branch when compiling with -flto that
> caused the assembler output file not to be properly initialized early
> enough, resulting in LTO LGEN stage writing into a invalid FILE pointer.
> I fixed this during rebasing but I forgot to push to the autopar_devel
> branch. In any case, I just pushed the recent changes to autopar_devel
> which fix this issue.

That might explain the problem; I had tried to build gcc with the
bootstrap-lto configuration.

> In any case, -fparallel-jobs= should NOT be used together with -flto.
> Although I used part of the LTO engine for development of this feature,
> they are meant for distinct things. I guess I should give a warning
> about that in next version :)

Interesting. Is that something that could change in the future? I'd like
to be able to get some parallelism when creating the object files, and
then more parallelism when doing the final LTO link.

> Also, I just tested bootstrap with
> 
> ../gcc/configure --disable-multilib --enable-languages=c,c++
>
> on x86_64 linux and it is working.

I'd used --enable-multilib, and --enable-languages=c,c++,lto . Would
that be expected to work?

Thanks,
Josh


Re: [PATCH] Fortran : ICE for division by zero in declaration PR95882

2020-08-24 Thread Thomas Koenig via Gcc-patches

Hi Mark,


OK to commit and backport?


The test cases mentioned in the ChangeLog are not in the
patch, instead there is the test case for PR 96624.

Could you correct that?

Best regards

Thomas


[PATCH] x86: Use target("baseline-isas-only") in

2020-08-24 Thread H.J. Lu via Gcc-patches
On Mon, Aug 24, 2020 at 7:55 AM Uros Bizjak  wrote:
>
> On Mon, Aug 24, 2020 at 3:23 PM H.J. Lu  wrote:
>
> > > Speaking of pragmas, these should be added outside cpuid.h, like:
> > >
> > > #pragma GCC push_options
> > > #pragma GCC target("general-regs-only")
> > >
> > > #include 
> > >
> > > void cpuid_check ()
> > > ...
> > >
> > > #pragma GCC pop_options
> > >
> > > >footnote
> > >
> > > Nowadays, -march=native is mostly used outside generic target
> > > compilations, so for relevant avx512 targets, we still generate spills
> > > to mask regs. In future, we can review the setting of the tuning flag
> > > for a generic target in the same way as with SSE2 inter-reg moves.
> > >
> >
> > Florian raised an issue that we need to limit  to the basic ISAs.
> >  should be handled similarly to other intrinsic header files.
> > That is  should use
> >
> > #pragma GCC push_options
> > #ifdef __x86_64__
> > #pragma GCC target("arch=x86-64")
> > #else
> > #pragma GCC target("arch=i386")
> > ...
> > #pragma GCC pop_options
> >
> > Here is a patch.  OK for master?
>
> -ENOPATCH
>
> However, how will this affect inlining? Every single function in
> cpuid.h is defined as static __inline, and due to target flags
> mismatch, it won't be inlined anymore. These inline functions are used
> in some bit testing functions, and to keep them inlined, these should
> also use the same options to avoid non-basic ISAs. This is the reason
> cpuid.h should be #included after pragma, together with bit testing
> functions, as shown above.
>

How about target("baseline-isas-only")? All CPUID functions are
inlined.


-- 
H.J.
From 82efbfdc58e6bdcf11a9e09018db6bbc690f77b1 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 21 Aug 2020 09:42:49 -0700
Subject: [PATCH] x86: Use target("baseline-isas-only") in 

CPUID check should be done only with baseline ISAs, which include FXSR,
MMX, SSE and SSE2 in 64-bit mode.

gcc/

	PR target/96744
	* common/config/i386/i386-common.c (ix86_handle_option): Support
	-mbaseline-isas-only.
	* config/i386/cpuid.h: Add #pragma GCC target("baseline-isas-only").
	* config/i386/i386-options.c (ix86_valid_target_attribute_inner_p):
	Handle baseline-isas-only.
	* config/i386/i386.opt: Add -mbaseline-isas-only.
	* doc/extend.texi: Document target("baseline-isas-only") function
	attribute.
	* doc/invoke.texi: Document -mbaseline-isas-only.

gcc/testsuite/

	PR target/96744
	* gcc.target/i386/avx512-check.h: Add #pragma GCC
	target("baseline-isas-only") for CPUID check.
	* gcc.target/i386/pr96744-10.c: New test.
---
 gcc/common/config/i386/i386-common.c | 20 +++
 gcc/config/i386/cpuid.h  | 13 ++
 gcc/config/i386/i386-options.c   |  7 -
 gcc/config/i386/i386.opt |  6 -
 gcc/doc/extend.texi  |  4 +++
 gcc/doc/invoke.texi  |  5 
 gcc/testsuite/gcc.target/i386/avx512-check.h |  5 
 gcc/testsuite/gcc.target/i386/pr96744-10.c   | 27 
 8 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr96744-10.c

diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index bb14305ad7b..2c0b7f3fe0f 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -338,6 +338,26 @@ ix86_handle_option (struct gcc_options *opts,
 	gcc_unreachable ();
   return true;
 
+case OPT_mbaseline_isas_only:
+  if (value)
+	{
+	  /* Only enable baseline ISAs.  */
+	  if ((opts->x_ix86_isa_flags & OPTION_MASK_ISA_64BIT))
+	opts->x_ix86_isa_flags = (OPTION_MASK_ISA_64BIT
+  | OPTION_MASK_ISA_FXSR
+  | OPTION_MASK_ISA_MMX
+  | OPTION_MASK_ISA_SSE
+  | OPTION_MASK_ISA_SSE2);
+	  else
+	opts->x_ix86_isa_flags = 0;
+	  opts->x_ix86_isa_flags2 = 0;
+	  opts->x_ix86_isa_flags_explicit = -1;
+	  opts->x_ix86_isa_flags2_explicit = -1;
+	}
+  else
+	gcc_unreachable ();
+  return true;
+
 case OPT_mmmx:
   if (value)
 	{
diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index bca61d620db..dd2ef8f9b30 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -24,6 +24,17 @@
 #ifndef _CPUID_H_INCLUDED
 #define _CPUID_H_INCLUDED
 
+#pragma GCC push_options
+#if __GNUC__ >= 11
+#pragma GCC target("baseline-isas-only")
+#else
+#ifdef __x86_64__
+#pragma GCC target("arch=x86-64")
+#else
+#pragma GCC target("arch=i386")
+#endif
+#endif
+
 /* %eax */
 #define bit_AVX512BF16	(1 << 5)
 
@@ -324,4 +335,6 @@ __cpuidex (int __cpuid_info[4], int __leaf, int __subleaf)
 		 __cpuid_info[2], __cpuid_info[3]);
 }
 
+#pragma GCC pop_options
+
 #endif /* _CPUID_H_INCLUDED */
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index e0fc68c27bf..4a09c1c93ee 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -1072,6 +1072,10 @@ ix86_valid_target_attribute_inner_p 

Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-24 Thread Jeff Law via Gcc-patches
On Fri, 2020-08-21 at 13:53 +0100, Roger Sayle wrote:
> This patch improves the code generated on PA-RISC for DImode
> (double word) left shifts by small constants (1-31).  This target
> has a very cool shd instruction that can be recognized by combine
> for simple shifts, but relying on combine is fragile for more
> complicated functions.  This patch tweaks pa.md's ashldi3 expander,
> to form the optimal two instruction shd/zdep sequence at RTL
> expansion time.
> 
> As an example of the benefits of this approach, the simple function
> 
> unsigned long long u9(unsigned long long x) { return x*9; }
> 
> currently generates 9 instructions
> 
> u9: copy %r25,%r28
> copy %r26,%r29
> extru %r26,2,3,%r21
> zdep %r25,28,29,%r19
> zdep %r26,28,29,%r20
> or %r21,%r19,%r19
> add %r29,%r20,%r29
> addc %r28,%r19,%r28
> bv,n %r0(%r2)
> 
> and with this patch now requires only 7:
> 
> u9: copy %r25,%r28
> copy %r26,%r29
> shd %r26,%r25,29,%r19
> zdep %r26,28,29,%r20
> add %r29,%r20,%r29
> addc %r28,%r19,%r28
> bv,n %r0(%r2)
> 
> 
> This improvement is a first step towards getting synth_mult to
> behave sanely on hppa (PR middle-end/87256).
> 
> Unfortunately, it's been a long while since I've had access to a
> hppa system, so apart from building a cross-compiler and looking at
> the assembler it generates, this patch is completely untested.
> I was wondering whether Dave or Jeff (or someone else with access
> to real hardware) might "spin" this patch for me?
So while I have an R390 down in the basement, it overheats and I don't think 
I've
turned it on in a few years.

What I do for testing is exploit QEMU user mode emulation.  I have a little root
filesystem with HPPA binaries -- there's enough bits in that filesystem to build
binutils, gcc, glibc & the linux kernel as well as run the testsuite.

The root filesystems are here:

https://github.com/JeffreyALaw/chroots

The Jenkins tester spins the PA once week.  It failed its last run due to
regressions in the analyzer.  I didn't realize it was running the static 
analyzer
for the qemu emulated targets, which I turned off yesterday and re-started the 
PA
build.

http://3.14.90.209:8080/job/hppa-linux-gnu/



Jeff
> 



Re: [PATCH] arm: Fix -mpure-code support/-mslow-flash-data for armv8-m.base [PR94538]

2020-08-24 Thread Christophe Lyon via Gcc-patches
On Mon, 24 Aug 2020 at 11:09, Christophe Lyon
 wrote:
>
> On Sat, 22 Aug 2020 at 00:44, Ramana Radhakrishnan
>  wrote:
> >
> > On Wed, Aug 19, 2020 at 10:32 AM Christophe Lyon via Gcc-patches
> >  wrote:
> > >
> > > armv8-m.base (cortex-m23) has the movt instruction, so we need to
> > > disable the define_split to generate a constant in this case,
> > > otherwise we get incorrect insn constraints as described in PR94538.
> > >
> > > We also need to fix the pure-code alternative for thumb1_movsi_insn
> > > because the assembler complains with instructions like
> > > movs r0, #:upper8_15:1234
> > > (Internal error in md_apply_fix)
> > > We now generate movs r0, 4 instead.
> > >
> > > 2020-08-19  Christophe Lyon  
> > > gcc/ChangeLog:
> > >
> > > * config/arm/thumb1.md: Disable set-constant splitter when
> > > TARGET_HAVE_MOVT.
> > > (thumb1_movsi_insn): Fix -mpure-code
> > > alternative.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/arm/pure-code/pr94538-1.c: New test.
> > > * gcc.target/arm/pure-code/pr94538-2.c: New test.
> >
>
> Hi Ramana,
>
> >  I take it that this fixes the ICE rather than addressing the code
> > generation / performance bits that Wilco was referring to ? Really the
> > other code quality / performance issues listed in the PR should really
> > be broken down into separate PRs while we take this as a fix for
> > fixing the ICE,
> >
> > Under that assumption OK.
> >
>
> Yes, that's correct, this patch just fixes the ICE, as it is cleaner
> to handle perf issues in a different patch.
>

The patch applies cleanly to gcc-9 and gcc-10: OK to backport there?

(I tested that the offending testcase passes on the branches with the
backport, and fails without).

Thanks,

Christophe

> I'll open a new PR to track the perf issue.
>
> Thanks
>
>
> > Ramana
> >
> > > ---
> > >  gcc/config/arm/thumb1.md   | 66 
> > > ++
> > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c | 13 +
> > >  gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c | 12 
> > >  3 files changed, 79 insertions(+), 12 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pr94538-2.c
> > >
> > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > > index 0ff8190..f0129db 100644
> > > --- a/gcc/config/arm/thumb1.md
> > > +++ b/gcc/config/arm/thumb1.md
> > > @@ -70,6 +70,7 @@ (define_split
> > >"TARGET_THUMB1
> > > && arm_disable_literal_pool
> > > && GET_CODE (operands[1]) == CONST_INT
> > > +   && !TARGET_HAVE_MOVT
> > > && !satisfies_constraint_I (operands[1])"
> > >[(clobber (const_int 0))]
> > >"
> > > @@ -696,18 +697,59 @@ (define_insn "*thumb1_movsi_insn"
> > >"TARGET_THUMB1
> > > && (   register_operand (operands[0], SImode)
> > > || register_operand (operands[1], SImode))"
> > > -  "@
> > > -   movs%0, %1
> > > -   movs%0, %1
> > > -   movw%0, %1
> > > -   #
> > > -   #
> > > -   ldmia\\t%1, {%0}
> > > -   stmia\\t%0, {%1}
> > > -   movs\\t%0, #:upper8_15:%1; lsls\\t%0, #8; adds\\t%0, #:upper0_7:%1; 
> > > lsls\\t%0, #8; adds\\t%0, #:lower8_15:%1; lsls\\t%0, #8; adds\\t%0, 
> > > #:lower0_7:%1
> > > -   ldr\\t%0, %1
> > > -   str\\t%1, %0
> > > -   mov\\t%0, %1"
> > > +{
> > > +  switch (which_alternative)
> > > +{
> > > +  default:
> > > +  case 0: return "movs\t%0, %1";
> > > +  case 1: return "movs\t%0, %1";
> > > +  case 2: return "movw\t%0, %1";
> > > +  case 3: return "#";
> > > +  case 4: return "#";
> > > +  case 5: return "ldmia\t%1, {%0}";
> > > +  case 6: return "stmia\t%0, {%1}";
> > > +  case 7:
> > > +  /* pure-code alternative: build the constant byte by byte,
> > > +instead of loading it from a constant pool.  */
> > > +   {
> > > + int i;
> > > + HOST_WIDE_INT op1 = INTVAL (operands[1]);
> > > + bool mov_done_p = false;
> > > + rtx ops[2];
> > > + ops[0] = operands[0];
> > > +
> > > + /* Emit upper 3 bytes if needed.  */
> > > + for (i = 0; i < 3; i++)
> > > +   {
> > > +  int byte = (op1 >> (8 * (3 - i))) & 0xff;
> > > +
> > > + if (byte)
> > > +   {
> > > + ops[1] = GEN_INT (byte);
> > > + if (mov_done_p)
> > > +   output_asm_insn ("adds\t%0, %1", ops);
> > > + else
> > > +   output_asm_insn ("movs\t%0, %1", ops);
> > > + mov_done_p = true;
> > > +   }
> > > +
> > > + if (mov_done_p)
> > > +   output_asm_insn ("lsls\t%0, #8", ops);
> > > +   }
> > > +
> > > + /* Emit lower byte if needed.  */
> > > + ops[1] = GEN_INT (op1 & 0xff);
> > > + if (!mov_done_p)
> > > +   output_asm_insn ("movs\t%0

[committed] libstdc++: Add deduction guide for std::ranges::join_view [LWG 3474]

2020-08-24 Thread Jonathan Wakely via Gcc-patches
This implements the proposed resolution for LWG 3474.

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view): Add deduction guide (LWG 3474).
* testsuite/std/ranges/adaptors/join_lwg3474.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

commit ef275d1f2083f8a1fa1b59a3cd07fd3e8431023e
Author: Jonathan Wakely 
Date:   Mon Aug 24 16:18:32 2020

libstdc++: Add deduction guide for std::ranges::join_view [LWG 3474]

This implements the proposed resolution for LWG 3474.

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view): Add deduction guide (LWG 3474).
* testsuite/std/ranges/adaptors/join_lwg3474.cc: New test.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 22184006c08..9d22b138082 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -2693,6 +2693,11 @@ namespace views
   template
 explicit join_view(_Range&&) -> join_view>;
 
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// 3474. Nesting join_views is broken because of CTAD
+  template
+explicit join_view(join_view<_View>) -> join_view>;
+
   namespace views
   {
 inline constexpr __adaptor::_RangeAdaptorClosure join
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/join_lwg3474.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/join_lwg3474.cc
new file mode 100644
index 000..516aaba7070
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/join_lwg3474.cc
@@ -0,0 +1,37 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+#include 
+
+void
+test01()
+{
+  // LWG 3474. Nesting join_views is broken because of CTAD
+  std::vector>> nested_vectors = {
+{{1, 2, 3}, {4, 5}, {6}},
+{{7},   {8, 9}, {10, 11, 12}},
+{{13}}
+  };
+  auto joined = nested_vectors | std::views::join | std::views::join;
+
+  using V = decltype(joined);
+  static_assert( std::same_as, int> );
+}


[committed] libstdc++: Fix std::indirectly_readable ambiguity [LWG 3446]

2020-08-24 Thread Jonathan Wakely via Gcc-patches
This implements the proposed resolution of LWG 3446. I'm also adding
another new constrained specialization which isn't proposed by 3446, to
resolve the ambiguity when a type has both value_type and element_type
but denoting different types.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (indirectly_readable): Add
partial specializations to resolve ambiguities (LWG 3446).
* testsuite/24_iterators/associated_types/readable.traits.cc:
Check types with both value_type and element_type.

Tested powerpc64le-linux. Committed to trunk.

commit 186aa6304570e15065f31482e9c27326a3a6679f
Author: Jonathan Wakely 
Date:   Mon Aug 24 16:18:31 2020

libstdc++: Fix std::indirectly_readable ambiguity [LWG 3446]

This implements the proposed resolution of LWG 3446. I'm also adding
another new constrained specialization which isn't proposed by 3446, to
resolve the ambiguity when a type has both value_type and element_type
but denoting different types.

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (indirectly_readable): Add
partial specializations to resolve ambiguities (LWG 3446).
* testsuite/24_iterators/associated_types/readable.traits.cc:
Check types with both value_type and element_type.

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index bd6660c5f22..a568f2ab825 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -220,6 +220,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template requires is_object_v<_Tp>
   struct __cond_value_type<_Tp>
   { using value_type = remove_cv_t<_Tp>; };
+
+template
+  concept __has_member_value_type
+   = requires { typename _Tp::value_type; };
+
+template
+  concept __has_member_element_type
+   = requires { typename _Tp::element_type; };
+
   } // namespace __detail
 
   template struct indirectly_readable_traits { };
@@ -238,16 +247,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : indirectly_readable_traits<_Iter>
 { };
 
-  template requires requires { typename _Tp::value_type; }
+  template<__detail::__has_member_value_type _Tp>
 struct indirectly_readable_traits<_Tp>
 : __detail::__cond_value_type
 { };
 
-  template requires requires { typename _Tp::element_type; }
+  template<__detail::__has_member_element_type _Tp>
 struct indirectly_readable_traits<_Tp>
 : __detail::__cond_value_type
 { };
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3446. indirectly_readable_traits ambiguity for types with both [...]
+  template<__detail::__has_member_value_type _Tp>
+requires __detail::__has_member_element_type<_Tp>
+&& same_as,
+  remove_cv_t>
+struct indirectly_readable_traits<_Tp>
+: __detail::__cond_value_type
+{ };
+
+  // LWG 3446 doesn't add this, but it's needed for the case where
+  // value_type and element_type are both present, but not the same type.
+  template<__detail::__has_member_value_type _Tp>
+requires __detail::__has_member_element_type<_Tp>
+struct indirectly_readable_traits<_Tp>
+{ };
+
   namespace __detail
   {
 template
diff --git 
a/libstdc++-v3/testsuite/24_iterators/associated_types/readable.traits.cc 
b/libstdc++-v3/testsuite/24_iterators/associated_types/readable.traits.cc
index b503b0cdc1e..3c9c3927153 100644
--- a/libstdc++-v3/testsuite/24_iterators/associated_types/readable.traits.cc
+++ b/libstdc++-v3/testsuite/24_iterators/associated_types/readable.traits.cc
@@ -141,3 +141,29 @@ struct J
 // iterator_traits matches constrained specialization in the library,
 // so use its value_type.
 static_assert( check_alias );
+
+struct I2
+{
+  using element_type = int;
+};
+// iterator_traits is not specialized, and no standard specialization
+// matches, so use indirectly_readable_traits.
+static_assert( check_alias::value_type> );
+
+// LWG 3446
+struct I3
+{
+  using value_type = long;
+  using element_type = const long;
+};
+// iterator_traits is not specialized, and no standard specialization
+// matches, so use indirectly_readable_traits.
+static_assert( check_alias::value_type> );
+
+// Correction to LWG 3446
+struct I4
+{
+  using value_type = int;
+  using element_type = long;
+};
+static_assert( ! has_alias );


Re: [PATCH] libstdc++: Fix iota_view::size() to avoid overflow

2020-08-24 Thread Jonathan Wakely via Gcc-patches

On 20/08/20 19:54 +0100, Jonathan Wakely wrote:

This avoids overfow that occurs when negating the most negative value of
an integral type.

Also prevent returning signed int when the values have lower rank and
promote to int.

libstdc++-v3/ChangeLog:

* include/std/ranges (ranges::iota_view::size()): Perform all
calculations in the right unsigned types.
* testsuite/std/ranges/iota/size.cc: New test.

Tested powerpc64-linux. Not yet pushed.

Does anybody see any reason to stick with exactly what C++20 requires,
despite its bugs?


I've pushed this now.



commit 6e7e3f742bf22139716712908efa83c74c734ed2
Author: Jonathan Wakely 
Date:   Thu Aug 20 19:44:43 2020

   libstdc++: Fix iota_view::size() to avoid overflow

   This avoids overfow that occurs when negating the most negative value of
   an integral type.

   Also prevent returning signed int when the values have lower rank and
   promote to int.

   libstdc++-v3/ChangeLog:

   * include/std/ranges (ranges::iota_view::size()): Perform all
   calculations in the right unsigned types.
   * testsuite/std/ranges/iota/size.cc: New test.

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index b8023e67c9f..22184006c08 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -889,12 +889,13 @@ namespace ranges
  {
using __detail::__is_integer_like;
using __detail::__to_unsigned_like;
-   if constexpr (__is_integer_like<_Winc> && __is_integer_like<_Bound>)
- return (_M_value < 0)
-   ? ((_M_bound < 0)
-   ? __to_unsigned_like(-_M_value) - __to_unsigned_like(-_M_bound)
-   : __to_unsigned_like(_M_bound) + __to_unsigned_like(-_M_value))
-   : __to_unsigned_like(_M_bound) - __to_unsigned_like(_M_value);
+   if constexpr (integral<_Winc> && integral<_Bound>)
+ {
+   using _Up = make_unsigned_t;
+   return _Up(_M_bound) - _Up(_M_value);
+ }
+   else if constexpr (__is_integer_like<_Winc>)
+ return __to_unsigned_like(_M_bound) - __to_unsigned_like(_M_value);
else
  return __to_unsigned_like(_M_bound - _M_value);
  }
diff --git a/libstdc++-v3/testsuite/std/ranges/iota/size.cc 
b/libstdc++-v3/testsuite/std/ranges/iota/size.cc
new file mode 100644
index 000..2a9d3870c5d
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/iota/size.cc
@@ -0,0 +1,110 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=c++2a" }
+// { dg-do compile { target c++2a } }
+
+#include 
+#include 
+
+template
+constexpr bool
+equal(T t, U u) requires std::same_as
+{
+  return t == u;
+}
+
+template>
+void
+test_integer_iota()
+{
+  using std::numeric_limits;
+
+  using V = std::ranges::iota_view;
+  static_assert( std::ranges::sized_range );
+
+  constexpr V zero(0, 0);
+  static_assert( equal(zero.size(), (S)0) );
+
+  constexpr V min(numeric_limits::min(),
+ numeric_limits::min());
+  static_assert( equal(min.size(), (S)0) );
+
+  constexpr V max(numeric_limits::max(),
+ numeric_limits::max());
+  static_assert( equal(max.size(), (S)0) );
+
+  constexpr V minmax(numeric_limits::min(),
+numeric_limits::max());
+  if constexpr (sizeof(W) < sizeof(S))
+  {
+using S2 = std::make_unsigned_t;
+static_assert( equal(minmax.size(), (S)numeric_limits::max()) );
+  }
+  else
+static_assert( equal(minmax.size(), numeric_limits::max()) );
+
+  constexpr V pospos(20, 22);
+  static_assert( equal(pospos.size(), (S)2) );
+
+  if constexpr (std::numeric_limits::is_signed)
+  {
+constexpr V negneg(-20, -2);
+static_assert( equal(negneg.size(), (S)18) );
+
+constexpr V negpos(-20, 22);
+static_assert( equal(negpos.size(), (S)42) );
+  }
+}
+
+void
+test01()
+{
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+  test_integer_iota();
+
+#ifdef __SIZEOF_INT128__
+  // When the target supports __int128 it can be used in iota_view
+  // even in strict mode where !integral<__int128>.
+  //

[committed] libstdc++: Make variant_npos conversions explicit [PR 96766]

2020-08-24 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

PR libstdc++/96766
* include/std/variant (_Variant_storage): Replace implicit
conversions from size_t to __index_type with explicit casts.

This fixes a silly Clang UBsan warning from its silly
unsigned-integer-overflow sanitizer which complains about perfectly
valid code doing exactly what it's intended to. How silly.

Tested powerpc64le-linux, and with Clang. Committed to trunk.

I'll backport this to the branches, because people use clang's silly
sanitizer with the branches too.

commit 074436cf8cdd2a9ce75cadd36deb8301f00e55b9
Author: Jonathan Wakely 
Date:   Mon Aug 24 16:10:07 2020

libstdc++: Make variant_npos conversions explicit [PR 96766]

libstdc++-v3/ChangeLog:

PR libstdc++/96766
* include/std/variant (_Variant_storage): Replace implicit
conversions from size_t to __index_type with explicit casts.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index eb3d6779205..dd8847cf829 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -385,12 +385,16 @@ namespace __variant
   template
 struct _Variant_storage
 {
-  constexpr _Variant_storage() : _M_index(variant_npos) { }
+  constexpr
+  _Variant_storage()
+  : _M_index(static_cast<__index_type>(variant_npos))
+  { }
 
   template
-   constexpr _Variant_storage(in_place_index_t<_Np>, _Args&&... __args)
+   constexpr
+   _Variant_storage(in_place_index_t<_Np>, _Args&&... __args)
: _M_u(in_place_index<_Np>, std::forward<_Args>(__args)...),
-   _M_index(_Np)
+   _M_index{_Np}
{ }
 
   void _M_reset()
@@ -403,7 +407,7 @@ namespace __variant
std::_Destroy(std::__addressof(__this_mem));
  }, __variant_cast<_Types...>(*this));
 
-   _M_index = variant_npos;
+   _M_index = static_cast<__index_type>(variant_npos);
   }
 
   ~_Variant_storage()
@@ -432,16 +436,20 @@ namespace __variant
   template
 struct _Variant_storage
 {
-  constexpr _Variant_storage() : _M_index(variant_npos) { }
+  constexpr
+  _Variant_storage()
+  : _M_index(static_cast<__index_type>(variant_npos))
+  { }
 
   template
-   constexpr _Variant_storage(in_place_index_t<_Np>, _Args&&... __args)
+   constexpr
+   _Variant_storage(in_place_index_t<_Np>, _Args&&... __args)
: _M_u(in_place_index<_Np>, std::forward<_Args>(__args)...),
-   _M_index(_Np)
+   _M_index{_Np}
{ }
 
   void _M_reset() noexcept
-  { _M_index = variant_npos; }
+  { _M_index = static_cast<__index_type>(variant_npos); }
 
   void*
   _M_storage() const noexcept
@@ -455,7 +463,7 @@ namespace __variant
   {
if constexpr (__variant::__never_valueless<_Types...>())
  return true;
-   return this->_M_index != __index_type(variant_npos);
+   return this->_M_index != static_cast<__index_type>(variant_npos);
   }
 
   _Variadic_union<_Types...> _M_u;


Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.

2020-08-24 Thread Giuliano Belinassi via Gcc-patches
Hi, Richi.

On 08/24, Richard Biener wrote:
> On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
>  wrote:
> >
> > This patch series add a new flag "-fparallel-jobs=" to control if the
> > compiler should try to compile the current file in parallel.
> >
> > There are three modes which is supported by now:
> >
> > 1. -fparallel-jobs=: Try to compile the file using a maximum of N
> > jobs.
> >
> > 2. -fparallel-jobs=jobserver: Check if there is a running GNU Make
> > Jobserver. If positive, communicate with it in order to launch jobs,
> > but alert the user if the jobserver was not found, since it requires
> > modifications in the project Makefile.
> >
> > 3. -fparallel-jobs=auto: Same as 2., but quietly fall back to a maximum
> > of 2 jobs if the jobserver was not found.
> >
> > The parallelization works by using a modified LTO engine, as no IR is
> > dumped into the disk, and a new partitioner is employed to find
> > symbols which must be partitioned together.
> >
> > In order to implement the parallelism feature, we:
> >
> > 1. The driver will pass a hidden -fsplit-outputs= to cc1*.
> >
> > 2. After IPA, cc1* will search for symbols in which must be partitioned
> > together.  If the user allows GCC to automatically promote symbols to
> > globals through "--param=promote-statics=1" for a better parallel
> > compilation performance, it will also be done.  However, if it decides
> > that partitioning is a bad idea, it will continue with a default serial
> > compilation, and the additional  will not be created.  It will
> > avoid compiling in parallel if and only if:
> >
> >   * File size exceeds the minimum file size specified by LTO default
> >   --param=lto-min-partition.
> 
> less than the minimum size I suppose.

True. :)

> 
> >   * The partitioner is unable to find any point of partitioning in the
> >   file.
> 
> It might make sense to increase the minimum partition size and also
> check the partitioning result against unreasonable bias (one very
> large and one very small partition).

I am working on this right now :)

> 
> > 3. cc1* will fork itself; one fork for each partition. Each child
> > process will apply its partition mask generated by the partitioner
> > and write a new assembler name file to  pointed by the driver.
> 
> For the first partition there's no fork (but the main process is used) and
> the main output file will be used, correct?

No. Forking is only disabled only in fallback mode in this version, but
this is certainly possible to implement.

> 
> > 4. The driver will open each file and partially link them together into
> > a single .o file, if -c was requested, else into a binary.  -S and -E
> > is unsupported for now and probably will remain so.
> 
> That also applies to -save-temps mode I assume which makes
> debugging issues a bit tricky and involves manual invocation
> of the cc1 command to have the file with the output filenames preserved.

This is currently unsupported, but it seems to be a interesting feature
to have.  I could use the input file name as a baseline for the
temporary files and dump them in the current working directory instead
of asking for a temporary file to libiberty.

> 
> >
> > Speedups ranged from 0.95x to 1.9x on a Quad-Core Intel Core-i7 8565U
> > when testing with two files in GCC, as stated in the following table.
> > The test was the result of a single execution with a previous warm up
> > execution. The compiled GCC had checking enabled, and therefore release
> > version might have better timings in both sequential and parallel, but the
> > speedup may remain the same.
> >
> > ||| Without Static | With Static |   Max   |
> > | File   | Sequential |Promotion   |  Promotion  | Speedup |
> > ||||---|
> > | gimple-match.c | 60s|   63s  | 34s |   1.7x  |
> > | insn-emit.c| 37s|   19s  | 20s |   1.9x  |
> >
> > Notice that we have a slowdown in some cases when it is enabled, that
> > is why the parallelism feature is enabled with a flag for now.
> 
> One reason why promote-statics is not enabled by default is that
> it creates new hidden symbols (LTO does so as well) which might
> be undesirable.  If deemed OK in general we could enable it by
> default.  Note that originally I wanted to have -fparallel-jobs=auto
> be enabled by default which should not end up with visible changes
> like this(?)

It not only creates hidden symbols, but it also changes the original
symbol name to avoid clashses with other object files. It could be
very nice to avoid doing this at all.

There was once an idea (I don't remember if from Richi or Honza) to
avoid using partial linking, but instead concatenate the generated
assembler files into a single assembler file and assemble it.  This
would remove the requirement of symbol promotion, as they would be
in the same assembler file, but I am not sure if this would work ou

Re: [PATCH] Fix libstdc++ testsuite to handle VxWorks gthreads implementation

2020-08-24 Thread Jonathan Wakely via Gcc-patches

On 21/08/20 09:37 -0300, Alexandre Oliva wrote:

On Dec 20, 2019, Jonathan Wakely  wrote:


On 10/12/19 15:58 +0100, Corentin Gay wrote:

This patch was tested on x86_64-linux and is part of our nightly testing
on all platforms, including VxWorks.



Was it tested on AIX?



I think dg-require-gthreads will prevent the tests running for the
single-threaded multilib on AIX, so it will work OK. But there's a
chance it will need fixing. Let's wait and see (I'm currently unable
to build GCC on AIX).


Sorry it took us so long to get back on this.  I've just refreshed the
patch at :

- resolving some trivial conflicts within 30_threads/shared_mutex,

- updating some renamed test file names within 30_threads/this_thread, and

- dropping the obviated change to 30_threads/this_thread/yield.cc

and gave it a spin on gcc111 in the cfarm.

There weren't any changes to the libstdc++ results, according to
test_summary, not even in the unsupported (or any other) test counts.


OK for trunk, thanks.


Given the approval and the lack of significant changes, I'll put this in
unless there are objections in the next 48 hours.  Thanks for the review!



Thanks. There's a new FAIL due to a bad merge.

That test needs to be restricted to C++11 and C++14, which I did
recently on master. However, since the default became -std=gnu++17
that means it stopped running on master. Since your patch it runs
again, but now fails.

This fixes it to use an explicit -std=gnu++11 and not run for C++17.

Pushed to master.


commit ac4e9090fce653ba7a43ea5333efdd9bbe5c71a3
Author: Jonathan Wakely 
Date:   Mon Aug 24 16:06:25 2020

libstdc++: Fix 30_threads/packaged_task/cons/alloc.cc regression

libstdc++-v3/ChangeLog:

* testsuite/30_threads/packaged_task/cons/alloc.cc: Run for
C++11 and skip for C++17 or later.

diff --git a/libstdc++-v3/testsuite/30_threads/packaged_task/cons/alloc.cc b/libstdc++-v3/testsuite/30_threads/packaged_task/cons/alloc.cc
index dbe477ad1bf..d45637d4f83 100644
--- a/libstdc++-v3/testsuite/30_threads/packaged_task/cons/alloc.cc
+++ b/libstdc++-v3/testsuite/30_threads/packaged_task/cons/alloc.cc
@@ -1,6 +1,6 @@
-// { dg-do run }
+// { dg-options "-std=gnu++11" }
+// { dg-do run { target { c++11_only || c++14_only } } }
 // { dg-additional-options "-pthread" { target pthread } }
-// { dg-require-effective-target c++11 }
 // { dg-require-gthreads "" }
 
 // Copyright (C) 2010-2020 Free Software Foundation, Inc.


[PATCH] sra: Bail out when encountering accesses with negative offsets (PR 96730)

2020-08-24 Thread Martin Jambor
Hi,

I must admit I was quite surprised to see that SRA does not disqualify
an aggregate from any transformations when it encounters an offset for
which get_ref_base_and_extent returns a negative offset.  It may not
matter too much because I sure hope such programs always have
undefined behavior (SRA candidates are local variables on stack) but
it is probably better not to perform weird transformations on them as
build ref model with the new build_reconstructed_reference function
currently happily do for negative offsets (they just copy the existing
expression which is then used as the expression of a "propagated"
access) and of course the compiler must not ICE (as it currently does
because the SRA forest verifier does not like the expression).

Fixed with the following patch which also passed bootstrap and testing
on an x86_64-linux.  OK for master and later on for the gcc-10 branch?

Thanks,

Martin


gcc/ChangeLog:

2020-08-24  Martin Jambor  

PR tree-optimization/96730
* tree-sra.c (create_access): Disqualify any aggregate with negative
offset access.
(build_ref_for_model): Add assert that offset is non-negative.

gcc/testsuite/ChangeLog:

2020-08-24  Martin Jambor  

PR tree-optimization/96730
* gcc.dg/tree-ssa/pr96730.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr96730.c | 13 +
 gcc/tree-sra.c  |  6 ++
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr96730.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96730.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr96730.c
new file mode 100644
index 000..39a06846529
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96730.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+struct a {
+  int b;
+  int c;
+} d() {
+  struct a e[9];
+  int f = 3362953455;
+  e[f] = e[6];
+  e[6].c = 1;
+}
+int main() {}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index fcba7fbdd31..754f41302fc 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -931,6 +931,11 @@ create_access (tree expr, gimple *stmt, bool write)
 }
   if (size == 0)
 return NULL;
+  if (offset < 0)
+{
+  disqualify_candidate (base, "Encountered a negative offset access.");
+  return NULL;
+}
   if (size < 0)
 {
   disqualify_candidate (base, "Encountered an unconstrained access.");
@@ -1667,6 +1672,7 @@ build_ref_for_model (location_t loc, tree base, 
HOST_WIDE_INT offset,
 struct access *model, gimple_stmt_iterator *gsi,
 bool insert_after)
 {
+  gcc_assert (offset >= 0);
   if (TREE_CODE (model->expr) == COMPONENT_REF
   && DECL_BIT_FIELD (TREE_OPERAND (model->expr, 1)))
 {
-- 
2.28.0



Re: [PATCH] Fix ICE.

2020-08-24 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 24, 2020 at 3:03 PM Hongtao Liu  wrote:
>
> Hi:
>   This patch is to fix a typo in my last patch [1].
>   [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551982.html
>
>   Bootstrap is ok, gcc regression test hosted on CLX for i386/x86-64
> backend is ok.
>   Ok for trunk?
>
> gcc/ChangeLog:
> PR target/96755
> * config/i386/sse.md: Refine typo in define_split for knot
> instruction.

Please describe the change in more detail, for example:

"Correct the mode of the NOT operand to SImode."

>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/pr96755.c: New test.

OK with the above ChangeLog improvement.

Uros.


Re: [PATCH] x86: Use -march=x86-64/-march=i386 in

2020-08-24 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 24, 2020 at 3:23 PM H.J. Lu  wrote:

> > Speaking of pragmas, these should be added outside cpuid.h, like:
> >
> > #pragma GCC push_options
> > #pragma GCC target("general-regs-only")
> >
> > #include 
> >
> > void cpuid_check ()
> > ...
> >
> > #pragma GCC pop_options
> >
> > >footnote
> >
> > Nowadays, -march=native is mostly used outside generic target
> > compilations, so for relevant avx512 targets, we still generate spills
> > to mask regs. In future, we can review the setting of the tuning flag
> > for a generic target in the same way as with SSE2 inter-reg moves.
> >
>
> Florian raised an issue that we need to limit  to the basic ISAs.
>  should be handled similarly to other intrinsic header files.
> That is  should use
>
> #pragma GCC push_options
> #ifdef __x86_64__
> #pragma GCC target("arch=x86-64")
> #else
> #pragma GCC target("arch=i386")
> ...
> #pragma GCC pop_options
>
> Here is a patch.  OK for master?

-ENOPATCH

However, how will this affect inlining? Every single function in
cpuid.h is defined as static __inline, and due to target flags
mismatch, it won't be inlined anymore. These inline functions are used
in some bit testing functions, and to keep them inlined, these should
also use the same options to avoid non-basic ISAs. This is the reason
cpuid.h should be #included after pragma, together with bit testing
functions, as shown above.

Uros.


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Qing Zhao via Gcc-patches



> On Aug 24, 2020, at 5:50 AM, Richard Biener  wrote:
> 
> On Tue, 11 Aug 2020, Qing Zhao wrote:
> 
>> Hi, Alexandre,
>> 
>> CC’ing Richard for his comments on this.
>> 
>> 
>>> On Aug 10, 2020, at 9:39 PM, Alexandre Oliva  wrote:
 I think that moving how to zeroing the registers part to each target
 will be a better solution since each target has
 Better idea on how to use the most efficient insns to do the work.
>>> 
>>> It's certainly good to allow machine-specific optimized code sequences,
>>> but it would certainly be desirable to have a machine-independent
>>> fallback.  It doesn't seem exceedingly hard to loop over the registers
>>> and emit a (set (reg:M N) (const_int 0)) for each one that is to be
>>> zeroed out.
>> 
>> The current implementation already includes such machine-independent code, 
>> it should be very easy to add this.
>> 
>> Richard, what’s your opinion on this?
>> Do we need a machine-independent implementation to zeroing the registers for 
>> the default when the target does not provide a optimized
>> Implementation?
> 
> Well, at least silently doing nothing when the option is used would be 
> bad.  So at least a diagnostic would be required.

Yes, this is the current behavior in the current implementation. 
>  Note since the
> option is quite elaborate on what (sub-)set of regs is supposed to be
> cleared I'm not sure an implementation not involving any target hook
> is possible?

Agreed.

Thanks 

Qing
> 
> Richard.
> 
>> Thanks.
>> 
>> Qing
>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> Richard Biener mailto:rguent...@suse.de>>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Rodriguez Bahena, Victor via Gcc-patches


From: Qing Zhao 
Date: Wednesday, August 19, 2020 at 6:28 PM
To: Segher Boessenkool , "Rodriguez Bahena, Victor" 

Cc: Richard Biener , Jeff Law , 
Uros Bizjak , "H. J. Lu" , Jakub 
Jelinek , GCC Patches , Kees Cook 

Subject: Re: PING [Patch][Middle-end]Add 
-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]




On Aug 19, 2020, at 5:57 PM, Segher Boessenkool 
mailto:seg...@kernel.crashing.org>> wrote:

Hi!

On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:

So, cleaning the scratch registers that are used to pass parameters at return 
instructions should
effectively mitigate ROP attack.

But that is *very* expensive, in general.  Instead of doing just a
return instruction (which effectively costs 0 cycles, and is just one
insn), you now have to zero all call-clobbered register at every return
(typically many returns per function, and you are talking 10+ registers
even if only considering the simple integer registers).

Yes, the run-time overhead and also the code-size overhead are major concerns. 
We should minimize the overhead
as much as we can during implementation. However, such overhead cannot be 
completely avoided for the security purpose.

In order to reduce the overhead for the ROP mitigation, I added 3 new values 
for -fzero-call-used-regs=used-arg-grp|used-arg|arg

For “used-arg-grp”, we only zero the integer registers that are used in the 
routine and can pass parameters; this should provide ROP mitigation
with the minimum overhead.

For “used-arg”, in addition to “used-arg-grp”, the other registers (for 
example, FP registers) that can pass parameters will be zeroed. But I am not
very sure whether this option is really needed in practical.

For “arg”, in addition to “used-arg”, all registers that pass parameters will 
be zeroed. Same as “used-arg”, I am not very sure whether we need this option
Or not.


Numbers on how expensive this is (for what arch, in code size and in
execution time) would be useful.  If it is so expensive that no one will
use it, it helps security at most none at all :-(

CLEAR Linux project has been using a similar patch since GCC 8, the option it 
used is an equivalent to -fzero-call-used-regs=used-gpr.

-fzero-call-used-regs=used-arg-gpr in this new proposal will have smaller 
overhead than the one currently being used in CLEAR Linux.

Victor, do you have any data on the overhead of the option that currently is 
used by CLEAR project?


This is a quick list of packages compiled with similar flag as you mention

https://gist.github.com/bryteise/f3469f318e82c626d20a83f557d897a2

The spec files can be located at:

https://github.com/clearlinux-pkgs

I don’t have any data on the overhead, the patch as you mention was implemented 
since GCC8 (2018) . The distro has been measure by community since then. I was 
looking for any major drop detected by community after this patches but I was 
not able to find it.

Maybe it will be worth to ask in the Clear Linux community project mailing list

Regards

Victor Rodriguez


Q1. Which registers should be set to zeros at the return of the function?
A. the caller-saved, i.e, call-used, or call-clobbered registers.
  For ROP mitigation purpose, only the call-used registers that pass
parameters need to be zeroed.
  For register erasure purpose, all the call-used registers might need to
be zeroed. we can provide multiple levels to user for controling the runtime
overhead.

The call-clobbered regs are the only ones you *can* touch.  That does
not mean you should clear them all (it doesn't help much at all in some
cases).  Only the backend knows.

I think that for ROP mitigation purpose, we only need to clear the call-used 
(i.e, call-clobbered) registers that are used in the current routine and
can pass parameters.

But for preventing information leak from callee registers, we might need to 
clear all the call-used registers at return.





   So, from both run-time performance and code-size aspects, setting the
registers to zero is a better approach.

From a security perspective, this isn't clear though.  But that is a lot
of extra research ;-)

There has been quite some discussion on this topic at

https://lists.llvm.org/pipermail/cfe-dev/2020-April/065221.html

From those old discussion, we can see that zero value should be good enough for 
the security purpose (though it’s not perfect).

Qing



I saw the same discussion on latest ELC/OSSNA conference this year by LLVM 
community. The flag is getting a lot of attraction

Regards

Victor Rodriguez



Segher




Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-24 Thread Rodriguez Bahena, Victor via Gcc-patches


-Original Message-
From: Segher Boessenkool 
Date: Wednesday, August 19, 2020 at 5:58 PM
To: Qing Zhao 
Cc: Richard Biener , Jeff Law , 
Uros Bizjak , "H. J. Lu" , Jakub 
Jelinek , GCC Patches , Kees Cook 
, "Rodriguez Bahena, Victor" 

Subject: Re: PING [Patch][Middle-end]Add 
-fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Hi!

On Wed, Aug 19, 2020 at 03:05:36PM -0500, Qing Zhao wrote:
> So, cleaning the scratch registers that are used to pass parameters at 
return instructions should 
> effectively mitigate ROP attack. 

But that is *very* expensive, in general.  Instead of doing just a
return instruction (which effectively costs 0 cycles, and is just one
insn), you now have to zero all call-clobbered register at every return
(typically many returns per function, and you are talking 10+ registers
even if only considering the simple integer registers).

Numbers on how expensive this is (for what arch, in code size and in
execution time) would be useful.  If it is so expensive that no one will
use it, it helps security at most none at all :-(

It is used in some operating systems and packages such as 

https://github.com/clearlinux-pkgs/gettext/blob/master/gettext.spec#L138

export CFLAGS="$CFLAGS -O3 -ffat-lto-objects -flto=4 -fstack-protector-strong 
-mzero-caller-saved-regs=used "

There is no record that this flag creates a considerable penalty in execution 
time.

> Q1. Which registers should be set to zeros at the return of the function?
> A. the caller-saved, i.e, call-used, or call-clobbered registers.
>For ROP mitigation purpose, only the call-used registers that pass
> parameters need to be zeroed. 
>For register erasure purpose, all the call-used registers might need to
> be zeroed. we can provide multiple levels to user for controling the 
runtime
> overhead. 

The call-clobbered regs are the only ones you *can* touch.  That does
not mean you should clear them all (it doesn't help much at all in some
cases).  Only the backend knows.

> So, from both run-time performance and code-size aspects, setting the
> registers to zero is a better approach. 

From a security perspective, this isn't clear though.  But that is a lot
of extra research ;-)

The paper from IEEE provide a clear example on how to use mzero-caller

I think the patch has a solid background and there are multiple projects that 
highlight the importance of cleaning as technique to prevent security issues in 
ROP attacks

Regards

Victor Rodriguez



Segher



c++: overload dumper

2020-08-24 Thread Nathan Sidwell

I frequently need to look at overload sets, and debug_node spews more
information than is useful, most of the time.  Here's a dumper for
overloads, that just tells you their full name and where they came from.

gcc/cp
* ptree.c (debug_overload): New.

pushed
--
Nathan Sidwell
diff --git c/gcc/cp/ptree.c w/gcc/cp/ptree.c
index dfc244fdceb..11833e3b4da 100644
--- c/gcc/cp/ptree.c
+++ w/gcc/cp/ptree.c
@@ -321,3 +321,19 @@ debug_tree (cp_expr node)
 {
   debug_tree (node.get_value());
 }
+
+DEBUG_FUNCTION void
+debug_overload (tree node)
+{
+  FILE *file = stdout;
+
+  for (lkp_iterator iter (node); iter; ++iter)
+{
+  tree decl = *iter;
+  auto xloc = expand_location (DECL_SOURCE_LOCATION (decl));
+  auto fullname = decl_as_string (decl, 0);
+
+  fprintf (file, "%p: %s:%d:%d \"%s\"\n", (void *)decl,
+	   xloc.file, xloc.line, xloc.column, fullname);
+}
+}


[PATCH] x86: Use -march=x86-64/-march=i386 in

2020-08-24 Thread H.J. Lu via Gcc-patches
On Sun, Aug 23, 2020 at 9:03 AM Uros Bizjak  wrote:
>
> On Sun, Aug 23, 2020 at 5:23 PM H.J. Lu  wrote:
> >
> > On Sun, Aug 23, 2020 at 10:18:28AM +0200, Uros Bizjak wrote:
> > > On Sat, Aug 22, 2020 at 9:09 PM H.J. Lu  wrote:
> > >
> > > > > > Compile CPUID check with "-mno-sse -mfpmath=387" to disable SSE, 
> > > > > > AVX and
> > > > > > AVX512 during CPUID check to avoid vector and mask register 
> > > > > > operations.
> > > > >
> > > > > -mgeneral-regs-only ?
> > > > >
> > > >
> > > > Here is a patch to add target("general-regs-only") function
> > > > attribute and use it for CPUID check.   OK for master if there
> > > > are no regressions?
> > >
> > > Please test it first, then ask for an approval.
> > >
> > > Please submit the general-regs-only part as an independent patch. (I
> > > think this is the option linux should use for compilation).
> > >
> > > OTOH, wrapping CPUID check in a target attribute is a bad idea. We
> > > should disable spills to mask registers for generic targets by either
> > > raising costs of moves between general and mask registers and/or (as
> > > suggested earlier) introducing TARGET_SPILL_TO_MASK_REGS tuning and
> > > use it in secondary_memory_needed to prevent inter register unit
> > > spills.
> > >
> > > So, compiling with -mavx512bw would NOT enable spills by default,
> > > where compiling with -march=skylake-avx512 (or using equivalent
> > > -mtune) would. This is IMO the least surprising approach, and would
> > > avoid changing sources (as you now have to do for several testcases).
> >
> > We have 2 orthogonal issues here:
> >
> > 1. When mask register spill should be enabled.
> > 2. CPUID check should be done with general registers only.
> >
> > As shown in GCC testcases, CPUID check may be done with arbitrary ISAs
> > or -march/-mtune options enabled.  We should either
> >
> > 1. Enable only general registers for CPUID check.  Or
> > 2. Issue an error for CPUID check if non-general registers are used.
>
> We should follow the same approach as with SSE2, where DI/SImode
> spills to XMM registers were effectively disabled for a generic
> target. So, unless the tuning target is also specified, spills to mask
> registers should not be generated. It was my oversight to approve the
> patch that enables spills for a generic target, and without the tuning
> flag, the patch will be reverted.
>
> Now, we have -mgeneral-regs-only functionality in place, so if a
> package wants to enable spills, the correct -mtune (ro -march that
> implies -mtune) should be used, and it is expected that the detection
> code is amended with general-regs-only pragmas.
>
> 
> Speaking of pragmas, these should be added outside cpuid.h, like:
>
> #pragma GCC push_options
> #pragma GCC target("general-regs-only")
>
> #include 
>
> void cpuid_check ()
> ...
>
> #pragma GCC pop_options
>
> >footnote
>
> Nowadays, -march=native is mostly used outside generic target
> compilations, so for relevant avx512 targets, we still generate spills
> to mask regs. In future, we can review the setting of the tuning flag
> for a generic target in the same way as with SSE2 inter-reg moves.
>

Florian raised an issue that we need to limit  to the basic ISAs.
 should be handled similarly to other intrinsic header files.
That is  should use

#pragma GCC push_options
#ifdef __x86_64__
#pragma GCC target("arch=x86-64")
#else
#pragma GCC target("arch=i386")
...
#pragma GCC pop_options

Here is a patch.  OK for master?

Thanks.

-- 
H.J.


Re: [PATCH 4/6] Add `+' for Jobserver Integration

2020-08-24 Thread Richard Biener via Gcc-patches
On Fri, Aug 21, 2020 at 12:34 AM Joseph Myers  wrote:
>
> On Thu, 20 Aug 2020, Giuliano Belinassi via Gcc-patches wrote:
>
> >  libbacktrace/Makefile.in |   2 +-
> >  zlib/Makefile.in |  64 ++--
>
> These directories use makefiles generated by automake.  Rather than
> modifying the generated files, you need to modify the sources (whether
> that's Makefile.am, or code in automake itself - if in automake itself, we
> should wait for an actual new automake release before updating the version
> used in GCC).

I also wonder whether for actual production use in GCC we should concentrate
on the known bottle-necks and just amend the rule(s) in gcc/Makefile.in for now?

Richard.

> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH 1/6] Modify gcc driver for parallel compilation

2020-08-24 Thread Richard Biener via Gcc-patches
On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
 wrote:
>
> Update the driver for parallel compilation. This process work as
> follows:
>
> When calling gcc, the driver will check if the flag
> "-fparallel-jobs" was provided by the user. If yes, then we will
> check what is the desired output, and if it can be parallelized.
> There are the following cases, which is described:
>
> 1. -S or -E was provided: We can't run in parallel, as the output
>can not be easily merged together into one file.
>
> 2. -c was provided: When cc1* forks into multiple processes, it
>must tell the driver where it stored its generated assembler files.
>Therefore we pass a hidden "-fsplit-outputs=filename" to the compiler,
>and we check if "filename" was created by it. If yes, we open it,
>call assembler for each generated asm file
>(this file must not be empty), and link them together with
>partial linking to a single .o file. This process is done for each
>object file in the argument list.
>
> 3. -c was not provided, and the final product will be an binary: Here
>we proceed exactly as 2., but we avoid doing the partial
>linking, feeding the generated object files directly into the final link.
>
> For that to work, we had to heavily modify how the "execute" function
> works, extracting common code which is used multiple times, and
> also detecting when the command is a call to a compiler or an
> assembler, as can be seen in append_split_outputs.
>
> Finally, we added some tests which reflects all cases found when
> bootstrapping the compiler, so development of further features to the
> driver get faster for now on.

Few comments inline, Joseph may want to comment on the overall
structure as driver maintainer (CCed).

I know I asked for the changes on the branch to be squashed but
the diff below is quite unreadable with the ChangeLog not helping
the overall refactoring much.  Is it possible to do some of the
factoring/refactoring without any functionality change to make the
actual diff easier to follow?

Thanks,
Richard.

> gcc/ChangeLog
> 2020-08-20  Giuliano Belinassi  
>
> * common.opt (fsplit-outputs): New flag.
> (fparallel-jobs): New flag.
> * gcc.c (extra_arg_storer): New class.
> (have_S): New variable.
> (struct command): Move from execute.
> (is_compiler): New function.
> (is_assembler): New function.
> (get_number_of_args): New function.
> (get_file_by_lines): New function.
> (identify_asm_file): New function.
> (struct infile): New attribute temp_additional_asm.
> (current_infile): New variable.
> (get_path_to_ld): New function.
> (has_hidden_E): New function.
> (sort_asm_files): New function.
> (append_split_outputs): New function.
> (print_command): New function.
> (print_commands): New function.
> (print_argbuf): New function.
> (handle_verbose): Extracted from execute.
> (append_valgrind): Same as above.
> (async_launch_commands): Same as above.
> (await_commands_to_finish): Same as above.
> (split_commands): Same as above.
> (parse_argbuf): Same as above.
> (execute): Refator.
> (fsplit_arg): New function.
> (alloc_infile): Initialize infiles with 0.
> (process_command): Remember when -S was passed.
> (do_spec_on_infiles): Remember current infile being processed.
> (maybe_run_linker): Replace object files when -o is a executable.
> (finalize): Deinitialize temp_object_files.
>
> gcc/testsuite/ChangeLog:
> 20-08-2020  Giuliano Belinassi  
>
> * driver/driver.exp: New test.
> * driver/a.c: New file.
> * driver/b.c: New file.
> * driver/empty.c: New file.
> * driver/foo.c: New file.
> ---
>  gcc/common.opt  |4 +
>  gcc/gcc.c   | 1219 ---
>  gcc/testsuite/driver/a.c|6 +
>  gcc/testsuite/driver/b.c|6 +
>  gcc/testsuite/driver/driver.exp |   80 ++
>  gcc/testsuite/driver/empty.c|0
>  gcc/testsuite/driver/foo.c  |7 +
>  7 files changed, 1049 insertions(+), 273 deletions(-)
>  create mode 100644 gcc/testsuite/driver/a.c
>  create mode 100644 gcc/testsuite/driver/b.c
>  create mode 100644 gcc/testsuite/driver/driver.exp
>  create mode 100644 gcc/testsuite/driver/empty.c
>  create mode 100644 gcc/testsuite/driver/foo.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4b08e91859f..4aa3ad8c95b 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3465,4 +3465,8 @@ fipa-ra
>  Common Report Var(flag_ipa_ra) Optimization
>  Use caller save register across calls if possible.
>
> +fsplit-outputs=
> +Common Joined Var(split_outputs)
> +-fsplit-outputs=  Filename in which current Compilation Unit will 
> be split to.
> +
>  ; This comment is to ensure we retain the blank line abov

[PATCH] Fix ICE.

2020-08-24 Thread Hongtao Liu via Gcc-patches
Hi:
  This patch is to fix a typo in my last patch [1].
  [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551982.html

  Bootstrap is ok, gcc regression test hosted on CLX for i386/x86-64
backend is ok.
  Ok for trunk?

gcc/ChangeLog:
PR target/96755
* config/i386/sse.md: Refine typo in define_split for knot
instruction.

gcc/testsuite/ChangeLog:
* gcc.target/i386/pr96755.c: New test.

-- 
BR,
Hongtao


0001-Refine-typo-to-fix-ICE.patch
Description: Binary data


[PATCH 5/5] Add --gdwarf-5 to ASM_SPEC

2020-08-24 Thread Mark Wielaard
This is needed to get DWARF version 5 .debug_line tables.
It is also obviously wrong. It needs a check for whether as supports
--gdwarf- for all versions we support and it should only
be added when generating DWARF debug information for the specific
DWARF version we are generating.

It also needs some fixes to binutils, to make sure the line table is
generated correctly:
https://sourceware.org/pipermail/binutils/2020-August/112685.html
And to make sure it can read the generated line table itself:
https://sourceware.org/pipermail/binutils/2020-August/112966.html
---
 gcc/gcc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 10bc9881aed3..98b10e7cd154 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -1882,6 +1882,11 @@ init_spec (void)
   }
 #endif
 
+  static const char dv[] = "--gdwarf-5 ";
+  obstack_grow (&obstack, dv, sizeof (dv) - 1);
+  obstack_grow0 (&obstack, asm_spec, strlen (asm_spec));
+  asm_spec = XOBFINISH (&obstack, const char *);
+
 #if defined LINK_EH_SPEC || defined LINK_BUILDID_SPEC || \
 defined LINKER_HASH_STYLE
 # ifdef LINK_BUILDID_SPEC
-- 
2.18.4



[PATCH 4/5] Default to DWARF5

2020-08-24 Thread Mark Wielaard
---
 gcc/common.opt  | 2 +-
 gcc/doc/invoke.texi | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 513125f0c00b..05e4e0638ecb 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3144,7 +3144,7 @@ Common Driver JoinedOrMissing Negative(gdwarf-)
 Generate debug information in default version of DWARF format.
 
 gdwarf-
-Common Driver Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
+Common Driver Joined UInteger Var(dwarf_version) Init(5) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format.
 
 ggdb
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e5dddc236d7d..511fcd25 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9042,13 +9042,14 @@ possible.
 @opindex gdwarf
 Produce debugging information in DWARF format (if that is supported).
 The value of @var{version} may be either 2, 3, 4 or 5; the default version
-for most targets is 4.  DWARF Version 5 is only experimental.
+for most targets is 5 (with the exception of vxworks and darwin which
+default to version 2).
 
 Note that with DWARF Version 2, some ports require and always
 use some non-conflicting DWARF 3 extensions in the unwind tables.
 
 Version 4 may require GDB 7.0 and @option{-fvar-tracking-assignments}
-for maximum benefit.
+for maximum benefit. Version 5 requires GDB 8.0 or higher.
 
 GCC no longer supports DWARF Version 1, which is substantially
 different than Version 2 and later.  For historical reasons, some
-- 
2.18.4



[PATCH 2/5] Make sure that static data member constexpr isn't optimized away in test.

2020-08-24 Thread Mark Wielaard
In DWARF5 class variables (static data members) are represented with a
DW_TAG_variable instead of a DW_TAG_member. Make sure the variable isn't
optimized away in the constexpr-var-1.C testcase so we can still match (2)
const_expr in the the assembly output.

Note that the same issue causes some failures in the gdb testsuite
for static data members when we enable DWARF5 by default:
https://sourceware.org/bugzilla/show_bug.cgi?id=26525
---
 gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
index 19062e29fd59..c6ad3f645379 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/constexpr-var-1.C
@@ -7,3 +7,4 @@ struct S
 {
   static constexpr int b = 6;
 } s;
+const int &c = s.b;
-- 
2.18.4



[PATCH 3/5] Add DWARF5 variants of assembly scan tests that use DW_FORM_implicit_const

2020-08-24 Thread Mark Wielaard
Some DWARF tests scan the assembly output looking for constant values.
When using DWARF5 those constants might use DW_FORM_implicit_const,
which are output (in the comments) after the attribute instead of
before. To make sure these tests work introduce a -gdwarf-5 variant
of these tests and explicitly use -gdwarf-2 for the original.
---
 .../dwarf2/{inline-var-1.C => inline-var-1-dwarf5.C}  | 6 --
 gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C  | 2 +-
 .../gcc.dg/debug/dwarf2/{inline2.c => inline2-dwarf5.c}   | 7 ---
 gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c   | 4 +++-
 .../debug/dwarf2/{pr41445-5.c => pr41445-5-dwarf5.c}  | 8 
 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-5.c | 2 +-
 .../debug/dwarf2/{pr41445-6.c => pr41445-6-dwarf5.c}  | 8 
 gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-6.c | 2 +-
 8 files changed, 22 insertions(+), 17 deletions(-)
 copy gcc/testsuite/g++.dg/debug/dwarf2/{inline-var-1.C => 
inline-var-1-dwarf5.C} (76%)
 copy gcc/testsuite/gcc.dg/debug/dwarf2/{inline2.c => inline2-dwarf5.c} (88%)
 copy gcc/testsuite/gcc.dg/debug/dwarf2/{pr41445-5.c => pr41445-5-dwarf5.c} 
(73%)
 copy gcc/testsuite/gcc.dg/debug/dwarf2/{pr41445-6.c => pr41445-6-dwarf5.c} 
(68%)

diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1-dwarf5.C
similarity index 76%
copy from gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
copy to gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1-dwarf5.C
index 3b1c913edfca..52ed5b6912fd 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1-dwarf5.C
@@ -1,7 +1,9 @@
+// DWARF5 variant of inline-var-1.C
 // { dg-do compile { target c++17 } }
-// { dg-options "-O -g -dA -gno-strict-dwarf 
-fno-eliminate-unused-debug-symbols" }
+// { dg-options "-O -gdwarf-5 -dA -gno-strict-dwarf 
-fno-eliminate-unused-debug-symbols" }
 // { dg-require-weak "" }
-// { dg-final { scan-assembler-times "0x3\[^\n\r]* DW_AT_inline" 6 { xfail 
*-*-aix* } } }
+// { dg-final { scan-assembler-times " DW_AT_inline \\(0x3\\)" 2 { xfail 
*-*-aix* } } }
+// { dg-final { scan-assembler-times "0x3\[^\n\r]* DW_AT_inline" 4 { xfail 
*-*-aix* } } }
 // { dg-final { scan-assembler-times "0x1\[^\n\r]* DW_AT_inline" 2 { xfail 
*-*-aix* } } }
 // { dg-final { scan-assembler-times " DW_AT_declaration" 6 { xfail *-*-aix* } 
} }
 // { dg-final { scan-assembler-times " DW_AT_specification" 6 { xfail *-*-aix* 
} } }
diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C 
b/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
index 3b1c913edfca..9a88e28cbe0f 100644
--- a/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
+++ b/gcc/testsuite/g++.dg/debug/dwarf2/inline-var-1.C
@@ -1,5 +1,5 @@
 // { dg-do compile { target c++17 } }
-// { dg-options "-O -g -dA -gno-strict-dwarf 
-fno-eliminate-unused-debug-symbols" }
+// { dg-options "-O -gdwarf-2 -dA -gno-strict-dwarf 
-fno-eliminate-unused-debug-symbols" }
 // { dg-require-weak "" }
 // { dg-final { scan-assembler-times "0x3\[^\n\r]* DW_AT_inline" 6 { xfail 
*-*-aix* } } }
 // { dg-final { scan-assembler-times "0x1\[^\n\r]* DW_AT_inline" 2 { xfail 
*-*-aix* } } }
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/inline2-dwarf5.c
similarity index 88%
copy from gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
copy to gcc/testsuite/gcc.dg/debug/dwarf2/inline2-dwarf5.c
index 7e019a6c06a0..03013f11bca8 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/inline2-dwarf5.c
@@ -1,4 +1,4 @@
-/* Contributed by Dodji Seketeli 
+/* DWARF5 variant of inline2.
Origin: PR debug/37801
 
   Abstract instances (DW_TAG_subroutines having the DW_AT_inline attribute)
@@ -14,7 +14,8 @@
   properly nested DW_TAG_inlined_subroutine DIEs for third, second and first.
 */
 
-/* { dg-options "-O -g3 -gdwarf -dA -fgnu89-inline" } */
+/* Explicitly use dwarf-5 which uses DW_FORM_implicit_const.  */
+/* { dg-options "-O -g3 -gdwarf-5 -dA -fgnu89-inline" } */
 /* { dg-do compile } */
 
 /* There are 6 inlined subroutines:
@@ -32,7 +33,7 @@
 /* There are 3 DW_AT_inline attributes: one per abstract inline instance.
The value of the attribute must be 0x3, meaning the function was
actually inlined.  */
-/* { dg-final { scan-assembler-times  "(?:byte|data1)\[^\n\]*0x3\[^\n\]* 
DW_AT_inline" 3 } } */
+/* { dg-final { scan-assembler-times  " DW_AT_inline \\(0x3\\)" 3 } } */
 
 volatile int *a;
 
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
index 7e019a6c06a0..9c36450ae2de 100644
--- a/gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
@@ -14,7 +14,9 @@
   properly nested DW_TAG_inlined_subroutine DIEs for third, second and first.
 */
 
-/* { dg-options "-O -g3 -gdwarf -dA -fgnu89-inline" } */
+/* Explicitly use dwarf-2 because dwa

[PATCH 1/5] Don't enable -gvariable-location-views by default for DWARF5.

2020-08-24 Thread Mark Wielaard
DWARF5 makes it possible to read loclists tables without consulting
the debuginfo tree by introducing a table header. Adding location views
breaks this (at least for binutils and elfutils). So don't enable
variable-location-views by default if DWARF5 or higher is selected.
---
 gcc/doc/invoke.texi | 6 +++---
 gcc/toplev.c| 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 70dc1ab73a12..e5dddc236d7d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9301,9 +9301,9 @@ can be consumed by debug information consumers that are 
not aware of
 these augmentations, but they won't derive any benefit from them either.
 
 This is enabled by default when outputting DWARF 2 debug information at
-the normal level, as long as there is assembler support,
-@option{-fvar-tracking-assignments} is enabled and
-@option{-gstrict-dwarf} is not.  When assembler support is not
+the normal level for DWARF versions lower than 5, as long as there
+is assembler support, @option{-fvar-tracking-assignments} is enabled
+and @option{-gstrict-dwarf} is not.  When assembler support is not
 available, this may still be enabled, but it will force GCC to output
 internal line number tables, and if
 @option{-ginternal-reset-location-views} is not enabled, that will most
diff --git a/gcc/toplev.c b/gcc/toplev.c
index 07457d08c3aa..34218c6b3349 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -1672,6 +1672,7 @@ process_options (void)
   && (write_symbols == DWARF2_DEBUG
   || write_symbols == VMS_AND_DWARF2_DEBUG)
   && !dwarf_strict
+  && dwarf_version < 5
   && dwarf2out_as_loc_support
   && dwarf2out_as_locview_support);
 }
-- 
2.18.4



BoF DWARF5 patches

2020-08-24 Thread Mark Wielaard
Hi,

This afternoon there will be a BoF at the virtual Cauldron about
DWARF5 and beyond. https://linuxplumbersconf.org/event/7/contributions/746/

Here are some patches for GCC that I would like to discuss.
I'll reply to them after the BoF with any comments people made.

Cheers,

Mark




Re: [PATCH 0/6] Parallelize Intra-Procedural Optimizations using the LTO Engine.

2020-08-24 Thread Richard Biener via Gcc-patches
On Fri, Aug 21, 2020 at 12:00 AM Giuliano Belinassi
 wrote:
>
> This patch series add a new flag "-fparallel-jobs=" to control if the
> compiler should try to compile the current file in parallel.
>
> There are three modes which is supported by now:
>
> 1. -fparallel-jobs=: Try to compile the file using a maximum of N
> jobs.
>
> 2. -fparallel-jobs=jobserver: Check if there is a running GNU Make
> Jobserver. If positive, communicate with it in order to launch jobs,
> but alert the user if the jobserver was not found, since it requires
> modifications in the project Makefile.
>
> 3. -fparallel-jobs=auto: Same as 2., but quietly fall back to a maximum
> of 2 jobs if the jobserver was not found.
>
> The parallelization works by using a modified LTO engine, as no IR is
> dumped into the disk, and a new partitioner is employed to find
> symbols which must be partitioned together.
>
> In order to implement the parallelism feature, we:
>
> 1. The driver will pass a hidden -fsplit-outputs= to cc1*.
>
> 2. After IPA, cc1* will search for symbols in which must be partitioned
> together.  If the user allows GCC to automatically promote symbols to
> globals through "--param=promote-statics=1" for a better parallel
> compilation performance, it will also be done.  However, if it decides
> that partitioning is a bad idea, it will continue with a default serial
> compilation, and the additional  will not be created.  It will
> avoid compiling in parallel if and only if:
>
>   * File size exceeds the minimum file size specified by LTO default
>   --param=lto-min-partition.

less than the minimum size I suppose.

>   * The partitioner is unable to find any point of partitioning in the
>   file.

It might make sense to increase the minimum partition size and also
check the partitioning result against unreasonable bias (one very
large and one very small partition).

> 3. cc1* will fork itself; one fork for each partition. Each child
> process will apply its partition mask generated by the partitioner
> and write a new assembler name file to  pointed by the driver.

For the first partition there's no fork (but the main process is used) and
the main output file will be used, correct?

> 4. The driver will open each file and partially link them together into
> a single .o file, if -c was requested, else into a binary.  -S and -E
> is unsupported for now and probably will remain so.

That also applies to -save-temps mode I assume which makes
debugging issues a bit tricky and involves manual invocation
of the cc1 command to have the file with the output filenames preserved.

>
> Speedups ranged from 0.95x to 1.9x on a Quad-Core Intel Core-i7 8565U
> when testing with two files in GCC, as stated in the following table.
> The test was the result of a single execution with a previous warm up
> execution. The compiled GCC had checking enabled, and therefore release
> version might have better timings in both sequential and parallel, but the
> speedup may remain the same.
>
> ||| Without Static | With Static |   Max   |
> | File   | Sequential |Promotion   |  Promotion  | Speedup |
> ||||---|
> | gimple-match.c | 60s|   63s  | 34s |   1.7x  |
> | insn-emit.c| 37s|   19s  | 20s |   1.9x  |
>
> Notice that we have a slowdown in some cases when it is enabled, that
> is why the parallelism feature is enabled with a flag for now.

One reason why promote-statics is not enabled by default is that
it creates new hidden symbols (LTO does so as well) which might
be undesirable.  If deemed OK in general we could enable it by
default.  Note that originally I wanted to have -fparallel-jobs=auto
be enabled by default which should not end up with visible changes
like this(?)

> Bootstrapped and Regtested on Linux x86_64.
>
> Giuliano Belinassi (6):
>   Modify gcc driver for parallel compilation
>   Implement a new partitioner for parallel compilation
>   Implement fork-based parallelism engine
>   Add `+' for Jobserver Integration
>   Add invoke documentation
>   New tests for parallel compilation feature
>
>  gcc/Makefile.in   |6 +-
>  gcc/cgraph.c  |   16 +
>  gcc/cgraph.h  |   13 +
>  gcc/cgraphunit.c  |  198 ++-
>  gcc/common.opt|4 +
>  gcc/doc/invoke.texi   |   32 +-
>  gcc/gcc.c | 1219 +
>  gcc/ipa-fnsummary.c   |2 +-
>  gcc/ipa-icf.c |3 +-
>  gcc/ipa-visibility.c  |3 +-
>  gcc/ipa.c |4 +-
>  gcc/jobserver.cc  |  168 +++
>  gcc/jobserver.h   |   33 +
>  gcc/lto-cgraph.c 

Re: [PATCH] libstdc++: mark variables as possibly unused

2020-08-24 Thread Jonathan Wakely via Gcc-patches

On 24/08/20 13:26 +0200, Krystian Kuźniarek via Libstdc++ wrote:

Hi,

First of all, sorry, I must have sent it as quoted-printable so spaces and
tabs are preserved.

A description of the problem/bug and how your patch addresses it:
I've got a small patch for -Wunused-variable and -Wunused-parameter in
system headers. These are needed either for:
1) __glibcxx_assert() that quickly goes away during an early compilation
step leaving the warning
2) code that conditionally is removed by preprocessor leaving the warning

Testcases:
N/A, it's only a warning.

ChangeLog:
Sorry, contrib/mklog.py didn't quite work for me.
For some reason after instruction in line 129: "diff = PatchSet(data)" my
"diff" variable is always empty.

Bootstrapping and testing:
Tested that manually by recompling GCC, unfolding all headers with
`#include ` and compiling what's been included by it.

The patch itself:

diff --git a/libstdc++-v3/include/bits/atomic_base.h
b/libstdc++-v3/include/bits/atomic_base.h
index 015acef83c4..1fbb2d78d28 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -231,7 +231,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_GLIBCXX_ALWAYS_INLINE void
clear(memory_order __m = memory_order_seq_cst) noexcept
{
-  memory_order __b = __m & __memory_order_mask;
+  memory_order __b __attribute__ ((__unused__)) =


Thanks for the patch.

Please put the line break before the '=' character i.e.

  memory_order __b __attribute__ ((__unused__))
= __m & __memory_order_mask;

Other than that, the patch looks ok, and simple enough to not need a
copyright assignment for GCC.

A changelog entry and confirmation the testsuite still runs would be
preferred though.



Re: [PATCH] libstdc++: remove an ignored qualifier on function return type

2020-08-24 Thread Jonathan Wakely via Gcc-patches

On 24/08/20 13:26 +0200, Krystian Kuźniarek via Libstdc++ wrote:

Hi,

First of all, sorry, I must have sent it as quoted-printable so spaces and
tabs are preserved.

A description of the problem/bug and how your patch addresses it:
I've got a small patch for -Wignored-qualifiers in system headers.

Testcases:
N/A, it's only a warning.

ChangeLog:
Sorry, contrib/mklog.py didn't quite work for me.
For some reason after instruction in line 129: "diff = PatchSet(data)" my
"diff" variable is always empty.


So then you need to produce a changelog entry by hand.


Bootstrapping and testing:
Tested that manually by recompling GCC, unfolding all headers with
`#include ` and compiling what's been included by it.


That doesn't test this header at all.

What about the libstdc++ testsuite?


The patch itself:

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index eb3d6779205..e8fcb57a401 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -808,7 +808,7 @@ namespace __variant
{ using element_type = _Tp; };

  template 
-   struct __untag_result
+   struct __untag_result


I don't remember exactly why I put it there, but I seem to recall it
was necessary.



: false_type
{ using element_type = void(*)(_Args...); };



Best regards,
Krystian





  1   2   >