Re: [PATCH] RISC-V: Canonicalize --with-arch

2020-12-04 Thread Kito Cheng via Gcc-patches
Committed, thanks :)

On Thu, Dec 3, 2020 at 8:51 AM Jim Wilson  wrote:
>
> On Tue, Dec 1, 2020 at 12:13 AM Kito Cheng  wrote:
>>
>>  - We would like to canonicalize the arch string for --with-arch for
>>easier handling multilib, so split canonicalization part to a stand
>>along script to shared the logic.
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/multilib-generator (arch_canonicalize): Move
>> code to arch-canonicalize, and call that script to canonicalize arch
>> string.
>> (canonical_order): Move code to arch-canonicalize.
>> (LONG_EXT_PREFIXES): Ditto.
>> (IMPLIED_EXT): Ditto.
>> * config/riscv/arch-canonicalize: New.
>> * config.gcc (riscv*-*-*): Canonicalize --with-arch.
>
>
> Looks OK to me.
>
> Jim
>


Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-12-04 Thread Martin Liška

On 12/3/20 2:12 PM, Richard Biener wrote:

Can we somehow preserve this by making the helper produce separate
strings for the 'GNU C17 11.0 ...' part and the options passed part?  So the
-fverbose-asm and -Q consumer can continue to nicely print the option part?


Yep, good idea. I've just done that and I have:

1) -Q output now:

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
options passed: -dumpbase-ext .c -mtune=generic -march=x86-64 -g -O2

and

GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
options passed: ./xgcc -B. -Q -v /home/marxin/Programming/testcases/a.c -c -O2 
-fverbose-asm -frecord-gcc-switches-format=driver -g -S

2) For -fverbose-asm:

# GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
# options passed: -dumpbase-ext .c -mtune=generic -march=x86-64 -g -O2

and

# GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
# options passed: -dumpbase-ext .c -mtune=generic -march=x86-64 -g -O2 
-frecord-gcc-switches-file=/tmp/ccm3kL7d.cmdline

3) DWARF producer:

   DW_AT_producer: (indirect string, offset: 0x97): GNU C17 11.0.0 
20201204 (experimental) -dumpbase-ext .c -mtune=generic -march=x86-64 -g -O2

and

   DW_AT_producer: (indirect string, offset: 0x27): GNU C17 11.0.0 
20201204 (experimental) ./xgcc -B. -Q -v /home/marxin/Programming/testcases/a.c -c 
-O2 -fverbose-asm -frecord-gcc-switches-format=driver -g

Thoughts?
Thanks,
Martin
>From 23de1dbbd81f662e9eb4a67385b0a2d0773569be Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 23 Nov 2020 13:40:04 +0100
Subject: [PATCH 1/2] Refactor -frecord-gcc-switches.

gcc/ChangeLog:

	* doc/tm.texi: Change argument of the record_gcc_switches
	hook and remove SWITCH_TYPE_* enum values.
	* dwarf2out.c (gen_producer_string): Move to opts.c and remove
	handling of the dwarf_record_gcc_switches option.
	(dwarf2out_early_finish): Use moved gen_producer_string
	function.
	* opts.c (gen_producer_string): New.
	* opts.h (gen_producer_string): New.
	* target.def: Change type of record_gcc_switches.
	* target.h (enum print_switch_type): Remove.
	(elf_record_gcc_switches): Change first argument.
	* toplev.c (MAX_LINE): Remove.
	(print_to_asm_out_file):  Likewise.
	(print_to_stderr): Likewise.
	(print_single_switch): Likewise.
	(print_switch_values): Likewise.
	(init_asm_output): Use new gen_producer_string function.
	(process_options): Likewise.
	* varasm.c (elf_record_gcc_switches): Just save the string argument
	to the ELF container.
---
 gcc/doc/tm.texi |  38 +--
 gcc/dwarf2out.c | 118 +++-
 gcc/opts.c  | 118 
 gcc/opts.h  |   6 ++
 gcc/target.def  |  38 +--
 gcc/target.h|  14 +---
 gcc/toplev.c| 176 +---
 gcc/varasm.c|  48 +++--
 8 files changed, 165 insertions(+), 391 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f5077655716..d9b855c13ac 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -8090,43 +8090,9 @@ need to override this if your target has special flags that might be
 set via @code{__attribute__}.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_ASM_RECORD_GCC_SWITCHES (print_switch_type @var{type}, const char *@var{text})
+@deftypefn {Target Hook} void TARGET_ASM_RECORD_GCC_SWITCHES (const char *@var{})
 Provides the target with the ability to record the gcc command line
-switches that have been passed to the compiler, and options that are
-enabled.  The @var{type} argument specifies what is being recorded.
-It can take the following values:
-
-@table @gcctabopt
-@item SWITCH_TYPE_PASSED
-@var{text} is a command line switch that has been set by the user.
-
-@item SWITCH_TYPE_ENABLED
-@var{text} is an option which has been enabled.  This might be as a
-direct result of a command line switch, or because it is enabled by
-default or because it has been enabled as a side effect of a different
-command line switch.  For example, the @option{-O2} switch enables
-various different individual optimization passes.
-
-@item SWITCH_TYPE_DESCRIPTIVE
-@var{text} is either NULL or some descriptive text which should be
-ignored.  If @var{text} is NULL then it is being used to warn the
-target hook that either recording is starting or ending.  The first
-time @var{type} is SWITCH_TYPE_DESCRIPTIVE and @var{text} is NULL, the
-warning is for start up and the second time the warning is for
-wind down.  This feature is to allow the target hook to make any
-necessary preparations before it starts to record switches and to
-perform any necessary tidying up after it has finished recording
-switches.
-
-@item SWITCH_TYPE_LINE_START
-This option can be ignored by this target hook.
-
-@item  SWITCH_TYPE_LINE_END
-This option can be ignored by this target hook.
-@end table
-
-The hook's return value must be zero.  Other return values may be
-supported in the fut

Re: [patch] Fix checking failure in IPA-SRA

2020-12-04 Thread Richard Biener via Gcc-patches
On Thu, Dec 3, 2020 at 8:13 PM Eric Botcazou  wrote:
>
> Hi,
>
> this is a regression present on the mainline and 10 branch: on the one hand,
> IPA-SRA does *not* disqualify accesses with zero size but, on the other hand,
> it checks that accesses present in the tree have a (strictly) positive size,
> thus trivially yielding an ICE, for example on the attached Ada testcase.
>
> The attached fix relaxes the check, OK for mainline and 10 branch?

OK.

Thanks,
Richard.

>
> 2020-12-03  Eric Botcazou  
>
> * ipa-sra.c (verify_access_tree_1): Relax assertion on the size.
>
>
> 2020-12-03  Eric Botcazou  
>
> * gnat.dg/opt91.ad[sb]: New test.
> * gnat.dg/opt91_pkg.ad[sb]: New helper.
>
> --
> Eric Botcazou


[PATCH][pushed] Document missing params.

2020-12-04 Thread Martin Liška

Pushed to master.

Martin

contrib/ChangeLog:

* check-params-in-docs.py: use flake8 and add some
tweaks to ignore aarch64 params.

gcc/ChangeLog:

* doc/invoke.texi: Add missing params.
---
 contrib/check-params-in-docs.py | 12 +-
 gcc/doc/invoke.texi | 40 +++--
 2 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
index dfbfa3d0067..440549f5fd8 100755
--- a/contrib/check-params-in-docs.py
+++ b/contrib/check-params-in-docs.py
@@ -23,6 +23,7 @@
 #
 
 import argparse

+import sys
 from itertools import dropwhile, takewhile
 
 
@@ -42,7 +43,7 @@ parser.add_argument('params_output')
 
 args = parser.parse_args()
 
-ignored = set(['logical-op-non-short-circuit'])

+ignored = {'logical-op-non-short-circuit'}
 params = {}
 
 for line in open(args.params_output).readlines():

@@ -58,15 +59,21 @@ texi = list(texi)[1:]
 
 token = '@item '

 texi = [x[len(token):] for x in texi if x.startswith(token)]
+# skip digits
+texi = [x for x in texi if not x[0].isdigit()]
+# skip aarch64 params
+texi = [x for x in texi if not x.startswith('aarch64')]
 sorted_texi = sorted(texi)
 
 texi_set = set(texi) - ignored

 params_set = set(params.keys()) - ignored
 
+success = True

 extra = texi_set - params_set
 if len(extra):
 print('Extra:')
 print(extra)
+success = False
 
 missing = params_set - texi_set

 if len(missing):
@@ -75,6 +82,9 @@ if len(missing):
 print('@item ' + m)
 print(params[m])
 print()
+success = False
 
 if texi != sorted_texi:

 print('WARNING: not sorted alphabetically!')
+
+sys.exit(0 if success else 1)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 615eae9a1c5..38c4d6a865a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13041,6 +13041,9 @@ growth limit is needed to avoid exponential explosion 
of code size.  Thus for
 smaller units, the size is increased to @option{--param large-unit-insns}
 before applying @option{--param inline-unit-growth}.
 
+@item lazy-modules

+Maximum number of concurrently open C++ module files when lazy loading.
+
 @item inline-unit-growth
 Specifies maximal overall growth of the compilation unit caused by inlining.
 For example, parameter value 20 limits unit growth to 1.2 times the original
@@ -13052,6 +13055,9 @@ Specifies maximal overall growth of the compilation 
unit caused by
 interprocedural constant propagation.  For example, parameter value 10 limits
 unit growth to 1.1 times the original size.
 
+@item ipa-cp-large-unit-insns

+The size of translation unit that IPA-CP pass considers large.
+
 @item large-stack-frame
 The limit specifying large stack frames.  While inlining the algorithm is 
trying
 to not grow past this limit too much.
@@ -13106,19 +13112,19 @@ Deeper chains are still handled by late inlining.
 Probability (in percent) that C++ inline function with comdat visibility
 are shared across multiple compilation units.
 
-@item ipa-modref-max-bases

-@item ipa-modref-max-refs
-@item ipa-modref-max-accesses
+@item modref-max-bases
+@item modref-max-refs
+@item modref-max-accesses
 Specifies the maximal number of base pointers, referneces and accesses stored
 for a single function by mod/ref analysis.
 
-@item ipa-modref-max-tests

+@item modref-max-tests
 Specifies the maxmal number of tests alias oracle can perform to disambiguate
 memory locations using the mod/ref information.  This parameter ought to be
-bigger than @option{--param ipa-modref-max-bases} and @option{--param
-ipa-modref-max-refs}.
+bigger than @option{--param modref-max-bases} and @option{--param
+modref-max-refs}.
 
-@item ipa-modref-max-depth

+@item modref-max-depth
 Specifies the maximum depth of DFS walk used by modref escape analysis.
 Setting to 0 disables the analysis completely.
 
@@ -13966,6 +13972,12 @@ If the size of a local variable in bytes is smaller or equal to this

 number, directly poison (or unpoison) shadow memory instead of using
 run-time callbacks.
 
+@item tsan-distinguish-volatile

+Emit special instrumentation for accesses to volatiles.
+
+@item tsan-instrument-func-entry-exit
+Emit instrumentation calls to __tsan_func_entry() and __tsan_func_exit().
+
 @item max-fsm-thread-path-insns
 Maximum number of instructions to copy when duplicating blocks on a
 finite state automaton jump thread path.
@@ -14005,6 +14017,9 @@ we may be able to devirtualize speculatively.
 The maximum number of assertions to add along the default edge of a switch
 statement during VRP.
 
+@item evrp-mode

+Specifies the mode Early VRP should operate in.
+
 @item unroll-jam-min-percent
 The minimum percentage of memory references that must be optimized
 away for the unroll-and-jam transformation to be considered profitable.
@@ -14169,15 +14184,26 @@ Maximum number of VALUEs handled during a single 
find_base_term call.
 The maximum number of exploded nodes per program poin

Re: [PATCH RFA] vec: Simplify use with C++11 range-based 'for'.

2020-12-04 Thread Richard Biener via Gcc-patches
On Thu, Dec 3, 2020 at 10:46 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 12/3/20 10:53 AM, Jason Merrill via Gcc-patches wrote:
> > It looks cleaner if we can use a vec* directly as a range for the C++11
> > range-based 'for' loop, without needing to indirect from it, and also works
> > with null pointers.
> >
> > The change in cp_parser_late_parsing_default_args is an example of how this
> > can be used to simplify many loops over vec*.
> >
> > I deliberately didn't format the new overloads for etags since they are
> > trivial, but am open to changing that.
> >
> > Tested x86_64-pc-linux-gnu.  Is this OK for trunk now, or should I hold it 
> > for
> > stage 1?
> >
> > gcc/ChangeLog:
> >
> >   * vec.h (begin, end): Add overloads for vec*.
> >   * tree.c (build_constructor_from_vec): Remove *.
> >
> > gcc/cp/ChangeLog:
> >
> >   * decl2.c (clear_consteval_vfns): Remove *.
> >   * pt.c (do_auto_deduction): Remove *.
> >   * parser.c (cp_parser_late_parsing_default_args): Change loop
> >   to use range 'for'.
> I'd go forward with it now, it's simple enough and simplifies the code
> we end up writing...

Btw, I was disappointed about range-for seeing you cannot express
iterating from element 2 or reverse iterating.  This means when we
try to adopt range-for we'll keep a messy mix of iteration style since
range-for cannot express all (or even most) iterations in our code base.

:/

Richard.

>
> jeff
>


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Richard Biener via Gcc-patches
On Thu, Dec 3, 2020 at 11:13 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 12/3/20 8:29 AM, Kumar, Venkataramanan via Gcc-patches wrote:
> > [AMD Public Use]
> >
> >
> > Hi Maintainers,
> >
> > PFA, the patch that enables support for the next generation AMD Zen3 CPU 
> > via -march=znver3.
> > This is a very basic enablement patch. As of now the cost, tuning and 
> > scheduler changes are kept same as znver2.
> > Further changes to the cost and tunings will be done later.
> >
> > Ok for trunk ?
> >
> > Regards,
> > Venkat.
> >
> >
> > X86_64-Enable-support-for-next-generation-AMD-Znver3.patch
> >
> > From ef7bd7d02e98d86ff32fa0dad6bc1d0802bd32aa Mon Sep 17 00:00:00 2001
> > From: Venkataramanan Kumar 
> > Date: Thu, 3 Dec 2020 17:32:53 +0530
> > Subject: [PATCH] X86_64: Enable support for next generation AMD Zen3 CPU.
> >
> > 2020-12-03  Venkataramanan Kumar  
> >   Sharavan Kumar  
> >
> > gcc/ChangeLog:
> >
> >   * common/config/i386/cpuinfo.h (get_amd_cpu) recognize znver3.
> >   * common/config/i386/i386-common.c (processor_names): Add
> >   znver3.
> >   (processor_alias_table): Add znver3 and AMDFAM19H entry.
> >   * common/config/i386/i386-cpuinfo.h (processor_types): Add
> >   AMDFAM19H.
> >   (processor_subtypes): AMDFAM19H_ZNVER3.
> >   * config.gcc (i[34567]86-*-linux* | ...): Likewise.
> >   * config/i386/driver-i386.c: (host_detect_local_cpu): Let
> >   -march=native recognize znver3 processors.
> >   * config/i386/i386-c.c (ix86_target_macros_internal): Add
> >   znver3.
> >   * config/i386/i386-options.c (m_znver3): New definition.
> >   (m_ZNVER): Include m_znver3.
> >   (processor_cost_table): Add znver3.
> >   * config/i386/i386.c (ix86_reassociation_width): Likewise.
> >   * config/i386/i386.h (TARGET_znver3): New definition.
> >   (enum processor_type): Add PROCESSOR_ZNVER3.
> >   * config/i386/i386.md (define_attr "cpu"): Add znver3.
> >   * config/i386/x86-tune-sched.c: (ix86_issue_rate): Likewise.
> >   (ix86_adjust_cost): Likewise.
> >   * config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS:
> >   Likewise.
> >   * config/i386/znver1.md: Add new reservations for znver3.
> >   * doc/extend.texi: Add details about znver3.
> >   * doc/invoke.texi: Likewise.
> Normally I would consider this inappropriate for stage3, but AFAICT the
> risk profile of this patch should be small.  Ultimately it's up to Uros
> and I'll support whatever decision he makes.

Per rule changes to targets are allowed at any point per discretion of target
maintainers.  Heck, we even accept _new_ targets during stage3/4!

So it's clearly appropriate at this stage but of course target maintainers
need to ack changes in their area.

Richard.

>
> Jeff
>


Re: How to traverse all the local variables that declared in the current routine?

2020-12-04 Thread Richard Biener via Gcc-patches
On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
> >> Another issue is, in order to check whether an auto-variable has 
> >> initializer, I plan to add a new bit in “decl_common” as:
> >>   /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
> >>   unsigned decl_is_initialized :1;
> >>
> >> /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
> >> #define DECL_IS_INITIALIZED(NODE) \
> >>   (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
> >>
> >> set this bit when setting DECL_INITIAL for the variables in FE. then keep 
> >> it
> >> even though DECL_INITIAL might be NULLed.
> >
> > For locals it would be more reliable to set this flag during gimplification.
> >
> >> Do you have any comment and suggestions?
> >
> > As said above - do you want to cover registers as well as locals?  I'd do
> > the actual zeroing during RTL expansion instead since otherwise you
> > have to figure youself whether a local is actually used (see 
> > expand_stack_vars)
> >
> > Note that optimization will already made have use of "uninitialized" state
> > of locals so depending on what the actual goal is here "late" may be too 
> > late.
>
> Haven't thought about this much, so it might be a daft idea, but would a
> compromise be to use a const internal function:
>
>   X1 = .DEFERRED_INIT (X0, INIT)
>
> where the X0 argument is an uninitialised value and the INIT argument
> describes the initialisation pattern?  So for a decl we'd have:
>
>   X = .DEFERRED_INIT (X, INIT)
>
> and for an SSA name we'd have:
>
>   X_2 = .DEFERRED_INIT (X_1(D), INIT)
>
> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>
> * Having the X0 argument would keep the uninitialised use of the
>   variable around for the later warning passes.
>
> * Using a const function should still allow the UB to be deleted as dead
>   if X1 isn't needed.
>
> * Having a function in the way should stop passes from taking advantage
>   of direct uninitialised uses for optimisation.
>
> This means we won't be able to optimise based on the actual init
> value at the gimple level, but that seems like a fair trade-off.
> AIUI this is really a security feature or anti-UB hardening feature
> (in the sense that users are more likely to see predictable behaviour
> “in the field” even if the program has UB).

The question is whether it's in line of peoples expectation that
explicitely zero-initialized code behaves differently from
implicitely zero-initialized code with respect to optimization
and secondary side-effects (late diagnostics, latent bugs, etc.).

Introducing a new concept like .DEFERRED_INIT is much more
heavy-weight than an explicit zero initializer.

As for optimization I fear you'll get a load of redundant zero-init
actually emitted if you can just rely on RTL DSE/DCE to remove it.

Btw, I don't think theres any reason to cling onto clangs semantics
for a particular switch.  We'll never be able to emulate 1:1 behavior
and our -Wuninit behavior is probably wastly different already.

Richard.

> Thanks,
> Richard


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Uros Bizjak via Gcc-patches
On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
 wrote:
>
> [AMD Public Use]
>
>
>
>
> Hi Maintainers,
>
>
>
> PFA, the patch that enables support for the next generation AMD Zen3 CPU via 
> -march=znver3.
>
> This is a very basic enablement patch. As of now the cost, tuning and 
> scheduler changes are kept same as znver2.
>
> Further changes to the cost and tunings will be done later.
>
>
>
> Ok for trunk ?

Please also add a new target to multiversioning and corresponding
testcases. As an example, how this is done nowadays, please see a
submission for a different target at [1].

BTW: It looks that multiversioning testcases lack AMD targets. Can you
please add a testcase similar to testsuite/g++.target/i386/mv16.C and
also add AMD targets to testsuite/gcc.target/i386/funcspec-56.inc.
(this can be done in a follow-up patch).

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549699.html

Uros.


Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-12-04 Thread Richard Biener via Gcc-patches
On Fri, Dec 4, 2020 at 9:08 AM Martin Liška  wrote:
>
> On 12/3/20 2:12 PM, Richard Biener wrote:
> > Can we somehow preserve this by making the helper produce separate
> > strings for the 'GNU C17 11.0 ...' part and the options passed part?  So the
> > -fverbose-asm and -Q consumer can continue to nicely print the option part?
>
> Yep, good idea. I've just done that and I have:
>
> 1) -Q output now:
>
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> options passed: -dumpbase-ext .c -mtune=generic -march=x86-64 -g -O2
>
> and
>
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> options passed: ./xgcc -B. -Q -v /home/marxin/Programming/testcases/a.c -c 
> -O2 -fverbose-asm -frecord-gcc-switches-format=driver -g -S
>
> 2) For -fverbose-asm:
>
> # GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> # options passed: -dumpbase-ext .c -mtune=generic -march=x86-64 -g -O2
>
> and
>
> # GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> # options passed: -dumpbase-ext .c -mtune=generic -march=x86-64 -g -O2 
> -frecord-gcc-switches-file=/tmp/ccm3kL7d.cmdline
>
> 3) DWARF producer:
>
> DW_AT_producer: (indirect string, offset: 0x97): GNU C17 
> 11.0.0 20201204 (experimental) -dumpbase-ext .c -mtune=generic -march=x86-64 
> -g -O2
>
> and
>
> DW_AT_producer: (indirect string, offset: 0x27): GNU C17 
> 11.0.0 20201204 (experimental) ./xgcc -B. -Q -v 
> /home/marxin/Programming/testcases/a.c -c -O2 -fverbose-asm 
> -frecord-gcc-switches-format=driver -g
>
> Thoughts?

+/* Return a heap allocated producer with command line options.  */
+
+char *gen_command_line_string (cl_decoded_option *options,

char * goes to a separate line

+  unsigned int options_count)
+{

Otherwise 0001- looks good to me.  As said I'd like to see opinions
from others on the
driver / backend communication for 0002.

Thanks,
Richard.

> Thanks,
> Martin


Re: [PATCH] Hashtable PR96088

2020-12-04 Thread François Dumont via Gcc-patches
Following submission of the heterogeneous lookup in unordered containers 
I rebased this patch on top of it.


Appart from reducing its size because of some code reuse the 
heterogeneous lookup had no impact on this one. This is because when I 
cannot find out if conversion from inserted element type to hash functor 
can throw then I pass the element as-is, like if hash functor was 
transparent.


    libstdc++: Limit allocation on iterator insertion in Hashtable [PR 
96088]


    Detect Hash functor argument type to find out if it is different to the
    container key_type and if a temporary instance needs to be 
generated to invoke
    the functor from the iterator value_type key part. If this 
temporary generation
    can throw a key_type instance is generated at Hashtable level and 
used to call

    the functors and, if necessary, moved to the storage.

    libstdc++-v3/ChangeLog:

    PR libstdc++/96088
    * include/bits/hashtable_policy.h (_Select2nd): New.
    (_NodeBuilder<>): New.
    (_ReuseOrAllocNode<>::operator()): Use variadic template args.
    (_AllocNode<>::operator()): Likewise.
    (_Hash_code_base<>::_M_hash_code): Add _Kt template parameter.
    (_Hashtable_base<>::_M_equals): Add _Kt template parameter.
    * include/bits/hashtable.h
    (_Hashtable<>::__node_builder_t): New.
    (_Hashtable<>::_M_find_before_node): Add _Kt template 
parameter.

    (_Hashtable<>::_M_find_node): Likewise.
    (_Hashtable<>::_Hash_arg_t): New.
    (_Hashtable<>::_S_forward_key): New.
(_Hashtable<>::_M_insert_unique<>(_Kt&&, _Arg&&, const _NodeGenerator&)):
 New.
    (_Hashtable<>::_M_insert): Use latter.
    * testsuite/23_containers/unordered_map/96088.cc: New test.
    * testsuite/23_containers/unordered_multimap/96088.cc: New 
test.
    * testsuite/23_containers/unordered_multiset/96088.cc: New 
test.

    * testsuite/23_containers/unordered_set/96088.cc: New test.
    * testsuite/util/replacement_memory_operators.h
    (counter::_M_increment): New.
    (counter::_M_decrement): New.
    (counter::reset()): New.

Note that I plan to change counter type name to something more 
meaningful but only when back to stage 1.


François

On 24/10/20 4:25 pm, François Dumont wrote:

Hi

    Just a rebase of this patch.

François

On 17/10/20 6:21 pm, François Dumont wrote:

I eventually would like to propose the following resolution.

For multi-key containers I kept the same resolution build the node 
first and compute has code from the node key.


For unique-key ones I change behavior when I can't find out hash 
functor argument type. I am rather using the iterator key type and 
just hope that the user's functors are prepared for it.


For now I am using functor argument_type which is deprecated. I just 
hope that the day we remove it we will have a compiler built-in to 
get any functor argument type given an input type.


    libstdc++: Limit allocation on iterator insertion in Hashtable 
[PR 96088]


    Detect Hash functor argument type to find out if it is different 
to the
    container key_type and if a temporary instance needs to be 
generated to invoke
    the functor from the iterator value_type key part. If this 
temporary generation
    can throw a key_type instance is generated at Hashtable level and 
use to call

    the functors and, if needed, move it to the storage.

    libstdc++-v3/ChangeLog:

    PR libstdc++/96088
    * include/bits/hashtable_policy.h (_Select2nd): New.
    (_NodeBuilder<>): New.
    (_ReuseOrAllocNode<>::operator()): Use varriadic template 
args.

    (_AllocNode<>::operator()): Likewise.
    (_Hash_code_base<>::_M_hash_code): Add _KType template 
parameter.
    (_Hashtable_base<>::_M_equals): Add _KType template 
parameter.

    * include/bits/hashtable.h
    (_Hashtable<>::__node_builder_t): New.
    (_Hashtable<>::_M_find_before_node): Add _KType template 
parameter.

    (_Hashtable<>::_M_find_node): Likewise.
    (_Hashtable<>::_Hash_arg_t): New.
    (_Hashtable<>::_S_forward_key): New.
(_Hashtable<>::_M_insert_unique<>(_KType&&, _Arg&&, const 
_NodeGenerator&)):

 New.
    (_Hashtable<>::_M_insert): Use latter.
    * testsuite/23_containers/unordered_map/96088.cc: New test.
    * testsuite/23_containers/unordered_multimap/96088.cc: 
New test.
    * testsuite/23_containers/unordered_multiset/96088.cc: 
New test.

    * testsuite/23_containers/unordered_set/96088.cc: New test.
    * testsuite/util/replacement_memory_operators.h
    (counter::_M_increment): New.
    (counter::_M_decrement): New.
    (counter::reset()): New.

Tested under Linux x86_64.

Ok to commit ?

François

On 01/09/20 2:36 pm,

Re: [patch] Fix PR middle-end/98099

2020-12-04 Thread Christophe Lyon via Gcc-patches
On Thu, 3 Dec 2020 at 13:33, Richard Biener via Gcc-patches
 wrote:
>
> On Thu, Dec 3, 2020 at 11:49 AM Eric Botcazou  wrote:
> >
> > Hi,
> >
> > this replaces the ICE by a sorry message for the use of reverse scalar 
> > storage
> > order with a 128-bit decimal floating-point type on 32-bit platforms.
> >
> > Tested on x86-64/Linux, OK for the mainline?
>
> OK.
>
> Richard.
>
> >
> > 2020-12-03  Eric Botcazou  
> >
> > * expmed.c (flip_storage_order): In the case of a non-integer mode,
> > sorry out if the integer mode to be used instead is not supported.
> >
> >
> > 2020-12-03  Eric Botcazou  
> >
> > * gcc.dg/pr98099.c: New test.
> >

I think you need to add an effective-target check, because the new test
fails on aarch64/arm:
FAIL: gcc.dg/pr98099.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/pr98099.c:7:12: error: decimal floating-point
not supported for this target
/gcc/testsuite/gcc.dg/pr98099.c:9:1: error: decimal floating-point not
supported for this target


Christophe

> > --
> > Eric Botcazou


Re: [patch] Fix PR middle-end/98099

2020-12-04 Thread Eric Botcazou
> I think you need to add an effective-target check, because the new test
> fails on aarch64/arm:

Done.

-- 
Eric Botcazou




[PATCH][pushed] testsuite: use param for if-to-switch tests

2020-12-04 Thread Martin Liška

gcc/testsuite/ChangeLog:

PR testsuite/98123
* gcc.dg/tree-ssa/if-to-switch-4.c: Add param to make the test
stable on all architectures.
* gcc.dg/tree-ssa/if-to-switch-6.c: Likewise.
* gcc.dg/tree-ssa/if-to-switch-8.c: Likewise.
---
 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-4.c | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c | 2 +-
 gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-4.c
index 6a035883457..e6dd4beb6bc 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-iftoswitch-optimized" } */
+/* { dg-options "-O2 -fdump-tree-iftoswitch-optimized --param 
case-values-threshold=5" } */
 
 int global;

 int foo ();
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c
index 464b1fbd124..b1640673eae 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-iftoswitch-optimized" } */
+/* { dg-options "-O2 -fdump-tree-iftoswitch-optimized --param 
case-values-threshold=5" } */
 
 int global;

 int foo ();
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c
index f43ce7daf78..f4d06fed2b6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/if-to-switch-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-iftoswitch-optimized" } */
+/* { dg-options "-O2 -fdump-tree-iftoswitch-optimized --param 
case-values-threshold=5" } */
 
 int global;

 int global1;
--
2.29.2



[PATCH 0/8 v4] LTO Dead Field Elimination and Field Reordering

2020-12-04 Thread Erick Ochoa

Hello,

I'm sharing the most recent version of dead-field elimination. In this 
patchset the following issues have been addressed:


* CamelCase -> snake_case
* STL -> GCC specific data structures
* Fixed the commit messages (the last two commits will be squashed in 
future patchset so the commit messages are not really relevant.)


The only criticism that I have not addressed is the use of my own 
visitors for trees and gimple instructions. My position is still that 
the current visitor (for trees) is not enough to address our needs. The 
visitor implemented here has pre and post traversal hooks. Furthermore, 
there is a visitor exclusively for tree expressions and another for tree 
types, which can allow users to focus on a specific tree traversal.


There is one single STL use and that is std::string which is used when 
serializing tree types. I'd would love to keep this since preppending 
and appending std::strings is so easy and it is really only used when 
dumping debug information. (Well, technically two uses of STL, but I 
have seen std::pair being used elsewhere in the repo, so I believe it is 
ok?)


The last two commits will be squashed across the other commits (i.e., no 
patch will ever include references to STL beyond std::string and 
std::pair) and some small issues will be fixed (i.e., some names will 
change and I will remove the use of auto).


I will be running tests this weekend to make sure nothing broke, but 
initial (fast) testing seems to indicate that everything is working 
correctly.


So, why am I sharing this now? I am hoping to provide a follow up to 
those interested in this transformation and I am hoping to receive more 
feedback in order to make this pass something which the community 
values. Please, let me know your comments, questions, concerns, etc...


-Erick


[PATCH 1/8 v4] Dead-field warning in structs at LTO-time

2020-12-04 Thread Erick Ochoa



This commit includes the following components:

  Type-based escape analysis to determine structs that can be modified at
  link-time.
  Field access analysis to determine which fields are never read.

The type-based escape analysis provides a list of types, that are not
visible outside of the current linking unit (e.g. parameter types of 
external

functions).

The field access analyses non-escaping structs for fields that
are not used in the linking unit and thus can be removed.

2020-11-04  Erick Ochoa  

* Makefile.in: Add file to list of new sources.
* common.opt: Add new flags.
* ipa-type-escape-analysis.c: New file.
---
 gcc/Makefile.in|1 +
 gcc/common.opt |8 +
 gcc/ipa-type-escape-analysis.c | 3428 
 gcc/ipa-type-escape-analysis.h | 1152 +++
 gcc/passes.def |1 +
 gcc/timevar.def|1 +
 gcc/tree-pass.h|2 +
 7 files changed, 4593 insertions(+)
 create mode 100644 gcc/ipa-type-escape-analysis.c
 create mode 100644 gcc/ipa-type-escape-analysis.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 978a08f7b04..8b18c9217a2 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1415,6 +1415,7 @@ OBJS = \
incpath.o \
init-regs.o \
internal-fn.o \
+   ipa-type-escape-analysis.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index d4cbb2f86a5..85351738a29 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3460,4 +3460,12 @@ fipa-ra
 Common Report Var(flag_ipa_ra) Optimization
 Use caller save register across calls if possible.
 +fipa-type-escape-analysis
+Common Report Var(flag_ipa_type_escape_analysis) Optimization
+This flag is only used for debugging the type escape analysis
+
+Wdfa
+Common Var(warn_dfa) Init(1) Warning
+Warn about dead fields at link time.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
new file mode 100644
index 000..32c8bf997fb
--- /dev/null
+++ b/gcc/ipa-type-escape-analysis.c
@@ -0,0 +1,3428 @@
+/* IPA Type Escape Analysis and Dead Field Elimination
+   Copyright (C) 2019-2020 Free Software Foundation, Inc.
+
+  Contributed by Erick Ochoa 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Interprocedural dead field analysis (IPA-DFA)
+
+   The goal of this analysis is to
+
+   1) discover RECORD_TYPEs which do not escape the current linking unit.
+
+   2) discover fields in RECORD_TYPEs that are never read.
+
+   3) merge the results from 1 and 2 to determine which fields are not 
needed.

+
+   The algorithm basically consists of the following stages:
+
+   1) Partition all TYPE_P trees into two sets: those trees which reach a
+   tree of RECORD_TYPE.
+
+   2.a) Analyze callsites to determine if arguments and return types are
+   escaping.
+   2.b) Analyze casts to determine if it would be safe to mark a field 
as dead.

+   2.c) Analyze for constructors and static initialization and mark this as
+   TYPE_P trees as unable to be modified
+   2.d) Analyze if FIELD_DECL are accessed via pointer arithmetic and mark
+   FIELD_DECLs before as unable to be modified.
+   2.e) Analyze if an address of a FIELD_DECL is taken and mark the whole
+   RECORD_TYPE as unable to be modified.
+   2.f) Propagate this information to nested TYPE_P trees.
+   2.g) Propagate this information across different TYPE_P trees that 
represent

+   equivalent TYPE_P types.
+
+   3.a) Analyze FIELD_DECL to determine whether they are read,
+   written or neither.
+   3.b) Unify this information across different RECORD_TYPE trees that
+   represent equivalent types
+   3.c) Determine which FIELD_DECL can be deleted.
+
+   4) Calculate the intersection of non-escaping RECORD_TYPEs with 
RECORD_TYPEs

+   that have a field that can be deleted.
+
+   First stage - Determining if a TYPE_P points to a RECORD_TYPE
+   =
+
+   This stage is computed through the *Collector classes.  Those are
+   TypeCollector, ExprCollector and GimpleTypeCollector which walk up 
and down
+   types, expressions, and gimple respectively and propagate 
information about

+   TYPE_P trees and mantain information on the type partitions.
+
+   Second stage - De

[PATCH 2/8 v4] Add Dead Field Elimination

2020-12-04 Thread Erick Ochoa



Using the Dead Field Analysis, Dead Field Elimination
automatically transforms gimple to eliminate fields that
are never read.

2020-11-04  Erick Ochoa  

* Makefile.in: Add file to list of sources.
* ipa-dfe.c: New.
* ipa-dfe.h: Same.
* ipa-type-escape-analysis.h: Export code used in dfe.
* ipa-type-escape-analysis.c: Call transformation.
---
 gcc/Makefile.in|1 +
 gcc/ipa-dfe.c  | 1284 
 gcc/ipa-dfe.h  |  247 ++
 gcc/ipa-type-escape-analysis.c |   24 +-
 gcc/ipa-type-escape-analysis.h |   14 +-
 5 files changed, 1557 insertions(+), 13 deletions(-)
 create mode 100644 gcc/ipa-dfe.c
 create mode 100644 gcc/ipa-dfe.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8b18c9217a2..8ef6047870b 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
init-regs.o \
internal-fn.o \
ipa-type-escape-analysis.o \
+   ipa-dfe.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
new file mode 100644
index 000..7c5e81bd6ac
--- /dev/null
+++ b/gcc/ipa-dfe.c
@@ -0,0 +1,1284 @@
+/* IPA Type Escape Analysis and Dead Field Elimination
+   Copyright (C) 2019-2020 Free Software Foundation, Inc.
+
+  Contributed by Erick Ochoa 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Interprocedural dead field elimination (IPA-DFE)
+
+   The goal of this transformation is to
+
+   1) Create new types to replace RECORD_TYPEs which hold dead fields.
+   2) Substitute instances of old RECORD_TYPEs for new RECORD_TYPEs.
+   3) Substitute instances of old FIELD_DECLs for new FIELD_DECLs.
+   4) Fix some instances of pointer arithmetic.
+   5) Relayout where needed.
+
+   First stage - DFA
+   =
+
+   Use DFA to compute the set of FIELD_DECLs which can be deleted.
+
+   Second stage - Reconstruct Types
+   
+
+   This stage is done by two family of classes, the SpecificTypeCollector
+   and the TypeReconstructor.
+
+   The SpecificTypeCollector collects all TYPE_P trees which point to
+   RECORD_TYPE trees returned by DFA.  The TypeReconstructor will create
+   new RECORD_TYPE trees and new TYPE_P trees replacing the old RECORD_TYPE
+   trees with the new RECORD_TYPE trees.
+
+   Third stage - Substitute Types and Relayout
+   ===
+
+   This stage is handled by ExprRewriter and GimpleRewriter.
+   Some pointer arithmetic is fixed here to take into account those 
eliminated

+   FIELD_DECLS.
+ */
+
+#include "config.h"
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple-expr.h"
+#include "predict.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "tree-pretty-print.h"
+#include "tree-inline.h"
+#include "ipa-fnsummary.h"
+#include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "basic-block.h" //needed for gimple.h
+#include "function.h"//needed for gimple.h
+#include "gimple.h"
+#include "stor-layout.h"
+#include "cfg.h" // needed for gimple-iterator.h
+#include "gimple-iterator.h"
+#include "gimplify.h"  //unshare_expr
+#include "value-range.h"   // make_ssa_name dependency
+#include "tree-ssanames.h" // make_ssa_name
+#include "ssa.h"
+#include "tree-into-ssa.h"
+#include "gimple-ssa.h" // update_stmt
+#include "tree.h"
+#include "gimple-expr.h"
+#include "predict.h"
+#include "alloc-pool.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "diagnostic.h"
+#include "fold-const.h"
+#include "gimple-fold.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "tree-pretty-print.h"
+#include "tree-inline.h"
+#include "ipa-fnsummary.h"
+#include "ipa-utils.h"
+#include "tree-ssa-ccp.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "tree-ssa-alias.h"
+#include "tree-ssanames.h"
+#include "gimple.h"
+#include "cfg.h"
+#include "gimple-iterator.h"
+#include "gimple-ssa.h"
+#include "gimple-pretty-print.h"
+
+#include "ipa-type-escape-analysis.h"
+#include

[PATCH 3/8 v4] Add Field Reordering

2020-12-04 Thread Erick Ochoa



Field reordering of structs at link-time

2020-11-04  Erick Ochoa  

* Makefile.in: Add new file to list of sources.
* common.opt: Add new flag for field reordering.
* passes.def: Add new pass.
* tree-pass.h: Same.
* ipa-field-reorder.c: New file.
* ipa-type-escape-analysis.c: Export common functions.
* ipa-type-escape-analysis.h: Same.
---
 gcc/Makefile.in|   1 +
 gcc/common.opt |   4 +
 gcc/ipa-dfe.c  |  86 -
 gcc/ipa-dfe.h  |  26 +-
 gcc/ipa-field-reorder.c| 622 +
 gcc/ipa-type-escape-analysis.c |  44 ++-
 gcc/ipa-type-escape-analysis.h |  12 +-
 gcc/passes.def |   1 +
 gcc/tree-pass.h|   2 +
 9 files changed, 749 insertions(+), 49 deletions(-)
 create mode 100644 gcc/ipa-field-reorder.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8ef6047870b..2184bd0fc3d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1417,6 +1417,7 @@ OBJS = \
internal-fn.o \
ipa-type-escape-analysis.o \
ipa-dfe.o \
+   ipa-field-reorder.o \
ipa-cp.o \
ipa-sra.o \
ipa-devirt.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 85351738a29..7885d0f5c0c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3468,4 +3468,8 @@ Wdfa
 Common Var(warn_dfa) Init(1) Warning
 Warn about dead fields at link time.
 +fipa-field-reorder
+Common Report Var(flag_ipa_field_reorder) Optimization
+Reorder fields.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
index 7c5e81bd6ac..7ab718c3628 100644
--- a/gcc/ipa-dfe.c
+++ b/gcc/ipa-dfe.c
@@ -185,7 +185,7 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

 {
   type_stringifier stringifier;
 -  type_reconstructor reconstructor (record_field_offset_map);
+  type_reconstructor reconstructor (record_field_offset_map, "reorg");
   for (std::set::const_iterator i = to_modify.begin (),
e = to_modify.end ();
i != e; ++i)
@@ -245,9 +245,9 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

  */
 void
 substitute_types_in_program (reorg_record_map_t map,
-reorg_field_map_t field_map)
+reorg_field_map_t field_map, bool _delete)
 {
-  gimple_type_rewriter rewriter (map, field_map);
+  gimple_type_rewriter rewriter (map, field_map, _delete);
   rewriter.walk ();
   rewriter._rewrite_function_decl ();
 }
@@ -361,8 +361,11 @@ type_reconstructor::set_is_not_modified_yet (tree t)
 return;
tree type = _reorg_map[tt];
-  const bool is_modified
+  bool is_modified
 = strstr (type_stringifier::get_type_identifier (type).c_str (), 
".reorg");

+  is_modified
+|= (bool) strstr (type_stringifier::get_type_identifier 
(type).c_str (),

+ ".reorder");
   if (!is_modified)
 return;
 @@ -408,14 +411,20 @@ type_reconstructor::is_memoized (tree t)
   return already_changed;
 }
 -static tree
-get_new_identifier (tree type)
+const char *
+type_reconstructor::get_new_suffix ()
+{
+  return _suffix;
+}
+
+tree
+get_new_identifier (tree type, const char *suffix)
 {
   const char *identifier = type_stringifier::get_type_identifier 
(type).c_str ();

-  const bool is_new_type = strstr (identifier, "reorg");
+  const bool is_new_type = strstr (identifier, suffix);
   gcc_assert (!is_new_type);
   char *new_name;
-  asprintf (&new_name, "%s.reorg", identifier);
+  asprintf (&new_name, "%s.%s", identifier, suffix);
   return get_identifier (new_name);
 }
 @@ -471,7 +480,9 @@ type_reconstructor::_walk_ARRAY_TYPE_post (tree t)
   TREE_TYPE (copy) = build_variant_type_copy (TREE_TYPE (copy));
   copy = is_modified ? build_distinct_type_copy (copy) : copy;
   TREE_TYPE (copy) = is_modified ? _reorg_map[TREE_TYPE (t)] : 
TREE_TYPE (copy);
-  TYPE_NAME (copy) = is_modified ? get_new_identifier (copy) : 
TYPE_NAME (copy);

+  TYPE_NAME (copy) = is_modified
+  ? get_new_identifier (copy, this->get_new_suffix ())
+  : TYPE_NAME (copy);
   // This is useful so that we go again through type layout
   TYPE_SIZE (copy) = is_modified ? NULL : TYPE_SIZE (copy);
   tree domain = TYPE_DOMAIN (t);
@@ -524,7 +535,9 @@ type_reconstructor::_walk_POINTER_TYPE_post (tree t)
copy = is_modified ? build_variant_type_copy (copy) : copy;
   TREE_TYPE (copy) = is_modified ? _reorg_map[TREE_TYPE (t)] : 
TREE_TYPE (copy);
-  TYPE_NAME (copy) = is_modified ? get_new_identifier (copy) : 
TYPE_NAME (copy);

+  TYPE_NAME (copy) = is_modified
+  ? get_new_identifier (copy, this->get_new_suffix ())
+  : TYPE_NAME (copy);
   TYPE_CACHED_VALUES_P (copy) = false;
tree _t = tree_to_tree (t);
@@ -619,7 +632,8 @@ type_reconstructor::_walk_RECORD_TYPE_post (tree t)
   tree main = TYPE_MAIN_VARIANT (t);
  

[PATCH 4/8 v4] Add documentation for dead field elimination

2020-12-04 Thread Erick Ochoa



2020-11-04  Erick Ochoa  

* Makefile.in: Add file to documentation sources.
* doc/dfe.texi: New section.
* doc/gccint.texi: Include new section.
---
 gcc/Makefile.in |   3 +-
 gcc/doc/dfe.texi| 187 
 gcc/doc/gccint.texi |   2 +
 3 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100644 gcc/doc/dfe.texi

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2184bd0fc3d..7e4c442416d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3275,7 +3275,8 @@ TEXI_GCCINT_FILES = gccint.texi gcc-common.texi 
gcc-vers.texi		\

 gnu.texi gpl_v3.texi fdl.texi contrib.texi languages.texi  \
 sourcebuild.texi gty.texi libgcc.texi cfg.texi tree-ssa.texi   \
 loop.texi generic.texi gimple.texi plugins.texi optinfo.texi   \
-match-and-simplify.texi analyzer.texi ux.texi poly-int.texi
+match-and-simplify.texi analyzer.texi ux.texi poly-int.texi\
+dfe.texi
  TEXI_GCCINSTALL_FILES = install.texi install-old.texi fdl.texi
\
 gcc-common.texi gcc-vers.texi
diff --git a/gcc/doc/dfe.texi b/gcc/doc/dfe.texi
new file mode 100644
index 000..e8d01d817d3
--- /dev/null
+++ b/gcc/doc/dfe.texi
@@ -0,0 +1,187 @@
+@c Copyright (C) 2001 Free Software Foundation, Inc.
+@c This is part of the GCC manual.
+@c For copying conditions, see the file gcc.texi.
+
+@node Dead Field Elimination
+@chapter Dead Field Elimination
+
+@node Dead Field Elimination Internals
+@section Dead Field Elimination Internals
+
+@subsection Introduction
+
+Dead field elimination is a compiler transformation that removes fields 
from structs. There are several challenges to removing fields from 
structs at link time but, depending on the workload of the compiled 
program and the architecture where the program runs, dead field 
elimination might be a worthwhile transformation to apply. Generally 
speaking, when the bottle-neck of an application is given by the memory 
bandwidth of the host system and the memory requested is of a struct 
which can be reduced in size, then that combination of workload, program 
and architecture can benefit from applying dead field elimination. The 
benefits come from removing unnecessary fields from structures and thus 
reducing the memory/cache requirements to represent a structure.  +

+ +
+While challenges exist to fully automate a dead field elimination 
transformation, similar and more powerful optimizations have been 
implemented in the past. Chakrabarti et al [0] implement struct peeling, 
splitting into hot and cold parts of a structure, and field reordering. 
Golovanevsky et al [1] also shows efforts to implement data layout 
optimizations at link time. Unlike the work of Chakrabarti and 
Golovanesky, this text only talks about dead field elimination. This 
doesn't mean that the implementation can't be expanded to perform other 
link-time layout optimizations, it just means that dead field 
elimination is the only transformation that is implemented at the time 
of this writing. +
+[0] Chakrabarti, Gautam, Fred Chow, and L. PathScale. "Structure layout 
optimizations in the open64 compiler: Design, implementation and 
measurements." Open64 Workshop at the International Symposium on Code 
Generation and Optimization. 2008.  +
+[1] Golovanevsky, Olga, and Ayal Zaks. "Struct-reorg: current status 
and future perspectives." Proceedings of the GCC Developers’ Summit. 
2007.  +

+@subsection Overview
+
+The dead field implementation is structured in the following way:  +
+ +@itemize @bullet
+@item
+Collect all types which can refer to a @code{RECORD_TYPE}. This means 
that if we have a pointer to a record, we also collect this pointer. Or 
an array, or a union. +@item

+Mark types as escaping. More of this in the following section.  +@item
+Find fields which can be deleted. (Iterate over all gimple code and 
find which fields are read.)  +@item
+Create new types with removed fields (and reference these types in 
pointers, arrays, etc.)  +@item

+Modify gimple to include these types.  +@end itemize
+
+
+Most of this code relies on the visitor pattern. Types, Expr, and 
Gimple statements are visited using this pattern. You can find the base 
classes in @file{type-walker.c} @file{expr-walker.c} and 
@file{gimple-walker.c}. There are assertions in place where a type, 
expr, or gimple code is encountered which has not been encountered 
before during the testing of this transformation. This facilitates 
fuzzying of the transformation.

+
+@subsubsection Implementation Details: Is a global variable escaping?
+
+How does the analysis determine whether a global variable is visible to 
code outside the current linking unit? In the file 
@file{gimple-escaper.c} we have a simple function called 
@code{is_variable_escaping} which checks whether a variable is visible 
to code outside the current linking unit by looking at the 
@code{varpool_node}’s @code{externally_visible} field. +

+

[PATCH 5/8 v4] Abort if Gimple from C++ or Fortran sources is found.

2020-12-04 Thread Erick Ochoa



2020-11-04  Erick Ochoa  

* ipa-field-reorder: Add flag to exit transformation.
* ipa-type-escape-analysis: Same.
---
 gcc/ipa-field-reorder.c|  3 +-
 gcc/ipa-type-escape-analysis.c | 54 --
 gcc/ipa-type-escape-analysis.h |  2 ++
 3 files changed, 49 insertions(+), 10 deletions(-)

diff --git a/gcc/ipa-field-reorder.c b/gcc/ipa-field-reorder.c
index e1094efe934..633a5a7cedc 100644
--- a/gcc/ipa-field-reorder.c
+++ b/gcc/ipa-field-reorder.c
@@ -587,6 +587,7 @@ lto_fr_execute ()
 {
   log ("here in field reordering \n");
   // Analysis.
+  detected_incompatible_syntax = false;
   tpartitions_t escaping_nonescaping_sets
 = partition_types_into_escaping_nonescaping ();
   record_field_map_t record_field_map = find_fields_accessed ();
@@ -594,7 +595,7 @@ lto_fr_execute ()
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
record_field_map, 0);
 -  if (record_field_offset_map.empty ())
+  if (detected_incompatible_syntax || record_field_offset_map.empty ())
 return 0;
// Prepare for transformation.
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
index f142b6e51ca..970b74630dd 100644
--- a/gcc/ipa-type-escape-analysis.c
+++ b/gcc/ipa-type-escape-analysis.c
@@ -171,6 +171,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-type-escape-analysis.h"
 #include "ipa-dfe.h"
 +#define ABORT_IF_NOT_C true
+
+bool detected_incompatible_syntax = false;
+
 // Main function that drives dfe.
 static unsigned int
 lto_dfe_execute ();
@@ -262,13 +266,15 @@ lto_dead_field_elimination ()
 if (cnode->inlined_to) continue;
 cnode->get_body();
   }
+
+  detected_incompatible_syntax = false;
   tpartitions_t escaping_nonescaping_sets
 = partition_types_into_escaping_nonescaping ();
   record_field_map_t record_field_map = find_fields_accessed ();
   record_field_offset_map_t record_field_offset_map
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
record_field_map, OPT_Wdfa);
-  if (record_field_offset_map.empty ())
+  if (detected_incompatible_syntax || record_field_offset_map.empty ())
 return;
  // Prepare for transformation.
@@ -588,6 +594,7 @@ type_walker::_walk (tree type)
   // Improve, verify that having a type is an invariant.
   // I think there was a specific example which didn't
   // allow for it
+  if (detected_incompatible_syntax) return;
   if (!type)
 return;
 @@ -641,9 +648,9 @@ type_walker::_walk (tree type)
 case POINTER_TYPE:
   this->walk_POINTER_TYPE (type);
   break;
-case REFERENCE_TYPE:
-  this->walk_REFERENCE_TYPE (type);
-  break;
+//case REFERENCE_TYPE:
+//  this->walk_REFERENCE_TYPE (type);
+//  break;
 case ARRAY_TYPE:
   this->walk_ARRAY_TYPE (type);
   break;
@@ -653,18 +660,24 @@ type_walker::_walk (tree type)
 case FUNCTION_TYPE:
   this->walk_FUNCTION_TYPE (type);
   break;
-case METHOD_TYPE:
-  this->walk_METHOD_TYPE (type);
-  break;
+//case METHOD_TYPE:
+  //this->walk_METHOD_TYPE (type);
+  //break;
 // Since we are dealing only with C at the moment,
 // we don't care about QUAL_UNION_TYPE nor LANG_TYPEs
 // So fail early.
+case REFERENCE_TYPE:
+case METHOD_TYPE:
 case QUAL_UNION_TYPE:
 case LANG_TYPE:
 default:
   {
log ("missing %s\n", get_tree_code_name (code));
+#ifdef ABORT_IF_NOT_C
+   detected_incompatible_syntax = true;
+#else
gcc_unreachable ();
+#endif
   }
   break;
 }
@@ -847,6 +860,7 @@ type_walker::_walk_arg (tree t)
 void
 expr_walker::walk (tree e)
 {
+  if (detected_incompatible_syntax) return;
   _walk_pre (e);
   _walk (e);
   _walk_post (e);
@@ -931,7 +945,11 @@ expr_walker::_walk (tree e)
 default:
   {
log ("missing %s\n", get_tree_code_name (code));
+#ifdef ABORT_IF_NOT_C
+   detected_incompatible_syntax = true;
+#else
gcc_unreachable ();
+#endif
   }
   break;
 }
@@ -1164,6 +1182,7 @@ gimple_walker::walk ()
   cgraph_node *node = NULL;
   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
 {
+  if (detected_incompatible_syntax) return;
   node->get_untransformed_body ();
   tree decl = node->decl;
   gcc_assert (decl);
@@ -1410,7 +1429,11 @@ gimple_walker::_walk_gimple (gimple *stmt)
   // Break if something is unexpected.
   const char *name = gimple_code_name[code];
   log ("gimple code name %s\n", name);
+#ifdef ABORT_IF_NOT_C
+  detected_incompatible_syntax = true;
+#else
   gcc_unreachable ();
+#endif
 }
  void
@@ -2960,6 +2983,8 @@ type_stringifier::stringify (tree t)
 return std::string ("");
   _stringification.clear ();
   gcc_assert (t);
+  if (detected_incompatible_syntax)
+return std::string ("");
   walk (t);
   return _stringification;
 }
@@ -3150,14 +3175,19 @@ type_stringifier::_walk_ar

[PATCH 6/8 v4] Add heuristic to take into account void* pattern.

2020-12-04 Thread Erick Ochoa



We add a heuristic in order to be able to transform functions which
receive void* arguments as a way to generalize over arguments. An
example of this is qsort. The heuristic works by first inspecting
leaves in the call graph. If the leaves only contain a reference
to a single RECORD_TYPE then we color the nodes in the call graph
as "casts are safe in this function and does not call external
visible functions". We propagate this property up the callgraph
until a fixed point is reached. This will later be changed to
use ipa-modref.

2020-11-04  Erick Ochoa  

* ipa-type-escape-analysis.c : Add new heuristic.
* ipa-field-reorder.c : Use heuristic.
* ipa-type-escape-analysis.h : Change signatures.
---
 gcc/ipa-field-reorder.c|   3 +-
 gcc/ipa-type-escape-analysis.c | 193 +++--
 gcc/ipa-type-escape-analysis.h |  78 +++--
 3 files changed, 259 insertions(+), 15 deletions(-)

diff --git a/gcc/ipa-field-reorder.c b/gcc/ipa-field-reorder.c
index 633a5a7cedc..70d26d71324 100644
--- a/gcc/ipa-field-reorder.c
+++ b/gcc/ipa-field-reorder.c
@@ -588,8 +588,9 @@ lto_fr_execute ()
   log ("here in field reordering \n");
   // Analysis.
   detected_incompatible_syntax = false;
+  std::map whitelisted = get_whitelisted_nodes();
   tpartitions_t escaping_nonescaping_sets
-= partition_types_into_escaping_nonescaping ();
+= partition_types_into_escaping_nonescaping (whitelisted);
   record_field_map_t record_field_map = find_fields_accessed ();
   record_field_offset_map_t record_field_offset_map
 = obtain_nonescaping_unaccessed_fields (escaping_nonescaping_sets,
diff --git a/gcc/ipa-type-escape-analysis.c b/gcc/ipa-type-escape-analysis.c
index 970b74630dd..48d8dc2bcd8 100644
--- a/gcc/ipa-type-escape-analysis.c
+++ b/gcc/ipa-type-escape-analysis.c
@@ -104,6 +104,7 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#include 
  #include "config.h"
 #include "system.h"
@@ -249,6 +250,99 @@ lto_dfe_execute ()
   return 0;
 }
 +/* Heuristic to determine if casting is allowed in a function.
+ * This heuristic attempts to allow casting in functions which follow the
+ * pattern where a struct pointer or array pointer is casted to void* 
or + * char*.  The heuristic works as follows:

+ *
+ * There is a simple per-function analysis that determines whether there
+ * is more than 1 type of struct referenced in the body of the method.
+ * If there is more than 1 type of struct referenced in the body,
+ * then the layout of the structures referenced within the body
+ * cannot be casted.  However, if there's only one type of struct 
referenced

+ * in the body of the function, casting is allowed in the function itself.
+ * The logic behind this is that the if the code follows good programming
+ * practices, the only way the memory should be accessed is via a singular
+ * type. There is also another requisite to this per-function analysis, and
+ * that is that the function can only call colored functions or functions
+ * which are available in the linking unit.
+ *
+ * Using this per-function analysis, we then start coloring leaf nodes 
in the
+ * call graph as ``safe'' or ``unsafe''.  The color is propagated to 
the + * callers of the functions until a fixed point is reached.

+ */
+std::map
+get_whitelisted_nodes ()
+{
+  cgraph_node *node = NULL;
+  std::set nodes;
+  std::set leaf_nodes;
+  std::set leaf_nodes_decl;
+  FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
+  {
+node->get_untransformed_body ();
+nodes.insert(node);
+if (node->callees) continue;
+
+leaf_nodes.insert (node);
+leaf_nodes_decl.insert (node->decl);
+  }
+
+  std::queue worklist;
+  for (std::set::iterator i = leaf_nodes.begin (),
+e = leaf_nodes.end (); i != e; ++i)
+  {
+if (dump_file) fprintf (dump_file, "is a leaf node %s\n", 
(*i)->name ());

+worklist.push (*i);
+  }
+
+  for (std::set::iterator i = nodes.begin (),
+e = nodes.end (); i != e; ++i)
+  {
+worklist.push (*i);
+  }
+
+  std::map map;
+  while (!worklist.empty ())
+  {
+
+if (detected_incompatible_syntax) return map;
+cgraph_node *i = worklist.front ();
+worklist.pop ();
+if (dump_file) fprintf (dump_file, "analyzing %s %p\n", i->name (), 
(void*)i);

+gimple_white_lister whitelister;
+whitelister._walk_cnode (i);
+bool no_external = whitelister.does_not_call_external_functions (i, 
map);

+bool before_in_map = map.find (i->decl) != map.end ();
+bool place_callers_in_worklist = !before_in_map;
+if (!before_in_map)
+{
+  map.insert(std::pair(i->decl, no_external));
+} else
+{
+  map[i->decl] = no_external;
+}
+bool previous_value = map[i->decl];
+place_callers_in_worklist |= previous_value != no_external;
+if (previous_value != no_external)
+{
+   // This ensures we are having a total order
+   // from no_external -> !no_external
+   gcc_assert (!previous_value);
+   

[PATCH 7/8 v4] Add tests

2020-12-04 Thread Erick Ochoa



---
 gcc/common.opt|  4 ++
 gcc/ipa-type-escape-analysis.c| 11 +
 .../ipa/ipa-access-counter-00-simple-read-0.c | 22 ++
 .../ipa-access-counter-01-simple-write-0.c| 22 ++
 .../ipa-access-counter-02-pointer-read-0.c| 22 ++
 .../ipa-access-counter-03-pointer-write-0.c   | 22 ++
 .../ipa/ipa-access-counter-04-gimple-cond-0.c | 24 ++
 .../ipa/ipa-ea-00-collect-global-record-0.c   | 20 +
 ...a-01-collect-global-pointers-to-record-0.c | 23 ++
 ...a-ea-02-collect-global-array-to-record-0.c | 23 ++
 .../ipa/ipa-ea-03-collect-nested-record-0.c   | 26 +++
 .../ipa/ipa-ea-04-collect-parameters-0.c  | 40 +
 .../gcc.dg/ipa/ipa-ea-05-global-escapes-0.c   | 37 
 .../ipa/ipa-ea-06-global-type-escapes-0.c | 42 ++
 .../ipa/ipa-ea-08-parameter-escapes-0.c   | 37 
 .../ipa/ipa-ea-10-return-type-escapes-0.c | 42 ++
 .../gcc.dg/ipa/ipa-ea-11-cast-to-void-ptr-0.c | 44 +++
 .../gcc.dg/ipa/ipa-ea-12-cast-to-void-ptr-0.c | 38 
 .../gcc.dg/ipa/ipa-ea-13-calling-printf-0.c   | 27 
 .../gcc.dg/ipa/ipa-ea-14-volatile-0.c | 20 +
 gcc/testsuite/gcc.dg/ipa/ipa-ea-15-union-0.c  | 25 +++
 .../gcc.dg/ipa/ipa-ea-16-parameter-cast-0.c   | 32 ++
 gcc/testsuite/gcc.dg/ipa/ipa-ea-17-malloc-0.c | 23 ++
 .../ipa/ipa-structreorg-03-new-type-0.c   | 21 +
 ...pa-structreorg-04-heterogeneous-struct-0.c | 21 +
 .../ipa/ipa-structreorg-04-layout-compile-0.c | 21 +
 .../ipa/ipa-structreorg-05-field-reads-0.c| 20 +
 .../ipa/ipa-structreorg-05-nested-struct-0.c  | 30 +
 .../ipa-structreorg-05-rewrite-local-decl-0.c | 23 ++
 .../ipa/ipa-structreorg-06-field-writes-0.c   | 24 ++
 .../ipa/ipa-structreorg-06-pointer-struct-0.c | 31 +
 .../ipa-structreorg-07-delete-first-field-0.c | 23 ++
 ...pa-structreorg-08-modify-double-struct-0.c | 33 ++
 .../ipa-structreorg-09-modify-int-struct-0.c  | 22 ++
 .../ipa/ipa-structreorg-1-prints-structs-0.c  | 19 
 .../gcc.dg/ipa/ipa-structreorg-10-array-0.c   | 23 ++
 ...ipa-structreorg-11-rewrites-minus-expr-0.c | 22 ++
 .../ipa-structreorg-12-delete-last-field-0.c  | 23 ++
 .../ipa-structreorg-13-modify-size-four-0.c   | 25 +++
 .../ipa-structreorg-14-rewrite-plus-expr-0.c  | 25 +++
 .../ipa-structreorg-15-rewrite-mult-expr-0.c  | 25 +++
 ...structreorg-16-rewrite-field-reads-ptr-0.c | 24 ++
 ...structreorg-17-rewrite-field-write-ptr-0.c | 23 ++
 .../ipa-structreorg-18-field-writes-deref-0.c | 26 +++
 ...pa-structreorg-19-middle-pointer-equal-0.c | 26 +++
 .../gcc.dg/ipa/ipa-structreorg-2-modifies-0.c | 19 
 .../ipa/ipa-structreorg-20-array-offset-0.c   | 24 ++
 ...structreorg-22-rewrites-addr-expr-read-0.c | 23 ++
 .../ipa/ipa-structreorg-23-array-cast-0.c | 31 +
 .../ipa/ipa-structreorg-25-array-cast-0.c | 31 +
 .../ipa/ipa-structreorg-26-array-cast-0.c | 24 ++
 .../ipa/ipa-structreorg-27-array-cast-0.c | 21 +
 .../ipa-structreorg-29-heterogeneous-struct.c | 22 ++
 ...ipa-structreorg-30-heterogenous-struct-0.c | 27 
 ...ipa-structreorg-31-heterogenous-struct-0.c | 30 +
 .../ipa/ipa-structreorg-33-nested-struct-0.c  | 39 
 ...ructreorg-33-pointer-indirection-level-0.c | 26 +++
 .../ipa/ipa-structreorg-34-array-cast-0.c | 26 +++
 .../ipa/ipa-structreorg-36-arguments-0.c  | 42 ++
 .../ipa/ipa-structreorg-37-arguments-0.c  | 43 ++
 .../ipa/ipa-structreorg-38-return-values-0.c  | 39 
 .../gcc.dg/ipa/ipa-structreorg-39-typedef-0.c | 21 +
 .../gcc.dg/ipa/ipa-structreorg-40-typedef-0.c | 22 ++
 .../gcc.dg/ipa/ipa-structreorg-41-deref-0.c   | 30 +
 .../gcc.dg/ipa/ipa-structreorg-42-mem-ref-0.c | 32 ++
 .../gcc.dg/ipa/ipa-structreorg-43-args-0.c| 39 
 .../gcc.dg/ipa/ipa-structreorg-44-cond-0.c| 16 +++
 .../gcc.dg/ipa/ipa-structreorg-45-phis-0.c| 16 +++
 .../gcc.dg/ipa/ipa-structreorg-46-static-0.c  | 23 ++
 .../ipa/ipa-structreorg-47-constructor-0.c| 27 
 .../ipa/ipa-structreorg-48-function-ptr-0.c   | 44 +++
 .../ipa-structreorg-50-field-write-delete-0.c | 24 ++
 .../gcc.dg/ipa/ipa-structreorg-51-creduce-0.c |  6 +++
 .../gcc.dg/ipa/ipa-structreorg-52-creduce-1.c | 13 ++
 .../gcc.dg/ipa/ipa-structreorg-53-csmith-2.c  |  6 +++
 .../gcc.dg/ipa/ipa-structreorg-54-csmith-3.c  | 24 ++
 .../gcc.dg/ipa/ipa-structreorg-55-csmith-4.c  |  9 
 .../gcc.dg/ipa/ipa-structreorg-56-csmith-5.c  | 25 ++

[PATCH 8/8 v4] The Great STL migration

2020-12-04 Thread Erick Ochoa



---
 gcc/ipa-dfe.c  | 262 +++---
 gcc/ipa-dfe.h  | 108 +++---
 gcc/ipa-field-reorder.c| 134 +++
 gcc/ipa-type-escape-analysis.c | 636 -
 gcc/ipa-type-escape-analysis.h | 160 -
 5 files changed, 643 insertions(+), 657 deletions(-)

diff --git a/gcc/ipa-dfe.c b/gcc/ipa-dfe.c
index 7ab718c3628..98427e8e423 100644
--- a/gcc/ipa-dfe.c
+++ b/gcc/ipa-dfe.c
@@ -129,31 +129,31 @@ along with GCC; see the file COPYING3.  If not see
  * Find all non_escaping types which point to RECORD_TYPEs in
  * record_field_offset_map.
  */
-std::set
-get_all_types_pointing_to (record_field_offset_map_t 
record_field_offset_map,

-  tpartitions_t casting)
+void
+get_all_types_pointing_to (record_field_offset_map4_t 
&record_field_offset_map2,

+  tpartitions2_t casting,
+  hash_set &to_modify2)
 {
-  const tset_t &non_escaping = casting.non_escaping;
+  tset2_t &non_escaping = casting.non_escaping;
 -  std::set specific_types;
   type_stringifier stringifier;
+  hash_set specific_types2;
// Here we are just placing the types of interest in a set.
-  for (std::map::const_iterator i
-   = record_field_offset_map.begin (),
-   e = record_field_offset_map.end ();
+  for (hash_map::iterator i
+   = record_field_offset_map2.begin (),
+   e = record_field_offset_map2.end ();
i != e; ++i)
 {
-  tree record = i->first;
-  std::string name = stringifier.stringify (record);
-  specific_types.insert (record);
+  tree record = (*i).first;
+  specific_types2.add (record);
 }
 -  specific_type_collector specifier (specific_types);
+  specific_type_collector specifier (&specific_types2);
// SpecificTypeCollector will collect all types which point to the 
types in

   // the set.
-  for (std::set::const_iterator i = non_escaping.begin (),
+  for (auto i = non_escaping.begin (),
e = non_escaping.end ();
i != e; ++i)
 {
@@ -162,8 +162,11 @@ get_all_types_pointing_to 
(record_field_offset_map_t record_field_offset_map,

 }
// These are all the types which need modifications.
-  std::set to_modify = specifier.get_set ();
-  return to_modify;
+  hash_set to_modify = specifier.get_set2 ();
+  for (hash_set::iterator i = to_modify.begin(), e = 
to_modify.end(); i != e; ++i)

+  {
+to_modify2.add (*i);
+  }
 }
  /* record_field_offset_map holds information on which FIELD_DECLs 
might be
@@ -180,13 +183,13 @@ get_all_types_pointing_to 
(record_field_offset_map_t record_field_offset_map,

  * The second maps old FIELD_DECLs trees to the new FIELD_DECLs.
  */
 reorg_maps_t
-get_types_replacement (record_field_offset_map_t record_field_offset_map,
-  std::set to_modify)
+get_types_replacement (record_field_offset_map4_t 
&record_field_offset_map2,
+		   hash_set &to_modify, reorg_record_map2_t &map2, 
reorg_field_map2_t &field_map2)

 {
   type_stringifier stringifier;
 -  type_reconstructor reconstructor (record_field_offset_map, "reorg");
-  for (std::set::const_iterator i = to_modify.begin (),
+  type_reconstructor reconstructor (record_field_offset_map2, "reorg", 
map2, field_map2);

+  for (hash_set::iterator i = to_modify.begin (),
e = to_modify.end ();
i != e; ++i)
 {
@@ -194,7 +197,7 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

   reconstructor.walk (TYPE_MAIN_VARIANT (record));
 }
 -  for (std::set::const_iterator i = to_modify.begin (),
+  for (hash_set::iterator i = to_modify.begin (),
e = to_modify.end ();
i != e; ++i)
 {
@@ -202,20 +205,17 @@ get_types_replacement (record_field_offset_map_t 
record_field_offset_map,

   reconstructor.walk (record);
 }
 -  reorg_record_map_t map = reconstructor.get_map ();
-  reorg_field_map_t field_map = reconstructor.get_field_map ();
-
   // Here, we are just making sure that we are not doing anything too 
crazy.

   // Also, we found some types for which TYPE_CACHED_VALUES_P is not being
   // rewritten.  This is probably indicative of a bug in 
TypeReconstructor.

-  for (std::map::const_iterator i = map.begin (),
- e = map.end ();
+  for (hash_map::iterator i = map2.begin (),
+ e = map2.end ();
i != e; ++i)
 {
-  tree o_record = i->first;
+  tree o_record = (*i).first;
   std::string o_name = stringifier.stringify (o_record);
   log ("original: %s\n", o_name.c_str ());
-  tree r_record = i->second;
+  tree r_record = (*i).second;
   std::string r_name
= r_record ? stringifier.stringify (r_record) : std::string ("");
   log ("modified: %s\n", r_name.c_str ());
@@ -227,16 +227,17 @@ get_types_

[PATCH] debug: Fix another vector DECL_MODE ICE [PR98100]

2020-12-04 Thread Jakub Jelinek via Gcc-patches
Hi!

The PR88587 fix changes DECL_MODE of vars with vector type during 
inlining/cloning
when the vars are copied, so that their DECL_MODE matches their TYPE_MODE in
the new function.  Unfortunately, the following testcase still ICEs, the var
isn't really used in the new function and so it isn't copied, but becomes
just a nonlocalized var.  So we can't adjust its DECL_MODE because it
appears in multiple functions and needs different modes in between them.
The following patch changes the DEBUG_INSN creation to use TYPE_MODE instead
of DECL_MODE for vars with vector types.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-03  Jakub Jelinek  

PR target/98100
* cfgexpand.c (expand_gimple_basic_block): For vars with
vector type, use TYPE_MODE rather than DECL_MODE.

* gcc.target/i386/pr98100.c: New test.

--- gcc/cfgexpand.c.jj  2020-11-26 01:14:47.443082924 +0100
+++ gcc/cfgexpand.c 2020-12-03 14:25:07.772537435 +0100
@@ -5919,7 +5919,7 @@ expand_gimple_basic_block (basic_block b
  && !target_for_debug_bind (var))
goto delink_debug_stmt;
 
- if (DECL_P (var))
+ if (DECL_P (var) && !VECTOR_TYPE_P (TREE_TYPE (var)))
mode = DECL_MODE (var);
  else
mode = TYPE_MODE (TREE_TYPE (var));
@@ -5936,7 +5936,10 @@ expand_gimple_basic_block (basic_block b
 
  value = gimple_debug_source_bind_get_value (stmt);
 
- mode = DECL_MODE (var);
+ if (!VECTOR_TYPE_P (TREE_TYPE (var)))
+   mode = DECL_MODE (var);
+ else
+   mode = TYPE_MODE (TREE_TYPE (var));
 
  val = gen_rtx_VAR_LOCATION (mode, var, (rtx)value,
  
VAR_INIT_STATUS_UNINITIALIZED);
--- gcc/testsuite/gcc.target/i386/pr98100.c.jj  2020-12-03 14:31:46.795106677 
+0100
+++ gcc/testsuite/gcc.target/i386/pr98100.c 2020-12-03 14:31:16.862439052 
+0100
@@ -0,0 +1,9 @@
+/* PR target/98100 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-avx -fvar-tracking-assignments -g0" } */
+
+__attribute__((target_clones("default","avx2"))) void
+foo ()
+{
+  __attribute__((__vector_size__(8 * sizeof(int int b = {};
+}

Jakub



[PATCH] testsuite: Fix various scan-assembler-symbol-section issues

2020-12-04 Thread Rainer Orth
I recently started looking into scan-assembler-symbol-section since all
tests using it were FAILing on Solaris/SPARC.  Unfortuntely, the more I
looked the more issues I found, both with the implementation and the
interface.  This patch addresses some of those, but there are quite a
number of open questions.

* The first issue was that on Solaris/SPARC, section names are
  double-quoted, both with as and gas:

.section".text"

  When using as, the section flag and type syntax is completely
  different from other ELF targets:

.section"my_named_section",#alloc,#execinstr,#progbits

  This patch fixes this by stripping double quotes from section names.

* However, this didn't work initially (only the leading quote was
  stripped), which is due to David's recent AIX patch: with the
  introduction of the new capturing group to handle both .section (ELF)
  and .csect (XCOFF), $full_section_directive would never be empty on
  ELF and Mach-O targets, so the extraction of the section name didn't
  work any longer.  This had also broken the Darwin tests completely.

* Wondering why this hadn't been captured by the framework tests led me
  to

skipping test framework tests, CHECK_TEST_FRAMEWORK is not defined

  so this issue was all too easy to miss.  I'll get back to the
  framework tests later: there's a whole lot of problems there even if
  run.

* With working double quote stripping, all but one of the tests PASSed
  on Solaris/SPARC, the exception being:

FAIL: gcc.dg/20021029-1.c scan-assembler-symbol-section symbol ar (found 
__sparc_get_pc_thunk.l7) has section ^.(const|rodata)|[RO] (found 
.text.__sparc_get_pc_thunk.l7%__sparc_get_pc_thunk.l7)

  This is due to the symbol name (ar) not being anchored in the test and
  unexpectedly matchting __sparc_get_pc_thunk.l7.  Easily fixed, but it
  left me wondering about the interface: currently, every user of
  scan-assembler-symbol-section has to deal with anchoring and handling
  USER_LABEL_PREFIX herself.  It seems to me that this were better
  handled in the framework instead.  However, in the case at hand the
  actual symbol name is "ar.0" and I wonder if this can change in the
  future, i.e. we might need the generality of regexps for the symbol
  name here.

* Next, I ran the tests on Darwin 11 and found two failing tests:

FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_a\$ 
(symbol not found) has section .data
FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_b\$ 
(symbol not found) has section .data

  is due to Iain's recent "Darwin : Begin rework of zero-fill sections."
  patch which emits

.globl _a
.zerofill __DATA,__common,_a,1,0

  This is already scanned for, so the two scans above can just go.

  The other failing test is

FAIL: g++.dg/gomp/tls-5.C  -std=c++14  scan-assembler-symbol-section symbol 
^_?_ZGR2ir_\$ (symbol not found) has section ^.tdata|[TL]
FAIL: g++.dg/gomp/tls-5.C  -std=c++14  scan-assembler-symbol-section symbol 
^_?ir\$ (symbol not found) has section ^.tbss|[TL]

  Other scans are guarded by target tls_native, and indeed the assembler
  output has

___emutls_v._ZGR2ir_:
___emutls_t._ZGR2ir_:

___emutls_v.ir:

  Unfortunately scan-assembler-symbol-section doesn't support selects
  yet, which this test implements both for the benefit of this test and
  for symmetry.

With those changes, test results are clean now on sparc-sun-solaris2.11,
i386-pc-solaris2.11, i386-apple-darwin11.4.2, and
powerpc-ibm-aix7.2.4.0.

On AIX 7.2, there are changes like

-PASS: g++.dg/gomp/tls-5.C  -std=c++2a  scan-assembler-symbol-section symbol 
^_?ir$ (found ir) has section ^\\.tbss|\\[TL\\] (found _tls5.tls_[TL],4)
+PASS: g++.dg/gomp/tls-5.C  -std=c++2a  scan-assembler-symbol-section symbol 
^_?ir$ (found ir) has section ^\\.tbss|\\[TL\\] (found _tls5.tls_[TL])

i.e. the ",4" after (?) the section name is now stripped.  I believe
this is benign: David?

Besides, I've documented scan-assembler-symbol-section and
scan-symbol-section in sourcebuild.texi as should have been done from
the beginning.

That's not the end of things, unfortunately:

* I find my selector handling code in dg-scan-symbol-section
  particularly ugly since it has to deal with the different argument
  indices in

  scan-assembler-symbol-section func section [selector]
  scan-symbol-section filename func section [selector]

  which made me wonder about the need for scan-symbol-section in
  general.  Right now, it's only used in the framework tests, and no
  other scan-* function (beside scan-file and scan-file-not, of course)
  has an explicit filename argument.

  Matthew, did you have any use in mind outside of the framework tests
  that justifies keeping it?

  Trying to get rid of it in the framework tests opened a can of worms,
  unfortunately: running

  gcc -S test.S -o test.s

  (equivalent to what the testsuite does) yields

# 1 "test.S"
# 1 

[PATCH] fold-const: Don't use build_constructor for non-aggregate types in native_encode_initializer [PR93121]

2020-12-04 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase is rejected, because when trying to encode a zeroing
CONSTRUCTOR, the code was using build_constructor to build initializers for
the elements but when recursing the function handles CONSTRUCTOR only for
aggregate types.

The following patch fixes that by using build_zero_cst instead for
non-aggregates.  Another option would be add handling CONSTRUCTOR for
non-aggregates in native_encode_initializer.  Or we can do both, I guess
the middle-end generally doesn't like CONSTRUCTORs for scalar variables, but
am not 100% sure if the FE doesn't produce those sometimes.

Ok for trunk if it passes bootstrap/regtest?
So far it passed
make check-c++-all RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
dg.exp=bit-cast*'

2020-12-04  Jakub Jelinek  

PR libstd++/93121
* fold-const.c (native_encode_initializer): Use build_constructor
only for aggregate types, otherwise use build_zero_cst.

* g++.dg/cpp2a/bit-cast6.C: New test.

--- gcc/fold-const.c.jj 2020-12-03 15:37:25.795342398 +0100
+++ gcc/fold-const.c2020-12-04 11:25:54.949421799 +0100
@@ -8104,11 +8104,16 @@ native_encode_initializer (tree init, un
{
  if (valueinit == -1)
{
- tree zero = build_constructor (TREE_TYPE (type), NULL);
+ tree zero;
+ if (AGGREGATE_TYPE_P (TREE_TYPE (type)))
+   zero = build_constructor (TREE_TYPE (type), NULL);
+ else
+   zero = build_zero_cst (TREE_TYPE (type));
  r = native_encode_initializer (zero, ptr + curpos,
 fieldsize, 0,
 mask + curpos);
- ggc_free (zero);
+ if (TREE_CODE (zero) == CONSTRUCTOR)
+   ggc_free (zero);
  if (!r)
return 0;
  valueinit = curpos;
@@ -8255,8 +8260,13 @@ native_encode_initializer (tree init, un
{
  cnt--;
  field = fld;
- val = build_constructor (TREE_TYPE (fld), NULL);
- to_free = val;
+ if (AGGREGATE_TYPE_P (TREE_TYPE (fld)))
+   {
+ val = build_constructor (TREE_TYPE (fld), NULL);
+ to_free = val;
+   }
+ else
+   val = build_zero_cst (TREE_TYPE (fld));
}
}
 
--- gcc/testsuite/g++.dg/cpp2a/bit-cast6.C.jj   2020-12-04 11:36:12.963456560 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/bit-cast6.C  2020-12-04 11:35:59.227611364 
+0100
@@ -0,0 +1,31 @@
+// PR libstd++/93121
+// { dg-do compile { target c++20 } }
+
+namespace std
+{
+enum class byte : unsigned char {};
+template 
+constexpr To
+bit_cast (const From &from)
+{
+  return __builtin_bit_cast (To, from);
+}
+}
+
+struct S { unsigned short s[2]; };
+constexpr std::byte from1[sizeof (S)]{};
+constexpr auto to1 = std::bit_cast(from1);
+constexpr unsigned char from2[sizeof (S)]{};
+constexpr auto to2 = std::bit_cast(from2);
+
+constexpr bool
+cmp (const S &s1, const S &s2)
+{
+  for (int i = 0; i < sizeof (s1.s) / sizeof (s1.s[0]); i++)
+if (s1.s[i] != s2.s[i])
+  return false;
+  return true;
+}
+
+static_assert (cmp (to1, S{}));
+static_assert (cmp (to2, S{}));

Jakub



Re: [PATCH] debug: Fix another vector DECL_MODE ICE [PR98100]

2020-12-04 Thread Richard Biener
On Fri, 4 Dec 2020, Jakub Jelinek wrote:

> Hi!
> 
> The PR88587 fix changes DECL_MODE of vars with vector type during 
> inlining/cloning
> when the vars are copied, so that their DECL_MODE matches their TYPE_MODE in
> the new function.  Unfortunately, the following testcase still ICEs, the var
> isn't really used in the new function and so it isn't copied, but becomes
> just a nonlocalized var.  So we can't adjust its DECL_MODE because it
> appears in multiple functions and needs different modes in between them.
> The following patch changes the DEBUG_INSN creation to use TYPE_MODE instead
> of DECL_MODE for vars with vector types.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2020-12-03  Jakub Jelinek  
> 
>   PR target/98100
>   * cfgexpand.c (expand_gimple_basic_block): For vars with
>   vector type, use TYPE_MODE rather than DECL_MODE.
> 
>   * gcc.target/i386/pr98100.c: New test.
> 
> --- gcc/cfgexpand.c.jj2020-11-26 01:14:47.443082924 +0100
> +++ gcc/cfgexpand.c   2020-12-03 14:25:07.772537435 +0100
> @@ -5919,7 +5919,7 @@ expand_gimple_basic_block (basic_block b
> && !target_for_debug_bind (var))
>   goto delink_debug_stmt;
>  
> -   if (DECL_P (var))
> +   if (DECL_P (var) && !VECTOR_TYPE_P (TREE_TYPE (var)))
>   mode = DECL_MODE (var);
> else
>   mode = TYPE_MODE (TREE_TYPE (var));
> @@ -5936,7 +5936,10 @@ expand_gimple_basic_block (basic_block b
>  
> value = gimple_debug_source_bind_get_value (stmt);
>  
> -   mode = DECL_MODE (var);
> +   if (!VECTOR_TYPE_P (TREE_TYPE (var)))
> + mode = DECL_MODE (var);
> +   else
> + mode = TYPE_MODE (TREE_TYPE (var));
>  
> val = gen_rtx_VAR_LOCATION (mode, var, (rtx)value,
> 
> VAR_INIT_STATUS_UNINITIALIZED);
> --- gcc/testsuite/gcc.target/i386/pr98100.c.jj2020-12-03 
> 14:31:46.795106677 +0100
> +++ gcc/testsuite/gcc.target/i386/pr98100.c   2020-12-03 14:31:16.862439052 
> +0100
> @@ -0,0 +1,9 @@
> +/* PR target/98100 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-avx -fvar-tracking-assignments -g0" } */
> +
> +__attribute__((target_clones("default","avx2"))) void
> +foo ()
> +{
> +  __attribute__((__vector_size__(8 * sizeof(int int b = {};
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] fold-const: Don't use build_constructor for non-aggregate types in native_encode_initializer [PR93121]

2020-12-04 Thread Richard Biener
On Fri, 4 Dec 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is rejected, because when trying to encode a zeroing
> CONSTRUCTOR, the code was using build_constructor to build initializers for
> the elements but when recursing the function handles CONSTRUCTOR only for
> aggregate types.
> 
> The following patch fixes that by using build_zero_cst instead for
> non-aggregates.  Another option would be add handling CONSTRUCTOR for
> non-aggregates in native_encode_initializer.  Or we can do both, I guess
> the middle-end generally doesn't like CONSTRUCTORs for scalar variables, but
> am not 100% sure if the FE doesn't produce those sometimes.
> 
> Ok for trunk if it passes bootstrap/regtest?

You can use build_zero_cst unconditionally it already defaults to
builting a CTOR for aggregates.  OK with that change.

Richard.

> So far it passed
> make check-c++-all RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
> dg.exp=bit-cast*'
> 
> 2020-12-04  Jakub Jelinek  
> 
>   PR libstd++/93121
>   * fold-const.c (native_encode_initializer): Use build_constructor
>   only for aggregate types, otherwise use build_zero_cst.
> 
>   * g++.dg/cpp2a/bit-cast6.C: New test.
> 
> --- gcc/fold-const.c.jj   2020-12-03 15:37:25.795342398 +0100
> +++ gcc/fold-const.c  2020-12-04 11:25:54.949421799 +0100
> @@ -8104,11 +8104,16 @@ native_encode_initializer (tree init, un
>   {
> if (valueinit == -1)
>   {
> -   tree zero = build_constructor (TREE_TYPE (type), NULL);
> +   tree zero;
> +   if (AGGREGATE_TYPE_P (TREE_TYPE (type)))
> + zero = build_constructor (TREE_TYPE (type), NULL);
> +   else
> + zero = build_zero_cst (TREE_TYPE (type));
> r = native_encode_initializer (zero, ptr + curpos,
>fieldsize, 0,
>mask + curpos);
> -   ggc_free (zero);
> +   if (TREE_CODE (zero) == CONSTRUCTOR)
> + ggc_free (zero);
> if (!r)
>   return 0;
> valueinit = curpos;
> @@ -8255,8 +8260,13 @@ native_encode_initializer (tree init, un
>   {
> cnt--;
> field = fld;
> -   val = build_constructor (TREE_TYPE (fld), NULL);
> -   to_free = val;
> +   if (AGGREGATE_TYPE_P (TREE_TYPE (fld)))
> + {
> +   val = build_constructor (TREE_TYPE (fld), NULL);
> +   to_free = val;
> + }
> +   else
> + val = build_zero_cst (TREE_TYPE (fld));
>   }
>   }
>  
> --- gcc/testsuite/g++.dg/cpp2a/bit-cast6.C.jj 2020-12-04 11:36:12.963456560 
> +0100
> +++ gcc/testsuite/g++.dg/cpp2a/bit-cast6.C2020-12-04 11:35:59.227611364 
> +0100
> @@ -0,0 +1,31 @@
> +// PR libstd++/93121
> +// { dg-do compile { target c++20 } }
> +
> +namespace std
> +{
> +enum class byte : unsigned char {};
> +template 
> +constexpr To
> +bit_cast (const From &from)
> +{
> +  return __builtin_bit_cast (To, from);
> +}
> +}
> +
> +struct S { unsigned short s[2]; };
> +constexpr std::byte from1[sizeof (S)]{};
> +constexpr auto to1 = std::bit_cast(from1);
> +constexpr unsigned char from2[sizeof (S)]{};
> +constexpr auto to2 = std::bit_cast(from2);
> +
> +constexpr bool
> +cmp (const S &s1, const S &s2)
> +{
> +  for (int i = 0; i < sizeof (s1.s) / sizeof (s1.s[0]); i++)
> +if (s1.s[i] != s2.s[i])
> +  return false;
> +  return true;
> +}
> +
> +static_assert (cmp (to1, S{}));
> +static_assert (cmp (to2, S{}));
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PR66791][ARM] Replace __builtin_neon_vcreate* for vcreate intrinsics

2020-12-04 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 3 Dec 2020 at 16:50, Kyrylo Tkachov  wrote:
>
> Hi Prathamesh,
>
> > -Original Message-
> > From: Prathamesh Kulkarni 
> > Sent: 03 December 2020 10:50
> > To: gcc Patches ; Kyrylo Tkachov
> > 
> > Subject: [PR66791][ARM] Replace __builtin_neon_vcreate* for vcreate
> > intrinsics
> >
> > Hi,
> > This patch replaces calls to __builtin_neon_vcreate* builtins for
> > vcreate intrinsics in arm_neon.h.
> > Cross-tested on arm*-*-*.
> > OK to commit ?
>
> Just remembered for this and the previous patch...
> Do we need to remove the builtins from being created in the backend if they 
> are now unused?
Hi Kyrill,
Indeed, I will resend patch(es) with builtins removed (if they're not
used in other places).

Thanks,
Prathamesh
>
> Thanks,
> Kyrill
>
> >
> > Thanks,
> > Prathamesh


Re: [PATCH] testsuite: Fix various scan-assembler-symbol-section issues

2020-12-04 Thread Iain Sandoe

Hi Rainer,

thanks for looking at this, I was trying to see how to fix the failing Darwin
tests last week, and concluded that the absence of target selectors/xfail
meant skipping some tests - this is a much better solution.

Rainer Orth  wrote:


I recently started looking into scan-assembler-symbol-section since all
tests using it were FAILing on Solaris/SPARC.  Unfortuntely, the more I
looked the more issues I found, both with the implementation and the
interface.  This patch addresses some of those, but there are quite a
number of open questions.

* The first issue was that on Solaris/SPARC, section names are
 double-quoted, both with as and gas:

   .section".text"

 When using as, the section flag and type syntax is completely
 different from other ELF targets:

   .section"my_named_section",#alloc,#execinstr,#progbits

 This patch fixes this by stripping double quotes from section names.

* However, this didn't work initially (only the leading quote was
 stripped), which is due to David's recent AIX patch: with the
 introduction of the new capturing group to handle both .section (ELF)
 and .csect (XCOFF), $full_section_directive would never be empty on
 ELF and Mach-O targets, so the extraction of the section name didn't
 work any longer.  This had also broken the Darwin tests completely.

* Wondering why this hadn't been captured by the framework tests led me
 to

skipping test framework tests, CHECK_TEST_FRAMEWORK is not defined

 so this issue was all too easy to miss.  I'll get back to the
 framework tests later: there's a whole lot of problems there even if
 run.

* With working double quote stripping, all but one of the tests PASSed
 on Solaris/SPARC, the exception being:

FAIL: gcc.dg/20021029-1.c scan-assembler-symbol-section symbol ar (found  
__sparc_get_pc_thunk.l7) has section ^.(const|rodata)|[RO]  
(found .text.__sparc_get_pc_thunk.l7%__sparc_get_pc_thunk.l7)


 This is due to the symbol name (ar) not being anchored in the test and
 unexpectedly matchting __sparc_get_pc_thunk.l7.  Easily fixed, but it
 left me wondering about the interface: currently, every user of
 scan-assembler-symbol-section has to deal with anchoring and handling
 USER_LABEL_PREFIX herself.  It seems to me that this were better
 handled in the framework instead.  However, in the case at hand the
 actual symbol name is "ar.0" and I wonder if this can change in the
 future, i.e. we might need the generality of regexps for the symbol
 name here.

* Next, I ran the tests on Darwin 11 and found two failing tests:

FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_a\$  
(symbol not found) has section .data
FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_b\$  
(symbol not found) has section .data


 is due to Iain's recent "Darwin : Begin rework of zero-fill sections."
 patch which emits

   .globl _a
   .zerofill __DATA,__common,_a,1,0

 This is already scanned for, so the two scans above can just go.

 The other failing test is

FAIL: g++.dg/gomp/tls-5.C  -std=c++14  scan-assembler-symbol-section  
symbol ^_?_ZGR2ir_\$ (symbol not found) has section  
^.tdata|[TL]
FAIL: g++.dg/gomp/tls-5.C  -std=c++14  scan-assembler-symbol-section  
symbol ^_?ir\$ (symbol not found) has section ^.tbss|[TL]


 Other scans are guarded by target tls_native, and indeed the assembler
 output has

___emutls_v._ZGR2ir_:
___emutls_t._ZGR2ir_:

___emutls_v.ir:


I was half in mind to test for those symbols for emulated TLS (since they  
indicate
the moral equivalent of placing the data in the special sections) - but  
this wasn’t

possible absent the selector / xfail.


 Unfortunately scan-assembler-symbol-section doesn't support selects
 yet, which this test implements both for the benefit of this test and
 for symmetry.


… so now we ought to be able to make the test meaningful on emulated TLS
otherwise, it’s just consuming cpu - and we might as well have :

+// { dg-require-effective-target tls_native }

at the top…


With those changes, test results are clean now on sparc-sun-solaris2.11,
i386-pc-solaris2.11, i386-apple-darwin11.4.2, and
powerpc-ibm-aix7.2.4.0.

On AIX 7.2, there are changes like

-PASS: g++.dg/gomp/tls-5.C  -std=c++2a  scan-assembler-symbol-section  
symbol ^_?ir$ (found ir) has section ^\\.tbss|\\[TL\\] (found  
_tls5.tls_[TL],4)
+PASS: g++.dg/gomp/tls-5.C  -std=c++2a  scan-assembler-symbol-section  
symbol ^_?ir$ (found ir) has section ^\\.tbss|\\[TL\\] (found  
_tls5.tls_[TL])


i.e. the ",4" after (?) the section name is now stripped.  I believe
this is benign: David?

Besides, I've documented scan-assembler-symbol-section and
scan-symbol-section in sourcebuild.texi as should have been done from
the beginning.

That's not the end of things, unfortunately:

* I find my selector handling code in dg-scan-symbol-section
 particularly ugly since it has to deal with the different argument
 indices in

 scan-

[PATCH] tree-optimization/98137 - enhance split_constant_offset range handling

2020-12-04 Thread Richard Biener
split_constant_offset currently gives up looking at ranges when
dealing with possibly wrapping operations for looking through
conversions when the downstream analysis does not yield a SSA name.
That's overly conservative and we have a nice helper that can
deal with arbitrary expresssions.  Use that.  This helps data
reference group analysis so the testcase is fully SLP vectorized,
making use of the whole-function "BB" vectorization capabilities
we now have.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk?

Thanks,
Richard.

2020-12-04  Richard Biener  

PR tree-optimization/98137
* tree-data-ref.c (split_constant_offset_1): Use
determine_value_range instead of get_range_info to handle
arbitrary expressions.

* gcc.dg/vect/bb-slp-pr98137.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr98137.c | 27 ++
 gcc/tree-data-ref.c| 24 +++
 2 files changed, 41 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr98137.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr98137.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr98137.c
new file mode 100644
index 000..af43a1347ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr98137.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+/* { dg-require-effective-target vect_double } */
+
+void
+gemm (const double* __restrict__ A, const double* __restrict__ B,
+  double* __restrict__ C)
+{
+  unsigned int l_m = 0;
+  unsigned int l_n = 0;
+  unsigned int l_k = 0;
+
+  for ( l_n = 0; l_n < 9; l_n++ ) {
+/* Use -O3 so this loop is unrolled completely early.  */
+for ( l_m = 0; l_m < 10; l_m++ ) { C[(l_n*10)+l_m] = 0.0; }
+for ( l_k = 0; l_k < 17; l_k++ ) {
+  /* Use -O3 so this loop is unrolled completely early.  */
+  for ( l_m = 0; l_m < 10; l_m++ ) {
+C[(l_n*10)+l_m] += A[(l_k*20)+l_m] * B[(l_n*20)+l_k];
+  }
+}
+  }
+}
+
+/* Exact scannig is difficult but we expect all loads and stores
+   and computations to be vectorized.  */
+/* { dg-final { scan-tree-dump "optimized: basic block" "slp1" } } */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 3bf460cccfd..e8308ce8250 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -763,18 +763,22 @@ split_constant_offset_1 (tree type, tree op0, enum 
tree_code code, tree op1,
tree tmp_var, tmp_off;
split_constant_offset (op0, &tmp_var, &tmp_off, cache, limit);
 
-   /* See whether we have an SSA_NAME whose range is known
-  to be [A, B].  */
-   if (TREE_CODE (tmp_var) != SSA_NAME)
- return false;
+   /* See whether we have an known range [A, B] for tmp_var.  */
wide_int var_min, var_max;
-   value_range_kind vr_type = get_range_info (tmp_var, &var_min,
-  &var_max);
-   wide_int var_nonzero = get_nonzero_bits (tmp_var);
signop sgn = TYPE_SIGN (itype);
-   if (intersect_range_with_nonzero_bits (vr_type, &var_min,
-  &var_max, var_nonzero,
-  sgn) != VR_RANGE)
+   if (TREE_CODE (tmp_var) == SSA_NAME)
+ {
+   value_range_kind vr_type
+ = get_range_info (tmp_var, &var_min, &var_max);
+   wide_int var_nonzero = get_nonzero_bits (tmp_var);
+   if (intersect_range_with_nonzero_bits (vr_type, &var_min,
+  &var_max,
+  var_nonzero,
+  sgn) != VR_RANGE)
+ return false;
+ }
+   else if (determine_value_range (tmp_var, &var_min, &var_max)
+!= VR_RANGE)
  return false;
 
/* See whether the range of OP0 (i.e. TMP_VAR + TMP_OFF)
-- 
2.26.2


Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Jan Hubicka
> [AMD Official Use Only - Internal Distribution Only]
> 
> Hi Maintainers,
> 
> PFA, the patch that enables support for the next generation AMD Zen3 CPU via 
> -march=znver3.
> This is a very basic enablement patch. As of now the cost, tuning and 
> scheduler changes are kept same as znver2.
> Further changes to the cost and tunings will be done later.

Hello,
the changes to x86-tune.def and x86-tune-sched.c seems fine to me.
There is one patch on 
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545415.html
I did not see significant difference on specs, but do we want to change
the default?

Honza
> 
> Ok for trunk ?
> 
> Regards,
> Venkat.




Re: [PATCH 1/2] Switch to a new section if the SECTION_RETAIN bit doesn't match

2020-12-04 Thread Jozef Lawrynowicz
Hi H.J.,

On Thu, Dec 03, 2020 at 04:06:51PM -0800, H.J. Lu via Gcc-patches wrote:
> When definitions marked with used attribute and unmarked definitions are
> placed in the same section, switch to a new section if the SECTION_RETAIN
> bit doesn't match.

GAS doesn't create separate sections for
  .section .data.foo,"aw"
  .section .data.foo,"awR"
A single .data.foo section, with SHF_GNU_RETAIN set, is output to the
object file.

This is because the addition of SHF_GNU_RETAIN support to the "used"
attribute was not supposed to be modifying the layout of sections in the
object file.

Now we are placing "used" decls in a new, unique section (when they
don't already have a unique section), I suppose it is fine to have two
separate sections for retained/non-retained .data.foo in the object
file.

It does feel like a bit of a kludge that we end up with
  [ 4] .data.foo PROGBITS  40 06 00  WA  0   0  1
  [ 5] .data.foo PROGBITS  46 06 00 WAR  0   0  1
but it is beneficial to the user, since linker garbage collection
will remove more unused parts of their program.

Your commit "2be7ea19b3 Add SEC_ASSEMBLER_SHF_MASK" on x86-binutils
implements this, it just needs some fix ups to apply cleanly.

Note that when .section is used without any section attributes
specified, then the non-SHF_GNU_RETAIN section is switched to.
I would expect it to switch to the most recently switched to section.

After applying your above SEC_ASSEMBLER_SHF_MASK commit, the following
assembly code:

  .section .data.foo,"aw"
  .word 0
  .section .data.foo
  .word 0
  .section .data.foo,"awR"
  .word 0
  .section .data.foo
  .word 0

results in the following section layout
  sh_size
  [ 4] .data.foo PROGBITS  40 06 00  WA  0   0  1
  [ 5] .data.foo PROGBITS  46 02 00 WAR  0   0  1

As you can see, the final ".section .data.foo" instance has been
associated with the non-SHF_GNU_RETAIN section.
To align with this GCC patch, the final ".section .data.foo" directive
should in fact switch to the most recently switched to section, which
would be the SHF_GNU_RETAIN instance of .data.foo.

I've included some other comments inline with the patch

Thanks,
Jozef

> 
> gcc/
> 
>   PR other/98121
>   * output.h (switch_to_section): Add a tree argument, default to
>   nullptr.
>   * varasm.c (get_section): If the SECTION_RETAIN bit doesn't match,
>   return and switch to a new section later.
>   (assemble_start_function): Pass decl to switch_to_section.
>   (assemble_variable): Likewise.
>   (switch_to_section): If the SECTION_RETAIN bit doesn't match,
>   switch to a new section.
> 
> gcc/testsuite/
> 
>   PR other/98121
>   * c-c++-common/attr-used-5.c: New test.
>   * c-c++-common/attr-used-6.c: Likewise.
>   * c-c++-common/attr-used-7.c: Likewise.
>   * c-c++-common/attr-used-8.c: Likewise.
> ---
>  gcc/output.h |  2 +-
>  gcc/testsuite/c-c++-common/attr-used-5.c | 26 ++
>  gcc/testsuite/c-c++-common/attr-used-6.c | 26 ++
>  gcc/testsuite/c-c++-common/attr-used-7.c |  8 +++
>  gcc/testsuite/c-c++-common/attr-used-8.c |  8 +++
>  gcc/varasm.c | 28 
>  6 files changed, 93 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/attr-used-5.c
>  create mode 100644 gcc/testsuite/c-c++-common/attr-used-6.c
>  create mode 100644 gcc/testsuite/c-c++-common/attr-used-7.c
>  create mode 100644 gcc/testsuite/c-c++-common/attr-used-8.c
> 
> diff --git a/gcc/output.h b/gcc/output.h
> index fa8ace1f394..1f9af46da1d 100644
> --- a/gcc/output.h
> +++ b/gcc/output.h
> @@ -548,7 +548,7 @@ extern void switch_to_other_text_partition (void);
>  extern section *get_cdtor_priority_section (int, bool);
>  
>  extern bool unlikely_text_section_p (section *);
> -extern void switch_to_section (section *);
> +extern void switch_to_section (section *, tree = nullptr);
>  extern void output_section_asm_op (const void *);
>  
>  extern void record_tm_clone_pair (tree, tree);
> diff --git a/gcc/testsuite/c-c++-common/attr-used-5.c 
> b/gcc/testsuite/c-c++-common/attr-used-5.c
> new file mode 100644
> index 000..9fc0d3834e9
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/attr-used-5.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -O2" } */
> +
> +struct dtv_slotinfo_list
> +{
> +  struct dtv_slotinfo_list *next;
> +};
> +
> +extern struct dtv_slotinfo_list *list;
> +
> +static int __attribute__ ((section ("__libc_freeres_fn")))
> +free_slotinfo (struct dtv_slotinfo_list **elemp)
> +{
> +  if (!free_slotinfo (&(*elemp)->next))
> +return 0;
> +  return 1;
> +}
> +
> +__attribute__ ((used, section ("__libc_freeres_fn")))
> +static void free_mem (void)
> +{
> +  free_slotinfo (&list);
> +}
> +
> +/* { d

Re: [PATCH 0/2] Switch to a new section if the SECTION_RETAIN bit doesn't match

2020-12-04 Thread Jozef Lawrynowicz
On Thu, Dec 03, 2020 at 04:06:50PM -0800, H.J. Lu via Gcc-patches wrote:
> When SECTION_RETAIN is used, definitions marked with used attribute and
> unmarked definitions are placed in the same section.  Instead of issue
> an error:
> 
> [hjl@gnu-cfl-2 gcc]$ /usr/gcc-11.0.0-x32/bin/gcc -S c.c 
> -fdiagnostics-plain-output
> c.c:2:49: error: ‘foo1’ causes a section type conflict with ‘foo2’
> c.c:1:54: note: ‘foo2’ was declared here
> [hjl@gnu-cfl-2 gcc]$
> 
> the first patch switches to a new section if the SECTION_RETAIN bit
> doesn't match.  The second optional patch issues a warning:
> 
> [hjl@gnu-cfl-2 gcc]$ ./xgcc -B./ -S c.c -fdiagnostics-plain-output
> c.c:2:49: warning: ‘foo1’ without ‘used’ attribute is placed in a section 
> with ‘foo2’ with ‘used’ attribute [-Wattributes]
> [hjl@gnu-cfl-2 gcc]$

I think the warning is useful, since we are modifying the structure of
the object file where the user may not expect it. It ensures they review
which declarations have "used" applied so they don't unexpectedly lose
parts of their program they wanted to keep by putting them in a
section that was marked "used" elsewhere.

> 
> H.J. Lu (2):
>   Switch to a new section if the SECTION_RETAIN bit doesn't match
>   Warn used and not used symbols in the same section

We should probably use a new PR to associate with these patches, rather
than PR/98121.

Your changes here address the issue exposed by glibc code, whilst 98121
was for the broader issue of whether "used" should apply SHF_GNU_RETAIN.

Let me know if you agree, and I'll create a new GCC PR for the specific
issue that was exposed by glibc. We should then mark 98121 as
resolved/rejected, since we are not changing whether "used" applies
SHF_GNU_RETAIN.

Thanks,
Jozef
> 
>  gcc/output.h |  2 +-
>  gcc/testsuite/c-c++-common/attr-used-5.c | 27 +++
>  gcc/testsuite/c-c++-common/attr-used-6.c | 27 +++
>  gcc/testsuite/c-c++-common/attr-used-7.c |  9 +
>  gcc/testsuite/c-c++-common/attr-used-8.c |  9 +
>  gcc/varasm.c | 43 +---
>  6 files changed, 112 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/attr-used-5.c
>  create mode 100644 gcc/testsuite/c-c++-common/attr-used-6.c
>  create mode 100644 gcc/testsuite/c-c++-common/attr-used-7.c
>  create mode 100644 gcc/testsuite/c-c++-common/attr-used-8.c
> 
> -- 
> 2.28.0
> 


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza,

> -Original Message-
> From: Jan Hubicka 
> Sent: Friday, December 4, 2020 5:25 PM
> To: Kumar, Venkataramanan 
> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak 
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> > [AMD Official Use Only - Internal Distribution Only]
> >
> > Hi Maintainers,
> >
> > PFA, the patch that enables support for the next generation AMD Zen3
> CPU via -march=znver3.
> > This is a very basic enablement patch. As of now the cost, tuning and
> scheduler changes are kept same as znver2.
> > Further changes to the cost and tunings will be done later.
> 
> Hello,
> the changes to x86-tune.def and x86-tune-sched.c seems fine to me.
> There is one patch on
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
> gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> May%2F545415.html&data=04%7C01%7CVenkataramanan.Kumar%40a
> md.com%7Ce25a15789ca8494e5f4d08d8984b7615%7C3dd8961fe4884e608e1
> 1a82d994e183d%7C0%7C0%7C637426797168805753%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C1000&sdata=1hax0HYCbxs1LGQEFULlvLh%2BTRo3xJzuj70
> kIRtFnDk%3D&reserved=0
> I did not see significant difference on specs, but do we want to change the
> default?

You mean the tune " X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL" has no impact on 
SPEC and  other benchmarks ?
I have not done performance experiments with this tune on Zen machines.   

Let me check on this and get back to you. 

> 
> Honza
> >
> > Ok for trunk ?
> >
> > Regards,
> > Venkat.
> 

Regards,
Venkat.


Re: [PATCH] testsuite: Fix various scan-assembler-symbol-section issues

2020-12-04 Thread David Edelsohn via Gcc-patches
On Fri, Dec 4, 2020 at 5:35 AM Rainer Orth  
wrote:

> On AIX 7.2, there are changes like
>
> -PASS: g++.dg/gomp/tls-5.C  -std=c++2a  scan-assembler-symbol-section symbol 
> ^_?ir$ (found ir) has section ^\\.tbss|\\[TL\\] (found _tls5.tls_[TL],4)
> +PASS: g++.dg/gomp/tls-5.C  -std=c++2a  scan-assembler-symbol-section symbol 
> ^_?ir$ (found ir) has section ^\\.tbss|\\[TL\\] (found _tls5.tls_[TL])
>
> i.e. the ",4" after (?) the section name is now stripped.  I believe
> this is benign: David?

The ",4" is the symbol alignment.  It is not necessary for the purpose
of the tests.

Thanks for looking further into this problem.  As I mentioned in my
earlier reply to the patch itself, I believe that this new feature and
infrastructure change should have been tested and fixed on
non-Linux/ELF/x86 architectures, not left as an exercise for the
maintainers of other targets.  A patch that introduces regressions in
the testsuite should be fixed or reverted and should be the
responsibility of the author -- whether the change is to the compiler
or to the testsuite.

Thanks, David


Re: [PATCH 0/2] Switch to a new section if the SECTION_RETAIN bit doesn't match

2020-12-04 Thread H.J. Lu via Gcc-patches
On Fri, Dec 4, 2020 at 4:17 AM Jozef Lawrynowicz
 wrote:
>
> On Thu, Dec 03, 2020 at 04:06:50PM -0800, H.J. Lu via Gcc-patches wrote:
> > When SECTION_RETAIN is used, definitions marked with used attribute and
> > unmarked definitions are placed in the same section.  Instead of issue
> > an error:
> >
> > [hjl@gnu-cfl-2 gcc]$ /usr/gcc-11.0.0-x32/bin/gcc -S c.c 
> > -fdiagnostics-plain-output
> > c.c:2:49: error: ‘foo1’ causes a section type conflict with ‘foo2’
> > c.c:1:54: note: ‘foo2’ was declared here
> > [hjl@gnu-cfl-2 gcc]$
> >
> > the first patch switches to a new section if the SECTION_RETAIN bit
> > doesn't match.  The second optional patch issues a warning:
> >
> > [hjl@gnu-cfl-2 gcc]$ ./xgcc -B./ -S c.c -fdiagnostics-plain-output
> > c.c:2:49: warning: ‘foo1’ without ‘used’ attribute is placed in a section 
> > with ‘foo2’ with ‘used’ attribute [-Wattributes]
> > [hjl@gnu-cfl-2 gcc]$
>
> I think the warning is useful, since we are modifying the structure of
> the object file where the user may not expect it. It ensures they review
> which declarations have "used" applied so they don't unexpectedly lose
> parts of their program they wanted to keep by putting them in a
> section that was marked "used" elsewhere.

I agree.

> >
> > H.J. Lu (2):
> >   Switch to a new section if the SECTION_RETAIN bit doesn't match
> >   Warn used and not used symbols in the same section
>
> We should probably use a new PR to associate with these patches, rather
> than PR/98121.
>
> Your changes here address the issue exposed by glibc code, whilst 98121
> was for the broader issue of whether "used" should apply SHF_GNU_RETAIN.
>
> Let me know if you agree, and I'll create a new GCC PR for the specific

Please do.

> issue that was exposed by glibc. We should then mark 98121 as
> resolved/rejected, since we are not changing whether "used" applies
> SHF_GNU_RETAIN.
>

Thanks.

-- 
H.J.


c++: Module API declarations

2020-12-04 Thread Nathan Sidwell
Here are the declarations of module.cc.  I'll fill these in with 
nop-stubs when adding the remaining pieces of the modules 
infrastructure.  Finally replacing the contents of module.cc with the 
real thing when victory is within reach.


This provides the inline predicates about module state, and declares
the functions to be provided.

gcc/cp/
* cp-tree.h: Add various inline module state predicates, and
declare the API that will be provided by modules.cc

pushing to trunk
--
Nathan Sidwell
diff --git c/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index 081ede24e96..722379445cb 100644
--- c/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -6884,6 +6884,103 @@ extern bool ctor_omit_inherited_parms		(tree);
 extern tree locate_ctor(tree);
 extern tree implicitly_declare_fn   (special_function_kind, tree,
 		 bool, tree, tree);
+/* In module.cc  */
+class module_state; /* Forward declare.  */
+inline bool modules_p () { return flag_modules != 0; }
+
+/* The kind of module or part thereof that we're in.  */
+enum module_kind_bits
+{
+  MK_MODULE = 1 << 0, /* This TU is a module.  */
+  MK_GLOBAL = 1 << 1, /* Entities are in the global module.  */
+  MK_INTERFACE = 1 << 2,  /* This TU is an interface.  */
+  MK_PARTITION = 1 << 3,  /* This TU is a partition.  */
+  MK_EXPORTING = 1 << 4,  /* We are in an export region.  */
+};
+
+/* We do lots of bit-manipulation, so an unsigned is easier.  */
+extern unsigned module_kind;
+
+/*  MK_MODULE & MK_GLOBAL have the following combined meanings:
+ MODULE GLOBAL
+   0	  0	not a module
+   0	  1	GMF of named module (we've not yet seen module-decl)
+   1	  0	purview of named module
+   1	  1	header unit.   */
+
+inline bool module_purview_p ()
+{ return module_kind & MK_MODULE; }
+inline bool global_purview_p ()
+{ return module_kind & MK_GLOBAL; }
+
+inline bool not_module_p ()
+{ return (module_kind & (MK_MODULE | MK_GLOBAL)) == 0; }
+inline bool named_module_p ()
+{ /* This is a named module if exactly one of MODULE and GLOBAL is
+ set.  */
+  /* The divides are constant shifts!  */
+  return ((module_kind / MK_MODULE) ^ (module_kind / MK_GLOBAL)) & 1;
+}
+inline bool header_module_p ()
+{ return (module_kind & (MK_MODULE | MK_GLOBAL)) == (MK_MODULE | MK_GLOBAL); }
+inline bool named_module_purview_p ()
+{ return (module_kind & (MK_MODULE | MK_GLOBAL)) == MK_MODULE; }
+inline bool module_interface_p ()
+{ return module_kind & MK_INTERFACE; }
+inline bool module_partition_p ()
+{ return module_kind & MK_PARTITION; }
+inline bool module_has_cmi_p ()
+{ return module_kind & (MK_INTERFACE | MK_PARTITION); }
+
+/* We're currently exporting declarations.  */
+inline bool module_exporting_p ()
+{ return module_kind & MK_EXPORTING; }
+
+extern module_state *get_module (tree name, module_state *parent = NULL,
+ bool partition = false);
+extern bool module_may_redeclare (tree decl);
+
+extern int module_initializer_kind ();
+extern void module_add_import_initializers ();
+
+/* Where the namespace-scope decl was originally declared.  */
+extern void set_originating_module (tree, bool friend_p = false);
+extern tree get_originating_module_decl (tree) ATTRIBUTE_PURE;
+extern int get_originating_module (tree, bool for_mangle = false) ATTRIBUTE_PURE;
+extern unsigned get_importing_module (tree, bool = false) ATTRIBUTE_PURE;
+
+/* Where current instance of the decl got declared/defined/instantiated.  */
+extern void set_instantiating_module (tree);
+extern void set_defining_module (tree);
+extern void maybe_attach_decl (tree ctx, tree decl);
+
+extern void mangle_module (int m, bool include_partition);
+extern void mangle_module_fini ();
+extern void lazy_load_binding (unsigned mod, tree ns, tree id,
+			   binding_slot *bslot);
+extern void lazy_load_specializations (tree tmpl);
+extern void lazy_load_members (tree decl);
+extern bool lazy_specializations_p (unsigned, bool, bool);
+extern module_state *preprocess_module (module_state *, location_t,
+	bool in_purview, 
+	bool is_import, bool export_p,
+	cpp_reader *reader);
+extern void preprocessed_module (cpp_reader *reader);
+extern void import_module (module_state *, location_t, bool export_p,
+			   tree attr, cpp_reader *);
+extern void declare_module (module_state *, location_t, bool export_p,
+			tree attr, cpp_reader *);
+extern void init_modules (cpp_reader *);
+extern void fini_modules ();
+extern void maybe_check_all_macros (cpp_reader *);
+extern void finish_module_processing (cpp_reader *);
+extern char const *module_name (unsigned, bool header_ok);
+extern bitmap get_import_bitmap ();
+extern bitmap module_visible_instantiation_path (bitmap *);
+extern void module_begin_main_file (cpp_reader *, line_maps *,
+const line_map_ordinary *);
+extern void module_preprocess_options (cpp_reader *);
+extern bool handle_module_option (unsigned opt, const char *arg, int value);
 
 /* In optimize.c */
 extern bool maybe_clone_body			(tree);


Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-12-04 Thread Martin Liška

On 12/4/20 10:03 AM, Richard Biener wrote:

Otherwise 0001- looks good to me.


Pushed that to master.


As said I'd like to see opinions
from others on the
driver / backend communication for 0002.


To be honest, we moved back to the original implementation which used
a temporary file. There hasn't been any opinion for last 8 months :(

Martin


Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-12-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Dec 04, 2020 at 02:30:45PM +0100, Martin Liška wrote:
> On 12/4/20 10:03 AM, Richard Biener wrote:
> > Otherwise 0001- looks good to me.
> 
> Pushed that to master.
> 
> > As said I'd like to see opinions
> > from others on the
> > driver / backend communication for 0002.
> 
> To be honest, we moved back to the original implementation which used
> a temporary file. There hasn't been any opinion for last 8 months :(

Using environment variable rather than a temporary file must be faster.

Jakub



Re: [PATCH] RISC-V: Canonicalize --with-arch

2020-12-04 Thread Matthias Klose
On 12/4/20 9:07 AM, Kito Cheng via Gcc-patches wrote:
> Committed, thanks :)
> 
> On Thu, Dec 3, 2020 at 8:51 AM Jim Wilson  wrote:
>>
>> On Tue, Dec 1, 2020 at 12:13 AM Kito Cheng  wrote:
>>>
>>>  - We would like to canonicalize the arch string for --with-arch for
>>>easier handling multilib, so split canonicalization part to a stand
>>>along script to shared the logic.
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/multilib-generator (arch_canonicalize): Move
>>> code to arch-canonicalize, and call that script to canonicalize arch
>>> string.
>>> (canonical_order): Move code to arch-canonicalize.
>>> (LONG_EXT_PREFIXES): Ditto.
>>> (IMPLIED_EXT): Ditto.
>>> * config/riscv/arch-canonicalize: New.
>>> * config.gcc (riscv*-*-*): Canonicalize --with-arch.
>>
>>
>> Looks OK to me.

that breaks the bootstrap if python is not available. The python command might
not be available, so please check for python3, python, or python2.

And it adds an unconditional build dependency on python for building the riscv
targets.

Matthias


Re: [PATCH] implement pre-c++20 contracts

2020-12-04 Thread Jeff Chapman via Gcc-patches
> OK, I'll start with -alt then, thanks.

Andrew is exactly correct, contracts-jac-alt is still the current branch
we're focusing our upstreaming efforts on.

It's trailing upstream master by a fair bit at this point. I'll get a merge
pushed shortly.

Please let me know if there's anything I can do to help review along!


On Thu, Dec 3, 2020 at 12:41 PM Jason Merrill  wrote:

> On 12/3/20 12:07 PM, Andrew Sutton wrote:
> >
> >  > Attached is a new squashed revision of the patch sans ChangeLogs.
> The
> >  > current work is now being done on github:
> >  > https://github.com/lock3/gcc/tree/contracts-jac-alt
> > 
> >
> > I'm starting to review this now, sorry for the delay. Is this still
> the
> > branch you want me to consider for GCC 11?  I notice that the
> > -constexpr
> > and -mangled-config branches are newer.
> >
> >
> > I think so. Jeff can answer more authoritatively. I know we had one set
> > of changes to the design (how contracts) work aimed at improving the
> > debugging experience for violated contracts. I'm not sure if that's in
> > the jac-alt branch though.
> >
> > The -constexpr branch checks for trivially satisfied contracts (e.g.,
> > [[assert: true]]) and issues warnings. It also preemptively checks
> > preconditions against constant function arguments. It's probably worth
> > reviewing that separately.
> >
> > I'm not sure the -manged-config branch is worth considering for merging
> > at this point. It's trying to solve a problem that might not be worth
> > solving.
>
> OK, I'll start with -alt then, thanks.
>
> > Out of curiosity, are you concerned that future versions of contracts
> > might have considerably different syntax or configurability? I'd hope it
> > wouldn't, but who knows where SG21 is going :)
>
> Not particularly; I figure that most of the implementation would be
> unaffected.
>
> Jason
>
>


Re: [PATCH] RISC-V: Canonicalize --with-arch

2020-12-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Dec 04, 2020 at 02:38:54PM +0100, Matthias Klose wrote:
> On 12/4/20 9:07 AM, Kito Cheng via Gcc-patches wrote:
> > Committed, thanks :)
> > 
> > On Thu, Dec 3, 2020 at 8:51 AM Jim Wilson  wrote:
> >>
> >> On Tue, Dec 1, 2020 at 12:13 AM Kito Cheng  wrote:
> >>>
> >>>  - We would like to canonicalize the arch string for --with-arch for
> >>>easier handling multilib, so split canonicalization part to a stand
> >>>along script to shared the logic.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> * config/riscv/multilib-generator (arch_canonicalize): Move
> >>> code to arch-canonicalize, and call that script to canonicalize 
> >>> arch
> >>> string.
> >>> (canonical_order): Move code to arch-canonicalize.
> >>> (LONG_EXT_PREFIXES): Ditto.
> >>> (IMPLIED_EXT): Ditto.
> >>> * config/riscv/arch-canonicalize: New.
> >>> * config.gcc (riscv*-*-*): Canonicalize --with-arch.
> >>
> >>
> >> Looks OK to me.
> 
> that breaks the bootstrap if python is not available. The python command might
> not be available, so please check for python3, python, or python2.
> 
> And it adds an unconditional build dependency on python for building the riscv
> targets.

Yeah, doing it in awk or shell might be better.  We do use python for
various things, but generally try not to require it for build and make
check, e.g. some contrib/ scripts used during build and make check have
shell variants etc.

Jakub



Re: [PATCH v2] Add --ld-path= to specify an arbitrary executable as the linker

2020-12-04 Thread Martin Liška

PING

May I please ping the patch, it's waiting here for a review
for quite some time.

Thanks,
Martin

On 7/23/20 12:17 PM, Martin Liška wrote:

On 7/21/20 6:07 AM, Fangrui Song wrote:

If the value does not contain any path component separator (e.g. a
slash), the linker will be searched for using COMPILER_PATH followed by
PATH. Otherwise, it is either an absolute path or a path relative to the
current working directory.

--ld-path= complements and overrides -fuse-ld={bfd,gold,lld}. If in the
future, we want to make dfferent linker option decisions we can let
-fuse-ld= represent the linker flavor and --ld-path= the linker path.


Hello.

I have just few nits:

=== ERROR type #3: trailing operator (1 error(s)) ===
gcc/collect2.c:1155:14:    ld_file_name =



PR driver/93645
* common.opt (--ld-path=): Add --ld-path=
* opts.c (common_handle_option): Handle OPT__ld_path_.
* gcc.c (driver_handle_option): Likewise.
* collect2.c (main): Likewise.
* doc/invoke.texi: Document --ld-path=.

---
Changes in v2:
* Renamed -fld-path= to --ld-path= (clang 12.0.0 new option).
   The option does not affect code generation and is not a language feature,
   -f* is not suitable. Additionally, clang has other similar --*-path=
   options, e.g. --cuda-path=.
---
  gcc/collect2.c  | 63 +++--
  gcc/common.opt  |  4 +++
  gcc/doc/invoke.texi |  9 +++
  gcc/gcc.c   |  2 +-
  gcc/opts.c  |  1 +
  5 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/gcc/collect2.c b/gcc/collect2.c
index f8a5ce45994..caa1b96ab52 100644
--- a/gcc/collect2.c
+++ b/gcc/collect2.c
@@ -844,6 +844,7 @@ main (int argc, char **argv)
    const char **ld1;
    bool use_plugin = false;
    bool use_collect_ld = false;
+  const char *ld_path = NULL;
    /* The kinds of symbols we will have to consider when scanning the
   outcome of a first pass link.  This is ALL to start with, then might
@@ -961,12 +962,21 @@ main (int argc, char **argv)
  if (selected_linker == USE_DEFAULT_LD)
    selected_linker = USE_PLUGIN_LD;
    }
-    else if (strcmp (argv[i], "-fuse-ld=bfd") == 0)
-  selected_linker = USE_BFD_LD;
-    else if (strcmp (argv[i], "-fuse-ld=gold") == 0)
-  selected_linker = USE_GOLD_LD;
-    else if (strcmp (argv[i], "-fuse-ld=lld") == 0)
-  selected_linker = USE_LLD_LD;
+    else if (strncmp (argv[i], "-fuse-ld=", 9) == 0
+ && selected_linker != USE_LD_MAX)
+  {
+    if (strcmp (argv[i] + 9, "bfd") == 0)
+  selected_linker = USE_BFD_LD;
+    else if (strcmp (argv[i] + 9, "gold") == 0)
+  selected_linker = USE_GOLD_LD;
+    else if (strcmp (argv[i] + 9, "lld") == 0)
+  selected_linker = USE_LLD_LD;
+  }
+    else if (strncmp (argv[i], "--ld-path=", 10) == 0)
+  {
+    ld_path = argv[i] + 10;
+    selected_linker = USE_LD_MAX;
+  }
  else if (strncmp (argv[i], "-o", 2) == 0)
    {
  /* Parse the output filename if it's given so that we can make
@@ -1117,14 +1127,34 @@ main (int argc, char **argv)
    ld_file_name = find_a_file (&cpath, collect_ld_suffix, X_OK);
    use_collect_ld = ld_file_name != 0;
  }
-  /* Search the compiler directories for `ld'.  We have protection against
- recursive calls in find_a_file.  */
-  if (ld_file_name == 0)
-    ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], X_OK);
-  /* Search the ordinary system bin directories
- for `ld' (if native linking) or `TARGET-ld' (if cross).  */
-  if (ld_file_name == 0)
-    ld_file_name = find_a_file (&path, full_ld_suffixes[selected_linker], 
X_OK);
+  if (selected_linker == USE_LD_MAX)
+    {
+  /* If --ld-path= does not contain a path component separator, search for
+ the command using cpath, then using path.  Otherwise find the linker
+ relative to the current working directory.  */
+  if (lbasename (ld_path) == ld_path)
+    {
+  ld_file_name = find_a_file (&cpath, ld_path, X_OK);
+  if (ld_file_name == 0)
+    ld_file_name = find_a_file (&path, ld_path, X_OK);
+    }
+  else if (file_exists (ld_path))
+    {


^^^ these braces are not needed.


+  ld_file_name = ld_path;
+    }
+    }
+  else
+    {
+  /* Search the compiler directories for `ld'.  We have protection against
+ recursive calls in find_a_file.  */
+  if (ld_file_name == 0)


I would prefer '== NULL'.


+    ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], X_OK);
+  /* Search the ordinary system bin directories
+ for `ld' (if native linking) or `TARGET-ld' (if cross).  */
+  if (ld_file_name == 0)
+    ld_file_name =
+  find_a_file (&path, full_ld_suffixes[selected_linker], X_OK);
+    }
  #ifdef REAL_NM_FILE_NAME
    nm_file_name = find_a_file (&path, REAL_NM_FILE_NAME, X_OK);
@@ -1461,6 +1491,11 @@ main (int argc, char **argv)
    ld2--;
  #endif
  }
+  else if

Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-12-04 Thread Richard Biener via Gcc-patches
On Fri, Dec 4, 2020 at 2:35 PM Jakub Jelinek  wrote:
>
> On Fri, Dec 04, 2020 at 02:30:45PM +0100, Martin Liška wrote:
> > On 12/4/20 10:03 AM, Richard Biener wrote:
> > > Otherwise 0001- looks good to me.
> >
> > Pushed that to master.
> >
> > > As said I'd like to see opinions
> > > from others on the
> > > driver / backend communication for 0002.
> >
> > To be honest, we moved back to the original implementation which used
> > a temporary file. There hasn't been any opinion for last 8 months :(
>
> Using environment variable rather than a temporary file must be faster.

But it's difficult to preserve behavior with -save-temps and execing cc1
as printed which is why I don't like it too much, even if it is "faster".
The environment might also be of limited size and with including
everything (-I... -L...) and always (a defect of the patch) I fear we might
run over its size limits on some host OSs at least.

Richard.

> Jakub
>


[PATCH, v2, OpenMP 5.0, libgomp] Structure element mapping for OpenMP 5.0

2020-12-04 Thread Chung-Lin Tang

Hi Jakub,
this is a new version of the structure element mapping patch for OpenMP 5.0 
requirement
changes.

This one uses the approach you've outlined in your concept patch [1], basically 
to
use more special REFCOUNT_* values to mark them, and link following structure 
element
splay_tree_keys back to the first key's refcount.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557622.html

Implementation notes of the attached patch:

(1) This patch solves the 5.0 requirements of "not already 
incremented/decremented
because of the effect of a map clause on the construct" by pulling in 
libgomp/hashtab.h
and using htab_t as a pointer set. A "htab_t *refcount_set" is added in 
map/unmap
routines to track the processing status of the uintptr_t* addresses of refcount
fields in splay_tree_keys.

   * Currently this patch is using the same htab_create/htab_free routines like 
in task.c.
 I toyed with creating a 'htab_alloca' macro (allocating a fixed size htab) 
to speed
 things further, but decided to play it safer for the current patch.

(2) Because of the use of pointer-to-refcounts as the basis, and structure 
element
siblings all share a same refcount, uniform increment/decrement without 
repeating is
also naturally achieved.

(3) Because of the need to remove whole structure element sibling sequences out 
of
context, it appears we need to mark the first/last of such a sequence. You'll 
see that
the special REFCOUNT_* values have been expanded a bit more than your concept 
patch
(at some point we should think about stop abusing it and add a proper flags 
word)

(4) The new increment/decrement routines combine most of the new refcount_set 
lookup
code with the refcount adjusting. For the decrement routine, "copy" and 
"removal" are
now separate return values, since for structure element sequences, even when 
signalling
"removal" you may still need to finish the "copy" work of following 
target_var_descs.

(5) There are some re-organizing changes to oacc-parallel.c and oacc-mem.c, but 
most
of the code that matters is in target.c.

(6) New testcases have been added to reflect the cases discussed on omp-lang 
list.

This patch has been tested for libgomp with no regressions on x86_64-linux with
nvptx offloading. Since I submitted the first "v1" patch long ago, is this okay 
to be
considered as committable now after approval?

Thanks,
Chung-Lin

2020-12-04  Chung-Lin Tang  

libgomp/
* hashtab.h (htab_clear): New function with initialization code
factored out from...
(htab_create): ...here, adjust to use htab_clear function.

* libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
special refcount values, add comments.
(REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
(REFCOUNT_LINK): Likewise.
(REFCOUNT_STRUCTELEM): New special refcount range for structure
element siblings.
(REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
sibling maps.
(REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
(REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
(REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
(REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
(struct splay_tree_key_s): Add structelem_refcount and
structelem_refcount_ptr fields into a union with dynamic_refcount.
Add comments.
(gomp_map_vars): Delete declaration.
(gomp_map_vars_async): Likewise.
(gomp_unmap_vars): Likewise.
(gomp_unmap_vars_async): Likewise.
(goacc_map_vars): New declaration.
(goacc_unmap_vars): Likewise.

* oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
(goacc_enter_datum): Likewise.
(goacc_enter_data_internal): Likewise.
* oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
and goacc_unmap_vars.
(GOACC_data_start): Adjust to use goacc_map_vars.
(GOACC_data_end): Adjust to use goacc_unmap_vars.

* target.c (hash_entry_type): New typedef.
(htab_alloc): New function hook for hashtab.h.
(htab_free): Likewise.
(htab_hash): Likewise.
(htab_eq): Likewise.
(hashtab.h): Add file include.
(gomp_increment_refcount): New function.
(gomp_decrement_refcount): Likewise.
(gomp_map_vars_existing): Add refcount_set parameter, adjust to use
gomp_increment_refcount.
(gomp_map_fields_existing): Add refcount_set parameter, adjust calls
to gomp_map_vars_existing.

(gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
variable to guard OpenMP specific paths, adjust calls to
gomp_map_vars_existing, add structure element sibling splay_tree_key
sequence creation code, adjust Fortran map case to avoid increment
under OpenMP.
(gomp_map_vars): Adju

Re: [PATCH] RISC-V: Canonicalize --with-arch

2020-12-04 Thread Matthias Klose
On 12/4/20 2:38 PM, Matthias Klose wrote:
> On 12/4/20 9:07 AM, Kito Cheng via Gcc-patches wrote:
>> Committed, thanks :)
>>
>> On Thu, Dec 3, 2020 at 8:51 AM Jim Wilson  wrote:
>>>
>>> On Tue, Dec 1, 2020 at 12:13 AM Kito Cheng  wrote:

  - We would like to canonicalize the arch string for --with-arch for
easier handling multilib, so split canonicalization part to a stand
along script to shared the logic.

 gcc/ChangeLog:

 * config/riscv/multilib-generator (arch_canonicalize): Move
 code to arch-canonicalize, and call that script to canonicalize 
 arch
 string.
 (canonical_order): Move code to arch-canonicalize.
 (LONG_EXT_PREFIXES): Ditto.
 (IMPLIED_EXT): Ditto.
 * config/riscv/arch-canonicalize: New.
 * config.gcc (riscv*-*-*): Canonicalize --with-arch.
>>>
>>>
>>> Looks OK to me.
> 
> that breaks the bootstrap if python is not available. The python command might
> not be available, so please check for python3, python, or python2.

same for config/riscv/arch-canonicalize


Re: [PATCH 0/2] Switch to a new section if the SECTION_RETAIN bit doesn't match

2020-12-04 Thread Jozef Lawrynowicz
On Fri, Dec 04, 2020 at 05:16:38AM -0800, H.J. Lu via Gcc-patches wrote:
> On Fri, Dec 4, 2020 at 4:17 AM Jozef Lawrynowicz
>  wrote:
> >
> > On Thu, Dec 03, 2020 at 04:06:50PM -0800, H.J. Lu via Gcc-patches wrote:
> > > When SECTION_RETAIN is used, definitions marked with used attribute and
> > > unmarked definitions are placed in the same section.  Instead of issue
> > > an error:
> > >
> > > [hjl@gnu-cfl-2 gcc]$ /usr/gcc-11.0.0-x32/bin/gcc -S c.c 
> > > -fdiagnostics-plain-output
> > > c.c:2:49: error: ‘foo1’ causes a section type conflict with ‘foo2’
> > > c.c:1:54: note: ‘foo2’ was declared here
> > > [hjl@gnu-cfl-2 gcc]$
> > >
> > > the first patch switches to a new section if the SECTION_RETAIN bit
> > > doesn't match.  The second optional patch issues a warning:
> > >
> > > [hjl@gnu-cfl-2 gcc]$ ./xgcc -B./ -S c.c -fdiagnostics-plain-output
> > > c.c:2:49: warning: ‘foo1’ without ‘used’ attribute is placed in a section 
> > > with ‘foo2’ with ‘used’ attribute [-Wattributes]
> > > [hjl@gnu-cfl-2 gcc]$
> >
> > I think the warning is useful, since we are modifying the structure of
> > the object file where the user may not expect it. It ensures they review
> > which declarations have "used" applied so they don't unexpectedly lose
> > parts of their program they wanted to keep by putting them in a
> > section that was marked "used" elsewhere.
> 
> I agree.
> 
> > >
> > > H.J. Lu (2):
> > >   Switch to a new section if the SECTION_RETAIN bit doesn't match
> > >   Warn used and not used symbols in the same section
> >
> > We should probably use a new PR to associate with these patches, rather
> > than PR/98121.
> >
> > Your changes here address the issue exposed by glibc code, whilst 98121
> > was for the broader issue of whether "used" should apply SHF_GNU_RETAIN.
> >
> > Let me know if you agree, and I'll create a new GCC PR for the specific
> 
> Please do.

Filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98146.

Thanks,
Jozef


[PATCH] v3: doc/implement-c.texi: About same-as-scalar-type volatile aggregate accesses, PR94600

2020-12-04 Thread Hans-Peter Nilsson via Gcc-patches
> From: Martin Sebor via Gcc-patches 
> Date: Fri, 4 Dec 2020 01:49:51 +0100

> On 12/3/20 12:14 PM, Hans-Peter Nilsson via Gcc-patches wrote:
> > Belatedly, here's an updated version, using Martin Sebor's
> > suggested wording from
> > "https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549580.html";.
> > I added two commas, hopefully helpfully.  Albeit ok'd by Richard
> > Biener in
> > "https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549922.html";,
> > better have this reviewed properly, including markup (none added).
> > 
> > Ok for trunk (gcc-11) and gcc-10?
> 
> Thanks for taking my suggestion!

You're welcome!

> These are just formatting nits but I would only further suggest
> to enclose the name S (since it names a type) and the second
> volatile in an @code{} directive (since it's a keyword).
> (The volatile in volatile access is not one so it shouldn't
> be formatted that way.)

Here we go, and now with all the right email-addresses.
Also, I inspected info and pdf output.  Yes, the last S ends up
on a line by its own in the pdf.  I didn't think it was worth
fixing by e.g. messing with the word order.

BTW, "make -j 4 info pdf" from the top-level doesn't work;
something is messed up in dependencies.  From a non-j "make info
pdf" it looks like libgcc wanted to compile stuff (no "all-gcc"
was done).

---
We say very little about reads and writes to aggregate /
compound objects, just scalar objects (i.e. assignments don't
cause reads).  Let's lets say something safe about aggregate
objects, but only for those that are the same size as a scalar
type.

There's an equal-sounding section (Volatiles) in extend.texi,
but this seems a more appropriate place, as specifying the
behavior of a standard qualifier.

Ok for trunk (gcc-11) and gcc-10?

gcc:

2020-12-04  Hans-Peter Nilsson  
Martin Sebor  

PR middle-end/94600
* doc/implement-c.texi (Qualifiers implementation): Add blurb
about access to the whole of a volatile aggregate object, only for
same-size as a scalar object.
---
 gcc/doc/implement-c.texi | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/doc/implement-c.texi b/gcc/doc/implement-c.texi
index 692297b69c4..d7433ba5213 100644
--- a/gcc/doc/implement-c.texi
+++ b/gcc/doc/implement-c.texi
@@ -576,6 +576,11 @@ are of scalar types, the expression is interpreted by GCC 
as a read of
 the volatile object; in the other cases, the expression is only evaluated
 for its side effects.
 
+When an object of an aggregate type, with the same size and alignment as a
+scalar type @code{S}, is the subject of a volatile access by an assignment
+expression or an atomic function, the access to it is performed as if the
+object's declared type were @code{volatile S}.
+
 @end itemize
 
 @node Declarators implementation
-- 
2.11.0

brgds, H-P


Re: [PATCH] Remove misleading debug line entries

2020-12-04 Thread Bernd Edlinger
On 12/3/20 9:30 AM, Richard Biener wrote:
> On Wed, 2 Dec 2020, Bernd Edlinger wrote:
> 
>> On 12/2/20 8:50 AM, Richard Biener wrote:
>>> On Tue, 1 Dec 2020, Bernd Edlinger wrote:
>>>
 Hi!


 This removes gimple_debug stmts without block info after a
 NULL INLINE_ENTRY.

 The line numbers from these stmts are from the inline function,
 but since the inline function is completely optimized away,
 there will be no DW_TAG_inlined_subroutine so the debugger has
 no callstack available at this point, and therefore those
 line table entries are not helpful to the user.

 2020-11-20  Bernd Edlinger  

* cfgexpand.c (expand_gimple_basic_block): Remove debug_begin_stmts
following a removed debug_inline_entry.


 Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
 Is it OK for trunk?
>>>
>>> So are those visited by clear_unused_block_pointer?  If so wouldn't
>>> it be more appropriate to remove those there, when we elide the
>>> inlined block scope?
>>>
>>
>> That's what I thought initially, but that is only true for 99% of the 
>> inline statements.  However 1% of the inline_entries without block info,
>> and debug_begin_stmts without block info, that have line numbers from
>> an inline header, do actually originate here:
>>
>> copy_debug_stmt (gdebug *stmt, copy_body_data *id)
>> {
>>   tree t, *n;
>>   struct walk_stmt_info wi;
>>
>>   if (tree block = gimple_block (stmt))
>> {
>>   n = id->decl_map->get (block);
>>   gimple_set_block (stmt, n ? *n : id->block);
>> }
>>
>> because id->block is NULL, and decl_map does not have
>> an entry.
>>
>> So I tracked it down why that happens.
>>
>> I think remap_gimple_stmt should just drop those nonbind markers
>> on the floor when the call statement has no block information.
>>
>> Once that is fixed, the special handling of inline entries without
>> block info can as well be moved from remap_gimple_stmt to
>> clear_unused_block_pointer.
>>
>> What do you think of this (not yet fully tested) patch?
>>
>> Is it OK when bootstrap and reg-testing passes?
> 
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index d9814bd..e87c653 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -1819,7 +1819,8 @@ remap_gimple_stmt (gimple *stmt, copy_body_data *id)
>   /* If the inlined function has too many debug markers,
>  don't copy them.  */
>   if (id->src_cfun->debug_marker_count
> - > param_max_debug_marker_count)
> + > param_max_debug_marker_count
> + || !id->block)
> return stmts;
> 
> Isn't this overly pessimistic in throwing away all debug markers
> of an inline rather than just those which are associated with
> the outermost scope (that's mapped to NULL if !id->block)?  Can
> we instead remap the block here (move it from copy_debug_stmt)
> and elide the copy only when it maps to NULL?
> 

Yes, indeed, I missed the fact that this is also called up from
tree_function_versioning.  id->block is always NULL in that case.
But since this should be a 1:1 copy, missing block info should not
get worse as it already is.  Fortunately it is possible to distinguish
that from the actual inlining by looking at id->call_stmt.


> 
>   gdebug *copy = as_a  (gimple_copy (stmt));
> diff --git a/gcc/tree-ssa-live.c b/gcc/tree-ssa-live.c
> index 21a9ee4..ca119c6 100644
> --- a/gcc/tree-ssa-live.c
> +++ b/gcc/tree-ssa-live.c
> @@ -623,13 +623,25 @@ clear_unused_block_pointer (void)
>{
> unsigned i;
> tree b;
> -   gimple *stmt = gsi_stmt (gsi);
> +   gimple *stmt;
>  
> +  next:
> +   stmt = gsi_stmt (gsi);
> if (!is_gimple_debug (stmt) && !gimple_clobber_p (stmt))
>   continue;
> b = gimple_block (stmt);
> if (b && !TREE_USED (b))
> - gimple_set_block (stmt, NULL);
> + {
> +   if (gimple_debug_nonbind_marker_p (stmt)
> +   && BLOCK_ABSTRACT_ORIGIN (b))
> 
> why only for inlined BLOCKs?  Did you want to restrict it further
> to inlined_function_outer_scope_p?
> 

Yes.
I had assumed that check would be sufficient, but as you said,
I have to walk the block structure, until I find a
inlined_function_outer_scope_p.

I don't know if there is a chance that any of the debug lines will
get a block info assigned in the end, if id->block == NULL, but I think
it does not hurt to remove the debug statements in copy_debug_stmt.

> I guess I don't understand the debug situation fully - I guess it is
> about jumping to locations in inlines where the call stack does
> not show we are in the actual inlined function?  But IIRC at least
> unused BLOCK removal never elides the actual 
> inlined_function_outer_scope_p which would leave the inlining case
> you spotted.  But there we should zap all markers that belong to
> the inlined function but not those which belong to another inline
> instance?  So we want to

Re: H8 cc0 conversion

2020-12-04 Thread Maciej W. Rozycki
On Wed, 25 Nov 2020, Hans-Peter Nilsson wrote:

> Current cc0 head-count is down to avr, cr16, h8300, vax, with
> two of them recently having patches posted, alas not a lot of
> ports left to try this advice.

 Hmm, the VAX port surely did not qualify for an innovative approach 
anyway (though still I made it a bit different by means of how I (ab)used 
subst iterators and chose to ignore both rtx's in SELECT_CC_MODE; arguably 
that was the only sustainable choice) as it has been too bitrotten to make 
experiments with without a major cleanup being made first, and by the time 
I started the effort no time has left for it.  Otherwise you would simply 
not know if any phenomenon observed is due to the change being made or 
unrelated breakage.

 As usually with software however nothing has been cast in stone (even 
things made to be as stable as ABIs do change from time to time), so with 
the conversion out of the way any remaining cleanup can be made and then 
we can try removing the splits in favour to clobbers exposed pre-reload 
and see what happens.  If that turns out feasible, then other ports of 
this kind may follow.

 You may want to have your observations posted in the wiki however.

  Maciej


V2 [PATCH 2/2] Warn used and not used symbols in section with the same name

2020-12-04 Thread H.J. Lu via Gcc-patches
When SECTION_RETAIN is used, issue a warning when a symbol without used
attribute and a symbol with used attribute are placed in the section with
the same name, like

int __attribute__((used,section(".data.foo"))) foo2 = 2;
int __attribute__((section(".data.foo"))) foo1 = 1;

since assembler will put them in different sections with the same section
name.

gcc/

PR target/98146
* varasm.c (switch_to_section): Warn when a symbol without used
attribute and a symbol with used attribute are placed in the
section with the same name.

gcc/testsuite/

PR target/98146
* c-c++-common/attr-used-5.c: Updated.
* c-c++-common/attr-used-6.c: Likewise.
* c-c++-common/attr-used-7.c: Likewise.
* c-c++-common/attr-used-8.c: Likewise.
* c-c++-common/attr-used-9.c: Likewise.
---
 gcc/testsuite/c-c++-common/attr-used-5.c |  1 +
 gcc/testsuite/c-c++-common/attr-used-6.c |  1 +
 gcc/testsuite/c-c++-common/attr-used-7.c |  1 +
 gcc/testsuite/c-c++-common/attr-used-8.c |  1 +
 gcc/testsuite/c-c++-common/attr-used-9.c | 28 
 gcc/varasm.c | 22 ---
 6 files changed, 51 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-9.c

diff --git a/gcc/testsuite/c-c++-common/attr-used-5.c 
b/gcc/testsuite/c-c++-common/attr-used-5.c
index 9fc0d3834e9..38ca8ef83f3 100644
--- a/gcc/testsuite/c-c++-common/attr-used-5.c
+++ b/gcc/testsuite/c-c++-common/attr-used-5.c
@@ -10,6 +10,7 @@ extern struct dtv_slotinfo_list *list;
 
 static int __attribute__ ((section ("__libc_freeres_fn")))
 free_slotinfo (struct dtv_slotinfo_list **elemp)
+/* { dg-warning "'.*' without 'used' attribute and '.*' with 'used' attribute 
are placed in a section with the same name" "" { target *-*-* } .-1 } */
 {
   if (!free_slotinfo (&(*elemp)->next))
 return 0;
diff --git a/gcc/testsuite/c-c++-common/attr-used-6.c 
b/gcc/testsuite/c-c++-common/attr-used-6.c
index 0cb82ade5a9..a4800f6d0f1 100644
--- a/gcc/testsuite/c-c++-common/attr-used-6.c
+++ b/gcc/testsuite/c-c++-common/attr-used-6.c
@@ -18,6 +18,7 @@ free_slotinfo (struct dtv_slotinfo_list **elemp)
 
 __attribute__ ((section ("__libc_freeres_fn")))
 void free_mem (void)
+/* { dg-warning "'.*' without 'used' attribute and '.*' with 'used' attribute 
are placed in a section with the same name" "" { target *-*-* } .-1 } */
 {
   free_slotinfo (&list);
 }
diff --git a/gcc/testsuite/c-c++-common/attr-used-7.c 
b/gcc/testsuite/c-c++-common/attr-used-7.c
index fba2706ffc1..39923cdde33 100644
--- a/gcc/testsuite/c-c++-common/attr-used-7.c
+++ b/gcc/testsuite/c-c++-common/attr-used-7.c
@@ -3,6 +3,7 @@
 
 int __attribute__((used,section(".data.foo"))) foo2 = 2;
 int __attribute__((section(".data.foo"))) foo1 = 1;
+/* { dg-warning "'.*' without 'used' attribute and '.*' with 'used' attribute 
are placed in a section with the same name" "" { target *-*-* } .-1 } */
 
 /* { dg-final { scan-assembler ".data.foo,\"aw\"" { target R_flag_in_section } 
} } */
 /* { dg-final { scan-assembler ".data.foo,\"awR\"" { target R_flag_in_section 
} } } */
diff --git a/gcc/testsuite/c-c++-common/attr-used-8.c 
b/gcc/testsuite/c-c++-common/attr-used-8.c
index 4da4aabe573..032cdd20901 100644
--- a/gcc/testsuite/c-c++-common/attr-used-8.c
+++ b/gcc/testsuite/c-c++-common/attr-used-8.c
@@ -2,6 +2,7 @@
 /* { dg-options "-Wall -O2" } */
 
 int __attribute__((section(".data.foo"))) foo1 = 1;
+/* { dg-warning "'.*' without 'used' attribute and '.*' with 'used' attribute 
are placed in a section with the same name" "" { target *-*-* } .-1 } */
 int __attribute__((used,section(".data.foo"))) foo2 = 2;
 
 /* { dg-final { scan-assembler ".data.foo,\"aw\"" { target R_flag_in_section } 
} } */
diff --git a/gcc/testsuite/c-c++-common/attr-used-9.c 
b/gcc/testsuite/c-c++-common/attr-used-9.c
new file mode 100644
index 000..502c768c813
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-used-9.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -O2" } */
+
+struct dtv_slotinfo_list
+{
+  struct dtv_slotinfo_list *next;
+};
+
+extern struct dtv_slotinfo_list *list;
+
+static int __attribute__ ((used, section ("__libc_freeres_fn")))
+free_slotinfo (struct dtv_slotinfo_list **elemp)
+{
+  if (!free_slotinfo (&(*elemp)->next))
+return 0;
+  return 1;
+}
+
+__attribute__ ((section ("__libc_freeres_fn")))
+static void free_mem (void)
+/* { dg-warning "defined but not used" "" { target *-*-* } .-1 } */
+{
+  free_slotinfo (&list);
+}
+
+/* { dg-final { scan-assembler-not "__libc_freeres_fn,\"ax\"" } }*/
+/* { dg-final { scan-assembler-not "__libc_freeres_fn\n" } } */
+/* { dg-final { scan-assembler "__libc_freeres_fn,\"axR\"" { target 
R_flag_in_section } } } */
diff --git a/gcc/varasm.c b/gcc/varasm.c
index c85d39813ec..025e0fb32fe 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -7731,11 +7731,27 @@ switch_to_section (section *new_section, tree decl)
{

V2 [PATCH 0/2] Switch to a new section if the SECTION_RETAIN bit doesn't match

2020-12-04 Thread H.J. Lu via Gcc-patches
When SECTION_RETAIN is used, definitions marked with used attribute and
unmarked definitions are placed in a section with the same name.  Instead
of issue an error:

[hjl@gnu-cfl-2 gcc]$ /usr/gcc-11.0.0-x32/bin/gcc -S c.c 
-fdiagnostics-plain-output
c.c:2:49: error: ‘foo1’ causes a section type conflict with ‘foo2’
c.c:1:54: note: ‘foo2’ was declared here
[hjl@gnu-cfl-2 gcc]$

the first patch switches to a new section if the SECTION_RETAIN bit
doesn't match.  The second optional patch issues a warning:

[hjl@gnu-cfl-2 gcc]$ ./xgcc -B./ -S c.c
c.c:2:49: warning: ‘foo1’ without ‘used’ attribute and ‘foo2’ with ‘used’ 
attribute are placed in a section with the same name [-Wattributes]
2 | const int __attribute__((section(".data.foo"))) foo1 = 1;
  | ^~~~
c.c:1:54: note: ‘foo2’ was declared here
1 | const int __attribute__((used,section(".data.foo"))) foo2 = 2;
  |
[hjl@gnu-cfl-2 gcc]$

H.J. Lu (2):
  Switch to a new section if the SECTION_RETAIN bit doesn't match
  Warn used and not used symbols in section with the same name

 gcc/output.h |  2 +-
 gcc/testsuite/c-c++-common/attr-used-5.c | 27 ++
 gcc/testsuite/c-c++-common/attr-used-6.c | 27 ++
 gcc/testsuite/c-c++-common/attr-used-7.c |  9 +
 gcc/testsuite/c-c++-common/attr-used-8.c |  9 +
 gcc/testsuite/c-c++-common/attr-used-9.c | 28 +++
 gcc/varasm.c | 45 +---
 7 files changed, 142 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-5.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-6.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-7.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-8.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-9.c

-- 
2.28.0



V2 [PATCH 1/2] Switch to a new section if the SECTION_RETAIN bit doesn't match

2020-12-04 Thread H.J. Lu via Gcc-patches
When definitions marked with used attribute and unmarked definitions are
placed in the section with the same name, switch to a new section if the
SECTION_RETAIN bit doesn't match.

gcc/

PR target/98146
* output.h (switch_to_section): Add a tree argument, default to
nullptr.
* varasm.c (get_section): If the SECTION_RETAIN bit doesn't match,
return and switch to a new section later.
(assemble_start_function): Pass decl to switch_to_section.
(assemble_variable): Likewise.
(switch_to_section): If the SECTION_RETAIN bit doesn't match,
switch to a new section.

gcc/testsuite/

PR target/98146
* c-c++-common/attr-used-5.c: New test.
* c-c++-common/attr-used-6.c: Likewise.
* c-c++-common/attr-used-7.c: Likewise.
* c-c++-common/attr-used-8.c: Likewise.
---
 gcc/output.h |  2 +-
 gcc/testsuite/c-c++-common/attr-used-5.c | 26 +
 gcc/testsuite/c-c++-common/attr-used-6.c | 26 +
 gcc/testsuite/c-c++-common/attr-used-7.c |  8 +++
 gcc/testsuite/c-c++-common/attr-used-8.c |  8 +++
 gcc/varasm.c | 29 
 6 files changed, 94 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-5.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-6.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-7.c
 create mode 100644 gcc/testsuite/c-c++-common/attr-used-8.c

diff --git a/gcc/output.h b/gcc/output.h
index fa8ace1f394..1f9af46da1d 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -548,7 +548,7 @@ extern void switch_to_other_text_partition (void);
 extern section *get_cdtor_priority_section (int, bool);
 
 extern bool unlikely_text_section_p (section *);
-extern void switch_to_section (section *);
+extern void switch_to_section (section *, tree = nullptr);
 extern void output_section_asm_op (const void *);
 
 extern void record_tm_clone_pair (tree, tree);
diff --git a/gcc/testsuite/c-c++-common/attr-used-5.c 
b/gcc/testsuite/c-c++-common/attr-used-5.c
new file mode 100644
index 000..9fc0d3834e9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-used-5.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -O2" } */
+
+struct dtv_slotinfo_list
+{
+  struct dtv_slotinfo_list *next;
+};
+
+extern struct dtv_slotinfo_list *list;
+
+static int __attribute__ ((section ("__libc_freeres_fn")))
+free_slotinfo (struct dtv_slotinfo_list **elemp)
+{
+  if (!free_slotinfo (&(*elemp)->next))
+return 0;
+  return 1;
+}
+
+__attribute__ ((used, section ("__libc_freeres_fn")))
+static void free_mem (void)
+{
+  free_slotinfo (&list);
+}
+
+/* { dg-final { scan-assembler "__libc_freeres_fn,\"ax\"" { target 
R_flag_in_section } } } */
+/* { dg-final { scan-assembler "__libc_freeres_fn,\"axR\"" { target 
R_flag_in_section } } } */
diff --git a/gcc/testsuite/c-c++-common/attr-used-6.c 
b/gcc/testsuite/c-c++-common/attr-used-6.c
new file mode 100644
index 000..0cb82ade5a9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-used-6.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -O2" } */
+
+struct dtv_slotinfo_list
+{
+  struct dtv_slotinfo_list *next;
+};
+
+extern struct dtv_slotinfo_list *list;
+
+static int __attribute__ ((used, section ("__libc_freeres_fn")))
+free_slotinfo (struct dtv_slotinfo_list **elemp)
+{
+  if (!free_slotinfo (&(*elemp)->next))
+return 0;
+  return 1;
+}
+
+__attribute__ ((section ("__libc_freeres_fn")))
+void free_mem (void)
+{
+  free_slotinfo (&list);
+}
+
+/* { dg-final { scan-assembler "__libc_freeres_fn,\"ax\"" { target 
R_flag_in_section } } } */
+/* { dg-final { scan-assembler "__libc_freeres_fn,\"axR\"" { target 
R_flag_in_section } } } */
diff --git a/gcc/testsuite/c-c++-common/attr-used-7.c 
b/gcc/testsuite/c-c++-common/attr-used-7.c
new file mode 100644
index 000..fba2706ffc1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-used-7.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -O2" } */
+
+int __attribute__((used,section(".data.foo"))) foo2 = 2;
+int __attribute__((section(".data.foo"))) foo1 = 1;
+
+/* { dg-final { scan-assembler ".data.foo,\"aw\"" { target R_flag_in_section } 
} } */
+/* { dg-final { scan-assembler ".data.foo,\"awR\"" { target R_flag_in_section 
} } } */
diff --git a/gcc/testsuite/c-c++-common/attr-used-8.c 
b/gcc/testsuite/c-c++-common/attr-used-8.c
new file mode 100644
index 000..4da4aabe573
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/attr-used-8.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -O2" } */
+
+int __attribute__((section(".data.foo"))) foo1 = 1;
+int __attribute__((used,section(".data.foo"))) foo2 = 2;
+
+/* { dg-final { scan-assembler ".data.foo,\"aw\"" { target R_flag_in_section } 
} } */
+/* { dg-final { scan-assembler ".data.foo,\"awR\"" { target R_flag_in_section 
} } } */
diff --git a/

Re: How to traverse all the local variables that declared in the current routine?

2020-12-04 Thread Qing Zhao via Gcc-patches



> On Dec 4, 2020, at 2:50 AM, Richard Biener  wrote:
> 
> On Thu, Dec 3, 2020 at 6:33 PM Richard Sandiford
> mailto:richard.sandif...@arm.com>> wrote:
>> 
>> Richard Biener via Gcc-patches  writes:
>>> On Tue, Nov 24, 2020 at 4:47 PM Qing Zhao  wrote:
 Another issue is, in order to check whether an auto-variable has 
 initializer, I plan to add a new bit in “decl_common” as:
  /* In a VAR_DECL, this is DECL_IS_INITIALIZED.  */
  unsigned decl_is_initialized :1;
 
 /* IN VAR_DECL, set when the decl is initialized at the declaration.  */
 #define DECL_IS_INITIALIZED(NODE) \
  (DECL_COMMON_CHECK (NODE)->decl_common.decl_is_initialized)
 
 set this bit when setting DECL_INITIAL for the variables in FE. then keep 
 it
 even though DECL_INITIAL might be NULLed.
>>> 
>>> For locals it would be more reliable to set this flag during gimplification.
>>> 
 Do you have any comment and suggestions?
>>> 
>>> As said above - do you want to cover registers as well as locals?  I'd do
>>> the actual zeroing during RTL expansion instead since otherwise you
>>> have to figure youself whether a local is actually used (see 
>>> expand_stack_vars)
>>> 
>>> Note that optimization will already made have use of "uninitialized" state
>>> of locals so depending on what the actual goal is here "late" may be too 
>>> late.
>> 
>> Haven't thought about this much, so it might be a daft idea, but would a
>> compromise be to use a const internal function:
>> 
>>  X1 = .DEFERRED_INIT (X0, INIT)
>> 
>> where the X0 argument is an uninitialised value and the INIT argument
>> describes the initialisation pattern?  So for a decl we'd have:
>> 
>>  X = .DEFERRED_INIT (X, INIT)
>> 
>> and for an SSA name we'd have:
>> 
>>  X_2 = .DEFERRED_INIT (X_1(D), INIT)
>> 
>> with all other uses of X_1(D) being replaced by X_2.  The idea is that:
>> 
>> * Having the X0 argument would keep the uninitialised use of the
>>  variable around for the later warning passes.
>> 
>> * Using a const function should still allow the UB to be deleted as dead
>>  if X1 isn't needed.
>> 
>> * Having a function in the way should stop passes from taking advantage
>>  of direct uninitialised uses for optimisation.
>> 
>> This means we won't be able to optimise based on the actual init
>> value at the gimple level, but that seems like a fair trade-off.
>> AIUI this is really a security feature or anti-UB hardening feature
>> (in the sense that users are more likely to see predictable behaviour
>> “in the field” even if the program has UB).
> 
> The question is whether it's in line of peoples expectation that
> explicitely zero-initialized code behaves differently from
> implicitely zero-initialized code with respect to optimization
> and secondary side-effects (late diagnostics, latent bugs, etc.).
> 
> Introducing a new concept like .DEFERRED_INIT is much more
> heavy-weight than an explicit zero initializer.

What exactly you mean by “heavy-weight”? More difficult to implement or much 
more run-time overhead or both? Or something else?

The major benefit of the approach of “.DEFERRED_INIT”  is to enable us keep the 
current -Wuninitialized analysis untouched and also pass
the “uninitialized” info from source code level to “pass_expand”. 

If we want to keep the current -Wuninitialized analysis untouched, this is a 
quite reasonable approach. 

However, if it’s not required to keep the current -Wuninitialized analysis 
untouched, adding zero-initializer directly during gimplification should
be much easier and simpler, and also smaller run-time overhead.

> 
> As for optimization I fear you'll get a load of redundant zero-init
> actually emitted if you can just rely on RTL DSE/DCE to remove it.

Runtime overhead for -fauto-init=zero is one important consideration for the 
whole feature, we should minimize the runtime overhead for zero
Initialization since it will be used in production build. 
We can do some run-time performance evaluation when we have an implementation 
ready. 

> 
> Btw, I don't think theres any reason to cling onto clangs semantics
> for a particular switch.  We'll never be able to emulate 1:1 behavior
> and our -Wuninit behavior is probably wastly different already.

From my study so far, yes, the currently behavior of -Wunit for Clang and GCC 
is not exactly the same. 

For example, for the following small testing case:
void blah(int);

int foo_2 (int n, int l, int m, int r)
{
  int v;

  if ( (n > 10) && (m != 100)  && (r < 20) )
v = r;

  if (l > 100)
if ( (n <= 8) &&  (m < 102)  && (r < 19) )
  blah(v); /* { dg-warning "uninitialized" "real warning" } */

  return 0;
}

GCC is able to report maybe uninitialized warning, but Clang cannot. 
Looks like that GCC’s uninitialized analysis relies on more analysis and 
optimization information than CLANG. 

Really curious on how clang implement its uninitialized analysis?

Qing



> 
> Richard.
> 
>> Thanks,
>> Richard



Re: [AArch64] Add --with-tune configure flag

2020-12-04 Thread Pop, Sebastian via Gcc-patches
On 11/19/20, 10:52 AM, "Richard Earnshaw (lists)"  
wrote:
> Having the same option have a completely different meaning would be even
> worse than not having the option at all.  So no, that's a non-starter.

The attached patch 0001 removes --with-{cpu,arch,tune}-32.
Bootstrap and regression testing pass on aarch64-linux.
Ok to commit to trunk and active branches?

I would like to ping the two patches from Wilco Dijkstra that fix issues in 
configure --with-mtune flag:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553865.html
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553866.html

Please see patches 0002 and 0003 attached, rebased on trunk as of today and 
tested with bootstrap and regression testing.
Ok to commit to trunk and all active branches?

Thanks,
Sebastian



0001-AArch64-disable-with-cpu-arch-tune-32.patch
Description: 0001-AArch64-disable-with-cpu-arch-tune-32.patch


0002-AArch64-Cleanup-CPU-option-processing-code.patch
Description: 0002-AArch64-Cleanup-CPU-option-processing-code.patch


0003-AArch64-Add-support-for-with-tune.patch
Description: 0003-AArch64-Add-support-for-with-tune.patch


c++: Revert dependent-array changes [PR 98116]

2020-12-04 Thread Nathan Sidwell


The changes reverted here are exposing an existing problem with alias
template comparisons.  The typename_type changes are also incomplete,
possibly for similar reasons.  It seems safer to revert them, fix the
underlying issue and then move forwards.

The testcases is adjusted to more robustly check the specialization
table, and ICEs with and without the c++ changes.

PR c++/98116
Revert:
62fb1b9e0da c++: Fix array type dependency [PR 98107]
07589ca2b2c c++: typename_type structural comparison
329ae1d7751 c++: Extend build_array_type API
gcc/cp/
* cp-tree.h (comparing_typenames): Delete.
(cplus_build_array_type): Remove default parm.
* pt.c (comparing_typenames): Delete.
(spec_hasher::equal): Don't increment it.
* tree.c (set_array_type_canon): Remove dep parm.
(build_cplus_array_type): Remove dep parm changes.
(cp_build_qualified_type_real): Remove dependent array type
changes.
(strip_typedefs): Likewise.
* typeck.c (structural_comptypes): Revert comparing_typename
changes.
gcc/testsuite/
* g++.dg/template/pr98116.C: Enable robust checking.

pushing to trunk


--
Nathan Sidwell
diff --git i/gcc/cp/cp-tree.h w/gcc/cp/cp-tree.h
index c7f8371c665..00901fe42d4 100644
--- i/gcc/cp/cp-tree.h
+++ w/gcc/cp/cp-tree.h
@@ -5422,10 +5422,6 @@ extern int function_depth;
in structrual_comptypes.  */
 extern int comparing_specializations;
 
-/* Nonzero if we are inside eq_specializations, which affects
-   resolving of typenames in structural_comptypes.  */
-extern int comparing_typenames;
-
 /* In parser.c.  */
 
 /* Nonzero if we are parsing an unevaluated operand: an operand to
@@ -7563,7 +7559,7 @@ extern bool is_local_temp			(tree);
 extern tree build_aggr_init_expr		(tree, tree);
 extern tree get_target_expr			(tree);
 extern tree get_target_expr_sfinae		(tree, tsubst_flags_t);
-extern tree build_cplus_array_type		(tree, tree, int is_dep = -1);
+extern tree build_cplus_array_type		(tree, tree);
 extern tree build_array_of_n_type		(tree, int);
 extern bool array_of_runtime_bound_p		(tree);
 extern bool vla_type_p(tree);
diff --git i/gcc/cp/pt.c w/gcc/cp/pt.c
index 08931823d57..9e8113d51a3 100644
--- i/gcc/cp/pt.c
+++ w/gcc/cp/pt.c
@@ -1704,19 +1704,16 @@ register_specialization (tree spec, tree tmpl, tree args, bool is_friend,
   return spec;
 }
 
-/* Restricts tree and type comparisons.  */
-int comparing_specializations;
-int comparing_typenames;
-
 /* Returns true iff two spec_entry nodes are equivalent.  */
 
+int comparing_specializations;
+
 bool
 spec_hasher::equal (spec_entry *e1, spec_entry *e2)
 {
   int equal;
 
   ++comparing_specializations;
-  ++comparing_typenames;
   equal = (e1->tmpl == e2->tmpl
 	   && comp_template_args (e1->args, e2->args));
   if (equal && flag_concepts
@@ -1732,7 +1729,6 @@ spec_hasher::equal (spec_entry *e1, spec_entry *e2)
   equal = equivalent_constraints (c1, c2);
 }
   --comparing_specializations;
-  --comparing_typenames;
 
   return equal;
 }
diff --git i/gcc/cp/tree.c w/gcc/cp/tree.c
index d9fa505041f..4e6bf9abba6 100644
--- i/gcc/cp/tree.c
+++ w/gcc/cp/tree.c
@@ -998,7 +998,7 @@ build_min_array_type (tree elt_type, tree index_type)
build_cplus_array_type.  */
 
 static void
-set_array_type_canon (tree t, tree elt_type, tree index_type, bool dep)
+set_array_type_canon (tree t, tree elt_type, tree index_type)
 {
   /* Set the canonical type for this new node.  */
   if (TYPE_STRUCTURAL_EQUALITY_P (elt_type)
@@ -1009,33 +1009,30 @@ set_array_type_canon (tree t, tree elt_type, tree index_type, bool dep)
 TYPE_CANONICAL (t)
   = build_cplus_array_type (TYPE_CANONICAL (elt_type),
 index_type
-? TYPE_CANONICAL (index_type) : index_type,
-dep);
+? TYPE_CANONICAL (index_type) : index_type);
   else
 TYPE_CANONICAL (t) = t;
 }
 
 /* Like build_array_type, but handle special C++ semantics: an array of a
variant element type is a variant of the array of the main variant of
-   the element type.  IS_DEPENDENT is -ve if we should determine the
-   dependency.  Otherwise its bool value indicates dependency.  */
+   the element type.  */
 
 tree
-build_cplus_array_type (tree elt_type, tree index_type, int dependent)
+build_cplus_array_type (tree elt_type, tree index_type)
 {
   tree t;
 
   if (elt_type == error_mark_node || index_type == error_mark_node)
 return error_mark_node;
 
-  if (dependent < 0)
-dependent = (uses_template_parms (elt_type)
-		 || (index_type && uses_template_parms (index_type)));
+  bool dependent = (uses_template_parms (elt_type)
+		|| (index_type && uses_template_parms (index_type)));
 
   if (elt_type != TYPE_MAIN_VARIANT (elt_type))
 /* Start with an array of the TYPE_MAIN_VARIANT.  */
 t = build_cplus_array_type (TYPE_MAIN_VARIANT (elt_type),
-index_type, dependent);
+index_type);
   else if (dependent)
 {
   /* Sinc

[PATCH] gimple: Return fnspec only for replaceable new/delete operators called from new/delete [PR98130]

2020-12-04 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, we shouldn't treat non-replaceable operator
new/delete (e.g. with the placement new) as replaceable ones.

There is some pending discussion that perhaps operator delete called from
delete if not replaceable should return some other fnspec, but can we handle
that incrementally, fix this wrong-code and then deal with a missed
optimization?  I really don't know what exactly should be returned.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-04  Jakub Jelinek  

PR c++/98130
* gimple.c (gimple_call_fnspec): Only return ".co " for replaceable
operator delete or ".mC" for replaceable operator new called from
new/delete.

* g++.dg/opt/pr98130.C: New test.

--- gcc/gimple.c.jj 2020-11-26 01:14:47.528081989 +0100
+++ gcc/gimple.c2020-12-04 13:31:10.885766239 +0100
@@ -1514,11 +1514,12 @@ gimple_call_fnspec (const gcall *stmt)
  such operator, then we can treat it as free.  */
   if (fndecl
   && DECL_IS_OPERATOR_DELETE_P (fndecl)
+  && DECL_IS_REPLACEABLE_OPERATOR (fndecl)
   && gimple_call_from_new_or_delete (stmt))
 return ".co ";
   /* Similarly operator new can be treated as malloc.  */
   if (fndecl
-  && DECL_IS_OPERATOR_NEW_P (fndecl)
+  && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
   && gimple_call_from_new_or_delete (stmt))
 return "mC";
   return "";
--- gcc/testsuite/g++.dg/opt/pr98130.C.jj   2020-12-04 12:30:11.510988404 
+0100
+++ gcc/testsuite/g++.dg/opt/pr98130.C  2020-12-04 12:33:05.663028984 +0100
@@ -0,0 +1,25 @@
+// PR c++/98130
+// { dg-do run { target c++11 } }
+// { dg-options "-O2" }
+
+#include 
+
+typedef int *T;
+
+static unsigned char storage[sizeof (T)] alignas (T);
+static T *p = (T *) storage;
+
+static inline __attribute__((__always_inline__)) void
+foo (T value)
+{
+  new (p) T(value);
+}
+
+int
+main ()
+{
+  int a;
+  foo (&a);
+  if (!*p)
+__builtin_abort ();
+}

Jakub



[PATCH] c++: Fix constexpr access to union member through pointer-to-member [PR98122]

2020-12-04 Thread Jakub Jelinek via Gcc-patches
Hi!

We currently incorrectly reject the first testcase, because
cxx_fold_indirect_ref_1 doesn't attempt to handle UNION_TYPEs.
As the second testcase shows, it isn't that easy, because I believe we need
to take into account the active member and prefer that active member over
other members, because if we pick a non-active one, we might reject valid
programs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-04  Jakub Jelinek  

PR c++/98122
* constexpr.c (cxx_fold_indirect_ref_1): Add ctx argument, pass it
through to recursive call.  Handle UNION_TYPE.
(cxx_fold_indirect_ref): Add ctx argument, pass it to recursive calls
and cxx_fold_indirect_ref_1.
(cxx_eval_indirect_ref): Adjust cxx_fold_indirect_ref calls.

* g++.dg/cpp1y/constexpr-98122.C: New test.
* g++.dg/cpp2a/constexpr-98122.C: New test.

--- gcc/cp/constexpr.c.jj   2020-12-03 15:43:00.491620290 +0100
+++ gcc/cp/constexpr.c  2020-12-04 14:10:50.649944649 +0100
@@ -4614,8 +4614,8 @@ same_type_ignoring_tlq_and_bounds_p (tre
 /* Helper function for cxx_fold_indirect_ref_1, called recursively.  */
 
 static tree
-cxx_fold_indirect_ref_1 (location_t loc, tree type, tree op,
-unsigned HOST_WIDE_INT off, bool *empty_base)
+cxx_fold_indirect_ref_1 (const constexpr_ctx *ctx, location_t loc, tree type,
+tree op, unsigned HOST_WIDE_INT off, bool *empty_base)
 {
   tree optype = TREE_TYPE (op);
   unsigned HOST_WIDE_INT const_nunits;
@@ -4674,13 +4674,41 @@ cxx_fold_indirect_ref_1 (location_t loc,
  tree index = size_int (idx + tree_to_uhwi (min_val));
  op = build4_loc (loc, ARRAY_REF, TREE_TYPE (optype), op, index,
   NULL_TREE, NULL_TREE);
- return cxx_fold_indirect_ref_1 (loc, type, op, rem,
+ return cxx_fold_indirect_ref_1 (ctx, loc, type, op, rem,
  empty_base);
}
 }
   /* ((foo *)&struct_with_foo_field)[x] => COMPONENT_REF */
-  else if (TREE_CODE (optype) == RECORD_TYPE)
+  else if (TREE_CODE (optype) == RECORD_TYPE
+  || TREE_CODE (optype) == UNION_TYPE)
 {
+  if (TREE_CODE (optype) == UNION_TYPE)
+   {
+ /* For unions prefer the currently active member.  */
+ constexpr_ctx new_ctx = *ctx;
+ new_ctx.quiet = true;
+ bool non_constant_p = false, overflow_p = false;
+ tree ctor = cxx_eval_constant_expression (&new_ctx, op, false,
+   &non_constant_p,
+   &overflow_p);
+ if (TREE_CODE (ctor) == CONSTRUCTOR
+ && CONSTRUCTOR_NELTS (ctor) == 1
+ && CONSTRUCTOR_ELT (ctor, 0)->index
+ && TREE_CODE (CONSTRUCTOR_ELT (ctor, 0)->index) == FIELD_DECL)
+   {
+ tree field = CONSTRUCTOR_ELT (ctor, 0)->index;
+ unsigned HOST_WIDE_INT el_sz
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (field)));
+ if (off < el_sz)
+   {
+ tree cop = build3 (COMPONENT_REF, TREE_TYPE (field),
+op, field, NULL_TREE);
+ if (tree ret = cxx_fold_indirect_ref_1 (ctx, loc, type, cop,
+ off, empty_base))
+   return ret;
+   }
+   }
+   }
   for (tree field = TYPE_FIELDS (optype);
   field; field = DECL_CHAIN (field))
if (TREE_CODE (field) == FIELD_DECL
@@ -4691,13 +4719,13 @@ cxx_fold_indirect_ref_1 (location_t loc,
if (!tree_fits_uhwi_p (pos))
  continue;
unsigned HOST_WIDE_INT upos = tree_to_uhwi (pos);
-   unsigned el_sz
+   unsigned HOST_WIDE_INT el_sz
  = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (field)));
if (upos <= off && off < upos + el_sz)
  {
tree cop = build3 (COMPONENT_REF, TREE_TYPE (field),
   op, field, NULL_TREE);
-   if (tree ret = cxx_fold_indirect_ref_1 (loc, type, cop,
+   if (tree ret = cxx_fold_indirect_ref_1 (ctx, loc, type, cop,
off - upos,
empty_base))
  return ret;
@@ -4718,7 +4746,8 @@ cxx_fold_indirect_ref_1 (location_t loc,
with TBAA in fold_indirect_ref_1.  */
 
 static tree
-cxx_fold_indirect_ref (location_t loc, tree type, tree op0, bool *empty_base)
+cxx_fold_indirect_ref (const constexpr_ctx *ctx, location_t loc, tree type,
+  tree op0, bool *empty_base)
 {
   tree sub = op0;
   tree subtype;
@@ -4756,7 +4785,7 @@ cxx_fold_indirect_ref (location_t loc, t
return op;
}
   else
-   return cxx_fold_indirect_ref_1 (loc, type, op, 0, empty_

[PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, we can combine ~(1 << x) into -2 r<< x, but we give
up in the ~(1 << (x & 31)) cases, as *3_mask* don't allow
immediate operand 1 and find_split_point prefers to split (x & 31) instead
of the constant.

With these combine splitters we help combine decide how to split those
insns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-04  Jakub Jelinek  

PR target/96226
* config/i386/i386.md (splitter after *3_mask,
splitter after *3_mask_1): New combine splitters.

* gcc.target/i386/pr96226.c: New test.

--- gcc/config/i386/i386.md.jj  2020-12-02 11:20:24.729487245 +0100
+++ gcc/config/i386/i386.md 2020-12-04 15:39:17.481148449 +0100
@@ -11975,6 +11975,23 @@ (define_insn_and_split "*mode) - 1))
+   == GET_MODE_BITSIZE (mode) - 1"
+ [(set (match_dup 4) (match_dup 1))
+  (set (match_dup 0)
+   (any_rotate:SWI48 (match_dup 4)
+(subreg:QI
+  (and:SI (match_dup 2) (match_dup 3)) 0)))]
+ "operands[4] = gen_reg_rtx (mode);")
+
 (define_insn_and_split "*3_mask_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand")
(any_rotate:SWI48
@@ -11995,6 +12012,21 @@ (define_insn_and_split "*mode) - 1))
+  == GET_MODE_BITSIZE (mode) - 1"
+ [(set (match_dup 4) (match_dup 1))
+  (set (match_dup 0)
+   (any_rotate:SWI48 (match_dup 4)
+(and:QI (match_dup 2) (match_dup 3]
+ "operands[4] = gen_reg_rtx (mode);")
+
 ;; Implement rotation using two double-precision
 ;; shift instructions and a scratch register.
 
--- gcc/testsuite/gcc.target/i386/pr96226.c.jj  2020-12-04 15:45:35.437890237 
+0100
+++ gcc/testsuite/gcc.target/i386/pr96226.c 2020-12-04 15:46:09.408507488 
+0100
@@ -0,0 +1,16 @@
+/* PR target/96226 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "\troll\t" 4 } } */
+/* { dg-final { scan-assembler-times "\trolq\t" 4 { target { ! ia32 } } } } */
+
+int f1 (int x) { return ~(1U << (x & 0x1f)); }
+int f2 (int x) { return ~(1U << x); }
+int f3 (unsigned char *x) { return ~(1U << (x[0] & 0x1f)); }
+int f4 (unsigned char *x) { return ~(1U << x[0]); }
+#ifdef __x86_64__
+long int f5 (int x) { return ~(1ULL << (x & 0x3f)); }
+long int f6 (int x) { return ~(1ULL << x); }
+long int f7 (unsigned char *x) { return ~(1ULL << (x[0] & 0x3f)); }
+long int f8 (unsigned char *x) { return ~(1ULL << x[0]); }
+#endif

Jakub



Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 4, 2020 at 6:32 PM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in the PR, we can combine ~(1 << x) into -2 r<< x, but we give
> up in the ~(1 << (x & 31)) cases, as *3_mask* don't allow
> immediate operand 1 and find_split_point prefers to split (x & 31) instead
> of the constant.
>
> With these combine splitters we help combine decide how to split those
> insns.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2020-12-04  Jakub Jelinek  
>
> PR target/96226
> * config/i386/i386.md (splitter after *3_mask,
> splitter after *3_mask_1): New combine splitters.
>
> * gcc.target/i386/pr96226.c: New test.
>
> --- gcc/config/i386/i386.md.jj  2020-12-02 11:20:24.729487245 +0100
> +++ gcc/config/i386/i386.md 2020-12-04 15:39:17.481148449 +0100
> @@ -11975,6 +11975,23 @@ (define_insn_and_split "*(clobber (reg:CC FLAGS_REG))])]
>"operands[2] = gen_lowpart (QImode, operands[2]);")
>
> +(define_split
> +  [(set (match_operand:SWI48 0 "register_operand")
> +   (any_rotate:SWI48
> + (match_operand:SWI48 1 "const_int_operand")
> + (subreg:QI
> +   (and:SI
> + (match_operand:SI 2 "register_operand")
> + (match_operand:SI 3 "const_int_operand")) 0)))]
> + "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode) - 1))
> +   == GET_MODE_BITSIZE (mode) - 1"
> + [(set (match_dup 4) (match_dup 1))
> +  (set (match_dup 0)
> +   (any_rotate:SWI48 (match_dup 4)
> +(subreg:QI
> +  (and:SI (match_dup 2) (match_dup 3)) 0)))]

Don't we need

   (clobber (reg:CC FLAGS_REG))]

here? (or is this one of the combine splitter peculiarities?)

Uros.

> + "operands[4] = gen_reg_rtx (mode);")
> +
>  (define_insn_and_split "*3_mask_1"
>[(set (match_operand:SWI48 0 "nonimmediate_operand")
> (any_rotate:SWI48
> @@ -11995,6 +12012,21 @@ (define_insn_and_split "*  (match_dup 2)))
>(clobber (reg:CC FLAGS_REG))])])
>
> +(define_split
> +  [(set (match_operand:SWI48 0 "register_operand")
> +   (any_rotate:SWI48
> + (match_operand:SWI48 1 "const_int_operand")
> + (and:QI
> +   (match_operand:QI 2 "register_operand")
> +   (match_operand:QI 3 "const_int_operand"]
> + "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode) - 1))
> +  == GET_MODE_BITSIZE (mode) - 1"
> + [(set (match_dup 4) (match_dup 1))
> +  (set (match_dup 0)
> +   (any_rotate:SWI48 (match_dup 4)
> +(and:QI (match_dup 2) (match_dup 3]
> + "operands[4] = gen_reg_rtx (mode);")
> +
>  ;; Implement rotation using two double-precision
>  ;; shift instructions and a scratch register.
>
> --- gcc/testsuite/gcc.target/i386/pr96226.c.jj  2020-12-04 15:45:35.437890237 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr96226.c 2020-12-04 15:46:09.408507488 
> +0100
> @@ -0,0 +1,16 @@
> +/* PR target/96226 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-times "\troll\t" 4 } } */
> +/* { dg-final { scan-assembler-times "\trolq\t" 4 { target { ! ia32 } } } } 
> */
> +
> +int f1 (int x) { return ~(1U << (x & 0x1f)); }
> +int f2 (int x) { return ~(1U << x); }
> +int f3 (unsigned char *x) { return ~(1U << (x[0] & 0x1f)); }
> +int f4 (unsigned char *x) { return ~(1U << x[0]); }
> +#ifdef __x86_64__
> +long int f5 (int x) { return ~(1ULL << (x & 0x3f)); }
> +long int f6 (int x) { return ~(1ULL << x); }
> +long int f7 (unsigned char *x) { return ~(1ULL << (x[0] & 0x3f)); }
> +long int f8 (unsigned char *x) { return ~(1ULL << x[0]); }
> +#endif
>
> Jakub
>


Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Dec 04, 2020 at 06:37:02PM +0100, Uros Bizjak wrote:
> > + "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode) - 1))
> > +   == GET_MODE_BITSIZE (mode) - 1"
> > + [(set (match_dup 4) (match_dup 1))
> > +  (set (match_dup 0)
> > +   (any_rotate:SWI48 (match_dup 4)
> > +(subreg:QI
> > +  (and:SI (match_dup 2) (match_dup 3)) 0)))]
> 
> Don't we need
> 
>(clobber (reg:CC FLAGS_REG))]
> 
> here? (or is this one of the combine splitter peculiarities?)

I was trying that first, but it didn't work.  Without the
clobber it actually works right, we don't have the rotate insn with the
masking and no clobber, so in the end combiner does add the clobber there
(or would fail it the clobber couldn't be added).

Jakub



Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 4, 2020 at 6:41 PM Jakub Jelinek  wrote:
>
> On Fri, Dec 04, 2020 at 06:37:02PM +0100, Uros Bizjak wrote:
> > > + "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode) - 1))
> > > +   == GET_MODE_BITSIZE (mode) - 1"
> > > + [(set (match_dup 4) (match_dup 1))
> > > +  (set (match_dup 0)
> > > +   (any_rotate:SWI48 (match_dup 4)
> > > +(subreg:QI
> > > +  (and:SI (match_dup 2) (match_dup 3)) 0)))]
> >
> > Don't we need
> >
> >(clobber (reg:CC FLAGS_REG))]
> >
> > here? (or is this one of the combine splitter peculiarities?)
>
> I was trying that first, but it didn't work.  Without the
> clobber it actually works right, we don't have the rotate insn with the
> masking and no clobber, so in the end combiner does add the clobber there
> (or would fail it the clobber couldn't be added).

I was not aware of that detail ...

The patch is OK.

Thanks,
Uros.


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Uros

> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, December 4, 2020 2:30 PM
> To: Kumar, Venkataramanan 
> Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> 
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
>  wrote:
> >
> > [AMD Public Use]
> >
> >
> >
> >
> > Hi Maintainers,
> >
> >
> >
> > PFA, the patch that enables support for the next generation AMD Zen3
> CPU via -march=znver3.
> >
> > This is a very basic enablement patch. As of now the cost, tuning and
> scheduler changes are kept same as znver2.
> >
> > Further changes to the cost and tunings will be done later.
> >
> >
> >
> > Ok for trunk ?
> 
> Please also add a new target to multiversioning and corresponding
> testcases. As an example, how this is done nowadays, please see a
> submission for a different target at [1].
> 
> BTW: It looks that multiversioning testcases lack AMD targets. Can you
> please add a testcase similar to testsuite/g++.target/i386/mv16.C and also
> add AMD targets to testsuite/gcc.target/i386/funcspec-56.inc.
> (this can be done in a follow-up patch).
> 
> [1]
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc
> .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> July%2F549699.html&data=04%7C01%7CVenkataramanan.Kumar%40
> amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> WwiLCJXVCI6Mn0%3D%7C1000&sdata=VAPPvfzv%2FMCRiXSn2eBNn
> 7bVIReoEHLkAtFgV%2BTFR4I%3D&reserved=0
> 

Please find attached the version 2 patch.

I have made additional changes as suggested by you. 
1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
2.  To covers multiversioning  added a new test with some set of AMD targets 
detected by builtin_cpus similar to mv16.C. 

is ok for trunk ? 

Regards,
Venkat.

> Uros.


X86_64-Enable-support-for-next-generation-AMD-Znver3-V2.patch
Description: X86_64-Enable-support-for-next-generation-AMD-Znver3-V2.patch


Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 4, 2020 at 6:42 PM Uros Bizjak  wrote:
>
> On Fri, Dec 4, 2020 at 6:41 PM Jakub Jelinek  wrote:
> >
> > On Fri, Dec 04, 2020 at 06:37:02PM +0100, Uros Bizjak wrote:
> > > > + "(INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode) - 1))
> > > > +   == GET_MODE_BITSIZE (mode) - 1"
> > > > + [(set (match_dup 4) (match_dup 1))
> > > > +  (set (match_dup 0)
> > > > +   (any_rotate:SWI48 (match_dup 4)
> > > > +(subreg:QI
> > > > +  (and:SI (match_dup 2) (match_dup 3)) 0)))]
> > >
> > > Don't we need
> > >
> > >(clobber (reg:CC FLAGS_REG))]
> > >
> > > here? (or is this one of the combine splitter peculiarities?)
> >
> > I was trying that first, but it didn't work.  Without the
> > clobber it actually works right, we don't have the rotate insn with the
> > masking and no clobber, so in the end combiner does add the clobber there
> > (or would fail it the clobber couldn't be added).
>
> I was not aware of that detail ...

That said, IMO, it would be better to rewrite other _mask and _mask_1
patterns that remove useless masking to combine splitter.
Unfortunately, the combine splitter expects exactly two output
instructions for some reason, but these patterns split to one
instruction. Perhaps it is possible to relax this limitation of
combine splitters and also allow one output instruction.

Uros.


Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Dec 04, 2020 at 06:53:49PM +0100, Uros Bizjak wrote:
> > > I was trying that first, but it didn't work.  Without the
> > > clobber it actually works right, we don't have the rotate insn with the
> > > masking and no clobber, so in the end combiner does add the clobber there
> > > (or would fail it the clobber couldn't be added).
> >
> > I was not aware of that detail ...
> 
> That said, IMO, it would be better to rewrite other _mask and _mask_1
> patterns that remove useless masking to combine splitter.
> Unfortunately, the combine splitter expects exactly two output
> instructions for some reason, but these patterns split to one
> instruction. Perhaps it is possible to relax this limitation of
> combine splitters and also allow one output instruction.

I've already checked it in.  Guess I can try to change the combine splitters
(can it wait till Monday?) so that they remove the masking when splitting
the insn into two, so that the pre-reload splitters aren't involved.

To turn those pre-reload define_insn_and_splits I'm afraid we'd indeed
need combiner's changes, so that would need to be discussed with Segher
first.

Jakub



Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
 wrote:
>
> [AMD Public Use]
>
> Hi Uros
>
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Friday, December 4, 2020 2:30 PM
> > To: Kumar, Venkataramanan 
> > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > 
> > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> > Zen3 CPU
> >
> > [CAUTION: External Email]
> >
> > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> >  wrote:
> > >
> > > [AMD Public Use]
> > >
> > >
> > >
> > >
> > > Hi Maintainers,
> > >
> > >
> > >
> > > PFA, the patch that enables support for the next generation AMD Zen3
> > CPU via -march=znver3.
> > >
> > > This is a very basic enablement patch. As of now the cost, tuning and
> > scheduler changes are kept same as znver2.
> > >
> > > Further changes to the cost and tunings will be done later.
> > >
> > >
> > >
> > > Ok for trunk ?
> >
> > Please also add a new target to multiversioning and corresponding
> > testcases. As an example, how this is done nowadays, please see a
> > submission for a different target at [1].
> >
> > BTW: It looks that multiversioning testcases lack AMD targets. Can you
> > please add a testcase similar to testsuite/g++.target/i386/mv16.C and also
> > add AMD targets to testsuite/gcc.target/i386/funcspec-56.inc.
> > (this can be done in a follow-up patch).
> >
> > [1]
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc
> > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > July%2F549699.html&data=04%7C01%7CVenkataramanan.Kumar%40
> > amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > WwiLCJXVCI6Mn0%3D%7C1000&sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > 7bVIReoEHLkAtFgV%2BTFR4I%3D&reserved=0
> >
>
> Please find attached the version 2 patch.
>
> I have made additional changes as suggested by you.
> 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> 2.  To covers multiversioning  added a new test with some set of AMD targets 
> detected by builtin_cpus similar to mv16.C.
>
> is ok for trunk ?

LGTM (I didn't review scheduling changes in detail).

Uros.


Re: [PATCH] gimple: Return fnspec only for replaceable new/delete operators called from new/delete [PR98130]

2020-12-04 Thread Richard Biener
On December 4, 2020 6:06:20 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>As mentioned in the PR, we shouldn't treat non-replaceable operator
>new/delete (e.g. with the placement new) as replaceable ones.
>
>There is some pending discussion that perhaps operator delete called
>from
>delete if not replaceable should return some other fnspec, but can we
>handle
>that incrementally, fix this wrong-code and then deal with a missed
>optimization?  I really don't know what exactly should be returned.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok. 

Richard. 

>2020-12-04  Jakub Jelinek  
>
>   PR c++/98130
>   * gimple.c (gimple_call_fnspec): Only return ".co " for replaceable
>   operator delete or ".mC" for replaceable operator new called from
>   new/delete.
>
>   * g++.dg/opt/pr98130.C: New test.
>
>--- gcc/gimple.c.jj2020-11-26 01:14:47.528081989 +0100
>+++ gcc/gimple.c   2020-12-04 13:31:10.885766239 +0100
>@@ -1514,11 +1514,12 @@ gimple_call_fnspec (const gcall *stmt)
>  such operator, then we can treat it as free.  */
>   if (fndecl
>   && DECL_IS_OPERATOR_DELETE_P (fndecl)
>+  && DECL_IS_REPLACEABLE_OPERATOR (fndecl)
>   && gimple_call_from_new_or_delete (stmt))
> return ".co ";
>   /* Similarly operator new can be treated as malloc.  */
>   if (fndecl
>-  && DECL_IS_OPERATOR_NEW_P (fndecl)
>+  && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
>   && gimple_call_from_new_or_delete (stmt))
> return "mC";
>   return "";
>--- gcc/testsuite/g++.dg/opt/pr98130.C.jj  2020-12-04 12:30:11.510988404
>+0100
>+++ gcc/testsuite/g++.dg/opt/pr98130.C 2020-12-04 12:33:05.663028984
>+0100
>@@ -0,0 +1,25 @@
>+// PR c++/98130
>+// { dg-do run { target c++11 } }
>+// { dg-options "-O2" }
>+
>+#include 
>+
>+typedef int *T;
>+
>+static unsigned char storage[sizeof (T)] alignas (T);
>+static T *p = (T *) storage;
>+
>+static inline __attribute__((__always_inline__)) void
>+foo (T value)
>+{
>+  new (p) T(value);
>+}
>+
>+int
>+main ()
>+{
>+  int a;
>+  foo (&a);
>+  if (!*p)
>+__builtin_abort ();
>+}
>
>   Jakub



Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 4, 2020 at 6:57 PM Jakub Jelinek  wrote:
>
> On Fri, Dec 04, 2020 at 06:53:49PM +0100, Uros Bizjak wrote:
> > > > I was trying that first, but it didn't work.  Without the
> > > > clobber it actually works right, we don't have the rotate insn with the
> > > > masking and no clobber, so in the end combiner does add the clobber 
> > > > there
> > > > (or would fail it the clobber couldn't be added).
> > >
> > > I was not aware of that detail ...
> >
> > That said, IMO, it would be better to rewrite other _mask and _mask_1
> > patterns that remove useless masking to combine splitter.
> > Unfortunately, the combine splitter expects exactly two output
> > instructions for some reason, but these patterns split to one
> > instruction. Perhaps it is possible to relax this limitation of
> > combine splitters and also allow one output instruction.
>
> I've already checked it in.  Guess I can try to change the combine splitters
> (can it wait till Monday?) so that they remove the masking when splitting
> the insn into two, so that the pre-reload splitters aren't involved.

No, I didn't want to burden you with the additional task - the patch
is OK as it is. I was just thinking out loud, as I remembered that
changing bt patterns to combine splitter regressed one testcase. IIRC
combination of two insns blocked better combination of three insns, or
something like that.

> To turn those pre-reload define_insn_and_splits I'm afraid we'd indeed
> need combiner's changes, so that would need to be discussed with Segher
> first.

Yes, that is the long-term plan. Segher CC'd.

Uros.


RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Uros,

> -Original Message-
> From: Uros Bizjak 
> Sent: Friday, December 4, 2020 11:31 PM
> To: Kumar, Venkataramanan 
> Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> 
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
>  wrote:
> >
> > [AMD Public Use]
> >
> > Hi Uros
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Friday, December 4, 2020 2:30 PM
> > > To: Kumar, Venkataramanan 
> > > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > > 
> > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation
> > > AMD
> > > Zen3 CPU
> > >
> > > [CAUTION: External Email]
> > >
> > > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> > >  wrote:
> > > >
> > > > [AMD Public Use]
> > > >
> > > >
> > > >
> > > >
> > > > Hi Maintainers,
> > > >
> > > >
> > > >
> > > > PFA, the patch that enables support for the next generation AMD
> > > > Zen3
> > > CPU via -march=znver3.
> > > >
> > > > This is a very basic enablement patch. As of now the cost, tuning
> > > > and
> > > scheduler changes are kept same as znver2.
> > > >
> > > > Further changes to the cost and tunings will be done later.
> > > >
> > > >
> > > >
> > > > Ok for trunk ?
> > >
> > > Please also add a new target to multiversioning and corresponding
> > > testcases. As an example, how this is done nowadays, please see a
> > > submission for a different target at [1].
> > >
> > > BTW: It looks that multiversioning testcases lack AMD targets. Can
> > > you please add a testcase similar to
> > > testsuite/g++.target/i386/mv16.C and also add AMD targets to
> testsuite/gcc.target/i386/funcspec-56.inc.
> > > (this can be done in a follow-up patch).
> > >
> > > [1]
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgc
> > > c
> > > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > >
> July%2F549699.html&data=04%7C01%7CVenkataramanan.Kumar%40
> > >
> amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > >
> 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > >
> %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > >
> WwiLCJXVCI6Mn0%3D%7C1000&sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > > 7bVIReoEHLkAtFgV%2BTFR4I%3D&reserved=0
> > >
> >
> > Please find attached the version 2 patch.
> >
> > I have made additional changes as suggested by you.
> > 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> > 2.  To covers multiversioning  added a new test with some set of AMD
> targets detected by builtin_cpus similar to mv16.C.
> >
> > is ok for trunk ?
> 
> LGTM (I didn't review scheduling changes in detail).

Thank you for reviewing the patch.  
I will wait for a day or two,  if I don’t get further comments I will commit 
the patch .

Regards,
Venkat.

> 
> Uros.


Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Jakub Jelinek via Gcc-patches
On Fri, Dec 04, 2020 at 07:06:45PM +0100, Uros Bizjak wrote:
> No, I didn't want to burden you with the additional task - the patch
> is OK as it is. I was just thinking out loud, as I remembered that
> changing bt patterns to combine splitter regressed one testcase. IIRC
> combination of two insns blocked better combination of three insns, or
> something like that.

Here is the patch to simplify the newly added combine splitters,
when we split into 2 insns anyway, no reason to split into the masking
define_insn_and_split we'd be splitting shortly after.

Passes the new testcase, ok if it passes bootstrap/regtest on x86_64-linux
and i686-linux?

2020-12-04  Jakub Jelinek  

PR target/96226
* config/i386/i386.md (splitter after *3_mask,
splitter after *3_mask_1): Drop the masking from
the patterns to split into.

--- gcc/config/i386/i386.md.jj  2020-12-04 18:44:23.494052861 +0100
+++ gcc/config/i386/i386.md 2020-12-04 19:00:22.192626807 +0100
@@ -11988,8 +11988,7 @@ (define_split
  [(set (match_dup 4) (match_dup 1))
   (set (match_dup 0)
(any_rotate:SWI48 (match_dup 4)
-(subreg:QI
-  (and:SI (match_dup 2) (match_dup 3)) 0)))]
+(subreg:QI (match_dup 2) 0)))]
  "operands[4] = gen_reg_rtx (mode);")
 
 (define_insn_and_split "*3_mask_1"
@@ -12023,8 +12022,7 @@ (define_split
   == GET_MODE_BITSIZE (mode) - 1"
  [(set (match_dup 4) (match_dup 1))
   (set (match_dup 0)
-   (any_rotate:SWI48 (match_dup 4)
-(and:QI (match_dup 2) (match_dup 3]
+   (any_rotate:SWI48 (match_dup 4) (match_dup 2)))]
  "operands[4] = gen_reg_rtx (mode);")
 
 ;; Implement rotation using two double-precision


Jakub



Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Segher Boessenkool
Hi!

On Fri, Dec 04, 2020 at 07:06:45PM +0100, Uros Bizjak wrote:
> On Fri, Dec 4, 2020 at 6:57 PM Jakub Jelinek  wrote:
> >
> > On Fri, Dec 04, 2020 at 06:53:49PM +0100, Uros Bizjak wrote:
> > > > > I was trying that first, but it didn't work.  Without the
> > > > > clobber it actually works right, we don't have the rotate insn with 
> > > > > the
> > > > > masking and no clobber, so in the end combiner does add the clobber 
> > > > > there
> > > > > (or would fail it the clobber couldn't be added).
> > > >
> > > > I was not aware of that detail ...
> > >
> > > That said, IMO, it would be better to rewrite other _mask and _mask_1
> > > patterns that remove useless masking to combine splitter.
> > > Unfortunately, the combine splitter expects exactly two output
> > > instructions for some reason, but these patterns split to one
> > > instruction. Perhaps it is possible to relax this limitation of
> > > combine splitters and also allow one output instruction.
> >
> > I've already checked it in.  Guess I can try to change the combine splitters
> > (can it wait till Monday?) so that they remove the masking when splitting
> > the insn into two, so that the pre-reload splitters aren't involved.
> 
> No, I didn't want to burden you with the additional task - the patch
> is OK as it is. I was just thinking out loud, as I remembered that
> changing bt patterns to combine splitter regressed one testcase. IIRC
> combination of two insns blocked better combination of three insns, or
> something like that.
> 
> > To turn those pre-reload define_insn_and_splits I'm afraid we'd indeed
> > need combiner's changes, so that would need to be discussed with Segher
> > first.
> 
> Yes, that is the long-term plan. Segher CC'd.

A splitter can *already* split to only one insn.


Segher


Re: [RFC PATCH v1 1/1] PPC64: Implement POWER Architecture Vector Function ABI.

2020-12-04 Thread GT via Gcc-patches
‐‐‐ Original Message ‐‐‐
On Thursday, August 20, 2020 1:48 PM, Segher Boessenkool 
 wrote:

> On Thu, Aug 20, 2020 at 04:19:36PM +, GT wrote:
>
> > > Great! Please repost with what I already pointed out fixed, that
> > > explanation added, and working links to the documentation?
> >
> > Are you ok with the titles of the patch and this document?
> > https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncabi.html
>
> It is very misleading. You can undo some of the damage in the first
> lines of the commit message, but you can also just fix the title itself,
> so that anyone can see what this is about even before reading the
> message (which is what a mail subject is for!)
>
> Segher

I have:

1. Changed the title of this document.
2. Removed all references within the document describing itself as an ABI.
3. Added a new introductory paragraph that should hopefully make clearer the 
doc's purpose.

Use the new link below. There's has been a name change in the link as well.

https://sourceware.org/glibc/wiki/HomePage?action=AttachFile&do=view&target=powerarchvectfuncspec.html

Bert.


Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 4, 2020 at 7:09 PM Jakub Jelinek  wrote:
>
> On Fri, Dec 04, 2020 at 07:06:45PM +0100, Uros Bizjak wrote:
> > No, I didn't want to burden you with the additional task - the patch
> > is OK as it is. I was just thinking out loud, as I remembered that
> > changing bt patterns to combine splitter regressed one testcase. IIRC
> > combination of two insns blocked better combination of three insns, or
> > something like that.
>
> Here is the patch to simplify the newly added combine splitters,
> when we split into 2 insns anyway, no reason to split into the masking
> define_insn_and_split we'd be splitting shortly after.
>
> Passes the new testcase, ok if it passes bootstrap/regtest on x86_64-linux
> and i686-linux?
>
> 2020-12-04  Jakub Jelinek  
>
> PR target/96226
> * config/i386/i386.md (splitter after *3_mask,
> splitter after *3_mask_1): Drop the masking from
> the patterns to split into.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2020-12-04 18:44:23.494052861 +0100
> +++ gcc/config/i386/i386.md 2020-12-04 19:00:22.192626807 +0100
> @@ -11988,8 +11988,7 @@ (define_split
>   [(set (match_dup 4) (match_dup 1))
>(set (match_dup 0)
> (any_rotate:SWI48 (match_dup 4)
> -(subreg:QI
> -  (and:SI (match_dup 2) (match_dup 3)) 0)))]
> +(subreg:QI (match_dup 2) 0)))]
>   "operands[4] = gen_reg_rtx (mode);")
>
>  (define_insn_and_split "*3_mask_1"
> @@ -12023,8 +12022,7 @@ (define_split
>== GET_MODE_BITSIZE (mode) - 1"
>   [(set (match_dup 4) (match_dup 1))
>(set (match_dup 0)
> -   (any_rotate:SWI48 (match_dup 4)
> -(and:QI (match_dup 2) (match_dup 3]
> +   (any_rotate:SWI48 (match_dup 4) (match_dup 2)))]
>   "operands[4] = gen_reg_rtx (mode);")
>
>  ;; Implement rotation using two double-precision
>
>
> Jakub
>


Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Uros Bizjak via Gcc-patches
On Fri, Dec 4, 2020 at 7:26 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Fri, Dec 04, 2020 at 07:06:45PM +0100, Uros Bizjak wrote:
> > On Fri, Dec 4, 2020 at 6:57 PM Jakub Jelinek  wrote:
> > >
> > > On Fri, Dec 04, 2020 at 06:53:49PM +0100, Uros Bizjak wrote:
> > > > > > I was trying that first, but it didn't work.  Without the
> > > > > > clobber it actually works right, we don't have the rotate insn with 
> > > > > > the
> > > > > > masking and no clobber, so in the end combiner does add the clobber 
> > > > > > there
> > > > > > (or would fail it the clobber couldn't be added).
> > > > >
> > > > > I was not aware of that detail ...
> > > >
> > > > That said, IMO, it would be better to rewrite other _mask and _mask_1
> > > > patterns that remove useless masking to combine splitter.
> > > > Unfortunately, the combine splitter expects exactly two output
> > > > instructions for some reason, but these patterns split to one
> > > > instruction. Perhaps it is possible to relax this limitation of
> > > > combine splitters and also allow one output instruction.
> > >
> > > I've already checked it in.  Guess I can try to change the combine 
> > > splitters
> > > (can it wait till Monday?) so that they remove the masking when splitting
> > > the insn into two, so that the pre-reload splitters aren't involved.
> >
> > No, I didn't want to burden you with the additional task - the patch
> > is OK as it is. I was just thinking out loud, as I remembered that
> > changing bt patterns to combine splitter regressed one testcase. IIRC
> > combination of two insns blocked better combination of three insns, or
> > something like that.
> >
> > > To turn those pre-reload define_insn_and_splits I'm afraid we'd indeed
> > > need combiner's changes, so that would need to be discussed with Segher
> > > first.
> >
> > Yes, that is the long-term plan. Segher CC'd.
>
> A splitter can *already* split to only one insn.

Oh... brown paper bag time... I really don't know where and when I
pick that info, since the docs indeed say:

--q--
When the combiner phase tries to split an insn pattern, it is always
the case that the pattern is _not_ matched by any 'define_insn'.  The
combiner pass first tries to split a single 'set' expression and then
the same 'set' expression inside a 'parallel', but followed by a
'clobber' of a pseudo-reg to use as a scratch register.  In these cases,
the combiner expects exactly one or two new insn patterns to be
generated.  It will verify that these patterns match some 'define_insn'
definitions, so you need not do this test in the 'define_split' (of
course, there is no point in writing a 'define_split' that will never
produce insns that match).
--/q--

Enough compilers for today, I'd say.

Uros.


Re: [PATCH] tree-optimization/98137 - enhance split_constant_offset range handling

2020-12-04 Thread Jeff Law via Gcc-patches



On 12/4/20 4:45 AM, Richard Biener wrote:
> split_constant_offset currently gives up looking at ranges when
> dealing with possibly wrapping operations for looking through
> conversions when the downstream analysis does not yield a SSA name.
> That's overly conservative and we have a nice helper that can
> deal with arbitrary expresssions.  Use that.  This helps data
> reference group analysis so the testcase is fully SLP vectorized,
> making use of the whole-function "BB" vectorization capabilities
> we now have.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK for trunk?
>
> Thanks,
> Richard.
>
> 2020-12-04  Richard Biener  
>
>   PR tree-optimization/98137
>   * tree-data-ref.c (split_constant_offset_1): Use
>   determine_value_range instead of get_range_info to handle
>   arbitrary expressions.
>
>   * gcc.dg/vect/bb-slp-pr98137.c: New testcase.
LGTM.  Just a testsuite comment nit:

> +/* Exact scannig is difficult but we expect all loads and stores
s/scannig/scanning/




Re: [PATCH] i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

2020-12-04 Thread Segher Boessenkool
On Fri, Dec 04, 2020 at 07:32:43PM +0100, Uros Bizjak wrote:
> On Fri, Dec 4, 2020 at 7:26 PM Segher Boessenkool
>  wrote:
> > A splitter can *already* split to only one insn.
> 
> Oh... brown paper bag time... I really don't know where and when I
> pick that info, since the docs indeed say:

At some point in the past it had to be always exactly two insns:

commit d340408c13f21efcbf7b012cfa7ccd3653b31281
Author: Richard Henderson 
Date:   Mon Sep 18 11:08:19 2000 -0700

* combine.c (try_combine): Allow split to create a single insn.

That is a while ago though ;-)

But, the doc was fixed with

commit 8cb0906b0fa9c07095db1ec7fb22eaeecf5075af
Author: Segher Boessenkool 
Date:   Wed Nov 6 01:06:23 2019 +0100

doc: Insn splitting by combine

(Many other places in the documentation still suggest splitters always
create multiple insns...  That is the common case of course!)

> When the combiner phase tries to split an insn pattern, it is always
> the case that the pattern is _not_ matched by any 'define_insn'.  The
> combiner pass first tries to split a single 'set' expression and then
> the same 'set' expression inside a 'parallel', but followed by a
> 'clobber' of a pseudo-reg to use as a scratch register.  In these cases,
> the combiner expects exactly one or two new insn patterns to be

(Before the 2019 commit, it said "exactly two".)

> generated.  It will verify that these patterns match some 'define_insn'
> definitions, so you need not do this test in the 'define_split' (of
> course, there is no point in writing a 'define_split' that will never
> produce insns that match).
> --/q--
> 
> Enough compilers for today, I'd say.

Enjoy your weekend!


Segher


Re: [PATCH] v3: doc/implement-c.texi: About same-as-scalar-type volatile aggregate accesses, PR94600

2020-12-04 Thread Jeff Law via Gcc-patches



On 12/4/20 7:51 AM, Hans-Peter Nilsson via Gcc-patches wrote:
>> From: Martin Sebor via Gcc-patches 
>> Date: Fri, 4 Dec 2020 01:49:51 +0100
>> On 12/3/20 12:14 PM, Hans-Peter Nilsson via Gcc-patches wrote:
>>> Belatedly, here's an updated version, using Martin Sebor's
>>> suggested wording from
>>> "https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549580.html";.
>>> I added two commas, hopefully helpfully.  Albeit ok'd by Richard
>>> Biener in
>>> "https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549922.html";,
>>> better have this reviewed properly, including markup (none added).
>>>
>>> Ok for trunk (gcc-11) and gcc-10?
>> Thanks for taking my suggestion!
> You're welcome!
>
>> These are just formatting nits but I would only further suggest
>> to enclose the name S (since it names a type) and the second
>> volatile in an @code{} directive (since it's a keyword).
>> (The volatile in volatile access is not one so it shouldn't
>> be formatted that way.)
> Here we go, and now with all the right email-addresses.
> Also, I inspected info and pdf output.  Yes, the last S ends up
> on a line by its own in the pdf.  I didn't think it was worth
> fixing by e.g. messing with the word order.
>
> BTW, "make -j 4 info pdf" from the top-level doesn't work;
> something is messed up in dependencies.  From a non-j "make info
> pdf" it looks like libgcc wanted to compile stuff (no "all-gcc"
> was done).
>
> ---
> We say very little about reads and writes to aggregate /
> compound objects, just scalar objects (i.e. assignments don't
> cause reads).  Let's lets say something safe about aggregate
> objects, but only for those that are the same size as a scalar
> type.
>
> There's an equal-sounding section (Volatiles) in extend.texi,
> but this seems a more appropriate place, as specifying the
> behavior of a standard qualifier.
>
> Ok for trunk (gcc-11) and gcc-10?
>
> gcc:
>
> 2020-12-04  Hans-Peter Nilsson  
>   Martin Sebor  
>
>   PR middle-end/94600
>   * doc/implement-c.texi (Qualifiers implementation): Add blurb
>   about access to the whole of a volatile aggregate object, only for
>   same-size as a scalar object.
OK
jeff



Re: [committed] Fix mcore multilib specification

2020-12-04 Thread Jeff Law via Gcc-patches



On 12/2/20 6:06 PM, Jim Wilson wrote:
> On Tue, Dec 1, 2020 at 3:24 PM Jeff Law via Gcc-patches
> mailto:gcc-patches@gcc.gnu.org>> wrote:
>
>
> Kito's recent change to multilib handling seems to have exposed a
> latent
> mcore bug.
>
> The mcore 210 does not support little endian.  Yet we try to build a
> mcore-210 little-endian multilibs.
>
> I don't know why this wasn't failing before, but clearly it's not
> supposed to work.  This patch adjusts the multilib set to not generate
> that particular library configuration.  The net result is mcore's
> libgcc
> builds again, as does newlib.
>
>
> You have two default args in MULTILIB_DEFAULTS and that was never
> supported by the print_multilib_info function.  It only handled one
> default arg correctly.  It must have been working by accident before
> Kito fixed it.  But since we know what the underlying problem is, we
> can check for targets with more than one option in MULTILIB_DEFAULTS. 
> Looks like targets that might be affected are csky, m32r, mcore, mips,
> nds32, riscv, rs6000/sysv*, and sh.  We know that riscv works
> correctly as we checked that.  And you just fixed mcore.  We should
> probably check the others.  They may or may not be OK with Kito's patch.
m32r, mcore, nds32 are at least building.  THe others my tester builds
as linux configurations, I think with --disable-multilib, so I don't
have any data on them.

jeff



Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Joseph Myers
On Fri, 4 Dec 2020, Richard Biener via Gcc-patches wrote:

> Per rule changes to targets are allowed at any point per discretion of target
> maintainers.  Heck, we even accept _new_ targets during stage3/4!

For architectures that are neither primary nor secondary targets, that's 
definitely the case (the other side being that if the maintainer keeps 
putting major changes in and as a result the back-end is unstable at the 
time of branching, the branch won't be delayed for that).

For primary and secondary architectures, more care is needed to consider 
the risk of a change (but basic enabling for a new processor as in this 
patch is certainly on the safer side).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)

2020-12-04 Thread Jeff Law via Gcc-patches



On 11/24/20 11:39 AM, Martin Sebor wrote:
> On 11/24/20 10:44 AM, Andrew MacLeod wrote:
>> On 11/24/20 12:42 PM, Andrew MacLeod wrote:
>>> On 11/23/20 4:38 PM, Martin Sebor wrote:
 On 11/21/20 6:26 AM, Andrew MacLeod wrote:
> On 11/21/20 12:07 AM, Jeff Law wrote:
>>
>> On 11/9/20 9:00 AM, Martin Sebor wrote:
>>> Ping:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554000.html
>>>
>>>
>>> Jeff, I don't expect to have the cycles to reimplement this patch
>>> using the Ranger APIs before stage 1 closes.  I'm open to giving
>>> it a try in stage 3 if it's still in scope for GCC 11. Otherwise,
>>> is this patch okay to commit?
>> So all we're going to get from the ranger is ranges of operands,
>> right?
>> Meaning that we still need to either roll our own evaluator
>> (eval_size_vflow) or overload range_for_stmt with our own, which
>> likely
>> looks like eval_size_vflow anyway, right?
>>
>> My hope was to avoid the roll our own evaluator, but that doesn't
>> look
>> like it's in the cards in the reasonably near future.
>
> Is there a PR open showing what exactly you are looking for?
> I'm using open PRs to track enhancement requests, and they will
> all feed back into the development roadmap  I am working on.

 Not that I know of.  The background is upthread, in particular in
 Aldy's response here:
 https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554242.html

 I like the suggestion and if/when I have the time I'd like to give
 it a try.  Until then, I think the patch is useful on its own so
 I'll go with it for now.

 Longer term, I do hope we can revisit the idea of computing either
 mathematically correct ranges alongside those required by the language
 semantics, or tracking signed overflow or unsigned wraparound. E.g.,
 in:

   void* f (int n)
   {
     if (n < INT_MAX / 3)
   n = INT_MAX / 3;

     n *= sizeof (int);
     // n is [(INT_MAX / 3) * 4, INF] mathematically
     // undefined due to overflow in C
     // but [INT_MIN, INT_MAX] according to VRP
>>>
>>> but sizeof returns a size_t.. which is an unsigned. thus the
>>> multiply is promoted to an unsigned multiply  which means there is
>>> lots of wrapping and I don't see how you can conclude those ranges?
>>> [INT_MIN, INT_MAX] are all possible outcomes based on the code that
>>> is generated.
>>>
>>> If I change that to
>>>   n *= (int) sizeof (int) to keep it as signed arithmetic, I see:
>>>
>>>
>>> Folding statement: n_4 = n_1 * 4;
>>> EVRP:hybrid: RVRP found singleton 2147483647
>>> Queued stmt for removal.  Folds to: 2147483647
>>> evrp visiting stmt _7 = malloc (n_4);
>>>
>>> extract_range_from_stmt visiting:
>>> _7 = foo (n_4);
>>> Folding statement: _7 = foo (n_4);
>>> EVRP:hybrid: RVRP found singleton 2147483647
>>> Folded into: _7 = malloc (2147483647);
>>>
>>> So I'm not sure what exactly you want to do?  We are calculating
>>> what the program can produce?
>>>
>>> Why do we care about alternative calculations?
>>>
>> Or rather, why do we want to do this?
>
> When computing the sizes of things, programmers commonly forget
> to consider unsigned wrapping (or signed overflow).  We simply
> assume it can't happen and that (for instance) N * sizeof (X)
> is necessarily big enough for N elements of type X.  (Grepping
> any code base for the pattern '\* sizeof' and looking for code
> that tests that the result doesn't wrap is revealing.)
>
> When overflow or wrapping does happen (typically because of poor
> precondition checking) it often leads to bugs when we end up
> allocating less space than we need and use.  A simple example
> to help illustrate what I mean:
>
>   void* g (int *a, int n)
>   {
>     a = realloc (a, n * sizeof (int) + 32);
>     for (int i = n; i != n + 32; ++i)
>   a[i] = f ();
>   }
>
> In ILP32, if (n > INT_MAX / 4 - 32) holds, n * sizeof(int) will
> wrap around zero.  The realloc call will end up allocating less
> space than expected, and the loop will write past the end of
> the allocated block.
>
> (The bug above can only be detected if we know n's range.
> I left that part out.)
>
> Historically, bugs caused by integer overflow and wrapping have
> been among the most serious security weaknesses.  Detecting these
> mistakes will help prevent some of these.
>
> The problem is that according to C/C++, nothing in the function
> above is undefined except for the buffer overflow in the loop,
> and the buffer overflow only happens because of the well-defined
> integer wrapping.  To detect the wrapping, we either need to do
> the computation in as-if infinite math and compare the final result
> to the result we get under C's truncating rules, or we need to set
> and propagate the "wraparound" bit throughout the computation.
Just to add a bit to Martin's note.  Yes, an overflow of the size passed
to

[PATCH,rs6000] Combine patterns for p10 load-cmpi fusion

2020-12-04 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey 

This patch adds the first batch of patterns to support p10 fusion. These
will allow combine to create a single insn for a pair of instructions
that that power10 can fuse and execute. These particular ones have the
requirement that only cr0 can be used when fusing a load with a compare
immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
to put that requirement in, and if it doesn't work out later the splitter
can get used.

The patterns are generated by a script genfusion.pl and live in new file
fusion.md. This script will be expanded to generate more patterns for
fusion.

This also adds option -mpower10-fusion which defaults on for power10 and
will gate all these fusion patterns. In addition I have added an
undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
that just controls the load+compare-immediate patterns. I have make
these default on for power10 but they are not disallowed for earlier
processors because it is still valid code. This allows us to test the
correctness of fusion code generation by turning it on explicitly.

If bootstrap/regtest is clean, ok for trunk?

Thanks!

   Aaron

gcc/ChangeLog:

* config/rs6000/genfusion.pl: New file, script to generate
define_insn_and_split patterns so combine can arrange fused
instructions next to each other.
* config/rs6000/fusion.md: New file, generated fused instruction
patterns for combine.
* config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
(non_update_memory_operand): New predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
POWERPC_MASKS.
* config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
prototype.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
if target is power10.  (rs600_opt_masks): Allow -mpower10-fusion
in function attributes.  (address_is_non_pfx_d_or_x): New function.
* config/rs6000/rs6000.h: Add MASK_P10_FUSION.
* config/rs6000/rs6000.md: Include fusion.md.
* config/rs6000/rs6000.opt: Add -mpower10-fusion
and -mpower10-fusion-ld-cmpi.
* config/rs6000/t-rs6000: Add dependencies involving fusion.md.
---
 gcc/config/rs6000/fusion.md   | 357 ++
 gcc/config/rs6000/genfusion.pl| 144 
 gcc/config/rs6000/predicates.md   |  14 ++
 gcc/config/rs6000/rs6000-cpus.def |   6 +-
 gcc/config/rs6000/rs6000-protos.h |   2 +
 gcc/config/rs6000/rs6000.c|  51 +
 gcc/config/rs6000/rs6000.h|   1 +
 gcc/config/rs6000/rs6000.md   |   1 +
 gcc/config/rs6000/rs6000.opt  |   8 +
 gcc/config/rs6000/t-rs6000|   6 +-
 10 files changed, 588 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/rs6000/fusion.md
 create mode 100755 gcc/config/rs6000/genfusion.pl

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
new file mode 100644
index 000..a4d3a6ae7f3
--- /dev/null
+++ b/gcc/config/rs6000/fusion.md
@@ -0,0 +1,357 @@
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
+  [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+(compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+   (clobber (match_scratch:DI 0 "=r"))]
+  "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+  "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+  "&& reload_completed
+   && (cc_reg_not_cr0_operand (operands[2], CCmode)
+   || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, 
NON_PREFIXED_DS))"
+  [(set (match_dup 0) (match_dup 1))
+   (set (match_dup 2)
+(compare:CC (match_dup 0)
+   (match_dup 3)))]
+  ""
+  [(set_attr "type" "load")
+   (set_attr "cost" "8")
+   (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generate

Re: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Jan Hubicka
> On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
>  wrote:
> >
> > [AMD Public Use]
> >
> > Hi Uros
> >
> > > -Original Message-
> > > From: Uros Bizjak 
> > > Sent: Friday, December 4, 2020 2:30 PM
> > > To: Kumar, Venkataramanan 
> > > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > > 
> > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> > > Zen3 CPU
> > >
> > > [CAUTION: External Email]
> > >
> > > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> > >  wrote:
> > > >
> > > > [AMD Public Use]
> > > >
> > > >
> > > >
> > > >
> > > > Hi Maintainers,
> > > >
> > > >
> > > >
> > > > PFA, the patch that enables support for the next generation AMD Zen3
> > > CPU via -march=znver3.
> > > >
> > > > This is a very basic enablement patch. As of now the cost, tuning and
> > > scheduler changes are kept same as znver2.
> > > >
> > > > Further changes to the cost and tunings will be done later.
> > > >
> > > >
> > > >
> > > > Ok for trunk ?
> > >
> > > Please also add a new target to multiversioning and corresponding
> > > testcases. As an example, how this is done nowadays, please see a
> > > submission for a different target at [1].
> > >
> > > BTW: It looks that multiversioning testcases lack AMD targets. Can you
> > > please add a testcase similar to testsuite/g++.target/i386/mv16.C and also
> > > add AMD targets to testsuite/gcc.target/i386/funcspec-56.inc.
> > > (this can be done in a follow-up patch).
> > >
> > > [1]
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc
> > > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > > July%2F549699.html&data=04%7C01%7CVenkataramanan.Kumar%40
> > > amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > > 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > > %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > > WwiLCJXVCI6Mn0%3D%7C1000&sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > > 7bVIReoEHLkAtFgV%2BTFR4I%3D&reserved=0
> > >
> >
> > Please find attached the version 2 patch.
> >
> > I have made additional changes as suggested by you.
> > 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> > 2.  To covers multiversioning  added a new test with some set of AMD 
> > targets detected by builtin_cpus similar to mv16.C.
> >
> > is ok for trunk ?
> 
> LGTM (I didn't review scheduling changes in detail).

I checked the scheudling changes and they are OK. So the patch is OK
overall.

Even with respect to Jason's point on possibly regressing primary target
(breaking -march=native on zen3 machine counts as a regression), the
risks here are low. There is nothing really controveral in the patch.

It would be nice to setup the regular benchmarking on zen3 machine, like
we do for zen1/2.
Honza
> 
> Uros.


Re: [PATCH RFA] vec: Simplify use with C++11 range-based 'for'.

2020-12-04 Thread Jason Merrill via Gcc-patches

On 12/4/20 3:39 AM, Richard Biener wrote:

On Thu, Dec 3, 2020 at 10:46 PM Jeff Law via Gcc-patches
 wrote:




On 12/3/20 10:53 AM, Jason Merrill via Gcc-patches wrote:

It looks cleaner if we can use a vec* directly as a range for the C++11
range-based 'for' loop, without needing to indirect from it, and also works
with null pointers.

The change in cp_parser_late_parsing_default_args is an example of how this
can be used to simplify many loops over vec*.

I deliberately didn't format the new overloads for etags since they are
trivial, but am open to changing that.

Tested x86_64-pc-linux-gnu.  Is this OK for trunk now, or should I hold it for
stage 1?

gcc/ChangeLog:

   * vec.h (begin, end): Add overloads for vec*.
   * tree.c (build_constructor_from_vec): Remove *.

gcc/cp/ChangeLog:

   * decl2.c (clear_consteval_vfns): Remove *.
   * pt.c (do_auto_deduction): Remove *.
   * parser.c (cp_parser_late_parsing_default_args): Change loop
   to use range 'for'.

I'd go forward with it now, it's simple enough and simplifies the code
we end up writing...


Btw, I was disappointed about range-for seeing you cannot express
iterating from element 2 or reverse iterating.  This means when we
try to adopt range-for we'll keep a messy mix of iteration style since
range-for cannot express all (or even most) iterations in our code base.


In C++20, for other ranges you use a range adaptor; to start from 
element 2 you'd use https://en.cppreference.com/w/cpp/ranges/drop_view


We could add similar adaptors to GCC to use them with range-for in C++11 
code, possibly drawing on the range-v3 library (which doesn't require 
concepts support).


Jason



Re: [PATCH] c++: Fix constexpr access to union member through pointer-to-member [PR98122]

2020-12-04 Thread Jason Merrill via Gcc-patches

On 12/4/20 12:27 PM, Jakub Jelinek wrote:

Hi!

We currently incorrectly reject the first testcase, because
cxx_fold_indirect_ref_1 doesn't attempt to handle UNION_TYPEs.
As the second testcase shows, it isn't that easy, because I believe we need
to take into account the active member and prefer that active member over
other members, because if we pick a non-active one, we might reject valid
programs.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2020-12-04  Jakub Jelinek  

PR c++/98122
* constexpr.c (cxx_fold_indirect_ref_1): Add ctx argument, pass it
through to recursive call.  Handle UNION_TYPE.
(cxx_fold_indirect_ref): Add ctx argument, pass it to recursive calls
and cxx_fold_indirect_ref_1.
(cxx_eval_indirect_ref): Adjust cxx_fold_indirect_ref calls.

* g++.dg/cpp1y/constexpr-98122.C: New test.
* g++.dg/cpp2a/constexpr-98122.C: New test.


+  if (TREE_CODE (optype) == UNION_TYPE)

+   {
+ /* For unions prefer the currently active member.  */
+ constexpr_ctx new_ctx = *ctx;
+ new_ctx.quiet = true;
+ bool non_constant_p = false, overflow_p = false;
+ tree ctor = cxx_eval_constant_expression (&new_ctx, op, false,
+   &non_constant_p,
+   &overflow_p);
+ if (TREE_CODE (ctor) == CONSTRUCTOR
+ && CONSTRUCTOR_NELTS (ctor) == 1
+ && CONSTRUCTOR_ELT (ctor, 0)->index
+ && TREE_CODE (CONSTRUCTOR_ELT (ctor, 0)->index) == FIELD_DECL)
+   {
+ tree field = CONSTRUCTOR_ELT (ctor, 0)->index;


I wonder about factoring the above out into a cxx_union_active_member 
function.  OK either way.



+ unsigned HOST_WIDE_INT el_sz
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (field)));
+ if (off < el_sz)
+   {
+ tree cop = build3 (COMPONENT_REF, TREE_TYPE (field),
+op, field, NULL_TREE);
+ if (tree ret = cxx_fold_indirect_ref_1 (ctx, loc, type, cop,
+ off, empty_base))
+   return ret;
+   }
+   }
+   }
for (tree field = TYPE_FIELDS (optype);
   field; field = DECL_CHAIN (field))
if (TREE_CODE (field) == FIELD_DECL
@@ -4691,13 +4719,13 @@ cxx_fold_indirect_ref_1 (location_t loc,
if (!tree_fits_uhwi_p (pos))
  continue;
unsigned HOST_WIDE_INT upos = tree_to_uhwi (pos);
-   unsigned el_sz
+   unsigned HOST_WIDE_INT el_sz
  = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (field)));
if (upos <= off && off < upos + el_sz)
  {
tree cop = build3 (COMPONENT_REF, TREE_TYPE (field),
   op, field, NULL_TREE);
-   if (tree ret = cxx_fold_indirect_ref_1 (loc, type, cop,
+   if (tree ret = cxx_fold_indirect_ref_1 (ctx, loc, type, cop,
off - upos,
empty_base))
  return ret;
@@ -4718,7 +4746,8 @@ cxx_fold_indirect_ref_1 (location_t loc,
 with TBAA in fold_indirect_ref_1.  */
  
  static tree

-cxx_fold_indirect_ref (location_t loc, tree type, tree op0, bool *empty_base)
+cxx_fold_indirect_ref (const constexpr_ctx *ctx, location_t loc, tree type,
+  tree op0, bool *empty_base)
  {
tree sub = op0;
tree subtype;
@@ -4756,7 +4785,7 @@ cxx_fold_indirect_ref (location_t loc, t
return op;
}
else
-   return cxx_fold_indirect_ref_1 (loc, type, op, 0, empty_base);
+   return cxx_fold_indirect_ref_1 (ctx, loc, type, op, 0, empty_base);
  }
else if (TREE_CODE (sub) == POINTER_PLUS_EXPR
   && tree_fits_uhwi_p (TREE_OPERAND (sub, 1)))
@@ -4766,7 +4795,7 @@ cxx_fold_indirect_ref (location_t loc, t
  
STRIP_NOPS (op00);

if (TREE_CODE (op00) == ADDR_EXPR)
-   return cxx_fold_indirect_ref_1 (loc, type, TREE_OPERAND (op00, 0),
+   return cxx_fold_indirect_ref_1 (ctx, loc, type, TREE_OPERAND (op00, 0),
tree_to_uhwi (op01), empty_base);
  }
/* *(foo *)fooarrptr => (*fooarrptr)[0] */
@@ -4776,7 +4805,7 @@ cxx_fold_indirect_ref (location_t loc, t
tree type_domain;
tree min_val = size_zero_node;
tree newsub
-   = cxx_fold_indirect_ref (loc, TREE_TYPE (subtype), sub, NULL);
+   = cxx_fold_indirect_ref (ctx, loc, TREE_TYPE (subtype), sub, NULL);
if (newsub)
sub = newsub;
else
@@ -4811,8 +4840,8 @@ cxx_eval_indirect_ref (const constexpr_c
  }
  
/* First try to simplify it directly.  */

-  tree r = cxx_fold_indirect_ref (EXPR_LOCATION (t), TREE_TYPE 

Re: [PATCH] c++: Distinguish unsatisfaction vs errors during satisfaction [PR97093]

2020-12-04 Thread Patrick Palka via Gcc-patches
On Thu, 3 Dec 2020, Jason Merrill wrote:

> On 12/3/20 9:24 AM, Patrick Palka wrote:
> > During satisfaction, the flag info.noisy() controls three things:
> > whether to diagnose fatal errors (such as the satisfaction value of an
> > atom being non-bool); whether to diagnose unsatisfaction; and whether to
> > bypass the satisfaction cache.
> > 
> > This flag turns out to be too coarse however, for sometimes we need to
> > diagnose fatal errors but not unsatisfaction, in particular when replaying
> > an erroneous satisfaction result from constraint_satisfaction_value,
> > evaluate_concept_check and tsubst_nested_requirement.
> > 
> > And we sometimes need to bypass the satisfaction cache but not diagnose
> > unsatisfaction, in particular when evaluating the branches of a
> > disjunction when info.noisy() is true.  Currently, satisfy_disjunction
> > first quietly evaluates each branch, but doing so causes satisfy_atom
> > to insert re-normalized atoms into the satisfaction cache when
> > diagnosing unsatisfaction of the overall constraint.  This is ultimately
> > the source of PR97093.
> > 
> > To that end, this patch adds the info.diagnose_unsatisfaction_p() flag
> > which refines the info.noisy() flag.  During satisfaction info.noisy()
> > now controls whether to diagnose fatal errors, and
> > info.diagnose_unsatisfaction_p() controls whether to additionally
> > diagnose unsatisfaction.  This enables us to address the above two
> > issues straightforwardly.
> 
> > This flag refinement also allows us to fold the diagnose_foo_requirement
> > routines into the corresponding tsubst_foo_requirement ones.  Here, the
> > flags take on slightly different meanings: info.noisy() controls whether
> > to diagnose invalid types and expressions inside the requires-expression,
> > and info.diagnose_unsatisfaction_p() controls whether to diagnose the
> > overall unsatisfaction of the requires-expression.
> 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, and also tested on
> > cmcstl2 and range-v3.  Does this look OK for trunk?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/97093
> > * constraint.cc (struct sat_info): Define.
> > (tsubst_valid_expression_requirement): Take a sat_info instead
> > of subst_info.  Perform the substitution quietly first.  Fold in
> > error-replaying code from diagnose_valid_expression.
> > (tsubst_simple_requirement): Take a sat_info instead of
> > subst_info.
> > (tsubst_type_requirement_1): New.  Fold in error-replaying code
> > from diagnose_valid_type.
> > (tsubst_type_requirement): Use it. Take a sat_info instead of
> > subst_info.
> > (tsubst_compound_requirement): Likewise.  Fold in
> > error-replaying code from diagnose_compound_requirement.
> > (tsubst_nested_requirement): Take a sat_info instead of
> > subst_info.  Perform the substitution quietly first.  Fold in
> > error-replaying code from diagnose_nested_requirement.
> > (tsubst_requirement): Take a sat_info instead of subst_info.
> > (tsubst_requirement_body): Likewise.
> > (tsubst_requires_expr): Split into two versions, one that takes
> > a sat_info argument and another that takes a complain and
> > in_decl argument.  Remove outdated documentation.  Document he
> > effects of the sat_info argument.
> > (tsubst_parameter_mapping): Take a sat_info instead of a
> > subst_info.
> > (satisfy_conjunction): Likewise.
> > (satisfy_disjunction): Likewise.  Evaluate each branch with
> > unsatisfaction diagnostics disabled rather than completely
> > quietly, and short circuit when an erroneous branch is
> > encountered.
> > (satisfy_atom):  Take a sat_info instead of a subst_info.  Fix a
> > comment.  Use diagnose_unsatisfaction_p() instead of noisy() to
> > guard replaying of satisfaction failure.  Always check
> > constantness quietly first and consistently return
> > error_mark_node when the value is non-constant.
> > (satisfy_constraint_r): Document the effects of the sat_info
> > argument.  Take a sat_info instead of a subst_info.
> > (satisfy_constraint): Take a sat_info instead of a subst_info.
> > (satisfy_associated_constraints): Likewise.
> > (satisfy_constraint_expression): Likewise.
> > (satisfy_declaration_constraints): Likewise.
> > (constraint_satisfaction_value): Likewise.  Adjust.  XXX
> > (constraints_satisfied_p): Adjust.
> > (evaluate_concept_check): Adjust.
> > (diagnose_trait_expr): Make static.  Take a template args vector
> > instead of a parameter mapping.
> > (diagnose_atomic_constraint): Take a sat_info instead of a
> > subst_info.  Adjust call to diagnose_trait_expr.  Call
> > tsubst_requires_expr instead of diagnose_requires_expr.
> > (diagnose_constraints): Adjust calls to
> > constraint_satisfaction_value.
> > (diagnose_valid_expression): Remove.
> > (diagnose_valid_type): Likewise.
> > (diagnos

[PATCH 2/2] c++: Normalize nested-requirements twice at parse time [PR97093]

2020-12-04 Thread Patrick Palka via Gcc-patches
The re-normalization performed from diagnose_nested_requirement doesn't
always work because we may have already lost the necessary template
context that determines the set of in-scope template parameters used by
the nested-requirement.  This leads to normalization producing atoms
that have incomplete/bogus parameter mappings, which breaks satisfaction.

To fix this, we could just use the previously normalized form that we
computed at parse time, but this normal form lacks the diagnostic
information that leads to good error messages.

Instead, this patch makes diagnose_nested_requirement normalize twice at
parse time -- once without diagnostic information and once with -- so
that routines can use the "regular" normal form when performing
satisfaction quietly and the "diagnostic" normal form when performing
satisfaction noisily.  Moreover, this patch makes tsubst_nested_requirement
always first perform satisfaction quietly so that the satisfaction cache
can get consistently utilized.

Finally, this patch also adds more stringent checking to
build_parameter_mapping that would have caught the underlying bug
sooner (and deterministically).

gcc/cp/ChangeLog:

PR c++/97093
* constraint.cc (parameter_mapping_equivalent_p): Add more
stringent checking.  Clarify comment.
(tsubst_nested_requirement): Always perform satisfaction
quietly first.  If that yields an erroneous result, emit a
context message and replay satisfaction noisily with the
diagnostic normal form.
(finish_nested_requirement): Normalize the constraint-expression
twice, once with diagnostic information and once without.  Store
them in a TREE_LIST within the TREE_TYPE.
(diagnose_nested_requirement): When replaying satisfaction, use
the diagnostic normal form instead of renormalizing on the spot.

gcc/testsuite/ChangeLog:

PR c++/97093
* g++.dg/cpp2a/concepts-requires22.C: New test.
---
 gcc/cp/constraint.cc  | 41 ---
 .../g++.dg/cpp2a/concepts-requires22.C| 18 
 2 files changed, 44 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires22.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 2be1a841535..c6d4d8e7e64 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -612,7 +612,8 @@ build_parameter_mapping (tree expr, tree args, tree decl)
   return map;
 }
 
-/* True if the parameter mappings of two atomic constraints are equivalent.  */
+/* True if the parameter mappings of two atomic constraints formed
+   from the same expression are equivalent.  */
 
 static bool
 parameter_mapping_equivalent_p (tree t1, tree t2)
@@ -621,6 +622,7 @@ parameter_mapping_equivalent_p (tree t1, tree t2)
   tree map2 = ATOMIC_CONSTR_MAP (t2);
   while (map1 && map2)
 {
+  gcc_checking_assert (TREE_VALUE (map1) == TREE_VALUE (map2));
   tree arg1 = TREE_PURPOSE (map1);
   tree arg2 = TREE_PURPOSE (map2);
   if (!template_args_equal (arg1, arg2))
@@ -628,6 +630,7 @@ parameter_mapping_equivalent_p (tree t1, tree t2)
   map1 = TREE_CHAIN (map1);
   map2 = TREE_CHAIN (map2);
 }
+  gcc_checking_assert (!map1 && !map2);
   return true;
 }
 
@@ -2085,14 +2088,16 @@ tsubst_compound_requirement (tree t, tree args, 
subst_info info)
 static tree
 tsubst_nested_requirement (tree t, tree args, subst_info info)
 {
-  /* Ensure that we're in an evaluation context prior to satisfaction.  */
-  tree norm = TREE_TYPE (t);
-  tree result = satisfy_constraint (norm, args,
-   sat_info (info.complain, info.in_decl));
-  if (result == error_mark_node && info.quiet ())
+  /* Perform satisfaction quietly with the regular normal form.  */
+  sat_info quiet (tf_none, info.in_decl);
+  tree norm = TREE_VALUE (TREE_TYPE (t));
+  tree diag_norm = TREE_PURPOSE (TREE_TYPE (t));
+  tree result = satisfy_constraint (norm, args, quiet);
+  if (result == error_mark_node)
 {
+  /* Replay the error using the diagnostic normal form.  */
   sat_info noisy (tf_warning_or_error, info.in_decl);
-  satisfy_constraint (norm, args, noisy);
+  satisfy_constraint (diag_norm, args, noisy);
 }
   if (result != boolean_true_node)
 return error_mark_node;
@@ -3139,10 +3144,15 @@ finish_compound_requirement (location_t loc, tree expr, 
tree type, bool noexcept
 tree
 finish_nested_requirement (location_t loc, tree expr)
 {
-  tree norm = normalize_constraint_expression (expr, false);
+  /* We need to normalize the constraints now, at parse time, while
+ we have the necessary template context.  We normalize twice,
+ once without diagnostic information and once with, which we'll
+ later use during quiet and noisy satisfaction respectively.  */
+  tree norm = normalize_constraint_expression (expr, /*diag=*/false);
+  tree diag_norm = normalize_constraint_expression (expr, /*diag=*/true);
 
- 

[PATCH 1/2 v2] c++: Distinguish unsatisfaction vs errors during satisfaction [PR97093]

2020-12-04 Thread Patrick Palka via Gcc-patches
During satisfaction, the flag info.noisy() controls three things:
whether to diagnose ill-formed satisfaction (such as the satisfaction
value of an atom being non-bool or non-constant); whether to diagnose
unsatisfaction; and whether to bypass the satisfaction cache.

The flag turns out to be too coarse however, because in some cases we
want to diagnose ill-formed satisfaction (and bypass the satisfaction
cache) but not diagnose unsatisfaction, for instance when replaying an
erroneous satisfaction result from constraint_satisfaction_value,
evaluate_concept_check and tsubst_nested_requirement.

And when noisily evaluating a disjunction, we want to first evaluate its
branches noisily (bypassing the satisfaction cache) but suppress
unsatisfaction diagnostics.  We currently work around this by instead
first evaluating each branch quietly, but that means the recursive calls
to satisfy_atom will use the satisfaction cache.

To fix this, this patch adds the info.diagnose_unsatisfaction_p() flag,
which refines the info.noisy() flag as part of a new sat_info class that
derives from subst_info.  During satisfaction, info.noisy() now controls
whether to diagnose ill-formed satisfaction, and
info.diagnose_unsatisfaction_p() controls whether to additionally
diagnose unsatisfaction.  This enables us to address the above two
issues straightforwardly.

Incidentally, the change to satisfy_disjunction suppresses the ICE in
the PR97093 testcase because we no longer insert atoms into the
satisfaction cache that have been incorrectly re-normalized in
diagnose_nested_requirement (after losing the necessary template
context).  But the underlying re-normalization issue remains, and will
be fixed in a subsequent patch.

gcc/cp/ChangeLog:

PR c++/97093
* constraint.cc (struct sat_info): Define.
(tsubst_nested_requirement): Pass a sat_info object to
satisfy_constraint.
(satisfy_constraint_r): Take a sat_info argument instead of
subst_info.
(satisfy_conjunction): Likewise.
(satisfy_disjunction): Likewise.  Instead of first evaluating
each branch quietly, evaluate each branch only with
unsatisfaction diagnostics disabled.  Exit early if evaluation
of a branch returns error_mark_node.
(satisfy_atom): Take a sat_info argument instead of subst_info.
Fix a comment.  Check diagnose_unsatisfaction_p() instead of
noisy() before replaying a substitution failure.
(satisfy_constraint): Take a sat_info argument instead of
subst_info.
(satisfy_associated_constraints): Likewise.
(satisfy_constraint_expression): Likewise.
(satisfy_declaration_constraints): Likewise.
(constraint_satisfaction_value): Likewise and adjust
accordingly.  Fix formatting.
(constraints_satisfied_p): Pass a sat_info object to
constraint_satisfaction_value.
(evaluate_concept_check): Pass a sat_info object to
satisfy_constraint_expression.
(diagnose_nested_requirement): Likewise.
(diagnose_constraints): Pass an appropriate sat_info object to
constraint_satisfaction_value.

gcc/testsuite/ChangeLog:

PR c++/97093
* g++.dg/concepts/pr94252.C: Verify we no longer issue a
spurious satisfaction failure note when diagnosing ill-formed
satisfaction.
* g++.dg/cpp2a/concepts-requires18.C: No longer expect a
spurious satisfaction failure diagnostic when immediately
evaluating the nested-requirement subst of a
requires-expression that appears outside of a template.
* g++.dg/cpp2a/concepts-requires21.C: Verify we no longer issue
a spurious satisfaction failure note when immediately evaluating
a nested-requirement of a requires-expression that appears
outside of a template.
* g++.dg/cpp2a/concepts-nonbool3.C: New test.
* g++.dg/cpp2a/concepts-pr97093.C: New test.
---
 gcc/cp/constraint.cc  | 149 +++---
 gcc/testsuite/g++.dg/concepts/pr94252.C   |   1 +
 .../g++.dg/cpp2a/concepts-nonbool3.C  |   5 +
 .../g++.dg/cpp2a/concepts-requires18.C|   2 +-
 .../g++.dg/cpp2a/concepts-requires21.C|   1 +
 5 files changed, 104 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nonbool3.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 7f02aa0a215..2be1a841535 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -98,7 +98,35 @@ struct subst_info
   tree in_decl;
 };
 
-static tree satisfy_constraint (tree, tree, subst_info);
+/* Provides additional context for satisfaction.
+
+   The flag noisy() controls whether to diagnose ill-formed satisfaction,
+   such as the satisfaction value of an atom being non-bool or non-constant.
+
+   The flag diagnose_unsatisfaction_p(), which implies noisy(), controls
+   whether to explain why a constraint is not sati

Re: [committed] Fix non-unique testnames

2020-12-04 Thread Mike Stump via Gcc-patches
On Nov 30, 2020, at 8:00 AM, Jeff Law via Gcc-patches  
wrote:
> 
> This patch fixes a handful of tests with non-unique names which confuse
> the living hell out of compare_tests, particularly if one of two tests
> [x]fail while the other is [x]pass which compare_tests will flag as a
> regression each and every run.

Thanks.  The other way to fix the issue is to fix the tools so that they never 
fail.  :-)

Re: [committed] Fix non-unique testnames

2020-12-04 Thread Jeff Law via Gcc-patches



On 12/4/20 2:55 PM, Mike Stump wrote:
> On Nov 30, 2020, at 8:00 AM, Jeff Law via Gcc-patches 
>  wrote:
>> This patch fixes a handful of tests with non-unique names which confuse
>> the living hell out of compare_tests, particularly if one of two tests
>> [x]fail while the other is [x]pass which compare_tests will flag as a
>> regression each and every run.
> Thanks.  The other way to fix the issue is to fix the tools so that they 
> never fail.  :-)
Yes, but either way tests should be unique.

jeff



Re: [PATCH 1/2 v2] c++: Distinguish unsatisfaction vs errors during satisfaction [PR97093]

2020-12-04 Thread Jason Merrill via Gcc-patches

On 12/4/20 4:33 PM, Patrick Palka wrote:

I've convinced myself to do away with the whole diagnose_requires_expr /
tsubst_requires_expr consolidation, since that part is just a pure
refactoring change and the added overloadedness of the flags is not
ideal.  This simplifies the patch considerably.


Does dropping that reduce the overloadedness?  I'm always in favor of 
reducing code duplication.  But it certainly makes sense for that to 
happen in a separate patch.



During satisfaction, the flag info.noisy() controls three things:
whether to diagnose ill-formed satisfaction (such as the satisfaction
value of an atom being non-bool or non-constant); whether to diagnose
unsatisfaction; and whether to bypass the satisfaction cache.

The flag turns out to be too coarse however, because in some cases we
want to diagnose ill-formed satisfaction (and bypass the satisfaction
cache) but not diagnose unsatisfaction, for instance when replaying an
erroneous satisfaction result from constraint_satisfaction_value,
evaluate_concept_check and tsubst_nested_requirement.

And when noisily evaluating a disjunction, we want to first evaluate its
branches noisily (bypassing the satisfaction cache) but suppress
unsatisfaction diagnostics.  We currently work around this by instead
first evaluating each branch quietly, but that means the recursive calls
to satisfy_atom will use the satisfaction cache.

To fix this, this patch adds the info.diagnose_unsatisfaction_p() flag,
which refines the info.noisy() flag as part of a new sat_info class that
derives from subst_info.  During satisfaction, info.noisy() now controls
whether to diagnose ill-formed satisfaction, and
info.diagnose_unsatisfaction_p() controls whether to additionally
diagnose unsatisfaction.  This enables us to address the above two
issues straightforwardly.

Incidentally, the change to satisfy_disjunction suppresses the ICE in
the PR97093 testcase because we no longer insert atoms into the
satisfaction cache that have been incorrectly re-normalized in
diagnose_nested_requirement (after losing the necessary template
context).  But the underlying re-normalization issue remains, and will
be fixed in a subsequent patch.

gcc/cp/ChangeLog:

PR c++/97093
* constraint.cc (struct sat_info): Define.
(tsubst_nested_requirement): Pass a sat_info object to
satisfy_constraint.
(satisfy_constraint_r): Take a sat_info argument instead of
subst_info.
(satisfy_conjunction): Likewise.
(satisfy_disjunction): Likewise.  Instead of first evaluating
each branch quietly, evaluate each branch only with
unsatisfaction diagnostics disabled.  Exit early if evaluation
of a branch returns error_mark_node.
(satisfy_atom): Take a sat_info argument instead of subst_info.
Fix a comment.  Check diagnose_unsatisfaction_p() instead of
noisy() before replaying a substitution failure.
(satisfy_constraint): Take a sat_info argument instead of
subst_info.
(satisfy_associated_constraints): Likewise.
(satisfy_constraint_expression): Likewise.
(satisfy_declaration_constraints): Likewise.
(constraint_satisfaction_value): Likewise and adjust
accordingly.  Fix formatting.
(constraints_satisfied_p): Pass a sat_info object to
constraint_satisfaction_value.
(evaluate_concept_check): Pass a sat_info object to
satisfy_constraint_expression.
(diagnose_nested_requirement): Likewise.
(diagnose_constraints): Pass an appropriate sat_info object to
constraint_satisfaction_value.

gcc/testsuite/ChangeLog:

PR c++/97093
* g++.dg/concepts/pr94252.C: Verify we no longer issue a
spurious satisfaction failure note when diagnosing ill-formed
satisfaction.
* g++.dg/cpp2a/concepts-requires18.C: No longer expect a
spurious satisfaction failure diagnostic when immediately
evaluating the nested-requirement subst of a
requires-expression that appears outside of a template.
* g++.dg/cpp2a/concepts-requires21.C: Verify we no longer issue
a spurious satisfaction failure note when immediately evaluating
a nested-requirement of a requires-expression that appears
outside of a template.
* g++.dg/cpp2a/concepts-nonbool3.C: New test.
* g++.dg/cpp2a/concepts-pr97093.C: New test.
---
  gcc/cp/constraint.cc  | 149 +++---
  gcc/testsuite/g++.dg/concepts/pr94252.C   |   1 +
  .../g++.dg/cpp2a/concepts-nonbool3.C  |   5 +
  .../g++.dg/cpp2a/concepts-requires18.C|   2 +-
  .../g++.dg/cpp2a/concepts-requires21.C|   1 +
  5 files changed, 104 insertions(+), 54 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nonbool3.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 7f02aa0a215..2be1a841535 100644
--- a/gcc/cp/constraint.

Re: [PATCH 2/2] c++: Normalize nested-requirements twice at parse time [PR97093]

2020-12-04 Thread Jason Merrill via Gcc-patches

On 12/4/20 4:33 PM, Patrick Palka wrote:

The re-normalization performed from diagnose_nested_requirement doesn't
always work because we may have already lost the necessary template
context that determines the set of in-scope template parameters used by
the nested-requirement.  This leads to normalization producing atoms
that have incomplete/bogus parameter mappings, which breaks satisfaction.

To fix this, we could just use the previously normalized form that we
computed at parse time, but this normal form lacks the diagnostic
information that leads to good error messages.

Instead, this patch makes diagnose_nested_requirement normalize twice at
parse time -- once without diagnostic information and once with -- so
that routines can use the "regular" normal form when performing
satisfaction quietly and the "diagnostic" normal form when performing
satisfaction noisily.  Moreover, this patch makes tsubst_nested_requirement
always first perform satisfaction quietly so that the satisfaction cache
can get consistently utilized.


I wonder about building the "regular" form from the "diagnostic" form to 
avoid doing full normalization twice, but that would be a significantly 
bigger patch.  OK.



Finally, this patch also adds more stringent checking to
build_parameter_mapping that would have caught the underlying bug
sooner (and deterministically).

gcc/cp/ChangeLog:

PR c++/97093
* constraint.cc (parameter_mapping_equivalent_p): Add more
stringent checking.  Clarify comment.
(tsubst_nested_requirement): Always perform satisfaction
quietly first.  If that yields an erroneous result, emit a
context message and replay satisfaction noisily with the
diagnostic normal form.
(finish_nested_requirement): Normalize the constraint-expression
twice, once with diagnostic information and once without.  Store
them in a TREE_LIST within the TREE_TYPE.
(diagnose_nested_requirement): When replaying satisfaction, use
the diagnostic normal form instead of renormalizing on the spot.

gcc/testsuite/ChangeLog:

PR c++/97093
* g++.dg/cpp2a/concepts-requires22.C: New test.
---
  gcc/cp/constraint.cc  | 41 ---
  .../g++.dg/cpp2a/concepts-requires22.C| 18 
  2 files changed, 44 insertions(+), 15 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-requires22.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 2be1a841535..c6d4d8e7e64 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -612,7 +612,8 @@ build_parameter_mapping (tree expr, tree args, tree decl)
return map;
  }
  
-/* True if the parameter mappings of two atomic constraints are equivalent.  */

+/* True if the parameter mappings of two atomic constraints formed
+   from the same expression are equivalent.  */
  
  static bool

  parameter_mapping_equivalent_p (tree t1, tree t2)
@@ -621,6 +622,7 @@ parameter_mapping_equivalent_p (tree t1, tree t2)
tree map2 = ATOMIC_CONSTR_MAP (t2);
while (map1 && map2)
  {
+  gcc_checking_assert (TREE_VALUE (map1) == TREE_VALUE (map2));
tree arg1 = TREE_PURPOSE (map1);
tree arg2 = TREE_PURPOSE (map2);
if (!template_args_equal (arg1, arg2))
@@ -628,6 +630,7 @@ parameter_mapping_equivalent_p (tree t1, tree t2)
map1 = TREE_CHAIN (map1);
map2 = TREE_CHAIN (map2);
  }
+  gcc_checking_assert (!map1 && !map2);
return true;
  }
  
@@ -2085,14 +2088,16 @@ tsubst_compound_requirement (tree t, tree args, subst_info info)

  static tree
  tsubst_nested_requirement (tree t, tree args, subst_info info)
  {
-  /* Ensure that we're in an evaluation context prior to satisfaction.  */
-  tree norm = TREE_TYPE (t);
-  tree result = satisfy_constraint (norm, args,
-   sat_info (info.complain, info.in_decl));
-  if (result == error_mark_node && info.quiet ())
+  /* Perform satisfaction quietly with the regular normal form.  */
+  sat_info quiet (tf_none, info.in_decl);
+  tree norm = TREE_VALUE (TREE_TYPE (t));
+  tree diag_norm = TREE_PURPOSE (TREE_TYPE (t));
+  tree result = satisfy_constraint (norm, args, quiet);
+  if (result == error_mark_node)
  {
+  /* Replay the error using the diagnostic normal form.  */
sat_info noisy (tf_warning_or_error, info.in_decl);
-  satisfy_constraint (norm, args, noisy);
+  satisfy_constraint (diag_norm, args, noisy);
  }
if (result != boolean_true_node)
  return error_mark_node;
@@ -3139,10 +3144,15 @@ finish_compound_requirement (location_t loc, tree expr, 
tree type, bool noexcept
  tree
  finish_nested_requirement (location_t loc, tree expr)
  {
-  tree norm = normalize_constraint_expression (expr, false);
+  /* We need to normalize the constraints now, at parse time, while
+ we have the necessary template context.  We normalize twice,
+ once without diagnostic information

[pushed] c++: Fix deduction from auto template parameter [PR93083]

2020-12-04 Thread Jason Merrill via Gcc-patches
The check in do_class_deduction to handle passing one class placeholder
template parm as an argument for itself needed to be extended to also handle
equivalent parms from other templates.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/93083
* pt.c (convert_template_argument): Handle equivalent placeholders.
(do_class_deduction): Look through EXPR_PACK_EXPANSION, too.

gcc/testsuite/ChangeLog:

PR c++/93083
* g++.dg/cpp2a/nontype-class40.C: New test.
---
 gcc/cp/pt.c  | 12 +--
 gcc/testsuite/g++.dg/cpp2a/nontype-class40.C | 79 
 2 files changed, 86 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class40.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index e991a323de8..2d3ab92dfd1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8266,7 +8266,7 @@ convert_template_argument (tree parm,
 
   /* When determining whether an argument pack expansion is a template,
  look at the pattern.  */
-  if (TREE_CODE (arg) == TYPE_PACK_EXPANSION)
+  if (PACK_EXPANSION_P (arg))
 arg = PACK_EXPANSION_PATTERN (arg);
 
   /* Deal with an injected-class-name used as a template template arg.  */
@@ -29013,6 +29013,12 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
   if (DECL_TEMPLATE_TEMPLATE_PARM_P (tmpl))
 return ptype;
 
+  /* Initializing one placeholder from another.  */
+  if (init && TREE_CODE (init) == TEMPLATE_PARM_INDEX
+  && is_auto (TREE_TYPE (init))
+  && CLASS_PLACEHOLDER_TEMPLATE (TREE_TYPE (init)) == tmpl)
+return cp_build_qualified_type (TREE_TYPE (init), cp_type_quals (ptype));
+
   /* Look through alias templates that just rename another template.  */
   tmpl = get_underlying_template (tmpl);
   if (!ctad_template_p (tmpl))
@@ -29029,10 +29035,6 @@ do_class_deduction (tree ptype, tree tmpl, tree init,
 "with %<-std=c++20%> or %<-std=gnu++20%>");
 }
 
-  if (init && TREE_TYPE (init) == ptype)
-/* Using the template parm as its own argument.  */
-return ptype;
-
   tree type = TREE_TYPE (tmpl);
 
   bool try_list_ctor = false;
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class40.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class40.C
new file mode 100644
index 000..d19354491ff
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class40.C
@@ -0,0 +1,79 @@
+// PR c++/93083
+// { dg-do compile { target c++20 } }
+
+template
+struct FixedString
+{
+char buf[N + 1]{};
+constexpr FixedString(char const* s) {
+for (unsigned i = 0; i != N; ++i) buf[i] = s[i];
+}
+
+auto operator<=>(const FixedString&) const = default;
+constexpr operator char const*() const { return buf; }
+constexpr static unsigned size() noexcept { return N; }
+};
+
+template FixedString(char const (&)[N]) -> FixedString;
+
+template 
+struct name_list
+{
+template 
+using add_name = name_list<
+names...,
+FixedString{ name }
+>;
+};
+
+
+int main()
+{
+using names =
+name_list<>
+::add_name<"Zaphod Beeblebrox">;
+
+}
+
+// 
+
+template  struct literal {
+  constexpr literal(const char (&input)[N]) noexcept { }
+  constexpr literal(const literal &) noexcept { }
+};
+
+template  struct field { };
+
+template  struct field { };
+
+// 
+
+template 
+struct use_as_nttp {};
+
+template 
+struct has_nttp {};
+
+template 
+using has_nttp_2 = has_nttp;
+
+// 
+
+using size_t = decltype(sizeof(0));
+
+template 
+struct string_literal
+{
+  constexpr string_literal(const char*) {}
+  string_literal(string_literal const&) = default;
+};
+template 
+string_literal(const char (&)[N]) -> string_literal;
+
+template 
+struct type_string { };
+
+template 
+void foo() {
+  type_string{};
+}

base-commit: df933e307b1950ce12472660dcac1765b8eb431d
-- 
2.27.0



libgo patch committed: Update type descriptor name in fieldtrack C code

2020-12-04 Thread Ian Lance Taylor via Gcc-patches
This libgo patch updates the type descriptor name in the fieldtrack C
support code.  We were using the old name, but nothing noticed because
it is a weak reference that is permitted to be nil, so that it works
with code that does not use the field tracking library.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
4ae5e581336d9e113b61cf7d014d49bf0cd037f3
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index cd1a3961a06..019aafdde9a 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-b3a0b068f7fa2d65ba781271b2c0479d103b7d7b
+342e5f0b349553a69d7c99a18162ae2a1e6e5775
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/runtime/go-fieldtrack.c b/libgo/runtime/go-fieldtrack.c
index 22f091be3f4..80be27ca5e3 100644
--- a/libgo/runtime/go-fieldtrack.c
+++ b/libgo/runtime/go-fieldtrack.c
@@ -31,7 +31,7 @@ extern void *mapassign (const struct maptype *, void *hmap, 
const void *key)
 // The type descriptor for map[string] bool.  */
 extern const char map_string_bool[] __attribute__ ((weak));
 extern const char map_string_bool[]
-  __asm__ (GOSYM_PREFIX "type..map.6string.7bool");
+  __asm__ (GOSYM_PREFIX "type..map_6string_7bool");
 
 void runtime_Fieldtrack (void *) __asm__ (GOSYM_PREFIX "runtime.Fieldtrack");
 


Re: [PATCH] Hurd: Enable ifunc by default

2020-12-04 Thread Samuel Thibault via Gcc-patches
Ping?

Samuel Thibault, le dim. 08 nov. 2020 23:52:51 +0100, a ecrit:
> The binutils bugs seem to have been fixed.
> 
> 2020-11-08  Samuel Thibault  
> 
>   gcc/
>   * config.gcc: Enable default_gnu_indirect_function in *-*-gnu*
>   target (but not *-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu).
> ---
>  gcc/config.gcc | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index b42ebc4e5be..a347c2cec7c 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -3538,7 +3538,9 @@ esac
>  case ${target} in
>  *-*-linux*android*|*-*-linux*uclibc*|*-*-linux*musl*)
>  ;;
> -*-*-linux*)
> +*-*-kfreebsd*-gnu | *-*-kopensolaris*-gnu)
> +;;
> +*-*-linux* | *-*-gnu*)
>   case ${target} in
>   aarch64*-* | arm*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | 
> x86_64-*)
>   default_gnu_indirect_function=yes
> -- 
> 2.20.1


Re: Merge from trunk to gccgo branch

2020-12-04 Thread Ian Lance Taylor via Gcc-patches
I've now merged trunk revision
918a5b84a2c51dc9d011d39461cc276e6558069d to the gccgo branch.

Ian


Re: [PATCH] c++: ICE with -fsanitize=vptr and constexpr dynamic_cast [PR98103]

2020-12-04 Thread Marek Polacek via Gcc-patches
On Wed, Dec 02, 2020 at 09:01:48PM -0500, Jason Merrill wrote:
> On 12/2/20 6:18 PM, Marek Polacek wrote:
> > -fsanitize=vptr initializes all vtable pointers to null so that it can
> > catch invalid calls; see cp_ubsan_maybe_initialize_vtbl_ptrs.  That
> > means that evaluating a vtable reference can produce a null pointer
> > in this mode, so cxx_eval_dynamic_cast_fn should check that.
> 
> Yes, but we shouldn't accept it silently; sanitize is supposed to flag
> undefined behavior, not allow it.  If we see a null vptr, we should complain
> and set *non_constant_p.

True, I shouldn't have left it for the run-time diagnostic.  How's this, then?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
-fsanitize=vptr initializes all vtable pointers to null so that it can
catch invalid calls; see cp_ubsan_maybe_initialize_vtbl_ptrs.  That
means that evaluating a vtable reference can produce a null pointer
in this mode, so cxx_eval_dynamic_cast_fn should check that and give
and error.

gcc/cp/ChangeLog:

PR c++/98103
* constexpr.c (cxx_eval_dynamic_cast_fn): If the evaluating of vtable
yields a null pointer, give an error and return.  Use objtype.

gcc/testsuite/ChangeLog:

PR c++/98103
* g++.dg/ubsan/vptr-18.C: New test.
---
 gcc/cp/constexpr.c   | 11 ++-
 gcc/testsuite/g++.dg/ubsan/vptr-18.C | 25 +
 2 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/ubsan/vptr-18.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index e0d358027c9..c413313fbe1 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1998,11 +1998,20 @@ cxx_eval_dynamic_cast_fn (const constexpr_ctx *ctx, 
tree call,
  to the object under construction or destruction, this object is
  considered to be a most derived object that has the type of the
  constructor or destructor's class.  */
-  tree vtable = build_vfield_ref (obj, TREE_TYPE (obj));
+  tree vtable = build_vfield_ref (obj, objtype);
   vtable = cxx_eval_constant_expression (ctx, vtable, /*lval*/false,
 non_constant_p, overflow_p);
   if (*non_constant_p)
 return call;
+  /* With -fsanitize=vptr, we initialize all vtable pointers to null,
+ so it's possible that we got a null pointer now.  */
+  if (integer_zerop (vtable))
+{
+  if (!ctx->quiet)
+   error_at (loc, "virtual table pointer is used uninitialized");
+  *non_constant_p = true;
+  return integer_zero_node;
+}
   /* VTABLE will be &_ZTV1A + 16 or similar, get _ZTV1A.  */
   vtable = extract_obj_from_addr_offset (vtable);
   const tree mdtype = DECL_CONTEXT (vtable);
diff --git a/gcc/testsuite/g++.dg/ubsan/vptr-18.C 
b/gcc/testsuite/g++.dg/ubsan/vptr-18.C
new file mode 100644
index 000..cd2ca0a9fb6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/vptr-18.C
@@ -0,0 +1,25 @@
+// PR c++/98103
+// { dg-do compile { target c++20 } }
+// { dg-additional-options "-fsanitize=vptr -fno-sanitize-recover=vptr" }
+// Modified constexpr-dynamic17.C.
+
+struct V {
+  virtual void f();
+};
+
+struct A : V { };
+
+struct B : V {
+  constexpr B(V*, A*);
+};
+
+struct D : B, A {
+  constexpr D() : B((A*)this, this) { }
+};
+
+constexpr B::B(V* v, A* a)
+{
+  dynamic_cast(a); // { dg-error "uninitialized" }
+}
+
+constexpr D d;

base-commit: df933e307b1950ce12472660dcac1765b8eb431d
-- 
2.28.0



[PATCH] c-family: Fix hang with -Wsequence-point [PR98126]

2020-12-04 Thread Marek Polacek via Gcc-patches
verify_sequence_points uses verify_tree to recursively walk the
subexpressions of an expression, and while recursing, it also
keeps lists of expressions found after/before a sequence point.
For a large expression, the list can grow significantly.  And
merge_tlist is at least N(n^2): for a list of length n it will
iterate n(n -1) times, and call candidate_equal_p each time, and
that can recurse further.  warn_for_collision also has to go
through the whole list.  With a large-enough expression, the
compilation can easily get stuck here for 24 hours.

This patch is a simple kludge: if we see that the expression is
overly complex, don't even try.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/c-family/ChangeLog:

PR c++/98126
* c-common.c (verify_tree_lim_r): New function.
(verify_sequence_points): Use it.  Use nullptr instead of 0.

gcc/testsuite/ChangeLog:

PR c++/98126
* g++.dg/warn/Wsequence-point-4.C: New test.
---
 gcc/c-family/c-common.c   | 32 +--
 gcc/testsuite/g++.dg/warn/Wsequence-point-4.C | 53 +++
 2 files changed, 80 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wsequence-point-4.C

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index dda23520b96..0b348aec77b 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -2056,23 +2056,45 @@ verify_tree (tree x, struct tlist **pbefore_sp, struct 
tlist **pno_sp,
 }
 }
 
+static constexpr size_t verify_sequence_points_limit = 1024;
+
+/* Called from verify_sequence_points via walk_tree.  */
+
+static tree
+verify_tree_lim_r (tree *tp, int *walk_subtrees, void *data)
+{
+  if (++*((size_t *) data) > verify_sequence_points_limit)
+return integer_zero_node;
+
+  if (TYPE_P (*tp))
+*walk_subtrees = 0;
+
+  return NULL_TREE;
+}
+
 /* Try to warn for undefined behavior in EXPR due to missing sequence
points.  */
 
 void
 verify_sequence_points (tree expr)
 {
-  struct tlist *before_sp = 0, *after_sp = 0;
+  tlist *before_sp = nullptr, *after_sp = nullptr;
+
+  /* verify_tree is highly recursive, and merge_tlist is O(n^2),
+ so we return early if the expression is too big.  */
+  size_t n = 0;
+  if (walk_tree (&expr, verify_tree_lim_r, &n, nullptr))
+return;
 
-  warned_ids = 0;
-  save_expr_cache = 0;
-  if (tlist_firstobj == 0)
+  warned_ids = nullptr;
+  save_expr_cache = nullptr;
+  if (!tlist_firstobj)
 {
   gcc_obstack_init (&tlist_obstack);
   tlist_firstobj = (char *) obstack_alloc (&tlist_obstack, 0);
 }
 
-  verify_tree (expr, &before_sp, &after_sp, 0);
+  verify_tree (expr, &before_sp, &after_sp, NULL_TREE);
   warn_for_collisions (after_sp);
   obstack_free (&tlist_obstack, tlist_firstobj);
 }
diff --git a/gcc/testsuite/g++.dg/warn/Wsequence-point-4.C 
b/gcc/testsuite/g++.dg/warn/Wsequence-point-4.C
new file mode 100644
index 000..1382ab5a934
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wsequence-point-4.C
@@ -0,0 +1,53 @@
+// PR c++/98126
+// { dg-do compile }
+// { dg-options "-Wsequence-point" }
+// Make sure we don't hand when verify_tree processes a large expression.
+
+struct T { bool operator==(const T &ot) const; };
+
+#define CMP(M, N, L) t[100 * M + 10 * N + L] == ot.t[100 * M + 10 * N + L] &&
+
+#define CMP1(M, N) \
+  CMP(M, N, 0) \
+  CMP(M, N, 1) \
+  CMP(M, N, 2) \
+  CMP(M, N, 3) \
+  CMP(M, N, 4) \
+  CMP(M, N, 5) \
+  CMP(M, N, 6) \
+  CMP(M, N, 7) \
+  CMP(M, N, 8) \
+  CMP(M, N, 9)
+
+#define CMP2(M) \
+  CMP1(M, 0) \
+  CMP1(M, 1) \
+  CMP1(M, 2) \
+  CMP1(M, 3) \
+  CMP1(M, 4) \
+  CMP1(M, 5) \
+  CMP1(M, 6) \
+  CMP1(M, 7) \
+  CMP1(M, 8) \
+  CMP1(M, 9)
+
+#define GENERATE_CMPS \
+  CMP2(0) \
+  CMP2(1) \
+  CMP2(2) \
+  CMP2(3) \
+  CMP2(4) \
+  CMP2(5) \
+  CMP2(6) \
+  CMP2(7) \
+  CMP2(8) \
+  CMP2(9)
+
+struct C {
+  bool operator==(const C &ot) const {
+return
+  GENERATE_CMPS
+  true;
+  }
+  T t[999];
+};

base-commit: df933e307b1950ce12472660dcac1765b8eb431d
-- 
2.28.0



Re: [PATCH v2] c++: ICE with switch and scoped enum bit-fields [PR98043]

2020-12-04 Thread Marek Polacek via Gcc-patches
On Wed, Dec 02, 2020 at 09:50:33PM -0500, Jason Merrill wrote:
> On 12/2/20 6:18 PM, Marek Polacek wrote:
> > In this testcase we are crashing trying to gimplify a switch, because
> > the types of the switch condition and case constants have different
> > TYPE_PRECISIONs.
> > 
> > This started with my r5-3726 fix: SWITCH_STMT_TYPE is supposed to be the
> > original type of the switch condition before any conversions, so in the
> > C++ FE we need to use unlowered_expr_type to get the unlowered type of
> > enum bit-fields.
> > 
> > Normally, the switch type is subject to integral promotions, but here
> > we have a scoped enum type and those don't promote:
> > 
> >enum class B { A };
> >struct C { B c : 8; };
> > 
> >switch (x.c) // type B
> >  case B::A: // type int, will be converted to B
> > 
> > Here TREE_TYPE is "signed char" but SWITCH_STMT_TYPE is "B".  When
> > gimplifying this in gimplify_switch_expr, the index type is "B" and
> > we convert all the case values to "B" in preprocess_case_label_vec,
> > but SWITCH_COND is of type "signed char": gimple_switch_index should
> > be the (possibly promoted) type, not the original type, so we gimplify
> > the "x.c" SWITCH_COND to a SSA_NAME of type "signed char".  And then
> > we crash because the precision of the index type doesn't match the
> > precision of the case value type.
> > 
> > I think it makes sense to do the following; at the end of pop_switch
> > we've already issued the switch warnings, and since scoped enums don't
> > promote, it should be okay to use the type of SWITCH_STMT_COND.  The
> > r5-3726 change was about giving warnings for enum bit-fields anyway.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?
> > 
> > gcc/cp/ChangeLog:
> > 
> > PR c++/98043
> > * decl.c (pop_switch): If SWITCH_STMT_TYPE is a scoped enum type,
> > set it to the type of SWITCH_STMT_COND.
> 
> It might make sense to do this in cp_genericize_r instead, but here is fine.

Right.  In the end I chose pop_switch due to the other SWITCH_STMT_* handling,
so that it's in the same function.

> > --- a/gcc/cp/decl.c
> > +++ b/gcc/cp/decl.c
> > @@ -3711,6 +3711,17 @@ pop_switch (void)
> >   SWITCH_STMT_ALL_CASES_P (cs->switch_stmt) = 1;
> > if (!cs->break_stmt_seen_p)
> >   SWITCH_STMT_NO_BREAK_P (cs->switch_stmt) = 1;
> > +  /* Now that we're done with the switch warnings, set the switch type
> > + to the type of the condition if the index type was of scoped enum 
> > type.
> > + (Such types don't participate in the integer promotions.)  We do this
> > + because of bit-fields whose declared type is a scoped enum type:
> > + gimplification will use the lowered index type, but convert the
> > + case values to SWITCH_STMT_TYPE, which would have been the declared 
> > type
> > + and verify_gimple_switch doesn't accept that.  */
> > +  if (SWITCH_STMT_TYPE (cs->switch_stmt)
> > +  && SCOPED_ENUM_P (SWITCH_STMT_TYPE (cs->switch_stmt)))
> > +SWITCH_STMT_TYPE (cs->switch_stmt)
> > +  = TREE_TYPE (SWITCH_STMT_COND (cs->switch_stmt));
> 
> What would be the impact of doing this for all
> is_bitfield_expr_with_lowered_type conditions, rather than all scoped enum
> conditions?

The impact is the same: for ordinary bit-fields and unscoped enum bit-fields,
cond will already have been promoted here, so e.g. "(int) a.b", and
is_bitfield_expr_with_lowered_type will return NULL_TREE.  And for scoped
enum bit-fields we will do the same thing as in v1.  I think the SCOPED_ENUM_P
check is cheaper than is_bitfield_expr_with_lowered_type but I'm fine with
either version.  Thanks,

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

-- >8 --
In this testcase we are crashing trying to gimplify a switch, because
the types of the switch condition and case constants have different
TYPE_PRECISIONs.

This started with my r5-3726 fix: SWITCH_STMT_TYPE is supposed to be the
original type of the switch condition before any conversions, so in the
C++ FE we need to use unlowered_expr_type to get the unlowered type of
enum bit-fields.

Normally, the switch type is subject to integral promotions, but here
we have a scoped enum type and those don't promote:

  enum class B { A };
  struct C { B c : 8; };

  switch (x.c) // type B
case B::A: // type int, will be converted to B

Here TREE_TYPE is "signed char" but SWITCH_STMT_TYPE is "B".  When
gimplifying this in gimplify_switch_expr, the index type is "B" and
we convert all the case values to "B" in preprocess_case_label_vec,
but SWITCH_COND is of type "signed char": gimple_switch_index should
be the (possibly promoted) type, not the original type, so we gimplify
the "x.c" SWITCH_COND to a SSA_NAME of type "signed char".  And then
we crash because the precision of the index type doesn't match the
precision of the case value type.

I think it makes sense to do the following; at the end of pop_switch
we've already issued the switch warnings, and since

RE: [PATCH] [X86_64]: Enable support for next generation AMD Zen3 CPU

2020-12-04 Thread Kumar, Venkataramanan via Gcc-patches
[AMD Public Use]

Hi Honza,

> -Original Message-
> From: Jan Hubicka 
> Sent: Saturday, December 5, 2020 1:06 AM
> To: Uros Bizjak 
> Cc: Kumar, Venkataramanan ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] [X86_64]: Enable support for next generation AMD
> Zen3 CPU
> 
> [CAUTION: External Email]
> 
> > On Fri, Dec 4, 2020 at 6:50 PM Kumar, Venkataramanan
> >  wrote:
> > >
> > > [AMD Public Use]
> > >
> > > Hi Uros
> > >
> > > > -Original Message-
> > > > From: Uros Bizjak 
> > > > Sent: Friday, December 4, 2020 2:30 PM
> > > > To: Kumar, Venkataramanan 
> > > > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz)
> > > > 
> > > > Subject: Re: [PATCH] [X86_64]: Enable support for next generation
> > > > AMD
> > > > Zen3 CPU
> > > >
> > > > [CAUTION: External Email]
> > > >
> > > > On Thu, Dec 3, 2020 at 4:29 PM Kumar, Venkataramanan
> > > >  wrote:
> > > > >
> > > > > [AMD Public Use]
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Hi Maintainers,
> > > > >
> > > > >
> > > > >
> > > > > PFA, the patch that enables support for the next generation AMD
> > > > > Zen3
> > > > CPU via -march=znver3.
> > > > >
> > > > > This is a very basic enablement patch. As of now the cost,
> > > > > tuning and
> > > > scheduler changes are kept same as znver2.
> > > > >
> > > > > Further changes to the cost and tunings will be done later.
> > > > >
> > > > >
> > > > >
> > > > > Ok for trunk ?
> > > >
> > > > Please also add a new target to multiversioning and corresponding
> > > > testcases. As an example, how this is done nowadays, please see a
> > > > submission for a different target at [1].
> > > >
> > > > BTW: It looks that multiversioning testcases lack AMD targets. Can
> > > > you please add a testcase similar to
> > > > testsuite/g++.target/i386/mv16.C and also add AMD targets to
> testsuite/gcc.target/i386/funcspec-56.inc.
> > > > (this can be done in a follow-up patch).
> > > >
> > > > [1]
> > > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > > gcc
> > > > .gnu.org%2Fpipermail%2Fgcc-patches%2F2020-
> > > >
> July%2F549699.html&data=04%7C01%7CVenkataramanan.Kumar%40
> > > >
> amd.com%7Cb53d6be6a0d6439396ae08d8983308e9%7C3dd8961fe4884e
> > > >
> 608e11a82d994e183d%7C0%7C0%7C637426692241855598%7CUnknown
> > > >
> %7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha
> > > >
> WwiLCJXVCI6Mn0%3D%7C1000&sdata=VAPPvfzv%2FMCRiXSn2eBNn
> > > > 7bVIReoEHLkAtFgV%2BTFR4I%3D&reserved=0
> > > >
> > >
> > > Please find attached the version 2 patch.
> > >
> > > I have made additional changes as suggested by you.
> > > 1.  Added the AMD Zen targets to funcspec-56.inc file in the tests.
> > > 2.  To covers multiversioning  added a new test with some set of AMD
> targets detected by builtin_cpus similar to mv16.C.
> > >
> > > is ok for trunk ?
> >
> > LGTM (I didn't review scheduling changes in detail).
> 
> I checked the scheudling changes and they are OK. So the patch is OK
> overall.
> 
> Even with respect to Jason's point on possibly regressing primary target
> (breaking -march=native on zen3 machine counts as a regression), the risks
> here are low. There is nothing really controveral in the patch.
> 
> It would be nice to setup the regular benchmarking on zen3 machine, like
> we do for zen1/2.
> Honza

Thank you for reviewing the patch.  I pushed the patch to the gcc trunk.

Ref: 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3e2ae3ee285a57455d5a23bd352a68c289130186

> >
> > Uros.

Regards,
Venkat.