Re: [PATCH] musl: use correct long double abi by default

2019-11-17 Thread Andreas Krebbel
On 15.11.19 19:58, Szabolcs Nagy wrote:
> On 15/11/2019 18:15, Segher Boessenkool wrote:
>> On Fri, Nov 15, 2019 at 06:03:25PM +, Szabolcs Nagy wrote:
>>> i'm fine with that, it means the --with-long-double-128
>>> switch does not work on *-musl,
>>
>> I thought that is what you wanted to do.  If you just want to change the
>> default, do that *before* this block, not *in* it?
> 
> ok, i'll just hard code the abi:
> 
> 
> On powerpc and s390x the musl ABI requires 64 bit and 128 bit long
> double respectively, so set the long double abi accordingly instead
> of requiring correct use of --with-long-double-128.
> 
> gcc/ChangeLog:
> 
> 2019-11-15  Szabolcs Nagy  
> 
>   * configure.ac (gcc_cv_target_ldbl128): Set for *-musl* targets.
>   * configure: Regenerate.
> 

S/390 part is ok. Thanks!

Andreas





Re: [PATCH] s390: add musl support

2019-11-17 Thread Andreas Krebbel
On 15.11.19 18:23, Szabolcs Nagy wrote:
> Add the musl dynamic linker names.
> 
> Build tested on s390x-linux-musl and s390x-linux-gnu.
> 
> gcc/ChangeLog:
> 
> 2019-11-15  Szabolcs Nagy  
> 
>   * config/s390/linux.h (MUSL_DYNAMIC_LINKER32): Define.
>   (MUSL_DYNAMIC_LINKER64): Define.
> 

Ok. Thanks!

Andreas



GCC 10.0 Status Report (2019-11-18), Stage 3 in effect now

2019-11-17 Thread Richard Biener


Status
==

Stage 1 ended, GCC trunk is open for general bugfixing, stage 3.


Quality Data


Priority  #   Change from last report
---   ---
P16
P2  201   -   3
P3  129   + 113
P4  151   +  10
P5   23   -   2
---   ---
Total P1-P3 342   + 160
Total   516   + 168


Previous Report
===

https://gcc.gnu.org/ml/gcc/2019-10/msg00143.html


Re: [PATCH] Split X86_TUNE_AVX128_OPTIMAL into X86_TUNE_AVX256_SPLIT_REGS and X86_TUNE_AVX128_OPTIMAL

2019-11-17 Thread Hongtao Liu
On Sat, Nov 16, 2019 at 7:27 AM Jeff Law  wrote:
>
> On 11/14/19 5:21 AM, Richard Biener wrote:
> > On Tue, Nov 12, 2019 at 11:35 AM Hongtao Liu  wrote:
> >>
> >> Hi:
> >>   As mentioned in https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00832.html
> >>> So yes, it's poorly named.  A preparatory patch to clean this up
> >>> (and maybe split it into TARGET_AVX256_SPLIT_REGS and 
> >>> TARGET_AVX128_OPTIMAL)
> >>> would be nice.
> >>
> >>   Bootstrap and regression test for i386 backend is ok.
> >>   Ok for trunk?
> >
> > It looks OK to me, please let x86 maintainers a day to comment, otherwise OK
> I think this fine to go in now.  Uros largely leaves the AVX bits to others.
Committed, thanks.
>
> jeff
>


-- 
BR,
Hongtao


[PATCH v2] Add `--with-install-sysroot=' configuration option

2019-11-17 Thread Maciej W. Rozycki
Provide means, in the form of a `--with-install-sysroot=' configuration 
option, to override the default installation directory for target 
libraries, otherwise known as $toolexeclibdir.  This is so that it is 
possible to get newly-built libraries, particularly the shared ones, 
installed in a common sysroot, so that they can be readily used by the 
target system as their host libraries, possibly over NFS, without a need 
to manually copy them over from the currently hardcoded location they 
would otherwise be installed in.

The name of the configuration option is chosen such as to give it a 
meaning, rather than referring to obscure $toolexeclibdir.  Arguments 
are interpreted as with the `--with-sysroot=' option.  The default is 
the current value of $toolexeclibdir, so in the absence of the option 
from the invocation of the `configure' script the installation location 
for target libraries remains unchanged from the current arrangement.  
In the presence of the `--enable-version-specific-runtime-libs' option 
and for configurations building native GCC the option is ignored.

config/
* install-sysroot.m4: New file.

gcc/
* doc/install.texi (Cross-Compiler-Specific Options): Document 
`--with-install-sysroot' option.

libada/
* Makefile.in (configure_deps): Add `install-sysroot.m4'.
* configure.ac: Handle `--with-install-sysroot='.
* configure: Regenerate.

libatomic/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libffi/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* include/Makefile.in: Regenerate.
* man/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.

libgcc/
* Makefile.in (configure_deps): Add `install-sysroot.m4'.
* configure.ac: Handle `--with-install-sysroot='.
* configure: Regenerate.

libgfortran/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libgo/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libgomp/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libhsail-rt/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libitm/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* testsuite/Makefile.in: Regenerate.

libobjc/
* Makefile.in (aclocal_deps): Add `install-sysroot.m4'.
* aclocal.m4: Include `install-sysroot.m4'.
* configure.ac: Handle `--with-install-sysroot='.
* configure: Regenerate.

liboffloadmic/
* plugin/configure.ac: Handle `--with-install-sysroot='.
* plugin/Makefile.in: Regenerate.
* plugin/aclocal.m4: Regenerate.
* plugin/configure: Regenerate.
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libphobos/
* m4/druntime.m4: Handle `--with-install-sysroot='.
* m4/Makefile.in: Regenerate.
* libdruntime/Makefile.in: Regenerate.
* src/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libquadmath/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libsanitizer/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.
* asan/Makefile.in: Regenerate.
* interception/Makefile.in: Regenerate.
* libbacktrace/Makefile.in: Regenerate.
* lsan/Makefile.in: Regenerate.
* sanitizer_common/Makefile.in: Regenerate.
* tsan/Makefile.in: Regenerate.
* ubsan/Makefile.in: Regenerate.

libssp/
* configure.ac: Handle `--with-install-sysroot='.
* Makefile.in: Regenerate.
* aclocal.m4: Regenerate.
* configure: Regenerate.

libs

Re: Add a new combine pass

2019-11-17 Thread Andrew Pinski
On Sun, Nov 17, 2019 at 3:35 PM Richard Sandiford
 wrote:
>
> (It's 23:35 local time, so it's still just about stage 1. :-))
>
> While working on SVE, I've noticed several cases in which we fail
> to combine instructions because the combined form would need to be
> placed earlier in the instruction stream than the last of the
> instructions being combined.  This includes one very important
> case in the handling of the first fault register (FFR).
>
> Combine currently requires the combined instruction to live at the same
> location as i3.  I thought about trying to relax that restriction, but it
> would be difficult to do with the current pass structure while keeping
> everything linear-ish time.
>
> So this patch instead goes for an option that has been talked about
> several times over the years: writing a new combine pass that just
> does instruction combination, and not all the other optimisations
> that have been bolted onto combine over time.  E.g. it deliberately
> doesn't do things like nonzero-bits tracking, since that really ought
> to be a separate, more global, optimisation.
>
> This is still far from being a realistic replacement for the even
> the combine parts of the current combine pass.  E.g.:
>
> - it only handles combinations that can be built up from individual
>   two-instruction combinations.
>
> - it doesn't allow new hard register clobbers to be added.
>
> - it doesn't have the special treatment of CC operations.
>
> - etc.
>
> But we have to start somewhere.
>
> On a more positive note, the pass handles things that the current
> combine pass doesn't:
>
> - the main motivating feature mentioned above: it works out where
>   the combined instruction could validly live and moves it there
>   if necessary.  If there are a range of valid places, it tries
>   to pick the best one based on register pressure (although only
>   with a simple heuristic for now).
>
> - once it has combined two instructions, it can try combining the
>   result with both later and earlier code, i.e. it can combine
>   in both directions.
>
> - it tries using REG_EQUAL notes for the final instruction.
>
> - it can parallelise two independent instructions that both read from
>   the same register or both read from memory.
>
> This last feature is useful for generating more load-pair combinations
> on AArch64.  In some cases it can also produce more store-pair combinations,
> but only for consecutive stores.  However, since the pass currently does
> this in a very greedy, peephole way, it only allows load/store-pair
> combinations if the first memory access has a higher alignment than
> the second, i.e. if we can be sure that the combined access is naturally
> aligned.  This should help it to make better decisions than the post-RA
> peephole pass in some cases while not being too aggressive.
>
> The pass is supposed to be linear time without debug insns.
> It only tries a constant number C of combinations per instruction
> and its bookkeeping updates are constant-time.  Once it has combined two
> instructions, it'll try up to C combinations on the result, but this can
> be counted against the instruction that was deleted by the combination
> and so effectively just doubles the constant.  (Note that C depends
> on MAX_RECOG_OPERANDS and the new NUM_RANGE_USERS constant.)
>
> Unfortunately, debug updates via propagate_for_debug are more expensive.
> This could probably be fixed if the pass did more to track debug insns
> itself, but using propagate_for_debug matches combine's behaviour.
>
> The patch adds two instances of the new pass: one before combine and
> one after it.  By default both are disabled, but this can be changed
> using the new 3-bit run-combine param, where:
>
> - bit 0 selects the new pre-combine pass
> - bit 1 selects the main combine pass
> - bit 2 selects the new post-combine pass
>
> The idea is that run-combine=3 can be used to see which combinations
> are missed by the new pass, while run-combine=6 (which I hope to be
> the production setting for AArch64 at -O2+) just uses the new pass
> to mop up cases that normal combine misses.  Maybe in some distant
> future, the pass will be good enough for run-combine=[14] to be a
> realistic option.
>
> I ended up having to add yet another validate_simplify_* routine,
> this time to do the equivalent of:
>
>newx = simplify_replace_rtx (*loc, old_rtx, new_rtx);
>validate_change (insn, loc, newx, 1);
>
> but in a more memory-efficient way.  validate_replace_rtx isn't suitable
> because it deliberately only tries simplifications in limited cases:
>
>   /* Do changes needed to keep rtx consistent.  Don't do any other
>  simplifications, as it is not our job.  */
>
> And validate_simplify_insn isn't useful for this case because it works
> on patterns that have already had changes made to them and expects
> those patterns to be valid rtxes.  simplify-replace operations instead
> need to simplify as they go, when the original modes are still to ha

[committed, obvious] libgomp: Regenerate `testsuite/Makefile.in' for GCC_HEADER_STDINT removal

2019-11-17 Thread Maciej W. Rozycki
Commit r276389 ("configure.ac: Remove GCC_HEADER_STDINT(gstdint.h)") has 
not regenerated `testsuite/Makefile.in'.  Fix it.

libgomp/
* testsuite/Makefile.in: Regenerate.
---
 libgomp/testsuite/Makefile.in |1 -
 1 file changed, 1 deletion(-)

gcc-libgomp-am-regenerate.diff
Index: gcc/libgomp/testsuite/Makefile.in
===
--- gcc.orig/libgomp/testsuite/Makefile.in
+++ gcc/libgomp/testsuite/Makefile.in
@@ -99,7 +99,6 @@ am__aclocal_m4_deps = $(top_srcdir)/../c
$(top_srcdir)/../config/lthostflags.m4 \
$(top_srcdir)/../config/multi.m4 \
$(top_srcdir)/../config/override.m4 \
-   $(top_srcdir)/../config/stdint.m4 \
$(top_srcdir)/../config/tls.m4 $(top_srcdir)/../ltoptions.m4 \
$(top_srcdir)/../ltsugar.m4 $(top_srcdir)/../ltversion.m4 \
$(top_srcdir)/../lt~obsolete.m4 $(top_srcdir)/acinclude.m4 \


[committed, obvious] libgfortran: Regenerate `Makefile.in' for `runstatedir' removal

2019-11-17 Thread Maciej W. Rozycki
A change made with r271340 ("libfortran/90038: Use posix_spawn instead 
of fork") accidentally brought the obsolete `runstatedir' setting back 
in.  Fix it.

libgfortran/
* Makefile.in: Regenerate.
---
 libgfortran/Makefile.in |1 -
 1 file changed, 1 deletion(-)

gcc-libgfortran-am-regenerate.diff
Index: gcc/libgfortran/Makefile.in
===
--- gcc.orig/libgfortran/Makefile.in
+++ gcc/libgfortran/Makefile.in
@@ -694,7 +694,6 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
-runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@


Add a new combine pass

2019-11-17 Thread Richard Sandiford
(It's 23:35 local time, so it's still just about stage 1. :-))

While working on SVE, I've noticed several cases in which we fail
to combine instructions because the combined form would need to be
placed earlier in the instruction stream than the last of the
instructions being combined.  This includes one very important
case in the handling of the first fault register (FFR).

Combine currently requires the combined instruction to live at the same
location as i3.  I thought about trying to relax that restriction, but it
would be difficult to do with the current pass structure while keeping
everything linear-ish time.

So this patch instead goes for an option that has been talked about
several times over the years: writing a new combine pass that just
does instruction combination, and not all the other optimisations
that have been bolted onto combine over time.  E.g. it deliberately
doesn't do things like nonzero-bits tracking, since that really ought
to be a separate, more global, optimisation.

This is still far from being a realistic replacement for the even
the combine parts of the current combine pass.  E.g.:

- it only handles combinations that can be built up from individual
  two-instruction combinations.

- it doesn't allow new hard register clobbers to be added.

- it doesn't have the special treatment of CC operations.

- etc.

But we have to start somewhere.

On a more positive note, the pass handles things that the current
combine pass doesn't:

- the main motivating feature mentioned above: it works out where
  the combined instruction could validly live and moves it there
  if necessary.  If there are a range of valid places, it tries
  to pick the best one based on register pressure (although only
  with a simple heuristic for now).

- once it has combined two instructions, it can try combining the
  result with both later and earlier code, i.e. it can combine
  in both directions.

- it tries using REG_EQUAL notes for the final instruction.

- it can parallelise two independent instructions that both read from
  the same register or both read from memory.

This last feature is useful for generating more load-pair combinations
on AArch64.  In some cases it can also produce more store-pair combinations,
but only for consecutive stores.  However, since the pass currently does
this in a very greedy, peephole way, it only allows load/store-pair
combinations if the first memory access has a higher alignment than
the second, i.e. if we can be sure that the combined access is naturally
aligned.  This should help it to make better decisions than the post-RA
peephole pass in some cases while not being too aggressive.

The pass is supposed to be linear time without debug insns.
It only tries a constant number C of combinations per instruction
and its bookkeeping updates are constant-time.  Once it has combined two
instructions, it'll try up to C combinations on the result, but this can
be counted against the instruction that was deleted by the combination
and so effectively just doubles the constant.  (Note that C depends
on MAX_RECOG_OPERANDS and the new NUM_RANGE_USERS constant.)

Unfortunately, debug updates via propagate_for_debug are more expensive.
This could probably be fixed if the pass did more to track debug insns
itself, but using propagate_for_debug matches combine's behaviour.

The patch adds two instances of the new pass: one before combine and
one after it.  By default both are disabled, but this can be changed
using the new 3-bit run-combine param, where:

- bit 0 selects the new pre-combine pass
- bit 1 selects the main combine pass
- bit 2 selects the new post-combine pass

The idea is that run-combine=3 can be used to see which combinations
are missed by the new pass, while run-combine=6 (which I hope to be
the production setting for AArch64 at -O2+) just uses the new pass
to mop up cases that normal combine misses.  Maybe in some distant
future, the pass will be good enough for run-combine=[14] to be a
realistic option.

I ended up having to add yet another validate_simplify_* routine,
this time to do the equivalent of:

   newx = simplify_replace_rtx (*loc, old_rtx, new_rtx);
   validate_change (insn, loc, newx, 1);

but in a more memory-efficient way.  validate_replace_rtx isn't suitable
because it deliberately only tries simplifications in limited cases:

  /* Do changes needed to keep rtx consistent.  Don't do any other
 simplifications, as it is not our job.  */

And validate_simplify_insn isn't useful for this case because it works
on patterns that have already had changes made to them and expects
those patterns to be valid rtxes.  simplify-replace operations instead
need to simplify as they go, when the original modes are still to hand.

As far as compile-time goes, I tried compiling optabs.ii at -O2
with an --enable-checking=release compiler:

run-combine=2 (normal combine):  100.0% (baseline)
run-combine=4 (new pass only) 98.0%
run-combine=6 (both passes)  100.3

Re: [PATCH v3] PR85678: Change default to -fno-common

2019-11-17 Thread Jeff Law
On 11/5/19 10:17 AM, Wilco Dijkstra wrote:
> Hi Richard,
> 
>> Please investigate those - C++ has -fno-common already so it might be a mix
>> of C/C++ required here.  Note that secondary files can use dg-options
>> with the same behavior as dg-additional-options (they append to 
>> dg-lto-options),
>> so here in _1.c add { dg-options "-fcommon" }
> 
> The odr-6 test mixes C and C++ using .C/.c extensions. But like you suggest, 
> dg-options
> works on the 2nd file, and with that odr-6 and pr88077 tests pass without 
> needing changes
> in lto.exp. I needed to change one of the types since default object 
> alignment is different
> between -fcommon and -fno-common and that causes linker failures when linking 
> objects
> built with different -fcommon settings. I also checked regress on x64, there 
> was one minor
> failure because of the alignment change, which is easily fixed.
> 
> So here is v3:
> 
> [PATCH v3] PR85678: Change default to -fno-common
> 
> GCC currently defaults to -fcommon.  As discussed in the PR, this is an 
> ancient
> C feature which is not conforming with the latest C standards.  On many 
> targets
> this means global variable accesses have a codesize and performance penalty.
> This applies to C code only, C++ code is not affected by -fcommon.  It is 
> about
> time to change the default.
> 
> Passes bootstrap and regress on AArch64 and x64. OK for commit?
> 
> ChangeLog
> 2019-11-05  Wilco Dijkstra  
> 
>   PR85678
>   * common.opt (fcommon): Change init to 1.
> 
> doc/
>   * invoke.texi (-fcommon): Update documentation.
> 
> testsuite/
>   * g++.dg/lto/odr-6_1.c: Add -fcommon.
>   * gcc.dg/alias-15.c: Likewise.
>   * gcc.dg/fdata-sections-1.c: Likewise.  
>   * gcc.dg/ipa/pr77653.c: Likewise.
>   * gcc.dg/lto/20090729_0.c: Likewise.
>   * gcc.dg/lto/20111207-1_0.c: Likewise.
>   * gcc.dg/lto/c-compatible-types-1_0.c: Likewise.
>   * gcc.dg/lto/pr55525_0.c: Likewise.
>   * gcc.dg/lto/pr88077_0.c: Use long to avoid alignment warning.
>   * gcc.dg/lto/pr88077_1.c: Add -fcommon.
>   * gcc.target/aarch64/sve/peel_ind_1.c: Allow ANCHOR0.
>   * gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
>   * gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
>   * gcc.target/i386/volatile-bitfields-2.c: Allow movl or movq.
I'd say let's go for it now.  That gives us plenty of time to work
through any problems.  I think it deserves a mention in the release notes.

jeff



Re: [PATCH] naming GCC's profile data section

2019-11-17 Thread Jeff Law
On 10/24/19 11:58 AM, David Taylor wrote:
> Our application is embedded.  And in addition to cold boot (reload
> everything; start over from scratch), we support warm boot.  As part of
> supporting warm boot, read-write data that needs to be initialized, is
> initialized by code.  And we ensure at link time that the traditional
> initialized read-write data sections (.data, .ldata, .sdata) are empty.
> 
> This presents a problem when attempting to use GCC based profiling as it
> creates read-write data in the aforementioned data sections.
> 
> This patch adds a new command line option that allows you to specify the
> name of the section where GCC puts the instrumentation data.
> 
> If the new option (-fprofile-data-section) is not specified, GCC behaves
> as before.
> 
> What's missing?  Testsuite changes.  I haven't yet figured out how to do
> automated testing of this.  To test it, I built our software, several
> thousand files, and then did an 'objdump --headers', verified that
> sections .data / .ldata / .sdata were either absent of empty, and that
> the instrumentation section had the name that I specified.
> 
> We have a copyright assignment on file from before EMC was acquired by
> Dell.  Our company lawyers assure me that it survived the acquisition
> and is still valid.
> 
> I'm sending this from GNU/Linux rather than from Windows (to avoid
> having the patch mangled), so I'm not sure what the headers will show
> for my return address.  If you wish to email me, I can be reached at
> dtaylor at emc dot com or David dot Taylor at dell dot com.  Or... you
> can just send to the gcc-patches list as I'll be reading it.
> 
> Enough verbiage, here's the ChangeLog entry and the patch...
> 
> 2019-10-23  David Taylor  
> 
>   * common.opt (fprofile-data-section): New command line switch.
>   * coverage.c (build_var): Add support for -fprofile-data-section.
>   (coverage_obj_finish): Ditto.
>   * toplev.c (process_options): Issue warning if
>   -fprofile-data-section is specified when it is not supported.
>   * doc/invoke.texi (Option Summary): List -fprofile-data-section.
>   (Instrumentation Options): Document -fprofile-data-section.
I don't see anything in here particularly worrisome.  WRT testing look
at gcc.dg/pr25376.c, that shows an example of how to tell dejagnu that
you require named sections for the test and how to scan the assembler.
It merely scans for the name of the section, but you can scan for just
about anything you want.

Other tests in that directory should give you a clue how to pass your
argument to the compiler.

> Index: gcc/toplev.c
> ===
> --- gcc/toplev.c  (revision 277133)
> +++ gcc/toplev.c  (working copy)
> @@ -1665,6 +1665,12 @@
> "%<-fdata-sections%> not supported for this target");
> flag_data_sections = 0;
>   }
> +  if (profile_data_section_name)
> + {
> +   warning_at (UNKNOWN_LOCATION, 0,
> +   "-fprofile-data-section= not supported for this target");
> +   profile_data_section_name = NULL;
I suspect you need some markup in the warning string.  Something like
the %<-fprofile-data-section=%> just like you see for flag_data_sections
immediately above the code you're adding.

jeff



[committed] Fix linux-atomic.c build on hppa-linux

2019-11-17 Thread John David Anglin
This patch fixes the build of linux-atomic.c.

Pointer operands are changed to volatile void * and value operands are changed 
to unsigned.
The release functions are changed to use the kernel cmpxchg support.

Tested on hppa-unknown-linux-gnu.  Committed to all active branches.

Dave
-- 

2019-11-17  John David Anglin  

* config/pa/linux-atomic.c (__kernel_cmpxchg): Change argument 1 to
volatile void *.  Remove trap check.
(__kernel_cmpxchg2): Likewise.
(FETCH_AND_OP_2): Adjust operand types.
(OP_AND_FETCH_2): Likewise.
(FETCH_AND_OP_WORD): Likewise.
(OP_AND_FETCH_WORD): Likewise.
(COMPARE_AND_SWAP_2): Likewise.
(__sync_val_compare_and_swap_4): Likewise.
(__sync_bool_compare_and_swap_4): Likewise.
(SYNC_LOCK_TEST_AND_SET_2): Likewise.
(__sync_lock_test_and_set_4): Likewise.
(SYNC_LOCK_RELEASE_1): Likewise.  Use __kernel_cmpxchg2 for release.
(__sync_lock_release_4): Adjust operand types.  Use __kernel_cmpxchg
for release.
(__sync_lock_release_8): Remove.

Index: config/pa/linux-atomic.c
===
--- config/pa/linux-atomic.c(revision 278361)
+++ config/pa/linux-atomic.c(working copy)
@@ -41,7 +41,7 @@

 /* Kernel helper for compare-and-exchange a 32-bit value.  */
 static inline long
-__kernel_cmpxchg (int *mem, int oldval, int newval)
+__kernel_cmpxchg (volatile void *mem, int oldval, int newval)
 {
   register unsigned long lws_mem asm("r26") = (unsigned long) (mem);
   register int lws_old asm("r25") = oldval;
@@ -54,20 +54,18 @@
: "i" (LWS_CAS), "r" (lws_mem), "r" (lws_old), "r" (lws_new)
: "r1", "r20", "r22", "r23", "r29", "r31", "memory"
   );
-  if (__builtin_expect (lws_errno == -EFAULT || lws_errno == -ENOSYS, 0))
-__builtin_trap ();

   /* If the kernel LWS call succeeded (lws_errno == 0), lws_ret contains
  the old value from memory.  If this value is equal to OLDVAL, the
  new value was written to memory.  If not, return -EBUSY.  */
   if (!lws_errno && lws_ret != oldval)
-lws_errno = -EBUSY;
+return -EBUSY;

   return lws_errno;
 }

 static inline long
-__kernel_cmpxchg2 (void *mem, const void *oldval, const void *newval,
+__kernel_cmpxchg2 (volatile void *mem, const void *oldval, const void *newval,
   int val_size)
 {
   register unsigned long lws_mem asm("r26") = (unsigned long) (mem);
@@ -88,9 +86,6 @@
   if (__builtin_expect (lws_ret == 0, 1))
 return 0;

-  if (__builtin_expect (lws_errno == -EFAULT || lws_errno == -ENOSYS, 0))
-__builtin_trap ();
-
   /* If the kernel LWS call fails with no error, return -EBUSY */
   if (__builtin_expect (!lws_errno, 0))
 return -EBUSY;
@@ -108,13 +103,13 @@

 #define FETCH_AND_OP_2(OP, PFX_OP, INF_OP, TYPE, WIDTH, INDEX) \
   TYPE HIDDEN  \
-  __sync_fetch_and_##OP##_##WIDTH (TYPE *ptr, TYPE val)
\
+  __sync_fetch_and_##OP##_##WIDTH (volatile void *ptr, TYPE val)   \
   {\
 TYPE tmp, newval;  \
 long failure;  \
\
 do {   \
-  tmp = __atomic_load_n (ptr, __ATOMIC_RELAXED);   \
+  tmp = __atomic_load_n ((volatile TYPE *)ptr, __ATOMIC_RELAXED);  \
   newval = PFX_OP (tmp INF_OP val);
\
   failure = __kernel_cmpxchg2 (ptr, &tmp, &newval, INDEX); \
 } while (failure != 0);\
@@ -122,36 +117,36 @@
 return tmp;
\
   }

-FETCH_AND_OP_2 (add,   , +, long long, 8, 3)
-FETCH_AND_OP_2 (sub,   , -, long long, 8, 3)
-FETCH_AND_OP_2 (or,, |, long long, 8, 3)
-FETCH_AND_OP_2 (and,   , &, long long, 8, 3)
-FETCH_AND_OP_2 (xor,   , ^, long long, 8, 3)
-FETCH_AND_OP_2 (nand, ~, &, long long, 8, 3)
+FETCH_AND_OP_2 (add,   , +, long long unsigned int, 8, 3)
+FETCH_AND_OP_2 (sub,   , -, long long unsigned int, 8, 3)
+FETCH_AND_OP_2 (or,, |, long long unsigned int, 8, 3)
+FETCH_AND_OP_2 (and,   , &, long long unsigned int, 8, 3)
+FETCH_AND_OP_2 (xor,   , ^, long long unsigned int, 8, 3)
+FETCH_AND_OP_2 (nand, ~, &, long long unsigned int, 8, 3)

-FETCH_AND_OP_2 (add,   , +, short, 2, 1)
-FETCH_AND_OP_2 (sub,   , -, short, 2, 1)
-FETCH_AND_OP_2 (or,, |, short, 2, 1)
-FETCH_AND_OP_2 (and,   , &, short, 2, 1)
-FETCH_AND_OP_2 (xor,   , ^, short, 2, 1)
-FETCH_AND_OP_2 (nand, ~, &, short, 2, 1)
+FETCH_AND_OP_2 (add,   , +, short unsigned int, 2, 1)
+FETCH_AND_OP_2 (sub,   , -, short unsigned int, 2, 1)
+FETCH_AND_OP_2 (or,

Re: [PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2019-11-17 Thread Ville Voutilainen
On Sun, 17 Nov 2019 at 23:15, François Dumont  wrote:
>
> H1 used to be a reference to the user Hash, now _Hashtable and unordered
> types agree on the same Hash type which is more intuitive.
>
> I also chose to not support anymore a stateful ranged hash functor. We
> only use _Mod_range_hashing and _Mask_range_hashing.
>
> Thanks to this simplification _M_bucket_index can also be simplified.

Do we know whether there are existing users that this breaks? Also, is
this ABI-compatible
for our unordered containers?


[PATCH][Hashtable 6/6] PR 68303 small size optimization

2019-11-17 Thread François Dumont

This is an implementation of PR 68303.

I try to use this idea as much as possible to avoid computation of hash 
codes.


Note that tests are not showing any gain. I guess hash computation must 
be quite bad to get a benefit from it. So I am only activating it when 
hash code is not cached and/or when computation is not fast.


    PR libstdc++/68303
    * include/bits/hashtable_policy.h
    (_Small_size_threshold<_Hash>): New.
    (_Hashtable_traits<>): Add _Small_size_threshold std::size_t template
    parameter, default to 0.
    (_Hashtable_traits<>::__small_size_threshold): New.
    (_Hash_code_base<>::_M_hash_code(const __node_type*)): New.
    (_Equal_helper<>::_S_node_equals): New.
    * include/bits/hashtable.h:
    (__small_size_threshold_default<>): New template alias.
    (_Hashtable<>::find): Add linear lookup when size is lower or equal to
    _Small_size_threshold.
    (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Add linear
    lookup when size is lower or equal to _Small_size_threshold.
    (_Hashtable<>::_M_insert<>(_Arg&&, const _NodeGenerator&, true_type,
    size_type)): Likewise.
    (_Hashtable<>::_M_compute_hash_code(const_iterator, const key_type&)):
    New.
    (_Hashtable<>::_M_emplace<_Args>(false_type, _Args&&...)): Use latter.
    (_Hashtable<>::_M_insert(const_iterator, _Arg&&, const _NodeGenerator&,
    false_type)): Likewise.
    (_Hashtable<>::_M_find_before_node(const key_type&)): New.
    (_Hashtable<>::_M_erase(true_type, const key_type&)): Use latter if 
size

    is lower or equal to _Small_size_threshold.
    (_Hashtable<>::_M_erase(false_type, const key_type&)): Likewise.
    * include/bits/unordered_map.h (__umaps_traits): Adapt using small size
    threshold set to 20.
    (__ummap_traits): Likewise.
    * include/bits/unordered_set.h (__uset_traits, __ummset_traits): 
Likewise.

    * src/c++11/hashtable_c++0x.cc: Add  include.

Tested under Linux x86_64.

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 9dadc62e328..460f25affe4 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -48,6 +48,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		   // Mandatory to have erase not throwing.
 		   __is_nothrow_invocable>>;
 
+  template
+using __small_size_threshold_default
+  = typename conditional<__cache,
+		// No small size optimization if hash code is cached...
+		integral_constant,
+		_Small_size_threshold<_Hash>>::type;
   /**
*  Primary class template _Hashtable.
*
@@ -743,6 +749,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __node_base*
   _M_find_before_node(size_type, const key_type&, __hash_code) const;
 
+  __node_base*
+  _M_find_before_node(const key_type&);
+
   __node_type*
   _M_find_node(size_type __bkt, const key_type& __key,
 		   __hash_code __c) const
@@ -766,6 +775,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __node_base*
   _M_get_previous_node(size_type __bkt, __node_base* __n);
 
+  pair
+  _M_compute_hash_code(const_iterator __hint, const key_type& __k) const;
+
   // Insert node __n with hash code __code, in bucket __bkt if no
   // rehash (assumes no element with same key already present).
   // Takes ownership of __n if insertion succeeds, throws otherwise.
@@ -1490,6 +1502,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 find(const key_type& __k)
 -> iterator
 {
+  if (size() <= __traits_type::__small_size_threshold::value)
+	{
+	  for (auto __it = begin(); __it != end(); ++__it)
+	if (this->_M_key_equals(__k, __it._M_cur))
+	  return __it;
+	  return end();
+	}
+
   __hash_code __code = this->_M_hash_code(__k);
   std::size_t __bkt = _M_bucket_index(__code);
   return iterator(_M_find_node(__bkt, __k, __code));
@@ -1504,6 +1524,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 find(const key_type& __k) const
 -> const_iterator
 {
+  if (size() <= __traits_type::__small_size_threshold::value)
+	{
+	  for (auto __it = begin(); __it != end(); ++__it)
+	if (this->_M_key_equals(__k, __it._M_cur))
+	  return __it;
+	  return end();
+	}
+
   __hash_code __code = this->_M_hash_code(__k);
   std::size_t __bkt = _M_bucket_index(__code);
   return const_iterator(_M_find_node(__bkt, __k, __code));
@@ -1619,6 +1647,34 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return nullptr;
 }
 
+  // Find the node before the one whose key compares equal to k.
+  // Return nullptr if no node is found.
+  template
+auto
+_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
+	   _Hash, _RehashPolicy, _Traits>::
+_M_find_before_node(const key_type& __k)
+-> __node_base*
+{
+  __node_base* __prev_p = &_M_before_begin;
+  if (!__prev_p->_M_nxt)
+	return nullptr;
+
+  for (__node_type* __p = static_cast<__node_type*>(__prev_p->_M_nxt);
+	   __p != nullptr;
+	   __p = __p->_M_next())
+	{
+	  if (this->_M_key_

[PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2019-11-17 Thread François Dumont
H1 used to be a reference to the user Hash, now _Hashtable and unordered 
types agree on the same Hash type which is more intuitive.


I also chose to not support anymore a stateful ranged hash functor. We 
only use _Mod_range_hashing and _Mask_range_hashing.


Thanks to this simplification _M_bucket_index can also be simplified.

    * include/bits/hashtable_policy.h (_Hashtable<>): Remove _H1 and _H2
    template parameters.
    (_Hastable_base<>): Likewise.
    (_Default_ranged_hash): Remove.
    (_Prime_rehash_policy::__ranged_hash): New.
    (_Power2_rehash_policy::__ranged_hash): New.
    (_Map_base<>): Remove _H1 and _H2 template parameters.
    (_Insert_base<>): Likewise.
    (_Insert<>): Likewise.
    (_Rehash_base<>): Likewise.
    (_Local_iterator_base<>): Remove _H1 and _H2 template parameters 
and add

    _RangedHash.
    (_Hash_code_base<>): Likewise.
    (_Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2, _Hash,
    __hash_not_cached_t>): Remove.
    (_Hash_code_base<>::_M_bucket_index(const _Key&, __hash_code, size_t)):
    Replace by...
    (_Hash_code_base<>::_M_bucket_index(__hash_code, size_t)): ...this.
    (_Local_iterator<>): Remove _H1 and _H2 template parameters.
    (_Local_const_iterator<>): Likewise.
    (_Equality<>): Likewise.
    * include/bits/hashtable.h (_Hashtable<>): Remove _H1 and _H2 template
    parameters.
    * include/bits/node_handle.h: Adapt.
    * include/bits/unordered_map.h: Adapt.
    * include/bits/unordered_set.h: Adapt.
    * testsuite/23_containers/unordered_set/hash_policy/26132.cc: Adapt.
    * testsuite/23_containers/unordered_set/hash_policy/71181.cc: Adapt.
    * testsuite/23_containers/unordered_set/hash_policy/load_factor.cc:
    Adapt.
    * testsuite/23_containers/unordered_set/hash_policy/rehash.cc: Adapt.
    * testsuite/23_containers/unordered_set/insert/hash_policy.cc: Adapt.
    * testsuite/23_containers/unordered_set/max_load_factor/robustness.cc:
    Adapt.
    * testsuite/performance/23_containers/insert/54075.cc: Adapt.
    * testsuite/performance/23_containers/insert_erase/41975.cc: Adapt.
    * testsuite/performance/23_containers/insert_erase/
    unordered_small_size.cc: Adapt.

Tested under Linux x86_64.

François


diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index ad07a36eb83..d09c851e8a4 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -69,31 +69,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  and returns a bool-like value that is true if the two objects
*  are considered equal.
*
-   *  @tparam _H1  The hash function. A unary function object with
+   *  @tparam _Hash  The hash function. A unary function object with
*  argument type _Key and result type size_t. Return values should
*  be distributed over the entire range [0, numeric_limits:::max()].
*
-   *  @tparam _H2  The range-hashing function (in the terminology of
-   *  Tavori and Dreizin).  A binary function object whose argument
-   *  types and result type are all size_t.  Given arguments r and N,
-   *  the return value is in the range [0, N).
-   *
-   *  @tparam _Hash  The ranged hash function (Tavori and Dreizin). A
-   *  binary function whose argument types are _Key and size_t and
-   *  whose result type is size_t.  Given arguments k and N, the
-   *  return value is in the range [0, N).  Default: hash(k, N) =
-   *  h2(h1(k), N).  If _Hash is anything other than the default, _H1
-   *  and _H2 are ignored.
-   *
-   *  @tparam _RehashPolicy  Policy class with three members, all of
-   *  which govern the bucket count. _M_next_bkt(n) returns a bucket
-   *  count no smaller than n.  _M_bkt_for_elements(n) returns a
-   *  bucket count appropriate for an element count of n.
-   *  _M_need_rehash(n_bkt, n_elt, n_ins) determines whether, if the
-   *  current bucket count is n_bkt and the current element count is
-   *  n_elt, we need to increase the bucket count.  If so, returns
-   *  make_pair(true, n), where n is the new bucket count.  If not,
-   *  returns make_pair(false, )
+   *  @tparam _RehashPolicy  Policy class with three members, all of which
+   *  govern the bucket count. _M_next_bkt(n) returns a bucket count no smaller
+   *  than n. _M_bkt_for_elements(n) returns a bucket count appropriate for an
+   *  element count of n. _M_need_rehash(n_bkt, n_elt, n_ins) determines
+   *  whether, if the current bucket count is n_bkt and the current element
+   *  count is n_elt, we need to increase the bucket count for n_ins insertions.
+   *  If so, returns make_pair(true, n), where n is the new bucket count. If
+   *  not, returns make_pair(false, )
*
*  @tparam _Traits  Compile-time class with three boolean
*  std::integral_constant members:  __cache_hash_code, __constant_iterators,
@@ -168,19 +155,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   template
+	   typename _Hash, typename _RehashPolicy, typename _Traits>
 class _Hash

[PATCH][Hashtable 4/6] Clean local_iterator implementation

2019-11-17 Thread François Dumont
Simplify local_iterator implementation. It makes local_iterator and 
iterator comparable which is used in debug containers.


    * include/bits/hashtable_policy.h (_Node_iterator_base()): New.
    (operator==(const _Node_iterator_base&, const _Node_iterator_base&)):
    Make hidden friend.
    (operator!=(const _Node_iterator_base&, const _Node_iterator_base&)):
    Make hidden friend.
    (_Local_iterator_base<>): Inherits _Node_iterator_base.
    (_Local_iterator_base<>::_M_cur): Remove.
    (_Local_iterator_base<>::_M_curr()): Remove.
    (operator==(const _Local_iterator_base&, const _Local_iterator_base&)):
    Remove.
    (operator!=(const _Local_iterator_base&, const _Local_iterator_base&)):
    Remove.
    * include/debug/unordered_map (unordered_map<>::_M_invalidate): Adapt.
    (unordered_multimap<>::_M_invalidate): Adapt.
    * include/debug/unordered_set (unordered_set<>::_M_invalidate): Adapt.
    (unordered_multiset<>::_M_invalidate): Adapt.

Tested under Linux x86_64.

François

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index f330f7f811b..5cc943b3d22 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -301,27 +301,24 @@ namespace __detail
 
   __node_type*  _M_cur;
 
+  _Node_iterator_base() = default;
   _Node_iterator_base(__node_type* __p) noexcept
   : _M_cur(__p) { }
 
   void
   _M_incr() noexcept
   { _M_cur = _M_cur->_M_next(); }
-};
 
-  template
-inline bool
-operator==(const _Node_iterator_base<_Value, _Cache_hash_code>& __x,
-	   const _Node_iterator_base<_Value, _Cache_hash_code >& __y)
-noexcept
-{ return __x._M_cur == __y._M_cur; }
+  friend bool
+  operator==(const _Node_iterator_base& __x, const _Node_iterator_base& __y)
+  noexcept
+  { return __x._M_cur == __y._M_cur; }
 
-  template
-inline bool
-operator!=(const _Node_iterator_base<_Value, _Cache_hash_code>& __x,
-	   const _Node_iterator_base<_Value, _Cache_hash_code>& __y)
-noexcept
-{ return __x._M_cur != __y._M_cur; }
+  friend bool
+  operator!=(const _Node_iterator_base& __x, const _Node_iterator_base& __y)
+  noexcept
+  { return __x._M_cur != __y._M_cur; }
+};
 
   /// Node iterators, used to iterate through all the hashtable.
   template::type;
 
   _Node_iterator() noexcept
-  : __base_type(0) { }
+  : __base_type(nullptr) { }
 
   explicit
   _Node_iterator(__node_type* __p) noexcept
@@ -394,7 +391,7 @@ namespace __detail
   typedef const _Value&reference;
 
   _Node_const_iterator() noexcept
-  : __base_type(0) { }
+  : __base_type(nullptr) { }
 
   explicit
   _Node_const_iterator(__node_type* __p) noexcept
@@ -1426,9 +1423,11 @@ namespace __detail
 struct _Local_iterator_base<_Key, _Value, _ExtractKey,
 _H1, _H2, _Hash, __hash_cached_t>
 : private _Hashtable_ebo_helper<0, _H2>
+, _Node_iterator_base<_Value, __hash_cached_t>
 {
 protected:
   using __base_type = _Hashtable_ebo_helper<0, _H2>;
+  using __base_node_iter = _Node_iterator_base<_Value, __hash_cached_t>;
   using __hash_code_base = _Hash_code_base<_Key, _Value, _ExtractKey,
 	   _H1, _H2, _Hash,
 	   __hash_cached_t>;
@@ -1437,31 +1436,27 @@ namespace __detail
   _Local_iterator_base(const __hash_code_base& __base,
 			   _Hash_node<_Value, __hash_cached_t>* __p,
 			   std::size_t __bkt, std::size_t __bkt_count)
-  : __base_type(__base._M_h2()),
-	_M_cur(__p), _M_bucket(__bkt), _M_bucket_count(__bkt_count) { }
+  : __base_type(__base._M_h2()), __base_node_iter(__p)
+  , _M_bucket(__bkt), _M_bucket_count(__bkt_count) { }
 
   void
   _M_incr()
   {
-	_M_cur = _M_cur->_M_next();
-	if (_M_cur)
+	__base_node_iter::_M_incr();
+	if (this->_M_cur)
 	  {
 	std::size_t __bkt
-	  = __base_type::_M_get()(_M_cur->_M_hash_code,
-	   _M_bucket_count);
+	  = __base_type::_M_get()(this->_M_cur->_M_hash_code,
+  _M_bucket_count);
 	if (__bkt != _M_bucket)
-	  _M_cur = nullptr;
+	  this->_M_cur = nullptr;
 	  }
   }
 
-  _Hash_node<_Value, __hash_cached_t>*  _M_cur;
   std::size_t _M_bucket;
   std::size_t _M_bucket_count;
 
 public:
-  const void*
-  _M_curr() const { return _M_cur; }  // for equality ops
-
   std::size_t
   _M_get_bucket() const { return _M_bucket; }  // for debug mode
 };
@@ -1510,18 +1505,20 @@ namespace __detail
 struct _Local_iterator_base<_Key, _Value, _ExtractKey,
 _H1, _H2, _Hash, __hash_not_cached_t>
 : __hash_code_for_local_iter<_Key, _Value, _ExtractKey, _H1, _H2, _Hash>
+, _Node_iterator_base<_Value, __hash_not_cached_t>
 {
 protected:
   using __hash_code_base = _Hash_code_base<_Key, _Value, _ExtractKey,
 	   _H1, _H2, _Hash,
 	   __hash_not_cached_t>;
+  usi

Re: LRA: handle memory constraints that accept more than "m"

2019-11-17 Thread Jeff Law
On 11/8/19 2:03 AM, Richard Sandiford wrote:
> LRA allows address constraints that are more relaxed than "p":
> 
>   /* Target hooks sometimes don't treat extra-constraint addresses as
>  legitimate address_operands, so handle them specially.  */
>   if (insn_extra_address_constraint (cn)
>   && satisfies_address_constraint_p (&ad, cn))
> return change_p;
> 
> For SVE it's useful to allow the same thing for memory constraints.
> The particular use case is LD1RQ, which is an SVE instruction that
> addresses Advanced SIMD vector modes and that accepts some addresses
> that normal Advanced SIMD moves don't.
> 
> Normally we require every memory to satisfy at least "m", which is
> defined to be a memory "with any kind of address that the machine
> supports in general".  However, LD1RQ is very much special-purpose:
> it doesn't really have any relation to normal operations on these
> modes.  Adding its addressing modes to "m" would lead to bad Advanced
> SIMD optimisation decisions in passes like ivopts.  LD1RQ therefore
> has a memory constraint that accepts things "m" doesn't.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-11-08  Richard Sandiford  
> 
> gcc/
>   * lra-constraints.c (valid_address_p): Take the operand and a
>   constraint as argument.  If the operand is a MEM and the constraint
>   is a memory constraint, check whether the eliminated form of the
>   MEM already satisfies the constraint.
>   (process_address_1): Update calls accordingly.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_f16.c: Remove XFAIL.
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_f32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_f64.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_s16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_s32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_s64.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_u16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_u32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/ld1rq_u64.c: Likewise.
OK.  Obviously I'll be on the lookout for any fallout on other targets.

jeff



Re: [PATCH 4/4] MSP430: Deprecate -minrt option

2019-11-17 Thread Jeff Law
On 11/7/19 2:41 PM, Jozef Lawrynowicz wrote:
> Support for the MSP430 -minrt option has been removed from Newlib, since all 
> the
> associated behaviour is now dynamic. Initialization code run before main is 
> only
> included when needed.
> 
> This patch removes the final traces of -minrt from GCC.
> 
> -minrt used to modify the linking process in the following ways:
> * Removing .init and .fini sections, by using a reduced crt0 and excluding 
> crtn.
> * Removing crtbegin and crtend (thereby not using crtstuff.c at all).
>   + This meant that even if the program had constructors for global or
> static objects which must run before main, it would blindly remove them.
> 
> These causes of code bloat have been addressed by:
> * switching to .{init,fini}_array instead of using .{init,fini} sections
>   "Lean" code to run through constructors before main is only included if
>   .init_array has contents.
> * removing bloat (frame_dummy, *tm_clones*, *do_global_dtors*) from the
>   crtstuff.c with the changes in the previous patches
> 
> Here are some examples of the total size of different "barebones" C programs 
> to
> show that the size previously achieved by -minrt is now matched by default:
> 
> program |old (with -minrt)  |new (without -minrt)
> -
> Empty main  |20 |20
> Looping main|14 |14
> Looping main with data  |94 |94
> Looping main with bss   |56 |56
> 
> 
> 0004-MSP430-Remove-minrt-option.patch
> 
> From 6e561b45c118540f06d5828ec386d2dd79c13b62 Mon Sep 17 00:00:00 2001
> From: Jozef Lawrynowicz 
> Date: Wed, 6 Nov 2019 18:12:45 +
> Subject: [PATCH 4/4] MSP430: Remove -minrt option
> 
> gcc/ChangeLog:
> 
> 2019-11-07  Jozef Lawrynowicz  
> 
>   * config/msp430/msp430.h (STARTFILE_SPEC): Remove -minrt rules.
>   Use "if, then, else" syntax for specs.
>   (ENDFILE_SPEC): Likewise.
>   * config/msp430/msp430.opt: Mark -minrt as deprecated.
>   * doc/invoke.texi: Remove -minrt documentation.
This is fine.  I leave the decision whether or not to install now or
wait for resolution on the other changes in this space as your decision.

jeff



[PATCH][Hashtable 3/6] Fix noexcept qualifications

2019-11-17 Thread François Dumont
This patch adds noexcept qualification on allocator aware constructors 
and fix the one on the default constructor.


    * include/bits/hashtable.h
    (_Hashtable(_Hashtable&& __ht, __node_alloc_type&& __a, true_type)):
    Add noexcept qualification.
    (_Hashtable(_Hashtable&&)): Fix noexcept qualification.
    (_Hashtable(_Hashtable&&, const allocator_type&)): Add noexcept
    qualification.
    * include/bits/unordered_map.h
    (unordered_map(unordered_map&&, const allocator_type&)): Add noexcept
    qualification.
    (unordered_multimap(unordered_multimap&&, const allocator_type&)): Add
    noexcept qualification.
    * include/bits/unordered_set.h
    (unordered_set(unordered_set&&, const allocator_type&)): Add noexcept
    qualification.
    (unordered_multiset(unordered_multiset&&, const allocator_type&)): Add
    noexcept qualification.
    * testsuite/23_containers/unordered_map/allocator/default_init.cc:
    New.
    * testsuite/23_containers/unordered_map/cons/
    noexcept_default_construct.cc: New.
    * testsuite/23_containers/unordered_map/cons/
    noexcept_move_construct.cc: New.
    * testsuite/23_containers/unordered_map/modifiers/move_assign.cc:
    New.
    * testsuite/23_containers/unordered_multimap/cons/
    noexcept_default_construct.cc: New.
    * testsuite/23_containers/unordered_multimap/cons/
    noexcept_move_construct.cc: New.
    * testsuite/23_containers/unordered_multiset/cons/
    noexcept_default_construct.cc: New.
    * testsuite/23_containers/unordered_multiset/cons/
    noexcept_move_construct.cc: New.
    * testsuite/23_containers/unordered_set/allocator/default_init.cc: New.
    * testsuite/23_containers/unordered_set/cons/
    noexcept_default_construct.cc: New.
    * 
testsuite/23_containers/unordered_set/cons/noexcept_move_construct.cc:

    New.

Tested under Linux x86_64.

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index 5f785d4904d..ad07a36eb83 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -463,6 +463,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__hashtable_alloc(__node_alloc_type(__a))
   { }
 
+  _Hashtable(_Hashtable&& __ht, __node_alloc_type&& __a, true_type)
+	noexcept(std::is_nothrow_copy_constructible<_H1>::value &&
+		 std::is_nothrow_copy_constructible<_Equal>::value)
+  : __hashtable_base(__ht),
+	__map_base(__ht),
+	__rehash_base(__ht),
+	__hashtable_alloc(std::move(__a)),
+	_M_buckets(__ht._M_buckets),
+	_M_bucket_count(__ht._M_bucket_count),
+	_M_before_begin(__ht._M_before_begin._M_nxt),
+	_M_element_count(__ht._M_element_count),
+	_M_rehash_policy(__ht._M_rehash_policy)
+  {
+	// Update, if necessary, buckets if __ht is using its single bucket.
+	if (__ht._M_uses_single_bucket())
+	  {
+	_M_buckets = &_M_single_bucket;
+	_M_single_bucket = __ht._M_single_bucket;
+	  }
+
+	// Fix bucket containing the _M_before_begin pointer that can't be
+	// moved.
+	_M_update_bbegin();
+
+	__ht._M_reset();
+  }
+
+  _Hashtable(_Hashtable&&, __node_alloc_type&&, false_type);
+
   template
 	_Hashtable(_InputIterator __first, _InputIterator __last,
 		   size_type __bkt_count_hint,
@@ -489,11 +518,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   _Hashtable(const _Hashtable&);
 
-  _Hashtable(_Hashtable&&) noexcept;
+  _Hashtable(_Hashtable&& __ht)
+	noexcept( noexcept(
+	  _Hashtable(std::declval<_Hashtable&&>(),
+	std::declval<__node_alloc_type&&>(), std::declval())) )
+  : _Hashtable(std::move(__ht), std::move(__ht._M_node_allocator()),
+		   true_type{})
+  { }
 
   _Hashtable(const _Hashtable&, const allocator_type&);
 
-  _Hashtable(_Hashtable&&, const allocator_type&);
+  _Hashtable(_Hashtable&& __ht, const allocator_type& __a)
+	noexcept( noexcept(
+	  _Hashtable(std::declval<_Hashtable&&>(),
+	std::declval<__node_alloc_type&&>(),
+	std::declval())) )
+  : _Hashtable(std::move(__ht), __node_alloc_type(__a),
+		   typename __node_alloc_traits::is_always_equal{})
+  { }
 
   // Use delegating constructors.
   template
@@ -1368,36 +1410,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_assign(__ht, __alloc_node_gen);
 }
 
-  template
-_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
-	   _H1, _H2, _Hash, _RehashPolicy, _Traits>::
-_Hashtable(_Hashtable&& __ht) noexcept
-: __hashtable_base(__ht),
-  __map_base(__ht),
-  __rehash_base(__ht),
-  __hashtable_alloc(std::move(__ht._M_base_alloc())),
-  _M_buckets(__ht._M_buckets),
-  _M_bucket_count(__ht._M_bucket_count),
-  _M_before_begin(__ht._M_before_begin._M_nxt),
-  _M_element_count(__ht._M_element_count),
-  _M_rehash_policy(__ht._M_rehash_policy)
-{
-  // Update, if necessary, buckets if __ht is using its single bucket.
-  if (__ht._M_uses_single_bucket())
-	{
-	  _M_buckets = &_M_single_bucket;
-	  _M_single_bucket = __ht._M_single_bucket;

[PATCH][Hashtable 2/6] Avoid over-sizing container

2019-11-17 Thread François Dumont
This patch avoids over-sizing of the container by rather considering the 
bucket count hint or potential reservation.


It concerns only the non-multi containers.

    * include/bits/hashtable.h
    (_Hashtable<>(_InputIterator, _InputIterator, size_t, const _H1&,
    const _H2&, const _Hash&, const _Equal&, const _ExtractKey&,
    const allocator_type&, __unique_keys_t)): New.
    (_Hashtable<>(_InputIterator, _InputIterator, size_t, const _H1&,
    const _H2&, const _Hash&, const _Equal&, const _ExtractKey&,
    const allocator_type&, __multi_keys_t)): New.
    (_Hashtable<>(_InputIterator, _InputIterator, size_t, const _H1&,
    const _H2&, const _Hash&, const _Equal&, const _ExtractKey&,
    const allocator_type&)): Delegate to latters.
    (operator=(initializer_list)): Rehash if too small.
    (_M_insert(_Arg&&, const _NodeGenerator&, __unique_keys_t)): Remove
    size_t len parameter.
    * include/bits/hashtable_policy.h (_Insert_base<>::_M_insert_range):
    Do not try to get input range distance.
    * testsuite/23_containers/unordered_set/cons/bucket_hint.cc: New.
    * testsuite/23_containers/unordered_set/modifiers/insert.cc: New.

Tested under Linux x86_64.

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index a685c20376f..5f785d4904d 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -463,17 +463,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__hashtable_alloc(__node_alloc_type(__a))
   { }
 
-public:
-  // Constructor, destructor, assignment, swap
-  _Hashtable() = default;
-  _Hashtable(size_type __bkt_count_hint,
+  template
+	_Hashtable(_InputIterator __first, _InputIterator __last,
+		   size_type __bkt_count_hint,
 		   const _H1&, const _H2&, const _Hash&,
 		   const _Equal&, const _ExtractKey&,
-		 const allocator_type&);
+		   const allocator_type&,
+		   __unique_keys_t);
 
   template
 	_Hashtable(_InputIterator __first, _InputIterator __last,
 		   size_type __bkt_count_hint,
+		   const _H1&, const _H2&, const _Hash&,
+		   const _Equal&, const _ExtractKey&,
+		   const allocator_type&,
+		   __multi_keys_t);
+
+public:
+  // Constructor, destructor, assignment, swap
+  _Hashtable() = default;
+  _Hashtable(size_type __bkt_count_hint,
 		 const _H1&, const _H2&, const _Hash&,
 		 const _Equal&, const _ExtractKey&,
 		 const allocator_type&);
@@ -487,6 +496,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Hashtable(_Hashtable&&, const allocator_type&);
 
   // Use delegating constructors.
+  template
+	_Hashtable(_InputIterator __first, _InputIterator __last,
+		   size_type __bkt_count_hint,
+		   const _H1& __h1, const _H2& __h2, const _Hash& __h,
+		   const _Equal& __eq, const _ExtractKey& __exk,
+		   const allocator_type& __a)
+	: _Hashtable(__first, __last, __bkt_count_hint,
+		 __h1, __h2, __h, __eq, __exk, __a, __unique_keys{})
+	{ }
+
   explicit
   _Hashtable(const allocator_type& __a)
   : __hashtable_alloc(__node_alloc_type(__a))
@@ -543,6 +562,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__reuse_or_alloc_node_gen_t __roan(_M_begin(), *this);
 	_M_before_begin._M_nxt = nullptr;
 	clear();
+
+	// We consider that all elements of __l are going to be inserted.
+	auto __l_bkt_count = _M_rehash_policy._M_bkt_for_elements(__l.size());
+
+	// Do not shrink to keep potential user reservation.
+	if (_M_bucket_count < __l_bkt_count)
+	  rehash(__l_bkt_count);
+
 	this->_M_insert_range(__l.begin(), __l.end(), __roan, __unique_keys{});
 	return *this;
   }
@@ -763,7 +790,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 	std::pair
-	_M_insert(_Arg&&, const _NodeGenerator&, __unique_keys_t, size_type = 1);
+	_M_insert(_Arg&&, const _NodeGenerator&, __unique_keys_t);
 
   template
 	iterator
@@ -1062,7 +1089,25 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		 size_type __bkt_count_hint,
 		 const _H1& __h1, const _H2& __h2, const _Hash& __h,
 		 const _Equal& __eq, const _ExtractKey& __exk,
-		 const allocator_type& __a)
+		 const allocator_type& __a, __unique_keys_t)
+  : _Hashtable(__bkt_count_hint, __h1, __h2, __h, __eq, __exk, __a)
+  {
+	for (; __f != __l; ++__f)
+	  this->insert(*__f);
+  }
+
+  template
+template
+  _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
+		 _H1, _H2, _Hash, _RehashPolicy, _Traits>::
+  _Hashtable(_InputIterator __f, _InputIterator __l,
+		 size_type __bkt_count_hint,
+		 const _H1& __h1, const _H2& __h2, const _Hash& __h,
+		 const _Equal& __eq, const _ExtractKey& __exk,
+		 const allocator_type& __a, __multi_keys_t)
   : _Hashtable(__h1, __h2, __h, __eq, __exk, __a)
   {
 	auto __nb_elems = __detail::__distance_fw(__f, __l);
@@ -1830,7 +1875,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 		 _H1, _H2, _Hash, _RehashPolicy, _Traits>::
   _M_insert(_Arg&& __v, const _NodeGenerator& __node_gen,
-

[PATCH][Hashtable 1/6] Code simplification/optimization

2019-11-17 Thread François Dumont

This patch simplifies a number of implementations.

It tries as much as possible to avoid computing hash code. This is 
especially true for the erase implementation in case of multi keys.



    * include/bits/hashtable_policy.h (_Map_base<>::at): Use
    _Hashtable<>::find.
(_Hashtable_base<>::_Equal_hash_code<>::_S_node_equals):New.
    (_Hashtable_base<>::_M_node_equals): New, use latter.
    * include/bits/hashtable.h (_Hashtable<>::_M_update_bbegin): New.
    (_Hashtable<>::_M_assign): Use latter.
    (_Hashtable<>::_M_move_assign): Likewise.
    (_Hashtable<>(_Hashtable<>&&)): Likewise.
    (_Hashtable<>(_Hashtable<>&&, const allocator_type&)): Likewise.
    (_Hashtable<>::swap): Likewise.
    (_Hashtable<>::find): Build iterator directly from _M_find_node result.
    (_Hashtable<>::count): Use _Hashtable<>::find.
    (_Hashtable<>::equal_range): Likewise.
    (_Hashtable<>::_M_erase(false_type, const key_type&)): Use
    _M_node_equals.

Tested under Linux x86_64.

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index ef71c090f3b..b8cfdde2f31 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -378,6 +378,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // numerous checks in the code to avoid 0 modulus.
   __bucket_type		_M_single_bucket	= nullptr;
 
+  void
+  _M_update_bbegin()
+  {
+	if (_M_begin())
+	  _M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+  }
+
+  void
+  _M_update_bbegin(__node_type* __n)
+  {
+	_M_before_begin._M_nxt = __n;
+	_M_update_bbegin();
+  }
+
   bool
   _M_uses_single_bucket(__bucket_type* __bkts) const
   { return __builtin_expect(__bkts == &_M_single_bucket, false); }
@@ -674,7 +688,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   std::pair
   equal_range(const key_type& __k) const;
 
-protected:
+private:
   // Bucket index computation helpers.
   size_type
   _M_bucket_index(__node_type* __n) const noexcept
@@ -1196,8 +1210,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  = __node_gen(__fwd_value(std::forward<_Ht>(__ht),
    __ht_n->_M_v()));
 	this->_M_copy_code(__this_n, __ht_n);
-	_M_before_begin._M_nxt = __this_n;
-	_M_buckets[_M_bucket_index(__this_n)] = &_M_before_begin;
+	_M_update_bbegin(__this_n);
 
 	// Then deal with other nodes.
 	__node_base* __prev_n = __this_n;
@@ -1259,15 +1272,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  _M_buckets = &_M_single_bucket;
 	  _M_single_bucket = __ht._M_single_bucket;
 	}
+
   _M_bucket_count = __ht._M_bucket_count;
   _M_before_begin._M_nxt = __ht._M_before_begin._M_nxt;
   _M_element_count = __ht._M_element_count;
   std::__alloc_on_move(this->_M_node_allocator(), __ht._M_node_allocator());
 
-  // Fix buckets containing the _M_before_begin pointers that can't be
-  // moved.
-  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+  // Fix bucket containing the _M_before_begin pointer that can't be moved.
+  _M_update_bbegin();
   __ht._M_reset();
 }
 
@@ -1335,10 +1347,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  _M_single_bucket = __ht._M_single_bucket;
 	}
 
-  // Update, if necessary, bucket pointing to before begin that hasn't
-  // moved.
-  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+  // Fix bucket containing the _M_before_begin pointer that can't be moved.
+  _M_update_bbegin();
 
   __ht._M_reset();
 }
@@ -1389,11 +1399,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  else
 	_M_buckets = __ht._M_buckets;
 
-	  _M_before_begin._M_nxt = __ht._M_before_begin._M_nxt;
-	  // Update, if necessary, bucket pointing to before begin that hasn't
+	  // Fix bucket containing the _M_before_begin pointer that can't be
 	  // moved.
-	  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
+	  _M_update_bbegin(__ht._M_begin());
+
 	  __ht._M_reset();
 	}
   else
@@ -1457,14 +1466,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   std::swap(_M_element_count, __x._M_element_count);
   std::swap(_M_single_bucket, __x._M_single_bucket);
 
-  // Fix buckets containing the _M_before_begin pointers that can't be
-  // swapped.
-  if (_M_begin())
-	_M_buckets[_M_bucket_index(_M_begin())] = &_M_before_begin;
-
-  if (__x._M_begin())
-	__x._M_buckets[__x._M_bucket_index(__x._M_begin())]
-	  = &__x._M_before_begin;
+  // Fix bucket containing the _M_before_begin pointer that can't be swap.
+  _M_update_bbegin();
+  __x._M_update_bbegin();
 }
 
   template_M_hash_code(__k);
   std::size_t __bkt = _M_bucket_index(__k, __code);
-  __node_type* __p = _M_find_node(__bkt, __k, __code);
-  return __p ? iterator(__p) : end();
+  return iterator(_M_find_node(__bkt, __k, __code));
 }
 
   template_M_hash_code(__k);
   std::size_t __bkt

[PATCH][Hashtable 0/6] Code review

2019-11-17 Thread François Dumont

This is the begining of a patch series for _Hashtable

Initial patch to clarify code. I was tired to see true/false or 
true_type/false_type without knowing what was true/false.


I also made code more consistent by chosing to specialize methods 
through usage of __unique_keys_t/__multi_keys_t rather than calling them 
_M_[multi]_XXX.



    * include/bits/hashtable_policy.h (__detail::__unique_keys_t): New.
    (__detail::__multi_keys_t): New.
    (__detail::__constant_iterators_t): New.
    (__detail::__mutable_iterators_t): New.
    (__detail::__hash_cached_t): New.
    (__detail::__hash_not_cached_t): New.
    (_Hash_node<>): Change _Cache_hash_code template parameter from bool to
    typename. Adapt partial specializations.
    (_Node_iterator_base<>): Likewise.
    (operator==(const _Node_iterator_base<>&,const 
_Node_iterator_base<>&)):

    Adapt.
    (operator!=(const _Node_iterator_base<>&,const 
_Node_iterator_base<>&)):

    Adapt.
    (_Node_iterator<>): Change __constant_iterators and __cache template
    parameters from bool to typename.
    (_Node_const_iterator<>): Likewise.
    (_Map_base<>): Change _Unique_keys template parameter from bool to
    typename. Adapt partial specializations.
    (_Insert<>): Change _Constant_iterators template parameter from bool to
    typename. Adapt partial specializations.
    (_Local_iterator_base<>): Change __cache_hash_code template parameter
    from bool to typename. Adapt partial specialization.
    (_Hash_code_base<>): Likewise.
    (operator==(const _Local_iterator_base<>&,
    const _Local_iterator_base<>&)): Adapt.
    (operator!=(const _Local_iterator_base<>&,
    const _Local_iterator_base<>&)):
    Adapt.
    (_Local_iterator<>): Change __constant_iterators and __cache template
    parameters from bool to typename.
    (_Local_const_iterator<>): Likewise.
    (_Hashtable_base<>): Adapt.
    (_Equal_hash_code<>): Adapt.
    (_Equality<>): Adapt.
    * include/bits/hashtable.h (_Hashtable<>): Replace occurences of
    true_type/false_type by respoectively __unique_type_t/__multi_type_t.
    (_M_insert_unique_node(const key_type&, size_t, __hash_code,
    __node_type*, size_t)): Replace by...
    (_M_insert_node(__unique_keys_t, size_t, __hash_code, __node_type*,
    size_t)): ...this.
    (_M_insert_muti_node(__node_type*, const key_type&, __hash_code,
    __node_type*)): Replace by...
    (_M_insert_node(__multi_keys_t, __node_type*, __hash_code,
    __node_type*)): ...this.
    (_M_reinsert_node(node_type&&)): Replace by...
    (_M_reinsert_node(node_type&&, __unique_keys_t)): ...this.
    (_M_reinsert_node(const_iterator, node_type&&, __unique_keys_t)): New,
    forward to latter.
    (_M_reinsert_node_multi(const_iterator, node_type&&)): Replace by...
    (_M_reinsert_node(const_iterator, node_type&&, __multi_keys_t)):
    ...this.
    (_M_reinsert_node(node_type&&, __multi_keys_t)): New, forward to 
latter.

    (_M_reinsert_node(node_type&&)): New, use latters.
    (_M_reinsert_node(const_iterator, node_type&&)): Likewise.
    (_M_merge_unique(_Compatible_Hashtable&)): Replace by...
    (_M_merge(__unique_keys_t, _Compatible_Hashtable&)): ...this.
    (_M_merge_multi(_Compatible_Hashtable&)): Replace by...
    (_M_merge(__multi_keys_t, _Compatible_Hashtable&)): ...this.
    (_M_merge(_Compatible_Hashtable&)): New, use latters.
    * include/bits/unordered_map.h
    (unordered_map<>::insert(const_iterator, node_type&&)): Adapt.
    (unordered_map<>::merge(unordered_map<>&)): Adapt.
(unordered_map<>::merge(unordered_multimap<>&)): Adapt.
    (unordered_multimap<>::insert(node_type&&)): Adapt.
    (unordered_multimap<>::insert(const_iterator, node_type&&)): Adapt.
(unordered_multimap<>::merge(unordered_multimap<>&)): Adapt.
(unordered_multimap<>::merge(unordered_map<>&)): Adapt.
    * include/bits/unordered_set.h
    (unordered_set<>::insert(const_iterator, node_type&&)): Adapt.
    (unordered_set<>::merge(unordered_set<>&)): Adapt.
(unordered_set<>::merge(unordered_multiset<>&)): Adapt.
    (unordered_multiset<>::insert(node_type&&)): Adapt.
    (unordered_multiset<>::insert(const_iterator, node_type&&)): Adapt.
(unordered_multiset<>::merge(unordered_multiset<>&)): Adapt.
(unordered_multiset<>::merge(unordered_set<>&)): Adapt.

Tested under Linux x86_64.

François

diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h
index c2b2219d471..ef71c090f3b 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -184,7 +184,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   private __detail::_Hashtable_alloc<
 	__alloc_rebind<_Alloc,
 		   __detail::_Hash_node<_Value,
-	_Traits::__hash_cached::value>>>
+	typename _Traits::__hash_cached>>>
 {
   static_assert(is_same::type, _Value>::value,
 	  "unordered container must have a non-const, non-volatile value_type");
@@ -195,7 +195,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   using __traits_type = _Tra

Re: [PATCH] musl: Don't use gthr weak refs in libgcc PR91737

2019-11-17 Thread Rich Felker
On Sun, Nov 17, 2019 at 11:31:02AM -0700, Jeff Law wrote:
> On 11/15/19 3:00 AM, Szabolcs Nagy wrote:
> > The gthr weak reference based single thread detection is unsafe with
> > static linking and in case of dynamic linking it's ineffective on musl
> > since pthread symbols are defined in libc.so.
> > 
> > Ideally this should be fixed for all targets, since glibc plans to move
> > libpthread.so into libc.so too and users want to static link to pthread
> > without --whole-archive: PR87189.
> > 
> > For now we have to explicitly opt out from the broken behaviour in the
> > config machinery of each target lib and libgcc was previously missed.
> > 
> > libgcc/ChangeLog:
> > 
> > 2019-11-15  Szabolcs Nagy  
> > 
> > * config.host: Add t-gthr-noweak on *-*-musl*.
> > * config/t-gthr-noweak: New file.
> > 
> Given the patch is constrained to musl, it's obviously OK.
> 
> WRT the bigger question, even if glibc gets those bits moved into
> libc.so it's likely going to be a long time before the split between
> libc and libpthreads disappears from the wild :(

The right thing for GCC to do on the glibc side is just having the
affected target libs depend on libpthread until the symbols are moved.
With the current invalid use of weak refs, static linking of
multithreaded programs is completely broken on glibc, and distros are
resorting to hacks of using ld -r or similar to move all of
libpthread.a into a monolithic object file to work around it.

In any case, I'll be happy to have it fixed just for musl now, so we
can drop these patches on our side and have upstream GCC working
pretty much out of the box.

Rich


Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-17 Thread Mikael Pettersson
On Sun, Nov 17, 2019 at 5:57 PM Andreas Schwab  wrote:
>
> On Nov 17 2019, Mikael Pettersson wrote:
>
> > /tmp/ccJA1qws.s:4828: Error: operands mismatch -- statement `seq %a1' 
> > ignored
> > /tmp/ccJA1qws.s:7344: Error: operands mismatch -- statement `seq %a1' 
> > ignored
>
> That should fix it:
>
> diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
> index 0cf063aaf84..3efcaad33a4 100644
> --- a/gcc/config/m68k/m68k.md
> +++ b/gcc/config/m68k/m68k.md
> @@ -698,7 +698,7 @@
>  })
>
>  (define_insn "cstore_bftst_insn"
> -  [(set (match_operand:QI 0 "register_operand")
> +  [(set (match_operand:QI 0 "register_operand" "=d")
> (match_operator:QI 1 "ordered_comparison_operator"
>  [(zero_extract:SI (match_operand:BTST 2 "" 
> "")
>(match_operand:SI 3 "const_int_operand" "n")
>
> Andreas.

This fixed the problem, thanks.

/Mikael


Re: [Patch, Fortran] dec comparisons - for review

2019-11-17 Thread Thomas Koenig

Hi Steve,


On Fri, Nov 15, 2019 at 10:40:56AM +, Mark Eggleston wrote:

This patch allows comparison of numeric values with Holleriths. This
feature is not guarded by a compiler option as it is preferred that
extra options should avoided, this seems reasonable as current Hollerith
support does not have such an option.


IMHO.

Has the comparison of a numeric value and a hollerith
ever been allowed in a Fortran standard?  If the answer
is 'No', then you should (1) put the comparison behind
an option, and (2) have gfortran issue an error without
the option.  If this is a DEC extension, then put it
behind -fdec.  If this misfeature is not a DEC extension
but allow some ancient piece of code to compile, then
put it behind -std=lagacy.


I concur.

Additionally, please put in a test case which confirms that
an error is indeed emitted without that particular option.

Regards

Thomas



Re: [PATCH 1/4] MSP430: Disable TM clone registry by default

2019-11-17 Thread Jeff Law
On 11/7/19 2:34 PM, Jozef Lawrynowicz wrote:
> Given that MSP430 is a resource constrained, embedded target disabling
> transactional memory by default is a good idea to save on code size in
> the runtime library.
> 
> It can still be enabled by passing --enable-tm-clone-registry (although as far
> as I understand the feature is fundamentally incompatible with MSP430 given
> reliance on libitm, lack of thread support without an OS and the memory
> limitations of the device.
> 
I'm not a huge fan of making the default configurations behave
differently.  But I can also see how something like TM in particular
isn't of much interest in the embedded space (hell, it's having trouble
getting real traction in the server space as well).

May be a reasonable path forward is to add the configury bits, keep TM
on by default and create a different msp target which disables this stuff?

Jeff

ps.  I thought libitm would fallback to a full software solution and the
hardware requirements were really just enabling fast-paths.



Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-17 Thread Jeff Law
On 11/12/19 1:16 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 9:15 AM Richard Biener
>  wrote:
>>
>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law  wrote:
>>>
>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
 On 11/6/19 2:06 PM, Martin Sebor wrote:
> On 11/6/19 1:39 PM, Jeff Law wrote:
>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>> On 11/6/19 11:55 AM, Jeff Law wrote:
 On 11/6/19 11:00 AM, Martin Sebor wrote:
> The -Wstringop-overflow warnings for single-byte and multi-byte
> stores mention the amount of data being stored and the amount of
> space remaining in the destination, such as:
>
> warning: writing 4 bytes into a region of size 0 
> [-Wstringop-overflow=]
>
> 123 |   *p = 0;
> |   ~~~^~~
> note: destination object declared here
>  45 |   char b[N];
> |^
>
> A warning like this can take some time to analyze.  First, the size
> of the destination isn't mentioned and may not be easy to tell from
> the sources.  In the note above, when N's value is the result of
> some non-trivial computation, chasing it down may be a small project
> in and of itself.  Second, it's also not clear why the region size
> is zero.  It could be because the offset is exactly N, or because
> it's negative, or because it's in some range greater than N.
>
> Mentioning both the size of the destination object and the offset
> makes the existing messages clearer, are will become essential when
> GCC starts diagnosing overflow into allocated buffers (as my
> follow-on patch does).
>
> The attached patch enhances -Wstringop-overflow to do this by
> letting compute_objsize return the offset to its caller, doing
> something similar in get_stridx, and adding a new function to
> the strlen pass to issue this enhanced warning (eventually, I'd
> like the function to replace the -Wstringop-overflow handler in
> builtins.c).  With the change, the note above might read something
> like:
>
> note: at offset 11 to object ‘b’ with size 8 declared here
>  45 |   char b[N];
> |^
>
> Tested on x86_64-linux.
>
> Martin
>
> gcc-store-offset.diff
>
> gcc/ChangeLog:
>
>  * builtins.c (compute_objsize): Add an argument and set it to
> offset
>  into destination.
>  * builtins.h (compute_objsize): Add an argument.
>  * tree-object-size.c (addr_object_size): Add an argument and
> set it
>  to offset into destination.
>  (compute_builtin_object_size): Same.
>  * tree-object-size.h (compute_builtin_object_size): Add an
> argument.
>  * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
> set it
>  to offset into destination.
>  (maybe_warn_overflow): New function.
>  (handle_store): Call maybe_warn_overflow to issue warnings.
>
> gcc/testsuite/ChangeLog:
>
>  * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
> messages.
>  * g++.dg/warn/Wstringop-overflow-3.C: Same.
>  * gcc.dg/Wstringop-overflow-17.c: Same.
>

> Index: gcc/tree-ssa-strlen.c
> ===
> --- gcc/tree-ssa-strlen.c(revision 277886)
> +++ gcc/tree-ssa-strlen.c(working copy)
> @@ -189,6 +189,52 @@ struct laststmt_struct
>static int get_stridx_plus_constant (strinfo *, unsigned
> HOST_WIDE_INT, tree);
>static void handle_builtin_stxncpy (built_in_function,
> gimple_stmt_iterator *);
>+/* Sets MINMAX to either the constant value or the range VAL
> is in
> +   and returns true on success.  */
> +
> +static bool
> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
> NULL)
> +{
> +  if (tree_fits_uhwi_p (val))
> +{
> +  minmax[0] = minmax[1] = wi::to_wide (val);
> +  return true;
> +}
> +
> +  if (TREE_CODE (val) != SSA_NAME)
> +return false;
> +
> +  if (rvals)
> +{
> +  gimple *def = SSA_NAME_DEF_STMT (val);
> +  if (gimple_assign_single_p (def)
> +  && gimple_assign_rhs_code (def) == INTEGER_CST)
> +{
> +  /* get_value_range returns [0, N] for constant
> assignments.  */
> +  val = gimple_assign_rhs1 (def);
> +  minmax[0] = minmax[1] = wi::to_wide (val);
> +  ret

Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-17 Thread Jeff Law
On 11/12/19 12:55 PM, Martin Sebor wrote:
> On 11/12/19 10:54 AM, Jeff Law wrote:
>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law  wrote:

 On 11/6/19 3:34 PM, Martin Sebor wrote:
> On 11/6/19 2:06 PM, Martin Sebor wrote:
>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
 On 11/6/19 11:55 AM, Jeff Law wrote:
> On 11/6/19 11:00 AM, Martin Sebor wrote:
>> The -Wstringop-overflow warnings for single-byte and multi-byte
>> stores mention the amount of data being stored and the amount of
>> space remaining in the destination, such as:
>>
>> warning: writing 4 bytes into a region of size 0
>> [-Wstringop-overflow=]
>>
>>  123 |   *p = 0;
>>  |   ~~~^~~
>> note: destination object declared here
>>   45 |   char b[N];
>>  |    ^
>>
>> A warning like this can take some time to analyze.  First, the
>> size
>> of the destination isn't mentioned and may not be easy to tell
>> from
>> the sources.  In the note above, when N's value is the result of
>> some non-trivial computation, chasing it down may be a small
>> project
>> in and of itself.  Second, it's also not clear why the region
>> size
>> is zero.  It could be because the offset is exactly N, or because
>> it's negative, or because it's in some range greater than N.
>>
>> Mentioning both the size of the destination object and the offset
>> makes the existing messages clearer, are will become essential
>> when
>> GCC starts diagnosing overflow into allocated buffers (as my
>> follow-on patch does).
>>
>> The attached patch enhances -Wstringop-overflow to do this by
>> letting compute_objsize return the offset to its caller, doing
>> something similar in get_stridx, and adding a new function to
>> the strlen pass to issue this enhanced warning (eventually, I'd
>> like the function to replace the -Wstringop-overflow handler in
>> builtins.c).  With the change, the note above might read
>> something
>> like:
>>
>> note: at offset 11 to object ‘b’ with size 8 declared here
>>   45 |   char b[N];
>>  |    ^
>>
>> Tested on x86_64-linux.
>>
>> Martin
>>
>> gcc-store-offset.diff
>>
>> gcc/ChangeLog:
>>
>>   * builtins.c (compute_objsize): Add an argument and set
>> it to
>> offset
>>   into destination.
>>   * builtins.h (compute_objsize): Add an argument.
>>   * tree-object-size.c (addr_object_size): Add an argument
>> and
>> set it
>>   to offset into destination.
>>   (compute_builtin_object_size): Same.
>>   * tree-object-size.h (compute_builtin_object_size): Add an
>> argument.
>>   * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>> set it
>>   to offset into destination.
>>   (maybe_warn_overflow): New function.
>>   (handle_store): Call maybe_warn_overflow to issue warnings.
>>
>> gcc/testsuite/ChangeLog:
>>
>>   * c-c++-common/Wstringop-overflow-2.c: Adjust text of
>> expected
>> messages.
>>   * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>   * gcc.dg/Wstringop-overflow-17.c: Same.
>>
>
>> Index: gcc/tree-ssa-strlen.c
>> ===
>>
>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>> +++ gcc/tree-ssa-strlen.c    (working copy)
>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>     static int get_stridx_plus_constant (strinfo *, unsigned
>> HOST_WIDE_INT, tree);
>>     static void handle_builtin_stxncpy (built_in_function,
>> gimple_stmt_iterator *);
>>     +/* Sets MINMAX to either the constant value or the range VAL
>> is in
>> +   and returns true on success.  */
>> +
>> +static bool
>> +get_range (tree val, wide_int minmax[2], const vr_values
>> *rvals =
>> NULL)
>> +{
>> +  if (tree_fits_uhwi_p (val))
>> +    {
>> +  minmax[0] = minmax[1] = wi::to_wide (val);
>> +  return true;
>> +    }
>> +
>> +  if (TREE_CODE (val) != SSA_NAME)
>> +    return false;
>> +
>> +  if (rvals)
>> +    {
>> +  gimple *def = SSA_NAME_DEF_STMT (val);
>> +  if (gimple_assign_single_p (def)
>> +  && gimple_

Re: [PATCH] include size and offset in -Wstringop-overflow

2019-11-17 Thread Jeff Law
On 11/13/19 7:34 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 6:55 PM Jeff Law  wrote:
>>
>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law  wrote:

 On 11/6/19 3:34 PM, Martin Sebor wrote:
> On 11/6/19 2:06 PM, Martin Sebor wrote:
>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
 On 11/6/19 11:55 AM, Jeff Law wrote:
> On 11/6/19 11:00 AM, Martin Sebor wrote:
>> The -Wstringop-overflow warnings for single-byte and multi-byte
>> stores mention the amount of data being stored and the amount of
>> space remaining in the destination, such as:
>>
>> warning: writing 4 bytes into a region of size 0 
>> [-Wstringop-overflow=]
>>
>> 123 |   *p = 0;
>> |   ~~~^~~
>> note: destination object declared here
>>  45 |   char b[N];
>> |^
>>
>> A warning like this can take some time to analyze.  First, the size
>> of the destination isn't mentioned and may not be easy to tell from
>> the sources.  In the note above, when N's value is the result of
>> some non-trivial computation, chasing it down may be a small project
>> in and of itself.  Second, it's also not clear why the region size
>> is zero.  It could be because the offset is exactly N, or because
>> it's negative, or because it's in some range greater than N.
>>
>> Mentioning both the size of the destination object and the offset
>> makes the existing messages clearer, are will become essential when
>> GCC starts diagnosing overflow into allocated buffers (as my
>> follow-on patch does).
>>
>> The attached patch enhances -Wstringop-overflow to do this by
>> letting compute_objsize return the offset to its caller, doing
>> something similar in get_stridx, and adding a new function to
>> the strlen pass to issue this enhanced warning (eventually, I'd
>> like the function to replace the -Wstringop-overflow handler in
>> builtins.c).  With the change, the note above might read something
>> like:
>>
>> note: at offset 11 to object ‘b’ with size 8 declared here
>>  45 |   char b[N];
>> |^
>>
>> Tested on x86_64-linux.
>>
>> Martin
>>
>> gcc-store-offset.diff
>>
>> gcc/ChangeLog:
>>
>>  * builtins.c (compute_objsize): Add an argument and set it to
>> offset
>>  into destination.
>>  * builtins.h (compute_objsize): Add an argument.
>>  * tree-object-size.c (addr_object_size): Add an argument and
>> set it
>>  to offset into destination.
>>  (compute_builtin_object_size): Same.
>>  * tree-object-size.h (compute_builtin_object_size): Add an
>> argument.
>>  * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>> set it
>>  to offset into destination.
>>  (maybe_warn_overflow): New function.
>>  (handle_store): Call maybe_warn_overflow to issue warnings.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>> messages.
>>  * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>  * gcc.dg/Wstringop-overflow-17.c: Same.
>>
>
>> Index: gcc/tree-ssa-strlen.c
>> ===
>> --- gcc/tree-ssa-strlen.c(revision 277886)
>> +++ gcc/tree-ssa-strlen.c(working copy)
>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>static int get_stridx_plus_constant (strinfo *, unsigned
>> HOST_WIDE_INT, tree);
>>static void handle_builtin_stxncpy (built_in_function,
>> gimple_stmt_iterator *);
>>+/* Sets MINMAX to either the constant value or the range VAL
>> is in
>> +   and returns true on success.  */
>> +
>> +static bool
>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>> NULL)
>> +{
>> +  if (tree_fits_uhwi_p (val))
>> +{
>> +  minmax[0] = minmax[1] = wi::to_wide (val);
>> +  return true;
>> +}
>> +
>> +  if (TREE_CODE (val) != SSA_NAME)
>> +return false;
>> +
>> +  if (rvals)
>> +{
>> +  gimple *def = SSA_NAME_DEF_STMT (val);
>> +  if (gimple_assign_single_p (def)
>> +  && gimple_assign_rhs_code (def) == INTEGER_CST)
>> +{
>> +  /* get_value_range returns [0, N] for constant
>>>

Re: Set inline-insns-single-O2 to 70

2019-11-17 Thread Jeff Law
On 11/14/19 5:38 AM, Jan Hubicka wrote:
> Hi,
> this patch bumps inline-insns-single-O2 from 30 to 70.  I originally reduced
> it from 120 to 50 when forking the -O2 and -O3 parameters which has
> quite significant code size benefits.
> 
> This parameter controls how large functions user declared inline are inlined
> (sadly we really can't inline all).
> 
> However while this transform is mostly SPEC netutral it has turned out to 
> cause
> performance regression for tramp3d, botan and some of Firefox benchmarks with
> LTO.
> 
> I re-measured everything with 30, 50, 70 and 90 values as seen here:
> 
> https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ead4ea7bb1b1b531c2d8ba72fc5c1f1b14ddc454%2Ced81e91c55436bb949fab8556c138488b598af9e%2C44f7fe6bc09fc2365c4ec9ec7aea2863593d87fc%2C6f4da220ebfa0c3a3db02109dcb371da27516a3b%2C7aa+fd3ce8a81b45e040dae74e38a6849c65883ba
> 
> Ignore the benzen results since it run only part of the tests.
> Also ignore everything which is not with -Ofast. It is noise.
> 
> The off noise observations are:
> 
>  - 6% regression for Povray at 50, 70, 90 (6%).  This is bit of independent
>problem which I will treat independently
>  - 4% improvement for gcc for 90
>  - 4% improvement for xalancbmk for 90
>  - 2% improvement for parest for 70, 90
>  - 12% improvement for Deesjeng for 50, 70, 90
> 
> Most sensitive code size wise is xalanc, about 15% growth for 50+
> 
> To see code size one needs to click "Display all ELF stats", set minimum
> threshold to 0.001 and click generate. Once page is fully loaded add 
> total.*text to Filter.
> 
> The overall outcome is growth
> 
> 507090
>  spec 2006  0.51% 0.89% 1.12%
>  spec 2006 LTO  0.34% 0.60% 0.79%
>  spec 2017  2.06% 2.48% 2.57%
> 
> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ead4ea7bb1b1b531c2d8ba72fc5c1f1b14ddc454%2Ced81e91c55436bb949fab8556c138488b598af9e%2C44f7fe6bc09fc2365c4ec9ec7aea2863593d87fc%2C6f4da220ebfa0c3a3db02109dcb371da27516a3b%2C7aa+fd3ce8a81b45e040dae74e38a6849c65883ba
> 
> Short story  
>  - many of botan tests like bumping limits up, about 1/3 of them all the way 
> to
>90 (there was no improvments for 120).
>  - nbench like increase to 50 and more
>  - polyhedron ttf2 likes 50 and more
>  - tramp3d likes 90 
> 
> I also run Firefox LTO benchmarks:
> 
> 30 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=90a908c19de521482cad5ff864f8f67fec6dbc75
> 50 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=7efe0bfd2f5acb55b1bcf0ba4a162e59b1a3be99
> 70 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a7cf9728e4a952eff5190abeb72a4a95571d95d
> 90 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=5157552ce80419ed5bd0594668a53a13a04786d2
> 
> In all cases I used --param inline-unit-growth=12000 since this limit 
> otherwise blocks
> inliner before it gets into function sizes in question.  Code size is as 
> folows:
> 
> libxul.so size:
> 
> 30 103798151
> 50 108490103 (+4%)
> 70 114372911 (+10%)
> 90 116104639 (+11%)
> 
> Compares:
> 30 to 50: 
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=5157552ce80419ed5bd0594668a53a13a04786d2&newProject=try&newRevision=7efe0bfd2f5acb55b1bcf0ba4a162e59b1a3be99&framework=1
>  (this shows almost nothing)
> 30 to 70: 
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=90a908c19de521482cad5ff864f8f67fec6dbc75&newProject=try&newRevision=6a7cf9728e4a952eff5190abeb72a4a95571d95d&framework=1
>  (here is 14% improvement for dormaeo benchamrk and 5% in overall
>  responsiveness, there is regression in tsvgx/tresize which can be
>  tracked down to quite low lever hand optimized code in SKIA graphics
>  rendering library which does not define ALWAY_INLINE to always_inline
>  for GCC (only for clang))
> 30 to 90: 
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=90a908c19de521482cad5ff864f8f67fec6dbc75&newProject=try&newRevision=5157552ce80419ed5bd0594668a53a13a04786d2&framework=1
>  (generally similar to previous one)
> 
> So most improvments shows up with 70 and 50 seems to not be enough to get
> performance for Firefox.  We still lose on tramp3d and some of botan, but I
> think this is generally -O3/-Ofast type of code so I hope it is acceptable.
> 
> The SPEC code sizes are not very realistic, since a lot of codebases are
> Fortran or old C which do not use inlined keyword at all.  On the other hand
> Firefox sizes are not realistic either (in other direction) since I disabled
> the inline-unit-growth parameter.
> 
> I hope that once Martin get Tumbleweed builds with GCC 10 branch working, we
> can verify how much this makes difference in larger scale.
> 
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
> 
>   * params.

[Patch, fortran] PR83118 - [8/9/10 Regression] Bad intrinsic assignment of class(*) array component of derived type

2019-11-17 Thread Paul Richard Thomas
This is a somewhat delayed patch to fix issues with the original
patch, as flagged up by Rainer in comment #12, Rainer in comment #14
and Eric in comment #15. The fix for these problems was posted in
April in comment #17. It was thoroughly tested but remained
uncommitted because my attention was elsewhere.

I have added the fix to Damian's failing test posted at
https://gcc.gnu.org/ml/fortran/2019-11/msg00061.html ? and referenced
by Tobias in comment #23.

The submitted testcase leaks memory as in PR38319, which I will return
to as I work my way through my assigned PRs. I have returned to this
latter PR on several occasions and have thus far not managed to find a
fix for the problem, which is primarily due to various issues with
allocatable component derived type constructor.

For the main part, the patch relies on ensuring vtables are available
and forcing all assignments to unlimited polymorphic entities to use
the vtable _copy.

Regtests on FC30/x86_64 - OK to commit?

Paul

2019-11-17  Paul Thomas  

PR fortran/83118
* resolve.c (resolve_ordinary_assign): Generate a vtable if
necessary for scalar non-polymorphic rhs's to unlimited lhs's.
* trans-array.c (structure_alloc_comps): Delete trailing white
spaces.
(gfc_alloc_allocatable_for_assignment): Use earlier evaluation
of 'cond_null'. If unlimited poly initialize 'size1' to zero
and jump to 'no_shape_tests'. Force reallocation of unlimited
polymorphic lhs's. For allocation to unlimited polymorphic lhs
from a class rhs, use the vtable size.
* trans-expr.c (gfc_conv_procedure_call): Ensure the vtable is
present for passing a non-class actual to an unlimited formal.
(gfc_trans_assignment_1): Simplify some of the logic with
'realloc_flag'.
(realloc_flag): Set 'vptr_copy' for all array assignments to
unlimited polymorphic lhs.

2019-11-17  Paul Thomas  

PR fortran/83118
* gfortran.dg/unlimited_polymorphic_31.f03: New test.
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c	(revision 278354)
--- gcc/fortran/resolve.c	(working copy)
*** resolve_ordinary_assign (gfc_code *code,
*** 10868,10874 
  
/* Make sure there is a vtable and, in particular, a _copy for the
   rhs type.  */
!   if (UNLIMITED_POLY (lhs) && lhs->rank && rhs->ts.type != BT_CLASS)
  gfc_find_vtab (&rhs->ts);
  
bool caf_convert_to_send = flag_coarray == GFC_FCOARRAY_LIB
--- 10868,10874 
  
/* Make sure there is a vtable and, in particular, a _copy for the
   rhs type.  */
!   if (UNLIMITED_POLY (lhs) && rhs->ts.type != BT_CLASS)
  gfc_find_vtab (&rhs->ts);
  
bool caf_convert_to_send = flag_coarray == GFC_FCOARRAY_LIB
Index: gcc/fortran/trans-array.c
===
*** gcc/fortran/trans-array.c	(revision 278354)
--- gcc/fortran/trans-array.c	(working copy)
*** structure_alloc_comps (gfc_symbol * der_
*** 8822,8828 
  
  	  cdesc = gfc_create_var (cdesc, "cdesc");
  	  DECL_ARTIFICIAL (cdesc) = 1;
!   
  	  gfc_add_modify (&tmpblock, gfc_conv_descriptor_dtype (cdesc),
  	  		  gfc_get_dtype_rank_type (1, tmp));
  	  gfc_conv_descriptor_lbound_set (&tmpblock, cdesc,
--- 8822,8828 
  
  	  cdesc = gfc_create_var (cdesc, "cdesc");
  	  DECL_ARTIFICIAL (cdesc) = 1;
! 
  	  gfc_add_modify (&tmpblock, gfc_conv_descriptor_dtype (cdesc),
  	  		  gfc_get_dtype_rank_type (1, tmp));
  	  gfc_conv_descriptor_lbound_set (&tmpblock, cdesc,
*** structure_alloc_comps (gfc_symbol * der_
*** 8833,8839 
  	  gfc_index_one_node);
  	  gfc_conv_descriptor_ubound_set (&tmpblock, cdesc,
  	  gfc_index_zero_node, ubound);
!   
  	  if (attr->dimension)
  	comp = gfc_conv_descriptor_data_get (comp);
  	  else
--- 8833,8839 
  	  gfc_index_one_node);
  	  gfc_conv_descriptor_ubound_set (&tmpblock, cdesc,
  	  gfc_index_zero_node, ubound);
! 
  	  if (attr->dimension)
  	comp = gfc_conv_descriptor_data_get (comp);
  	  else
*** gfc_alloc_allocatable_for_assignment (gf
*** 10184,10198 
  			 rss->info->string_length);
cond_null = fold_build2_loc (input_location, TRUTH_OR_EXPR,
     logical_type_node, tmp, cond_null);
  }
else
  cond_null= gfc_evaluate_now (cond_null, &fblock);
  
-   tmp = build3_v (COND_EXPR, cond_null,
- 		  build1_v (GOTO_EXPR, jump_label1),
- 		  build_empty_stmt (input_location));
-   gfc_add_expr_to_block (&fblock, tmp);
- 
/* Get arrayspec if expr is a full array.  */
if (expr2 && expr2->expr_type == EXPR_FUNCTION
  	&& expr2->value.function.isym
--- 10184,10194 
  			 rss->info->string_length);
cond_null = fold_build2_loc (input_location, TRUTH_OR_EXPR,
     logical_type_node, tmp, cond_null);
+   cond_null= gfc_evaluate_now (cond_null, &fblock);
  }
else
  cond_null= gfc_evaluate_now (cond_null, &f

Re: [mid-end][__RTL] Clean state despite unspecified __RTL startwith passes

2019-11-17 Thread Jeff Law
On 11/14/19 11:22 AM, Matthew Malcomson wrote:
> Hi there,
> 
> When compiling an __RTL function that has an unspecified "startwith"
> pass we currently don't run the cleanup pass, this means that we ICE on
> the next function (if it's a basic function).
> I asked about this on the GCC mailing list a while ago and Richard mentioned
> it might be a good idea to clear bad state so it doesn't leak to other
> functions.
> https://gcc.gnu.org/ml/gcc/2019-02/msg00106.html
> 
> This change ensures that the clean_state pass is run even if the
> startwith pass is unspecified.
> 
> We also ensure the name of the startwith pass is always freed correctly.
> 
> As an example, before this change the following code would ICE when compiling
> the function `foo_a`.
> 
> When compiled with
> ./aarch64-none-linux-gnu-gcc -O0 -S unspecified-pass-error.c -o test.s
> 
> ```
> int __RTL () badfoo ()
> {
> (function "badfoo"
>   (insn-chain
> (block 2
>   (edge-from entry (flags "FALLTHRU"))
>   (cnote 3 [bb 2] NOTE_INSN_BASIC_BLOCK)
>   (cinsn 101 (set (reg:DI x19) (reg:DI x0)))
>   (cinsn 10 (use (reg/i:SI x19)))
>   (edge-to exit (flags "FALLTHRU"))
> ) ;; block 2
>   ) ;; insn-chain
> ) ;; function "foo2"
> }
> 
> int
> foo_a ()
> {
>   return 200;
> }
> ```
> 
> Now it silently ignores the __RTL function and successfully compiles foo_a.
> 
> regtest done on aarch64
> regtest done on x86_64
> 
> OK for trunk?
> 
> gcc/ChangeLog:
> 
> 2019-11-14  Matthew Malcomson  
> 
>   * run-rtl-passes.c (run_rtl_passes): Accept and handle empty
>   "initial_pass_name" argument -- by running "*clean_state" pass.
>   Also free the "initial_pass_name" when done.
> 
> gcc/c/ChangeLog:
> 
> 2019-11-14  Matthew Malcomson  
> 
>   * c-parser.c (c_parser_parse_rtl_body): Always call
>   run_rtl_passes, even if startwith pass is not provided.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-11-14  Matthew Malcomson  
> 
>   * gcc.dg/rtl/aarch64/unspecified-pass-error.c: New test.
OK
jeff



Re: [PATCH] musl: Don't use gthr weak refs in libgcc PR91737

2019-11-17 Thread Jeff Law
On 11/15/19 3:00 AM, Szabolcs Nagy wrote:
> The gthr weak reference based single thread detection is unsafe with
> static linking and in case of dynamic linking it's ineffective on musl
> since pthread symbols are defined in libc.so.
> 
> Ideally this should be fixed for all targets, since glibc plans to move
> libpthread.so into libc.so too and users want to static link to pthread
> without --whole-archive: PR87189.
> 
> For now we have to explicitly opt out from the broken behaviour in the
> config machinery of each target lib and libgcc was previously missed.
> 
> libgcc/ChangeLog:
> 
> 2019-11-15  Szabolcs Nagy  
> 
>   * config.host: Add t-gthr-noweak on *-*-musl*.
>   * config/t-gthr-noweak: New file.
> 
Given the patch is constrained to musl, it's obviously OK.

WRT the bigger question, even if glibc gets those bits moved into
libc.so it's likely going to be a long time before the split between
libc and libpthreads disappears from the wild :(

jeff



Re: [Patch, Fortran] dec comparisons - for review

2019-11-17 Thread Jeff Law
On 11/15/19 3:40 AM, Mark Eggleston wrote:
> This patch allows comparison of numeric values with Holleriths. This
> feature is not guarded by a compiler option as it is preferred that
> extra options should avoided, this seems reasonable as current Hollerith
> support does not have such an option.
> 
> In addition it also allows comparison of character values with
> Holleriths, this is guarded by a compiler option, I've used the existing
> -fdec-char-conversions because the Hollerith is converted to character
> for the comparison to be made.
> 
> These legacy features are supported by xlf, Sun and flang compilers.
> 
> I know the deadline for new features is close. Does the process have to
> completed before the deadline or can it overlap? If not I'll withdraw it
> until stage 1 is re-opened.
The general guidance is patch should be posted prior to stage1 close;
some iteration on the patch is certainly allowed.  If the patch requires
major rework then we deal with those on a case-by-case basis.  I think
Jason has once characterized it as "no new concepts" after stage1 close.

We also usually give some port, language and runtime maintainers a bit
more leeway.

You've clearly met the submission deadline, so it's in the hands of the
Fortran maintainers to decide if this should go forward.

jeff



Re: [PATCH 1/4] Preliminary m68k patches

2019-11-17 Thread Jeff Law
On 11/14/19 5:23 AM, Bernd Schmidt wrote:
> On 11/13/19 9:03 PM, Jeff Law wrote:
>> OK.  I'd actually recommend this go ahead and get installed.  My tester
>> will bootstrap it overnight.
> 
> Alright, let me know how that turns out. What kind of machine do you
> have for that?
Sorry I should have been clearer.  I expected you to commit the first
patch and my tester would have picked it up automatically.

Regrardless, I put the first patch into my magic testing directory, so
it'll get bootstrapped later today.

jeff



Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-17 Thread Jeff Law
On 11/17/19 9:57 AM, Andreas Schwab wrote:
> On Nov 17 2019, Mikael Pettersson wrote:
> 
>> /tmp/ccJA1qws.s:4828: Error: operands mismatch -- statement `seq %a1' ignored
>> /tmp/ccJA1qws.s:7344: Error: operands mismatch -- statement `seq %a1' ignored
> 
> That should fix it:
> 
> diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
> index 0cf063aaf84..3efcaad33a4 100644
> --- a/gcc/config/m68k/m68k.md
> +++ b/gcc/config/m68k/m68k.md
> @@ -698,7 +698,7 @@
>  })
>  
>  (define_insn "cstore_bftst_insn"
> -  [(set (match_operand:QI 0 "register_operand")
> +  [(set (match_operand:QI 0 "register_operand" "=d")
>   (match_operator:QI 1 "ordered_comparison_operator"
>[(zero_extract:SI (match_operand:BTST 2 "" 
> "")
>  (match_operand:SI 3 "const_int_operand" "n")
OK.

Bernd, presumably you can add this minor bugfix on top of your kit when
you install it?

Jeff

ps.  And to answer a question of Bernd's from a prior message.  I'm not
bootstrapping on real m68k hardware.  I'm using qemu user space
emulation + a native m68k chroot environment.   I've also used Aranym in
the past with good success.  I prefer the former because it's the same
core technology as other targets *and* since I'm just using user mode
emulation I can exploit whatever SMP resources are on the host.

jeff



Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-17 Thread Jeff Law
On 11/13/19 6:23 AM, Bernd Schmidt wrote:
> Once more with patch.
> 
> 
> Bernd
> 
> 
> m68k-2.diff
> 
> PR target/91851
> * config/m68k/m68k-protos.h (output-dbcc_and_branch): Adjust
> declaration.
> (m68k_init_cc): New declaration.
> (m68k_output_compare_di, m68k_output_compare_si,
> m68k_output_compare_hi, m68k_output_compare_qi,
> m68k_output_compare_fp, m68k_output_btst, m68k_output_bftst,
> m68k_find_flags_value, m68k_output_scc, m68k_output_scc_float,
> m68k_output_branch_integer, m68k_output_branch_integer_rev.
> m68k_output_branch_float, m68k_output_branch_float_rev):
> Likewise.
> (valid_dbcc_comparison_p_2, flags_in_68881,
> output_btst): Remove declaration.
> * config/m68k/m68k.c (INCLDUE_STRING): Define.
> (TARGET_ASM_FINAL_POSTSCAN_INSN): Define.
> (valid_dbcc_comparison_p_2, flags_in_68881): Delete functions.
> (flags_compare_op0, flags_compare_op1, flags_operand1,
> flags_operand2, flags_valid): New static variables.
> (m68k_find_flags_value, m68k_init_cc): New functions.
> (handle_flags_for_move, m68k_asm_final_postscan_insn,
> remember_compare_flags): New static functions.
> (output_dbcc_and_branch): New argument CODE.  Use it, and add
> PLUS and MINUS to the possible codes.  All callers changed.
> (m68k_output_btst): Renamed from output_btst.  Remove OPERANDS
> and INSN arguments, add CODE arg.  Return the comparison code
> to use.  All callers changed.  Use CODE instead of
> next_insn_tests_no_inequality, and replace cc_status management
> with changing the return code.
> (m68k_rtx_costs): Instead of testing for COMPARE, test for
> RTX_COMPARE or RTX_COMM_COMPARE.
> (output_move_simode, output_move_qimode): Call
> handle_flags_for_move.
> (notice_update_cc): Delete function.
> (m68k_output_bftst, m68k_output_compare_di, 
> m68k_output_compare_si,
> m68k_output_compare_hi, m68k_output_compare_qi,
> m68k_output_compare_fp, m68k_output_branch_integer,
> m68k_output_branch_integer_rev, m68k_output_scc,
> m68k_output_branch_float, m68k_output_branch_float_rev,
> m68k_output_scc_float): New functions.
> (output_andsi3, output_iorsi3, output_xorsi3): Call CC_STATUS_INIT
> once at the start, and set flags_valid and flags_operand1 if the
> flags are usable.
> * config/m68k/m68k.h (CC_IN_68881, NOTICE_UPDATE_CC,
> CC_OVERFLOW_UNUSABLE, CC_NO_CARRY, OUTPUT_JUMP): Remove
> definitions.
> (CC_STATUS_INIT): Define.
> * config/m68k/m68k.md (flags_valid): New define_attr.
> (tstdi, tstsi_internal_68020_cf, tstsi_internal, tsthi_internal,
> tstqi_internal, tst_68881, tst_cf, cmpdi_internal,
> cmpdi, unnamed cmpsi/cmphi/cmpqi patterns, cmpsi_cf,
> cmp_68881, cmp_cf, unnamed btst patterns,
> tst_bftst_reg, tst_bftst_reg, unnamed scc patterns, scc,
> sls, sordered_1, sunordered_1, suneq_1, sunge_1, sungt_1,
> sunle_1, sunlt_1, sltgt_1, fsogt_1, fsoge_1, fsolt_1, fsole_1,
> bge0_di, blt0_di, beq, bne, bgt, bgtu, blt, bltu, bge, bgeu,
> ble, bleu, bordered, bunordered, buneq, bunge, bungt, bunle,
> bunlt, bltgt, beq_rev, bne_rev, bgt_rev, bgtu_rev,
> blt_rev, bltu_rev, bge_rev, bgeu_rev, ble_rev, bleu_rev,
> bordered_rev, bunordered_rev, buneq_rev, bunge_rv, bungt_rev,
> bunle_rev, bunlt_rev, bltgt_rev, ctrapdi4, ctrapsi4, ctraphi4,
> ctrapqi4, conditional_trap): Delete patterns.
> (cbranchdi4_insn): New pattern.
> (cbranchdi4): Don't generate cc0 patterns.  When testing LT or GE,
> test high part only.  When testing EQ or NE, generate beq0_di
> and bne0_di patterns directly.
> (cstoredi4): When testing LT or GE, test high part only.
> (both sets of cbranch4, cstore4): Don't generate cc0
> patterns.
> (scc0_constraints, cmp1_constraints, cmp2_constraints,
> scc0_cf_constraints, cmp1_cf_constraints, cmp2_cf_constraints,
> cmp2_cf_predicate): New define_mode_attrs.
> (cbranch4_insn, cbranch4_insn_rev,
> cbranch4_insn_cf, cbranch4_insn_cf_rev,
> cstore4_insn, cstore4_insn_cf for integer modes)
> New patterns.
> (cbranch4_insn_68881, cbranch4_insn_rev_68881):
> (cbranch4_insn_cf, cbranch4_insn_rev_cf,
> cstore4_insn_68881, cstore4_insn_cf for FP):
> New patterns.
> (cbra

Re: Add optabs for accelerating RAW and WAR alias checks

2019-11-17 Thread Jeff Law
On 11/16/19 8:39 AM, Richard Sandiford wrote:
> This patch adds optabs that check whether a read followed by a write
> or a write followed by a read can be divided into interleaved byte
> accesses without changing the dependencies between the bytes.
> This is one of the uses of the SVE2 WHILERW and WHILEWR instructions.
> (The instructions can also be used to limit the VF at runtime,
> but that's future work.)
> 
> This applies on top of:
> 
>   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00787.html
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-11-16  Richard Sandiford  
> 
> gcc/
>   * doc/sourcebuild.texi (vect_check_ptrs): Document.
>   * optabs.def (check_raw_ptrs_optab, check_war_ptrs_optab): New optabs.
>   * doc/md.texi: Document them.
>   * internal-fn.def (IFN_CHECK_RAW_PTRS, IFN_CHECK_WAR_PTRS): New
>   internal functions.
>   * internal-fn.h (internal_check_ptrs_fn_supported_p): Declare.
>   * internal-fn.c (check_ptrs_direct): New macro.
>   (expand_check_ptrs_optab_fn): Likewise.
>   (direct_check_ptrs_optab_supported_p): Likewise.
>   (internal_check_ptrs_fn_supported_p): New fuction.
>   * tree-data-ref.c: Include internal-fn.h.
>   (create_ifn_alias_checks): New function.
>   (create_intersect_range_checks): Use it.
>   * config/aarch64/iterators.md (SVE2_WHILE_PTR): New int iterator.
>   (optab, cmp_op): Handle it.
>   (raw_war, unspec): New int attributes.
>   * config/aarch64/aarch64.md (UNSPEC_WHILERW, UNSPEC_WHILE_WR): New
>   constants.
>   * config/aarch64/predicates.md (aarch64_bytes_per_sve_vector_operand):
>   New predicate.
>   * config/aarch64/aarch64-sve2.md (check__ptrs): New
>   expander.
>   (@aarch64_sve2_while_ptest): New
>   pattern.
> 
> gcc/testsuite/
>   * lib/target-supports.exp (check_effective_target_vect_check_ptrs):
>   New procedure.
>   * gcc.dg/vect/vect-alias-check-14.c: Expect IFN_CHECK_WAR to be
>   used, if available.
>   * gcc.dg/vect/vect-alias-check-15.c: Likewise.
>   * gcc.dg/vect/vect-alias-check-16.c: Likewise IFN_CHECK_RAW.
>   * gcc.target/aarch64/sve2/whilerw_1.c: New test.
>   * gcc.target/aarch64/sve2/whilewr_1.c: Likewise.
>   * gcc.target/aarch64/sve2/whilewr_2.c: Likewise.
>
OK
jeff



Re: Handle VIEW_CONVERT_EXPR for variable-length vectors

2019-11-17 Thread Jeff Law
On 11/16/19 6:38 AM, Richard Sandiford wrote:
> This patch handles VIEW_CONVERT_EXPRs of variable-length VECTOR_CSTs
> by adding tree-level versions of native_decode_vector_rtx and
> simplify_const_vector_subreg.  It uses the same code for fixed-length
> vectors, both to get more coverage and because operating directly on
> the compressed encoding should be more efficient for longer vectors
> with a regular pattern.
> 
> The structure and comments are very similar between the tree and
> rtx routines.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-11-15  Richard Sandiford  
> 
> gcc/
>   * fold-const.c (native_encode_vector): Turn into a wrapper function,
>   splitting the main code out into...
>   (native_encode_vector_part): ...this new function.
>   (native_decode_vector_tree): New function.
>   (fold_view_convert_vector_encoding): Likewise.
>   (fold_view_convert_expr): Use it for converting VECTOR_CSTs
>   to VECTOR_TYPEs.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/sve/acle/general/temporaries_1.c: New test.
OK
jeff
> 



Re: Two RTL CC tweaks for SVE pmore/plast conditions

2019-11-17 Thread Jeff Law
On 11/16/19 6:42 AM, Richard Sandiford wrote:
> SVE has two composite conditions:
> 
>   pmore == at least one bit set && last bit clear
>   plast == no bits set || last bit set
> 
> So in general we generate them from:
> 
>   A: CC = test bits
>   B: reg1 = first condition
>   C: CC = test bits
>   D: reg2 = second condition
>   E: result = (reg1 op reg2)   where op is || or &&
> 
> To fold all this into a single test, we need to be able to remove
> the redundant C (the cse.c patch) and then fold B, D and E down to
> a single condition (the simplify-rtx.c patch).
> 
> The underlying conditions are unsigned, so the simplify-rtx.c part needs
> to support both unsigned comparisons and AND.  However, to avoid opening
> the can of worms that is ANDing FP comparisons for unordered inputs,
> I've restricted the new AND handling to cases in which NaNs can be
> ignored.  I think this is still a strict extension of what we have now,
> it just doesn't go as far as it could.  Going further would need an
> entirely different set of testcases so I think would make more sense
> as separate work.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-11-16  Richard Sandiford  
> 
> gcc/
>   * cse.c (cse_insn): Delete no-op register moves too.
>   * simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons.
>   Take a second comparison to control the value for NE.
>   (mask_to_comparison): Handle unsigned comparisons.
>   (simplify_logical_relational_operation): Likewise.  Update call
>   to comparison_to_mask.  Handle AND if !HONOR_NANs.
>   (simplify_binary_operation_1): Call the above for AND too.
> 
> gcc/testsuite/
>   * gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test.
OK
jeff



Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-17 Thread Andreas Schwab
On Nov 17 2019, Mikael Pettersson wrote:

> /tmp/ccJA1qws.s:4828: Error: operands mismatch -- statement `seq %a1' ignored
> /tmp/ccJA1qws.s:7344: Error: operands mismatch -- statement `seq %a1' ignored

That should fix it:

diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index 0cf063aaf84..3efcaad33a4 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -698,7 +698,7 @@
 })
 
 (define_insn "cstore_bftst_insn"
-  [(set (match_operand:QI 0 "register_operand")
+  [(set (match_operand:QI 0 "register_operand" "=d")
(match_operator:QI 1 "ordered_comparison_operator"
 [(zero_extract:SI (match_operand:BTST 2 "" 
"")
   (match_operand:SI 3 "const_int_operand" "n")

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [C++ coroutines 2/6] Define builtins and internal functions.

2019-11-17 Thread Jeff Law
On 11/17/19 3:24 AM, Iain Sandoe wrote:
> 
> This part of the patch series provides the builtin functions
> used by the standard library code and the internal functions
> used to implement lowering of the coroutine state machine.
> 
> gcc/ChangeLog:
> 
> 2019-11-17  Iain Sandoe  
> 
>   * builtin-types.def (BT_FN_BOOL_PTR): New.
>   (BT_FN_PTR_PTR_SIZE_BOOL): New.
>   * builtins.def (DEF_COROUTINE_BUILTIN): New.
>   * coroutine-builtins.def: New file.
>   * internal-fn.c (expand_CO_FRAME): New.
>   (expand_CO_YIELD): New.
>   (expand_CO_SUSPN): New.
>   (expand_CO_ACTOR): New.
>   * internal-fn.def (CO_ACTOR): New.
>   (CO_YIELD): New.
>   (CO_SUSPN): New.
>   (CO_FRAME): New.
This is OK as would be any minor adjustments you may ultimately need due
to other feedback on the kit.

jeff



[committed] Fix complex-6 for rx target

2019-11-17 Thread Jeff Law

Jakub's recent changes to fix 92449 regresses complex-6 on the rx port
because it defaults to !HONOR_NANs which compromises this test.

ISTM the best thing to do is just avoid the two dump scans for the rx
target.  I haven't seen another port trip over this so I didn't create
an effective-target test.

Installing on the trunk momentarily.

Jeff
* gcc.dg/complex-6.c: Do not run dump scan tests for rx target.

diff --git a/gcc/testsuite/gcc.dg/complex-6.c b/gcc/testsuite/gcc.dg/complex-6.c
index e70322bf6f3..a7eae1e2513 100644
--- a/gcc/testsuite/gcc.dg/complex-6.c
+++ b/gcc/testsuite/gcc.dg/complex-6.c
@@ -9,5 +9,5 @@ foo (__complex float a, __complex float b)
   return a * b;
 }
 
-/* { dg-final { scan-tree-dump-times "unord" 1 "cplxlower1" } } */
-/* { dg-final { scan-tree-dump-times "__mulsc3" 1 "cplxlower1" } } */
+/* { dg-final { scan-tree-dump-times "unord" 1 "cplxlower1" { target { ! 
rx*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "__mulsc3" 1 "cplxlower1" { target { ! 
rx*-*-* } } } } */


Re: [C++ coroutines 5/6] Standard library header.

2019-11-17 Thread Jonathan Wakely

On 17/11/19 10:27 +, Iain Sandoe wrote:

This provides the interfaces mandated by the standard and implements
the interaction with the coroutine frame by means of inline use of
builtins expanded at compile-time.  There should be a 1:1 correspondence
with the standard sections which are cross-referenced.

There is no runtime content.

At this stage we have the content in an inline namespace "n4835" for
the current CD.

libstdc++-v3/ChangeLog:

2019-11-17  Iain Sandoe  

* include/Makefile.am: Add coroutine to the experimental set.
* include/Makefile.in: Regnerated.


"Regnerated" typo.


* include/experimental/coroutine: New file.
---
libstdc++-v3/include/Makefile.am|   1 +
libstdc++-v3/include/Makefile.in|   1 +
libstdc++-v3/include/experimental/coroutine | 268 
3 files changed, 270 insertions(+)
create mode 100644 libstdc++-v3/include/experimental/coroutine

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 49fd413..4ffe209 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -708,6 +708,7 @@ experimental_headers = \
${experimental_srcdir}/array \
${experimental_srcdir}/buffer \
${experimental_srcdir}/chrono \
+   ${experimental_srcdir}/coroutine \


The experimental dir is (currently) only used for TS headers. All
C++20 support is currently experimental, so adding  where
 and  have been added would be OK.

But I'm not really clear if this is an implementation of the TS or the
C++20 feature.  If it's a hybrid, putting it in
 is fine.

When the final  header is added it will need to be in
libsupc++ so that it's included for freestanding builds (and at that
point it won't be able to use , but that will be
OK as the final header will be C++20-only and can rely on 
unconditionally, which is also freestanding).


${experimental_srcdir}/deque \
${experimental_srcdir}/executor \
${experimental_srcdir}/forward_list \




diff --git a/libstdc++-v3/include/experimental/coroutine 
b/libstdc++-v3/include/experimental/coroutine
new file mode 100644
index 000..d903352
--- /dev/null
+++ b/libstdc++-v3/include/experimental/coroutine
@@ -0,0 +1,268 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file experimental/coroutine
+ *  This is an experimental C++ Library header against the C++20 CD n4835.
+ *  @ingroup coroutine-ts


The coroutine-ts doc group should be defined somewhere.


+ */
+
+#ifndef _GLIBCXX_EXPERIMENTAL_COROUTINE
+#define _GLIBCXX_EXPERIMENTAL_COROUTINE 1
+
+#pragma GCC system_header
+
+// It is very likely that earlier versions would work, but they are untested.
+#if __cplusplus >= 201402L
+
+#include 
+
+#if __cplusplus > 201703L && __cpp_impl_three_way_comparison >= 201907L
+#  include 
+#  define THE_SPACESHIP_HAS_LANDED 1


This is in trunk now, although not supported by Clang, and not
supported by GCC pre-C++20, so the fallback is OK.

The macro name should be a reserved name though, e.g.
_THE_SPACESHIP_HAS_LANDED


+#else
+#  include 
+#  define THE_SPACESHIP_HAS_LANDED 0
+#endif
+
+namespace std _GLIBCXX_VISIBILITY (default)
+{
+  _GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+#if __cpp_coroutines
+
+  namespace experimental {
+  inline namespace coroutines_n4835 {


This should be a reserved name too, e.g. __coroutines_n4835.


+
+  // [coroutine.traits]
+  // [coroutine.traits.primary]
+  // 17.12.2 coroutine traits
+  template  struct coroutine_traits
+  {
+using promise_type = typename _R::promise_type;
+  };
+
+  // 17.12.3 Class template coroutine_handle
+  // [coroutine.handle]
+  template  struct coroutine_handle;
+
+  template <> struct coroutine_handle
+  {
+  public:
+// 17.12.3.1, construct/reset
+constexpr coroutine_handle () noexcept : __fr_ptr (0) {}


The libstdc++ naming convention is _M_xxx for non-static members (both
data members and member functions) and

Re: [PATCH 0/4] Eliminate cc0 from m68k

2019-11-17 Thread Mikael Pettersson
On Wed, 13 Nov 2019 14:04:59 +0100, Bernd Schmidt
 wrote:
> This is a set of patches to convert m68k so that it no longer uses cc0.

Thank you for doing this.  I attempted a native bootstrap of
gcc-10-20191110 (r278028) plus the five patches posted so far on
m68k-linux (aranym), but it failed in stage 2:

/mnt/scratch/objdir10/./prev-gcc/xg++
-B/mnt/scratch/objdir10/./prev-gcc/
-B/mnt/scratch/install10/m68k-unknown-linux-gnu/bin/ -nostdinc++
-B/mnt/scratch/objdir10/prev-m68k-unknown-linux-gnu/libstdc++-v3/src/.libs
-B/mnt/scratch/objdir10/prev-m68k-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
 
-I/mnt/scratch/objdir10/prev-m68k-unknown-linux-gnu/libstdc++-v3/include/m68k-unknown-linux-gnu
 -I/mnt/scratch/objdir10/prev-m68k-unknown-linux-gnu/libstdc++-v3/include
 -I/mnt/scratch/gcc-10-20191110/libstdc++-v3/libsupc++
-L/mnt/scratch/objdir10/prev-m68k-unknown-linux-gnu/libstdc++-v3/src/.libs
-L/mnt/scratch/objdir10/prev-m68k-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs
-fno-PIE -c   -g -O2 -fno-checking -gtoggle -DIN_GCC
-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall
-Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-error=format-diag
-Wmissing-format-attribute -Woverloaded-virtual -pedantic
-Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror
-DHAVE_CONFIG_H -I. -I. -I/mnt/scratch/gcc-10-20191110/gcc
-I/mnt/scratch/gcc-10-20191110/gcc/.
-I/mnt/scratch/gcc-10-20191110/gcc/../include
-I/mnt/scratch/gcc-10-20191110/gcc/../libcpp/include
-I/mnt/scratch/gcc-10-20191110/gcc/../libdecnumber
-I/mnt/scratch/gcc-10-20191110/gcc/../libdecnumber/dpd
-I../libdecnumber -I/mnt/scratch/gcc-10-20191110/gcc/../libbacktrace
-o dbxout.o -MT dbxout.o -MMD -MP -MF ./.deps/dbxout.TPo
/mnt/scratch/gcc-10-20191110/gcc/dbxout.c
/tmp/ccJA1qws.s: Assembler messages:
/tmp/ccJA1qws.s:4828: Error: operands mismatch -- statement `seq %a1' ignored
/tmp/ccJA1qws.s:7344: Error: operands mismatch -- statement `seq %a1' ignored
Makefile:1118: recipe for target 'dbxout.o' failed
make[3]: *** [dbxout.o] Error 1
make[3]: Leaving directory '/mnt/scratch/objdir10/gcc'
Makefile:4740: recipe for target 'all-stage2-gcc' failed
make[2]: *** [all-stage2-gcc] Error 2
make[2]: Leaving directory '/mnt/scratch/objdir10'
Makefile:20204: recipe for target 'stage2-bubble' failed
make[1]: *** [stage2-bubble] Error 2
make[1]: Leaving directory '/mnt/scratch/objdir10'
Makefile:20399: recipe for target 'bootstrap' failed
make: *** [bootstrap] Error 2

An Scc instruction cannot have an address register as destination operand.

I don't have a reduced test case, but the error can be reproduced by
building a cross gcc to m68k-linux and then using that to build a
native gcc for m68k-linux.

/Mikael


Re: [C++ coroutines 1/6] Common code and base definitions.

2019-11-17 Thread Jeff Law
On 11/17/19 3:24 AM, Iain Sandoe wrote:
> This part of the patch series provides the gating flag, the keywords,
> cpp defines etc.
> 
> gcc/ChangeLog:
> 
> 2019-11-17  Iain Sandoe  
> 
>   * doc/invoke.texi: Document the fcoroutines command line
>   switch.
> 
> gcc/c-family/ChangeLog:
> 
> 2019-11-17  Iain Sandoe  
> 
>   * c-common.c (co_await, co_yield, co_return): New.
>   * c-common.h (RID_CO_AWAIT, RID_CO_YIELD,
>   RID_CO_RETURN): New enumeration values.
>   (D_CXX_COROUTINES): Bit to identify coroutines are active.
>   (D_CXX_COROUTINES_FLAGS): Guard for coroutine keywords.
>   * c-cppbuiltin.c (__cpp_coroutines): New cpp define.
>   * c.opt (fcoroutines): New command-line switch.
> 
> gcc/cp/ChangeLog:
> 
> 2019-11-17  Iain Sandoe  
> 
>   * cp-tree.h (lang_decl-fn): coroutine_p, new bit.
>   * lex.c (init_reswords): Enable keywords when the coroutine flag
>   is set,
>   * operators.def (co_await): New operator.
Looks quite reasonable to me.  If you need minor twiddling due to
reviewer feedback elsewhere those are pre-approved as well.

jeff



[C++ coroutines 5/6] Standard library header.

2019-11-17 Thread Iain Sandoe
This provides the interfaces mandated by the standard and implements
the interaction with the coroutine frame by means of inline use of
builtins expanded at compile-time.  There should be a 1:1 correspondence
with the standard sections which are cross-referenced.

There is no runtime content.

At this stage we have the content in an inline namespace "n4835" for
the current CD.

libstdc++-v3/ChangeLog:

2019-11-17  Iain Sandoe  

* include/Makefile.am: Add coroutine to the experimental set.
* include/Makefile.in: Regnerated.
* include/experimental/coroutine: New file.
---
 libstdc++-v3/include/Makefile.am|   1 +
 libstdc++-v3/include/Makefile.in|   1 +
 libstdc++-v3/include/experimental/coroutine | 268 
 3 files changed, 270 insertions(+)
 create mode 100644 libstdc++-v3/include/experimental/coroutine

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 49fd413..4ffe209 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -708,6 +708,7 @@ experimental_headers = \
${experimental_srcdir}/array \
${experimental_srcdir}/buffer \
${experimental_srcdir}/chrono \
+   ${experimental_srcdir}/coroutine \
${experimental_srcdir}/deque \
${experimental_srcdir}/executor \
${experimental_srcdir}/forward_list \
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index acc4fe5..fdb7d3d 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1052,6 +1052,7 @@ experimental_headers = \
${experimental_srcdir}/array \
${experimental_srcdir}/buffer \
${experimental_srcdir}/chrono \
+   ${experimental_srcdir}/coroutine \
${experimental_srcdir}/deque \
${experimental_srcdir}/executor \
${experimental_srcdir}/forward_list \
diff --git a/libstdc++-v3/include/experimental/coroutine 
b/libstdc++-v3/include/experimental/coroutine
new file mode 100644
index 000..d903352
--- /dev/null
+++ b/libstdc++-v3/include/experimental/coroutine
@@ -0,0 +1,268 @@
+//  -*- C++ -*-
+
+// Copyright (C) 2019 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file experimental/coroutine
+ *  This is an experimental C++ Library header against the C++20 CD n4835.
+ *  @ingroup coroutine-ts
+ */
+
+#ifndef _GLIBCXX_EXPERIMENTAL_COROUTINE
+#define _GLIBCXX_EXPERIMENTAL_COROUTINE 1
+
+#pragma GCC system_header
+
+// It is very likely that earlier versions would work, but they are untested.
+#if __cplusplus >= 201402L
+
+#include 
+
+#if __cplusplus > 201703L && __cpp_impl_three_way_comparison >= 201907L
+#  include 
+#  define THE_SPACESHIP_HAS_LANDED 1
+#else
+#  include 
+#  define THE_SPACESHIP_HAS_LANDED 0
+#endif
+
+namespace std _GLIBCXX_VISIBILITY (default)
+{
+  _GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+#if __cpp_coroutines
+
+  namespace experimental {
+  inline namespace coroutines_n4835 {
+
+  // [coroutine.traits]
+  // [coroutine.traits.primary]
+  // 17.12.2 coroutine traits
+  template  struct coroutine_traits
+  {
+using promise_type = typename _R::promise_type;
+  };
+
+  // 17.12.3 Class template coroutine_handle
+  // [coroutine.handle]
+  template  struct coroutine_handle;
+
+  template <> struct coroutine_handle
+  {
+  public:
+// 17.12.3.1, construct/reset
+constexpr coroutine_handle () noexcept : __fr_ptr (0) {}
+constexpr coroutine_handle (decltype (nullptr) __h) noexcept
+  : __fr_ptr (__h)
+{}
+coroutine_handle &operator= (decltype (nullptr)) noexcept
+{
+  __fr_ptr = nullptr;
+  return *this;
+}
+
+  public:
+// 17.12.3.2, export/import
+constexpr void *address () const noexcept { return __fr_ptr; }
+constexpr static coroutine_handle from_address (void *__a) noexcept
+{
+  coroutine_handle __self;
+  __self.__fr_ptr = __a;
+  return __self;
+}
+
+  public:
+// 17.12.3.3, observers

[C++ coroutines 4/6] Middle end expanders and transforms.

2019-11-17 Thread Iain Sandoe


As described in the covering note, the main part of this is the
expansion of the library support builtins, these are simple boolean
or numerical substitutions.

The functionality of implementing an exit from scope without cleanup
is performed here by lowering an IFN to a gimple goto.

The final part is the expansion of the coroutine IFNs that describe the
state machine connections to the dispatchers.

   In the front end we construct a single actor function that contains
   the coroutine state machine.

   The actor function has three entry conditions:
1. from the ramp, resume point 0 - to initial-suspend.
2. when resume () is executed (resume point N).
3. from the destroy () shim when that is executed.

   The actor function begins with two dispatchers; one for resume and
   one for destroy (where the initial entry from the ramp is a special-
   case of resume point 0).

   Each suspend point and each dispatch entry is marked with an IFN such
   that we can connect the relevant dispatchers to their target labels.

   So, if we have:

   CO_YIELD (NUM, FINAL, RES_LAB, DEST_LAB, FRAME_PTR)

   This is await point NUM, and is the final await if FINAL is non-zero.
   The resume point is RES_LAB, and the destroy point is DEST_LAB.

   We expect to find a CO_ACTOR (NUM) in the resume dispatcher and a
   CO_ACTOR (NUM+1) in the destroy dispatcher.

   Initially, the intent of keeping the resume and destroy paths together
   is that the conditionals controlling them are identical, and thus there
   would be duplication of any optimisation of those paths if the split
   were earlier.

   Subsequent inlining of the actor (and DCE) is then able to extract the
   resume and destroy paths as separate functions if that is found
   profitable by the optimisers.

   Once we have remade the connections to their correct postions, we elide
   the labels that the front end inserted.

gcc/ChangeLog:

2019-11-17  Iain Sandoe  

* Makefile.in: Add coroutine-passes.o.
* coroutine-passes.cc: New file.
* passes.def: Add pass_coroutine_lower_builtins,
pass_coroutine_early_expand_ifns and
pass_coroutine_finalize_frame.
* tree-pass.h (make_pass_coroutine_lower_builtins): New.
(make_pass_coroutine_early_expand_ifns): New.
(make_pass_coroutine_finalize_frame): New.
---
 gcc/Makefile.in |   1 +
 gcc/coroutine-passes.cc | 621 
 gcc/passes.def  |   3 +
 gcc/tree-pass.h |   3 +
 4 files changed, 628 insertions(+)
 create mode 100644 gcc/coroutine-passes.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ac21401..fc7226a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1266,6 +1266,7 @@ OBJS = \
compare-elim.o \
context.o \
convert.o \
+   coroutine-passes.o \
coverage.o \
cppbuiltin.o \
cppdefault.o \
diff --git a/gcc/coroutine-passes.cc b/gcc/coroutine-passes.cc
new file mode 100644
index 000..33e1d38
--- /dev/null
+++ b/gcc/coroutine-passes.cc
@@ -0,0 +1,621 @@
+/* coroutine expansion and optimisation passes.
+
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+ Contributed by Iain Sandoe  under contract to Facebook.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "cgraph.h"
+#include "pretty-print.h"
+#include "diagnostic-core.h"
+#include "fold-const.h"
+#include "internal-fn.h"
+#include "langhooks.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimplify-me.h"
+#include "gimple-walk.h"
+#include "gimple-fold.h"
+#include "tree-cfg.h"
+#include "tree-into-ssa.h"
+#include "tree-ssa-propagate.h"
+#include "gimple-pretty-print.h"
+#include "cfghooks.h"
+
+/* Here we:
+   * lower the internal function that implements an exit from scope.
+   * expand the builtins that are used to implement the library
+ interfaces to the coroutine frame.  */
+
+static tree
+lower_coro_builtin (gimple_stmt_iterator *gsi, bool *handled_ops_p,
+   struct walk_stmt_info *wi ATTRIBUTE_UNUSED)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+
+  *handled_ops_p = !gimple_has_substatements (stmt);
+  if (gimple_code (stmt) != GIMPL

[C++ coroutines 3/6] Front end parsing and transforms.

2019-11-17 Thread Iain Sandoe


As described in the covering note, there are two parts to this.

1. Parsing, template instantiation and diagnostics for the standard-
   mandated class entries.

  The user authors a function that becomes a coroutine (lazily) by
  making use of any of the co_await, co_yield or co_return keywords.

  Unlike a regular function, where the activation record is placed on the
  stack, and is destroyed on function exit, a coroutine has some state that
  persists between calls - the coroutine frame (analogous to a stack frame).

  We transform the user's function into three pieces:
  1. A so-called ramp function, that establishes the coroutine frame and
 begins execution of the coroutine.
  2. An actor function that contains the state machine corresponding to the
 user's suspend/resume structure.
  3. A stub function that calls the actor function in 'destroy' mode.

  The actor function is executed:
   * from "resume point 0" by the ramp.
   * from resume point N ( > 0 ) for handle.resume() calls.
   * from the destroy stub for destroy point N for handle.destroy() calls.

  The functions in this file carry out the necessary analysis of, and
  transforms to, the AST to perform this.

  The C++ coroutine design makes use of some helper functions that are
  authored in a so-called "promise" class provided by the user.

  At parse time (or post substitution) the type of the coroutine promise
  will be determined.  At that point, we can look up the required promise
  class methods and issue diagnostics if they are missing or incorrect.  To
  avoid repeating these actions at code-gen time, we make use of temporary
  'proxy' variables for the coroutine handle and the promise - which will
  eventually be instantiated in the coroutine frame.

  Each of the keywords will expand to a code sequence (although co_yield is
  just syntactic sugar for a co_await).

  We defer the analysis and transformatin until template expansion is
  complete so that we have complete types at that time.

2. AST analysis and transformation which performs the code-gen for the
   outlined state machine.

   The entry point here is morph_fn_to_coro () which is called from
   finish_function () when we have completed any template expansion.

   This is preceded by helper functions that implement the phases below.

   The process proceeds in four phases.

   A Initial framing.
 The user's function body is wrapped in the initial and final suspend
 points and we begin building the coroutine frame.
 We build empty decls for the actor and destroyer functions at this
 time too.
 When exceptions are enabled, the user's function body will also be
 wrapped in a try-catch block with the catch invoking the promise
 class 'unhandled_exception' method.

   B Analysis.
 The user's function body is analysed to determine the suspend points,
 if any, and to capture local variables that might persist across such
 suspensions.  In most cases, it is not necessary to capture compiler
 temporaries, since the tree-lowering nests the suspensions correctly.
 However, in the case of a captured reference, there is a lifetime
 extension to the end of the full expression - which can mean across a
 suspend point in which case it must be promoted to a frame variable.

 At the conclusion of analysis, we have a conservative frame layout and
 maps of the local variables to their frame entry points.

   C Build the ramp function.
 Carry out the allocation for the coroutine frame (NOTE; the actual size
 computation is deferred until late in the middle end to allow for future
 optimisations that will be allowed to elide unused frame entries).
 We build the return object.

   D Build and expand the actor and destroyer function bodies.
 The destroyer is a trivial shim that sets a bit to indicate that the
 destroy dispatcher should be used and then calls into the actor.

 The actor function is the implementation of the user's state machine.
 The current suspend point is noted in an index.
 Each suspend point is encoded as a pair of internal functions, one in
 the relevant dispatcher, and one representing the suspend point.

 During this process, the user's local variables and the proxies for the
 self-handle and the promise class instanceare re-written to their
 coroutine frame equivalents.

 The complete bodies for the ramp, actor and destroy function are passed
 back to finish_function for folding and gimplification.

gcc/cp/ChangeLog:

2019-11-17  Iain Sandoe  

* Make-lang.in: Add coroutines.o.
* call.c (add_builtin_candidates): Handle CO_AWAIT_EXPR.
(op_error): Likewise.
(build_new_op_1): Likewise.
* constexpr.c (potential_constant_expression_1): Handle
CO_AWAIT_EXPR, CO_YIELD_EXPR.
* coroutines.cc: New file.
* cp-objcp-common.c (cp_common_init_ts): Add CO_AWAIT_EXPR,
CO_YIELD_EXPR,

[C++ coroutines 2/6] Define builtins and internal functions.

2019-11-17 Thread Iain Sandoe


This part of the patch series provides the builtin functions
used by the standard library code and the internal functions
used to implement lowering of the coroutine state machine.

gcc/ChangeLog:

2019-11-17  Iain Sandoe  

* builtin-types.def (BT_FN_BOOL_PTR): New.
(BT_FN_PTR_PTR_SIZE_BOOL): New.
* builtins.def (DEF_COROUTINE_BUILTIN): New.
* coroutine-builtins.def: New file.
* internal-fn.c (expand_CO_FRAME): New.
(expand_CO_YIELD): New.
(expand_CO_SUSPN): New.
(expand_CO_ACTOR): New.
* internal-fn.def (CO_ACTOR): New.
(CO_YIELD): New.
(CO_SUSPN): New.
(CO_FRAME): New.
---
 gcc/builtin-types.def  |  3 +++
 gcc/builtins.def   |  9 
 gcc/coroutine-builtins.def | 52 ++
 gcc/internal-fn.c  | 26 +++
 gcc/internal-fn.def|  6 ++
 5 files changed, 96 insertions(+)
 create mode 100644 gcc/coroutine-builtins.def

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index e5c9e06..6b4875e 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -297,6 +297,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_UINT32_UINT32, BT_UINT32, 
BT_UINT32)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT64_UINT64, BT_UINT64, BT_UINT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT64_FLOAT, BT_UINT64, BT_FLOAT)
 DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
+DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_PTR, BT_BOOL, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_CONST_PTR_CONST_PTR, BT_CONST_PTR, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT32, BT_UINT16, BT_UINT32)
@@ -625,6 +626,8 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_UINT32_UINT32_PTR,
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_SIZE_SIZE_PTR, BT_VOID, BT_SIZE, BT_SIZE,
 BT_PTR)
 DEF_FUNCTION_TYPE_3 (BT_FN_UINT_UINT_PTR_PTR, BT_UINT, BT_UINT, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_SIZE_BOOL,
+BT_PTR, BT_PTR, BT_SIZE, BT_BOOL)
 
 DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR,
 BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index d8233f5..5ad9608 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -189,6 +189,12 @@ along with GCC; see the file COPYING3.  If not see
   DEF_BUILTIN (ENUM, NAME, BUILT_IN_NORMAL, BT_LAST, BT_LAST, false, false, \
   false, ATTR_LAST, false, false)
 
+/* Builtins used in implementing coroutine support. */
+#undef DEF_COROUTINE_BUILTIN
+#define DEF_COROUTINE_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
+  DEF_BUILTIN (ENUM, "__builtin_coro_" NAME, BUILT_IN_NORMAL, TYPE, TYPE, \
+  true, true, true, ATTRS, true, flag_coroutines)
+
 /* Builtin used by the implementation of OpenACC and OpenMP.  Few of these are
actually implemented in the compiler; most are in libgomp.  */
 /* These builtins also need to be enabled in offloading compilers invoked from
@@ -1064,6 +1070,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, 
ATTR_NOTHROW_LEAF_LIST)
 /* Sanitizer builtins. */
 #include "sanitizer.def"
 
+/* Coroutine builtins.  */
+#include "coroutine-builtins.def"
+
 /* Do not expose the BRIG builtins by default gcc-wide, but only privately in
the BRIG FE as long as there are no references for them in the middle end
or any of the upstream backends.  */
diff --git a/gcc/coroutine-builtins.def b/gcc/coroutine-builtins.def
new file mode 100644
index 000..2f611e9
--- /dev/null
+++ b/gcc/coroutine-builtins.def
@@ -0,0 +1,52 @@
+/* This file contains the definitions and documentation for the
+   coroutines builtins used in GCC.
+
+   Copyright (C) 2018-2019 Free Software Foundation, Inc.
+
+ Contributed by Iain Sandoe  under contract to Facebook.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Before including this file, you should define a macro:
+
+ DEF_BUILTIN_STUB(ENUM, NAME)
+ DEF_COROUTINE_BUILTIN (ENUM, NAME, TYPE, ATTRS)
+
+   See builtins.def for details.
+   The builtins are created used by library implementations of C++
+   coroutines.  */
+
+/* This has to come before all the coroutine builtins.  */
+DEF_BUILTIN_STUB (BEGIN_COROUTINE_BUILTINS, (const char *) 0)
+
+/* These are the builtins that are externally-visible and used by the
+   standard library implementation of t

[C++ coroutines 1/6] Common code and base definitions.

2019-11-17 Thread Iain Sandoe
This part of the patch series provides the gating flag, the keywords,
cpp defines etc.

gcc/ChangeLog:

2019-11-17  Iain Sandoe  

* doc/invoke.texi: Document the fcoroutines command line
switch.

gcc/c-family/ChangeLog:

2019-11-17  Iain Sandoe  

* c-common.c (co_await, co_yield, co_return): New.
* c-common.h (RID_CO_AWAIT, RID_CO_YIELD,
RID_CO_RETURN): New enumeration values.
(D_CXX_COROUTINES): Bit to identify coroutines are active.
(D_CXX_COROUTINES_FLAGS): Guard for coroutine keywords.
* c-cppbuiltin.c (__cpp_coroutines): New cpp define.
* c.opt (fcoroutines): New command-line switch.

gcc/cp/ChangeLog:

2019-11-17  Iain Sandoe  

* cp-tree.h (lang_decl-fn): coroutine_p, new bit.
* lex.c (init_reswords): Enable keywords when the coroutine flag
is set,
* operators.def (co_await): New operator.
---
 gcc/c-family/c-common.c |  5 +
 gcc/c-family/c-common.h |  5 +
 gcc/c-family/c-cppbuiltin.c |  2 ++
 gcc/c-family/c.opt  |  4 
 gcc/cp/cp-tree.h| 17 -
 gcc/cp/lex.c|  2 ++
 gcc/cp/operators.def|  1 +
 gcc/doc/invoke.texi |  4 
 8 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4881199..8be92a6 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -536,6 +536,11 @@ const struct c_common_resword c_common_reswords[] =
   { "concept", RID_CONCEPT,D_CXX_CONCEPTS_FLAGS | D_CXXWARN },
   { "requires",RID_REQUIRES,   D_CXX_CONCEPTS_FLAGS | D_CXXWARN },
 
+  /* Coroutines-related keywords */
+  { "co_await",RID_CO_AWAIT,   D_CXX_COROUTINES_FLAGS | 
D_CXXWARN },
+  { "co_yield",RID_CO_YIELD,   D_CXX_COROUTINES_FLAGS | 
D_CXXWARN },
+  { "co_return",   RID_CO_RETURN,  D_CXX_COROUTINES_FLAGS | D_CXXWARN },
+
   /* These Objective-C keywords are recognized only immediately after
  an '@'.  */
   { "compatibility_alias", RID_AT_ALIAS,   D_OBJC },
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 80a8c9f..6ec0910 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -189,6 +189,9 @@ enum rid
   /* C++ concepts */
   RID_CONCEPT, RID_REQUIRES,
 
+  /* C++ coroutines */
+  RID_CO_AWAIT, RID_CO_YIELD, RID_CO_RETURN,
+
   /* C++ transactional memory.  */
   RID_ATOMIC_NOEXCEPT, RID_ATOMIC_CANCEL, RID_SYNCHRONIZED,
 
@@ -433,9 +436,11 @@ extern machine_mode c_default_pointer_mode;
 #define D_TRANSMEM 0X0800  /* C++ transactional memory TS.  */
 #define D_CXX_CHAR8_T  0X1000  /* In C++, only with -fchar8_t.  */
 #define D_CXX200x2000  /* In C++, C++20 only.  */
+#define D_CXX_COROUTINES 0x4000  /* In C++, only with coroutines.  */
 
 #define D_CXX_CONCEPTS_FLAGS D_CXXONLY | D_CXX_CONCEPTS
 #define D_CXX_CHAR8_T_FLAGS D_CXXONLY | D_CXX_CHAR8_T
+#define D_CXX_COROUTINES_FLAGS (D_CXXONLY | D_CXX_COROUTINES)
 
 /* The reserved keyword table.  */
 extern const struct c_common_resword c_common_reswords[];
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index cf3d437..6299d47 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1000,6 +1000,8 @@ c_cpp_builtins (cpp_reader *pfile)
   else
 cpp_define (pfile, "__cpp_concepts=201507L");
 }
+  if (flag_coroutines)
+   cpp_define (pfile, "__cpp_coroutines=201902L"); /* n4835, C++20 CD */
   if (flag_tm)
/* Use a value smaller than the 201505 specified in
   the TS, since we don't yet support atomic_cancel.  */
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 914a2f0..62bf4f1 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1469,6 +1469,10 @@ fconstexpr-ops-limit=
 C++ ObjC++ Joined RejectNegative Host_Wide_Int Var(constexpr_ops_limit) 
Init(33554432)
 -fconstexpr-ops-limit= Specify maximum number of constexpr operations 
during a single constexpr evaluation.
 
+fcoroutines
+C++ LTO Var(flag_coroutines)
+Enable C++ coroutines (experimental).
+
 fdebug-cpp
 C ObjC C++ ObjC++
 Emit debug annotations during preprocessing.
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index adc021b..6fb99d8 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2696,7 +2696,9 @@ struct GTY(()) lang_decl_fn {
   unsigned has_dependent_explicit_spec_p : 1;
   unsigned immediate_fn_p : 1;
   unsigned maybe_deleted : 1;
-  unsigned spare : 10;
+  unsigned coroutine_p : 1;
+
+  unsigned spare : 9;
 
   /* 32-bits padding on 64-bit host.  */
 
@@ -4982,6 +4984,13 @@ more_aggr_init_expr_args_p (const 
aggr_init_expr_arg_iterator *iter)
 #define QUALIFIED_NAME_IS_TEMPLATE(NODE) \
   (TREE_LANG_FLAG_1 (SCOPE_REF_CHECK (NODE)))
 
+/* [coroutines]
+*/
+
+/* True if NODE is a co-routine FUNCTION_DECL.  */
+#define DECL_COROUTINE_P(NODE) \
+  (LANG_DECL_FN_CHECK (DECL_COMMON_CHECK (NODE))->coroutine_p)

[C++ coroutines 0/6] Implement C++ coroutines.

2019-11-17 Thread Iain Sandoe


This patch series is an initial implementation of a coroutine feature,
expected to be standardised in C++20.

Standardisation status (and potential impact on this implementation):
--

The facility was accepted into the working draft for C++20 by WG21 in
February 2019.  During two following WG21 meetings, design and national
body comments have been reviewed, with no significant change resulting.

Mature implementations (several years) of this exist in MSVC, clang and
EDG with some experience using the clang one in production - so that the
underlying principles are thought to be sound.

At this stage, the remaining potential for change comes from two areas of
national body comments that were not resolved during the last WG21 meeting:
(a) handling of the situation where aligned allocation is available.
(b) handling of the situation where a user wants coroutines, but does not
want exceptions (e.g. a GPU).

It is not expected that the resolution to either of these will produce any
major change.

The current GCC implementation is against n4835 [1].

ABI
---

The various compiler developers have discussed a minimal ABI to allow one
implementation to call coroutines compiled by another; this amounts to:

1. The layout of a public portion of the coroutine frame.
2. A number of compiler builtins that the standard library might use.

The eventual home for the ABI is not decided yet, I will put a draft onto
the wiki this week.

The ABI has currently no target-specific content (a given psABI might elect
to mandate alignment, but the common ABI does not do this).

There is not need to add any new mangling, since the components of this are
regular functions with manipulation of the coroutine via a type-erased handle.

Standard Library impact
---

The current implementations require addition of only a single header to
the standard library (no change to the runtime).  This header is part of
the patch series.

GCC Implementation outline
--

The standard's design for coroutines does not decorate the definition of
a coroutine in any way, so that a function is only known to be a coroutine
when one of the keywords (co_await, co_yield, co_return) is encountered.

This means that we cannot special-case such functions from the outset, but
must process them differently when they are finalised - which we do from
"finish_function ()".

At a high level, this design of coroutine produces four pieces from the
original user's function:

  1. A coroutine state frame (taking the logical place of the activation
 record for a regular function).  One item stored in that state is the
 index of the current suspend point.
  2. A "ramp" function
 This is what the user calls to construct the coroutine frame and start
 the coroutine execution.  This will return some object representing the
 coroutine's eventual return value (or means to continue it when it it
 suspended).
  3. A "resume" function.
 This is what gets called when a the coroutine is resumed when suspended.
  4. A "destroy" function.
 This is what gets called when the coroutine state should be destroyed
 and its memory returned.

The standard's coroutines involve cooperation of the user's authored function
with a provided "promise" class, which includes mandatory methods for
handling the state transitions and providing output values.  Most realistic
coroutines will also have one or more 'awaiter' classes that implement the
user's actions for each suspend point.  As we parse (or during template
expansion) the types of the promise and awaiter classes become known, and can
then be verified against the signatures expected by the standard.

Once the function is parsed (and templates expanded) we are able to make the
transformation into the four pieces noted above.

The implementation here takes the approach of a series of AST transforms.
The state machine suspend points are encoded in three internal functions
(one of which represents an exit from scope without cleanups).  These three 
IFNs are lowered early in the middle end, such that the majority of GCC's
optimisers can be run on the resulting output.

As a design choice, we have carried out the outlining of the user's function
in the front end, and taken advantage of the existing middle end's abilities
to inline and DCE where that is profitable.

Since the state machine is actually common to both resumer and destroyer
functions, we make only a single function "actor" that contains both the
resume and destroy paths.  The destroy function is represented by a small
stub that sets a value to signal the use of the destroy path and calls the
actor.  The idea is that optimisation of the state machine need only be done
once - and then the resume and destroy paths can be identified allowing the
middle end's inline and DCE machinery to optimise as profitable as noted above.

The middle end components for this implementation are:
 1. Lower the