RE: [PATCH PR95459] aarch64: ICE in aarch64_short_vector_p, at config/aarch64/aarch64.c:16803

2020-06-02 Thread Yangfei (Felix)
Hi Richard,

> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Wednesday, June 3, 2020 1:19 AM
> To: Yangfei (Felix) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR95459] aarch64: ICE in aarch64_short_vector_p, at
> config/aarch64/aarch64.c:16803
> 
> "Yangfei (Felix)"  writes:
> > Hi,
> >
> > Please review this trivial patch fixing an ICE in 
> > aarch64_short_vector_p.
> > Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95459
> >
> > In aarch64_short_vector_p, we are simply checking whether a type (and
> a mode)
> > is a 64/128-bit short vector or not.  This should not be affected by the
> value
> > of TARGET_SVE.  Simply leave later code to report an error if SVE is
> disabled.
> > Bootstrapped and tested on aarch64-linux-gnu.  OK?
> 
> OK, thanks.  Pushed to master.

Thanks for reviewing and installing the patch : - )

Felix


Re: [PATCH 5/7 v3] vect: Support vector load/store with length in vectorizer

2020-06-02 Thread Kewen.Lin via Gcc-patches
Hi Richard,

Thanks a lot for your great comments!

on 2020/6/2 下午7:50, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> on 2020/5/29 下午4:32, Richard Sandiford wrote:
>>> "Kewen.Lin"  writes:
 on 2020/5/27 下午6:02, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>

snip ...

> 
> It would be easier to review, and perhaps easier to bisect,
> if some of the mechanical changes were split out.  E.g.:
> 
> - Rename can_fully_mask_p to can_use_partial_vectors_p.
> 
> - Rename fully_masked_p to using_partial_vectors_p.
> 
> - Rename things related to rgroup_masks.  I think “rgroup_controls”
>   or “rgroup_guards” might be more descriptive than “rgroup_objs”.
> 
> These should be fairly mechanical changes and can happen ahead of
> the main series.  It'll then be easier to see what's different
> for masks and lengths, separately from the more mechanical stuff.
> 

Good suggestion.  My fault, I should have done it before. 
Will split it into some NFC patches.

> As far as:
> 
> +  union
> +  {
> +/* The type of mask to use, based on the highest nS recorded above.  */
> +tree mask_type;
> +/* Any vector type to use these lengths.  */
> +tree vec_type;
> +  };
> 
> goes, some parts of the code seem to use mask_type for lengths too,
> which I'm a bit nervous about.  I think we should either be consistent
> about which union field we use (always mask_type for masks, always
> vec_type for lengths) or we should just rename mask_type to something
> more generic.  Just "type" might be good enough with a suitable comment.

Will fix it.

> 
>>  {
>>tree compare_type = LOOP_VINFO_MASK_COMPARE_TYPE (loop_vinfo);
>>tree iv_type = LOOP_VINFO_MASK_IV_TYPE (loop_vinfo);
>> -  tree mask_type = rgm->mask_type;
>> -  unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter;
>> -  poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type);
>> +
>> +  bool vect_for_masking = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
>> +  if (!vect_for_masking)
>> +{
>> +  /* Obtain target supported length type.  */
>> +  scalar_int_mode len_mode = targetm.vectorize.length_mode;
>> +  unsigned int len_prec = GET_MODE_PRECISION (len_mode);
>> +  compare_type = build_nonstandard_integer_type (len_prec, true);
>> +  /* Simply set iv_type as same as compare_type.  */
>> +  iv_type = compare_type;
> 
> This might not be the best time to bring this up :-) but it seems
> odd to be asking the target for the induction variable type here.
> I got the impression that the hook was returning DImode, whereas
> the PowerPC instructions only looked at the low 8 bits of the length.
> If so, forcing a naturally 32-bit IV to DImode would insert extra
> sign/zero extensions, even though the input to the length intrinsics
> would have been happy with the 32-bit IV.
> 

Good point, I'll check it with some cases.  As Segher pointed out, the 8
bits in bits 0-7 (the top, abnormal I admit), these vector with length
instructions are guarded in 64 bits only.  IIUC the extra sign/zero
extensions would exist in pre-header with current setting?  At that time
I thought the iv with less precision than length mode had to be converted
later for length's need, it looks good to use length mode simply.

> I think it would make sense to ask the target for its minimum
> precision P (which would be 8 bits if the above is correct).
> The starting point would then be the maximum of:
> 
> - this P
> - the IV's natural precision
> - the precision needed to hold:
> the maximum number of scalar iterations multiplied by the scale factor
> (to convert scalar counts to bytes)
> 
> If the IV might wrap at that precision without producing all-zero lengths,
> it would be worth doubling the precision to avoid the wrapping issue,
> provided that we don't go beyond BITS_PER_WORD.
> 
Thanks! Will think/test more on this part.

>> +  tree obj_type = rgo->mask_type;
>> +  /* Here, take nscalars_per_iter as nbytes_per_iter for length.  */
>> +  unsigned int nscalars_per_iter = rgo->max_nscalars_per_iter;
> 
> I think whether we count scalars or count bytes is really a separate
> decision that shouldn't be tied directly to using lengths.  Length-based
> loads and stores on other arches might want to count scalars too.
> I'm not saying you should add support for that (it wouldn't be tested),
> but I think we should avoid structuring the code to make it harder to
> add in future.
> 

It makes sense, will update it.

> So I think nscalars_per_iter should always count scalars and anything
> length-based should be separate.  Would it make sense to store the
> length scale factor as a separate field?  I.e. using the terms
> above the rgroup_masks comment, the length IV step is:
> 
>factor * nS * VF == factor * nV * nL
> 

Yeah, factor*nS becomes what we wanted for length-based in bytes, factor
* nL would be the vector size.

> That way, applying the factor becomes separate from lengths vs. masks.
> 

Re: [stage1][PATCH] Make TOPN counter dynamically allocated.

2020-06-02 Thread Martin Liška

On 6/2/20 3:19 PM, Martin Liška wrote:

I'm suggesting to pre-allocate 16 gcov_kvp in the gcov run-time library.
Please take a look at the attached patch. I also added a test-case that
stresses that. I've just finished LTO PGO bootstrap of the GCC.

Ready for master?


There's V2 of the patch:

- guard __atomic_fetch_add in GCOV_SUPPORTS_ATOMIC

Martin
>From 0b8d980f31de2cba3f01ba8f3884f124c343dffe Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 2 Jun 2020 13:31:48 +0200
Subject: [PATCH] libgcov: support overloaded malloc

gcc/ChangeLog:

	* gcov-io.h (GCOV_PREALLOCATED_KVP): New.
	* gcov-tool.c: Add __gcov_kvp_pool
	and __gcov_kvp_pool_index variables.

libgcc/ChangeLog:

	* libgcov-driver.c: Add __gcov_kvp_pool
	and __gcov_kvp_pool_index variables.
	* libgcov.h (allocate_gcov_kvp): New.
	(gcov_topn_add_value): Use it.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-prof/indir-call-prof-malloc.c: New test.
---
 gcc/gcov-io.h |  3 ++
 .../gcc.dg/tree-prof/indir-call-prof-malloc.c | 49 +++
 libgcc/libgcov-driver.c   |  6 +++
 libgcc/libgcov.h  | 40 ++-
 4 files changed, 96 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 940eb7d8561..4dba01c78ce 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -292,6 +292,9 @@ GCOV_COUNTERS
 /* Maximum number of tracked TOP N value profiles.  */
 #define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32
 
+/* Number of pre-allocated gcov_kvp structures.  */
+#define GCOV_PREALLOCATED_KVP 16
+
 /* Convert a counter index to a tag.  */
 #define GCOV_TAG_FOR_COUNTER(COUNT)\
 	(GCOV_TAG_COUNTER_BASE + ((gcov_unsigned_t)(COUNT) << 17))
diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
new file mode 100644
index 000..454e224c95f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
@@ -0,0 +1,49 @@
+/* { dg-options "-O2 -ldl" } */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+
+int global;
+int global2;
+
+void report1 (size_t size)
+{
+  global++;
+}
+
+void report2 (size_t size)
+{
+  global2++;
+}
+
+typedef void (*tp) (size_t);
+static tp fns[] = {report1, report2};
+
+void* malloc(size_t size)
+{
+  static void* (*real_malloc)(size_t) = NULL;
+  if (!real_malloc)
+  real_malloc = dlsym(RTLD_NEXT, "malloc");
+
+  void *p = real_malloc (size);
+  fns[size % 2] (size);
+  // fprintf(stderr, "malloc(%d) = %p\n", size, p);
+  return p;
+}
+
+void *calloc (size_t n, size_t size)
+{
+  void *ptr = malloc (n * size);
+  __builtin_memset (ptr, 0, n * size);
+  return ptr;
+}
+
+void *ptr;
+
+int main()
+{
+  ptr = malloc (16);
+  return 0;
+}
diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index 8348d9f33ec..f2383e9d198 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -576,6 +576,12 @@ struct gcov_root __gcov_root;
 struct gcov_master __gcov_master = 
   {GCOV_VERSION, 0};
 
+/* Pool of pre-allocated gcov_kvp strutures.  */
+struct gcov_kvp __gcov_kvp_pool[GCOV_PREALLOCATED_KVP];
+
+/* Index to first free gcov_kvp in the pool.  */
+unsigned __gcov_kvp_pool_index;
+
 void
 __gcov_exit (void)
 {
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 1456100815d..392b6113101 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -249,6 +249,8 @@ struct indirect_call_tuple
   
 /* Exactly one of these will be active in the process.  */
 extern struct gcov_master __gcov_master;
+extern struct gcov_kvp __gcov_kvp_pool[GCOV_PREALLOCATED_KVP];
+extern unsigned __gcov_kvp_pool_index;
 
 /* Dump a set of gcov objects.  */
 extern void __gcov_dump_one (struct gcov_root *) ATTRIBUTE_HIDDEN;
@@ -419,6 +421,41 @@ gcov_counter_set_if_null (gcov_type *counter, struct gcov_kvp *node,
 }
 }
 
+/* Allocate gcov_kvp from heap.  If we are recursively called, then allocate
+   it from a list of pre-allocated pool.  */
+
+static inline struct gcov_kvp *
+allocate_gcov_kvp (void)
+{
+  struct gcov_kvp *new_node = NULL;
+  static volatile unsigned in_recursion ATTRIBUTE_UNUSED = 0;
+
+#if !defined(IN_GCOV_TOOL) && !defined(L_gcov_merge_topn)
+  if (__builtin_expect (in_recursion, 0))
+{
+  unsigned index;
+#if GCOV_SUPPORTS_ATOMIC
+  index
+	= __atomic_fetch_add (&__gcov_kvp_pool_index, 1, __ATOMIC_RELAXED);
+#else
+  index = __gcov_kvp_pool_index++;
+#endif
+  if (index < GCOV_PREALLOCATED_KVP)
+	new_node = &__gcov_kvp_pool[index];
+  else
+	gcc_unreachable ();
+}
+  else
+#endif
+{
+  in_recursion = 1;
+  new_node = (struct gcov_kvp *)xcalloc (1, sizeof (struct gcov_kvp));
+  in_recursion = 0;
+}
+
+  return new_node;
+}
+
 /* Add key value pair VALUE:COUNT to a top N COUNTERS.  When INCREMENT_TOTAL
is true, add COUNT to total of the TOP counter.  If USE_ATOMIC is true,
do it in atomic 

[PATCH] gcov: Fix build on AIX

2020-06-02 Thread Martin Liška

We must guard used atomic builtins with GCOV_SUPPORTS_ATOMIC.
The patch is tested on AIX and I'm going to push it.

libgcc/ChangeLog:

PR gcov-profile/95480
* libgcov-profiler.c (GCOV_SUPPORTS_ATOMIC): Move to...
* libgcov.h (GCOV_SUPPORTS_ATOMIC): ...here.
(gcov_counter_add): Use GCOV_SUPPORTS_ATOMIC guard.
(gcov_counter_set_if_null): Likewise.
---
 libgcc/libgcov-profiler.c | 11 ---
 libgcc/libgcov.h  | 22 --
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/libgcc/libgcov-profiler.c b/libgcc/libgcov-profiler.c
index 7b171382a07..45ab93c9776 100644
--- a/libgcc/libgcov-profiler.c
+++ b/libgcc/libgcov-profiler.c
@@ -26,17 +26,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include "libgcov.h"
 #if !defined(inhibit_libc)
 
-/* Detect whether target can support atomic update of profilers.  */

-#if __SIZEOF_LONG_LONG__ == 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
-#define GCOV_SUPPORTS_ATOMIC 1
-#else
-#if __SIZEOF_LONG_LONG__ == 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
-#define GCOV_SUPPORTS_ATOMIC 1
-#else
-#define GCOV_SUPPORTS_ATOMIC 0
-#endif
-#endif
-
 #ifdef L_gcov_interval_profiler
 /* If VALUE is in interval , then increases the
corresponding counter in COUNTERS.  If the VALUE is above or below
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 7c037a97878..1456100815d 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -85,6 +85,19 @@ typedef unsigned gcov_type_unsigned __attribute__ ((mode 
(QI)));
 #define GCOV_LOCKED 0
 #endif
 
+#ifndef GCOV_SUPPORTS_ATOMIC

+/* Detect whether target can support atomic update of profilers.  */
+#if __SIZEOF_LONG_LONG__ == 4 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#if __SIZEOF_LONG_LONG__ == 8 && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
+#define GCOV_SUPPORTS_ATOMIC 1
+#else
+#define GCOV_SUPPORTS_ATOMIC 0
+#endif
+#endif
+#endif
+
 /* In libgcov we need these functions to be extern, so prefix them with
__gcov.  In libgcov they must also be hidden so that the instance in
the executable is not also used in a DSO.  */
@@ -377,11 +390,14 @@ gcov_get_counter_target (void)
if USE_ATOMIC is true.  */
 
 static inline void

-gcov_counter_add (gcov_type *counter, gcov_type value, int use_atomic)
+gcov_counter_add (gcov_type *counter, gcov_type value,
+ int use_atomic ATTRIBUTE_UNUSED)
 {
+#if GCOV_SUPPORTS_ATOMIC
   if (use_atomic)
 __atomic_fetch_add (counter, value, __ATOMIC_RELAXED);
   else
+#endif
 *counter += value;
 }
 
@@ -390,11 +406,13 @@ gcov_counter_add (gcov_type *counter, gcov_type value, int use_atomic)
 
 static inline int

 gcov_counter_set_if_null (gcov_type *counter, struct gcov_kvp *node,
- int use_atomic)
+ int use_atomic ATTRIBUTE_UNUSED)
 {
+#if GCOV_SUPPORTS_ATOMIC
   if (use_atomic)
 return !__sync_val_compare_and_swap (counter, NULL, (intptr_t)node);
   else
+#endif
 {
   *counter = (intptr_t)node;
   return 1;
--
2.26.2



Re: [stage1][PATCH] Make TOPN counter dynamically allocated.

2020-06-02 Thread Martin Liška

On 6/3/20 3:07 AM, Gerald Pfeifer wrote:

On Tue, 2 Jun 2020, Martin Liška wrote:

Ready for master?


Before that, my nightly tester on i386-unknown-freebsd11 just ran into
the following:

   /scratch/tmp/gerald/GCC-HEAD/gcc/../libgcc/libgcov.h:396:51: error:
   cannot initialize a parameter of type 'gcov_type' (aka 'long long')
   with an rvalue of type 'nullptr_t'
 return !__sync_val_compare_and_swap (counter, NULL, (intptr_t)node);
   ^~~~
   /usr/include/sys/_null.h:35:14: note: expanded from macro 'NULL'
   #define NULLnullptr
   ^~~


Hello.

Sorry for the breakage. Can you please paste full build output for the 
problematic
.o file?

I bet it's a C file compilation, where we should use:

__sync_val_compare_and_swap (counter, 0, (intptr_t)node);

Can you please test it?



That seems to have come in via 871e5ada6d53d5eb495cc9f323983f347487c1b2

   Author: Martin Liska 
   Date:   Fri Jan 31 13:10:14 2020 +0100

 Make TOPN counter dynamically allocated.
   
   :

 * libgcov.h (gcov_counter_add): New.
 (gcov_counter_set_if_null): Likewise.
 (gcov_topn_add_value): New.
   
which added the gcov_counter_set_if_null function.


(The data is quite misleading and appears to be when you first
committed this locally?  Does git have a good way to show when
something was pushed?)


I would recommend:

$ git show --format=fuller 871e5ada6d53d5eb495cc9f323983f347487c1b2
...
commit 871e5ada6d53d5eb495cc9f323983f347487c1b2
Author: Martin Liska 
AuthorDate: Fri Jan 31 13:10:14 2020 +0100
Commit: Martin Liska 
CommitDate: Tue Jun 2 12:11:02 2020 +0200

Martin



Gerald





Re: [PATCH] Do not copy NULL string with memcpy.

2020-06-02 Thread Martin Liška

On 6/3/20 5:58 AM, Alexandre Oliva wrote:

Please let me know if you'd prefer me to take this PR over.


Yes, please take a look.

Martin


PING Re: testsuite: clarify scan-dump file globbing behavior

2020-06-02 Thread Frederik Harwath
Frederik Harwath  writes:

ping :-)

> Frederik Harwath  writes:
>
> Hi Rainer, hi Mike,
> ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545803.html
>
> Best regards,
> Frederik
>
>> Hi Thomas,
>>
>> Thomas Schwinge  writes:
>>
>>> I can't formally approve testsuite patches, but did a review anyway:
>>
>> Thanks for the review!
>>
>>> On 2020-05-15T12:31:54+0200, Frederik Harwath  
>>> wrote:
>>
 The dump
 scanning procedures are changed to make the test unresolved
 if globbing matches more than one file.
>>>
>>> (The code changes look good, but I have not tested that specific aspect.)
>>
>> We do not have automated tests for the testsuite commands :-), but I
>> have of course tested this manually.
>>
>>> As I said, not an approval, and minor comments (see below), but still:
>>>
>>> Reviewed-by: Thomas Schwinge 
>>>
>>> Do we have to similarly also audit/alter other testsuite infrastructure
>>> files, anything that uses '[glob [...]]'?  (..., and then generalize
>>> 'glob-dump-file' into 'glob-one-file', or similar.)  That can be done
>>> incrementally, as far as I'm concerned.
>>
>> I also think it would make sense to adapt similar test commands as well.
>>
>>> May also make this more useful/explicit:
>>>
>>> This is useful if, for example, if a pass has several static
>>> instances [correct terminology?], and depending on torture testing
>>> command-line flags, a different instance executes and produces a dump
>>> file, and so in the test case you can use a generic [put example
>>> here] to scan the varying dump files names.
>>>
>>> (Or similar.)
>>
>> I have moved the explanation below the description of the individual
>> commands and added an example. See the attached revised patch.
>>
>> Best regards,
>> Frederik
>>
>> From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001
>> From: Frederik Harwath 
>> Date: Fri, 15 May 2020 10:35:48 +0200
>> Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior
>>
>> The test commands for scanning optimization dump files
>> perform globbing on the argument that specifies the suffix
>> of the dump files to be scanned.  This behavior is currently
>> undocumented.  Furthermore, the current implementation of
>> "scan-dump" and similar procedures yields an error whenever
>> the file name globbing matches more than one file (due to an
>> attempt to call "open" on multiple files) while a failure to
>> match any file results in an unresolved test.
>>
>> This commit documents the globbing behavior.  The dump
>> scanning procedures are changed to make the test unresolved
>> if globbing matches more than one file.
>>
>> gcc/ChangeLog:
>>
>> 2020-05-19  Frederik Harwath  
>>
>>  * doc/sourcebuild.texi: Describe globbing of the
>>  dump file scanning commands "suffix" argument.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2020-05-19  Frederik Harwath  
>>
>>  * lib/scandump.exp (glob-dump-file): New proc.
>>  (scan-dump): Use glob-dump-file for file name expansion.
>>  (scan-dump-times): Likewise.
>>  (scan-dump-dem): Likewise.
>>  (scan-dump-dem-not): Likewise.
>>
>> Reviewed-by: Thomas Schwinge 
>> ---
>>  gcc/doc/sourcebuild.texi   | 13 
>>  gcc/testsuite/lib/scandump.exp | 54 +++---
>>  2 files changed, 56 insertions(+), 11 deletions(-)
>>
>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>> index 240d6e4b08e..9df4b06d460 100644
>> --- a/gcc/doc/sourcebuild.texi
>> +++ b/gcc/doc/sourcebuild.texi
>> @@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text 
>> in the dump file with
>>  suffix @var{suffix}.
>>  @end table
>>
>> +The @var{suffix} argument which describes the dump file to be scanned
>> +may contain a glob pattern that must expand to exactly one file
>> +name. This is useful if, e.g., different pass instances are executed
>> +depending on torture testing command-line flags, producing dump files
>> +whose names differ only in their pass instance number suffix.  For
>> +example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for
>> +occurrences of the string ``code has been optimized'', use:
>> +@smallexample
>> +/* @{ dg-options "-fdump-tree-mypass" @} */
>> +/* @{ dg-final @{ scan-tree-dump "code has been optimized" "mypass\[1-3\]" 
>> @} @} */
>> +@end smallexample
>> +
>> +
>>  @subsubsection Check for output files
>>
>>  @table @code
>> diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
>> index d6ba350acc8..f3a991b590a 100644
>> --- a/gcc/testsuite/lib/scandump.exp
>> +++ b/gcc/testsuite/lib/scandump.exp
>> @@ -39,6 +39,34 @@ proc dump-base { args } {
>>  return $dumpbase
>>  }
>>
>> +# Expand dump file name pattern to exactly one file.
>> +# Return a single dump file name or an empty string
>> +# if the pattern matches no file or more than one file.
>> +#
>> +# Argument 0 is the testcase name
>> +# Argument 1 is the dump file glob pattern
>> +proc glob-du

[PATCH V2] PING^ correct COUNT and PROB for unrolled loop

2020-06-02 Thread Jiufu Guo via Gcc-patches
Jiufu Guo via Gcc-patches  writes:

Hi,

I would like to reping this, hope to get approval for this patch.
https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html

BR,
Jiufu Guo

> Jiufu Guo  writes:
>
> Hi,
>
> I'd like to ping this patch for trunk on stage 1.
>
> This patch could fix the issue on incorrect COUNT/FREQUENCES of loop
> unrolled blocks, and also could help the improve the cold/hot issue of
> the unrolled loops.
>
> patch is also at
> https://gcc.gnu.org/legacy-ml/gcc-patches/2020-02/msg00927.html
>
> Thanks,
> Jiufu
>
>> Jiufu Guo  writes:
>>
>> Hi!
>>
>> I'd like to ping following patch. As near end of gcc10 stage 4, it seems
>> I would ask approval for GCC11 trunk.
>>
>> Thanks,
>> Jiufu Guo
>>
>>> Hi Honza and all,
>>>
>>> I updated the patch a little as below. Bootstrap and regtest are ok
>>> on powerpc64le.
>>>
>>> Is OK for trunk?
>>>
>>> Thanks for comments.
>>> Jiufu
>>>
>>> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
>>> index 727e951..ded0046 100644
>>> --- a/gcc/cfgloopmanip.c
>>> +++ b/gcc/cfgloopmanip.c
>>> @@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "gimplify-me.h"
>>>  #include "tree-ssa-loop-manip.h"
>>>  #include "dumpfile.h"
>>> +#include "cfgrtl.h"
>>>  
>>>  static void copy_loops_to (class loop **, int,
>>>class loop *);
>>> @@ -1258,14 +1259,30 @@ duplicate_loop_to_header_edge (class loop *loop, 
>>> edge e,
>>>   /* If original loop is executed COUNT_IN times, the unrolled
>>>  loop will account SCALE_MAIN_DEN times.  */
>>>   scale_main = count_in.probability_in (scale_main_den);
>>> +
>>> + /* If we are guessing at the number of iterations and count_in
>>> +becomes unrealistically small, reset probability.  */
>>> + if (!(count_in.reliable_p () || loop->any_estimate))
>>> +   {
>>> + profile_count new_count_in = count_in.apply_probability 
>>> (scale_main);
>>> + profile_count preheader_count = loop_preheader_edge (loop)->count 
>>> ();
>>> + if (new_count_in.apply_scale (1, 10) < preheader_count)
>>> +   scale_main = profile_probability::likely ();
>>> +   }
>>> +
>>>   scale_act = scale_main * prob_pass_main;
>>> }
>>>else
>>> {
>>> + profile_count new_loop_count;
>>>   profile_count preheader_count = e->count ();
>>> - for (i = 0; i < ndupl; i++)
>>> -   scale_main = scale_main * scale_step[i];
>>>   scale_act = preheader_count.probability_in (count_in);
>>> + /* Compute final preheader count after peeling NDUPL copies.  */
>>> + for (i = 0; i < ndupl; i++)
>>> +   preheader_count = preheader_count.apply_probability (scale_step[i]);
>>> + /* Subtract out exit(s) from peeled copies.  */
>>> + new_loop_count = count_in - (e->count () - preheader_count);
>>> + scale_main = new_loop_count.probability_in (count_in);
>>> }
>>>  }
>>>  
>>> @@ -1381,6 +1398,38 @@ duplicate_loop_to_header_edge (class loop *loop, 
>>> edge e,
>>>   scale_bbs_frequencies (new_bbs, n, scale_act);
>>>   scale_act = scale_act * scale_step[j];
>>> }
>>> +
>>> +  /* Need to update PROB of exit edge and corresponding COUNT.  */
>>> +  if (orig && is_latch && (!bitmap_bit_p (wont_exit, j + 1))
>>> + && bbs_to_scale)
>>> +   {
>>> + edge new_exit = new_spec_edges[SE_ORIG];
>>> + profile_count new_count_in = new_exit->src->count;
>>> + profile_count preheader_count = loop_preheader_edge (loop)->count ();
>>> + edge e;
>>> + edge_iterator ei;
>>> +
>>> + FOR_EACH_EDGE (e, ei, new_exit->src->succs)
>>> +   if (e != new_exit)
>>> + break;
>>> +
>>> + gcc_assert (e && e != new_exit);
>>> +
>>> + new_exit->probability = preheader_count.probability_in (new_count_in);
>>> + e->probability = new_exit->probability.invert ();
>>> +
>>> + profile_count new_latch_count
>>> +   = new_exit->src->count.apply_probability (e->probability);
>>> + profile_count old_latch_count = e->dest->count;
>>> +
>>> + EXECUTE_IF_SET_IN_BITMAP (bbs_to_scale, 0, i, bi)
>>> +   scale_bbs_frequencies_profile_count (new_bbs + i, 1,
>>> +new_latch_count,
>>> +old_latch_count);
>>> +
>>> + if (current_ir_type () != IR_GIMPLE)
>>> +   update_br_prob_note (e->src);
>>> +   }
>>>  }
>>>free (new_bbs);
>>>free (orig_loops);
>>> diff --git a/gcc/testsuite/gcc.dg/pr68212.c b/gcc/testsuite/gcc.dg/pr68212.c
>>> new file mode 100644
>>> index 000..f3b7c22
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/pr68212.c
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fno-tree-vectorize -funroll-loops --param 
>>> max-unroll-times=4 -fdump-rtl-alignments" } */
>>> +
>>> +void foo(long int *a, long int *b, long int n)
>>> +{
>>> +  long int i;
>>> +
>>> +  for (i = 0; i < n; i++)
>>> +a[i] = *b;
>>> +}

Re: [PATCH] Do not copy NULL string with memcpy.

2020-06-02 Thread Alexandre Oliva
Hello, Martin,

On Jun  2, 2020, Martin Liška  wrote:

> The problem happens when we generate temp file
> for .res file. Tested locally with the problematic
> options.

Thanks for looking into this.

> Ready for master?

Erhm, no, I don't think that's correct.

With local analysis, the length computation just before has only
allocated space for basename_length.  If you copy outbase_length, you
might be writing past the end of the allocated area.


I guess basename_length is 0, otherwise you'd see a crash without the
assert you added in the PR.

With some global analysis (and running the testcase), it appears to me
that when input_basename is NULL (e.g., when processing linker specs),
so is outbase, so your proposed change appears to be trading one 0-byte
memcpy from NULL for another.

I'd rather make it:

  if (!outbase_length)
{
  if (basename_length)
memcpy (tmp + dumpdir_length, input_basename, basename_length);
}
  else
memcpy (tmp + dumpdir_length, outbase, outbase_length);

or maybe:

  if (outbase_length)
memcpy (tmp + dumpdir_length, outbase, outbase_length);
  else if (basename_length)
memcpy (tmp + dumpdir_length, input_basename, basename_length);


Please let me know if you'd prefer me to take this PR over.

-- 
Alexandre Oliva, freedom fighterhe/himhttps://FSFLA.org/blogs/lxo/
Free Software Evangelist  Stallman was right, but he's left :(
GNU Toolchain Engineer   Live long and free, and prosper ethically


Re: [PATCH 0/4] IVOPTs consider step cost for different forms when unrolling

2020-06-02 Thread Kewen.Lin via Gcc-patches
Hi Richi,

on 2020/6/2 下午7:38, Richard Biener wrote:
> On Thu, 28 May 2020, Kewen.Lin wrote:
> 
>> Hi,
>>
>> This is one repost and you can refer to the original series 
>> via https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538360.html.
>>
>> As we discussed in the thread
>> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html
>> Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html,
>> I'm working to teach IVOPTs to consider D-form group access during unrolling.
>> The difference on D-form and other forms during unrolling is we can put the
>> stride into displacement field to avoid additional step increment. eg:
>>
>> With X-form (uf step increment):
>>   ...
>>   LD A = baseA, X
>>   LD B = baseB, X
>>   ST C = baseC, X
>>   X = X + stride
>>   LD A = baseA, X
>>   LD B = baseB, X
>>   ST C = baseC, X
>>   X = X + stride
>>   LD A = baseA, X
>>   LD B = baseB, X
>>   ST C = baseC, X
>>   X = X + stride
>>   ...
>>
>> With D-form (one step increment for each base):
>>   ...
>>   LD A = baseA, OFF
>>   LD B = baseB, OFF
>>   ST C = baseC, OFF
>>   LD A = baseA, OFF+stride
>>   LD B = baseB, OFF+stride
>>   ST C = baseC, OFF+stride
>>   LD A = baseA, OFF+2*stride
>>   LD B = baseB, OFF+2*stride
>>   ST C = baseC, OFF+2*stride
>>   ...
>>   baseA += stride * uf
>>   baseB += stride * uf
>>   baseC += stride * uf
>>
>> Imagining that if the loop get unrolled by 8 times, then 3 step updates with
>> D-form vs. 8 step updates with X-form. Here we only need to check stride
>> meet D-form field requirement, since if OFF doesn't meet, we can construct
>> baseA' with baseA + OFF.
> 
> I'd just mention there are other targets that have the choice between
> the above forms.  Since IVOPTs itself does not perform the unrolling
> the IL it produces is the same, correct?
> 
Yes.  Before this patch, IVOPTs doesn't consider the unrolling impacts,
it only models things based on what it sees.  We can assume it thinks
later RTL unrolling won't perform.

With this patch, since the IV choice probably changes, the IL can probably
change.  The typical difference with this patch is:

  vect__1.7_15 = MEM[symbol: x, index: ivtmp.19_22, offset: 0B];
vs.
  vect__1.7_15 = MEM[base: _29, offset: 0B];

BR,
Kewen

> Richard.
> 


Re: [PATCH 6/7] PowerPC tests: Add PC-relative tests.

2020-06-02 Thread Michael Meissner via Gcc-patches
On Mon, Jun 01, 2020 at 05:45:34PM -0500, will schmidt wrote:
> Similar/same comment as was made in Apr.I recommend something like 
> 
> "Test whether pc-relative prefixed instructions
> are generated for the _Decimal64 type." 

Ok, I missed that comment in April.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling

2020-06-02 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2020/6/2 下午3:14, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> Thanks for the comments!
>>
>> on 2020/6/2 上午1:59, Richard Sandiford wrote:
>>> Could you go into more detail about this choice of cost calculation?
>>> It looks like we first calculate per-group flags, which are true only if
>>> the unrolled offsets are valid for all uses in the group.  Then we create
>>> per-candidate flags when associating candidates with groups.
>>>
>>
>> Sure.  It checks every address type IV group to determine whether this
>> group is valid to use reg offset addressing mode.  Here we only need to
>> check the first one and the last one, since the intermediates should 
>> have been handled by split_address_groups.  With unrolling the
>> displacement of the address can be offset-ed by (UF-1)*step, check the
>> address with this max offset whether still valid.  If the check finds
>> it's valid to use reg offset mode for the whole group, we flag this
>> group.  Later, when we create IV candidate for address group flagged,
>> we flag the candidate further.  This flag is mainly for iv cand
>> costing, we don't need to scale up iv cand's step cost for this kind
>> of candidate.
> 
> But AIUI, this is calculating whether the uses in their original form
> support all unrolled offsets.  For ivopts, I think the question is really
> whether the uses support all unrolled offsets when based on a given IV
> candidate (which might not be the original IV).
> 

Good point!  Indeed, the patch only flags the IV cands derived from the
address group flagged with reg_offset_p, it has the possibility that
we miss some other candidates with same basic but different offset
which can satisfy addr_offset_valid_p.

How about to update the current approach to: instead of flag the derived
iv cand, when we determine the cost for iv cand, we check whether this
iv cand has the same basic object as the one of any reg_offset_p vuse[0]
(both should be stripped their offsets), then further check the offset
can satisfy addr_offset_valid_p, if all checks expected, update the step
cost without uf consideration.

I would expect this kind of address based iv cand will mainly be used
for address type group and compare type group, for the address type 
group, it can be only applied for those with same basic object, in most
cases they are put in the same address group.  If it's used for generic
much, the step cost tweaking might not be fixed at the beginning but
varies according to the iv set members.  It looks too much for the
existing framework.

> E.g. there might be another IV candidate at a constant offset
> from the original one, and the offsets might all be in range
> for that offset too.
> 
>> Imagining this loop is being unrolled, all the statements will be
>> duplicated by UF.  For the cost modeling against iv group, it's
>> scaling up the cost by UF (here I simply excluded the compare_type
>> since in most cases it for loop ending check).  For the cost modeling
>> against iv candidate, it's to focus on step costs, for an iv candidate
>> we flagged before, it's taken as one time step cost, for the others,
>> it's scaling up the step cost since the unrolling make step 
>> calculation become UF times.
>>
>> This cost modeling is trying to simulate cost change after the
>> unrolling, scaling up the costs accordingly.  There are somethings
>> to be improved like distinguish the loop ending compare or else,
>> whether need to tweak the other costs somehow since the scaling up
>> probably cause existing cost framework imbalance, but during
>> benchmarking I didn't find these matter, so take it as simple as 
>> possible for now.
>>
>>
>>> Instead, couldn't we take this into account in get_address_cost,
>>> which calculates the cost of an address use for a given candidate?
>>> E.g. after the main if-else at the start of the function,
>>> perhaps it would make sense to add the worst-case offset to
>>> the address in “parts”, check whether that too is a valid address,
>>> and if not, increase var_cost by the cost of one add instruction.
>>>
>>
>> IIUC, what you suggest is to tweak the iv group cost, if we find
>> one address group is valid for reg offset mode, we price more on
>> the pairs between this group and other non address-based iv cands.
>> The question is how do we decide this add-on cost.  For the test
>> case I was working on initially, adding one cost (of add) doesn't
>> work, the normal iv still wined.  We can price it more like two
>> but what's the justification on this value, by heuristics?
> 
> Yeah, I was thinking of adding one instance of add_cost.  If that
> doesn't work, it'd be interesting to know why in more detail.
> 

The case is like:

for (i = 0; i < SIZE; i++)
  y[i] = a * x[i] + z[i];

It has three array access in the loop body, after vectorization,
it looks like

  vect__1.7_15 = MEM  [(double *)vectp_x.5_20];
  vect__3.8_13 = vect__1.7_15 * vect_cst__14;
  vect__4.11_19 = ME

[PATCH] Fix zero-masking for vcvtps2ph when dest operand is memory.

2020-06-02 Thread Hongtao Liu via Gcc-patches
Hi:
  When dest is memory, zero-masking is not valid, only merging-masking
is available,

  Bootstrap is ok, regression test on i386/x86-64 backend is ok.

gcc/ChangeLog:
* gcc/config/i386/sse.md (*vcvtps2ph_store):
Refine from *vcvtps2ph_store.
(vcvtps2ph256): Refine constraint from vm to v.
(avx512f_vcvtps2ph512): Ditto.
(*vcvtps2ph256): New define_insn.
(*avx512f_vcvtps2ph512): Ditto.
* gcc/config/i386/subst.md (merge_mask): New define_subst.
(merge_mask_name): New define_subst_attr.
(merge_mask_operand3): Ditto.

gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-vcvtps2ph-pr95254.c: New test.
* gcc.target/i386/avx512vl-vcvtps2ph-pr95254.c: Ditto.
-- 
BR,
Hongtao


0001-Fix-zero-masking-for-vcvtps2ph-when-dest-operand-is-.patch
Description: Binary data


Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-06-02 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Tue, Jun 2, 2020 at 4:10 AM Jiufu Guo  wrote:
>>
>> Jiufu Guo  writes:
>>
>> Hi,
>>
>> I updated the patch just a little accordinlgy.  Thanks!
>>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 4464049fc1f..570e2aa53c8 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -2856,6 +2856,10 @@ funroll-all-loops
>>  Common Report Var(flag_unroll_all_loops) Optimization
>>  Perform loop unrolling for all loops.
>>
>> +funroll-completely-grow-size
>> +Common Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
>> +; Internal undocumented flag, allow size growth during complete unrolling
>> +
>>  ; Nonzero means that loop optimizer may assume that the induction variables
>>  ; that control loops do not overflow and that the loops with nontrivial
>>  ; exit condition are not infinite
>> diff --git a/gcc/toplev.c b/gcc/toplev.c
>> index 96316fbd23b..8d52358efdd 100644
>> --- a/gcc/toplev.c
>> +++ b/gcc/toplev.c
>> @@ -1474,6 +1474,10 @@ process_options (void)
>>if (flag_unroll_all_loops)
>>  flag_unroll_loops = 1;
>>
>> +  /* Allow cunroll to grow size accordingly.  */
>> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
>> +flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
>> +
>
> (*)
>
>>/* web and rename-registers help when run after loop unrolling.  */
>>if (flag_web == AUTODETECT_VALUE)
>>  flag_web = flag_unroll_loops;
>> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
>> index 8ab6ab3330c..298ab215530 100644
>> --- a/gcc/tree-ssa-loop-ivcanon.c
>> +++ b/gcc/tree-ssa-loop-ivcanon.c
>> @@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
>>   re-peeling the same loop multiple times.  */
>>if (flag_peel_loops)
>>  peeled_loops = BITMAP_ALLOC (NULL);
>> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
>> -  || flag_peel_loops
>> -  || optimize >= 3, true);
>> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size,
>
> With -O3 -fno-peel-loops this does not have the same effect anymore.
> So above (*) you'd need to check || optimize >= 3 as well.

Oh, yes.  Thanks for your kindly review!

BR,
Jiufu

>
>> +  true);
>>if (peeled_loops)
>>  {
>>BITMAP_FREE (peeled_loops);
>>
>> BR,
>> Jiufu
>>
>> > Richard Biener  writes:
>> >
>>
>> >>> >> From: Jiufu Guo 
>> >>> >>
>> >>> >> Currently GIMPLE complete unroller(cunroll) is checking
>> >>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
>> >>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
>> >>> >> To have more freedom to control cunroll and RTL unroller, this patch
>> >>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
>> >>> >> cunroll and RTL unroller indepently.
>> >>> >>
>> >>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
>> >>> >> GCC10 after week?
>> >>> >>
>> >>> >>
>> >>> >> +funroll-completely-grow-size
>> >>> >> +Var(flag_cunroll_grow_size) Init(2)
>> >>> >> +; Control cunroll to allow size growth during complete unrolling
>> >>> >> +
>> >>> >
>> >>
>> >> It won't work without adjusting the awk scripts.  So go with
>> >>
>> >> funroll-completely-grow-size
>> >> Undocumented Optimization Var(flag_cunroll_grow_size)
>> >> EnabledBy(funroll-loops || fpeel-loops)
>> >> ; ...
>> >>
>> > EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
>> > "-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.
>> >
>> > Through "EnabledBy", a flag can be turned, and also can be turned off by
>> > the "EnabledBy option", only if the flag is not specifed through commond
>> > line.
>> >
>> >> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
>> >> an option not supposed to be set by users?
>> >>
>> >
>> > global_options_set.x_flagxxx can be used to check if option is set by
>> > user.  But it does not work well here neither, because we also care of
>> > if the flag is override by OPTION_OPTIMIZATION_TABLE or
>> > OPTION_OVERRIDE.
>> >
>> > AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
>> > flag_rename_registers, flag_var_tracking, flag_tree_cselim...
>> > And this way could be used to check if the flag is effective(on/off)
>> > either explicit set by command line or implicit set through
>> > OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
>> > So, I use it here.
>>
>>


Re: [stage1][PATCH] Make TOPN counter dynamically allocated.

2020-06-02 Thread Gerald Pfeifer
On Tue, 2 Jun 2020, Martin Liška wrote:
> Ready for master?

Before that, my nightly tester on i386-unknown-freebsd11 just ran into 
the following:

  /scratch/tmp/gerald/GCC-HEAD/gcc/../libgcc/libgcov.h:396:51: error: 
  cannot initialize a parameter of type 'gcov_type' (aka 'long long') 
  with an rvalue of type 'nullptr_t'
return !__sync_val_compare_and_swap (counter, NULL, (intptr_t)node);
  ^~~~
  /usr/include/sys/_null.h:35:14: note: expanded from macro 'NULL'
  #define NULLnullptr
  ^~~

That seems to have come in via 871e5ada6d53d5eb495cc9f323983f347487c1b2

  Author: Martin Liska 
  Date:   Fri Jan 31 13:10:14 2020 +0100

Make TOPN counter dynamically allocated.
  
  :
* libgcov.h (gcov_counter_add): New.
(gcov_counter_set_if_null): Likewise.
(gcov_topn_add_value): New.
  
which added the gcov_counter_set_if_null function.

(The data is quite misleading and appears to be when you first 
committed this locally?  Does git have a good way to show when 
something was pushed?)

Gerald


[PATCH] avoid false positives due to compute_objsize (PR 95353)

2020-06-02 Thread Martin Sebor via Gcc-patches

The compute_objsize() function started out as a thin wrapper around
compute_builtin_object_size(), but over time developed its own
features to compensate for the other function's limitations (such
as its inability to work with ranges).  The interaction of these
features and the limitations has started to become increasingly
problematic as the former function is used in more contexts.

A complete "fix" for all the problems (as well as some other
limitations) that I'm working on will be more extensive and won't
be appropriate for backports.  Until then, the attached patch
cleans up the extensions compute_objsize() has accumulated over
the years to avoid a class of false positives.

To make the warnings issued based on the results of the function
easier to understand and fix, the patch also adds an informative
message to many instances of -Wstringop-overflow to point to
the object to which the warning refers.  This is especially
helpful when the object is referenced by a series of pointer
operations.

Tested by boostrapping on x86_64-linux and building Binutils/GDB,
Glibc, and the Linux kernel with no new warnings.

Besides applying it to trunk I'm looking to backport the fix to
GCC 10.

Martin
PR middle-end/95353 - spurious -Wstringop-overflow writing to a trailing array plus offset
PR middle-end/92939 - missing -Wstringop-overflow on negative index from the end of array

gcc/ChangeLog:

	PR middle-end/95353
	PR middle-end/92939
	* builtins.c (inform_access): New function.
	(check_access): Call it.  Add argument.
	(addr_decl_size): Remove.
	(get_range): New function.
	(compute_objsize): New overload.  Only use compute_builtin_object_size
	with raw memory function.
	(check_memop_access): Pass new argument to compute_objsize and
	check_access.
	(expand_builtin_memchr, expand_builtin_strcat): Same.
	(expand_builtin_strcpy, expand_builtin_stpcpy_1): Same.
	(expand_builtin_stpncpy, check_strncat_sizes): Same.
	(expand_builtin_strncat, expand_builtin_strncpy): Same.
	(expand_builtin_memcmp): Same.
	* builtins.h (check_nul_terminated_array): Declare extern.
	(check_access): Add argument.
	(struct access_ref, struct access_data): New structs.
	* gimple-ssa-warn-restrict.c (clamp_offset): New helper.
	(builtin_access::overlap): Call it.
	* tree-object-size.c ((decl_init_size): Declare extern.
	(addr_object_size): Correct offset computation.
	* tree-object-size.h (decl_init_size): Declare.
	* tree-ssa-strlen.c (handle_integral_assign): Remove a call
	to maybe_warn_overflow when assigning to an SSA_NAME.

gcc/testsuite/ChangeLog:

	PR middle-end/95353
	PR middle-end/92939
	* c-c++-common/Wstringop-truncation.c: Remove an xfail.
	* gcc.dg/Warray-bounds-46.c: Remove a bogus warning.
	* gcc.dg/Wrestrict-9.c: Disable -Wstringop-overflow.
	* gcc.dg/Wstringop-overflow-12.c: Remove xfails.
	* gcc.dg/Wstringop-overflow-28.c: Same.
	* gcc.dg/builtin-stringop-chk-4.c: Same.
	* gcc.dg/builtin-stringop-chk-5.c: Same.
	* gcc.dg/builtin-stringop-chk-8.c: Same.
	* gcc.dg/strlenopt-74.c: Avoid buffer overflow.
	* gcc.dg/Wstringop-overflow-33.c: New test.
	* gcc.dg/Wstringop-overflow-34.c: New test.
	* gcc.dg/Wstringop-overflow-35.c: New test.
	* gcc.dg/Wstringop-overflow-36.c: New test.
	* gcc.dg/Wstringop-overflow-37.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 53bae599d3e..ae2ed974e57 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3310,6 +3310,130 @@ determine_block_size (tree len, rtx len_rtx,
 			  GET_MODE_MASK (GET_MODE (len_rtx)));
 }
 
+/* Issue an inform message describing the target of an access REF.
+   WRITE is set for a write access and clear for a read access.  */
+
+static void
+inform_access (const access_ref &ref, bool write)
+{
+  if (!ref.ref)
+return;
+
+  /* Convert offset range and avoid including a zero range since it isn't
+ necessarily meaningful.  */
+  long long minoff = 0, maxoff = 0;
+  if (wi::fits_shwi_p (ref.offrng[0])
+  && wi::fits_shwi_p (ref.offrng[1]))
+{
+  minoff = ref.offrng[0].to_shwi ();
+  maxoff = ref.offrng[1].to_shwi ();
+}
+
+  /* Convert size range and always include it since all sizes are
+ meaningful. */
+  unsigned long long minsize = 0, maxsize = 0;
+  if (wi::fits_shwi_p (ref.sizrng[0])
+  && wi::fits_shwi_p (ref.sizrng[1]))
+{
+  minsize = ref.sizrng[0].to_shwi ();
+  maxsize = ref.sizrng[1].to_shwi ();
+}
+
+  char sizestr[80];
+  location_t loc;
+  tree allocfn = NULL_TREE;
+  if (TREE_CODE (ref.ref) == SSA_NAME)
+{
+  gimple *stmt = SSA_NAME_DEF_STMT (ref.ref);
+  gcc_assert (is_gimple_call (stmt));
+  loc = gimple_location (stmt);
+  allocfn = gimple_call_fndecl (stmt);
+  if (!allocfn)
+	/* Handle calls through pointers to functions.  */
+	allocfn = gimple_call_fn (stmt);
+
+  /* SIZRNG doesn't necessarily have the same range as the allocation
+	 size determined by gimple_call_alloc_size ().  */
+
+  if (minsize == maxsize)
+	sprintf (sizestr, "%llu", minsize);
+ 

Re: [PATCH] rs6000: Use REAL_TYPE to copy when block move array in structure[PR65421]

2020-06-02 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 04:56:49PM -0400, David Edelsohn wrote:
> > > +  if (TREE_CODE (type) == RECORD_TYPE
> > > +  && rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, 
> > > NULL,
> > > + NULL))
> > > +{
> > > +  tree field_type = TREE_TYPE (first_field (type));
> > > +  if (field_type && TREE_CODE (field_type) == ARRAY_TYPE
> > > +   && TREE_CODE (TREE_TYPE (field_type)) == REAL_TYPE)
> > > + elt_mode = TYPE_MODE (TREE_TYPE (field_type));
> > > +}
> >
> > Homogeneous aggregates only exist in the ELFv2 ABI, while the problem
> > here is the SP float things.  You also noticed (elsewhere) that if the
> > struct contains (say) SI, SF, SI, SF, then this does not help.
> >
> > Is there some better condition this could use, and maybe an expansion
> > that works in more cases as well?
> >
> > And, it would be lovely if generic code could expand to something better
> > already (not expand to a block move at all, certainly not for something
> > as tiny as this).
> 
> And please don't refer to homogeneous aggregates outside of ELFv2 ABI
> code because that will miss an optimization or generate incorrect code
> other PowerPC OSes and ABIs, such as AIX.

Yes, rs6000_discover_homogeneous_aggregate always returns false if some
other ABI is in use, which means for this particular code that the
problem isn't solved for those other ABIs at all.


Segher


[committed] jit: fix __builtin_unreachable [PR 95426]

2020-06-02 Thread David Malcolm via Gcc-patches
PR jit/95426 reports a crash deep inside "expand" when using
__builtin_unreachable via gcc_jit_context_get_builtin_function,
due to BLOCK_FOR_INSN being erroneously used on a barrier within
rtl_verify_bb_pointers.

The root cause turns out to be that I didn't implement
LANG_HOOKS_COMMON_ATTRIBUTE_TABLE and LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE
for the jit "frontend".  When building a decl for the builtin, the
libgccjit frontend generates a chain of attributes names, but when
this is passed to decl_attributes and the attributes are looked up by
namespace and name within lookup_scoped_attribute_spec, attributes_table
is empty.  Hence no attributes were being associated with the fndecl,
and so ECF_NORETURN was not set on the gimple_call (along with various
other flags missing on the decl, etc), and so the call is treated as
not terminating its BB, and so the CFG rapidly diverges from the
equivalent created by the C frontend.

This patch fixes things by implementing these langhooks, copying the
minimal attribute-handling code from LTO.  I stepped through the
creation of the fndecl and verified that with this fix it has the same
attributes as the equivalent created by the C frontend.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to master as r11-839-g44564c4c811f4751daf363ca019a9f9bed702f4f.

gcc/jit/ChangeLog:
PR jit/95426
* dummy-frontend.c: Include "options.h", "stringpool.h", and
"attribs.h".
(ATTR_EXCL): New, copied from lto/lto-lang.c.
(attr_noreturn_exclusions): Likewise.
(attr_returns_twice_exclusions): Likewise.
(attr_const_pure_exclusions): Likewise.
(jit_attribute_table): Likewise, copied from lto_attribute_table.
(jit_format_attribute_table): Likewise, copied from
lto_format_attribute_table.
(handle_noreturn_attribute): New, copied from lto/lto-lang.c.
(handle_leaf_attribute): Likewise.
(handle_const_attribute): Likewise.
(handle_malloc_attribute): Likewise.
(handle_pure_attribute): Likewise.
(handle_novops_attribute): Likewise.
(get_nonnull_operand): Likewise.
(handle_nonnull_attribute): Likewise.
(handle_nothrow_attribute): Likewise.
(handle_sentinel_attribute): Likewise.
(handle_type_generic_attribute): Likewise.
(handle_transaction_pure_attribute): Likewise.
(handle_returns_twice_attribute): Likewise.
(handle_patchable_function_entry_attribute): Likewise.
(ignore_attribute): Likewise.
(handle_format_attribute): Likewise.
(handle_format_arg_attribute): Likewise.
(handle_fnspec_attribute): Likewise.
(LANG_HOOKS_COMMON_ATTRIBUTE_TABLE): Define.
(LANG_HOOKS_FORMAT_ATTRIBUTE_TABLE): Define.

gcc/testsuite/ChangeLog:
PR jit/95426
* jit.dg/all-non-failing-tests.h: Add note about...
* jit.dg/test-builtin-unreachable.c: New test.
---
 gcc/jit/dummy-frontend.c  | 460 +-
 gcc/testsuite/jit.dg/all-non-failing-tests.h  |   3 +
 .../jit.dg/test-builtin-unreachable.c |  49 ++
 3 files changed, 511 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/jit.dg/test-builtin-unreachable.c

diff --git a/gcc/jit/dummy-frontend.c b/gcc/jit/dummy-frontend.c
index 6c7b7992a4d..aa64a6eb7a0 100644
--- a/gcc/jit/dummy-frontend.c
+++ b/gcc/jit/dummy-frontend.c
@@ -26,10 +26,462 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "langhooks-def.h"
 #include "diagnostic.h"
-
+#include "options.h"
+#include "stringpool.h"
+#include "attribs.h"
 
 #include 
 
+/* Attribute handling.  */
+
+static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
+static tree handle_leaf_attribute (tree *, tree, tree, int, bool *);
+static tree handle_const_attribute (tree *, tree, tree, int, bool *);
+static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
+static tree handle_pure_attribute (tree *, tree, tree, int, bool *);
+static tree handle_novops_attribute (tree *, tree, tree, int, bool *);
+static tree handle_nonnull_attribute (tree *, tree, tree, int, bool *);
+static tree handle_nothrow_attribute (tree *, tree, tree, int, bool *);
+static tree handle_sentinel_attribute (tree *, tree, tree, int, bool *);
+static tree handle_type_generic_attribute (tree *, tree, tree, int, bool *);
+static tree handle_transaction_pure_attribute (tree *, tree, tree, int, bool 
*);
+static tree handle_returns_twice_attribute (tree *, tree, tree, int, bool *);
+static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
+  int, bool *);
+static tree ignore_attribute (tree *, tree, tree, int, bool *);
+
+static tree handle_format_attribute (tree *, tree, tree, int, bool *);
+static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
+static tree handle_format_arg_attribute (tree *, tr

Re: [PATCH] Add pattern for pointer-diff on addresses with same base/offset (PR 94234)

2020-06-02 Thread Marc Glisse

On Mon, 1 Jun 2020, Feng Xue OS via Gcc-patches wrote:


* match.pd ((PTR + A) - (PTR + B)) -> (ptrdiff_t)(A - B): New
simplification.


Not new, modified.


* ((PTR_A + O) - (PTR_B + O)) -> (PTR_A - PTR_B): New simplification.


O might not be the best choice because of how close it looks to 0.


-   (simplify
-(pointer_diff (pointer_plus @@0 @1) (pointer_plus @0 @2))
-/* The second argument of pointer_plus must be interpreted as signed, and
-   thus sign-extended if necessary.  */
-(with { tree stype = signed_type_for (TREE_TYPE (@1)); }
- /* Use view_convert instead of convert here, as POINTER_PLUS_EXPR
-   second arg is unsigned even when we need to consider it as signed,
-   we don't want to diagnose overflow here.  */
- (minus (convert (view_convert:stype @1))
-   (convert (view_convert:stype @2)))
+  (simplify
+   (pointer_diff (pointer_plus@3 @0 @1) (pointer_plus @0 @2))
+(if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3)))
+  (convert (minus @1 @2


What don't you like about the existing transformation? You are replacing a 
transformation that always folds by one that folds only in some cases, and 
looses the information that some overflows cannot happen. That looks like 
it is making things worse from an optimization point of view. Do you 
consider the transformation as unsafe with -fsanitize=pointer-overflow 
(does that correspond to the case where TYPE_OVERFLOW_UNDEFINED is true 
for a pointer type?)?


Ah, looking at the PR, you decided to perform the operation as unsigned 
because that has fewer NOP conversions, which, in that particular testcase 
where the offsets are originally unsigned, means we simplify better. But I 
would expect it to regress other testcases (in particular if the offsets 
were originally signed). Also, changing the second argument of 
pointer_plus to be signed, as is supposed to eventually happen, would 
break your testcase again.


At the very least we want to keep a comment next to the transformation 
explaining the situation.


If there are platforms where the second argument of pointer_plus is a 
smaller type than the result of pointer_diff (can this happen? I keep 
forgetting all the weird things some platforms do), this version may do an 
unsafe zero-extension.


I think I would rather try to extend the transformation for A*C-B*C.


+  (simplify
+   (pointer_diff (pointer_plus@3 @0 @2) (pointer_plus @1 @2))
+(if (TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@3))
+&& !integer_zerop (@2))
+ (pointer_diff @0 @1)


Why would it be a problem if @2 is 0? We should have already simplified 
(pointer_plus @i integer_zerop) to just @i before reaching this, but it 
doesn't make the transformation wrong.


Same remark as above for TYPE_OVERFLOW_UNDEFINED, is that for the sake of 
-fsanitize=pointer-overflow?


--
Marc Glisse


[PATCH] c++: more constrained nested partial specialization

2020-06-02 Thread Patrick Palka via Gcc-patches
When checking that a constrained partial specialization is more
constrained than the primary template, we pass only the innermost level
of generic template arguments to strictly_subsumes.  This leads to us
doing a nonsensical substitution from normalize_concept_check if the
full set of template arguments has multiple levels, and it causes
strictly_subsumes to sometimes erroneously return false as in the
testcase below.

Passes 'make check-c++' and also tested by building the testsuites of
cmcstl2 and range-v3.  Does this look OK to commit to mainline and to
the 10 branch after a full bootstrap and regtest?

(We shouldn't need to do any substitution from strictly_subsumes here,
since processing_template_decl would always be non-zero and so
substituting in the complete set of generic template arguments should
always be a no-op I think.  I can propose this as a subsequent patch for
mainline.)

gcc/cp/ChangeLog:

* pt.c (process_partial_specialization): Pass the full set of
generic template arguments to strictly_subsumes.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-partial-spec8.C: New test.
---
 gcc/cp/pt.c|  2 +-
 .../g++.dg/cpp2a/concepts-partial-spec8.C  | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec8.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index df92f5584cf..d7f61289621 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -5062,7 +5062,7 @@ process_partial_specialization (tree decl)
   if (comp_template_args (inner_args, INNERMOST_TEMPLATE_ARGS (main_args))
   && (!flag_concepts
  || !strictly_subsumes (current_template_constraints (),
-inner_args, maintmpl)))
+main_args, maintmpl)))
 {
   if (!flag_concepts)
 error ("partial specialization %q+D does not specialize "
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec8.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec8.C
new file mode 100644
index 000..873cf44e407
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec8.C
@@ -0,0 +1,14 @@
+// { dg-do compile { target c++20 } }
+
+template
+concept equal = M == N;
+
+template
+struct traits
+{
+  template requires equal
+struct foo {};
+
+  template requires equal && (M >= 0) // { dg-bogus "not more 
constrained" }
+struct foo {};
+};
-- 
2.27.0.rc1.5.gae92ac8ae3



[PATCH] rs6000: identify lfs prefixed case PR95347

2020-06-02 Thread Aaron Sawdey via Gcc-patches
The same problem also arises for plfs where prefixed_load_p()
doesn't recognize it so we get just lfs in the asm output
with a @pcrel address.

OK for trunk if regstrap on ppc64le passes?

Thanks,
   Aaron


PR target/95347
* config/rs6000/rs6000.c (is_stfs_insn): Rename to
is_lfs_stfs_insn and make it recognize lfs as well.
(prefixed_store_p): Use is_lfs_stfs_insn().
(prefixed_load_p): Use is_lfs_stfs_insn() to recognize lfs.
---
 gcc/config/rs6000/rs6000.c | 41 +++---
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ba9069ecc3b..8ed8ae04e7a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -24980,14 +24980,18 @@ address_to_insn_form (rtx addr,
   return INSN_FORM_BAD;
 }
 
-/* Helper function to see if we're potentially looking at stfs.
+/* Helper function to see if we're potentially looking at lfs/stfs.
- PARALLEL containing a SET and a CLOBBER
-   - SET is from UNSPEC_SI_FROM_SF to MEM:SI
-   - CLOBBER is a V4SF
+   - stfs:
+- SET is from UNSPEC_SI_FROM_SF to MEM:SI
+- CLOBBER is a V4SF
+   - lfs:
+- SET is from UNSPEC_SF_FROM_SI to REG:SF
+- CLOBBER is a DI
  */
 
 static bool
-is_stfs_insn (rtx_insn *insn)
+is_lfs_stfs_insn (rtx_insn *insn)
 {
   rtx pattern = PATTERN (insn);
   if (GET_CODE (pattern) != PARALLEL)
@@ -25013,16 +25017,22 @@ is_stfs_insn (rtx_insn *insn)
   rtx src = SET_SRC (set);
   rtx scratch = SET_DEST (clobber);
 
-  if (GET_CODE (src) != UNSPEC || XINT (src, 1) != UNSPEC_SI_FROM_SF)
+  if (GET_CODE (src) != UNSPEC)
 return false;
 
-  if (GET_CODE (dest) != MEM || GET_MODE (dest) != SImode)
-return false;
+  /* stfs case.  */
+  if (XINT (src, 1) == UNSPEC_SI_FROM_SF
+  && GET_CODE (dest) == MEM && GET_MODE (dest) == SImode
+  && GET_CODE (scratch) == SCRATCH && GET_MODE (scratch) == V4SFmode)
+return true;
 
-  if (GET_CODE (scratch) != SCRATCH || GET_MODE (scratch) != V4SFmode)
-return false;
+  /* lfs case.  */
+  if (XINT (src, 1) == UNSPEC_SF_FROM_SI
+  && GET_CODE (dest) == REG && GET_MODE (dest) == SFmode
+  && GET_CODE (scratch) == SCRATCH && GET_MODE (scratch) == DImode)
+return true;
 
-  return true;
+  return false;
 }
 
 /* Helper function to take a REG and a MODE and turn it into the non-prefixed
@@ -25135,7 +25145,14 @@ prefixed_load_p (rtx_insn *insn)
   else
 non_prefixed = reg_to_non_prefixed (reg, mem_mode);
 
-  return address_is_prefixed (XEXP (mem, 0), mem_mode, non_prefixed);
+  fprintf(stderr,"prefixed_load_p regmode %s memmode %s non_prefixed %d\n",
+ GET_MODE_NAME(reg_mode), GET_MODE_NAME(mem_mode), non_prefixed);
+  debug_rtx(insn);
+
+  if (non_prefixed == NON_PREFIXED_X && is_lfs_stfs_insn (insn))
+return address_is_prefixed (XEXP (mem, 0), mem_mode, NON_PREFIXED_DEFAULT);
+  else
+return address_is_prefixed (XEXP (mem, 0), mem_mode, non_prefixed);
 }
 
 /* Whether a store instruction is a prefixed instruction.  This is called from
@@ -25170,7 +25187,7 @@ prefixed_store_p (rtx_insn *insn)
   /* Need to make sure we aren't looking at a stfs which doesn't look
  like the other things reg_to_non_prefixed/address_is_prefixed
  looks for.  */
-  if (non_prefixed == NON_PREFIXED_X && is_stfs_insn (insn))
+  if (non_prefixed == NON_PREFIXED_X && is_lfs_stfs_insn (insn))
 return address_is_prefixed (addr, mem_mode, NON_PREFIXED_DEFAULT);
   else
 return address_is_prefixed (addr, mem_mode, non_prefixed);
-- 
2.17.1



Re: [PATCH] rs6000: Use REAL_TYPE to copy when block move array in structure[PR65421]

2020-06-02 Thread David Edelsohn via Gcc-patches
On Tue, Jun 2, 2020 at 4:32 PM Segher Boessenkool
 wrote:
>
> Hi Xiong Hu,
>
> On Tue, Jun 02, 2020 at 04:41:50AM -0500, Xionghu Luo wrote:
> > Double array in structure as function arguments or return value is accessed
> > by BLKmode, they are stored to stack and load from stack with redundant
> > conversion from DF->DI->DF.  This patch checks the homogeneous type and
> > use the actual element type to do block move to by pass the conversions.
>
> > @@ -2733,6 +2734,7 @@ expand_block_move (rtx operands[], bool might_overlap)
> >rtx loads[MAX_MOVE_REG];
> >rtx stores[MAX_MOVE_REG];
> >int num_reg = 0;
> > +  machine_mode elt_mode = DImode;
> >
> >/* If this is not a fixed size move, just call memcpy */
> >if (! constp)
> > @@ -2750,6 +2752,17 @@ expand_block_move (rtx operands[], bool 
> > might_overlap)
> >if (bytes > rs6000_block_move_inline_limit)
> >  return 0;
> >
> > +  tree type = TREE_TYPE (MEM_EXPR (orig_dest));
>
> Declare elt_mode here as well?
>
> > +  if (TREE_CODE (type) == RECORD_TYPE
> > +  && rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, 
> > NULL,
> > + NULL))
> > +{
> > +  tree field_type = TREE_TYPE (first_field (type));
> > +  if (field_type && TREE_CODE (field_type) == ARRAY_TYPE
> > +   && TREE_CODE (TREE_TYPE (field_type)) == REAL_TYPE)
> > + elt_mode = TYPE_MODE (TREE_TYPE (field_type));
> > +}
>
> Homogeneous aggregates only exist in the ELFv2 ABI, while the problem
> here is the SP float things.  You also noticed (elsewhere) that if the
> struct contains (say) SI, SF, SI, SF, then this does not help.
>
> Is there some better condition this could use, and maybe an expansion
> that works in more cases as well?
>
> And, it would be lovely if generic code could expand to something better
> already (not expand to a block move at all, certainly not for something
> as tiny as this).

And please don't refer to homogeneous aggregates outside of ELFv2 ABI
code because that will miss an optimization or generate incorrect code
other PowerPC OSes and ABIs, such as AIX.

Thanks, David


Re: [PATCH] rs6000: Use REAL_TYPE to copy when block move array in structure[PR65421]

2020-06-02 Thread Segher Boessenkool
Hi Xiong Hu,

On Tue, Jun 02, 2020 at 04:41:50AM -0500, Xionghu Luo wrote:
> Double array in structure as function arguments or return value is accessed
> by BLKmode, they are stored to stack and load from stack with redundant
> conversion from DF->DI->DF.  This patch checks the homogeneous type and
> use the actual element type to do block move to by pass the conversions.

> @@ -2733,6 +2734,7 @@ expand_block_move (rtx operands[], bool might_overlap)
>rtx loads[MAX_MOVE_REG];
>rtx stores[MAX_MOVE_REG];
>int num_reg = 0;
> +  machine_mode elt_mode = DImode;
>  
>/* If this is not a fixed size move, just call memcpy */
>if (! constp)
> @@ -2750,6 +2752,17 @@ expand_block_move (rtx operands[], bool might_overlap)
>if (bytes > rs6000_block_move_inline_limit)
>  return 0;
>  
> +  tree type = TREE_TYPE (MEM_EXPR (orig_dest));

Declare elt_mode here as well?

> +  if (TREE_CODE (type) == RECORD_TYPE
> +  && rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, NULL,
> + NULL))
> +{
> +  tree field_type = TREE_TYPE (first_field (type));
> +  if (field_type && TREE_CODE (field_type) == ARRAY_TYPE
> +   && TREE_CODE (TREE_TYPE (field_type)) == REAL_TYPE)
> + elt_mode = TYPE_MODE (TREE_TYPE (field_type));
> +}

Homogeneous aggregates only exist in the ELFv2 ABI, while the problem
here is the SP float things.  You also noticed (elsewhere) that if the
struct contains (say) SI, SF, SI, SF, then this does not help.

Is there some better condition this could use, and maybe an expansion
that works in more cases as well?

And, it would be lovely if generic code could expand to something better
already (not expand to a block move at all, certainly not for something
as tiny as this).


Segher


Re: [PATCH] Extend std::copy/std::copy_n char* overload to deque iterator

2020-06-02 Thread Jonathan Wakely via Gcc-patches

On 24/05/20 15:43 +0200, François Dumont via Libstdc++ wrote:

Now tested in C++98 mode, there was indeed a small problem.

I even wonder if I shouldn't have extend the std::copy overload to any 
call with deque iterator as the output so that it is transform into an 
output to pointer.


Ok to commit ?





--- a/libstdc++-v3/include/debug/safe_iterator.tcc
+++ b/libstdc++-v3/include/debug/safe_iterator.tcc
@@ -234,6 +234,12 @@ namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION

+  template
+_Ite
+__niter_base(const ::__gnu_debug::_Safe_iterator<_Ite, _Seq,
+std::random_access_iterator_tag>& __it)
+{ return __it.base(); }
+


I was going to ask if there's a reason this uses "_Ite" and not
"_Iter", but I see we already have several uses of that.

Thy all seem to be introduced by you though :-)

We definitely have a lot more "_Iter" uses though:

$ git grep -w _Ite -- include/ | wc -l
46
$ git grep -w _Iter -- include/ | wc -l
835

When I see "Ite" it looks to me as though it should rhyme with "sight"
or "white", rather than "Iter" which rhymes with "bitter" (and so is
the start of "iterator").

Leave it as _Ite for now, I might go through and change every _Ite to
_Iter later.


OK for master, thanks.




[PATCH] c++: constrained nested partial specialization [PR92103]

2020-06-02 Thread Patrick Palka via Gcc-patches
When determining the most specialized partial specialization of a
primary template that is nested inside a class template, we first
tsubst the outer template arguments into the TEMPLATE_DECL of each
partial specialization, and then check for satisfaction of the new
TEMPLATE_DECL's constraints.

But tsubst_template_decl does not currently guarantee that constraints
from the original DECL_TEMPLATE_RESULT get reattached to the new
DECL_TEMPLATE_RESULT.  In the testcase below, this leads to the
constraints_satisfied_p check in most_specialized_partial_spec to
trivially return true for each of the partial specializations.

I'm not sure if such a guarantee would be desirable, but in any case we
can just check constraints_satisfied_p on the original TEMPLATE_DECL
instead of on the tsubsted TEMPLATE_DECL here, which is what this patch
does (alongside some reorganizing).

Passes 'make check-c++' and also tested by building the testsuites of
cmcstl2 and range-v3.  Does this look OK to commit to mainline and to
the 10 branch after a full bootstrap and regtest?

gcc/cp/ChangeLog:

PR c++/92103
* pt.c (most_specialized_partial_spec): Reorganize the loop over
DECL_TEMPLATE_SPECIALIZATIONS.  Check constraints_satisfied_p on
the original template declaration, not on the substituted one.

gcc/testsuite/ChangeLog:

PR c++/92103
* g++.dg/cpp2a/concepts-partial-spec7.C: New test.
---
 gcc/cp/pt.c   | 19 
 .../g++.dg/cpp2a/concepts-partial-spec7.C | 22 +++
 2 files changed, 32 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 907ca879c73..df92f5584cf 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -24482,21 +24482,22 @@ most_specialized_partial_spec (tree target, 
tsubst_flags_t complain)
 
   for (t = DECL_TEMPLATE_SPECIALIZATIONS (main_tmpl); t; t = TREE_CHAIN (t))
 {
-  tree spec_args;
-  tree spec_tmpl = TREE_VALUE (t);
+  const tree ospec_tmpl = TREE_VALUE (t);
 
+  tree spec_tmpl;
   if (outer_args)
{
  /* Substitute in the template args from the enclosing class.  */
  ++processing_template_decl;
- spec_tmpl = tsubst (spec_tmpl, outer_args, tf_none, NULL_TREE);
+ spec_tmpl = tsubst (ospec_tmpl, outer_args, tf_none, NULL_TREE);
  --processing_template_decl;
+ if (spec_tmpl == error_mark_node)
+   return error_mark_node;
}
+  else
+   spec_tmpl = ospec_tmpl;
 
-  if (spec_tmpl == error_mark_node)
-   return error_mark_node;
-
-  spec_args = get_partial_spec_bindings (tmpl, spec_tmpl, args);
+  tree spec_args = get_partial_spec_bindings (tmpl, spec_tmpl, args);
   if (spec_args)
{
  if (outer_args)
@@ -24505,9 +24506,9 @@ most_specialized_partial_spec (tree target, 
tsubst_flags_t complain)
   /* Keep the candidate only if the constraints are satisfied,
  or if we're not compiling with concepts.  */
   if (!flag_concepts
-  || constraints_satisfied_p (spec_tmpl, spec_args))
+ || constraints_satisfied_p (ospec_tmpl, spec_args))
 {
-  list = tree_cons (spec_args, TREE_VALUE (t), list);
+ list = tree_cons (spec_args, ospec_tmpl, list);
   TREE_TYPE (list) = TREE_TYPE (t);
 }
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C
new file mode 100644
index 000..5b3afce3bc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec7.C
@@ -0,0 +1,22 @@
+// PR c++/92103
+// { dg-do compile { target c++20 } }
+
+template
+struct traits
+{
+  template
+struct equal_to
+{ static constexpr bool value = false; };
+
+  template requires (M == N)
+struct equal_to
+{ static constexpr bool value = true; };
+
+  template requires (M < 0) || (N < 0)
+struct equal_to
+{ };
+};
+
+static_assert(traits<0>::equal_to<0>::value);
+static_assert(!traits<0>::equal_to<1>::value);
+static_assert(traits<-1>::equal_to<0>::value); // { dg-error "not a member" }
-- 
2.27.0.rc1.5.gae92ac8ae3



[pushed] c++: *this capture in const member fn [PR95193].

2020-06-02 Thread Jason Merrill via Gcc-patches
Here, the capture proxy for *this is const, but its DECL_VALUE_EXPR is not.
Don't ICE on this; it's a reasonable difference, since in C++ an rvalue of
scalar type does not have cv-qualifiers.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

PR c++/95193
* pt.c (tsubst_decl): Relax assert.

gcc/testsuite/ChangeLog:

PR c++/95193
* g++.dg/cpp1z/lambda-this7.C: New test.
---
 gcc/cp/pt.c   |  3 ++-
 gcc/testsuite/g++.dg/cpp1z/lambda-this7.C | 11 +++
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/lambda-this7.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 907ca879c73..9c03c5a5bbd 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -14633,7 +14633,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 && DECL_BIT_FIELD_TYPE (TREE_OPERAND (ve, 1)) == type)
  type = TREE_TYPE (ve);
else
- gcc_checking_assert (TREE_TYPE (ve) == type);
+ gcc_checking_assert (TYPE_MAIN_VARIANT (TREE_TYPE (ve))
+  == TYPE_MAIN_VARIANT (type));
SET_DECL_VALUE_EXPR (r, ve);
  }
if (CP_DECL_THREAD_LOCAL_P (r)
diff --git a/gcc/testsuite/g++.dg/cpp1z/lambda-this7.C 
b/gcc/testsuite/g++.dg/cpp1z/lambda-this7.C
new file mode 100644
index 000..8137061e161
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/lambda-this7.C
@@ -0,0 +1,11 @@
+// PR c++/95193
+// { dg-do compile { target c++17 } }
+
+struct X {
+  void foo() const {
+auto GL1 = [*this](auto a) {
+};
+
+GL1("abc");
+  }
+};

base-commit: 4013baf99c38f7bca06a51f8301e8fb195ccfa33
-- 
2.18.1



[PATCH] FAT library support for libatomic, libgfortran, libgomp, libstdc++

2020-06-02 Thread David Edelsohn via Gcc-patches
[I'll start by repeating what I wrote about a similar libgcc change to
provide background and context.]

When AIX added 64 bit support, it implemented what Apple MacOS Darwin
calls "FAT" libraries for its equivalent functionality -- both 32 bit
and 64 bit objects (or shared objects) are co-located in the same
archive.  GCC on AIX historically has followed the GCC multilib
directory hierarchy approach with separate directories and archives
for each multilib.

We now are working to support GCC on AIX in 64 bit mode.  To retain
the directory hierarchy, it is beneficial to shift (or at least
initially augment) the GCC multilib mechanism with AIX-style "FAT"
libraries.

It is beneficial for the "FAT" libraries to be created consistently
for GCC in both 32 bit mode and 64 bit mode.  When all of the
libraries are converted, the multilib rules will look for 32 bit and
64 bit multilibs in the top-level library.  All target multilibs need
to be enabled at the same time, but the build can start the creation
of the "FAT" libraries without utilizing them.

The goal is to place both 32 bit and 64 bit objects and shared objects
in archives at the top-level, not multilib subdirectories.  The
multilibs are built in subdirectories, but must be combined during the
last parts of the target library build process.  Because of the way
that GCC bootstrap works, the libraries must be combined during the
multiple stages of GCC bootstrap, not solely when installed in the
final destination, so the libraries have to be correct at the end of
each target library build stage, not solely an install recipe.

After trying various options, the best solution seems to be the
inclusion of target-specific Makefile fragments. Directly adding the
rules to the Makefiles conflicts with Automake conditionals syntax.
And Makefile fragments are easily extendible to other targets, such as
Darwin. It avoids regenerating Makefile.in or configure for each
change.

The rhythm of the fragment is the same for each library, but each
library is slightly different: package name versus directory name,
sub-directory, location of version info. I didn't see the benefit of
trying to rationalize all of the GCC target libraries as part of this
effort.

One global issue: I copied empty.mk into each library because Make
needs to include something by default. Alternatively, I can move
empty.mk to the top-level of the source tree and include a common copy
everywhere.

Bootstrapped on powerpc-ibm-aix7.2.0.0 and powerpc64le-unknown-gnu-linux.

Is this okay with each maintainer for the respective target libraries?

Thanks, David

libgomp/
* Makefile.am (tmake_file): Build and install AIX-style FAT libraries.
* Makefile.in: Regenerate
* configure.ac (tmake_file): Substitute.
* configure: Regenerate.
* configure.tgt (powerpc-ibm-aix*): Define tmake_file.
* config/t-aix: New file.
* empty.mk: New file.

libstdc++-v3/
* Makefile.am (tmake_file): Build and install AIX-style FAT libraries.
* Makefile.in: Regenerate.
* configure.ac (tmake_file): Substitute.
* configure: Regenerate.
* configure.host (aix*): Define tmake_file.
* config/os/aix/t-aix: New file.
* empty.mk: New file.

libatomic/
* Makefile.am (tmake_file): Build and install AIX-style FAT libraries.
* Makefile.in: Regenerate.
* configure.ac (tmake_file): Substitute.
* configure: Regenerate.
* configure.tgt (powerpc-ibm-aix*): Define tmake_file.
* config/t-aix: New file.
* empty.mk: New file.

libgfortran/
* Makefile.am (tmake_file): Build and install AIX-style FAT libraries.
* Makefile.in: Regenerate.
* configure.ac (tmake_file): Substitute.
* configure: Regenerate.
* configure.host: Add system configury stanza. Define tmake_file.
* config/t-aix: New file.
* empty.mk: New file.
From cf96afed42fcda9debb1351e86b511316ee4d781 Mon Sep 17 00:00:00 2001
From: David Edelsohn 
Date: Fri, 15 May 2020 17:46:08 -0400
Subject: [PATCH] FAT target library support

libgomp/
* Makefile.am (tmake_file): Build and install AIX-style FAT libraries.
* Makefile.in: Regenerate
* configure.ac (tmake_file): Substitute.
* configure: Regenerate.
* configure.tgt (powerpc-ibm-aix*): Define tmake_file.
* config/t-aix: New file.
* empty.mk: New file.

libstdc++-v3/
* Makefile.am (tmake_file): Build and install AIX-style FAT libraries.
* Makefile.in: Regenerate.
* configure.ac (tmake_file): Substitute.
* configure: Regenerate.
* configure.host (aix*): Define tmake_file.
* config/os/aix/t-aix: New file.
* empty.mk: New file.

libatomic/
* Makefile.am (tmake_file): Build and install AIX-style FAT libraries.
* Makefile.in: Regenerate.
* configure.ac (tmake_file): Substitute.
* configure: Regener

[PATCH] RISC-V: Make __divdi3 handle div by zero same as hardware.

2020-06-02 Thread Jim Wilson
The ISA manual specifies that divide by zero always returns -1 as the result.
We were failing to do that when the dividend was negative.

Tested with cross toolchain builds for riscv32-elf and riscv64-linux.  There
were no regressions.

Committed.

Jim

libgcc/
* config/riscv/div.S (__divdi3): For negative arguments, change bgez
to bgtz.
---
 libgcc/config/riscv/div.S | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/riscv/div.S b/libgcc/config/riscv/div.S
index 151f8e273ac..17234324c1e 100644
--- a/libgcc/config/riscv/div.S
+++ b/libgcc/config/riscv/div.S
@@ -107,10 +107,12 @@ FUNC_END (__umoddi3)
   /* Handle negative arguments to __divdi3.  */
 .L10:
   neg   a0, a0
-  bgez  a1, .L12  /* Compute __udivdi3(-a0, a1), then negate the result.  
*/
+  /* Zero is handled as a negative so that the result will not be inverted.  */
+  bgtz  a1, .L12 /* Compute __udivdi3(-a0, a1), then negate the result.  */
+
   neg   a1, a1
-  j __udivdi3 /* Compute __udivdi3(-a0, -a1).  */
-.L11: /* Compute __udivdi3(a0, -a1), then negate the result.  
*/
+  j __udivdi3/* Compute __udivdi3(-a0, -a1).  */
+.L11:/* Compute __udivdi3(a0, -a1), then negate the result.  */
   neg   a1, a1
 .L12:
   move  t0, ra
-- 
2.17.1



Re: [PATCH] RISC-V: align RISC-V software division with hardware specification in case of division by zero

2020-06-02 Thread Jim Wilson
On Fri, May 29, 2020 at 1:53 AM MOSER Virginie via Gcc-patches
 wrote:
> The assembly code in libgcc/config/riscv/div.S does not handle the division 
> by zero as specified in the riscv-spec v2.2 chapter 6.2 in case of signed 
> division:

This looks OK.  There are some administrative comments to make here.

FSF policy requires copyright assignments.  I don't see one for you or
your employer.  If you are serious about contributing to GNU projects.
you need to get a copyright assignment.  This patch is small enough
that I can accept it, but I may not be able to accept the next one.
If you don't get a copyright assignment, you might consider filing bug
reports instead and let someone with an assignment write the patch.

The patch is mime encoded which is inconvenient.  The patch also got
corrupted somehow.  I had to edit it before I could get it to apply.
Or maybe the patch was for an older gcc version?  git format-patch and
git send-email is a good way to create and send a patch.  Or you can
file a bug report and add the patch as an attachment there.

ChangeLog entries should describe what change was made.  Not why it
was made.  The why should go into the git commit or comments in the
code.  I rewrote the ChangeLog entry.

We prefer lines that are less than 80 characters.  I moved the comment
to a separate line and also fixed some other nearby lines.  Comments
end with a period and two spaces.

I fixed up the patch, tested it, and committed it.  I will send the
patch I committed.

Jim


[FYI] [Ada] remove last traces of -auxbase

2020-06-02 Thread Alexandre Oliva
Remove occurrences of auxbase that remained in comments.

Regstrapped on x86_64-linux-gnu.  Pre-approved by Arno.


for  gcc/ada/ChangeLog

* lib.ads (Compilation_Switches): Remove -auxbase from
comments.
* switch.ads (Is_Internal_GCC_Switch): Likewise.
---
 ada/lib.ads|2 +-
 ada/switch.ads |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git gcc/ada/lib.ads gcc/ada/lib.ads
index 8376d7a..47b6867 100644
--- gcc/ada/lib.ads
+++ gcc/ada/lib.ads
@@ -998,7 +998,7 @@ private
--  The following table records the compilation switches used to compile
--  the main unit. The table includes only switches. It excludes -o
--  switches as well as artifacts of the gcc/gnat1 interface such as
-   --  -quiet, -dumpbase, or -auxbase.
+   --  -quiet, or -dumpbase.
 
--  This table is set as part of the compiler argument scanning in
--  Back_End. It can also be reset in -gnatc mode from the data in an
diff --git gcc/ada/switch.ads gcc/ada/switch.ads
index aa8b283..7fdfb52 100644
--- gcc/ada/switch.ads
+++ gcc/ada/switch.ads
@@ -77,7 +77,7 @@ package Switch is
 
function Is_Internal_GCC_Switch (Switch_Chars : String) return Boolean;
--  Returns True iff Switch_Chars represents an internal GCC switch to be
-   --  followed by a single argument, such as -dumpbase, --param or -auxbase.
+   --  followed by a single argument, such as -dumpbase, or --param.
--  Even though passed by the "gcc" driver, these need not be stored in ALI
--  files and may safely be ignored by non GCC back-ends.
 

-- 
Alexandre Oliva, freedom fighterhe/himhttps://FSFLA.org/blogs/lxo/
Free Software Evangelist  Stallman was right, but he's left :(
GNU Toolchain Engineer   Live long and free, and prosper ethically


Re: [PATCH] Prefer simple case changes in spelling suggestions

2020-06-02 Thread David Malcolm via Gcc-patches
On Sat, 2020-05-30 at 18:51 +, Pip Cet wrote:
> On Sat, May 30, 2020 at 5:06 PM David Malcolm 
> wrote:
> > On Sat, 2020-05-30 at 13:40 +, Pip Cet via Gcc-patches wrote:
> > > I think we should just omit the triangle inequality test from the
> > > self-test, as in the attached patch.
> > 
> > I like the idea,
> 
> Thanks!
> 
> > but can you update the comment so that it explains why
> > it's not a metric? (better to capture that in a comment in the
> > source
> > rather than just in the mailing list archives).
> 
> How's this?

Thanks; looks good to me.  Hopefully this doesn't clash with Tom's
patch.


Dave



Re: [PATCH] Prefer simple case changes in spelling suggestions

2020-06-02 Thread David Malcolm via Gcc-patches
On Mon, 2020-06-01 at 14:11 -0600, Tom Tromey wrote:
> > Did the full DejaGnu testsuite get run?  There are a lot of tests
> > in it
> > that make use of this code.
> 
> I did "make check" and only saw some XFAILs.
> 
> Here's v2 of the patch, which I think addresses your comments.  I did
> not add a new test of get_edit_distance, because as I mentioned
> earlier,
> an existing test already does what you asked for.
> 
> Tom
> 
> commit e897a99dada8d3935343ebf7b14ad7ec36515b3d
> Author: Tom Tromey 
> Date:   Fri May 29 10:46:57 2020 -0600
> 
> Prefer simple case changes in spelling suggestions
> 
> I got this error message when editing gcc and recompiling:
> 
> ../../gcc/gcc/ada/gcc-interface/decl.c:7714:39: error:
> ‘DWARF_GNAT_ENCODINGS_all’ was not declared in this scope; did you
> mean ‘DWARF_GNAT_ENCODINGS_GDB’?
>  7714 | = debug_info && gnat_encodings ==
> DWARF_GNAT_ENCODINGS_all;
>   |   ^~~
> ~
>   |   DWARF_GNAT_ENCODING
> S_GDB
> 
> This suggestion could be improved -- what happened here is that I
> failed to upper-case the word, and DWARF_GNAT_ENCODINGS_ALL was
> the
> correct spelling.
> 
> This patch changes gcc's spell checker to prefer simple case
> changes
> when possible.
> 
> I tested this using the self-tests.  A new self-test is also
> included.
> 
> gcc/ChangeLog:
> 
> * spellcheck.c (CASE_COST): New define.
> (BASE_COST): New define.
> (get_edit_distance): Recognize case changes.
> (get_edit_distance_cutoff): Update.
> (test_edit_distances): Update.
> (get_old_cutoff): Update.
> (test_find_closest_string): Add case sensitivity test.

Thanks; looks good to me.

Dave
 




Re: [PATCH PR95459] aarch64: ICE in aarch64_short_vector_p, at config/aarch64/aarch64.c:16803

2020-06-02 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
> Hi,
>
> Please review this trivial patch fixing an ICE in aarch64_short_vector_p.
> Bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95459 
>
> In aarch64_short_vector_p, we are simply checking whether a type (and a 
> mode)
> is a 64/128-bit short vector or not.  This should not be affected by the 
> value
> of TARGET_SVE.  Simply leave later code to report an error if SVE is 
> disabled.
> Bootstrapped and tested on aarch64-linux-gnu.  OK?

OK, thanks.  Pushed to master.

Richard

>
> gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2020-06-02  Felix Yang  
> +
> +   PR target/95459
> +   * config/aarch64/aarch64.c (aarch64_short_vector_p):
> +   Leave later code to report an error if SVE is disabled.
>
> gcc/testsuite/ChangeLog
> @@ -1,3 +1,8 @@
> +2020-06-02  Felix Yang  
> +
> +   PR target/95459
> +   * gcc.target/aarch64/mgeneral-regs_6.c: New test.
>
> Thanks,
> Felix


[committed] libstdc++: Make debug containers prefer copy ctor to base ctor (PR 90102)

2020-06-02 Thread Jonathan Wakely via Gcc-patches
When given a type which can convert to any container-like type, the
C(const C&) copy constructor and C(const C::_Base&) converting
constructor are ambiguous. This change replaces the converting
constructor's parameter with a reference_wrapper-like type so that
calling that constructor requires an additional user-defined conversion.
This gives it a lower rank than the copy constructor, avoiding the
ambiguity.

While testing this change I discovered that __gnu_debug::forward_list
doesn't have a convering constructor from the std::forward_list base, so
this adds it.

We should probably consider whether the converting constructors should
be 'explicit' but I'm not changing that now.

libstdc++-v3/ChangeLog:

PR libstdc++/90102
* include/debug/deque (deque(const _Base&)): Replace parameter
with a struct that wraps a const _Base&.
* include/debug/forward_list (forward_list(_Base_ref)): New
constructor.
* include/debug/list (list(const _Base&)): Replace parameter
with a struct that wraps a const _Base&.
* include/debug/map.h (map(const _Base&)): Likewise.
* include/debug/multimap.h (multimap(const _Base&)): Likewise.
* include/debug/multiset.h (multiset(const _Base&)): Likewise.
* include/debug/set.h (set(const _Base&)): Likewise.
* include/debug/unordered_map (unordered_map(const _Base&))
(unordered_multimap(const _Base&)): Likewise.
* include/debug/unordered_set (unordered_set(const _Base&))
(unordered_multiset(const _Base&)): Likewise.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Adjust dg-error line number.
* include/debug/vector (vector(const _Base&)): Likewise.
* testsuite/23_containers/deque/debug/90102.cc: New test.
* testsuite/23_containers/forward_list/debug/90102.cc: New test.
* testsuite/23_containers/list/debug/90102.cc: New test.
* testsuite/23_containers/map/debug/90102.cc: New test.
* testsuite/23_containers/multimap/debug/90102.cc: New test.
* testsuite/23_containers/multiset/debug/90102.cc: New test.
* testsuite/23_containers/set/debug/90102.cc: New test.
* testsuite/23_containers/unordered_map/debug/90102.cc: New test.
* testsuite/23_containers/unordered_multimap/debug/90102.cc: New test.
* testsuite/23_containers/unordered_multiset/debug/90102.cc: New test.
* testsuite/23_containers/unordered_set/debug/90102.cc: New test.
* testsuite/23_containers/vector/debug/90102.cc: New test.

Tested powerpc64le-linux, committed to master.


commit eca833b81289438ec5ae3ed4c77ffb49cfb65f34
Author: Jonathan Wakely 
Date:   Tue Jun 2 18:13:08 2020 +0100

libstdc++: Make debug containers prefer copy ctor to base ctor (PR 90102)

When given a type which can convert to any container-like type, the
C(const C&) copy constructor and C(const C::_Base&) converting
constructor are ambiguous. This change replaces the converting
constructor's parameter with a reference_wrapper-like type so that
calling that constructor requires an additional user-defined conversion.
This gives it a lower rank than the copy constructor, avoiding the
ambiguity.

While testing this change I discovered that __gnu_debug::forward_list
doesn't have a convering constructor from the std::forward_list base, so
this adds it.

We should probably consider whether the converting constructors should
be 'explicit' but I'm not changing that now.

libstdc++-v3/ChangeLog:

PR libstdc++/90102
* include/debug/deque (deque(const _Base&)): Replace parameter
with a struct that wraps a const _Base&.
* include/debug/forward_list (forward_list(_Base_ref)): New
constructor.
* include/debug/list (list(const _Base&)): Replace parameter
with a struct that wraps a const _Base&.
* include/debug/map.h (map(const _Base&)): Likewise.
* include/debug/multimap.h (multimap(const _Base&)): Likewise.
* include/debug/multiset.h (multiset(const _Base&)): Likewise.
* include/debug/set.h (set(const _Base&)): Likewise.
* include/debug/unordered_map (unordered_map(const _Base&))
(unordered_multimap(const _Base&)): Likewise.
* include/debug/unordered_set (unordered_set(const _Base&))
(unordered_multiset(const _Base&)): Likewise.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Adjust dg-error line number.
* include/debug/vector (vector(const _Base&)): Likewise.
* testsuite/23_containers/deque/debug/90102.cc: New test.
* testsuite/23_containers/forward_list/debug/90102.cc: New test.
* testsuite/23_containers/list/debug/90102.cc: New test.
* testsuite/23_containers/map/debug/90102.c

Re: [patch] Make memory copy functions scalar storage order barriers

2020-06-02 Thread Eric Botcazou
[Starting with tne VN issue]

> For the VN case the issue is interpreting a read via memcpy (non
> reverse-storage) as a reverse-storage one when matching up with a hashtable
> entry from a regular reverse-storage order store, correct?

It's a read from a scalar (hence native order) combined with a store to an 
aggregate type with reverse order:

   u = 305419896;
  __builtin_memcpy (&tempb, &u, 4);
  _1 = tempb.val;

> But likewise whether the pointer destination had the attribute doesn't
> correctly qualify this - instead I think we are already properly matching up
> the reverse storage property?

No, we do not do direct matching for the storage order because you may not 
access the same memory location with different scalar storage orders, it's the 
fundamental invariant, so it's redundant with e.g. the offset for matching.

You need to byte swap if you want to propagate 305419896 to _1 above.  But 
it's an easy case: suppose that tempb is made up of 2 scalars with reverse 
storage order, then it's more complicated.  So memcpy where one of the types 
is aggregate with reverse storage order needs to be considered a storage order 
barrier, exactly like VIEW_CONVERT_EXPR:

case VIEW_CONVERT_EXPR:
  temp.off = 0;
  temp.reverse = storage_order_barrier_p (ref);
  break;

/* Return true if T is a storage order barrier, i.e. a VIEW_CONVERT_EXPR
   that can modify the storage order of objects.  Note that, even if the
   TYPE_REVERSE_STORAGE_ORDER flag is set on both the inner type and the
   outer type, a VIEW_CONVERT_EXPR can modify the storage order because
   it can change the partition of the aggregate object into scalars.  */

static inline bool
storage_order_barrier_p (const_tree t)
{
  if (TREE_CODE (t) != VIEW_CONVERT_EXPR)
return false;

  if (AGGREGATE_TYPE_P (TREE_TYPE (t))
  && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (t)))
return true;

  tree op = TREE_OPERAND (t, 0);

  if (AGGREGATE_TYPE_P (TREE_TYPE (op))
  && TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (op)))
return true;

  return false;
}

hence the similar conditions, but I can add some more commentary.

> Hum.  How does the pointer type matter at all for memcpy()?
> Isn't the
> issue just that we may not use the TYPE_REVERSE_STORAGE_ORDER
> qualified type in the following case:
> 
>   gimple *new_stmt;
>   if (is_gimple_reg_type (TREE_TYPE (srcvar)))
> {
>   tree tem = fold_const_aggregate_ref (srcvar);
>   if (tem)
> srcvar = tem;
>   if (! is_gimple_min_invariant (srcvar))
> {
>   new_stmt = gimple_build_assign (NULL_TREE, srcvar);
>   srcvar = create_tmp_reg_or_ssa_name (TREE_TYPE (srcvar),
>new_stmt);
> 
> ?

Same rationale as above: if you want to rewrite the copy in the presence of an 
aggregate type with reverse storage order, you may need to split the operation 
into the appropriate pieces; one large access may be incorrect.

-- 
Eric Botcazoudiff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 4e3de95d2d2..2dca9776b38 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -741,15 +741,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 }
   else
 {
-  tree srctype, desttype;
+  /* We cannot (easily) change the type of the copy if it is a storage
+	 order barrier, i.e. is equivalent to a VIEW_CONVERT_EXPR that can
+	 modify the storage order of objects (see storage_order_barrier_p).  */
+  tree srctype
+	= POINTER_TYPE_P (TREE_TYPE (src))
+	  ? TREE_TYPE (TREE_TYPE (src)) : NULL_TREE;
+  tree desttype
+	= POINTER_TYPE_P (TREE_TYPE (dest))
+	  ? TREE_TYPE (TREE_TYPE (dest)) : NULL_TREE;
   unsigned int src_align, dest_align;
-  tree off0;
-  const char *tmp_str;
   unsigned HOST_WIDE_INT tmp_len;
+  const char *tmp_str;
 
   /* Build accesses at offset zero with a ref-all character type.  */
-  off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
-			 ptr_mode, true), 0);
+  tree off0
+	= build_int_cst (build_pointer_type_for_mode (char_type_node,
+		  ptr_mode, true), 0);
 
   /* If we can perform the copy efficiently with first doing all loads
  and then all stores inline it that way.  Currently efficiently
@@ -767,7 +775,13 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	 hack can be removed.  */
 	  && !c_strlen (src, 1)
 	  && !((tmp_str = c_getstr (src, &tmp_len)) != NULL
-	   && memchr (tmp_str, 0, tmp_len) == NULL))
+	   && memchr (tmp_str, 0, tmp_len) == NULL)
+	  && !(srctype
+	   && AGGREGATE_TYPE_P (srctype)
+	   && TYPE_REVERSE_STORAGE_ORDER (srctype))
+	  && !(desttype
+	   && AGGREGATE_TYPE_P (desttype)
+	   && TYPE_REVERSE_STORAGE_ORDER (desttype)))
 	{
 	  unsigned ilen = tree_to_uhwi (len);
 	  if (pow2p_hwi (ilen))
@@ -947,8 +961,13 @@ gimple_fold_builtin_memory_op (gimple_stmt_it

Re: [PATCH 5/7 v3] vect: Support vector load/store with length in vectorizer

2020-06-02 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 12:50:25PM +0100, Richard Sandiford wrote:
> This might not be the best time to bring this up :-) but it seems
> odd to be asking the target for the induction variable type here.
> I got the impression that the hook was returning DImode, whereas
> the PowerPC instructions only looked at the low 8 bits of the length.

It's the top(!) 8 bits of the register actually (the other 56 bits are
"do not care" bits).

> If so, forcing a naturally 32-bit IV to DImode would insert extra
> sign/zero extensions, even though the input to the length intrinsics
> would have been happy with the 32-bit IV.

It's a shift left always.  All bits beyond the 8 drop out.


Thanks for the great reviews Richard, much appreciated!


Segher


[PATCH][GCC 8] aarch64: Add initial support for -mcpu=zeus

2020-06-02 Thread Kyrylo Tkachov
Hi all,

This patch adds support for the Arm Zeus CPU.
Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to the GCC 8 branch.
Thanks,
Kyrill

gcc/

2020-06-02  Kyrylo Tkachov  

* config/aarch64/aarch64-cores.def (zeus): Define.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document zeus -mcpu option.


aarch64-8-zeus.patch
Description: aarch64-8-zeus.patch


[PATCH][GCC 9] aarch64: Add initial support for -mcpu=zeus

2020-06-02 Thread Kyrylo Tkachov
Hi all,

This patch adds support for the Arm Zeus CPU.
Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to the GCC 9 branch.
Thanks,
Kyrill

2020-06-02  Kyrylo Tkachov  

* config/aarch64/aarch64-cores.def (zeus): Define.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document zeus -mcpu option.


aarch64-9-zeus.patch
Description: aarch64-9-zeus.patch


Re: [PATCH] rs6000: Use REAL_TYPE to copy when block move array in structure[PR65421]

2020-06-02 Thread Segher Boessenkool
On Tue, Jun 02, 2020 at 12:52:31PM +0200, Richard Biener wrote:
> On Tue, Jun 2, 2020 at 11:43 AM Xionghu Luo via Gcc-patches
>  wrote:
> > Double array in structure as function arguments or return value is accessed
> > by BLKmode, they are stored to stack and load from stack with redundant
> > conversion from DF->DI->DF.  This patch checks the homogeneous type and
> > use the actual element type to do block move to by pass the conversions.
> 
> Is it correct to do this when the actual data in the place is DImode?
> We generally
> avoid using any floating point modes here because the DImode data could
> for example correspond to a signalling NaN or a non-canonical NaN.

On PowerPC, load and store insns do not ever trap for FP reasons.  Also,
all possible IEEE binary FP bit patterns have a defined meaning, and on
PowerPC a NaN is never converted to a default NaN.  You can load a DP
(or SP or QP) float and store it again, and get the same bit pattern out
always, and no traps.

It isn't necessarily very fast on all cores though, subnormals can be
very slow on some cores, as the prime (maybe only?) example -- but even
those are not a problem on any existing core AFAIR, not for DP at least.

Things are different if more is done then just moving the data, there
are issues with using SP data in DP insns for example (and OTOH, on
older cores using DP data in SP insns isn't actually supported).

Maybe I am missing something...  I'll look at the patch in detail ;-)


Segher


[PATCH] aarch64: Add initial support for -mcpu=zeus

2020-06-02 Thread Kyrylo Tkachov
Hi all,

This patch adds support for the Arm Zeus CPU.
Bootstrapped and tested on aarch64-none-linux-gnu.

Committing to trunk (and to the GCC 10 branch).
Thanks,
Kyrill

gcc/

2020-06-02  Kyrylo Tkachov  

* config/aarch64/aarch64-cores.def (zeus): Define.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document zeus -mcpu option.


aarch64-zeus.patch
Description: aarch64-zeus.patch


Re: [PATCH] Port libgccjit to Windows.

2020-06-02 Thread JonY via Gcc-patches
On 5/28/20 8:46 PM, David Malcolm via Gcc-patches wrote:
> On Thu, 2020-05-28 at 16:51 -0300, Nicolas Bértolo wrote:
>>> I'm going to have to trust your Windows expertise here; the tempdir
>>> code looks convoluted to me, but perhaps that's the only way to do
>> it.
>>> (Microsoft's docs for "SECURITY_ATTRIBUTES" suggest to me that if
>>> lpSecurityDescriptor is NULL, then the directory gets a default
>>> security descriptor, and that this may mean it's only readable by
>> the
>>> user represented by the access token of the process [1], which
>> might
>>> suggest a simplification - but I'm very hazy on how the security
>> model
>>> in Windows works)
>>
>> I tested this and it gives write access to the "Authenticated Users"
>> group. 
> 
> Aha - sounds like that would be a problem.  Thanks for clarifying.
> 
>> The
>> way I did it gives access only to the user that owns the libgccjit
>> process. I
>> have to admit that it is a lot of code and it is hard to understand
>> unless you
>> know the security model of Windows well. I don't know it well, I
>> wrote this
>> keeping the documentation close and experimenting.
> 
> Thanks.
> 
>>> I was able to successfully bootstrap and regression test with your
>>> patch on x86_64-pc-linux-gnu.  I also verified that the result of
>> "make
>>> install" was not affected for my configuration.
>>
>> Great.
>>
>>> I've pushed your patch to master as
>>> c83027f32d9cca84959c7d6a1e519a0129731501.
>>>
>>> Thanks again for the patch
>>> Dave
>>
>> Thanks to you for all the good feedback.
>>
>> Nico.
> 

Hello,

A bit of a late review, some minor points:

1. Using .so on Windows for DLLs is fine.
2. The DLL name on Windows should use LIBGCCJIT_SONAME rather than
LIBGCCJIT_LINKER_NAME, so applications would load libgccjit.so.0 instead
of libgccjit.so directly. The linker command output needs to be
LIBGCCJIT_SONAME.
3. Ideally I would prefer to .cc too, though I see other C++ files also
written as .c.

Resend in case reply to list did not work,



signature.asc
Description: OpenPGP digital signature


Simplify streaming of tree references

2020-06-02 Thread Jan Hubicka
Hi,
this patch makes tree referneces to be stream by one integer rather then
by pair of tag and index.  This makes function streams about 3% smaller
and reduces number of ulebs we stream (that still shows high in
profiles).

lto-bootstrapped/regtested x86_64-linux, comitted.

Honza

* lto-streamer-in.c (stream_read_tree_ref): Simplify streaming of
references.
* lto-streamer-out.c (stream_write_tree_ref): Likewise.
diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
index 5eaba7d16d4..15bfb614163 100644
--- a/gcc/lto-streamer-in.c
+++ b/gcc/lto-streamer-in.c
@@ -1501,20 +1501,22 @@ lto_input_scc (class lto_input_block *ib, class data_in 
*data_in,
 tree
 stream_read_tree_ref (lto_input_block *ib, data_in *data_in)
 {
-  unsigned ix = streamer_read_uhwi (ib);
-  tree ret;
+  int ix = streamer_read_hwi (ib);
   if (!ix)
 return NULL_TREE;
-  else if (ix < LTO_NUM_TAGS)
-ret = lto_input_tree_ref (ib, data_in, cfun, (LTO_tags)ix);
+  if (ix > 0)
+return streamer_tree_cache_get_tree (data_in->reader_cache, ix - 1);
+
+  ix = -ix - 1;
+  int id = ix & 1;
+  ix /= 2;
+
+  tree ret;
+  if (!id)
+ret = (*data_in->file_data->current_decl_state
+  ->streams[LTO_DECL_STREAM])[ix];
   else
-ret = streamer_tree_cache_get_tree (data_in->reader_cache,
-   ix - LTO_NUM_TAGS);
-  if (ret && streamer_debugging)
-{
-  enum tree_code c = (enum tree_code)streamer_read_uhwi (ib);
-  gcc_assert (c == TREE_CODE (ret));
-}
+ret = (*SSANAMES (cfun))[ix];
   return ret;
 }
 
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index dfc4603d7ae..f71d3f81662 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -400,15 +400,19 @@ stream_write_tree_ref (struct output_block *ob, tree t)
   unsigned int ix;
   bool existed_p = streamer_tree_cache_lookup (ob->writer_cache, t, &ix);
   if (existed_p)
-   streamer_write_uhwi (ob, ix + LTO_NUM_TAGS);
+   streamer_write_hwi (ob, ix + 1);
   else
{
  enum LTO_tags tag;
  unsigned ix;
+ int id = 0;
 
  lto_indexable_tree_ref (ob, t, &tag, &ix);
- streamer_write_uhwi (ob, tag);
- streamer_write_uhwi (ob, ix);
+ if (tag == LTO_ssa_name_ref)
+   id = 1;
+ else
+   gcc_checking_assert (tag == LTO_global_stream_ref);
+ streamer_write_hwi (ob, -(int)(ix * 2 + id + 1));
}
   if (streamer_debugging)
streamer_write_uhwi (ob, TREE_CODE (t));


Re: [stage1][PATCH] Lower VEC_COND_EXPR into internal functions.

2020-06-02 Thread Martin Liška

On 6/2/20 1:09 PM, Richard Biener wrote:

So please be constructive.  Like, provide a testcase that ICEs
with the FAILs replaced by gcc_unreachable ().  Martin, may I suggest
to do this replacement and bootstrap/test?  I think it would be nice
to have testsuite coverage for the FAILs, and maybe we have that
already.


Hello.

There's the suggested patch that survives bootstrap on ppc64le-linux-gnu
and passes test-suite.

Martin
>From 22db04d058c9bbd140041e7aa2caf1613767095a Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 2 Jun 2020 15:29:37 +0200
Subject: [PATCH] rs6000: replace FAIL with gcc_unreachable.

gcc/ChangeLog:

	* config/rs6000/vector.md: Replace FAIL with gcc_unreachable
	in all vcond* patterns.
---
 gcc/config/rs6000/vector.md | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 662521e74fe..796345c80d3 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -354,7 +354,7 @@ (define_expand "vcond"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 (define_expand "vcond"
@@ -371,7 +371,7 @@ (define_expand "vcond"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 (define_expand "vcondv4sfv4si"
@@ -389,7 +389,7 @@ (define_expand "vcondv4sfv4si"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 (define_expand "vcondv4siv4sf"
@@ -407,7 +407,7 @@ (define_expand "vcondv4siv4sf"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 (define_expand "vcondv2dfv2di"
@@ -425,7 +425,7 @@ (define_expand "vcondv2dfv2di"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 (define_expand "vcondv2div2df"
@@ -443,7 +443,7 @@ (define_expand "vcondv2div2df"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 (define_expand "vcondu"
@@ -460,7 +460,7 @@ (define_expand "vcondu"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 (define_expand "vconduv4sfv4si"
@@ -478,7 +478,7 @@ (define_expand "vconduv4sfv4si"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 (define_expand "vconduv2dfv2di"
@@ -496,7 +496,7 @@ (define_expand "vconduv2dfv2di"
 operands[3], operands[4], operands[5]))
 DONE;
   else
-FAIL;
+gcc_unreachable ();
 })
 
 ;; To support vector condition vectorization, define vcond_mask and vec_cmp.
-- 
2.26.2



[COMMITTED] amdgcn: Remove -mlocal-symbol-id option

2020-06-02 Thread Andrew Stubbs
This patch removes the vestiges of the GCN-specific -mlocal-symbol-id 
option. Previously, this was part of a horrible workaround for a bug in 
the Radeon Open Compute ELF loader. The bug has been fixed a while now, 
and the name mangling has not been present in the compiler for a while, 
so the option that controlled it can go away now.


This clean up also fixes a lot of failures in the PCH testsuite that 
have been occurring since Alexandre's dump file overhaul.


Andrew
amdgcn: Remove -mlocal-symbol-id option

This patch removes the obsolete -mlocal-symbol-id option.  This was used to
control mangling of local symbol names in order to work around a ROCm runtime
bug, but that has not been needed in some time, and the mangling was removed
already.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (CC1_SPEC): Delete.
	* config/gcn/gcn.opt (-mlocal-symbol-id): Delete.
	* config/gcn/mkoffload.c (main): Don't use -mlocal-symbol-id.

gcc/testsuite/ChangeLog:

	* gcc.dg/intermod-1.c: Don't use -mlocal-symbol-id.

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index d6523cf1247..2eaf4149f4c 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -79,16 +79,6 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 #define ASM_SPEC  "-triple=amdgcn--amdhsa -mattr=-code-object-v3 "  \
 		  "%:last_arg(%{march=*:-mcpu=%*}) " \
 		  "-filetype=obj"
-/* Add -mlocal-symbol-id= unless the user (or mkoffload)
-   passes the option explicitly on the command line.  The option also causes
-   several dump-matching tests to fail in the testsuite, so the option is not
-   added when or tree dump/compare-debug options used in the testsuite are
-   present.
-   This has the potential for surprise, but a user can still use an explicit
-   -mlocal-symbol-id= option manually together with -fdump-tree or
-   -fcompare-debug options.  */
-#define CC1_SPEC "%{!mlocal-symbol-id=*:%{!fdump-tree-*:"	\
-		 "%{!fdump-ipa-*:%{!fcompare-debug*:-mlocal-symbol-id=%b"
 #define LINK_SPEC "--pie"
 #define LIB_SPEC  "-lc"
 
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 04c73d64630..e1b9942ebed 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -70,9 +70,6 @@ mstack-size=
 Target Report RejectNegative Joined UInteger Var(stack_size_opt) Init(-1)
 -mstack-size=	Set the private segment size per wave-front, in bytes.
 
-mlocal-symbol-id=
-Target RejectNegative Report JoinedOrMissing Var(local_symbol_id) Init(0)
-
 Wopenacc-dims
 Target Var(warn_openacc_dims) Warning
 Warn about invalid OpenACC dimensions.
diff --git a/gcc/config/gcn/mkoffload.c b/gcc/config/gcn/mkoffload.c
index 723da108b65..4a99d70e312 100644
--- a/gcc/config/gcn/mkoffload.c
+++ b/gcc/config/gcn/mkoffload.c
@@ -524,7 +524,7 @@ main (int argc, char **argv)
   FILE *in = stdin;
   FILE *out = stdout;
   FILE *cfile = stdout;
-  const char *outname = 0, *offloadsrc = 0;
+  const char *outname = 0;
 
   progname = "mkoffload";
   diagnostic_initialize (global_dc, 0);
@@ -653,18 +653,11 @@ main (int argc, char **argv)
   if (!strcmp (argv[ix], "-o") && ix + 1 != argc)
 	outname = argv[++ix];
   else
-	{
-	  obstack_ptr_grow (&cc_argv_obstack, argv[ix]);
-
-	  if (argv[ix][0] != '-')
-	offloadsrc = argv[ix];
-	}
+	obstack_ptr_grow (&cc_argv_obstack, argv[ix]);
 }
 
   obstack_ptr_grow (&cc_argv_obstack, "-o");
   obstack_ptr_grow (&cc_argv_obstack, gcn_s1_name);
-  obstack_ptr_grow (&cc_argv_obstack,
-		concat ("-mlocal-symbol-id=", offloadsrc, NULL));
   obstack_ptr_grow (&cc_argv_obstack, NULL);
   const char **cc_argv = XOBFINISH (&cc_argv_obstack, const char **);
 
diff --git a/gcc/testsuite/gcc.dg/intermod-1.c b/gcc/testsuite/gcc.dg/intermod-1.c
index 44a8ce071b5..9f8d19deb6a 100644
--- a/gcc/testsuite/gcc.dg/intermod-1.c
+++ b/gcc/testsuite/gcc.dg/intermod-1.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-mlocal-symbol-id=" { target amdgcn-*-* } } */
 /* { dg-final { scan-assembler-not {foo[1-9]\.[0-9]} } } */
 
 /* Check that we don't get .0 suffixes on static variables when not using


Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Martin Liška

On 6/2/20 4:14 PM, Jonathan Wakely wrote:

On Tue, 2 Jun 2020 at 14:56, Jonathan Wakely  wrote:


On Tue, 2 Jun 2020 at 14:16, Martin Liška  wrote:


On 6/2/20 1:48 PM, Martin Liška wrote:

I tend to this approach. Let me prepare a patch candidate for it.


There's a patch for it. Can you please Jonathan take a look?


Looks great, thanks!

+if name not in wildcard_prefixes:
+msg = 'unsupported wilcard prefix'

Typo "wilcard"

+if pattern not in used_patterns:
+self.errors.append(Error('a file pattern not used in a patch',
+ pattern))

If the script printed this error I don't think I'd know what it was
complaining about. How about "pattern doesn't match any changed files"
?

I find the existing error 'file not changed in a patch' to be
suboptimal too. We're checking revisions (or commits), not patches.


Along those lines, here's an incomplete patch (tests aren't updated
yet, no commit log, your latest change isn't merged ) to revise the
error messages. I've tried to make them more consistent (e.g change
'file not changed in a patch' to 'unchanged file mentioned in a
ChangeLog' which mirrors the error for the opposite condition,
'changed file not mentioned in a ChangeLog').


I welcome this.



I've also moved line numbers to the start of the line (like GCC's own
diagnostics) and not used the "line" argument of the Error constructor
for things that aren't line numbers. I've aimed to be consistent, so
that line numbers come at the start, pathnames and patterns come at
the end (and there's a space before them).


Well, the Error ctor argument 'line' is bit misleading. It's a string and
not an integer:

  File "/home/marxin/Programming/gcc/contrib/gcc-changelog/git_commit.py", line 
177, in __repr__
return '%d: %s' % (self.line, self.message)
TypeError: %d format: a number is required, not str

and it represents a line from a git commit message. I use it in order to provide
line which is affected by an error.



I also suggest changing 'additional author must prepend with tab and 4
spaces' to 'additional author must be indented with one tab and four
spaces'.> 
The intent is to improve the user experience, since for many people

who are new to git *any* error can cause confusion, so extra,
gcc-specific errors that we issue should aim to be easy to understand.


I like your wordings.

Martin



Do you like the direction?





Re: [PATCH] Extend std::copy/std::copy_n char* overload to deque iterator

2020-06-02 Thread François Dumont via Gcc-patches

Hi

Any feedback regarding this patch ?

François


On 26/05/20 1:45 pm, François Dumont wrote:

On 24/05/20 3:43 pm, François Dumont wrote:

Now tested in C++98 mode, there was indeed a small problem.

I even wonder if I shouldn't have extend the std::copy overload to 
any call with deque iterator as the output so that it is transform 
into an output to pointer.



Ignore this remark, I had a look and we already have an overload that 
does that. It is limited to RA iterators and is just fine.





Ok to commit ?

François

On 23/05/20 6:37 pm, Jonathan Wakely wrote:

On 22/05/20 22:57 +0200, François Dumont via Libstdc++ wrote:

On 21/05/20 2:17 pm, Jonathan Wakely wrote:


Why is the optimization not done for C++03 mode?

I did it this way because the new std::copy overload rely on 
std::copy_n implementation details which is a C++11 algo.




It looks like the uses of 'auto' can be reaplced easily, and
__enable_if_t<> can be replaced with 
__gnu_cxx::__enable_if<>::__type.


But yes, we can indeed provide those implementation details in 
pre-C++11.


This is what I've done in this new version.

Tested under Linux x86_64 in default c++ mode.

I tried to use CXXFLAGS=-std=c++03 but it doesn't seem to work even 
if I do see the option in build logs. I remember you adivised a 
different approach, can you tell me again ?


See the documentation:
https://gcc.gnu.org/onlinedocs/libstdc++/manual/test.html#test.run.permutations 












Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 2 Jun 2020 at 14:56, Jonathan Wakely  wrote:
>
> On Tue, 2 Jun 2020 at 14:16, Martin Liška  wrote:
> >
> > On 6/2/20 1:48 PM, Martin Liška wrote:
> > > I tend to this approach. Let me prepare a patch candidate for it.
> >
> > There's a patch for it. Can you please Jonathan take a look?
>
> Looks great, thanks!
>
> +if name not in wildcard_prefixes:
> +msg = 'unsupported wilcard prefix'
>
> Typo "wilcard"
>
> +if pattern not in used_patterns:
> +self.errors.append(Error('a file pattern not used in a 
> patch',
> + pattern))
>
> If the script printed this error I don't think I'd know what it was
> complaining about. How about "pattern doesn't match any changed files"
> ?
>
> I find the existing error 'file not changed in a patch' to be
> suboptimal too. We're checking revisions (or commits), not patches.

Along those lines, here's an incomplete patch (tests aren't updated
yet, no commit log, your latest change isn't merged ) to revise the
error messages. I've tried to make them more consistent (e.g change
'file not changed in a patch' to 'unchanged file mentioned in a
ChangeLog' which mirrors the error for the opposite condition,
'changed file not mentioned in a ChangeLog').

I've also moved line numbers to the start of the line (like GCC's own
diagnostics) and not used the "line" argument of the Error constructor
for things that aren't line numbers. I've aimed to be consistent, so
that line numbers come at the start, pathnames and patterns come at
the end (and there's a space before them).

I also suggest changing 'additional author must prepend with tab and 4
spaces' to 'additional author must be indented with one tab and four
spaces'.

The intent is to improve the user experience, since for many people
who are new to git *any* error can cause confusion, so extra,
gcc-specific errors that we issue should aim to be easy to understand.

Do you like the direction?
diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 069900d7cbf..453f0f61803 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -173,10 +173,9 @@ class Error:
 self.line = line
 
 def __repr__(self):
-s = self.message
 if self.line:
-s += ':"%s"' % self.line
-return s
+return '%d: %s' % (self.line, self.message)
+return self.message
 
 
 class ChangeLogEntry:
@@ -376,8 +375,8 @@ class GitCommit:
 elif additional_author_regex.match(line):
 m = additional_author_regex.match(line)
 if len(m.group('spaces')) != 4:
-msg = 'additional author must prepend with tab ' \
-  'and 4 spaces'
+msg = 'additional author must be indented with '\
+  ' one tab and four spaces'
 self.errors.append(Error(msg, line))
 else:
 author_tuple = (m.group('name'), None)
@@ -437,15 +436,14 @@ class GitCommit:
 m = star_prefix_regex.match(line)
 if m:
 if len(m.group('spaces')) != 1:
-err = Error('one space should follow asterisk',
-line)
-self.errors.append(err)
+msg = 'one space should follow asterisk'
+self.errors.append(Error(msg, line))
 else:
 last_entry.lines.append(line)
 else:
 if last_entry.is_empty:
 msg = 'first line should start with a tab, ' \
-  'asterisk and space'
+  'an asterisk and a space'
 self.errors.append(Error(msg, line))
 else:
 last_entry.lines.append(line)
@@ -526,8 +524,8 @@ class GitCommit:
 used_patterns = set()
 for entry in self.changelog_entries:
 if not entry.files:
-msg = 'ChangeLog must contain at least one file entry'
-self.errors.append(Error(msg, entry.folder))
+msg = 'no files mentioned for ChangeLog in directory: "%s"'
+self.errors.append(Error(msg % entry.folder))
 assert not entry.folder.endswith('/')
 for file in entry.files:
 if not self.is_changelog_filename(file):
@@ -539,7 +537,8 @@ class GitCommit:
 if not self.is_changelog_filename(x[0])]
 changed_files = set(cand)
 for file in sorted(mentioned_files - changed_files):
-self.errors.append(Error('file not changed in a patch', file))
+msg =

Re: Broken build

2020-06-02 Thread Hans-Peter Nilsson via Gcc-patches
> From: Alexandre Oliva 
> Date: Tue, 2 Jun 2020 13:29:03 +0200

> Hello, Anthony, H-P,
> 
> On May 27, 2020, Anthony Green  wrote:
> 
> > Hans-Peter Nilsson via Gcc-patches  writes:
> >> And here's an improper bug report.
> >> 
> >> One of the commits between cfdff3eeb90..5c8344e7289 caused every
> >> single *linked* test to fail for cris-elf, like:
> 
> > I can confirm that the moxie-elf test cases don't link either.
> 
> I believe the problem reported by Hans-Peter was fixed in the follow up
> I put in the other day,

FAOD: confirmed.  (And thanks for fixing it promptly.)

brgds, H-P


Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Martin Liška

On 6/2/20 3:56 PM, Jonathan Wakely wrote:

On Tue, 2 Jun 2020 at 14:16, Martin Liška  wrote:


On 6/2/20 1:48 PM, Martin Liška wrote:

I tend to this approach. Let me prepare a patch candidate for it.


There's a patch for it. Can you please Jonathan take a look?


Looks great, thanks!

+if name not in wildcard_prefixes:
+msg = 'unsupported wilcard prefix'

Typo "wilcard"

+if pattern not in used_patterns:
+self.errors.append(Error('a file pattern not used in a patch',
+ pattern))

If the script printed this error I don't think I'd know what it was
complaining about. How about "pattern doesn't match any changed files"
?


Yes, thank you for the corrections. I've just pushed that to master.

Martin



I find the existing error 'file not changed in a patch' to be
suboptimal too. We're checking revisions (or commits), not patches.





[GCC][PATCH][ARM]: Correct the grouping of operands in MVE vector scatter store intrinsics (PR94735).

2020-06-02 Thread Srinath Parvathaneni
Hello,

The operands in RTL patterns of MVE vector scatter store intrinsics are wrongly 
grouped, because of which few
vector loads and stores instructions are wrongly getting optimized out with -O2.

A new predicate "mve_scatter_memory" is defined in this patch, this predicate 
returns TRUE on
matching: (mem(reg)) for MVE scatter store intrinsics.
This patch fixes the issue by adding define_expand pattern with 
"mve_scatter_memory" predicate and calls the
corresponding define_insn by passing register_operand as first argument. This 
register_operand is extracted
from the operand with "mve_scatter_memory" predicate in define_expand pattern.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more 
details.
[1] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2020-06-02  Srinath Parvathaneni

PR target/94735
* config/arm//predicates.md (mve_scatter_memory): Define to
match (mem (reg)) for scatter store memory.
* config/arm/mve.md (mve_vstrbq_scatter_offset_): Modify
define_insn to define_expand.
(mve_vstrbq_scatter_offset_p_): Likewise.
(mve_vstrhq_scatter_offset_): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_): Likewise.
(mve_vstrhq_scatter_shifted_offset_): Likewise.
(mve_vstrdq_scatter_offset_p_v2di): Likewise.
(mve_vstrdq_scatter_offset_v2di): Likewise.
(mve_vstrdq_scatter_shifted_offset_p_v2di): Likewise.
(mve_vstrdq_scatter_shifted_offset_v2di): Likewise.
(mve_vstrhq_scatter_offset_fv8hf): Likewise.
(mve_vstrhq_scatter_offset_p_fv8hf): Likewise.
(mve_vstrhq_scatter_shifted_offset_fv8hf): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf): Likewise.
(mve_vstrwq_scatter_offset_fv4sf): Likewise.
(mve_vstrwq_scatter_offset_p_fv4sf): Likewise.
(mve_vstrwq_scatter_offset_p_v4si): Likewise.
(mve_vstrwq_scatter_offset_v4si): Likewise.
(mve_vstrwq_scatter_shifted_offset_fv4sf): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_v4si): Likewise.
(mve_vstrwq_scatter_shifted_offset_v4si): Likewise.
(mve_vstrbq_scatter_offset__insn): Define insn for scatter
stores.
(mve_vstrbq_scatter_offset_p__insn): Likewise.
(mve_vstrhq_scatter_offset__insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p__insn): Likewise.
(mve_vstrhq_scatter_shifted_offset__insn): Likewise.
(mve_vstrdq_scatter_offset_p_v2di_insn): Likewise.
(mve_vstrdq_scatter_offset_v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_p_v2di_insn): Likewise.
(mve_vstrdq_scatter_shifted_offset_v2di_insn): Likewise.
(mve_vstrhq_scatter_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_offset_p_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Likewise.
(mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Likewise.
(mve_vstrwq_scatter_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_offset_p_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_offset_p_v4si_insn): Likewise.
(mve_vstrwq_scatter_offset_v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_p_v4si_insn): Likewise.
(mve_vstrwq_scatter_shifted_offset_v4si_insn): Likewise.

gcc/testsuite/ChangeLog:

2020-06-02  Srinath Parvathanenisrinath.parvathan...@arm.com

PR target/94735
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_base.c: New test.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_base_p.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_offset.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_offset_p.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_shifted_offset.c:
Likewise.
* gcc.target/arm/mve/intrinsics/mve_vstore_scatter_shifted_offset_p.c:
Likewise.


### Attachment also inlined for ease of reply###


diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 
986fbfe2abae5f1e91e65f1ff5c84709c43c4617..3a57901bd5bcd770832d59dc77cd92b6d9b5ecb4
 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -8102,22 +8102,29 @@
 ;;
 ;; [vstrbq_scatter_offset_s vstrbq_scatter_offset_u]
 ;;
-(define_insn "mve_vstrbq_scatter_offset_"
-  [(set (match_operand: 0 "memory_operand" "=Us")
-   (unspec:
-   [(match_operand:MVE_2 1 "s_register_operand" "w")
-(match_operand:MVE_2 2 "s_register_operand" "w")]
-VSTRBSOQ))
-  ]
+(define_expand "mve_vstrbq_scatter_offse

[PATCH] recog: Use parameter packs for operator()

2020-06-02 Thread Richard Sandiford
This patch uses parameter packs to define insn_gen_fn::operator().
I guess in some ways it's C++-ification for its own sake, but it does
make things simpler and removes the current artificial limit of 16
arguments.

Note that the call is still strongly typed: all arguments have to have
implicit conversions to rtx.  Error messages for bad arguments look
reasonable.

I'm sure there are more elegant ways of getting the function type,
but this version at least fits on one line, so I didn't try too
hard to find an alternative.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2020-06-02  Richard Sandiford  

gcc/
* recog.h (insn_gen_fn::f0, insn_gen_fn::f1, insn_gen_fn::f2)
(insn_gen_fn::f3, insn_gen_fn::f4, insn_gen_fn::f5, insn_gen_fn::f6)
(insn_gen_fn::f7, insn_gen_fn::f8, insn_gen_fn::f9, insn_gen_fn::f10)
(insn_gen_fn::f11, insn_gen_fn::f12, insn_gen_fn::f13)
(insn_gen_fn::f14, insn_gen_fn::f15, insn_gen_fn::f16): Delete.
(insn_gen_fn::operator()): Replace overloaded definitions with
a parameter-pack version.
---
 gcc/recog.h | 40 +---
 1 file changed, 5 insertions(+), 35 deletions(-)

diff --git a/gcc/recog.h b/gcc/recog.h
index 17c09fdba3b..0a71a02c4a9 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -292,43 +292,13 @@ typedef const char * (*insn_output_fn) (rtx *, rtx_insn 
*);
 
 struct insn_gen_fn
 {
-  typedef rtx_insn * (*f0) (void);
-  typedef rtx_insn * (*f1) (rtx);
-  typedef rtx_insn * (*f2) (rtx, rtx);
-  typedef rtx_insn * (*f3) (rtx, rtx, rtx);
-  typedef rtx_insn * (*f4) (rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f5) (rtx, rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f6) (rtx, rtx, rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f7) (rtx, rtx, rtx, rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f8) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f9) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f10) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f11) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
rtx);
-  typedef rtx_insn * (*f12) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
rtx, rtx);
-  typedef rtx_insn * (*f13) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
rtx, rtx, rtx);
-  typedef rtx_insn * (*f14) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f15) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
rtx, rtx, rtx, rtx, rtx);
-  typedef rtx_insn * (*f16) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx, 
rtx, rtx, rtx, rtx, rtx, rtx);
-
   typedef void (*stored_funcptr) (void);
 
-  rtx_insn * operator () (void) const { return ((f0)func) (); }
-  rtx_insn * operator () (rtx a0) const { return ((f1)func) (a0); }
-  rtx_insn * operator () (rtx a0, rtx a1) const { return ((f2)func) (a0, a1); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2) const { return ((f3)func) 
(a0, a1, a2); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3) const { return 
((f4)func) (a0, a1, a2, a3); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4) const { 
return ((f5)func) (a0, a1, a2, a3, a4); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5) 
const { return ((f6)func) (a0, a1, a2, a3, a4, a5); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6) const { return ((f7)func) (a0, a1, a2, a3, a4, a5, a6); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6, rtx a7) const { return ((f8)func) (a0, a1, a2, a3, a4, a5, a6, a7); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6, rtx a7, rtx a8) const { return ((f9)func) (a0, a1, a2, a3, a4, a5, a6, a7, 
a8); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6, rtx a7, rtx a8, rtx a9) const { return ((f10)func) (a0, a1, a2, a3, a4, a5, 
a6, a7, a8, a9); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6, rtx a7, rtx a8, rtx a9, rtx a10) const { return ((f11)func) (a0, a1, a2, 
a3, a4, a5, a6, a7, a8, a9, a10); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6, rtx a7, rtx a8, rtx a9, rtx a10, rtx a11) const { return ((f12)func) (a0, 
a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6, rtx a7, rtx a8, rtx a9, rtx a10, rtx a11, rtx a12) const { return 
((f13)func) (a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6, rtx a7, rtx a8, rtx a9, rtx a10, rtx a11, rtx a12, rtx a13) const { return 
((f14)func) (a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13); }
-  rtx_insn * operator () (rtx a0, rtx a1, rtx a2, rtx a3, rtx a4, rtx a5, rtx 
a6, rtx a7, rtx a8, rtx a9, rtx a10, rtx a11, rtx a12,

Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 2 Jun 2020 at 14:16, Martin Liška  wrote:
>
> On 6/2/20 1:48 PM, Martin Liška wrote:
> > I tend to this approach. Let me prepare a patch candidate for it.
>
> There's a patch for it. Can you please Jonathan take a look?

Looks great, thanks!

+if name not in wildcard_prefixes:
+msg = 'unsupported wilcard prefix'

Typo "wilcard"

+if pattern not in used_patterns:
+self.errors.append(Error('a file pattern not used in a patch',
+ pattern))

If the script printed this error I don't think I'd know what it was
complaining about. How about "pattern doesn't match any changed files"
?

I find the existing error 'file not changed in a patch' to be
suboptimal too. We're checking revisions (or commits), not patches.


Re: [stage1][PATCH] Make TOPN counter dynamically allocated.

2020-06-02 Thread Martin Liška

On 6/2/20 10:27 AM, Jan Hubicka wrote:

The patch looks good (and is OK for mainline).  I am bit concerned about
two things.


Hello.

Thank you for the review!


  1) I think we should add support to pre-allocate memory pool so we hide
  the problem with instrumenting malloc (I think with big enough memory
  pool the Firefox malloc issue should disappear since we populate the
  counters used to profile malloc before we run out of it).
  I think this can be done by adding comdat symbol so one can specify
  size of the pool at compile time.


I'm suggesting to pre-allocate 16 gcov_kvp in the gcov run-time library.
Please take a look at the attached patch. I also added a test-case that
stresses that. I've just finished LTO PGO bootstrap of the GCC.

Ready for master?


  2) This seems like bit too big hammer for profiling values for division
  etc.  We will see how perfromance will go - in worst case I guess we
  could put back the original counter TOPN counter and use it for those
  cases.


We'll see about this and we can possibly reduce number of tracked values
for this kind of profiling.

Thanks,
Martin
>From 12e0fe1be6490a90d45c9a20a98c096ba6c03e07 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 2 Jun 2020 13:31:48 +0200
Subject: [PATCH] libgcov: support overloaded malloc

gcc/ChangeLog:

	* gcov-io.h (GCOV_PREALLOCATED_KVP): New.
	* gcov-tool.c: Add __gcov_kvp_pool
	and __gcov_kvp_pool_index variables.

libgcc/ChangeLog:

	* libgcov-driver.c: Add __gcov_kvp_pool
	and __gcov_kvp_pool_index variables.
	* libgcov.h (allocate_gcov_kvp): New.
	(gcov_topn_add_value): Use it.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-prof/indir-call-prof-malloc.c: New test.
---
 gcc/gcov-io.h |  3 ++
 .../gcc.dg/tree-prof/indir-call-prof-malloc.c | 49 +++
 libgcc/libgcov-driver.c   |  6 +++
 libgcc/libgcov.h  | 35 -
 4 files changed, 91 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c

diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 940eb7d8561..4dba01c78ce 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -292,6 +292,9 @@ GCOV_COUNTERS
 /* Maximum number of tracked TOP N value profiles.  */
 #define GCOV_TOPN_MAXIMUM_TRACKED_VALUES 32
 
+/* Number of pre-allocated gcov_kvp structures.  */
+#define GCOV_PREALLOCATED_KVP 16
+
 /* Convert a counter index to a tag.  */
 #define GCOV_TAG_FOR_COUNTER(COUNT)\
 	(GCOV_TAG_COUNTER_BASE + ((gcov_unsigned_t)(COUNT) << 17))
diff --git a/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
new file mode 100644
index 000..454e224c95f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-prof/indir-call-prof-malloc.c
@@ -0,0 +1,49 @@
+/* { dg-options "-O2 -ldl" } */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+
+int global;
+int global2;
+
+void report1 (size_t size)
+{
+  global++;
+}
+
+void report2 (size_t size)
+{
+  global2++;
+}
+
+typedef void (*tp) (size_t);
+static tp fns[] = {report1, report2};
+
+void* malloc(size_t size)
+{
+  static void* (*real_malloc)(size_t) = NULL;
+  if (!real_malloc)
+  real_malloc = dlsym(RTLD_NEXT, "malloc");
+
+  void *p = real_malloc (size);
+  fns[size % 2] (size);
+  // fprintf(stderr, "malloc(%d) = %p\n", size, p);
+  return p;
+}
+
+void *calloc (size_t n, size_t size)
+{
+  void *ptr = malloc (n * size);
+  __builtin_memset (ptr, 0, n * size);
+  return ptr;
+}
+
+void *ptr;
+
+int main()
+{
+  ptr = malloc (16);
+  return 0;
+}
diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index 8348d9f33ec..f2383e9d198 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -576,6 +576,12 @@ struct gcov_root __gcov_root;
 struct gcov_master __gcov_master = 
   {GCOV_VERSION, 0};
 
+/* Pool of pre-allocated gcov_kvp strutures.  */
+struct gcov_kvp __gcov_kvp_pool[GCOV_PREALLOCATED_KVP];
+
+/* Index to first free gcov_kvp in the pool.  */
+unsigned __gcov_kvp_pool_index;
+
 void
 __gcov_exit (void)
 {
diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index 7c037a97878..0c3f46e9a04 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -236,6 +236,8 @@ struct indirect_call_tuple
   
 /* Exactly one of these will be active in the process.  */
 extern struct gcov_master __gcov_master;
+extern struct gcov_kvp __gcov_kvp_pool[GCOV_PREALLOCATED_KVP];
+extern unsigned __gcov_kvp_pool_index;
 
 /* Dump a set of gcov objects.  */
 extern void __gcov_dump_one (struct gcov_root *) ATTRIBUTE_HIDDEN;
@@ -401,6 +403,36 @@ gcov_counter_set_if_null (gcov_type *counter, struct gcov_kvp *node,
 }
 }
 
+/* Allocate gcov_kvp from heap.  If we are recursively called, then allocate
+   it from a list of pre-allocated pool.  */
+
+static inline struct gcov_kvp *
+allocate_gcov_kvp (void)
+{
+  struct gcov_kvp *new_node = NULL;
+  static volatile unsigned in_recursion ATTRIBUTE_UNUSED = 0;
+
+#if !de

Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Martin Liška

On 6/2/20 1:48 PM, Martin Liška wrote:

I tend to this approach. Let me prepare a patch candidate for it.


There's a patch for it. Can you please Jonathan take a look?

Thanks,
Martin
>From 4d2cf31b6deb03c9ddc8062b9a45d2511e4a58bb Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 2 Jun 2020 15:13:22 +0200
Subject: [PATCH] gcc-changelog: support patterns

contrib/ChangeLog:

	* gcc-changelog/git_commit.py: Support foo/bar/*: patterns in
	wildcard_prefixes locations.
	* gcc-changelog/test_email.py: Test it.
	* gcc-changelog/test_patches.txt: Add 3 new patches.
---
 contrib/gcc-changelog/git_commit.py| 52 +---
 contrib/gcc-changelog/test_email.py| 12 
 contrib/gcc-changelog/test_patches.txt | 86 ++
 3 files changed, 142 insertions(+), 8 deletions(-)

diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
index e2ef6c61ed0..37a3ef3fa92 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -136,6 +136,11 @@ ignored_prefixes = [
 'libsanitizer/',
 ]
 
+wildcard_prefixes = [
+'gcc/testsuite/',
+'libstdc++-v3/doc/html/'
+]
+
 misc_files = [
 'gcc/DATESTAMP',
 'gcc/BASE-VER',
@@ -182,11 +187,10 @@ class ChangeLogEntry:
 self.initial_prs = list(prs)
 self.prs = list(prs)
 self.lines = []
+self.files = []
+self.file_patterns = []
 
-@property
-def files(self):
-files = []
-
+def parse_file_names(self):
 # Whether the content currently processed is between a star prefix the
 # end of the file list: a colon or an open paren.
 in_location = False
@@ -215,8 +219,10 @@ class ChangeLogEntry:
 for file in line.split(','):
 file = file.strip()
 if file:
-files.append(file)
-return files
+if file.endswith('*'):
+self.file_patterns.append(file[:-1])
+else:
+self.files.append(file)
 
 @property
 def datetime(self):
@@ -275,8 +281,10 @@ class GitCommit:
 self.parse_lines(all_are_ignored)
 if self.changes:
 self.parse_changelog()
+self.parse_file_names()
 self.check_for_empty_description()
 self.deduce_changelog_locations()
+self.check_file_patterns()
 if not self.errors:
 self.check_mentioned_files()
 self.check_for_correct_changelog()
@@ -442,6 +450,18 @@ class GitCommit:
 else:
 last_entry.lines.append(line)
 
+def parse_file_names(self):
+for entry in self.changelog_entries:
+entry.parse_file_names()
+
+def check_file_patterns(self):
+for entry in self.changelog_entries:
+for pattern in entry.file_patterns:
+name = os.path.join(entry.folder, pattern)
+if name not in wildcard_prefixes:
+msg = 'unsupported wilcard prefix'
+self.errors.append(Error(msg, name))
+
 def check_for_empty_description(self):
 for entry in self.changelog_entries:
 for i, line in enumerate(entry.lines):
@@ -502,6 +522,8 @@ class GitCommit:
 assert folder_count == len(self.changelog_entries)
 
 mentioned_files = set()
+mentioned_patterns = []
+used_patterns = set()
 for entry in self.changelog_entries:
 if not entry.files:
 msg = 'ChangeLog must contain at least one file entry'
@@ -510,6 +532,8 @@ class GitCommit:
 for file in entry.files:
 if not self.is_changelog_filename(file):
 mentioned_files.add(os.path.join(entry.folder, file))
+for pattern in entry.file_patterns:
+mentioned_patterns.append(os.path.join(entry.folder, pattern))
 
 cand = [x[0] for x in self.modified_files
 if not self.is_changelog_filename(x[0])]
@@ -543,9 +567,21 @@ class GitCommit:
 assert file.startswith(entry.folder)
 file = file[len(entry.folder):].lstrip('/')
 entry.lines.append('\t* %s: New file.' % file)
+entry.files.append(file)
 else:
-msg = 'changed file not mentioned in a ChangeLog'
-self.errors.append(Error(msg, file))
+used_pattern = [p for p in mentioned_patterns
+if file.startswith(p)]
+used_pattern = used_pattern[0] if used_pattern else None
+if used_pattern:
+used_patterns.add(used_pattern)
+else:
+msg = 'changed file not mentioned in a ChangeLog'
+

Re: [PATCH 5/5] Adds ipa-initcall-cp pass.

2020-06-02 Thread Richard Biener via Gcc-patches
On Fri, May 29, 2020 at 8:52 PM Erick Ochoa
 wrote:
>
>
>
> This pass is a variant of constant propagation where global
> primitive constants with a single write are propagated to multiple
> read statements.

Just a few small comments while skimming through the code

> ChangeLog:
>
> 2020-05-20  Erick Ochoa 
>
> gcc/Makefile.in: Adds new pass
> gcc/passes.def: Same
> gcc/tree-pass.h: Same
> gcc/common.opt: Same
> gcc/cgraph.h: Adds new field to cgraph_node
> gcc/cgraph.c: Same
> gcc/ipa-cp.c: May skip ipa-cp for a function
> if initcall-cp has constants to propagate
> in that same function
> gcc/ipa-initcall-cp.c: New file
> ---
>   gcc/Makefile.in   |1 +
>   gcc/cgraph.h  |5 +-
>   gcc/common.opt|8 +
>   gcc/ipa-cp.c  |2 +-
>   gcc/ipa-initcall-cp.c | 1199 +
>   gcc/passes.def|1 +
>   gcc/tree-pass.h   |1 +
>   7 files changed, 1215 insertions(+), 2 deletions(-)
>   create mode 100644 gcc/ipa-initcall-cp.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 543b477ff18..b94fb633d78 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1401,6 +1401,7 @@ OBJS = \
> internal-fn.o \
> ipa-cp.o \
> ipa-sra.o \
> +   ipa-initcall-cp.o \
> ipa-devirt.o \
> ipa-fnsummary.o \
> ipa-polymorphic-call.o \
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 5ddeb65269b..532b4671609 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -937,7 +937,7 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node :
> public symtab_node
> split_part (false), indirect_call_target (false), local (false),
> versionable (false), can_change_signature (false),
> redefined_extern_inline (false), tm_may_enter_irr (false),
> -  ipcp_clone (false), m_uid (uid), m_summary_id (-1)
> +  ipcp_clone (false), skip_ipa_cp (false), m_uid (uid),
> m_summary_id (-1)
> {}
>
> /* Remove the node from cgraph and all inline clones inlined into it.
> @@ -1539,6 +1539,9 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node
> : public symtab_node
> unsigned tm_may_enter_irr : 1;
> /* True if this was a clone created by ipa-cp.  */
> unsigned ipcp_clone : 1;
> +  /* True if ipa-initcall-cp and therefore we need to skip ipa-cp for
> cgraph
> +   * node.  */
> +  unsigned skip_ipa_cp : 1;
>
>   private:
> /* Unique id of the node.  */
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d33383b523c..b2d8d1b0958 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3408,4 +3408,12 @@ fipa-ra
>   Common Report Var(flag_ipa_ra) Optimization
>   Use caller save register across calls if possible.
>
> +fipa-initcall-cp-dryrun
> +Common Report Var(flag_initcall_cp_dryrun)
> +Run analysis for propagating constants across initialization functions.
> +
> +fipa-initcall-cp
> +Common Report Var(flag_initcall_cp) Optimization
> +Run transformation for propagation constants across initialization
> functions.
> +
>   ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index c64e9104a94..1036cb0684e 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1188,7 +1188,7 @@ initialize_node_lattices (struct cgraph_node *node)
>
> gcc_checking_assert (node->has_gimple_body_p ());
>
> -  if (!ipa_get_param_count (info))
> +  if (!ipa_get_param_count (info) || node->skip_ipa_cp)
>   disable = true;
> else if (node->local)
>   {
> diff --git a/gcc/ipa-initcall-cp.c b/gcc/ipa-initcall-cp.c
> new file mode 100644
> index 000..02f70b7a908
> --- /dev/null
> +++ b/gcc/ipa-initcall-cp.c
> @@ -0,0 +1,1199 @@
> +/* Initcall constant propagation pass.
> +   Copyright (C) 2019-2020 Free Software Foundation, Inc.
> +
> +   Contributed by Erick Ochoa 
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "tree-eh.h"
> +#include "gimple.h"
> +#include "gimple-expr.h"
> +#include "gimple-iterator.h"
> +#include "predict.h"
> +#include "alloc-pool.h"
> +#include "tree-pass.h"
> +#include "cgraph.h"
> +#include "diagnostic.h"
> +#include "fold-const.h"
> +#include "gimple-fold.h"
> +

Re: [patch 0/5] ipa-initcall-cp

2020-06-02 Thread Richard Biener via Gcc-patches
On Fri, May 29, 2020 at 8:44 PM Erick Ochoa
 wrote:
>
> Hello everyone,
>
> I wanted to highlight this ticket on bugzilla [0]. It is a missed
> optimization that I worked on. It involves propagating constants across
> function calls at link-time. I am relatively new to GCC and this would
> be my first significant contribution. I have developed a prototype of
> the solution that seems to work well enough to compile and run CPU2017
> intrate benchmarks correctly. I am now in the process of running the
> full suite. Feedback would be appreciated.
>
> Here's an overview of how it works:
>
> ipa-initcall-cp collects all variables with static lifetime that contain
> only a single constant write. Then, for each read of the variable, we
> determine (statically) if the write occurs before read. In order to do
> this, we use both the dominators graph and the static call graph. If the
> write occurs before all reads, then we can safely substitute the read
> with the constant being written to the variable. ipa-initcall-cp works
> across function calls so the read and write do not need to occur on the
> same function.

I've had a quick look and stumbled across the following issues that
make adoption highly unlikely:
 - the pass is a simple IPA pass
 - it needs to disable IPA CP for some reason
 - it looks quite heavy-weight for the simple transform it does
   (targeting bechmarks)

Now, the specific transform should be done as true IPA pass and
I think most of the requirements are already met by the IPA reference
pass.  IPA reference local analysis would need to be changed to
record whether stores to a variable dominate loads and all calls
local to a function [alternatively record a bounded vector of called
functions before such store].  At WPA time this information can be
propagated and at transform time the value replacement can happen.

> There are some issues which still need to be addressed, particularly,
> ipa-initcall-cp is at the moment just a prototype and as such it will
> stream in functions and modify them during WPA. I would like to fix
> this, however I am not sure how to properly use clones when modifying
> the function body.

I'm not sure why you are cloning functions at all here - you do not seem
to consider the case where only a subset of the program flow dominates
a store to the respective variable?

Richard.

> I have tested this against applied my patch to GCC version 10.1. If you
> have any questions, comments, let me know.
>
> [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92538


Re: [PATCH 2/5] Add function tree_code_in_cst.

2020-06-02 Thread Erick Ochoa




On 02/06/2020 14:29, Richard Biener wrote:

On Sat, May 30, 2020 at 12:18 AM Jan Hubicka  wrote:




---
  gcc/tree.h | 11 +++
  1 file changed, 11 insertions(+)

diff --git a/gcc/tree.h b/gcc/tree.h
index bd0c51b2a18..86a4542f58b 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -6156,6 +6156,17 @@ int_bit_position (const_tree field)
 + wi::to_offset (DECL_FIELD_BIT_OFFSET (field))).to_shwi ();
  }

+/* Determine if tree code is a constant */
+inline bool
+tree_code_is_cst (tree op)
+{
+  int code = TREE_CODE (op);
+  if (code == INTEGER_CST || code == REAL_CST || code == COMPLEX_CST
+  || code == VECTOR_CST)
+return true;
+  return false;


We have is_gimple_ip_invariant which I think should suit your purpose -
it return true if tree is a constant, it also accepts things like
addresses of (global) variables, functions and labels.


And otherwise there's CONSTANT_CLASS_P ().

Richard.


Thanks, I'll check these suggestions and remove unnecessary code!




+}
+
  /* Return true if it makes sense to consider alias set for a type T.  */

  inline bool
--
2.18.1



Re: [PATCH 2/5] Add function tree_code_in_cst.

2020-06-02 Thread Richard Biener via Gcc-patches
On Sat, May 30, 2020 at 12:18 AM Jan Hubicka  wrote:
>
> >
> > ---
> >  gcc/tree.h | 11 +++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index bd0c51b2a18..86a4542f58b 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -6156,6 +6156,17 @@ int_bit_position (const_tree field)
> > + wi::to_offset (DECL_FIELD_BIT_OFFSET (field))).to_shwi ();
> >  }
> >
> > +/* Determine if tree code is a constant */
> > +inline bool
> > +tree_code_is_cst (tree op)
> > +{
> > +  int code = TREE_CODE (op);
> > +  if (code == INTEGER_CST || code == REAL_CST || code == COMPLEX_CST
> > +  || code == VECTOR_CST)
> > +return true;
> > +  return false;
>
> We have is_gimple_ip_invariant which I think should suit your purpose -
> it return true if tree is a constant, it also accepts things like
> addresses of (global) variables, functions and labels.

And otherwise there's CONSTANT_CLASS_P ().

Richard.

> > +}
> > +
> >  /* Return true if it makes sense to consider alias set for a type T.  */
> >
> >  inline bool
> > --
> > 2.18.1
> >


Re: drop -aux{dir,base}, revamp -dump{dir,base}

2020-06-02 Thread Richard Biener
On Tue, 2 Jun 2020, Alexandre Oliva wrote:

> On May 27, 2020, Alexandre Oliva  wrote:
> 
> > - The prepending of -Wl, to file names in ldflags et al was done in a
> > way that introduced empty arguments when consecutive blanks appeared
> > in these board configuration knobs.  Skip the empty strings between
> > consecutive blanks to avoid this problem.
> 
> I thought I'd dealt with nearly all of the fallout with the patch above,
> but I was still seeing unexpected outputs.exp errors in gcc-testresults@
> 
> Anthony Green, whose results showed a problem, was kind enough to share
> his board config file with me, and looking into it I realized I'd used
> ldscripts instead of ldscript as the board configuration knob name.
> Oops.
> 
> I have not tested the patch below yet.  I'm starting testing it on
> various configurations that, despite some of them being crosses to
> embedded targets, did *not* have ldscript or ldscripts in the board
> config files, i.e., they did not experience the problem.
> 
> I'd appreciate if someone who uses the ldscript knob in their board
> config file would confirm that this patch fixes the problems for them.
> I'll check the patch in under the obviously correct rule as soon as I
> get confirmation that it brings some progress.
> 
> Thanks in advance,

OK.

Richard.

> 
> spell ldscript correctly in outputs.exp et al
> 
> From: Alexandre Oliva 
> 
> In my recent changes to outputs.exp and gcc-defs.exp, I misreferenced
> dejagnu board property ldscript, singular, as ldscripts, plural.
> 
> This probably didn't have much impact on gcc-defs.exp: the code there
> would just prefix with -Wl, any options that amounted to file names,
> and linker scripts probably wouldn't be named without a -T or even an
> -Wl,-T, prefix.
> 
> The visible effects were in outputs.exp, that also intended to add the
> ldscript, if present, to the set of options to be passed to the
> compiler driver for linking.  Using the wrong option name, that
> wouldn't work.
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.misc-tests/outputs.exp: Spell ldscript correctly.
>   * lib/gcc-defs.exp (gcc_adjust_linker_flags): Likewise.
> ---
>  testsuite/gcc.misc-tests/outputs.exp |2 +-
>  testsuite/lib/gcc-defs.exp   |4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git gcc/testsuite/gcc.misc-tests/outputs.exp 
> gcc/testsuite/gcc.misc-tests/outputs.exp
> index c3c6c2d..06a32db 100644
> --- gcc/testsuite/gcc.misc-tests/outputs.exp
> +++ gcc/testsuite/gcc.misc-tests/outputs.exp
> @@ -48,7 +48,7 @@ set skip_lto ![check_effective_target_lto]
>  # We do not compile to an executable, because that requires naming an output.
>  set link_options ""
>  set dest [target_info name]
> -foreach i { ldflags libs ldscripts } {
> +foreach i { ldflags libs ldscript } {
>  if {[board_info $dest exists $i]} {
>   set skip ""
>   foreach opt [split [board_info $dest $i]] {
> diff --git gcc/testsuite/lib/gcc-defs.exp gcc/testsuite/lib/gcc-defs.exp
> index d591cb3..87eeb7d 100644
> --- gcc/testsuite/lib/gcc-defs.exp
> +++ gcc/testsuite/lib/gcc-defs.exp
> @@ -287,7 +287,7 @@ proc dg-additional-files { args } {
>  
>  set gcc_adjusted_linker_flags 0
>  
> -# Add -Wl, before any file names in ldflags, libs, and ldscripts, so
> +# Add -Wl, before any file names in ldflags, libs, and ldscript, so
>  # that default object files or libraries do not change the names of
>  # gcc auxiliary outputs.
>  
> @@ -300,7 +300,7 @@ proc gcc_adjust_linker_flags {} {
>  
>  if {![is_remote host]} {
>   set dest [target_info name]
> - foreach i { ldflags libs ldscripts } {
> + foreach i { ldflags libs ldscript } {
>   if {[board_info $dest exists $i]} {
>   set opts [board_info $dest $i]
>   set nopts {}
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH][OBVIOUS] libgcov: replace malloc and calloc.

2020-06-02 Thread Martin Liška

The calloc was in the original tested version of the patch
and I made accidental last minute change.

Installed to master as obvious.

libgcc/ChangeLog:

* libgcov.h (gcov_topn_add_value): Use xcalloc instead
of xmalloc.
---
 libgcc/libgcov.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/libgcov.h b/libgcc/libgcov.h
index bc38f4dd804..7c037a97878 100644
--- a/libgcc/libgcov.h
+++ b/libgcc/libgcov.h
@@ -443,7 +443,7 @@ gcov_topn_add_value (gcov_type *counters, gcov_type value, 
gcov_type count,
   else
 {
   struct gcov_kvp *new_node
-   = (struct gcov_kvp *)xmalloc (sizeof (struct gcov_kvp));
+   = (struct gcov_kvp *)xcalloc (1, sizeof (struct gcov_kvp));
   new_node->value = value;
   new_node->count = count;
 
--

2.26.2



Re: [PATCH 2/4] ipa-sra: Introduce a mini-DCE to tree-inline.c (PR 93385)

2020-06-02 Thread Richard Biener
On Thu, 28 May 2020, Martin Jambor wrote:

> PR 93385 reveals that if the user explicitely disables DCE, IPA-SRA
> can leave behind statements which are useless because their results
> are eventually not used but can have problematic side effects,
> especially since their inputs are now bogus that useless parameters
> were removed.
> 
> This patch fixes the problem by doing a similar def-use walk when
> materializing clones, marking which statements should not be copied
> and which SSA_NAMEs will be removed by call redirections and now need
> to be replaced with anything valid.  Default-definition SSA_NAMEs of
> parameters which are removed and all SSA_NAMEs derived from them (in a
> phi node or a simple assignment statement) are then remapped to
> error_mark_node - a sure way to spot it if any is left in place.
> 
> There is one exception to the above rule, if such SSA_NAMEs appear as
> an argument of a call, they need to be removed by call redirection and
> not as part of clone materialization.  So to have something valid
> there until that time, this patch pulls out dummy declarations out of
> thin air.  If you do not like that, see patch number 4 in the series,
> which changes this, but probably in a controversial way.
> 
> This patch only resets debug statements using the removed SSA_NAMEs.
> The first follow-up patch adjusts debug statements in the current
> function to still try to make the removed values available in debugger
> in the current function and the subsequent one also in other functions
> where they are passed.
> 
> gcc/ChangeLog:
> 
> 2020-05-14  Martin Jambor  
> 
>   PR ipa/93385
>   * ipa-param-manipulation.h (class ipa_param_body_adjustments): New
>   members m_dead_stmts, m_dead_ssas, mark_dead_statements and
>   get_removed_call_arg_placeholder.
>   * ipa-param-manipulation.c (phi_arg_will_live_p): New function.
>   (ipa_param_body_adjustments::mark_dead_statements): New method.
>   (ipa_param_body_adjustments::common_initialization): Call it.
>   (ipa_param_body_adjustments::ipa_param_body_adjustments): Initialize
>   new mwmbers.
>   (ipa_param_body_adjustments::get_removed_call_arg_placeholder): New.
>   (ipa_param_body_adjustments::modify_call_stmt): Replace dead SSAs
>   with dummy decls.
>   * tree-inline.c (remap_gimple_stmt): Do not copy dead statements,
>   reset dead debug statements.
>   (copy_phis_for_bb): Do not copy dead PHI nodes.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-05-14  Martin Jambor  
> 
>   PR ipa/93385
>   * gcc.dg/ipa/pr93385.c: New test.
>   * gcc.dg/ipa/ipa-sra-23.c: Likewise.
> ---
>  gcc/ipa-param-manipulation.c  | 142 --
>  gcc/ipa-param-manipulation.h  |   8 ++
>  gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c |  24 +
>  gcc/testsuite/gcc.dg/ipa/pr93385.c|  27 +
>  gcc/tree-inline.c |  18 +++-
>  5 files changed, 205 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-23.c
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr93385.c
> 
> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
> index 978916057f0..1f47f3a4268 100644
> --- a/gcc/ipa-param-manipulation.c
> +++ b/gcc/ipa-param-manipulation.c
> @@ -953,6 +953,99 @@ ipa_param_body_adjustments::carry_over_param (tree t)
>return new_parm;
>  }
>  
> +/* Return true if BLOCKS_TO_COPY is NULL or if PHI has an argument ARG in
> +   position that corresponds to an edge that is coming from a block that has
> +   the corresponding bit set in BLOCKS_TO_COPY.  */
> +
> +static bool
> +phi_arg_will_live_p (gphi *phi, bitmap blocks_to_copy, tree arg)
> +{
> +  bool arg_will_survive = false;
> +  if (!blocks_to_copy)
> +arg_will_survive = true;
> +  else
> +for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
> +  if (gimple_phi_arg_def (phi, i) == arg
> +   && bitmap_bit_p (blocks_to_copy,
> +gimple_phi_arg_edge (phi, i)->src->index))
> + {
> +   arg_will_survive = true;
> +   break;
> + }
> +  return arg_will_survive;
> +}
> +
> +/* Populate m_dead_stmts given that DEAD_PARAM is going to be removed without
> +   any replacement or splitting.  */
> +
> +void
> +ipa_param_body_adjustments::mark_dead_statements (tree dead_param)
> +{
> +  if (!is_gimple_reg (dead_param))

Hmm, so non-registers are not a problem?  I guess IPA SRA simply
ensures there are no actual uses (but call arguments) in that case?
Please clearly document this function matches IPA SRA capabilities.

> +return;
> +  tree parm_ddef = ssa_default_def (m_id->src_cfun, dead_param);
> +  if (!parm_ddef || has_zero_uses (parm_ddef))
> +return;
> +
> +  auto_vec stack;
> +  m_dead_ssas.add (parm_ddef);
> +  stack.safe_push (parm_ddef);
> +  while (!stack.is_empty ())
> +{
> +  tree t = stack.pop ();
> +
> +  imm_use_iterator imm_iter;
> +  gimple *stmt;
> +
> +  insert_d

Re: drop -aux{dir,base}, revamp -dump{dir,base}

2020-06-02 Thread Alexandre Oliva
On May 27, 2020, Alexandre Oliva  wrote:

> - The prepending of -Wl, to file names in ldflags et al was done in a
> way that introduced empty arguments when consecutive blanks appeared
> in these board configuration knobs.  Skip the empty strings between
> consecutive blanks to avoid this problem.

I thought I'd dealt with nearly all of the fallout with the patch above,
but I was still seeing unexpected outputs.exp errors in gcc-testresults@

Anthony Green, whose results showed a problem, was kind enough to share
his board config file with me, and looking into it I realized I'd used
ldscripts instead of ldscript as the board configuration knob name.
Oops.

I have not tested the patch below yet.  I'm starting testing it on
various configurations that, despite some of them being crosses to
embedded targets, did *not* have ldscript or ldscripts in the board
config files, i.e., they did not experience the problem.

I'd appreciate if someone who uses the ldscript knob in their board
config file would confirm that this patch fixes the problems for them.
I'll check the patch in under the obviously correct rule as soon as I
get confirmation that it brings some progress.

Thanks in advance,


spell ldscript correctly in outputs.exp et al

From: Alexandre Oliva 

In my recent changes to outputs.exp and gcc-defs.exp, I misreferenced
dejagnu board property ldscript, singular, as ldscripts, plural.

This probably didn't have much impact on gcc-defs.exp: the code there
would just prefix with -Wl, any options that amounted to file names,
and linker scripts probably wouldn't be named without a -T or even an
-Wl,-T, prefix.

The visible effects were in outputs.exp, that also intended to add the
ldscript, if present, to the set of options to be passed to the
compiler driver for linking.  Using the wrong option name, that
wouldn't work.


for  gcc/testsuite/ChangeLog

* gcc.misc-tests/outputs.exp: Spell ldscript correctly.
* lib/gcc-defs.exp (gcc_adjust_linker_flags): Likewise.
---
 testsuite/gcc.misc-tests/outputs.exp |2 +-
 testsuite/lib/gcc-defs.exp   |4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git gcc/testsuite/gcc.misc-tests/outputs.exp 
gcc/testsuite/gcc.misc-tests/outputs.exp
index c3c6c2d..06a32db 100644
--- gcc/testsuite/gcc.misc-tests/outputs.exp
+++ gcc/testsuite/gcc.misc-tests/outputs.exp
@@ -48,7 +48,7 @@ set skip_lto ![check_effective_target_lto]
 # We do not compile to an executable, because that requires naming an output.
 set link_options ""
 set dest [target_info name]
-foreach i { ldflags libs ldscripts } {
+foreach i { ldflags libs ldscript } {
 if {[board_info $dest exists $i]} {
set skip ""
foreach opt [split [board_info $dest $i]] {
diff --git gcc/testsuite/lib/gcc-defs.exp gcc/testsuite/lib/gcc-defs.exp
index d591cb3..87eeb7d 100644
--- gcc/testsuite/lib/gcc-defs.exp
+++ gcc/testsuite/lib/gcc-defs.exp
@@ -287,7 +287,7 @@ proc dg-additional-files { args } {
 
 set gcc_adjusted_linker_flags 0
 
-# Add -Wl, before any file names in ldflags, libs, and ldscripts, so
+# Add -Wl, before any file names in ldflags, libs, and ldscript, so
 # that default object files or libraries do not change the names of
 # gcc auxiliary outputs.
 
@@ -300,7 +300,7 @@ proc gcc_adjust_linker_flags {} {
 
 if {![is_remote host]} {
set dest [target_info name]
-   foreach i { ldflags libs ldscripts } {
+   foreach i { ldflags libs ldscript } {
if {[board_info $dest exists $i]} {
set opts [board_info $dest $i]
set nopts {}



-- 
Alexandre Oliva, freedom fighterhe/himhttps://FSFLA.org/blogs/lxo/
Free Software Evangelist  Stallman was right, but he's left :(
GNU Toolchain Engineer   Live long and free, and prosper ethically


Re: [PATCH 5/7 v3] vect: Support vector load/store with length in vectorizer

2020-06-02 Thread Richard Sandiford
"Kewen.Lin"  writes:
> Hi Richard,
>
> on 2020/5/29 下午4:32, Richard Sandiford wrote:
>> "Kewen.Lin"  writes:
>>> on 2020/5/27 下午6:02, Richard Sandiford wrote:
 "Kewen.Lin"  writes:
> Hi Richard,
>
>
> Snip ...
>
>>>
>>> Thanks a lot for your detailed explanation!  This proposal looks good
>>> based on the current implementation of both masking and length.  I may
>>> think too much, but I had a bit concern as below when some targets have
>>> both masking and length supports in future, such as ppc adds masking
>>> support like SVE.
>>>
>>> I assumed that you meant each vectorizable_* routine should record the
>>> objs for any available partial vectorisation approaches.  If one target
>>> supports both, we would have both recorded but decide not to do partial
>>> vectorisation finally since both have records.  The target can disable
>>> length like through optab to resolve it, but there is one possibility
>>> that the masking support can be imperfect initially since ISA support
>>> could be gradual, it further leads some vectorizable_* check or final
>>> verification to fail for masking, and length approach may work here but
>>> it gets disabled.  We can miss to use partial vectorisation here.
>>>
>>> The other assumption is that each vectorizable_* routine record the 
>>> first available partial vectorisation approach, let's assume masking
>>> takes preference, then it's fine to record just one here even if one
>>> target supports both approaches, but we still have the possiblity to
>>> miss the partial vectorisation chance as some check/verify fail with
>>> masking but fine with length.
>>>
>>> Does this concern make sense?
>> 
>> There's nothing to stop us using masks and lengths in the same loop
>> in future if we need to.  It would “just” be a case of setting up both
>> the masks and the lengths in vect_set_loop_condition.  But the point is
>> that doing that would be extra code, and there's no point writing that
>> extra code until it's needed.
>> 
>> If some future arch does support both mask-based and length-based
>> approaches, I think that's even less reason to make a binary choice
>> between them.  How we prioritise the length and mask approaches when
>> both are available is something that we'll have to decide at the time.
>> 
>> If your concern is that the arch might support masked operations
>> without wanting them to be used for loop control, we could test for
>> that case by checking whether while_ult_optab is implemented.
>> 
>> Thanks,
>> Richard
>> 
>
> Thanks for your further expalanation, as you pointed out, my concern
> is just one case of mixing mask-based and length-based.  I didn't
> realize it and thought we still used one approach for one loop at the
> time, but it's senseless.
>
> The v3 patch attached to use can_partial_vect_p.  In the regression
> testing with explicit vect-with-length-scope setting, I saw several
> reduction failures, updated vectorizable_condition to set
> can_partial_vect_p to false for !EXTRACT_LAST_REDUCTION under your
> guidance to ensure it either records sth. or clearing
> can_partial_vect_p.
>
> Bootstrapped/regtested on powerpc64le-linux-gnu P9 and no remarkable
> failures found even with explicit vect-with-length-scope settings.
>
> But I met one regression failure on aarch64-linux-gnu as below:
>
> PASS->FAIL: gcc.target/aarch64/sve/reduc_8.c -march=armv8.2-a+sve  
> scan-assembler-not \\tcmpeq\\tp[0-9]+\\.s,
>
> It's caused by vectorizable_condition's change, without the change,
> it can use fully-masking for the outer loop.  The reduction_type is
> TREE_CODE_REDUCTION here, so can_partial_vect_p gets cleared.
>
> From the optimized dumping, the previous IRs look fine.  It's doing
> reduction for inner loop, but we are checking partial vectorisation
> for the outer loop.  I'm not sure whether to adjust the current
> guard is reasonable for this case.  Could you help to give some
> insights?  Thanks in advance!
>
> BR,
> Kewen
> --
> gcc/ChangeLog

It would be easier to review, and perhaps easier to bisect,
if some of the mechanical changes were split out.  E.g.:

- Rename can_fully_mask_p to can_use_partial_vectors_p.

- Rename fully_masked_p to using_partial_vectors_p.

- Rename things related to rgroup_masks.  I think “rgroup_controls”
  or “rgroup_guards” might be more descriptive than “rgroup_objs”.

These should be fairly mechanical changes and can happen ahead of
the main series.  It'll then be easier to see what's different
for masks and lengths, separately from the more mechanical stuff.

As far as:

+  union
+  {
+/* The type of mask to use, based on the highest nS recorded above.  */
+tree mask_type;
+/* Any vector type to use these lengths.  */
+tree vec_type;
+  };

goes, some parts of the code seem to use mask_type for lengths too,
which I'm a bit nervous about.  I think we should either be consistent
about which union field we use (always mask_type for masks, always
vec_type for lengths) or we should just

Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Martin Liška

On 6/2/20 1:22 PM, Jonathan Wakely via Gcc-patches wrote:

On Tue, 2 Jun 2020 at 12:09, Jonathan Wakely  wrote:


On Tue, 2 Jun 2020 at 12:05, Jonathan Wakely  wrote:


On Tue, 2 Jun 2020 at 11:56, Gerald Pfeifer  wrote:


On Mon, 1 Jun 2020, Jonathan Wakely via Gcc-patches wrote:

The libstdc++ manual is written in Docbook XML, but we commit both the
XML and generated HTML pages to Git. Sometimes a small XML file can
result in dozens of mechanical changes to the generated HTML files,
which we record in the ChangeLog as:

 * doc/html/*: Regenerated.

With the new checks we need to name every generated file individually.

If we add that directory to the ignored_prefixes list, we won't need
to name them. But then the doc/html/* entry will give an error, and
changes to the HTML files can be committed without any ChangeLog
entry. Should we just stop mentioning the HTML in the ChangeLog?

We could do something like the attached patch, but it seems overkill
for this one special case.


The change makes sense, but indeed it feels like a very specialized
case in a general script.


Yes, that was my thought too.


On the other hand, the script is just meant to enforce our policies,
not dictate them. But on the gripping hand, if the policy can't be
checked simply, maybe it's a bad policy.


Similar to "doc/html/*" I've sometimes used "testsuite/*" for changes
that affect huge numbers of files in the libstdc++ testsuite, e.g.
commit r7-2817-52066eae5d3dd6b7c0a1b843469582dbdbb941eb did:

  2911 files changed, 3072 insertions(+), 4512 deletions(-)


Nice example.



I don't want to list thousands of files at once. So maybe a general
approach for allowing wildcards in specific directories makes sense.


I tend to this approach. Let me prepare a patch candidate for it.



What will we do on January 1 2021 when Jakub updates the copyright
years in every file in the tree, turn off the hook temporarily?



We should probably add early bail out for git commits with
'Bump copyright year'. I'll help Jakub with that.

Martin


Re: [PATCH 1/4] ipa-sra: Do not remove statements necessary because of non-call EH (PR 95113)

2020-06-02 Thread Richard Biener
On Thu, 28 May 2020, Martin Jambor wrote:

> PR 95113 revealed that when reasoning about which parameters are dead,
> IPA-SRA does not perform the same check related to non-call exceptions
> as tree DCE.  It most certainly should and so this patch moves the
> condition used in tree-ssa-dce.c into a separate predicate (in
> tree-eh.c) and uses it from both places.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
> 2020-05-27  Martin Jambor  
> 
>   PR ipa/95113
>   * gcc/tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Move non-call
>   exceptions check to...
>   * gcc/tree-eh.c (stmt_unremovable_because_of_non_call_eh_p): ...this
>   new function.
>   * gcc/tree-eh.h (stmt_unremovable_because_of_non_call_eh_p): Declare it.
>   * gcc/ipa-sra.c (isra_track_scalar_value_uses): Use it.  New parameter
>   fun.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-05-27  Martin Jambor  
> 
>   PR ipa/95113
>   * gcc.dg/ipa/pr95113.c: New test.
> ---
>  gcc/ipa-sra.c  | 28 +
>  gcc/testsuite/gcc.dg/ipa/pr95113.c | 33 ++
>  gcc/tree-eh.c  | 10 +
>  gcc/tree-eh.h  |  1 +
>  gcc/tree-ssa-dce.c |  4 +---
>  5 files changed, 60 insertions(+), 16 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr95113.c
> 
> diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
> index 7c922e40a4e..c81e8869e7a 100644
> --- a/gcc/ipa-sra.c
> +++ b/gcc/ipa-sra.c
> @@ -795,17 +795,17 @@ get_single_param_flow_source (const isra_param_flow 
> *param_flow)
>  }
>  
>  /* Inspect all uses of NAME and simple arithmetic calculations involving NAME
> -   in NODE and return a negative number if any of them is used for something
> -   else than either an actual call argument, simple arithmetic operation or
> -   debug statement.  If there are no such uses, return the number of actual
> -   arguments that this parameter eventually feeds to (or zero if there is 
> none).
> -   For any such parameter, mark PARM_NUM as one of its sources.  ANALYZED is 
> a
> -   bitmap that tracks which SSA names we have already started
> -   investigating.  */
> +   in FUN represented with NODE and return a negative number if any of them 
> is
> +   used for something else than either an actual call argument, simple
> +   arithmetic operation or debug statement.  If there are no such uses, 
> return
> +   the number of actual arguments that this parameter eventually feeds to (or
> +   zero if there is none).  For any such parameter, mark PARM_NUM as one of 
> its
> +   sources.  ANALYZED is a bitmap that tracks which SSA names we have already
> +   started investigating.  */
>  
>  static int
> -isra_track_scalar_value_uses (cgraph_node *node, tree name, int parm_num,
> -   bitmap analyzed)
> +isra_track_scalar_value_uses (function *fun, cgraph_node *node, tree name,
> +   int parm_num, bitmap analyzed)
>  {
>int res = 0;
>imm_use_iterator imm_iter;
> @@ -859,8 +859,9 @@ isra_track_scalar_value_uses (cgraph_node *node, tree 
> name, int parm_num,
>   }
> res += all_uses;
>   }
> -  else if ((is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
> -|| gimple_code (stmt) == GIMPLE_PHI)
> +  else if (!stmt_unremovable_because_of_non_call_eh_p (fun, stmt)
> +&& ((is_gimple_assign (stmt) && !gimple_has_volatile_ops (stmt))
> +|| gimple_code (stmt) == GIMPLE_PHI))
>   {
> tree lhs;
> if (gimple_code (stmt) == GIMPLE_PHI)
> @@ -876,7 +877,7 @@ isra_track_scalar_value_uses (cgraph_node *node, tree 
> name, int parm_num,
> gcc_assert (!gimple_vdef (stmt));
> if (bitmap_set_bit (analyzed, SSA_NAME_VERSION (lhs)))
>   {
> -   int tmp = isra_track_scalar_value_uses (node, lhs, parm_num,
> +   int tmp = isra_track_scalar_value_uses (fun, node, lhs, parm_num,
> analyzed);
> if (tmp < 0)
>   {
> @@ -927,7 +928,8 @@ isra_track_scalar_param_local_uses (function *fun, 
> cgraph_node *node, tree parm,
>  return true;
>  
>bitmap analyzed = BITMAP_ALLOC (NULL);
> -  int call_uses = isra_track_scalar_value_uses (node, name, parm_num, 
> analyzed);
> +  int call_uses = isra_track_scalar_value_uses (fun, node, name, parm_num,
> + analyzed);
>BITMAP_FREE (analyzed);
>if (call_uses < 0)
>  return true;
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr95113.c 
> b/gcc/testsuite/gcc.dg/ipa/pr95113.c
> new file mode 100644
> index 000..a8f8c901ebe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr95113.c
> @@ -0,0 +1,33 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fexceptions -fnon-call-exceptions" } */
> +/* { dg-require-effective-target exceptions } */
> +
> +int a, b;
> +
> +static inline long int
> +fo

Re: [PATCH 0/4] IVOPTs consider step cost for different forms when unrolling

2020-06-02 Thread Richard Biener
On Thu, 28 May 2020, Kewen.Lin wrote:

> Hi,
> 
> This is one repost and you can refer to the original series 
> via https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538360.html.
> 
> As we discussed in the thread
> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html
> Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html,
> I'm working to teach IVOPTs to consider D-form group access during unrolling.
> The difference on D-form and other forms during unrolling is we can put the
> stride into displacement field to avoid additional step increment. eg:
> 
> With X-form (uf step increment):
>   ...
>   LD A = baseA, X
>   LD B = baseB, X
>   ST C = baseC, X
>   X = X + stride
>   LD A = baseA, X
>   LD B = baseB, X
>   ST C = baseC, X
>   X = X + stride
>   LD A = baseA, X
>   LD B = baseB, X
>   ST C = baseC, X
>   X = X + stride
>   ...
> 
> With D-form (one step increment for each base):
>   ...
>   LD A = baseA, OFF
>   LD B = baseB, OFF
>   ST C = baseC, OFF
>   LD A = baseA, OFF+stride
>   LD B = baseB, OFF+stride
>   ST C = baseC, OFF+stride
>   LD A = baseA, OFF+2*stride
>   LD B = baseB, OFF+2*stride
>   ST C = baseC, OFF+2*stride
>   ...
>   baseA += stride * uf
>   baseB += stride * uf
>   baseC += stride * uf
> 
> Imagining that if the loop get unrolled by 8 times, then 3 step updates with
> D-form vs. 8 step updates with X-form. Here we only need to check stride
> meet D-form field requirement, since if OFF doesn't meet, we can construct
> baseA' with baseA + OFF.

I'd just mention there are other targets that have the choice between
the above forms.  Since IVOPTs itself does not perform the unrolling
the IL it produces is the same, correct?

Richard.

> This patch set consists four parts:
>  
>   [PATCH 1/4] unroll: Add middle-end unroll factor estimation
> 
>  Add unroll factor estimation in middle-end. It mainly refers to current
>  RTL unroll factor determination in function decide_unrolling and its
>  sub calls.  As Richi suggested, we probably can force unroll factor
>  with this and avoid duplicate unroll factor calculation, but I think it
>  need more benchmarking work and should be handled separately.
> 
>   [PATCH 2/4] param: Introduce one param to control unroll factor 
> 
>  As Richard and Segher's suggestion, I used addr_offset_valid_p for the
>  addressing mode, rather than one target hook.  As Richard's suggestion,  
>
>  it introduces one parameter to control this IVOPTs consideration and
>  further tweaking [3/4] on top of unroll factor estimation [1/4].
>  
>   [PATCH 3/4] ivopts: Consider cost_step on different forms during unrolling
> 
>  Teach IVOPTs to mark the IV cand as reg_offset_p which is derived from
>  one address IV type group where the whole group is valid to use 
> reg_offset
>  mode.  Then scaling up the IV cand step cost by (uf - 1) for no
>  reg_offset_p IV cands, here the uf is one estimated unroll factor [1/4].
>  
>   [PATCH 4/4] rs6000: P9 D-form test cases
> 
>  Add some test cases, mainly copied from Kelvin's patch.  This is approved
>  by Segher if the whole series is fine.
> 
> 
> Many thanks to Richard and Segher on previous version reviews.
> 
> Bootstrapped and regress tested on powerpc64le-linux-gnu.
> 
> Any comments are highly appreciated!  Thanks in advance!
> 
> 
> BR,
> Kewen
> 
> ---
> 
>  gcc/cfgloop.h  |   3 ++
>  gcc/config/i386/i386-options.c |   6 +++
>  gcc/config/s390/s390.c |   6 +++
>  gcc/doc/invoke.texi|   9 +
>  gcc/params.opt |   4 ++
>  gcc/tree-ssa-loop-ivopts.c | 100 
> ++-
>  gcc/tree-ssa-loop-manip.c  | 253 
> ++
>  gcc/tree-ssa-loop-manip.h  |   3 +-
>  gcc/tree-ssa-loop.c|  33 
>  gcc/tree-ssa-loop.h|   2 +
>  10 files changed, 416 insertions(+), 3 deletions(-)
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: Broken build

2020-06-02 Thread Alexandre Oliva
Hello, Anthony, H-P,

On May 27, 2020, Anthony Green  wrote:

> Hans-Peter Nilsson via Gcc-patches  writes:
>> And here's an improper bug report.
>> 
>> One of the commits between cfdff3eeb90..5c8344e7289 caused every
>> single *linked* test to fail for cris-elf, like:

> I can confirm that the moxie-elf test cases don't link either.

I believe the problem reported by Hans-Peter was fixed in the follow up
I put in the other day, but I happened to stumble across more recent
test results you've posted, and it looks like you're still running into
problems.


> It looks like setting ldscript in the board description file doesn't
> work.  In my case that means "-Tsim.ld" isn't being passed through and
> we can't link anymore.  Here's moxie-sim.exp:

> https://github.com/moxielogic/moxie-test-gcc/blob/7c707e187f101922e3ef7f6e23dbbd1890f9e8dd/moxie-sim.exp#L42

Looking more closely, I realized I used ldscripts instead of ldscript in
the recent changes.  That would explain at least the outputs.exp
failures.  Would you please let me know whether the patch below brings
you back to the state before the revamp patch went in?

Failing that, could you possibly share a gcc.log with me?

Thanks in advance,


diff --git a/gcc/testsuite/gcc.misc-tests/outputs.exp 
b/gcc/testsuite/gcc.misc-tests/outputs.exp
index c3c6c2d..06a32db 100644
--- a/gcc/testsuite/gcc.misc-tests/outputs.exp
+++ b/gcc/testsuite/gcc.misc-tests/outputs.exp
@@ -48,7 +48,7 @@ set skip_lto ![check_effective_target_lto]
 # We do not compile to an executable, because that requires naming an output.
 set link_options ""
 set dest [target_info name]
-foreach i { ldflags libs ldscripts } {
+foreach i { ldflags libs ldscript } {
 if {[board_info $dest exists $i]} {
set skip ""
foreach opt [split [board_info $dest $i]] {
diff --git a/gcc/testsuite/lib/gcc-defs.exp b/gcc/testsuite/lib/gcc-defs.exp
index d591cb3..87eeb7d 100644
--- a/gcc/testsuite/lib/gcc-defs.exp
+++ b/gcc/testsuite/lib/gcc-defs.exp
@@ -287,7 +287,7 @@ proc dg-additional-files { args } {
 
 set gcc_adjusted_linker_flags 0
 
-# Add -Wl, before any file names in ldflags, libs, and ldscripts, so
+# Add -Wl, before any file names in ldflags, libs, and ldscript, so
 # that default object files or libraries do not change the names of
 # gcc auxiliary outputs.
 
@@ -300,7 +300,7 @@ proc gcc_adjust_linker_flags {} {
 
 if {![is_remote host]} {
set dest [target_info name]
-   foreach i { ldflags libs ldscripts } {
+   foreach i { ldflags libs ldscript } {
if {[board_info $dest exists $i]} {
set opts [board_info $dest $i]
set nopts {}


-- 
Alexandre Oliva, freedom fighterhe/himhttps://FSFLA.org/blogs/lxo/
Free Software Evangelist  Stallman was right, but he's left :(
GNU Toolchain Engineer   Live long and free, and prosper ethically


RE: [PATCH] extend cselim to check non-trapping for more references (PR tree-optimizaton/89430)

2020-06-02 Thread Richard Biener
On Fri, 29 May 2020, Hao Liu OS wrote:

> Hi Richard,
> 
> Thanks for your comments. It's a good idea to simplify the code and remove 
> get_inner_reference. I've updated the patch accordingly. I also simplified 
> the code to ignore other loads, which can not help to check if a store can be 
> trapped. 
> 
> About tests:
> 1.  All previously XFAIL tests (gcc.dg/tree-ssa/pr89430-*) for pr89430 
> are passed with this patch. 
> 2.  ssa-pre-17.c is failed as cselim optimizes the conditional store, so 
> "-fno-tree-cselim" is added. That case is added as a new test case for 
> pr89430.
> 3.  Other test cases (including the regression test for pr94734) in gcc 
> testsuit are not affected by this patch, according to gcc "make check".
> 4.  Some other benchmarks are also tested for correctness and 
> performance. The performance regression mentioned in pr89430 can be fixed. 
>  
> Review, please.

Looks mostly good now, some more comments inline below

> gcc/ChangeLog:
> 
> PR tree-optimization/89430
> * tree-ssa-phiopt.c (cond_store_replacement): Extend non-trap checking to
> support ARRAY_REFs and COMPONENT_REFs.  Support a special case: if there 
> is
> a dominating load of local variable without address escape, a store is not
> trapped (as local stack is always writable).  Other loads are ignored for
> simplicity, as they don't help to check if a store can be trapped.
> 
> gcc/testsuite/ChangeLog:
> 
> PR tree-optimization/89430
> * gcc.dg/tree-ssa/pr89430-1.c: Remove xfail.
> * gcc.dg/tree-ssa/pr89430-2.c: Remove xfail.
> * gcc.dg/tree-ssa/pr89430-5.c: Remove xfail.
> * gcc.dg/tree-ssa/pr89430-6.c: Remove xfail.
> * gcc.dg/tree-ssa/pr89430-7-comp-ref.c: New test.
> * gcc.dg/tree-ssa/ssa-pre-17.c: Add -fno-tree-cselim.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c |   2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c |   2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c |   2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c |   2 +-
>  .../gcc.dg/tree-ssa/pr89430-7-comp-ref.c  |  17 +++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-pre-17.c|   2 +-
>  gcc/tree-ssa-phiopt.c | 140 +-
>  7 files changed, 91 insertions(+), 76 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr89430-7-comp-ref.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c
> index ce242ba569b..8ee1850ac63 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-1.c
> @@ -9,4 +9,4 @@ unsigned test(unsigned k, unsigned b) {
>  return a[0]+a[1];
>  }
>  
> -/* { dg-final { scan-tree-dump "Conditional store replacement" "cselim" { 
> xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump "Conditional store replacement" "cselim" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c
> index 90ae36bfce2..9b96875ac7a 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-2.c
> @@ -11,4 +11,4 @@ unsigned test(unsigned k, unsigned b) {
>  return a[0]+a[1];
>  }
>  
> -/* { dg-final { scan-tree-dump "Conditional store replacement" "cselim" { 
> xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump "Conditional store replacement" "cselim" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c
> index c633cbe947d..b2d04119381 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-5.c
> @@ -13,4 +13,4 @@ int test(int b, int k) {
>  return a.data[0] + a.data[1];
>  }
>  
> -/* { dg-final { scan-tree-dump "Conditional store replacement" "cselim" { 
> xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump "Conditional store replacement" "cselim" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c
> index 7cad563128d..8d3c4f7cc6a 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-6.c
> @@ -16,4 +16,4 @@ int test(int b, int k) {
>  return a.data[0].x + a.data[1].x;
>  }
>  
> -/* { dg-final { scan-tree-dump "Conditional store replacement" "cselim" { 
> xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump "Conditional store replacement" "cselim" } } 
> */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr89430-7-comp-ref.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-7-comp-ref.c
> new file mode 100644
> index 000..c35a2afc70b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr89430-7-comp-ref.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-cselim-details" } */
> +
> +typedef union {
> +  int i;
> +  float f;
> +} U;
> +
> +int foo(U *u, int b, int i)
> +{
> +  u->i = 0;
> +  if (b)
> +u->i = i;
> +  return u->i;
> +}
> 

Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 2 Jun 2020 at 12:09, Jonathan Wakely  wrote:
>
> On Tue, 2 Jun 2020 at 12:05, Jonathan Wakely  wrote:
> >
> > On Tue, 2 Jun 2020 at 11:56, Gerald Pfeifer  wrote:
> > >
> > > On Mon, 1 Jun 2020, Jonathan Wakely via Gcc-patches wrote:
> > > > The libstdc++ manual is written in Docbook XML, but we commit both the
> > > > XML and generated HTML pages to Git. Sometimes a small XML file can
> > > > result in dozens of mechanical changes to the generated HTML files,
> > > > which we record in the ChangeLog as:
> > > >
> > > > * doc/html/*: Regenerated.
> > > >
> > > > With the new checks we need to name every generated file individually.
> > > >
> > > > If we add that directory to the ignored_prefixes list, we won't need
> > > > to name them. But then the doc/html/* entry will give an error, and
> > > > changes to the HTML files can be committed without any ChangeLog
> > > > entry. Should we just stop mentioning the HTML in the ChangeLog?
> > > >
> > > > We could do something like the attached patch, but it seems overkill
> > > > for this one special case.
> > >
> > > The change makes sense, but indeed it feels like a very specialized
> > > case in a general script.
> >
> > Yes, that was my thought too.
>
> On the other hand, the script is just meant to enforce our policies,
> not dictate them. But on the gripping hand, if the policy can't be
> checked simply, maybe it's a bad policy.

Similar to "doc/html/*" I've sometimes used "testsuite/*" for changes
that affect huge numbers of files in the libstdc++ testsuite, e.g.
commit r7-2817-52066eae5d3dd6b7c0a1b843469582dbdbb941eb did:

 2911 files changed, 3072 insertions(+), 4512 deletions(-)

I don't want to list thousands of files at once. So maybe a general
approach for allowing wildcards in specific directories makes sense.

What will we do on January 1 2021 when Jakub updates the copyright
years in every file in the tree, turn off the hook temporarily?


Fix PR middle-end/95395

2020-06-02 Thread Eric Botcazou
The usual stupid confusion between bits and bytes...  The tree-pretty-print.c 
hunk is unrelated and has been approved by Richard elsewhere.

Tested on SPARC64/Linux, applied on the mainline as obvious.


2020-06-02 Eric Botcazou  

PR middle-end/95395
* optabs.c (expand_unop): Fix bits/bytes confusion in latest change.

* tree-pretty-print.c (dump_generic_node) : Print quals.

-- 
Eric Botcazoudiff --git a/gcc/optabs.c b/gcc/optabs.c
index 7a4ec1ec01c..6d0b76c13ba 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -2892,7 +2892,7 @@ expand_unop (machine_mode mode, optab unoptab, rtx op0, rtx target,
 	  /* We do not provide a 128-bit bswap in libgcc so force the use of
 	 a double bswap for 64-bit targets.  */
 	  if (GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD
-	  && (UNITS_PER_WORD == 64
+	  && (UNITS_PER_WORD == 8
 		  || optab_handler (unoptab, word_mode) != CODE_FOR_nothing))
 	{
 	  temp = expand_doubleword_bswap (mode, op0, target);
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index f04fd65091a..7d581214022 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1899,8 +1899,16 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags,
 
 case ARRAY_TYPE:
   {
+	unsigned int quals = TYPE_QUALS (node);
 	tree tmp;
 
+	if (quals & TYPE_QUAL_ATOMIC)
+	  pp_string (pp, "atomic ");
+	if (quals & TYPE_QUAL_CONST)
+	  pp_string (pp, "const ");
+	if (quals & TYPE_QUAL_VOLATILE)
+	  pp_string (pp, "volatile ");
+
 	/* Print the innermost component type.  */
 	for (tmp = TREE_TYPE (node); TREE_CODE (tmp) == ARRAY_TYPE;
 	 tmp = TREE_TYPE (tmp))


Re: [PATCH PR95254] aarch64: gcc generate inefficient code with fixed sve vector length

2020-06-02 Thread Richard Sandiford
"Yangfei (Felix)"  writes:
> Hi,
>
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Monday, June 1, 2020 4:47 PM
>> To: Yangfei (Felix) 
>> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak ; Jakub
>> Jelinek ; Hongtao Liu ; H.J. Lu
>> 
>> Subject: Re: [PATCH PR95254] aarch64: gcc generate inefficient code with
>> fixed sve vector length
>
> Snip...
>  
>> Sounds good.  Maybe at this point the x_inner and y_inner code is getting
>> complicated enough to put into a lambda too:
>> 
>>   x_inner = ... (x);
>>   y_inner = ... (y);
>> 
>> Just a suggestion though.
>
> Yes, that's a good suggestion.  I see the code becomes more cleaner with 
> another lambda.
>  
>> Yeah, looks good.
>> 
>> Formatting nit though: multi-line conditions should be wrapped in (...),
>> i.e.:
>> 
>> return (...
>> && ...
>> && ...);
>> 
>
> Done.  v6 patch is based on trunk 20200601.
> Bootstrapped and tested on aarch64-linux-gnu. 
> Also bootstrapped on x86-64-linux-gnu with --enable-multilib (for building 
> -m32 x86 libgcc).
> Regresssion test on x86-64-linux-gnu looks good except for the following 
> failures which has been confirmed by x86 devs: 
>
>> FAIL: gcc.target/i386/avx512f-vcvtps2ph-2.c (test for excess errors)
>> UNRESOLVED: gcc.target/i386/avx512f-vcvtps2ph-2.c compilation failed to 
>> produce executable
> 154803c154803

Looks good.  (I know I said that last time too :-))  I've also tested
it on arm-linux-gnueabihf and powerpc64le-linux-gnu without problems.

As before, I'll hold off applying until the AVX512 problem is fixed.

Thanks,
Richard


Re: [PATCH 1/2] Introduce flag_cunroll_grow_size for cunroll

2020-06-02 Thread Richard Biener via Gcc-patches
On Tue, Jun 2, 2020 at 4:10 AM Jiufu Guo  wrote:
>
> Jiufu Guo  writes:
>
> Hi,
>
> I updated the patch just a little accordinlgy.  Thanks!
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 4464049fc1f..570e2aa53c8 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2856,6 +2856,10 @@ funroll-all-loops
>  Common Report Var(flag_unroll_all_loops) Optimization
>  Perform loop unrolling for all loops.
>
> +funroll-completely-grow-size
> +Common Undocumented Var(flag_cunroll_grow_size) Init(2) Optimization
> +; Internal undocumented flag, allow size growth during complete unrolling
> +
>  ; Nonzero means that loop optimizer may assume that the induction variables
>  ; that control loops do not overflow and that the loops with nontrivial
>  ; exit condition are not infinite
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 96316fbd23b..8d52358efdd 100644
> --- a/gcc/toplev.c
> +++ b/gcc/toplev.c
> @@ -1474,6 +1474,10 @@ process_options (void)
>if (flag_unroll_all_loops)
>  flag_unroll_loops = 1;
>
> +  /* Allow cunroll to grow size accordingly.  */
> +  if (flag_cunroll_grow_size == AUTODETECT_VALUE)
> +flag_cunroll_grow_size = flag_unroll_loops || flag_peel_loops;
> +

(*)

>/* web and rename-registers help when run after loop unrolling.  */
>if (flag_web == AUTODETECT_VALUE)
>  flag_web = flag_unroll_loops;
> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
> index 8ab6ab3330c..298ab215530 100644
> --- a/gcc/tree-ssa-loop-ivcanon.c
> +++ b/gcc/tree-ssa-loop-ivcanon.c
> @@ -1603,9 +1603,8 @@ pass_complete_unroll::execute (function *fun)
>   re-peeling the same loop multiple times.  */
>if (flag_peel_loops)
>  peeled_loops = BITMAP_ALLOC (NULL);
> -  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
> -  || flag_peel_loops
> -  || optimize >= 3, true);
> +  unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size,

With -O3 -fno-peel-loops this does not have the same effect anymore.
So above (*) you'd need to check || optimize >= 3 as well.

> +  true);
>if (peeled_loops)
>  {
>BITMAP_FREE (peeled_loops);
>
> BR,
> Jiufu
>
> > Richard Biener  writes:
> >
>
> >>> >> From: Jiufu Guo 
> >>> >>
> >>> >> Currently GIMPLE complete unroller(cunroll) is checking
> >>> >> flag_unroll_loops and flag_peel_loops to see if allow size growth.
> >>> >> Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
> >>> >> To have more freedom to control cunroll and RTL unroller, this patch
> >>> >> introduces flag_cunroll_grow_size.  With this patch, we can control
> >>> >> cunroll and RTL unroller indepently.
> >>> >>
> >>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? And backport to
> >>> >> GCC10 after week?
> >>> >>
> >>> >>
> >>> >> +funroll-completely-grow-size
> >>> >> +Var(flag_cunroll_grow_size) Init(2)
> >>> >> +; Control cunroll to allow size growth during complete unrolling
> >>> >> +
> >>> >
> >>
> >> It won't work without adjusting the awk scripts.  So go with
> >>
> >> funroll-completely-grow-size
> >> Undocumented Optimization Var(flag_cunroll_grow_size)
> >> EnabledBy(funroll-loops || fpeel-loops)
> >> ; ...
> >>
> > EnabledBy(funroll-loops || fpeel-loops) does not works as we expected:
> > "-funroll-loops -fno-peel-loops" turns off flag_cunroll_grow_size.
> >
> > Through "EnabledBy", a flag can be turned, and also can be turned off by
> > the "EnabledBy option", only if the flag is not specifed through commond
> > line.
> >
> >> and enable it at O3+.  AUTODETECT_VALUE doesn't make sense for
> >> an option not supposed to be set by users?
> >>
> >
> > global_options_set.x_flagxxx can be used to check if option is set by
> > user.  But it does not work well here neither, because we also care of
> > if the flag is override by OPTION_OPTIMIZATION_TABLE or
> > OPTION_OVERRIDE.
> >
> > AUTODETECT_VALUE(value is 2) is used for some flags like flag_web,
> > flag_rename_registers, flag_var_tracking, flag_tree_cselim...
> > And this way could be used to check if the flag is effective(on/off)
> > either explicit set by command line or implicit set through
> > OPTION_OVERRIDE or OPTION_OPTIMIZATION_TABLE.
> > So, I use it here.
>
>


Re: [stage1][PATCH] Lower VEC_COND_EXPR into internal functions.

2020-06-02 Thread Richard Biener via Gcc-patches
On Sat, May 30, 2020 at 3:08 PM Segher Boessenkool
 wrote:
>
> Hi!
>
> On Sat, May 30, 2020 at 08:15:55AM +0100, Richard Sandiford wrote:
> > Segher Boessenkool  writes:
> > >> Sure.  But the point is that FAILing isn't “explicitly allowed” for 
> > >> vcond*.
> > >> In fact it's the opposite.
>
> I disagree btw, and no one else has noticed for 16 years either.
>
> In general, almost all patterns can FAIL, and those that can not are
> simply because no one wrote fallback code.  Which means that all
> targets that need a fallback need to implement the same thing for
> themselves, which is just a waste and causes extra errors.
>
> So, "cannot FAIL" should be a temporary thing, and should change to
> "can FAIL" as soon as someone implements that, and never be changed
> back -- and it should be how almost everything is in the first place
> (and it still is, thankfully).
>
> > > It has FAILed on rs6000 since 2004.
> >
> > But that just means that the powerpc bug has been there since 2004,
> > assuming these FAILs can actually trigger in practice.  At that time,
> > the corresponding expand code was:
>
> I, and I think most other people, thought it was allowed to FAIL (and
> I still do).
>
> > rtx
> > expand_vec_cond_expr (tree vec_cond_expr, rtx target)
>
> [ snip ]
>
> So this was buggy.
>
> > i.e. no fallbacks, and no checking whether the expansion even
> > succeeded.  Since FAIL just causes the generator to return null,
> > and since emit_insn is a no-op for null insns, the effect for
> > FAILs was to emit no instructions and return an uninitialised
> > target register.
> >
> > The silent use of an uninitialised register was changed in 2011
> > to an ICE, via the introduction of expand_insn.
>
> Yeah, I ran into some of that in 2015, at least then not all of that
> was fixed.  That was some very basic insn I think, that really should
> never fail, a simple branch or something...  Was surprising though, a
> good reminder to always check return values :-)
>
> > The fact that we've had no code to handle the FAILs for 15+ years
> > without apparent problems makes it even more likely that the FAILs
> > never happen in practice.
>
> AltiVec can do a lot less than VSX (and VSX on p7 can do less than on
> p8, and that can do less than p9, etc.), so I am pretty certain it
> could fail for some cases.  Only up to not so very long ago these
> patterns were mainly (or only?) used via builtins, and the code for
> those handles all those cases already.
>
> > If you think the FAILs do trigger in practice, please provide an example.
>
> As I said before, that is completely beside the point.
>
> vcond is allowed to FAIL.  No pattern that can FAIL should ever be
> changed to not allow that anymore.  This would make no sense at all.

Fact is if the FAIL happens we ICE _currently_.

The patterns may not FAIL since otherwise the vectorizer has no way
to check whether the backend can code-generate and fail vectorization
if not.  FAIL is a _very_ unspecific fail and I guess the middle-end would
need to cope with a pattern (considered as may-FAIL) doing

  if (random() == 5)
FAIL;

specifically not FAIL when invoked during vectorization but FAIL when
re-invoked during RTL expansion just because some internal state
at the point of RTL expansion cannot be simulated at vectorization time.

Now the real issue here is of course that if vcond expansion may
FAIL then of course, based on your reasoning, vec_cmp expansion
may so as well.  In that case there's _no_ way to code generate
either of the two.  Well, spill, do elementwise compare (are cmp*
allowed to FAIL?), store, load would do the trick - but that defeats
the attempt to cost during vectorization.

So please be constructive.  Like, provide a testcase that ICEs
with the FAILs replaced by gcc_unreachable ().  Martin, may I suggest
to do this replacement and bootstrap/test?  I think it would be nice
to have testsuite coverage for the FAILs, and maybe we have that
already.

To get out of the deadlock here I'll approve a patch variant that
uses a separate internal function (thus without the static non-FAIL
checking) that preserves current behavior - thus ICE if the pattern
FAILs.  This change is then not a regression.  (please re-post for
appropriate approval)

Unless we come to a consensus in this discussion which seems
to dance around the latent vectorizer <-> rs6000 interface issue.

CCing David as other rs6000 maintainer (the subject isn't
specifically pointing to rs6000 so not sure if you followed the
discussion).

Thanks,
Richard.

>
> Segher


Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 2 Jun 2020 at 12:05, Jonathan Wakely  wrote:
>
> On Tue, 2 Jun 2020 at 11:56, Gerald Pfeifer  wrote:
> >
> > On Mon, 1 Jun 2020, Jonathan Wakely via Gcc-patches wrote:
> > > The libstdc++ manual is written in Docbook XML, but we commit both the
> > > XML and generated HTML pages to Git. Sometimes a small XML file can
> > > result in dozens of mechanical changes to the generated HTML files,
> > > which we record in the ChangeLog as:
> > >
> > > * doc/html/*: Regenerated.
> > >
> > > With the new checks we need to name every generated file individually.
> > >
> > > If we add that directory to the ignored_prefixes list, we won't need
> > > to name them. But then the doc/html/* entry will give an error, and
> > > changes to the HTML files can be committed without any ChangeLog
> > > entry. Should we just stop mentioning the HTML in the ChangeLog?
> > >
> > > We could do something like the attached patch, but it seems overkill
> > > for this one special case.
> >
> > The change makes sense, but indeed it feels like a very specialized
> > case in a general script.
>
> Yes, that was my thought too.

On the other hand, the script is just meant to enforce our policies,
not dictate them. But on the gripping hand, if the policy can't be
checked simply, maybe it's a bad policy.


Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 2 Jun 2020 at 07:44, Martin Liška  wrote:
>
> On 6/1/20 7:24 PM, Jonathan Wakely wrote:
> > On Mon, 25 May 2020 at 23:50, Jakub Jelinek via Gcc  
> > wrote:
> >>
> >> Hi!
> >>
> >> I've turned the strict mode of Martin Liška's hook changes,
> >> which means that from now on no commits to the trunk or release branches
> >> should be changing any ChangeLog files together with the other files,
> >> ChangeLog entry should be solely in the commit message.
> >> The DATESTAMP bumping script will be updating the ChangeLog files for you.
> >> If somebody makes a mistake in that, please wait 24 hours (at least until
> >> after 00:16 UTC after your commit) so that the script will create the
> >> ChangeLog entries, and afterwards it can be fixed by adjusting the 
> >> ChangeLog
> >> files.  But you can only touch the ChangeLog files in that case (and
> >> shouldn't write a ChangeLog entry for that in the commit message).
> >>
> >> If anything goes wrong, please let me, other RMs and Martin Liška know.
> >
> > The libstdc++ manual is written in Docbook XML, but we commit both the
> > XML and generated HTML pages to Git. Sometimes a small XML file can
> > result in dozens of mechanical changes to the generated HTML files,
> > which we record in the ChangeLog as:
> >
> >  * doc/html/*: Regenerated.
> >
> > With the new checks we need to name every generated file individually.
> >
> > If we add that directory to the ignored_prefixes list, we won't need
> > to name them. But then the doc/html/* entry will give an error, and
> > changes to the HTML files can be committed without any ChangeLog
> > entry. Should we just stop mentioning the HTML in the ChangeLog?
> >
> > We could do something like the attached patch, but it seems overkill
> > for this one special case.
> >
>
> The patch is fine to me.
> Can you please a pytest for the situation: 
> contrib/gcc-changelog/test_email.py ?

Tests now added (and passing) for the positive case and both negative cases.
commit 21f171e1f8af7e222a87523d7957174f985a1dd5
Author: Jonathan Wakely 
Date:   Mon Jun 1 18:33:48 2020 +0100

gcc-changelog: Allow "doc/html/*" in libstdc++ changelog

contrib/ChangeLog:

* gcc-changelog/git_commit.py (check_mentioned_files): Allow
wildcard path for generated files in libstdc++-v3/doc/html.
* gcc-changelog/test_email.py (test_libstdcxx_html_regenerated):
New test.
* gcc-changelog/test_patches.txt: Add patches.

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index b8c7f718c50..d723d890f31 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -501,6 +501,7 @@ class GitCommit:
 assert folder_count == len(self.changelog_entries)
 
 mentioned_files = set()
+libstdcxx_html_regenerated = False
 for entry in self.changelog_entries:
 if not entry.files:
 msg = 'ChangeLog must contain at least one file entry'
@@ -508,16 +509,33 @@ class GitCommit:
 assert not entry.folder.endswith('/')
 for file in entry.files:
 if not self.is_changelog_filename(file):
-mentioned_files.add(os.path.join(entry.folder, file))
+file = os.path.join(entry.folder, file)
+if file == 'libstdc++-v3/doc/html/*':
+libstdcxx_html_regenerated = True
+else:
+mentioned_files.add(file)
 
 cand = [x[0] for x in self.modified_files
 if not self.is_changelog_filename(x[0])]
 changed_files = set(cand)
+if libstdcxx_html_regenerated:
+libstdcxx_html_regenerated = False
+for c in changed_files:
+if c.startswith('libstdc++-v3/doc/html/'):
+libstdcxx_html_regenerated = True
+break
+if not libstdcxx_html_regenerated:
+self.errors.append(Error('No libstdc++ HTML changes found'))
+
 for file in sorted(mentioned_files - changed_files):
 self.errors.append(Error('file not changed in a patch', file))
 for file in sorted(changed_files - mentioned_files):
 if not self.in_ignored_location(file):
-if file in self.new_files:
+if file.startswith('libstdc++-v3/doc/html/'):
+if not libstdcxx_html_regenerated:
+msg = 'libstdc++ HTML changes not in ChangeLog'
+self.errors.append(Error(msg, file))
+elif file in self.new_files:
 changelog_location = self.get_changelog_by_path(file)
 # Python2: we cannot use next(filter(...))
 entries = filter(lambda x: x.folder == changelog_location,
diff --git a/contrib/gcc-changelog/test_email.py 
b/contrib/gcc-changelog/test_email.py
in

Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Jonathan Wakely via Gcc-patches
On Tue, 2 Jun 2020 at 11:56, Gerald Pfeifer  wrote:
>
> On Mon, 1 Jun 2020, Jonathan Wakely via Gcc-patches wrote:
> > The libstdc++ manual is written in Docbook XML, but we commit both the
> > XML and generated HTML pages to Git. Sometimes a small XML file can
> > result in dozens of mechanical changes to the generated HTML files,
> > which we record in the ChangeLog as:
> >
> > * doc/html/*: Regenerated.
> >
> > With the new checks we need to name every generated file individually.
> >
> > If we add that directory to the ignored_prefixes list, we won't need
> > to name them. But then the doc/html/* entry will give an error, and
> > changes to the HTML files can be committed without any ChangeLog
> > entry. Should we just stop mentioning the HTML in the ChangeLog?
> >
> > We could do something like the attached patch, but it seems overkill
> > for this one special case.
>
> The change makes sense, but indeed it feels like a very specialized
> case in a general script.

Yes, that was my thought too.

> Thinking out of the box (and admittedly with a dose of igorance, which
> means I am likely missing something): Is not keeping the libstdc++/doc
> HTML in Git a viable option?  Only creating that HTML as part of releases
> and maybe snapshots?

It gets sync'd to https://gcc.gnu.org/onlinedocs/libstdc++ nightly. We
could generate it nightly, but we'd need all the docbook stylesheets
etc. on sourceware. Or we could just generate it for snapshots (which
would still need the docbook stuff on the server) and only sync the
onlinedocs weekly from the snapshot.


Re: [PATCH, FORTRAN] matchexp.c: Fix possible false positive of uninitialized variable usage e

2020-06-02 Thread Stefan Schulze Frielinghaus via Gcc-patches
Ping

On Tue, May 19, 2020 at 08:33:24AM +0200, Stefan Schulze Frielinghaus via 
Gcc-patches wrote:
> While bootstrapping GCC on S/390 the following warning is raised:
> 
> gcc/fortran/matchexp.c: In function 'match match_level_5(gfc_expr**)':
> gcc/fortran/matchexp.c:401:18: error: 'e' may be used uninitialized in this 
> function [-Werror=maybe-uninitialized]
>   401 |gfc_free_expr (e);
>   |~~^~~
> /home/stefansf/devel/gcc/src/gcc/fortran/matchexp.c:353:19: note: 'e' was 
> declared here
>   353 |   gfc_expr *all, *e, *total;
>   |   ^
> 
> In function match_add_operand local variable e gets initialised via
> function match_ext_mult_operand if it returns MATCH_YES.  In case of the
> remaining two return values MATCH_NO and MATCH_ERROR the current
> function call is terminated before e may be used.  Thus this
> warning/error seems to be a false positive analogous to variable e in
> function match_mult_operand.  Interestingly this warning may be silenced
> by initialising variable e of either function match_level_5 or
> match_add_operand.  Since both variables are unrelated and correspond to
> different functions, this endorses the suspicion of a false positive.
> 
> Silenced the warning by initializing variable e of function
> match_add_operand.
> 
> Bootstrapped and regtested on S/390.  Ok for master?
> 
> gcc/fortran/ChangeLog:
> 
> 2020-05-04  Stefan Schulze Frielinghaus  
> 
>   * matchexp.c (match_add_operand): Initialize local variable e.
> ---
>  gcc/fortran/matchexp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/fortran/matchexp.c b/gcc/fortran/matchexp.c
> index 6479582d75b..25e675f3d45 100644
> --- a/gcc/fortran/matchexp.c
> +++ b/gcc/fortran/matchexp.c
> @@ -348,7 +348,9 @@ match_ext_mult_operand (gfc_expr **result)
>  static match
>  match_add_operand (gfc_expr **result)
>  {
> -  gfc_expr *all, *e, *total;
> +  /* Workaround -Wmaybe-uninitialized false positive by initializing variable
> + e.  */
> +  gfc_expr *all, *e = NULL, *total;
>locus where, old_loc;
>match m;
>gfc_intrinsic_op i;
> -- 
> 2.25.3
> 


Re: [IMPORTANT] ChangeLog related changes

2020-06-02 Thread Gerald Pfeifer
On Mon, 1 Jun 2020, Jonathan Wakely via Gcc-patches wrote:
> The libstdc++ manual is written in Docbook XML, but we commit both the
> XML and generated HTML pages to Git. Sometimes a small XML file can
> result in dozens of mechanical changes to the generated HTML files,
> which we record in the ChangeLog as:
> 
> * doc/html/*: Regenerated.
> 
> With the new checks we need to name every generated file individually.
> 
> If we add that directory to the ignored_prefixes list, we won't need
> to name them. But then the doc/html/* entry will give an error, and
> changes to the HTML files can be committed without any ChangeLog
> entry. Should we just stop mentioning the HTML in the ChangeLog?
> 
> We could do something like the attached patch, but it seems overkill
> for this one special case.

The change makes sense, but indeed it feels like a very specialized
case in a general script.

Thinking out of the box (and admittedly with a dose of igorance, which
means I am likely missing something): Is not keeping the libstdc++/doc 
HTML in Git a viable option?  Only creating that HTML as part of releases 
and maybe snapshots?

Gerald


[committed] d: Set up input_location before lowering Dsymbol AST

2020-06-02 Thread Iain Buclaw via Gcc-patches
Hi,

Like a similarly named function in the visitor class for statements,
this ensures that the current input_location is set to the correct
source file location of the decl.

It is likely that there are a number of cases where declarations have
ended up with no location without this.

Bootstrapped and regression tested on x86_64-linux-gnu, and committed to
mainline.

Regards
Iain


gcc/d/ChangeLog:

* decl.cc (DeclVisitor::build_dsymbol): New function.
(DeclVisitor::visit (TupleDeclaration *)): Use build_dsymbol to
traverse AST instead of accept.
(DeclVisitor::visit (AttribDeclaration *)): Likewise.
(DeclVisitor::visit (Nspace *)): Likewise.
(DeclVisitor::visit (TemplateDeclaration *)): Likewise.
(DeclVisitor::visit (TemplateInstance *)): Likewise.
(DeclVisitor::visit (TemplateMixin *)): Likewise.
(DeclVisitor::visit (StructDeclaration *)): Likewise.
(DeclVisitor::visit (ClassDeclaration *)): Likewise.
(DeclVisitor::visit (InterfaceDeclaration *)): Likewise.
(DeclVisitor::visit (VarDeclaration *)): Likewise.
(build_decl_tree): Likewise.
---
 gcc/d/decl.cc | 67 +--
 1 file changed, 28 insertions(+), 39 deletions(-)

diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 845fc5cf877..7dab5c7e88c 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -131,6 +131,17 @@ public:
 this->in_version_unittest_ = false;
   }
 
+  /* Helper for generating code for the dsymbol AST class D.
+ Sets up the location of the symbol before lowering.  */
+
+  void build_dsymbol (Dsymbol *d)
+  {
+location_t saved_location = input_location;
+input_location = make_location_t (d->loc);
+d->accept (this);
+input_location = saved_location;
+  }
+
   /* This should be overridden by each declaration class.  */
 
   void visit (Dsymbol *)
@@ -212,7 +223,7 @@ public:
  {
Declaration *d = ((DsymbolExp *) o)->s->isDeclaration ();
if (d)
- d->accept (this);
+ this->build_dsymbol (d);
  }
   }
   }
@@ -227,10 +238,7 @@ public:
   return;
 
 for (size_t i = 0; i < ds->dim; i++)
-  {
-   Dsymbol *s = (*ds)[i];
-   s->accept (this);
-  }
+  this->build_dsymbol ((*ds)[i]);
   }
 
   /* Pragmas are a way to pass special information to the compiler and to add
@@ -278,10 +286,7 @@ public:
   return;
 
 for (size_t i = 0; i < d->members->dim; i++)
-  {
-   Dsymbol *s = (*d->members)[i];
-   s->accept (this);
-  }
+  this->build_dsymbol ((*d->members)[i]);
   }
 
   /* Templates are D's approach to generic programming.  They have no members
@@ -315,7 +320,7 @@ public:
 /* Return type is instantiated from this template declaration, walk over
all members of the instance.  */
 if (ti && ti->tempdecl == d)
-  ti->accept (this);
+  this->build_dsymbol (ti);
   }
 
   /* Walk over all members in the instantiated template.  */
@@ -329,10 +334,7 @@ public:
   return;
 
 for (size_t i = 0; i < d->members->dim; i++)
-  {
-   Dsymbol *s = (*d->members)[i];
-   s->accept (this);
-  }
+  this->build_dsymbol ((*d->members)[i]);
   }
 
   /* Walk over all members in the mixin template scope.  */
@@ -343,10 +345,7 @@ public:
   return;
 
 for (size_t i = 0; i < d->members->dim; i++)
-  {
-   Dsymbol *s = (*d->members)[i];
-   s->accept (this);
-  }
+  this->build_dsymbol ((*d->members)[i]);
   }
 
   /* Write out compiler generated TypeInfo, initializer and functions for the
@@ -391,24 +390,20 @@ public:
 
 d_finish_decl (d->sinit);
 
-/* Put out the members.  */
+/* Put out the members.  There might be static constructors in the members
+   list, and they cannot be put in separate object files.  */
 for (size_t i = 0; i < d->members->dim; i++)
-  {
-   Dsymbol *member = (*d->members)[i];
-   /* There might be static ctors in the members, and they cannot
-  be put in separate object files.  */
-   member->accept (this);
-  }
+  this->build_dsymbol ((*d->members)[i]);
 
 /* Put out xopEquals, xopCmp and xopHash.  */
 if (d->xeq && d->xeq != d->xerreq)
-  d->xeq->accept (this);
+  this->build_dsymbol (d->xeq);
 
 if (d->xcmp && d->xcmp != d->xerrcmp)
-  d->xcmp->accept (this);
+  this->build_dsymbol (d->xcmp);
 
 if (d->xhash)
-  d->xhash->accept (this);
+  this->build_dsymbol (d->xhash);
 
 d->semanticRun = PASSobj;
   }
@@ -503,10 +498,7 @@ public:
 
 /* Put out the members.  */
 for (size_t i = 0; i < d->members->dim; i++)
-  {
-   Dsymbol *member = (*d->members)[i];
-   member->accept (this);
-  }
+  this->build_dsymbol ((*d->members)[i]);
 
 /* If something goes wrong during final semantic pass, don't bother with
the rest as we may have incomplete info.  */
@@ -582

Re: [PATCH] rs6000: Use REAL_TYPE to copy when block move array in structure[PR65421]

2020-06-02 Thread Richard Biener via Gcc-patches
On Tue, Jun 2, 2020 at 11:43 AM Xionghu Luo via Gcc-patches
 wrote:
>
> Double array in structure as function arguments or return value is accessed
> by BLKmode, they are stored to stack and load from stack with redundant
> conversion from DF->DI->DF.  This patch checks the homogeneous type and
> use the actual element type to do block move to by pass the conversions.

Is it correct to do this when the actual data in the place is DImode?
We generally
avoid using any floating point modes here because the DImode data could
for example correspond to a signalling NaN or a non-canonical NaN.

What makes a case with, say, struct { double a; long b; } different?

Richard.

> gcc/ChangeLog:
>
> 2020-06-02  Xionghu Luo  
>
> PR target/65421
> * config/rs6000/rs6000-string.c (expand_block_move): Use
> elt_mode to copy when homogeneous REAL_TYPE.
>
> gcc/testsuite/ChangeLog:
>
> 2020-06-02  Xionghu Luo  
>
> PR target/65421
> * gcc.target/powerpc/pr65421.c: New test.
> ---
>  gcc/config/rs6000/rs6000-string.c  | 15 ++-
>  gcc/testsuite/gcc.target/powerpc/pr65421.c | 17 +
>  2 files changed, 31 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
>
> diff --git a/gcc/config/rs6000/rs6000-string.c 
> b/gcc/config/rs6000/rs6000-string.c
> index fe7177f10fd..ea217840d88 100644
> --- a/gcc/config/rs6000/rs6000-string.c
> +++ b/gcc/config/rs6000/rs6000-string.c
> @@ -37,6 +37,7 @@
>  #include "target.h"
>  #include "profile-count.h"
>  #include "predict.h"
> +#include "rs6000-internal.h"
>
>  /* Expand a block clear operation, and return 1 if successful.  Return 0
> if we should let the compiler generate normal code.
> @@ -2733,6 +2734,7 @@ expand_block_move (rtx operands[], bool might_overlap)
>rtx loads[MAX_MOVE_REG];
>rtx stores[MAX_MOVE_REG];
>int num_reg = 0;
> +  machine_mode elt_mode = DImode;
>
>/* If this is not a fixed size move, just call memcpy */
>if (! constp)
> @@ -2750,6 +2752,17 @@ expand_block_move (rtx operands[], bool might_overlap)
>if (bytes > rs6000_block_move_inline_limit)
>  return 0;
>
> +  tree type = TREE_TYPE (MEM_EXPR (orig_dest));
> +  if (TREE_CODE (type) == RECORD_TYPE
> +  && rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, NULL,
> +   NULL))
> +{
> +  tree field_type = TREE_TYPE (first_field (type));
> +  if (field_type && TREE_CODE (field_type) == ARRAY_TYPE
> + && TREE_CODE (TREE_TYPE (field_type)) == REAL_TYPE)
> +   elt_mode = TYPE_MODE (TREE_TYPE (field_type));
> +}
> +
>for (offset = 0; bytes > 0; offset += move_bytes, bytes -= move_bytes)
>  {
>union {
> @@ -2771,7 +2784,7 @@ expand_block_move (rtx operands[], bool might_overlap)
>&& (align >= 64 || !STRICT_ALIGNMENT))
> {
>   move_bytes = 8;
> - mode = DImode;
> + mode = elt_mode;
>   gen_func.mov = gen_movdi;
>   if (offset == 0 && align < 64)
> {
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c 
> b/gcc/testsuite/gcc.target/powerpc/pr65421.c
> new file mode 100644
> index 000..ec8f4824de5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +typedef struct
> +{
> +  double a[4];
> +} A;
> +
> +A
> +foo (const A *a)
> +{
> +  return *a;
> +}
> +
> +/* { dg-final { scan-assembler-not   {\mld\M}} } */
> +/* { dg-final { scan-assembler-not   {\mstd\M}   } } */
> +/* { dg-final { scan-assembler-times {\mlfd\M}  4 } } */
> --
> 2.21.0.777.g83232e3864
>


Re: [PATCH] contrib: Improve error text for overlong ChangeLog lines

2020-06-02 Thread Jonathan Wakely via Gcc-patches

On 02/06/20 11:18 +0100, Jonathan Wakely wrote:

On 02/06/20 11:37 +0200, Martin Liška wrote:

On 6/2/20 11:23 AM, Jonathan Wakely wrote:

On 02/06/20 10:22 +0100, Jonathan Wakely wrote:

This error is wrong, the line is what exceeds LINE_LIMIT characters, the
limit doesn't exceed itself.

contrib/ChangeLog:

* gcc-changelog/git_commit.py (GitCommit.parse_changelog): Fix
* grammar.

OK for master?




commit ba4ddb1023844202f84a32275e451f9c7a0bb7b7
Author: Jonathan Wakely 
Date:   Tue Jun 2 10:19:42 2020 +0100

   contrib: Improve error text for overlong ChangeLog lines


I used "contrib:" as the component tag here. We should probably decide
if that's right, or if it should be "git_commit" or "gcc-changelog"
(both have been used for recent commits).



I'm not much used to the components. Btw. do we have it documented somewhere?
I would use gcc-changelog.


Done, thanks.


And this fixes the test that now fails (sorry, I didn't know about the
tests). Committed to master.


commit 54a0eb7fa5f504bf26be03060b779f756d998be5
Author: Jonathan Wakely 
Date:   Tue Jun 2 11:41:28 2020 +0100

gcc-changelog: Update test to match new error text

contrib/ChangeLog:

* gcc-changelog/test_email.py (TestGccChangelog.test_long_lines):
Update expected error message.

diff --git a/contrib/gcc-changelog/test_email.py b/contrib/gcc-changelog/test_email.py
index 158eb651367..2465669786e 100755
--- a/contrib/gcc-changelog/test_email.py
+++ b/contrib/gcc-changelog/test_email.py
@@ -176,7 +176,7 @@ class TestGccChangelog(unittest.TestCase):
 def test_long_lines(self):
 email = self.get_git_email('long-lines.patch')
 assert len(email.errors) == 1
-assert email.errors[0].message == 'line limit exceeds 100 characters'
+assert email.errors[0].message == 'line exceeds 100 character limit'
 
 def test_new_files(self):
 email = self.from_patch_glob('0030')


Re: [PATCH] S/390: Emit vector alignment hints for z13

2020-06-02 Thread Stefan Schulze Frielinghaus via Gcc-patches
On Fri, May 29, 2020 at 01:57:30PM +0200, Andreas Krebbel wrote:
> On 28.05.20 20:24, Stefan Schulze Frielinghaus wrote:
> > Vector alignment hints are fully supported since z14.  On z13 alignment
> > hints have no effect, however, instructions with alignment hints are
> > still legal.  Thus, emit alignment hints also for z13 targets so that if
> > the binary is actually run on a z14 or later it benefits from such
> > hints.
> 
> More precisely the alignment hints don't have any effect before z15, are 
> described already in the
> z14 architecture but actually are also accepted by z13 hardware.

True, I removed the confusing comment and pushed.

Thanks, Stefan

> 
> > 
> > Note, this requires gas including commit f687f5f563 of the binutils
> > repository.
> > 
> > gcc/ChangeLog:
> > 
> > * config/s390/s390.c (print_operand): Emit vector alignment
> > hints for z13.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/vector/align-1.c: Change target architecture
> > to z13.
> > * gcc.target/s390/vector/align-2.c: Change target architecture
> > to z13.
> 
> Ok. Thanks!
> 
> Andreas
> 
> > ---
> >  gcc/config/s390/s390.c | 7 ++-
> >  gcc/testsuite/gcc.target/s390/vector/align-1.c | 2 +-
> >  gcc/testsuite/gcc.target/s390/vector/align-2.c | 2 +-
> >  3 files changed, 8 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> > index 4de3129f88e..b5fd5a2f3ed 100644
> > --- a/gcc/config/s390/s390.c
> > +++ b/gcc/config/s390/s390.c
> > @@ -7854,7 +7854,12 @@ print_operand (FILE *file, rtx x, int code)
> >  {
> >  case 'A':
> >  #ifdef HAVE_AS_VECTOR_LOADSTORE_ALIGNMENT_HINTS
> > -  if (TARGET_Z14 && MEM_P (x))
> > +  /* Vector alignment hints are fully supported since z14.  On z13
> > +alignment hints have no effect, however, instructions with alignment
> > +hints are still legal.  Thus, emit alignment hints also for z13
> > +targets so that if the binary is actually run on a z14 or later it
> > +benefits from such hints.  */
> > +  if (TARGET_Z13 && MEM_P (x))
> > {
> >   if (MEM_ALIGN (x) >= 128)
> > fprintf (file, ",4");
> > diff --git a/gcc/testsuite/gcc.target/s390/vector/align-1.c 
> > b/gcc/testsuite/gcc.target/s390/vector/align-1.c
> > index ccad22a..6997af2ddcd 100644
> > --- a/gcc/testsuite/gcc.target/s390/vector/align-1.c
> > +++ b/gcc/testsuite/gcc.target/s390/vector/align-1.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O3 -mzarch -march=z14" } */
> > +/* { dg-options "-O3 -mzarch -march=z13" } */
> >  
> >  /* The user alignment ends up in DECL_ALIGN of the VAR_DECL and is
> > currently ignored if it is smaller than the alignment of the type.
> > diff --git a/gcc/testsuite/gcc.target/s390/vector/align-2.c 
> > b/gcc/testsuite/gcc.target/s390/vector/align-2.c
> > index e4e2fba6a58..00e09d3eadb 100644
> > --- a/gcc/testsuite/gcc.target/s390/vector/align-2.c
> > +++ b/gcc/testsuite/gcc.target/s390/vector/align-2.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O3 -mzarch -march=z14" } */
> > +/* { dg-options "-O3 -mzarch -march=z13" } */
> >  
> >  /* The user alignment ends up in TYPE_ALIGN of the type of the
> > VAR_DECL.  */
> > 
> 


Re: [PATCH] contrib: Improve error text for overlong ChangeLog lines

2020-06-02 Thread Jonathan Wakely via Gcc-patches

On 02/06/20 11:37 +0200, Martin Liška wrote:

On 6/2/20 11:23 AM, Jonathan Wakely wrote:

On 02/06/20 10:22 +0100, Jonathan Wakely wrote:

This error is wrong, the line is what exceeds LINE_LIMIT characters, the
limit doesn't exceed itself.

contrib/ChangeLog:

* gcc-changelog/git_commit.py (GitCommit.parse_changelog): Fix
* grammar.

OK for master?




commit ba4ddb1023844202f84a32275e451f9c7a0bb7b7
Author: Jonathan Wakely 
Date:   Tue Jun 2 10:19:42 2020 +0100

   contrib: Improve error text for overlong ChangeLog lines


I used "contrib:" as the component tag here. We should probably decide
if that's right, or if it should be "git_commit" or "gcc-changelog"
(both have been used for recent commits).



I'm not much used to the components. Btw. do we have it documented somewhere?
I would use gcc-changelog.


Done, thanks.



Re: openmp: Add basic library allocator support

2020-06-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 02, 2020 at 11:26:37AM +0200, Sebastian Huber wrote:
> with this patch I get the following error for target arm-rtems6:
> 
> ../../../gnu-mirror-gcc-86b14bb/libgomp/allocator.c: In function 'omp_free':
> ../../../gnu-mirror-gcc-86b14bb/libgomp/allocator.c:351:42: error: 'struct
> omp_mem_header' has no member named 'new_size'
>   351 |    allocator_data->used_pool_size -= data->new_size;
>       |                                 
>          ^~

Oops, sorry, fixed thusly, tested also with #undef HAVE_SYNC_BUILTINS early
in the file, committed to trunk.

2020-06-02  Jakub Jelinek  

* allocator.c (omp_free): Fix up build if HAVE_SYNC_BUILTINS is not
defined.

--- libgomp/allocator.c
+++ libgomp/allocator.c
@@ -348,7 +348,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
  MEMMODEL_RELAXED);
 #else
  gomp_mutex_lock (&allocator_data->lock);
- allocator_data->used_pool_size -= data->new_size;
+ allocator_data->used_pool_size -= data->size;
  gomp_mutex_unlock (&allocator_data->lock);
 #endif
}


Jakub



[PATCH] rs6000: Use REAL_TYPE to copy when block move array in structure[PR65421]

2020-06-02 Thread Xionghu Luo via Gcc-patches
Double array in structure as function arguments or return value is accessed
by BLKmode, they are stored to stack and load from stack with redundant
conversion from DF->DI->DF.  This patch checks the homogeneous type and
use the actual element type to do block move to by pass the conversions.

gcc/ChangeLog:

2020-06-02  Xionghu Luo  

PR target/65421
* config/rs6000/rs6000-string.c (expand_block_move): Use
elt_mode to copy when homogeneous REAL_TYPE.

gcc/testsuite/ChangeLog:

2020-06-02  Xionghu Luo  

PR target/65421
* gcc.target/powerpc/pr65421.c: New test.
---
 gcc/config/rs6000/rs6000-string.c  | 15 ++-
 gcc/testsuite/gcc.target/powerpc/pr65421.c | 17 +
 2 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c

diff --git a/gcc/config/rs6000/rs6000-string.c 
b/gcc/config/rs6000/rs6000-string.c
index fe7177f10fd..ea217840d88 100644
--- a/gcc/config/rs6000/rs6000-string.c
+++ b/gcc/config/rs6000/rs6000-string.c
@@ -37,6 +37,7 @@
 #include "target.h"
 #include "profile-count.h"
 #include "predict.h"
+#include "rs6000-internal.h"
 
 /* Expand a block clear operation, and return 1 if successful.  Return 0
if we should let the compiler generate normal code.
@@ -2733,6 +2734,7 @@ expand_block_move (rtx operands[], bool might_overlap)
   rtx loads[MAX_MOVE_REG];
   rtx stores[MAX_MOVE_REG];
   int num_reg = 0;
+  machine_mode elt_mode = DImode;
 
   /* If this is not a fixed size move, just call memcpy */
   if (! constp)
@@ -2750,6 +2752,17 @@ expand_block_move (rtx operands[], bool might_overlap)
   if (bytes > rs6000_block_move_inline_limit)
 return 0;
 
+  tree type = TREE_TYPE (MEM_EXPR (orig_dest));
+  if (TREE_CODE (type) == RECORD_TYPE
+  && rs6000_discover_homogeneous_aggregate (TYPE_MODE (type), type, NULL,
+   NULL))
+{
+  tree field_type = TREE_TYPE (first_field (type));
+  if (field_type && TREE_CODE (field_type) == ARRAY_TYPE
+ && TREE_CODE (TREE_TYPE (field_type)) == REAL_TYPE)
+   elt_mode = TYPE_MODE (TREE_TYPE (field_type));
+}
+
   for (offset = 0; bytes > 0; offset += move_bytes, bytes -= move_bytes)
 {
   union {
@@ -2771,7 +2784,7 @@ expand_block_move (rtx operands[], bool might_overlap)
   && (align >= 64 || !STRICT_ALIGNMENT))
{
  move_bytes = 8;
- mode = DImode;
+ mode = elt_mode;
  gen_func.mov = gen_movdi;
  if (offset == 0 && align < 64)
{
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c 
b/gcc/testsuite/gcc.target/powerpc/pr65421.c
new file mode 100644
index 000..ec8f4824de5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+typedef struct
+{
+  double a[4];
+} A;
+
+A
+foo (const A *a)
+{
+  return *a;
+}
+
+/* { dg-final { scan-assembler-not   {\mld\M}} } */
+/* { dg-final { scan-assembler-not   {\mstd\M}   } } */
+/* { dg-final { scan-assembler-times {\mlfd\M}  4 } } */
-- 
2.21.0.777.g83232e3864



Re: [PATCH] contrib: Improve error text for overlong ChangeLog lines

2020-06-02 Thread Martin Liška

On 6/2/20 11:23 AM, Jonathan Wakely wrote:

On 02/06/20 10:22 +0100, Jonathan Wakely wrote:

This error is wrong, the line is what exceeds LINE_LIMIT characters, the
limit doesn't exceed itself.

contrib/ChangeLog:

* gcc-changelog/git_commit.py (GitCommit.parse_changelog): Fix
* grammar.

OK for master?




commit ba4ddb1023844202f84a32275e451f9c7a0bb7b7
Author: Jonathan Wakely 
Date:   Tue Jun 2 10:19:42 2020 +0100

   contrib: Improve error text for overlong ChangeLog lines


I used "contrib:" as the component tag here. We should probably decide
if that's right, or if it should be "git_commit" or "gcc-changelog"
(both have been used for recent commits).



I'm not much used to the components. Btw. do we have it documented somewhere?
I would use gcc-changelog.

Martin



Re: [PATCH] contrib: Improve error text for overlong ChangeLog lines

2020-06-02 Thread Martin Liška

On 6/2/20 11:22 AM, Jonathan Wakely wrote:

OK for master?


You're right.
Please install it.

Martin


Re: [PATCH] Optimize and+or+sub into xor+not (PR94882)

2020-06-02 Thread Richard Biener via Gcc-patches
On Fri, May 29, 2020 at 5:56 AM Naveen Hurugalawadi via Gcc-patches
 wrote:
>
> Hi,
>
> Please find attached the patch that addresses PR94882.
>
> Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Is the pattern correct for saturating arithmetic?  Some related
patterns test !TYPE_SATURATING.   I guess that the
pattern is still correct for

(unsigned)((x & y) - (x | y)) + -1u

which would be one more variant to test.  That is, the need to
spell out three different cases (do they all appear for signed
integer?) hints at lack of canonicalization (for the fear of
undefined overflow I guess).  Also raises the question
whether the pattern should be guarded with !TYPE_OVERFLOW_SANITIZED
and/or !TYPE_OVERFLOW_TRAPS

+ (minus (bit_and @0 @1) (plus:c (bit_ior @0 @1) integer_onep))

the plus does not required :c (constants are canonicalized last)

+ (bit_not (bit_xor @0 @1)))

can you add some cases with unsigned types and types
smaller than int (thus with integral promotion applied)?

Thanks,
Richard.

> Thanks,
> Naveen
>
> match.pd: (x & y) - (x | y) - 1 -> ~(x ^ y) simplification [PR94882]
>
> 2029-05-04  Naveen H S  
>
> PR tree-optimization/94882
>
> * match.pd (x & y) - (x | y) - 1 -> ~(x ^ y): New simplification.
>
> * gcc.dg/tree-ssa/pr94882.c: New test.
> * gcc.dg/tree-ssa/pr94882-1.c: New test.


Re: openmp: Add basic library allocator support

2020-06-02 Thread Sebastian Huber

Hello,

On 19/05/2020 10:24, Jakub Jelinek via Gcc-patches wrote:

+  gomp_mutex_lock (&allocator_data->lock);
+  if (__builtin_add_overflow (allocator_data->used_pool_size, new_size,
+ &used_pool_size)
+ || used_pool_size > allocator_data->pool_size)
+   {
+ gomp_mutex_unlock (&allocator_data->lock);
+ goto fail;
+   }
+  allocator_data->used_pool_size = used_pool_size;
+  gomp_mutex_unlock (&allocator_data->lock);
+#endif
+  ptr = malloc (new_size);
+  if (ptr == NULL)
+   {
+#ifdef HAVE_SYNC_BUILTINS
+ __atomic_add_fetch (&allocator_data->used_pool_size, -new_size,
+ MEMMODEL_RELAXED);
+#else
+ gomp_mutex_lock (&allocator_data->lock);
+ allocator_data->used_pool_size -= new_size;
+ gomp_mutex_unlock (&allocator_data->lock);
+#endif


with this patch I get the following error for target arm-rtems6:

../../../gnu-mirror-gcc-86b14bb/libgomp/allocator.c: In function 'omp_free':
../../../gnu-mirror-gcc-86b14bb/libgomp/allocator.c:351:42: error: 
'struct omp_mem_header' has no member named 'new_size'

  351 |    allocator_data->used_pool_size -= data->new_size;
  |  ^~



Re: [PATCH] contrib: Improve error text for overlong ChangeLog lines

2020-06-02 Thread Jonathan Wakely via Gcc-patches

On 02/06/20 10:22 +0100, Jonathan Wakely wrote:

This error is wrong, the line is what exceeds LINE_LIMIT characters, the
limit doesn't exceed itself.

contrib/ChangeLog:

* gcc-changelog/git_commit.py (GitCommit.parse_changelog): Fix
* grammar.

OK for master?




commit ba4ddb1023844202f84a32275e451f9c7a0bb7b7
Author: Jonathan Wakely 
Date:   Tue Jun 2 10:19:42 2020 +0100

   contrib: Improve error text for overlong ChangeLog lines


I used "contrib:" as the component tag here. We should probably decide
if that's right, or if it should be "git_commit" or "gcc-changelog"
(both have been used for recent commits).




[PATCH] contrib: Improve error text for overlong ChangeLog lines

2020-06-02 Thread Jonathan Wakely via Gcc-patches
This error is wrong, the line is what exceeds LINE_LIMIT characters, the
limit doesn't exceed itself.

contrib/ChangeLog:

* gcc-changelog/git_commit.py (GitCommit.parse_changelog): Fix
* grammar.

OK for master?

commit ba4ddb1023844202f84a32275e451f9c7a0bb7b7
Author: Jonathan Wakely 
Date:   Tue Jun 2 10:19:42 2020 +0100

contrib: Improve error text for overlong ChangeLog lines

This error is wrong, the line is what exceeds LINE_LIMIT characters, the
limit doesn't exceed itself.

contrib/ChangeLog:

* gcc-changelog/git_commit.py (GitCommit.parse_changelog): Fix
* grammar.

diff --git a/contrib/gcc-changelog/git_commit.py 
b/contrib/gcc-changelog/git_commit.py
index 4f82b58f64b..4f9c37cf0de 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -346,7 +346,7 @@ class GitCommit:
 if line != line.rstrip():
 self.errors.append(Error('trailing whitespace', line))
 if len(line.replace('\t', ' ' * TAB_WIDTH)) > LINE_LIMIT:
-self.errors.append(Error('line limit exceeds %d characters'
+self.errors.append(Error('line exceeds %d character limit'
  % LINE_LIMIT, line))
 m = changelog_regex.match(line)
 if m:


Re: [PATCH] contrib: Make prepare-commit-msg hook safe for older branches

2020-06-02 Thread Martin Liška

On 6/2/20 11:16 AM, Jonathan Wakely wrote:

OK for master?


I like the patch.

Martin


[PATCH] contrib: Make prepare-commit-msg hook safe for older branches

2020-06-02 Thread Jonathan Wakely via Gcc-patches
If a user installs this script as .git/hooks/prepare-commit-msg and then
works on an old branch which doesn't have the mklog.py script, trying to
commit will fail with an error like:

environment: /.../gcc/contrib/mklog.py: No such file or directory

This makes it exit cleanly so it's possible to commit.

contrib/ChangeLog:

* prepare-commit-msg: Do nothing if the mklog.py script isn't
present.

OK for master?


commit b4645b28abbed1c55e330071266f9e31a7816c28
Author: Jonathan Wakely 
Date:   Tue Jun 2 10:10:19 2020 +0100

contrib: Make prepare-commit-msg hook safe for older branches

If a user installs this script as .git/hooks/prepare-commit-msg and then
works on an old branch which doesn't have the mklog.py script, trying to
commit will fail with an error like:

environment: /.../gcc/contrib/mklog.py: No such file or directory

This makes it exit cleanly so it's possible to commit.

contrib/ChangeLog:

* prepare-commit-msg: Do nothing if the mklog.py script isn't
present.

diff --git a/contrib/prepare-commit-msg b/contrib/prepare-commit-msg
index fd59bfbcf88..24f0783aae2 100755
--- a/contrib/prepare-commit-msg
+++ b/contrib/prepare-commit-msg
@@ -23,6 +23,9 @@ COMMIT_MSG_FILE=$1
 COMMIT_SOURCE=$2
 SHA1=$3
 
+# We might be on a branch before the file was added.
+if ! [ -x contrib/mklog.py ]; then exit 0; fi
+
 # Can't do anything if $COMMIT_MSG_FILE isn't a file.
 if ! [ -f "$COMMIT_MSG_FILE" ]; then exit 0; fi
 


Re: [stage1][PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2020-06-02 Thread Martin Liška

PING^2

On 5/15/20 11:58 AM, Martin Liška wrote:

We're in stage1: PING^1

On 4/3/20 8:15 PM, Egeyar Bagcioglu wrote:



On 3/18/20 10:05 AM, Martin Liška wrote:

On 3/17/20 7:43 PM, Egeyar Bagcioglu wrote:

Hi Martin,

I like the patch. It definitely serves our purposes at Oracle and provides 
another way to do what my previous patches did as well.

1) It keeps the backwards compatibility regarding -frecord-gcc-switches; 
therefore, removes my related doubts about your previous patch.

2) It still makes use of -frecord-gcc-switches. The new option is only to 
control the format. This addresses some previous objections to having a new 
option doing something similar. Now the new option controls the behaviour of 
the existing one and that behaviour can be further extended.

3) It uses an environment variable as Jakub suggested.

The patch looks good and I confirm that it works for our purposes.


Hello.

Thank you for the support.



Having said that, I have to ask for recognition in this patch for my and my 
company's contributions. Can you please keep my name and my work email in the 
changelog and in the commit message?


Sure, sorry I forgot.


Hi Martin,

I noticed that some comments in the patch were still referring to 
--record-gcc-command-line, the option I suggested earlier. I updated those 
comments to mention -frecord-gcc-switches-format instead and also added my name 
to the patch as you agreed above. I attached the updated patch. We are starting 
to use this patch in the specific domain where we need its functionality.

Regards
Egeyar




Martin



Thanks
Egeyar



On 3/17/20 2:53 PM, Martin Liška wrote:

Hi.

I'm sending enhanced patch that makes the following changes:
- a new option -frecord-gcc-switches-format is added; the option
  selects format (processed, driver) for all options that record
  GCC command line
- Dwarf gen_produce_string is now used in -fverbose-asm
- The .s file is affected in the following way:

BEFORE:

# GNU C17 (SUSE Linux) version 9.2.1 20200128 [revision 
83f65674e78d97d27537361de1a9d74067ff228d] (x86_64-suse-linux)
#    compiled by GNU C version 9.2.1 20200128 [revision 
83f65674e78d97d27537361de1a9d74067ff228d], GMP version 6.2.0, MPFR version 
4.0.2, MPC version 1.1.0, isl version isl-0.22.1-GMP

# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed:  -fpreprocessed test.i -march=znver1 -mmmx -mno-3dnow
# -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -msha
# -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mno-sgx
# -mbmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1
# -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mrdseed -mprfchw
# -madx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
# -mno-avx512pf -mno-prefetchwt1 -mclflushopt -mxsavec -mxsaves
# -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi
# -mno-avx5124fmaps -mno-avx5124vnniw -mno-clwb -mmwaitx -mclzero -mno-pku
# -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni
# -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-movdiri -mno-movdir64b
# -mno-waitpkg -mno-cldemote -mno-ptwrite --param l1-cache-size=32
# --param l1-cache-line-size=64 --param l2-cache-size=512 -mtune=znver1
# -grecord-gcc-switches -g -fverbose-asm -frecord-gcc-switches
# options enabled:  -faggressive-loop-optimizations -fassume-phsa
# -fasynchronous-unwind-tables -fauto-inc-dec -fcommon
# -fdelete-null-pointer-checks -fdwarf2-cfi-asm -fearly-inlining
# -feliminate-unused-debug-types -ffp-int-builtin-inexact -ffunction-cse
# -fgcse-lm -fgnu-runtime -fgnu-unique -fident -finline-atomics
# -fipa-stack-alignment -fira-hoist-pressure -fira-share-save-slots
# -fira-share-spill-slots -fivopts -fkeep-static-consts
# -fleading-underscore -flifetime-dse -flto-odr-type-merging -fmath-errno
# -fmerge-debug-strings -fpeephole -fplt -fprefetch-loop-arrays
# -frecord-gcc-switches -freg-struct-return -fsched-critical-path-heuristic
# -fsched-dep-count-heuristic -fsched-group-heuristic -fsched-interblock
# -fsched-last-insn-heuristic -fsched-rank-heuristic -fsched-spec
# -fsched-spec-insn-heuristic -fsched-stalled-insns-dep -fschedule-fusion
# -fsemantic-interposition -fshow-column -fshrink-wrap-separate
# -fsigned-zeros -fsplit-ivs-in-unroller -fssa-backprop -fstdarg-opt
# -fstrict-volatile-bitfields -fsync-libcalls -ftrapping-math -ftree-cselim
# -ftree-forwprop -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon
# -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop
# -ftree-reassoc -ftree-scev-cprop -funit-at-a-time -funwind-tables
# -fverbose-asm -fzero-initialized-in-bss -m128bit-long-double -m64 -m80387
# -mabm -madx -maes -malign-stringops -mavx -mavx2
# -mavx256-split-unaligned-store -mbmi -mbmi2 -mclflushopt -mclzero -mcx16
# -mf16c -mfancy-math-387 -mfma -mfp-ret-in-387 -mfsgsbase -mfxsr -mglibc
# -mieee-fp -mlong-double-80 -mlzcnt -mmmx -mmovbe -mmwaitx -mpclmul
# -mp

Re: [PATCH] contrib: Add gdc.test to list of ignored prefixes

2020-06-02 Thread Martin Liška

OK?


Yes.


[PATCH] Do not copy NULL string with memcpy.

2020-06-02 Thread Martin Liška

The problem happens when we generate temp file
for .res file. Tested locally with the problematic
options.

Ready for master?
Thanks,
Martin

gcc/ChangeLog:

PR driver/95456
* gcc.c (do_spec_1): Append to tempfile only when
input_basename != NULL.
---
 gcc/gcc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index e2362175f40..6dea22c0669 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -6031,7 +6031,7 @@ do_spec_1 (const char *spec, int inswitch, const char 
*soft_matched_part)
tmp = (char *) alloca (temp_filename_length);
if (dumpdir_length)
  memcpy (tmp, dumpdir, dumpdir_length);
-   if (!outbase_length)
+   if (!outbase_length && input_basename != NULL)
  memcpy (tmp + dumpdir_length, input_basename,
  basename_length);
else
--
2.26.2



[PATCH 5/7 v3] vect: Support vector load/store with length in vectorizer

2020-06-02 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2020/5/29 下午4:32, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> on 2020/5/27 下午6:02, Richard Sandiford wrote:
>>> "Kewen.Lin"  writes:
 Hi Richard,


Snip ...

>>
>> Thanks a lot for your detailed explanation!  This proposal looks good
>> based on the current implementation of both masking and length.  I may
>> think too much, but I had a bit concern as below when some targets have
>> both masking and length supports in future, such as ppc adds masking
>> support like SVE.
>>
>> I assumed that you meant each vectorizable_* routine should record the
>> objs for any available partial vectorisation approaches.  If one target
>> supports both, we would have both recorded but decide not to do partial
>> vectorisation finally since both have records.  The target can disable
>> length like through optab to resolve it, but there is one possibility
>> that the masking support can be imperfect initially since ISA support
>> could be gradual, it further leads some vectorizable_* check or final
>> verification to fail for masking, and length approach may work here but
>> it gets disabled.  We can miss to use partial vectorisation here.
>>
>> The other assumption is that each vectorizable_* routine record the 
>> first available partial vectorisation approach, let's assume masking
>> takes preference, then it's fine to record just one here even if one
>> target supports both approaches, but we still have the possiblity to
>> miss the partial vectorisation chance as some check/verify fail with
>> masking but fine with length.
>>
>> Does this concern make sense?
> 
> There's nothing to stop us using masks and lengths in the same loop
> in future if we need to.  It would “just” be a case of setting up both
> the masks and the lengths in vect_set_loop_condition.  But the point is
> that doing that would be extra code, and there's no point writing that
> extra code until it's needed.
> 
> If some future arch does support both mask-based and length-based
> approaches, I think that's even less reason to make a binary choice
> between them.  How we prioritise the length and mask approaches when
> both are available is something that we'll have to decide at the time.
> 
> If your concern is that the arch might support masked operations
> without wanting them to be used for loop control, we could test for
> that case by checking whether while_ult_optab is implemented.
> 
> Thanks,
> Richard
> 

Thanks for your further expalanation, as you pointed out, my concern
is just one case of mixing mask-based and length-based.  I didn't
realize it and thought we still used one approach for one loop at the
time, but it's senseless.

The v3 patch attached to use can_partial_vect_p.  In the regression
testing with explicit vect-with-length-scope setting, I saw several
reduction failures, updated vectorizable_condition to set
can_partial_vect_p to false for !EXTRACT_LAST_REDUCTION under your
guidance to ensure it either records sth. or clearing
can_partial_vect_p.

Bootstrapped/regtested on powerpc64le-linux-gnu P9 and no remarkable
failures found even with explicit vect-with-length-scope settings.

But I met one regression failure on aarch64-linux-gnu as below:

PASS->FAIL: gcc.target/aarch64/sve/reduc_8.c -march=armv8.2-a+sve  
scan-assembler-not \\tcmpeq\\tp[0-9]+\\.s,

It's caused by vectorizable_condition's change, without the change,
it can use fully-masking for the outer loop.  The reduction_type is
TREE_CODE_REDUCTION here, so can_partial_vect_p gets cleared.

>From the optimized dumping, the previous IRs look fine.  It's doing
reduction for inner loop, but we are checking partial vectorisation
for the outer loop.  I'm not sure whether to adjust the current
guard is reasonable for this case.  Could you help to give some
insights?  Thanks in advance!

BR,
Kewen
--
gcc/ChangeLog

2020-MM-DD  Kewen Lin  

* doc/invoke.texi (vect-with-length-scope): Document new option.
* params.opt (vect-with-length-scope): New.
* tree-vect-loop-manip.c (vect_set_loop_mask): Renamed to ...
(vect_set_loop_mask_or_len): ... this.  Update variable names
accordingly.
(vect_maybe_permute_loop_masks): Replace rgroup_masks with rgroup_objs.
(vect_set_loop_masks_directly): Renamed to ...
(vect_set_loop_objs_directly): ... this.  Extend the support to cover
vector with length, call vect_gen_len for length, replace rgroup_masks
with rgroup_objs, replace vect_set_loop_mask with
vect_set_loop_mask_or_len.
(vect_set_loop_condition_masked): Renamed to ...
(vect_set_loop_condition_partial): ... this.  Extend the support to
cover length-based partial vectorization, replace rgroup_masks with
rgroup_objs, replace vect_iv_limit_for_full_masking with
vect_iv_limit_for_partial_vect.
(vect_set_loop_condition_unmasked): Renamed to ...
(vect_set_loop_condition_normal): ... this.
(vect_set_loop_co

[Ada] Get rid of more references to Universal_Integer in expanded code

2020-06-02 Thread Pierre-Marie de Rodat
This further tweaks the expanded code generated by the front-end, so as
to avoid having references to Universal_Integer reaching the code
generator, either directly or indirectly through attributes returning
Universal_Integer. There is also a minor tweak to the a-sequio.adb unit
of the runtime to the same effect.

The reason is that Universal_Integer must be a type as large as the
largest supported integer type and, therefore, can be much larger than
what is really needed here.

No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

2020-06-02  Eric Botcazou  

gcc/ada/

* exp_aggr.adb (Build_Array_Aggr_Code): Set the type of the PAT
on the zero used to clear the array.
* exp_attr.adb (Expand_N_Attribute_Reference)
: In the CW case, directly convert from the
alignment's type to the target type if the parent is an
unchecked conversion.
* sem_res.adb (Set_String_Literal_Subtype): In the dynamic case,
use the general expression for the upper bound only when needed.
Set the base type of the index as the type of the low bound.
(Simplify_Type_Conversion): Do an intermediate conversion to the
root type of the target type if the operand is an integer
literal.
* tbuild.adb (Convert_To): Get rid of an intermediate conversion
to Universal_Integer if the inner expression has integer tyoe.
* libgnat/a-sequio.adb (Byte_Swap): Make use of an equivalent
static expression in the case statement.--- gcc/ada/exp_aggr.adb
+++ gcc/ada/exp_aggr.adb
@@ -2043,12 +2043,15 @@ package body Exp_Aggr is
 and then Is_Bit_Packed_Array (Typ)
 and then Is_Modular_Integer_Type (Packed_Array_Impl_Type (Typ))
   then
- Append_To (New_Code,
-   Make_Assignment_Statement (Loc,
- Name   => New_Copy_Tree (Into),
- Expression =>
-   Unchecked_Convert_To (Typ,
- Make_Integer_Literal (Loc, Uint_0;
+ declare
+Zero : constant Node_Id := Make_Integer_Literal (Loc, Uint_0);
+ begin
+Analyze_And_Resolve (Zero, Packed_Array_Impl_Type (Typ));
+Append_To (New_Code,
+  Make_Assignment_Statement (Loc,
+Name   => New_Copy_Tree (Into),
+Expression => Unchecked_Convert_To (Typ, Zero)));
+ end;
   end if;
 
   --  If the component type contains tasks, we need to build a Master

--- gcc/ada/exp_attr.adb
+++ gcc/ada/exp_attr.adb
@@ -2459,12 +2459,20 @@ package body Exp_Attr is
 
 New_Node := Build_Get_Alignment (Loc, New_Node);
 
+--  Case where the context is an unchecked conversion to a specific
+--  integer type. We directly convert from the alignment's type.
+
+if Nkind (Parent (N)) = N_Unchecked_Type_Conversion then
+   Rewrite (N, New_Node);
+   Analyze_And_Resolve (N);
+   return;
+
 --  Case where the context is a specific integer type with which
 --  the original attribute was compatible. But the alignment has a
 --  specific type in a-tags.ads (Standard.Natural) so, in order to
 --  preserve type compatibility, we must convert explicitly.
 
-if Typ /= Standard_Natural then
+elsif Typ /= Standard_Natural then
New_Node := Convert_To (Typ, New_Node);
 end if;
 

--- gcc/ada/libgnat/a-sequio.adb
+++ gcc/ada/libgnat/a-sequio.adb
@@ -73,7 +73,7 @@ package body Ada.Sequential_IO is
procedure Byte_Swap (Siz : in out size_t) is
   use System.Byte_Swapping;
begin
-  case Siz'Size is
+  case size_t'Size is
  when 32 => Siz := size_t (Bswap_32 (U32 (Siz)));
  when 64 => Siz := size_t (Bswap_64 (U64 (Siz)));
  when others => raise Program_Error;

--- gcc/ada/sem_res.adb
+++ gcc/ada/sem_res.adb
@@ -266,7 +266,8 @@ package body Sem_Res is
procedure Simplify_Type_Conversion (N : Node_Id);
--  Called after N has been resolved and evaluated, but before range checks
--  have been applied. Currently simplifies a combination of floating-point
-   --  to integer conversion and Rounding or Truncation attribute.
+   --  to integer conversion and Rounding or Truncation attribute, and also the
+   --  conversion of an integer literal to a dynamic integer type.
 
function Unique_Fixed_Point_Type (N : Node_Id) return Entity_Id;
--  A universal_fixed expression in an universal context is unambiguous if
@@ -12477,37 +12478,51 @@ package body Sem_Res is
 
   --  If the lower bound is not static we create a range for the string
   --  literal, using the index type and the known length of the literal.
-  --  The index type is not necessarily Positive, so the upper bound is
-  --  computed as T'Val (T'Pos (Low_Bound) + L - 1).
+  --  If the length is 1, then the up

[Ada] Get rid of more references to Universal_Integer in expanded code

2020-06-02 Thread Pierre-Marie de Rodat
This further tweaks the expanded code generated by the front-end, so as
to avoid having references to Universal_Integer reaching the code
generator, either directly or indirectly through attributes returning
Universal_Integer.

The reason is that Universal_Integer must be a type as large as the
largest supported integer type and, therefore, can be much larger than
what is really needed here.

No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

2020-06-02  Eric Botcazou  

gcc/ada/

* exp_aggr.adb (Others_Check): In the positional case, use the
general expression for the comparison only when needed.
* exp_attr.adb (Expand_Fpt_Attribute;): Use a simple conversion
to the target type instead of an unchecked conversion to the
base type to do the range check, as in the other cases.
(Expand_N_Attribute_Reference) : Do the
Max operation in the type of the storage size variable, and use
Convert_To as in the other cases.
* tbuild.adb (Convert_To): Do not get rid of an intermediate
conversion to Universal_Integer here...
* sem_res.adb  (Simplify_Type_Conversion): ...but here instead.--- gcc/ada/exp_aggr.adb
+++ gcc/ada/exp_aggr.adb
@@ -5853,26 +5853,51 @@ package body Exp_Aggr is
  --   raise Constraint_Error;
  --end if;
 
+ --  in the general case, but the following simpler test:
+
+ --[constraint_error when
+ --  Aggr_Lo + (Nb_Elements - 1) > Aggr_Hi];
+
+ --  instead if the index type is a signed integer.
+
  elsif Nb_Elements > Uint_0 then
-Cond :=
-  Make_Op_Gt (Loc,
-Left_Opnd  =>
-  Make_Op_Add (Loc,
-Left_Opnd  =>
-  Make_Attribute_Reference (Loc,
-Prefix => New_Occurrence_Of (Ind_Typ, Loc),
-Attribute_Name => Name_Pos,
-Expressions=>
-  New_List
-(Duplicate_Subexpr_Move_Checks (Aggr_Lo))),
-Right_Opnd => Make_Integer_Literal (Loc, Nb_Elements - 1)),
+if Nb_Elements = Uint_1 then
+   Cond :=
+ Make_Op_Gt (Loc,
+   Left_Opnd  => Duplicate_Subexpr_Move_Checks (Aggr_Lo),
+   Right_Opnd => Duplicate_Subexpr_Move_Checks (Aggr_Hi));
+
+elsif Is_Signed_Integer_Type (Ind_Typ) then
+   Cond :=
+ Make_Op_Gt (Loc,
+   Left_Opnd  =>
+ Make_Op_Add (Loc,
+   Left_Opnd  => Duplicate_Subexpr_Move_Checks (Aggr_Lo),
+   Right_Opnd =>
+ Make_Integer_Literal (Loc, Nb_Elements - 1)),
+   Right_Opnd => Duplicate_Subexpr_Move_Checks (Aggr_Hi));
 
-Right_Opnd =>
-  Make_Attribute_Reference (Loc,
-Prefix => New_Occurrence_Of (Ind_Typ, Loc),
-Attribute_Name => Name_Pos,
-Expressions=> New_List (
-  Duplicate_Subexpr_Move_Checks (Aggr_Hi;
+else
+   Cond :=
+ Make_Op_Gt (Loc,
+   Left_Opnd  =>
+ Make_Op_Add (Loc,
+   Left_Opnd  =>
+ Make_Attribute_Reference (Loc,
+   Prefix => New_Occurrence_Of (Ind_Typ, Loc),
+   Attribute_Name => Name_Pos,
+   Expressions=>
+ New_List
+   (Duplicate_Subexpr_Move_Checks (Aggr_Lo))),
+   Right_Opnd => Make_Integer_Literal (Loc, Nb_Elements - 1)),
+
+   Right_Opnd =>
+ Make_Attribute_Reference (Loc,
+   Prefix => New_Occurrence_Of (Ind_Typ, Loc),
+   Attribute_Name => Name_Pos,
+   Expressions=> New_List (
+ Duplicate_Subexpr_Move_Checks (Aggr_Hi;
+end if;
 
  --  If we are dealing with an aggregate containing an others choice
  --  and discrete choices we generate the following test:

--- gcc/ada/exp_attr.adb
+++ gcc/ada/exp_attr.adb
@@ -1096,12 +1096,10 @@ package body Exp_Attr is
   Selector_Name => Make_Identifier (Loc, Nam));
 
   --  The generated call is given the provided set of parameters, and then
-  --  wrapped in a conversion which converts the result to the target type
-  --  We use the base type as the target because a range check may be
-  --  required.
+  --  wrapped in a conversion which converts the result to the target type.
 
   Rewrite (N,
-Unchecked_Convert_To (Base_Type (Etype (N)),
+Convert_To (Typ,
   Make_Function_Call 

[Ada] Remove ASIS_Mode

2020-06-02 Thread Pierre-Marie de Rodat
In order to simplify the long term maintenance of the GNAT frontend,
support for generating trees for ASIS is removed from trunk. ASIS is
being phased out at this stage in favor of Libadalang.

Tested on x86_64-pc-linux-gnu, committed on trunk

2020-06-02  Arnaud Charlet  

gcc/ada/

* atree.ads, checks.adb, contracts.adb, debug.adb, einfo.ads,
exp_ch3.adb, exp_util.adb, expander.ads, expander.adb,
frontend.adb, gnat1drv.adb, itypes.adb, lib.ads, namet.ads,
opt.adb, opt.ads, par-prag.adb, repinfo.ads, sem_aggr.adb,
sem_aux.ads, sem_case.ads, sem_ch10.adb, sem_ch12.adb,
sem_ch13.adb, sem_ch3.adb, sem_ch4.adb, sem_ch6.adb,
sem_dim.adb, sem_elab.adb, sem_prag.adb, sem_prag.ads,
sem_res.adb, sem_type.adb, sem_util.adb, sinfo.ads, stand.ads,
tree_io.ads: Remove references to ASIS_Mode.

patch.diff.gz
Description: application/gzip


  1   2   >