Re: Improve DOM's ability to derive equivalences when traversing edges

2017-08-30 Thread Jeff Law
On 08/29/2017 09:42 AM, Christophe Lyon wrote:
> On 29 August 2017 at 17:28, Jeff Law  wrote:
>> On 08/29/2017 03:13 AM, Christophe Lyon wrote:
>>> Hi Jeff,
>> [ ... ]

 commit a370df2c52074abbb044d1921a0c7df235758050
 Author: law 
 Date:   Tue Aug 29 05:03:36 2017 +

 * tree-ssa-dom.c (edge_info::record_simple_equiv): Call
 derive_equivalences.
 (derive_equivalences_from_bit_ior, 
 record_temporary_equivalences):
 Code moved into
 (edge_info::derive_equivalences): New private member function

 * gcc.dg/torture/pr57214.c: Fix type of loop counter.
 * gcc.dg/tree-ssa/ssa-sink-16.c: Disable DOM.
 * gcc.dg/tree-ssa/ssa-dom-thread-11.c: New test.
 * gcc.dg/tree-ssa/ssa-dom-thread-12.c: New test.
 * gcc.dg/tree-ssa/ssa-dom-thread-13.c: New test.
 * gcc.dg/tree-ssa/ssa-dom-thread-14.c: New test.
 * gcc.dg/tree-ssa/ssa-dom-thread-15.c: New test.
 * gcc.dg/tree-ssa/ssa-dom-thread-16.c: New test.
 * gcc.dg/tree-ssa/ssa-dom-thread-17.c: New test.

 git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251397 
 138bc75d-0d04-0410-961f-82ee72b054a4

>>>
>>> 3 of the new tests fail on arm-none-linux-gnueabihf
>>> --with-cpu=cortex-a15 --with-fpu=vfpv3-d16-fp16
>>>
>>> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-11.c scan-tree-dump-times dom2
>>> "Threaded" 1
>>> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-14.c scan-tree-dump-times dom2
>>> "Threaded" 1
>>> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-16.c scan-tree-dump-times dom2
>>> "Threaded" 1
>>>
>>> they do pass when configuring for cpu cortex-a9/a15 and fpu 
>>> neon-fp16/neon-vfpv4
>>>
>>> I do not have the dumps since it's automated testing; let me know if
>>> you need me to
>>> reproduce it manually and extract the dumps.
>> Strange.  I can't reproduce this.
>>
>> /home/law/gcc-testing/gcc2/configure --target=arm-none-linux-gnueabihf
>> --with-cpu=cortex-a15 --with-fpu=vfpv3-d16-fp16
>>
> 
> Sorry, it was a typo: I meant cortex-a5.
Ah.  I see it now.

Isn't this really an indicator that the ARM test in
check_effective_target_logical_op_short_circuit is wrong?

AFAICT that test just looks to see if a particular preprocessor symbol
is defined.

WOuldn't it be better to check the output of -mprint-tune-info which
should make the test DTRT in all situations?

jeff


Re: [TESTSUITE]Use strncpy instead of strcpy in testsuite/gcc.dg/memcmp-1.c

2017-08-30 Thread Mike Stump
On Aug 30, 2017, at 8:31 AM, Renlin Li  wrote:
> memcpy is better than strncpy in this case.
> Here is the updated patch.

Ok.


Re: C++ PATCH for c++/80767, unnecessary instantiation of generic lambda

2017-08-30 Thread Jason Merrill
On Tue, Aug 29, 2017 at 3:12 PM, Jason Merrill  wrote:
> In this testcase, when we try to call the object of 'overloader' type,
> we consider the conversion function from the first lambda to void
> (*)(a) and build up a surrogate call function for it.  We consider how
> to convert from overloader to that type, which means also looking at
> the template conversion function from the second lambda, which
> involves instantiating the operator() to determine the return type,
> which is ill-formed.
>
> But the standard seems to say that we shouldn't bother to consider how
> to do that conversion unless we actually choose the surrogate call
> function for that type, so that's what this patch implemenets.

We also need to adjust joust to handle a null candidate.
commit f57bf1708cc3824fd0047ecd883ecb28341bcdb8
Author: Jason Merrill 
Date:   Wed Aug 30 17:30:02 2017 -0400

PR c++/82030 - ICE inheriting from multiple lambdas

PR c++/80767
* call.c (compare_ics): Handle null candidate.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index c446057..9e4a5c1 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -9561,7 +9561,9 @@ compare_ics (conversion *ics1, conversion *ics2)
return 0;
   else if (t1->kind == ck_user)
{
- if (t1->cand->fn != t2->cand->fn)
+ tree f1 = t1->cand ? t1->cand->fn : t1->type;
+ tree f2 = t2->cand ? t2->cand->fn : t2->type;
+ if (f1 != f2)
return 0;
}
   else
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv12.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv12.C
new file mode 100644
index 000..16adee6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv12.C
@@ -0,0 +1,18 @@
+// PR c++/80767
+// { dg-do compile { target c++11 } }
+
+template  struct A { using type = U; };
+template  struct B : B::type, B::type {
+  using type = B;
+  using B::type::operator();
+};
+template  struct B { using type = F; };
+struct {
+  template ::type...>::type>
+  Overload operator()(F...){}
+} a;
+int main() {
+  auto f = a([](int) {}, [](float) {});
+  f({});
+}


Re: [C++ PATCH] Make taking the address of an overloaded function a non-deduced context

2017-08-30 Thread Jason Merrill
On Wed, Aug 30, 2017 at 1:08 PM, Ville Voutilainen
 wrote:
> On 30 August 2017 at 19:07, Jason Merrill  wrote:
>> Please also remove the error.  OK with that change.
>
> Here's a new and much improved version as discussed on IRC. Tested on
> Linux-PPC64.
> Ok for trunk?
>
> 2017-08-30  Ville Voutilainen  
>
> Make taking the address of an overloaded function a non-deduced context
>
> cp/
>
> * pt.c (unify_overload_resolution_failure): Remove.
> (unify_one_argument): Adjust.

Please add a comment before unify_success along the lines of "If there
isn't a unique match, this is a non-deduced context, so we still
succeed."  OK with that change.

Jason


Re: [PATCH] Fix PR81987 (SLSR dominance issue)

2017-08-30 Thread Richard Biener
On August 30, 2017 7:22:45 PM GMT+02:00, Bill Schmidt 
 wrote:
>Hi,
>
>https://gcc.gnu.org/PR81987 identifies an SSA verification error
>following
>SLSR.  The problem arises when SLSR places an initialization expression
>at
>a point not dominated by the definition of an SSA name it uses.  When
>there
>are multiple conditional candidates for replacement, the initialization
>expression must dominate all of these candidates and their phi
>dependencies,
>but the nearest common dominator for these may actually be above the
>stride
>definition in some cases.
>
>In such cases a single initialization point is not possible.  With
>sufficient
>analysis, it would be possible to find multiple initialization points
>that
>would satisfy availability of the stride at the cost of larger code. 
>This
>is too complex for a bug fix, though.  This patch instead refuses to
>replace
>candidates where a single legal initialization point isn't possible. 
>We
>ensure this by setting the cost for the increment associated with this
>initialization to effectively infinite.
>
>Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. 
>Is
>this okay for trunk, and backport to all supported releases after a
>period
>of burn-in?

Yes. 

Thanks, 
Richard. 

>Thanks,
>Bill
>
>
>[gcc]
>
>2017-08-30  Bill Schmidt  
>
>   PR tree-optimization/81987
>   * gimple-ssa-strength-reduction.c (insert_initializers): Don't
>   insert an initializer in a location not dominated by the stride
>   definition.
>
>[gcc/testsuite]
>
>2017-08-30  Bill Schmidt  
>
>   PR tree-optimization/81987
>   * g++.dg/torture/pr81987.C: New file.
>
>
>Index: gcc/gimple-ssa-strength-reduction.c
>===
>--- gcc/gimple-ssa-strength-reduction.c(revision 251369)
>+++ gcc/gimple-ssa-strength-reduction.c(working copy)
>@@ -3340,6 +3340,23 @@ insert_initializers (slsr_cand_t c)
>that block, the earliest one will be returned in WHERE.  */
>   bb = nearest_common_dominator_for_cands (c, incr, );
> 
>+  /* If the NCD is not dominated by the block containing the
>+   definition of the stride, we can't legally insert a
>+   single initializer.  Mark the increment as unprofitable
>+   so we don't make any replacements.  FIXME: Multiple
>+   initializers could be placed with more analysis.  */
>+  gimple *stride_def = SSA_NAME_DEF_STMT (c->stride);
>+  basic_block stride_bb = gimple_bb (stride_def);
>+
>+  if (stride_bb && !dominated_by_p (CDI_DOMINATORS, bb,
>stride_bb))
>+  {
>+if (dump_file && (dump_flags & TDF_DETAILS))
>+  fprintf (dump_file,
>+   "Initializer #%d cannot be legally placed\n", i);
>+incr_vec[i].cost = COST_INFINITE;
>+continue;
>+  }
>+
>   /* If the nominal stride has a different type than the recorded
>stride type, build a cast from the nominal stride to that type.  */
>   if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
>Index: gcc/testsuite/g++.dg/torture/pr81987.C
>===
>--- gcc/testsuite/g++.dg/torture/pr81987.C (nonexistent)
>+++ gcc/testsuite/g++.dg/torture/pr81987.C (working copy)
>@@ -0,0 +1,61 @@
>+extern short var_1;
>+extern const short var_3;
>+extern unsigned long int var_9;
>+extern short var_13;
>+extern const unsigned long int var_15;
>+extern const unsigned long int var_37;
>+extern unsigned long int var_40;
>+extern long long int var_47;
>+extern short var_48;
>+extern const short var_54;
>+extern long long int var_79;
>+extern long long int var_81;
>+extern long long int var_94;
>+extern long long int var_95;
>+extern long long int var_701;
>+extern unsigned long int var_786;
>+extern short var_788;
>+extern long long int var_844;
>+
>+struct struct_1 {
>+  short member_1_2 : 15;
>+  static long long int member_1_3;
>+};
>+
>+extern struct_1 struct_obj_6;
>+extern struct_1 struct_obj_8;
>+
>+void foo() {
>+  int a = var_3 <= 602154393864UL;
>+  if (var_81 ? 0 : var_3 && var_9)
>+;
>+  else {
>+var_94 = 0;
>+if (var_3 && var_48 || var_13) {
>+  if (var_48)
>+  var_95 = 0;
>+  short b((236446151776511UL + var_3) * (2 ? var_13 : 0) ||
>var_1);
>+  struct_obj_8.member_1_2 = b;
>+  if (var_15) {
>+  if (var_81)
>+if (var_47)
>+  ;
>+else if (var_40)
>+  var_701 = 0;
>+  } else {
>+  if (var_40)
>+var_79 = 0;
>+  if (var_54) {
>+if (var_37)
>+  var_786 = 0;
>+else
>+  var_788 = 0;
>+  struct_obj_6.member_1_3 =
>+(236446151776511UL + var_3) * (2 ? var_13 : 0);
>+  }
>+  }
>+  if ((236446151776511UL + var_3) * (2 ? var_13 : 0))
>+  var_844 = 0;
>+}
>+  }
>+}



Re: Add a partial_subreg_p predicate

2017-08-30 Thread Jeff Law
On 08/30/2017 09:41 AM, Richard Sandiford wrote:
> 
> Even then I don't think we could ever have SET_SRC and SET_DEST being
> different.  The extension has to be explicit in the source.
Yea, you should still have the same mode for the SET_SRC and SET_DEST.
It's just that the operands underneath may have different modes.

ie

(set (reg:SI) (zero_extend:SI (reg:HI))

So the SET_SRC/SET_DEST have the same modes, but the operands presented
to the allocators and reloaders have different modes.

The only reason I remember this being an "interesting" case is because
in the old allocator (pre-IRA) we were not good at tying objects of
different sizes.

Jeff


Re: Add support to trace comparison instructions and switch statements

2017-08-30 Thread Dmitry Vyukov via gcc-patches
On Sat, Aug 5, 2017 at 11:53 AM, 吴潍浠(此彼)  wrote:
> Hi all
> Is it worth adding my codes to gcc ? Are there some steps I need to do ?
> Could somebody tell me the progress ?


FYI, we've mailed a Linux kernel change that uses this instrumentation:
https://groups.google.com/forum/#!topic/syzkaller/r0ARNVV-Bhg
Another reason to have this in gcc.

Can somebody from gcc maintainers take a look at this? Jakub?

Thanks


> Maybe there should be a project like libfuzzer to solve bugs in program.
>
> Wish Wu
> --
> From:Wish Wu 
> Time:2017 Jul 21 (Fri) 13:38
> To:gcc ; gcc-patches ; Jeff Law 
> 
> Cc:wishwu007 
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> Hi Jeff
>
> I have signed the copyright assignment, and used the name 'Wish Wu' .
> Should I send you a copy of my assignment ?
>
> The attachment is my new patch with small changes.
> Codes are checked by ./contrib/check_GNU_style.sh, except some special files.
>
> With
>
> --
> From:Jeff Law 
> Time:2017 Jul 14 (Fri) 15:37
> To:Wish Wu ; gcc ; gcc-patches 
> 
> Cc:wishwu007 
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On 07/10/2017 06:07 AM, 吴潍浠(此彼) wrote:
>> Hi
>>
>> I write some codes to make gcc support comparison-guided fuzzing.
>> It is very like 
>> http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow .
>> With -fsanitize-coverage=trace-cmp the compiler will insert extra 
>> instrumentation around comparison instructions and switch statements.
>> I think it is useful for fuzzing.  :D
>>
>> Patch is below, I may supply test cases later.
> Before anyone can really look at this code you'll need to get a
> copyright assignment on file with the FSF.
>
> See:
> https://gcc.gnu.org/contribute.html
>
> If you've already done this, please let me know and I'll confirm with
> the FSF copyright clerk.
>
> Jeff


Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-08-30 Thread Richard Sandiford
"Jon Beniston"  writes:
> Hi Richard,
>
>> I think the issue is that with TARGET_VECTOR_MODE_SUPPORTED_P false 
>> for V1SI you'll get a SImode vector type and the
>>
>>  if (VECTOR_BOOLEAN_TYPE_P (type_in)
>>  || VECTOR_MODE_P (TYPE_MODE (type_in)))
>>
>>check fails.  Try changing that to || VECTOR_TYPE_P (type_in).
>>The else path should be hopefully dead ...
>>
>>> It looks to me like the other call sites of optab_for_tree_code which 
>>> are passing optab_default are mostly places where GCC is sure it is 
>>> not a shift operation.
>>> Given TYPE_IN is returned from get_vectype_for_scalar_type, is it 
>>> safe to simply pass optab_vector in vect_pattern_recog_1?
>>
>>I guess so.  Can you also try the above?
>
> Changing VECTOR_MODE_P check into VECTOR_TYPE_P check also works for me, I
> verified the following patch could vectorize my dot product case and there
> is no regression on gcc tests on my port.
>
> Meanwhile x86-64 bootstraps OK and no regression on gcc/g++ test.
>
>  Does this look OK?
>
>
> gcc/
> 2017-08-30  Jon Beniston  
> Richard Biener  
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index cfdb72c..5ebeac2 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func *recog_func,
>loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>   
>if (VECTOR_BOOLEAN_TYPE_P (type_in)
> -  || VECTOR_MODE_P (TYPE_MODE (type_in)))
> +  || VECTOR_TYPE_P (type_in))
>  {
>/* No need to check target support (already checked by the pattern
>   recognition function).  */

It looks like this makes the VECTOR_BOOLEAN_TYPE_P (type_in) check redundant.

Thanks,
Richard


Re: All carchive gotools tests fail

2017-08-30 Thread Ian Lance Taylor
On Thu, Aug 24, 2017 at 3:19 AM, Uros Bizjak  wrote:
> Hello!
>
> Following carchive gotools tests fail in gccgo testsuite:
>
> FAIL: TestCgoCallbackGC
> FAIL: TestInstall
> FAIL: TestEarlySignalHandler
> FAIL: TestSignalForwarding
> FAIL: TestSignalForwardingExternal
> FAIL: TestOsSignal
> FAIL: TestSigaltstack
> FAIL: TestPIE
>
> The problem can be seen from the gotools log dump:
>
> === RUN   TestInstall
> FAIL: TestInstall
> carchive_test.go:160: [gcc -fPIC -m64 -pthread
> -fmessage-length=0
> -fdebug-prefix-map=/tmp/go-build828417309=/tmp/go-build
> -gno-record-gcc-switches -funwind-tables -I pkg/gccgo_linux_amd64_fPIC
> -o ./testp1 main.c main_unix.c pkg/gccgo_linux_amd64_fPIC/liblibgo.a
> -lgo]
> carchive_test.go:162:
> pkg/gccgo_linux_amd64_fPIC/liblibgo.a(a.out.a.o): In function
> `__go_init_main':
> /tmp/go-build/libgo/_obj/_cgo_gotypes.go:3: undefined
> reference to `runtime.registerGCRoots'
> pkg/gccgo_linux_amd64_fPIC/liblibgo.a(a.out.a.o): In
> function `__go_init_main':
>
> /home/uros/gcc-build/gotools/carchive-test-dir/misc/cgo/testcarchive/src/libgo/libgo.go:18:
> undefined reference to `runtime.writeBarrier'
>
> /home/uros/gcc-build/gotools/carchive-test-dir/misc/cgo/testcarchive/src/libgo/libgo.go:18:
> undefined reference to `runtime.writebarrierptr'
> pkg/gccgo_linux_amd64_fPIC/liblibgo.a(a.out.a.o): In
> function `gostart':
>
> /home/uros/gcc-build/x86_64-pc-linux-gnu/libgo/../../../git/gcc/libgo/runtime/go-libmain.c:111:
> undefined reference to `runtime.schedinit'
>
> /home/uros/gcc-build/x86_64-pc-linux-gnu/libgo/../../../git/gcc/libgo/runtime/go-libmain.c:112:
> undefined reference to `runtime.main'
> collect2: error: ld returned 1 exit status
> carchive_test.go:163: exit status 1
>
> The compiler in carchive_test.go:160 is called with the following command:
>
> gcc -fPIC -m64 -pthread -fmessage-length=0
> -fdebug-prefix-map=/tmp/go-build828417309=/tmp/go-build
> -gno-record-gcc-switches -funwind-tables -I pkg/gccgo_linux_amd64_fPIC
> -o ./testp1 main.c main_unix.c pkg/gccgo_linux_amd64_fPIC/liblibgo.a
> -lgo
>
> Please note that with -lgo, the default system-wide libgo shared
> library is linked in. However, the test expects newly compiled library
> from the libgo build directory. Manually adding correct -L path to the
> above command builds functional executable.
>
>  (The system-wide library in the above example is installed from the
> current gcc-7 branch).

Thanks.  I have committed this patch which should fix the problem.
Bootstrapped and tested on x86_64-pc-linux-gnu.

Ian


2017-08-30  Ian Lance Taylor  

* configure.ac: Substitute GOC_FOR_TARGET and GCC_FOR_TARGET.
* Makefile.am (MOSTLYCLEANFILES): Add check-gcc.
(check-gccgo): Create via a temporary file.
(check-gcc): New target.
(CHECK_ENV): Set CC.
(ECHO_ENV): Report CC.
(check-go-tool): Depend on check-gcc.
(check-runtime, check-cgo-test, check-carchive-test): Likewise.
* configure, Makefile.in: Rebuild.
Index: configure.ac
===
--- configure.ac(revision 251533)
+++ configure.ac(working copy)
@@ -46,6 +46,11 @@ AC_PROG_INSTALL
 AC_PROG_CC
 AC_PROG_GO
 
+# These should be defined by the top-level configure.
+# Copy them into Makefile.
+AC_SUBST(GOC_FOR_TARGET)
+AC_SUBST(GCC_FOR_TARGET)
+
 AM_CONDITIONAL(NATIVE, test "$host_alias" = "$target_alias")
 
 dnl Test for -lsocket and -lnsl.  Copied from libjava/configure.ac.
Index: Makefile.am
===
--- Makefile.am (revision 251533)
+++ Makefile.am (working copy)
@@ -109,7 +109,8 @@ s-zdefaultcc: Makefile
 
 MOSTLYCLEANFILES = \
zdefaultcc.go s-zdefaultcc \
-   check-gccgo gotools.head *-testlog gotools.sum gotools.log *.sent
+   check-gccgo check-gcc gotools.head *-testlog gotools.sum gotools.log \
+   *.sent
 
 mostlyclean-local:
rm -rf check-go-dir check-runtime-dir cgo-test-dir carchive-test-dir
@@ -155,11 +156,22 @@ check-head:
 # check-gccgo is a little shell script that executes gccgo with the
 # options to pick up the newly built libgo.
 check-gccgo: Makefile
-   rm -f $@
-   echo "#!/bin/sh" > $@
+   rm -f $@ $@.tmp
+   echo "#!/bin/sh" > $@.tmp
abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
-   echo "$(GOCOMPILE)" '"$$@"' "-I $${abs_libgodir} -L $${abs_libgodir} -L 
$${abs_libgodir}/.libs" >> $@
-   chmod +x $@
+   echo "$(GOCOMPILE)" '"$$@"' "-I $${abs_libgodir} -L $${abs_libgodir} -L 
$${abs_libgodir}/.libs" >> $@.tmp
+   chmod +x $@.tmp
+   mv -f $@.tmp $@
+
+# check-gcc is a little shell script that executes the newly built gcc
+# with the options to pick up the newly built libgo.
+check-gcc: Makefile
+   rm -f $@ $@.tmp
+   echo "#!/bin/sh" > $@.tmp
+   abs_libgodir=`cd $(libgodir) && $(PWD_COMMAND)`; \
+   

Re: [PATCH], PR target/82015, add PowerPC warning for unpack_vector_int128 with illegal 2nd argument

2017-08-30 Thread Michael Meissner
Bill Seurer pointed out to me that when I checked in the PR 82015 patches, I
had changed the error message, but I didn't update the test.  I checked this
patch in as obvious:

2017-08-30  Michael Meissner  

PR target/82015
* gcc.target/powerpc/pr82015.c: Fix up error message.

Index: gcc/testsuite/gcc.target/powerpc/pr82015.c
===
--- gcc/testsuite/gcc.target/powerpc/pr82015.c  (revision 251538)
+++ gcc/testsuite/gcc.target/powerpc/pr82015.c  (working copy)
@@ -5,10 +5,10 @@
 
 unsigned long foo_11(vector __int128_t *p)
 {
-  return __builtin_unpack_vector_int128(*p, 11); /* { dg-error "argument 2 
must be 0 or 1" } */
+  return __builtin_unpack_vector_int128(*p, 11); /* { dg-error "argument 2 
must be a 1-bit unsigned literal" } */
 }
 
 unsigned long foo_n(vector __int128_t *p, unsigned long n)
 {
-  return __builtin_unpack_vector_int128(*p, n);/* { dg-error "argument 
2 must be 0 or 1" } */
+  return __builtin_unpack_vector_int128(*p, n);/* { dg-error "argument 
2 must be a 1-bit unsigned literal" } */
 }

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: AArch64 patch pings

2017-08-30 Thread Richard Sandiford
James Greenhalgh  writes:
> On Wed, Aug 30, 2017 at 04:34:40PM +0100, Richard Sandiford wrote:
>> Ping for a few AArch64 patches:
>> 
>> [AArch64] Remove use of wider vector modes
>> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01249.html
>>
>> [AArch64] Rename cmp_result iterator
>> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01250.html
>>
>> [AArch64] Tighten address register subreg checks
>> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01252.html
>
> I've reviewed these three, which all looked OK to me.
>
>> [AArch64] Tweak aarch64_classify_address interface
>> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01251.html
>
> I've reviewed this one with a comment on avoiding the duplication
> of the new enum values across the back end.

Thanks.

>> [61/77] Use scalar_int_mode in the AArch64 port
>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00701.html
>> 
>> [75/77] Use scalar_mode in the AArch64 port
>> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00715.html
>
> Do you have a link to a short primer on what these changes do? I don't really
> have any objections to them if that's the direction the rest of the compiler
> is going in, but I also didn't really understand what was happening here.

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00639.html describes
the overall rationale for the changes.  When SVE is added, AArch64
will be the one backend for which the size of a normal machine_mode
is a variable rather than a constant.  But scalar modes will continue
to be constant-sized, so when dealing with them, it's better to use
the new scalar mode classes where possible.

Thanks,
Richard


Re: [Patch][aarch64] Use IFUNCs to enable LSE instructions in libatomic on aarch64

2017-08-30 Thread Steve Ellcey
On Tue, 2017-08-29 at 12:25 +0100, Szabolcs Nagy wrote:
> 
> it's a general bug that most ifunc users (e.g. in binutils
> tests) declare the ifunc resolvers incorrectly, the correct
> prototype is the way the dynamic linker invokes the resolver
> in sysdeps/aarch64/dl-irel.h:
> 
> static inline ElfW(Addr)
> __attribute ((always_inline))
> elf_ifunc_invoke (ElfW(Addr) addr)
> {
>   return ((ElfW(Addr) (*) (unsigned long int)) (addr))
> (GLRO(dl_hwcap));
> }
> 
> (that argument should be uint64_t for ilp32, but that's a
> separate issue)
> 
> in glibc the hwcap is not used, because it has accesses to
> cached dispatch info, but in libatomic using the hwcap
> argument is the right way.

OK, I changed my patch to use the argument and it did work but
since the type of the argument should be uint64_t instead of 'unsigned
long int' for aarch64 I think I will send a patch to libc-alpha
first to fix this before sending an updated libatomic patch.  That way
I won't have to update GCC when glibc changes the type.  I see that
different platforms use different types for the resolver argument so I
think I will have to update the libatomic configure.tgt script as well
to set the resolver arg type when I resbumit the gcc libatomic patch so
that we can have the correct prototype in libatomic_i.h.

Steve Ellcey
sell...@cavium.com


[PATCH] Fix PR81987 (SLSR dominance issue)

2017-08-30 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/PR81987 identifies an SSA verification error following
SLSR.  The problem arises when SLSR places an initialization expression at
a point not dominated by the definition of an SSA name it uses.  When there
are multiple conditional candidates for replacement, the initialization
expression must dominate all of these candidates and their phi dependencies,
but the nearest common dominator for these may actually be above the stride
definition in some cases.

In such cases a single initialization point is not possible.  With sufficient
analysis, it would be possible to find multiple initialization points that
would satisfy availability of the stride at the cost of larger code.  This
is too complex for a bug fix, though.  This patch instead refuses to replace
candidates where a single legal initialization point isn't possible.  We
ensure this by setting the cost for the increment associated with this
initialization to effectively infinite.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is
this okay for trunk, and backport to all supported releases after a period
of burn-in?

Thanks,
Bill


[gcc]

2017-08-30  Bill Schmidt  

PR tree-optimization/81987
* gimple-ssa-strength-reduction.c (insert_initializers): Don't
insert an initializer in a location not dominated by the stride
definition.

[gcc/testsuite]

2017-08-30  Bill Schmidt  

PR tree-optimization/81987
* g++.dg/torture/pr81987.C: New file.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 251369)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -3340,6 +3340,23 @@ insert_initializers (slsr_cand_t c)
 that block, the earliest one will be returned in WHERE.  */
   bb = nearest_common_dominator_for_cands (c, incr, );
 
+  /* If the NCD is not dominated by the block containing the
+definition of the stride, we can't legally insert a
+single initializer.  Mark the increment as unprofitable
+so we don't make any replacements.  FIXME: Multiple
+initializers could be placed with more analysis.  */
+  gimple *stride_def = SSA_NAME_DEF_STMT (c->stride);
+  basic_block stride_bb = gimple_bb (stride_def);
+
+  if (stride_bb && !dominated_by_p (CDI_DOMINATORS, bb, stride_bb))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Initializer #%d cannot be legally placed\n", i);
+ incr_vec[i].cost = COST_INFINITE;
+ continue;
+   }
+
   /* If the nominal stride has a different type than the recorded
 stride type, build a cast from the nominal stride to that type.  */
   if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type))
Index: gcc/testsuite/g++.dg/torture/pr81987.C
===
--- gcc/testsuite/g++.dg/torture/pr81987.C  (nonexistent)
+++ gcc/testsuite/g++.dg/torture/pr81987.C  (working copy)
@@ -0,0 +1,61 @@
+extern short var_1;
+extern const short var_3;
+extern unsigned long int var_9;
+extern short var_13;
+extern const unsigned long int var_15;
+extern const unsigned long int var_37;
+extern unsigned long int var_40;
+extern long long int var_47;
+extern short var_48;
+extern const short var_54;
+extern long long int var_79;
+extern long long int var_81;
+extern long long int var_94;
+extern long long int var_95;
+extern long long int var_701;
+extern unsigned long int var_786;
+extern short var_788;
+extern long long int var_844;
+
+struct struct_1 {
+  short member_1_2 : 15;
+  static long long int member_1_3;
+};
+
+extern struct_1 struct_obj_6;
+extern struct_1 struct_obj_8;
+
+void foo() {
+  int a = var_3 <= 602154393864UL;
+  if (var_81 ? 0 : var_3 && var_9)
+;
+  else {
+var_94 = 0;
+if (var_3 && var_48 || var_13) {
+  if (var_48)
+   var_95 = 0;
+  short b((236446151776511UL + var_3) * (2 ? var_13 : 0) || var_1);
+  struct_obj_8.member_1_2 = b;
+  if (var_15) {
+   if (var_81)
+ if (var_47)
+   ;
+ else if (var_40)
+   var_701 = 0;
+  } else {
+   if (var_40)
+ var_79 = 0;
+   if (var_54) {
+ if (var_37)
+   var_786 = 0;
+ else
+   var_788 = 0;
+   struct_obj_6.member_1_3 =
+ (236446151776511UL + var_3) * (2 ? var_13 : 0);
+   }
+  }
+  if ((236446151776511UL + var_3) * (2 ? var_13 : 0))
+   var_844 = 0;
+}
+  }
+}



Re: AArch64 patch pings

2017-08-30 Thread James Greenhalgh

On Wed, Aug 30, 2017 at 04:34:40PM +0100, Richard Sandiford wrote:
> Ping for a few AArch64 patches:
> 
> [AArch64] Remove use of wider vector modes
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01249.html
>
> [AArch64] Rename cmp_result iterator
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01250.html
>
> [AArch64] Tighten address register subreg checks
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01252.html

I've reviewed these three, which all looked OK to me.

> [AArch64] Tweak aarch64_classify_address interface
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01251.html

I've reviewed this one with a comment on avoiding the duplication
of the new enum values across the back end.

> [61/77] Use scalar_int_mode in the AArch64 port
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00701.html
> 
> [75/77] Use scalar_mode in the AArch64 port
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00715.html

Do you have a link to a short primer on what these changes do? I don't really
have any objections to them if that's the direction the rest of the compiler
is going in, but I also didn't really understand what was happening here.

Thanks,
James



Re: [AArch64] Tighten address register subreg checks

2017-08-30 Thread James Greenhalgh
On Tue, Aug 22, 2017 at 10:25:02AM +0100, Richard Sandiford wrote:
> Previously we allowed subregs of non-GPR modes to be base and index
> registers in non-strict mode.  In practice such subregs will always
> require a reload, so we get better code by disallowing them.

Makes sense.

> Tested on aarch64-linux-gnu.  OK to install?

OK.

Thanks,
James

> 2017-08-22  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * config/aarch64/aarch64.c (aarch64_base_register_rtx_p): Only allow
>   subregs whose inner modes can be stored in GPRs.
>   (aarch64_classify_index): Likewise.
> 


Re: [C++ PATCH] Make taking the address of an overloaded function a non-deduced context

2017-08-30 Thread Ville Voutilainen
On 30 August 2017 at 19:07, Jason Merrill  wrote:
> Please also remove the error.  OK with that change.


Here's a new and much improved version as discussed on IRC. Tested on
Linux-PPC64.
Ok for trunk?

2017-08-30  Ville Voutilainen  

Make taking the address of an overloaded function a non-deduced context

cp/

* pt.c (unify_overload_resolution_failure): Remove.
(unify_one_argument): Adjust.

testsuite/

* g++.dg/overload/template6.C: New.
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 564ffb0..5cfe292 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6363,16 +6363,6 @@ unify_template_argument_mismatch (bool explain_p, tree 
parm, tree arg)
   return unify_invalid (explain_p);
 }
 
-static int
-unify_overload_resolution_failure (bool explain_p, tree arg)
-{
-  if (explain_p)
-inform (input_location,
-   "  could not resolve address from overloaded function %qE",
-   arg);
-  return unify_invalid (explain_p);
-}
-
 /* Attempt to convert the non-type template parameter EXPR to the
indicated TYPE.  If the conversion is successful, return the
converted value.  If the conversion is unsuccessful, return
@@ -19090,12 +19080,10 @@ unify_one_argument (tree tparms, tree targs, tree 
parm, tree arg,
 templates and at most one of a set of
 overloaded functions provides a unique
 match.  */
-
- if (resolve_overloaded_unification
- (tparms, targs, parm, arg, strict,
-  arg_strict, explain_p))
-   return unify_success (explain_p);
- return unify_overload_resolution_failure (explain_p, arg);
+ resolve_overloaded_unification (tparms, targs, parm,
+ arg, strict,
+ arg_strict, explain_p);
+ return unify_success (explain_p);
}
 
  arg_expr = arg;
diff --git a/gcc/testsuite/g++.dg/overload/template6.C 
b/gcc/testsuite/g++.dg/overload/template6.C
new file mode 100644
index 000..f2650aa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/template6.C
@@ -0,0 +1,47 @@
+// { dg-do compile { target c++11 } }
+
+template 
+struct is_function {
+  static constexpr bool value = false;
+};
+
+template 
+struct is_function
+{
+  static constexpr bool value = true;
+};
+
+template struct enable_if {};
+
+template struct enable_if 
+{
+  typedef T type;
+};
+
+template 
+struct remove_pointer
+{
+  typedef T type;
+};
+
+template 
+struct remove_pointer
+{
+  typedef T type;
+};
+
+void f(int) {}
+void f(double) {}
+
+template 
+struct X
+{
+  template ::type>::value,
+  bool>::type = false> X(U&&) {}
+};
+
+int main() {
+  X x0(f);
+}


Re: [AArch64] Rename cmp_result iterator

2017-08-30 Thread James Greenhalgh
On Tue, Aug 22, 2017 at 10:22:37AM +0100, Richard Sandiford wrote:
> The comparison results provided by the V_cmp_result/v_cmp_result
> attribute were simply the corresponding integer vector.  We'd also
> like to have easy access to the integer vector for SVE, but using
> "cmp_result" would be confusing because SVE comparisons return
> predicates instead of vectors.  This patch therefore renames the
> attributes to the more general V_INT_EQUIV/v_int_equiv instead.
> 
> As to the capitalisation: there are already many iterators that use
> all lowercase vs. all uppercase names to distinguish all lowercase
> vs. all uppercase expansions (e.g. fcvt_target and FCVT_TARGET).
> It's also the convention used for the built-in mode/MODE/code/CODE/etc.
> attributes.  IMO those names are easier to read at a glance, rather than
> relying on a single letter's difference.
> 
> Tested on aarch64-linux-gnu.  OK to install?

OK. This name far outlived its descriptive use (as the output type for FCGE
etc.), and this sort of rename is the right thing to do.

Thanks,
James

> 
> Richard
> 
> 
> 2017-08-22  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * config/aarch64/iterators.md (V_cmp_result): Rename to...
>   (V_INT_EQUIV): ...this.
>   (v_cmp_result): Rename to...
>   (v_int_equiv): ...this.
>   * config/aarch64/aarch64.md (xorsign3): Update accordingly.
>   * config/aarch64/aarch64-simd.md (xorsign3): Likewise.
>   (copysign3): Likewise.
>   (aarch64_simd_bsl_internal): Likewise.
>   (aarch64_simd_bsl): Likewise.
>   (vec_cmp): Likewise.
>   (vcond): Likewise.
>   (vcond): Likewise.
>   (vcondu): Likewise.
>   (aarch64_cm): Likewise.
>   (aarch64_cmtst): Likewise.
>   (aarch64_fac): Likewise.
>   (vec_perm_const): Likewise.
>   (vcond_mask_): Rename to...
>   (vcond_mask_): ...this.
>   (vec_cmp): Rename to...
>   (vec_cmp): ...this.


Re: [AArch64] Tweak aarch64_classify_address interface

2017-08-30 Thread James Greenhalgh
On Tue, Aug 22, 2017 at 10:23:47AM +0100, Richard Sandiford wrote:
> Previously aarch64_classify_address used an rtx code to distinguish
> LDP/STP addresses from normal addresses; the code was PARALLEL
> to select LDP/STP and anything else to select normal addresses.
> This patch replaces that parameter with a dedicated enum.
> 
> The SVE port will add another enum value that didn't map naturally
> to an rtx code.
> 
> Tested on aarch64-linux-gnu.  OK to install?

I can't say I really like this new interface, I'd prefer two wrappers
aarch64_legitimate_address_p , aarch64_legitimate_ldp_address_p (or similar)
around your new interface, and for most code to simply call the wrapper.
Or an overloaded call that filled in ADDR_QUERY_M automatically, to save
that spreading through the backend.

Thanks,
James

> 
> 
> 2017-08-22  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * config/aarch64/aarch64-protos.h (aarch64_addr_query_type): New enum.
>   (aarch64_legitimate_address_p): Use it instead of an rtx code.
>   * config/aarch64/aarch64.c (aarch64_classify_address): Likewise.
>   (aarch64_legitimate_address_p): Likewise.
>   (aarch64_address_valid_for_prefetch_p): Update calls accordingly.
>   (aarch64_legitimate_address_hook_p): Likewise.
>   (aarch64_print_operand_address): Likewise.
>   (aarch64_address_cost): Likewise.
>   * config/aarch64/aarch64-simd.md (mov): Likewise.
>   * config/aarch64/constraints.md (Uad): Likewise.
>   * config/aarch64/predicates.md (aarch64_mem_pair_operand): Likewise.


Re: [AArch64] Remove use of wider vector modes

2017-08-30 Thread James Greenhalgh
On Tue, Aug 22, 2017 at 10:20:46AM +0100, Richard Sandiford wrote:
> The AArch64 port defined x2, x3 and x4 vector modes that were only used
> in the rtl for the AdvSIMD LD{2,3,4} patterns.  It seems unlikely that
> this rtl would have led to any valid simplifications, since the values
> involved were unspecs that had a different number of operands from the
> non-dreg versions.  (The dreg UNSPEC_LD2 had a single operand, while
> the qreg one had two operands.)
> 
> As it happened, the patterns led to invalid simplifications on big-
> endian targets due to a mix-up in the operand order, see Tamar's fix
> in r240271.
> 
> This patch therefore replaces the rtl patterns with dedicated unspecs.
> This allows the x2, x3 and x4 modes to be removed, avoiding a clash
> with 256-bit and 512-bit SVE.
> 
> Tested on aarch64-linux-gnu.  Also tested by comparing the before and after
> assembly for the testsuite at -O2 -ftree-vectorize on aarch64-linux-gnu
> and aarch64_be-linux-gnu; there were no differences.  OK to install?

This is OK. I think we were being a bit too optimistic with the design
of these RTL patterns, though clearly at least some part of this was getting
used in rare corner cases, as we had a wrong code-bug. I think what these
would give you is a simplification if you used a 64-bit ld2/3/4 and then
extracted a lane which GCC knew to be zero. This happened by mistake with
the backwards lane indexes for big-endian, causing us to think the DImode
subreg we took was constant zero.

In reality, there is not likely to be a common case that needs
optimisation here, so I don't mind losing this.

OK.

Thanks,
James

> 2017-08-22  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * config/aarch64/aarch64-modes.def: Remove 32-, 48- and 64-byte
>   vector modes.
>   * config/aarch64/iterators.md (VRL2, VRL3, VRL4): Delete.
>   * config/aarch64/aarch64.md (UNSPEC_LD2_DREG, UNSPEC_LD3_DREG)
>   (UNSPEC_LD4_DREG): New unspecs.
>   * config/aarch64/aarch64-simd.md (aarch64_ld2_dreg_le)
>   (aarch64_ld2_dreg_be): Replace with...
>   (aarch64_ld2_dreg): ...this pattern and use the new DREG
>   unspec.
>   (aarch64_ld3_dreg_le)
>   (aarch64_ld3_dreg_be): Replace with...
>   (aarch64_ld3_dreg): ...this pattern and use the new DREG
>   unspec.
>   (aarch64_ld4_dreg_le)
>   (aarch64_ld4_dreg_be): Replace with...
>   (aarch64_ld4_dreg): ...this pattern and use the new DREG
>   unspec.
> 
> Index: gcc/config/aarch64/aarch64-modes.def
> ===
> --- gcc/config/aarch64/aarch64-modes.def  2017-02-23 19:54:24.0 
> +
> +++ gcc/config/aarch64/aarch64-modes.def  2017-08-22 10:11:04.724056356 
> +0100
> @@ -44,15 +44,5 @@ INT_MODE (OI, 32);
>  INT_MODE (CI, 48);
>  INT_MODE (XI, 64);
>  
> -/* Vector modes for register lists.  */
> -VECTOR_MODES (INT, 32);  /* V32QI V16HI V8SI V4DI.  */
> -VECTOR_MODES (FLOAT, 32);/* V8SF V4DF.  */
> -
> -VECTOR_MODES (INT, 48);  /* V32QI V16HI V8SI V4DI.  */
> -VECTOR_MODES (FLOAT, 48);/* V8SF V4DF.  */
> -
> -VECTOR_MODES (INT, 64);  /* V32QI V16HI V8SI V4DI.  */
> -VECTOR_MODES (FLOAT, 64);/* V8SF V4DF.  */
> -
>  /* Quad float: 128-bit floating mode for long doubles.  */
>  FLOAT_MODE (TF, 16, ieee_quad_format);
> Index: gcc/config/aarch64/iterators.md
> ===
> --- gcc/config/aarch64/iterators.md   2017-08-03 10:40:55.896279339 +0100
> +++ gcc/config/aarch64/iterators.md   2017-08-22 10:11:04.727125997 +0100
> @@ -711,21 +711,6 @@ (define_mode_attr Vendreg [(OI "T") (CI
>  ;; ld..._lane and st..._lane operations.
>  (define_mode_attr nregs [(OI "2") (CI "3") (XI "4")])
>  
> -(define_mode_attr VRL2 [(V8QI "V32QI") (V4HI "V16HI")
> - (V4HF "V16HF")
> - (V2SI "V8SI")  (V2SF "V8SF")
> - (DI   "V4DI")  (DF   "V4DF")])
> -
> -(define_mode_attr VRL3 [(V8QI "V48QI") (V4HI "V24HI")
> - (V4HF "V24HF")
> - (V2SI "V12SI")  (V2SF "V12SF")
> - (DI   "V6DI")  (DF   "V6DF")])
> -
> -(define_mode_attr VRL4 [(V8QI "V64QI") (V4HI "V32HI")
> - (V4HF "V32HF")
> - (V2SI "V16SI")  (V2SF "V16SF")
> - (DI   "V8DI")  (DF   "V8DF")])
> -
>  ;; Mode for atomic operation suffixes
>  (define_mode_attr atomic_sfx
>[(QI "b") (HI "h") (SI "") (DI "")])
> Index: gcc/config/aarch64/aarch64.md
> ===
> --- gcc/config/aarch64/aarch64.md 2017-08-16 08:50:34.060622654 +0100
> +++ gcc/config/aarch64/aarch64.md 2017-08-22 10:11:04.726102783 +0100
> @@ -98,10 +98,13 @@ (define_c_enum "unspec" [
>  

Re: [PATCH] Factor out division by squares and remove division around comparisons (2/2)

2017-08-30 Thread Jackson Woodruff

Hi all,

I've attached a new version of the patch in response to a few of Wilco's 
comments in person.


The end product of the pass is still the same, but I have fixed several 
bugs.


Now tested independently of the other patches.

On 08/15/2017 03:07 PM, Richard Biener wrote:

On Thu, Aug 10, 2017 at 4:10 PM, Jackson Woodruff
 wrote:

Hi all,

The patch implements the some of the division optimizations discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 .

We now reassociate (as discussed in the bug report):

 x / (y * y) -> x  * (1 / y) * (1 / y)

If it is reasonable to do so. This is done with
-funsafe-math-optimizations.

Bootstrapped and regtested with part (1/2). OK for trunk?


I believe your enhancement shows the inherent weakness of
CSE of reciprocals in that it works from the defs.  It will
handle x / (y * y) but not x / (y * y * y).

I think a rewrite of this mini-pass is warranted.


I suspect that there might be more to gain by of handling the case of
x / (y * z) rather than the case of x / (y**n), but I agree that this 
pass could do more.




Richard.


Jackson

gcc/

2017-08-03  Jackson Woodruff  

 PR 71026/tree-optimization
 * tree-ssa-math-opts (is_division_by_square,
 is_square_of, insert_sqaure_reciprocals): New.
 (insert_reciprocals): Change to insert reciprocals
 before a division by a square.
 (execute_cse_reciprocals_1): Change to consider
 division by a square.


gcc/testsuite

2017-08-03  Jackson Woodruff  

 PR 71026/tree-optimization
 * gcc.dg/associate_division_1.c: New.



Thanks,

Jackson.

Updated ChangeLog:

gcc/

2017-08-30  Jackson Woodruff  

PR 71026/tree-optimization
* tree-ssa-math-opts (is_division_by_square, is_square_of): New.
(insert_reciprocals): Change to insert reciprocals
before a division by a square and to insert the square
of a reciprocal.
(execute_cse_reciprocals_1): Change to consider
division by a square.
(register_division_in): Add importance parameter.

gcc/testsuite

2017-08-30  Jackson Woodruff  

PR 71026/tree-optimization
* gcc.dg/extract_recip_3.c: New.
* gcc.dg/extract_recip_4.c: New.
* gfortran.dg/extract_recip_1.f: New.
diff --git a/gcc/testsuite/gcc.dg/extract_recip_3.c 
b/gcc/testsuite/gcc.dg/extract_recip_3.c
new file mode 100644
index 
..0ea3fdf5cca06a0806a55185ef8b0a4b0507
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/extract_recip_3.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+float
+extract_square (float x, float y, float *a, float *b)
+{
+  *a = 3 / (y * y);
+  *b = 5 / (y * y);
+
+  return x / (y * y);
+}
+
+/* Don't expect the 'powmult' (calculation of y * y)
+   to be deleted until a later pass, so look for one
+   more multiplication than strictly necessary.  */
+float
+extract_recip (float *w, float x, float y, float z)
+{
+  *w = 7 / y;
+
+  return x / (y * y) + z / y;
+}
+
+/* 3 For the pointers to a, b and w, 4 multiplications in 'extract_square',
+   5 multiplications in 'extract_recip' expected.  */
+/* { dg-final { scan-tree-dump-times " \\* " 12 "optimized" } } */
+
+/* 1 division in 'extract_square', 1 division in 'extract_recip'. */
+/* { dg-final { scan-tree-dump-times " / " 2 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/extract_recip_4.c 
b/gcc/testsuite/gcc.dg/extract_recip_4.c
new file mode 100644
index 
..5a31d40ed910cdd1914cc1e82358493be428946a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/extract_recip_4.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -fdump-tree-optimized" } */
+
+/* Don't expect any of these divisions to be extracted.  */
+double f (double x, int p)
+{
+  if (p > 0)
+{
+  return 1.0/(x * x);
+}
+
+  if (p > -1)
+{
+  return x * x * x;
+}
+  return  1.0 /(x);
+}
+
+/* Expect a reciprocal to be extracted here.  */
+double g (double x, double y)
+{
+  double k = x / (y * y);
+
+  if (y * y == 2.0)
+return k + 1 / y;
+  else
+return k - 1 / y;
+}
+
+/* Expect 2 divisions in 'f' and 1 in 'g'.  */
+/* { dg-final { scan-tree-dump-times " / " 3 "optimized" } } */
+/* Expect 3 multiplications in 'f' and 3 in 'g'.  */
+/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */
diff --git a/gcc/testsuite/gfortran.dg/extract_recip_1.f 
b/gcc/testsuite/gfortran.dg/extract_recip_1.f
new file mode 100644
index 
..ecf05189773b6c2f46222857fd88fd010bfdf348
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/extract_recip_1.f
@@ -0,0 +1,19 @@
+! { dg-do compile }
+! { dg-options "-Ofast -fdump-tree-optimized" }
+
+  SUBROUTINE F(N,X,Y,Z,A,B)
+  DIMENSION 

Re: [PATCH GCC]A simple implementation of loop interchange

2017-08-30 Thread Bin.Cheng
On Wed, Aug 30, 2017 at 3:19 PM, Richard Biener
 wrote:
> On Wed, Aug 30, 2017 at 3:18 PM, Bin Cheng  wrote:
>> Hi,
>> This patch implements a simple loop interchange pass in GCC, as described by 
>> its comments:
>> +/* This pass performs loop interchange: for example, the loop nest
>> +
>> +   for (int j = 0; j < N; j++)
>> + for (int k = 0; k < N; k++)
>> +   for (int i = 0; i < N; i++)
>> +c[i][j] = c[i][j] + a[i][k]*b[k][j];
>> +
>> +   is transformed to
>> +
>> +   for (int i = 0; i < N; i++)
>> + for (int j = 0; j < N; j++)
>> +   for (int k = 0; k < N; k++)
>> +c[i][j] = c[i][j] + a[i][k]*b[k][j];
>> +
>> +   This pass implements loop interchange in the following steps:
>> +
>> + 1) Find perfect loop nest for each innermost loop and compute data
>> +   dependence relations for it.  For above example, loop nest is
>> +   .
>> + 2) From innermost to outermost loop, this pass tries to interchange
>> +   each loop pair.  For above case, it firstly tries to interchange
>> +    and loop nest becomes .
>> +   Then it tries to interchange  and loop nest becomes
>> +   .  The overall effect is to move innermost
>> +   loop to the outermost position.  For loop pair 
>> +   to be interchanged, we:
>> + 3) Check if data dependence relations are valid for loop interchange.
>> + 4) Check if both loops can be interchanged in terms of transformation.
>> + 5) Check if interchanging the two loops is profitable.
>> + 6) Interchange the two loops by mapping induction variables.
>> +
>> +   This pass also handles reductions in loop nest.  So far we only support
>> +   simple reduction of inner loop and double reduction of the loop nest.  */
>>
>> Actually, this pass only does loop shift which outermosting inner loop to 
>> outer, rather
>> than permutation.  Also as a traditional loop optimizer, it only works for 
>> perfect loop
>> nest.  I put it just after loop distribution thus ideally loop 
>> split/distribution could
>> create perfect nest for it.  Unfortunately, we don't get any perfect nest 
>> from distribution
>> for now because it only works for innermost loop.  For example, the 
>> motivation case in
>> spec2k6/bwaves is not handled on this pass alone.  I have a patch extending 
>> distribution
>> for (innermost) loop nest and with that patch bwaves case can be handled.
>> Another point is I deliberately make both the cost model and code 
>> transformation (very)
>> conservative.  We can support more cases, or more transformations with great 
>> care when
>> it is for sure known beneficial.  IMHO, we already hit over-baked issues 
>> quite often and
>> don't want to introduce more.
>> As for code generation, this patch has an issue that invariant code in outer 
>> loop could
>> be moved to inner loop.  For the moment, we rely on the last lim pass to 
>> handle all INV
>> generated during interchange.  In the future, we may need to avoid that in 
>> interchange
>> itself, or another lim pass just like the one after graphite optimizations.
>>
>> Boostrap and test on x86_64 and AArch64.  Various benchmarks built and run 
>> successfully.
>> Note this pass is disabled in patch, while the code is exercised by 
>> bootstrap/building
>> programs with it enabled by default.  Any comments?
>
Thanks for quick review.
> +/* The same as above, but this one is only used for interchanging not
> +   innermost loops.  */
> +#define OUTER_STRIDE_RATIO (2)
>
> please make all these knobs --params.
>
> +/* Enum type for loop reduction variable.  */
> +enum reduction_type
> +{
> +  UNKNOWN_RTYPE = 0,
> +  SIMPLE_RTYPE,
> +  DOUBLE_RTYPE
> +};
>
> seeing this we should have some generic data structure / analysis for
> reduction detection.  This adds a third user (autopar and vectorizer
> are the others).  Just an idea.
>
> +/* Return true if E is abnormal edge.  */
> +
> +static inline bool
> +abnormal_edge (edge e)
> +{
> +  return (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP));
> +}
>
> bad name/comment for what it does.
>
> ... jumping to end of file / start of pass
>
> +  /* Get the outer loop.  */
> +  loop = superloop_at_depth (loop, loop_depth (loop) - 1);
>
> loop_outer (loop)?
>
> +  /* Only support rectangle loop nest, i.e, inner loop's niters doesn't
> +depends on outer loop's IV.  */
> +  if (chrec_contains_symbols_defined_in_loop (niters, loop->num))
> +   break;
>
> but you don't check for a three-nest whether niters depends on outer outer
> loop's IV that way.  Either the check is superfluous here or incomplete.
It is checked for multi-nest case in can_interchange_loops.  I will
move the check to this function so that we can save compilation time.
>
> +  /* Check if start_loop forms a perfect loop nest.  */
> +  

Re: [C++ PATCH] Make taking the address of an overloaded function a non-deduced context

2017-08-30 Thread Jason Merrill
On Tue, Aug 29, 2017 at 6:19 PM, Ville Voutilainen
 wrote:
> 2017-08-29  Ville Voutilainen  
>
> Make taking the address of an overloaded function a non-deduced context
>
> cp/
>
> * pt.c (unify_overload_resolution_failure): Return unify_success
> instead of unify_invalid.

Please also remove the error.  OK with that change.

Jason


Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

2017-08-30 Thread James Greenhalgh
Hi Tamar,

I think the AArch64 parts of this patch can be substantially simplified.

On Mon, Jul 03, 2017 at 07:17:51AM +0100, Tamar Christina wrote:
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> ab1bdc0233afe7a3c41501cb724a5c4c719123b8..1cf6c4b891571745f740d7dbd03754bbba9264a6
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13466,7 +13466,6 @@ aarch64_copy_one_block_and_progress_pointers (rtx 
> *src, rtx *dst,
>  
>  /* Expand movmem, as if from a __builtin_memcpy.  Return true if
> we succeed, otherwise return false.  */
> -
>  bool
>  aarch64_expand_movmem (rtx *operands)
>  {
> @@ -13498,16 +13497,55 @@ aarch64_expand_movmem (rtx *operands)
>base = copy_to_mode_reg (Pmode, XEXP (src, 0));
>src = adjust_automodify_address (src, VOIDmode, base, 0);
>  
> +  /* Optimize routines for MEM to REG copies.
> + This particular function only handles copying to two
> + destination types: 1) a regular register and 2) the stack.
> + When writing to a regular register we may end up writting too much in 
> cases
> + where the register already contains a live value or when no data 
> padding is
> + happening so we disallow it.  */
> +  if (n < 8 && !REG_P (XEXP (operands[0], 0)))

I don't understand how this comment lines up with the condition on this
code path. On the one hand you say you permit regular registers, but on the
other hand you say you disallow regular registers because you may overwrite.


> +   {
> + machine_mode mov_mode, dest_mode
> +   = smallest_mode_for_size (n * BITS_PER_UNIT, MODE_INT);
> + rtx result = gen_reg_rtx (dest_mode);
> + emit_insn (gen_move_insn (result, GEN_INT (0)));
> +
> + unsigned int shift_cnt = 0;

Any reason not to just initialise the shift_cnt in place here?

> + for (; n > shift_cnt; shift_cnt += GET_MODE_SIZE (mov_mode))


 for (unsigned int shift_cnt = 0;
  shift_cnt <=  n;
  shift_cnt += GET_MODE_SIZE (mov_mode))

Having the loop counter first in the comparison is personal preference.

mov_mode looks uninitialised, but I guess by the time you check the loop
condition you have actually initialized it.

> +   {
> +  int nearest = 0;

This isn't required, we could do the loop below with one 

> +  /* Find the mode to use.  */
> +  for (unsigned max = 1; max <= (n - shift_cnt); max *= 2)
> +   nearest = max;

Wrong formatting.

So, when this loop terminates for the first time (shift_cnt == 0) nearest
is the first power of two greater than n.

> +
> +   mov_mode = smallest_mode_for_size (nearest * BITS_PER_UNIT, MODE_INT);

That means this is always a mode of at least the right size, which in turn
means that we can't execute round the loop again (GET_MODE_SIZE (mov_mode)
will always be greater than n). So, we can be sure this loop only executes
once, and we can fold mov_mode and dest_mode to be equal.

> +   rtx reg = gen_reg_rtx (mov_mode);
> +
> +   src = adjust_address (src, mov_mode, 0);
> +   emit_insn (gen_move_insn (reg, src));
> +   src = aarch64_progress_pointer (src);
> +
> +   reg = gen_rtx_ZERO_EXTEND (dest_mode, reg);
> +   reg = gen_rtx_ASHIFT (dest_mode, reg,
> + GEN_INT (shift_cnt * BITS_PER_UNIT));
> +   result = gen_rtx_IOR (dest_mode, reg, result);
> +   }
> +
> +dst = adjust_address (dst, dest_mode, 0);
> +emit_insn (gen_move_insn (dst, result));
> +return true;
> +  }
> +
>/* Simple cases.  Copy 0-3 bytes, as (if applicable) a 2-byte, then a
>   1-byte chunk.  */
>if (n < 4)
>  {
>if (n >= 2)
> - {
> -   aarch64_copy_one_block_and_progress_pointers (, , HImode);
> -   n -= 2;
> - }
>  
> +  {
> + aarch64_copy_one_block_and_progress_pointers (, , HImode);
> + n -= 2;
> +  }

These formatting changes leave a blank newline between if (n >= 2) and the
remainder of this block. Why?

>if (n == 1)
>   aarch64_copy_one_block_and_progress_pointers (, , QImode);
>  

Thanks,
James


RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-08-30 Thread Richard Biener
On August 30, 2017 5:11:24 PM GMT+02:00, Jon Beniston  wrote:
>Hi Richard,
>
>> I think the issue is that with TARGET_VECTOR_MODE_SUPPORTED_P false 
>> for V1SI you'll get a SImode vector type and the
>>
>>  if (VECTOR_BOOLEAN_TYPE_P (type_in)
>>  || VECTOR_MODE_P (TYPE_MODE (type_in)))
>>
>>check fails.  Try changing that to || VECTOR_TYPE_P (type_in).
>>The else path should be hopefully dead ...
>>
>>> It looks to me like the other call sites of optab_for_tree_code
>which 
>>> are passing optab_default are mostly places where GCC is sure it is 
>>> not a shift operation.
>>> Given TYPE_IN is returned from get_vectype_for_scalar_type, is it 
>>> safe to simply pass optab_vector in vect_pattern_recog_1?
>>
>>I guess so.  Can you also try the above?
>
>Changing VECTOR_MODE_P check into VECTOR_TYPE_P check also works for
>me, I
>verified the following patch could vectorize my dot product case and
>there
>is no regression on gcc tests on my port.
>
>Meanwhile x86-64 bootstraps OK and no regression on gcc/g++ test.
>
> Does this look OK?

OK. 

Thanks, 
Richard. 

>
>gcc/
>2017-08-30  Jon Beniston  
>Richard Biener  
>
>diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>index cfdb72c..5ebeac2 100644
>--- a/gcc/tree-vect-patterns.c
>+++ b/gcc/tree-vect-patterns.c
>@@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func
>*recog_func,
>   loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>  
>   if (VECTOR_BOOLEAN_TYPE_P (type_in)
>-  || VECTOR_MODE_P (TYPE_MODE (type_in)))
>+  || VECTOR_TYPE_P (type_in))
> {
> /* No need to check target support (already checked by the pattern
>  recognition function).  */
>diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>index 013fb1f..fc62efb 100644
>--- a/gcc/tree-vect-stmts.c
>+++ b/gcc/tree-vect-stmts.c
>@@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
>scalar_type, unsigned size)
>   else
> simd_mode = mode_for_vector (inner_mode, size / nbytes);
>   nunits = GET_MODE_SIZE (simd_mode) / nbytes;
>-  if (nunits <= 1)
>+  /* NOTE: nunits == 1 is allowed to support single element vector
>types.
>*/
>+  if (nunits < 1)
> return NULL_TREE;
> 
>   vectype = build_vector_type (scalar_type, nunits);



Re: Add a partial_subreg_p predicate

2017-08-30 Thread Richard Sandiford
Jeff Law  writes:
> On 08/21/2017 07:34 AM, Richard Sandiford wrote:
>> This patch adds a partial_subreg_p predicate to go alongside
>> paradoxical_subreg_p.
>> 
>> The first two changes to cse_insn preserve the current behaviour,
>> but the condition seems strange.  Shouldn't we be able to continue
>> to cse if the inner modes of the two subregs have the same size?
> I would think so.  It could well be a simple oversight.  We're talking
> about very old code -- this stuff is virtually unchanged in 25 years.
>
> This specific chunk of code is something I would like to look more
> deeply into its utility -- most of what it's doing should be handled by
> DOM these days.  It'd be an interesting exercise to see if this code is
> important at all anymore.

OK.  When I get some spare machine time, I might try putting a
gcc_unreachable () there and doing the usual multi-target testing,
just to see if or when it fires.

> If you wanted to handle that case, I wouldn't object as a follow-up if
> you can come up with a testcase that shows when its useful.
>
>
>> The patch also preserves the existing condition in
>> simplify_operand_subreg, but perhaps it should be using
>> a df_read_modify_subreg_p-style check instead.  We don't
>> need to reload the inner value first if the inner value
>> is no bigger than a word, for example.
> Not sure on this one.  We could try to change it as a follow-up though.
> Vlad may have a better sense of how this might interact with other parts
> of LRA than I would

I'll give this one a go too.

>> The use in curr_insn_transform also seemed strange.  Surely the
>> modes of the SET_DEST and SET_SRC will be the same, given that
>> this code isn't meant for constants?
> Is it possible this is for the zero/sign extension support?

Even then I don't think we could ever have SET_SRC and SET_DEST being
different.  The extension has to be explicit in the source.

>> Like the paradoxical_subreg_p patch, this one replaces some tests that
>> were based on GET_MODE_SIZE rather than GET_MODE_PRECISION.  In each
>> case the change should be a no-op or an improvement.
>> 
>> Doing this in regcprop.c prevents some replacements of the 82-bit RFmode
>> with the 80-bit XFmode on ia64.  I don't understand the target details
>> here particularly well, but from the way the values are described in
>> ia64-modes.def, it isn't valid to assume that an XFmode can carry an
>> RFmode payload.  A comparison of the testsuite assembly output for one
>> target per CPU showed no other differences.
>> 
>> Some of the places changed here are tracking the widest access mode
>> found for a register.  The series tries to standardise on:
>> 
>>   if (partial_subreg_p (widest_seen, new_mode))
>> widest_seen = new_mode;
>> 
>> rather than:
>> 
>>   if (paradoxical_subreg_p (new_mode, widest_seen))
>> widest_seen = new_mode;
>> 
>> Either would have been OK.
>> 
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu in addition to the above.
>> OK to install?
>> 
>> Richard
>> 
>> 
>> 2017-08-21  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>> 
>> gcc/
>>  * rtl.h (partial_subreg_p): New function.
>>  * caller-save.c (save_call_clobbered_regs): Use it.
>>  * calls.c (expand_call): Likewise.
>>  * combine.c (combinable_i3pat): Likewise.
>>  (simplify_set): Likewise.
>>  (make_extraction): Likewise.
>>  (make_compound_operation_int): Likewise.
>>  (gen_lowpart_or_truncate): Likewise.
>>  (force_to_mode): Likewise.
>>  (make_field_assignment): Likewise.
>>  (reg_truncated_to_mode): Likewise.
>>  (record_truncated_value): Likewise.
>>  (move_deaths): Likewise.
>>  * cse.c (record_jump_cond): Likewise.
>>  (cse_insn): Likewise.
>>  * cselib.c (cselib_lookup_1): Likewise.
>>  * df-scan.c (df_read_modify_subreg_p): Likewise.
>>  * expmed.c (extract_bit_field_using_extv): Likewise.
>>  * function.c (assign_parm_setup_reg): Likewise.
>>  * ifcvt.c (noce_convert_multiple_sets): Likewise.
>>  * ira-build.c (create_insn_allocnos): Likewise.
>>  * lra-coalesce.c (merge_pseudos): Likewise.
>>  * lra-constraints.c (match_reload): Likewise.
>>  (simplify_operand_subreg): Likewise.
>>  (curr_insn_transform): Likewise.
>>  * lra-lives.c (process_bb_lives): Likewise.
>>  * lra.c (new_insn_reg): Likewise.
>>  (lra_substitute_pseudo): Likewise.
>>  * regcprop.c (mode_change_ok): Likewise.
>>  (maybe_mode_change): Likewise.
>>  (copyprop_hardreg_forward_1): Likewise.
>>  * reload.c (push_reload): Likewise.
>>  (find_reloads): Likewise.
>>  (find_reloads_subreg_address): Likewise.
>>  * reload1.c (alter_reg): Likewise.
>>  (eliminate_regs_1): Likewise.
>>  * simplify-rtx.c (simplify_unary_operation_1): Likewise.
> Any thought on whether or not a same size subreg predicate would 

AArch64 patch pings

2017-08-30 Thread Richard Sandiford
Ping for a few AArch64 patches:

[61/77] Use scalar_int_mode in the AArch64 port
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00701.html

[75/77] Use scalar_mode in the AArch64 port
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00715.html

[AArch64] Remove use of wider vector modes
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01249.html

[AArch64] Rename cmp_result iterator
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01250.html

[AArch64] Tweak aarch64_classify_address interface
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01251.html

[AArch64] Tighten address register subreg checks
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01252.html

Thanks,
Richard


Re: [TESTSUITE]Use strncpy instead of strcpy in testsuite/gcc.dg/memcmp-1.c

2017-08-30 Thread Renlin Li

Hi Aaron,

On 30/08/17 15:37, Aaron Sawdey wrote:

On Wed, 2017-08-30 at 10:16 +0100, Renlin Li wrote:

Hi,



Hi,
   Renlin you are correct that it shouldn't be using strcpy because the
string may not be null terminated. However I would suggest we use
memcpy instead of strncpy. The reason is that cases where there is a
null char in the middle of the string test whether the strncmp is
properly ignoring what comes after. So how about using this:

  memcpy(a,str1,SZ);\
  memcpy(b,str2,SZ);\

as in the test_memcmp_ part of the macro?


You are right.

For strncpy, if there is a null-character before size, the destination will be padded with 
zeros.


memcpy is better than strncpy in this case.
Here is the updated patch.

Regards,
Renlin
diff --git a/gcc/testsuite/gcc.dg/memcmp-1.c b/gcc/testsuite/gcc.dg/memcmp-1.c
index 828a0ca..d258354 100644
--- a/gcc/testsuite/gcc.dg/memcmp-1.c
+++ b/gcc/testsuite/gcc.dg/memcmp-1.c
@@ -110,8 +110,8 @@ static void test_strncmp_ ## SZ ## _ ## ALIGN (const char *str1, const char *str
 	{\
 	  a = three+i*ALIGN+j*(4096-2*i*ALIGN);\
 	  b = four+i*ALIGN+j*(4096-2*i*ALIGN);\
-	  strcpy(a,str1);		\
-	  strcpy(b,str2);		\
+	  memcpy(a,str1,SZ);		\
+	  memcpy(b,str2,SZ);		\
 	  r = strncmp(a,b,SZ);		\
 	  if ( r < 0 && !(expect < 0) ) abort();			\
 	  if ( r > 0 && !(expect > 0) )	abort();			\


RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-08-30 Thread Jon Beniston
Hi Richard,

> I think the issue is that with TARGET_VECTOR_MODE_SUPPORTED_P false 
> for V1SI you'll get a SImode vector type and the
>
>  if (VECTOR_BOOLEAN_TYPE_P (type_in)
>  || VECTOR_MODE_P (TYPE_MODE (type_in)))
>
>check fails.  Try changing that to || VECTOR_TYPE_P (type_in).
>The else path should be hopefully dead ...
>
>> It looks to me like the other call sites of optab_for_tree_code which 
>> are passing optab_default are mostly places where GCC is sure it is 
>> not a shift operation.
>> Given TYPE_IN is returned from get_vectype_for_scalar_type, is it 
>> safe to simply pass optab_vector in vect_pattern_recog_1?
>
>I guess so.  Can you also try the above?

Changing VECTOR_MODE_P check into VECTOR_TYPE_P check also works for me, I
verified the following patch could vectorize my dot product case and there
is no regression on gcc tests on my port.

Meanwhile x86-64 bootstraps OK and no regression on gcc/g++ test.

 Does this look OK?


gcc/
2017-08-30  Jon Beniston  
Richard Biener  

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index cfdb72c..5ebeac2 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func *recog_func,
   loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
  
   if (VECTOR_BOOLEAN_TYPE_P (type_in)
-  || VECTOR_MODE_P (TYPE_MODE (type_in)))
+  || VECTOR_TYPE_P (type_in))
 {
   /* No need to check target support (already checked by the pattern
  recognition function).  */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 013fb1f..fc62efb 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
scalar_type, unsigned size)
   else
 simd_mode = mode_for_vector (inner_mode, size / nbytes);
   nunits = GET_MODE_SIZE (simd_mode) / nbytes;
-  if (nunits <= 1)
+  /* NOTE: nunits == 1 is allowed to support single element vector types.
*/
+  if (nunits < 1)
 return NULL_TREE;
 
   vectype = build_vector_type (scalar_type, nunits);




[PATCH] rs6000_expand_binop_builtin

2017-08-30 Thread David Edelsohn
The patch to convert rs6000.c:rs6000_expand_binop_builtin to switch
statement broke bootstrap because some of the CODE_FOR_XXX labels are
not guaranteed to be defined or have unique values in all
configurations.

Restores bootstrap on AIX.

Committed.

Thanks, David


* config/rs6000/rs6000.c (rs6000_expand_binop_builtin): Revert
back to if statements, including unpack.

Index: rs6000.c
===
--- rs6000.c(revision 251533)
+++ rs6000.c(working copy)
@@ -14000,17 +14000,14 @@
   if (arg0 == error_mark_node || arg1 == error_mark_node)
 return const0_rtx;

-  switch (icode)
+  if (icode == CODE_FOR_altivec_vcfux
+  || icode == CODE_FOR_altivec_vcfsx
+  || icode == CODE_FOR_altivec_vctsxs
+  || icode == CODE_FOR_altivec_vctuxs
+  || icode == CODE_FOR_altivec_vspltb
+  || icode == CODE_FOR_altivec_vsplth
+  || icode == CODE_FOR_altivec_vspltw)
 {
-default:
-  break;
-case CODE_FOR_altivec_vcfux:
-case CODE_FOR_altivec_vcfsx:
-case CODE_FOR_altivec_vctsxs:
-case CODE_FOR_altivec_vctuxs:
-case CODE_FOR_altivec_vspltb:
-case CODE_FOR_altivec_vsplth:
-case CODE_FOR_altivec_vspltw:
   /* Only allow 5-bit unsigned literals.  */
   STRIP_NOPS (arg1);
   if (TREE_CODE (arg1) != INTEGER_CST
@@ -14019,15 +14016,16 @@
  error ("argument 2 must be a 5-bit unsigned literal");
  return CONST0_RTX (tmode);
}
-  break;
-case CODE_FOR_dfptstsfi_eq_dd:
-case CODE_FOR_dfptstsfi_lt_dd:
-case CODE_FOR_dfptstsfi_gt_dd:
-case CODE_FOR_dfptstsfi_unordered_dd:
-case CODE_FOR_dfptstsfi_eq_td:
-case CODE_FOR_dfptstsfi_lt_td:
-case CODE_FOR_dfptstsfi_gt_td:
-case CODE_FOR_dfptstsfi_unordered_td:
+}
+  else if (icode == CODE_FOR_dfptstsfi_eq_dd
+  || icode == CODE_FOR_dfptstsfi_lt_dd
+  || icode == CODE_FOR_dfptstsfi_gt_dd
+  || icode == CODE_FOR_dfptstsfi_unordered_dd
+  || icode == CODE_FOR_dfptstsfi_eq_td
+  || icode == CODE_FOR_dfptstsfi_lt_td
+  || icode == CODE_FOR_dfptstsfi_gt_td
+  || icode == CODE_FOR_dfptstsfi_unordered_td)
+{
   /* Only allow 6-bit unsigned literals.  */
   STRIP_NOPS (arg0);
   if (TREE_CODE (arg0) != INTEGER_CST
@@ -14036,12 +14034,13 @@
  error ("argument 1 must be a 6-bit unsigned literal");
  return CONST0_RTX (tmode);
}
-  break;
-case CODE_FOR_xststdcqp:
-case CODE_FOR_xststdcdp:
-case CODE_FOR_xststdcsp:
-case CODE_FOR_xvtstdcdp:
-case CODE_FOR_xvtstdcsp:
+}
+  else if (icode == CODE_FOR_xststdcqp
+  || icode == CODE_FOR_xststdcdp
+  || icode == CODE_FOR_xststdcsp
+  || icode == CODE_FOR_xvtstdcdp
+  || icode == CODE_FOR_xvtstdcsp)
+{
   /* Only allow 7-bit unsigned literals. */
   STRIP_NOPS (arg1);
   if (TREE_CODE (arg1) != INTEGER_CST
@@ -14050,12 +14049,13 @@
  error ("argument 2 must be a 7-bit unsigned literal");
  return CONST0_RTX (tmode);
}
-  break;
-case CODE_FOR_unpackv1ti:
-case CODE_FOR_unpackkf:
-case CODE_FOR_unpacktf:
-case CODE_FOR_unpackif:
-case CODE_FOR_unpacktd:
+}
+  else if (icode == CODE_FOR_unpackv1ti
+  || icode == CODE_FOR_unpackkf
+  || icode == CODE_FOR_unpacktf
+  || icode == CODE_FOR_unpackif
+  || icode == CODE_FOR_unpacktd)
+{
   /* Only allow 1-bit unsigned literals. */
   STRIP_NOPS (arg1);
   if (TREE_CODE (arg1) != INTEGER_CST
@@ -14064,7 +14064,6 @@
  error ("argument 2 must be a 1-bit unsigned literal");
  return CONST0_RTX (tmode);
}
-  break;
 }

   if (target == 0


Re: [TESTSUITE]Use strncpy instead of strcpy in testsuite/gcc.dg/memcmp-1.c

2017-08-30 Thread Aaron Sawdey
On Wed, 2017-08-30 at 10:16 +0100, Renlin Li wrote:
> Hi,
> 
> In test_driver_memcmp function, I found buf1 and buf2 is not properly
> terminated with null character.
> 
> In lib_strncmp, strcpy will be called with buf1 and buf2.
> The normal implementation of strcpy function has a loop to copy
> character from source
> to destination one by one until a null character is encountered.
> 
> If the string is not properly terminated, this will cause the strcpy
> read/write
> memory beyond the boundary.
> 
> Here I changed the strcpy into strncpy to constraint the function to
> visit
> legal memory only.

Hi,
  Renlin you are correct that it shouldn't be using strcpy because the
string may not be null terminated. However I would suggest we use
memcpy instead of strncpy. The reason is that cases where there is a
null char in the middle of the string test whether the strncmp is
properly ignoring what comes after. So how about using this:

  memcpy(a,str1,SZ);\
  memcpy(b,str2,SZ);\

as in the test_memcmp_ part of the macro?

  Aaron

> 
> Test Okay without any problem. Okay to commit?
> 
> Regard,
> Renlin
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-08-30  Renlin Li  
> 
>   * gcc.dg/memcmp-1.c (test_strncmp): Use strncpy instead of
> strcpy.
-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

2017-08-30 Thread Tamar Christina
Sure, I'll split mid-end parts off and add a test.

Thanks,
Tamar

From: H.J. Lu 
Sent: Wednesday, August 30, 2017 2:16:12 PM
To: Tamar Christina
Cc: Richard Sandiford; GCC Patches; nd; James Greenhalgh; Richard Earnshaw; 
Marcus Shawcroft
Subject: Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned 
memory access

On Wed, Aug 30, 2017 at 2:21 AM, Tamar Christina
 wrote:
> Hi,
>
> The test I used was testsuite/gcc.c-torture/compile/structs.c

Results look very nice on x86-64.  Can we add a testcase to scan the
compiler output
to verify this optimization?

Thanks.

> Thanks,
> Tamar
> 
> From: H.J. Lu 
> Sent: Friday, August 25, 2017 8:10:22 PM
> To: Tamar Christina
> Cc: Richard Sandiford; GCC Patches; nd; James Greenhalgh; Richard Earnshaw; 
> Marcus Shawcroft
> Subject: Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned 
> memory access
>
> On Sun, Jul 2, 2017 at 11:17 PM, Tamar Christina
>  wrote:
>> Hi All,
>>
>> Sorry I just realized I never actually sent the updated patch...
>>
>> So here it is :)
>>
>
> Are there any testcases?
>
> --
> H.J.



--
H.J.


Re: [PATCH GCC]A simple implementation of loop interchange

2017-08-30 Thread Richard Biener
On Wed, Aug 30, 2017 at 3:18 PM, Bin Cheng  wrote:
> Hi,
> This patch implements a simple loop interchange pass in GCC, as described by 
> its comments:
> +/* This pass performs loop interchange: for example, the loop nest
> +
> +   for (int j = 0; j < N; j++)
> + for (int k = 0; k < N; k++)
> +   for (int i = 0; i < N; i++)
> +c[i][j] = c[i][j] + a[i][k]*b[k][j];
> +
> +   is transformed to
> +
> +   for (int i = 0; i < N; i++)
> + for (int j = 0; j < N; j++)
> +   for (int k = 0; k < N; k++)
> +c[i][j] = c[i][j] + a[i][k]*b[k][j];
> +
> +   This pass implements loop interchange in the following steps:
> +
> + 1) Find perfect loop nest for each innermost loop and compute data
> +   dependence relations for it.  For above example, loop nest is
> +   .
> + 2) From innermost to outermost loop, this pass tries to interchange
> +   each loop pair.  For above case, it firstly tries to interchange
> +    and loop nest becomes .
> +   Then it tries to interchange  and loop nest becomes
> +   .  The overall effect is to move innermost
> +   loop to the outermost position.  For loop pair 
> +   to be interchanged, we:
> + 3) Check if data dependence relations are valid for loop interchange.
> + 4) Check if both loops can be interchanged in terms of transformation.
> + 5) Check if interchanging the two loops is profitable.
> + 6) Interchange the two loops by mapping induction variables.
> +
> +   This pass also handles reductions in loop nest.  So far we only support
> +   simple reduction of inner loop and double reduction of the loop nest.  */
>
> Actually, this pass only does loop shift which outermosting inner loop to 
> outer, rather
> than permutation.  Also as a traditional loop optimizer, it only works for 
> perfect loop
> nest.  I put it just after loop distribution thus ideally loop 
> split/distribution could
> create perfect nest for it.  Unfortunately, we don't get any perfect nest 
> from distribution
> for now because it only works for innermost loop.  For example, the 
> motivation case in
> spec2k6/bwaves is not handled on this pass alone.  I have a patch extending 
> distribution
> for (innermost) loop nest and with that patch bwaves case can be handled.
> Another point is I deliberately make both the cost model and code 
> transformation (very)
> conservative.  We can support more cases, or more transformations with great 
> care when
> it is for sure known beneficial.  IMHO, we already hit over-baked issues 
> quite often and
> don't want to introduce more.
> As for code generation, this patch has an issue that invariant code in outer 
> loop could
> be moved to inner loop.  For the moment, we rely on the last lim pass to 
> handle all INV
> generated during interchange.  In the future, we may need to avoid that in 
> interchange
> itself, or another lim pass just like the one after graphite optimizations.
>
> Boostrap and test on x86_64 and AArch64.  Various benchmarks built and run 
> successfully.
> Note this pass is disabled in patch, while the code is exercised by 
> bootstrap/building
> programs with it enabled by default.  Any comments?

+/* The same as above, but this one is only used for interchanging not
+   innermost loops.  */
+#define OUTER_STRIDE_RATIO (2)

please make all these knobs --params.

+/* Enum type for loop reduction variable.  */
+enum reduction_type
+{
+  UNKNOWN_RTYPE = 0,
+  SIMPLE_RTYPE,
+  DOUBLE_RTYPE
+};

seeing this we should have some generic data structure / analysis for
reduction detection.  This adds a third user (autopar and vectorizer
are the others).  Just an idea.

+/* Return true if E is abnormal edge.  */
+
+static inline bool
+abnormal_edge (edge e)
+{
+  return (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP));
+}

bad name/comment for what it does.

... jumping to end of file / start of pass

+  /* Get the outer loop.  */
+  loop = superloop_at_depth (loop, loop_depth (loop) - 1);

loop_outer (loop)?

+  /* Only support rectangle loop nest, i.e, inner loop's niters doesn't
+depends on outer loop's IV.  */
+  if (chrec_contains_symbols_defined_in_loop (niters, loop->num))
+   break;

but you don't check for a three-nest whether niters depends on outer outer
loop's IV that way.  Either the check is superfluous here or incomplete.

+  /* Check if start_loop forms a perfect loop nest.  */
+  loop_nest->create (3);
+  do {
+datarefs->create (20);
+ddrs->create (20);
+loop_nest->truncate (0);
+if (compute_data_dependences_for_loop (start_loop, false, loop_nest,
+  datarefs, ddrs))
+  {
+   unsigned i;
+   struct data_dependence_relation *ddr;
+
+   for (i = 0; ddrs->iterate (i, ); ++i)
+ {
+   if 

Re: [PATCH,AIX] Cleanup in libiberty for AIX.

2017-08-30 Thread Ian Lance Taylor
On Tue, Aug 29, 2017 at 8:10 AM, REIX, Tony  wrote:
> Description:
>  * This patch does some cleanup in libiberty for AIX.
>
> Tests:
>  * AIX: Build: SUCCESS
>- build made by means of gmake in trunk.
>- patch generated by:
> cd gcc-svn-trunk/
> svn diff libiberty/ChangeLog libiberty/simple-object-xcoff.c
>
> ChangeLog:
>+   * simple-object-xcoff.c (simple_object_xcoff_find_sections):
>+   Improve .go_export csect handling.  Don't make assumptions
>+   on containing section or number of auxiliary entries.

Thanks.  Committed.

Ian


RE: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-08-30 Thread Richard Biener
On Tue, 29 Aug 2017, Tamar Christina wrote:

> 
> 
> > -Original Message-
> > From: Richard Biener [mailto:rguent...@suse.de]
> > Sent: 24 August 2017 10:16
> > To: Tamar Christina
> > Cc: Joseph Myers; Christophe Lyon; Markus Trippelsdorf; Jeff Law; GCC
> > Patches; Wilco Dijkstra; Michael Meissner; nd
> > Subject: RE: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like
> > numbers in GIMPLE.
> > 
> > On Wed, 23 Aug 2017, Tamar Christina wrote:
> > 
> > > Hi Richard,
> > >
> > > Thanks for the feedback,
> > >
> > > >
> > > > I think you should split up your work a bit.  The pieces from
> > > > fold_builtin_* you remove and move to lowering that require no
> > > > control flow like __builtin_isnan (x) -> x UNORD x should move to
> > > > match.pd patterns (aka foldings that will then be applied both on
> > GENERIC and GIMPLE).
> > > >
> > >
> > > But emitting them as an UNORD wouldn't solve the performance or
> > > correctness issues The patch is trying to address. For correctness,
> > > for instance an UNORD would still signal When -fsignaling-nans is used
> > (PR/66462).
> > 
> > Well, then address the correctness issues separately.  I think lumping
> > everything into a single patch just makes your live harder ;)
> 
> But the patch was made to fix the correctness issues. Which unless I'm 
> mistaken can only be solved using integer operations. To give a short 
> history on the patch:
> 
> My original patch had this in builtins.c where the current code is and 
> had none of this problem. The intention of this patch was just address 
> the correctness issues with isnan etc.
> 
> Since we had to do this using integer operations we figured we'd also 
> replace the operations with faster ones so the code was tweaked in order 
> to e.g. make the common paths of fpclassify exit earlier. (I had 
> provided benchmark results to show that the integer operations are 
> faster than the floating point ones, also on x86).
> 
> It was then suggested by upstream review that I do this as a gimple lowering
> phase. I pushed back against this but ultimately lost and submitted a new 
> version
> that moved from builtins.c to gimple-low.c, which was why late expansion for 
> C++
> broke, because the expansion is now done very early.
> 
> The rational for this move was that it would give the rest of the pipeline a 
> chance
> to optimize the code further.
> 
> After a few revisions this was ultimately accepted (the v3 version), but was 
> reverted
> because it broke IBM's long double format due to the special case code for it 
> violating SSA.
> 
> That was trivially fixed (once I finally had access to a PowerPC hardware 
> months later) and I changed the
> patch to leave the builtin as a function call if at lowering time it wasn't 
> known to be a builtin.
> This then "fixes" the C++ testcase, in that the test doesn't fail anymore, 
> but it doesn't generate
> the same code as before.
> Though the test is quite contrived and I don't know if actual user code often 
> has this.
> 
> While checking to see if this behavior is OK, I was suggested to do it 
> as a gimple-fold instead. I did, but then turns out it won't work for 
> non-constants as I can't generate new control flow from a fold (which 
> makes sense in retrospect but I didn't know this before).
> 
> This is why the patch is more complex than what it was before. The 
> original version only emitted simple tree.

The correctness issue is only about -fsignaling-nans, correct?  So a
simple fix is to guard the existing builtins.c code with
!flag_singalling_nans (and emit a library call for -fsignalling-nans).

> > 
> > I was saying that if I write isunordered (x, y) and use -fno-signaling-nans 
> > (the
> > default) then I would expect the same code to be generated as if I say isnan
> > (x) || isnan (y).  And I'm not sure if isnan (x) || isnan (y) is more 
> > efficient if
> > both are implemented with integer ops compared to using an unordered FP
> > compare.
> > 
> 
> But isnan (x) || isnan (y) can be rewritten using a match.pd rule to 
> isunordered (x, y)
> If that should really generate the same code. So I don’t think it's an issue 
> for this patch.

If isnan is lowered to integer ops than we'll have a hard time recognizing
it.

> > A canonical high-level representation on GIMPLE is equally important as
> > maybe getting some optimization on a lowered "optimized" form.
> > 
> > And a good canonical representation is short (x UNORD x or x UNORD y
> > counts as short here).
> > 
> > > > As fallback the expanders should simply emit a call (as done for
> > > > isinf when not folded for example).
> > >
> > > Yes the patch currently already does this.
> > 
> > Ah, I probably looked at an old version then.
> > 
> > > My question had more to do if
> > > the late expansion when you alias one of these functions in C++ was
> > > really an issue, since It used to (sometimes, only when you got the
> > > type of the argument correct) expand before whereas now it'll 

[PATCH GCC]A simple implementation of loop interchange

2017-08-30 Thread Bin Cheng
Hi,
This patch implements a simple loop interchange pass in GCC, as described by 
its comments:
+/* This pass performs loop interchange: for example, the loop nest
+
+   for (int j = 0; j < N; j++)
+ for (int k = 0; k < N; k++)
+   for (int i = 0; i < N; i++)
+c[i][j] = c[i][j] + a[i][k]*b[k][j];
+
+   is transformed to
+
+   for (int i = 0; i < N; i++)
+ for (int j = 0; j < N; j++)
+   for (int k = 0; k < N; k++)
+c[i][j] = c[i][j] + a[i][k]*b[k][j];
+
+   This pass implements loop interchange in the following steps:
+
+ 1) Find perfect loop nest for each innermost loop and compute data
+   dependence relations for it.  For above example, loop nest is
+   .
+ 2) From innermost to outermost loop, this pass tries to interchange
+   each loop pair.  For above case, it firstly tries to interchange
+    and loop nest becomes .
+   Then it tries to interchange  and loop nest becomes
+   .  The overall effect is to move innermost
+   loop to the outermost position.  For loop pair 
+   to be interchanged, we:
+ 3) Check if data dependence relations are valid for loop interchange.
+ 4) Check if both loops can be interchanged in terms of transformation.
+ 5) Check if interchanging the two loops is profitable.
+ 6) Interchange the two loops by mapping induction variables.
+
+   This pass also handles reductions in loop nest.  So far we only support
+   simple reduction of inner loop and double reduction of the loop nest.  */

Actually, this pass only does loop shift which outermosting inner loop to 
outer, rather
than permutation.  Also as a traditional loop optimizer, it only works for 
perfect loop
nest.  I put it just after loop distribution thus ideally loop 
split/distribution could
create perfect nest for it.  Unfortunately, we don't get any perfect nest from 
distribution
for now because it only works for innermost loop.  For example, the motivation 
case in
spec2k6/bwaves is not handled on this pass alone.  I have a patch extending 
distribution
for (innermost) loop nest and with that patch bwaves case can be handled.
Another point is I deliberately make both the cost model and code 
transformation (very)
conservative.  We can support more cases, or more transformations with great 
care when
it is for sure known beneficial.  IMHO, we already hit over-baked issues quite 
often and
don't want to introduce more.
As for code generation, this patch has an issue that invariant code in outer 
loop could
be moved to inner loop.  For the moment, we rely on the last lim pass to handle 
all INV
generated during interchange.  In the future, we may need to avoid that in 
interchange
itself, or another lim pass just like the one after graphite optimizations.

Boostrap and test on x86_64 and AArch64.  Various benchmarks built and run 
successfully.
Note this pass is disabled in patch, while the code is exercised by 
bootstrap/building
programs with it enabled by default.  Any comments?

Thanks,
bin
2017-08-29  Bin Cheng  

* Makefile.in (tree-ssa-loop-interchange.o): New object file.
* common.opt (ftree-loop-interchange): New option.
* doc/invoke.texi (-ftree-loop-interchange): Document new option.
* passes.def (pass_linterchange): New pass.
* timevar.def (TV_LINTERCHANGE): New time var.
* tree-pass.h (make_pass_linterchange): New declaration.
* tree-ssa-loop-interchange.cc: New file.
* tree-ssa-loop-ivcanon.c (create_canonical_iv): Change to external.
* tree-ssa-loop-ivopts.h (create_canonical_iv): New declaration.

gcc/testsuite
2017-08-29  Bin Cheng  

* gcc.dg/tree-ssa/loop-interchange-1.c: New test.
* gcc.dg/tree-ssa/loop-interchange-2.c: New test.
* gcc.dg/tree-ssa/loop-interchange-3.c: New test.
* gcc.dg/tree-ssa/loop-interchange-4.c: New test.
* gcc.dg/tree-ssa/loop-interchange-5.c: New test.
* gcc.dg/tree-ssa/loop-interchange-6.c: New test.
* gcc.dg/tree-ssa/loop-interchange-7.c: New test.
* gcc.dg/tree-ssa/loop-interchange-8.c: New test.
* gcc.dg/tree-ssa/loop-interchange-9.c: New test.
* gcc.dg/tree-ssa/loop-interchange-10.c: New test.diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 0bde7ac..5002598 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1522,6 +1522,7 @@ OBJS = \
tree-ssa-live.o \
tree-ssa-loop-ch.o \
tree-ssa-loop-im.o \
+   tree-ssa-loop-interchange.o \
tree-ssa-loop-ivcanon.o \
tree-ssa-loop-ivopts.o \
tree-ssa-loop-manip.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 1331008..e7efa09 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2524,6 +2524,10 @@ ftree-loop-distribute-patterns
 Common Report 

Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

2017-08-30 Thread H.J. Lu
On Wed, Aug 30, 2017 at 2:21 AM, Tamar Christina
 wrote:
> Hi,
>
> The test I used was testsuite/gcc.c-torture/compile/structs.c

Results look very nice on x86-64.  Can we add a testcase to scan the
compiler output
to verify this optimization?

Thanks.

> Thanks,
> Tamar
> 
> From: H.J. Lu 
> Sent: Friday, August 25, 2017 8:10:22 PM
> To: Tamar Christina
> Cc: Richard Sandiford; GCC Patches; nd; James Greenhalgh; Richard Earnshaw; 
> Marcus Shawcroft
> Subject: Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned 
> memory access
>
> On Sun, Jul 2, 2017 at 11:17 PM, Tamar Christina
>  wrote:
>> Hi All,
>>
>> Sorry I just realized I never actually sent the updated patch...
>>
>> So here it is :)
>>
>
> Are there any testcases?
>
> --
> H.J.



-- 
H.J.


Re: [PATCH] Expand switch statements with a single (or none) non-default case.

2017-08-30 Thread Richard Biener
On Wed, Aug 30, 2017 at 2:32 PM, Martin Liška  wrote:
> On 08/30/2017 02:28 PM, Richard Biener wrote:
>> On Wed, Aug 30, 2017 at 1:13 PM, Martin Liška  wrote:
>>> Hi.
>>>
>>> Simple transformation of switch statements where degenerated switch can be 
>>> interpreted
>>> as gimple condition (or removed if having any non-default case). I 
>>> originally though
>>> that we don't have switch statements without non-default cases, but PR82032 
>>> shows we
>>> can see it.
>>>
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>
>> While I guess this case is ok to handle here it would be nice if CFG cleanup
>> would do the same.  I suppose find_taken_edge somehow doesn't work for
>> this case even after my last enhancement?  Or is CFG cleanup for some reason
>> not run?
>
> Do you mean both with # of non-default edges equal to 0 and 1?
> Let me take a look.

First and foremost 0.  The case of 1 non-default and a default would
need extra code.

Richard.

> Martin
>
>>
>> Richard.
>>
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-08-25  Martin Liska  
>>>
>>> PR tree-optimization/82032
>>> * tree-switch-conversion.c (generate_high_low_equality): New
>>> function.
>>> (expand_degenerated_switch): Likewise.
>>> (process_switch): Call expand_degenerated_switch.
>>> (try_switch_expansion): Likewise.
>>> (emit_case_nodes): Use generate_high_low_equality.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2017-08-25  Martin Liska  
>>>
>>> PR tree-optimization/82032
>>> * gcc.dg/tree-ssa/pr68198.c: Update jump threading expectations.
>>> * gcc.dg/tree-ssa/switch-expansion.c: New test.
>>> * gcc.dg/tree-ssa/vrp34.c: Update scanned pattern.
>>> * g++.dg/other/pr82032.C: New test.
>>> ---
>>>  gcc/testsuite/g++.dg/other/pr82032.C |  36 +++
>>>  gcc/testsuite/gcc.dg/tree-ssa/pr68198.c  |   6 +-
>>>  gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c |  14 +++
>>>  gcc/testsuite/gcc.dg/tree-ssa/vrp34.c|   5 +-
>>>  gcc/tree-switch-conversion.c | 123 
>>> ++-
>>>  5 files changed, 152 insertions(+), 32 deletions(-)
>>>  create mode 100644 gcc/testsuite/g++.dg/other/pr82032.C
>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c
>>>
>>>
>


Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-08-30 Thread Richard Biener
On Wed, Aug 30, 2017 at 11:46 AM, Jackson Woodruff
 wrote:
> On 08/29/2017 01:13 PM, Richard Biener wrote:
>>
>> On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff
>>  wrote:
>>>
>>> Hi all,
>>>
>>> Apologies again to those CC'ed, who (again) received this twice.
>>>
>>> Joseph: Yes you are correct. I misread the original thread, now fixed.
>>>
>>> Richard: I've moved the optimizations out of fold-const.c. One has been
>>> replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C)
>>> I've
>>> deleted as it only introduced a new optimization when running with the
>>> flags
>>> '-O0 -funsafe-math-optimizations'.
>>
>>
>> Hmm, how did you verify that, that it only adds sth with -O0
>> -funsafe-math-optimizations?
>
>
> By checking with various flags, although not exhaustively. I looked for
> reasons for the behavior in match.pd (explained below).
>
> I have also since discovered that the combinations of
> '-funsafe-math-optimizations -frounding-math' (at all levels) and
> '-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern adds
> something.
>
>> Is that because in GIMPLE the reassoc pass should do this transform?
>
> It's because the pattern that changes (X / C) -> X * (1 / C) is gated with
> O1:
>
>   (for cst (REAL_CST COMPLEX_CST VECTOR_CST)
>(simplify
> (rdiv @0 cst@1)
> ->(if (optimize)
> -> (if (flag_reciprocal_math
>   && !real_zerop (@1))
>   (with
>{ tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), @1);
> }
>(if (tem)
> (mult @0 { tem; } )))
>   (if (cst != COMPLEX_CST)
>(with { tree inverse = exact_inverse (type, @1); }
> (if (inverse)
>  (mult @0 { inverse; } 
>
>
> I've flagged the two lines that are particularly relevant to this.

So this means we go x / (C * y) -> (x / C) / y -> (x * (1/C)) / y
why's that in any way preferable?  I suppose this is again to enable
the recip pass to detect / y (as opposed to / (C * y))?  What's the
reason to believe that / y is more "frequent"?

> Removing this pattern, as I would expect, means that the divisions in the
> above optimization (and the one further down) are not removed.
>
> So then there is the question of edge cases. This pattern is (ignoring the
> second case) going to fail when const_binop returns null. Looking through
> that function says that it will fail (for reals) when:
>
> - Either argument is null (not the case)
>
> - The operation is not in the list (PLUS_EXPR,
>   MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR)
>   (again not the case)
>
> - We honor Signalling NaNs and one of the operands is a sNaN.
>
> - The operation is a division, and the second argument is zero
>   and dividing by 0.0 raises an exception.
>
> - The result is infinity and neither of the operands were infinity
>   and flag_trapping_math is set.
>
> - The result isn't exact and flag_rounding_math is set.
>
>
> For (x / ( y * C) -> (x / C) / y), I will add some gating for each of these
> so that the pattern is never executed if one of these would be the case.

Why not transform this directly to (x * (1/C)) / y then (and only then)?
That makes it obvious not two divisions prevail.

That said, I'm questioning this canonicalization.  I can come up with a
testcase where it makes things worse:

 tem = x / (y * C);
 tem2 = z / (y * C);

should generate

 rdivtmp = 1 / (y*C);
 tem = x *rdivtmp;
 tem2= z * rdivtmp;

instead of

 rdivtmp = 1/y;
 tem = x * 1/C * rdivtmp;
 tem2 = z * 1/C * rdivtmp;

> The additional cases where this isn't converted to a multiplication by the
> reciprocal appear to be when -freciprocal-math is used, but we don't have
> -fno-rounding-math, or funsafe-math-optimizations.
>
> For the removal of (x / C +- y / C -> (x +- y) / C), there is a trade-off of
> whether it is worth having an extra pattern for these edge cases. On this
> I'm not sure.

Well, at least leave it in fold-const.c if you can't move it.

>>
>> You added
>>
>> +/* Simplify x / (- y) to -x / y.  */
>> +(simplify
>> + (rdiv @0 (negate @1))
>> + (rdiv (negate @0) @1))
>
>
> Sorry, this was unclear, the changes to fold const should be split up from
> the additions to match.pd.
>
> This was one of the changes discussed before. The idea is to canonicalize
> this so that y can be extracted in the cse_reciprocals pass.

As said elsewhere I think cse_reciprocals needs a rewrite to not rely on
arbitrary canonicalization to do its work but consider association
opportunities globally with a focus on CSE.

>
>>
>> for
>>
>>/* (-A) / (-B) -> A / B  */
>>if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1))
>>  return fold_build2_loc (loc, RDIV_EXPR, type,
>>  TREE_OPERAND (arg0, 0),
>>  negate_expr (arg1));
>>if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0))
>>  return fold_build2_loc (loc, 

Re: [PATCH] Expand switch statements with a single (or none) non-default case.

2017-08-30 Thread Martin Liška
On 08/30/2017 02:28 PM, Richard Biener wrote:
> On Wed, Aug 30, 2017 at 1:13 PM, Martin Liška  wrote:
>> Hi.
>>
>> Simple transformation of switch statements where degenerated switch can be 
>> interpreted
>> as gimple condition (or removed if having any non-default case). I 
>> originally though
>> that we don't have switch statements without non-default cases, but PR82032 
>> shows we
>> can see it.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> While I guess this case is ok to handle here it would be nice if CFG cleanup
> would do the same.  I suppose find_taken_edge somehow doesn't work for
> this case even after my last enhancement?  Or is CFG cleanup for some reason
> not run?

Do you mean both with # of non-default edges equal to 0 and 1?
Let me take a look.

Martin

> 
> Richard.
> 
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2017-08-25  Martin Liska  
>>
>> PR tree-optimization/82032
>> * tree-switch-conversion.c (generate_high_low_equality): New
>> function.
>> (expand_degenerated_switch): Likewise.
>> (process_switch): Call expand_degenerated_switch.
>> (try_switch_expansion): Likewise.
>> (emit_case_nodes): Use generate_high_low_equality.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-08-25  Martin Liska  
>>
>> PR tree-optimization/82032
>> * gcc.dg/tree-ssa/pr68198.c: Update jump threading expectations.
>> * gcc.dg/tree-ssa/switch-expansion.c: New test.
>> * gcc.dg/tree-ssa/vrp34.c: Update scanned pattern.
>> * g++.dg/other/pr82032.C: New test.
>> ---
>>  gcc/testsuite/g++.dg/other/pr82032.C |  36 +++
>>  gcc/testsuite/gcc.dg/tree-ssa/pr68198.c  |   6 +-
>>  gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c |  14 +++
>>  gcc/testsuite/gcc.dg/tree-ssa/vrp34.c|   5 +-
>>  gcc/tree-switch-conversion.c | 123 
>> ++-
>>  5 files changed, 152 insertions(+), 32 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.dg/other/pr82032.C
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c
>>
>>



Re: [PATCH] Fix IPA ICF with ASM statements (PR inline-asm/82001).

2017-08-30 Thread Richard Biener
On Wed, Aug 30, 2017 at 12:17 PM, Martin Liška  wrote:
> On 08/30/2017 11:18 AM, Richard Biener wrote:
>> On Wed, Aug 30, 2017 at 11:12 AM, Martin Liška  wrote:
>>> Hi.
>>>
>>> Following patch compares also constraints of input and output operands of 
>>> ASM statements.
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>
>> It's now no longer "compare_tree_list_operand" but compare_asm_constraint
>> where there's no need to walk TREE_CHAIN either.
>
> Yep, it desires a refactoring.
>
> Is it fine for trunk?

Ok.

Richard.

> Martin
>
>>
>> So can you refactor this a bit?
>>
>>> Martin
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-08-28  Martin Liska  
>>>
>>> PR inline-asm/82001
>>> * ipa-icf-gimple.c (func_checker::compare_tree_list_operand):
>>> Compare TREE_PURPOSE of asm inputs and outputs.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2017-08-28  Martin Liska  
>>>
>>> PR inline-asm/82001
>>> * gcc.dg/ipa/pr82001.c: New test.
>>> ---
>>>  gcc/ipa-icf-gimple.c   | 10 ++
>>>  gcc/testsuite/gcc.dg/ipa/pr82001.c | 21 +
>>>  2 files changed, 31 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr82001.c
>>>
>>>
>


Re: [66/77] Use scalar_mode for constant integers

2017-08-30 Thread Richard Sandiford
Jeff Law  writes:
> On 07/13/2017 03:02 AM, Richard Sandiford wrote:
>> This patch treats the mode associated with an integer constant as a
>> scalar_mode.  We can't use the more natural-sounding scalar_int_mode
>> because we also use (const_int 0) for bounds-checking modes.  (It might
>> be worth adding a bounds-specific code instead, but that's for another
>> day.)
> Is that the only reason why we can't use scalar_int_mode here -- the
> bounds checking stuff?  What if it were to just magically disappear?

:-)  I *think* that was the only case, but it's possible that once
we hit it, we didn't look much further.  

[...]

>> We didn't try to make these functions take scalar_mode arguments
>> because in many cases that would be too invasive at this stage.
>> Maybe it would become feasible in future.  Also, the long-term
>> direction should probably be to add modes to constant integers
>> rather than have then as VOIDmode odd-ones-out.  That would remove
>> the need for rtx_mode_t and thus remove the question whether they
>> should use scalar_int_mode, scalar_mode or machine_mode.
> THe lack of a mode on CONST_INTs is a long standing wart.  It's been
> eons since we really thought about how to fix it.I'd have to dig
> real deep to remember why we've let the wart stand so long

When I last looked at the history, I got the impression it was just
lack of time.  I have a vague plan for how we could transition to
integers with modes, but then I've had the same plan for a while now
and no realistic chance of getting time to do it.

Thanks,
Richard


Re: [PATCH] Expand switch statements with a single (or none) non-default case.

2017-08-30 Thread Richard Biener
On Wed, Aug 30, 2017 at 1:13 PM, Martin Liška  wrote:
> Hi.
>
> Simple transformation of switch statements where degenerated switch can be 
> interpreted
> as gimple condition (or removed if having any non-default case). I originally 
> though
> that we don't have switch statements without non-default cases, but PR82032 
> shows we
> can see it.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

While I guess this case is ok to handle here it would be nice if CFG cleanup
would do the same.  I suppose find_taken_edge somehow doesn't work for
this case even after my last enhancement?  Or is CFG cleanup for some reason
not run?

Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2017-08-25  Martin Liska  
>
> PR tree-optimization/82032
> * tree-switch-conversion.c (generate_high_low_equality): New
> function.
> (expand_degenerated_switch): Likewise.
> (process_switch): Call expand_degenerated_switch.
> (try_switch_expansion): Likewise.
> (emit_case_nodes): Use generate_high_low_equality.
>
> gcc/testsuite/ChangeLog:
>
> 2017-08-25  Martin Liska  
>
> PR tree-optimization/82032
> * gcc.dg/tree-ssa/pr68198.c: Update jump threading expectations.
> * gcc.dg/tree-ssa/switch-expansion.c: New test.
> * gcc.dg/tree-ssa/vrp34.c: Update scanned pattern.
> * g++.dg/other/pr82032.C: New test.
> ---
>  gcc/testsuite/g++.dg/other/pr82032.C |  36 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr68198.c  |   6 +-
>  gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c |  14 +++
>  gcc/testsuite/gcc.dg/tree-ssa/vrp34.c|   5 +-
>  gcc/tree-switch-conversion.c | 123 
> ++-
>  5 files changed, 152 insertions(+), 32 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/other/pr82032.C
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c
>
>


Re: [C++ PATCH] Make taking the address of an overloaded function a non-deduced context

2017-08-30 Thread Ville Voutilainen
On 30 August 2017 at 01:25, Ville Voutilainen
 wrote:
> On 30 August 2017 at 01:19, Ville Voutilainen
>  wrote:
>> 2017-08-29  Ville Voutilainen  
>>
>> Make taking the address of an overloaded function a non-deduced context
>>
>> cp/
>>
>> * pt.c (unify_overload_resolution_failure): Return unify_success
>> instead of unify_invalid.
>>
>> testsuite/
>>
>> * g++.dg/overload/template6.C: New.
>
> Ah yes, tested on Linux-PPC64. I see some libstdc++ (runtime) failures
> from the testsuite, but as far as I can
> see they are there even before this patch.

..and were caused by an unclean libstdc++ build on my end. The patch
passes the full testsuite with
no regressions.


[PATCH] Expand switch statements with a single (or none) non-default case.

2017-08-30 Thread Martin Liška
Hi.

Simple transformation of switch statements where degenerated switch can be 
interpreted
as gimple condition (or removed if having any non-default case). I originally 
though
that we don't have switch statements without non-default cases, but PR82032 
shows we
can see it.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2017-08-25  Martin Liska  

PR tree-optimization/82032
* tree-switch-conversion.c (generate_high_low_equality): New
function.
(expand_degenerated_switch): Likewise.
(process_switch): Call expand_degenerated_switch.
(try_switch_expansion): Likewise.
(emit_case_nodes): Use generate_high_low_equality.

gcc/testsuite/ChangeLog:

2017-08-25  Martin Liska  

PR tree-optimization/82032
* gcc.dg/tree-ssa/pr68198.c: Update jump threading expectations.
* gcc.dg/tree-ssa/switch-expansion.c: New test.
* gcc.dg/tree-ssa/vrp34.c: Update scanned pattern.
* g++.dg/other/pr82032.C: New test.
---
 gcc/testsuite/g++.dg/other/pr82032.C |  36 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr68198.c  |   6 +-
 gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c |  14 +++
 gcc/testsuite/gcc.dg/tree-ssa/vrp34.c|   5 +-
 gcc/tree-switch-conversion.c | 123 ++-
 5 files changed, 152 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/other/pr82032.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c


diff --git a/gcc/testsuite/g++.dg/other/pr82032.C b/gcc/testsuite/g++.dg/other/pr82032.C
new file mode 100644
index 000..607bf85c8e1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/other/pr82032.C
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -Wno-return-type" } */
+
+template  class b
+{
+public:
+  typename a::aa operator[] (typename a::c) { }
+};
+class d
+{
+public:
+  typedef long c;
+  typedef int aa;
+};
+struct e
+{
+  int af[4];
+  int ag;
+};
+b f;
+bool
+g (e )
+{
+  for (int h; h; ++h)
+switch (f[h])
+  {
+  case 'x':
+  case 'a':
+	i.af[h] = 3;
+	break;
+  default:
+	return false;
+  }
+
+  return true;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c b/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
index 16097d7e2bc..59d562e156c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr68198.c
@@ -37,7 +37,5 @@ c_finish_omp_clauses (tree clauses)
 }
 }
 
-/* There are 3 FSM jump threading opportunities, two of which will
-  get filtered out.  */
-/* { dg-final { scan-tree-dump-times "Registering FSM" 1 "thread1"} } */
-/* { dg-final { scan-tree-dump-times "FSM Thread through multiway branch without threading a multiway branch" 2 "thread1"} } */
+/* There are 3 FSM jump threading opportunities.  */
+/* { dg-final { scan-tree-dump-times "Registering FSM" 3 "thread1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c b/gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c
new file mode 100644
index 000..306253f3f9c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/switch-expansion.c
@@ -0,0 +1,14 @@
+/* { dg-options "-O2 -fdump-tree-switchconv" } */
+
+int check(int param)
+{
+  switch (param) 
+{
+case -2:
+  return 1;
+default:
+  return 0;
+}
+}
+
+/* { dg-final { scan-tree-dump "Expanding GIMPLE switch as condition" "switchconv" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp34.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp34.c
index 142e56c1641..d2a36a706f2 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp34.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp34.c
@@ -15,5 +15,6 @@ foo (int a)
 }
 }
 
-/* Both ifs should be optimized.  */
-/* { dg-final { scan-tree-dump-times "if \\\(" 0 "vrp1" } } */
+/* Both ifs should be optimized (and switch statement will be the only if
+   in the function).  */
+/* { dg-final { scan-tree-dump-times "if \\\(" 1 "vrp1" } } */
diff --git a/gcc/tree-switch-conversion.c b/gcc/tree-switch-conversion.c
index 5a827a6f1f9..b1cc203aba2 100644
--- a/gcc/tree-switch-conversion.c
+++ b/gcc/tree-switch-conversion.c
@@ -1486,6 +1486,88 @@ gen_inbound_check (gswitch *swtch, struct switch_conv_info *info)
 }
 }
 
+/* Generate GIMPLE IL to basic block BB which compares whether INDEX
+   value is within range LOW ... HIGH.  We create a LHS and RHS that
+   can be then compared in order to hit the interval or not.  */
+
+static void
+generate_high_low_equality (basic_block bb, tree index, tree low, tree high,
+			tree *lhs, tree *rhs)
+{
+  tree type = TREE_TYPE (index);
+  tree utype = unsigned_type_for (type);
+
+  low = fold_convert (type, low);
+  high = fold_convert (type, high);
+
+  tree tmp = make_ssa_name (type);
+  gassign *sub1
+= gimple_build_assign (tmp, MINUS_EXPR, index, low);
+
+  *lhs = make_ssa_name (utype);
+  gassign *a = gimple_build_assign 

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-30 Thread Martin Liška
On 08/30/2017 10:33 AM, Rainer Orth wrote:
> Hi Martin,
> 
>> Yes, sorry for the missing entries. Fixed and installed as r251412.
>> Hopefully fall out will be small.
> 
> there's
> 
> UNRESOLVED: gcc.dg/tree-ssa/vrp104.c scan-tree-dump-times switchlower 
> "switch" 1
> 
> everywhere, it seems.  gcc.log has
> 
> gcc.dg/tree-ssa/vrp104.c: dump file does not exist
> 
> For one, the testcase needs -fdump-tree-switchlower now, and on
> i386-pc-solaris2.11 I find 2 instances of switch in the dump...
> 
> Please fix.
> 
>   Rainer
> 

Sorry for the breakage. Fixed by installing attach patch, which is obvious.

Martin
>From 60fd6e2bd2927acf33f7aab2c178cbfcd268d314 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 30 Aug 2017 13:06:12 +0200
Subject: [PATCH] Fix test-case vrp104.c.

gcc/testsuite/ChangeLog:

2017-08-30  Martin Liska  

	* gcc.dg/tree-ssa/vrp104.c: Change dump file name.
---
 gcc/testsuite/gcc.dg/tree-ssa/vrp104.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp104.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp104.c
index aa3b00a1204..0a952267b29 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp104.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp104.c
@@ -1,6 +1,8 @@
 /* PR tree-optimization/18046  */
-/* { dg-options "-O2 -fdump-tree-optimized" }  */
-/* { dg-final { scan-tree-dump-times "switch" 1 "switchlower" } }  */
+/* { dg-options "-O2 -fdump-tree-switchlower" }  */
+/* We scan for 2 switches as the dump file reports a transformation,
+   IL really contains just a single.  */
+/* { dg-final { scan-tree-dump-times "switch" 2 "switchlower" } }  */
 
 void foo (void);
 void bar (void);
-- 
2.14.1



RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-08-30 Thread Richard Biener
On Wed, 30 Aug 2017, Jon Beniston wrote:

> Hi Richard,
> 
> >> Ah, that's what I first tried, and mistakenly sent the patch against
> that.
> >> 
> >> That did help the initial problem, but then I needed to add a lot of 
> >> support for the V1SI type in the backend (which just duplicated SI 
> >> patterns) and there are a few places where GCC seems to have sanity 
> >> checks against vectors that are only one element wide. A dot product 
> >> producing a scalar result seems natural.
> >
> >Where do you need V1SI types?  The vectorizer should perform a SI extract 
> >from V1SI here.  You need to elaborate a bit here.
> 
> After adding single element vector type support in the middle-end, at the
> tree level, V1SI vector types would be the result type for the dot product
> if the input operands were V2HI.
> 
> During RTL expansion, if V1SImode is accepted in the
> TARGET_VECTOR_MODE_SUPPORTED_P hook, then V1SImode will be used after RTL
> expansion and V1SImode patterns are needed, although they are actually
> duplicates of SImode patterns.  I have now removed V1SImode from
> TARGET_VECTOR_MODE_SUPPORTED_P, and they are turned into SI mode so there is
> no need for the V1SI patterns now.
> 
> >> The vectorizer doesn't really support vector->scalar ops so V1SI 
> >> feels more natural here.  That is, your patch looks a bit ugly -- 
> >> there's nothing wrong with V1SI vector types.
> 
> I agree "there's nothing wrong with V1SI vector types" and the original
> patch was trying to let vector reduction operation accept V1SI vector types.
> 
> However, if the only change was "<= 1" into "< 1" in
> get_vectype_for_scalar_type_and_size, then it will cause a GCC assertion
> failure in optab_for_tree_node for some shift operations. It seems all such
> failures come from:
> 
> vect_pattern_recog_1:4185
>   optab = optab_for_tree_code (code, type_in, optab_default);
> 
> The SUB_TYPE passed to optab_for_tree_code is "optab_default", however it is
> possible that vect_pattern_recog_1 will generate gimple containing shift
> operations, for example vect_recog_divmod_pattern will generate RSHIFT_EXPR,
> while shift operation requires sub_type to be not optab_default?

I think the issue is that with TARGET_VECTOR_MODE_SUPPORTED_P false
for V1SI you'll get a SImode vector type and the

  if (VECTOR_BOOLEAN_TYPE_P (type_in)
  || VECTOR_MODE_P (TYPE_MODE (type_in)))

check fails.  Try changing that to || VECTOR_TYPE_P (type_in).
The else path should be hopefully dead ...

With your TARGET_VECTOR_MODE_SUPPORTED_P setting you are essentially
making the vectorizer mix what was desired as "generic" vectorization
and regular vectorization...  if it works, fine ;)

>   gcc/optabs-tree.h:
> 
>   /* An extra flag to control optab_for_tree_code's behavior.  This is
> needed to
>  distinguish between machines with a vector shift that takes a scalar
> for the
>  shift amount vs. machines that take a vector for the shift amount.  */
> 
>   enum optab_subtype
>   {
> optab_default,
> optab_scalar,
> optab_vector
>   };
> 
> It looks to me like the other call sites of optab_for_tree_code which are
> passing optab_default are mostly places where GCC is sure it is not a shift
> operation.
> Given TYPE_IN is returned from get_vectype_for_scalar_type, is it safe to
> simply pass optab_vector in vect_pattern_recog_1?

I guess so.  Can you also try the above?

> I have verified the following middle-end fix also works for my port, it
> passed also x86-64 bootstrap and there is no regression in gcc/g++
> regression.
> 
> How is this new patch?
> 
> gcc/
> 2017-08-30  Jon Beniston  
> 
> * tree-vect-patterns.c (vect_pattern_recog_1): Pass optab_vector
> when
> calling optab_for_tree_code.
> * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Allow
> single
> element vector types.
> 
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index cfdb72c..4b39cb6 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -4182,7 +4182,7 @@ vect_pattern_recog_1 (vect_recog_func *recog_func,
>   code = CALL_EXPR;
> }
> 
> -  optab = optab_for_tree_code (code, type_in, optab_default);
> +  optab = optab_for_tree_code (code, type_in, optab_vector);
>vec_mode = TYPE_MODE (type_in);
>if (!optab
>|| (icode = optab_handler (optab, vec_mode)) == CODE_FOR_nothing
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 013fb1f..fc62efb 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
> scalar_type, unsigned size)
>else
>  simd_mode = mode_for_vector (inner_mode, size / nbytes);
>nunits = GET_MODE_SIZE (simd_mode) / nbytes;
> -  if (nunits <= 1)
> +  /* NOTE: nunits == 1 is allowed to support single element vector types.
> */
> +  if (nunits < 1)
>  return NULL_TREE;
> 
>

Re: [RFC] [PATCH] Introduce configure flag --with-stage1-cflags.

2017-08-30 Thread Martin Liška
On 08/28/2017 02:24 PM, Richard Biener wrote:
> On Fri, Aug 25, 2017 at 9:51 PM, Jeff Law  wrote:
>> On 07/31/2017 01:47 AM, Martin Liška wrote:
>>> I would like to ping this. Input from other people will be appreciated ;)
>> I think the thing to keep in mind here is that IIUC this only affects
>> things when we've configured using the --with-stage1-cflags option.
>>
>> So questions about is -O1 more stable than -O2, should we restrict -O2
>> to newer compilers, etc are really more about the defaults we set.
>>
>> My understanding is the patch is just adding the capability and does not
>> change the default.  Assuming that's the case, then I'm comfortable
>> acking the raw infrastructure.
> 
> OTOH you can simply set STAGE1_CFLAGS so the value of this
> as a configure option is somewhat questionable.

Yes, STAGE1_CFLAGS is a working solution. I'm not planning to install the patch.

Martin

> 
>> jeff



RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-08-30 Thread Jon Beniston
Hi Richard,

>> Ah, that's what I first tried, and mistakenly sent the patch against
that.
>> 
>> That did help the initial problem, but then I needed to add a lot of 
>> support for the V1SI type in the backend (which just duplicated SI 
>> patterns) and there are a few places where GCC seems to have sanity 
>> checks against vectors that are only one element wide. A dot product 
>> producing a scalar result seems natural.
>
>Where do you need V1SI types?  The vectorizer should perform a SI extract 
>from V1SI here.  You need to elaborate a bit here.

After adding single element vector type support in the middle-end, at the
tree level, V1SI vector types would be the result type for the dot product
if the input operands were V2HI.

During RTL expansion, if V1SImode is accepted in the
TARGET_VECTOR_MODE_SUPPORTED_P hook, then V1SImode will be used after RTL
expansion and V1SImode patterns are needed, although they are actually
duplicates of SImode patterns.  I have now removed V1SImode from
TARGET_VECTOR_MODE_SUPPORTED_P, and they are turned into SI mode so there is
no need for the V1SI patterns now.

>> The vectorizer doesn't really support vector->scalar ops so V1SI 
>> feels more natural here.  That is, your patch looks a bit ugly -- 
>> there's nothing wrong with V1SI vector types.

I agree "there's nothing wrong with V1SI vector types" and the original
patch was trying to let vector reduction operation accept V1SI vector types.

However, if the only change was "<= 1" into "< 1" in
get_vectype_for_scalar_type_and_size, then it will cause a GCC assertion
failure in optab_for_tree_node for some shift operations. It seems all such
failures come from:

vect_pattern_recog_1:4185
  optab = optab_for_tree_code (code, type_in, optab_default);

The SUB_TYPE passed to optab_for_tree_code is "optab_default", however it is
possible that vect_pattern_recog_1 will generate gimple containing shift
operations, for example vect_recog_divmod_pattern will generate RSHIFT_EXPR,
while shift operation requires sub_type to be not optab_default?

  gcc/optabs-tree.h:

  /* An extra flag to control optab_for_tree_code's behavior.  This is
needed to
 distinguish between machines with a vector shift that takes a scalar
for the
 shift amount vs. machines that take a vector for the shift amount.  */

  enum optab_subtype
  {
optab_default,
optab_scalar,
optab_vector
  };

It looks to me like the other call sites of optab_for_tree_code which are
passing optab_default are mostly places where GCC is sure it is not a shift
operation.
Given TYPE_IN is returned from get_vectype_for_scalar_type, is it safe to
simply pass optab_vector in vect_pattern_recog_1?

I have verified the following middle-end fix also works for my port, it
passed also x86-64 bootstrap and there is no regression in gcc/g++
regression.

How is this new patch?

gcc/
2017-08-30  Jon Beniston  

* tree-vect-patterns.c (vect_pattern_recog_1): Pass optab_vector
when
calling optab_for_tree_code.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Allow
single
element vector types.

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index cfdb72c..4b39cb6 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -4182,7 +4182,7 @@ vect_pattern_recog_1 (vect_recog_func *recog_func,
  code = CALL_EXPR;
}

-  optab = optab_for_tree_code (code, type_in, optab_default);
+  optab = optab_for_tree_code (code, type_in, optab_vector);
   vec_mode = TYPE_MODE (type_in);
   if (!optab
   || (icode = optab_handler (optab, vec_mode)) == CODE_FOR_nothing
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 013fb1f..fc62efb 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
scalar_type, unsigned size)
   else
 simd_mode = mode_for_vector (inner_mode, size / nbytes);
   nunits = GET_MODE_SIZE (simd_mode) / nbytes;
-  if (nunits <= 1)
+  /* NOTE: nunits == 1 is allowed to support single element vector types.
*/
+  if (nunits < 1)
 return NULL_TREE;

   vectype = build_vector_type (scalar_type, nunits);




Re: [PATCH] Fix file find utils and add unit tests (PR driver/81829).

2017-08-30 Thread Martin Liška
On 08/25/2017 06:41 PM, Martin Sebor wrote:
> On 08/18/2017 04:17 AM, Martin Liška wrote:
>> On 08/15/2017 02:45 PM, Martin Liška wrote:
>>> Hi.
>>>
>>> As shown in the PR, remove_prefix function is written wrongly. It does not 
>>> distinguish
>>> in between a node in linked list and pprefix->plist. So I decide to rewrite 
>>> it.
>>> Apart from that, I identified discrepancy in between do_add_prefix and 
>>> prefix_from_string
>>> where the later one appends always DIR_SEPARATOR (if missing). So I also 
>>> the former function.
>>> And I'm adding unit tests for all functions in file-find.c.
> 
> I know only very little about this API but from a quick glance at
> the change it looks to me like it introduces an implicit assumption
> that prefix points to a non-empty string.  If that is in fact one
> of the function's preconditions I would suggest to a) assert it
> before relying on it, and b) document it.  Otherwise, if the prefix
> is allowed to be empty then the code below is undefined in that
> case.
> 
> Martin
> 
> @@ -126,11 +127,22 @@ do_add_prefix (struct path_prefix *pprefix, const char 
> *prefix, bool first)
>/* Keep track of the longest prefix.  */
> 
>len = strlen (prefix);
> +  bool append_separator = !IS_DIR_SEPARATOR (prefix[len - 1]);
> +  if (append_separator)
> +len++;
> +
>if (len > pprefix->max_le

Thanks Martin.

I'm tending to simplify the functionality, let's see what will be discussed in 
the PR.

Martin


[PATCH] Fix last dwarf2out change with pubnames

2017-08-30 Thread Richard Biener

Noticed when trying to do sth on Darwin.  Lightly tested, committed
as obvious.

Richard.

2017-08-30  Richard Biener  

* dwarf2out.c (dwarf2out_finish): Remove setting AT_pubnames.
(dwarf2out_early_finish): Move setting of AT_pubnames from
early debug output to early finish.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 251449)
+++ gcc/dwarf2out.c (working copy)
@@ -29965,13 +29965,6 @@ dwarf2out_finish (const char *)
   *slot = ctnode;
 }
 
-  /* The AT_pubnames attribute needs to go in all skeleton dies, including
- both the main_cu and all skeleton TUs.  Making this call unconditional
- would end up either adding a second copy of the AT_pubnames attribute, or
- requiring a special case in add_top_level_skeleton_die_attrs.  */
-  if (!dwarf_split_debug_info)
-add_AT_pubnames (comp_unit_die ());
-
   if (dwarf_split_debug_info)
 {
   int mark;
@@ -30521,6 +30514,13 @@ dwarf2out_early_finish (const char *file
   for (limbo_die_node *node = limbo_die_list; node; node = node->next)
 note_variable_value (node->die);
 
+  /* The AT_pubnames attribute needs to go in all skeleton dies, including
+ both the main_cu and all skeleton TUs.  Making this call unconditional
+ would end up either adding a second copy of the AT_pubnames attribute, or
+ requiring a special case in add_top_level_skeleton_die_attrs.  */
+  if (!dwarf_split_debug_info)
+add_AT_pubnames (comp_unit_die ());
+
   /* The early debug phase is now finished.  */
   early_dwarf_finished = true;
 
@@ -30581,13 +30581,6 @@ dwarf2out_early_finish (const char *file
   *slot = ctnode;
 }
 
-  /* The AT_pubnames attribute needs to go in all skeleton dies, including
- both the main_cu and all skeleton TUs.  Making this call unconditional
- would end up either adding a second copy of the AT_pubnames attribute, or
- requiring a special case in add_top_level_skeleton_die_attrs.  */
-  if (!dwarf_split_debug_info)
-add_AT_pubnames (comp_unit_die ());
-
   /* Stick a unique symbol to the main debuginfo section.  */
   compute_comp_unit_symbol (comp_unit_die ());
 


Re: [PATCH] Fix IPA ICF with ASM statements (PR inline-asm/82001).

2017-08-30 Thread Martin Liška
On 08/30/2017 11:18 AM, Richard Biener wrote:
> On Wed, Aug 30, 2017 at 11:12 AM, Martin Liška  wrote:
>> Hi.
>>
>> Following patch compares also constraints of input and output operands of 
>> ASM statements.
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> It's now no longer "compare_tree_list_operand" but compare_asm_constraint
> where there's no need to walk TREE_CHAIN either.

Yep, it desires a refactoring.

Is it fine for trunk?

Martin

> 
> So can you refactor this a bit?
> 
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2017-08-28  Martin Liska  
>>
>> PR inline-asm/82001
>> * ipa-icf-gimple.c (func_checker::compare_tree_list_operand):
>> Compare TREE_PURPOSE of asm inputs and outputs.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-08-28  Martin Liska  
>>
>> PR inline-asm/82001
>> * gcc.dg/ipa/pr82001.c: New test.
>> ---
>>  gcc/ipa-icf-gimple.c   | 10 ++
>>  gcc/testsuite/gcc.dg/ipa/pr82001.c | 21 +
>>  2 files changed, 31 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr82001.c
>>
>>

>From dbbbf807a5721cb14da95fddbfe9afc32cd60ce7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 28 Aug 2017 13:57:07 +0200
Subject: [PATCH] Fix IPA ICF with ASM statements (PR inline-asm/82001).

gcc/ChangeLog:

2017-08-28  Martin Liska  

	PR inline-asm/82001
	* ipa-icf-gimple.c (func_checker::compare_tree_list_operand):
	Rename to ...
	(func_checker::compare_asm_inputs_outputs): ... this function.
	(func_checker::compare_gimple_asm): Use the function to compare
	also ASM constrains.
	* ipa-icf-gimple.h: Rename the function.

gcc/testsuite/ChangeLog:

2017-08-28  Martin Liska  

	PR inline-asm/82001
	* gcc.dg/ipa/pr82001.c: New test.
---
 gcc/ipa-icf-gimple.c   | 19 +--
 gcc/ipa-icf-gimple.h   |  6 +++---
 gcc/testsuite/gcc.dg/ipa/pr82001.c | 21 +
 3 files changed, 37 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr82001.c

diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index f44a995f580..b40dd8653b4 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -543,11 +543,8 @@ func_checker::compare_operand (tree t1, tree t2)
 }
 }
 
-/* Compares two tree list operands T1 and T2 and returns true if these
-   two trees are semantically equivalent.  */
-
 bool
-func_checker::compare_tree_list_operand (tree t1, tree t2)
+func_checker::compare_asm_inputs_outputs (tree t1, tree t2)
 {
   gcc_assert (TREE_CODE (t1) == TREE_LIST);
   gcc_assert (TREE_CODE (t2) == TREE_LIST);
@@ -560,6 +557,16 @@ func_checker::compare_tree_list_operand (tree t1, tree t2)
   if (!compare_operand (TREE_VALUE (t1), TREE_VALUE (t2)))
 	return return_false ();
 
+  tree p1 = TREE_PURPOSE (t1);
+  tree p2 = TREE_PURPOSE (t2);
+
+  gcc_assert (TREE_CODE (p1) == TREE_LIST);
+  gcc_assert (TREE_CODE (p2) == TREE_LIST);
+
+  if (strcmp (TREE_STRING_POINTER (TREE_VALUE (p1)),
+		  TREE_STRING_POINTER (TREE_VALUE (p2))) != 0)
+	return return_false ();
+
   t2 = TREE_CHAIN (t2);
 }
 
@@ -1008,7 +1015,7 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
   tree input1 = gimple_asm_input_op (g1, i);
   tree input2 = gimple_asm_input_op (g2, i);
 
-  if (!compare_tree_list_operand (input1, input2))
+  if (!compare_asm_inputs_outputs (input1, input2))
 	return return_false_with_msg ("ASM input is different");
 }
 
@@ -1017,7 +1024,7 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
   tree output1 = gimple_asm_output_op (g1, i);
   tree output2 = gimple_asm_output_op (g2, i);
 
-  if (!compare_tree_list_operand (output1, output2))
+  if (!compare_asm_inputs_outputs (output1, output2))
 	return return_false_with_msg ("ASM output is different");
 }
 
diff --git a/gcc/ipa-icf-gimple.h b/gcc/ipa-icf-gimple.h
index da904b5897e..7e69024165f 100644
--- a/gcc/ipa-icf-gimple.h
+++ b/gcc/ipa-icf-gimple.h
@@ -215,9 +215,9 @@ public:
  is returned.  */
   bool compare_operand (tree t1, tree t2);
 
-  /* Compares two tree list operands T1 and T2 and returns true if these
- two trees are semantically equivalent.  */
-  bool compare_tree_list_operand (tree t1, tree t2);
+  /* Compares GIMPLE ASM inputs (or outputs) where we iterate tree chain
+ and compare both TREE_PURPOSEs and TREE_VALUEs.  */
+  bool compare_asm_inputs_outputs (tree t1, tree t2);
 
   /* Verifies that trees T1 and T2, representing function declarations
  are equivalent from perspective of ICF.  */
diff --git a/gcc/testsuite/gcc.dg/ipa/pr82001.c b/gcc/testsuite/gcc.dg/ipa/pr82001.c
new file mode 100644
index 000..05e32b10ef5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr82001.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target 

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-08-30 Thread Jackson Woodruff

On 08/29/2017 01:13 PM, Richard Biener wrote:

On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff
 wrote:

Hi all,

Apologies again to those CC'ed, who (again) received this twice.

Joseph: Yes you are correct. I misread the original thread, now fixed.

Richard: I've moved the optimizations out of fold-const.c. One has been
replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C) I've
deleted as it only introduced a new optimization when running with the flags
'-O0 -funsafe-math-optimizations'.


Hmm, how did you verify that, that it only adds sth with -O0
-funsafe-math-optimizations?


By checking with various flags, although not exhaustively. I looked for 
reasons for the behavior in match.pd (explained below).


I have also since discovered that the combinations of 
'-funsafe-math-optimizations -frounding-math' (at all levels) and 
'-fno-recriprocal-math -funsafe-math-optimizations' mean this pattern 
adds something.



Is that because in GIMPLE the reassoc pass should do this transform?
It's because the pattern that changes (X / C) -> X * (1 / C) is gated 
with O1:


  (for cst (REAL_CST COMPLEX_CST VECTOR_CST)
   (simplify
(rdiv @0 cst@1)
->(if (optimize)
-> (if (flag_reciprocal_math
  && !real_zerop (@1))
  (with
   { tree tem = const_binop (RDIV_EXPR, type, build_one_cst (type), 
@1); }

   (if (tem)
(mult @0 { tem; } )))
  (if (cst != COMPLEX_CST)
   (with { tree inverse = exact_inverse (type, @1); }
(if (inverse)
 (mult @0 { inverse; } 


I've flagged the two lines that are particularly relevant to this.

Removing this pattern, as I would expect, means that the divisions in 
the above optimization (and the one further down) are not removed.


So then there is the question of edge cases. This pattern is (ignoring 
the second case) going to fail when const_binop returns null. Looking 
through that function says that it will fail (for reals) when:


- Either argument is null (not the case)

- The operation is not in the list (PLUS_EXPR,
  MINUS_EXPR, MULT_EXPR, RDIV_EXPR, MIN_EXPR, MAX_EXPR)
  (again not the case)

- We honor Signalling NaNs and one of the operands is a sNaN.

- The operation is a division, and the second argument is zero  
  and dividing by 0.0 raises an exception.

- The result is infinity and neither of the operands were infinity
  and flag_trapping_math is set.

- The result isn't exact and flag_rounding_math is set.


For (x / ( y * C) -> (x / C) / y), I will add some gating for each of 
these so that the pattern is never executed if one of these would be the 
case.


The additional cases where this isn't converted to a multiplication by 
the reciprocal appear to be when -freciprocal-math is used, but we don't 
have -fno-rounding-math, or funsafe-math-optimizations.


For the removal of (x / C +- y / C -> (x +- y) / C), there is a 
trade-off of whether it is worth having an extra pattern for these edge 
cases. On this I'm not sure.




You added

+/* Simplify x / (- y) to -x / y.  */
+(simplify
+ (rdiv @0 (negate @1))
+ (rdiv (negate @0) @1))


Sorry, this was unclear, the changes to fold const should be split up 
from the additions to match.pd.


This was one of the changes discussed before. The idea is to 
canonicalize this so that y can be extracted in the cse_reciprocals pass.





for

   /* (-A) / (-B) -> A / B  */
   if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1))
 return fold_build2_loc (loc, RDIV_EXPR, type,
 TREE_OPERAND (arg0, 0),
 negate_expr (arg1));
   if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0))
 return fold_build2_loc (loc, RDIV_EXPR, type,
 negate_expr (arg0),
 TREE_OPERAND (arg1, 0));

presumably?  This isn't equivalent.  It's more like

(simplify
   (rdiv (negate_expr_p @0) (negate @1))
   (rdiv (negate @0) @1))
(simplify
   (rdiv (negate @0) (negate_expr_p @1))
   (rdiv @0 (negate @1)))

and you should remove the corresponding fold-const.c code.

Please do these changes independently to aid bisecting in case of issues.


I'll resubmit two different patches when I can get them separated. It 
should also make it easier to review when it is clearer what belongs where.




  (if (flag_reciprocal_math)
- /* Convert (A/B)/C to A/(B*C)  */
+ /* Convert (A/B)/C to A/(B*C) where neither B nor C are constant.  */
   (simplify
(rdiv (rdiv:s @0 @1) @2)
-   (rdiv @0 (mult @1 @2)))
+   (if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@1) != REAL_CST)
+(rdiv @0 (mult @1 @2

why?  I guess to avoid ping-poning with


Yes, although there is a mistake there. It should be:

(if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@2) != REAL_CST)

I'll fix that up.



+ /* Simplify x / (C * y) to (x / C) / y where C is a constant.  */
+ (simplify
+  (rdiv @0
+   (mult @1 REAL_CST@2))
+  (rdiv (rdiv @0 

[RFC] Make 4-stage PGO bootstrap really working

2017-08-30 Thread Martin Liška
Hi.

This is follow up which I've just noticed. Main problem we have is that
an instrumented compiler w/ -fprofile-generate (built in $OBJDIR/gcc subfolder)
will generate all *.gcda files in a same dir as *.o files. That's problematic
because we then have *.gcda files spread in 'profile' subfolder (because 
profile'
compiler builds libgcc) and 'train' subfolder. Eventually in 'feedback' stage
we don't load any *.gcda files :/

Well I really hope we need to set -fprofile-generate=$folder to a $folder. 
There comes
second problem: all *.gcda files are created as $folder/$aux_base_name.gcda 
which makes
it useless as we multiple same file names:

$ find . -name expr.c
./libcpp/expr.c
./gcc/expr.c

Thus I suggest patch #0001 that appends full path of current work dir. Patch 
#0002 sets
a folder for PGO bootstrap. So far so good with a small exception: 
conftest.gcda files
that trigger -Wcoverage-mismatch. Can we remove these before a stage? Do we do 
a similar
thing somewhere?

Thoughts?
Thanks,
Martin
>From f00c6bd99b98df7c650bb3fcbbb983671c99caef Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 28 Aug 2017 11:58:47 +0200
Subject: [PATCH 2/2] Hack Makefile.tpl

---
 Makefile.in  | 5 +++--
 Makefile.tpl | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index 78db0982ba2..16b76906ad0 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -529,13 +529,14 @@ STAGE1_CONFIGURE_FLAGS = --disable-intermodule $(STAGE1_CHECKING) \
 	  --disable-coverage --enable-languages="$(STAGE1_LANGUAGES)" \
 	  --disable-build-format-warnings
 
-STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate
+profile_folder=`${PWD_COMMAND}`/gcov-profiles/
+STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate=$(profile_folder)
 STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEtrain_CFLAGS = $(STAGE3_CFLAGS)
 STAGEtrain_TFLAGS = $(STAGE3_TFLAGS)
 
-STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use
+STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use=$(profile_folder) -fdump-ipa-profile
 STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
 
 STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
diff --git a/Makefile.tpl b/Makefile.tpl
index 5fcd7e358d9..129175a579c 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -452,13 +452,14 @@ STAGE1_CONFIGURE_FLAGS = --disable-intermodule $(STAGE1_CHECKING) \
 	  --disable-coverage --enable-languages="$(STAGE1_LANGUAGES)" \
 	  --disable-build-format-warnings
 
-STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate
+profile_folder=`${PWD_COMMAND}`/gcov-profiles/
+STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate=$(profile_folder)
 STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEtrain_CFLAGS = $(STAGE3_CFLAGS)
 STAGEtrain_TFLAGS = $(STAGE3_TFLAGS)
 
-STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use
+STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use=$(profile_folder) -fdump-ipa-profile
 STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
 
 STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
-- 
2.14.1

>From 654ca05a0e1e0261a4477283ca2dd8678f62f1e7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 16 Aug 2017 10:22:57 +0200
Subject: [PATCH 1/2] Append PWD to path when using
 -fprofile-generate=/some/path.

---
 gcc/coverage.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/gcc/coverage.c b/gcc/coverage.c
index ed469107e3e..5780e19bbc8 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -1220,8 +1220,24 @@ coverage_init (const char *filename)
 g->get_passes ()->get_pass_profile ()->static_pass_number;
   g->get_dumps ()->dump_start (profile_pass_num, NULL);
 
-  if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
-profile_data_prefix = getpwd ();
+  if (!IS_ABSOLUTE_PATH (filename))
+{
+  if (profile_data_prefix)
+	{
+	  const char *pwd = getpwd ();
+	  unsigned l1 = strlen (profile_data_prefix);
+	  unsigned l2 = strlen (pwd);
+
+	  char *b = XNEWVEC (char, l1 + l2 + 2);
+	  memcpy (b, profile_data_prefix, l1);
+	  b[l1] = '/';
+	  memcpy (b + l1 + 1, pwd, l2);
+	  b[l1 + l2 + 1] = '\0';
+	  profile_data_prefix = b;
+	}
+  else
+	profile_data_prefix = getpwd ();
+}
 
   if (profile_data_prefix)
 prefix_len = strlen (profile_data_prefix);
-- 
2.14.1



Re: [PATCH] [MSP430] Pass -mcode/data-region to the linker and assembler

2017-08-30 Thread Nick Clifton
Hi Jozef,

> The changes made in a series of binutils patches
> (https://sourceware.org/ml/binutils/2017-08/msg00274.html)
> to ld and gas require the -mcode/data-region options to be propagated
> from gcc.
> 
> The attached patch adds that functionality.

Approved and applied.

Cheers
  Nick




Re: [PATCH] Fix PR82011, early LTO debug fallout

2017-08-30 Thread Richard Biener
On Tue, 29 Aug 2017, Richard Biener wrote:

> On Tue, 29 Aug 2017, Richard Biener wrote:
> 
> > 
> > The following avoids adding DW_AT_inline attributes twice on which
> > dsymutil complains.  The duplicate attribute is caused by stray
> > code I left in (I guess I hoped nothing ends up DECL_ABSTRACT_P ...).
> > Thus the following patch removes it -- DW_AT_inline is solely set
> > by dwarf2out_abstract_function now.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.  The
> > gdb testsuite shows no regression.
> > 
> > Will commit once testing finished.
> > 
> > Note the assert is deliberately restricted to DW_AT_inline for now
> > given enabling it unconditionally fires left and right ... :/
> > (sth to fix, but only as followups)
> 
> The following seems to cure it as far as preliminary testing goes.
> 
> Full bootstrap & regtest currently running on x86_64-unknown-linux-gnu.

Minor issue with the dwarf2out_early_global_decl hunk - we need to
re-process declaration DIEs which can end up generated by processing
member functions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, no gdb testsuite
regressions, applied to trunk.

Richard.

2017-08-30  Richard Biener  

* dwarf2out.c (add_dwarf_attr): Check we don't add duplicate
attributes.
(gen_subprogram_die): Add DW_AT_object_pointer only early.
(dwarf2out_early_global_decl): Only generate a DIE for the
abstract origin if it doesn't already exist or is a declaration DIE.
(resolve_addr): Do not add the linkage name twice when
generating a stub DIE for the DW_TAG_GNU_call_site target.

* g++.dg/pr78112-2.C: Do not expect duplicate DW_AT_object_pointer.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 251409)
+++ gcc/dwarf2out.c (working copy)
@@ -4129,7 +4129,7 @@ add_dwarf_attr (dw_die_ref die, dw_attr_
   dw_attr_node *a;
   unsigned ix;
   FOR_EACH_VEC_SAFE_ELT (die->die_attr, ix, a)
-   gcc_assert (a->dw_attr != attr->dw_attr || a->dw_attr != DW_AT_inline);
+   gcc_assert (a->dw_attr != attr->dw_attr);
 }
 
   vec_safe_reserve (die->die_attr, 1);
@@ -22334,7 +22334,8 @@ gen_subprogram_die (tree decl, dw_die_re
{
  dw_die_ref parm_die = gen_decl_die (parm, NULL, NULL, subr_die);
 
- if (parm == DECL_ARGUMENTS (decl)
+ if (early_dwarf
+ && parm == DECL_ARGUMENTS (decl)
  && TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE
  && parm_die
  && (dwarf_version >= 3 || !dwarf_strict))
@@ -25479,10 +25480,16 @@ dwarf2out_early_global_decl (tree decl)
 with C++ constructor clones for example and makes
 dwarf2out_abstract_function happy which requires the early
 DIE of the abstract instance to be present.  */
- if (DECL_ABSTRACT_ORIGIN (decl))
+ tree origin = DECL_ABSTRACT_ORIGIN (decl);
+ dw_die_ref origin_die;
+ if (origin != NULL
+ /* Do not emit the DIE multiple times but make sure to
+process it fully here in case we just saw a declaration.  */
+ && ((origin_die = lookup_decl_die (origin)) == NULL
+ || is_declaration_die (origin_die)))
{
- current_function_decl = DECL_ABSTRACT_ORIGIN (decl);
- dwarf2out_decl (DECL_ABSTRACT_ORIGIN (decl));
+ current_function_decl = origin;
+ dwarf2out_decl (origin);
}
 
  current_function_decl = decl;
@@ -29047,7 +29054,7 @@ resolve_addr (dw_die_ref die)
add_AT_flag (tdie, DW_AT_external, 1);
add_AT_flag (tdie, DW_AT_declaration, 1);
add_linkage_attr (tdie, tdecl);
-   add_name_and_src_coords_attributes (tdie, tdecl);
+   add_name_and_src_coords_attributes (tdie, tdecl, true);
equate_decl_number_to_die (tdecl, tdie);
  }
  }
Index: gcc/testsuite/g++.dg/pr78112-2.C
===
--- gcc/testsuite/g++.dg/pr78112-2.C(revision 251408)
+++ gcc/testsuite/g++.dg/pr78112-2.C(working copy)
@@ -2,7 +2,7 @@
 /* { dg-skip-if "No dwarf debug support" { hppa*-*-hpux* } } */
 /* { dg-options "-g -dA -gdwarf-4 -std=gnu++11" } */
 /* { dg-options "-g -dA -std=gnu++11 -gdwarf-4" } */
-/* { dg-final { scan-assembler-times DW_AT_object_pointer 18 } } */
+/* { dg-final { scan-assembler-times DW_AT_object_pointer 12 } } */
 
 void run (int *int_p, void(*func)(int *)) { func (int_p); }
 namespace foo {


Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned memory access

2017-08-30 Thread Tamar Christina
Hi,

The test I used was testsuite/gcc.c-torture/compile/structs.c

Thanks,
Tamar

From: H.J. Lu 
Sent: Friday, August 25, 2017 8:10:22 PM
To: Tamar Christina
Cc: Richard Sandiford; GCC Patches; nd; James Greenhalgh; Richard Earnshaw; 
Marcus Shawcroft
Subject: Re: [PATCH][GCC][AARCH64]Bad code-gen for structure/block/unaligned 
memory access

On Sun, Jul 2, 2017 at 11:17 PM, Tamar Christina
 wrote:
> Hi All,
>
> Sorry I just realized I never actually sent the updated patch...
>
> So here it is :)
>

Are there any testcases?

--
H.J.


Re: [PATCH] Fix IPA ICF with ASM statements (PR inline-asm/82001).

2017-08-30 Thread Richard Biener
On Wed, Aug 30, 2017 at 11:12 AM, Martin Liška  wrote:
> Hi.
>
> Following patch compares also constraints of input and output operands of ASM 
> statements.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

It's now no longer "compare_tree_list_operand" but compare_asm_constraint
where there's no need to walk TREE_CHAIN either.

So can you refactor this a bit?

> Martin
>
> gcc/ChangeLog:
>
> 2017-08-28  Martin Liska  
>
> PR inline-asm/82001
> * ipa-icf-gimple.c (func_checker::compare_tree_list_operand):
> Compare TREE_PURPOSE of asm inputs and outputs.
>
> gcc/testsuite/ChangeLog:
>
> 2017-08-28  Martin Liska  
>
> PR inline-asm/82001
> * gcc.dg/ipa/pr82001.c: New test.
> ---
>  gcc/ipa-icf-gimple.c   | 10 ++
>  gcc/testsuite/gcc.dg/ipa/pr82001.c | 21 +
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr82001.c
>
>


[TESTSUITE]Use strncpy instead of strcpy in testsuite/gcc.dg/memcmp-1.c

2017-08-30 Thread Renlin Li

Hi,

In test_driver_memcmp function, I found buf1 and buf2 is not properly
terminated with null character.

In lib_strncmp, strcpy will be called with buf1 and buf2.
The normal implementation of strcpy function has a loop to copy character from 
source
to destination one by one until a null character is encountered.

If the string is not properly terminated, this will cause the strcpy read/write
memory beyond the boundary.

Here I changed the strcpy into strncpy to constraint the function to visit
legal memory only.

Test Okay without any problem. Okay to commit?

Regard,
Renlin


gcc/testsuite/ChangeLog:

2017-08-30  Renlin Li  

* gcc.dg/memcmp-1.c (test_strncmp): Use strncpy instead of strcpy.
diff --git a/gcc/testsuite/gcc.dg/memcmp-1.c b/gcc/testsuite/gcc.dg/memcmp-1.c
index 828a0ca..d258354 100644
--- a/gcc/testsuite/gcc.dg/memcmp-1.c
+++ b/gcc/testsuite/gcc.dg/memcmp-1.c
@@ -110,8 +110,8 @@ static void test_strncmp_ ## SZ ## _ ## ALIGN (const char *str1, const char *str
 	{\
 	  a = three+i*ALIGN+j*(4096-2*i*ALIGN);\
 	  b = four+i*ALIGN+j*(4096-2*i*ALIGN);\
-	  strcpy(a,str1);		\
-	  strcpy(b,str2);		\
+	  strncpy(a,str1,SZ);		\
+	  strncpy(b,str2,SZ);		\
 	  r = strncmp(a,b,SZ);		\
 	  if ( r < 0 && !(expect < 0) ) abort();			\
 	  if ( r > 0 && !(expect > 0) )	abort();			\


[PATCH] Fix IPA ICF with ASM statements (PR inline-asm/82001).

2017-08-30 Thread Martin Liška
Hi.

Following patch compares also constraints of input and output operands of ASM 
statements.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2017-08-28  Martin Liska  

PR inline-asm/82001
* ipa-icf-gimple.c (func_checker::compare_tree_list_operand):
Compare TREE_PURPOSE of asm inputs and outputs.

gcc/testsuite/ChangeLog:

2017-08-28  Martin Liska  

PR inline-asm/82001
* gcc.dg/ipa/pr82001.c: New test.
---
 gcc/ipa-icf-gimple.c   | 10 ++
 gcc/testsuite/gcc.dg/ipa/pr82001.c | 21 +
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr82001.c


diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index f44a995f580..c08a43c1cdc 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -560,6 +560,16 @@ func_checker::compare_tree_list_operand (tree t1, tree t2)
   if (!compare_operand (TREE_VALUE (t1), TREE_VALUE (t2)))
 	return return_false ();
 
+  tree p1 = TREE_PURPOSE (t1);
+  tree p2 = TREE_PURPOSE (t2);
+
+  gcc_assert (TREE_CODE (p1) == TREE_LIST);
+  gcc_assert (TREE_CODE (p2) == TREE_LIST);
+
+  if (strcmp (TREE_STRING_POINTER (TREE_VALUE (p1)),
+		  TREE_STRING_POINTER (TREE_VALUE (p2))) != 0)
+	return return_false ();
+
   t2 = TREE_CHAIN (t2);
 }
 
diff --git a/gcc/testsuite/gcc.dg/ipa/pr82001.c b/gcc/testsuite/gcc.dg/ipa/pr82001.c
new file mode 100644
index 000..05e32b10ef5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr82001.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fdump-ipa-icf-details"  } */
+
+int
+mullo (int a, int b)
+{
+  asm("mul %%edx   # %%1 was %1"
+  : "+"
+	"a"(a),
+	"+d"(b));
+  return a;
+}
+
+int
+mulhi (int a, int b)
+{
+  asm("mul %%edx   # %%1 was %1" : "+d"(a), "+a"(b));
+  return a;
+}
+
+/* { dg-final { scan-ipa-dump "Equal symbols: 0" "icf"  } } */



Re: [PATCH] [MSP430] [PR80993] Prevent lto removing interrupt handlers

2017-08-30 Thread Richard Biener
On Tue, Aug 29, 2017 at 5:47 PM, DJ Delorie  wrote:
>
> Richard Biener  writes:
>> Humm... don't you have to register interrupt handlers somehow?
>
> MSP430 uses an "if they're present, they're registered" approach, so
> it's driven by the user tagging functions as interrupts - the linker
> notices that they're present and links them into the interrupt table for
> you.  So, adding "used" is consistent with this method, as the interrupt
> attribute *is* the registration.

Ah, thanks for the clarification.


Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-30 Thread Rainer Orth
Hi Martin,

> Yes, sorry for the missing entries. Fixed and installed as r251412.
> Hopefully fall out will be small.

there's

UNRESOLVED: gcc.dg/tree-ssa/vrp104.c scan-tree-dump-times switchlower "switch" 1

everywhere, it seems.  gcc.log has

gcc.dg/tree-ssa/vrp104.c: dump file does not exist

For one, the testcase needs -fdump-tree-switchlower now, and on
i386-pc-solaris2.11 I find 2 instances of switch in the dump...

Please fix.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[rs6000] int->machine_mode in rs6000-c.c

2017-08-30 Thread Richard Sandiford
Tested on powerpc64le-linux-gnu and installed as obvious.


2017-08-30  Richard Sandiford  

gcc/
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Use machine_mode rather than int for arg1_mode.

Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c2017-08-30 09:17:07.677166653 +0100
+++ gcc/config/rs6000/rs6000-c.c2017-08-30 09:21:31.139741443 +0100
@@ -6729,7 +6729,7 @@ altivec_resolve_overloaded_builtin (loca
 else if (fcode == P9V_BUILTIN_VEC_VSIEDP)
   {
int overloaded_code;
-   int arg1_mode = TYPE_MODE (types[0]);
+   machine_mode arg1_mode = TYPE_MODE (types[0]);
 
if (nargs != 2)
  {


Re: [PATCH] Fix bug in simplify_ternary_operation

2017-08-30 Thread Tom de Vries

On 08/28/2017 08:26 PM, Tom de Vries wrote:

Hi,

I think I found a bug in r17465:
...

   * cse.c (simplify_ternary_operation): Handle more IF_THEN_ELSE
   simplifications.

diff --git a/gcc/cse.c b/gcc/cse.c
index e001597..3c27387 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -4713,6 +4713,17 @@ simplify_ternary_operation (code, mode, 
op0_mode, op0, op1, op2)


Note: the parameters of simplify_ternary_operation have the following 
meaning:

...
/* Simplify CODE, an operation with result mode MODE and three operands,
OP0, OP1, and OP2.  OP0_MODE was the mode of OP0 before it became
a constant.  Return 0 if no simplifications is possible.  */

rtx
simplify_ternary_operation (code, mode, op0_mode, op0, op1, op2)
  enum rtx_code code;
  enum machine_mode mode, op0_mode;
  rtx op0, op1, op2;
...


  && rtx_equal_p (XEXP (op0, 1), op1)
  && rtx_equal_p (XEXP (op0, 0), op2))
return op2;
+  else if (! side_effects_p (op0))
+   {
+ rtx temp;
+ temp = simplify_relational_operation (GET_CODE (op0), op0_mode,
+   XEXP (op0, 0), XEXP 
(op0, 1));


We're handling code == IF_THEN_ELSE here, so op0 is the condition, op1 
is the 'then expr' and op2 is the 'else expr'.


The parameters of simplify_relational_operation have the following meaning:
...
/* Like simplify_binary_operation except used for relational operators.
MODE is the mode of the operands, not that of the result.  If MODE
is VOIDmode, both operands must also be VOIDmode and we compare the
operands in "infinite precision".

If no simplification is possible, this function returns zero.
Otherwise, it returns either const_true_rtx or const0_rtx.  */

rtx
simplify_relational_operation (code, mode, op0, op1)
  enum rtx_code code;
  enum machine_mode mode;
  rtx op0, op1;
...

The problem in the patch is that we use op0_mode argument for the mode 
parameter. The mode parameter of simplify_relational_operation needs to 
be the mode of the operands of the condition, while op0_mode is the mode 
of the condition.


Patch below fixes this on current trunk.

[ I found this by running into an ICE in 
gcc.c-torture/compile/pr28776-2.c for gcn target. I haven't been able to 
reproduce this with an upstream branch yet. ]


Filed as PR82020 - "ICE in decompose at rtl.h:2126" ( 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82020 ).




OK for trunk if bootstrap and reg-test for x86_64 succeeds?


bootstrap and reg-test for x86_64 done, no issues found.

Thanks,
- Tom

[ reposting patch with ChangeLog entry ]
Fix comparison mode in simplify_ternary_operation

2017-08-29  Tom de Vries  

	PR rtl-optimization/82020
	* simplify-rtx.c (simplify_ternary_operation): Fix comparison mode of
	IF_THEN_ELSE condition.

---
 gcc/simplify-rtx.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 0133d43..fbf979b 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5567,8 +5567,6 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 	  XEXP (op0, 0), XEXP (op0, 1));
 	}
 
-	  if (cmp_mode == VOIDmode)
-	cmp_mode = op0_mode;
 	  temp = simplify_relational_operation (GET_CODE (op0), op0_mode,
 			  			cmp_mode, XEXP (op0, 0),
 		XEXP (op0, 1));