[PATCH] Improve code generation of v += (c == 0) etc. on x86 (PR target/92140)

2019-10-18 Thread Jakub Jelinek
Hi!

As mentioned in the PR, x == 0 can be equivalently tested as x < 1U
and the latter form has the advantage that it sets the carry flag and if it
is consumed by an instruction that can directly use the carry flag, it is a
win.
The following patch adds a couple of (pre-reload only) define_insn_and_split
to handle the most common cases.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-10-18  Jakub Jelinek  
Uroš Bizjak  

PR target/92140
* config/i386/predicates.md (int_nonimmediate_operand): New special
predicate.
* config/i386/i386.md (*add3_eq, *add3_ne,
*add3_eq_0, *add3_ne_0, *sub3_eq, *sub3_ne,
*sub3_eq_1, *sub3_eq_0, *sub3_ne_0): New
define_insn_and_split patterns.

* gcc.target/i386/pr92140.c: New test.
* gcc.c-torture/execute/pr92140.c: New test.

--- gcc/config/i386/predicates.md.jj2019-10-07 13:09:06.486261815 +0200
+++ gcc/config/i386/predicates.md   2019-10-18 15:47:50.781855838 +0200
@@ -100,6 +100,15 @@ (define_special_predicate "ext_register_
(match_test "GET_MODE (op) == SImode")
(match_test "GET_MODE (op) == HImode"
 
+;; Match a DI, SI, HI or QImode nonimmediate_operand.
+(define_special_predicate "int_nonimmediate_operand"
+  (and (match_operand 0 "nonimmediate_operand")
+   (ior (and (match_test "TARGET_64BIT")
+(match_test "GET_MODE (op) == DImode"))
+   (match_test "GET_MODE (op) == SImode")
+   (match_test "GET_MODE (op) == HImode")
+   (match_test "GET_MODE (op) == QImode"
+
 ;; Match register operands, but include memory operands for TARGET_SSE_MATH.
 (define_predicate "register_ssemem_operand"
   (if_then_else
--- gcc/config/i386/i386.md.jj  2019-09-20 12:25:48.0 +0200
+++ gcc/config/i386/i386.md 2019-10-18 15:52:22.697717013 +0200
@@ -6843,6 +6843,228 @@ (define_insn "*addsi3_zext_cc_overflow_2
   [(set_attr "type" "alu")
(set_attr "mode" "SI")])
 
+;; x == 0 with zero flag test can be done also as x < 1U with carry flag
+;; test, where the latter is preferrable if we have some carry consuming
+;; instruction.
+;; For x != 0, we need to use x < 1U with negation of carry, i.e.
+;; + (1 - CF).
+(define_insn_and_split "*add3_eq"
+  [(set (match_operand:SWI 0 "nonimmediate_operand")
+   (plus:SWI
+ (plus:SWI
+   (eq:SWI (match_operand 3 "int_nonimmediate_operand") (const_int 0))
+   (match_operand:SWI 1 "nonimmediate_operand"))
+ (match_operand:SWI 2 "")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (PLUS, mode, operands)
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 3) (const_int 1)))
+   (parallel [(set (match_dup 0)
+  (plus:SWI
+(plus:SWI (ltu:SWI (reg:CC FLAGS_REG) (const_int 0))
+  (match_dup 1))
+(match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_insn_and_split "*add3_ne"
+  [(set (match_operand:SWI 0 "nonimmediate_operand")
+   (plus:SWI
+ (plus:SWI
+   (ne:SWI (match_operand 3 "int_nonimmediate_operand") (const_int 0))
+   (match_operand:SWI 1 "nonimmediate_operand"))
+ (match_operand:SWI 2 "")))
+   (clobber (reg:CC FLAGS_REG))]
+  "CONST_INT_P (operands[2])
+   && (mode != DImode
+   || INTVAL (operands[2]) != HOST_WIDE_INT_C (-0x8000))
+   && ix86_binary_operator_ok (PLUS, mode, operands)
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 3) (const_int 1)))
+   (parallel [(set (match_dup 0)
+  (minus:SWI
+(minus:SWI (match_dup 1)
+   (ltu:SWI (reg:CC FLAGS_REG) (const_int 0)))
+(match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[2] = gen_int_mode (~INTVAL (operands[2]),
+ mode == DImode ? SImode : mode);
+})
+
+(define_insn_and_split "*add3_eq_0"
+  [(set (match_operand:SWI 0 "nonimmediate_operand")
+   (plus:SWI
+ (eq:SWI (match_operand 2 "int_nonimmediate_operand") (const_int 0))
+ (match_operand:SWI 1 "")))
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_unary_operator_ok (PLUS, mode, operands)
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (reg:CC FLAGS_REG)
+   (compare:CC (match_dup 2) (const_int 1)))
+   (parallel [(set (match_dup 0)
+  (plus:SWI (ltu:SWI (reg:CC FLAGS_REG) (const_int 0))
+(match_dup 1)))
+ (clobber (reg:CC FLAGS_REG))])]
+{
+  if (!nonimmediate_operand (operands[1], mode))
+operands[1] = force_reg (mode, operands[1]);
+})
+
+(define_insn_and_split "*add3_ne_0"
+  [(set (match_operand:SWI 0 "nonimmediate_operand")
+   (plus:SWI
+ (ne:SWI (match_operand 2 "int_nonimmediate_operand") 

[Bug middle-end/86575] [7/8 Regression] -Wimplicit-fallthrough affects code generation

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86575

--- Comment #8 from Eric Gallager  ---
(In reply to Michael Matz from comment #7)
> As I stated, it's only fixed in trunk, so it's still a regression in 7 and 8,
> as marked in the summary.

But you also said you weren't planning on backporting though:

(In reply to Michael Matz from comment #5)
> Fixed in trunk.  Not planning backporting, it's not a very important problem.

So, if you're not going to backport to 7 or 8, then, does this bug really need
to stay open?

[Bug libfortran/91593] Implicit enum conversions in libgfortran/io/transfer.c

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91593

--- Comment #8 from Eric Gallager  ---
(In reply to Jerry DeLisle from comment #7)
> Author: jvdelisle
> Date: Wed Oct  2 02:35:14 2019
> New Revision: 276439
> 
> URL: https://gcc.gnu.org/viewcvs?rev=276439=gcc=rev
> Log:
> 2019-10-01  Jerry DeLisle  
> 
>   PR libfortran/91593
>   * io/read.c (read_decimal): Cast constant to size_t to turn off
>   a bogus warning.
>   * io/write.c (btoa_big): Use memset in lieu of setting the null
>   byte in a string buffer to turn off a bogus warning.
> 
> Modified:
> trunk/libgfortran/ChangeLog
> trunk/libgfortran/io/read.c
> trunk/libgfortran/io/write.c

Did this fix it?

[Bug c/60591] Report enum conversions as part of Wconversion

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60591

--- Comment #5 from Eric Gallager  ---
(In reply to Eric Gallager from comment #2)
> There are several other bugs open like this one, such as bug 78736

This is fixed now. It's probably still worth checking some of the other bugs
under its "See Also" section though.

[Bug c++/52763] Warning if compare between enum and non-enum type

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52763

Eric Gallager  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-10-19
 Ever confirmed|0   |1

--- Comment #11 from Eric Gallager  ---
(In reply to Martin Sebor from comment #8)
> Clang warns when an enum object is compared to a constant that's out of the
> most restricted range of the enum's type.  The warning is in -Wall.  It
> doesn't warn when the object is compared to a constant that doesn't
> correspond to any of the type's enumerators.  I can see that being useful to
> some (carefully written) projects but suspect it could be quite noisy for
> many others.
> 
> $ cat t.C && clang++ -S -Wall -Wextra t.C
> enum E { NONE = 0, ONE = 1, TWO = 2 };
> 
> bool f (E e)
> {
>   return e == 3;   // no warning here
> }
> 
> bool g (E e)
> {
>   return e == 4;
> }
> 
> 
> t.C:10:12: warning: comparison of constant 4 with expression of type 'E' is
>   always false [-Wtautological-constant-out-of-range-compare]
>   return e == 4;
>  ~ ^  ~
> 1 warning generated.

I combined this testcase with the testcase in the original report and can
confirm that there is still no warning even after the addition of
-Wenum-conversion in bug 78736 (when compiling with either the C or C++
frontends)

$ cat 52763.cc
#ifdef __cplusplus
# include 
#else
# include 
#endif /* __cplusplus */

typedef enum {
NONE = 0, ONE = 1, TWO = 2
} tEnumType;

bool f(tEnumType e)
{
return (e == 3);   // no warning here
}

bool g(tEnumType e)
{
return (e == 4);
}

int main(void)
{
tEnumType var1 = TWO;
//Warn here, because we compare an enum to a non-enum type (1)
//should be 'if (var1 == ONE)'
if (var1 == 1)
return f(NONE);
else
return g(ONE);
}
$ /usr/local/bin/g++ -c -S -Wall -Wextra -Wconversion -pedantic 52763.cc
$ /usr/local/bin/gcc -c -S -Wall -Wextra -Wconversion -pedantic
-Wenum-conversion -Wc++-compat -x c 52763.cc
$

(no output)

[Bug c/78736] enum warnings in GCC (request for -Wenum-conversion to be added)

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78736

Eric Gallager  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #18 from Eric Gallager  ---
(In reply to Jonny Grant from comment #17)
> Hello Joseph
> 
> This was the test case I created. There isn't any warning output when 'a_t'
> is converted to 'int'. Or even if it was converted to an 'unsigned int'
> 
> https://gcc.gnu.org/ml/gcc-help/2019-07/msg00014.html
> 
> 
> //-O2 -Wall -Wextra -Wconversion -Werror
> 
> #include 
> typedef enum
> {
> a = -1
> } a_t;
> 
> a_t f()
> {
> return a;
> }
> 
> int main()
> {
> int b = f();
> return b;
> }

While it's true that g++ prints no warnings for that testcase, I think that's
material for a separate bug. The original bug here was just to add
-Wenum-conversion for the C front-end, which has now been done, so I'm closing
this. Feel free to open new bugs for any missed cases.

[Bug c++/87403] [Meta-bug] Issues that suggest a new warning

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87403
Bug 87403 depends on bug 78736, which changed state.

Bug 78736 Summary: enum warnings in GCC (request for -Wenum-conversion to be 
added)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78736

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug c/7654] warn if an enum is being assigned a non enum value

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=7654

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org

--- Comment #12 from Eric Gallager  ---
(In reply to Martin Sebor from comment #11)
> I'll confirm this ancient request.
> 
> Bug 78736 asks for something similar, and I'm working on enhancing the
> solution there even further (to diagnose assigning constants that don't have
> a corresponding enumerator in the destination type).  I'll add that on the
> following slightly modified test case  Clang issues the warnings below: 
> 
> $ cat t.C && clang -S -Wall -Wextra -Weverything -xc t.C
> void f (int i)
> {
>   enum e1 { e1a, e1b };
>   enum e1 e1v;
>   enum e2 { e2a, e2b };
>   enum e2 e2v;
> 
>   e1v = 1;   // no warning
>   e1v = 3;   // warning
>   e1v = e1a; // ok
>   e2v = e1v; // warning
>   i = e1v;   // ok I guess
>   e2v = i;   // warning
> }
> t.C:9:9: warning: integer constant not in range of enumerated type 'enum e1'
>   [-Wassign-enum]
>   e1v = 3;   // warning
> ^
> t.C:11:9: warning: implicit conversion from enumeration type 'enum e1' to
>   different enumeration type 'enum e2' [-Wenum-conversion]
>   e2v = e1v; // warning
>   ~ ^~~
> t.C:13:9: warning: implicit conversion changes signedness: 'int' to 'enum e2'
>   [-Wsign-conversion]
>   e2v = i;   // warning
>   ~ ^
> t.C:1:6: warning: no previous prototype for function 'f'
> [-Wmissing-prototypes]
> void f (int i)
>  ^
> 4 warnings

gcc now prints:

$ /usr/local/bin/gcc -c -S -Wall -Wextra -Wconversion -Wsign-conversion
-Wmissing-prototypes -pedantic -xc 7654.c
7654.c:1:6: warning: no previous prototype for 'f' [-Wmissing-prototypes]
1 | void f (int i)
  |  ^
7654.c: In function 'f':
7654.c:11:6: warning: implicit conversion from 'enum e1' to 'enum e2'
[-Wenum-conversion]
   11 |  e2v = e1v; // warning
  |  ^
7654.c:6:10: warning: variable 'e2v' set but not used
[-Wunused-but-set-variable]
6 |  enum e2 e2v;
  |  ^~~
$

so, gcc has -Wenum-conversion now, but it is still missing warnings from
-Wassign-enum and -Wsign-conversion on this testcase.

[Bug objc/77404] Add Wobjc-root-class

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77404

Eric Gallager  changed:

   What|Removed |Added

 CC||mikestump at comcast dot net

--- Comment #5 from Eric Gallager  ---
cc-ing other objc maintainer

[Bug target/82240] i386.md & -Wlogical-op in build

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82240

Eric Gallager  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-10-19
 Ever confirmed|0   |1

--- Comment #12 from Eric Gallager  ---
(In reply to Eric Gallager from comment #10)
> *** Bug 83863 has been marked as a duplicate of this bug. ***

Taking the dup as confirmation

Re: Improving GCC's line table information to help GDB

2019-10-18 Thread Alexandre Oliva
On Oct 16, 2019, Luis Machado  wrote:

> It seems, from reading the blog post about SFN's, that it was meant to
> help with debugging optimized binaries.

Indeed.  Getting rid of the dummy jumps would be one kind of
optimization, and then SFN might help preserve some of the loss of
location info in some cases.  However, SFN doesn't kick in at -O0
because the dummy jumps and all other artifacts of unoptimized code are
retained anyway, so SFN wouldn't have a chance to do any of the good
it's meant to do there.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[Bug bootstrap/86518] Strengthen bootstrap comparison by not enabling warnings at stage3

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86518

--- Comment #12 from Eric Gallager  ---
(In reply to Jonathan Wakely from comment #11)
> (In reply to Eric Gallager from comment #10)
> > If this is becoming the meta-bug for all warnings that affect codegen, then
> > I'd like to add bug 61579 (-Wwrite-strings) as at least semi-related... (not
> > sure if toggling it causes miscompares though)
> 
> That one is *supposed* to affect codegen (so maybe it should be a -f option
> instead).

hm, I could've sworn there was already an -f option with a similar name, but
after checking the manual I can't seem to find it after all, so... I guess
making it an -f option could work.

Re: [ C++ ] [ PATCH ] [ RFC ] p1301 - [[nodiscard("should have a reason")]]

2019-10-18 Thread Jason Merrill

On 10/18/19 1:54 AM, JeanHeyd Meneide wrote:

... And I am very tired and forgot to attach the patch. Again. Sorry...!

On Fri, Oct 18, 2019 at 1:54 AM JeanHeyd Meneide
 wrote:


Dear Jason,

On Thu, Oct 17, 2019 at 3:51 PM Jason Merrill  wrote:

  > FAIL: g++.dg/cpp0x/gen-attrs-67.C  -std=c++11  (test for errors, line 8)
  > FAIL: g++.dg/cpp1z/feat-cxx1z.C  -std=gnu++17 (test for excess errors)
  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11 (test for excess errors)
  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11  (test for warnings, line 12)
  > FAIL: g++.dg/cpp1z/nodiscard4.C  -std=c++11  (test for warnings, line 13)
  > FAIL: g++.dg/cpp2a/feat-cxx2a.C   (test for excess errors)


  Sorry about that! I implemented a bit of a better warning to
cover gen-attrs-67, and bumped the feature test macro value checks in
the feat tests. The rest should be fine now too.

  Let me know if anything else seems off!



+ "%qE attribute%<'%>s argument list is empty",


Using %<'%> results in "attribute'''s" in the output, not what we want 
here; that's only for when you are trying to give a diagnostic about an 
apostrophe in the source.  You can use %' for an apostrophe, but I think 
we might as well drop the 's entirely.  Here's what I'm committing.


Thanks a lot for the patch!
commit 073503af93e3409553fc32107cbcd316ddabc7c8
Author: JeanHeyd Meneide 
Date:   Fri Oct 18 01:54:47 2019 -0400

Implement C++20 P1301 [[nodiscard("should have a reason")]].

2019-10-17  JeanHeyd Meneide  

gcc/
* escaped_string.h (escaped_string): New header.
* tree.c (escaped_string): Remove escaped_string class.

gcc/c-family
* c-lex.c (c_common_has_attribute): Update nodiscard value.

gcc/cp/
* tree.c (handle_nodiscard_attribute) Added C++2a nodiscard
string message.
(std_attribute_table) Increase nodiscard argument handling
max_length from 0 to 1.
* parser.c (cp_parser_check_std_attribute): Add requirement
that nodiscard only be seen once in attribute-list.
(cp_parser_std_attribute): Check that empty parenthesis lists are
not specified for attributes that have max_length > 0 (e.g.
[[attr()]]).
* cvt.c (maybe_warn_nodiscard): Add nodiscard message to
output, if applicable.
(convert_to_void): Allow constructors to be nodiscard-able (P1771).

gcc/testsuite/g++.dg/cpp0x
* gen-attrs-67.C: Test new error message for empty-parenthesis-list.

gcc/testsuite/g++.dg/cpp2a
* nodiscard-construct.C: New test.
* nodiscard-once.C: New test.
* nodiscard-reason-nonstring.C: New test.
* nodiscard-reason-only-one.C: New test.
* nodiscard-reason.C: New test.

Reviewed-by: Jason Merrill 

diff --git a/gcc/escaped_string.h b/gcc/escaped_string.h
new file mode 100644
index 000..b83e1281f27
--- /dev/null
+++ b/gcc/escaped_string.h
@@ -0,0 +1,43 @@
+/* Shared escaped string class.
+   Copyright (C) 1999-2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_ESCAPED_STRING_H
+#define GCC_ESCAPED_STRING_H
+
+#include 
+
+/* A class to handle converting a string that might contain
+   control characters, (eg newline, form-feed, etc), into one
+   in which contains escape sequences instead.  */
+
+class escaped_string
+{
+ public:
+  escaped_string () { m_owned = false; m_str = NULL; };
+  ~escaped_string () { if (m_owned) free (m_str); }
+  operator const char *() const { return m_str; }
+  void escape (const char *);
+ private:
+  escaped_string(const escaped_string&) {}
+  escaped_string& operator=(const escaped_string&) { return *this; }
+  char *m_str;
+  bool  m_owned;
+};
+
+#endif /* ! GCC_ESCAPED_STRING_H */
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e3c602fbb8d..fb05b5f8af0 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -353,13 +353,14 @@ c_common_has_attribute (cpp_reader *pfile)
 	  else if (is_attribute_p ("deprecated", attr_name))
 		result = 201309;
 	  else if (is_attribute_p ("maybe_unused", attr_name)
-		   || is_attribute_p ("nodiscard", attr_name)
 		   || is_attribute_p ("fallthrough", attr_name))
 		result = 201603;
 	  

Re: RFA [PATCH] * lock-and-run.sh: Check for process existence rather than timeout.

2019-10-18 Thread Alexandre Oliva
Hello, Jason,

On Oct 14, 2019, Jason Merrill  wrote:

> Alex, you had a lot of helpful comments when I first wrote this, any thoughts
> on this revision?

I think the check of the pid file could be made slightly simpler and
cheaper if we created it using:

   echo $$ > $lockdir/pidT && mv $lockdir/pidT $lockdir/pid

instead of

> +touch $lockdir/$$



> + pid="`(cd $lockdir; echo *)`"

The ""s are implicit in a shell assignment, though there really
shouldn't be more than one PID-named file in the dir.  With the change
suggested above, this would become

pid=`cat $lockdir/pid 2>/dev/null`

There's a slight possibility of hitting this right between the creation
of the dir and the creation of the pid file, thus the 2>/dev/null.

> + if ps "$pid" >/dev/null; then

could be tested with much lower overhead:

if test -z "$pid" || kill -0 $pid ; then

though it might make sense to have a different test and error message
for the case of the absent pid file.

We might also wish to use different lock-breaking logic for that case,
too, e.g. checking that the timestamp of the dir didn't change by
comparing `ls -ld $lockdir` with what we got 30 seconds before.  If it
changed or the output is now empty, we just lost the race again.

It's unlikely that the dir would remain unchanged for several seconds
without the pid file, so if we get the same timestamp after 30 seconds,
it's likely that something went wrong with the lock holder, though it's
not impossible to imagine a scenario in which the lock program that just
succeeded in creating the dir got stopped (^Z) or killed-9 just before
creating the PID file.


Even then, maybe breaking the lock is not such a great idea in
general...

Though mkdir is an operation that forces a synchronization, reading a
file without a filesystem lock isn't.  The rename alleviates that a bit,
but it's not entirely unreasonable for an attempt to read the file to
cache the absence of the file and not notice a creation shortly
afterward.  This would be a lot more of an issue in case of contention
for the lock between different clients of a shared networked filesystem,
though we might imagine highly parallel systems to eventually hit such
issues as well.

But just the possibility of contention across a shared network
filesystem would give me pause, out of realizing that checking for a
local process with the same PID would do no good.  And then, kill -0
would not succeed if the lock holder was started by a different user,
unlike ps.

What if we printed an error message suggesting the command to clean up,
and then errored out, instead of retrying forever or breaking the lock
and proceeding?  Several programs that rely on lock files (git and svn
come to mind) seem to be taking such an approach these days, presumably
because of all the difficulties in automating the double-checking in all
potential scenarios.

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #42 from Eric Gallager  ---
(In reply to Rich Felker from comment #41)
> > Josef Wolf mentioned that he ran into this on the gcc-help mailing list 
> > here: https://gcc.gnu.org/ml/gcc-help/2019-10/msg00079.html
> 
> I don't think that's an instance of this issue.

Well ok, maybe not THAT message specifically; see the rest of the thread
though.

> It's normal/expected that __builtin_foo compiles to a call to foo in the
> absence of factors that lead to it being optimized to something simpler.
> The idiom of using __builtin_foo to get the compiler to emit an optimized
> implementation of foo for you, to serve as the public definition of foo, is
> simply not valid. That's kinda a shame because it would be nice to be able to
> do it for lots of math library functions, but of course in order for this to 
> be
> able to work gcc would have to promise it can generate code for the operation
> for all targets, which is unlikely to be reasonable.

[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread olegendo at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

Oleg Endo  changed:

   What|Removed |Added

 Target|sh*-*-* |

--- Comment #12 from Oleg Endo  ---
(In reply to Rich Felker from comment #8)

> This issue report is specific to target sh*-*-* 

I put that in the target field of the PR initially, but I also wrote in the
description above that this looks like a target independent issue.  So removing
the target field now, to be clear.

I don't think anybody should have to deal with that on any target in the
backend.  It should be handled by earlier stages of compilation, if at all.


The prerequisite for this optimization is that the integer variable type can be
represented by the floating point type of the comparison exactly.  If that's
the case, there should be no need to touch anything in fenv.

If the floating point constant can't be represented exactly as specified by the
code (e.g. comment #3 -- 16777216 becomes 16777217), would that still require
setting the inexact flag in fenv?  (I don't think so)

Re: [committed] correct strcmp() == 0 result for unknown strings (PR 92157)

2019-10-18 Thread Jeff Law
On 10/18/19 4:27 PM, Martin Sebor wrote:
> The optimization to fold (strcmp() == 0) results involving
> arrays/strings of unequal size/length has a bug where it is
> unprepared for the compute_string_length() function to return
> an invalid length as an indication that the length is unknown.
> This leads to some strings that are unequal being considered
> equal.
> 
> The attached patch corrects this handling.  I have committed
> it in r277194 with Jeff's okay.
And just for the record, this was caught by the Fedora tester I spoke
about at Cauldron.  I'm spinning the entire distro against our weekly
snapshots again.

Most of the issues have been package level problems (missing #includes,
narrowing conversions, fortran argument passing, assumptions about the
inliner, etc).   Those are largely under control, leading to...

My focus now is on chasing down the codegen issues and I've contacted a
few folks already with problems in their code.  There'll certainly be
more over time.  I'm starting with the easier to understand problems in
the hopes the tough ones (python & perl interpreters) get fixed along
the way.

The nastiest problem so far is the improved tail call opts from Jakub in
May.  AFAICT they're doing the right thing in the caller which is a good
indication that something is going wrong in the indirect sibling callee
(ugh).

jeff


gcc-8-20191018 is now available

2019-10-18 Thread gccadmin
Snapshot gcc-8-20191018 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/8-20191018/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-8-branch 
revision 277194

You'll find:

 gcc-8-20191018.tar.xzComplete GCC

  SHA256=d283e08654366645ed19848fb746aec8f6b683c7c4fb772f9a28640972fe8d2b
  SHA1=cc638a14de801ed074037be84eda7f5fd84505bf

Diffs from 8-20191011 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug tree-optimization/92157] [10 Regression] incorrect strcmp() == 0 result for unknown strings

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92157

Martin Sebor  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |10.0

--- Comment #3 from Martin Sebor  ---
Fixed via r277194.

[Bug tree-optimization/83819] [meta-bug] missing strlen optimizations

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83819
Bug 83819 depends on bug 92157, which changed state.

Bug 92157 Summary: [10 Regression] incorrect strcmp() == 0 result for unknown 
strings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92157

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[committed] correct strcmp() == 0 result for unknown strings (PR 92157)

2019-10-18 Thread Martin Sebor

The optimization to fold (strcmp() == 0) results involving
arrays/strings of unequal size/length has a bug where it is
unprepared for the compute_string_length() function to return
an invalid length as an indication that the length is unknown.
This leads to some strings that are unequal being considered
equal.

The attached patch corrects this handling.  I have committed
it in r277194 with Jeff's okay.

Martin

PR tree-optimization/92157 - incorrect strcmp() == 0 result for unknown strings

gcc/testsuite/ChangeLog:

	PR tree-optimization/92157
	* gcc.dg/strlenopt-69.c: Disable test failing due to PR 92155.
	* gcc.dg/strlenopt-87.c: New test.

gcc/ChangeLog:

	PR tree-optimization/92157
	* tree-ssa-strlen.c (handle_builtin_string_cmp): Be prepared for
	compute_string_length to return a negative result.


Index: gcc/testsuite/gcc.dg/strlenopt-69.c
===
--- gcc/testsuite/gcc.dg/strlenopt-69.c	(revision 277156)
+++ gcc/testsuite/gcc.dg/strlenopt-69.c	(working copy)
@@ -66,11 +66,14 @@ void test_empty_string (void)
   b4[2] = '\0';
   A (0 == strcmp ([2], [2]));
 
+#if 0
+  /* The following isn't handled yet due to PR 92155.  */
   clobber (a4, b4);
 
   memset (a4, 0, sizeof a4);
   memset (b4, 0, sizeof b4);
   A (0 == strcmp (a4, b4));
+#endif
 }
 
 /* Verify that comparison of dynamically created strings with unknown
Index: gcc/testsuite/gcc.dg/strlenopt-87.c
===
--- gcc/testsuite/gcc.dg/strlenopt-87.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-87.c	(working copy)
@@ -0,0 +1,105 @@
+/* PR tree-optimization/92157 - incorrect strcmp() == 0 result for unknown
+   strings​
+   { dg-do run }
+   { dg-options "-O2 -Wall" } */
+
+#include "strlenopt.h"
+
+
+char a2[2], a3[3];
+
+
+static inline __attribute__ ((always_inline)) int
+verify_not_equal (const char *s, const char *t, int x)
+{
+  int n = x < 0 ? strlen (s) : 0 < x ? strlen (t) : strlen (s) + strlen (t);
+
+  if (strcmp (t, s) == 0)
+abort ();
+
+  return n;
+}
+
+__attribute__ ((noipa)) int test_a2_s (const char *s)
+{
+  return verify_not_equal (a2, s, 0);
+}
+
+__attribute__ ((noipa)) int test_a2_a3 (void)
+{
+  return verify_not_equal (a2, a3, 0);
+}
+
+__attribute__ ((noipa)) int test_a3_a2 (void)
+{
+  return verify_not_equal (a3, a2, 0);
+}
+
+__attribute__ ((noipa)) int test_s_a2 (const char *s)
+{
+  return verify_not_equal (s, a2, 0);
+}
+
+
+__attribute__ ((noipa)) int test_a2_s_1 (const char *s)
+{
+  return verify_not_equal (a2, s, -1);
+}
+
+__attribute__ ((noipa)) int test_a2_a3_1 (void)
+{
+  return verify_not_equal (a2, a3, -1);
+}
+
+__attribute__ ((noipa)) int test_a3_a2_1 (void)
+{
+  return verify_not_equal (a3, a2, -1);
+}
+
+__attribute__ ((noipa)) int test_s_a2_1 (const char *s)
+{
+  return verify_not_equal (s, a2, -1);
+}
+
+
+__attribute__ ((noipa)) int test_a2_s_2 (const char *s)
+{
+  return verify_not_equal (a2, s, +1);
+}
+
+__attribute__ ((noipa)) int test_a2_a3_2 (void)
+{
+  return verify_not_equal (a2, a3, +1);
+}
+
+__attribute__ ((noipa)) int test_a3_a2_2 (void)
+{
+  return verify_not_equal (a3, a2, +1);
+}
+
+__attribute__ ((noipa)) int test_s_a2_2 (const char *s)
+{
+  return verify_not_equal (s, a2, +1);
+}
+
+int main (void)
+{
+  a2[0] = '1';
+  a3[0] = '1';
+  a3[0] = '2';
+
+  test_a2_s ("");
+  test_a2_a3 ();
+  test_a3_a2 ();
+  test_s_a2 ("");
+
+  test_a2_s_1 ("");
+  test_a2_a3_1 ();
+  test_a3_a2_1 ();
+  test_s_a2_1 ("");
+
+  test_a2_s_2 ("");
+  test_a2_a3_2 ();
+  test_a3_a2_2 ();
+  test_s_a2_2 ("");
+}
Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c	(revision 277156)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -3842,7 +3842,7 @@ handle_builtin_string_cmp (gimple_stmt_iterator *g
   HOST_WIDE_INT arysiz1 = -1, arysiz2 = -1;
 
   if (idx1)
-cstlen1 = compute_string_length (idx1) + 1;
+cstlen1 = compute_string_length (idx1);
   else
 arysiz1 = determine_min_objsize (arg1);
 
@@ -3853,7 +3853,7 @@ handle_builtin_string_cmp (gimple_stmt_iterator *g
 
   /* Repeat for the second argument.  */
   if (idx2)
-cstlen2 = compute_string_length (idx2) + 1;
+cstlen2 = compute_string_length (idx2);
   else
 arysiz2 = determine_min_objsize (arg2);
 
@@ -3860,6 +3860,14 @@ handle_builtin_string_cmp (gimple_stmt_iterator *g
   if (cstlen2 < 0 && arysiz2 < 0)
 return false;
 
+  if (cstlen1 < 0 && cstlen2 < 0)
+return false;
+
+  if (cstlen1 >= 0)
+++cstlen1;
+  if (cstlen2 >= 0)
+++cstlen2;
+
   /* The exact number of characters to compare.  */
   HOST_WIDE_INT cmpsiz = bound < 0 ? cstlen1 < 0 ? cstlen2 : cstlen1 : bound;
   /* The size of the array in which the unknown string is stored.  */


[Bug tree-optimization/92155] strlen(a) not folded after memset(a, 0, sizeof a)

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92155

--- Comment #4 from Martin Sebor  ---
Author: msebor
Date: Fri Oct 18 22:26:39 2019
New Revision: 277194

URL: https://gcc.gnu.org/viewcvs?rev=277194=gcc=rev
Log:
PR tree-optimization/92157 - incorrect strcmp() == 0 result for unknown strings

gcc/testsuite/ChangeLog:

PR tree-optimization/92157
* gcc.dg/strlenopt-69.c: Disable test failing due to PR 92155.
* gcc.dg/strlenopt-87.c: New test.

gcc/ChangeLog:

PR tree-optimization/92157
* tree-ssa-strlen.c (handle_builtin_string_cmp): Be prepared for
compute_string_length to return a negative result.

Added:
trunk/gcc/testsuite/gcc.dg/strlenopt-87.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/strlenopt-69.c
trunk/gcc/tree-ssa-strlen.c

[Bug tree-optimization/92157] [10 Regression] incorrect strcmp() == 0 result for unknown strings

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92157

--- Comment #2 from Martin Sebor  ---
Author: msebor
Date: Fri Oct 18 22:26:39 2019
New Revision: 277194

URL: https://gcc.gnu.org/viewcvs?rev=277194=gcc=rev
Log:
PR tree-optimization/92157 - incorrect strcmp() == 0 result for unknown strings

gcc/testsuite/ChangeLog:

PR tree-optimization/92157
* gcc.dg/strlenopt-69.c: Disable test failing due to PR 92155.
* gcc.dg/strlenopt-87.c: New test.

gcc/ChangeLog:

PR tree-optimization/92157
* tree-ssa-strlen.c (handle_builtin_string_cmp): Be prepared for
compute_string_length to return a negative result.

Added:
trunk/gcc/testsuite/gcc.dg/strlenopt-87.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/strlenopt-69.c
trunk/gcc/tree-ssa-strlen.c

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #25 from Jakub Jelinek  ---
The define_insn part of define_insn_and_split needs constraints if it is meant
to match during or after reload, the patterns are just written with the
assumption that they are split before reload.  At least that is my
understanding of them.

[testsuite] Add test for PR91532

2019-10-18 Thread Prathamesh Kulkarni
Hi Richard,
Sorry for not adding the test in PR91532 fix.
Is the attached patch OK to commit ?

Thanks,
Prathamesh
2019-10-18  Prathamesh Kulkarni  

PR tree-optimization/91532
testsuite/
* gcc.target/aarch64/sve/fmla_2.c: Add dg-scan check for deleted store.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
index 5c04bcdb3f5..bebb073d1f8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
@@ -1,4 +1,4 @@
-/* { dg-options "-O3" } */
+/* { dg-options "-O3 -fdump-tree-ifcvt-details" } */
 
 #include 
 
@@ -15,5 +15,6 @@ f (double *restrict a, double *restrict b, double *restrict c,
 }
 }
 
+/* { dg-final { scan-tree-dump-times "Deleted dead store" 1 "ifcvt" } } */
 /* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, 
z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
 /* { dg-final { scan-assembler-not {\tfmad\t} } } */


[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #24 from Segher Boessenkool  ---
A dumb question I'm sure, but I don't see it: if the rest of your
define_insn doesn't need constraints, why would the match_scratch
need some?  (A define_split never uses constraints).

[Bug tree-optimization/92155] strlen(a) not folded after memset(a, 0, sizeof a)

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92155

--- Comment #3 from Martin Sebor  ---
Actually, the memcpy is transformed to MEM_REF and the strlen pass knows how to
deal with a subset of those (small powers of 2).  What it doesn't know how to
do yet is deal with other sizes like in the test case below:

  extern char a5[5];

  int g (void)
  {
__builtin_memcpy (a5, (char[sizeof a5]){ }, sizeof a5);
return __builtin_strlen (a5);
  }

which results in

  D.1933 = {};
  MEM  [(char * {ref-all})] = MEM 
[(char * {ref-all})];
  _1 = __builtin_strlen ();

and that's not folded yet either.  Handling it is among the outstanding
optimizations mentioned in pr92128.

[Bug tree-optimization/92155] strlen(a) not folded after memset(a, 0, sizeof a)

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92155

--- Comment #2 from Martin Sebor  ---
The inequality (__builtin_strlen (a4) != 0) is folded into (a4[0] != 0) very
early on during Gimplification so the strlen pass never sees it.

What the strlen pass should be able to do is fold strlen(a4) below to zero:

  int g (void)
  {
__builtin_memset (a4, 0, sizeof a4);
return __builtin_strlen (a4);   // not folded
  }

But handle_builtin_memset doesn't have the logic.  memcpy does so when the
above is change to the below it works:

  int g (void)
  {
__builtin_memcpy (a4, (char[sizeof a4]){ }, sizeof a4);
return __builtin_strlen (a4);   // folded to zero
  }

[Bug tree-optimization/92155] strlen(a) not folded after memset(a, 0, sizeof a)

2019-10-18 Thread prathamesh3492 at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92155

prathamesh3492 at gcc dot gnu.org changed:

   What|Removed |Added

 CC||prathamesh3492 at gcc dot 
gnu.org

--- Comment #1 from prathamesh3492 at gcc dot gnu.org ---
Hi Martin,
Just wondering if it's necessary for 3rd arg to be sizeof ?
IIUC memset (a, 0, n) for valid n, should result in strlen(a) equal to 0 ?

Btw, it seems, the comparison is folded to 0 in following case:

extern char a4[4];

void g ()
{
  __builtin_memset (a4, 0, sizeof a4);
  if (__builtin_strlen (a4) != 0)
__builtin_abort ();
}

.optimized dump shows only call to memset.

Thanks,
Prathamesh

[Bug tree-optimization/92157] [10 Regression] incorrect strcmp() == 0 result for unknown strings

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92157

Martin Sebor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Keywords||wrong-code
   Last reconfirmed||2019-10-18
   Assignee|unassigned at gcc dot gnu.org  |msebor at gcc dot 
gnu.org
 Blocks||83819
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=92155
 Ever confirmed|0   |1
Summary|incorrect strcmp() == 0 |[10 Regression] incorrect
   |result for unknown strings  |strcmp() == 0 result for
   ||unknown strings

--- Comment #1 from Martin Sebor  ---
The test for the optimization, strlenopt-69.c, actually passes because of the
bug, and also because of the missing memset optimization reported in pr92155. 
The test_empty_string() function calls memset to zero out two arrays, a and b,
and expects strcmp(a, b) == 0 to be folded to true.  It is folded, but not
because the strlen pass knows the strings are the same (the missing
optimization), but because of this bug.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83819
[Bug 83819] [meta-bug] missing strlen optimizations

[Bug tree-optimization/92157] New: incorrect strcmp() == 0 result for unknown strings

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92157

Bug ID: 92157
   Summary: incorrect strcmp() == 0 result for unknown strings
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msebor at gcc dot gnu.org
  Target Milestone: ---

The strcmp optimization checked in r276773 doesn't handle the following case
correctly, causing regressions in the arj package.

$ cat a.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout a.c
char a[3];

__attribute__ ((noipa)) int f (const char *s)
{
  int n = __builtin_strlen (s);

  if (__builtin_strcmp (s, a) == 0)
__builtin_abort ();

  return n;
}

int main (void)
{
  a[0] = '1';
  f ("");
}

;; Function f (f, funcdef_no=0, decl_uid=1931, cgraph_uid=1, symbol_order=1)

__attribute__((noipa, noinline, noclone, no_icf))
f (const char * s)
{
   [count: 0]:
  __builtin_abort ();

}



;; Function main (main, funcdef_no=1, decl_uid=1935, cgraph_uid=2,
symbol_order=2) (executed once)

main ()
{
   [local count: 1073741824]:
  a[0] = 49;
  f ("");
  return 0;

}

[PATCH 14/29] [arm] Early split simple DImode equality comparisons

2019-10-18 Thread Richard Earnshaw

This is the first step of early splitting all the DImode comparison
operations.  We start by factoring the DImode handling out of
arm_gen_compare_reg into its own function.

Simple DImode equality comparisions (such as equality with zero, or
equality with a constant that is zero in one of the two word values
that it comprises) can be done using a single subtract followed by an
ORRS instruction.  This avoids the need for conditional execution.

For example, (r0 != 5) can be written as

SUB Rt, R0, #5
ORRSRt, Rt, R1

The ORRS is now expanded using an SImode pattern that already exists
in the MD file and this gives the register allocator more freedom to
select registers (consecutive pairs are no-longer required).
Furthermore, we can then delete the arm_cmpdi_zero pattern as it is
no-longer required.  We use SUB for the value adjustment as this has a
generally more flexible range of immediates than XOR and what's more
has the opportunity to be relaxed in thumb2 to a 16-bit SUBS
instruction.

* config/arm/arm.c (arm_select_cc_mode): For DImode equality tests
return CC_Zmode if comparing against a constant where one word is
zero.
(arm_gen_compare_reg): Split DImode handling to ...
(arm_gen_dicompare_reg): ... here.  Handle equality comparisons
against simple constants.
* config/arm/arm.md (arm_cmpdi_zero): Delete pattern.
---
 gcc/config/arm/arm.c  | 87 +--
 gcc/config/arm/arm.md | 11 --
 2 files changed, 68 insertions(+), 30 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e33b6b14d28..64367b42332 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15350,8 +15350,14 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
 	case EQ:
 	case NE:
 	  /* A DImode comparison against zero can be implemented by
-	 or'ing the two halves together.  */
-	  if (y == const0_rtx)
+	 or'ing the two halves together.  We can also handle
+	 immediates where one word of that value is zero by
+	 subtracting the non-zero word from the corresponding word
+	 in the other register and then ORRing it with the other
+	 word.  */
+	  if (CONST_INT_P (y)
+	  && ((UINTVAL (y) & 0x) == 0
+		  || (UINTVAL (y) >> 32) == 0))
 	return CC_Zmode;
 
 	  /* We can do an equality test in three Thumb instructions.  */
@@ -15393,37 +15399,64 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   return CCmode;
 }
 
-/* X and Y are two things to compare using CODE.  Emit the compare insn and
-   return the rtx for register 0 in the proper mode.  FP means this is a
-   floating point compare: I don't think that it is needed on the arm.  */
-rtx
-arm_gen_compare_reg (enum rtx_code code, rtx x, rtx y, rtx scratch)
+/* X and Y are two (DImode) things to compare for the condition CODE.  Emit
+   the sequence of instructions needed to generate a suitable condition
+   code register.  Return the CC register result.  */
+static rtx
+arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx scratch)
 {
-  machine_mode mode;
-  rtx cc_reg;
-  int dimode_comparison = GET_MODE (x) == DImode || GET_MODE (y) == DImode;
+  /* We don't currently handle DImode in thumb1, but rely on libgcc.  */
+  gcc_assert (TARGET_32BIT);
 
   /* We might have X as a constant, Y as a register because of the predicates
  used for cmpdi.  If so, force X to a register here.  */
-  if (dimode_comparison && !REG_P (x))
+  if (!REG_P (x))
 x = force_reg (DImode, x);
 
-  mode = SELECT_CC_MODE (code, x, y);
-  cc_reg = gen_rtx_REG (mode, CC_REGNUM);
+  machine_mode mode = SELECT_CC_MODE (code, x, y);
+  rtx cc_reg = gen_rtx_REG (mode, CC_REGNUM);
 
-  if (dimode_comparison
-  && mode != CC_CZmode)
+  if (mode != CC_CZmode)
 {
   rtx clobber, set;
 
   /* To compare two non-zero values for equality, XOR them and
 	 then compare against zero.  Not used for ARM mode; there
 	 CC_CZmode is cheaper.  */
-  if (mode == CC_Zmode && y != const0_rtx)
+  if (mode == CC_Zmode)
 	{
-	  gcc_assert (!reload_completed);
-	  x = expand_binop (DImode, xor_optab, x, y, NULL_RTX, 0, OPTAB_WIDEN);
-	  y = const0_rtx;
+	  mode = CC_NOOVmode;
+	  PUT_MODE (cc_reg, mode);
+	  if (y != const0_rtx)
+	{
+	  gcc_assert (CONST_INT_P (y));
+	  rtx xlo, xhi, ylo, yhi;
+	  arm_decompose_di_binop (x, y, , , , );
+	  if (!scratch)
+		scratch = gen_reg_rtx (SImode);
+	  if (ylo == const0_rtx)
+		{
+		  yhi = GEN_INT (-INTVAL(yhi));
+		  if (!arm_add_operand (yhi, SImode))
+		yhi = force_reg (SImode, yhi);
+		  emit_insn (gen_addsi3 (scratch, xhi, yhi));
+		  y = xlo;
+		}
+	  else
+		{
+		  gcc_assert (yhi == const0_rtx);
+		  ylo = GEN_INT (-INTVAL(ylo));
+		  if (!arm_add_operand (ylo, SImode))
+		ylo = force_reg (SImode, ylo);
+		  emit_insn (gen_addsi3 (scratch, xlo, ylo));
+		  y = xhi;
+		}
+	  x = gen_rtx_IOR (SImode, scratch, y);
+	  y = 

[PATCH 29/29] [arm] Fix testsuite nit when compiling for thumb2

2019-10-18 Thread Richard Earnshaw

In thumb2 we now generate a NEGS instruction rather than RSBS, so this
test needs updating.

* gcc.target/arm/negdi-3.c: Update expected output to allow NEGS.
---
 gcc/testsuite/gcc.target/arm/negdi-3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/negdi-3.c b/gcc/testsuite/gcc.target/arm/negdi-3.c
index 76ddf49fc0d..1520e9c65df 100644
--- a/gcc/testsuite/gcc.target/arm/negdi-3.c
+++ b/gcc/testsuite/gcc.target/arm/negdi-3.c
@@ -8,10 +8,10 @@ signed long long negdi_zero_extendsidi (unsigned int x)
 }
 /*
 Expected output:
-rsbsr0, r0, #0
+rsbsr0, r0, #0 (arm) | negs	r0, r0 (thumb2)
 sbc r1, r1, r1
 */
-/* { dg-final { scan-assembler-times "rsb" 1 } } */
+/* { dg-final { scan-assembler-times "rsbs|negs" 1 } } */
 /* { dg-final { scan-assembler-times "sbc" 1 } } */
 /* { dg-final { scan-assembler-times "mov" 0 } } */
 /* { dg-final { scan-assembler-times "rsc" 0 } } */


[PATCH 07/29] [arm] Remove redundant DImode subtract patterns

2019-10-18 Thread Richard Earnshaw

Now that we early split DImode subtracts, the patterns to emit the
original and to match zero-extend with subtraction or negation are
no-longer useful.

* config/arm/arm.md (arm_subdi3): Delete insn.
(zextendsidi_negsi, negdi_extendsidi): Delete insn_and_split.
---
 gcc/config/arm/arm.md | 102 --
 1 file changed, 102 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 99d931525f8..f597a277c17 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1161,18 +1161,6 @@ (define_expand "subdi3"
   "
 )
 
-(define_insn "*arm_subdi3"
-  [(set (match_operand:DI 0 "arm_general_register_operand" "=,,")
-	(minus:DI (match_operand:DI 1 "arm_general_register_operand" "0,r,0")
-		  (match_operand:DI 2 "arm_general_register_operand" "r,0,0")))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "subs\\t%Q0, %Q1, %Q2\;sbc\\t%R0, %R1, %R2"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "subsi3"
   [(set (match_operand:SI   0 "s_register_operand")
 	(minus:SI (match_operand:SI 1 "reg_or_int_operand")
@@ -3866,96 +3854,6 @@ (define_expand "negdf2"
   "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP_DOUBLE"
   "")
 
-(define_insn_and_split "*zextendsidi_negsi"
-  [(set (match_operand:DI 0 "s_register_operand" "=r")
-(zero_extend:DI (neg:SI (match_operand:SI 1 "s_register_operand" "r"]
-   "TARGET_32BIT"
-   "#"
-   ""
-   [(set (match_dup 2)
- (neg:SI (match_dup 1)))
-(set (match_dup 3)
- (const_int 0))]
-   {
-  operands[2] = gen_lowpart (SImode, operands[0]);
-  operands[3] = gen_highpart (SImode, operands[0]);
-   }
- [(set_attr "length" "8")
-  (set_attr "type" "multiple")]
-)
-
-;; Negate an extended 32-bit value.
-(define_insn_and_split "*negdi_extendsidi"
-  [(set (match_operand:DI 0 "s_register_operand" "=l,r")
-	(neg:DI (sign_extend:DI
-		 (match_operand:SI 1 "s_register_operand" "l,r"
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-  {
-rtx low = gen_lowpart (SImode, operands[0]);
-rtx high = gen_highpart (SImode, operands[0]);
-
-if (reg_overlap_mentioned_p (low, operands[1]))
-  {
-	/* Input overlaps the low word of the output.  Use:
-		asr	Rhi, Rin, #31
-		rsbs	Rlo, Rin, #0
-		rsc	Rhi, Rhi, #0 (thumb2: sbc Rhi, Rhi, Rhi, lsl #1).  */
-	rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
-
-	emit_insn (gen_rtx_SET (high,
-gen_rtx_ASHIFTRT (SImode, operands[1],
-		  GEN_INT (31;
-
-	emit_insn (gen_subsi3_compare (low, const0_rtx, operands[1]));
-	if (TARGET_ARM)
-	  emit_insn (gen_rtx_SET (high,
-  gen_rtx_MINUS (SImode,
-		 gen_rtx_MINUS (SImode,
-const0_rtx,
-high),
-		 gen_rtx_LTU (SImode,
-			  cc_reg,
-			  const0_rtx;
-	else
-	  {
-	rtx two_x = gen_rtx_ASHIFT (SImode, high, GEN_INT (1));
-	emit_insn (gen_rtx_SET (high,
-gen_rtx_MINUS (SImode,
-		   gen_rtx_MINUS (SImode,
-  high,
-  two_x),
-		   gen_rtx_LTU (SImode,
-cc_reg,
-const0_rtx;
-	  }
-  }
-else
-  {
-	/* No overlap, or overlap on high word.  Use:
-		rsb	Rlo, Rin, #0
-		bic	Rhi, Rlo, Rin
-		asr	Rhi, Rhi, #31
-	   Flags not needed for this sequence.  */
-	emit_insn (gen_rtx_SET (low, gen_rtx_NEG (SImode, operands[1])));
-	emit_insn (gen_rtx_SET (high,
-gen_rtx_AND (SImode,
-	 gen_rtx_NOT (SImode, operands[1]),
-	 low)));
-	emit_insn (gen_rtx_SET (high,
-gen_rtx_ASHIFTRT (SImode, high,
-		  GEN_INT (31;
-  }
-DONE;
-  }
-  [(set_attr "length" "12")
-   (set_attr "arch" "t2,*")
-   (set_attr "type" "multiple")]
-)
-
 ;; abssi2 doesn't really clobber the condition codes if a different register
 ;; is being set.  To keep things simple, assume during rtl manipulations that
 ;; it does, but tell the final scan operator the truth.  Similarly for


[PATCH 21/29] [arm] Improve code generation for addvsi4.

2019-10-18 Thread Richard Earnshaw

Similar to the improvements for uaddvsi4, this patch improves the code
generation for addvsi4 to handle immediates and to add alternatives
that better target thumb2.  To do this we separate out the expansion
of uaddvsi4 from that of uaddvdi4 and then add an additional pattern
to handle constants.  Also, while doing this I've fixed the incorrect
usage of NE instead of COMPARE in the generated RTL.

* config/arm/arm.md (addv4): Delete.
(addvsi4): New pattern.  Handle immediate values that the architecture
supports.
(addvdi4): New pattern.
(addsi3_compareV): Rename to ...
(addsi3_compareV_reg): ... this.  Add constraints for thumb2 variants
and use COMPARE rather than NE.
(addsi3_compareV_imm): New pattern.
* config/arm/arm.c (arm_select_cc_mode): Return CC_Vmode for
a signed-overflow check.
---
 gcc/config/arm/arm.c  |  8 ++
 gcc/config/arm/arm.md | 63 ---
 2 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index eebbdc3d9c2..638c82df25f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15411,6 +15411,14 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
 	  || arm_borrow_operation (y, DImode)))
 return CC_Bmode;
 
+  if (GET_MODE (x) == DImode
+  && (op == EQ || op == NE)
+  && GET_CODE (x) == PLUS
+  && GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+  && GET_CODE (y) == SIGN_EXTEND
+  && GET_CODE (XEXP (y, 0)) == PLUS)
+return CC_Vmode;
+
   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_CC)
 return GET_MODE (x);
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 9f0e43571fd..b5214c79c35 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -488,14 +488,30 @@ (define_expand "adddi3"
   "
 )
 
-(define_expand "addv4"
-  [(match_operand:SIDI 0 "register_operand")
-   (match_operand:SIDI 1 "register_operand")
-   (match_operand:SIDI 2 "register_operand")
+(define_expand "addvsi4"
+  [(match_operand:SI 0 "s_register_operand")
+   (match_operand:SI 1 "s_register_operand")
+   (match_operand:SI 2 "arm_add_operand")
(match_operand 3 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_add3_compareV (operands[0], operands[1], operands[2]));
+  if (CONST_INT_P (operands[2]))
+emit_insn (gen_addsi3_compareV_imm (operands[0], operands[1], operands[2]));
+  else
+emit_insn (gen_addsi3_compareV_reg (operands[0], operands[1], operands[2]));
+  arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
+
+  DONE;
+})
+
+(define_expand "addvdi4"
+  [(match_operand:DI 0 "register_operand")
+   (match_operand:DI 1 "register_operand")
+   (match_operand:DI 2 "register_operand")
+   (match_operand 3 "")]
+  "TARGET_32BIT"
+{
+  emit_insn (gen_adddi3_compareV (operands[0], operands[1], operands[2]));
   arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
 
   DONE;
@@ -770,21 +786,48 @@ (define_insn "adddi3_compareV"
(set_attr "type" "multiple")]
 )
 
-(define_insn "addsi3_compareV"
+(define_insn "addsi3_compareV_reg"
   [(set (reg:CC_V CC_REGNUM)
-	(ne:CC_V
+	(compare:CC_V
 	  (plus:DI
-	(sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
-	(sign_extend:DI (match_operand:SI 2 "register_operand" "r")))
+	(sign_extend:DI (match_operand:SI 1 "register_operand" "%l,0,r"))
+	(sign_extend:DI (match_operand:SI 2 "register_operand" "l,r,r")))
 	  (sign_extend:DI (plus:SI (match_dup 1) (match_dup 2)
-   (set (match_operand:SI 0 "register_operand" "=r")
+   (set (match_operand:SI 0 "register_operand" "=l,r,r")
 	(plus:SI (match_dup 1) (match_dup 2)))]
   "TARGET_32BIT"
   "adds%?\\t%0, %1, %2"
   [(set_attr "conds" "set")
+   (set_attr "arch" "t2,t2,*")
+   (set_attr "length" "2,2,4")
(set_attr "type" "alus_sreg")]
 )
 
+(define_insn "addsi3_compareV_imm"
+  [(set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	  (plus:DI
+	(sign_extend:DI
+	 (match_operand:SI 1 "register_operand" "l,0,l,0,r,r"))
+	(match_operand 2 "arm_addimm_operand" "Pd,Py,Px,Pw,I,L"))
+	  (sign_extend:DI (plus:SI (match_dup 1) (match_dup 2)
+   (set (match_operand:SI 0 "register_operand" "=l,l,l,l,r,r")
+	(plus:SI (match_dup 1) (match_dup 2)))]
+  "TARGET_32BIT
+   && INTVAL (operands[2]) == ARM_SIGN_EXTEND (INTVAL (operands[2]))"
+  "@
+   adds%?\\t%0, %1, %2
+   adds%?\\t%0, %0, %2
+   subs%?\\t%0, %1, #%n2
+   subs%?\\t%0, %0, #%n2
+   adds%?\\t%0, %1, %2
+   subs%?\\t%0, %1, #%n2"
+  [(set_attr "conds" "set")
+   (set_attr "arch" "t2,t2,t2,t2,*,*")
+   (set_attr "length" "2,2,2,2,4,4")
+   (set_attr "type" "alus_imm")]
+)
+
 (define_insn "addsi3_compare0"
   [(set (reg:CC_NOOV CC_REGNUM)
 	(compare:CC_NOOV


[PATCH 06/29] [arm] Early split subdi3

2019-10-18 Thread Richard Earnshaw

This patch adds early splitting of subdi3 so that the individual
operations can be seen by the optimizers, particuarly combine.  This
should allow us to do at least as good a job as previously, but with
far fewer patterns in the machine description.

This is just the initial patch to add the early splitting.  The
cleanups will follow later.

A special trick is used to handle the 'reverse subtract and compare'
where a register is subtracted from a constant.  The natural
comparison

(COMPARE (const) (reg))

is not canonical in this case and combine will never correctly
generate it (trying to swap the order of the operands.  To handle this
we write the comparison as

(COMPARE (NOT (reg)) (~const)),

which has the same result for EQ, NE, LTU, LEU, GTU and GEU, which are
all the cases we are really interested in here.

Finally, we delete the negdi2 pattern.  The generic expanders will use
our new subdi3 expander if this pattern is missing and that can handle
the negate case just fine.

* config/arm/arm-modes.def (CC_RSB): New CC mode.
* config/arm/predicates.md (arm_borrow_operation): Handle CC_RSBmode.
* config/arm/arm.c (arm_select_cc_mode): Detect when we should
return CC_RSBmode.
(maybe_get_arm_condition_code): Handle CC_RSBmode.
* config/arm/arm.md (subsi3_carryin): Make this pattern available to
expand.
(subdi3): Rewrite to early-expand the sub-operations.
(rsb_im_compare): New pattern.
(negdi2): Delete.
(negdi2_insn): Delete.
(arm_negsi2): Correct type attribute to alu_imm.
(negsi2_0compare): New insn pattern.
(negsi2_carryin): New insn pattern.
---
 gcc/config/arm/arm-modes.def |   4 +
 gcc/config/arm/arm.c |  23 ++
 gcc/config/arm/arm.md| 141 ---
 gcc/config/arm/predicates.md |   2 +-
 4 files changed, 141 insertions(+), 29 deletions(-)

diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index 8f131c369b5..4fa7f1b43e5 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -36,6 +36,9 @@ ADJUST_FLOAT_FORMAT (HF, ((arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
CC_Nmode should be used if only the N (sign) flag is set correctly
CC_CZmode should be used if only the C and Z flags are correct
(used for DImode unsigned comparisons).
+   CC_RSBmode should be used where the comparison is set by an RSB immediate,
+ or NEG instruction.  The form of the comparison for (const - reg) will
+ be (COMPARE (not (reg)) (~const)).
CC_NCVmode should be used if only the N, C, and V flags are correct
(used for DImode signed comparisons).
CCmode should be used otherwise.  */
@@ -45,6 +48,7 @@ CC_MODE (CC_Z);
 CC_MODE (CC_CZ);
 CC_MODE (CC_NCV);
 CC_MODE (CC_SWP);
+CC_MODE (CC_RSB);
 CC_MODE (CCFP);
 CC_MODE (CCFPE);
 CC_MODE (CC_DNE);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index db18651346f..9a779e24cac 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15214,6 +15214,17 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
 	  || (TARGET_32BIT && GET_CODE (x) == ZERO_EXTRACT)))
 return CC_NOOVmode;
 
+  /* An unsigned comparison of ~reg with a const is really a special
+ canoncialization of compare (~const, reg), which is a reverse
+ subtract operation.  We may not get here if CONST is 0, but that
+ doesn't matter because ~0 isn't a valid immediate for RSB.  */
+  if (GET_MODE (x) == SImode
+  && GET_CODE (x) == NOT
+  && CONST_INT_P (y)
+  && (op == EQ || op == NE
+	  || op == LTU || op == LEU || op == GEU || op == GTU))
+return CC_RSBmode;
+
   if (GET_MODE (x) == QImode && (op == EQ || op == NE))
 return CC_Zmode;
 
@@ -23629,6 +23640,18 @@ maybe_get_arm_condition_code (rtx comparison)
 	default: return ARM_NV;
 	}
 
+case E_CC_RSBmode:
+  switch (comp_code)
+	{
+	case NE: return ARM_NE;
+	case EQ: return ARM_EQ;
+	case GEU: return ARM_CS;
+	case GTU: return ARM_HI;
+	case LEU: return ARM_LS;
+	case LTU: return ARM_CC;
+	default: return ARM_NV;
+	}
+
 case E_CCmode:
   switch (comp_code)
 	{
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index fbe154a9873..99d931525f8 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -989,7 +989,7 @@ (define_insn "subsi3_compare1"
(set_attr "type" "alus_sreg")]
 )
 
-(define_insn "*subsi3_carryin"
+(define_insn "subsi3_carryin"
   [(set (match_operand:SI 0 "s_register_operand" "=r,r,r")
 	(minus:SI (minus:SI (match_operand:SI 1 "reg_or_int_operand" "r,I,Pz")
 			(match_operand:SI 2 "s_register_operand" "r,r,r"))
@@ -1094,12 +1094,72 @@ (define_expand "adddf3"
 (define_expand "subdi3"
  [(parallel
[(set (match_operand:DI0 "s_register_operand")
-	  (minus:DI (match_operand:DI 1 "s_register_operand")
+	  (minus:DI (match_operand:DI 1 "reg_or_int_operand")
 		(match_operand:DI 2 "s_register_operand")))
   

[PATCH 19/29] [arm] Handle immediate values in uaddvsi4

2019-10-18 Thread Richard Earnshaw

The uaddv patterns in the arm back-end do not currenty handle immediates
during expansion.  This patch adds this support for uaddvsi4.  It's really
a stepping-stone towards early expansion of uaddvdi4, but it complete and
a useful change in its own right.

Whilst making this change I also observed that we really had two patterns
that did exactly the same thing, but with slightly different properties;
consequently I've cleaned up all of the add-and-compare patterns to bring
some consistency.

* config/arm/arm.md (adddi3): Call gen_addsi3_compare_op1.
* (uaddv4): Delete expansion pattern.
(uaddvsi4): New pattern.
(uaddvdi4): Likewise.
(addsi3_compareC): Delete pattern, change callers to use
addsi3_compare_op1.
(addsi3_compare_op1): No-longer anonymous.  Clean up constraints to
reduce the number of alternatives and re-work type attribute handling.
(addsi3_compare_op2): Clean up constraints to reduce the number of
alternatives and re-work type attribute handling.
(compare_addsi2_op0): Likewise.
(compare_addsi2_op1): Likewise.
---
 gcc/config/arm/arm.md | 118 ++
 1 file changed, 62 insertions(+), 56 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index eaadfd64128..4ea6f4b226c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -470,7 +470,7 @@ (define_expand "adddi3"
 	  if (!arm_not_operand (hi_op2, SImode))
 	hi_op2 = force_reg (SImode, hi_op2);
 
-	  emit_insn (gen_addsi3_compareC (lo_dest, lo_op1, lo_op2));
+	  emit_insn (gen_addsi3_compare_op1 (lo_dest, lo_op1, lo_op2));
 	  rtx carry = gen_rtx_LTU (SImode, gen_rtx_REG (CC_Cmode, CC_REGNUM),
    const0_rtx);
 	  if (hi_op2 == const0_rtx)
@@ -501,14 +501,27 @@ (define_expand "addv4"
   DONE;
 })
 
-(define_expand "uaddv4"
-  [(match_operand:SIDI 0 "register_operand")
-   (match_operand:SIDI 1 "register_operand")
-   (match_operand:SIDI 2 "register_operand")
+(define_expand "uaddvsi4"
+  [(match_operand:SI 0 "s_register_operand")
+   (match_operand:SI 1 "s_register_operand")
+   (match_operand:SI 2 "arm_add_operand")
(match_operand 3 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_add3_compareC (operands[0], operands[1], operands[2]));
+  emit_insn (gen_addsi3_compare_op1 (operands[0], operands[1], operands[2]));
+  arm_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]);
+
+  DONE;
+})
+
+(define_expand "uaddvdi4"
+  [(match_operand:DI 0 "s_register_operand")
+   (match_operand:DI 1 "s_register_operand")
+   (match_operand:DI 2 "s_register_operand")
+   (match_operand 3 "")]
+  "TARGET_32BIT"
+{
+  emit_insn (gen_adddi3_compareC (operands[0], operands[1], operands[2]));
   arm_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]);
 
   DONE;
@@ -639,19 +652,6 @@ (define_insn "adddi3_compareC"
(set_attr "type" "multiple")]
 )
 
-(define_insn "addsi3_compareC"
-   [(set (reg:CC_C CC_REGNUM)
-	 (compare:CC_C (plus:SI (match_operand:SI 1 "register_operand" "r")
-(match_operand:SI 2 "register_operand" "r"))
-		   (match_dup 1)))
-(set (match_operand:SI 0 "register_operand" "=r")
-	 (plus:SI (match_dup 1) (match_dup 2)))]
-  "TARGET_32BIT"
-  "adds%?\\t%0, %1, %2"
-  [(set_attr "conds" "set")
-   (set_attr "type" "alus_sreg")]
-)
-
 (define_insn "addsi3_compare0"
   [(set (reg:CC_NOOV CC_REGNUM)
 	(compare:CC_NOOV
@@ -770,13 +770,13 @@ (define_peephole2
 ;; the operands, and we know that the use of the condition code is
 ;; either GEU or LTU, so we can use the carry flag from the addition
 ;; instead of doing the compare a second time.
-(define_insn "*addsi3_compare_op1"
+(define_insn "addsi3_compare_op1"
   [(set (reg:CC_C CC_REGNUM)
 	(compare:CC_C
-	 (plus:SI (match_operand:SI 1 "s_register_operand" "l,0,l,0,r,r,r")
-		  (match_operand:SI 2 "arm_add_operand" "lPd,Py,lPx,Pw,I,L,r"))
+	 (plus:SI (match_operand:SI 1 "s_register_operand" "l,0,l,0,rk,rk")
+		  (match_operand:SI 2 "arm_add_operand" "lPd,Py,lPx,Pw,rkI,L"))
 	 (match_dup 1)))
-   (set (match_operand:SI 0 "s_register_operand" "=l,l,l,l,r,r,r")
+   (set (match_operand:SI 0 "s_register_operand" "=l,l,l,l,rk,rk")
 	(plus:SI (match_dup 1) (match_dup 2)))]
   "TARGET_32BIT"
   "@
@@ -785,22 +785,23 @@ (define_insn "*addsi3_compare_op1"
subs%?\\t%0, %1, #%n2
subs%?\\t%0, %0, #%n2
adds%?\\t%0, %1, %2
-   subs%?\\t%0, %1, #%n2
-   adds%?\\t%0, %1, %2"
+   subs%?\\t%0, %1, #%n2"
   [(set_attr "conds" "set")
-   (set_attr "arch" "t2,t2,t2,t2,*,*,*")
-   (set_attr "length" "2,2,2,2,4,4,4")
-   (set_attr "type"
-"alus_sreg,alus_imm,alus_sreg,alus_imm,alus_imm,alus_imm,alus_sreg")]
+   (set_attr "arch" "t2,t2,t2,t2,*,*")
+   (set_attr "length" "2,2,2,2,4,4")
+   (set (attr "type")
+	(if_then_else (match_operand 2 "const_int_operand")
+		  (const_string "alu_imm")
+		  (const_string "alu_sreg")))]
 )
 
 (define_insn "*addsi3_compare_op2"
   [(set (reg:CC_C CC_REGNUM)
 	(compare:CC_C
-	 (plus:SI 

[PATCH 18/29] [arm] Cleanup dead code - old support for DImode comparisons

2019-10-18 Thread Richard Earnshaw

Now that all the major patterns for DImode have been converted to
early expansion, we can safely clean up some dead code for the old way
of handling DImode.

* config/arm/arm-modes.def (CC_NCV, CC_CZ): Delete CC modes.
* config/arm/arm.c (arm_select_cc_mode): Remove old selection code
for DImode operands.
(arm_gen_dicompare_reg): Remove unreachable expansion code.
(maybe_get_arm_condition_code): Remove support for CC_CZmode and
CC_NCVmode.
* config/arm/arm.md (arm_cmpdi_insn): Delete.
(arm_cmpdi_unsigned): Delete.
---
 gcc/config/arm/arm-modes.def |   5 --
 gcc/config/arm/arm.c | 147 +--
 gcc/config/arm/arm.md|  45 ---
 3 files changed, 1 insertion(+), 196 deletions(-)

diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index 65cddf68cdb..f0eb8415b93 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -36,12 +36,9 @@ ADJUST_FLOAT_FORMAT (HF, ((arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
CC_Nmode should be used if only the N (sign) flag is set correctly
CC_NVmode should be used if only the N and V bits are set correctly,
  (used for signed comparisons when the carry is propagated in).
-   CC_CZmode should be used if only the C and Z flags are correct
-   (used for DImode unsigned comparisons).
CC_RSBmode should be used where the comparison is set by an RSB immediate,
  or NEG instruction.  The form of the comparison for (const - reg) will
  be (COMPARE (not (reg)) (~const)).
-   CC_NCVmode should be used if only the N, C, and V flags are correct
CC_Bmode should be used if only the C flag is correct after a subtract
  (eg after an unsigned borrow with carry-in propagation).
(used for DImode signed comparisons).
@@ -49,8 +46,6 @@ ADJUST_FLOAT_FORMAT (HF, ((arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
 
 CC_MODE (CC_NOOV);
 CC_MODE (CC_Z);
-CC_MODE (CC_CZ);
-CC_MODE (CC_NCV);
 CC_MODE (CC_NV);
 CC_MODE (CC_SWP);
 CC_MODE (CC_RSB);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 299dce638c2..6da2a368d9f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15403,56 +15403,6 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
 	  || arm_borrow_operation (y, DImode)))
 return CC_Bmode;
 
-  if (GET_MODE (x) == DImode || GET_MODE (y) == DImode)
-{
-  switch (op)
-	{
-	case EQ:
-	case NE:
-	  /* A DImode comparison against zero can be implemented by
-	 or'ing the two halves together.  We can also handle
-	 immediates where one word of that value is zero by
-	 subtracting the non-zero word from the corresponding word
-	 in the other register and then ORRing it with the other
-	 word.  */
-	  if (CONST_INT_P (y)
-	  && ((UINTVAL (y) & 0x) == 0
-		  || (UINTVAL (y) >> 32) == 0))
-	return CC_Zmode;
-
-	  /* We can do an equality test in three Thumb instructions.  */
-	  if (!TARGET_32BIT)
-	return CC_Zmode;
-
-	  /* FALLTHROUGH */
-
-	case LTU:
-	case LEU:
-	case GTU:
-	case GEU:
-	  /* DImode unsigned comparisons can be implemented by cmp +
-	 cmpeq without a scratch register.  Not worth doing in
-	 Thumb-2.  */
-	  if (TARGET_32BIT)
-	return CC_CZmode;
-
-	  /* FALLTHROUGH */
-
-	case LT:
-	case LE:
-	case GT:
-	case GE:
-	  /* DImode signed and unsigned comparisons can be implemented
-	 by cmp + sbcs with a scratch register, but that does not
-	 set the Z flag - we must reverse GT/LE/GTU/LEU.  */
-	  gcc_assert (op != EQ && op != NE);
-	  return CC_NCVmode;
-
-	default:
-	  gcc_unreachable ();
-	}
-}
-
   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_CC)
 return GET_MODE (x);
 
@@ -15673,81 +15623,8 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx scratch)
   }
 
 default:
-  break;
-}
-
-  /* We might have X as a constant, Y as a register because of the predicates
- used for cmpdi.  If so, force X to a register here.  */
-  if (!REG_P (x))
-x = force_reg (DImode, x);
-
-  mode = SELECT_CC_MODE (code, x, y);
-  cc_reg = gen_rtx_REG (mode, CC_REGNUM);
-
-  if (mode != CC_CZmode)
-{
-  rtx clobber, set;
-
-  /* To compare two non-zero values for equality, XOR them and
-	 then compare against zero.  Not used for ARM mode; there
-	 CC_CZmode is cheaper.  */
-  if (mode == CC_Zmode)
-	{
-	  mode = CC_NOOVmode;
-	  PUT_MODE (cc_reg, mode);
-	  if (y != const0_rtx)
-	{
-	  gcc_assert (CONST_INT_P (y));
-	  rtx xlo, xhi, ylo, yhi;
-	  arm_decompose_di_binop (x, y, , , , );
-	  if (!scratch)
-		scratch = gen_reg_rtx (SImode);
-	  if (ylo == const0_rtx)
-		{
-		  yhi = gen_int_mode (-INTVAL (yhi), SImode);
-		  if (!arm_add_operand (yhi, SImode))
-		yhi = force_reg (SImode, yhi);
-		  emit_insn (gen_addsi3 (scratch, xhi, yhi));
-		  y = xlo;
-		}
-	  else
-		{
-		  gcc_assert (yhi == const0_rtx);
-		  ylo = gen_int_mode (-INTVAL 

[PATCH 22/29] [arm] Allow the summation result of signed add-with-overflow to be discarded.

2019-10-18 Thread Richard Earnshaw

This patch matches the signed add-with-overflow patterns when the
summation itself is dropped.  In this case we can use CMN (or CMP with
some immediates).  There are a small number of constants in thumb2
where this can result in less dense code (as we lack 16-bit CMN with
immediate patterns).  To handle this we use peepholes to try these
alternatives when either a scratch is available (0 <= i <= 7) or the
original register is dead (0 <= i <= 255).  We don't use a scratch in
the pattern as if those conditions are not satisfied then the 32-bit
form is preferable to forcing a reload.

* config/arm/arm.md (addsi3_compareV_reg_nosum): New insn.
(addsi3_compareV_imm_nosum): New insn.  Also add peephole2 patterns
to transform this back into the summation version when that leads
to smaller code.
---
 gcc/config/arm/arm.md | 78 +++
 1 file changed, 78 insertions(+)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index b5214c79c35..be002f77382 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -803,6 +803,21 @@ (define_insn "addsi3_compareV_reg"
(set_attr "type" "alus_sreg")]
 )
 
+(define_insn "*addsi3_compareV_reg_nosum"
+  [(set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	  (plus:DI
+	(sign_extend:DI (match_operand:SI 0 "register_operand" "%l,r"))
+	(sign_extend:DI (match_operand:SI 1 "register_operand" "l,r")))
+	  (sign_extend:DI (plus:SI (match_dup 0) (match_dup 1)]
+  "TARGET_32BIT"
+  "cmn%?\\t%0, %1"
+  [(set_attr "conds" "set")
+   (set_attr "arch" "t2,*")
+   (set_attr "length" "2,4")
+   (set_attr "type" "alus_sreg")]
+)
+
 (define_insn "addsi3_compareV_imm"
   [(set (reg:CC_V CC_REGNUM)
 	(compare:CC_V
@@ -828,6 +843,69 @@ (define_insn "addsi3_compareV_imm"
(set_attr "type" "alus_imm")]
 )
 
+(define_insn "addsi3_compareV_imm_nosum"
+  [(set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	  (plus:DI
+	(sign_extend:DI
+	 (match_operand:SI 0 "register_operand" "l,r,r"))
+	(match_operand 1 "arm_addimm_operand" "Pw,I,L"))
+	  (sign_extend:DI (plus:SI (match_dup 0) (match_dup 1)]
+  "TARGET_32BIT
+   && INTVAL (operands[1]) == ARM_SIGN_EXTEND (INTVAL (operands[1]))"
+  "@
+   cmp%?\\t%0, #%n1
+   cmn%?\\t%0, %1
+   cmp%?\\t%0, #%n1"
+  [(set_attr "conds" "set")
+   (set_attr "arch" "t2,*,*")
+   (set_attr "length" "2,4,4")
+   (set_attr "type" "alus_imm")]
+)
+
+;; We can handle more constants efficently if we can clobber either a scratch
+;; or the other source operand.  We deliberately leave this late as in
+;; high register pressure situations it's not worth forcing any reloads.
+(define_peephole2
+  [(match_scratch:SI 2 "l")
+   (set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	  (plus:DI
+	(sign_extend:DI
+	 (match_operand:SI 0 "low_register_operand"))
+	(match_operand 1 "const_int_operand"))
+	  (sign_extend:DI (plus:SI (match_dup 0) (match_dup 1)]
+  "TARGET_THUMB2
+   && satisfies_constraint_Pd (operands[1])"
+  [(parallel[
+(set (reg:CC_V CC_REGNUM)
+	 (compare:CC_V
+	  (plus:DI (sign_extend:DI (match_dup 0))
+		   (sign_extend:DI (match_dup 1)))
+	  (sign_extend:DI (plus:SI (match_dup 0) (match_dup 1)
+(set (match_dup 2) (plus:SI (match_dup 0) (match_dup 1)))])]
+)
+
+(define_peephole2
+  [(set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	  (plus:DI
+	(sign_extend:DI
+	 (match_operand:SI 0 "low_register_operand"))
+	(match_operand 1 "const_int_operand"))
+	  (sign_extend:DI (plus:SI (match_dup 0) (match_dup 1)]
+  "TARGET_THUMB2
+   && dead_or_set_p (peep2_next_insn (0), operands[0])
+   && satisfies_constraint_Py (operands[1])"
+  [(parallel[
+(set (reg:CC_V CC_REGNUM)
+	 (compare:CC_V
+	  (plus:DI (sign_extend:DI (match_dup 0))
+		   (sign_extend:DI (match_dup 1)))
+	  (sign_extend:DI (plus:SI (match_dup 0) (match_dup 1)
+(set (match_dup 0) (plus:SI (match_dup 0) (match_dup 1)))])]
+)
+
 (define_insn "addsi3_compare0"
   [(set (reg:CC_NOOV CC_REGNUM)
 	(compare:CC_NOOV


[PATCH 20/29] [arm] Early expansion of uaddvdi4.

2019-10-18 Thread Richard Earnshaw

This code borrows strongly on the uaddvti4 expansion for aarch64 since
the principles are similar.  Firstly, if the one of the low words of
the expansion is 0, we can simply copy the other low word to the
destination and use uaddvsi4 for the upper word.  If that doesn't work
we have to handle three possible cases for the upper work (the lower
word is simply an add-with-carry operation as for adddi3): zero in the
upper word, some other constant and a register (each has a different
canonicalization).  We use CC_ADCmode (a new CC mode variant) to
describe the cases as the introduction of the carry means we can
no-longer use the normal overflow trick of comparing the sum against
one of the operands.

* config/arm/arm-modes.def (CC_ADC): New CC mode.
* config/arm/arm.c (arm_select_cc_mode): Detect selection of
CC_ADCmode.
(maybe_get_arm_condition_code): Handle CC_ADCmode.
* config/arm/arm.md (uaddvdi4): Early expansion of unsigned addition
with overflow.
(addsi3_cin_cout_reg, addsi3_cin_cout_imm, addsi3_cin_cout_0): New
expand patterns.
(addsi3_cin_cout_reg_insn, addsi3_cin_cout_0_insn): New insn patterns
(addsi3_cin_cout_imm_insn): Likewise.
(adddi3_compareC): Delete insn.
* config/arm/predicates.md (arm_carry_operation): Handle CC_ADCmode.
---
 gcc/config/arm/arm-modes.def |   4 +
 gcc/config/arm/arm.c |  16 
 gcc/config/arm/arm.md| 171 +++
 gcc/config/arm/predicates.md |   2 +-
 4 files changed, 173 insertions(+), 20 deletions(-)

diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index f0eb8415b93..a6b520df32d 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -42,6 +42,9 @@ ADJUST_FLOAT_FORMAT (HF, ((arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
CC_Bmode should be used if only the C flag is correct after a subtract
  (eg after an unsigned borrow with carry-in propagation).
(used for DImode signed comparisons).
+   CC_ADCmode is used when the carry is formed from the output of ADC for an
+ addtion.  In this case we cannot use the trick of comparing the sum
+ against one of the other operands.
CCmode should be used otherwise.  */
 
 CC_MODE (CC_NOOV);
@@ -65,6 +68,7 @@ CC_MODE (CC_C);
 CC_MODE (CC_B);
 CC_MODE (CC_N);
 CC_MODE (CC_V);
+CC_MODE (CC_ADC);
 
 /* Vector modes.  */
 VECTOR_MODES (INT, 4);/*V4QI V2HI */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6da2a368d9f..eebbdc3d9c2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15387,6 +15387,14 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   && (rtx_equal_p (XEXP (x, 0), y) || rtx_equal_p (XEXP (x, 1), y)))
 return CC_Cmode;
 
+  if (GET_MODE (x) == DImode
+  && GET_CODE (x) == PLUS
+  && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+  && CONST_INT_P (y)
+  && UINTVAL (y) == 0x8
+  && (op == GEU || op == LTU))
+return CC_ADCmode;
+
   if (GET_MODE (x) == DImode
   && (op == GE || op == LT)
   && GET_CODE (x) == SIGN_EXTEND
@@ -23952,6 +23960,14 @@ maybe_get_arm_condition_code (rtx comparison)
 	default: return ARM_NV;
 	}
 
+case E_CC_ADCmode:
+  switch (comp_code)
+	{
+	case GEU: return ARM_CS;
+	case LTU: return ARM_CC;
+	default: return ARM_NV;
+	}
+
 case E_CCmode:
 case E_CC_RSBmode:
   switch (comp_code)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4ea6f4b226c..9f0e43571fd 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -517,16 +517,165 @@ (define_expand "uaddvsi4"
 (define_expand "uaddvdi4"
   [(match_operand:DI 0 "s_register_operand")
(match_operand:DI 1 "s_register_operand")
-   (match_operand:DI 2 "s_register_operand")
+   (match_operand:DI 2 "reg_or_int_operand")
(match_operand 3 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_adddi3_compareC (operands[0], operands[1], operands[2]));
-  arm_gen_unlikely_cbranch (LTU, CC_Cmode, operands[3]);
+  rtx lo_result, hi_result;
+  rtx lo_op1, hi_op1, lo_op2, hi_op2;
+  arm_decompose_di_binop (operands[1], operands[2], _op1, _op1,
+			  _op2, _op2);
+  lo_result = gen_lowpart (SImode, operands[0]);
+  hi_result = gen_highpart (SImode, operands[0]);
+
+  if (lo_op2 == const0_rtx)
+{
+  emit_move_insn (lo_result, lo_op1);
+  if (!arm_add_operand (hi_op2, SImode))
+	hi_op2 = force_reg (SImode, hi_op2);
+
+  gen_uaddvsi4 (hi_result, hi_op1, hi_op2, operands[3]);
+}
+  else
+{
+  if (!arm_add_operand (lo_op2, SImode))
+	lo_op2 = force_reg (SImode, lo_op2);
+  if (!arm_not_operand (hi_op2, SImode))
+	hi_op2 = force_reg (SImode, hi_op2);
+
+  emit_insn (gen_addsi3_compare_op1 (lo_result, lo_op1, lo_op2));
+
+  if (hi_op2 == const0_rtx)
+emit_insn (gen_addsi3_cin_cout_0 (hi_result, hi_op1));
+  else if (CONST_INT_P (hi_op2))
+emit_insn (gen_addsi3_cin_cout_imm 

[PATCH 28/29] [arm] Improvements to negvsi4 and negvdi4.

2019-10-18 Thread Richard Earnshaw

The generic expansion code for negv does not try the subv patterns,
but instead emits a sub and a compare separately.  Fortunately, the
patterns can make use of the new subv operations, so just call those.
We can also rewrite this using an iterator to simplify things further.
Finally, we can now make negvdi4 work on Thumb2 as well as Arm.

* config/arm/arm.md (negv3): New expansion rule.
(negvsi3, negvdi3): Delete.
(negdi2_compare): Delete.
---
 gcc/config/arm/arm.md | 41 +
 1 file changed, 5 insertions(+), 36 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5a8175ff8b0..7ef0c16580d 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4581,48 +4581,17 @@ (define_insn "udivsi3"
 
 ;; Unary arithmetic insns
 
-(define_expand "negvsi3"
-  [(match_operand:SI 0 "register_operand")
-   (match_operand:SI 1 "register_operand")
+(define_expand "negv3"
+  [(match_operand:SIDI 0 "s_register_operand")
+   (match_operand:SIDI 1 "s_register_operand")
(match_operand 2 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_subsi3_compare (operands[0], const0_rtx, operands[1]));
-  arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
-
-  DONE;
-})
-
-(define_expand "negvdi3"
-  [(match_operand:DI 0 "s_register_operand")
-   (match_operand:DI 1 "s_register_operand")
-   (match_operand 2 "")]
-  "TARGET_ARM"
-{
-  emit_insn (gen_negdi2_compare (operands[0], operands[1]));
-  arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[2]);
-
+  emit_insn (gen_subv4 (operands[0], const0_rtx, operands[1],
+			  operands[2]));
   DONE;
 })
 
-
-(define_insn "negdi2_compare"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
-	  (const_int 0)
-	  (match_operand:DI 1 "register_operand" "r,r")))
-   (set (match_operand:DI 0 "register_operand" "=,")
-	(minus:DI (const_int 0) (match_dup 1)))]
-  "TARGET_ARM"
-  "@
-   rsbs\\t%Q0, %Q1, #0;rscs\\t%R0, %R1, #0
-   rsbs\\t%Q0, %Q1, #0;sbcs\\t%R0, %R1, %R1, lsl #1"
-  [(set_attr "conds" "set")
-   (set_attr "arch" "a,t2")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "negsi2"
   [(set (match_operand:SI 0 "s_register_operand")
 	(neg:SI (match_operand:SI 1 "s_register_operand")))]


[PATCH 04/29] [arm] Rewrite addsi3_carryin_shift_ in canonical form

2019-10-18 Thread Richard Earnshaw

The add-with-carry operation which involves a shift doesn't match at present
because it isn't matching the canonical form generated by combine.  Fixing
this is simply a matter of re-ordering the operands.

* config/arm/arm.md (addsi3_carryin_shift_): Reorder operands
to match canonical form.
---
 gcc/config/arm/arm.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 4a7a64e6613..9754a761faf 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -913,8 +913,8 @@ (define_insn "*addsi3_carryin_shift_"
 		  (match_operator:SI 2 "shift_operator"
 		[(match_operand:SI 3 "s_register_operand" "r")
 		 (match_operand:SI 4 "reg_or_int_operand" "rM")])
-		  (match_operand:SI 1 "s_register_operand" "r"))
-		 (LTUGEU:SI (reg: CC_REGNUM) (const_int 0]
+		  (LTUGEU:SI (reg: CC_REGNUM) (const_int 0)))
+		 (match_operand:SI 1 "s_register_operand" "r")))]
   "TARGET_32BIT"
   "adc%?\\t%0, %1, %3%S2"
   [(set_attr "conds" "use")


[PATCH 15/29] [arm] Improve handling of DImode comparisions against constants.

2019-10-18 Thread Richard Earnshaw

In almost all cases it is better to handle inequality handling against constants
by transforming comparisons of the form (reg  const) into
(reg  (const+1)).  However, there are many cases that we could
handle but currently failed to do so because we forced the constant into a
register too early in the pattern expansion.  To permit this to be done we need
to defer forcing the constant into a register until after we've had the chance
to do the transform - in some cases that may even mean that we no-longer need
to force the constant into a register at all.  For example, on Arm, the case:

_Bool f8 (unsigned long long a) { return a > 0x; }

previously compiled to

mov r3, #0
cmp r1, r3
mvn r2, #0
cmpeq   r0, r2
movhi   r0, #1
movls   r0, #0
bx  lr

But now compiles to

cmp r1, #1
cmpeq   r0, #0
movcs   r0, #1
movcc   r0, #0
bx  lr

Which although not yet completely optimal, is certainly better than
previously.

* config/arm/arm.md (cbranchdi4): Accept reg_or_int_operand for
operand 2.
(cstoredi4): Similarly, but for operand 3.
* config/arm/arm.c (arm_canoncialize_comparison): Allow canonicalization
of unsigned compares with a constant on Arm.  Prefer using const+1 and
adjusting the comparison over swapping the operands whenever the
original constant was not valid.
(arm_gen_dicompare_reg): If Y is not a valid operand, force it to a
register here.
(arm_validize_comparison): Do not force invalid DImode operands to
registers here.
---
 gcc/config/arm/arm.c  | 37 +++--
 gcc/config/arm/arm.md |  4 ++--
 2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 64367b42332..ddfe4335169 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5372,15 +5372,16 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
 
   maxval = (HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (mode) - 1)) - 1;
 
-  /* For DImode, we have GE/LT/GEU/LTU comparisons.  In ARM mode
- we can also use cmp/cmpeq for GTU/LEU.  GT/LE must be either
- reversed or (for constant OP1) adjusted to GE/LT.  Similarly
- for GTU/LEU in Thumb mode.  */
+  /* For DImode, we have GE/LT/GEU/LTU comparisons (with cmp/sbc).  In
+ ARM mode we can also use cmp/cmpeq for GTU/LEU.  GT/LE must be
+ either reversed or (for constant OP1) adjusted to GE/LT.
+ Similarly for GTU/LEU in Thumb mode.  */
   if (mode == DImode)
 {
 
   if (*code == GT || *code == LE
-	  || (!TARGET_ARM && (*code == GTU || *code == LEU)))
+	  || ((!TARGET_ARM || CONST_INT_P (*op1))
+	  && (*code == GTU || *code == LEU)))
 	{
 	  /* Missing comparison.  First try to use an available
 	 comparison.  */
@@ -5392,23 +5393,27 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
 		case GT:
 		case LE:
 		  if (i != maxval
-		  && arm_const_double_by_immediates (GEN_INT (i + 1)))
+		  && (!arm_const_double_by_immediates (*op1)
+			  || arm_const_double_by_immediates (GEN_INT (i + 1
 		{
 		  *op1 = GEN_INT (i + 1);
 		  *code = *code == GT ? GE : LT;
 		  return;
 		}
 		  break;
+
 		case GTU:
 		case LEU:
 		  if (i != ~((unsigned HOST_WIDE_INT) 0)
-		  && arm_const_double_by_immediates (GEN_INT (i + 1)))
+		  && (!arm_const_double_by_immediates (*op1)
+			  || arm_const_double_by_immediates (GEN_INT (i + 1
 		{
 		  *op1 = GEN_INT (i + 1);
 		  *code = *code == GTU ? GEU : LTU;
 		  return;
 		}
 		  break;
+
 		default:
 		  gcc_unreachable ();
 		}
@@ -15436,7 +15441,7 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx scratch)
 		scratch = gen_reg_rtx (SImode);
 	  if (ylo == const0_rtx)
 		{
-		  yhi = GEN_INT (-INTVAL(yhi));
+		  yhi = gen_int_mode (-INTVAL (yhi), SImode);
 		  if (!arm_add_operand (yhi, SImode))
 		yhi = force_reg (SImode, yhi);
 		  emit_insn (gen_addsi3 (scratch, xhi, yhi));
@@ -15445,7 +15450,7 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx scratch)
 	  else
 		{
 		  gcc_assert (yhi == const0_rtx);
-		  ylo = GEN_INT (-INTVAL(ylo));
+		  ylo = gen_int_mode (-INTVAL (ylo), SImode);
 		  if (!arm_add_operand (ylo, SImode))
 		ylo = force_reg (SImode, ylo);
 		  emit_insn (gen_addsi3 (scratch, xlo, ylo));
@@ -15458,6 +15463,8 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx scratch)
 	x = gen_rtx_IOR (SImode, gen_lowpart (SImode, x),
 			 gen_highpart (SImode, x));
 	}
+  else if (!cmpdi_operand (y, mode))
+	y = force_reg (DImode, y);
 
   /* A scratch register is required.  */
   if (reload_completed)
@@ -15470,7 +15477,12 @@ arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx scratch)
   emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set, clobber)));
 }
   else
-

[PATCH 26/29] [arm] Improve constant handling for subvsi4.

2019-10-18 Thread Richard Earnshaw

This patch addresses constant handling in subvsi4.  Either operand may
be a constant.  If the second input (operand[2]) is a constant, then
we can canonicalize this into an addition form, providing we take care
of the INT_MIN case.  In that case the negation has to handle the fact
that -INT_MIN is still INT_MIN and we need to ensure that a subtract
operation is performed rather than an addition.  The remaining cases
are largely duals of the usubvsi4 expansion.

This patch also fixes a technical correctness bug in the old
expansion, where we did not realy describe the test for overflow in
the RTL.  We seem to have got away with that, however...

* config/arm/arm.md (subv4): Delete.
(subvdi4): New expander pattern.
(subvsi4): Likewise.  Handle some immediate values.
(subvsi3_intmin): New insn pattern.
(subvsi3): Likewise.
(subvsi3_imm1): Likewise.
* config/arm/arm.c (select_cc_mode): Also allow minus for CC_V
idioms.
---
 gcc/config/arm/arm.c  |  5 ++-
 gcc/config/arm/arm.md | 96 ---
 2 files changed, 94 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c9abbb0f91d..d5ffd2133a9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15413,11 +15413,12 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
 
   if (GET_MODE (x) == DImode
   && (op == EQ || op == NE)
-  && GET_CODE (x) == PLUS
+  && (GET_CODE (x) == PLUS
+	  || GET_CODE (x) == MINUS)
   && (GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
 	  || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND)
   && GET_CODE (y) == SIGN_EXTEND
-  && GET_CODE (XEXP (y, 0)) == PLUS)
+  && GET_CODE (XEXP (y, 0)) == GET_CODE (x))
 return CC_Vmode;
 
   if (GET_MODE_CLASS (GET_MODE (x)) == MODE_CC)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 92f1823cdfa..05b735cfccd 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -957,6 +957,22 @@ (define_insn "*addsi3_compareV_reg_nosum"
(set_attr "type" "alus_sreg")]
 )
 
+(define_insn "subvsi3_intmin"
+  [(set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	  (plus:DI
+	(sign_extend:DI
+	 (match_operand:SI 1 "register_operand" "r"))
+	(const_int 2147483648))
+	  (sign_extend:DI (plus:SI (match_dup 1) (const_int -2147483648)
+   (set (match_operand:SI 0 "register_operand" "=r")
+	(plus:SI (match_dup 1) (const_int -2147483648)))]
+  "TARGET_32BIT"
+  "subs%?\\t%0, %1, #-2147483648"
+  [(set_attr "conds" "set")
+   (set_attr "type" "alus_imm")]
+)
+
 (define_insn "addsi3_compareV_imm"
   [(set (reg:CC_V CC_REGNUM)
 	(compare:CC_V
@@ -1339,14 +1355,52 @@ (define_insn "*addsi3_carryin_clobercc"
 (set_attr "type" "adcs_reg")]
 )
 
-(define_expand "subv4"
-  [(match_operand:SIDI 0 "register_operand")
-   (match_operand:SIDI 1 "register_operand")
-   (match_operand:SIDI 2 "register_operand")
+(define_expand "subvsi4"
+  [(match_operand:SI 0 "s_register_operand")
+   (match_operand:SI 1 "arm_rhs_operand")
+   (match_operand:SI 2 "arm_add_operand")
+   (match_operand 3 "")]
+  "TARGET_32BIT"
+{
+  if (CONST_INT_P (operands[1]) && CONST_INT_P (operands[2]))
+{
+  /* If both operands are constants we can decide the result statically.  */
+  wi::overflow_type overflow;
+  wide_int val = wi::sub (rtx_mode_t (operands[1], SImode),
+			  rtx_mode_t (operands[2], SImode),
+			  SIGNED, );
+  emit_move_insn (operands[0], GEN_INT (val.to_shwi ()));
+  if (overflow != wi::OVF_NONE)
+	emit_jump_insn (gen_jump (operands[3]));
+  DONE;
+}
+  else if (CONST_INT_P (operands[2]))
+{
+  operands[2] = GEN_INT (-INTVAL (operands[2]));
+  /* Special case for INT_MIN.  */
+  if (INTVAL (operands[2]) == 0x8000)
+	emit_insn (gen_subvsi3_intmin (operands[0], operands[1]));
+  else
+	emit_insn (gen_addsi3_compareV_imm (operands[0], operands[1],
+	  operands[2]));
+}
+  else if (CONST_INT_P (operands[1]))
+emit_insn (gen_subvsi3_imm1 (operands[0], operands[1], operands[2]));
+  else
+emit_insn (gen_subvsi3 (operands[0], operands[1], operands[2]));
+
+  arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
+  DONE;
+})
+
+(define_expand "subvdi4"
+  [(match_operand:DI 0 "s_register_operand")
+   (match_operand:DI 1 "s_register_operand")
+   (match_operand:DI 2 "s_register_operand")
(match_operand 3 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_sub3_compare1 (operands[0], operands[1], operands[2]));
+  emit_insn (gen_subdi3_compare1 (operands[0], operands[1], operands[2]));
   arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
 
   DONE;
@@ -1496,6 +1550,38 @@ (define_insn "subsi3_compare1"
(set_attr "type" "alus_sreg")]
 )
 
+(define_insn "subvsi3"
+  [(set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	 (minus:DI
+	  (sign_extend:DI (match_operand:SI 1 "s_register_operand" "l,r"))
+	  (sign_extend:DI (match_operand:SI 2 "s_register_operand" "l,r")))
+	 

[PATCH 05/29] [arm] fix constraints on addsi3_carryin_alt2

2019-10-18 Thread Richard Earnshaw

addsi3_carryin_alt2 has a more strict constraint than the predicate
when adding a constant.  This leads to sub-optimal code in some
circumstances.

* config/arm/arm.md (addsi3_carryin_alt2): Use arm_not_operand for
operand 2.
---
 gcc/config/arm/arm.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 9754a761faf..fbe154a9873 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -893,7 +893,7 @@ (define_insn "*addsi3_carryin_alt2_"
   [(set (match_operand:SI 0 "s_register_operand" "=l,r,r")
 (plus:SI (plus:SI (LTUGEU:SI (reg: CC_REGNUM) (const_int 0))
   (match_operand:SI 1 "s_register_operand" "%l,r,r"))
- (match_operand:SI 2 "arm_rhs_operand" "l,rI,K")))]
+ (match_operand:SI 2 "arm_not_operand" "l,rI,K")))]
   "TARGET_32BIT"
   "@
adc%?\\t%0, %1, %2


[PATCH 17/29] [arm] Handle some constant comparisons using rsbs+rscs

2019-10-18 Thread Richard Earnshaw

In a small number of cases it is preferable to handle comparisons with
constants using the sequence

RSBStmp, Xlo, constlo
RSCStmp, Xhi, consthi

which allows us to handle a small number of LE/GT/LEU/GEU cases when
changing the code to use LT/GE/LTU/GEU would make the constant more
expensive.  Sadly, we cannot do this on Thumb, since we need RSC, so we
now always use the incremented constant in that case since normally that
still works out cheaper than forcing the entire constant into a register.

Further investigation has also shown that the canonicalization of a
reverse subtract and compare is valid for signed as well as unsigned value,
so we relax the restriction on selecting CC_RSBmode to allow all types
of compare.

* config/arm/arm.c (arm_const_double_prefer_rsbs_rsc): New function.
(arm_canonicalize_comparison): For GT/LE/GTU/GEU, use the constant
unchanged only if that will be cheaper.
(arm_select_cc_mode): Recognize a swapped comparison that will
be regenerated using RSBS or RSCS.  Relax restriction on selecting
CC_RSBmode.
(arm_gen_dicompare_reg): Handle LE/GT/LEU/GEU comparisons against
a constant.
(arm_gen_compare_reg): Handle compare (CONST, X) when the mode
is CC_RSBmode.
(maybe_get_arm_condition_code): CC_RSBmode now returns the same codes
as CCmode.
* config/arm/arm.md (rsb_imm_compare_scratch): New pattern.
(rscsi3_out_scratch): New pattern.
---
 gcc/config/arm/arm.c  | 153 +-
 gcc/config/arm/arm.md |  27 
 2 files changed, 134 insertions(+), 46 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 99c8bd79d30..299dce638c2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -5355,6 +5355,21 @@ arm_gen_constant (enum rtx_code code, machine_mode mode, rtx cond,
   return insns;
 }
 
+/* Return TRUE if op is a constant where both the low and top words are
+   suitable for RSB/RSC instructions.  This is never true for Thumb, since
+   we do not have RSC in that case.  */
+static bool
+arm_const_double_prefer_rsbs_rsc (rtx op)
+{
+  /* Thumb lacks RSC, so we never prefer that sequence.  */
+  if (TARGET_THUMB || !CONST_INT_P (op))
+return false;
+  HOST_WIDE_INT hi, lo;
+  lo = UINTVAL (op) & 0xULL;
+  hi = UINTVAL (op) >> 32;
+  return const_ok_for_arm (lo) && const_ok_for_arm (hi);
+}
+
 /* Canonicalize a comparison so that we are more likely to recognize it.
This can be done for a few constant compares, where we can make the
immediate value easier to load.  */
@@ -5380,8 +5395,7 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
 {
 
   if (*code == GT || *code == LE
-	  || ((!TARGET_ARM || CONST_INT_P (*op1))
-	  && (*code == GTU || *code == LEU)))
+	  || *code == GTU || *code == LEU)
 	{
 	  /* Missing comparison.  First try to use an available
 	 comparison.  */
@@ -5392,10 +5406,13 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
 		{
 		case GT:
 		case LE:
-		  if (i != maxval
-		  && (!arm_const_double_by_immediates (*op1)
-			  || arm_const_double_by_immediates (GEN_INT (i + 1
+		  if (i != maxval)
 		{
+		  /* Try to convert to GE/LT, unless that would be more
+			 expensive.  */
+		  if (!arm_const_double_by_immediates (GEN_INT (i + 1))
+			  && arm_const_double_prefer_rsbs_rsc (*op1))
+			return;
 		  *op1 = GEN_INT (i + 1);
 		  *code = *code == GT ? GE : LT;
 		  return;
@@ -5404,10 +5421,13 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
 
 		case GTU:
 		case LEU:
-		  if (i != ~((unsigned HOST_WIDE_INT) 0)
-		  && (!arm_const_double_by_immediates (*op1)
-			  || arm_const_double_by_immediates (GEN_INT (i + 1
+		  if (i != ~((unsigned HOST_WIDE_INT) 0))
 		{
+		  /* Try to convert to GEU/LTU, unless that would
+			 be more expensive.  */
+		  if (!arm_const_double_by_immediates (GEN_INT (i + 1))
+			  && arm_const_double_prefer_rsbs_rsc (*op1))
+			return;
 		  *op1 = GEN_INT (i + 1);
 		  *code = *code == GTU ? GEU : LTU;
 		  return;
@@ -5419,7 +5439,6 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
 		}
 	}
 
-	  /* If that did not work, reverse the condition.  */
 	  if (!op0_preserve_value)
 	{
 	  std::swap (*op0, *op1);
@@ -15251,6 +15270,28 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
 	  || GET_CODE (x) == ROTATERT))
 return CC_SWPmode;
 
+  /* A widened compare of the sum of a value plus a carry against a
+ constant.  This is a representation of RSC.  We want to swap the
+ result of the comparison at output.  Not valid if the Z bit is
+ needed.  */
+  if (GET_MODE (x) == DImode
+  && GET_CODE (x) == PLUS
+  && arm_borrow_operation (XEXP (x, 1), DImode)
+  && CONST_INT_P (y)
+  && ((GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+	   && (op == LE || op 

[PATCH 08/29] [arm] Introduce arm_carry_operation

2019-10-18 Thread Richard Earnshaw

An earlier patch introduced arm_borrow_operation, this one introduces
the carry variant, which is the same except that the logic of the
carry-setting is inverted.  Having done this we can now match more
cases where the carry flag is propagated from comparisons with
different modes without having to define even more patterns.  A few
small changes to the expand patterns are required to directly create
the carry representation.

The iterators LTUGEU is no-longer needed and removed, as is the code
attribute 'cnb'.

Finally, we fix a long-standing bug which was probably inert before:
in Thumb2 a shift with ADC can only be by an immediate amount;
register-specified shifts are not permitted.

* config/arm/predicates.md (arm_carry_operation): New special
predicate.
* config/arm/iterators.md (LTUGEU): Delete iterator.
(cnb): Delete code attribute.
(optab): Delete ltu and geu elements.
* config/arm/arm.md (addsi3_carryin): Renamed from
addsi3_carryin_.  Remove iterator and use arm_carry_operand.
(add0si3_carryin): Similarly, but from add0si3_carryin_.
(addsi3_carryin_alt2): Similarly, but from addsi3_carryin_alt2_.
(addsi3_carryin_clobercc): Similarly.
(addsi3_carryin_shift): Similarly.  Do not allow register shifts in
Thumb2 state.
---
 gcc/config/arm/arm.md| 36 
 gcc/config/arm/iterators.md  | 11 +--
 gcc/config/arm/predicates.md | 21 +
 3 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f597a277c17..f53dbc27207 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -471,10 +471,12 @@ (define_expand "adddi3"
 	hi_op2 = force_reg (SImode, hi_op2);
 
 	  emit_insn (gen_addsi3_compareC (lo_dest, lo_op1, lo_op2));
+	  rtx carry = gen_rtx_LTU (SImode, gen_rtx_REG (CC_Cmode, CC_REGNUM),
+   const0_rtx);
 	  if (hi_op2 == const0_rtx)
-	emit_insn (gen_add0si3_carryin_ltu (hi_dest, hi_op1));
+	emit_insn (gen_add0si3_carryin (hi_dest, hi_op1, carry));
 	  else
-	emit_insn (gen_addsi3_carryin_ltu (hi_dest, hi_op1, hi_op2));
+	emit_insn (gen_addsi3_carryin (hi_dest, hi_op1, hi_op2, carry));
 	}
 
   if (lo_result != lo_dest)
@@ -858,11 +860,11 @@ (define_insn "*compare_addsi2_op1"
(set_attr "type" "alus_imm,alus_sreg,alus_imm,alus_imm,alus_sreg")]
  )
 
-(define_insn "addsi3_carryin_"
+(define_insn "addsi3_carryin"
   [(set (match_operand:SI 0 "s_register_operand" "=l,r,r")
 (plus:SI (plus:SI (match_operand:SI 1 "s_register_operand" "%l,r,r")
   (match_operand:SI 2 "arm_not_operand" "0,rI,K"))
- (LTUGEU:SI (reg: CC_REGNUM) (const_int 0]
+ (match_operand:SI 3 "arm_carry_operation" "")))]
   "TARGET_32BIT"
   "@
adc%?\\t%0, %1, %2
@@ -877,9 +879,9 @@ (define_insn "addsi3_carryin_"
 )
 
 ;; Canonicalization of the above when the immediate is zero.
-(define_insn "add0si3_carryin_"
+(define_insn "add0si3_carryin"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
-	(plus:SI (LTUGEU:SI (reg: CC_REGNUM) (const_int 0))
+	(plus:SI (match_operand:SI 2 "arm_carry_operation" "")
 		 (match_operand:SI 1 "arm_not_operand" "r")))]
   "TARGET_32BIT"
   "adc%?\\t%0, %1, #0"
@@ -889,9 +891,9 @@ (define_insn "add0si3_carryin_"
(set_attr "type" "adc_imm")]
 )
 
-(define_insn "*addsi3_carryin_alt2_"
+(define_insn "*addsi3_carryin_alt2"
   [(set (match_operand:SI 0 "s_register_operand" "=l,r,r")
-(plus:SI (plus:SI (LTUGEU:SI (reg: CC_REGNUM) (const_int 0))
+(plus:SI (plus:SI (match_operand:SI 3 "arm_carry_operation" "")
   (match_operand:SI 1 "s_register_operand" "%l,r,r"))
  (match_operand:SI 2 "arm_not_operand" "l,rI,K")))]
   "TARGET_32BIT"
@@ -907,28 +909,30 @@ (define_insn "*addsi3_carryin_alt2_"
(set_attr "type" "adc_reg,adc_reg,adc_imm")]
 )
 
-(define_insn "*addsi3_carryin_shift_"
-  [(set (match_operand:SI 0 "s_register_operand" "=r")
+(define_insn "*addsi3_carryin_shift"
+  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
 	(plus:SI (plus:SI
 		  (match_operator:SI 2 "shift_operator"
-		[(match_operand:SI 3 "s_register_operand" "r")
-		 (match_operand:SI 4 "reg_or_int_operand" "rM")])
-		  (LTUGEU:SI (reg: CC_REGNUM) (const_int 0)))
-		 (match_operand:SI 1 "s_register_operand" "r")))]
+		[(match_operand:SI 3 "s_register_operand" "r,r")
+		 (match_operand:SI 4 "shift_amount_operand" "M,r")])
+		  (match_operand:SI 5 "arm_carry_operation" ""))
+		 (match_operand:SI 1 "s_register_operand" "r,r")))]
   "TARGET_32BIT"
   "adc%?\\t%0, %1, %3%S2"
   [(set_attr "conds" "use")
+   (set_attr "arch" "32,a")
+   (set_attr "shift" "3")
(set_attr "predicable" "yes")
(set (attr "type") (if_then_else (match_operand 4 "const_int_operand" "")
 		  (const_string "alu_shift_imm")
 		  (const_string 

[PATCH 09/29] [arm] Correctly cost addition with a carry-in

2019-10-18 Thread Richard Earnshaw

The cost routine for Arm and Thumb2 was not recognising the idioms that
describe the addition with carry, this results in the instructions
appearing more expensive than they really are, which occasionally can lead
to poor choices by combine.  Recognising all the possible variants is
a little trickier than normal because the expressions can become complex
enough that this is no single canonical from.

* config/arm/arm.c (strip_carry_operation): New function.
(arm_rtx_costs_internal, case PLUS): Handle addtion with carry-in
for SImode.
---
 gcc/config/arm/arm.c | 76 +---
 1 file changed, 65 insertions(+), 11 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9a779e24cac..dfbd5cde5eb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9504,6 +9504,20 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
 }
 }
 
+/* Helper function for arm_rtx_costs.  If one operand of the OP, a
+   PLUS, adds the carry flag, then return the other operand.  If
+   neither is a carry, return OP unchanged.  */
+static rtx
+strip_carry_operation (rtx op)
+{
+  gcc_assert (GET_CODE (op) == PLUS);
+  if (arm_carry_operation (XEXP (op, 0), GET_MODE (op)))
+return XEXP (op, 1);
+  else if (arm_carry_operation (XEXP (op, 1), GET_MODE (op)))
+return XEXP (op, 0);
+  return op;
+}
+
 /* Helper function for arm_rtx_costs.  If the operand is a valid shift
operand, then return the operand that is being shifted.  If the shift
is not by a constant, then set SHIFT_REG to point to the operand.
@@ -10253,8 +10267,41 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	  return true;
 	}
 
+	  rtx op0 = XEXP (x, 0);
+	  rtx op1 = XEXP (x, 1);
+
+	  /* Handle a side effect of adding in the carry to an addition.  */
+	  if (GET_CODE (op0) == PLUS
+	  && arm_carry_operation (op1, mode))
+	{
+	  op1 = XEXP (op0, 1);
+	  op0 = XEXP (op0, 0);
+	}
+	  else if (GET_CODE (op1) == PLUS
+		   && arm_carry_operation (op0, mode))
+	{
+	  op0 = XEXP (op1, 0);
+	  op1 = XEXP (op1, 1);
+	}
+	  else if (GET_CODE (op0) == PLUS)
+	{
+	  op0 = strip_carry_operation (op0);
+	  if (swap_commutative_operands_p (op0, op1))
+		std::swap (op0, op1);
+	}
+
+	  if (arm_carry_operation (op0, mode))
+	{
+	  /* Adding the carry to a register is a canonicalization of
+		 adding 0 to the register plus the carry.  */
+	  if (speed_p)
+		*cost += extra_cost->alu.arith;
+	  *cost += rtx_cost (op1, mode, PLUS, 1, speed_p);
+	  return true;
+	}
+
 	  shift_reg = NULL;
-	  shift_op = shifter_op_p (XEXP (x, 0), _reg);
+	  shift_op = shifter_op_p (op0, _reg);
 	  if (shift_op != NULL)
 	{
 	  if (shift_reg)
@@ -10267,12 +10314,13 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 		*cost += extra_cost->alu.arith_shift;
 
 	  *cost += (rtx_cost (shift_op, mode, ASHIFT, 0, speed_p)
-			+ rtx_cost (XEXP (x, 1), mode, PLUS, 1, speed_p));
+			+ rtx_cost (op1, mode, PLUS, 1, speed_p));
 	  return true;
 	}
-	  if (GET_CODE (XEXP (x, 0)) == MULT)
+
+	  if (GET_CODE (op0) == MULT)
 	{
-	  rtx mul_op = XEXP (x, 0);
+	  rtx mul_op = op0;
 
 	  if (TARGET_DSP_MULTIPLY
 		  && ((GET_CODE (XEXP (mul_op, 0)) == SIGN_EXTEND
@@ -10296,7 +10344,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
   SIGN_EXTEND, 0, speed_p)
 			+ rtx_cost (XEXP (XEXP (mul_op, 1), 0), mode,
 	SIGN_EXTEND, 0, speed_p)
-			+ rtx_cost (XEXP (x, 1), mode, PLUS, 1, speed_p));
+			+ rtx_cost (op1, mode, PLUS, 1, speed_p));
 		  return true;
 		}
 
@@ -10304,24 +10352,30 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 		*cost += extra_cost->mult[0].add;
 	  *cost += (rtx_cost (XEXP (mul_op, 0), mode, MULT, 0, speed_p)
 			+ rtx_cost (XEXP (mul_op, 1), mode, MULT, 1, speed_p)
-			+ rtx_cost (XEXP (x, 1), mode, PLUS, 1, speed_p));
+			+ rtx_cost (op1, mode, PLUS, 1, speed_p));
 	  return true;
 	}
-	  if (CONST_INT_P (XEXP (x, 1)))
+
+	  if (CONST_INT_P (op1))
 	{
 	  int insns = arm_gen_constant (PLUS, SImode, NULL_RTX,
-	INTVAL (XEXP (x, 1)), NULL_RTX,
+	INTVAL (op1), NULL_RTX,
 	NULL_RTX, 1, 0);
 	  *cost = COSTS_N_INSNS (insns);
 	  if (speed_p)
 		*cost += insns * extra_cost->alu.arith;
-	  *cost += rtx_cost (XEXP (x, 0), mode, PLUS, 0, speed_p);
+	  *cost += rtx_cost (op0, mode, PLUS, 0, speed_p);
 	  return true;
 	}
-	  else if (speed_p)
+
+	  if (speed_p)
 	*cost += extra_cost->alu.arith;
 
-	  return false;
+	  /* Don't recurse here because we want to test the operands
+	 without any carry operation.  */
+	  *cost += rtx_cost (op0, mode, PLUS, 0, speed_p);
+	  *cost += rtx_cost (op1, mode, PLUS, 1, speed_p);
+	  return true;
 	}
 
   if 

[PATCH 11/29] [arm] Reduce cost of insns that are simple reg-reg moves.

2019-10-18 Thread Richard Earnshaw

Consider this sequence during combine:

Trying 18, 7 -> 22:
   18: r118:SI=r122:SI
  REG_DEAD r122:SI
7: r114:SI=0x1-r118:SI-ltu(cc:CC_RSB,0)
  REG_DEAD r118:SI
  REG_DEAD cc:CC_RSB
   22: r1:SI=r114:SI
  REG_DEAD r114:SI
Failed to match this instruction:
(set (reg:SI 1 r1 [+4 ])
(minus:SI (geu:SI (reg:CC_RSB 100 cc)
(const_int 0 [0]))
(reg:SI 122)))
Successfully matched this instruction:
(set (reg:SI 114)
(geu:SI (reg:CC_RSB 100 cc)
(const_int 0 [0])))
Successfully matched this instruction:
(set (reg:SI 1 r1 [+4 ])
(minus:SI (reg:SI 114)
(reg:SI 122)))
allowing combination of insns 18, 7 and 22
original costs 4 + 4 + 4 = 12
replacement costs 8 + 4 = 12

The costs are all correct, but we really don't want this combination
to take place.  The original costs contain an insn that is a simple
move of one pseudo register to another and it is extremely likely that
register allocation will eliminate this insn entirely.  On the other
hand, the resulting sequence really does expand into a sequence that
costs 12 (ie 3 insns).

We don't want to prevent combine from eliminating such moves, as this
can expose more combine opportunities, but we shouldn't rate them as
profitable in themselves.  We can do this be adjusting the costs
slightly so that the benefit of eliminating such a simple insn is
reduced.

We only do this before register allocation; after allocation we give
such insns their full cost.

* config/arm/arm.c (arm_insn_cost): New function.
(TARGET_INSN_COST): Override default definition.
---
 gcc/config/arm/arm.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b91b52f6d51..e33b6b14d28 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -181,6 +181,7 @@ static bool arm_have_conditional_execution (void);
 static bool arm_cannot_force_const_mem (machine_mode, rtx);
 static bool arm_legitimate_constant_p (machine_mode, rtx);
 static bool arm_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static int arm_insn_cost (rtx_insn *, bool);
 static int arm_address_cost (rtx, machine_mode, addr_space_t, bool);
 static int arm_register_move_cost (machine_mode, reg_class_t, reg_class_t);
 static int arm_memory_move_cost (machine_mode, reg_class_t, bool);
@@ -510,6 +511,8 @@ static const struct attribute_spec arm_attribute_table[] =
 #define TARGET_RTX_COSTS arm_rtx_costs
 #undef  TARGET_ADDRESS_COST
 #define TARGET_ADDRESS_COST arm_address_cost
+#undef TARGET_INSN_COST
+#define TARGET_INSN_COST arm_insn_cost
 
 #undef TARGET_SHIFT_TRUNCATION_MASK
 #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask
@@ -11486,6 +11489,24 @@ arm_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
   return result;
 }
 
+static int
+arm_insn_cost (rtx_insn *insn, bool speed)
+{
+  int cost;
+
+  /* Don't cost a simple reg-reg move at a full insn cost: such moves
+ will likely disappear during register allocation.  */
+  if (!reload_completed
+  && GET_CODE (PATTERN (insn)) == SET
+  && REG_P (SET_DEST (PATTERN (insn)))
+  && REG_P (SET_SRC (PATTERN (insn
+return 2;
+  cost = pattern_cost (PATTERN (insn), speed);
+  /* If the cost is zero, then it's likely a complex insn.  We don't want the
+ cost of these to be less than something we know about.  */
+  return cost ? cost : COSTS_N_INSNS (2);
+}
+
 /* All address computations that can be done are free, but rtx cost returns
the same for practically all of them.  So we weight the different types
of address here in the order (most pref first):


[PATCH 13/29] [arm] Add alternative canonicalizations for subtract-with-carry + shift

2019-10-18 Thread Richard Earnshaw

This patch adds a couple of alternative canonicalizations to allow
combine to match a subtract-with-carry operation when one of the operands
is shifted first.  The most common case of this is when combining a
sign-extend of one operand with a long-long value during subtraction.
The RSC variant is only enabled for Arm, the SBC variant for any 32-bit
compilation.

* config/arm/arm.md (subsi3_carryin_shift_alt): New pattern.
(rsbsi3_carryin_shift_alt): Likewise.
---
 gcc/config/arm/arm.md | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 74f417fbe4b..613f50ae5f0 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1048,6 +1048,23 @@ (define_insn "*subsi3_carryin_shift"
 (const_string "alu_shift_reg")))]
 )
 
+(define_insn "*subsi3_carryin_shift_alt"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(minus:SI (minus:SI
+		   (match_operand:SI 1 "s_register_operand" "r")
+		   (match_operand:SI 5 "arm_borrow_operation" ""))
+		  (match_operator:SI 2 "shift_operator"
+		   [(match_operand:SI 3 "s_register_operand" "r")
+		(match_operand:SI 4 "reg_or_int_operand" "rM")])))]
+  "TARGET_32BIT"
+  "sbc%?\\t%0, %1, %3%S2"
+  [(set_attr "conds" "use")
+   (set_attr "predicable" "yes")
+   (set (attr "type") (if_then_else (match_operand 4 "const_int_operand" "")
+(const_string "alu_shift_imm")
+(const_string "alu_shift_reg")))]
+)
+
 (define_insn "*rsbsi3_carryin_shift"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(minus:SI (minus:SI
@@ -1065,6 +1082,23 @@ (define_insn "*rsbsi3_carryin_shift"
 		  (const_string "alu_shift_reg")))]
 )
 
+(define_insn "*rsbsi3_carryin_shift_alt"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(minus:SI (minus:SI
+		   (match_operator:SI 2 "shift_operator"
+		[(match_operand:SI 3 "s_register_operand" "r")
+		 (match_operand:SI 4 "reg_or_int_operand" "rM")])
+		(match_operand:SI 5 "arm_borrow_operation" ""))
+		  (match_operand:SI 1 "s_register_operand" "r")))]
+  "TARGET_ARM"
+  "rsc%?\\t%0, %1, %3%S2"
+  [(set_attr "conds" "use")
+   (set_attr "predicable" "yes")
+   (set (attr "type") (if_then_else (match_operand 4 "const_int_operand" "")
+		  (const_string "alu_shift_imm")
+		  (const_string "alu_shift_reg")))]
+)
+
 ; transform ((x << y) - 1) to ~(~(x-1) << y)  Where X is a constant.
 (define_split
   [(set (match_operand:SI 0 "s_register_operand" "")


[PATCH 27/29] [arm] Early expansion of subvdi4

2019-10-18 Thread Richard Earnshaw

This patch adds early expansion of subvdi4.  The expansion sequence
is broadly based on the expansion of usubvdi4.

* config/arm/arm.md (subvdi4): Decompose calculation into 32-bit
operations.
(subdi3_compare1): Delete pattern.
(subvsi3_borrow): New insn pattern.
(subvsi3_borrow_imm): Likewise.
---
 gcc/config/arm/arm.md | 131 --
 1 file changed, 114 insertions(+), 17 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 05b735cfccd..5a8175ff8b0 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1395,12 +1395,79 @@ (define_expand "subvsi4"
 
 (define_expand "subvdi4"
   [(match_operand:DI 0 "s_register_operand")
-   (match_operand:DI 1 "s_register_operand")
-   (match_operand:DI 2 "s_register_operand")
+   (match_operand:DI 1 "reg_or_int_operand")
+   (match_operand:DI 2 "reg_or_int_operand")
(match_operand 3 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_subdi3_compare1 (operands[0], operands[1], operands[2]));
+  rtx lo_result, hi_result;
+  rtx lo_op1, hi_op1, lo_op2, hi_op2;
+  lo_result = gen_lowpart (SImode, operands[0]);
+  hi_result = gen_highpart (SImode, operands[0]);
+  machine_mode mode = CCmode;
+
+  if (CONST_INT_P (operands[1]) && CONST_INT_P (operands[2]))
+{
+  /* If both operands are constants we can decide the result statically.  */
+  wi::overflow_type overflow;
+  wide_int val = wi::sub (rtx_mode_t (operands[1], DImode),
+			  rtx_mode_t (operands[2], DImode),
+			  SIGNED, );
+  emit_move_insn (operands[0], GEN_INT (val.to_shwi ()));
+  if (overflow != wi::OVF_NONE)
+	emit_jump_insn (gen_jump (operands[3]));
+  DONE;
+}
+  else if (CONST_INT_P (operands[1]))
+{
+  arm_decompose_di_binop (operands[2], operands[1], _op2, _op2,
+			  _op1, _op1);
+  if (const_ok_for_arm (INTVAL (lo_op1)))
+	{
+	  emit_insn (gen_rsb_imm_compare (lo_result, lo_op1, lo_op2,
+	  GEN_INT (~UINTVAL (lo_op1;
+	  /* We could potentially use RSC here in Arm state, but not
+	 in Thumb, so it's probably not worth the effort of handling
+	 this.  */
+	  hi_op1 = force_reg (SImode, hi_op1);
+	  mode = CC_RSBmode;
+	  goto highpart;
+	}
+  operands[1] = force_reg (DImode, operands[1]);
+}
+
+  arm_decompose_di_binop (operands[1], operands[2], _op1, _op1,
+			  _op2, _op2);
+  if (lo_op2 == const0_rtx)
+{
+  emit_move_insn (lo_result, lo_op1);
+  if (!arm_add_operand (hi_op2, SImode))
+hi_op2 = force_reg (SImode, hi_op2);
+  emit_insn (gen_subvsi4 (hi_result, hi_op1, hi_op2, operands[3]));
+  DONE;
+}
+
+  if (CONST_INT_P (lo_op2) && !arm_addimm_operand (lo_op2, SImode))
+lo_op2 = force_reg (SImode, lo_op2);
+  if (CONST_INT_P (lo_op2))
+emit_insn (gen_cmpsi2_addneg (lo_result, lo_op1, lo_op2,
+  GEN_INT (-INTVAL (lo_op2;
+  else
+emit_insn (gen_subsi3_compare1 (lo_result, lo_op1, lo_op2));
+
+ highpart:
+  if (!arm_not_operand (hi_op2, SImode))
+hi_op2 = force_reg (SImode, hi_op2);
+  rtx ccreg = gen_rtx_REG (mode, CC_REGNUM);
+  if (CONST_INT_P (hi_op2))
+emit_insn (gen_subvsi3_borrow_imm (hi_result, hi_op1, hi_op2,
+   gen_rtx_LTU (SImode, ccreg, const0_rtx),
+   gen_rtx_LTU (DImode, ccreg,
+		const0_rtx)));
+  else
+emit_insn (gen_subvsi3_borrow (hi_result, hi_op1, hi_op2,
+   gen_rtx_LTU (SImode, ccreg, const0_rtx),
+   gen_rtx_LTU (DImode, ccreg, const0_rtx)));
   arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
 
   DONE;
@@ -1523,20 +1590,6 @@ (define_expand "usubvdi4"
   DONE;
 })
 
-(define_insn "subdi3_compare1"
-  [(set (reg:CC CC_REGNUM)
-	(compare:CC
-	  (match_operand:DI 1 "s_register_operand" "r")
-	  (match_operand:DI 2 "s_register_operand" "r")))
-   (set (match_operand:DI 0 "s_register_operand" "=")
-	(minus:DI (match_dup 1) (match_dup 2)))]
-  "TARGET_32BIT"
-  "subs\\t%Q0, %Q1, %Q2;sbcs\\t%R0, %R1, %R2"
-  [(set_attr "conds" "set")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_insn "subsi3_compare1"
   [(set (reg:CC CC_REGNUM)
 	(compare:CC
@@ -2016,6 +2069,50 @@ (define_insn "usubvsi3_borrow_imm"
(set_attr "type" "alus_imm")]
 )
 
+(define_insn "subvsi3_borrow"
+  [(set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	 (minus:DI
+	  (minus:DI
+	   (sign_extend:DI (match_operand:SI 1 "s_register_operand" "0,r"))
+	   (sign_extend:DI (match_operand:SI 2 "s_register_operand" "l,r")))
+	  (match_operand:DI 4 "arm_borrow_operation" ""))
+	 (sign_extend:DI
+	  (minus:SI (minus:SI (match_dup 1) (match_dup 2))
+		(match_operand:SI 3 "arm_borrow_operation" "")
+   (set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(minus:SI (minus:SI (match_dup 1) (match_dup 2))
+		  (match_dup 3)))]
+  "TARGET_32BIT"
+  "sbcs%?\\t%0, %1, %2"
+  [(set_attr "conds" "set")
+   (set_attr "arch" "t2,*")
+   (set_attr "length" "2,4")]
+)
+
+(define_insn "subvsi3_borrow_imm"
+  [(set (reg:CC_V 

[PATCH 23/29] [arm] Early split addvdi4

2019-10-18 Thread Richard Earnshaw

This patch adds early splitting for addvdi4; it's very similar to the
uaddvdi4 splitter, but the details are just different enough in
places, especially for the patterns that match the splitting, where we
have to compare against the non-widened version to detect if overflow
occurred.

I've also added a testcase to the testsuite for a couple of constants
that caught me out during the development of this patch.  They're
probably arm-specific values, but the test is generic enough that I've
included it for all targets.

[gcc]
* config/arm/arm.c (arm_select_cc_mode): Allow either the first
or second operand of the PLUS inside a DImode equality test to be
sign-extend when selecting CC_Vmode.
* config/arm/arm.md (addvdi4): Early-split the operation into SImode
instructions.
(addsi3_cin_vout_reg, addsi3_cin_vout_imm, addsi3_cin_vout_0): New
expand patterns.
(addsi3_cin_vout_reg_insn, addsi3_cin_vout_imm_insn): New patterns.
(addsi3_cin_vout_0): Likewise.
(adddi3_compareV): Delete.

[gcc/testsuite]
* gcc.dg/builtin-arith-overflow-3.c: New test.
---
 gcc/config/arm/arm.c  |   3 +-
 gcc/config/arm/arm.md | 181 --
 .../gcc.dg/builtin-arith-overflow-3.c |  41 
 3 files changed, 203 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 638c82df25f..c9abbb0f91d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15414,7 +15414,8 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   if (GET_MODE (x) == DImode
   && (op == EQ || op == NE)
   && GET_CODE (x) == PLUS
-  && GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+  && (GET_CODE (XEXP (x, 0)) == SIGN_EXTEND
+	  || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND)
   && GET_CODE (y) == SIGN_EXTEND
   && GET_CODE (XEXP (y, 0)) == PLUS)
 return CC_Vmode;
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index be002f77382..e9e0ca925d2 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -505,18 +505,173 @@ (define_expand "addvsi4"
 })
 
 (define_expand "addvdi4"
-  [(match_operand:DI 0 "register_operand")
-   (match_operand:DI 1 "register_operand")
-   (match_operand:DI 2 "register_operand")
+  [(match_operand:DI 0 "s_register_operand")
+   (match_operand:DI 1 "s_register_operand")
+   (match_operand:DI 2 "reg_or_int_operand")
(match_operand 3 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_adddi3_compareV (operands[0], operands[1], operands[2]));
-  arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
+  rtx lo_result, hi_result;
+  rtx lo_op1, hi_op1, lo_op2, hi_op2;
+  arm_decompose_di_binop (operands[1], operands[2], _op1, _op1,
+			  _op2, _op2);
+  lo_result = gen_lowpart (SImode, operands[0]);
+  hi_result = gen_highpart (SImode, operands[0]);
+
+  if (lo_op2 == const0_rtx)
+{
+  emit_move_insn (lo_result, lo_op1);
+  if (!arm_add_operand (hi_op2, SImode))
+	hi_op2 = force_reg (SImode, hi_op2);
+
+  emit_insn (gen_addvsi4 (hi_result, hi_op1, hi_op2, operands[3]));
+}
+  else
+{
+  if (!arm_add_operand (lo_op2, SImode))
+	lo_op2 = force_reg (SImode, lo_op2);
+  if (!arm_not_operand (hi_op2, SImode))
+	hi_op2 = force_reg (SImode, hi_op2);
+
+  emit_insn (gen_addsi3_compare_op1 (lo_result, lo_op1, lo_op2));
+
+  if (hi_op2 == const0_rtx)
+emit_insn (gen_addsi3_cin_vout_0 (hi_result, hi_op1));
+  else if (CONST_INT_P (hi_op2))
+emit_insn (gen_addsi3_cin_vout_imm (hi_result, hi_op1, hi_op2));
+  else
+emit_insn (gen_addsi3_cin_vout_reg (hi_result, hi_op1, hi_op2));
+
+  arm_gen_unlikely_cbranch (NE, CC_Vmode, operands[3]);
+}
 
   DONE;
 })
 
+(define_expand "addsi3_cin_vout_reg"
+  [(parallel
+[(set (match_dup 3)
+	  (compare:CC_V
+	   (plus:DI
+	(plus:DI (match_dup 4)
+		 (sign_extend:DI (match_operand:SI 1 "s_register_operand")))
+	(sign_extend:DI (match_operand:SI 2 "s_register_operand")))
+	   (sign_extend:DI (plus:SI (plus:SI (match_dup 5) (match_dup 1))
+(match_dup 2)
+ (set (match_operand:SI 0 "s_register_operand")
+	  (plus:SI (plus:SI (match_dup 5) (match_dup 1))
+		   (match_dup 2)))])]
+  "TARGET_32BIT"
+  {
+operands[3] = gen_rtx_REG (CC_Vmode, CC_REGNUM);
+rtx ccin = gen_rtx_REG (CC_Cmode, CC_REGNUM);
+operands[4] = gen_rtx_LTU (DImode, ccin, const0_rtx);
+operands[5] = gen_rtx_LTU (SImode, ccin, const0_rtx);
+  }
+)
+
+(define_insn "*addsi3_cin_vout_reg_insn"
+  [(set (reg:CC_V CC_REGNUM)
+	(compare:CC_V
+	 (plus:DI
+	  (plus:DI
+	   (match_operand:DI 3 "arm_carry_operation" "")
+	   (sign_extend:DI (match_operand:SI 1 "s_register_operand" "%0,r")))
+	  (sign_extend:DI (match_operand:SI 2 "s_register_operand" "l,r")))
+	 (sign_extend:DI
+	  (plus:SI (plus:SI (match_operand:SI 4 "arm_carry_operation" "")
+			

[PATCH 12/29] [arm] Implement negscc using SBC when appropriate.

2019-10-18 Thread Richard Earnshaw

When the carry flag is appropriately set by a comprison, negscc
patterns can expand into a simple SBC of a register with itself.  This
means we can convert two conditional instructions into a single
non-conditional instruction.  Furthermore, in Thumb2 we can avoid the
need for an IT instruction as well.  This patch also fixes the remaining
testcase that we initially XFAILed in the first patch of this series.

gcc:
* config/arm/arm.md (negscc_borrow): New pattern.
(mov_negscc): Don't split if the insn would match negscc_borrow.
* config/arm/thumb2.md (thumb2_mov_negscc): Likewise.
(thumb2_mov_negscc_strict_it): Likewise.

testsuite:
* gcc.target/arm/negdi-3.c: Remove XFAIL markers.
---
 gcc/config/arm/arm.md  | 14 --
 gcc/config/arm/thumb2.md   |  8 ++--
 gcc/testsuite/gcc.target/arm/negdi-3.c |  8 
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f53dbc27207..74f417fbe4b 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6612,13 +6612,23 @@ (define_insn_and_split "*mov_scc"
(set_attr "type" "multiple")]
 )
 
+(define_insn "*negscc_borrow"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(neg:SI (match_operand:SI 1 "arm_borrow_operation" "")))]
+  "TARGET_32BIT"
+  "sbc\\t%0, %0, %0"
+  [(set_attr "conds" "use")
+   (set_attr "length" "4")
+   (set_attr "type" "adc_reg")]
+)
+
 (define_insn_and_split "*mov_negscc"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(neg:SI (match_operator:SI 1 "arm_comparison_operator_mode"
 		 [(match_operand 2 "cc_register" "") (const_int 0)])))]
-  "TARGET_ARM"
+  "TARGET_ARM && !arm_borrow_operation (operands[1], SImode)"
   "#"   ; "mov%D1\\t%0, #0\;mvn%d1\\t%0, #0"
-  "TARGET_ARM"
+  "&& true"
   [(set (match_dup 0)
 (if_then_else:SI (match_dup 1)
  (match_dup 3)
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 6ccc875e2b4..8d0b6be9205 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -368,7 +368,9 @@ (define_insn_and_split "*thumb2_mov_negscc"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(neg:SI (match_operator:SI 1 "arm_comparison_operator_mode"
 		 [(match_operand 2 "cc_register" "") (const_int 0)])))]
-  "TARGET_THUMB2 && !arm_restrict_it"
+  "TARGET_THUMB2
+   && !arm_restrict_it
+   && !arm_borrow_operation (operands[1], SImode)"
   "#"   ; "ite\\t%D1\;mov%D1\\t%0, #0\;mvn%d1\\t%0, #0"
   "&& true"
   [(set (match_dup 0)
@@ -387,7 +389,9 @@ (define_insn_and_split "*thumb2_mov_negscc_strict_it"
   [(set (match_operand:SI 0 "low_register_operand" "=l")
 	(neg:SI (match_operator:SI 1 "arm_comparison_operator_mode"
 		 [(match_operand 2 "cc_register" "") (const_int 0)])))]
-  "TARGET_THUMB2 && arm_restrict_it"
+  "TARGET_THUMB2
+   && arm_restrict_it
+   && !arm_borrow_operation (operands[1], SImode)"
   "#"   ; ";mvn\\t%0, #0 ;it\\t%D1\;mov%D1\\t%0, #0\"
   "&& reload_completed"
   [(set (match_dup 0)
diff --git a/gcc/testsuite/gcc.target/arm/negdi-3.c b/gcc/testsuite/gcc.target/arm/negdi-3.c
index 3f6f2d1c2bb..76ddf49fc0d 100644
--- a/gcc/testsuite/gcc.target/arm/negdi-3.c
+++ b/gcc/testsuite/gcc.target/arm/negdi-3.c
@@ -11,7 +11,7 @@ Expected output:
 rsbsr0, r0, #0
 sbc r1, r1, r1
 */
-/* { dg-final { scan-assembler-times "rsb" 1 { xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times "sbc" 1 { xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times "mov" 0 { xfail *-*-* } } } */
-/* { dg-final { scan-assembler-times "rsc" 0 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times "rsb" 1 } } */
+/* { dg-final { scan-assembler-times "sbc" 1 } } */
+/* { dg-final { scan-assembler-times "mov" 0 } } */
+/* { dg-final { scan-assembler-times "rsc" 0 } } */


[PATCH 10/29] [arm] Correct cost calculations involving borrow for subtracts.

2019-10-18 Thread Richard Earnshaw

The rtx_cost calculations when a borrow operation was being performed were
not being calculated correctly.  The borrow is free as part of the
subtract-with-carry instructions.  This patch recognizes the various
idioms that can describe this and returns the correct costs.

* config/arm/arm.c (arm_rtx_costs_internal, case MINUS): Handle
borrow operations.
---
 gcc/config/arm/arm.c | 49 +---
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dfbd5cde5eb..b91b52f6d51 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10049,15 +10049,46 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	  rtx shift_by_reg = NULL;
 	  rtx shift_op;
 	  rtx non_shift_op;
+	  rtx op0 = XEXP (x, 0);
+	  rtx op1 = XEXP (x, 1);
 
-	  shift_op = shifter_op_p (XEXP (x, 0), _by_reg);
+	  /* Factor out any borrow operation.  There's more than one way
+	 of expressing this; try to recognize them all.  */
+	  if (GET_CODE (op0) == MINUS)
+	{
+	  if (arm_borrow_operation (op1, SImode))
+		{
+		  op1 = XEXP (op0, 1);
+		  op0 = XEXP (op0, 0);
+		}
+	  else if (arm_borrow_operation (XEXP (op0, 1), SImode))
+		op0 = XEXP (op0, 0);
+	}
+	  else if (GET_CODE (op1) == PLUS
+		   && arm_borrow_operation (XEXP (op1, 0), SImode))
+	op1 = XEXP (op1, 0);
+	  else if (GET_CODE (op0) == NEG
+		   && arm_borrow_operation (op1, SImode))
+	{
+	  /* Negate with carry-in.  For Thumb2 this is done with
+		 SBC R, X, X lsl #1 (ie X - 2X - C) as Thumb lacks the
+		 RSC instruction that exists in Arm mode.  */
+	  if (speed_p)
+		*cost += (TARGET_THUMB2
+			  ? extra_cost->alu.arith_shift
+			  : extra_cost->alu.arith);
+	  *cost += rtx_cost (XEXP (op0, 0), mode, MINUS, 0, speed_p);
+	  return true;
+	}
+
+	  shift_op = shifter_op_p (op0, _by_reg);
 	  if (shift_op == NULL)
 	{
-	  shift_op = shifter_op_p (XEXP (x, 1), _by_reg);
-	  non_shift_op = XEXP (x, 0);
+	  shift_op = shifter_op_p (op1, _by_reg);
+	  non_shift_op = op0;
 	}
 	  else
-	non_shift_op = XEXP (x, 1);
+	non_shift_op = op1;
 
 	  if (shift_op != NULL)
 	{
@@ -10087,10 +10118,10 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	  return true;
 	}
 
-	  if (CONST_INT_P (XEXP (x, 0)))
+	  if (CONST_INT_P (op0))
 	{
 	  int insns = arm_gen_constant (MINUS, SImode, NULL_RTX,
-	INTVAL (XEXP (x, 0)), NULL_RTX,
+	INTVAL (op0), NULL_RTX,
 	NULL_RTX, 1, 0);
 	  *cost = COSTS_N_INSNS (insns);
 	  if (speed_p)
@@ -10101,7 +10132,11 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	  else if (speed_p)
 	*cost += extra_cost->alu.arith;
 
-	  return false;
+	  /* Don't recurse as we don't want to cost any borrow that
+	 we've stripped.  */
+	  *cost += rtx_cost (op0, mode, MINUS, 0, speed_p);
+	  *cost += rtx_cost (op1, mode, MINUS, 1, speed_p);
+	  return true;
 	}
 
   if (GET_MODE_CLASS (mode) == MODE_INT


[PATCH 25/29] [arm] Early expansion of usubvdi4.

2019-10-18 Thread Richard Earnshaw

This patch adds early expansion of usubvdi4, allowing us to handle some
constants in place, which previously we were unable to do.

* config/arm/arm.md (usubvdi4): Allow registers or integers for
incoming operands.  Early split the calculation into SImode
operations.
(usubvsi3_borrow): New insn pattern.
(usubvsi3_borrow_imm): Likewise.
---
 gcc/config/arm/arm.md | 113 --
 1 file changed, 109 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a465bf8e7a3..92f1823cdfa 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1390,13 +1390,81 @@ (define_expand "usubvsi4"
 
 (define_expand "usubvdi4"
   [(match_operand:DI 0 "s_register_operand")
-   (match_operand:DI 1 "s_register_operand")
-   (match_operand:DI 2 "s_register_operand")
+   (match_operand:DI 1 "reg_or_int_operand")
+   (match_operand:DI 2 "reg_or_int_operand")
(match_operand 3 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_subdi3_compare1 (operands[0], operands[1], operands[2]));
-  arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
+  rtx lo_result, hi_result;
+  rtx lo_op1, hi_op1, lo_op2, hi_op2;
+  lo_result = gen_lowpart (SImode, operands[0]);
+  hi_result = gen_highpart (SImode, operands[0]);
+  machine_mode mode = CCmode;
+
+  if (CONST_INT_P (operands[1]) && CONST_INT_P (operands[2]))
+{
+  /* If both operands are constants we can decide the result statically.  */
+  wi::overflow_type overflow;
+  wide_int val = wi::sub (rtx_mode_t (operands[1], DImode),
+			  rtx_mode_t (operands[2], DImode),
+			  UNSIGNED, );
+  emit_move_insn (operands[0], GEN_INT (val.to_shwi ()));
+  if (overflow != wi::OVF_NONE)
+	emit_jump_insn (gen_jump (operands[3]));
+  DONE;
+}
+  else if (CONST_INT_P (operands[1]))
+{
+  arm_decompose_di_binop (operands[2], operands[1], _op2, _op2,
+			  _op1, _op1);
+  if (const_ok_for_arm (INTVAL (lo_op1)))
+	{
+	  emit_insn (gen_rsb_imm_compare (lo_result, lo_op1, lo_op2,
+	  GEN_INT (~UINTVAL (lo_op1;
+	  /* We could potentially use RSC here in Arm state, but not
+	 in Thumb, so it's probably not worth the effort of handling
+	 this.  */
+	  hi_op1 = force_reg (SImode, hi_op1);
+	  mode = CC_RSBmode;
+	  goto highpart;
+	}
+  operands[1] = force_reg (DImode, operands[1]);
+}
+
+  arm_decompose_di_binop (operands[1], operands[2], _op1, _op1,
+			  _op2, _op2);
+  if (lo_op2 == const0_rtx)
+{
+  emit_move_insn (lo_result, lo_op1);
+  if (!arm_add_operand (hi_op2, SImode))
+hi_op2 = force_reg (SImode, hi_op2);
+  emit_insn (gen_usubvsi4 (hi_result, hi_op1, hi_op2, operands[3]));
+  DONE;
+}
+
+  if (CONST_INT_P (lo_op2) && !arm_addimm_operand (lo_op2, SImode))
+lo_op2 = force_reg (SImode, lo_op2);
+  if (CONST_INT_P (lo_op2))
+emit_insn (gen_cmpsi2_addneg (lo_result, lo_op1, lo_op2,
+  GEN_INT (-INTVAL (lo_op2;
+  else
+emit_insn (gen_subsi3_compare1 (lo_result, lo_op1, lo_op2));
+
+ highpart:
+  if (!arm_not_operand (hi_op2, SImode))
+hi_op2 = force_reg (SImode, hi_op2);
+  rtx ccreg = gen_rtx_REG (mode, CC_REGNUM);
+  if (CONST_INT_P (hi_op2))
+emit_insn (gen_usubvsi3_borrow_imm (hi_result, hi_op1, hi_op2,
+	GEN_INT (UINTVAL (hi_op2) & 0x),
+	gen_rtx_LTU (SImode, ccreg, const0_rtx),
+	gen_rtx_LTU (DImode, ccreg,
+		 const0_rtx)));
+  else
+emit_insn (gen_usubvsi3_borrow (hi_result, hi_op1, hi_op2,
+gen_rtx_LTU (SImode, ccreg, const0_rtx),
+gen_rtx_LTU (DImode, ccreg, const0_rtx)));
+  arm_gen_unlikely_cbranch (LTU, CC_Bmode, operands[3]);
 
   DONE;
 })
@@ -1825,6 +1893,43 @@ (define_insn "rscsi3_out_scratch"
(set_attr "type" "alus_imm")]
 )
 
+(define_insn "usubvsi3_borrow"
+  [(set (reg:CC_B CC_REGNUM)
+	(compare:CC_B
+	 (zero_extend:DI (match_operand:SI 1 "s_register_operand" "0,r"))
+	 (plus:DI (match_operand:DI 4 "arm_borrow_operation" "")
+	  (zero_extend:DI
+		   (match_operand:SI 2 "s_register_operand" "l,r")
+   (set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(minus:SI (match_dup 1)
+		  (plus:SI (match_operand:SI 3 "arm_borrow_operation" "")
+			   (match_dup 2]
+  "TARGET_32BIT"
+  "sbcs%?\\t%0, %1, %2"
+  [(set_attr "conds" "set")
+   (set_attr "arch" "t2,*")
+   (set_attr "length" "2,4")]
+)
+
+(define_insn "usubvsi3_borrow_imm"
+  [(set (reg:CC_B CC_REGNUM)
+	(compare:CC_B
+	 (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r,r"))
+	 (plus:DI (match_operand:DI 5 "arm_borrow_operation" "")
+		  (match_operand:DI 3 "const_int_operand" "n,n"
+   (set (match_operand:SI 0 "s_register_operand" "=r,r")
+	(minus:SI (match_dup 1)
+		  (plus:SI (match_operand:SI 4 "arm_borrow_operation" "")
+			   (match_operand:SI 2 "arm_adcimm_operand" "I,K"]
+  "TARGET_32BIT
+   && (UINTVAL (operands[2]) & 0x) == UINTVAL (operands[3])"
+  "@
+  sbcs%?\\t%0, 

[PATCH 24/29] [arm] Improve constant handling for usubvsi4.

2019-10-18 Thread Richard Earnshaw

This patch improves the expansion of usubvsi4 by allowing suitable
constants to be passed directly.  Unlike normal subtraction, either
operand may be a constant (and indeed I have seen cases where both can
be with LTO enabled).  One interesting testcase that improves as a
result of this is:

unsigned f6 (unsigned a)
{
  unsigned x;
  return __builtin_sub_overflow (5U, a, ) ? 0 : x;
}

Which previously compiled to:

rsbsr3, r0, #5
cmp r0, #5
movls   r0, r3
movhi   r0, #0

but now generates the optimal sequence:

rsbsr0, r0, #5
movcc   r0, #0

* config/arm/arm.md (usubv4): Delete expansion.
(usubvsi4): New pattern.  Allow some immediate values for inputs.
(usubvdi4): New pattern.
---
 gcc/config/arm/arm.md | 46 ++-
 1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index e9e0ca925d2..a465bf8e7a3 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1352,14 +1352,50 @@ (define_expand "subv4"
   DONE;
 })
 
-(define_expand "usubv4"
-  [(match_operand:SIDI 0 "register_operand")
-   (match_operand:SIDI 1 "register_operand")
-   (match_operand:SIDI 2 "register_operand")
+(define_expand "usubvsi4"
+  [(match_operand:SI 0 "s_register_operand")
+   (match_operand:SI 1 "arm_rhs_operand")
+   (match_operand:SI 2 "arm_add_operand")
(match_operand 3 "")]
   "TARGET_32BIT"
 {
-  emit_insn (gen_sub3_compare1 (operands[0], operands[1], operands[2]));
+  machine_mode mode = CCmode;
+  if (CONST_INT_P (operands[1]) && CONST_INT_P (operands[2]))
+{
+  /* If both operands are constants we can decide the result statically.  */
+  wi::overflow_type overflow;
+  wide_int val = wi::sub (rtx_mode_t (operands[1], SImode),
+			  rtx_mode_t (operands[2], SImode),
+			  UNSIGNED, );
+  emit_move_insn (operands[0], GEN_INT (val.to_shwi ()));
+  if (overflow != wi::OVF_NONE)
+	emit_jump_insn (gen_jump (operands[3]));
+  DONE;
+}
+  else if (CONST_INT_P (operands[2]))
+emit_insn (gen_cmpsi2_addneg (operands[0], operands[1], operands[2],
+  GEN_INT (-INTVAL (operands[2];
+  else if (CONST_INT_P (operands[1]))
+{
+  mode = CC_RSBmode;
+  emit_insn (gen_rsb_imm_compare (operands[0], operands[1], operands[2],
+  GEN_INT (~UINTVAL (operands[1];
+}
+  else
+emit_insn (gen_subsi3_compare1 (operands[0], operands[1], operands[2]));
+  arm_gen_unlikely_cbranch (LTU, mode, operands[3]);
+
+  DONE;
+})
+
+(define_expand "usubvdi4"
+  [(match_operand:DI 0 "s_register_operand")
+   (match_operand:DI 1 "s_register_operand")
+   (match_operand:DI 2 "s_register_operand")
+   (match_operand 3 "")]
+  "TARGET_32BIT"
+{
+  emit_insn (gen_subdi3_compare1 (operands[0], operands[1], operands[2]));
   arm_gen_unlikely_cbranch (LTU, CCmode, operands[3]);
 
   DONE;


[PATCH 16/29] [arm] early split most DImode comparison operations.

2019-10-18 Thread Richard Earnshaw

This patch does most of the work for early splitting the DImode
comparisons.  We now handle EQ, NE, LT, GE, LTU and GEU during early
expansion, in addition to EQ and NE, for which the expansion has now
been reworked to use a standard conditional-compare pattern already in
the back-end.

To handle this we introduce two new condition flag modes that are used
when comparing the upper words of decomposed DImode values: one for
signed, and one for unsigned comparisons.  CC_Bmode (B for Borrow) is
essentially the inverse of CC_Cmode and is used when the carry flag is
set by a subtraction of unsigned values.

* config/arm/arm-modes.def (CC_NV, CC_B): New CC modes.
* config/arm/arm.c (arm_select_cc_mode): Recognize constructs that
need these modes.
(arm_gen_dicompare_reg): New code to early expand the sub-operations
of EQ, NE, LT, GE, LTU and GEU.
* config/arm/iterators.md (CC_EXTEND): New code attribute.
* config/arm/predicates.md (arm_adcimm_operand): New predicate..
* config/arm/arm.md (cmpsi3_carryin_out): New pattern.
(cmpsi3_imm_carryin_out): Likewise.
(cmpsi3_0_carryin_out): Likewise.
---
 gcc/config/arm/arm-modes.def |   6 +
 gcc/config/arm/arm.c | 220 ++-
 gcc/config/arm/arm.md|  45 +++
 gcc/config/arm/iterators.md  |   4 +
 gcc/config/arm/predicates.md |   6 +
 5 files changed, 278 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index 4fa7f1b43e5..65cddf68cdb 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -34,12 +34,16 @@ ADJUST_FLOAT_FORMAT (HF, ((arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
CC_Cmode should be used if only the C flag is set correctly, after an
  addition.
CC_Nmode should be used if only the N (sign) flag is set correctly
+   CC_NVmode should be used if only the N and V bits are set correctly,
+ (used for signed comparisons when the carry is propagated in).
CC_CZmode should be used if only the C and Z flags are correct
(used for DImode unsigned comparisons).
CC_RSBmode should be used where the comparison is set by an RSB immediate,
  or NEG instruction.  The form of the comparison for (const - reg) will
  be (COMPARE (not (reg)) (~const)).
CC_NCVmode should be used if only the N, C, and V flags are correct
+   CC_Bmode should be used if only the C flag is correct after a subtract
+ (eg after an unsigned borrow with carry-in propagation).
(used for DImode signed comparisons).
CCmode should be used otherwise.  */
 
@@ -47,6 +51,7 @@ CC_MODE (CC_NOOV);
 CC_MODE (CC_Z);
 CC_MODE (CC_CZ);
 CC_MODE (CC_NCV);
+CC_MODE (CC_NV);
 CC_MODE (CC_SWP);
 CC_MODE (CC_RSB);
 CC_MODE (CCFP);
@@ -62,6 +67,7 @@ CC_MODE (CC_DLTU);
 CC_MODE (CC_DGEU);
 CC_MODE (CC_DGTU);
 CC_MODE (CC_C);
+CC_MODE (CC_B);
 CC_MODE (CC_N);
 CC_MODE (CC_V);
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ddfe4335169..99c8bd79d30 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15348,6 +15348,22 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
   && (rtx_equal_p (XEXP (x, 0), y) || rtx_equal_p (XEXP (x, 1), y)))
 return CC_Cmode;
 
+  if (GET_MODE (x) == DImode
+  && (op == GE || op == LT)
+  && GET_CODE (x) == SIGN_EXTEND
+  && ((GET_CODE (y) == PLUS
+	   && arm_borrow_operation (XEXP (y, 0), DImode))
+	  || arm_borrow_operation (y, DImode)))
+return CC_NVmode;
+
+  if (GET_MODE (x) == DImode
+  && (op == GEU || op == LTU)
+  && GET_CODE (x) == ZERO_EXTEND
+  && ((GET_CODE (y) == PLUS
+	   && arm_borrow_operation (XEXP (y, 0), DImode))
+	  || arm_borrow_operation (y, DImode)))
+return CC_Bmode;
+
   if (GET_MODE (x) == DImode || GET_MODE (y) == DImode)
 {
   switch (op)
@@ -15410,16 +15426,198 @@ arm_select_cc_mode (enum rtx_code op, rtx x, rtx y)
 static rtx
 arm_gen_dicompare_reg (rtx_code code, rtx x, rtx y, rtx scratch)
 {
-  /* We don't currently handle DImode in thumb1, but rely on libgcc.  */
+  machine_mode mode;
+  rtx cc_reg;
+
+/* We don't currently handle DImode in thumb1, but rely on libgcc.  */
   gcc_assert (TARGET_32BIT);
 
+  rtx x_lo = simplify_gen_subreg (SImode, x, DImode,
+  subreg_lowpart_offset (SImode, DImode));
+  rtx x_hi = simplify_gen_subreg (SImode, x, DImode,
+  subreg_highpart_offset (SImode, DImode));
+  rtx y_lo = simplify_gen_subreg (SImode, y, DImode,
+  subreg_lowpart_offset (SImode, DImode));
+  rtx y_hi = simplify_gen_subreg (SImode, y, DImode,
+  subreg_highpart_offset (SImode, DImode));
+  switch (code)
+{
+case EQ:
+case NE:
+  {
+	/* We should never have X as a const_int in this case.  */
+	gcc_assert (!CONST_INT_P (x));
+
+	if (y_lo == const0_rtx || y_hi == const0_rtx)
+	  {
+	if (y_lo != const0_rtx)
+	  {
+		rtx scratch2 = scratch ? scratch : gen_reg_rtx (SImode);
+
+		gcc_assert 

[PATCH 03/29] [arm] Early split zero- and sign-extension

2019-10-18 Thread Richard Earnshaw

This patch changes the insn patterns for zero- and sign-extend into
define_expands that generate the appropriate word operations
immediately.

* config/arm/arm.md (zero_extenddi2): Convert to define_expand.
(extenddi2): Likewise.
---
 gcc/config/arm/arm.md | 75 +++
 1 file changed, 54 insertions(+), 21 deletions(-)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5ba42a13430..4a7a64e6613 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4196,31 +4196,64 @@ (define_expand "truncdfhf2"
 
 ;; Zero and sign extension instructions.
 
-(define_insn "zero_extenddi2"
-  [(set (match_operand:DI 0 "s_register_operand" "=r,?r")
-(zero_extend:DI (match_operand:QHSI 1 ""
-	"")))]
+(define_expand "zero_extenddi2"
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(zero_extend:DI (match_operand:QHSI 1 "" "")))]
   "TARGET_32BIT "
-  "#"
-  [(set_attr "length" "4,8")
-   (set_attr "arch" "*,*")
-   (set_attr "ce_count" "2")
-   (set_attr "predicable" "yes")
-   (set_attr "type" "mov_reg,multiple")]
+  {
+rtx res_lo, res_hi, op0_lo, op0_hi;
+res_lo = gen_lowpart (SImode, operands[0]);
+res_hi = gen_highpart (SImode, operands[0]);
+if (can_create_pseudo_p ())
+  {
+	op0_lo = mode == SImode ? operands[1] : gen_reg_rtx (SImode);
+	op0_hi = gen_reg_rtx (SImode);
+  }
+else
+  {
+	op0_lo = mode == SImode ? operands[1] : res_lo;
+	op0_hi = res_hi;
+  }
+if (mode != SImode)
+  emit_insn (gen_rtx_SET (op0_lo,
+			  gen_rtx_ZERO_EXTEND (SImode, operands[1])));
+emit_insn (gen_movsi (op0_hi, const0_rtx));
+if (res_lo != op0_lo)
+  emit_move_insn (res_lo, op0_lo);
+if (res_hi != op0_hi)
+  emit_move_insn (res_hi, op0_hi);
+DONE;
+  }
 )
 
-(define_insn "extenddi2"
-  [(set (match_operand:DI 0 "s_register_operand" "=r,?r,?r")
-(sign_extend:DI (match_operand:QHSI 1 ""
-	"")))]
+(define_expand "extenddi2"
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(sign_extend:DI (match_operand:QHSI 1 "" "")))]
   "TARGET_32BIT "
-  "#"
-  [(set_attr "length" "4,8,8")
-   (set_attr "ce_count" "2")
-   (set_attr "shift" "1")
-   (set_attr "predicable" "yes")
-   (set_attr "arch" "*,a,t")
-   (set_attr "type" "mov_reg,multiple,multiple")]
+  {
+rtx res_lo, res_hi, op0_lo, op0_hi;
+res_lo = gen_lowpart (SImode, operands[0]);
+res_hi = gen_highpart (SImode, operands[0]);
+if (can_create_pseudo_p ())
+  {
+	op0_lo = mode == SImode ? operands[1] : gen_reg_rtx (SImode);
+	op0_hi = gen_reg_rtx (SImode);
+  }
+else
+  {
+	op0_lo = mode == SImode ? operands[1] : res_lo;
+	op0_hi = res_hi;
+  }
+if (mode != SImode)
+  emit_insn (gen_rtx_SET (op0_lo,
+			  gen_rtx_SIGN_EXTEND (SImode, operands[1])));
+emit_insn (gen_ashrsi3 (op0_hi, op0_lo, GEN_INT (31)));
+if (res_lo != op0_lo)
+  emit_move_insn (res_lo, op0_lo);
+if (res_hi != op0_hi)
+  emit_move_insn (res_hi, op0_hi);
+DONE;
+  }
 )
 
 ;; Splits for all extensions to DImode


[PATCH 01/29] [arm] Rip out DImode addition and subtraction splits.

2019-10-18 Thread Richard Earnshaw

The first step towards early splitting of addition and subtraction at
DImode is to rip out the old patterns that are designed to propagate
DImode through the RTL optimization passes and the do late splitting.

This patch does cause some code size regressions, but it should still
execute correctly.  We will progressively add back the optimizations
we had here in later patches.

A small number of tests in the Arm-specific testsuite do fail as a
result of this patch, but that's to be expected, since the
optimizations they are looking for have just been removed.  I've kept
the tests, but XFAILed them for now.

One small technical change is also done in this patch as part of the
cleanup: the uaddv4 expander is changed to use LTU as the branch
comparison.  This eliminates the need for CC_Cmode to recognize
somewhat bogus equality constraints.

gcc:
* arm.md (adddi3): Only accept register operands.
(arm_adddi3): Convert to simple insn with no split.  Do not accept
constants.
(adddi_sesidi_di): Delete patern.
(adddi_zesidi_di): Likewise.
(uaddv4): Use LTU as condition for branch.
(adddi3_compareV): Convert to simple insn with no split.
(addsi3_compareV_upper): Delete pattern.
(adddi3_compareC): Convert to simple insn with no split.  Correct
flags setting expression.
(addsi3_compareC_upper): Delete pattern.
(addsi3_compareC): Correct flags setting expression.
(subdi3_compare1): Convert to simple insn with no split.
(subsi3_carryin_compare): Delete pattern.
(arm_subdi3): Convert to simple insn with no split.
(subdi_zesidi): Delete pattern.
(subdi_di_sesidi): Delete pattern.
(subdi_zesidi_di): Delete pattern.
(subdi_sesidi_di): Delete pattern.
(subdi_zesidi_zesidi): Delete pattern.
(negvdi3): Use s_register_operand.
(negdi2_compare): Convert to simple insn with no split.
(negdi2_insn): Likewise.
(negsi2_carryin_compare): Delete pattern.
(negdi_zero_extendsidi): Delete pattern.
(arm_cmpdi_insn): Convert to simple insn with no split.
(negdi2): Don't call gen_negdi2_neon.
* config/arm/neon.md (adddi3_neon): Delete pattern.
(subdi3_neon): Delete pattern.
(negdi2_neon): Delete pattern.
(splits for negdi2_neon): Delete splits.

testsuite:
* gcc.target/arm/negdi-3.c: Add XFAILS.
* gcc.target/arm/pr3447-1.c: Likewise.
* gcc.target/arm/pr3447-3.c: Likewise.
* gcc.target/arm/pr3447-4.c: Likewise.
---
 gcc/config/arm/arm.c |   2 -
 gcc/config/arm/arm.md| 569 ++-
 gcc/testsuite/gcc.target/arm/negdi-3.c   |   8 +-
 gcc/testsuite/gcc.target/arm/pr53447-1.c |   2 +-
 gcc/testsuite/gcc.target/arm/pr53447-3.c |   2 +-
 gcc/testsuite/gcc.target/arm/pr53447-4.c |   2 +-
 6 files changed, 56 insertions(+), 529 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ba330470141..41567af1869 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23581,8 +23581,6 @@ maybe_get_arm_condition_code (rtx comparison)
 	{
 	case LTU: return ARM_CS;
 	case GEU: return ARM_CC;
-	case NE: return ARM_CS;
-	case EQ: return ARM_CC;
 	default: return ARM_NV;
 	}
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f861c72ccfc..241ba97c4ba 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -437,7 +437,7 @@ (define_expand "adddi3"
  [(parallel
[(set (match_operand:DI   0 "s_register_operand")
 	  (plus:DI (match_operand:DI 1 "s_register_operand")
-	   (match_operand:DI 2 "arm_adddi_operand")))
+	   (match_operand:DI 2 "s_register_operand")))
 (clobber (reg:CC CC_REGNUM))])]
   "TARGET_EITHER"
   "
@@ -446,87 +446,13 @@ (define_expand "adddi3"
   "
 )
 
-(define_insn_and_split "*arm_adddi3"
-  [(set (match_operand:DI  0 "arm_general_register_operand" "=")
-	(plus:DI (match_operand:DI 1 "arm_general_register_operand" "%0, 0, r, 0, r")
-		 (match_operand:DI 2 "arm_general_adddi_operand""r,  0, r, Dd, Dd")))
+(define_insn "*arm_adddi3"
+  [(set (match_operand:DI 0 "s_register_operand"  "=,,")
+	(plus:DI (match_operand:DI 1 "s_register_operand" " %0,0,r")
+		 (match_operand:DI 2 "s_register_operand" " r,0,r")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT"
-  "#"
-  "TARGET_32BIT"
-  [(parallel [(set (reg:CC_C CC_REGNUM)
-		   (compare:CC_C (plus:SI (match_dup 1) (match_dup 2))
- (match_dup 1)))
-	  (set (match_dup 0) (plus:SI (match_dup 1) (match_dup 2)))])
-   (set (match_dup 3) (plus:SI (plus:SI (match_dup 4) (match_dup 5))
-			   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[4] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, 

[PATCH 02/29] [arm] Perform early splitting of adddi3.

2019-10-18 Thread Richard Earnshaw

This patch causes the expansion of adddi3 to split the operation
immediately for Arm and Thumb-2.  This is desirable as it frees up the
register allocator to pick what ever combination of registers suits
best and reduces the number of auxiliary patterns that we need in the
back-end.  Three of the testcases that we disabled earlier are already
fixed by this patch.  Finally, we add a new pattern to match the
canonicalization of add-with-carry when using an immediate of zero.

gcc:
* config/arm/arm-protos.h (arm_decompose_di_binop): New prototype.
* config/arm/arm.c (arm_decompose_di_binop): New function.
* config/arm/arm.md (adddi3): Also accept any const_int for op2.
If not generating Thumb-1 code, decompose the operation into 32-bit
pieces.
* add0si_carryin_: New pattern.

testsuite:
* gcc.target/arm/pr53447-1.c: Remove XFAIL.
* gcc.target/arm/pr53447-3.c: Remove XFAIL.
* gcc.target/arm/pr53447-4.c: Remove XFAIL.
---
 gcc/config/arm/arm-protos.h  |  1 +
 gcc/config/arm/arm.c | 15 +
 gcc/config/arm/arm.md| 73 ++--
 gcc/testsuite/gcc.target/arm/pr53447-1.c |  2 +-
 gcc/testsuite/gcc.target/arm/pr53447-3.c |  2 +-
 gcc/testsuite/gcc.target/arm/pr53447-4.c |  2 +-
 6 files changed, 76 insertions(+), 19 deletions(-)

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f995974f9bb..c685bcbf99c 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -57,6 +57,7 @@ extern rtx arm_simd_vect_par_cnst_half (machine_mode mode, bool high);
 extern bool arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode,
 		 bool high);
 extern void arm_emit_speculation_barrier_function (void);
+extern void arm_decompose_di_binop (rtx, rtx, rtx *, rtx *, rtx *, rtx *);
 
 #ifdef RTX_CODE
 extern void arm_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 41567af1869..db18651346f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -14933,6 +14933,21 @@ gen_cpymem_ldrd_strd (rtx *operands)
   return true;
 }
 
+/* Decompose operands for a 64-bit binary operation in OP1 and OP2
+   into its component 32-bit subregs.  OP2 may be an immediate
+   constant and we want to simplify it in that case.  */
+void
+arm_decompose_di_binop (rtx op1, rtx op2, rtx *lo_op1, rtx *hi_op1,
+			rtx *lo_op2, rtx *hi_op2)
+{
+  *lo_op1 = gen_lowpart (SImode, op1);
+  *hi_op1 = gen_highpart (SImode, op1);
+  *lo_op2 = simplify_gen_subreg (SImode, op2, DImode,
+ subreg_lowpart_offset (SImode, DImode));
+  *hi_op2 = simplify_gen_subreg (SImode, op2, DImode,
+ subreg_highpart_offset (SImode, DImode));
+}
+
 /* Select a dominance comparison mode if possible for a test of the general
form (OP (COND_OR (X) (Y)) (const_int 0)).  We support three forms.
COND_OR == DOM_CC_X_AND_Y => (X && Y)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 241ba97c4ba..5ba42a13430 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -437,25 +437,53 @@ (define_expand "adddi3"
  [(parallel
[(set (match_operand:DI   0 "s_register_operand")
 	  (plus:DI (match_operand:DI 1 "s_register_operand")
-	   (match_operand:DI 2 "s_register_operand")))
+		   (match_operand:DI 2 "reg_or_int_operand")))
 (clobber (reg:CC CC_REGNUM))])]
   "TARGET_EITHER"
   "
-  if (TARGET_THUMB1 && !REG_P (operands[2]))
-operands[2] = force_reg (DImode, operands[2]);
-  "
-)
+  if (TARGET_THUMB1)
+{
+  if (!REG_P (operands[2]))
+	operands[2] = force_reg (DImode, operands[2]);
+}
+  else
+{
+  rtx lo_result, hi_result, lo_dest, hi_dest;
+  rtx lo_op1, hi_op1, lo_op2, hi_op2;
+  arm_decompose_di_binop (operands[1], operands[2], _op1, _op1,
+			  _op2, _op2);
+  lo_result = lo_dest = gen_lowpart (SImode, operands[0]);
+  hi_result = hi_dest = gen_highpart (SImode, operands[0]);
+
+  if (lo_op2 == const0_rtx)
+	{
+	  lo_dest = lo_op1;
+	  if (!arm_add_operand (hi_op2, SImode))
+	hi_op2 = force_reg (SImode, hi_op2);
+	  /* Assume hi_op2 won't also be zero.  */
+	  emit_insn (gen_addsi3 (hi_dest, hi_op1, hi_op2));
+	}
+  else
+	{
+	  if (!arm_add_operand (lo_op2, SImode))
+	lo_op2 = force_reg (SImode, lo_op2);
+	  if (!arm_not_operand (hi_op2, SImode))
+	hi_op2 = force_reg (SImode, hi_op2);
+
+	  emit_insn (gen_addsi3_compareC (lo_dest, lo_op1, lo_op2));
+	  if (hi_op2 == const0_rtx)
+	emit_insn (gen_add0si3_carryin_ltu (hi_dest, hi_op1));
+	  else
+	emit_insn (gen_addsi3_carryin_ltu (hi_dest, hi_op1, hi_op2));
+	}
 
-(define_insn "*arm_adddi3"
-  [(set (match_operand:DI 0 "s_register_operand"  "=,,")
-	(plus:DI (match_operand:DI 1 "s_register_operand" " %0,0,r")
-		 (match_operand:DI 2 "s_register_operand" " r,0,r")))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "adds\\t%Q0, %Q1, 

[PATCH 00/29] [arm] Rewrite DImode arithmetic support

2019-10-18 Thread Richard Earnshaw

This series of patches rewrites all the DImode arithmetic patterns for
the Arm backend when compiling for Arm or Thumb2 to split the
operations during expand (the thumb1 code is unchanged and cannot
benefit from early splitting as we are unable to expose the carry
flag).

This has a number of benefits:
 - register allocation has more freedom to use independent
   registers for the upper and lower halves of the register
 - we can make better use of combine for spotting insn merge
   opportunities without needing many additional patterns that are
   only used for DImode
 - we eliminate a number of bugs in the machine description where
   the carry calculations were not correctly propagated by the
   split patterns (we mostly got away with this because the
   splitting previously happened only after most of the important
   optimization passes had been run).

The patch series starts by paring back all the DImode arithmetic
support to a very simple form without any splitting at all and then
progressively re-implementing the patterns with early split
operations.  This proved to be the only sane way of untangling the
existing code due to a number of latent bugs which would have been
exposed if a different approach had been taken.

Each patch should produce a working compiler (it did when it was
originally written), though since the patch set has been re-ordered
slightly there is a possibility that some of the intermediate steps
may have missing test updates that are only cleaned up later.
However, only the end of the series should be considered complete.
I've kept the patch as a series to permit easier regression hunting
should that prove necessary.

R.

Richard Earnshaw (29):
  [arm] Rip out DImode addition and subtraction splits.
  [arm] Perform early splitting of adddi3.
  [arm] Early split zero- and sign-extension
  [arm] Rewrite addsi3_carryin_shift_ in canonical form
  [arm] fix constraints on addsi3_carryin_alt2
  [arm] Early split subdi3
  [arm] Remove redundant DImode subtract patterns
  [arm] Introduce arm_carry_operation
  [arm] Correctly cost addition with a carry-in
  [arm] Correct cost calculations involving borrow for subtracts.
  [arm] Reduce cost of insns that are simple reg-reg moves.
  [arm] Implement negscc using SBC when appropriate.
  [arm] Add alternative canonicalizations for subtract-with-carry +
shift
  [arm] Early split simple DImode equality comparisons
  [arm] Improve handling of DImode comparisions against constants.
  [arm] early split most DImode comparison operations.
  [arm] Handle some constant comparisons using rsbs+rscs
  [arm] Cleanup dead code - old support for DImode comparisons
  [arm] Handle immediate values in uaddvsi4
  [arm] Early expansion of uaddvdi4.
  [arm] Improve code generation for addvsi4.
  [arm] Allow the summation result of signed add-with-overflow to be
discarded.
  [arm] Early split addvdi4
  [arm] Improve constant handling for usubvsi4.
  [arm] Early expansion of usubvdi4.
  [arm] Improve constant handling for subvsi4.
  [arm] Early expansion of subvdi4
  [arm] Improvements to negvsi4 and negvdi4.
  [arm] Fix testsuite nit when compiling for thumb2

 gcc/config/arm/arm-modes.def  |   19 +-
 gcc/config/arm/arm-protos.h   |1 +
 gcc/config/arm/arm.c  |  598 -
 gcc/config/arm/arm.md | 2020 ++---
 gcc/config/arm/iterators.md   |   15 +-
 gcc/config/arm/predicates.md  |   29 +-
 gcc/config/arm/thumb2.md  |8 +-
 .../gcc.dg/builtin-arith-overflow-3.c |   41 +
 gcc/testsuite/gcc.target/arm/negdi-3.c|4 +-
 9 files changed, 1757 insertions(+), 978 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-arith-overflow-3.c



[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #23 from Jakub Jelinek  ---
(In reply to Segher Boessenkool from comment #22)
> Hrm, I don't see how this is nicer than just adding a scratch in the
> pattern?  What makes that a worse option?

Most of the patterns don't have constraints and don't want to deal with that. 
See the ugliness I had to play with the enabled attribute in the earlier
version of the patch to deal properly with the constraints.  Many of them
actually don't create any pseudos, just want to be matched only before reload,
split there and not match afterwards.

[Bug libstdc++/92156] New: Cannot in-place construct std::any with std::any

2019-10-18 Thread jason.e.cobb at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92156

Bug ID: 92156
   Summary: Cannot in-place construct std::any with std::any
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jason.e.cobb at gmail dot com
  Target Milestone: ---

GCC (and clang, when using libstdc++) claim that the following program is
ill-formed when compiled with "-std=c++17":

#include 

int main() {
auto a = std::any(std::in_place_type, 5);
}

[end code]

Both clang with libc++ and MSVC accept this program.

On Compiler Explorer: https://godbolt.org/z/ckzMsg .

The standard says that this is correct. Under
http://eel.is/c++draft/any.cons#itemdecl:5 , the only requirements are that T
be copy-constructible and constructible from the in-place args. std::any is
copy-constructible and constructible from int, so this program should be
well-formed.

Re: PR69455

2019-10-18 Thread Steve Kargl
On Fri, Oct 18, 2019 at 05:17:38PM +0100, Iain Sandoe wrote:
> 
> something like this, perhaps (I regret my Fortran skills are in the f77 era):
> 

If you know/knew F77 and have some working knowledge of C/C++ and
you want to see where modern Fortran sits, I recommend Modern Fortran
Explained iby Metcalf et al.  You can probably read it in a day.

For the record, here is the patch (see attached) committed to all
open branches.

With this commit, I'll be taking a long break from looking at
any gfortran bugs.

2019-10-18  Steven G. Kargl  

PR fortran/69455
* trans-decl.c (generate_local_decl): Avoid misconstructed
intrinsic modules in a BLOCK construct.

2019-10-18  Steven G. Kargl  

PR fortran/69455
* gfortran.dg/pr69455_1.f90: New test.
* gfortran.dg/pr69455_2.f90: Ditto.

-- 
Steve
25.16%
Index: gcc/fortran/trans-decl.c
===
--- gcc/fortran/trans-decl.c	(revision 277157)
+++ gcc/fortran/trans-decl.c	(working copy)
@@ -5962,7 +5962,14 @@ generate_local_decl (gfc_symbol * sym)
 
   if (sym->ns && sym->ns->construct_entities)
 	{
-	  if (sym->attr.referenced)
+	  /* Construction of the intrinsic modules within a BLOCK
+	 construct, where ONLY and RENAMED entities are included,
+	 seems to be bogus.  This is a workaround that can be removed
+	 if someone ever takes on the task to creating full-fledge
+	 modules.  See PR 69455.  */
+	  if (sym->attr.referenced
+	  && sym->from_intmod != INTMOD_ISO_C_BINDING
+	  && sym->from_intmod != INTMOD_ISO_FORTRAN_ENV)
 	gfc_get_symbol_decl (sym);
 	  sym->mark = 1;
 	}
Index: gcc/testsuite/gfortran.dg/pr69455_1.f90
===
--- gcc/testsuite/gfortran.dg/pr69455_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr69455_1.f90	(working copy)
@@ -0,0 +1,14 @@
+! { dg-do run }
+program foo
+   block
+  use, intrinsic :: iso_c_binding, only: wp => c_float, ik => c_int
+  if (ik /= 4) stop 1
+  if (wp /= 4) stop 2
+   end block
+   block
+  use, intrinsic :: iso_c_binding, only: wp => c_double, ik => c_int64_t
+  if (ik /= 8) stop 3
+  if (wp /= 8) stop 4
+   end block
+end program foo
+
Index: gcc/testsuite/gfortran.dg/pr69455_2.f90
===
--- gcc/testsuite/gfortran.dg/pr69455_2.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr69455_2.f90	(working copy)
@@ -0,0 +1,13 @@
+! { dg-do run }
+program foo
+   block
+  use, intrinsic :: ISO_FORTRAN_ENV, only: wp => REAL32, ik => INT32
+  if (ik /= 4) stop 1
+  if (wp /= 4) stop 2
+   end block
+   block
+  use, intrinsic :: ISO_FORTRAN_ENV, only: wp => REAL64, ik => INT64
+  if (ik /= 8) stop 3
+  if (wp /= 8) stop 4
+   end block
+end program foo


[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #22 from Segher Boessenkool  ---
Hrm, I don't see how this is nicer than just adding a scratch in the
pattern?  What makes that a worse option?

[Bug tree-optimization/92155] New: strlen(a) not folded after memset(a, 0, sizeof a)

2019-10-18 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92155

Bug ID: 92155
   Summary: strlen(a) not folded after memset(a, 0, sizeof a)
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msebor at gcc dot gnu.org
  Target Milestone: ---

Now that GCC unrolls memset-like loops with small numbers of iterations
(pr91975) and transforms some of them into MEM_REF, the strlen pass can also
determine the lengths of zeroed-out arrays to be zero.  This can be seen in
function f below.

But GCC doesn't yet transform memset calls into the equivalent MEM_REFs, and
the strlen pass for some reason can't figure out that the length of an array
that's been zeroed-out by memset is also zero.  This missed optimization can be
seen in function g below.

$ cat z.c && gcc -O2 -S -Wall -fdump-tree-optimized=/dev/stdout z.c
extern char a4[4];
extern char b4[4];

void f (void)
{
  for (int i = 0; i != sizeof a4; ++i)
a4[i] = 0;
  for (int i = 0; i != sizeof b4; ++i)
b4[i] = 0;

  if (__builtin_strlen (a4) != __builtin_strlen (b4))
__builtin_abort ();
}

void g (void)
{
  __builtin_memset (a4, 0, sizeof a4);
  __builtin_memset (b4, 0, sizeof b4);

  if (__builtin_strlen (a4) != __builtin_strlen (b4))
__builtin_abort ();
}

;; Function f (f, funcdef_no=0, decl_uid=1932, cgraph_uid=1, symbol_order=0)

f ()
{
   [local count: 214748369]:
  MEM  [(char *)] = 0;
  MEM  [(char *)] = 0;
  return;

}



;; Function g (g, funcdef_no=1, decl_uid=1943, cgraph_uid=2, symbol_order=1)

g ()
{
  long unsigned int _1;
  long unsigned int _2;

   [local count: 1073741824]:
  __builtin_memset (, 0, 4);
  __builtin_memset (, 0, 4);
  _1 = __builtin_strlen ();
  _2 = __builtin_strlen ();
  if (_1 != _2)
goto ; [0.00%]
  else
goto ; [100.00%]

   [count: 0]:
  __builtin_abort ();

   [local count: 1073741824]:
  return;

}

[Bug debug/91929] missing inline subroutine information in build using sin/cos

2019-10-18 Thread dimhen at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91929

Dmitry G. Dyachenko  changed:

   What|Removed |Added

 CC||dimhen at gmail dot com

--- Comment #12 from Dmitry G. Dyachenko  ---
I see new warnings -Wuninitialized and -Wmaybe-uninitialized after r276993

r276992 no warnings
r276993 warnings

$ cat x_3.i
int *a;
int b, d;

int g() {
  int *c;
  int e[6];
  int f = 1;
  if (0)
goto cd;
  c = 0;
  for (; d; d++)
*e = 1 ^ *(c + 1);
  if (f)
for (b = 0;;)
  a[0] = e[b];
cd:
  return 0;
}

$ ~/arch-gcc/gcc_276993/bin/gcc -fpreprocessed -O2 -Wall -c x_3.i

x_3.i: In function ‘g’:
x_3.i:15:15: warning: ‘e[0]’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
   15 |   a[0] = e[b];
  |  ~^~~


$ cat x.i
typedef struct {
  int a[0];
} c;
typedef struct {
  c d;
} * e;
e a;
void f(void);
void f() {
  int c[1];
  for (;;) {
unsigned long d[0];
int b, g, h = b = h;
unsigned long *e = d;
for (; g; ++g)
  e[g] = 0;
*a->d.a = *c;
  }
}

$ ~/arch-gcc/gcc_276993/bin/gcc -fpreprocessed -O2 -Wall -c x.i
x.i: In function ‘f’:
x.i:17:13: warning: ‘c[0]’ is used uninitialized in this function
[-Wuninitialized]
   17 | *a->d.a = *c;
  | ^~~~

[Bug fortran/69455] [7/8/9/10 Regression] [F08] Assembler error(s) when using intrinsic modules in two BLOCK

2019-10-18 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69455

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Assignee|unassigned at gcc dot gnu.org  |kargl at gcc dot gnu.org

--- Comment #20 from kargl at gcc dot gnu.org ---
Fixed on all open branches and trunk.

[Bug fortran/69455] [7/8/9/10 Regression] [F08] Assembler error(s) when using intrinsic modules in two BLOCK

2019-10-18 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69455

--- Comment #19 from kargl at gcc dot gnu.org ---
Author: kargl
Date: Fri Oct 18 19:26:22 2019
New Revision: 277193

URL: https://gcc.gnu.org/viewcvs?rev=277193=gcc=rev
Log:
2019-10-18  Steven G. Kargl  

PR fortran/69455
* trans-decl.c (generate_local_decl): Avoid misconstructed
intrinsic modules in a BLOCK construct.

2019-10-18  Steven G. Kargl  

PR fortran/69455
* gfortran.dg/pr69455_1.f90: New test.
* gfortran.dg/pr69455_2.f90: Ditto.

Added:
branches/gcc-7-branch/gcc/testsuite/gfortran.dg/pr69455_1.f90
branches/gcc-7-branch/gcc/testsuite/gfortran.dg/pr69455_2.f90
Modified:
branches/gcc-7-branch/gcc/fortran/ChangeLog
branches/gcc-7-branch/gcc/fortran/trans-decl.c
branches/gcc-7-branch/gcc/testsuite/ChangeLog

(ARM) Wrong conditional codes when paired with tst instruction

2019-10-18 Thread AlwaysTeachingable .
The following C code:
unsigned int wrong(unsigned int n){
return (n%2) ? 0 : 42;
}

should return 42 when n is odd and 0 when n is even.

But ARM gcc 8.2 with -O3 produces following assembly:

tst r0, #1
moveq r0, #42
movne r0, #0
bx lr

tst r0,#1 sets Z=1 iff r0 is even, and moveq r0,#42 executes iff Z=1,
therefore
it sets r0 to 42 when r0 is even, which is wrong given the C code above (
it should return 0 ).


[Bug tree-optimization/92056] [10 Regression] ice in expr_object_size, at tree-object-si ze.c:675 with -O3

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92056

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jakub Jelinek  ---
Fixed.

[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread harald at gigawatt dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

--- Comment #11 from Harald van Dijk  ---
(In reply to Rich Felker from comment #10)
> On this particular target, and on every target of any modern
> relevance, (float)16777217 has well-defined behavior.

That was exactly the point of my original comment. I do not understand why you
took issue with it.

[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

--- Comment #10 from Rich Felker  ---
GCC can choose the behavior for any undefined behavior it wants, and GCC
absolutely can make transformations based on behaviors it guarantees or that
Annex F guarantees on targets for which it implements the requirements of Annex
F. On this particular target, and on every target of any modern relevance,
(float)16777217 has well-defined behavior. On ones with floating point
environment (most/all hardfloat), it has side effects (inexact), so can't be
elided without the flags to make gcc ignore those side effects.

Re: [Patch][Demangler] Fix for complex values

2019-10-18 Thread Miguel Saldivar
The only reason  I wanted `float complex` was for interoperability
between the two other demanglers. Although the go demangler
does use `_Complex` and `_Imaginary`, so I guess it's sort of split.
But I agree, `_Complex` and `_Imaginary` is probably the
better option.

Thanks,
Miguel Saldivar

On Fri, Oct 18, 2019 at 9:04 AM Ian Lance Taylor  wrote:

> On Thu, Oct 17, 2019 at 10:20 PM Miguel Saldivar 
> wrote:
> >
> > This is a small fix for Bug 67299, where symbol: `Z1fCf` which would
> become
> > `f(float complex)` instead of `f(floatcomplex )`.
> > I thought this would be the preferred way of printing, because both
> > `llvm-cxxfilt` and `cpp_filt` both print the the mangled name in this
> > fashion.
>
> Thanks.  Personally I think it would be better to change the strings
> to " _Complex" and " _Imaginary".  I'm open to discussion on this.
>
> Ian
>
> > From 4ca98c0749bae1389594b31ee7f6ef575aafcd8f Mon Sep 17 00:00:00 2001
> > From: Miguel Saldivar 
> > Date: Thu, 17 Oct 2019 16:36:19 -0700
> > Subject: [PATCH][Demangler] Small fix for complex values
> >
> > gcc/libiberty/
> > * cp-demangle.c (d_print_mod): Add a space before printing `complex`
> > and `imaginary`, as opposed to after.
> >
> > gcc/libiberty/
> > * testsuite/demangle-expected: Adjust test.
> > ---
> >  libiberty/ChangeLog   | 5 +
> >  libiberty/cp-demangle.c   | 4 ++--
> >  libiberty/testsuite/demangle-expected | 2 +-
> >  3 files changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
> > index 97d9767c2ea..62d5527b95b 100644
> > --- a/libiberty/ChangeLog
> > +++ b/libiberty/ChangeLog
> > @@ -1,3 +1,8 @@
> > +2019-10-17  Miguel Saldivar  
> > + * cp-demangle.c (d_print_mod): Add a space before printing `complex`
> > + and `imaginary`, as opposed to after.
> > + * testsuite/demangle-expected: Adjust test.
> > +
> >  2019-10-03  Eduard-Mihai Burtescu  
> >
> >   * rust-demangle.c (looks_like_rust): Remove.
> > diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
> > index aa78c86dd44..bd4dfb785a9 100644
> > --- a/libiberty/cp-demangle.c
> > +++ b/libiberty/cp-demangle.c
> > @@ -5977,10 +5977,10 @@ d_print_mod (struct d_print_info *dpi, int
> options,
> >d_append_string (dpi, "&&");
> >return;
> >  case DEMANGLE_COMPONENT_COMPLEX:
> > -  d_append_string (dpi, "complex ");
> > +  d_append_string (dpi, " complex");
> >return;
> >  case DEMANGLE_COMPONENT_IMAGINARY:
> > -  d_append_string (dpi, "imaginary ");
> > +  d_append_string (dpi, " imaginary");
> >return;
> >  case DEMANGLE_COMPONENT_PTRMEM_TYPE:
> >if (d_last_char (dpi) != '(')
> > diff --git a/libiberty/testsuite/demangle-expected
> > b/libiberty/testsuite/demangle-expected
> > index f21ed00e559..43f003655b2 100644
> > --- a/libiberty/testsuite/demangle-expected
> > +++ b/libiberty/testsuite/demangle-expected
> > @@ -1278,7 +1278,7 @@ int& int_if_addable(A > ((*((Y*)(0)))+(*((Y*)(0>*)
> >  #
> >  --format=gnu-v3
> >  _Z3bazIiEvP1AIXszcl3foocvT__ELCf_
> > -void baz(A )_))>*)
> > +void baz(A complex)_))>*)
> >  #
> >  --format=gnu-v3
> >  _Z3fooI1FEN1XIXszdtcl1PclcvT__EEE5arrayEE4TypeEv
> > --
> > 2.23.0
>


[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread harald at gigawatt dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

--- Comment #9 from Harald van Dijk  ---
(In reply to Rich Felker from comment #8)
> So arguments about generality to non-Annex-F C
> environments are not relevant to the topic here.

The comment it was a reply to suggested (possibly unintentionally) that
evaluating (float)16777217 would have undefined behaviour if 16777217 could not
be represented by float. A clarification that no, the standard says it only has
undefined behaviour if it is out of float's range, so GCC cannot optimise on
the assumption that such conversions do not happen, is absolutely relevant to
the topic here.

[Bug fortran/69455] [7/8/9/10 Regression] [F08] Assembler error(s) when using intrinsic modules in two BLOCK

2019-10-18 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69455

--- Comment #18 from kargl at gcc dot gnu.org ---
Author: kargl
Date: Fri Oct 18 18:18:34 2019
New Revision: 277161

URL: https://gcc.gnu.org/viewcvs?rev=277161=gcc=rev
Log:
2019-10-18  Steven G. Kargl  

PR fortran/69455
* trans-decl.c (generate_local_decl): Avoid misconstructed
intrinsic modules in a BLOCK construct.

2019-10-18  Steven G. Kargl  

PR fortran/69455
* gfortran.dg/pr69455_1.f90: New test.
* gfortran.dg/pr69455_2.f90: Ditto.

Added:
branches/gcc-8-branch/gcc/testsuite/gfortran.dg/pr69455_1.f90
branches/gcc-8-branch/gcc/testsuite/gfortran.dg/pr69455_2.f90
Modified:
branches/gcc-8-branch/gcc/fortran/ChangeLog
branches/gcc-8-branch/gcc/fortran/trans-decl.c
branches/gcc-8-branch/gcc/testsuite/ChangeLog

Re: [Patch, fortran] PR fortran/92142 - CFI_setpointer corrupts descriptor

2019-10-18 Thread Paul Richard Thomas
I will deal with this and various other issues associated with
ISO_Fortran_binding tomorrow.

Thanks for your help

Paul

On Thu, 17 Oct 2019 at 18:30, Tobias Burnus  wrote:
>
> Hi,
>
> +  fprintf (stderr, "CFI_setpointer: Result is NULL.\n");
> …
> > + return CFI_INVALID_DESCRIPTOR;
> > +! { dg-do run }
> > +! { dg-additional-options "-fbounds-check" }
> > +! { dg-additional-sources ISO_Fortran_binding_15.c }
>
>
> If you generate to stdout/stderr like in this case, I think it makes
> sense to also check for this output using "{dg-output …}".
>
> Otherwise, it looks okay at a glance – but I defer the proper review to
> either someone else or to later.
>
> Another question would be: Is it always guaranteed that
> result->attribute  is set? I am asking because it resembles to the
> untrained eye the code at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92027
>
> And there, the result attribute is unset – that might be a bug in the C
> code of the test itself – or in libgomp. But it doesn't harm to quickly
> think about whether that can be an issue here as well or not.
>
> Cheers,
>
> Tobias
>


-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

--- Comment #8 from Rich Felker  ---
> Floating point types are not guaranteed to support infinity by the C standard

Annex F (IEEE 754 alignment) does guarantee it, and GCC aims to implement this.
This issue report is specific to target sh*-*-* which uses either softfloat
with IEEE types and semantics or SH4 hardfloat which has IEEE types and
semantics. So arguments about generality to non-Annex-F C environments are not
relevant to the topic here.

[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread harald at gigawatt dot nl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

--- Comment #7 from Harald van Dijk  ---
(In reply to Rich Felker from comment #6)
> > Only if the int is out of float's range.
> 
> float's range is [-INF,INF] (endpoints included). There is no such thing as
> "out of float's range".

Floating point types are not guaranteed to support infinity by the C standard,
and checking GCC sources, it appears to support at least one representation
without infinities (VAX).

[Bug fortran/69455] [7/8/9/10 Regression] [F08] Assembler error(s) when using intrinsic modules in two BLOCK

2019-10-18 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69455

--- Comment #17 from kargl at gcc dot gnu.org ---
Author: kargl
Date: Fri Oct 18 17:59:32 2019
New Revision: 277160

URL: https://gcc.gnu.org/viewcvs?rev=277160=gcc=rev
Log:
2019-10-18  Steven G. Kargl  

PR fortran/69455
* trans-decl.c (generate_local_decl): Avoid misconstructed
intrinsic modules in a BLOCK construct.

2019-10-18  Steven G. Kargl  

PR fortran/69455
* gfortran.dg/pr69455_1.f90: New test.
* gfortran.dg/pr69455_2.f90: Ditto.

Added:
branches/gcc-9-branch/gcc/testsuite/gfortran.dg/pr69455_1.f90
branches/gcc-9-branch/gcc/testsuite/gfortran.dg/pr69455_2.f90
Modified:
branches/gcc-9-branch/gcc/fortran/ChangeLog
branches/gcc-9-branch/gcc/fortran/trans-decl.c
branches/gcc-9-branch/gcc/testsuite/ChangeLog

[Bug fortran/69455] [7/8/9/10 Regression] [F08] Assembler error(s) when using intrinsic modules in two BLOCK

2019-10-18 Thread kargl at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69455

--- Comment #16 from kargl at gcc dot gnu.org ---
Author: kargl
Date: Fri Oct 18 17:27:06 2019
New Revision: 277158

URL: https://gcc.gnu.org/viewcvs?rev=277158=gcc=rev
Log:
2019-10-18  Steven G. Kargl  

PR fortran/69455
* trans-decl.c (generate_local_decl): Avoid misconstructed
intrinsic modules in a BLOCK construct.

2019-10-18  Steven G. Kargl  

PR fortran/69455
* gfortran.dg/pr69455_1.f90: New test.
* gfortran.dg/pr69455_2.f90: Ditto.

Added:
trunk/gcc/testsuite/gfortran.dg/pr69455_1.f90
trunk/gcc/testsuite/gfortran.dg/pr69455_2.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/trans-decl.c
trunk/gcc/testsuite/ChangeLog

[Bug target/92149] Enefficient x86_64 code

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92149

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Most likely dup of PR92038, which you've already filed yourself.

[Bug middle-end/92153] [10 Regression] ICE / segmentation fault, use-after-free at gcc/ggc-page.c:1159

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92153

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Jakub Jelinek  ---
Should be fixed now, thanks for the report.  Most developers test GCC on hosts
that do support mmap and thus this went unnoticed for a few days.

Re: New GCC mirror from Rabat, Morocco

2019-10-18 Thread Sami Ait Ali Oulahcen
A kind reminder.



Sent from my iPhone

> On Oct 6, 2019, at 01:01, Sami Ait Ali Oulahcen  wrote:
> 
> Hi,
> 
> We'd like to start mirroring the GCC.
> 
> URLs:
> http://mirror.marwan.ma/gcc/
> https://mirror.marwan.ma/gcc/
> rsync://mirror.marwan.ma/gcc/
> Location: Rabat, Morocco
> Contact:  Sami Ait Ali Oulahcen (noc{at}marwan{dot}ma)
> 
> Please let us know of the central rsync address, and the recommended pull 
> frequency.
> 
> Regards,
> 
> Sami



Re: Implement ggc_trim

2019-10-18 Thread Jakub Jelinek
On Fri, Oct 11, 2019 at 09:03:53AM +0200, Jan Hubicka wrote:
> Bootstrapped/regtested x86_64-linux, OK?
> 
>   * ggc-page.c (release_pages): Output statistics when !quiet_flag.
>   (ggc_collect): Dump later to not interfere with release_page dump.
>   (ggc_trim): New function.
>   * ggc-none.c (ggc_trim): New.
> 
>   * lto.c (lto_wpa_write_files): Call ggc_trim.

> @@ -1152,10 +1156,20 @@ release_pages (void)
>   *gp = g->next;
>   G.bytes_mapped -= g->alloc_size;
>   free (g->allocation);
> + n1 += g->alloc_size;
>}
>  else
>gp = >next;
>  #endif

This broke !defined(USING_MMAP) support, the second g->alloc_size read
is after the memory containing *g is freed.

Fixed thusly, tested with #undef USING_MMAP in the file (without the patch
self-test ICEs, with it succeeds), committed to trunk as obvious.

2019-10-18  Jakub Jelinek  

PR middle-end/92153
* ggc-page.c (release_pages): Read g->alloc_size before free rather
than after it.

--- gcc/ggc-page.c.jj   2019-10-11 14:10:44.987386981 +0200
+++ gcc/ggc-page.c  2019-10-18 19:13:59.458085610 +0200
@@ -1155,8 +1155,8 @@ release_pages (void)
   {
*gp = g->next;
G.bytes_mapped -= g->alloc_size;
-   free (g->allocation);
n1 += g->alloc_size;
+   free (g->allocation);
   }
 else
   gp = >next;


Jakub


[Bug middle-end/92153] [10 Regression] ICE / segmentation fault, use-after-free at gcc/ggc-page.c:1159

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92153

--- Comment #1 from Jakub Jelinek  ---
Author: jakub
Date: Fri Oct 18 17:18:21 2019
New Revision: 277157

URL: https://gcc.gnu.org/viewcvs?rev=277157=gcc=rev
Log:
PR middle-end/92153
* ggc-page.c (release_pages): Read g->alloc_size before free rather
than after it.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ggc-page.c

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #41 from Rich Felker  ---
> Josef Wolf mentioned that he ran into this on the gcc-help mailing list here: 
> https://gcc.gnu.org/ml/gcc-help/2019-10/msg00079.html

I don't think that's an instance of this issue. It's normal/expected that
__builtin_foo compiles to a call to foo in the absence of factors that lead to
it being optimized to something simpler. The idiom of using __builtin_foo to
get the compiler to emit an optimized implementation of foo for you, to serve
as the public definition of foo, is simply not valid. That's kinda a shame
because it would be nice to be able to do it for lots of math library
functions, but of course in order for this to be able to work gcc would have to
promise it can generate code for the operation for all targets, which is
unlikely to be reasonable.

[Bug middle-end/92153] [10 Regression] ICE / segmentation fault, use-after-free at gcc/ggc-page.c:1159

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92153

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
   Target Milestone|--- |10.0

[Bug sanitizer/92154] new glibc breaks arm bootstrap due to libsanitizer

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92154

--- Comment #1 from Jakub Jelinek  ---
If it has landed upstream already, please post the backport of it to
gcc-patches.

[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2019-10-18 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #40 from Eric Gallager  ---
Josef Wolf mentioned that he ran into this on the gcc-help mailing list here:
https://gcc.gnu.org/ml/gcc-help/2019-10/msg00079.html

[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #21 from Jakub Jelinek  ---
Created attachment 47069
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47069=edit
gcc10-prereload-splitters.patch

Ah, apparently we already have for ~ 2 years a property to handle this safely.
So perhaps following incremental (so far completely untested) patch?
It unfortunately requires the two generic changes, the alternative would be
to add a helper function somewhere in i386*.c which would
return can_create_pseudo_p () && !(cfun->curr_properties &
PROP_rtl_split_insns);
declare it in i386-protos.h and just use it in config/i386/*.md instead.
Any preferences?  From maintainance POV, it might be cleaner to have the
wrapper, but then the question is what is the best name for it.

Re: [PATCH] OpenACC 2.6 manual deep copy support (attach/detach)

2019-10-18 Thread Thomas Schwinge
Hi!

While reviewing
<20191003163505.49997-2-julian@codesourcery.com">http://mid.mail-archive.com/20191003163505.49997-2-julian@codesourcery.com>
"OpenACC reference count overhaul", I've just now stumbled over one thing
that originally was designed here:

On 2018-12-10T19:41:37+, Julian Brown  wrote:
> On Fri, 7 Dec 2018 14:50:19 +0100
> Jakub Jelinek  wrote:
>
>> On Fri, Nov 30, 2018 at 03:41:09AM -0800, Julian Brown wrote:
>> > @@ -918,8 +920,13 @@ struct splay_tree_key_s {
>> >uintptr_t tgt_offset;
>> >/* Reference count.  */
>> >uintptr_t refcount;
>> > -  /* Dynamic reference count.  */
>> > -  uintptr_t dynamic_refcount;
>> > +  /* Reference counts beyond those that represent genuine references in 
>> > the
>> > + linked splay tree key/target memory structures, e.g. for multiple 
>> > OpenACC
>> > + "present increment" operations (via "acc enter data") refering to 
>> > the same
>> > + host-memory block.  */
>> > +  uintptr_t virtual_refcount;
>> > +  /* For a block with attached pointers, the attachment counters for 
>> > each.  */
>> > +  unsigned short *attach_count;
>> >/* Pointer to the original mapping of "omp declare target link" object. 
>> >  */
>> >splay_tree_key link_key;
>> >  };  
>> 
>> This is something I'm worried about a lot, the nodes keep growing way
>> too much.

Is that just a would-be-nice-to-avoid, or is it an actual problem?

If the latter, can we maybe move some data into on-the-side data
structures, say an associative array keyed by [something suitable]?  I
would assume that compared to actual host to/from device data movement
(or even lookup etc.), lookup of values from such an associative array
should be relatively cheap?

I'm bringing this up, because:

>> Is there a way to reuse some other field if it is of
>> certain kind?
>
> How about this -- it seems that the link_key is only used for OpenMP,

So, is that actually correct?  Per my understanding, for the OpenACC
'link' clause we uses 'GOMP_MAP_LINK', which sets "omp declare target
link", and thus:

> and the attach count is only needed for OpenACC. So the obvious thing
> to do is probably to put those two together into a tagged union. The
> question is where to put the tag?
>
> Options are, I guess:
>
> 1. The high or low bits of the address.  Potentially non-portable, ugly.
>
> 2. Or, the virtual refcount is also only needed for OpenACC, so we can
>reserve a magic value for that field to act as a tag.
>
> I've tried implementing the latter in the attached patch, and it seems
> to work OK.

... this is not actually feasible?

It's certainly possible that we're totally lacking sufficient testsuite
coverage, and that there are issues in the 'link' implementation
( "libgomp.c/target-link-1.c fails for
nvptx: #pragma omp target link not implemented" comes to mind
immediatelly, and certainly for OpenACC I used to be aware of additional
issues; I think I intended to use that mechanism for Fortran
'allocatable' with OpenACC 'declare'), but the libgomp handling to me
seems reasonable upon quick review -- just that we need to keep it alive
for OpenACC, too, unless I'm confused?


Simplifying the libgomp code to avoid the 'VREFCOUNT_LINK_KEY' toggle
flag, and not putting 'link_key' into an union together with
'attach_count', that should -- I hope -- resolve/obsolete some of the
questions raised in my late-night pre-Christmas 2018 review,
, where I'm now
not sure yet whether all my questions have been addressed (or disputed,
but I didn't hear anything) in the recent -- split-out, thanks! --
version of this patch,
<20191003163505.49997-2-julian@codesourcery.com">http://mid.mail-archive.com/20191003163505.49997-2-julian@codesourcery.com>
"OpenACC reference count overhaul".


Grüße
 Thomas


signature.asc
Description: PGP signature


Fixing cvs2svn branchpoints

2019-10-18 Thread Joseph Myers
As mentioned at the Cauldron, I'm looking at finding better branchpoints 
for the cases in the GCC repository where cvs2svn messed up identifying 
the parent branch and commit on which a branch was based, so that affected 
branches can be reparented as part of moving to git, since messed-up 
branchpoints are actually confusing in practice when looking at old 
branches.

An idiomatic branch in SVN would start with a commit that just copies one 
commit of one branch to another branch, with no further changes.  In many 
cases it's not possible to achieve that through reparenting because there 
is no commit on any parent branch exactly corresponding to the first 
commit on the cvs2svn-generated branch.  However, it's still possible to 
find a much better approximation than cvs2svn did in some cases.  (There 
are also cases where cvs2svn found a good branchpoint, but represented the 
branch-creation commit in a superfluously complicated way, replacing lots 
of files and subdirectories by copies of different revisions.  That 
doesn't really matter for conversion to git, however, since git's data 
structures don't say anything about where a particular subdirectory was 
copied from, just the tree hash and the parent commit.)

I'm using heuristics to see if a particular branch has a suspicious 
branchpoint.  First, if there is a branchpoint tag I take that as the best 
estimate of what the tree should look like at the branchpoint commit on 
the parent branch; otherwise, I take the first commit on the branch as the 
best estimate of that.  Then, I consider a branchpoint not to be 
suspicious if the only diffs between the tree at the parent commit and the 
tree estimated to start the branch to be file deletions, and, if there was 
no branchpoint commit, file additions.

(There are several reasons why the creation of a branch might involve file 
deletions.  Some look like CVS glitches where it simply failed to create 
the branch in particular ,v files; some may be cases where the person 
created the branch only for certain subdirectories, deliberately; some 
look like cases where ,v files for separately developed subdirectories, 
e.g. libjava, got moved into the GCC CVS repository at some point, so 
resulting in the appearance of those subdirectories being deleted on 
creation of branches before they were moved into place.  File additions at 
branch creation look more like an artifact of how cvs2svn handles cases of 
a file first added on trunk after a branch was created, then backported to 
that branch.)

If the branchpoint is suspicious (54 are, out of 135 branches in /branches 
as of r105925, the last cvs2svn-generated commit), I then look for an 
alternative non-suspicious branchpoint, which might be either on the same 
parent branch currently used, or on a different one chosen by some 
heuristics.  Because pretty much all normal GCC commits change file 
contents (modifying a ChangeLog file, if nothing else), any candidate 
parent that is non-suspicious, and thus does not involve any file content 
differences when compared with the branchpoint commit or first commit on 
the branch, should be very close to being the right parent commit.

Here is a list of reparentings I suggest for 16 of those 54 branches, 
including in particular the cases of egcs_1_00_branch and gcc-3_2-branch 
that were noted on IRC to have bad branchpoints at present; some are only 
small changes, some are much more major fixes.  I expect I can find 
reparentings for some of the rest with more investigation and improved 
heuristics or hints for those heuristics, while others may well already be 
essentially the right branchpoint despite file content changes being 
present in the first commit.  (Two of the rest do have reparentings 
suggested by my script, but they need more careful investigation because 
of file content mismatches between the branchpoint tags and the first 
commit on the branch.)

The first two columns after REPARENT: list the SVN path of the branch, and 
the revision number of the first commit on it (the one that should be 
reparented).  The next two list the suspicious parent (that is, the branch 
and revision from which cvs2svn generated the copy that created the 
top-level /branches/whatever directory for the branch, along with further 
changes in the commit to fix up files and subdirectories in that copy to 
have the right tree contents).  The final two columns list the proposed 
new parent branch and revision on that branch.  In all cases, the tree 
content is expected to be left as generated by cvs2svn; it's simply the 
commit parent that should be changed in git.

REPARENT: /branches/GC_5_0_ALPHA_1 27860 /trunk 27852 /trunk 27855
REPARENT: /branches/csl-3_3_1-branch 70143 /trunk 60111 
/branches/gcc-3_3-branch 70142
REPARENT: /branches/csl-3_4-linux-branch 90110 /trunk 75991 
/branches/gcc-3_4-branch 90109
REPARENT: /branches/csl-3_4_0-hp-branch 80843 /trunk 75991 
/branches/gcc-3_4-branch 80842
REPARENT: 

UML - GCC program

2019-10-18 Thread Angelmarauder
Hello community,

I don't know the correct format for this email, so I'll give a summary
according to https://gcc.gnu.org/contributewhy.html and wait for
responses!

"what you are working on"

I am creating a program to link the draw.io platform with a c++
compiled gcc output
to allow for:

1. auto-documentation and graphical diagrams (class, sequence, etc.)
2. alternatively create basic c++ code based on diagram creation

"give occasional reports of how far you have come"

I have begun planning!  I have my own personal Trello for now but I
will be expanding my over time.

"how confident you are that you will finish the job"

This is my senior project for my BS Software Engineering.  I'm hoping
to eventually commercialize.
Since it is a requirement to finish the project, minimal functionality
will be required.

Sincerely,
Angelmarauder


Technical Feasibility.odt
Description: application/vnd.oasis.opendocument.text


Re: [Patch][Demangler] Fix for complex values

2019-10-18 Thread Ian Lance Taylor via gcc-patches
On Thu, Oct 17, 2019 at 10:20 PM Miguel Saldivar  wrote:
>
> This is a small fix for Bug 67299, where symbol: `Z1fCf` which would become
> `f(float complex)` instead of `f(floatcomplex )`.
> I thought this would be the preferred way of printing, because both
> `llvm-cxxfilt` and `cpp_filt` both print the the mangled name in this
> fashion.

Thanks.  Personally I think it would be better to change the strings
to " _Complex" and " _Imaginary".  I'm open to discussion on this.

Ian

> From 4ca98c0749bae1389594b31ee7f6ef575aafcd8f Mon Sep 17 00:00:00 2001
> From: Miguel Saldivar 
> Date: Thu, 17 Oct 2019 16:36:19 -0700
> Subject: [PATCH][Demangler] Small fix for complex values
>
> gcc/libiberty/
> * cp-demangle.c (d_print_mod): Add a space before printing `complex`
> and `imaginary`, as opposed to after.
>
> gcc/libiberty/
> * testsuite/demangle-expected: Adjust test.
> ---
>  libiberty/ChangeLog   | 5 +
>  libiberty/cp-demangle.c   | 4 ++--
>  libiberty/testsuite/demangle-expected | 2 +-
>  3 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
> index 97d9767c2ea..62d5527b95b 100644
> --- a/libiberty/ChangeLog
> +++ b/libiberty/ChangeLog
> @@ -1,3 +1,8 @@
> +2019-10-17  Miguel Saldivar  
> + * cp-demangle.c (d_print_mod): Add a space before printing `complex`
> + and `imaginary`, as opposed to after.
> + * testsuite/demangle-expected: Adjust test.
> +
>  2019-10-03  Eduard-Mihai Burtescu  
>
>   * rust-demangle.c (looks_like_rust): Remove.
> diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
> index aa78c86dd44..bd4dfb785a9 100644
> --- a/libiberty/cp-demangle.c
> +++ b/libiberty/cp-demangle.c
> @@ -5977,10 +5977,10 @@ d_print_mod (struct d_print_info *dpi, int options,
>d_append_string (dpi, "&&");
>return;
>  case DEMANGLE_COMPONENT_COMPLEX:
> -  d_append_string (dpi, "complex ");
> +  d_append_string (dpi, " complex");
>return;
>  case DEMANGLE_COMPONENT_IMAGINARY:
> -  d_append_string (dpi, "imaginary ");
> +  d_append_string (dpi, " imaginary");
>return;
>  case DEMANGLE_COMPONENT_PTRMEM_TYPE:
>if (d_last_char (dpi) != '(')
> diff --git a/libiberty/testsuite/demangle-expected
> b/libiberty/testsuite/demangle-expected
> index f21ed00e559..43f003655b2 100644
> --- a/libiberty/testsuite/demangle-expected
> +++ b/libiberty/testsuite/demangle-expected
> @@ -1278,7 +1278,7 @@ int& int_if_addable(A ((*((Y*)(0)))+(*((Y*)(0>*)
>  #
>  --format=gnu-v3
>  _Z3bazIiEvP1AIXszcl3foocvT__ELCf_
> -void baz(A*)
> +void baz(A*)
>  #
>  --format=gnu-v3
>  _Z3fooI1FEN1XIXszdtcl1PclcvT__EEE5arrayEE4TypeEv
> --
> 2.23.0


Re: Type representation in CTF and DWARF

2019-10-18 Thread Nick Alcock
On 18 Oct 2019, Pedro Alves stated:

> On 10/18/19 2:21 PM, Richard Biener wrote:
>
 In most cases local types etc are a fairly small contributor to the
 total volume -- but macros can contribute a lot in some codebases.
>>> (The
 Linux kernel's READ_ONCE macro is one I've personally been bitten by
>>> in
 the past, with a new local struct in every use. GCC doesn't
>>> deduplicate
 any of those so the resulting bloat from tens of thousands of
>>> instances
 of this identical structure is quite incredible...)

>>>
>>> Sounds like something that would be beneficial to do with DWARF too.
>> 
>> Otoh those are distinct types according to the C standard and since dwarf is 
>> a source level representation we should preserve this (source locations also 
>> differ). 
>
> Right.  Maybe some partial deduplication would be possible, preserving
> type distinction.  But since CTF doesn't include these, this is moot
> for now.

Yeah, the libctf API and existing CTF users only care if they're
assignment-compatible, which they are. We could preserve more
type-identity information if there was a need to do so, but none has yet
emerged.

-- 
NULL && (void)


[Bug target/92140] clang vs gcc optimizing with adc/sbb

2019-10-18 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92140

--- Comment #20 from Segher Boessenkool  ---
Ah, okay.  So it is either one or two insns (zero can not be handled, but you
can do a noop, a move of a reg to itself, and that will be optimised away just
fine).  Three insns is not something combine ever handles at all: it's always
{2,3,4}->{1,2}.

Since some years new pseudos *can* be created during combine, but there are
various problems with that still.

[Bug debug/90231] ivopts causes iterator in the loop

2019-10-18 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90231

--- Comment #19 from Jakub Jelinek  ---
Created attachment 47068
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=47068=edit
gcc10-pr90231.patch

Untested implementation of what I wrote above.
The difference on the testcase at -O2 -g is:
[local count: 955630224]:
   # base_17 = PHI 
   # ivtmp.11_20 = PHI 
-  # DEBUG i => NULL
+  # DEBUG i => (int) ((ivtmp.11_20 - (unsigned long) dst_10(D)) /[ex] 4)
   # DEBUG base => base_17
   # DEBUG BEGIN_STMT
   _4 = (void *) ivtmp.11_20;
   MEM[base: _4, offset: 0B] = base_17;
   # DEBUG BEGIN_STMT
-  # DEBUG D#1 => NULL
+  # DEBUG D#1 => (int) ((ivtmp.11_20 - (unsigned long) dst_10(D)) /[ex] 4) + 1
   # DEBUG i => D#1
   base_13 = base_17 + 15;
   # DEBUG i => D#1
and in the debugger I can actually see correct i values.

[Bug tree-optimization/60540] Don't convert int to float when comparing int with float (double) constant

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60540

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #6 from Rich Felker  ---
> Only if the int is out of float's range.

float's range is [-INF,INF] (endpoints included). There is no such thing as
"out of float's range".

[Bug target/12306] GOT pointer (r12) reloaded unnecessarily

2019-10-18 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12306

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #8 from Rich Felker  ---
I think this should be closed as not a bug. There is no contract that, on
function entry, the r12 register contain the callee's GOT pointer. Rather it
contains the caller's GOT pointer, and the two will only be equal if both
reside in the same DSO.

(Note that PowerPC64 ELFv2 ABI goes to great lengths to optimize this case with
"local entry point" and fancy ABI contract for how the GOT pointer save/load
can be elided. I'm not sure the benefits are well-documented though.)

  1   2   >