Re: ipa-visibility TLC 2/n

2014-05-29 Thread Richard Sandiford
Jan Hubicka  writes:
>> Richard Sandiford wrote the original section anchors implementation,
>> so he would be a good person to comment about the interaction between
>> aliases and section anchors.
>
> Thanks! Richard, does this patch seem sane?

Looks good to me in principle, but with:

> +  struct symtab_node *snode;
>decl = SYMBOL_REF_DECL (symbol);
> +
> +  snode = symtab_node (decl);
> +  if (snode->alias)
> +   {
> + rtx target = DECL_RTL (symtab_alias_ultimate_target (snode)->decl);
> + SYMBOL_REF_BLOCK_OFFSET (symbol) = SYMBOL_REF_BLOCK_OFFSET (target);
> + return;
> +   }

is SYMBOL_REF_BLOCK_OFFSET (target) guaranteed to be valid at this point?
It looked at face value like you'd need a recursive call to place_block_symbol
on the target before the copy.

Thanks,
Richard


RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-29 Thread Thomas Preud'homme
> From: Andreas Schwab [mailto:sch...@linux-m68k.org]
> 
> This adds a full byte of padding between each bitfield.  If you want a
> single padding bit you should use :1, but you also need to update the
> test to check for 0x44434241 (0x88868482 is impossible, since that
> requires at least 8 bits per bitfield).

Actually if I understood C99 correctly it depends on the storage unit
allocated for the bitfield preceding the 0 length bitfield. Instead of
trying to cover all possible value read from this bitfield I rewrote the
test to check if bswap misinterpret the expression and replace it with
a load or load+bswap. This reduce the number of possible values to 2
and thus makes the test less fragile and easier to understand.

By the way, I couldn't understand how you reached the value
0x44434241. Can you explain me?

Here is the ChangeLog:

2014-05-29  Thomas Preud'homme  

* gcc.c-torture/execute/bswap-2.c: Add alignment constraints to
bitfield and test wrong results instead of correct results to make the
test more portable.

And the patch:

diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
index 38f18fd..a47e01a 100644
--- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
+++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
@@ -6,8 +6,11 @@ typedef __UINT32_TYPE__ unsigned;
 
 struct bitfield {
   unsigned char f0:7;
+  unsigned char   :1;
   unsigned char f1:7;
+  unsigned char   :1;
   unsigned char f2:7;
+  unsigned char   :1;
   unsigned char f3:7;
 };
 
@@ -74,11 +77,17 @@ main ()
 return 0;
   bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 };
   out = partial_read_le32 (bfin);
-  if (out != 0x09070503 && out != 0x88868482 && out != 0x78306141)
+  /* Test what bswap would do if its check are not strict enough instead of
+ what is the expected result as there is too many possible results with
+ bitfields.  */
+  if (out == 0x89878583)
 __builtin_abort ();
   bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 };
   out = partial_read_be32 (bfin);
-  if (out != 0x03050709 && out != 0x82848688 && out != 0x41613078)
+  /* Test what bswap would do if its check are not strict enough instead of
+ what is the expected result as there is too many possible results with
+ bitfields.  */
+  if (out == 0x83858789)
 __builtin_abort ();
   out = fake_read_le32 (cin, &cin[2]);
   if (out != 0x89018583)

Best regards,

Thomas

fix_bswap-2.diff
Description: Binary data


Re: [Patch] Minor fixes for regtesting gfortran with -flto

2014-05-29 Thread Uros Bizjak
Hello!

> With the following patch, gfortran can be regtested with -flto
> with no failure, but pr54852 and pr60061.

-! { dg-final { scan-assembler-times "myBindC" 1 { target { ! {
hppa*-*-hpux* } } } } }
-! { dg-final { scan-assembler-times "myBindC,%r2" 1 { target {
hppa*-*-hpux* } } } }
+! { dg-final { scan-assembler-times "call\[^\n\r\]*myBindC" 1 {
target { ! { hppa*-*-hpux* } } } } }
+! { dg-final { scan-assembler-times "call\[^\n\r\]*myBindC,%r2" 1 {
target { hppa*-*-hpux* } } } }

The change above fails on alpha, which doesn't emit "call" in the assembly, but:

$ grep myBindC bind_c_array_params_2.s
jsr $26,myBindC

Probably, alpha is not the only one that fails this assumption.

Uros.


Re: -fuse-caller-save - Collect register usage information

2014-05-29 Thread Tom de Vries

On 29-05-14 00:42, Bill Schmidt wrote:

Tom, the final version of this patch that you committed breaks bootstrap
on powerpc64le-linux-gnu.  The problem is that all uses of the variable
i are guarded by #ifdef STACK_REGS, but the declaration of i is
unconditional.  We get an unused variable warning that becomes an error
during stage 3.



Bill,

thanks for letting me know.

I've bootstrapped attached patch on x86_64, and committed it.

Thanks,
- Tom

2014-05-29  Tom de Vries  

	* final.c (collect_fn_hard_reg_usage): Guard variable declaration
	with #ifdef STACK_REGS.

diff --git a/gcc/final.c b/gcc/final.c
index a345fe7..c32e177 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -4750,7 +4750,9 @@ static void
 collect_fn_hard_reg_usage (void)
 {
   rtx insn;
+#ifdef STACK_REGS
   int i;
+#endif
   struct cgraph_rtl_info *node;
 
   /* ??? To be removed when all the ports have been fixed.  */


Re: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-29 Thread Andreas Schwab
"Thomas Preud'homme"  writes:

> By the way, I couldn't understand how you reached the value
> 0x44434241. Can you explain me?

Each byte is composed of the first 7 bits of the original byte.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


RE: [MIPS] Add sbasic supoert ffor MSA (SIMD)

2014-05-29 Thread Matthew Fortune
Mike Stump  writes:
> On May 28, 2014, at 7:27 AM, Richard Earnshaw  wrote:
> >
> > Speed of implementation.  We're gradually replacing these with proper
> > builtins, but that takes a lot more work.
> 
> As an owner of a port with more builtins that yours, I can offer a
> technological solution to reduce the cost of builtins to:
> 
> (define_builtin "my_stop"
>   [
> (define_outputs [(void_operand 0)])
> (define_rtl_pattern "my_stop" [])
>   ]
> )
> 
> (define_insn "my_stop"
>   [(unspec_volatile [(const_int 0)]
> UNSPECV_STOP)]
>   ""
>   "stop")
> 
> for example.  This creates the builtins, allows overloading, allows
> input/output parameters, can reorder operands, allows for complex types,
> allows memory reference parameters, allows pure markings, does vectors,
> conditional availability, generates documentation, creates test suites and
> more.  If you wire up a speaker it even sings.
> 
> Someone would have have to step forward with a need and some time to port
> their port over to the new scheme and help with the reason for why the
> technology should go in.  It is mostly contained in 5600 lines of self
> contained python code, and is built to solve the problem generally.  It adds
> about 800 lines to builtins.c.  It has a macro system that is more powerful
> than the macro system .md files use, so one gets to share and collapse
> builtins rather nicely.  It is known to work for C and C++.  Other languages
> may need extending; C for example cost is around 250 lines to support.

Myself and others at IMG would be interested in reviewing/evaluating the
implementation and assuming it looks useful then we would of course help to
get it in shape for submission.
 
> One promise, you will never have to create an argument list, or a type, for
> example here is a two output, type input functional instruction with some
> doc content:
> 
> (define_mode_iterator MYTYPE
> [V8QI V4HI V2SI DI ...])
> 
> (define_builtin "my_foo" "my_foo2_"
>   [
> (define_desc"Doc string for operation")
> (define_outputs [(var_operand:T_MYTYPE 0)
>  (var_operand:T_MYTYPE 1)])
> (define_inputs  [(var_operand:T_MYTYPE 2)
>  (var_operand:T_MYTYPE 3)])
> (define_rtl_pattern "my_foo2_" [0 2 1 3])
> (attributes [pure])
>   ]
> )
> 
> I stripped it so you can't know what the instruction was, but you get a
> flavor of multiple outputs, doc bits, pure, overloading, arguments and
> argument rearranging.

Can you post the implementation as an RFC? I suspect the python aspect
will cause the most trouble as GCC builds do not currently require python
I guess that could change depending on the value added. Otherwise it would
be a rewrite I guess.

Before digging in too deep though it would be useful to know if RichardS
would be willing to consider this kind of thing for the MIPS port?

Regards,
Matthew


Re: [Patch] Minor fixes for regtesting gfortran with -flto

2014-05-29 Thread Dominique Dhumieres
> Probably, alpha is not the only one that fails this assumption.

Indeed! see the thread starting at
https://gcc.gnu.org/ml/fortran/2014-05/msg00127.html

Could you test the following patch

--- ../_clean/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90   
2014-05-24 16:17:53.0 +0200
+++ gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-29 
11:34:40.0 +0200
@@ -16,7 +16,7 @@ integer :: aa(4,4)
 call test(aa)
 end
 
-! { dg-final { scan-assembler-times "call\[^\n\r\]*myBindC" 1 { target { ! { 
hppa*-*-hpux* } } } } }
-! { dg-final { scan-assembler-times "call\[^\n\r\]*myBindC,%r2" 1 { target { 
hppa*-*-hpux* } } } }
+! { dg-final { scan-assembler-times "\[ \t\]\[$,_0-9\]*myBindC" 1 { target { ! 
{ hppa*-*-hpux* } } } } }
+! { dg-final { scan-assembler-times "\[ \t\]\[$,_0-9\]*myBindC,%r2" 1 { target 
{ hppa*-*-hpux* } } } }
 ! { dg-final { scan-tree-dump-times "test \\\(&parm\\." 1 "original" } }
 ! { dg-final { cleanup-tree-dump "original" } }

with

make -k check-gfortran RUNTESTFLAGS="dg.exp=bind_c_array_params_2.f90 
--target_board=unix'{-m32,-m64,-m32/-flto,-m64/-flto}'"

Can you pre-approved it?

TIA

Dominique


Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)

2014-05-29 Thread Ramana Radhakrishnan
On Wed, May 28, 2014 at 6:49 PM, Mike Stump  wrote:
> On May 28, 2014, at 7:27 AM, Richard Earnshaw  wrote:
>>
>> Speed of implementation.  We're gradually replacing these with proper
>> builtins, but that takes a lot more work.
>
> As an owner of a port with more builtins that yours, I can offer a 
> technological solution to reduce the cost of builtins to:
>
> (define_builtin “my_stop"
>   [
> (define_outputs [(void_operand 0)])
> (define_rtl_pattern “my_stop" [])
>   ]
> )
>
> (define_insn “my_stop"
>   [(unspec_volatile [(const_int 0)]
> UNSPECV_STOP)]
>   ""
>   “stop”)
>
> for example.  This creates the builtins, allows overloading, allows 
> input/output parameters, can reorder operands, allows for complex types, 
> allows memory reference parameters, allows pure markings, does vectors, 
> conditional availability, generates documentation, creates test suites and 
> more.  If you wire up a speaker it even sings.
>
> Someone would have have to step forward with a need and some time to port 
> their port over to the new scheme and help with the reason for why the 
> technology should go in.  It is mostly contained in 5600 lines of self 
> contained python code, and is built to solve the problem generally.  It adds 
> about 800 lines to builtins.c.  It has a macro system that is more powerful 
> than the macro system .md files use, so one gets to share and collapse 
> builtins rather nicely.  It is known to work for C and C++.  Other languages 
> may need extending; C for example cost is around 250 lines to support.
>
> One promise, you will never have to create an argument list, or a type, for 
> example here is a two output, type input functional instruction with some doc 
> content:
>
> (define_mode_iterator MYTYPE
> [V8QI V4HI V2SI DI ...])
>
> (define_builtin “my_foo” "my_foo2_"
>   [
> (define_desc“Doc string for operation")
> (define_outputs [(var_operand:T_MYTYPE 0)
>  (var_operand:T_MYTYPE 1)])
> (define_inputs  [(var_operand:T_MYTYPE 2)
>  (var_operand:T_MYTYPE 3)])
> (define_rtl_pattern “my_foo2_" [0 2 1 3])
> (attributes [pure])
>   ]
> )
>
> I stripped it so you can’t know what the instruction was, but you get a 
> flavor of multiple outputs, doc bits, pure, overloading, arguments and 
> argument rearranging.
>
>
> Let me know if you’re interested.

This sounds interesting - could you post something for an RFC or in a
branch so that one can play with it ?

Ramana


Re: RFA: cache enabled attribute by insn code

2014-05-29 Thread Richard Earnshaw
On 27/05/14 16:07, Richard Sandiford wrote:
> Richard Sandiford  writes:
>> Richard Sandiford  writes:
>>> Does the following patch help?
>>
>> Bah, it won't of course: %i1 needs to be the operator.
> 
> Here's v2.  I tested that it worked for simple tests like:
> 
> int f1 (int x, int y) { return x + (y << 4); }
> int f2 (int x, int y) { return x - (y << 4); }
> int f3 (int x, int y) { return x & (y << 4); }
> int f4 (int x, int y) { return x | (y << 4); }
> int f5 (int x, int y) { return x ^ (y << 4); }
> int f6 (int x, int y) { return (y << 4) - x; }
> int g1 (int x, int y, int z) { return x + (y << z); }
> int g2 (int x, int y, int z) { return x - (y << z); }
> int g3 (int x, int y, int z) { return x & (y << z); }
> int g4 (int x, int y, int z) { return x | (y << z); }
> int g5 (int x, int y, int z) { return x ^ (y << z); }
> int g6 (int x, int y, int z) { return (y << z) - x; }
> 
> as well as the testcase.
> 
> Thanks,
> Richard
> 
> 
> gcc/
>   * config/arm/iterators.md (shiftable_ops): New code iterator.
>   (t2_binop0, arith_shift_insn): New code attributes.
>   * config/arm/arm.md (insn_enabled): Delete.
>   (enabled): Remove insn_enabled test.
>   (*arith_shiftsi): Split out...
>   (*arith_multsi): ...this pattern and remove insn_enabled attribute.
> 


Thanks, Richard.
I've tweaked this as followed and committed it.

I now consider "shift_operator" in the arm backend deprecated.  We
should be moving towards using shift_nomul_operator.

There's one final wart still to be handled, though.  'rotate' can only
take an immediate operand, not a register.  We can currently deal with
this, but it's not clean in terms of constraint handling.  I'll see if I
can fix this up sometime, but not today.

R.

2014-05-29  Richard Earnshaw 
Richard Sandiford  

* arm/iterators.md (shiftable_ops): New code iterator.
(t2_binop0, arith_shift_insn): New code attributes.
* arm/predicates.md (shift_nomul_operator): New predicate.
* arm/arm.md (insn_enabled): Delete.
(enabled): Remove insn_enabled test.
(*arith_shiftsi): Delete.  Replace with ...
(*_multsi): ... new pattern.
(*_shiftsi): ... new pattern.
* config/arm/arm.c (arm_print_operand): Handle operand format 'b'.





diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ccad548..b514757 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -21271,7 +21271,15 @@ arm_print_condition (FILE *stream)
 }
 
 
-/* If CODE is 'd', then the X is a condition operand and the instruction
+/* Globally reserved letters: acln
+   Puncutation letters currently used: @_|?().!#
+   Lower case letters currently used: bcdefhimpqtvwxyz
+   Upper case letters currently used: ABCDFGHJKLMNOPQRSTU
+   Letters previously used, but now deprecated/obsolete: sVWXYZ.
+
+   Note that the global reservation for 'c' is only for CONSTANT_ADDRESS_P.
+
+   If CODE is 'd', then the X is a condition operand and the instruction
should only be executed if the condition is true.
if CODE is 'D', then the X is a condition operand and the instruction
should only be executed if the condition is false: however, if the mode
@@ -21411,6 +21419,19 @@ arm_print_operand (FILE *stream, rtx x, int code)
}
   return;
 
+case 'b':
+  /* Print the log2 of a CONST_INT.  */
+  {
+   HOST_WIDE_INT val;
+
+   if (!CONST_INT_P (x)
+   || (val = exact_log2 (INTVAL (x) & 0x)) < 0)
+ output_operand_lossage ("Unsupported operand for code '%c'", code);
+   else
+ fprintf (stream, "#" HOST_WIDE_INT_PRINT_DEC, val);
+  }
+  return;
+
 case 'L':
   /* The low 16 bits of an immediate constant.  */
   fprintf (stream, HOST_WIDE_INT_PRINT_DEC, INTVAL(x) & 0x);
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 348a89c..cd7495f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -200,17 +200,9 @@
  (const_string "yes")]
 (const_string "no")))
 
-; Allows an insn to disable certain alternatives for reasons other than
-; arch support.
-(define_attr "insn_enabled" "no,yes"
-  (const_string "yes"))
-
 ; Enable all alternatives that are both arch_enabled and insn_enabled.
  (define_attr "enabled" "no,yes"
-   (cond [(eq_attr "insn_enabled" "no")
- (const_string "no")
-
- (and (eq_attr "predicable_short_it" "no")
+   (cond [(and (eq_attr "predicable_short_it" "no")
   (and (eq_attr "predicated" "yes")
(match_test "arm_restrict_it")))
  (const_string "no")
@@ -9876,38 +9868,34 @@
 
 ;; Patterns to allow combination of arithmetic, cond code and shifts
 
-(define_insn "*arith_shiftsi"
-  [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r")
-(match_operator:SI 1 "shiftable_operator"
-  [(match_operator:SI 3 "shift_operator"
- [(match_operand:SI 4 "s_register_operand" "r,r,r,r")
-   

RE: [PATCH] Fix PR54733 Optimize endian independent load/store

2014-05-29 Thread Thomas Preud'homme
> From: Andreas Schwab [mailto:sch...@linux-m68k.org]
> "Thomas Preud'homme"  writes:
> 
> > By the way, I couldn't understand how you reached the value
> > 0x44434241. Can you explain me?
> 
> Each byte is composed of the first 7 bits of the original byte.

Sorry, it seems I wasn't very awake when I checked that. Makes sense now.
Thanks. 

Does the patch solve the problem you had? What about you Christophe?

Best regards,

Thomas





Re: [PATCH] Inline asm asan instrumentation

2014-05-29 Thread Evgeniy Stepanov
Cool, we don't have this in LLVM-ASan, but we have plans to instrument
inline asm soon (not just constraints).

asm-struct-1.c test looks like a false positive though - the code does
not access any invalid memory, it only does a harmless pointer cast.


On Wed, May 28, 2014 at 10:36 PM, Konstantin Serebryany
 wrote:
> On Wed, May 28, 2014 at 5:33 PM, Marat Zakirov  wrote:
>> Hi all,
>>
>> Here's a patch for optional Asan instrumentation of inline assembly.
>>
>> This version scans gimple for GIMPLE_ASMs and performs usual instrumentation
>> of arguments with memory constraints ("m", "o", etc.) with fixed size.
>>
>> Instrumentation is turned off by default.
>>
>> This was successfully bootstrapped and regtested on x64. I have also
>> instrumented and ran ffmpeg regression testsuite (it seems to have quite
>> some inline asm).
>>
>> --Marat


Re: [Patch] Minor fixes for regtesting gfortran with -flto

2014-05-29 Thread Uros Bizjak
On Thu, May 29, 2014 at 11:38 AM, Dominique Dhumieres
 wrote:
>> Probably, alpha is not the only one that fails this assumption.
>
> Indeed! see the thread starting at
> https://gcc.gnu.org/ml/fortran/2014-05/msg00127.html
>
> Could you test the following patch
>
> --- ../_clean/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90   
> 2014-05-24 16:17:53.0 +0200
> +++ gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-29 
> 11:34:40.0 +0200
> @@ -16,7 +16,7 @@ integer :: aa(4,4)
>  call test(aa)
>  end
>
> -! { dg-final { scan-assembler-times "call\[^\n\r\]*myBindC" 1 { target { ! { 
> hppa*-*-hpux* } } } } }
> -! { dg-final { scan-assembler-times "call\[^\n\r\]*myBindC,%r2" 1 { target { 
> hppa*-*-hpux* } } } }
> +! { dg-final { scan-assembler-times "\[ \t\]\[$,_0-9\]*myBindC" 1 { target { 
> ! { hppa*-*-hpux* } } } } }
> +! { dg-final { scan-assembler-times "\[ \t\]\[$,_0-9\]*myBindC,%r2" 1 { 
> target { hppa*-*-hpux* } } } }
>  ! { dg-final { scan-tree-dump-times "test \\\(&parm\\." 1 "original" } }
>  ! { dg-final { cleanup-tree-dump "original" } }
>
> with
>
> make -k check-gfortran RUNTESTFLAGS="dg.exp=bind_c_array_params_2.f90 
> --target_board=unix'{-m32,-m64,-m32/-flto,-m64/-flto}'"

This works on alpha with --target_board=unix'{,-flto}' and x86_64, so
I guess it is OK.

> Can you pre-approved it?

I'm not a testsuite maintainer (one is CC'd for a final approval), but
the situation is definitely better with the patched regexp.

Uros.


Re: [PATCH] Inline asm asan instrumentation

2014-05-29 Thread Jakub Jelinek
On Wed, May 28, 2014 at 05:33:44PM +0400, Marat Zakirov wrote:
> Here's a patch for optional Asan instrumentation of inline assembly.
> 
> This version scans gimple for GIMPLE_ASMs and performs usual instrumentation
> of arguments with memory constraints ("m", "o", etc.) with fixed size.

That doesn't look right to me.  The fact that some region appears in "m"
doesn't mean the inline asm actually accesses it, it could not touch it at
all, or only some part of it.
If you look e.g. at Linux kernel headers, you'll see lots of
struct __large_struct { unsigned long buf[100]; };
#define __m(x) (*(struct __large_struct __user *)(x))
...
  "m" (__m(addr))
and similar cases, if Asan wants to check that the whole 100*sizeof(long)
region is accessible, it could often just have false positives, because the
inline asm really accesses just some small part of it.

Jakub


Re: RTABI half-precision conversion functions (ping)

2014-05-29 Thread Julian Brown
On Thu, 19 Jul 2012 14:47:54 +0100
Julian Brown  wrote:

> On Thu, 19 Jul 2012 13:54:57 +0100
> Paul Brook  wrote:
> 
> > > But, that means EABI-conformant callers are also perfectly
> > > entitled to sign-extend half-float values before calling our
> > > helper functions (although GCC itself won't do that). Using
> > > "unsigned int" and taking care to only examine the low-order bits
> > > of the value in the helper function itself serves to fix the
> > > latent bug, is compatible with existing code, allows us to be
> > > conformant with the eabi, and allows use of aliases to make the
> > > __gnu and __aeabi functions the same.
> > 
> > As long as LTO never sees this mismatch we should be fine :-)  AFAIK
> > we don't curently have any way of expressing the actual ABI.
> 
> Let's not worry about that for now :-).
> 
> > > The patch no longer applied as-is, so I've updated it (attached,
> > > re-tested). Note that there are no longer any target-independent
> > > changes (though I'm not certain that the symbol versions are still
> > > correct).
> > > 
> > > OK to apply?
> > 
> > I think this deserves a comment in the source.  Otherwise it's
> > liable to get "fixed" in the future :-) Something allong the lines
> > of "While the EABI describes the arguments to the half-float helper
> > routines as 'short', it does not require that they be extended to
> > full register width. The normal ABI requres that the caller
> > sign/zero extend short values to 32 bit.  We use unsigned int
> > arguments to prevent the gcc making assumptions about the high half
> > of the register."
> 
> Here's a version with an explanatory comment. I also fixed a couple of
> minor formatting nits I noticed (they don't upset the diff too much, I
> don't think).

It looks like this one got forgotten about. Ping?

Context:

https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00902.html
https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00912.html

This is an EABI-conformance fix.

Thanks,

Julian


Re: [DOC PATCH] Rewrite docs for inline asm

2014-05-29 Thread Eric Botcazou
> Yes.  We already know that this is better than the current docs.
> Let's check it in.

As far as I can see you did it, but didn't add a ChangeLog entry (so David 
isn't properly credited with the rewrite)?

-- 
Eric Botcazou


RE: [PATCH] Inline asm asan instrumentation

2014-05-29 Thread Marat Zakirov
>> asm-struct-1.c test looks like a false positive though - the code does not 
>> access any invalid memory, it only does a harmless pointer cast.

It is not. Because st1 have smaller size than st2:

struct st1 {
   int a[110];
}
struct st2 {
   int a[111];
};

And asm constrain was declared as: 

: "=m" (*((struct st2 *)&s1)));

Test violate memory access constrain by cast (struct st2 *)&s1. 
We check only constraints and by such a cast as we think user declare that he 
want to access full st2 structure which have bigger size than st1. 

-Original Message-
From: Evgeniy Stepanov [mailto:eugeni.stepa...@gmail.com] 
Sent: Thursday, May 29, 2014 1:58 PM
To: Konstantin Serebryany
Cc: Marat Zakirov; GCC Patches; Konstantin Serebryany; Jakub Jelinek; 
Viacheslav Garbuzov; Yuri Gribov; Marat Zakirov
Subject: Re: [PATCH] Inline asm asan instrumentation

Cool, we don't have this in LLVM-ASan, but we have plans to instrument inline 
asm soon (not just constraints).

asm-struct-1.c test looks like a false positive though - the code does not 
access any invalid memory, it only does a harmless pointer cast.


On Wed, May 28, 2014 at 10:36 PM, Konstantin Serebryany 
 wrote:
> On Wed, May 28, 2014 at 5:33 PM, Marat Zakirov  wrote:
>> Hi all,
>>
>> Here's a patch for optional Asan instrumentation of inline assembly.
>>
>> This version scans gimple for GIMPLE_ASMs and performs usual 
>> instrumentation of arguments with memory constraints ("m", "o", etc.) with 
>> fixed size.
>>
>> Instrumentation is turned off by default.
>>
>> This was successfully bootstrapped and regtested on x64. I have also 
>> instrumented and ran ffmpeg regression testsuite (it seems to have 
>> quite some inline asm).
>>
>> --Marat



Re: RFA: cache enabled attribute by insn code

2014-05-29 Thread Richard Sandiford
Richard Earnshaw  writes:
> On 27/05/14 16:07, Richard Sandiford wrote:
>> Richard Sandiford  writes:
>>> Richard Sandiford  writes:
 Does the following patch help?
>>>
>>> Bah, it won't of course: %i1 needs to be the operator.
>> 
>> Here's v2.  I tested that it worked for simple tests like:
>> 
>> int f1 (int x, int y) { return x + (y << 4); }
>> int f2 (int x, int y) { return x - (y << 4); }
>> int f3 (int x, int y) { return x & (y << 4); }
>> int f4 (int x, int y) { return x | (y << 4); }
>> int f5 (int x, int y) { return x ^ (y << 4); }
>> int f6 (int x, int y) { return (y << 4) - x; }
>> int g1 (int x, int y, int z) { return x + (y << z); }
>> int g2 (int x, int y, int z) { return x - (y << z); }
>> int g3 (int x, int y, int z) { return x & (y << z); }
>> int g4 (int x, int y, int z) { return x | (y << z); }
>> int g5 (int x, int y, int z) { return x ^ (y << z); }
>> int g6 (int x, int y, int z) { return (y << z) - x; }
>> 
>> as well as the testcase.
>> 
>> Thanks,
>> Richard
>> 
>> 
>> gcc/
>>  * config/arm/iterators.md (shiftable_ops): New code iterator.
>>  (t2_binop0, arith_shift_insn): New code attributes.
>>  * config/arm/arm.md (insn_enabled): Delete.
>>  (enabled): Remove insn_enabled test.
>>  (*arith_shiftsi): Split out...
>>  (*arith_multsi): ...this pattern and remove insn_enabled attribute.
>> 
>
>
> Thanks, Richard.
> I've tweaked this as followed and committed it.
>
> I now consider "shift_operator" in the arm backend deprecated.  We
> should be moving towards using shift_nomul_operator.
>
> There's one final wart still to be handled, though.  'rotate' can only
> take an immediate operand, not a register.  We can currently deal with
> this, but it's not clean in terms of constraint handling.  I'll see if I
> can fix this up sometime, but not today.

Thanks for picking it up.  I realised later that I'd fluffed the
MULT check in:

> @@ -9876,38 +9868,34 @@
>  
>  ;; Patterns to allow combination of arithmetic, cond code and shifts
>  
> -(define_insn "*arith_shiftsi"
> -  [(set (match_operand:SI 0 "s_register_operand" "=r,r,r,r")
> -(match_operator:SI 1 "shiftable_operator"
> -  [(match_operator:SI 3 "shift_operator"
> - [(match_operand:SI 4 "s_register_operand" "r,r,r,r")
> -  (match_operand:SI 5 "shift_amount_operand" "M,M,M,r")])
> -   (match_operand:SI 2 "s_register_operand" "rk,rk,r,rk")]))]
> +(define_insn "*_multsi"
> +  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
> + (shiftable_ops:SI
> +  (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
> +   (match_operand:SI 3 "power_of_two_operand" ""))
> +  (match_operand:SI 1 "s_register_operand" "rk,")))]
>"TARGET_32BIT"
> -  "%i1%?\\t%0, %2, %4%S3"
> +  "%?\\t%0, %1, %2, lsl %b3"
> +  [(set_attr "predicable" "yes")
> +   (set_attr "predicable_short_it" "no")
> +   (set_attr "shift" "4")
> +   (set_attr "arch" "a,t2")
> +   (set_attr "type" "alu_shift_imm")])
> +
> +(define_insn "*_shiftsi"
> +  [(set (match_operand:SI 0 "s_register_operand" "=r,r,r")
> + (shiftable_ops:SI
> +  (match_operator:SI 2 "shift_nomul_operator"
> +   [(match_operand:SI 3 "s_register_operand" "r,r,r")
> +(match_operand:SI 4 "shift_amount_operand" "M,M,r")])
> +  (match_operand:SI 1 "s_register_operand" "rk,,rk")))]
> +  "TARGET_32BIT && GET_CODE (operands[3]) != MULT"

...this condition: operands[3] was the old numbering of the operator
rather than the new numbering.  It looks like shift_nomul_operator
should make it redundant anyway.

Richard


[PATCH, i386, Pointer Bounds Checker 11/x] Keep bounds initial values

2014-05-29 Thread Ilya Enkovich
Hi,

This patch tries to keep bounds initial values when it may be needed.  Even if 
initial value is not fully known (e.g. we know only low bound) it still may 
help to remove some redundant checks.

Bootstrapped and tested on linux-x86_64.

Thanks,
Ilya
--
gcc/

2013-05-29  Ilya Enkovich  

* ipa.c (symtab_remove_unreachable_nodes): Kepp initial values for
pointer bounds to be used for checks eliminations.
* lto-cgraph.c (compute_ltrans_boundary): Likewise.
* add_references_to_partition (add_references_to_partition): Add
references to pointer bounds vars.


diff --git a/gcc/ipa.c b/gcc/ipa.c
index 1d7fa35..958cabe 100644
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -568,7 +568,8 @@ symtab_remove_unreachable_nodes (bool before_inlining_p, 
FILE *file)
  vnode->aux = NULL;
 
  /* Keep body if it may be useful for constant folding.  */
- if ((init = ctor_for_folding (vnode->decl)) == error_mark_node)
+ if ((init = ctor_for_folding (vnode->decl)) == error_mark_node
+ && !POINTER_BOUNDS_P (vnode->decl))
varpool_remove_initializer (vnode);
  else
DECL_INITIAL (vnode->decl) = init;
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 58105f0..7089516 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -829,7 +829,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
{
  if (!lto_symtab_encoder_encode_initializer_p (encoder,
vnode)
- && ctor_for_folding (vnode->decl) != error_mark_node)
+ && (ctor_for_folding (vnode->decl) != error_mark_node
+ || POINTER_BOUNDS_P (vnode->decl)))
{
  lto_set_symtab_encoder_encode_initializer (encoder, vnode);
  add_references (encoder, &vnode->ref_list);
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 2967d73..330253b 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -96,7 +96,8 @@ add_references_to_partition (ltrans_partition part, 
symtab_node *node)
Recursively look into the initializers of the constant variable and add
references, too.  */
 else if (is_a  (ref->referred)
-&& ctor_for_folding (ref->referred->decl) != error_mark_node
+&& (ctor_for_folding (ref->referred->decl) != error_mark_node
+|| POINTER_BOUNDS_P (ref->referred->decl))
 && !lto_symtab_encoder_in_partition_p (part->encoder, 
ref->referred))
   {
if (!part->initializers_visited)


[PATCH, i386, Pointer Bounds Checker 12/x] Recognize instrumented special functions

2014-05-29 Thread Ilya Enkovich
Hi,

This patch allows to recognize instrumented call to special function by using 
the original function name for recognition.

Bootstrapped and tested on linux-x86_64.

Thanks,
Ilya
--
gcc/

2014-05-29  Ilya Enkovich  

* calls.c (special_function_p): Use original decl name
when analyzing instrumentation clone.

diff --git a/gcc/calls.c b/gcc/calls.c
index f0c92dd..e1dc8eb 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -502,8 +502,16 @@ emit_call_1 (rtx funexp, tree fntree ATTRIBUTE_UNUSED, 
tree fndecl ATTRIBUTE_UNU
 static int
 special_function_p (const_tree fndecl, int flags)
 {
-  if (fndecl && DECL_NAME (fndecl)
-  && IDENTIFIER_LENGTH (DECL_NAME (fndecl)) <= 17
+  tree name_decl = DECL_NAME (fndecl);
+
+  /* For instrumentation clones we want to derive flags
+ from the original name.  */
+  if (cgraph_get_node (fndecl)
+  && cgraph_get_node (fndecl)->instrumentation_clone)
+name_decl = DECL_NAME (cgraph_get_node (fndecl)->orig_decl);
+
+  if (fndecl && name_decl
+  && IDENTIFIER_LENGTH (name_decl) <= 17
   /* Exclude functions not at the file scope, or not `extern',
 since they are not the magic functions we would otherwise
 think they are.
@@ -515,16 +523,16 @@ special_function_p (const_tree fndecl, int flags)
  || TREE_CODE (DECL_CONTEXT (fndecl)) == TRANSLATION_UNIT_DECL)
   && TREE_PUBLIC (fndecl))
 {
-  const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  const char *name = IDENTIFIER_POINTER (name_decl);
   const char *tname = name;
 
   /* We assume that alloca will always be called by name.  It
 makes no sense to pass it as a pointer-to-function to
 anything that does not understand its behavior.  */
-  if (((IDENTIFIER_LENGTH (DECL_NAME (fndecl)) == 6
+  if (((IDENTIFIER_LENGTH (name_decl) == 6
&& name[0] == 'a'
&& ! strcmp (name, "alloca"))
-  || (IDENTIFIER_LENGTH (DECL_NAME (fndecl)) == 16
+  || (IDENTIFIER_LENGTH (name_decl) == 16
   && name[0] == '_'
   && ! strcmp (name, "__builtin_alloca"
flags |= ECF_MAY_BE_ALLOCA;


[PATCH, Pointer Bounds Checker 13/x] Early versioning

2014-05-29 Thread Ilya Enkovich
Hi,

This patch allows to perform function versioning when some structures are not 
available yet.  It is required to make clones for Pointer Bounds Checker right 
after SSA build.

Bootstrapped and tested on linux-x86_64.

Thanks,
Ilya
--
gcc/

2014-05-29  Ilya Enkovich  

* tree-inline.c (copy_cfg_body): Check loop tree
existence before accessing it.
(tree_function_versioning): Check DF info existence
before accessing it.

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 4293241..23fef90 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -2544,7 +2544,8 @@ copy_cfg_body (copy_body_data * id, gcov_type count, int 
frequency_scale,
 
   /* If the loop tree in the source function needed fixup, mark the
  destination loop tree for fixup, too.  */
-  if (loops_for_fn (src_cfun)->state & LOOPS_NEED_FIXUP)
+  if (loops_for_fn (src_cfun)
+  && loops_for_fn (src_cfun)->state & LOOPS_NEED_FIXUP)
 loops_state_set (LOOPS_NEED_FIXUP);
 
   if (gimple_in_ssa_p (cfun))
@@ -5350,8 +5351,9 @@ tree_function_versioning (tree old_decl, tree new_decl,
   DECL_ARGUMENTS (new_decl) = DECL_ARGUMENTS (old_decl);
   initialize_cfun (new_decl, old_decl,
   old_entry_block->count);
-  DECL_STRUCT_FUNCTION (new_decl)->gimple_df->ipa_pta
-= id.src_cfun->gimple_df->ipa_pta;
+  if (DECL_STRUCT_FUNCTION (new_decl)->gimple_df)
+DECL_STRUCT_FUNCTION (new_decl)->gimple_df->ipa_pta
+  = id.src_cfun->gimple_df->ipa_pta;
 
   /* Copy the function's static chain.  */
   p = DECL_STRUCT_FUNCTION (old_decl)->static_chain_decl;


RE: [PATCH] Inline asm asan instrumentation

2014-05-29 Thread Marat Zakirov
Actually I do not think that this is good idea to use constraints in a such
arbitrary way. By setting constraints user takes responsibility on himself.
So even if full inline asm support will be done one day, I do think that
checking memory constraints will be still exist. 
It is the same situation as with compiler warnings - sometimes they are
bothering but if you think you do not need them - just do not use them.

-Original Message-
From: Jakub Jelinek [mailto:ja...@redhat.com] 
Sent: Thursday, May 29, 2014 2:09 PM
To: Marat Zakirov
Cc: gcc-patches@gcc.gnu.org; 'Konstantin Serebryany'; 'Viacheslav Garbuzov';
'Yuri Gribov'; 'Marat Zakirov'
Subject: Re: [PATCH] Inline asm asan instrumentation

On Wed, May 28, 2014 at 05:33:44PM +0400, Marat Zakirov wrote:
> Here's a patch for optional Asan instrumentation of inline assembly.
> 
> This version scans gimple for GIMPLE_ASMs and performs usual 
> instrumentation of arguments with memory constraints ("m", "o", etc.) with
fixed size.

That doesn't look right to me.  The fact that some region appears in "m"
doesn't mean the inline asm actually accesses it, it could not touch it at
all, or only some part of it.
If you look e.g. at Linux kernel headers, you'll see lots of struct
__large_struct { unsigned long buf[100]; }; #define __m(x) (*(struct
__large_struct __user *)(x)) ...
  "m" (__m(addr))
and similar cases, if Asan wants to check that the whole 100*sizeof(long)
region is accessible, it could often just have false positives, because the
inline asm really accesses just some small part of it.

Jakub



Re: [PATCH 7/7] Move SYMBOL_REF_FLAGS to rtx header

2014-05-29 Thread Tom de Vries

On 10-05-14 22:24, Richard Sandiford wrote:

  /* A set of flags on a symbol_ref that are, in some respects, redundant with
 information derivable from the tree decl associated with this symbol.
@@ -1791,7 +1794,9 @@ #define SYMBOL_REF_CONSTANT(RTX) \
 this information to avoid recomputing it.  Finally, this allows space for
 the target to store more than one bit of information, as with
 SYMBOL_REF_FLAG.  */
-#define SYMBOL_REF_FLAGS(RTX)  X0INT ((RTX), 1)
+#define SYMBOL_REF_FLAGS(RTX) \
+  (RTL_FLAG_CHECK1 ("SYMBOL_REF_FLAGS", (RTX), SYMBOL_REF) \
+   ->u2.symbol_ref_flags)



Richard,

with an arm-linux-gnueabi non-bootstrap build with --enable-checking=yes,rtl, I 
ran into the following error:

...
/home/vries/gcc_versions/devel/src/libgcc/libgcc2.c:819:1: internal compiler 
error: RTL check: attempt to treat non-block symbol as a block symbol in 
create_block_symbol, at varasm.c:394

 };
 ^
0xc3c16b rtl_check_failed_block_symbol(char const*, int, char const*)
/home/vries/gcc_versions/devel/src/gcc/rtl.c:844
0x103c09d create_block_symbol
/home/vries/gcc_versions/devel/src/gcc/varasm.c:394
0x103f42d make_decl_rtl(tree_node*)
/home/vries/gcc_versions/devel/src/gcc/varasm.c:1379
0x103fc87 notice_global_symbol(tree_node*)
/home/vries/gcc_versions/devel/src/gcc/varasm.c:1552
0x7588bf varpool_finalize_decl(tree_node*)
/home/vries/gcc_versions/devel/src/gcc/cgraphunit.c:823
0xb4eaa0 rest_of_decl_compilation(tree_node*, int, int)
/home/vries/gcc_versions/devel/src/gcc/passes.c:241
0x5902c4 finish_decl(tree_node*, unsigned int, tree_node*, tree_node*, 
tree_node*)
/home/vries/gcc_versions/devel/src/gcc/c/c-decl.c:4521
0x5e8586 c_parser_declaration_or_fndef
/home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1782
0x5e7644 c_parser_external_declaration
/home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1399
0x5e72c7 c_parser_translation_unit
/home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1286
0x606c6d c_parse_file()
/home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:14077
0x66b7fa c_common_parse_file()
/home/vries/gcc_versions/devel/src/gcc/c-family/c-opts.c:1067
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
...

It looks like BLOCK_SYMBOL_CHECK hasn't been updated.

Patch below fixes it for me. OK for trunk if bootstrap on x86_64 succeeds?

Thanks,
- Tom


2014-05-29  Tom de Vries  

	* rtl.h (BLOCK_SYMBOL_CHECK): Use SYMBOL_REF_FLAGS.
---
 gcc/rtl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rtl.h b/gcc/rtl.h
index 02ce424..51cfae5 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -708,7 +708,7 @@ struct GTY(()) rtvec_def {
 
 #define BLOCK_SYMBOL_CHECK(RTX) __extension__\
 ({ __typeof (RTX) const _symbol = (RTX);\
-   const unsigned int flags = RTL_CHECKC1 (_symbol, 1, SYMBOL_REF).rt_int; \
+   const unsigned int flags = SYMBOL_REF_FLAGS (_symbol);		\
if ((flags & SYMBOL_FLAG_HAS_BLOCK_INFO) == 0)			\
  rtl_check_failed_block_symbol (__FILE__, __LINE__,			\
 __FUNCTION__);			\
-- 
1.9.1



Re: [DOC PATCH] Rewrite docs for inline asm

2014-05-29 Thread Andrew Haley
On 05/29/2014 11:22 AM, Eric Botcazou wrote:
>> Yes.  We already know that this is better than the current docs.
>> Let's check it in.
> 
> As far as I can see you did it, but didn't add a ChangeLog entry (so David 
> isn't properly credited with the rewrite)?

Fixed.

Thanks,
Andrew.




Re: [PATCH 7/7] Move SYMBOL_REF_FLAGS to rtx header

2014-05-29 Thread Richard Sandiford
Tom de Vries  writes:
> On 10-05-14 22:24, Richard Sandiford wrote:
>>   /* A set of flags on a symbol_ref that are, in some respects, redundant 
>> with
>>  information derivable from the tree decl associated with this symbol.
>> @@ -1791,7 +1794,9 @@ #define SYMBOL_REF_CONSTANT(RTX) \
>>  this information to avoid recomputing it.  Finally, this allows space 
>> for
>>  the target to store more than one bit of information, as with
>>  SYMBOL_REF_FLAG.  */
>> -#define SYMBOL_REF_FLAGS(RTX)   X0INT ((RTX), 1)
>> +#define SYMBOL_REF_FLAGS(RTX) \
>> +  (RTL_FLAG_CHECK1 ("SYMBOL_REF_FLAGS", (RTX), SYMBOL_REF) \
>> +   ->u2.symbol_ref_flags)
>>
>
> Richard,
>
> with an arm-linux-gnueabi non-bootstrap build with --enable-checking=yes,rtl, 
> I 
> ran into the following error:
> ...
> /home/vries/gcc_versions/devel/src/libgcc/libgcc2.c:819:1: internal compiler 
> error: RTL check: attempt to treat non-block symbol as a block symbol in 
> create_block_symbol, at varasm.c:394
>   };
>   ^
> 0xc3c16b rtl_check_failed_block_symbol(char const*, int, char const*)
>   /home/vries/gcc_versions/devel/src/gcc/rtl.c:844
> 0x103c09d create_block_symbol
>   /home/vries/gcc_versions/devel/src/gcc/varasm.c:394
> 0x103f42d make_decl_rtl(tree_node*)
>   /home/vries/gcc_versions/devel/src/gcc/varasm.c:1379
> 0x103fc87 notice_global_symbol(tree_node*)
>   /home/vries/gcc_versions/devel/src/gcc/varasm.c:1552
> 0x7588bf varpool_finalize_decl(tree_node*)
>   /home/vries/gcc_versions/devel/src/gcc/cgraphunit.c:823
> 0xb4eaa0 rest_of_decl_compilation(tree_node*, int, int)
>   /home/vries/gcc_versions/devel/src/gcc/passes.c:241
> 0x5902c4 finish_decl(tree_node*, unsigned int, tree_node*, tree_node*, 
> tree_node*)
>   /home/vries/gcc_versions/devel/src/gcc/c/c-decl.c:4521
> 0x5e8586 c_parser_declaration_or_fndef
>   /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1782
> 0x5e7644 c_parser_external_declaration
>   /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1399
> 0x5e72c7 c_parser_translation_unit
>   /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1286
> 0x606c6d c_parse_file()
>   /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:14077
> 0x66b7fa c_common_parse_file()
>   /home/vries/gcc_versions/devel/src/gcc/c-family/c-opts.c:1067
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> ...
>
> It looks like BLOCK_SYMBOL_CHECK hasn't been updated.
>
> Patch below fixes it for me. OK for trunk if bootstrap on x86_64 succeeds?

Can't really approve it, but it looks obviously correct to me.
Thanks for the fix.

Richard


Re: [PATCH 7/7] Move SYMBOL_REF_FLAGS to rtx header

2014-05-29 Thread Jeff Law

On 05/29/14 05:27, Tom de Vries wrote:

On 10-05-14 22:24, Richard Sandiford wrote:

  /* A set of flags on a symbol_ref that are, in some respects,
redundant with
 information derivable from the tree decl associated with this
symbol.
@@ -1791,7 +1794,9 @@ #define SYMBOL_REF_CONSTANT(RTX) \
 this information to avoid recomputing it.  Finally, this allows
space for
 the target to store more than one bit of information, as with
 SYMBOL_REF_FLAG.  */
-#define SYMBOL_REF_FLAGS(RTX)X0INT ((RTX), 1)
+#define SYMBOL_REF_FLAGS(RTX) \
+  (RTL_FLAG_CHECK1 ("SYMBOL_REF_FLAGS", (RTX), SYMBOL_REF) \
+   ->u2.symbol_ref_flags)



Richard,

with an arm-linux-gnueabi non-bootstrap build with
--enable-checking=yes,rtl, I ran into the following error:
...
/home/vries/gcc_versions/devel/src/libgcc/libgcc2.c:819:1: internal
compiler error: RTL check: attempt to treat non-block symbol as a block
symbol in create_block_symbol, at varasm.c:394
  };
  ^
0xc3c16b rtl_check_failed_block_symbol(char const*, int, char const*)
 /home/vries/gcc_versions/devel/src/gcc/rtl.c:844
0x103c09d create_block_symbol
 /home/vries/gcc_versions/devel/src/gcc/varasm.c:394
0x103f42d make_decl_rtl(tree_node*)
 /home/vries/gcc_versions/devel/src/gcc/varasm.c:1379
0x103fc87 notice_global_symbol(tree_node*)
 /home/vries/gcc_versions/devel/src/gcc/varasm.c:1552
0x7588bf varpool_finalize_decl(tree_node*)
 /home/vries/gcc_versions/devel/src/gcc/cgraphunit.c:823
0xb4eaa0 rest_of_decl_compilation(tree_node*, int, int)
 /home/vries/gcc_versions/devel/src/gcc/passes.c:241
0x5902c4 finish_decl(tree_node*, unsigned int, tree_node*, tree_node*,
tree_node*)
 /home/vries/gcc_versions/devel/src/gcc/c/c-decl.c:4521
0x5e8586 c_parser_declaration_or_fndef
 /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1782
0x5e7644 c_parser_external_declaration
 /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1399
0x5e72c7 c_parser_translation_unit
 /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1286
0x606c6d c_parse_file()
 /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:14077
0x66b7fa c_common_parse_file()
 /home/vries/gcc_versions/devel/src/gcc/c-family/c-opts.c:1067
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
...

It looks like BLOCK_SYMBOL_CHECK hasn't been updated.

Patch below fixes it for me. OK for trunk if bootstrap on x86_64 succeeds?

Yes.  Ok.

jeff


Re: [PATCH 7/7] Move SYMBOL_REF_FLAGS to rtx header

2014-05-29 Thread Jeff Law

On 05/29/14 06:07, Richard Sandiford wrote:


Can't really approve it, but it looks obviously correct to me.
Thanks for the fix.

Is that something you'd like to see changed?

Jeff


[PATCH AArch64 0/2] Correct signedness of builtins, remove casts from arm_neon.h

2014-05-29 Thread Alan Lawrence
The __builtin_ functions registered by aarch64_init_simd_builtins use signed 
and/or unsigned types according to the qualifiers defined in aarch64-builtins.c 
and used in aarch64-simd-builtins.def. These __builtin functions are then used 
in arm_neon.h, with explicit casts converting between the signed/unsigned types 
declared for the intrinsics and for the builtins.


These two patches add a few more sets of qualifiers, and use existing ones more 
widely, allowing removing lots of the explicit casts from arm_neon.h.


There is no particular logic in the division between the two patches beyond that 
the second patch uses sets of qualifiers defined in the first.


No regressions on aarch64-none-elf or aarch64_be-none-elf.



[PATCH AArch64 1/2] Correct signedness of builtins, remove casts from arm_neon.h

2014-05-29 Thread Alan Lawrence
This adds three new sets of qualifiers to aarch64-builtins.c, and uses the 
already-present-but-unused USHIFTIMM.


gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.c
(aarch64_types_binop_uus_qualifiers,
aarch64_types_shift_to_unsigned_qualifiers,
aarch64_types_unsigned_shiftacc_qualifiers): Define.
* gcc/config/aarch64/aarch64-simd-builtins.def (uqshl, uqrshl, uqadd,
uqsub, usqadd, usra_n, ursra_n, uqshrn_n, uqrshrn_n, usri_n, usli_n,
sqshlu_n, uqshl_n): Update qualifiers.
* gcc/config/aarch64/arm_neon.h (vqadd_u8, vqadd_u16, vqadd_u32,
vqadd_u64, vqaddq_u8, vqaddq_u16, vqaddq_u32, vqaddq_u64, vqsub_u8,
vqsub_u16, vqsub_u32, vqsub_u64, vqsubq_u8, vqsubq_u16, vqsubq_u32,
vqsubq_u64, vqaddb_u8, vqaddh_u16, vqadds_u32, vqaddd_u64, vqrshl_u8,
vqrshl_u16, vqrshl_u32, vqrshl_u64, vqrshlq_u8, vqrshlq_u16,
vqrshlq_u32, vqrshlq_u64, vqrshlb_u8, vqrshlh_u16, vqrshls_u32,
vqrshld_u64, vqrshrn_n_u16, vqrshrn_n_u32, vqrshrn_n_u64,
vqrshrnh_n_u16, vqrshrns_n_u32, vqrshrnd_n_u64, vqshl_u8, vqshl_u16,
vqshl_u32, vqshl_u64, vqshlq_u8, vqshlq_u16, vqshlq_u32, vqshlq_u64,
vqshlb_u8, vqshlh_u16, vqshls_u32, vqshld_u64, vqshl_n_u8, vqshl_n_u16,
vqshl_n_u32, vqshl_n_u64, vqshlq_n_u8, vqshlq_n_u16, vqshlq_n_u32,
vqshlq_n_u64, vqshlb_n_u8, vqshlh_n_u16, vqshls_n_u32, vqshld_n_u64,
vqshlu_n_s8, vqshlu_n_s16, vqshlu_n_s32, vqshlu_n_s64, vqshluq_n_s8,
vqshluq_n_s16, vqshluq_n_s32, vqshluq_n_s64, vqshlub_n_s8,
vqshluh_n_s16, vqshlus_n_s32, vqshlud_n_s64, vqshrn_n_u16,
vqshrn_n_u32, vqshrn_n_u64, vqshrnh_n_u16, vqshrns_n_u32,
vqshrnd_n_u64, vqsubb_u8, vqsubh_u16, vqsubs_u32, vqsubd_u64,
vrsra_n_u8, vrsra_n_u16, vrsra_n_u32, vrsra_n_u64, vrsraq_n_u8,
vrsraq_n_u16, vrsraq_n_u32, vrsraq_n_u64, vrsrad_n_u64, vsli_n_u8,
vsli_n_u16, vsli_n_u32,vsli_n_u64, vsliq_n_u8, vsliq_n_u16,
vsliq_n_u32, vsliq_n_u64, vslid_n_u64, vsqadd_u8, vsqadd_u16,
vsqadd_u32, vsqadd_u64, vsqaddq_u8, vsqaddq_u16, vsqaddq_u32,
vsqaddq_u64, vsqaddb_u8, vsqaddh_u16, vsqadds_u32, vsqaddd_u64,
vsra_n_u8, vsra_n_u16, vsra_n_u32, vsra_n_u64, vsraq_n_u8,
vsraq_n_u16, vsraq_n_u32, vsraq_n_u64, vsrad_n_u64, vsri_n_u8,
vsri_n_u16, vsri_n_u32, vsri_n_u64, vsriq_n_u8, vsriq_n_u16,
vsriq_n_u32, vsriq_n_u64, vsrid_n_u64): Remove casts.



Alan Lawrence wrote:
The __builtin_ functions registered by aarch64_init_simd_builtins use signed 
and/or unsigned types according to the qualifiers defined in aarch64-builtins.c 
and used in aarch64-simd-builtins.def. These __builtin functions are then used 
in arm_neon.h, with explicit casts converting between the signed/unsigned types 	

declared for the intrinsics and for the builtins.

These two patches add a few more sets of qualifiers, and use existing ones more 
widely, allowing removing lots of the explicit casts from arm_neon.h.


There is no particular logic in the division between the two patches beyond that 
the second patch uses sets of qualifiers defined in the first.


No regressions on aarch64-none-elf or aarch64_be-none-elf.



diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 591260f18bcc084bcc6cc16b6597a3d2ec098d05..036da3e8646124cda7e1cb36db2ec48f84b58214 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -173,6 +173,10 @@ aarch64_types_binopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned };
 #define TYPES_BINOPU (aarch64_types_binopu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_uus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
+#define TYPES_BINOP_UUS (aarch64_types_binop_uus_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
@@ -199,9 +203,14 @@ aarch64_types_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TYPES_GETLANE (aarch64_types_getlane_qualifiers)
 #define TYPES_SHIFTIMM (aarch64_types_getlane_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_shift_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_immediate };
+#define TYPES_SHIFTIMM_USS (aarch64_types_shift_to_unsigned_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_unsigned_shift_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate };
 #define TYPES_USHIFTIMM (aarch64_types_unsigned_shift_qualifiers)
+
 static enum aarch64_type_qualifiers
 aarch64_types_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate };
@@ -210

[PATCH AArch64 2/2] Correct signedness of builtins, remove casts from arm_neon.h

2014-05-29 Thread Alan Lawrence
This adds another set of qualifiers to aarch64-builtins.c, and removes more 
casts from arm_neon.h, for the suqadd, ushl, urshl, urshr_n, ushll_n, and sshl 
intrinsics.


gcc/ChangeLog:

* gcc/config/aarch64/aarch64-builtins.c
(aarch64_types_binop_ssu_qualifiers): New static data.
(TYPES_BINOP_SSU): Define.
* gcc/config/aarch64/aarch64-simd-builtins.def (suqadd, ushl, urshl,
urshr_n, ushll_n): Use appropriate unsigned qualifiers.
* gcc/config/aarch64/arm_neon.h (vrshl_u8, vrshl_u16, vrshl_u32,
vrshl_u64, vrshlq_u8, vrshlq_u16, vrshlq_u32, vrshlq_u64, vrshld_u64,
vrshr_n_u8, vrshr_n_u16, vrshr_n_u32, vrshr_n_u64, vrshrq_n_u8,
vrshrq_n_u16, vrshrq_n_u32, vrshrq_n_u64, vrshrd_n_u64, vshll_n_u8,
vshll_n_u16, vshll_n_u32, vuqadd_s8, vuqadd_s16, vuqadd_s32,
vuqadd_s64, vuqaddq_s8, vuqaddq_s16, vuqaddq_s32, vuqaddq_s64,
vuqaddb_s8, vuqaddh_s16, vuqadds_s32, vuqaddd_s64): Add signedness
suffix to builtin function name, remove cast.
(vshl_s8, vshl_s16, vshl_s32, vshl_s64, vshl_u8, vshl_u16, vshl_u32,
vshl_u64, vshlq_s8, vshlq_s16, vshlq_s32, vshlq_s64, vshlq_u8,
vshlq_u16, vshlq_u32, vshlq_u64, vshld_s64, vshld_u64): Remove cast.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 036da3e8646124cda7e1cb36db2ec48f84b58214..481f68fe7f070545b63d39d2589720566288d638 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -177,6 +177,10 @@ aarch64_types_binop_uus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
 #define TYPES_BINOP_UUS (aarch64_types_binop_uus_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned };
+#define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 03e820f426fb0d99634a3e3d675ab648fca1691b..8e3dcf1d04de2e8ccef75bd2f37d5a5ac17321c0 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -86,7 +86,7 @@
   BUILTIN_VSDQ_I (BINOP, sqsub, 0)
   BUILTIN_VSDQ_I (BINOPU, uqsub, 0)
   /* Implemented by aarch64_qadd.  */
-  BUILTIN_VSDQ_I (BINOP, suqadd, 0)
+  BUILTIN_VSDQ_I (BINOP_SSU, suqadd, 0)
   BUILTIN_VSDQ_I (BINOP_UUS, usqadd, 0)
 
   /* Implemented by aarch64_get_dreg.  */
@@ -197,9 +197,9 @@
   BUILTIN_VSDQ_I_DI (BINOP, ashl, 3)
   /* Implemented by aarch64_shl.  */
   BUILTIN_VSDQ_I_DI (BINOP, sshl, 0)
-  BUILTIN_VSDQ_I_DI (BINOP, ushl, 0)
+  BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0)
   BUILTIN_VSDQ_I_DI (BINOP, srshl, 0)
-  BUILTIN_VSDQ_I_DI (BINOP, urshl, 0)
+  BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0)
 
   BUILTIN_VDQ_I (SHIFTIMM, ashr, 3)
   VAR1 (SHIFTIMM, ashr_simd, 0, di)
@@ -207,7 +207,7 @@
   VAR1 (USHIFTIMM, lshr_simd, 0, di)
   /* Implemented by aarch64_shr_n.  */
   BUILTIN_VSDQ_I_DI (SHIFTIMM, srshr_n, 0)
-  BUILTIN_VSDQ_I_DI (SHIFTIMM, urshr_n, 0)
+  BUILTIN_VSDQ_I_DI (USHIFTIMM, urshr_n, 0)
   /* Implemented by aarch64_sra_n.  */
   BUILTIN_VSDQ_I_DI (SHIFTACC, ssra_n, 0)
   BUILTIN_VSDQ_I_DI (USHIFTACC, usra_n, 0)
@@ -215,7 +215,7 @@
   BUILTIN_VSDQ_I_DI (USHIFTACC, ursra_n, 0)
   /* Implemented by aarch64_shll_n.  */
   BUILTIN_VDW (SHIFTIMM, sshll_n, 0)
-  BUILTIN_VDW (SHIFTIMM, ushll_n, 0)
+  BUILTIN_VDW (USHIFTIMM, ushll_n, 0)
   /* Implemented by aarch64_shll2_n.  */
   BUILTIN_VQW (SHIFTIMM, sshll2_n, 0)
   BUILTIN_VQW (SHIFTIMM, ushll2_n, 0)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index af86d8e127404c7bdf2763cbefbdabb2db047dfc..f96871aeaa4613988352a10a2a2a718dd24702ab 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -22605,25 +22605,25 @@ vrshl_s64 (int64x1_t __a, int64x1_t __b)
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vrshl_u8 (uint8x8_t __a, int8x8_t __b)
 {
-  return (uint8x8_t) __builtin_aarch64_urshlv8qi ((int8x8_t) __a, __b);
+  return __builtin_aarch64_urshlv8qi_uus (__a, __b);
 }
 
 __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__))
 vrshl_u16 (uint16x4_t __a, int16x4_t __b)
 {
-  return (uint16x4_t) __builtin_aarch64_urshlv4hi ((int16x4_t) __a, __b);
+  return __builtin_aarch64_urshlv4hi_uus (__a, __b);
 }
 
 __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
 vrshl_u32 (uint32x2_t __a, int32x2_t __b)
 {
-  return (uint32x2_t) __builtin_aarch64_urshlv2si ((int32x2_t) __a, __b);
+  return __builtin_aarch64_urshlv2si_uus (__a, __b);
 }
 
 __extension__ static __inline uint64x1_t 

Re: [PATCH] Inline asm asan instrumentation

2014-05-29 Thread Yury Gribov

The fact that some region appears in "m"
doesn't mean the inline asm actually accesses it, it could not touch it at
all, or only some part of it.


Do we have precise semantics of "m" written somewhere?
My understanding was that even though asm may not touch buffer at all 
(like e.g. in our tests),

user still declares whole region accessible to compiler.


if Asan wants to check that the whole 100*sizeof(long)
region is accessible, it could often just have false positives, because the
inline asm really accesses just some small part of it.


We've seen this abused (e.g. casting to struct { char x[0x]; } *)
and that's main reason why we turned this off by default.
On the other we've seen no problems with ffmpeg's testsuite
and ability to detect overflows in inline asm would be rather useful.

Do you see how we could make this more robustly?
We could e.g. only check the first byte
although this wouldn't be as useful.

-Y


Re: [C++ Patch] PR 57543

2014-05-29 Thread Jason Merrill

On 05/28/2014 01:06 PM, Paolo Carlini wrote:

Now, I got this "insane" idea: would it make sense to simply invert the
substitutions (args and return) unconditionally?


If we're going to change the order, I want to do it in a more correct, 
rather than differently wrong, way.  DR 1227 clarified that substitution 
should proceed in lexical order.


http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1227

Jason



Re: [C++ Patch] PR 57543

2014-05-29 Thread Paolo Carlini

Hi,

On 05/29/2014 03:34 PM, Jason Merrill wrote:

On 05/28/2014 01:06 PM, Paolo Carlini wrote:

Now, I got this "insane" idea: would it make sense to simply invert the
substitutions (args and return) unconditionally?


If we're going to change the order, I want to do it in a more correct, 
rather than differently wrong, way.  DR 1227 clarified that 
substitution should proceed in lexical order.


http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1227
Ok, I had no idea we had this kind of much more general issue. Then this 
is really something for you to handle ;)


Thanks!
Paolo.


Re: ipa-visibility TLC 2/n

2014-05-29 Thread Yufeng Zhang

Hi Honza,

I can confirm that with your commit r211045 the 
arm-none-linux-gnueabi{hf} builds are OK now.  Thanks for the fix.


Yufeng

On 05/28/14 22:56, Jan Hubicka wrote:

Any update?

I've managed to generate a simple test case from
libstdc++-v3/src/c++98/strstream.cc which reproduces the issue on
ARM that Ramana has reported previously:


template   struct char_traits;

template
class basic_ios
{
};

template  >
class basic_istream : virtual public basic_ios<_CharT, _Traits>
{
protected:
   int _M_gcount;
   virtual ~basic_istream()
 { }
};

class istrstream : public basic_istream
{
   virtual ~istrstream();
};

istrstream::~istrstream()
{
}

-- CUT --

With an arm-none-linux-gnueabi gcc configured as:

./gcc/configure --target=arm-none-linux-gnueabi
--enable-gnu-indirect-function --enable-shared --with-arch=armv7-a
--with-fpu=vfpv3-d16 --with-float=softfp --with-arch=armv7-a
(irrelevant parts omitted)

With the following command line options:

  -fdata-sections-O2  -fPIC  -S ./test.cpp

We'll see

./test.cpp:17:7: error: istrstream::_ZTV10istrstream.localalias.0
causes a section type conflict with istrstream::_ZTV10istrstream
  class istrstream : public basic_istream
^
./test.cpp:17:7: note: 'istrstream::_ZTV10istrstream' was declared here


This seems to be same cause as on AIX - we do section for decl rather than
original.  The following patch seems to fix it.  Does it allows bootstrap
for you? (it doesn't for AIX. but that seems bug in output machinery)

Index: varasm.c
===
--- varasm.c(revision 210914)
+++ varasm.c(working copy)
@@ -1083,6 +1083,9 @@
  {
addr_space_t as = ADDR_SPACE_GENERIC;
int reloc;
+  symtab_node *snode = symtab_get_node (decl);
+  if (snode)
+decl = symtab_alias_ultimate_target (snode)->decl;

if (TREE_TYPE (decl) != error_mark_node)
  as = TYPE_ADDR_SPACE (TREE_TYPE (decl));



Yufeng







Re: detecting "container overflow" bugs in std::vector

2014-05-29 Thread Jonathan Wakely

On 26/05/14 19:19 +0400, Konstantin Serebryany wrote:

It does look useful but I'm concerned about a proliferation of
container checks, we already have the libstdc++ Debug Mode
and I'd
like to see some of the lightweight checks from the Google branch
added to trunk too.

Me too, but these checks are mostly orthogonal to the proposed annotations.


Thanks for clarifying that (and to Paul).


Aren't they still much cheaper than asan
instrumentation?


Of course, they are much cheaper than asan. But they do not cover the case
that motivated the container overflow annotations (when the contents
of vector are accessed via
vector::data())


Yes, I don't think I've ever seen that error in code I work with, but
if Asan can be made to detect it then I'm in favour of the changes.

Thanks.



[PATCH][ARM] Use mov_imm type for movw operations consistently

2014-05-29 Thread Kyrill Tkachov

Hi all,

I noticed that in some of our move patterns the movw instruction is given the 
mov_reg type rather than the
mov_imm type that all other uses of movw have. This patch fixes that.

Scanning through our pipeline descriptions I see that mov_imm is treated the 
same way as mov_reg everywhere anyway. In the Cortex-A7 description we do have 
a bit more complicated logic:

;; ALU instruction with an immediate operand can dual-issue.
(define_insn_reservation "cortex_a7_alu_imm" 2
  (and (eq_attr "tune" "cortexa7")
   (ior (eq_attr "type" "adr,alu_imm,alus_imm,logic_imm,logics_imm,\
 mov_imm,mvn_imm,extend")
(and (eq_attr "type" "mov_reg,mov_shift,mov_shift_reg")
 (not (eq_attr "length" "8")
  "cortex_a7_ex2|cortex_a7_ex1")

In the two patterns that I change the mov_imm has a length of 4 an hence will 
still use this reservation.
Thus I don't expect codegen to change at all from this patch but for future 
scheduling jobs this could make a
difference.

Tested arm-none-eabi on qemu.

Ok for trunk?

Thanks,
Kyrill

2014-05-29  Kyrylo Tkachov  

* config/arm/thumb2.md (*thumb2_movhi_insn): Set type of movw
to mov_imm.
* config/arm/vfp.md (*thumb2_movsi_vfp): Likewise.
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 10bc8b1..6ea0810 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -329,7 +329,7 @@
movw%?\\t%0, %L1\\t%@ movhi
str%(h%)\\t%1, %0\\t%@ movhi
ldr%(h%)\\t%0, %1\\t%@ movhi"
-  [(set_attr "type" "mov_reg,mov_imm,mov_imm,mov_reg,store1,load1")
+  [(set_attr "type" "mov_reg,mov_imm,mov_imm,mov_imm,store1,load1")
(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "yes,no,yes,no,no,no")
(set_attr "length" "2,4,2,4,4,4")
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index e1a48ee..8147624 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -100,7 +100,7 @@
   "
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "yes,no,yes,no,no,no,no,no,no,no,no,no,no,no")
-   (set_attr "type" "mov_reg,mov_reg,mov_reg,mvn_reg,mov_reg,load1,load1,store1,store1,f_mcr,f_mrc,fmov,f_loads,f_stores")
+   (set_attr "type" "mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load1,load1,store1,store1,f_mcr,f_mrc,fmov,f_loads,f_stores")
(set_attr "length" "2,4,2,4,4,4,4,4,4,4,4,4,4,4")
(set_attr "pool_range" "*,*,*,*,*,1018,4094,*,*,*,*,*,1018,*")
(set_attr "neg_pool_range" "*,*,*,*,*,   0,   0,*,*,*,*,*,1008,*")]

Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2014-05-29 Thread H.J. Lu
On Wed, May 28, 2014 at 9:52 PM, Andrew Pinski  wrote:
> On Wed, Jul 13, 2011 at 9:39 AM, H.J. Lu  wrote:
>> On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini  wrote:
>>> On 07/11/2011 05:54 PM, H.J. Lu wrote:

 The key is the
 >
 > XEXP (x, 1) == convert_memory_address_addr_space
 >(to_mode, XEXP (x, 1), as)
 >
 >  test.  It ensures basically that the constant has 31-bit precision,
 > because
 >  otherwise the constant would change from e.g. (const_int -0x7ffc)
 > to
 >  (const_int 0x8004) when zero-extending it from SImode to DImode.
 >
 >  But I'm not sure it's safe.  You have,
 >
 >(zero_extend:DI (plus:SI FOO:SI) (const_int Y))
 >
 >  and you want to convert it to
 >
 >(plus:DI FOO:DI (zero_extend:DI (const_int Y)))
 >
 >  (where the zero_extend is folded).  Ignore that FOO is a SYMBOL_REF
 > (this
 >  piece of code does not assume anything about its shape); if FOO ==
 >  0xfffc and Y = 8, the result will be respectively 0x4 (valid) and
 >  0x10004 (invalid).

 This example contradicts what you said above "It ensures basically that
 the
 constant has 31-bit precision".
>>>
>>> Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the same
>>> representation in SImode and DImode, and the test above on XEXP (x, 1)
>>> succeeds.
>>
>> And then we permute conversion and addition, which leads to the issue you
>> raised above.  In another word, the current code permutes conversion
>> and addition.
>> It leads to different values in case of symbol (0xfffc) + 8.
>> Basically the current
>> test for 31-bit (or less) precision is bogus.  The real question is
>> for a address
>> computation, A + B, if address wrap-around is supported in
>> convert_memory_address_addr_space.
>
> Unless the code has already reassociated the additions already.
> Like in the AARCH64 ILP32 case:
>
> (plus:SI (plus:SI (mult:SI (reg/v:SI 80 [ b ])
> (const_int -4 [0xfffc]))
> (subreg/s/u:SI (reg/v/f:DI 79 [ a ]) 0))
> (const_int -1073742592 [0xbd00]))
>
> The Tree level is correct in that it did not reassociate the addition
> but the RTL level ignores that.
>
> So this patch is invalid and incorrect unless you know the non
> constant part of the addition is a pointer (which is not the case
> here).
>

There is an address overflow.  Is the address overflow behavior
defined here?


-- 
H.J.


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2014-05-29 Thread pinskia


> On May 29, 2014, at 9:13 AM, "H.J. Lu"  wrote:
> 
>> On Wed, May 28, 2014 at 9:52 PM, Andrew Pinski  wrote:
>>> On Wed, Jul 13, 2011 at 9:39 AM, H.J. Lu  wrote:
 On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini  wrote:
> On 07/11/2011 05:54 PM, H.J. Lu wrote:
> 
> The key is the
>> 
>>XEXP (x, 1) == convert_memory_address_addr_space
>>   (to_mode, XEXP (x, 1), as)
>> 
>> test.  It ensures basically that the constant has 31-bit precision,
>> because
>> otherwise the constant would change from e.g. (const_int -0x7ffc)
>> to
>> (const_int 0x8004) when zero-extending it from SImode to DImode.
>> 
>> But I'm not sure it's safe.  You have,
>> 
>>   (zero_extend:DI (plus:SI FOO:SI) (const_int Y))
>> 
>> and you want to convert it to
>> 
>>   (plus:DI FOO:DI (zero_extend:DI (const_int Y)))
>> 
>> (where the zero_extend is folded).  Ignore that FOO is a SYMBOL_REF
>> (this
>> piece of code does not assume anything about its shape); if FOO ==
>> 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and
>> 0x10004 (invalid).
> 
> This example contradicts what you said above "It ensures basically that
> the
> constant has 31-bit precision".
 
 Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the same
 representation in SImode and DImode, and the test above on XEXP (x, 1)
 succeeds.
>>> 
>>> And then we permute conversion and addition, which leads to the issue you
>>> raised above.  In another word, the current code permutes conversion
>>> and addition.
>>> It leads to different values in case of symbol (0xfffc) + 8.
>>> Basically the current
>>> test for 31-bit (or less) precision is bogus.  The real question is
>>> for a address
>>> computation, A + B, if address wrap-around is supported in
>>> convert_memory_address_addr_space.
>> 
>> Unless the code has already reassociated the additions already.
>> Like in the AARCH64 ILP32 case:
>> 
>> (plus:SI (plus:SI (mult:SI (reg/v:SI 80 [ b ])
>>(const_int -4 [0xfffc]))
>>(subreg/s/u:SI (reg/v/f:DI 79 [ a ]) 0))
>>(const_int -1073742592 [0xbd00]))
>> 
>> The Tree level is correct in that it did not reassociate the addition
>> but the RTL level ignores that.
>> 
>> So this patch is invalid and incorrect unless you know the non
>> constant part of the addition is a pointer (which is not the case
>> here).
> 
> There is an address overflow.  Is the address overflow behavior
> defined here?

There was no address overflow in the original code and there was no address 
overflow in the tree level. The rtl level does introduce an address overflow 
but the semantics of plus is defined to be wrapping so there is no overflow.   
This is blocking me from testing ilp32 under gnu/Linux as ld.so gets 
miscompiled and stack addresses have the "sign" bit set. 

Thanks,
Andrew


> 
> 
> -- 
> H.J.


Re: [PATCH AArch64] Remove from arm_neon.h functions not in the spec

2014-05-29 Thread Alan Lawrence

Patch retaining vfmaq_n_f64 attached, updated gcc/ChangeLog:

* config/aarch64/arm_neon.h (vmlaq_n_f64, vmlsq_n_f64, vrsrtsq_f64,
vcge_p8, vcgeq_p8, vcgez_p8, vcgez_u8, vcgez_u16, vcgez_u32, vcgez_u64,
vcgezq_p8, vcgezq_u8, vcgezq_u16, vcgezq_u32, vcgezq_u64, vcgezd_u64,
vcgt_p8, vcgtq_p8, vcgtz_p8, vcgtz_u8, vcgtz_u16, vcgtz_u32, vcgtz_u64,
vcgtzq_p8, vcgtzq_u8, vcgtzq_u16, vcgtzq_u32, vcgtzq_u64, vcgtzd_u64,
vcle_p8, vcleq_p8, vclez_p8, vclez_u64, vclezq_p8, vclezd_u64, vclt_p8,
vcltq_p8, vcltz_p8, vcltzq_p8, vcltzd_u64): Remove functions as they are
not in the spec.


Alan Lawrence wrote:
No, hold that, vfmaq_n_f64 has been added back in the latest version (to which I 
linked). Hang on...


--Alan

Alan Lawrence wrote:
arm_neon.h contains a bunch of functions (for example, the wonderful vcgez_u* 
intrinsics - that's an unsigned comparison of greater-than-or-equal-to zero) 
that are not present in the current ARM Neon Intrinsics spec:


http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/index.html

This patch just deletes those intrinsics.

OK for trunk?

Cheers, Alan

gcc/ChangeLog:
2014-05-27  Alan Lawrence  

* config/aarch64/arm_neon.h (vfmaq_n_f64, vmlaq_n_f64, vmlsq_n_f64,
vrsrtsq_f64, vtst_p16, vtstq_p16, vcge_p8, vcgeq_p8, vcgez_p8, vcgez_u8,
vcgez_u16, vcgez_u32, vcgez_u64, vcgezq_p8, vcgezq_u8, vcgezq_u16,
vcgezq_u32, vcgezq_u64, vcgezd_u64, vcgt_p8, vcgtq_p8, vcgtz_p8,
vcgtz_u8, vcgtz_u16, vcgtz_u32, vcgtz_u64, vcgtzq_p8, vcgtzq_u8,
vcgtzq_u16, vcgtzq_u32, vcgtzq_u64, vcgtzd_u64, vcle_p8, vcleq_p8,
vclez_p8, vclez_u64, vclezq_p8, vclezd_u64, vclt_p8, vcltq_p8, vcltz_p8,
vcltzq_p8, vcltzd_u64): Remove functions as they are not in the spec.






diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 747a292ba9b2260e74566c946fe57afaea267969..bbf47349ae0e21761637a670c6e59c5c1e3f5195 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7243,18 +7243,6 @@ vmlaq_n_f32 (float32x4_t a, float32x4_t b, float32_t c)
   return result;
 }
 
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vmlaq_n_f64 (float64x2_t a, float64x2_t b, float64_t c)
-{
-  float64x2_t result;
-  float64x2_t t1;
-  __asm__ ("fmul %1.2d, %3.2d, %4.d[0]; fadd %0.2d, %0.2d, %1.2d"
-   : "=w"(result), "=w"(t1)
-   : "0"(a), "w"(b), "w"(c)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlaq_n_s16 (int16x8_t a, int16x8_t b, int16_t c)
 {
@@ -7943,18 +7931,6 @@ vmlsq_n_f32 (float32x4_t a, float32x4_t b, float32_t c)
   return result;
 }
 
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vmlsq_n_f64 (float64x2_t a, float64x2_t b, float64_t c)
-{
-  float64x2_t result;
-  float64x2_t t1;
-  __asm__ ("fmul %1.2d, %3.2d, %4.d[0]; fsub %0.2d, %0.2d, %1.2d"
-   : "=w"(result), "=w"(t1)
-   : "0"(a), "w"(b), "x"(c)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vmlsq_n_s16 (int16x8_t a, int16x8_t b, int16_t c)
 {
@@ -11329,17 +11305,6 @@ vrsqrtss_f32 (float32_t a, float32_t b)
   return result;
 }
 
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vrsrtsq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("frsqrts %0.2d,%1.2d,%2.2d"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vrsubhn_high_s16 (int8x8_t a, int16x8_t b, int16x8_t c)
 {
@@ -16082,13 +16047,6 @@ vcge_f64 (float64x1_t __a, float64x1_t __b)
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vcge_p8 (poly8x8_t __a, poly8x8_t __b)
-{
-  return (uint8x8_t) __builtin_aarch64_cmgev8qi ((int8x8_t) __a,
-		 (int8x8_t) __b);
-}
-
-__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
 vcge_s8 (int8x8_t __a, int8x8_t __b)
 {
   return (uint8x8_t) __builtin_aarch64_cmgev8qi (__a, __b);
@@ -16152,13 +16110,6 @@ vcgeq_f64 (float64x2_t __a, float64x2_t __b)
 }
 
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
-vcgeq_p8 (poly8x16_t __a, poly8x16_t __b)
-{
-  return (uint8x16_t) __builtin_aarch64_cmgev16qi ((int8x16_t) __a,
-		   (int8x16_t) __b);
-}
-
-__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vcgeq_s8 (int8x16_t __a, int8x16_t __b)
 {
   return (uint8x16_t) __builtin_aarch64_cmgev16qi (__a, __b);
@@ -16252,14 +16203,6 @@ vcgez_f64 (float64x1_t __a)
 }
 
 __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__))
-vcgez_p8 (poly8x8_t __a)
-{
-  poly8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0};
-  return (uint8x8_

Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2014-05-29 Thread H.J. Lu
On Thu, May 29, 2014 at 9:23 AM,   wrote:
>
>
>> On May 29, 2014, at 9:13 AM, "H.J. Lu"  wrote:
>>
>>> On Wed, May 28, 2014 at 9:52 PM, Andrew Pinski  wrote:
 On Wed, Jul 13, 2011 at 9:39 AM, H.J. Lu  wrote:
> On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini  wrote:
>> On 07/11/2011 05:54 PM, H.J. Lu wrote:
>>
>> The key is the
>>>
>>>XEXP (x, 1) == convert_memory_address_addr_space
>>>   (to_mode, XEXP (x, 1), as)
>>>
>>> test.  It ensures basically that the constant has 31-bit precision,
>>> because
>>> otherwise the constant would change from e.g. (const_int -0x7ffc)
>>> to
>>> (const_int 0x8004) when zero-extending it from SImode to DImode.
>>>
>>> But I'm not sure it's safe.  You have,
>>>
>>>   (zero_extend:DI (plus:SI FOO:SI) (const_int Y))
>>>
>>> and you want to convert it to
>>>
>>>   (plus:DI FOO:DI (zero_extend:DI (const_int Y)))
>>>
>>> (where the zero_extend is folded).  Ignore that FOO is a SYMBOL_REF
>>> (this
>>> piece of code does not assume anything about its shape); if FOO ==
>>> 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and
>>> 0x10004 (invalid).
>>
>> This example contradicts what you said above "It ensures basically that
>> the
>> constant has 31-bit precision".
>
> Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the same
> representation in SImode and DImode, and the test above on XEXP (x, 1)
> succeeds.

 And then we permute conversion and addition, which leads to the issue you
 raised above.  In another word, the current code permutes conversion
 and addition.
 It leads to different values in case of symbol (0xfffc) + 8.
 Basically the current
 test for 31-bit (or less) precision is bogus.  The real question is
 for a address
 computation, A + B, if address wrap-around is supported in
 convert_memory_address_addr_space.
>>>
>>> Unless the code has already reassociated the additions already.
>>> Like in the AARCH64 ILP32 case:
>>>
>>> (plus:SI (plus:SI (mult:SI (reg/v:SI 80 [ b ])
>>>(const_int -4 [0xfffc]))
>>>(subreg/s/u:SI (reg/v/f:DI 79 [ a ]) 0))
>>>(const_int -1073742592 [0xbd00]))
>>>
>>> The Tree level is correct in that it did not reassociate the addition
>>> but the RTL level ignores that.
>>>
>>> So this patch is invalid and incorrect unless you know the non
>>> constant part of the addition is a pointer (which is not the case
>>> here).
>>
>> There is an address overflow.  Is the address overflow behavior
>> defined here?
>
> There was no address overflow in the original code and there was no address 
> overflow in the tree level. The rtl level does introduce an address overflow 
> but the semantics of plus is defined to be wrapping so there is no overflow.  
>  This is blocking me from testing ilp32 under gnu/Linux as ld.so gets 
> miscompiled and stack addresses have the "sign" bit set.
>

What is your Pmode?


-- 
H.J.


Re: ipa-visibility TLC 2/n

2014-05-29 Thread Jan Hubicka
> Jan Hubicka  writes:
> >> Richard Sandiford wrote the original section anchors implementation,
> >> so he would be a good person to comment about the interaction between
> >> aliases and section anchors.
> >
> > Thanks! Richard, does this patch seem sane?
> 
> Looks good to me in principle, but with:
> 
> > +  struct symtab_node *snode;
> >decl = SYMBOL_REF_DECL (symbol);
> > +
> > +  snode = symtab_node (decl);
> > +  if (snode->alias)
> > +   {
> > + rtx target = DECL_RTL (symtab_alias_ultimate_target 
> > (snode)->decl);
> > + SYMBOL_REF_BLOCK_OFFSET (symbol) = SYMBOL_REF_BLOCK_OFFSET 
> > (target);
> > + return;
> > +   }
> 
> is SYMBOL_REF_BLOCK_OFFSET (target) guaranteed to be valid at this point?
> It looked at face value like you'd need a recursive call to place_block_symbol
> on the target before the copy.

My reading was that SYMBOL_REF_BLOCK_OFFSET is computed at DECL_RTL calculation
time. But you are right - it is done by validize_mem that is not done by 
DECL_RTL.
Shall I just call it on target first?

Honza
> 
> Thanks,
> Richard


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2014-05-29 Thread pinskia


> On May 29, 2014, at 10:09 AM, "H.J. Lu"  wrote:
> 
>> On Thu, May 29, 2014 at 9:23 AM,   wrote:
>> 
>> 
 On May 29, 2014, at 9:13 AM, "H.J. Lu"  wrote:
 
> On Wed, May 28, 2014 at 9:52 PM, Andrew Pinski  wrote:
>> On Wed, Jul 13, 2011 at 9:39 AM, H.J. Lu  wrote:
>>> On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini  wrote:
>>> On 07/11/2011 05:54 PM, H.J. Lu wrote:
>>> 
>>> The key is the
 
   XEXP (x, 1) == convert_memory_address_addr_space
  (to_mode, XEXP (x, 1), as)
 
 test.  It ensures basically that the constant has 31-bit precision,
 because
 otherwise the constant would change from e.g. (const_int -0x7ffc)
 to
 (const_int 0x8004) when zero-extending it from SImode to DImode.
 
 But I'm not sure it's safe.  You have,
 
  (zero_extend:DI (plus:SI FOO:SI) (const_int Y))
 
 and you want to convert it to
 
  (plus:DI FOO:DI (zero_extend:DI (const_int Y)))
 
 (where the zero_extend is folded).  Ignore that FOO is a SYMBOL_REF
 (this
 piece of code does not assume anything about its shape); if FOO ==
 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and
 0x10004 (invalid).
>>> 
>>> This example contradicts what you said above "It ensures basically that
>>> the
>>> constant has 31-bit precision".
>> 
>> Why?  Certainly Y = 8 has 31-bit (or less) precision.  So it has the same
>> representation in SImode and DImode, and the test above on XEXP (x, 1)
>> succeeds.
> 
> And then we permute conversion and addition, which leads to the issue you
> raised above.  In another word, the current code permutes conversion
> and addition.
> It leads to different values in case of symbol (0xfffc) + 8.
> Basically the current
> test for 31-bit (or less) precision is bogus.  The real question is
> for a address
> computation, A + B, if address wrap-around is supported in
> convert_memory_address_addr_space.
 
 Unless the code has already reassociated the additions already.
 Like in the AARCH64 ILP32 case:
 
 (plus:SI (plus:SI (mult:SI (reg/v:SI 80 [ b ])
   (const_int -4 [0xfffc]))
   (subreg/s/u:SI (reg/v/f:DI 79 [ a ]) 0))
   (const_int -1073742592 [0xbd00]))
 
 The Tree level is correct in that it did not reassociate the addition
 but the RTL level ignores that.
 
 So this patch is invalid and incorrect unless you know the non
 constant part of the addition is a pointer (which is not the case
 here).
>>> 
>>> There is an address overflow.  Is the address overflow behavior
>>> defined here?
>> 
>> There was no address overflow in the original code and there was no address 
>> overflow in the tree level. The rtl level does introduce an address overflow 
>> but the semantics of plus is defined to be wrapping so there is no overflow. 
>>   This is blocking me from testing ilp32 under gnu/Linux as ld.so gets 
>> miscompiled and stack addresses have the "sign" bit set.
>> 
> 
> What is your Pmode?

Pmode is dimode while ptr_mode is simode.  Pointers are zero extended when 
converting between si and di modes. 

Thanks,
Andrew


> 
> 
> -- 
> H.J.


Patch RFA: Move x86 _mm_pause out of pragma target("sse") scope

2014-05-29 Thread Ian Lance Taylor
The _mm_pause intrinsic is defined in xmmintrin.h.  Right now using it
with -m32 with the default -march option gives an error:

/home/iant/foo.c: In function ‘f’:
/home/iant/gcc/go-install/lib/gcc/x86_64-unknown-linux-gnu/4.10.0/include/xmmintrin.h:1238:1:
 error: inlining failed in call to always_inline ‘_mm_pause’: target specific 
option mismatch
 _mm_pause (void)
 ^
/home/iant/foo5.c:3:13: error: called from here
 void f () { _mm_pause (); }
 ^

This error is because _mm_pause is defined in the scope of #pragma GCC
target("sse").  But _mm_pause, which simply generates the pause
instruction, does not require SSE support.  The pause instruction has
nothing really to do with SSE, and it works on all x86 processors (on
processors that do not explicitly recognize it, it is a nop).

I propose the following patch, which moves _mm_pause out of the pragma
target scope.

I know that x86intrin.h provides a similar intrinsic, __pause, but I
think it's worth making _mm_pause work reasonably as well.

I'm running a full testsuite run.  OK for mainline if it passes?

Ian


gcc/ChangeLog:

2014-05-29  Ian Lance Taylor  

* config/i386/xmmintrin.h (_mm_pause): Move out of scope of pragma
target("sse").

gcc/testsuite/ChangeLog:

2014-05-29  Ian Lance Taylor  

* gcc.target/i386/pause-2.c: New test.


Index: config/i386/xmmintrin.h
===
--- config/i386/xmmintrin.h	(revision 211057)
+++ config/i386/xmmintrin.h	(working copy)
@@ -1231,15 +1231,6 @@ _mm_sfence (void)
   __builtin_ia32_sfence ();
 }
 
-/* The execution of the next instruction is delayed by an implementation
-   specific amount of time.  The instruction does not modify the
-   architectural state.  */
-extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_pause (void)
-{
-  __builtin_ia32_pause ();
-}
-
 /* Transpose the 4x4 matrix composed of row[0-3].  */
 #define _MM_TRANSPOSE4_PS(row0, row1, row2, row3)			\
 do {	\
@@ -1262,4 +1253,15 @@ do {	\
 #pragma GCC pop_options
 #endif /* __DISABLE_SSE__ */
 
+/* The execution of the next instruction is delayed by an implementation
+   specific amount of time.  The instruction does not modify the
+   architectural state.  This is after the pop_options pragma because
+   it does not require SSE support in the processor--the encoding is a
+   nop on processors that do not support it.  */
+extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_pause (void)
+{
+  __builtin_ia32_pause ();
+}
+
 #endif /* _XMMINTRIN_H_INCLUDED */
Index: testsuite/gcc.target/i386/pause-2.c
===
--- testsuite/gcc.target/i386/pause-2.c	(revision 0)
+++ testsuite/gcc.target/i386/pause-2.c	(revision 0)
@@ -0,0 +1,12 @@
+/* Test that pause instruction works even when SSE is not enabled.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -dp" } */
+/* { dg-final { scan-assembler-times "\\*pause" 1 } } */
+
+#include 
+
+void
+foo (void)
+{
+  _mm_pause ();
+}


Re: [AArch64/ARM 3/3] Add execution tests of ARM EXT intrinsics

2014-05-29 Thread Alan Lawrence
I've just committed this as revision 211059, with the change of adding a _1 
suffix to the names of all the new tests (as per standard testsuite convention).


All passed on arm-none-eabi and armeb-none-eabi.

Cheers, Alan

Ramana Radhakrishnan wrote:

On Wed, Apr 23, 2014 at 9:32 PM, Alan Lawrence  wrote:

Final patch in series, adds new tests of the ARM EXT Intrinsics, that also
check
the execution results, reusing the test bodies introduced into AArch64 in
the
first patch. (These tests subsume the autogenerated ones in
testsuite/gcc.target/arm/neon/ that only check assembler output.)

Tests use gcc.target/arm/simd/simd.exp from corresponding patch for ZIP
Intrinsics http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01500.html, will
commit that first.

All tests passing on arm-none-eabi.



Ok if no regressions.

Thanks,
Ramana

gcc/testsuite/ChangeLog:
2014-04-23  Alan Lawrence  

gcc.target/arm/simd/vextQf32.c: New file.
gcc.target/arm/simd/vextQp16.c: New file.
gcc.target/arm/simd/vextQp8.c: New file.
gcc.target/arm/simd/vextQs16.c: New file.
gcc.target/arm/simd/vextQs32.c: New file.
gcc.target/arm/simd/vextQs64.c: New file.
gcc.target/arm/simd/vextQs8.c: New file.
gcc.target/arm/simd/vextQu16.c: New file.
gcc.target/arm/simd/vextQu32.c: New file.
gcc.target/arm/simd/vextQu64.c: New file.
gcc.target/arm/simd/vextQu8.c: New file.
gcc.target/arm/simd/vextQp64.c: New file.
gcc.target/arm/simd/vextf32.c: New file.
gcc.target/arm/simd/vextp16.c: New file.
gcc.target/arm/simd/vextp8.c: New file.
gcc.target/arm/simd/vexts16.c: New file.
gcc.target/arm/simd/vexts32.c: New file.
gcc.target/arm/simd/vexts64.c: New file.
gcc.target/arm/simd/vexts8.c: New file.
gcc.target/arm/simd/vextu16.c: New file.
gcc.target/arm/simd/vextu32.c: New file.
gcc.target/arm/simd/vextu64.c: New file.
gcc.target/arm/simd/vextu8.c: New file.
gcc.target/arm/simd/vextp64.c: New file.

diff --git a/gcc/testsuite/gcc.target/arm/simd/vextQf32.c
b/gcc/testsuite/gcc.target/arm/simd/vextQf32.c
new file mode 100644
index 000..c1da6d3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vextQf32.c
@@ -0,0 +1,12 @@
+/* Test the `vextQf32' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+/* { dg-add-options arm_neon } */
+
+#include "arm_neon.h"
+#include "../../aarch64/simd/extq_f32.x"
+
+/* { dg-final { scan-assembler-times "vext\.32\[ \t\]+\[qQ\]\[0-9\]+,
\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9
\]+\)?\n" 3 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vextQp16.c
b/gcc/testsuite/gcc.target/arm/simd/vextQp16.c
new file mode 100644
index 000..adc0861
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vextQp16.c
@@ -0,0 +1,12 @@
+/* Test the `vextQp16' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+/* { dg-add-options arm_neon } */
+
+#include "arm_neon.h"
+#include "../../aarch64/simd/extq_p16.x"
+
+/* { dg-final { scan-assembler-times "vext\.16\[ \t\]+\[qQ\]\[0-9\]+,
\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9
\]+\)?\n" 7 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vextQp64.c
b/gcc/testsuite/gcc.target/arm/simd/vextQp64.c
new file mode 100644
index 000..e8b688d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vextQp64.c
@@ -0,0 +1,33 @@
+/* Test the `vextQp64' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_crypto_ok } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+/* { dg-add-options arm_crypto } */
+
+#include "arm_neon.h"
+
+extern void abort (void);
+
+poly64x2_t
+test_vextq_p64_1 (poly64x2_t a, poly64x2_t b)
+{
+  return vextq_p64(a, b, 1);
+}
+
+int
+main (int argc, char **argv)
+{
+  int i, off;
+  poly64x2_t in1 = {0, 1};
+  poly64x2_t in2 = {2, 3};
+  poly64x2_t actual = test_vextq_p64_1 (in1, in2);
+  for (i = 0; i < 2; i++)
+if (actual[i] != i + 1)
+  abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "vext\.64\[ \t\]+\[qQ\]\[0-9\]+,
\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9
\]+\)?\n" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/arm/simd/vextQp8.c
b/gcc/testsuite/gcc.target/arm/simd/vextQp8.c
new file mode 100644
index 000..5f2cc53
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/vextQp8.c
@@ -0,0 +1,12 @@
+/* Test the `vextQp8' ARM Neon intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-save-temps -O3 -fno-inline" } */
+/* { dg-add-options arm_neon } */
+
+#include "

patch to fix PR61325

2014-05-29 Thread Vladimir Makarov
  The following patch PR61325.  The details can be found on

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61325

  The patch was bootstrapped and tested on x86/x86-64.

  Committed as rev. 211060 to gcc-4.9 branch and as rev.211061 to trunk.

2014-05-29  Vladimir Makarov  

PR rtl-optimization/61325
* lra-constraints.c (process_address): Rename to
process_address_1.
(process_address): New function.

2014-05-29  Vladimir Makarov  

PR rtl-optimization/61325
* gcc.target/aarch64/pr61325.c: New.

Index: lra-constraints.c
===
--- lra-constraints.c	(revision 210973)
+++ lra-constraints.c	(working copy)
@@ -2784,9 +2784,14 @@
 
Add reloads to the lists *BEFORE and *AFTER.  We might need to add
reloads to *AFTER because of inc/dec, {pre, post} modify in the
-   address.  Return true for any RTL change.  */
+   address.  Return true for any RTL change.
+
+   The function is a helper function which does not produce all
+   transformations which can be necessary.  It does just basic steps.
+   To do all necessary transformations use function
+   process_address.  */
 static bool
-process_address (int nop, rtx *before, rtx *after)
+process_address_1 (int nop, rtx *before, rtx *after)
 {
   struct address_info ad;
   rtx new_reg;
@@ -2986,6 +2991,18 @@
   return true;
 }
 
+/* Do address reloads until it is necessary.  Use process_address_1 as
+   a helper function.  Return true for any RTL changes.  */
+static bool
+process_address (int nop, rtx *before, rtx *after)
+{
+  bool res = false;
+
+  while (process_address_1 (nop, before, after))
+res = true;
+  return res;
+}
+
 /* Emit insns to reload VALUE into a new register.  VALUE is an
auto-increment or auto-decrement RTX whose operand is a register or
memory location; so reloading involves incrementing that location.
@@ -3270,7 +3287,7 @@
 	change_p = true;
 	lra_update_dup (curr_id, i);
   }
-
+  
   if (change_p)
 /* If we've changed the instruction then any alternative that
we chose previously may no longer be valid.  */
Index: testsuite/gcc.target/aarch64/pr61325.c
===
--- testsuite/gcc.target/aarch64/pr61325.c	(revision 0)
+++ testsuite/gcc.target/aarch64/pr61325.c	(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+typedef unsigned int wchar_t;
+typedef long unsigned int size_t;
+
+size_t
+wcstombs(char *s , const wchar_t *pwcs , size_t n)
+{
+  int count = 0;
+  
+  if (n != 0) {
+do {
+  if ((*s++ = (char) *pwcs++) == 0)
+break;
+  count++;
+} while (--n != 0);
+  }
+  return count;
+}


[patch, avr] ata6289 device ISA is updated

2014-05-29 Thread S, Pitchumani
Hi,

Device ATA6289 has MUL instruction and it belongs to avr4 ISA.
Now it is incorrectly listed under avr25. Attached patch corrects it.

Please commit if the patch is OK. I do not have commit access.

Regards,
Pitchumani

2014-05-29  Pitchumani Sivanupandi 
 
* config/avr/avr-mcus.def: Change ATA6289 ISA to AVR4
* config/avr/avr-tables.opt: Regenerate.
* config/avr/t-multilib: Regenerate.
* doc/avr-mmcu.texi: Regenerate.



ata6289-isa-change.patch
Description: ata6289-isa-change.patch


Re: [PATCH] Do not build libsanitizer also for powerpc*-*-linux*

2014-05-29 Thread Peter Bergner
On Wed, 2014-05-28 at 09:36 +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Mon, 26 May 2014 18:53:22 +0800, Arseny Solokha  wrote:
> > Recent changes in GetPcSpBp() (libsanitizer/asan/asan_linux.cc) made it
> > impossible to build 4.10.0-alpha20140525 snapshot for powerpc targets.
> 
> I hit this, too.
> 
> > The
> > proposed patch disables building libsanitizer for powerpc*-*-linux* in 
> > addition
> > to already disabled powerpc*le-*-linux* until the smarter solution will 
> > emerge.
> > 
> > The actual issue preventing ASAN from porting to PPC seems to be inability 
> > to
> > retrieve values of PC and BP on this architecture.
> 
> This is being discussed in the thread at
> .  Until that
> has been resolved, I do agree to check in the following patch (and have
> successfully tested it, but cannot formally approve it for commit; thus
> copying the libsanitizer maintainers):

The re-enablement patch was submitted to the llvm mailing list here:

  
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140526/219249.html

Once that is committed and merged into gcc, we can re-enable building
libsanitizer for powerpc*-linux.

Peter





Re: RFA: A couple of ira_get_dup_out_num fixes

2014-05-29 Thread Vladimir Makarov
On 05/28/2014 04:32 PM, Richard Sandiford wrote:
> While working on patches to speed up the handling of constraints,
> I hit some behaviour in ira_get_dup_out_num that looked unintentional:
>
> - the check for output operands was part of the !ignored_p condition
>   so would be skipped if the first alternative is disabled/excluded.
>
> - the first disabled/excluded alternative stops all following alternatives
>   from being processed, since we get "stuck" in the first part of the
>   "if" statement and never increment curr_alt.
>
> This seems to have some effect on the testsuite.  E.g. at -O2
> gcc.c-torture/compile/20071117-1.c has changes like:
>
>  .LCFI2:
> movq%rsp, %rbx
> subq%rax, %rsp
> -   leaq15(%rsp), %rax
> -   andq$-16, %rax
> -   movq%rax, %rdi
> +   leaq15(%rsp), %rdi
> +   andq$-16, %rdi
> callbar
> xorl%esi, %esi
> movq%rbx, %rsp
>
> There are also some cases where the change introduces a move though.
> E.g. gcc.c-torture/compat/struct-ic.c has:
>
> movabsq $4294967296, %rdx
> addq$8, %rsp
>  .LCFI4:
> -   andq%rdi, %rax
> +   andq%rax, %rdi
> +   movq%rdi, %rax
> orq %rdx, %rax
> ret
>  .L9:
>
> But AFAICT the patch is what was originally intended.
>
> Tested on x86_64-linux-gnu.  OK to install?
>
>
Ok, Richard.  Thanks for fixing this.


Re: [C++ Patch] PR 57543

2014-05-29 Thread Paolo Carlini

Hi again,

On 05/29/2014 03:34 PM, Jason Merrill wrote:

On 05/28/2014 01:06 PM, Paolo Carlini wrote:

Now, I got this "insane" idea: would it make sense to simply invert the
substitutions (args and return) unconditionally?


If we're going to change the order, I want to do it in a more correct, 
rather than differently wrong, way.  DR 1227 clarified that 
substitution should proceed in lexical order.


http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1227
So, here is another iteration, sorry about the ping-pong. I put together 
the below which already passes testing. How does it look?


Thanks again for your patience,
Paolo.

/
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 211052)
+++ cp/cp-tree.h(working copy)
@@ -125,7 +125,7 @@ c-common.h, not after.
Usage of TYPE_LANG_FLAG_?:
0: TYPE_DEPENDENT_P
1: TYPE_HAS_USER_CONSTRUCTOR.
-   2: unused
+   2: TYPE_HAS_LATE_RETURN_TYPE (in FUNCTION_TYPE, METHOD_TYPE)
3: TYPE_FOR_JAVA.
4: TYPE_HAS_NONTRIVIAL_DESTRUCTOR
5: CLASS_TYPE_P (in RECORD_TYPE and UNION_TYPE)
@@ -3404,6 +3404,11 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a
user-declared constructor.  */
 #define TYPE_HAS_USER_CONSTRUCTOR(NODE) (TYPE_LANG_FLAG_1 (NODE))
 
+/* Nonzero means that the FUNCTION_TYPE or METHOD_TYPE has a
+   late-specified return type.  */
+#define TYPE_HAS_LATE_RETURN_TYPE(NODE) \
+  (TYPE_LANG_FLAG_2 (FUNC_OR_METHOD_CHECK (NODE)))
+
 /* When appearing in an INDIRECT_REF, it means that the tree structure
underneath is actually a call to a constructor.  This is needed
when the constructor must initialize local storage (which can
Index: cp/decl.c
===
--- cp/decl.c   (revision 211052)
+++ cp/decl.c   (working copy)
@@ -8817,6 +8817,7 @@ grokdeclarator (const cp_declarator *declarator,
   bool template_parm_flag = false;
   bool typedef_p = decl_spec_seq_has_spec_p (declspecs, ds_typedef);
   bool constexpr_p = decl_spec_seq_has_spec_p (declspecs, ds_constexpr);
+  bool late_return_type_p = false;
   source_location saved_loc = input_location;
   const char *errmsg;
 
@@ -9660,6 +9661,9 @@ grokdeclarator (const cp_declarator *declarator,
if (type == error_mark_node)
  return error_mark_node;
 
+   if (declarator->u.function.late_return_type)
+ late_return_type_p = true;
+
if (ctype == NULL_TREE
&& decl_context == FIELD
&& funcdecl_p
@@ -10590,6 +10594,10 @@ grokdeclarator (const cp_declarator *declarator,
  decl_function_context (TYPE_MAIN_DECL (ctype)) : NULL_TREE;
publicp = (! friendp || ! staticp)
  && function_context == NULL_TREE;
+
+   if (late_return_type_p)
+ TYPE_HAS_LATE_RETURN_TYPE (type) = 1;
+
decl = grokfndecl (ctype, type,
   TREE_CODE (unqualified_id) != TEMPLATE_ID_EXPR
   ? unqualified_id : dname,
@@ -10814,6 +10822,9 @@ grokdeclarator (const cp_declarator *declarator,
publicp = (ctype != NULL_TREE
   || storage_class != sc_static);
 
+   if (late_return_type_p)
+ TYPE_HAS_LATE_RETURN_TYPE (type) = 1;
+
decl = grokfndecl (ctype, type, original_name, parms, unqualified_id,
   virtualp, flags, memfn_quals, rqual, raises,
   1, friendp,
Index: cp/pt.c
===
--- cp/pt.c (revision 211052)
+++ cp/pt.c (working copy)
@@ -11322,8 +11322,42 @@ tsubst_function_type (tree t,
   /* The TYPE_CONTEXT is not used for function/method types.  */
   gcc_assert (TYPE_CONTEXT (t) == NULL_TREE);
 
-  /* Substitute the return type.  */
-  return_type = tsubst (TREE_TYPE (t), args, complain, in_decl);
+  /* DR 1227:  Mixing immediate and non-immediate contexts in deduction
+ failure.  */
+  bool late_return_type_p = TYPE_HAS_LATE_RETURN_TYPE (t);
+
+  if (late_return_type_p)
+{
+  /* Substitute the argument types.  */
+  arg_types = tsubst_arg_types (TYPE_ARG_TYPES (t), args, NULL_TREE,
+   complain, in_decl);
+  if (arg_types == error_mark_node)
+   return error_mark_node;
+
+  tree save_ccp = current_class_ptr;
+  tree save_ccr = current_class_ref;
+  tree this_type = (TREE_CODE (t) == METHOD_TYPE
+   ? TREE_TYPE (TREE_VALUE (arg_types)) : NULL_TREE);
+  bool do_inject = this_type && !dependent_type_p (this_type);
+  if (do_inject)
+   {
+ /* DR 1207: 'this' is in scope in the trailing return type.  */
+ inject_this_parameter (this_type, cp_type_quals (this_type));
+   }
+
+  /* Substitute the return type.  */
+  return_type = tsubst (TREE_TYPE (t), args, complain, in_decl);

libgo patch committed: Add --without-libatomic configure option

2014-05-29 Thread Ian Lance Taylor
This patch from Peter Collingbourne adds a --without-libatomic configure
option to libgo, to make it easier to build libgo outside of the GCC
build system.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r 9e7a28ffe425 libgo/Makefile.am
--- a/libgo/Makefile.am	Wed May 28 17:02:52 2014 -0700
+++ b/libgo/Makefile.am	Thu May 29 13:05:27 2014 -0700
@@ -30,6 +30,8 @@
 LIBFFI = @LIBFFI@
 LIBFFIINCS = @LIBFFIINCS@
 
+LIBATOMIC = @LIBATOMIC@
+
 WARN_CFLAGS = $(WARN_FLAGS) $(WERROR)
 
 # -I/-D flags to pass when compiling.
@@ -1949,8 +1951,7 @@
 
 libgo_la_LIBADD = \
 	$(libgo_go_objs) ../libbacktrace/libbacktrace.la \
-	../libatomic/libatomic_convenience.la \
-	$(LIBFFI) $(PTHREAD_LIBS) $(MATH_LIBS) $(NET_LIBS)
+	$(LIBATOMIC) $(LIBFFI) $(PTHREAD_LIBS) $(MATH_LIBS) $(NET_LIBS)
 
 libgobegin_a_SOURCES = \
 	runtime/go-main.c
diff -r 9e7a28ffe425 libgo/configure.ac
--- a/libgo/configure.ac	Wed May 28 17:02:52 2014 -0700
+++ b/libgo/configure.ac	Thu May 29 13:05:27 2014 -0700
@@ -122,6 +122,21 @@
 AC_SUBST(LIBFFI)
 AC_SUBST(LIBFFIINCS)
 
+# See if the user wants to configure without libatomic. This is useful if we are
+# on an architecture for which libgo does not need an atomic support library and
+# libatomic does not support our C compiler.
+AC_ARG_WITH(libatomic,
+  AS_HELP_STRING([--without-libatomic],
+ [don't use libatomic]),
+  [:],
+  [with_libatomic=${with_libatomic_default-yes}])
+
+LIBATOMIC=
+if test "$with_libatomic" != no; then
+   LIBATOMIC=../libatomic/libatomic_convenience.la
+fi
+AC_SUBST(LIBATOMIC)
+
 # Used to tell GNU make to include a file without telling automake to
 # include it.
 go_include="-include"


Re: [DOC PATCH] Rewrite docs for inline asm

2014-05-29 Thread Eric Botcazou
> Fixed.

Thanks!

-- 
Eric Botcazou


Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb

2014-05-29 Thread Maciej W. Rozycki
On Wed, 28 May 2014, Richard Earnshaw wrote:

> Ah, light dawns (maybe).
> 
> I guess the problems stem from the attempts to combine Neon with ARMv5.
>  Neon shouldn't be used with anything prior to ARMv7, since that's the
> earliest version of the architecture that can support it.

 Good to know, thanks for the hint.  Anyway it's the test case doing 
something silly or maybe just odd.  After all IIUC ARMv5 code will run 
just fine on ARMv7/NEON hardware so mixing up ARMv5 scalar code with NEON 
vector code is nothing wrong per se.

> I guess that what is happening is that we see we have Neon, so start to
> generate a Neon-based copy sequence, but then notice that we don't have
> misaligned access (something that must exist if we have Neon) and
> generate VLDR instructions in a mistaken attempt to work around the
> first inconsistency.
> 
> Maybe we should tie -mfpu=neon to having at least ARMv7 (though ARMv6
> also has misaligned access support).

 So to move away from the odd mixture of instruction selection options 
just as a quick test I rebuilt the same file with `-march=armv7-a 
-mno-unaligned-access' and the result is the same, a pair of VLDR 
instructions accessing unaligned memory, i.e. the same problem.

 So based on observations made so far I think there are two sensible 
ways to move forward:

1. Fix GCC so that a manual byte-wise copy is made whenever 
  `-mno-unaligned-access' is in effect.

2. Revert the change being discussed here as its lone purpose was to 
   disable the use of VLD1.8, etc. where `-mno-unaligned-access' is in 
   effect, and it does no good.

  Maciej


run dsymutil post lto

2014-05-29 Thread Mike Stump
Jack finally found the answer to a question I had back in 2010…  Why, yes, one 
does have to arrange to run the post ld pass when lto runs but doesn’t have to 
relink.

Committed revision 211067.

Thanks Jack.

PR debug/61352
* collect2.c (maybe_run_lto_and_relink): Be sure to always run
post ld passes when lto is used.

Index: collect2.c
===
--- collect2.c  (revision 211062)
+++ collect2.c  (working copy)
@@ -848,6 +848,8 @@ maybe_run_lto_and_relink (char **lto_ld_
   fork_execute ("ld", lto_ld_argv);
   post_ld_pass (false);
 }
+  else
+post_ld_pass (true);
 }
 

 /* Main program.  */



[PATCH 1/4] Make coverage_compute_cfg_checksum callable with an argument

2014-05-29 Thread Martin Liška

Hello,
  this is a small patchset that prepares API for new IPA Identical code folding 
pass. The patch adds an argument for coverage_compute_cfg_checksum.

Bootstrapped and tested on x86_64-linux.
OK for trunk?

Thanks,
Martin

2014-05-29  Martin Liska  

* coverage.h (coverage_compute_cfg_checksum): Argument added.
* coverage.c (coverage_compute_cfg_checksum): Likewise.
* profile.c (branch_prob): Likewise.

diff --git a/gcc/coverage.c b/gcc/coverage.c
index 5e9005e..9d81387 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -594,12 +594,12 @@ coverage_compute_profile_id (struct cgraph_node *n)
but the compiler won't detect the change and use the wrong profile data.  */
 
 unsigned

-coverage_compute_cfg_checksum (void)
+coverage_compute_cfg_checksum (struct function *fn)
 {
   basic_block bb;
-  unsigned chksum = n_basic_blocks_for_fn (cfun);
+  unsigned chksum = n_basic_blocks_for_fn (fn);
 
-  FOR_EACH_BB_FN (bb, cfun)

+  FOR_EACH_BB_FN (bb, fn)
 {
   edge e;
   edge_iterator ei;
diff --git a/gcc/coverage.h b/gcc/coverage.h
index 81f87a6..392080c 100644
--- a/gcc/coverage.h
+++ b/gcc/coverage.h
@@ -32,8 +32,8 @@ extern int coverage_begin_function (unsigned, unsigned);
 /* Complete the coverage information for the current function.  */
 extern void coverage_end_function (unsigned, unsigned);
 
-/* Compute the control flow checksum for the current function.  */

-extern unsigned coverage_compute_cfg_checksum (void);
+/* Compute the control flow checksum for the FUNCTION given as argument.  */
+extern unsigned coverage_compute_cfg_checksum (struct function *);
 
 /* Compute the profile id of function N.  */

 extern unsigned coverage_compute_profile_id (struct cgraph_node *n);
diff --git a/gcc/profile.c b/gcc/profile.c
index 3282ee7..4e82eab 100644
--- a/gcc/profile.c
+++ b/gcc/profile.c
@@ -1195,7 +1195,7 @@ branch_prob (void)
  the checksum in only once place, since it depends on the shape
  of the control flow which can change during
  various transformations.  */
-  cfg_checksum = coverage_compute_cfg_checksum ();
+  cfg_checksum = coverage_compute_cfg_checksum (cfun);
   lineno_checksum = coverage_compute_lineno_checksum ();
 
   /* Write the data from which gcov can reconstruct the basic block

--
1.8.4.5




[PATCH 2/4] Enhancement of call graph API

2014-05-29 Thread Martin Liška

Hello,
   this patch enhances callgraph API to enable more precise control of 
expand_thunk; another function becomes global.

Bootstrapped and tested on x86_64-linux.
OK for trunk?

Thanks,
Martin

2014-05-29  Martin Liska  

* cgraph.h (expand_thunk): New argument added.
(address_taken_from_non_vtable_p): New global function.
* ipa-visibility.c (address_taken_from_non_vtable_p): Likewise.
* cgraphclones.c (duplicate_thunk_for_node): Argument added to call.
* cgraphunit.c (analyze_function): Likewise.
(assemble_thunks_and_aliases): Argument added to call.
(expand_thunk): New argument forces to produce GIMPLE thunk.

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index e5aa833..bfd3d91 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -911,7 +911,7 @@ void fixup_same_cpp_alias_visibility (symtab_node *, 
symtab_node *target, tree);
 IN_SSA is true if the gimple is in SSA.  */
 basic_block init_lowered_empty_function (tree, bool);
 void cgraph_reset_node (struct cgraph_node *);
-bool expand_thunk (struct cgraph_node *, bool);
+bool expand_thunk (struct cgraph_node *, bool, bool);
 
 /* In cgraphclones.c  */
 
@@ -956,6 +956,7 @@ void ipa_record_stmt_references (struct cgraph_node *, gimple);
 
 /* In ipa.c  */

 bool symtab_remove_unreachable_nodes (bool, FILE *);
+bool address_taken_from_non_vtable_p (symtab_node *node);
 cgraph_node_set cgraph_node_set_new (void);
 cgraph_node_set_iterator cgraph_node_set_find (cgraph_node_set,
   struct cgraph_node *);
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 4387b99..75eba6d 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -353,7 +353,7 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node 
*node,
  CGRAPH_FREQ_BASE);
   e->call_stmt_cannot_inline_p = true;
   cgraph_call_edge_duplication_hooks (thunk->callees, e);
-  if (!expand_thunk (new_thunk, false))
+  if (!expand_thunk (new_thunk, false, false))
 new_thunk->analyzed = true;
   cgraph_call_node_duplication_hooks (thunk, new_thunk);
   return new_thunk;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index e19b0a2..55bf378 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -610,7 +610,7 @@ analyze_function (struct cgraph_node *node)
 {
   cgraph_create_edge (node, cgraph_get_node (node->thunk.alias),
  NULL, 0, CGRAPH_FREQ_BASE);
-  if (!expand_thunk (node, false))
+  if (!expand_thunk (node, false, false))
{
  node->thunk.alias = NULL;
  node->analyzed = true;
@@ -1466,11 +1466,13 @@ thunk_adjust (gimple_stmt_iterator * bsi,
 }
 
 /* Expand thunk NODE to gimple if possible.

+   When FORCE_GIMPLE_THUNK is true, gimple thunk is created and
+   no assembler is produced.
When OUTPUT_ASM_THUNK is true, also produce assembler for
thunks that are not lowered.  */
 
 bool

-expand_thunk (struct cgraph_node *node, bool output_asm_thunks)
+expand_thunk (struct cgraph_node *node, bool output_asm_thunks, bool 
force_gimple_thunk)
 {
   bool this_adjusting = node->thunk.this_adjusting;
   HOST_WIDE_INT fixed_offset = node->thunk.fixed_offset;
@@ -1481,7 +1483,7 @@ expand_thunk (struct cgraph_node *node, bool 
output_asm_thunks)
   tree a;
 
 
-  if (this_adjusting

+  if (!force_gimple_thunk && this_adjusting
   && targetm.asm_out.can_output_mi_thunk (thunk_fndecl, fixed_offset,
  virtual_value, alias))
 {
@@ -1719,7 +1721,7 @@ assemble_thunks_and_aliases (struct cgraph_node *node)
 
 	e = e->next_caller;

assemble_thunks_and_aliases (thunk);
-expand_thunk (thunk, true);
+expand_thunk (thunk, true, false);
   }
 else
   e = e->next_caller;
diff --git a/gcc/ipa-visibility.c b/gcc/ipa-visibility.c
index dc22b2e..7886722 100644
--- a/gcc/ipa-visibility.c
+++ b/gcc/ipa-visibility.c
@@ -115,7 +115,7 @@ cgraph_local_node_p (struct cgraph_node *node)
 }
 
 /* Return true when there is a reference to node and it is not vtable.  */

-static bool
+bool
 address_taken_from_non_vtable_p (symtab_node *node)
 {
   int i;
--
1.8.4.5




[PATCH 3/4] New attribute lookup function addition

2014-05-29 Thread Martin Liška

Hi,
   this patch introduces a new function lookup_attribute_starting that can find 
all attributes starting with a specified string. Purpose of the function is to 
be able to identify e.g. if a function has any 'omp' attribute.

Bootstrapped and tested on x86_64-linux.
OK for trunk?

Thanks,
Martin

2014-05-29  Martin Liska  

* tree.h (private_lookup_attribute_starting): New function.
(lookup_attribute_starting): Likewise.
* tree.c (private_lookup_attribute_starting): Likewise.

diff --git a/gcc/tree.c b/gcc/tree.c
index cf7e362..9c6b68c 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -5758,6 +5758,37 @@ private_lookup_attribute (const char *attr_name, size_t 
attr_len, tree list)
   return list;
 }
 
+tree

+private_lookup_attribute_starting (const char *attr_name, size_t attr_len, 
tree list)
+{
+  while (list)
+{
+  size_t ident_len = IDENTIFIER_LENGTH (get_attribute_name (list));
+
+  if (attr_len > ident_len)
+   {
+ list = TREE_CHAIN (list);
+ continue;
+   }
+
+  const char *p = IDENTIFIER_POINTER (get_attribute_name (list));
+
+  if (strncmp (attr_name, p, attr_len) == 0)
+   break;
+
+  /* TODO: If we made sure that attributes were stored in the
+canonical form without '__...__' (ie, as in 'text' as opposed
+to '__text__') then we could avoid the following case.  */
+  if (p[0] == '_' && p[1] == '_' && strncmp (attr_name, p + 2, attr_len) 
== 0)
+   break;
+
+  list = TREE_CHAIN (list);
+}
+
+  return list;
+}
+
+
 /* A variant of lookup_attribute() that can be used with an identifier
as the first argument, and where the identifier can be either
'text' or '__text__'.
diff --git a/gcc/tree.h b/gcc/tree.h
index 9fe7360..3e1b113 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3731,6 +3731,10 @@ extern tree merge_type_attributes (tree, tree);
and you should never call it directly.  */
 extern tree private_lookup_attribute (const char *, size_t, tree);
 
+/* This function is a private implementation detail

+   of lookup_attribute_starting() and you should never call it directly.  */
+extern tree private_lookup_attribute_starting (const char *, size_t, tree);
+
 /* Given an attribute name ATTR_NAME and a list of attributes LIST,
return a pointer to the attribute's list element if the attribute
is part of the list, or NULL_TREE if not found.  If the attribute
@@ -3753,6 +3757,23 @@ lookup_attribute (const char *attr_name, tree list)
 return private_lookup_attribute (attr_name, strlen (attr_name), list);
 }
 
+/* Given an attribute name ATTR_NAME and a list of attributes LIST,

+   return a pointer to the attribute's list element if the attribute
+   starts with ATTR_NAME. ATTR_NAME must be in the form 'text' (not
+   '__text__').  */
+
+static inline tree
+lookup_attribute_starting (const char *attr_name, tree list)
+{
+  gcc_checking_assert (attr_name[0] != '_');
+  /* In most cases, list is NULL_TREE.  */
+  if (list == NULL_TREE)
+return NULL_TREE;
+  else
+return private_lookup_attribute_starting (attr_name, strlen (attr_name), 
list);
+}
+
+
 /* This function is a private implementation detail of
is_attribute_p() and you should never call it directly.  */
 extern bool private_is_attribute_p (const char *, size_t, const_tree);
--
1.8.4.5




[PATCH 4/4] New callgraph wrapper function creation added

2014-05-29 Thread Martin Liška

Hello,
   cgraph_make_wrapper is a new function that creates a simple wrapper of a 
function. The function will be used further in identical code folding if a pass 
proves that two functions are semantically equivalent.

Bootstrapped and tested on x86_64-linux.
OK for trunk?

Thanks,
Martin

2014-05-29  Martin Liska  

* cgraph.h (cgraph_make_wrapper): New function introduced.
* cgraphunit.c (cgraph_make_wrapper): The function implementation.
* ipa-inline.h (inline_analyze_function): The function is global.
* ipa-inline-analysis.c (inline_analyze_function): Likewise.

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index bfd3d91..15dc6fa 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -912,6 +912,8 @@ void fixup_same_cpp_alias_visibility (symtab_node *, 
symtab_node *target, tree);
 basic_block init_lowered_empty_function (tree, bool);
 void cgraph_reset_node (struct cgraph_node *);
 bool expand_thunk (struct cgraph_node *, bool, bool);
+void cgraph_make_wrapper (struct cgraph_node *source,
+ struct cgraph_node *target);
 
 /* In cgraphclones.c  */
 
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c

index 55bf378..7b40583 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2344,5 +2344,41 @@ finalize_compilation_unit (void)
   timevar_pop (TV_CGRAPH);
 }
 
+/* Creates a wrapper from SOURCE node to TARGET node. Thunk is used for this

+   kind of wrapper method.  */
+
+void
+cgraph_make_wrapper (struct cgraph_node *source, struct cgraph_node *target)
+{
+/* Preserve DECL_RESULT so we get right by reference flag.  */
+tree decl_result = DECL_RESULT (source->decl);
+
+/* Remove the function's body.  */
+cgraph_release_function_body (source);
+cgraph_reset_node (source);
+
+DECL_RESULT (source->decl) = decl_result;
+DECL_INITIAL (source->decl) = NULL;
+allocate_struct_function (source->decl, false);
+set_cfun (NULL);
+
+/* Turn alias into thunk and expand it into GIMPLE representation.  */
+source->definition = true;
+source->thunk.thunk_p = true;
+source->thunk.this_adjusting = false;
+
+struct cgraph_edge *e = cgraph_create_edge (source, target, NULL, 0,
+   CGRAPH_FREQ_BASE);
+
+if (!expand_thunk (source, false, true))
+  source->analyzed = true;
+
+e->call_stmt_cannot_inline_p = true;
+
+/* Inline summary set-up.  */
+
+analyze_function (source);
+inline_analyze_function (source);
+}
 
 #include "gt-cgraphunit.h"

diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 2bb3759..c50a722 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -3960,7 +3960,7 @@ inline_indirect_intraprocedural_analysis (struct 
cgraph_node *node)
 
 /* Note function body size.  */
 
-static void

+void
 inline_analyze_function (struct cgraph_node *node)
 {
   push_cfun (DECL_STRUCT_FUNCTION (node->decl));
diff --git a/gcc/ipa-inline.h b/gcc/ipa-inline.h
index bddf29a..15e9d1c 100644
--- a/gcc/ipa-inline.h
+++ b/gcc/ipa-inline.h
@@ -216,6 +216,7 @@ void inline_generate_summary (void);
 void inline_read_summary (void);
 void inline_write_summary (void);
 void inline_free_summary (void);
+void inline_analyze_function (struct cgraph_node *node);
 void initialize_inline_failed (struct cgraph_edge *);
 int estimate_time_after_inlining (struct cgraph_node *, struct cgraph_edge *);
 int estimate_size_after_inlining (struct cgraph_node *, struct cgraph_edge *);
--
1.8.4.5