Re: PR middle-end/35535 part I

2013-12-18 Thread Jan Hubicka
> On 12/17/13 23:53, Tobias Burnus wrote:
> >Am 17.12.2013 21:56, schrieb Jeff Law:
> >>>* tree-vrp.c (extract_range_from_unary_expr_1): Add OBJ_TYPE_REF
> >>s/Add/Handle.  Please add the PR marker as well.
> >>
> >>OK with that trivial nit.
> >
> >And the proper PR. I don't think that INVALID C++ PR is the PR you want
> >to refer to.
> Yea, I mentioned that for the part II patch.  The right number is
> 35545 I think.

I see, now I understand the comment.  I fixed the changelog.

Honza
> 
> jeff


Re: [patch] Fix PR debug/59418

2013-12-18 Thread Eric Botcazou
> Is it really a good idea to put the XVECLEN into the loop condition?
> I mean, the functions that are called in the loop are unlikely pure
> and thus the length will need to be uselessly reread for each iteration.

I'm not sure we are supposed to care about this kind of micro-optimization in 
the compiler but...

> My preference would be to keep the limit hoisted manually before the loop.

...OK, I'll leave it alone.

> Otherwise looks good to me.

Thanks.

-- 
Eric Botcazou


RFA: Fix test pr32912-2.c for 16-bit targets

2013-12-18 Thread Nick Clifton
Hi Guys,

  The test gcc/testsuite/gcc.dg/pr32912-2.c fails to execute correctly
  on targets that use 16-bit integers, because it assumes at least a
  32-bit integer.  The patch below removes this assumption, and it also
  tidies up the code slightly so that __SIZEOF_INT__ is only tested in
  one place, not three.

  There were no regressions when tested with a i686-pc-linux-gnu or a
  x86_64-pc-linux-gnu toolchain, and the test was fixed for a rl78-elf
  toolchain.

  OK to apply ?

Cheers
  Nick

gcc/testsuite/ChangeLog
2013-12-18  Nick Clifton  

* gcc.dg/pr32912-2.c: Fix for 16-bit targets.

Index: gcc/testsuite/gcc.dg/pr32912-2.c
===
--- gcc/testsuite/gcc.dg/pr32912-2.c(revision 206082)
+++ gcc/testsuite/gcc.dg/pr32912-2.c(working copy)
@@ -1,14 +1,24 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -w" } */
-/* { dg-skip-if "TImode not supported" { "avr-*-*" } { "*" } { "" } } */
 
 extern void abort (void);
 
 #if(__SIZEOF_INT__ >= 4)
-typedef int __m128i __attribute__ ((__vector_size__ (16)));
+# define TYPE  int
+# define TYPED(a)  a
+
+#elif(__SIZEOF_INT__ > 2)
+# define TYPE  long
+# define TYPED(a)  a##L
+
 #else
-typedef long __m128i __attribute__ ((__vector_size__ (16)));
+# define TYPE  long long
+# define TYPED(a)  a##LL
 #endif
+
+
+typedef TYPE __m128i __attribute__ ((__vector_size__ (16)));
+
 __m128i
 foo (void)
 {
@@ -26,11 +36,7 @@
 int
 main (void)
 {
-#if(__SIZEOF_INT__ >= 4)
-  union { __m128i v; int i[sizeof (__m128i) / sizeof (int)]; } u, v;
-#else
-  union { __m128i v; long i[sizeof (__m128i) / sizeof (long)]; } u, v;
-#endif
+  union { __m128i v; TYPE i[sizeof (__m128i) / sizeof (TYPE)]; } u, v;
   int i;
 
   u.v = foo ();
@@ -39,9 +45,10 @@
 {
   if (u.i[i] != ~v.i[i])
abort ();
+
   if (i < 3)
{
- if (u.i[i] != (0x << i))
+ if (u.i[i] != (TYPED (0x) << i))
abort ();
}
   else if (u.i[i])


Re: [ARM] Fix register r3 wrongly used to save ip in nested APCS frame

2013-12-18 Thread Eric Botcazou
> Revised patch attached, your testcase passes on arm-eabi with it.  Does it
> look OK to you?  If so, I'll run a testing cycle on arm-vxworks and
> arm-eabi.
> 
> 
>   * config/arm/arm.c (arm_expand_prologue): In a nested APCS frame with
>   arguments to push onto the stack and no varargs, save ip into the last
>   stack slot if r3 isn't available on entry.

It would be nice if we could settle this before Christmas. ;-)

-- 
Eric Botcazou


Re: [PATCH, i386 testsuite]: Fix -mabi=ms related failures for -mtune=corei7

2013-12-18 Thread Rainer Orth
Uros Bizjak  writes:

> On Sun, Dec 15, 2013 at 1:14 PM, Dominique Dhumieres  
> wrote:
>>> OTOH, I can't test darwin properly, please provide the patch and I'll
>>> commit it for you.
>>
>> Basically the patch I have in my tree since the PR replace 'linux' with
>> *' (see below).
>> Since I can only test darwin, there is no guarantee that the tests pass
>> on non-linux,
>> non-darwin platforms. So if you apply the patch below as such, it will be
>> necessary to
>> watch out for fall-out.
>
> Let's ask Rainer for help with x86 solaris.

I've just applied Dominique's patch to r206061 and ran the four changed
tests on i386-pc-solaris2.10: all of them passed, so from my POV the
change is fine.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, i386 testsuite]: Fix -mabi=ms related failures for -mtune=corei7

2013-12-18 Thread Uros Bizjak
On Wed, Dec 18, 2013 at 11:44 AM, Rainer Orth
 wrote:

>> On Sun, Dec 15, 2013 at 1:14 PM, Dominique Dhumieres  
>> wrote:
 OTOH, I can't test darwin properly, please provide the patch and I'll
 commit it for you.
>>>
>>> Basically the patch I have in my tree since the PR replace 'linux' with
>>> *' (see below).
>>> Since I can only test darwin, there is no guarantee that the tests pass
>>> on non-linux,
>>> non-darwin platforms. So if you apply the patch below as such, it will be
>>> necessary to
>>> watch out for fall-out.
>>
>> Let's ask Rainer for help with x86 solaris.
>
> I've just applied Dominique's patch to r206061 and ran the four changed
> tests on i386-pc-solaris2.10: all of them passed, so from my POV the
> change is fine.

So, the proposed patch (with proper ChangeLog) is OK and preapproved.

Thanks,
Uros.


[patch] Cleaning after the gfortran test suite

2013-12-18 Thread Dominique d'Humières
This patch extend the cleaning proposed in
http://gcc.gnu.org/ml/fortran/2013-10/msg00083.html to opened files.  The
patch is mostly obvious except for gfortran.dg/open_negative_unit_1.f90 for
which I assumed that the second OPEN closes the file foo.txt without
deleting it (and that it is the intended behavior).  I have also added the
INQUIRE as a test for PR59419.

After this patch there is no left file in the gcc/testsuite/gfortran
directory. Note that the files m.mod or fort.10 are generated by several 
tests. I am pretty sure that some of them do not do the cleaning, but I 
did not have tha patience to look for them.

OK?

If yes, I don't have write access (while I have the FSF papers signed) and 
someone will have to do the commit for me.

TIA

Dominique

PS Darwin users whould run
rm -rf gcc/testsuite/gfortran*/*.dSYM before checking for left files.
For those interested I have a patch for doing the cleaning automaticaly.


CL
Description: Binary data


patch-clean
Description: Binary data


Re: Fix PR58477, part II

2013-12-18 Thread Richard Biener
On Mon, Dec 16, 2013 at 2:58 PM, Jan Hubicka  wrote:
>> Hi,
>>
>> On Sat, Dec 14, 2013 at 11:01:53PM +0100, Jan Hubicka wrote:
>> > Hi,
>> > this patch makes stmt_may_be_vtbl_ptr_store to skip clobbers as discussed
>> > previously.
>>
>> This is the first time I hear about this but the change is obviously
>> OK, thanks.
>
> It actually seems to solve quite number of false positives, not exactly sure 
> why.
>>
>> > Martin, I do not fully follow the logic of this function.  Can't we just
>> > ignore all stores that are not of pointer type that looks like vptr
>> > type?  All those tores are compiler generated and we probably can just
>> > annotate them, right?
>>
>> Richard wanted me to be quite conservative here and to a big extent I
>> can see his point.  The AGGREGATE_TYPE_P is there because copies of
>> the whole object could change the vptr in an unpredictable ways.  I
>> doubt that this is what copy-constructors get expanded to but gimple
>> does not provide any necessary guarantees.  Perhaps we could come up
>> with an elaborate explanation why it cannot happen in valid C++ just
>> as I did for the call statement (and that is written in the comment
>> above the function) and then we could remove it.
>
> OK, so the explanation is not as simple as claim that non-POD types
> needs to be constructed or copied by constructor and C++ FE always
> generate an explicit vtbl store?

No as optimizers may combine stores for example.

>> Of course, having gimple defined and required annotation for vptr
>> changes would be much better but then of course all transformations
>> would have to make sure they preserve it.
>>
>> IIRC, flag_strict_aliasing test was explicitely Richard's request,
>> perhaps we could restrict the pointer type test further.
>>
>> Does that answer your question?
>
> Sort of, yes.  We should make some analysis how effective current
> methods are (i.e. disabling it and checking how much devirtualization
> improve for firefox) and if we find they seem insufficient, we probably
> should think of better analysis or annotation...
>
> Honza


Re: Vectorization for store with negative step

2013-12-18 Thread Richard Biener
On Mon, Dec 16, 2013 at 5:54 PM, Bingfeng Mei  wrote:
> Hi,
> I was looking at some loops that can be vectorized by LLVM, but not GCC. One 
> type of loop is with store of negative step.
>
> void test1(short * __restrict__ x, short * __restrict__ y, short * 
> __restrict__ z)
> {
> int i;
> for (i=127; i>=0; i--) {
> x[i] = y[127-i] + z[127-i];
> }
> }
>
> I don't know why GCC only implements negative step for load, but not store. I 
> implemented a patch, very similar to code in vectorizable_load.
>
> ~/scratch/install-x86/bin/gcc ghs-dec.c -ftree-vectorize -S -O2 -mavx
>
> Without patch:
> test1:
> .LFB0:
> addq$254, %rdi
> xorl%eax, %eax
> .p2align 4,,10
> .p2align 3
> .L2:
> movzwl  (%rsi,%rax), %ecx
> subq$2, %rdi
> addw(%rdx,%rax), %cx
> addq$2, %rax
> movw%cx, 2(%rdi)
> cmpq$256, %rax
> jne .L2
> rep; ret
>
> With patch:
> test1:
> .LFB0:
> vmovdqa .LC0(%rip), %xmm1
> xorl%eax, %eax
> .p2align 4,,10
> .p2align 3
> .L2:
> vmovdqu (%rsi,%rax), %xmm0
> movq%rax, %rcx
> negq%rcx
> vpaddw  (%rdx,%rax), %xmm0, %xmm0
> vpshufb %xmm1, %xmm0, %xmm0
> addq$16, %rax
> cmpq$256, %rax
> vmovups %xmm0, 240(%rdi,%rcx)
> jne .L2
> rep; ret
>
> Performance is definitely improved here. It is bootstrapped for 
> x86_64-unknown-linux-gnu, and has no additional regressions on my machine.
>
> For reference, LLVM seems to use different instructions and slightly worse 
> code. I am not so familiar with x86 assemble code. The patch is originally 
> for our private port.
> test1:  # @test1
> .cfi_startproc
> # BB#0: # %entry
> addq$240, %rdi
> xorl%eax, %eax
> .align  16, 0x90
> .LBB0_1:# %vector.body
> # =>This Inner Loop Header: Depth=1
> movdqu  (%rsi,%rax,2), %xmm0
> movdqu  (%rdx,%rax,2), %xmm1
> paddw   %xmm0, %xmm1
> shufpd  $1, %xmm1, %xmm1# xmm1 = xmm1[1,0]
> pshuflw $27, %xmm1, %xmm0   # xmm0 = xmm1[3,2,1,0,4,5,6,7]
> pshufhw $27, %xmm0, %xmm0   # xmm0 = xmm0[0,1,2,3,7,6,5,4]
> movdqu  %xmm0, (%rdi)
> addq$8, %rax
> addq$-16, %rdi
> cmpq$128, %rax
> jne .LBB0_1
> # BB#2: # %for.end
> ret
>
> Any comment?

Looks good to me.  One of the various TODOs in vectorizable_store I presume.

Needs a testcase and at this stage a bugreport that is fixed by it.

Thanks,
Richard.

> Bingfeng Mei
> Broadcom UK
>
>


Re: [patch] remove docs for SSA_OP_VMAYUSE and other doc/tree-ssa.texi cleanups

2013-12-18 Thread Richard Biener
On Mon, Dec 16, 2013 at 6:59 PM, Aldy Hernandez  wrote:
> While debugging something completely unrelated I ran into this...
>
> SSA_OP_VMAYUSE hasn't been around for a looong time.  For that matter, I
> think it was even me that got rid of it, so this was an sloppy oversight on
> my part.
>
> I also noticed that the SSA_OP* definitions in the documentation have
> diverged from the source.  I'm not a big fan of source embedded in the
> documentation, but if we must, then at least let's put a `NOTE' in the
> source to keep the .texi file up to date.  I've done this.
>
> I also cleaned up a small inconsistency in the comment preceding the macro
> definition of FOR_EACH_IMM_USE_STMT.
>
> Tested on x86-64 Linux.
>
> OK?

Ok.

Thanks,
Richard.


RE: Vectorization for store with negative step

2013-12-18 Thread Bingfeng Mei
Thanks, Richard. I will file a bug report and prepare a complete patch. For 
perm_mask_for_reverse function, should I move it before vectorizable_store or 
add a declaration. 


Bingfeng
-Original Message-
From: Richard Biener [mailto:richard.guent...@gmail.com] 
Sent: 18 December 2013 11:26
To: Bingfeng Mei
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Vectorization for store with negative step

On Mon, Dec 16, 2013 at 5:54 PM, Bingfeng Mei  wrote:
> Hi,
> I was looking at some loops that can be vectorized by LLVM, but not GCC. One 
> type of loop is with store of negative step.
>
> void test1(short * __restrict__ x, short * __restrict__ y, short * 
> __restrict__ z)
> {
> int i;
> for (i=127; i>=0; i--) {
> x[i] = y[127-i] + z[127-i];
> }
> }
>
> I don't know why GCC only implements negative step for load, but not store. I 
> implemented a patch, very similar to code in vectorizable_load.
>
> ~/scratch/install-x86/bin/gcc ghs-dec.c -ftree-vectorize -S -O2 -mavx
>
> Without patch:
> test1:
> .LFB0:
> addq$254, %rdi
> xorl%eax, %eax
> .p2align 4,,10
> .p2align 3
> .L2:
> movzwl  (%rsi,%rax), %ecx
> subq$2, %rdi
> addw(%rdx,%rax), %cx
> addq$2, %rax
> movw%cx, 2(%rdi)
> cmpq$256, %rax
> jne .L2
> rep; ret
>
> With patch:
> test1:
> .LFB0:
> vmovdqa .LC0(%rip), %xmm1
> xorl%eax, %eax
> .p2align 4,,10
> .p2align 3
> .L2:
> vmovdqu (%rsi,%rax), %xmm0
> movq%rax, %rcx
> negq%rcx
> vpaddw  (%rdx,%rax), %xmm0, %xmm0
> vpshufb %xmm1, %xmm0, %xmm0
> addq$16, %rax
> cmpq$256, %rax
> vmovups %xmm0, 240(%rdi,%rcx)
> jne .L2
> rep; ret
>
> Performance is definitely improved here. It is bootstrapped for 
> x86_64-unknown-linux-gnu, and has no additional regressions on my machine.
>
> For reference, LLVM seems to use different instructions and slightly worse 
> code. I am not so familiar with x86 assemble code. The patch is originally 
> for our private port.
> test1:  # @test1
> .cfi_startproc
> # BB#0: # %entry
> addq$240, %rdi
> xorl%eax, %eax
> .align  16, 0x90
> .LBB0_1:# %vector.body
> # =>This Inner Loop Header: Depth=1
> movdqu  (%rsi,%rax,2), %xmm0
> movdqu  (%rdx,%rax,2), %xmm1
> paddw   %xmm0, %xmm1
> shufpd  $1, %xmm1, %xmm1# xmm1 = xmm1[1,0]
> pshuflw $27, %xmm1, %xmm0   # xmm0 = xmm1[3,2,1,0,4,5,6,7]
> pshufhw $27, %xmm0, %xmm0   # xmm0 = xmm0[0,1,2,3,7,6,5,4]
> movdqu  %xmm0, (%rdi)
> addq$8, %rax
> addq$-16, %rdi
> cmpq$128, %rax
> jne .LBB0_1
> # BB#2: # %for.end
> ret
>
> Any comment?

Looks good to me.  One of the various TODOs in vectorizable_store I presume.

Needs a testcase and at this stage a bugreport that is fixed by it.

Thanks,
Richard.

> Bingfeng Mei
> Broadcom UK
>
>


Re: RFC Asan instrumentation control

2013-12-18 Thread Maxim Ostapenko

Hi all,

On 12/06/2013 05:32 PM, Yury Gribov wrote:


So it looks like people are generally ok with
* --param asan-instrument-reads=0/1
* --param asan-instrument-writes=0/1
* --param asan-stack=0/1
* --param asan-globals=0/1

I've implemented these options. Tested on x86_64.

* --param asan-memintrin=0/1
but not with blacklists (which is sad but understandable).

-Y


This one will be implemented in future.

I've also added 4 new testfiles to test new options.

Can you review this patch, please?

-Maxim
2013-12-18  Max Ostapenko  

	* gcc/asan.c (asan_emit_stack_protection): Optionally disable stack protection.
	(instrument_derefs): Optionally disable memory access instrumentation.
	(instrument_mem_region_access): Likewise.
	(instrument_strlen_call): Likewise.
	(asan_finish_file): Optionally disable global variables protection.
	* gcc/doc/invoke.texi: Added doc for new options.
	* gcc/params.def: Added new options.
	* gcc/params.h: Likewise.

2013-12-18  Max Ostapenko  
	* c-c++-common/asan/global-overflow-2.c: New test.
	* c-c++-common/asan/memcmp-3.c: Likewise.
	* c-c++-common/asan/no-instrument-reads.c: Likewise.
	* c-c++-common/asan/no-instrument-writes.c: Likewise.

diff --git a/gcc/asan.c b/gcc/asan.c
index 1394e13..1b8d0c2 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-builder.h"
 #include "ubsan.h"
 #include "predict.h"
+#include "params.h"
 
 /* AddressSanitizer finds out-of-bounds and use-after-free bugs
with <2x slowdown on average.
@@ -963,6 +964,9 @@ rtx
 asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
 			HOST_WIDE_INT *offsets, tree *decls, int length)
 {
+  if (!ASAN_STACK)
+return NULL_RTX;
+
   rtx shadow_base, shadow_mem, ret, mem, orig_base, lab;
   char buf[30];
   unsigned char shadow_bytes[4];
@@ -1568,6 +1572,11 @@ static void
 instrument_derefs (gimple_stmt_iterator *iter, tree t,
 		   location_t location, bool is_store)
 {
+  if (is_store && !ASAN_INSTRUMENT_WRITES)
+return;
+  if (!is_store && !ASAN_INSTRUMENT_READS)
+return;
+
   tree type, base;
   HOST_WIDE_INT size_in_bytes;
 
@@ -1662,6 +1671,11 @@ instrument_mem_region_access (tree base, tree len,
 			  gimple_stmt_iterator *iter,
 			  location_t location, bool is_store)
 {
+  if (is_store && !ASAN_INSTRUMENT_WRITES)
+return;
+  if (!is_store && !ASAN_INSTRUMENT_READS)
+return;
+
   if (!POINTER_TYPE_P (TREE_TYPE (base))
   || !INTEGRAL_TYPE_P (TREE_TYPE (len))
   || integer_zerop (len))
@@ -1825,6 +1839,9 @@ instrument_mem_region_access (tree base, tree len,
 static bool
 instrument_strlen_call (gimple_stmt_iterator *iter)
 {
+  if (!ASAN_INSTRUMENT_READS)
+return false;
+
   gimple call = gsi_stmt (*iter);
   gcc_assert (is_gimple_call (call));
 
@@ -2396,7 +2413,7 @@ asan_finish_file (void)
   ++gcount;
   htab_t const_desc_htab = constant_pool_htab ();
   htab_traverse (const_desc_htab, count_string_csts, &gcount);
-  if (gcount)
+  if (gcount && ASAN_GLOBALS)
 {
   tree type = asan_global_struct (), var, ctor;
   tree dtor_statements = NULL_TREE;
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 99ec1d2..d1f20a9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10037,6 +10037,26 @@ The default choice depends on the target.
 Set the maximum number of existing candidates that will be considered when
 seeking a basis for a new straight-line strength reduction candidate.
 
+@item asan-globals
+Enable overflow/underflow detection for global objects. This kind of protection
+is enabled by default if you are using @option{-fsanitize=address} option.
+To disable global objects protection use @option{--param asan-globals=0} option.
+
+@item asan-stack
+Enable overflow/underflow detection for stack objects. This kind of protection
+is enabled by default if you are using @option{-fsanitize=address} option.
+To disable stack protection use @option{--param asan-stack=0} option.
+
+@item asan-instrument-reads
+Enable overflow/underflow detection for memory reads instructions. This kind of protection
+is enabled by default if you are using @option{-fsanitize=address} option.
+To disable memory reads instructions protection use @option{--param asan-instrument-reads=0} option.
+
+@item asan-instrument-writes
+Enable overflow/underflow detection for memory writes instructions. This kind of protection
+is enabled by default if you are using @option{-fsanitize=address} option.
+To disable memory writes instructions protection use @option{--param asan-instrument-writes=0} option.
+
 @end table
 @end table
 
diff --git a/gcc/params.def b/gcc/params.def
index c0f9622..aea5f41 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1049,6 +1049,26 @@ DEFPARAM (PARAM_MAX_SLSR_CANDIDATE_SCAN,
 	  "strength reduction",
 	  50, 1, 99)
 
+DEFPARAM(PARAM_ASAN_STACK,
+ "asan-stack",
+ "Enable asan stack protection",
+ 1, 0, 1)
+
+DEFP

Re: [Patch,avr]: Fix wrong warning PR59396

2013-12-18 Thread Richard Biener
On Tue, Dec 17, 2013 at 2:05 PM, Georg-Johann Lay  wrote:
> Am 12/05/2013 04:09 PM, schrieb Richard Biener:
>>
>> On Thu, Dec 5, 2013 at 3:53 PM, Georg-Johann Lay  wrote:
>>
>>> This is a fix of a wrong warning for a bas ISR name.  The assumption was
>>> that if DECL_ASSEMBLER_NAME is set, it would always starts with a *.
>>>
>>> This is not the case for LTO compiler where the assembler name is the
>>> plain
>>> name of the function (except an assembler name is set).
>>
>>
>> That sounds odd to me.  Does the bug reproduce with -fwhole-program?
>> Or if the interrupt handler is static?
>
>
> Hi, I tried to debug lto1.
>
> What I see is that SET_DECL_ASSEMBLER_NAME with "__vector_14", i.e. without
> a leading '*', is called from
>
> tree-streamer-in.c:lto_input_ts_decl_with_vis_tree_pointers().
>
> Hope that helps in narrowing down the issue.

You need to debug the LTO IL creating process (cc1) then - this code merely
restores what the compile-stage assigned the assembler name to.  See
tree.c:assign_assembler_name_if_needed.

Richard.

> Johann
>
>
>> Richard.
>>
>>> Thus, do a more restrictive test if the first character of the function
>>> name
>>> has to be skipped.
>>>
>>> Ok to commit?
>>>
>>> Johann
>>>
>>>  PR target/59396
>>>  * config/avr/avr.c (avr_set_current_function): If the first char
>>>  of the function name is skipped, make sure it is actually '*'.
>
>


Re: [patch] fix .DOT file generation for IPA passes

2013-12-18 Thread Richard Biener
On Tue, Dec 17, 2013 at 9:10 PM, Aldy Hernandez  wrote:
> I was trying to generate a graph file with
> -fdump-ipa-tmipa-blocks-details-vops-graph, but the .dot file was corrupted.
> It looks like the header bits printed in start_graph_dump() are not dumped
> because we are predicating the calls to
> clean_graph_dump_file->start_graph_dump by:
>
>   && cfun && (cfun->curr_properties & PROP_cfg))
>
> The problem is that for IPA passes (well at least for tmipa) cfun is NULL so
> we don't initialize the dump file, but later we go through each function
> (setting cfun appropriately) and dump the corresponding graphs somewhere in:
>
> do_per_function (execute_function_dump, NULL);
>
> I have fixed this by adding a bit in opt_pass to keep track of if a graph
> .DOT file has been initialized, and initialize it if not.  I suppose we
> could move initialization of the graph file further down, but that seemed a
> bit more tedious given all the places where we dump.
>
> OK?

Ok.

Thanks,
Richard.


Re: Vectorization for store with negative step

2013-12-18 Thread Richard Biener
On Wed, Dec 18, 2013 at 12:34 PM, Bingfeng Mei  wrote:
> Thanks, Richard. I will file a bug report and prepare a complete patch. For 
> perm_mask_for_reverse function, should I move it before vectorizable_store or 
> add a declaration.

Move it.

Richard.

>
> Bingfeng
> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: 18 December 2013 11:26
> To: Bingfeng Mei
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: Vectorization for store with negative step
>
> On Mon, Dec 16, 2013 at 5:54 PM, Bingfeng Mei  wrote:
>> Hi,
>> I was looking at some loops that can be vectorized by LLVM, but not GCC. One 
>> type of loop is with store of negative step.
>>
>> void test1(short * __restrict__ x, short * __restrict__ y, short * 
>> __restrict__ z)
>> {
>> int i;
>> for (i=127; i>=0; i--) {
>> x[i] = y[127-i] + z[127-i];
>> }
>> }
>>
>> I don't know why GCC only implements negative step for load, but not store. 
>> I implemented a patch, very similar to code in vectorizable_load.
>>
>> ~/scratch/install-x86/bin/gcc ghs-dec.c -ftree-vectorize -S -O2 -mavx
>>
>> Without patch:
>> test1:
>> .LFB0:
>> addq$254, %rdi
>> xorl%eax, %eax
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> movzwl  (%rsi,%rax), %ecx
>> subq$2, %rdi
>> addw(%rdx,%rax), %cx
>> addq$2, %rax
>> movw%cx, 2(%rdi)
>> cmpq$256, %rax
>> jne .L2
>> rep; ret
>>
>> With patch:
>> test1:
>> .LFB0:
>> vmovdqa .LC0(%rip), %xmm1
>> xorl%eax, %eax
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> vmovdqu (%rsi,%rax), %xmm0
>> movq%rax, %rcx
>> negq%rcx
>> vpaddw  (%rdx,%rax), %xmm0, %xmm0
>> vpshufb %xmm1, %xmm0, %xmm0
>> addq$16, %rax
>> cmpq$256, %rax
>> vmovups %xmm0, 240(%rdi,%rcx)
>> jne .L2
>> rep; ret
>>
>> Performance is definitely improved here. It is bootstrapped for 
>> x86_64-unknown-linux-gnu, and has no additional regressions on my machine.
>>
>> For reference, LLVM seems to use different instructions and slightly worse 
>> code. I am not so familiar with x86 assemble code. The patch is originally 
>> for our private port.
>> test1:  # @test1
>> .cfi_startproc
>> # BB#0: # %entry
>> addq$240, %rdi
>> xorl%eax, %eax
>> .align  16, 0x90
>> .LBB0_1:# %vector.body
>> # =>This Inner Loop Header: Depth=1
>> movdqu  (%rsi,%rax,2), %xmm0
>> movdqu  (%rdx,%rax,2), %xmm1
>> paddw   %xmm0, %xmm1
>> shufpd  $1, %xmm1, %xmm1# xmm1 = xmm1[1,0]
>> pshuflw $27, %xmm1, %xmm0   # xmm0 = xmm1[3,2,1,0,4,5,6,7]
>> pshufhw $27, %xmm0, %xmm0   # xmm0 = xmm0[0,1,2,3,7,6,5,4]
>> movdqu  %xmm0, (%rdi)
>> addq$8, %rax
>> addq$-16, %rdi
>> cmpq$128, %rax
>> jne .LBB0_1
>> # BB#2: # %for.end
>> ret
>>
>> Any comment?
>
> Looks good to me.  One of the various TODOs in vectorizable_store I presume.
>
> Needs a testcase and at this stage a bugreport that is fixed by it.
>
> Thanks,
> Richard.
>
>> Bingfeng Mei
>> Broadcom UK
>>
>>


[PATCH][ARM] Add new cores to t-aprofile

2013-12-18 Thread Kyrill Tkachov

Hi all,

This patch adds the recently introduced cores to the t-aprofile multilib 
machinery. The values added are cortex-a15.cortex-a7, cortex-a12, cortex-a57 and 
cortex-a57.cortex-a53.


Tested arm-none-eabi on qemu and model.

Ok for trunk?

Thanks,
Kyrill


2013-12-18  James Greenhalgh  
Kyrylo Tkachov  

* config/arm/t-aprofile: Add cortex-a15.cortex-a7, cortex-a12,
cortex-a57, cortex-a57.cortex-a53.diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index ce45d4d..5c5ee0c 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -83,8 +83,11 @@ MULTILIB_EXCEPTIONS+= *mcpu=cortex-a7/*mfpu=neon-fp-armv8*
 MULTILIB_MATCHES   += march?armv7-a=mcpu?cortex-a8
 MULTILIB_MATCHES   += march?armv7-a=mcpu?cortex-a9
 MULTILIB_MATCHES   += march?armv7-a=mcpu?cortex-a5
-MULTILIB_MATCHES   += mcpu?cortex-a7=mcpu?cortex-a15
+MULTILIB_MATCHES   += mcpu?cortex-a7=mcpu?cortex-a15=mcpu?cortex-a12
+MULTILIB_MATCHES   += mcpu?cortex-a7=mcpu?cortex-a15.cortex-a7
 MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a53
+MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a57
+MULTILIB_MATCHES   += march?armv8-a=mcpu?cortex-a57.cortex-a53
 
 # FPU matches
 MULTILIB_MATCHES   += mfpu?vfpv3-d16=mfpu?vfpv3

Re: RFC Asan instrumentation control

2013-12-18 Thread Jakub Jelinek
On Wed, Dec 18, 2013 at 03:35:35PM +0400, Maxim Ostapenko wrote:
> 2013-12-18  Max Ostapenko  
> 
>   * gcc/asan.c (asan_emit_stack_protection): Optionally disable stack 
> protection.
>   (instrument_derefs): Optionally disable memory access instrumentation.
>   (instrument_mem_region_access): Likewise.
>   (instrument_strlen_call): Likewise.
>   (asan_finish_file): Optionally disable global variables protection.
>   * gcc/doc/invoke.texi: Added doc for new options.
>   * gcc/params.def: Added new options.
>   * gcc/params.h: Likewise.

No gcc/ prefixes in ChangeLog entries.

> 2013-12-18  Max Ostapenko  
>   * c-c++-common/asan/global-overflow-2.c: New test.
>   * c-c++-common/asan/memcmp-3.c: Likewise.
>   * c-c++-common/asan/no-instrument-reads.c: Likewise.
>   * c-c++-common/asan/no-instrument-writes.c: Likewise.
> 
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-builder.h"
>  #include "ubsan.h"
>  #include "predict.h"
> +#include "params.h"
>  
>  /* AddressSanitizer finds out-of-bounds and use-after-free bugs
> with <2x slowdown on average.
> @@ -963,6 +964,9 @@ rtx
>  asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb,
>   HOST_WIDE_INT *offsets, tree *decls, int length)
>  {
> +  if (!ASAN_STACK)
> +return NULL_RTX;

This is a wrong spot to do this.  Instead put it into the
if ((flag_sanitize & SANITIZE_ADDRESS) && pred)
condition in cfgexpand.c (and maybe also 
if ((flag_sanitize & SANITIZE_ADDRESS) && isize != jsize ...)
too, maybe all four flag_sanitize & SANITIZE_ADDRESS occurrences in
cfgexpand.c.

> @@ -2396,7 +2413,7 @@ asan_finish_file (void)
>++gcount;
>htab_t const_desc_htab = constant_pool_htab ();
>htab_traverse (const_desc_htab, count_string_csts, &gcount);
> -  if (gcount)
> +  if (gcount && ASAN_GLOBALS)
>  {
>tree type = asan_global_struct (), var, ctor;
>tree dtor_statements = NULL_TREE;

I'd say this isn't sufficient, for !ASAN_GLOBALS you should also make sure
asan_protect_global always returns false, so that no extra padding is emitted
around the global vars.

> +@item asan-stack
> +Enable overflow/underflow detection for stack objects. This kind of 
> protection
> +is enabled by default if you are using @option{-fsanitize=address} option.
> +To disable stack protection use @option{--param asan-stack=0} option.

Talking about this, perhaps there should be also
--param asan-use-after-return=0
knob to disallow the support for use-after-return checking (in 4.8 this
didn't exist, in 4.9 there is some extra runtime code emitted, but still one
needs to enable it manually through environment variable).  With that param
we would emit pretty much what 4.8 did, i.e. assume that use-after-return
will not be enabled in the runtime.

Jakub


[AArch64 1/3 big.LITTLE] Driver rewriting of big.LITTLE names.

2013-12-18 Thread James Greenhalgh

Hi,

As in the ARM backend we would like to use specs to rewrite a
-mcpu=big.LITTLE style name to just -mcpu=big before handing off
to the assembler.

As with the ARM backend, we can do this using ASM_SPEC. We need to
do a little more wiring up here, as the AArch64 backend doesn't
yet use these specs.

Tested on aarch64-none-elf with no regressions, built on
aarch64-none-linux-gnu with no problems.

OK?

Thanks,
James

---
2013-12-18  James Greenhalgh  

* common/config/aarch64/aarch64-common.c
(aarch64_rewrite_selected_cpu): New.
(aarch64_rewrite_mcpu): New.
* config/aarch64/aarch64-protos.h
(aarch64_rewrite_selected_cpu): New.
* config/aarch64/aarch64.h (BIG_LITTLE_SPEC): New.
(BIG_LITTLE_SPEC_FUNCTIONS): Likewise.
(ASM_CPU_SPEC): Likewise.
(EXTRA_SPEC_FUNCTIONS): Likewise.
(EXTRA_SPECS): Likewise.
(ASM_SPEC): Likewise.
* config/aarch64/aarch64.c (aarch64_start_file): Rewrite target
CPU name.
diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c
index 9c8e770..19acce1 100644
--- a/gcc/common/config/aarch64/aarch64-common.c
+++ b/gcc/common/config/aarch64/aarch64-common.c
@@ -88,3 +88,38 @@ aarch64_handle_option (struct gcc_options *opts,
 }
 
 struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
+
+#define AARCH64_CPU_NAME_LENGTH 20
+
+/* Truncate NAME at the first '.' character seen, or return
+   NAME unmodified.  */
+
+const char *
+aarch64_rewrite_selected_cpu (const char *name)
+{
+  static char output_buf[AARCH64_CPU_NAME_LENGTH + 1] = {0};
+  char *arg_pos;
+
+  strncpy (output_buf, name, AARCH64_CPU_NAME_LENGTH);
+  arg_pos = strchr (output_buf, '.');
+
+  /* If we found a '.' truncate the entry at that point.  */
+  if (arg_pos)
+*arg_pos = '\0';
+
+  return output_buf;
+}
+
+/* Called by the driver to rewrite a name passed to the -mcpu
+   argument in preparation to be passed to the assembler.  The
+   name will be in ARGV[0], ARGC should always be 1.  */
+
+const char *
+aarch64_rewrite_mcpu (int argc, const char **argv)
+{
+  gcc_assert (argc == 1);
+  return aarch64_rewrite_selected_cpu (argv[0]);
+}
+
+#undef AARCH64_CPU_NAME_LENGTH
+
diff --git a/gcc/config/aarch64/aarch64-elf.h b/gcc/config/aarch64/aarch64-elf.h
index a66c3db..97e1fb5 100644
--- a/gcc/config/aarch64/aarch64-elf.h
+++ b/gcc/config/aarch64/aarch64-elf.h
@@ -145,7 +145,8 @@
 %{mbig-endian:-EB} \
 %{mlittle-endian:-EL} \
 %{mcpu=*:-mcpu=%*} \
-%{march=*:-march=%*}" \
+%{march=*:-march=%*} \
+%(asm_cpu_spec)" \
 ASM_MABI_SPEC
 #endif
 
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 489fd1c..6ac059b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -189,6 +189,8 @@ bool aarch64_simd_valid_immediate (rtx, enum machine_mode, bool,
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 const char *aarch64_output_casesi (rtx *);
+const char *aarch64_rewrite_selected_cpu (const char *name);
+
 enum aarch64_symbol_type aarch64_classify_symbol (rtx,
 		  enum aarch64_symbol_context);
 enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index afcf43f..0c53e64 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7437,7 +7437,9 @@ aarch64_start_file (void)
 }
   else if (selected_cpu)
 {
-  asm_fprintf (asm_out_file, "\t.cpu %s", selected_cpu->name);
+  const char *truncated_name
+	= aarch64_rewrite_selected_cpu (selected_cpu->name);
+  asm_fprintf (asm_out_file, "\t.cpu %s", truncated_name);
   aarch64_print_extension ();
 }
   default_file_start();
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index cead022..d89c09b 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -857,4 +857,19 @@ extern enum aarch64_code_model aarch64_cmodel;
 #define ENDIAN_LANE_N(mode, n)  \
   (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
 
+#define BIG_LITTLE_SPEC \
+   " %{mcpu=*:%

[AArch64 0/3 big.LITTLE]

2013-12-18 Thread James Greenhalgh
Hi,

This patch series adds support for tuning for big.LITTLE systems
when compiling for the AArch64 target.

The patch series progresses as the one for the ARM backend did yesterday.
(http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01475.html)

As with the ARM backend, we take the convention that for some big.LITTLE
system where the big core is 'x' and the little core is 'y', the -mcpu
name will be x.y

We first add support for driver rewriting of -mcpu names before
handing off to the assembler, again, we truncate at the first
'.'. Then, as with the ARM backend, we remove the restriction
in aarch64-cores.def that each core must have a unique tuning
target.

We then add support for -mcpu=cortex-a57.cortex-a53.

The patch series has been regression tested on aarch64-none-elf
and aarch64-none-linux-gnu with no issues.

OK?

Thanks,
James

[AArch64 3/3 big.LITTLE] Add support for -mcpu=cortex-a57.cortex-a53

2013-12-18 Thread James Greenhalgh

Hi,

This patch wires up support for -mcpu=cortex-a57.cortex-a53.

Tested on aarch64-none-elf with no regressions.

OK?

---
2013-12-18  James Greenhalgh  

* config/aarch64/aarch64-cores.def: Add support for
-mcpu=cortex-a57.cortex-a53.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document -mcpu=cortex-a57.cortex-a53.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 1b4a49f..430cc56 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -36,3 +36,7 @@
 
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FPSIMD, cortexa53)
 AARCH64_CORE("cortex-a57",  cortexa15, cortexa15, 8,  AARCH64_FL_FPSIMD, generic)
+
+/* V8 big.LITTLE implementations.  */
+
+AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8,  AARCH64_FL_FPSIMD, generic)
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index 84081d1ba57e306398e4449e55bf4c4dadf2e391..b7e40e0b5d13842ba5db02b41c9d17a2e626d916 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa53,cortexa15"
+	"cortexa53,cortexa15,cortexa57cortexa53"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b102e13..86eafa7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11326,6 +11326,9 @@ possible values for @var{cpu} are @samp{generic}, @samp{cortex-a53},
 @samp{cortex-a57}.  The possible values for @var{feature} are documented
 in the sub-section below.
 
+Additionally, this option can specify that the target is a big.LITTLE system.
+The only possible value is @samp{cortex-a57.cortex-a53}.
+
 Where conflicting feature modifiers are specified, the right-most feature is
 used.
 

[AArch64 2/3 big.LITTLE] Allow tuning parameters without unique tuning targets.

2013-12-18 Thread James Greenhalgh

Hi,

AArch64 imports the limitation from ARM that each entry in the CORE_IDENT
column in aarch64-cores.def must be unique (as this column is used to
generate enums). We would like to have different -mcpu values map to
the same scheduler description, so remove that limitation.

One oddity this enforces, the core named by SCHEDULER_IDENT
must appear at least once in the CORE_IDENT column, so we while we
wish to use the Cortex-A15 scheduler model for the Cortex-A57,
we must set the CORE_IDENT for "cortex-a57" to cortex15.

Checked for aarch64-none-elf and built for aarch64-none-linux-gnu
with no issues.

OK?

Thanks,
James

---
2013-12-18  James Greenhalgh  

* config/aarch64/aarch64-cores.def: Add new column for
SCHEDULER_IDENT.
* config/aarch64/aarch64-opts.h (AARCH64_CORE): Handle
SCHEDULER_IDENT.
* config/aarch64/aarch64.c (AARCH64_CORE): Handle
SCHEDULER_IDENT.
(aarch64_parse_cpu): mcpu implies a default value for mtune.
* config/aarch64/aarch64.h (AARCH64_CORE): Handle
SCHEDULER_IDENT.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index b631dbe..1b4a49f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -21,18 +21,18 @@
 
Before using #include to read this file, define a macro:
 
-  AARCH64_CORE(CORE_NAME, CORE_IDENT, ARCH, FLAGS, COSTS)
+  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH, FLAGS, COSTS)
 
The CORE_NAME is the name of the core, represented as a string constant.
The CORE_IDENT is the name of the core, represented as an identifier.
+   The SCHEDULER_IDENT is the name of the core for which scheduling decisions
+   will be made, represented as an identifier.
ARCH is the architecture revision implemented by the chip.
FLAGS are the bitwise-or of the traits that apply to that core.
This need not include flags implied by the architecture.
COSTS is the name of the rtx_costs routine to use.  */
 
-/* V8 Architecture Processors.
-   This list currently contains example CPUs that implement AArch64, and
-   therefore serves as a template for adding more CPUs in the future.  */
+/* V8 Architecture Processors.  */
 
-AARCH64_CORE("cortex-a53",	  cortexa53,	 8,  AARCH64_FL_FPSIMD,cortexa53)
-AARCH64_CORE("cortex-a57",	  cortexa15,	 8,  AARCH64_FL_FPSIMD,generic)
+AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8,  AARCH64_FL_FPSIMD, cortexa53)
+AARCH64_CORE("cortex-a57",  cortexa15, cortexa15, 8,  AARCH64_FL_FPSIMD, generic)
diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index 31e105f..6275112 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -25,8 +25,8 @@
 /* The various cores that implement AArch64.  */
 enum aarch64_processor
 {
-#define AARCH64_CORE(NAME, IDENT, ARCH, FLAGS, COSTS) \
-  IDENT,
+#define AARCH64_CORE(NAME, INTERNAL_IDENT, IDENT, ARCH, FLAGS, COSTS) \
+  INTERNAL_IDENT,
 #include "aarch64-cores.def"
 #undef AARCH64_CORE
   /* Used to indicate that no processor has been specified.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0c53e64..e668088 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -246,7 +246,7 @@ struct processor
 /* Processor cores implementing AArch64.  */
 static const struct processor all_cores[] =
 {
-#define AARCH64_CORE(NAME, IDENT, ARCH, FLAGS, COSTS) \
+#define AARCH64_CORE(NAME, X, IDENT, ARCH, FLAGS, COSTS) \
   {NAME, IDENT, #ARCH, FLAGS | AARCH64_FL_FOR_ARCH##ARCH, &COSTS##_tunings},
 #include "aarch64-cores.def"
 #undef AARCH64_CORE
@@ -5119,6 +5119,7 @@ aarch64_parse_cpu (void)
   if (strlen (cpu->name) == len && strncmp (cpu->name, str, len) == 0)
 	{
 	  selected_cpu = cpu;
+	  selected_tune = cpu;
 	  aarch64_isa_flags = selected_cpu->flags;
 
 	  if (ext != NULL)
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index d89c09b..e3e4846 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -461,8 +461,8 @@ enum reg_class
 
 enum target_cpus
 {
-#define AARCH64_CORE(NAME, IDENT, ARCH, FLAGS, COSTS) \
-  TARGET_CPU_##IDENT,
+#define AARCH64_CORE(NAME, INTERNAL_IDENT, IDENT, ARCH, FLAGS, COSTS) \
+  TARGET_CPU_##INTERNAL_IDENT,
 #include "aarch64-cores.def"
 #undef AARCH64_CORE
   TARGET_CPU_generic

Re: [GOMP4] Patch to add option for offloading

2013-12-18 Thread Andrey Turetskiy
Ping

On Fri, Dec 13, 2013 at 3:09 PM, Andrey Turetskiy
 wrote:
> Hi,
> I've added option -foffload-target to specify target and options for
> target compiler for offloading.
> Please, have a look.
>
> --
> Best regards,
> Andrey Turetskiy



-- 
Best regards,
Andrey Turetskiy


Re: Fix PR58477, part II

2013-12-18 Thread Jan Hubicka
> > OK, so the explanation is not as simple as claim that non-POD types
> > needs to be constructed or copied by constructor and C++ FE always
> > generate an explicit vtbl store?
> 
> No as optimizers may combine stores for example.

Yep, I understand we can design "evil" optimization that will make problem
of tracking virtual table pointers very hard.
The question is how much we want to have these pre-IPA and how their benefits
relate to beenfits from improved devirtualization.

We are still weak on deivrt (and current generation benchmarks don't seem to
care much on how good we are), so at the moment my approach is to improve the
analysis without imposing additional restriction. but in longer turn we may
want to make some compromises (pre-ipa) here and preserve more information in 
gimple.

Honza
> 
> >> Of course, having gimple defined and required annotation for vptr
> >> changes would be much better but then of course all transformations
> >> would have to make sure they preserve it.
> >>
> >> IIRC, flag_strict_aliasing test was explicitely Richard's request,
> >> perhaps we could restrict the pointer type test further.
> >>
> >> Does that answer your question?
> >
> > Sort of, yes.  We should make some analysis how effective current
> > methods are (i.e. disabling it and checking how much devirtualization
> > improve for firefox) and if we find they seem insufficient, we probably
> > should think of better analysis or annotation...
> >
> > Honza


[AArch64 4.8-branch] Backport: Fix CM instruction generation.

2013-12-18 Thread James Greenhalgh

Hi,

Recently it was pointed out that GCC 4.8 can generate instruction
aliases which are no longer mentioned in the AArch64 portion of the
ARMv8 architecture reference manual.

It turns out that a series of patches to GCC 4.9 actually
corrected this behaviour, though at the time the instructions
were still around, so there was no need to backport the patch.

Now that there is a need I have backported parts of the following
three patches:

[AArch64] Improve description of CM instructions in RTL

  http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01789.html
  Committed as r198490.

{Partial} [AArch64] Remap neon vcmp functions to C/TREE

  http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01791.html
  Committed as r198491

[AArch64 Testsuite] Fix fallout from FCM changes.

  http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01793.html
  And committed as revision 198493.

I can split these out if anyone desires, but they make more sense
as this coherent blob.

Tested on aarch64-none-elf.

OK for 4.8-branch?

Thanks,
James

---
gcc/

2013-12-18  James Greenhalgh  

Backport from Mainline.
2013-05-01  James Greenhalgh  

* config/aarch64/aarch64-simd-builtins.def (cmhs): Rename to...
(cmgeu): ...This.
(cmhi): Rename to...
(cmgtu): ...This.
* config/aarch64/aarch64-simd.md
(simd_mode): Add SF.
(aarch64_vcond_internal): Use new names for unsigned comparison insns.
(aarch64_cm): Rewrite to not use UNSPECs.
* config/aarch64/aarch64.md (*cstore_neg): Rename to...
(cstore_neg): ...This.
* config/aarch64/iterators.md
(VALLF): new.
(unspec): Remove UNSPEC_CM.
(COMPARISONS): New.
(UCOMPARISONS): Likewise.
(optab): Add missing comparisons.
(n_optab): New.
(cmp_1): Likewise.
(cmp_2): Likewise.
(CMP): Likewise.
(cmp): Remove.
(VCMP_S): Likewise.
(VCMP_U): Likewise.
(V_cmp_result): Add DF, SF modes.
(v_cmp_result): Likewise.
(v): Likewise.
(vmtype): Likewise.
* config/aarch64/predicates.md (aarch64_reg_or_fp_zero): New.

Partial Backport from mainline.
2013-05-01  James Greenhalgh  

* config/aarch64/arm_neon.h
(vc_<8,16,32,64>): Remap
to builtins or C as appropriate.

gcc/testsuite/

2013-12-18  James Greenhalgh  

Backport from Mainline
2013-05-01  James Greenhalgh  

* gcc.target/aarch64/scalar_intrinsics.c (force_simd): New.
(test_vceqd_s64): Force arguments to SIMD registers.
(test_vceqzd_s64): Likewise.
(test_vcged_s64): Likewise.
(test_vcled_s64): Likewise.
(test_vcgezd_s64): Likewise.
(test_vcged_u64): Likewise.
(test_vcgtd_s64): Likewise.
(test_vcltd_s64): Likewise.
(test_vcgtzd_s64): Likewise.
(test_vcgtd_u64): Likewise.
(test_vclezd_s64): Likewise.
(test_vcltzd_s64): Likewise.
(test_vtst_s64): Likewise.
(test_vtst_u64): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 955da26..ed73c15 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -217,8 +217,8 @@
   BUILTIN_VSDQ_I_DI (BINOP, cmle)
   BUILTIN_VSDQ_I_DI (BINOP, cmlt)
   /* Implemented by aarch64_cm.  */
-  BUILTIN_VSDQ_I_DI (BINOP, cmhs)
-  BUILTIN_VSDQ_I_DI (BINOP, cmhi)
+  BUILTIN_VSDQ_I_DI (BINOP, cmgeu)
+  BUILTIN_VSDQ_I_DI (BINOP, cmgtu)
   BUILTIN_VSDQ_I_DI (BINOP, cmtst)
 
   /* Implemented by aarch64_.  */
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 00f3c3121f0330713565dfc45247a866e5c7763c..481222cf5287cef42f45d723faa4b28d176171c8 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -21,7 +21,7 @@
 
 ; Main data types used by the insntructions
 
-(define_attr "simd_mode" "unknown,none,V8QI,V16QI,V4HI,V8HI,V2SI,V4SI,V2DI,V2SF,V4SF,V2DF,OI,CI,XI,DI,DF,SI,HI,QI"
+(define_attr "simd_mode" "unknown,none,V8QI,V16QI,V4HI,V8HI,V2SI,V4SI,V2DI,V2SF,V4SF,V2DF,OI,CI,XI,DI,DF,SI,SF,HI,QI"
   (const_string "unknown"))
 
 
@@ -1548,12 +1548,12 @@ (define_expand "aarch64_vcond_internal (mask, operands[4], operands[5]));
+  emit_insn (gen_aarch64_cmgeu (mask, operands[4], operands[5]));
   break;
 
 case LEU:
 case GTU:
-  emit_insn (gen_aarch64_cmhi (mask, operands[4], operands[5]));
+  emit_insn (gen_aarch64_cmgtu (mask, operands[4], operands[5]));
   break;
 
 case NE:
@@ -3034,48 +3034,181 @@ (define_insn "aarch64_qshrn_n
 )
 
 
-;; cm(eq|ge|le|lt|gt)
+;; cm(eq|ge|gt|lt|le)
+;; Note, we have constraints for Dz and Z as different expanders
+;; have different ideas of what should be passed to this pattern.
 
-(define_insn "aarch64_cm"
+(define_insn "aarch64_cm"
   [(set (match_operand: 0 "register_operand" "=w,w")
-(u

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-18 Thread Allan Sandfeld Jensen
Update patch. Solved __attribute((target("arch=corei7-avx"))) by defining 
proper architectures for the recent Intel families instead of renaming 
submodels. 

I am thinking the patch is starting to touch a bit many different details, 
perhaps it should be split up, or is it good as is?

Regards
`Allan
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 206065)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2013-12-14  Allan Sandfeld Jensen 
+
+PR gcc/59422
+* config/i386/i386.c: Extend function multiversioning
+to better support recent Intel and AMD models.
+
 2013-12-17  Jan Hubicka  
 
 	* ipa-devirt.c (get_polymorphic_call_info): Fix offset calculatoin
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 206065)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29965,9 +29965,14 @@
 P_PROC_SSE4_2,
 P_POPCNT,
 P_AVX,
+P_PROC_AVX,
+P_FMA4,
+P_XOP,
+P_PROC_XOP,
+P_FMA,
+P_PROC_FMA,
 P_AVX2,
-P_FMA,
-P_PROC_FMA
+P_PROC_AVX2
   };
 
  enum feature_priority priority = P_ZERO;
@@ -29986,11 +29991,15 @@
   {"sse", P_SSE},
   {"sse2", P_SSE2},
   {"sse3", P_SSE3},
+  {"sse4a", P_SSE4_a},
   {"ssse3", P_SSSE3},
   {"sse4.1", P_SSE4_1},
   {"sse4.2", P_SSE4_2},
   {"popcnt", P_POPCNT},
   {"avx", P_AVX},
+  {"fma4", P_FMA4},
+  {"xop", P_XOP},
+  {"fma", P_FMA},
   {"avx2", P_AVX2}
 };
 
@@ -30044,25 +30053,49 @@
 	  break;
 case PROCESSOR_COREI7_AVX:
   arg_str = "corei7-avx";
-  priority = P_PROC_SSE4_2;
+  priority = P_PROC_AVX;
   break;
+case PROCESSOR_HASWELL:
+  arg_str = "core-avx2";
+  priority = P_PROC_AVX2;
+  break;
 	case PROCESSOR_ATOM:
 	  arg_str = "atom";
 	  priority = P_PROC_SSSE3;
 	  break;
+case PROCESSOR_SLM:
+  arg_str = "slm";
+  priority = P_PROC_SSE4_2;
+  break;
 	case PROCESSOR_AMDFAM10:
 	  arg_str = "amdfam10h";
 	  priority = P_PROC_SSE4_a;
 	  break;
+case PROCESSOR_BTVER1:
+  arg_str = "bobcat";
+  priority = P_PROC_SSE4_a;
+  break;
+case PROCESSOR_BTVER2:
+  arg_str = "jaguar";
+  priority = P_PROC_AVX;
+  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = "bdver1";
-	  priority = P_PROC_FMA;
+	  priority = P_PROC_XOP;
 	  break;
 	case PROCESSOR_BDVER2:
 	  arg_str = "bdver2";
 	  priority = P_PROC_FMA;
 	  break;
-	}  
+case PROCESSOR_BDVER3:
+  arg_str = "bdver3";
+  priority = P_PROC_FMA;
+  break;
+case PROCESSOR_BDVER4:
+  arg_str = "bdver4";
+  priority = P_PROC_AVX2;
+  break;
+}  
 	}
 
   cl_target_option_restore (&global_options, &cur_target);
@@ -30922,9 +30955,13 @@
 F_SSE2,
 F_SSE3,
 F_SSSE3,
+F_SSE4_a,
 F_SSE4_1,
 F_SSE4_2,
 F_AVX,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_AVX2,
 F_MAX
   };
@@ -30943,6 +30980,10 @@
 M_AMDFAM10H,
 M_AMDFAM15H,
 M_INTEL_SLM,
+M_INTEL_COREI7_AVX,
+M_INTEL_CORE_AVX2,
+M_AMD_BOBCAT,
+M_AMD_JAGUAR,
 M_CPU_SUBTYPE_START,
 M_INTEL_COREI7_NEHALEM,
 M_INTEL_COREI7_WESTMERE,
@@ -30953,7 +30994,9 @@
 M_AMDFAM15H_BDVER1,
 M_AMDFAM15H_BDVER2,
 M_AMDFAM15H_BDVER3,
-M_AMDFAM15H_BDVER4
+M_AMDFAM15H_BDVER4,
+M_INTEL_COREI7_IVYBRIDGE,
+M_INTEL_CORE_HASWELL
   };
 
   static struct _arch_names_table
@@ -30971,11 +31014,17 @@
   {"corei7", M_INTEL_COREI7},
   {"nehalem", M_INTEL_COREI7_NEHALEM},
   {"westmere", M_INTEL_COREI7_WESTMERE},
+  {"corei7-avx", M_INTEL_COREI7_AVX},
   {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE},
+  {"ivybridge", M_INTEL_COREI7_IVYBRIDGE},
+  {"core-avx2", M_INTEL_CORE_AVX2},
+  {"haswell", M_INTEL_CORE_HASWELL},
   {"amdfam10h", M_AMDFAM10H},
   {"barcelona", M_AMDFAM10H_BARCELONA},
   {"shanghai", M_AMDFAM10H_SHANGHAI},
   {"istanbul", M_AMDFAM10H_ISTANBUL},
+  {"bobcat", M_AMD_BOBCAT},
+  {"jaguar", M_AMD_JAGUAR},
   {"amdfam15h", M_AMDFAM15H},
   {"bdver1", M_AMDFAM15H_BDVER1},
   {"bdver2", M_AMDFAM15H_BDVER2},
@@ -30997,9 +31046,13 @@
   {"sse2",   F_SSE2},
   {"sse3",   F_SSE3},
   {"ssse3",  F_SSSE3},
+  {"sse4a",  F_SSE4_a},
   {"sse4.1", F_SSE4_1},
   {"sse4.2", F_SSE4_2},
   {"avx",F_AVX},
+  {"fma4",   F_FMA4},
+  {"xop",F_XOP},
+  {"fma",F_FMA},
   {"avx2",   F_AVX2}
 };
 
Index: gcc/testsuite/gcc.target/i386/funcspec-5.c
=

Re: [PATCH][ARM] Add new cores to t-aprofile

2013-12-18 Thread Ramana Radhakrishnan

On 18/12/13 11:46, Kyrill Tkachov wrote:

Hi all,

This patch adds the recently introduced cores to the t-aprofile multilib
machinery. The values added are cortex-a15.cortex-a7, cortex-a12, cortex-a57 and
cortex-a57.cortex-a53.

Tested arm-none-eabi on qemu and model.

Ok for trunk?


Ok.

Ramana



Thanks,
Kyrill


2013-12-18  James Greenhalgh  
  Kyrylo Tkachov  

  * config/arm/t-aprofile: Add cortex-a15.cortex-a7, cortex-a12,
  cortex-a57, cortex-a57.cortex-a53.






Re: [PATCH i386 4/8] [AVX512] [5/8] Add substed patterns: rounding subst.

2013-12-18 Thread Kirill Yukhin
Hello,
On 02 Dec 16:09, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:08, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:06, Kirill Yukhin wrote:
> > > Ping.
> > Ping.
> Ping.
Ping.

Rebased patch in the bottom.

--
Thanks, K

---
 gcc/config/i386/i386.c   |  32 
 gcc/config/i386/i386.md  |  10 ++
 gcc/config/i386/sse.md   | 457 +--
 gcc/config/i386/subst.md |  41 +
 4 files changed, 326 insertions(+), 214 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ecf5e0b..a3dd307 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15041,6 +15041,38 @@ ix86_print_operand (FILE *file, rtx x, int code)
fputs ("{z}", file);
  return;
 
+   case 'R':
+ gcc_assert (CONST_INT_P (x));
+
+ if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fputs (", ", file);
+
+ switch (INTVAL (x))
+   {
+   case ROUND_NEAREST_INT:
+ fputs ("{rn-sae}", file);
+ break;
+   case ROUND_NEG_INF:
+ fputs ("{rd-sae}", file);
+ break;
+   case ROUND_POS_INF:
+ fputs ("{ru-sae}", file);
+ break;
+   case ROUND_ZERO:
+ fputs ("{rz-sae}", file);
+ break;
+   case ROUND_SAE:
+ fputs ("{sae}", file);
+ break;
+   default:
+ gcc_unreachable ();
+   }
+
+ if (ASSEMBLER_DIALECT == ASM_ATT)
+   fputs (", ", file);
+
+ return;
+
case '*':
  if (ASSEMBLER_DIALECT == ASM_ATT)
putc ('*', file);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ab5b33f..30b8d74 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -241,6 +241,16 @@
(ROUND_NO_EXC   0x8)
   ])
 
+;; Constants to represent AVX512F embeded rounding
+(define_constants
+  [(ROUND_NEAREST_INT  0)
+   (ROUND_NEG_INF  1)
+   (ROUND_POS_INF  2)
+   (ROUND_ZERO 3)
+   (NO_ROUND   4)
+   (ROUND_SAE  5)
+  ])
+
 ;; Constants to represent pcomtrue/pcomfalse variants
 (define_constants
   [(PCOM_FALSE 0)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index adedf44..23edbd3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1229,23 +1229,23 @@
 }
   [(set_attr "isa" "noavx,noavx,avx,avx")])
 
-(define_expand "3"
+(define_expand "3"
   [(set (match_operand:VF 0 "register_operand")
(plusminus:VF
  (match_operand:VF 1 "nonimmediate_operand")
  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && "
+  "TARGET_SSE &&  && "
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
-(define_insn "*3"
+(define_insn "*3"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
(plusminus:VF
  (match_operand:VF 1 "nonimmediate_operand" "0,v")
- (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (, mode, operands) && 
"
+ (match_operand:VF 2 "nonimmediate_operand" "xm,")))]
+  "TARGET_SSE && ix86_binary_operator_ok (, mode, operands) && 
 && "
   "@
\t{%2, %0|%0, %2}
-   v\t{%2, %1, 
%0|%0, %1, %2}"
+   v\t{%2, %1, 
%0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseadd")
(set_attr "prefix" "")
@@ -1268,23 +1268,23 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
 
-(define_expand "mul3"
+(define_expand "mul3"
   [(set (match_operand:VF 0 "register_operand")
(mult:VF
  (match_operand:VF 1 "nonimmediate_operand")
  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && "
+  "TARGET_SSE &&  && "
   "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);")
 
-(define_insn "*mul3"
+(define_insn "*mul3"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
(mult:VF
  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
- (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && ix86_binary_operator_ok (MULT, mode, operands) && 
"
+ (match_operand:VF 2 "nonimmediate_operand" "xm,")))]
+  "TARGET_SSE && ix86_binary_operator_ok (MULT, mode, operands) && 
 && "
   "@
mul\t{%2, %0|%0, %2}
-   vmul\t{%2, %1, %0|%0, %1, %2}"
+   vmul\t{%2, %1, 
%0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "ssemul")
(set_attr "prefix" "")
@@ -1335,15 +1335,15 @@
 }
 })
 
-(define_insn "_div3"
+(define_insn "_div3"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
(div:VF
  (match_operand:VF 1 "register_operand" "0,v")
- (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
-  "TARGET_SSE && "
+ (match_operand:VF 2 "nonimmediate_operand" "xm,")))]
+  "TARGET_SSE &&  && "
   "@
div\t{%2, %0|%0, %2}
-   vdiv\t{%2, %1, 

Re: [PATCH i386 4/8] [AVX512] [6/8] Add substed patterns: `sae' subst.

2013-12-18 Thread Kirill Yukhin
Hello,

On 02 Dec 16:10, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:11, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:07, Kirill Yukhin wrote:
> > > > Is it ok for trunk?
> > > Ping.
> > Ping.
> Ping.
Ping.

Rebased patch in the bottom.

--
Thanks, K

---
 gcc/config/i386/sse.md   | 168 +++
 gcc/config/i386/subst.md |  31 +
 2 files changed, 115 insertions(+), 84 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 23edbd3..5b10cec 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1523,45 +1523,45 @@
 ;; isn't really correct, as those rtl operators aren't defined when
 ;; applied to NaNs.  Hopefully the optimizers won't get too smart on us.
 
-(define_expand "3"
+(define_expand "3"
   [(set (match_operand:VF 0 "register_operand")
(smaxmin:VF
  (match_operand:VF 1 "nonimmediate_operand")
  (match_operand:VF 2 "nonimmediate_operand")))]
-  "TARGET_SSE && "
+  "TARGET_SSE &&  && 
"
 {
   if (!flag_finite_math_only)
 operands[1] = force_reg (mode, operands[1]);
   ix86_fixup_binary_operands_no_copy (, mode, operands);
 })
 
-(define_insn "*3_finite"
+(define_insn "*3_finite"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
(smaxmin:VF
  (match_operand:VF 1 "nonimmediate_operand" "%0,v")
- (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
+ (match_operand:VF 2 "nonimmediate_operand" 
"xm,")))]
   "TARGET_SSE && flag_finite_math_only
&& ix86_binary_operator_ok (, mode, operands)
-   && "
+   &&  && "
   "@
\t{%2, %0|%0, %2}
-   v\t{%2, %1, 
%0|%0, %1, %2}"
+   v\t{%2, %1, 
%0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseadd")
(set_attr "btver2_sse_attr" "maxmin")
(set_attr "prefix" "")
(set_attr "mode" "")])
 
-(define_insn "*3"
+(define_insn "*3"
   [(set (match_operand:VF 0 "register_operand" "=x,v")
(smaxmin:VF
  (match_operand:VF 1 "register_operand" "0,v")
- (match_operand:VF 2 "nonimmediate_operand" "xm,vm")))]
+ (match_operand:VF 2 "nonimmediate_operand" 
"xm,")))]
   "TARGET_SSE && !flag_finite_math_only
-   && "
+   &&  && "
   "@
\t{%2, %0|%0, %2}
-   v\t{%2, %1, 
%0|%0, %1, %2}"
+   v\t{%2, %1, 
%0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sseadd")
(set_attr "btver2_sse_attr" "maxmin")
@@ -2099,15 +2099,15 @@
   [(V16SF "const_0_to_31_operand") (V8DF "const_0_to_31_operand")
   (V16SI "const_0_to_7_operand") (V8DI "const_0_to_7_operand")])
 
-(define_insn "avx512f_cmp3"
+(define_insn "avx512f_cmp3"
   [(set (match_operand: 0 "register_operand" "=k")
(unspec:
  [(match_operand:VI48F_512 1 "register_operand" "v")
-  (match_operand:VI48F_512 2 "nonimmediate_operand" "vm")
+  (match_operand:VI48F_512 2 "nonimmediate_operand" 
"")
   (match_operand:SI 3 "" "n")]
  UNSPEC_PCMP))]
-  "TARGET_AVX512F"
-  "vcmp\t{%3, %2, %1, 
%0|%0, %1, %2, %3}"
+  "TARGET_AVX512F && "
+  "vcmp\t{%3, 
%2, %1, 
%0|%0, %1, 
%2, %3}"
   [(set_attr "type" "ssecmp")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
@@ -2127,35 +2127,35 @@
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
-(define_insn "avx512f_vmcmp3"
+(define_insn "avx512f_vmcmp3"
   [(set (match_operand: 0 "register_operand" "=k")
(and:
  (unspec:
[(match_operand:VF_128 1 "register_operand" "v")
-(match_operand:VF_128 2 "nonimmediate_operand" "vm")
+(match_operand:VF_128 2 "nonimmediate_operand" 
"")
 (match_operand:SI 3 "const_0_to_31_operand" "n")]
UNSPEC_PCMP)
  (const_int 1)))]
   "TARGET_AVX512F"
-  "vcmp\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vcmp\t{%3, %2, %1, %0|%0, %1, 
%2, %3}"
   [(set_attr "type" "ssecmp")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
-(define_insn "avx512f_vmcmp3_mask"
+(define_insn "avx512f_vmcmp3_mask"
   [(set (match_operand: 0 "register_operand" "=k")
(and:
  (unspec:
[(match_operand:VF_128 1 "register_operand" "v")
-(match_operand:VF_128 2 "nonimmediate_operand" "vm")
+(match_operand:VF_128 2 "nonimmediate_operand" 
"")
 (match_operand:SI 3 "const_0_to_31_operand" "n")]
UNSPEC_PCMP)
  (and:
(match_operand: 4 "register_operand" "k")
(const_int 1]
   "TARGET_AVX512F"
-  "vcmp\t{%3, %2, %1, %0%{%4%}|%0%{%4%}, %1, %2, %3}"
+  "vcmp\t{%3, %2, %1, 
%0%{%4%}|%0%{%4%}, %1, %2, %3}"
   [(set_attr "type" "ssecmp")
(set_attr "length_immediate" "1")
(set_attr "prefix" "evex")
@@ -2173,17 +2173,17 @@
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
-(define_insn "_comi"
+(define_insn "_comi"
   [(set (reg:CCFP FLAGS_REG)
(compare:CCFP
  (vec_select:MODEF
(match_operand: 0 "register_operand" "v")

Re: [PATCH i386 4/8] [AVX512] [7/8] Add substed patterns: `round for expand' subst.

2013-12-18 Thread Kirill Yukhin
Hello,

On 02 Dec 16:11, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:12, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:08, Kirill Yukhin wrote:
> > > > Is it ok for trunk?
> > > Ping.
> > Ping.
> Ping.
Ping.

Rebased patch in the bottom.

--
Thanks, K

---
 gcc/config/i386/sse.md   | 24 
 gcc/config/i386/subst.md | 18 ++
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5b10cec..e15e1b1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2698,17 +2698,17 @@
  (match_operand:FMAMODE 3 "nonimmediate_operand")))]
   "")
 
-(define_expand "avx512f_fmadd__maskz"
+(define_expand "avx512f_fmadd__maskz"
   [(match_operand:VF_512 0 "register_operand")
-   (match_operand:VF_512 1 "nonimmediate_operand")
-   (match_operand:VF_512 2 "nonimmediate_operand")
-   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:VF_512 1 "")
+   (match_operand:VF_512 2 "")
+   (match_operand:VF_512 3 "")
(match_operand: 4 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_fma_fmadd__maskz_1 (
+  emit_insn (gen_fma_fmadd__maskz_1 (
 operands[0], operands[1], operands[2], operands[3],
-CONST0_RTX (mode), operands[4]));
+CONST0_RTX (mode), operands[4]));
   DONE;
 })
 
@@ -2940,17 +2940,17 @@
  UNSPEC_FMADDSUB))]
   "TARGET_FMA || TARGET_FMA4 || TARGET_AVX512F")
 
-(define_expand "avx512f_fmaddsub__maskz"
+(define_expand "avx512f_fmaddsub__maskz"
   [(match_operand:VF_512 0 "register_operand")
-   (match_operand:VF_512 1 "nonimmediate_operand")
-   (match_operand:VF_512 2 "nonimmediate_operand")
-   (match_operand:VF_512 3 "nonimmediate_operand")
+   (match_operand:VF_512 1 "")
+   (match_operand:VF_512 2 "")
+   (match_operand:VF_512 3 "")
(match_operand: 4 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_fma_fmaddsub__maskz_1 (
+  emit_insn (gen_fma_fmaddsub__maskz_1 (
 operands[0], operands[1], operands[2], operands[3],
-CONST0_RTX (mode), operands[4]));
+CONST0_RTX (mode), operands[4]));
   DONE;
 })
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index 0887f14..c41da4a 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -159,3 +159,21 @@
  (set (match_dup 0)
   (match_dup 1))
  (unspec [(match_operand:SI 2 "const_4_to_5_operand")] 
UNSPEC_EMBEDDED_ROUNDING)])])
+
+(define_subst_attr "round_expand_name" "round_expand" "" "_round")
+(define_subst_attr "round_expand_predicate" "round_expand" 
"nonimmediate_operand" "register_operand")
+(define_subst_attr "round_expand_operand" "round_expand" "" ", operands[5]")
+
+(define_subst "round_expand"
+ [(match_operand:SUBST_V 0)
+  (match_operand:SUBST_V 1)
+  (match_operand:SUBST_V 2)
+  (match_operand:SUBST_V 3)
+  (match_operand:SUBST_S 4)]
+  "TARGET_AVX512F"
+  [(match_dup 0)
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)
+   (match_dup 4)
+   (unspec [(match_operand:SI 5 "const_0_to_4_operand")] 
UNSPEC_EMBEDDED_ROUNDING)])


Re: [Patch, AArch64] [2/6] Implement support for Crypto -- Instruction types.

2013-12-18 Thread Ramana Radhakrishnan
On Fri, Dec 6, 2013 at 5:35 PM, Tejas Belagod  wrote:
>
> Hi,
>
> The attached patch adds crypto types for instruction classificiation.
>
> Tested on aarch64-none-elf. OK for trunk?

Ok but please work with Kyryll to make sure only one version of this
gets in, obviously.

Ramana

>
> Thanks,
> Tejas
>
> 2013-12-06  Tejas Belagod  
>
> * config/arm/types.md (neon_mul_d_long, crypto_aes, crypto_sha1_xor,
> crypto_sha1_fast, crypto_sha1_slow, crypto_sha256_fast,
> crypto_sha256_slow): New.
> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
> index 1c4b9e3..81ca62d 100644
> --- a/gcc/config/arm/types.md
> +++ b/gcc/config/arm/types.md
> @@ -326,6 +326,7 @@
>  ; neon_mul_b_long
>  ; neon_mul_h_long
>  ; neon_mul_s_long
> +; neon_mul_d_long
>  ; neon_mul_h_scalar
>  ; neon_mul_h_scalar_q
>  ; neon_mul_s_scalar
> @@ -519,6 +520,15 @@
>  ; neon_fp_div_s_q
>  ; neon_fp_div_d
>  ; neon_fp_div_d_q
> +;
> +; The classification below is for Crypto instructions.
> +;
> +; crypto_aes
> +; crypto_sha1_xor
> +; crypto_sha1_fast
> +; crypto_sha1_slow
> +; crypto_sha256_fast
> +; crypto_sha256_slow
>
>  (define_attr "type"
>   "adc_imm,\
> @@ -821,6 +831,7 @@
>neon_mul_b_long,\
>neon_mul_h_long,\
>neon_mul_s_long,\
> +  neon_mul_d_long,\
>neon_mul_h_scalar,\
>neon_mul_h_scalar_q,\
>neon_mul_s_scalar,\
> @@ -1035,7 +1046,14 @@
>neon_fp_div_s,\
>neon_fp_div_s_q,\
>neon_fp_div_d,\
> -  neon_fp_div_d_q"
> +  neon_fp_div_d_q,\
> +\
> +  crypto_aes,\
> +  crypto_sha1_xor,\
> +  crypto_sha1_fast,\
> +  crypto_sha1_slow,\
> +  crypto_sha256_fast,\
> +  crypto_sha256_slow"
> (const_string "untyped"))
>
>  ; Is this an (integer side) multiply with a 32-bit (or smaller) result?


Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.

2013-12-18 Thread Kirill Yukhin
Hello,

On 02 Dec 16:11, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:14, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:09, Kirill Yukhin wrote:
> > > > Is it ok for trunk?
> > > Ping.
> > Ping.
> Ping.
Ping.

Rebased patch in the bottom.

--
Thanks, K


Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.

2013-12-18 Thread Kirill Yukhin
Hello,

On 02 Dec 16:13, Kirill Yukhin wrote:
> Hello,
> On 19 Nov 12:14, Kirill Yukhin wrote:
> > Hello,
> > On 15 Nov 20:10, Kirill Yukhin wrote:
> > > > Is it ok to commit to main trunk?
> > > Ping.
> > Ping.
> Ping.
Ping.

Updated patch in the bottom.

--
Thanks, K

---
 gcc/config/i386/i386.c | 607 +++--
 gcc/config/i386/sse.md |  94 ++--
 gcc/tree-vect-stmts.c  |  34 ++-
 gcc/tree-vectorizer.h  |   4 +-
 4 files changed, 636 insertions(+), 103 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a3dd307..c86aa0a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2307,7 +2307,7 @@ enum x86_64_reg_class
 X86_64_MEMORY_CLASS
   };
 
-#define MAX_CLASSES 4
+#define MAX_CLASSES 8
 
 /* Table of constants used by fldpi, fldln2, etc  */
 static REAL_VALUE_TYPE ext_80387_constants_table [5];
@@ -6293,7 +6293,7 @@ merge_classes (enum x86_64_reg_class class1, enum 
x86_64_reg_class class2)
sized containers, classes[0] will be NO_CLASS and 1 is returned.
 
BIT_OFFSET is used internally for handling records and specifies offset
-   of the offset in bits modulo 256 to avoid overflow cases.
+   of the offset in bits modulo 512 to avoid overflow cases.
 
See the x86-64 PS ABI for details.
 */
@@ -6393,7 +6393,7 @@ classify_argument (enum machine_mode mode, const_tree 
type,
  num = classify_argument (TYPE_MODE (type), type,
   subclasses,
   (int_bit_position (field)
-   + bit_offset) % 256);
+   + bit_offset) % 512);
  if (!num)
return 0;
  pos = (int_bit_position (field)
@@ -6643,6 +6643,21 @@ classify_argument (enum machine_mode mode, const_tree 
type,
   classes[2] = X86_64_SSEUP_CLASS;
   classes[3] = X86_64_SSEUP_CLASS;
   return 4;
+case V8DFmode:
+case V16SFmode:
+case V8DImode:
+case V16SImode:
+case V32HImode:
+case V64QImode:
+  classes[0] = X86_64_SSE_CLASS;
+  classes[1] = X86_64_SSEUP_CLASS;
+  classes[2] = X86_64_SSEUP_CLASS;
+  classes[3] = X86_64_SSEUP_CLASS;
+  classes[4] = X86_64_SSEUP_CLASS;
+  classes[5] = X86_64_SSEUP_CLASS;
+  classes[6] = X86_64_SSEUP_CLASS;
+  classes[7] = X86_64_SSEUP_CLASS;
+  return 8;
 case V4SFmode:
 case V4SImode:
 case V16QImode:
@@ -6828,6 +6843,18 @@ construct_container (enum machine_mode mode, enum 
machine_mode orig_mode,
   && mode != BLKmode)
 return gen_reg_or_parallel (mode, orig_mode,
SSE_REGNO (sse_regno));
+  if (n == 8
+  && regclass[0] == X86_64_SSE_CLASS
+  && regclass[1] == X86_64_SSEUP_CLASS
+  && regclass[2] == X86_64_SSEUP_CLASS
+  && regclass[3] == X86_64_SSEUP_CLASS
+  && regclass[4] == X86_64_SSEUP_CLASS
+  && regclass[5] == X86_64_SSEUP_CLASS
+  && regclass[6] == X86_64_SSEUP_CLASS
+  && regclass[7] == X86_64_SSEUP_CLASS
+  && mode != BLKmode)
+return gen_reg_or_parallel (mode, orig_mode,
+   SSE_REGNO (sse_regno));
   if (n == 2
   && regclass[0] == X86_64_X87_CLASS
   && regclass[1] == X86_64_X87UP_CLASS)
@@ -6909,6 +6936,18 @@ construct_container (enum machine_mode mode, enum 
machine_mode orig_mode,
tmpmode = OImode;
i += 3;
break;
+ case 8:
+   gcc_assert (i == 0
+   && regclass[1] == X86_64_SSEUP_CLASS
+   && regclass[2] == X86_64_SSEUP_CLASS
+   && regclass[3] == X86_64_SSEUP_CLASS
+   && regclass[4] == X86_64_SSEUP_CLASS
+   && regclass[5] == X86_64_SSEUP_CLASS
+   && regclass[6] == X86_64_SSEUP_CLASS
+   && regclass[7] == X86_64_SSEUP_CLASS);
+   tmpmode = XImode;
+   i += 7;
+   break;
  default:
gcc_unreachable ();
  }
@@ -6982,6 +7021,12 @@ function_arg_advance_32 (CUMULATIVE_ARGS *cum, enum 
machine_mode mode,
 
 case V8SFmode:
 case V8SImode:
+case V64QImode:
+case V32HImode:
+case V16SImode:
+case V8DImode:
+case V16SFmode:
+case V8DFmode:
 case V32QImode:
 case V16HImode:
 case V4DFmode:
@@ -7033,8 +7078,9 @@ function_arg_advance_64 (CUMULATIVE_ARGS *cum, enum 
machine_mode mode,
 {
   int int_nregs, sse_nregs;
 
-  /* Unnamed 256bit vector mode parameters are passed on stack.  */
-  if (!named && VALID_AVX256_REG_MODE (mode))
+  /* Unnamed 512 and 256bit vector mode parameters are passed on stack.  */
+  if (!named && (VALID_AVX512F_REG_MODE (mode)
+|| VALID_AVX256_REG_MODE (mode)))
 return;
 

Re: [PATCH i386 4/8] [AVX512] [8/8] Add substed patterns: `sae-only for expand' subst.

2013-12-18 Thread Kirill Yukhin
> Rebased patch in the bottom.
Adding the patch.

--
Thanks, K

---
 gcc/config/i386/sse.md   | 18 ++
 gcc/config/i386/subst.md | 20 
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e15e1b1..8620541 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6598,18 +6598,19 @@
 })
 
 
-(define_expand "avx512f_fixupimm_maskz"
+(define_expand "avx512f_fixupimm_maskz"
   [(match_operand:VF_512 0 "register_operand")
(match_operand:VF_512 1 "register_operand")
(match_operand:VF_512 2 "register_operand")
-   (match_operand: 3 "nonimmediate_operand")
+   (match_operand: 3 "")
(match_operand:SI 4 "const_0_to_255_operand")
(match_operand: 5 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_avx512f_fixupimm_maskz_1 (
+  emit_insn (gen_avx512f_fixupimm_maskz_1 (
operands[0], operands[1], operands[2], operands[3],
-   operands[4], CONST0_RTX (mode), operands[5]));
+   operands[4], CONST0_RTX (mode), operands[5]
+   ));
   DONE;
 })
 
@@ -6642,18 +6643,19 @@
   [(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
-(define_expand "avx512f_sfixupimm_maskz"
+(define_expand "avx512f_sfixupimm_maskz"
   [(match_operand:VF_128 0 "register_operand")
(match_operand:VF_128 1 "register_operand")
(match_operand:VF_128 2 "register_operand")
-   (match_operand: 3 "nonimmediate_operand")
+   (match_operand: 3 "")
(match_operand:SI 4 "const_0_to_255_operand")
(match_operand: 5 "register_operand")]
   "TARGET_AVX512F"
 {
-  emit_insn (gen_avx512f_sfixupimm_maskz_1 (
+  emit_insn (gen_avx512f_sfixupimm_maskz_1 (
operands[0], operands[1], operands[2], operands[3],
-   operands[4], CONST0_RTX (mode), operands[5]));
+   operands[4], CONST0_RTX (mode), operands[5]
+   ));
   DONE;
 })
 
diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md
index c41da4a..11f2e5f 100644
--- a/gcc/config/i386/subst.md
+++ b/gcc/config/i386/subst.md
@@ -177,3 +177,23 @@
(match_dup 3)
(match_dup 4)
(unspec [(match_operand:SI 5 "const_0_to_4_operand")] 
UNSPEC_EMBEDDED_ROUNDING)])
+
+(define_subst_attr "round_saeonly_expand_name5" "round_saeonly_expand5" "" 
"_round")
+(define_subst_attr "round_saeonly_expand_predicate5" "round_saeonly_expand5" 
"nonimmediate_operand" "register_operand")
+(define_subst_attr "round_saeonly_expand_operand6" "round_saeonly_expand5" "" 
", operands[6]")
+
+(define_subst "round_saeonly_expand5"
+ [(match_operand:SUBST_V 0)
+  (match_operand:SUBST_V 1)
+  (match_operand:SUBST_V 2)
+  (match_operand:SUBST_A 3)
+  (match_operand:SI 4)
+  (match_operand:SUBST_S 5)]
+  "TARGET_AVX512F"
+  [(match_dup 0)
+   (match_dup 1)
+   (match_dup 2)
+   (match_dup 3)
+   (match_dup 4)
+   (match_dup 5)
+   (unspec [(match_operand:SI 6 "const_4_to_5_operand")] 
UNSPEC_EMBEDDED_ROUNDING)])


Re: [PATCH i386 6/8] [AVX-512] Add builtins/intrinsics.

2013-12-18 Thread Kirill Yukhin
Hello,

On 02 Dec 16:15, Kirill Yukhin wrote:
> Hello
> > Ok for trunk?
> Ping?
Ping.

Rebased patch attached.

Thanks, K


p.patch.bz2
Description: BZip2 compressed data


[PATCH] Vectorization for store with negative step

2013-12-18 Thread Bingfeng Mei
Hi,
I created PR59544 and here is the patch. OK to commit? 

Thanks,
Bingfeng


2013-12-18  Bingfeng Mei  

PR tree-optimization/59544
 * tree-vect-stmts.c (perm_mask_for_reverse): Move before
   vectorizable_store. (vectorizable_store): Handle negative step.

2013-12-18  Bingfeng Mei  

PR tree-optimization/59544
* gcc.target/i386/pr59544.c: New test

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On 
Behalf Of Richard Biener
Sent: 18 December 2013 11:47
To: Bingfeng Mei
Cc: gcc-patches@gcc.gnu.org
Subject: Re: Vectorization for store with negative step

On Wed, Dec 18, 2013 at 12:34 PM, Bingfeng Mei  wrote:
> Thanks, Richard. I will file a bug report and prepare a complete patch. For 
> perm_mask_for_reverse function, should I move it before vectorizable_store or 
> add a declaration.

Move it.

Richard.

>
> Bingfeng
> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: 18 December 2013 11:26
> To: Bingfeng Mei
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: Vectorization for store with negative step
>
> On Mon, Dec 16, 2013 at 5:54 PM, Bingfeng Mei  wrote:
>> Hi,
>> I was looking at some loops that can be vectorized by LLVM, but not GCC. One 
>> type of loop is with store of negative step.
>>
>> void test1(short * __restrict__ x, short * __restrict__ y, short * 
>> __restrict__ z)
>> {
>> int i;
>> for (i=127; i>=0; i--) {
>> x[i] = y[127-i] + z[127-i];
>> }
>> }
>>
>> I don't know why GCC only implements negative step for load, but not store. 
>> I implemented a patch, very similar to code in vectorizable_load.
>>
>> ~/scratch/install-x86/bin/gcc ghs-dec.c -ftree-vectorize -S -O2 -mavx
>>
>> Without patch:
>> test1:
>> .LFB0:
>> addq$254, %rdi
>> xorl%eax, %eax
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> movzwl  (%rsi,%rax), %ecx
>> subq$2, %rdi
>> addw(%rdx,%rax), %cx
>> addq$2, %rax
>> movw%cx, 2(%rdi)
>> cmpq$256, %rax
>> jne .L2
>> rep; ret
>>
>> With patch:
>> test1:
>> .LFB0:
>> vmovdqa .LC0(%rip), %xmm1
>> xorl%eax, %eax
>> .p2align 4,,10
>> .p2align 3
>> .L2:
>> vmovdqu (%rsi,%rax), %xmm0
>> movq%rax, %rcx
>> negq%rcx
>> vpaddw  (%rdx,%rax), %xmm0, %xmm0
>> vpshufb %xmm1, %xmm0, %xmm0
>> addq$16, %rax
>> cmpq$256, %rax
>> vmovups %xmm0, 240(%rdi,%rcx)
>> jne .L2
>> rep; ret
>>
>> Performance is definitely improved here. It is bootstrapped for 
>> x86_64-unknown-linux-gnu, and has no additional regressions on my machine.
>>
>> For reference, LLVM seems to use different instructions and slightly worse 
>> code. I am not so familiar with x86 assemble code. The patch is originally 
>> for our private port.
>> test1:  # @test1
>> .cfi_startproc
>> # BB#0: # %entry
>> addq$240, %rdi
>> xorl%eax, %eax
>> .align  16, 0x90
>> .LBB0_1:# %vector.body
>> # =>This Inner Loop Header: Depth=1
>> movdqu  (%rsi,%rax,2), %xmm0
>> movdqu  (%rdx,%rax,2), %xmm1
>> paddw   %xmm0, %xmm1
>> shufpd  $1, %xmm1, %xmm1# xmm1 = xmm1[1,0]
>> pshuflw $27, %xmm1, %xmm0   # xmm0 = xmm1[3,2,1,0,4,5,6,7]
>> pshufhw $27, %xmm0, %xmm0   # xmm0 = xmm0[0,1,2,3,7,6,5,4]
>> movdqu  %xmm0, (%rdi)
>> addq$8, %rax
>> addq$-16, %rdi
>> cmpq$128, %rax
>> jne .LBB0_1
>> # BB#2: # %for.end
>> ret
>>
>> Any comment?
>
> Looks good to me.  One of the various TODOs in vectorizable_store I presume.
>
> Needs a testcase and at this stage a bugreport that is fixed by it.
>
> Thanks,
> Richard.
>
>> Bingfeng Mei
>> Broadcom UK
>>
>>


patch_vec_store
Description: patch_vec_store


RE: Two build != host fixes

2013-12-18 Thread Bernd Edlinger
Hi,

On Wed, 18 Dec 2013 09:58:39, Alan Modra wrote:
>
> On Tue, Dec 17, 2013 at 01:14:23PM +0100, Bernd Edlinger wrote:
>> the reason for this is overwriting GMPINC for the auto-build generation, 
>> because
>> many test scripts include  which fails now completely (it is not 
>> installed,
>> I have it in-tree).
>
> Yes, I understand the reason why your setup is failing. Please try
> this patch.
>
> Index: gcc/configure.ac
> ===
> --- gcc/configure.ac (revision 206009)
> +++ gcc/configure.ac (working copy)
> @@ -1529,8 +1529,13 @@
> /* | [A-Za-z]:[\\/]* ) realsrcdir=${srcdir};;
> *) realsrcdir=../${srcdir};;
> esac
> + # Clearing GMPINC is necessary to prevent host headers being
> + # used by the build compiler. Defining GENERATOR_FILE stops
> + # system.h from including gmp.h.
> CC="${CC_FOR_BUILD}" CFLAGS="${CFLAGS_FOR_BUILD}" \
> - LDFLAGS="${LDFLAGS_FOR_BUILD}" GMPINC="" \
> + CXX="${CXX_FOR_BUILD}" CXXFLAGS="${CXXFLAGS_FOR_BUILD}" \
> + LD="${LD_FOR_BUILD}" LDFLAGS="${LDFLAGS_FOR_BUILD}" \
> + GMPINC="" CPPFLAGS="${CPPFLAGS} -DGENERATOR_FILE" \
> ${realsrcdir}/configure \
> --enable-languages=${enable_languages-all} \
> --target=$target_alias --host=$build_alias --build=$build_alias
>
> --
> Alan Modra
> Australia Development Lab, IBM

Yes. It works. Thanks!


g++ -c -DIN_GCC -DGENERATOR_FILE -I. -Ibuild -I../../gcc-4.9-20131215/gcc 
-I../../gcc-4.9-20131215/gcc/build -I../../gcc-4.9-20131215/gcc/../include 
-I../../gcc-4.9-20131215/gcc/../libcpp/include 
-I/home/ed/gnu/x/gcc-build-arm-linux-gnueabihf-cross/./gmp 
-I/home/ed/gnu/x/gcc-4.9-20131215/gmp 
-I/home/ed/gnu/x/gcc-build-arm-linux-gnueabihf-cross/./mpfr 
-I/home/ed/gnu/x/gcc-4.9-20131215/mpfr 
-I/home/ed/gnu/x/gcc-4.9-20131215/mpc/src  
-I../../gcc-4.9-20131215/gcc/../libdecnumber 
-I../../gcc-4.9-20131215/gcc/../libdecnumber/dpd -I../libdecnumber 
-I../../gcc-4.9-20131215/gcc/../libbacktrace    \
    -o build/gengtype.o ../../gcc-4.9-20131215/gcc/gengtype.c


I wonder if the GMPINC="" is still necessary, as the actual host g++ invocation 
also has GMP-directories
but does not use them because of the -DGENERATOR_FILE.


Bernd.

Re: [PATCH] Vectorization for store with negative step

2013-12-18 Thread Jakub Jelinek
On Wed, Dec 18, 2013 at 01:31:05PM +, Bingfeng Mei wrote:
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 206016)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2013-12-18  Bingfeng Mei  
+
+   PR tree-optimization/59544
+* tree-vect-stmts.c (perm_mask_for_reverse): Move before

This should be a tab instead of 8 spaces.

+   vectorizable_store. (vectorizable_store): Handle negative step.

Newline and tab after "store.", rather than space.

Property changes on: gcc/testsuite/gcc.target/i386/pr59544.c
___
Added: svn:executable
   + *

Please don't add such bogus property.  Testcases aren't executable.

Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 206016)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2013-12-18  Bingfeng Mei  
+
+   PR tree-optimization/59544
+   * gcc.target/i386/pr59544.c: New test

Missing dot at the end of line.
+
 2013-12-16  Jakub Jelinek  
 
PR middle-end/58956
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 206016)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -4859,6 +4859,25 @@ ensure_base_align (stmt_vec_info stmt_in
 }
 
 
+/* Given a vector type VECTYPE returns the VECTOR_CST mask that implements
+   reversal of the vector elements.  If that is impossible to do,
+   returns NULL.  */
+
+static tree
+perm_mask_for_reverse (tree vectype)
+{
+  int i, nunits;
+  unsigned char *sel;
+
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  sel = XALLOCAVEC (unsigned char, nunits);
+
+  for (i = 0; i < nunits; ++i)
+sel[i] = nunits - 1 - i;
+
+  return vect_gen_perm_mask (vectype, sel);
+}
+
 /* Function vectorizable_store.
 
Check if STMT defines a non scalar data-ref (array/pointer/structure) that
@@ -4902,6 +4921,8 @@ vectorizable_store (gimple stmt, gimple_
   vec oprnds = vNULL;
   vec result_chain = vNULL;
   bool inv_p;
+  bool negative = false;
+  tree offset = NULL_TREE;
   vec vec_oprnds = vNULL;
   bool slp = (slp_node != NULL);
   unsigned int vec_num;
@@ -4976,16 +4997,38 @@ vectorizable_store (gimple stmt, gimple_
   if (!STMT_VINFO_DATA_REF (stmt_info))
 return false;
 
-  if (tree_int_cst_compare (loop && nested_in_vect_loop_p (loop, stmt)
-   ? STMT_VINFO_DR_STEP (stmt_info) : DR_STEP (dr),
-   size_zero_node) < 0)
+  negative = tree_int_cst_compare (loop && nested_in_vect_loop_p (loop, stmt)
+? STMT_VINFO_DR_STEP (stmt_info) : DR_STEP (dr),
+size_zero_node) < 0;

The formatting looks wrong, do:
  negative
= tree_int_cst_compare (loop && nested_in_vect_loop_p (loop, stmt)
? STMT_VINFO_DR_STEP (stmt_info) : DR_STEP (dr),
size_zero_node) < 0;
instead.

+  if (negative && ncopies > 1)
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "negative step for store.\n");
+ "multiple types with negative step.");
   return false;
 }
 
+  if (negative)
+{
+  gcc_assert (!grouped_store);
+  alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
+  if (alignment_support_scheme != dr_aligned
+  && alignment_support_scheme != dr_unaligned_supported)

Lots of places where you use 8 spaces instead of tab, please fix.
+offset = size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1);
+
   if (store_lanes_p)
 aggr_type = build_array_type_nelts (elem_type, vec_num * nunits);
   else
@@ -5200,7 +5246,7 @@ vectorizable_store (gimple stmt, gimple_
dataref_ptr
  = vect_create_data_ref_ptr (first_stmt, aggr_type,
  simd_lane_access_p ? loop : NULL,
- NULL_TREE, &dummy, gsi, &ptr_incr,
+ offset, &dummy, gsi, &ptr_incr,
  simd_lane_access_p, &inv_p);
  gcc_assert (bb_vinfo || !inv_p);
}
@@ -5306,6 +5352,21 @@ vectorizable_store (gimple stmt, gimple_
set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
misalign);
 
+ if (negative)
+{
+  tree perm_mask = perm_mask_for_reverse (vectype);
+  tree perm_dest = vect_create_destination_var 
(gimple_assign_rhs1 (stmt), vectype);
+  tree new_temp = make_ssa_name (perm_dest, NULL);
+
+  /* Generate the permute statement.  */
+  gimple perm_stmt = gimple_build_assign_with_ops 
(VEC_PERM_EXPR,

[PATCH] Avoid uninitialized warning in i386.c

2013-12-18 Thread Marek Polacek
Bootstrap with -fsanitize=undefined revealed that the alg variable
may be used uninitialized.  Or at least gcc thinks so.  This patch
initializes it to 0.

Regtested/bootstrapped on x86_64-linux, ok for trunk?

2013-12-18  Marek Polacek  

* config/i386/i386.c (ix86_parse_stringop_strategy_string): Initialize
alg to 0.

--- gcc/config/i386/i386.c.mp   2013-12-18 13:38:21.908138307 +0100
+++ gcc/config/i386/i386.c  2013-12-18 13:38:41.417214628 +0100
@@ -2856,7 +2856,7 @@ ix86_parse_stringop_strategy_string (cha
   do
 {
   int maxs;
-  stringop_alg alg;
+  stringop_alg alg = (stringop_alg) 0;
   char alg_name[128];
   char align[16];
   next_range_str = strchr (curr_range_str, ',');

Marek


Re: [PATCH] Avoid uninitialized warning in i386.c

2013-12-18 Thread Jakub Jelinek
On Wed, Dec 18, 2013 at 02:39:44PM +0100, Marek Polacek wrote:
> Bootstrap with -fsanitize=undefined revealed that the alg variable
> may be used uninitialized.  Or at least gcc thinks so.  This patch
> initializes it to 0.
> 
> Regtested/bootstrapped on x86_64-linux, ok for trunk?

Do
stringop_alg alg = last_alg;
instead, or alternatively (perhaps better), remove the alg variable
altogether, just break; in the loop and set
  input_ranges[n].alg = (stringop_alg) i;

> 2013-12-18  Marek Polacek  
> 
>   * config/i386/i386.c (ix86_parse_stringop_strategy_string): Initialize
>   alg to 0.
> 
> --- gcc/config/i386/i386.c.mp 2013-12-18 13:38:21.908138307 +0100
> +++ gcc/config/i386/i386.c2013-12-18 13:38:41.417214628 +0100
> @@ -2856,7 +2856,7 @@ ix86_parse_stringop_strategy_string (cha
>do
>  {
>int maxs;
> -  stringop_alg alg;
> +  stringop_alg alg = (stringop_alg) 0;
>char alg_name[128];
>char align[16];
>next_range_str = strchr (curr_range_str, ',');

Jakub


[Patch, ARM, LRA] Fix Thumb1 ICE

2013-12-18 Thread Yvan Roux
Hi,

this patch from Vladimir fixes an ICE when compiling newlib in Thumb1.
 It returns NO_REGS in THUMB_SECONDARY_OUTPUT_RELOAD_CLASS, the same
way we did for THUMB_SECONDARY_INPUT_RELOAD_CLASS.

The testsuite is OK with this patch, but as we have also a regression
on iWMMXT, I tried to avoid the secondary reload restriction at a
higher level : in SECONDARY_[INPUT|OUTPUT]_RELOAD_CLASS, as these
macros handle the iWMMXT target.  Unfortunately it doesn't fix the
issue, but the testsuite results are the same as with the attached
patch.

It seems to me that this second solution is more LRA friendly (i.e.
doing less thing on the target side) but I want your opinion.

If the Thumb fix is sufficient, here is the Changelog

2013-12-18  Vladimir Makarov  

* config/arm/arm.h (THUMB_SECONDARY_OUTPUT_RELOAD_CLASS): Return NO_REGS
for LRA.

Thanks,
Yvan
Index: config/arm/arm.h
===
--- config/arm/arm.h(revision 206023)
+++ config/arm/arm.h(working copy)
@@ -1285,11 +1285,12 @@ enum reg_class
   : NO_REGS))
 
 #define THUMB_SECONDARY_OUTPUT_RELOAD_CLASS(CLASS, MODE, X)\
-  ((CLASS) != LO_REGS && (CLASS) != BASE_REGS  \
-   ? ((true_regnum (X) == -1 ? LO_REGS \
-   : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS  \
-   : NO_REGS)) \
-   : NO_REGS)
+  (lra_in_progress ? NO_REGS   \
+   : (CLASS) != LO_REGS && (CLASS) != BASE_REGS
\
+  ? ((true_regnum (X) == -1 ? LO_REGS  \
+ : (true_regnum (X) + HARD_REGNO_NREGS (0, MODE) > 8) ? LO_REGS
\
+ : NO_REGS))   \
+  : NO_REGS)
 
 /* Return the register class of a scratch register needed to copy IN into
or out of a register in CLASS in MODE.  If it can be done directly,


Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-18 Thread Uros Bizjak
On Wed, Dec 18, 2013 at 1:57 PM, Allan Sandfeld Jensen
 wrote:
> Update patch. Solved __attribute((target("arch=corei7-avx"))) by defining
> proper architectures for the recent Intel families instead of renaming
> submodels.

@@ -30922,9 +30955,13 @@
 F_SSE2,
 F_SSE3,
 F_SSSE3,
+F_SSE4_a,
 F_SSE4_1,
 F_SSE4_2,
 F_AVX,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_AVX2,
 F_MAX

and

@@ -89,9 +97,13 @@
   FEATURE_SSE2,
   FEATURE_SSE3,
   FEATURE_SSSE3,
+  FEATURE_SSE4_a,
   FEATURE_SSE4_1,
   FEATURE_SSE4_2,
   FEATURE_AVX,
+  FEATURE_FMA4,
+  FEATURE_XOP,
+  FEATURE_FMA,
   FEATURE_AVX2
 };


The above two enums should not be reordered.

> I am thinking the patch is starting to touch a bit many different details,
> perhaps it should be split up, or is it good as is?

It is OK.

Apart from the reordered enums, the patch looks mostly OK. Let's wait
a couple of days for possible comments from Intel and AMD people.

Uros.


Re: RFA: Fix test pr32912-2.c for 16-bit targets

2013-12-18 Thread Jeff Law

On 12/18/13 03:09, Nick Clifton wrote:

Hi Guys,

   The test gcc/testsuite/gcc.dg/pr32912-2.c fails to execute correctly
   on targets that use 16-bit integers, because it assumes at least a
   32-bit integer.  The patch below removes this assumption, and it also
   tidies up the code slightly so that __SIZEOF_INT__ is only tested in
   one place, not three.

   There were no regressions when tested with a i686-pc-linux-gnu or a
   x86_64-pc-linux-gnu toolchain, and the test was fixed for a rl78-elf
   toolchain.

   OK to apply ?

Cheers
   Nick

gcc/testsuite/ChangeLog
2013-12-18  Nick Clifton  

* gcc.dg/pr32912-2.c: Fix for 16-bit targets.

OK.
Jeff



[Patch, Fortran, OOP] PR 59493: Cleanup of vtab generation code

2013-12-18 Thread Janus Weil
Hi all,

here is a follow-up to my recent patch for PR59493, doing some cleanup
related to the generation of vtab symbols:
1) Since the function gfc_find_intrinsic_vtab, contrary to its name,
handles not only intrinsic but also derived types, I removed the
latter functionality, and instead introduced a new function
gfc_find_vtab, which handles arbitrary types and simply decides
whether to call the corresponding function for intrinsic or derived
vtabs.
2) Basically all calls to gfc_find_intrinsic_vtab are replaced by
gfc_find_vtab. This often simplifies the logic and saves additional IF
clauses to distinguish between intrinsic and derived types.
3) As a consequence, gfc_find_intrinsic_vtab is made static and loses
the gfc_ prefix.

All of this results in the code being shorter, clearer and more
error-prone. The patch is regtested on x86_64-unknown-linux-gnu. Ok
for trunk?

Cheers,
Janus



2013-12-18  Janus Weil  

PR fortran/59493
* gfortran.h (gfc_find_intrinsic_vtab): Removed prototype.
(gfc_find_vtab): New prototype.
* class.c (gfc_find_intrinsic_vtab): Rename to 'find_intrinsic_vtab' and
make static. Minor modifications.
(gfc_find_vtab): New function.
(gfc_class_initializer): Use new function 'gfc_find_vtab'.
* check.c (gfc_check_move_alloc): Ditto.
* expr.c (gfc_check_pointer_assign): Ditto.
* interface.c (compare_actual_formal): Ditto.
* resolve.c (resolve_allocate_expr, resolve_select_type): Ditto.
* trans-expr.c (gfc_conv_intrinsic_to_class, gfc_trans_class_assign):
Ditto.
* trans-intrinsic.c (conv_intrinsic_move_alloc): Ditto.
* trans-stmt.c (gfc_trans_allocate): Ditto.
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c (revision 206083)
+++ gcc/fortran/check.c (working copy)
@@ -2858,12 +2858,7 @@ gfc_check_move_alloc (gfc_expr *from, gfc_expr *to
 
   /* CLASS arguments: Make sure the vtab of from is present.  */
   if (to->ts.type == BT_CLASS && !UNLIMITED_POLY (from))
-{
-  if (from->ts.type == BT_CLASS || from->ts.type == BT_DERIVED)
-   gfc_find_derived_vtab (from->ts.u.derived);
-  else
-   gfc_find_intrinsic_vtab (&from->ts);
-}
+gfc_find_vtab (&from->ts);
 
   return true;
 }
Index: gcc/fortran/class.c
===
--- gcc/fortran/class.c (revision 206083)
+++ gcc/fortran/class.c (working copy)
@@ -423,18 +423,11 @@ gfc_class_initializer (gfc_typespec *ts, gfc_expr
   gfc_expr *init;
   gfc_component *comp;
   gfc_symbol *vtab = NULL;
-  bool is_unlimited_polymorphic;
 
-  is_unlimited_polymorphic = ts->u.derived
-  && ts->u.derived->components->ts.u.derived
-  && ts->u.derived->components->ts.u.derived->attr.unlimited_polymorphic;
-
-  if (is_unlimited_polymorphic && init_expr)
-vtab = gfc_find_intrinsic_vtab (&ts->u.derived->components->ts);
-  else if (init_expr && init_expr->expr_type != EXPR_NULL)
-vtab = gfc_find_derived_vtab (init_expr->ts.u.derived);
+  if (init_expr && init_expr->expr_type != EXPR_NULL)
+vtab = gfc_find_vtab (&init_expr->ts);
   else
-vtab = gfc_find_derived_vtab (ts->u.derived);
+vtab = gfc_find_vtab (ts);
 
   init = gfc_get_structure_constructor_expr (ts->type, ts->kind,
 &ts->u.derived->declared_at);
@@ -2403,39 +2396,34 @@ yes:
 
 
 /* Find (or generate) the symbol for an intrinsic type's vtab.  This is
-   need to support unlimited polymorphism.  */
+   needed to support unlimited polymorphism.  */
 
-gfc_symbol *
-gfc_find_intrinsic_vtab (gfc_typespec *ts)
+static gfc_symbol *
+find_intrinsic_vtab (gfc_typespec *ts)
 {
   gfc_namespace *ns;
   gfc_symbol *vtab = NULL, *vtype = NULL, *found_sym = NULL;
   gfc_symbol *copy = NULL, *src = NULL, *dst = NULL;
   int charlen = 0;
 
-  if (ts->type == BT_CHARACTER && ts->deferred)
+  if (ts->type == BT_CHARACTER)
 {
-  gfc_error ("TODO: Deferred character length variable at %C cannot "
-"yet be associated with unlimited polymorphic entities");
-  return NULL;
+  if (ts->deferred)
+   {
+ gfc_error ("TODO: Deferred character length variable at %C cannot "
+"yet be associated with unlimited polymorphic entities");
+ return NULL;
+   }
+  else if (ts->u.cl && ts->u.cl->length
+  && ts->u.cl->length->expr_type == EXPR_CONSTANT)
+   charlen = mpz_get_si (ts->u.cl->length->value.integer);
 }
 
-  if (ts->type == BT_UNKNOWN)
-return NULL;
-
-  /* Sometimes the typespec is passed from a single call.  */
-  if (ts->type == BT_DERIVED || ts->type == BT_CLASS)
-return gfc_find_derived_vtab (ts->u.derived);
-
   /* Find the top-level namespace.  */
   for (ns = gfc_current_ns; ns; ns = ns->parent)
 if (!ns->parent)
   break;
 
-  if (ts->type == BT_CHARACTER && ts->u.cl && ts->u.cl->length
-  && ts->u.cl->length->expr_type == EXP

RE: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-18 Thread Iyer, Balaji V


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@zalov.cz]
> Sent: Wednesday, December 18, 2013 1:31 AM
> To: Iyer, Balaji V
> Cc: Joseph S. Myers; Aldy Hernandez (al...@redhat.com); 'gcc-
> patc...@gcc.gnu.org'
> Subject: Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly
> Elemental functions) for C
> 
> On Tue, Dec 17, 2013 at 11:38:48PM +, Iyer, Balaji V wrote:
> > > What I meant is
> > >   if (((mask >> PRAGMA_CILK_CLAUSE_VECTORLENGTH) & 1) != 0)
> > > is_cilk_simd_fn = true;
> > > (note, for 32-bit HWI targets, omp_clause_mask is a class and not
> > > all arithmetic is actually supported on it, so better limit yourself
> > > to forms used elsewhere already).
> > >
> >
> > I have a better idea.. The where string, if it is "SIMD-enabled
> > functions attribute" will indicate that it is a Cilk Plus SIMD-enabled 
> > function.
> > So, if I do a check for that, then I don't have to do any of this mask
> > anding.
> >
> > This is what I am talking about:
> >
> >   if (where && !strcmp (where, "SIMD-enabled functions attribute"))
> > is_cilk_simd_fn = false;
> 
> But this is more expensive and the string really is meant for diagnostics
> messages, so I'd strongly prefer the above mask check instead.
> Ok with that change.
> 

OK, will make this fix.

> > From what I understood, all the #pragma omp declare simd work are
> pushed into trunk right?
> 
> Yes, though I still want to optimize it a little bit (generate thunks and/or
> aliases when desirable/possible), but that only affects exported entry-points
> for OpenMP, for Cilk+ the code matches more the Intel ABI paper and
> generates only one ISA variant (and expects to parse processor clause for
> other ISA variants), rather than emitting all 3.

So, install it into gomp4 branch then?

> 
>   Jakub


Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-18 Thread Jakub Jelinek
On Wed, Dec 18, 2013 at 02:32:54PM +, Iyer, Balaji V wrote:
> > Yes, though I still want to optimize it a little bit (generate thunks and/or
> > aliases when desirable/possible), but that only affects exported 
> > entry-points
> > for OpenMP, for Cilk+ the code matches more the Intel ABI paper and
> > generates only one ISA variant (and expects to parse processor clause for
> > other ISA variants), rather than emitting all 3.
> 
> So, install it into gomp4 branch then?

No, please test against trunk and commit there.

Jakub


Re: [RS6000] bswapdi2 pattern, reload and lra

2013-12-18 Thread David Edelsohn
On Tue, Dec 17, 2013 at 6:50 AM, Alan Modra  wrote:

> Bootstrapped and regression tested powerpc64-linux.  Output of
> pr53199.c inspected for sanity with -mcpu=power{6,7} -m{32,64} and
> {-mlra,}.  OK to apply?
>
> gcc/
> * config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
> Modify Z->r bswapdi splitter to use dest in place of scratch.
> In r->Z and Z->r bswapdi splitter rename word_high, word_low
> to word1, word2 and rearrange logic to suit.
> (bswapdi2_64bit): Remove early clobber on Z->r alternative.
> (bswapdi2_ldbrx): Likewise.  Remove '??' on r->r.
> (bswapdi2_32bit): Remove early clobber on Z->r alternative.
> Add one '?' on r->r.  Modify Z->r splitter to avoid need for
> early clobber.
> gcc/testsuite/
> * gcc.target/powerpc/pr53199.c: Add extra functions.

> @@ -2438,29 +2433,29 @@
>addr2 = gen_rtx_PLUS (Pmode, op2, addr1);
>  }
>
> +  word1 = change_address (src, SImode, addr1);
> +  word2 = change_address (src, SImode, addr2);
> +
>if (BYTES_BIG_ENDIAN)
>  {
> -  word_high = change_address (src, SImode, addr1);
> -  word_low  = change_address (src, SImode, addr2);
> +  emit_insn (gen_bswapsi2 (op3_32, word2));
> +  emit_insn (gen_bswapsi2 (dest_32, word1));
>  }
>else
>  {
> -  word_high = change_address (src, SImode, addr2);
> -  word_low  = change_address (src, SImode, addr1);
> +  emit_insn (gen_bswapsi2 (op3_32, word1));
> +  emit_insn (gen_bswapsi2 (dest_32, word2));
>  }
>
> -  emit_insn (gen_bswapsi2 (op3_32, word_low));
> -  emit_insn (gen_bswapsi2 (op4_32, word_high));
> -  emit_insn (gen_ashldi3 (dest, op3, GEN_INT (32)));
> -  emit_insn (gen_iordi3 (dest, dest, op4));
> +  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
> +  emit_insn (gen_iordi3 (dest, dest, op3));
>  }")

Why change the code from swapping the words at the initial
change_address() to swapping the words in the call to gen_bswapsi2()?

- David


RE: [PATCH] Fix PR58944

2013-12-18 Thread Bernd Edlinger
On Tue, 17 Dec 2013 10:29:03, Sriraman Tallam wrote:
>
> On Fri, Dec 13, 2013 at 5:06 AM, H.J. Lu  wrote:
>> On Mon, Dec 2, 2013 at 6:46 PM, Sriraman Tallam  wrote:
>>> On Thu, Nov 28, 2013 at 9:36 PM, Bernd Edlinger
>>>  wrote:
 Hi,

 On Wed, 27 Nov 2013 19:49:39, Uros Bizjak wrote:
>
> On Mon, Nov 25, 2013 at 10:08 PM, Sriraman Tallam  
> wrote:
>
>> I have attached a patch to fix this bug :
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58944
>>
>> A similar problem was also reported here:
>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01050.html
>>
>>
>> Recently, ix86_valid_target_attribute_tree in config/i386/i386.c was
>> refactored to not depend on global_options structure and to be able to
>> use any gcc_options structure. One clean way to fix this is by having
>> target_option_default_node save all the default target options which
>> can be restored to any gcc_options structure. The root cause of the
>> above bugs was that ix86_arch_string and ix86_tune_string was not
>> saved in target_option_deault_node in PR58944 and
>> ix86_preferred_stack_boundary_arg was not saved in the latter case.
>>
>> This patch saves all the target options used in i386.opt which are
>> either obtained from the command-line or set to some default. Is this
>> patch alright?
>
> Things looks rather complicated, but I see no other solution that save
> and restore the way you propose.
>
> Please wait 24h if somebody has a different idea, otherwise please go
> ahead and commit the patch to mainline.
>

 Maybe you should also look at the handling or preferred_stack_boundary_arg
 versus incoming_stack_boundary_arg in ix86_option_override_internal:

 Remember ix86_incoming_stack_boundary_arg is defined to
 global_options.x_ix86_incoming_stack_boundary_arg.

 like this?

 if (opts_set->x_ix86_incoming_stack_boundary_arg)
 {
 - if (ix86_incoming_stack_boundary_arg
 + if (opts->x_ix86_incoming_stack_boundary_arg
 < (TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 4 : 2)
 - || ix86_incoming_stack_boundary_arg> 12)
 + || opts->x_ix86_incoming_stack_boundary_arg> 12)
 error ("-mincoming-stack-boundary=%d is not between %d and 12",
 - ix86_incoming_stack_boundary_arg,
 + opts->x_ix86_incoming_stack_boundary_arg,
 TARGET_64BIT_P (opts->x_ix86_isa_flags) ? 4 : 2);
 else
 {
 ix86_user_incoming_stack_boundary
 - = (1 << ix86_incoming_stack_boundary_arg) * BITS_PER_UNIT;
 + = (1 << opts->x_ix86_incoming_stack_boundary_arg) * BITS_PER_UNIT;
 ix86_incoming_stack_boundary
 = ix86_user_incoming_stack_boundary;
 }

>>>
>>> Thanks for catching this. I will make this change in the same patch.
>>>
>>
>> Your change caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59492
>
> Thanks for fixing this. This is making me wonder if I am missing some
> other important flags. Is there a way to detect this? I originally
> looked at everything in i386.opt to form my list.
>

Maybe that is related too see: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38077

#pragma GCC optimize ("strict-volatile-bitfields")


does not work together with:

#pragma GCC push#pragma GCC pop

> Thanks
> Sri
>
>>
>>
>> --
>> H.J.   

[PATCH] Improve _mm*loadu* intrinsics handling (PR target/59539)

2013-12-18 Thread Jakub Jelinek
Hi!

As discussed in the PR, this patch similarly to the recent changes
in movmisalign expansion for TARGET_AVX for unaligned loads from
misaligned_operand just expands those as *mov_internal pattern,
because that pattern emits vmovdqu/vmovup[sd] too, but doesn't contain
UNSPECs and thus can be also merged into most other AVX insns that use
the load target if those insns accept a memory operand.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-12-18  Jakub Jelinek  

PR target/59539
* config/i386/sse.md
(_loadu,
_loaddqu): New expanders,
prefix existing define_insn names with *.

* gcc.target/i386/pr59539-1.c: New test.
* gcc.target/i386/pr59539-2.c: New test.

--- gcc/config/i386/sse.md.jj   2013-12-10 12:43:21.0 +0100
+++ gcc/config/i386/sse.md  2013-12-18 11:10:36.428643400 +0100
@@ -912,7 +912,27 @@ (define_expand "movmisalign"
   DONE;
 })
 
-(define_insn "_loadu"
+(define_expand "_loadu"
+  [(set (match_operand:VF 0 "register_operand")
+   (unspec:VF [(match_operand:VF 1 "nonimmediate_operand")]
+ UNSPEC_LOADU))]
+  "TARGET_SSE && "
+{
+  /* For AVX, normal *mov_internal pattern will handle unaligned loads
+ just fine if misaligned_operand is true, and without the UNSPEC it can
+ be combined with arithmetic instructions.  If misaligned_operand is
+ false, still emit UNSPEC_LOADU insn to honor user's request for
+ misaligned load.  */
+  if (TARGET_AVX
+  && misaligned_operand (operands[1], mode)
+  && !)
+{
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0], operands[1]));
+  DONE;
+}
+})
+
+(define_insn "*_loadu"
   [(set (match_operand:VF 0 "register_operand" "=v")
(unspec:VF
  [(match_operand:VF 1 "nonimmediate_operand" "vm")]
@@ -999,7 +1019,28 @@ (define_insn "avx512f_storeu")])
 
-(define_insn "_loaddqu"
+(define_expand "_loaddqu"
+  [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand")
+   (unspec:VI_UNALIGNED_LOADSTORE
+ [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand")]
+ UNSPEC_LOADU))]
+  "TARGET_SSE2 && "
+{
+  /* For AVX, normal *mov_internal pattern will handle unaligned loads
+ just fine if misaligned_operand is true, and without the UNSPEC it can
+ be combined with arithmetic instructions.  If misaligned_operand is
+ false, still emit UNSPEC_LOADU insn to honor user's request for
+ misaligned load.  */
+  if (TARGET_AVX
+  && misaligned_operand (operands[1], mode)
+  && !)
+{
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0], operands[1]));
+  DONE;
+}
+})
+
+(define_insn "*_loaddqu"
   [(set (match_operand:VI_UNALIGNED_LOADSTORE 0 "register_operand" "=v")
(unspec:VI_UNALIGNED_LOADSTORE
  [(match_operand:VI_UNALIGNED_LOADSTORE 1 "nonimmediate_operand" "vm")]
--- gcc/testsuite/gcc.target/i386/pr59539-1.c.jj2013-12-18 
08:46:26.023864371 +0100
+++ gcc/testsuite/gcc.target/i386/pr59539-1.c   2013-12-18 08:53:12.304743270 
+0100
@@ -0,0 +1,16 @@
+/* PR target/59539 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+#include 
+
+int
+foo (void *p1, void *p2)
+{
+  __m128i d1 = _mm_loadu_si128 ((__m128i *) p1);
+  __m128i d2 = _mm_loadu_si128 ((__m128i *) p2);
+  __m128i result = _mm_cmpeq_epi16 (d1, d2);
+  return _mm_movemask_epi8 (result);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu" 1 } } */
--- gcc/testsuite/gcc.target/i386/pr59539-2.c.jj2013-12-18 
08:46:33.130826198 +0100
+++ gcc/testsuite/gcc.target/i386/pr59539-2.c   2013-12-18 08:47:14.890608917 
+0100
@@ -0,0 +1,16 @@
+/* PR target/59539 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2" } */
+
+#include 
+
+int
+foo (void *p1, void *p2)
+{
+  __m256i d1 = _mm256_loadu_si256 ((__m256i *) p1);
+  __m256i d2 = _mm256_loadu_si256 ((__m256i *) p2);
+  __m256i result = _mm256_cmpeq_epi16 (d1, d2);
+  return _mm256_movemask_epi8 (result);
+}
+
+/* { dg-final { scan-assembler-times "vmovdqu" 1 } } */

Jakub


[ARM] Fix thinko in arm_expand_epilogue_apcs_frame

2013-12-18 Thread Eric Botcazou
While arm_expand_epilogue has the correct:

  if (crtl->calls_eh_return)
emit_insn (gen_addsi3 (stack_pointer_rtx,
   stack_pointer_rtx,
   gen_rtx_REG (SImode, ARM_EH_STACKADJ_REGNUM)));

arm_expand_epilogue_apcs_frame has the bogus:

   if (crtl->calls_eh_return)
 emit_insn (gen_addsi3 (stack_pointer_rtx,
   stack_pointer_rtx,
   GEN_INT (ARM_EH_STACKADJ_REGNUM)));

leading to:

  add sp, sp, #2

in the assembly file.

Tested on ARM/VxWorks, applied on the mainline and 4.8 branch as obvious.


2013-12-18  Eric Botcazou  

* config/arm/arm.c (arm_expand_epilogue_apcs_frame): Fix thinko.


-- 
Eric BotcazouIndex: config/arm/arm.c
===
--- config/arm/arm.c	(revision 206039)
+++ config/arm/arm.c	(working copy)
@@ -26852,8 +26852,8 @@ arm_expand_epilogue_apcs_frame (bool rea
 
   if (crtl->calls_eh_return)
 emit_insn (gen_addsi3 (stack_pointer_rtx,
-   stack_pointer_rtx,
-   GEN_INT (ARM_EH_STACKADJ_REGNUM)));
+			   stack_pointer_rtx,
+			   gen_rtx_REG (SImode, ARM_EH_STACKADJ_REGNUM)));
 
   if (IS_STACKALIGN (func_type))
 /* Restore the original stack pointer.  Before prologue, the stack was

[PATCH] Fix ifcvt (PR rtl-optimization/58668)

2013-12-18 Thread Jakub Jelinek
Hi!

As discussed in the PR, this testcase ICEs on arm, because ifcvt
is relying on active instruction counts from various routines
(count_bb_insns, flow_find_cross_jump and flow_find_head_matching_sequence),
but each of those routines have different view of what counts as
active insns.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux
and tested on the testcase using cross to arm.  Ok for trunk?

2013-12-18  Jakub Jelinek  

PR rtl-optimization/58668
* cfgcleanup.c (flow_find_cross_jump): Don't count
any jumps if dir_p is NULL.  Remove p1 variable and make USE/CLOBBER
check consistent with other places.
(flow_find_head_matching_sequence): Don't count USE or CLOBBER insns.
(try_head_merge_bb): Adjust for the flow_find_head_matching_sequence
counting change.
* ifcvt.c (count_bb_insns): Don't count USE or CLOBBER insns.

* gcc.dg/pr58668.c: New test.

--- gcc/cfgcleanup.c.jj 2013-12-10 08:52:13.0 +0100
+++ gcc/cfgcleanup.c2013-12-18 12:23:11.695684615 +0100
@@ -1295,7 +1295,6 @@ flow_find_cross_jump (basic_block bb1, b
 {
   rtx i1, i2, last1, last2, afterlast1, afterlast2;
   int ninsns = 0;
-  rtx p1;
   enum replace_direction dir, last_dir, afterlast_dir;
   bool follow_fallthru, did_fallthru;
 
@@ -1323,8 +1322,9 @@ flow_find_cross_jump (basic_block bb1, b
   || (returnjump_p (i2) && !side_effects_p (PATTERN (i2
 {
   last2 = i2;
-  /* Count everything except for unconditional jump as insn.  */
-  if (!simplejump_p (i2) && !returnjump_p (i2) && last1)
+  /* Count everything except for unconditional jump as insn.
+Don't count any jumps if dir_p is NULL.  */
+  if (!simplejump_p (i2) && !returnjump_p (i2) && last1 && dir_p)
ninsns++;
   i2 = PREV_INSN (i2);
 }
@@ -1375,8 +1375,8 @@ flow_find_cross_jump (basic_block bb1, b
  last1 = i1, last2 = i2;
  afterlast_dir = last_dir;
  last_dir = dir;
- p1 = PATTERN (i1);
- if (!(GET_CODE (p1) == USE || GET_CODE (p1) == CLOBBER))
+ if (GET_CODE (PATTERN (i1)) != USE
+ && GET_CODE (PATTERN (i1)) != CLOBBER)
ninsns++;
}
 
@@ -1494,7 +1494,9 @@ flow_find_head_matching_sequence (basic_
 
  beforelast1 = last1, beforelast2 = last2;
  last1 = i1, last2 = i2;
- ninsns++;
+ if (GET_CODE (PATTERN (i1)) != USE
+ && GET_CODE (PATTERN (i1)) != CLOBBER)
+   ninsns++;
}
 
   if (i1 == BB_END (bb1) || i2 == BB_END (bb2)
@@ -2410,7 +2412,9 @@ try_head_merge_bb (basic_block bb)
return false;
   do
e0_last_head = prev_real_insn (e0_last_head);
-  while (DEBUG_INSN_P (e0_last_head));
+  while (DEBUG_INSN_P (e0_last_head)
+|| GET_CODE (PATTERN (e0_last_head)) == USE
+|| GET_CODE (PATTERN (e0_last_head)) == CLOBBER);
 }
 
   if (max_match == 0)
@@ -2430,7 +2434,9 @@ try_head_merge_bb (basic_block bb)
   basic_block merge_bb = EDGE_SUCC (bb, ix)->dest;
   rtx head = BB_HEAD (merge_bb);
 
-  while (!NONDEBUG_INSN_P (head))
+  while (!NONDEBUG_INSN_P (head)
+|| GET_CODE (PATTERN (head)) == USE
+|| GET_CODE (PATTERN (head)) == CLOBBER)
head = NEXT_INSN (head);
   headptr[ix] = head;
   currptr[ix] = head;
@@ -2439,7 +2445,9 @@ try_head_merge_bb (basic_block bb)
   for (j = 1; j < max_match; j++)
do
  head = NEXT_INSN (head);
-   while (!NONDEBUG_INSN_P (head));
+   while (!NONDEBUG_INSN_P (head)
+  || GET_CODE (PATTERN (head)) == USE
+  || GET_CODE (PATTERN (head)) == CLOBBER);
   simulate_backwards_to_point (merge_bb, live, head);
   IOR_REG_SET (live_union, live);
 }
--- gcc/ifcvt.c.jj  2013-12-11 10:11:04.0 +0100
+++ gcc/ifcvt.c 2013-12-18 11:37:31.924056330 +0100
@@ -118,7 +118,11 @@ count_bb_insns (const_basic_block bb)
 
   while (1)
 {
-  if (CALL_P (insn) || NONJUMP_INSN_P (insn))
+  if ((CALL_P (insn) || NONJUMP_INSN_P (insn))
+ /* Don't count USE/CLOBBER insns, flow_find_cross_jump etc.
+don't count them either and we need consistency.  */
+ && GET_CODE (PATTERN (insn)) != USE
+ && GET_CODE (PATTERN (insn)) != CLOBBER)
count++;
 
   if (insn == BB_END (bb))
--- gcc/testsuite/gcc.dg/pr58668.c.jj   2013-12-18 11:43:05.729311888 +0100
+++ gcc/testsuite/gcc.dg/pr58668.c  2013-12-18 11:42:54.0 +0100
@@ -0,0 +1,25 @@
+/* PR rtl-optimization/58668 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-mthumb" { target { { arm*-*-* } && arm_thumb2_ok 
} } } */
+
+void *fn1 (void *);
+void *fn2 (void *, const char *);
+void fn3 (void *);
+void fn4 (void *, int);
+
+void *
+test (void *x)
+{
+  void *a, *b;
+  if (!(a = fn1 (x)))
+return (void *) 0;
+  if (!(b = fn2 (a, "w")))
+{
+  fn3 (a);
+  

Re: [PATCH] Avoid uninitialized warning in i386.c

2013-12-18 Thread Marek Polacek
On Wed, Dec 18, 2013 at 02:45:12PM +0100, Jakub Jelinek wrote:
> On Wed, Dec 18, 2013 at 02:39:44PM +0100, Marek Polacek wrote:
> > Bootstrap with -fsanitize=undefined revealed that the alg variable
> > may be used uninitialized.  Or at least gcc thinks so.  This patch
> > initializes it to 0.
> > 
> > Regtested/bootstrapped on x86_64-linux, ok for trunk?
> 
> Do
>   stringop_alg alg = last_alg;
> instead, or alternatively (perhaps better), remove the alg variable
> altogether, just break; in the loop and set
>   input_ranges[n].alg = (stringop_alg) i;

Okay.  Regtested/bootstrapped on x86_64-linux, ok now?

2013-12-18  Marek Polacek  

* config/i386/i386.c (ix86_parse_stringop_strategy_string): Remove
variable alg.  Use index variable i directly.

--- gcc/config/i386/i386.c.mp   2013-12-18 13:38:21.908138307 +0100
+++ gcc/config/i386/i386.c  2013-12-18 16:24:37.034353633 +0100
@@ -2856,7 +2856,6 @@ ix86_parse_stringop_strategy_string (cha
   do
 {
   int maxs;
-  stringop_alg alg;
   char alg_name[128];
   char align[16];
   next_range_str = strchr (curr_range_str, ',');
@@ -2879,13 +2878,8 @@ ix86_parse_stringop_strategy_string (cha
 }
 
   for (i = 0; i < last_alg; i++)
-{
-  if (!strcmp (alg_name, stringop_alg_names[i]))
-{
-  alg = (stringop_alg) i;
-  break;
-}
-}
+   if (!strcmp (alg_name, stringop_alg_names[i]))
+ break;
 
   if (i == last_alg)
 {
@@ -2896,7 +2890,7 @@ ix86_parse_stringop_strategy_string (cha
 }
 
   input_ranges[n].max = maxs;
-  input_ranges[n].alg = alg;
+  input_ranges[n].alg = (stringop_alg) i;
   if (!strcmp (align, "align"))
 input_ranges[n].noalign = false;
   else if (!strcmp (align, "noalign"))

Marek


Re: [Patch, AArch64] [3/6] Implement support for Crypto -- AES.

2013-12-18 Thread Tejas Belagod

Marcus Shawcroft wrote:

On 6 December 2013 17:36, Tejas Belagod  wrote:


* gcc.target/aarch64/aes.c: New.


Add _1 on the test case file name (see http://gcc.gnu.org/wiki/TestCaseWriting)



diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index dc56170..9f35e09 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -15793,6 +15793,42 @@ vaddvq_f64 (float64x2_t __a)
   return vgetq_lane_f64 (__t, __LANE0 (2));
 }

+#ifdef __ARM_FEATURE_CRYPTO
+
+/* vaes  */
+
+static __inline uint8x16_t
+vaeseq_u8 (uint8x16_t data, uint8x16_t key)
+{
+  return
+(uint8x16_t) __builtin_aarch64_crypto_aesev16qi ((int8x16_t) data,
+(int8x16_t) key);


James G fixed the infrastructure to allow properly typed builtins, see:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02005.html
and
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02880.html



@@ -959,3 +966,7 @@
(UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")])

 (define_int_attr frecp_suffix  [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")])
+
+(define_int_attr aes_op [(UNSPEC_AESE "e") (UNSPEC_AESD "d")])
+(define_int_attr aesmc_op [(UNSPEC_AESMC "mc") (UNSPEC_AESIMC "imc")])
+


Superflous trailing blank line.


diff --git a/gcc/testsuite/gcc.target/aarch64/aes.c
b/gcc/testsuite/gcc.target/aarch64/aes.c
new file mode 100644
index 000..82665fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/aes.c
@@ -0,0 +1,40 @@
+
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a+crypto" } */
+
+#include "arm_neon.h"
+
+uint8x16_t
+test_vaeseq_u8 (uint8x16_t data, uint8x16_t key)
+{
+  return vaeseq_u8 (data, key);
+}
+
+/* { dg-final { scan-assembler "aese\\tv\[0-9\]+\.16b, v\[0-9\]+\.16b" } }


Use scan-assembler-times 1 instead please.


Thanks for the review. Here is an improved patch.

Tested on aarch64-none-elf. OK for trunk?

Thanks
Tejas.

2013-12-18  Tejas Belagod  

gcc/
* config/aarch64/aarch64-simd-builtins.def: Update builtins table.
* config/aarch64/aarch64-builtins.c (aarch64_types_binopu_qualifiers,
TYPES_BINOPU): New.
* config/aarch64/aarch64-simd.md (aarch64_crypto_aesv16qi,
aarch64_crypto_aesv16qi): New.
* config/aarch64/arm_neon.h (vaeseq_u8, vaesdq_u8, vaesmcq_u8,
vaesimcq_u8): New.
* config/aarch64/iterators.md (UNSPEC_AESE, UNSPEC_AESD, UNSPEC_AESMC,
UNSPEC_AESIMC): New.
(CRYPTO_AES, CRYPTO_AESMC): New int iterators.
(aes_op, aesmc_op): New int attributes.

testsuite/
* gcc.target/aarch64/aes_1.c: New.


diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 1bc3cc5..00a33ce 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -142,6 +142,10 @@ static enum aarch64_type_qualifiers
 aarch64_types_unop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none };
 #define TYPES_UNOP (aarch64_types_unop_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_unopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned };
+#define TYPES_UNOPU (aarch64_types_unopu_qualifiers)
 #define TYPES_CREATE (aarch64_types_unop_qualifiers)
 #define TYPES_REINTERP (aarch64_types_unop_qualifiers)
 static enum aarch64_type_qualifiers
@@ -149,6 +153,10 @@ aarch64_types_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_maybe_immediate };
 #define TYPES_BINOP (aarch64_types_binop_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned };
+#define TYPES_BINOPU (aarch64_types_binopu_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
 #define TYPES_TERNOP (aarch64_types_ternop_qualifiers)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 1dc3c1f..6b72e8f 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -367,3 +367,8 @@
   BUILTIN_VSDQ_I_DI (BSL_U, simd_bsl, 0)
   BUILTIN_VALLDIF (BSL_S, simd_bsl, 0)
 
+  /* Implemented by aarch64_crypto_aes.  */
+  VAR1 (BINOPU, crypto_aese, 0, v16qi)
+  VAR1 (BINOPU, crypto_aesd, 0, v16qi)
+  VAR1 (UNOPU, crypto_aesmc, 0, v16qi)
+  VAR1 (UNOPU, crypto_aesimc, 0, v16qi)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 158b3dc..f8c204f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4074,3 +4074,25 @@
   (gen_aarch64_get_lane (operands[0], operands[1], operands[2]));
 DONE;
 })
+
+;; aes
+
+(define_insn "aarch64_crypto_aesv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "=w")
+(unspec:V

Re: [Patch, AArch64] [4/6] Implement support for Crypto -- SHA1.

2013-12-18 Thread Tejas Belagod

Marcus Shawcroft wrote:

Same comments as previous patch:

On 6 December 2013 17:36, Tejas Belagod  wrote:


testsuite/
* gcc.target/aarch64/sha1.c: New.


Add _1 on the test case file name (see http://gcc.gnu.org/wiki/TestCaseWriting)


+static __inline uint32x4_t
+vsha1cq_u32 (uint32x4_t hash_abcd, uint32_t hash_e, uint32x4_t wk)
+{
+  return
+(uint32x4_t) __builtin_aarch64_crypto_sha1cv4si ((int32x4_t) hash_abcd,
+(int32_t) hash_e,
+(int32x4_t) wk);
+}


James G fixed the infrastructure to allow properly typed builtins, see:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02005.html
and
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02880.html


+/* { dg-final { scan-assembler "sha1c\\tq" } } */


Use scan-assembler-times 1


Thanks for the review. Here is an improved patch.

Tested on aarch64-none-elf. OK for trunk?

Thanks
Tejas.

2013-12-18  Tejas Belagod  
gcc/
* config/aarch64/aarch64-simd-builtins.def: Update builtins table.
* config/aarch64/aarch64-builtins.c (aarch64_types_ternopu_qualifiers,
TYPES_TERNOPU): New.
* config/aarch64/aarch64-simd.md (aarch64_crypto_sha1hsi,
aarch64_crypto_sha1su1v4si, aarch64_crypto_sha1v4si,
aarch64_crypto_sha1su0v4si): New.
* config/aarch64/arm_neon.h (vsha1cq_u32, sha1mq_u32, vsha1pq_u32,
vsha1h_u32, vsha1su0q_u32, vsha1su1q_u32): New.
* config/aarch64/iterators.md (UNSPEC_SHA1, UNSPEC_SHA1SU<01>):
New.
(CRYPTO_SHA1): New int iterator.
(sha1_op): New int attribute.

testsuite/
* gcc.target/aarch64/sha1_1.c: New.diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 00a33ce..ea933d6 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -161,6 +161,12 @@ aarch64_types_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
 #define TYPES_TERNOP (aarch64_types_ternop_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_ternopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned,
+  qualifier_unsigned, qualifier_unsigned };
+#define TYPES_TERNOPU (aarch64_types_ternopu_qualifiers)
+
+static enum aarch64_type_qualifiers
 aarch64_types_quadop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none,
   qualifier_none, qualifier_none };
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 6b72e8f..2d3ccb0 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -372,3 +372,11 @@
   VAR1 (BINOPU, crypto_aesd, 0, v16qi)
   VAR1 (UNOPU, crypto_aesmc, 0, v16qi)
   VAR1 (UNOPU, crypto_aesimc, 0, v16qi)
+
+  /* Implemented by aarch64_crypto_sha1.  */
+  VAR1 (UNOPU, crypto_sha1h, 0, si)
+  VAR1 (BINOPU, crypto_sha1su1, 0, v4si)
+  VAR1 (TERNOPU, crypto_sha1c, 0, v4si)
+  VAR1 (TERNOPU, crypto_sha1m, 0, v4si)
+  VAR1 (TERNOPU, crypto_sha1p, 0, v4si)
+  VAR1 (TERNOPU, crypto_sha1su0, 0, v4si)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index f8c204f..5b454ca 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4096,3 +4096,46 @@
   [(set_attr "type" "crypto_aes")]
 )
 
+;; sha1
+
+(define_insn "aarch64_crypto_sha1hsi"
+  [(set (match_operand:SI 0 "register_operand" "=w")
+(unspec:SI [(match_operand:SI 1
+   "register_operand" "w")]
+ UNSPEC_SHA1H))]
+  "TARGET_SIMD && TARGET_CRYPTO"
+  "sha1h\\t%s0, %s1"
+  [(set_attr "type" "crypto_sha1_fast")]
+)
+
+(define_insn "aarch64_crypto_sha1su1v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+  (match_operand:V4SI 2 "register_operand" "w")]
+ UNSPEC_SHA1SU1))]
+  "TARGET_SIMD && TARGET_CRYPTO"
+  "sha1su1\\t%0.4s, %2.4s"
+  [(set_attr "type" "crypto_sha1_fast")]
+)
+
+(define_insn "aarch64_crypto_sha1v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+  (match_operand:SI 2 "register_operand" "w")
+  (match_operand:V4SI 3 "register_operand" "w")]
+ CRYPTO_SHA1))]
+  "TARGET_SIMD && TARGET_CRYPTO"
+  "sha1\\t%q0, %s2, %3.4s"
+  [(set_attr "type" "crypto_sha1_slow")]
+)
+
+(define_insn "aarch64_crypto_sha1su0v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+  (match_operand:V4SI 2 "register_operand" "w")
+  (match_operand:V4SI 3 "register_operand" "w")]
+ UNSPEC_SHA1SU0))]
+  "TARGET_SIMD && TARGET_CRYPTO"
+  "sha1su0

Re: [Patch, AArch64] [6/6] Implement support for Crypto -- PMULL.64.

2013-12-18 Thread Tejas Belagod

Tejas Belagod wrote:

Hi,

This patch implements support for crypto pmull.64.

Tested on aarch64-none-elf. OK for trunk?

Thanks,
Tejas.

2013-12-06  Tejas Belagod  

gcc/
* config/aarch64/aarch64-builtins.c: Define builtin types for poly64_t
poly128_t.
* aarch64/aarch64-simd-builtins.def: Update builtins table.
* config/aarch64/aarch64-simd.md (aarch64_crypto_pmulldi,
aarch64_crypto_pmullv2di): New.
* config/aarch64/aarch64.c (aarch64_simd_mangle_map): Update table for
poly64x2_t mangler.
* config/aarch64/arm_neon.h (poly64x2_t, poly64_t, poly128_t): Define.
(vmull_p64, vmull_high_p64): New.
* config/aarch64/iterators.md (UNSPEC_PMULL<2>): New.

testsuite/

* gcc.target/aarch64/pmull.c: New.


Here is an improved patch.

Tested on aarch64-none-elf. OK for trunk?

Thanks
Tejas.

2013-12-18  Tejas Belagod  

gcc/
* config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins):
Define builtin types for poly64_t poly128_t.
(TYPES_BINOPP, aarch64_types_binopp_qualifiers): New.
* aarch64/aarch64-simd-builtins.def: Update builtins table.
* config/aarch64/aarch64-simd.md (aarch64_crypto_pmulldi,
aarch64_crypto_pmullv2di): New.
* config/aarch64/aarch64.c (aarch64_simd_mangle_map): Update table for
poly64x2_t mangler.
* config/aarch64/arm_neon.h (poly64x2_t, poly64_t, poly128_t): Define.
(vmull_p64, vmull_high_p64): New.
* config/aarch64/iterators.md (UNSPEC_PMULL<2>): New.

testsuite/

* gcc.target/aarch64/pmull_1.c: New.diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index ea933d6..439c3f4 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -157,6 +157,11 @@ aarch64_types_binopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned };
 #define TYPES_BINOPU (aarch64_types_binopu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_poly, qualifier_poly, qualifier_poly };
+#define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
+
+static enum aarch64_type_qualifiers
 aarch64_types_ternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
 #define TYPES_TERNOP (aarch64_types_ternop_qualifiers)
@@ -574,6 +579,8 @@ aarch64_init_simd_builtins (void)
   /* Poly scalar type nodes.  */
   tree aarch64_simd_polyQI_type_node = aarch64_build_poly_type (QImode);
   tree aarch64_simd_polyHI_type_node = aarch64_build_poly_type (HImode);
+  tree aarch64_simd_polyDI_type_node = aarch64_build_poly_type (DImode);
+  tree aarch64_simd_polyTI_type_node = aarch64_build_poly_type (TImode);
 
   /* Float type nodes.  */
   tree aarch64_simd_float_type_node = aarch64_build_signed_type (SFmode);
@@ -598,6 +605,10 @@ aarch64_init_simd_builtins (void)
 "__builtin_aarch64_simd_poly8");
   (*lang_hooks.types.register_builtin_type) (aarch64_simd_polyHI_type_node,
 "__builtin_aarch64_simd_poly16");
+  (*lang_hooks.types.register_builtin_type) (aarch64_simd_polyDI_type_node,
+"__builtin_aarch64_simd_poly64");
+  (*lang_hooks.types.register_builtin_type) (aarch64_simd_polyTI_type_node,
+"__builtin_aarch64_simd_poly128");
   (*lang_hooks.types.register_builtin_type) (aarch64_simd_intTI_type_node,
 "__builtin_aarch64_simd_ti");
   (*lang_hooks.types.register_builtin_type) (aarch64_simd_intEI_type_node,
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index adda948..159d98d 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -386,3 +386,7 @@
   VAR1 (TERNOPU, crypto_sha256h2, 0, v4si)
   VAR1 (BINOPU, crypto_sha256su0, 0, v4si)
   VAR1 (TERNOPU, crypto_sha256su1, 0, v4si)
+
+  /* Implemented by aarch64_crypto_pmull.  */
+  VAR1 (BINOPP, crypto_pmull, 0, di)
+  VAR1 (BINOPP, crypto_pmull, 0, v2di)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 874d532..5345759 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4173,3 +4173,25 @@
   "sha256su1\\t%0.4s, %2.4s, %3.4s"
   [(set_attr "type" "crypto_sha256_slow")]
 )
+
+;; pmull
+
+(define_insn "aarch64_crypto_pmulldi"
+  [(set (match_operand:TI 0 "register_operand" "=w")
+(unspec:TI  [(match_operand:DI 1 "register_operand" "w")
+(match_operand:DI 2 "register_operand" "w")]
+   UNSPEC_PMULL))]
+ "TARGET_SIMD && TARGET_CRYPTO"
+ "pmull\\t%0.1q, %1.1d, %2.1d"
+  [(set_attr "type" "neon_mul_d_long")]
+)
+
+(define

Re: [Patch, AArch64] [5/6] Implement support for Crypto -- SHA256.

2013-12-18 Thread Tejas Belagod

Marcus Shawcroft wrote:

On 6 December 2013 17:36, Tejas Belagod  wrote:

Hi,

The attached patch implements support for crypto sha256.


Same comments as previous crypto patch.


Thanks for the review. Here is an improved patch.

Tested on aarch64-none-elf. OK for trunk?

Thanks
Tejas.

2013-12-18  Tejas Belagod  

gcc/
* config/aarch64/aarch64-simd-builtins.def: Update builtins table.
* config/aarch64/aarch64-simd.md (aarch64_crypto_sha256hv4si,
aarch64_crypto_sha256su0v4si, aarch64_crypto_sha256su1v4si): New.
* config/aarch64/arm_neon.h (vsha256hq_u32, vsha256h2q_u32,
vsha256su0q_u32, vsha256su1q_u32): New.
* config/aarch64/iterators.md (UNSPEC_SHA256H<2>, UNSPEC_SHA256SU<01>):
New.
(CRYPTO_SHA256): New int iterator.
(sha256_op): New int attribute.

testsuite/
* gcc.target/aarch64/sha256_1.c: New.diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
b/gcc/config/aarch64/aarch64-simd-builtins.def
index 2d3ccb0..adda948 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -380,3 +380,9 @@
   VAR1 (TERNOPU, crypto_sha1m, 0, v4si)
   VAR1 (TERNOPU, crypto_sha1p, 0, v4si)
   VAR1 (TERNOPU, crypto_sha1su0, 0, v4si)
+
+  /* Implemented by aarch64_crypto_sha256.  */
+  VAR1 (TERNOPU, crypto_sha256h, 0, v4si)
+  VAR1 (TERNOPU, crypto_sha256h2, 0, v4si)
+  VAR1 (BINOPU, crypto_sha256su0, 0, v4si)
+  VAR1 (TERNOPU, crypto_sha256su1, 0, v4si)
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 5b454ca..874d532 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4139,3 +4139,37 @@
   "sha1su0\\t%0.4s, %2.4s, %3.4s"
   [(set_attr "type" "crypto_sha1_xor")]
 )
+
+;; sha256
+
+(define_insn "aarch64_crypto_sha256hv4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+  (match_operand:V4SI 2 "register_operand" "w")
+  (match_operand:V4SI 3 "register_operand" "w")]
+ CRYPTO_SHA256))]
+  "TARGET_SIMD && TARGET_CRYPTO"
+  "sha256h\\t%q0, %q2, %3.4s"
+  [(set_attr "type" "crypto_sha256_slow")]
+)
+
+(define_insn "aarch64_crypto_sha256su0v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+  (match_operand:V4SI 2 "register_operand" "w")]
+ UNSPEC_SHA256SU0))]
+  "TARGET_SIMD &&TARGET_CRYPTO"
+  "sha256su0\\t%0.4s, %2.4s"
+  [(set_attr "type" "crypto_sha256_fast")]
+)
+
+(define_insn "aarch64_crypto_sha256su1v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=w")
+(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+  (match_operand:V4SI 2 "register_operand" "w")
+  (match_operand:V4SI 3 "register_operand" "w")]
+ UNSPEC_SHA256SU1))]
+  "TARGET_SIMD &&TARGET_CRYPTO"
+  "sha256su1\\t%0.4s, %2.4s, %3.4s"
+  [(set_attr "type" "crypto_sha256_slow")]
+)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 5a5691d..709c6a1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -22990,6 +22990,30 @@ vsha1su1q_u32 (uint32x4_t tw0_3, uint32x4_t w12_15)
   return __builtin_aarch64_crypto_sha1su1v4si_uuu (tw0_3, w12_15);
 }
 
+static __inline uint32x4_t
+vsha256hq_u32 (uint32x4_t hash_abcd, uint32x4_t hash_efgh, uint32x4_t wk)
+{
+  return __builtin_aarch64_crypto_sha256hv4si_ (hash_abcd, hash_efgh, wk);
+}
+
+static __inline uint32x4_t
+vsha256h2q_u32 (uint32x4_t hash_efgh, uint32x4_t hash_abcd, uint32x4_t wk)
+{
+  return __builtin_aarch64_crypto_sha256h2v4si_ (hash_efgh, hash_abcd, wk);
+}
+
+static __inline uint32x4_t
+vsha256su0q_u32 (uint32x4_t w0_3, uint32x4_t w4_7)
+{
+  return __builtin_aarch64_crypto_sha256su0v4si_uuu (w0_3, w4_7);
+}
+
+static __inline uint32x4_t
+vsha256su1q_u32 (uint32x4_t tw0_3, uint32x4_t w8_11, uint32x4_t w12_15)
+{
+  return __builtin_aarch64_crypto_sha256su1v4si_ (tw0_3, w8_11, w12_15);
+}
+
 #endif
 
 /* vshl */
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 12de4ac..88e 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -277,6 +277,10 @@
 UNSPEC_SHA1H; Used in aarch64-simd.md.
 UNSPEC_SHA1SU0  ; Used in aarch64-simd.md.
 UNSPEC_SHA1SU1  ; Used in aarch64-simd.md.
+UNSPEC_SHA256H  ; Used in aarch64-simd.md.
+UNSPEC_SHA256H2 ; Used in aarch64-simd.md.
+UNSPEC_SHA256SU0; Used in aarch64-simd.md.
+UNSPEC_SHA256SU1; Used in aarch64-simd.md.
 ])
 
 ;; ---
@@ -863,6 +867,8 @@
 
 (define_int_iterator CRYPTO_SHA1 [UNSPEC_SHA1C UNSPEC_SHA1M UNSPEC_SHA1P])
 
+(define_int_iterator CRYPTO_SHA256 [UNSPEC_SHA256H UNSPEC_SHA256H2])
+
 ;

Re: [PATCH][ARM] Implement CRC32 intrinsics for AArch32 in ARMv8-A

2013-12-18 Thread Ramana Radhakrishnan
On Tue, Dec 3, 2013 at 1:46 PM, Kyrill Tkachov  wrote:
> Ping?
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02351.html
>
> Thanks,
> Kyrill

Ok if no objections in 24 hours.

Ramana

>
>
> On 26/11/13 09:44, Kyrill Tkachov wrote:
>>
>> Ping?
>>
>> Thanks,
>> Kyrill
>>
>> On 19/11/13 17:04, Kyrill Tkachov wrote:
>>>
>>> On 19/11/13 16:26, Joseph S. Myers wrote:

 In any target header installed for user use, such as arm_acle.h, you
 need
 to be namespace-clean.  In this case, that means you need to use
 implementation-namespace identifiers such as __a, __b and __d in case
 the
 user has defined macros with names such as a, b and d (unless the ACLE
 says that identifiers a, b and d are in the implementation's namespace
 when this header is included, which would be a very odd thing for it to
 do).

>>> Hi Joseph,
>>>
>>> Thanks for the catch. ACLE doesn't expect a,b,d to be in the
>>> implementation
>>> namespace. I've added underscores before them.
>>>
>>> Made sure tests pass.
>>>
>>> Revised patch attached.
>>> How's this?
>>>
>>> Kyrill
>>>
>>> gcc/
>>> 2013-11-19  Kyrylo Tkachov  
>>>
>>>* Makefile.in (TEXI_GCC_FILES): Add arm-acle-intrinsics.texi.
>>>* config.gcc (extra_headers): Add arm_acle.h.
>>>* config/arm/arm.c (FL_CRC32): Define.
>>>(arm_have_crc): Likewise.
>>>(arm_option_override): Set arm_have_crc.
>>>(arm_builtins): Add CRC32 builtins.
>>>(bdesc_2arg): Likewise.
>>>(arm_init_crc32_builtins): New function.
>>>(arm_init_builtins): Initialise CRC32 builtins.
>>>(arm_file_start): Handle architecture extensions.
>>>* config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Define
>>> __ARM_FEATURE_CRC32.
>>>Define __ARM_32BIT_STATE.
>>>(TARGET_CRC32): Define.
>>>* config/arm/arm-arches.def: Add armv8-a+crc.
>>>* config/arm/arm-tables.opt: Regenerate.
>>>* config/arm/arm.md (type): Add crc.
>>>(): New insn.
>>>* config/arm/arm_acle.h: New file.
>>>* config/arm/iterators.md (CRC): New int iterator.
>>>(crc_variant, crc_mode): New int attributes.
>>>* confg/arm/unspecs.md (UNSPEC_CRC32B, UNSPEC_CRC32H,
>>> UNSPEC_CRC32W,
>>>UNSPEC_CRC32CB, UNSPEC_CRC32CH, UNSPEC_CRC32CW): New unspecs.
>>>* doc/invoke.texi: Document -march=armv8-a+crc option.
>>>* doc/extend.texi: Document ACLE intrinsics.
>>>* doc/arm-acle-intrinsics.texi: New.
>>>
>>>
>>> gcc/testsuite
>>> 2013-11-19  Kyrylo Tkachov  
>>>
>>>* lib/target-supports.exp (add_options_for_arm_crc): New
>>> procedure.
>>>(check_effective_target_arm_crc_ok_nocache): Likewise.
>>>(check_effective_target_arm_crc_ok): Likewise.
>>>* gcc.target/arm/acle/: New directory.
>>>* gcc.target/arm/acle/acle.exp: New.
>>>* gcc.target/arm/acle/crc32b.c: New test.
>>>* gcc.target/arm/acle/crc32h.c: Likewise.
>>>* gcc.target/arm/acle/crc32w.c: Likewise.
>>>* gcc.target/arm/acle/crc32d.c: Likewise.
>>>* gcc.target/arm/acle/crc32cb.c: Likewise.
>>>* gcc.target/arm/acle/crc32ch.c: Likewise.
>>>* gcc.target/arm/acle/crc32cw.c: Likewise.
>>>* gcc.target/arm/acle/crc32cd.c: Likewise.
>
>
>


RE: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-18 Thread Gopalasubramanian, Ganesh

Ping!

"Gopalasubramanian, Ganesh"  wrote:


> Yes, I figured that was the original idea behind it, but the final family of 
> the jaguar processors seems to have become 16h instead of 14h (bobcat) at 
> some point.

Yes. It is amdfam16h. I was supposed to pass on some comments on the patch.
1. Amdfam16h for Jaguar.
2. For Jaguar, the priority needs to be AVX (AVX got included into the Jaguar 
ISA).

I have a doubt! What would be done if priority is set to "F_FMA4" instead of 
"F_XOP" for bdver1?

Regards
Ganesh




Re: [PATCH] Avoid uninitialized warning in i386.c

2013-12-18 Thread Jakub Jelinek
On Wed, Dec 18, 2013 at 04:27:23PM +0100, Marek Polacek wrote:
> Okay.  Regtested/bootstrapped on x86_64-linux, ok now?
> 
> 2013-12-18  Marek Polacek  
> 
>   * config/i386/i386.c (ix86_parse_stringop_strategy_string): Remove
>   variable alg.  Use index variable i directly.

Yes, thanks.

Jakub


Re: [PATCH i386 7/8] [AVX-512] Add tests.

2013-12-18 Thread Uros Bizjak
On Wed, Dec 18, 2013 at 2:23 PM, Kirill Yukhin  wrote:
> Hello,
> On 02 Dec 16:18, Kirill Yukhin wrote:
>> Hello,
>> > Is it ok now?
>> Ping?
> Ping.
>
> Rebased patch attached.

Whoa.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md

No, not in this patch.

+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512cd -DHAVE_512 -DAVX512CD" } */
+/* { dg-require-effective-target avx512cd } */
+
+#include "avx512f-helper.h"

Please put the definitions in the source itself, not in command line.

+  if ((ecx & bit_OSXSAVE) == (bit_OSXSAVE))
+{
+  if (__get_cpuid_max (0, NULL) < 7)
+return 0;
+
+  __cpuid_count (7, 0, eax, ebx, ecx, edx);
+
+  if ((avx512f_os_support ()) && ((ebx & (bit_AVX512ER)) ==
(bit_AVX512ER)))

No need for parenthesis around bit_* constants. They alredy have
needed parenthesis in cpuid.h.

+++ b/gcc/testsuite/gcc.target/i386/avx512f-vgetmantps-2.c
@@ -0,0 +1,110 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512f -DAVX512F" } */
+/* { dg-require-effective-target avx512f } */
+
+#include "avx512f-helper.h"
+
+#define SIZE (AVX512F_LEN / 32)
+#include "avx512f-mask-type.h"
+#include 
+
+#ifndef GET_NORM_MANT
+#define GET_NORM_MANT
+float
+get_norm_mant (float source, int signctrl, int interv)
+{
+  int dest, src, sign, exp, fraction;
+  src = *(int *) &source;

Strict aliasing violation.

+  dest = (sign << 31) | (exp << 23) | fraction;
+  return *(float *) &dest;

also here. I wonder, why you didn't get:

warning: dereferencing type-punned pointer will break strict-aliasing
rules [-Wstrict-aliasing]
   src = *(int *) &source;
warning: dereferencing type-punned pointer will break strict-aliasing
rules [-Wstrict-aliasing]
   return *(float *) &dest;

Can you please review testcases for aliasing violations (you can check
the new tests with -Wstrict-aliasing warning enabled)

The patch is otherwise OK for mainline, on the (obvious) condition
that other patches get approved and committed first.

Thanks,
Uros.


Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-18 Thread Uros Bizjak
On Wed, Dec 18, 2013 at 4:38 PM, Gopalasubramanian, Ganesh
 wrote:
>
> Ping!
>
> "Gopalasubramanian, Ganesh"  wrote:
>
>
>> Yes, I figured that was the original idea behind it, but the final family of 
>> the jaguar processors seems to have become 16h instead of 14h (bobcat) at 
>> some point.
>
> Yes. It is amdfam16h. I was supposed to pass on some comments on the patch.
> 1. Amdfam16h for Jaguar.
> 2. For Jaguar, the priority needs to be AVX (AVX got included into the Jaguar 
> ISA).
>
> I have a doubt! What would be done if priority is set to "F_FMA4" instead of 
> "F_XOP" for bdver1?

XOP enables FMA4,  so it should be better to set priority to
P_PROC_XOP. From config/i386/i386-common.c:

#define OPTION_MASK_ISA_XOP_SET \
  (OPTION_MASK_ISA_XOP | OPTION_MASK_ISA_FMA4_SET)

Looking at processor_dispatch_table, bdver1 doesn't support F_FMA, so
the proposed patch fixes an error here.

Uros.


RE: [PATCH] Vectorization for store with negative step

2013-12-18 Thread Bingfeng Mei
Hi, Jakub,
Sorry for all the formatting issues. Haven't submit a patch for a while :-).
Please find the updated patch. 

Thanks,
Bingfeng

-Original Message-
From: Jakub Jelinek [mailto:ja...@redhat.com] 
Sent: 18 December 2013 13:38
To: Bingfeng Mei
Cc: Richard Biener; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Vectorization for store with negative step

On Wed, Dec 18, 2013 at 01:31:05PM +, Bingfeng Mei wrote:
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 206016)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2013-12-18  Bingfeng Mei  
+
+   PR tree-optimization/59544
+* tree-vect-stmts.c (perm_mask_for_reverse): Move before

This should be a tab instead of 8 spaces.

+   vectorizable_store. (vectorizable_store): Handle negative step.

Newline and tab after "store.", rather than space.

Property changes on: gcc/testsuite/gcc.target/i386/pr59544.c
___
Added: svn:executable
   + *

Please don't add such bogus property.  Testcases aren't executable.

Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 206016)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2013-12-18  Bingfeng Mei  
+
+   PR tree-optimization/59544
+   * gcc.target/i386/pr59544.c: New test

Missing dot at the end of line.
+
 2013-12-16  Jakub Jelinek  
 
PR middle-end/58956
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 206016)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -4859,6 +4859,25 @@ ensure_base_align (stmt_vec_info stmt_in
 }
 
 
+/* Given a vector type VECTYPE returns the VECTOR_CST mask that implements
+   reversal of the vector elements.  If that is impossible to do,
+   returns NULL.  */
+
+static tree
+perm_mask_for_reverse (tree vectype)
+{
+  int i, nunits;
+  unsigned char *sel;
+
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  sel = XALLOCAVEC (unsigned char, nunits);
+
+  for (i = 0; i < nunits; ++i)
+sel[i] = nunits - 1 - i;
+
+  return vect_gen_perm_mask (vectype, sel);
+}
+
 /* Function vectorizable_store.
 
Check if STMT defines a non scalar data-ref (array/pointer/structure) that
@@ -4902,6 +4921,8 @@ vectorizable_store (gimple stmt, gimple_
   vec oprnds = vNULL;
   vec result_chain = vNULL;
   bool inv_p;
+  bool negative = false;
+  tree offset = NULL_TREE;
   vec vec_oprnds = vNULL;
   bool slp = (slp_node != NULL);
   unsigned int vec_num;
@@ -4976,16 +4997,38 @@ vectorizable_store (gimple stmt, gimple_
   if (!STMT_VINFO_DATA_REF (stmt_info))
 return false;
 
-  if (tree_int_cst_compare (loop && nested_in_vect_loop_p (loop, stmt)
-   ? STMT_VINFO_DR_STEP (stmt_info) : DR_STEP (dr),
-   size_zero_node) < 0)
+  negative = tree_int_cst_compare (loop && nested_in_vect_loop_p (loop, stmt)
+? STMT_VINFO_DR_STEP (stmt_info) : DR_STEP (dr),
+size_zero_node) < 0;

The formatting looks wrong, do:
  negative
= tree_int_cst_compare (loop && nested_in_vect_loop_p (loop, stmt)
? STMT_VINFO_DR_STEP (stmt_info) : DR_STEP (dr),
size_zero_node) < 0;
instead.

+  if (negative && ncopies > 1)
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "negative step for store.\n");
+ "multiple types with negative step.");
   return false;
 }
 
+  if (negative)
+{
+  gcc_assert (!grouped_store);
+  alignment_support_scheme = vect_supportable_dr_alignment (dr, false);
+  if (alignment_support_scheme != dr_aligned
+  && alignment_support_scheme != dr_unaligned_supported)

Lots of places where you use 8 spaces instead of tab, please fix.
+offset = size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1);
+
   if (store_lanes_p)
 aggr_type = build_array_type_nelts (elem_type, vec_num * nunits);
   else
@@ -5200,7 +5246,7 @@ vectorizable_store (gimple stmt, gimple_
dataref_ptr
  = vect_create_data_ref_ptr (first_stmt, aggr_type,
  simd_lane_access_p ? loop : NULL,
- NULL_TREE, &dummy, gsi, &ptr_incr,
+ offset, &dummy, gsi, &ptr_incr,
  simd_lane_access_p, &inv_p);
  gcc_assert (bb_vinfo || !inv_p);
}
@@ -5306,6 +5352,21 @@ vectorizable_store (gimple stmt, gimple_
set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
misalign);
 
+ if (negative)
+{
+  

Re: [PATCH] Improve _mm*loadu* intrinsics handling (PR target/59539)

2013-12-18 Thread Uros Bizjak
On Wed, Dec 18, 2013 at 4:11 PM, Jakub Jelinek  wrote:
> Hi!
>
> As discussed in the PR, this patch similarly to the recent changes
> in movmisalign expansion for TARGET_AVX for unaligned loads from
> misaligned_operand just expands those as *mov_internal pattern,
> because that pattern emits vmovdqu/vmovup[sd] too, but doesn't contain
> UNSPECs and thus can be also merged into most other AVX insns that use
> the load target if those insns accept a memory operand.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2013-12-18  Jakub Jelinek  
>
> PR target/59539
> * config/i386/sse.md
> (_loadu,
> _loaddqu): New expanders,
> prefix existing define_insn names with *.
>
> * gcc.target/i386/pr59539-1.c: New test.
> * gcc.target/i386/pr59539-2.c: New test.

OK for mainline, with a FIXME comment to review !
condition once avx512f merge is finished.

Thanks,
Uros.


Re: [PATCH i386 6/8] [AVX-512] Add builtins/intrinsics.

2013-12-18 Thread Uros Bizjak
On Wed, Dec 18, 2013 at 2:21 PM, Kirill Yukhin  wrote:
> Hello,
>
> On 02 Dec 16:15, Kirill Yukhin wrote:
>> Hello
>> > Ok for trunk?
>> Ping?
> Ping.
>
> Rebased patch attached.

+  error ("third argument must be comparison constant.");

"the third ...", without dot at the end. Please review many other
instances of either missing "the" and/or unneded dot.
+  error ("the immediate argument must be 4-bit immediate.");

"... an 4-bit immediate"

+  error ("last argument must be scale 1, 2, 4, 8");

"the last ..."

+  error ("forth argument must be scale 1, 2, 4, 8");

"the fourth ..."

(disclaimer: English is not my native language).

+/* Walk through insns sequence or pattern and erase rounding mentions.
+   Main transformation is performed in ix86_erase_embedded_rounding_1.  */
+static rtx
+ix86_erase_embedded_rounding (rtx pat)

All calls to this function are made with insn pattern, so we can
remove this function and use ix86_erase_embedded_rounding_1 directly
instead. The function to handle sequences can be re-introduced when
needed, probably in a later patch.

@@ -34092,6 +35818,16 @@ ix86_builtin_vectorized_function (tree
fndecl, tree type_out,

Changes to ix86_builtin_vectorized_function belong to "[PATCH i386
5/8] [AVX-512] Extend vectorizer hooks.". This one is already huge...

So, based on following findings:
- the patch uses standard builtin expansion infrastructure and
expansion approaches
- headers and builtins survive gcc.target/i386/sse-{12,13,14,22,23}.c
and g++.dg/other/i386-{2,3}.C regression smoketests
- the functionality is covered by extensive testsuite

the patch is OK (with above mentioned changes) for mainline.

Uros.


Re: [RFC] libgcov.c re-factoring and offline profile-tool

2013-12-18 Thread Xinliang David Li
>>
>>  #ifdef L_gcov_merge_ior
>>  /* The profile merging function that just adds the counters.  It is given
>> -   an array COUNTERS of N_COUNTERS old counters and it reads the same number
>> -   of counters from the gcov file.  */
>> +   an array COUNTERS of N_COUNTERS old counters.
>> +   When SRC==NULL, it reads the same number of counters from the gcov file.
>> +   Otherwise, it reads from SRC array.  */
>>  void
>> -__gcov_merge_ior (gcov_type *counters, unsigned n_counters)
>> +__gcov_merge_ior (gcov_type *counters, unsigned n_counters,
>> +  gcov_type *src, unsigned w __attribute__ ((unused)))
>
> So the new in-memory variants are introduced for merging tool, while libgcc 
> use gcov_read_counter
> interface?
> Perhaps we can actually just duplicate the functions to avoid runtime to do 
> all the scalling
> and in_mem tests it won't need?


I thought about this one a little. How about making the interface
change conditionally, but still share the implementation?  The merge
function bodies mostly remain unchanged and there is no runtime
penalty for libgcov.  The new macros can be shared across most of the
mergers.

#ifdef IN_PREOFILE_TOOL
#define GCOV_MERGE_EXTRA_ARGS  gcov_type *src, unsigned w
#define GCOV_READ_COUNTER  *(src++) * w
#else
#define GCOV_MERGE_EXTRA_ARGS
#define GCOV_READ_COUNTER gcov_read_counter ()
#endif

__gcov_merge_add (gcov_type *counters, unsigned n_counters,
  GCOV_MERGE_EXTRA_ARGS)
{

 for (; n_counters; counters++, n_counters--)
  {
  *counters += GCOV_READ_COUNTER ;
   }

}

thanks,

David

>
> I would suggest going with libgcov.h changes and clenaups first, with 
> interface changes next
> and the gcov-tool is probably quite obvious at the end?
> Do you think you can split the patch this way?
>
> Thanks and sorry for taking long to review. I should have more time again now.
> Honza


Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-18 Thread Allan Sandfeld Jensen
On Wednesday 18 December 2013, Gopalasubramanian, Ganesh wrote:
> Ping!
> 
> "Gopalasubramanian, Ganesh"  wrote:
> > Yes, I figured that was the original idea behind it, but the final family
> > of the jaguar processors seems to have become 16h instead of 14h
> > (bobcat) at some point.
> 
> Yes. It is amdfam16h. I was supposed to pass on some comments on the patch.
> 1. Amdfam16h for Jaguar.
> 2. For Jaguar, the priority needs to be AVX (AVX got included into the
> Jaguar ISA).
> 
Yes, I changed that in the last patch, though I consider it momentarily 
problematic because you do not yet enable AVX with march=btver2 (AVX versions 
would currently be better than btver2 versions for a btver2 arch), but expect 
march=btver2 will be fixed soon.

Regards
'Allan


Re: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-18 Thread Joseph S. Myers
On Mon, 16 Dec 2013, Eric Botcazou wrote:

> which of course blatantly violates the do-not-rely-on-mode rule.  Although 
> the 
> layout change apparently occurs very rarely, I think that this rules out the 
> direct mode change in stor-layout.c... 

Well - makes such a change unsuitable for 4.9.  Longer-term we still want 
to avoid ABIs depending on modes of structs / unions / arrays (and 
depending on modes of vectors can also have undesired effects in some 
cases - I took care when working on the ARM hard-float ABI variant to 
avoid the ABI for argument passing of GCC generic vectors depending on 
whether NEON vector support was enables).  That suggests to me:

* Produce a list of all ABI-affecting target macros and hooks that might, 
wrongly, use modes to determine the ABI (struct layout, alignment, 
argument passing, function return, etc.).

* Put that on a wiki page along with a description of the problem (when 
it's OK to use modes in determining the ABI and when it isn't).  Also make 
sure the documentation of the macros / hooks in the internals manual is 
clear about not using modes inappropriately.

* Put a table of target architectures on the wiki page with a column for 
architecture maintainers to record whether they have checked for and fixed 
any problematic uses of modes.

* Ask architecture maintainers to do the checks and fixes.  Fixes may be 
nontrivial in some cases, as compatibility means that if the ABI did 
depend on the mode you may need to replicate the relevant bits of 
stor-layout.c logic that determined the mode inside the back end.

* After a while (pinging maintainers as needed), obsolete then remove any 
architectures that have not been checked.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Ubsan load of bool/enum sanitization

2013-12-18 Thread Joseph S. Myers
On Mon, 16 Dec 2013, Jakub Jelinek wrote:

> On Mon, Dec 16, 2013 at 07:40:16PM +0100, Jakub Jelinek wrote:
> > It can be the last thing, sure.  I think the still unimplemented and
> > potentially useful are the floating point overflow sanitization (haven't
> > looked yet what exactly it is, I suppose casts from floating point to
> > integers where the values are out of range, but dunno exactly) and
> > they have also some __builtin_object_size based bounds checking.
> 
> Oh, and then there is sanitization of nonnull arguments and returns_nonnull
> return values which ideally we should add, clang doesn't have it (yet?), but
> it is really desirable when we aggressively optimize based on those
> attributes.  We need to discuss with compiler-rt ubsan upstream first
> though, unless we want to add the entrypoint as a GCC only libubsan addition.

I think there's probably more for future versions once you get into adding 
new entry points.  See e.g. 
 about not being 
able to check systematically that the size in bytes of a VLA, including a 
constant-size array of VLAs, does not exceed target PTRDIFF_MAX, for lack 
of libubsan support.

More generally: look at the list of undefined behavior in C11 J.2.  For 
each item there, it should be possible to say one (or more) of:

* GCC bounds the effects of this undefined behavior.

* This undefined behavior is detected at runtime with (specified 
sanitization option).

* This undefined behavior is critical undefined behavior as defined in 
L.3.

Perhaps a table to that effect, along with one for GCC extensions for 
which sanitization could usefully apply, should go on the wiki.  We could 
then see what sanitization (or other bounding) features are needed in 
order to provide an option / options that would conform to Annex L and 
define __STDC_ANALYZABLE__.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING]: [GOMP4] [PATCH] SIMD-Enabled Functions (formerly Elemental functions) for C

2013-12-18 Thread Joseph S. Myers
On Mon, 16 Dec 2013, Jakub Jelinek wrote:

> On Mon, Dec 16, 2013 at 09:41:43PM +, Iyer, Balaji V wrote:
> > --- gcc/c/c-parser.c(revision 205759)
> > +++ gcc/c/c-parser.c(working copy)
> > @@ -208,6 +208,12 @@
> >/* True if we are in a context where the Objective-C "Property attribute"
> >   keywords are valid.  */
> >BOOL_BITFIELD objc_property_attr_context : 1;
> > +
> > +  /* Cilk Plus specific parser/lexer information.  */
> > +
> > +  /* Buffer to hold all the tokens from parsing the vector attribute for 
> > the
> > + SIMD-enabled functions (formerly known as elemental functions).  */
> > +  vec  *cilk_simd_fn_tokens;
> >  } c_parser;
> 
> Joseph, is this ok for you?

Fine with me.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Ubsan load of bool/enum sanitization

2013-12-18 Thread Joseph S. Myers
On Mon, 16 Dec 2013, Jakub Jelinek wrote:

> It can be the last thing, sure.  I think the still unimplemented and
> potentially useful are the floating point overflow sanitization (haven't
> looked yet what exactly it is, I suppose casts from floating point to
> integers where the values are out of range, but dunno exactly) and

Note that under Annex F that's only unspecified value plus "invalid" 
exception, rather than undefined behavior (though that issue is covered by 
allowing this checking to be enabled / disabled independent of the other 
cases).  (Reliably getting the "invalid" exception is one of the many 
Annex F pieces not implemented in GCC.)

I think it would be most appropriate for floating-point conversion to 
bit-fields in C to count as out of range (with sanitization / exception as 
appropriate) based on the range of the bit-field, but in C++ it should 
probably be based on the range of the underlying type not taking into 
account the bit-field width, with conversion from that type to the 
bit-field then being modulo, in accordance with the principle that 
bit-field width is not part of the type in C++.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [AArch64 2/3 big.LITTLE] Allow tuning parameters without unique tuning targets.

2013-12-18 Thread Marcus Shawcroft
On 18 December 2013 12:23, James Greenhalgh  wrote:

> 2013-12-18  James Greenhalgh  
>
> * config/aarch64/aarch64-cores.def: Add new column for
> SCHEDULER_IDENT.
> * config/aarch64/aarch64-opts.h (AARCH64_CORE): Handle
> SCHEDULER_IDENT.
> * config/aarch64/aarch64.c (AARCH64_CORE): Handle
> SCHEDULER_IDENT.
> (aarch64_parse_cpu): mcpu implies a default value for mtune.
> * config/aarch64/aarch64.h (AARCH64_CORE): Handle
> SCHEDULER_IDENT.

OK
/Marcus


Re: [AArch64 1/3 big.LITTLE] Driver rewriting of big.LITTLE names.

2013-12-18 Thread Marcus Shawcroft
On 18 December 2013 12:23, James Greenhalgh  wrote:

> 2013-12-18  James Greenhalgh  
>
> * common/config/aarch64/aarch64-common.c
> (aarch64_rewrite_selected_cpu): New.
> (aarch64_rewrite_mcpu): New.
> * config/aarch64/aarch64-protos.h
> (aarch64_rewrite_selected_cpu): New.
> * config/aarch64/aarch64.h (BIG_LITTLE_SPEC): New.
> (BIG_LITTLE_SPEC_FUNCTIONS): Likewise.
> (ASM_CPU_SPEC): Likewise.
> (EXTRA_SPEC_FUNCTIONS): Likewise.
> (EXTRA_SPECS): Likewise.
> (ASM_SPEC): Likewise.
> * config/aarch64/aarch64.c (aarch64_start_file): Rewrite target
> CPU name.

OK
/Marcus


Re: [AArch64 3/3 big.LITTLE] Add support for -mcpu=cortex-a57.cortex-a53

2013-12-18 Thread Marcus Shawcroft
On 18 December 2013 12:23, James Greenhalgh  wrote:

> 2013-12-18  James Greenhalgh  
>
> * config/aarch64/aarch64-cores.def: Add support for
> -mcpu=cortex-a57.cortex-a53.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/invoke.texi: Document -mcpu=cortex-a57.cortex-a53.

OK /Marcus


Re: [Patch, AArch64] [3/6] Implement support for Crypto -- AES.

2013-12-18 Thread Marcus Shawcroft
On 18 December 2013 15:28, Tejas Belagod  wrote:

> 2013-12-18  Tejas Belagod  
>
>
> gcc/
> * config/aarch64/aarch64-simd-builtins.def: Update builtins table.
> * config/aarch64/aarch64-builtins.c
> (aarch64_types_binopu_qualifiers,
> TYPES_BINOPU): New.
>
> * config/aarch64/aarch64-simd.md (aarch64_crypto_aesv16qi,
> aarch64_crypto_aesv16qi): New.
> * config/aarch64/arm_neon.h (vaeseq_u8, vaesdq_u8, vaesmcq_u8,
> vaesimcq_u8): New.
> * config/aarch64/iterators.md (UNSPEC_AESE, UNSPEC_AESD,
> UNSPEC_AESMC,
> UNSPEC_AESIMC): New.
> (CRYPTO_AES, CRYPTO_AESMC): New int iterators.
> (aes_op, aesmc_op): New int attributes.
>
> testsuite/
> * gcc.target/aarch64/aes_1.c: New.

OK, Thanks /Marcus


Re: [Patch, AArch64] [4/6] Implement support for Crypto -- SHA1.

2013-12-18 Thread Marcus Shawcroft
On 18 December 2013 15:28, Tejas Belagod  wrote:

> 2013-12-18  Tejas Belagod  
>
> gcc/
> * config/aarch64/aarch64-simd-builtins.def: Update builtins table.
> * config/aarch64/aarch64-builtins.c
> (aarch64_types_ternopu_qualifiers,
> TYPES_TERNOPU): New.
>
> * config/aarch64/aarch64-simd.md (aarch64_crypto_sha1hsi,
> aarch64_crypto_sha1su1v4si, aarch64_crypto_sha1v4si,
> aarch64_crypto_sha1su0v4si): New.
> * config/aarch64/arm_neon.h (vsha1cq_u32, sha1mq_u32, vsha1pq_u32,
> vsha1h_u32, vsha1su0q_u32, vsha1su1q_u32): New.
> * config/aarch64/iterators.md (UNSPEC_SHA1,
> UNSPEC_SHA1SU<01>):
> New.
> (CRYPTO_SHA1): New int iterator.
> (sha1_op): New int attribute.
>
> testsuite/
> * gcc.target/aarch64/sha1_1.c: New.

OK, thanks /Marcus


Re: [Patch, AArch64] [5/6] Implement support for Crypto -- SHA256.

2013-12-18 Thread Marcus Shawcroft
On 18 December 2013 15:28, Tejas Belagod  wrote:

> 2013-12-18  Tejas Belagod  
>
>
> gcc/
> * config/aarch64/aarch64-simd-builtins.def: Update builtins table.
> * config/aarch64/aarch64-simd.md
> (aarch64_crypto_sha256hv4si,
> aarch64_crypto_sha256su0v4si, aarch64_crypto_sha256su1v4si): New.
> * config/aarch64/arm_neon.h (vsha256hq_u32, vsha256h2q_u32,
> vsha256su0q_u32, vsha256su1q_u32): New.
> * config/aarch64/iterators.md (UNSPEC_SHA256H<2>,
> UNSPEC_SHA256SU<01>):
> New.
> (CRYPTO_SHA256): New int iterator.
> (sha256_op): New int attribute.
>
> testsuite/
> * gcc.target/aarch64/sha256_1.c: New.

OK /Marcus


Re: [Patch, AArch64] [6/6] Implement support for Crypto -- PMULL.64.

2013-12-18 Thread Marcus Shawcroft
On 18 December 2013 15:28, Tejas Belagod  wrote:

>> 2013-12-06  Tejas Belagod  
>>
>> gcc/
>> * config/aarch64/aarch64-builtins.c: Define builtin types for
>> poly64_t
>> poly128_t.
>> * aarch64/aarch64-simd-builtins.def: Update builtins table.
>> * config/aarch64/aarch64-simd.md (aarch64_crypto_pmulldi,
>> aarch64_crypto_pmullv2di): New.
>> * config/aarch64/aarch64.c (aarch64_simd_mangle_map): Update table
>> for
>> poly64x2_t mangler.
>> * config/aarch64/arm_neon.h (poly64x2_t, poly64_t, poly128_t):
>> Define.
>> (vmull_p64, vmull_high_p64): New.
>> * config/aarch64/iterators.md (UNSPEC_PMULL<2>): New.
>>
>> testsuite/
>>
>> * gcc.target/aarch64/pmull.c: New.

OK /Marcus


Re: GOMP_target: alignment (was: [gomp4] #pragma omp target* fixes)

2013-12-18 Thread Thomas Schwinge
Hi!

This one's owed to me still learning about GCC internals; if someone
could please be so kind to poit me to the appropriate documentation, or
explain:

On Mon, 16 Dec 2013 16:38:18 +0100, Jakub Jelinek  wrote:
> The reason for 3 separate arrays is that some of the values
> are always variable, some are sometimes variable (sizes), some are
> never variable (alignment + kind).

Related to this, in gcc/omp-low.c:lower_omp_target, I see:

  tree clobber = build_constructor (ctx->record_type, NULL);
  TREE_THIS_VOLATILE (clobber) = 1;
  gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl,
clobber));

I'm assuming the point of this clobber is to tell the compiler (because
it can't figure this out on its own?) that afterwards, after the
gimple_seq olist has been "executed", we're not going to use the
ctx->sender_decl object anymore, right?  What would happen if this
clobber were not added?  Missed optimizations due to the object being
kept alive, or correctness issues, or something else?

And, why doesn't the same also need to be done for the sizes object (in
the non-static case)?


Grüße,
 Thomas


pgpqFEg3l5zjf.pgp
Description: PGP signature


Re: [Patch, Fortran, OOP] PR 59493: Cleanup of vtab generation code

2013-12-18 Thread Tobias Burnus

Janus Weil wrote:

here is a follow-up to my recent patch for PR59493, doing some cleanup
related to the generation of vtab symbols:
1) Since the function gfc_find_intrinsic_vtab, contrary to its name,
handles not only intrinsic but also derived types, I removed the
latter functionality, and instead introduced a new function
gfc_find_vtab, which handles arbitrary types and simply decides
whether to call the corresponding function for intrinsic or derived
vtabs.
2) Basically all calls to gfc_find_intrinsic_vtab are replaced by
gfc_find_vtab. This often simplifies the logic and saves additional IF
clauses to distinguish between intrinsic and derived types.
3) As a consequence, gfc_find_intrinsic_vtab is made static and loses
the gfc_ prefix.

All of this results in the code being shorter, clearer and more
error-prone. The patch is regtested on x86_64-unknown-linux-gnu. Ok
for trunk?


I assume you mean "less error prone".

Looks good to me. Thanks for the cleanup!

Tobias


2013-12-18  Janus Weil  

 PR fortran/59493
 * gfortran.h (gfc_find_intrinsic_vtab): Removed prototype.
 (gfc_find_vtab): New prototype.
 * class.c (gfc_find_intrinsic_vtab): Rename to 'find_intrinsic_vtab' and
 make static. Minor modifications.
 (gfc_find_vtab): New function.
 (gfc_class_initializer): Use new function 'gfc_find_vtab'.
 * check.c (gfc_check_move_alloc): Ditto.
 * expr.c (gfc_check_pointer_assign): Ditto.
 * interface.c (compare_actual_formal): Ditto.
 * resolve.c (resolve_allocate_expr, resolve_select_type): Ditto.
 * trans-expr.c (gfc_conv_intrinsic_to_class, gfc_trans_class_assign):
 Ditto.
 * trans-intrinsic.c (conv_intrinsic_move_alloc): Ditto.
 * trans-stmt.c (gfc_trans_allocate): Ditto.




Re: GOMP_target: alignment (was: [gomp4] #pragma omp target* fixes)

2013-12-18 Thread Jakub Jelinek
On Wed, Dec 18, 2013 at 09:03:40PM +0100, Thomas Schwinge wrote:
> This one's owed to me still learning about GCC internals; if someone
> could please be so kind to poit me to the appropriate documentation, or
> explain:
> 
> On Mon, 16 Dec 2013 16:38:18 +0100, Jakub Jelinek  wrote:
> > The reason for 3 separate arrays is that some of the values
> > are always variable, some are sometimes variable (sizes), some are
> > never variable (alignment + kind).
> 
> Related to this, in gcc/omp-low.c:lower_omp_target, I see:
> 
>   tree clobber = build_constructor (ctx->record_type, NULL);
>   TREE_THIS_VOLATILE (clobber) = 1;
>   gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl,
> clobber));

Clobber stmt is an artificial statement that tells various optimization
passes that the decl is dead at that point, so e.g. DSE can remove stores
to the decl only followed by the clobber, or cfgexpand automatic variable
layout code can be able to better share stack slots for variables that
aren't live concurrently.

It is purely optimization thing right.  Given that the address of the
object is passed to some other function, it might help the compiler to find
out that the function doesn't remember that address somewhere, making the
object live for longer than it is.

Jakub


lra_in_progress check in simplify_subreg_regno

2013-12-18 Thread Richard Sandiford
Hi Vlad,

The initial LRA merge added the lra_in_progress check to this code
in simplify_subreg_regno:

  /* Give the backend a chance to disallow the mode change.  */
  if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
  && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
  && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
  /* We can use mode change in LRA for some transformations.  */
  && ! lra_in_progress)
return -1;

I realise that LRA internally uses subregs that wouldn't normally be valid
(e.g. to handle matching constraints) but I think it's really dangerous
to ignore REG_CANNOT_CHANGE_MODE_P and reduce a subreg to a specific
hard reg anyway.  CLASS_CANNOT_CHANGE_MODE says that all bets are off in
terms of the normal subreg rules.  E.g. the register class might have the
opposite endianness to GENERAL_REGS or something extreme like that.

It wasn't obvious from the comment exactly what case this !lra_in_progress
was handling, but I tried reverting it and saw a failure in gcc.dg/pr49948.c.
The problem there was with a subreg of the form:

(subreg:DI (reg:V2DI X) 0)

We were trying to reload this subreg in GENERAL_REGS and X had been
allocated an SSE reg.  We then hit this code in curr_insn_transform,
which was deciding whether to reload the reg or the subreg:

  if (REG_P (reg)
  /* Strict_low_part requires reload the register not
 the sub-register.  */
  && (curr_static_id->operand[i].strict_low
  || (GET_MODE_SIZE (mode)
  <= GET_MODE_SIZE (GET_MODE (reg))
  && (hard_regno
  = get_try_hard_regno (REGNO (reg))) >= 0
  && (simplify_subreg_regno
  (hard_regno,
   GET_MODE (reg), byte, mode) < 0)
  && (goal_alt[i] == NO_REGS
  || (simplify_subreg_regno
  (ira_class_hard_regs[goal_alt[i]][0],
   GET_MODE (reg), byte, mode) >= 0)
{
  loc = &SUBREG_REG (*loc);
  mode = GET_MODE (*loc);
}

The first simplify_subreg_regno (on the SSE reg) used to return >= 0
thanks to the !lra_in_progress, even though the change is usually supposed
to be disallowed.  With the !lra_in_progress removed simplify_subreg_regno
rejects the simplification instead.  Then the second simplify_subreg_regno
(on the first allocatable GENERAL_REGS) succeeds and we go on to reload
the inner reg.

But the key in this case is that GENERAL_REGS isn't allowed to hold V2DImode,
so the simplification performed by the second simplify_subreg_regno
doesn't really have any meaning.  I think we should check that
ira_class_hard_regs[goal_alt[i]][0] can hold GET_MODE (reg) before
trying to simplify a subreg of it.

Tested on x86_64-linux-gnu ({,-m32}, all languages including Go and Ada).
Does it look OK?  Or, if the check was originally added for a different
situation, are you still able to reproduce it?

FWIW, this is all part of an attempt to fix the vec_select thing for x86_64.
I also want to get rid of the GET_MODE_CLASS checks seen in the first hunk,
but that's an entirely separate issue.

Thanks,
Richard


gcc/
* rtlanal.c (simplify_subreg_regno): Remove lra_in_progress check.
* lra-constraints.c (curr_insn_transform): Before trying to simplify
a subreg, check whether the original reg is valid.

Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   2013-12-09 08:03:31.931201686 +
+++ gcc/rtlanal.c   2013-12-18 21:22:10.605987652 +
@@ -3533,9 +3533,7 @@ simplify_subreg_regno (unsigned int xreg
   /* Give the backend a chance to disallow the mode change.  */
   if (GET_MODE_CLASS (xmode) != MODE_COMPLEX_INT
   && GET_MODE_CLASS (xmode) != MODE_COMPLEX_FLOAT
-  && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode)
-  /* We can use mode change in LRA for some transformations.  */
-  && ! lra_in_progress)
+  && REG_CANNOT_CHANGE_MODE_P (xregno, xmode, ymode))
 return -1;
 #endif
 
Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c   2013-12-09 21:35:05.589588280 +
+++ gcc/lra-constraints.c   2013-12-18 21:22:10.585987483 +
@@ -3493,9 +3493,12 @@ curr_insn_transform (void)
  (hard_regno,
   GET_MODE (reg), byte, mode) < 0)
  && (goal_alt[i] == NO_REGS
- || (simplify_subreg_regno
+ || (HARD_REGNO_MODE_OK
  (ira_class_hard_regs[goal_alt[i]][0],
-  GET_MODE (reg), byte, mode) >= 0)
+  GET

Re: [Patch, Fortran, OOP] PR 59493: Cleanup of vtab generation code

2013-12-18 Thread Janus Weil
2013/12/18 Tobias Burnus :
> Janus Weil wrote:
>>
>> here is a follow-up to my recent patch for PR59493, doing some cleanup
>> related to the generation of vtab symbols:
>> 1) Since the function gfc_find_intrinsic_vtab, contrary to its name,
>> handles not only intrinsic but also derived types, I removed the
>> latter functionality, and instead introduced a new function
>> gfc_find_vtab, which handles arbitrary types and simply decides
>> whether to call the corresponding function for intrinsic or derived
>> vtabs.
>> 2) Basically all calls to gfc_find_intrinsic_vtab are replaced by
>> gfc_find_vtab. This often simplifies the logic and saves additional IF
>> clauses to distinguish between intrinsic and derived types.
>> 3) As a consequence, gfc_find_intrinsic_vtab is made static and loses
>> the gfc_ prefix.
>>
>> All of this results in the code being shorter, clearer and more
>> error-prone. The patch is regtested on x86_64-unknown-linux-gnu. Ok
>> for trunk?
>
> I assume you mean "less error prone".

I guess I meant either "less error-prone" or "more error-proof", and
somehow picked the wrong combination ;)


> Looks good to me. Thanks for the cleanup!

Thanks, committed as r206101.

Btw, I also thought about the possibility of replacing all calls to
gfc_find_derived_vtab by gfc_find_vtab, but I need to check if that is
reasonable/helpful in all cases.

Cheers,
Janus


Re: [PATCH] Fix ifcvt (PR rtl-optimization/58668)

2013-12-18 Thread Steven Bosscher
On Wednesday, December 18, 2013, Jakub Jelinek wrote:
>
> Hi!
>
> As discussed in the PR, this testcase ICEs on arm, because ifcvt
> is relying on active instruction counts from various routines
> (count_bb_insns, flow_find_cross_jump and flow_find_head_matching_sequence),
> but each of those routines have different view of what counts as
> active insns.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux
> and tested on the testcase using cross to arm.  Ok for trunk?
>
> 2013-12-18  Jakub Jelinek  <>
> PR rtl-optimization/58668
> * cfgcleanup.c (flow_find_cross_jump): Don't count
> any jumps if dir_p is NULL.  Remove p1 variable and make USE/CLOBBER
> check consistent with other places.
> (flow_find_head_matching_sequence): Don't count USE or CLOBBER insns.
> (try_head_merge_bb): Adjust for the flow_find_head_matching_sequence
> counting change.
> * ifcvt.c (count_bb_insns): Don't count USE or CLOBBER insns.

Why not use active_insn_p instead of hand-checks for USE and CLOBBER insns?

Ciao!

Steven


RE: [GOMP4][PATCH] SIMD-enabled functions (formerly Elemental functions) for C++

2013-12-18 Thread Iyer, Balaji V
Hello Jakub,
Attached, please find a patch that will implement SIMD-enabled 
functions for C++.  To my best knowledge, I have incorporated all the changes 
you have mentioned in C patch that are applicable for C++.

Is this Ok to trunk? It passes all the tests on my SUSE machine for 
-m32 and -m64.

Here are the ChangeLog entries:
gcc/cp/ChangeLog
2013-12-18  Balaji V. Iyer  

* decl2.c (is_late_template_attribute): Added a check for SIMD-enabled
functions attribute.  If found, return true.
* parser.c (cp_parser_direct_declarator): When Cilk Plus is enabled
see if there is an attribute after function decl.  If so, then
parse them now.
(cp_parser_late_return_type_opt): Handle parsing of Cilk Plus SIMD
enabled function late parsing.
(cp_parser_gnu_attribute_list): Parse all the tokens for the vector
attribute for a SIMD-enabled function.
(cp_parser_omp_all_clauses): Skip parsing to the end of pragma when
the function is used by SIMD-enabled function (indicated by NULL
pragma token).   Added 3 new clauses: PRAGMA_CILK_CLAUSE_MASK,
PRAGMA_CILK_CLAUSE_NOMASK and PRAGMA_CILK_CLAUSE_VECTORLENGTH
(cp_parser_cilk_simd_vectorlength): Modified this function to handle
vectorlength clause in SIMD-enabled function and #pragma SIMD's
vectorlength clause.  Added a new bool parameter to differentiate
between the two.
(cp_parser_cilk_simd_fn_vector_attrs): New function.
(is_cilkplus_vector_p): Likewise.
(cp_parser_late_parsing_elem_fn_info): Likewise.
(cp_parser_omp_clause_name): Added a check for "mask," "nomask"
and "vectorlength" clauses when Cilk Plus is enabled.
(cp_parser_omp_clause_linear): Added a new parameter of type bool
and emit a sorry message when step size is a parameter.
* parser.h (cp_parser::cilk_simd_fn_info): New field.

gcc/testsuite/ChangeLog
2013-12-18  Balaji V. Iyer  

* g++.dg/cilk-plus/cilk-plus.exp: Called the C/C++ common tests for
SIMD enabled function.
* g++.dg/cilk-plus/ef_test.C: New test.
* c-c++-common/cilk-plus/vlength_errors.c: Added new dg-error tags
to differentiate C error messages from C++ ones.

Thanks,
Balaji V. Iyer.



> -Original Message-
> From: Iyer, Balaji V
> Sent: Sunday, December 15, 2013 11:53 PM
> To: 'Jakub Jelinek'
> Cc: Aldy Hernandez (al...@redhat.com); 'gcc-patches@gcc.gnu.org'
> Subject: RE: [GOMP4][PATCH] SIMD-enabled functions (formerly Elemental
> functions) for C++
> 
> Hello Everyone,
>   The following changes mentioned in this thread
> (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01280.html) are also
> applicable to the C++ patch and the attached patch has been fixed
> accordingly:
> 
> 1. Sharing the vectorlength parsing function between #pragma simd and
> SIMD enabled functions 2. Renaming the function that is parsing the SIMD
> enabled function attributes 3. Renaming "Cilk plus elementals" to "Cilk SIMD
> function" for the attribute name 4. Marking all the SIMD enabled function
> attributes with both "omp declare simd" and "cilk simd function."
> 5. Renaming an error message from "..SIMD-enabled function" to "Cilk Plus
> SIMD-enabled function..."
> 
> So, is this patch OK for branch/trunk?
> 
> Here are the ChangeLog entries:
> 
> Gcc/cp/ChangeLog:
> 2013-12-16  Balaji V. Iyer  
> 
> * decl2.c (is_late_template_attribute): Added a check for SIMD-enabled
> functions attribute.  If found, return true.
> * parser.c (cp_parser_direct_declarator): When Cilk Plus is enabled
> see if there is an attribute after function decl.  If so, then
> parse them now.
> (cp_parser_late_return_type_opt): Handle parsing of Cilk Plus SIMD
> enabled function late parsing.
> (cp_parser_gnu_attribute_list): Parse all the tokens for the vector
> attribute for a SIMD-enabled function.
> (cp_parser_omp_all_clauses): Skip parsing to the end of pragma when
> the function is used by SIMD-enabled function (indicated by NULL
> pragma token).
> (cp_parser_cilk_simd_vectorlength): Modified this function to parse
> vectorlength attribute in SIMD-enabled function and #pragma SIMD's
> vectorlength clause.  Added a new parameter to pass in SIMD-enabled
> function's info.
> (cp_parser_cilk_simd_fn_vector_attrs): New function.
> (cp_parser_late_parsing_elem_fn_info): Likewise.
> * parser.h (cp_parser::elem_fn_info): New field.
> * decl.c (grokfndecl): Added a check if Cilk Plus is enabled and
> if so, adjust the Cilk Plus SIMD-enabled function attributes.
> 
> Gcc/testsuite/ChangeLog
> 2013-12-16  Balaji V. Iyer  
> 
> * g++.dg/cilk-plus/cilk-plus.exp: Called the C/C++ common tests for
> SIMD enabled function.
> * g++.dg/cilk-plus/ef_test.C: Ne

[PATCH,committed] Convert assert to runtime error in reading exponents in Fortran

2013-12-18 Thread Steve Kargl
I committed the following changes to gfortran's runtime library.
It converts an assert on an invalid floating-point exponent into
a runtime error.

2013-12-18  Steven G. Kargl  

* io/read.c (read_f): Convert assert to runtime error.

2013-12-18  Steven G. Kargl  

* gfortran.dg/io_err_1.f90: New test.

Index: libgfortran/io/read.c
===
--- libgfortran/io/read.c   (revision 206100)
+++ libgfortran/io/read.c   (working copy)
@@ -1150,7 +1150,9 @@ done:
  exponent = - exponent;
}
 
-  assert (exponent < 1);
+  if (exponent >= 1)
+   goto bad_float;
+
   for (dig = 3; dig >= 0; --dig)
{
  out[dig] = (char) ('0' + exponent % 10);
Index: gcc/testsuite/gfortran.dg/io_err_1.f90
===
--- gcc/testsuite/gfortran.dg/io_err_1.f90  (revision 0)
+++ gcc/testsuite/gfortran.dg/io_err_1.f90  (working copy)
@@ -0,0 +1,14 @@
+! { dg-do run }
+! { dg-shouldfail "Compile-time specifier checking" }
+!
+! Contributed by Dominique Dhumieres 
+program read
+   character(50) :: buf='0.D9'
+   double precision val
+   read (UNIT=buf, FMT='(D60.0)', ERR=10) Val
+   call abort
+10 read (UNIT=buf, FMT='(D60.0)') Val
+end program read
+! { dg-output "At line 10 of file.*" }
+! { dg-output "Fortran runtime error: Bad value during floating point read" }
+

-- 
Steve


Add const char* constructors for exception classes in

2013-12-18 Thread Oleg Endo
Hello,

When writing code such as
...
  throw std::logic_error ("cold coffee");
...
currently the construction of std::string happens in the code that
throws the exception, which results in code bloat.  Implementing the
const char* constructors as defined by C++11 fixes the issue.
I'm not sure whether the #if __cplusplus >= 201103L checks are required.
C++98 code could also benefit from the overloads.

Tested with 'make all' and 'make install', writing a hello world and
checking the asm output.

Cheers,
Oleg

libstdc++-v3/ChangeLog:

* include/std/stdexcept (logic_error, domain_error, 
invalid_argument, length_error, out_of_range, runtime_error, 
range_error, overflow_error, underflow_error): Declare const 
char* constructors.
* src/c++98/stdexcept.cc (logic_error, domain_error, 
invalid_argument, length_error, out_of_range, runtime_error, 
range_error, overflow_error, underflow_error): Implement them.
Index: libstdc++-v3/include/std/stdexcept
===
--- libstdc++-v3/include/std/stdexcept	(revision 206101)
+++ libstdc++-v3/include/std/stdexcept	(working copy)
@@ -58,9 +58,12 @@
 
   public:
 /** Takes a character string describing the error.  */
-explicit 
+explicit
 logic_error(const string& __arg);
-
+#if __cplusplus >= 201103L
+explicit
+logic_error(const char* __arg);
+#endif
 virtual ~logic_error() _GLIBCXX_USE_NOEXCEPT;
 
 /** Returns a C-style character string describing the general cause of
@@ -75,6 +78,9 @@
   {
   public:
 explicit domain_error(const string& __arg);
+#if __cplusplus >= 201103L
+explicit domain_error(const char* __arg);
+#endif
 virtual ~domain_error() _GLIBCXX_USE_NOEXCEPT;
   };
 
@@ -83,6 +89,9 @@
   {
   public:
 explicit invalid_argument(const string& __arg);
+#if __cplusplus >= 201103L
+explicit invalid_argument(const char* __arg);
+#endif
 virtual ~invalid_argument() _GLIBCXX_USE_NOEXCEPT;
   };
 
@@ -92,6 +101,9 @@
   {
   public:
 explicit length_error(const string& __arg);
+#if __cplusplus >= 201103L
+explicit length_error(const char* __arg);
+#endif
 virtual ~length_error() _GLIBCXX_USE_NOEXCEPT;
   };
 
@@ -101,6 +113,9 @@
   {
   public:
 explicit out_of_range(const string& __arg);
+#if __cplusplus >= 201103L
+explicit out_of_range(const char* __arg);
+#endif
 virtual ~out_of_range() _GLIBCXX_USE_NOEXCEPT;
   };
 
@@ -115,9 +130,12 @@
 
   public:
 /** Takes a character string describing the error.  */
-explicit 
+explicit
 runtime_error(const string& __arg);
-
+#if __cplusplus >= 201103L
+explicit
+runtime_error(const char* __arg);
+#endif
 virtual ~runtime_error() _GLIBCXX_USE_NOEXCEPT;
 
 /** Returns a C-style character string describing the general cause of
@@ -131,6 +149,9 @@
   {
   public:
 explicit range_error(const string& __arg);
+#if __cplusplus >= 201103L
+explicit range_error(const char* __arg);
+#endif
 virtual ~range_error() _GLIBCXX_USE_NOEXCEPT;
   };
 
@@ -139,6 +160,9 @@
   {
   public:
 explicit overflow_error(const string& __arg);
+#if __cplusplus >= 201103L
+explicit overflow_error(const char* __arg);
+#endif
 virtual ~overflow_error() _GLIBCXX_USE_NOEXCEPT;
   };
 
@@ -147,6 +171,9 @@
   {
   public:
 explicit underflow_error(const string& __arg);
+#if __cplusplus >= 201103L
+explicit underflow_error(const char* __arg);
+#endif
 virtual ~underflow_error() _GLIBCXX_USE_NOEXCEPT;
   };
 
Index: libstdc++-v3/src/c++98/stdexcept.cc
===
--- libstdc++-v3/src/c++98/stdexcept.cc	(revision 206101)
+++ libstdc++-v3/src/c++98/stdexcept.cc	(working copy)
@@ -36,6 +36,11 @@
   logic_error::logic_error(const string& __arg)
   : exception(), _M_msg(__arg) { }
 
+#if __cplusplus >= 201103L
+  logic_error::logic_error(const char* __arg)
+  : exception(), _M_msg(__arg) { }
+#endif
+
   logic_error::~logic_error() _GLIBCXX_USE_NOEXCEPT { }
 
   const char*
@@ -45,26 +50,51 @@
   domain_error::domain_error(const string& __arg)
   : logic_error(__arg) { }
 
+#if __cplusplus >= 201103L
+  domain_error::domain_error(const char* __arg)
+  : logic_error(__arg) { }
+#endif
+
   domain_error::~domain_error() _GLIBCXX_USE_NOEXCEPT { }
 
   invalid_argument::invalid_argument(const string& __arg)
   : logic_error(__arg) { }
 
+#if __cplusplus >= 201103L
+  invalid_argument::invalid_argument(const char* __arg)
+  : logic_error(__arg) { }
+#endif
+
   invalid_argument::~invalid_argument() _GLIBCXX_USE_NOEXCEPT { }
 
   length_error::length_error(const string& __arg)
   : logic_error(__arg) { }
 
+#if __cplusplus >= 201103L
+  length_error::length_error(const char* __arg)
+  : logic_error(__arg) { }
+#endif
+
   length_error::~length_error() _GLIBCXX_USE_NOEXCEPT { }
 
   out_of_range::out_of_range(const string& __arg)
   : logic_error(__arg) { }
 
+#if _

Re: PATCH: PR driver/59321: -fuse-ld has no effect on -print-prog-name nor on --with-ld=

2013-12-18 Thread Joseph S. Myers
On Mon, 2 Dec 2013, H.J. Lu wrote:

> @@ -3952,6 +3955,10 @@ process_command (unsigned int decoded_options_count,
>free (fname);
> continue;
>   }
> +  else if (decoded_options[j].opt_index == OPT_fuse_ld_bfd)
> + use_ld = ".bfd";
> +  else if (decoded_options[j].opt_index == OPT_fuse_ld_gold)
> + use_ld = ".gold";

Is there a reason these options need handling there rather than in the 
switch statement in driver_handle_option?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Add const char* constructors for exception classes in

2013-12-18 Thread Jonathan Wakely
On 19 December 2013 00:10, Oleg Endo wrote:
> Hello,
>
> When writing code such as
> ...
>   throw std::logic_error ("cold coffee");
> ...
> currently the construction of std::string happens in the code that
> throws the exception, which results in code bloat.  Implementing the
> const char* constructors as defined by C++11 fixes the issue.
> I'm not sure whether the #if __cplusplus >= 201103L checks are required.
> C++98 code could also benefit from the overloads.

I think there was some good reason we haven't added these yet, but I
can't remember it.

> Tested with 'make all' and 'make install', writing a hello world and
> checking the asm output.

For all patches we need to know that the libstdc++ testsuite passes too.


Re: PATCH: PR driver/59321: -fuse-ld has no effect on -print-prog-name nor on --with-ld=

2013-12-18 Thread H.J. Lu
On Wed, Dec 18, 2013 at 4:13 PM, Joseph S. Myers
 wrote:
> On Mon, 2 Dec 2013, H.J. Lu wrote:
>
>> @@ -3952,6 +3955,10 @@ process_command (unsigned int decoded_options_count,
>>free (fname);
>> continue;
>>   }
>> +  else if (decoded_options[j].opt_index == OPT_fuse_ld_bfd)
>> + use_ld = ".bfd";
>> +  else if (decoded_options[j].opt_index == OPT_fuse_ld_gold)
>> + use_ld = ".gold";
>
> Is there a reason these options need handling there rather than in the
> switch statement in driver_handle_option?
>

It is because driver_handle_option isn't called for -fuse-ld=gold:

Starting program: /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B./ -print-prog-name=ld -fuse-ld=gold

Breakpoint 5, driver_handle_option (opts=0x6e5580 ,
opts_set=0x6e6020 , decoded=0x6fbab0,
lang_mask=524288, kind=0, loc=0, handlers=0x7fffddc0,
dc=0x6e6b00 )
at /export/gnu/import/git/gcc/gcc/gcc.c:3291
3291  size_t opt_index = decoded->opt_index;
$5 = {opt_index = 116, warn_message = 0x0, arg = 0x7fffe4f8 "./",
  orig_option_with_args_text = 0x6fa410 "-B./", canonical_option = {
0x495bcc "-B", 0x7fffe4f8 "./", 0x0, 0x0},
  canonical_option_num_elements = 2, value = 1, errors = 0}
(gdb) c
Continuing.

Breakpoint 5, driver_handle_option (opts=0x6e5580 ,
opts_set=0x6e6020 , decoded=0x6fbb00,
lang_mask=524288, kind=0, loc=0, handlers=0x7fffddc0,
dc=0x6e6b00 )
at /export/gnu/import/git/gcc/gcc/gcc.c:3291
3291  size_t opt_index = decoded->opt_index;
$6 = {opt_index = 1212, warn_message = 0x0, arg = 0x7fffe50c "ld",
  orig_option_with_args_text = 0x6fa440 "-print-prog-name=ld",
  canonical_option = {0x6fa420 "-print-prog-name=ld", 0x0, 0x0, 0x0},
  canonical_option_num_elements = 1, value = 1, errors = 0}
(gdb) c
Continuing.
ld

Breakpoint 3, 0x003a91a39290 in exit () from /lib64/libc.so.6
(gdb)


-- 
H.J.


RE: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-18 Thread Gopalasubramanian, Ganesh
> Yes, I changed that in the last patch, though I consider it momentarily 
> problematic because you do not yet enable AVX with march=btver2 (AVX versions 
> would currently be better than btver2 versions for a btver2 arch), but expect
march=btver2 will be fixed soon.

The " processor_alias_table" entry for "btver2" in i386.c enables AVX.


  {"btver2", PROCESSOR_BTVER2, CPU_BTVER2,
PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
| PTA_BMI | PTA_F16C | PTA_MOVBE | PTA_PRFCHW
| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},


The assembly listing for a simple test (compiled with -march=btver2) also has 
-mavx enabled. So, can you please enable AVX for btver2?

Regards
Ganesh



Re: [GOMP4][PATCH] SIMD-enabled functions (formerly Elemental functions) for C++

2013-12-18 Thread Jakub Jelinek
On Wed, Dec 18, 2013 at 11:36:04PM +, Iyer, Balaji V wrote:
> --- a/gcc/cp/decl2.c
> +++ b/gcc/cp/decl2.c
> @@ -1124,6 +1124,10 @@ is_late_template_attribute (tree attr, tree decl)
>&& is_attribute_p ("omp declare simd", name))
>  return true;
>  
> +  /* Ditto as above for Cilk Plus SIMD-enabled function attributes.  */
> +  if (flag_enable_cilkplus && is_attribute_p ("cilk simd function", name))
> +return true;

Why?  It doesn't have any argument, why it should be processed late?

> @@ -17097,6 +17102,14 @@ cp_parser_direct_declarator (cp_parser* parser,
>  
> attrs = cp_parser_std_attribute_spec_seq (parser);
>  
> +   /* In here, we handle cases where attribute is used after
> +  the function declaration.  For example:
> +  void func (int x) __attribute__((vector(..)));  */
> +   if (flag_enable_cilkplus
> +   && cp_lexer_next_token_is_keyword (parser->lexer,
> +  RID_ATTRIBUTE))
> + attrs = chainon (cp_parser_gnu_attributes_opt (parser),
> +  attrs);
> late_return = (cp_parser_late_return_type_opt
>(parser, declarator,
> memfn ? cv_quals : -1));

Doesn't this change the grammar (for all attributes, not just Cilk+ specific
ones) just based on whether -fcilkplus has been specified or not?

> @@ -17820,10 +17833,14 @@ cp_parser_late_return_type_opt (cp_parser* parser, 
> cp_declarator *declarator,
>&& declarator
>&& declarator->kind == cdk_id);
>  
> +  bool cilk_simd_fn_vector_p = (parser->cilk_simd_fn_info
> +&& declarator
> +&& declarator->kind == cdk_id);

Formatting looks wrong, put = on the next line and align && right below
parser.
> +
> +cp_omp_declare_simd_data info;

Global var?  Why?  Isn't heap or GC allocation better?
> +  /* The vectorlength clause in #pragma simdbehaves exactly like OpenMP's

Missing space after simd

> @@ -8602,9 +8602,12 @@ apply_late_template_attributes (tree *decl_p, tree 
> attributes, int attr_flags,
>   {
> *p = TREE_CHAIN (t);
> TREE_CHAIN (t) = NULL_TREE;
> -   if (flag_openmp
> -   && is_attribute_p ("omp declare simd",
> -  get_attribute_name (t))
> +   if (((flag_openmp
> + && is_attribute_p ("omp declare simd",
> +get_attribute_name (t)))
> +|| (flag_enable_cilkplus
> +&& is_attribute_p ("cilk simd function",
> +   get_attribute_name (t

Again, why this?

Jakub