[PATCH i386 11/8] [AVX512] Add missing packed PF gathers/scatters, rename load/store.

2014-01-13 Thread Kirill Yukhin
Hello,
This patch introduces missing AVX-512PF intrinsics and tests.
It also renames store/load intrinsics according to EAS.

gcc/
* config/i386/avx512fintrin.h (_mm512_loadu_si512): Rename.
(_mm512_storeu_si512): Ditto.
* config/i386/avx512pfintrin.h (_mm512_mask_prefetch_i32gather_pd): New.
(_mm512_mask_prefetch_i64gather_pd): Ditto.
(_mm512_prefetch_i32scatter_pd): Ditto.
(_mm512_mask_prefetch_i32scatter_pd): Ditto.
(_mm512_prefetch_i64scatter_pd): Ditto.
(_mm512_mask_prefetch_i64scatter_pd): Ditto.
(_mm512_mask_prefetch_i32gather_ps): Fix operand type.
(_mm512_mask_prefetch_i64gather_ps): Ditto.
(_mm512_prefetch_i32scatter_ps): Ditto.
(_mm512_mask_prefetch_i32scatter_ps): Ditto.
(_mm512_prefetch_i64scatter_ps): Ditto.
(_mm512_mask_prefetch_i64scatter_ps): Ditto.
* config/i386/i386-builtin-types.def: Define
VOID_FTYPE_QI_V8SI_PCINT64_INT_INT and 
VOID_FTYPE_QI_V8DI_PCINT64_INT_INT.
* config/i386/i386.c (ix86_builtins): Define IX86_BUILTIN_GATHERPFQPD,
IX86_BUILTIN_GATHERPFDPD, IX86_BUILTIN_SCATTERPFDPD,
IX86_BUILTIN_SCATTERPFQPD.
(ix86_init_mmx_sse_builtins): Define __builtin_ia32_gatherpfdpd,
__builtin_ia32_gatherpfdps, __builtin_ia32_gatherpfqpd,
__builtin_ia32_gatherpfqps, __builtin_ia32_scatterpfdpd,
__builtin_ia32_scatterpfdps, __builtin_ia32_scatterpfqpd,
__builtin_ia32_scatterpfqps.
(ix86_expand_builtin): Expand new built-ins.
* config/i386/sse.md (avx512pf_gatherpf): Add SF suffix,
fix memory access data type.
(*avx512pf_gatherpf_mask): Ditto.
(*avx512pf_gatherpf): Ditto.
(avx512pf_scatterpf): Ditto.
(*avx512pf_scatterpf_mask): Ditto.
(*avx512pf_scatterpf): Ditto.
(avx512pf_gatherpfdf): New.
(*avx512pf_gatherpfdf_mask): Ditto.
(*avx512pf_gatherpfdf): Ditto.
(avx512pf_scatterpfdf): Ditto.
(*avx512pf_scatterpfdf_mask): Ditto.
(*avx512pf_scatterpfdf): Ditto.

testsuite/
* gcc.target/i386/avx512f-vmovdqu32-1.c: Fix intrinsic name.
* gcc.target/i386/avx512f-vmovdqu32-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpd-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpq-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpud-2.c: Ditto.
* gcc.target/i386/avx512f-vpcmpuq-2.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Ditto.
* gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Ditto.
* gcc.target/i386/sse-14.c: Add new built-ins, fix AVX-512ER
built-ins roudning immediate.
* gcc.target/i386/sse-22.c: Add new built-ins.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx-1.c: Ditto.

I have a doubts about changes to sse.md.
I've splitted existing (SF-only) patterns into 2: DF and SF.
As far as insn operands and final instruction have no such data
type discrimination I set this data type to (mem:..) part.
Having this (for SF):
  (define_expand "avx512pf_scatterpfsf"
[(unspec
   [(match_operand: 0 "register_or_constm1_operand")
(mem:SF
  ...

instead of this:
  (define_expand "avx512pf_scatterpf"
[(unspec
   [(match_operand: 0 "register_or_constm1_operand")
(mem:
  ...

Not sure if this (DI/SI) mode for mem is needed. Moreover, not sure what
that data type represents.

Patch in the bottom. AVX* and SSE* tests pass.

Comments or it is ok for trunk?

--
Thanks, K

---
 gcc/config/i386/avx512fintrin.h|   4 +-
 gcc/config/i386/avx512pfintrin.h   | 113 --
 gcc/config/i386/i386-builtin-types.def |   2 +
 gcc/config/i386/i386.c |  37 -
 gcc/config/i386/sse.md | 171 +++--
 gcc/testsuite/gcc.target/i386/avx-1.c  |   4 +
 .../gcc.target/i386/avx512f-vmovdqu32-1.c  |   4 +-
 .../gcc.target/i386/avx512f-vmovdqu32-2.c  |   4 +-
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpd-2.c   |   4 +-
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpq-2.c   |   4 +-
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpud-2.c  |   4 +-
 gcc/testsuite/gcc.target/i386/avx512f-vpcmpuq-2.c  |   4 +-
 .../gcc.target/i386/avx512pf-vgatherpf0dpd-1.c |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf0qpd-1.c |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf1dpd-1.c |  15 ++
 .../gcc.target/i386/avx512pf-vgatherpf1qpd-1.c |  15 ++
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c|  17 ++
 .../gcc.tar

Re: [PATCH] Fixing PR59006 and PR58921 by delaying loop invariant hoisting in vectorizer.

2014-01-13 Thread Cong Hou
I noticed that LIM could not hoist vector invariant, and that is why
my first implementation tries to hoist them all.

In addition, there are two disadvantages of hoisting invariant load +
lim method:

First, for some instructions the scalar version is faster than the
vector version, and in this case hoisting scalar instructions before
vectorization is better. Those instructions include data
packing/unpacking, integer multiplication with SSE2, etc..

Second, it may use more SIMD registers.

The following code shows a simple example:

char *a, *b, *c;
for (int i = 0; i < N; ++i)
  a[i] = b[0] * c[0] + a[i];

Vectorizing b[0]*c[0] is worse than loading the result of b[0]*c[0]
into a vector.


thanks,
Cong


On Mon, Jan 13, 2014 at 5:37 AM, Richard Biener  wrote:
> On Wed, 27 Nov 2013, Jakub Jelinek wrote:
>
>> On Wed, Nov 27, 2013 at 10:53:56AM +0100, Richard Biener wrote:
>> > Hmm.  I'm still thinking that we should handle this during the regular
>> > transform step.
>>
>> I wonder if it can't be done instead just in vectorizable_load,
>> if LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) and the load is
>> invariant, just emit the (broadcasted) load not inside of the loop, but on
>> the loop preheader edge.
>
> So this implements this suggestion, XFAILing the no longer handled cases.
> For example we get
>
>   _94 = *b_8(D);
>   vect_cst_.18_95 = {_94, _94, _94, _94};
>   _99 = prolog_loop_adjusted_niters.9_132 * 4;
>   vectp_a.22_98 = a_6(D) + _99;
>   ivtmp.43_77 = (unsigned long) vectp_a.22_98;
>
>   :
>   # ivtmp.41_67 = PHI 
>   # ivtmp.43_71 = PHI 
>   vect__10.19_97 = vect_cst_.18_95 + { 1, 1, 1, 1 };
>   _76 = (void *) ivtmp.43_71;
>   MEM[base: _76, offset: 0B] = vect__10.19_97;
>
> ...
>
> instead of having hoisted *b_8 + 1 as scalar computation.  Not sure
> why LIM doesn't hoist the vector variant later.
>
> vect__10.19_97 = vect_cst_.18_95 + vect_cst_.20_96;
>   invariant up to level 1, cost 1.
>
> ah, the cost thing.  Should be "improved" to see that hoisting
> reduces the number of live SSA names in the loop.
>
> Eventually lower_vector_ssa could optimize vector to scalar
> code again ... (ick).
>
> Bootstrap / regtest running on x86_64.
>
> Comments?
>
> Thanks,
> Richard.
>
> 2014-01-13  Richard Biener  
>
> PR tree-optimization/58921
> PR tree-optimization/59006
> * tree-vect-loop-manip.c (vect_loop_versioning): Remove code
> hoisting invariant stmts.
> * tree-vect-stmts.c (vectorizable_load): Insert the splat of
> invariant loads on the preheader edge if possible.
>
> * gcc.dg/torture/pr58921.c: New testcase.
> * gcc.dg/torture/pr59006.c: Likewise.
> * gcc.dg/vect/pr58508.c: XFAIL no longer handled cases.
>
> Index: gcc/tree-vect-loop-manip.c
> ===
> *** gcc/tree-vect-loop-manip.c  (revision 206576)
> --- gcc/tree-vect-loop-manip.c  (working copy)
> *** vect_loop_versioning (loop_vec_info loop
> *** 2435,2507 
> }
>   }
>
> -
> -   /* Extract load statements on memrefs with zero-stride accesses.  */
> -
> -   if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
> - {
> -   /* In the loop body, we iterate each statement to check if it is a 
> load.
> -Then we check the DR_STEP of the data reference.  If DR_STEP is zero,
> -then we will hoist the load statement to the loop preheader.  */
> -
> -   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> -   int nbbs = loop->num_nodes;
> -
> -   for (int i = 0; i < nbbs; ++i)
> -   {
> - for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]);
> -  !gsi_end_p (si);)
> -   {
> - gimple stmt = gsi_stmt (si);
> - stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> - struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
> -
> - if (is_gimple_assign (stmt)
> - && (!dr
> - || (DR_IS_READ (dr) && integer_zerop (DR_STEP (dr)
> -   {
> - bool hoist = true;
> - ssa_op_iter iter;
> - tree var;
> -
> - /* We hoist a statement if all SSA uses in it are defined
> -outside of the loop.  */
> - FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
> -   {
> - gimple def = SSA_NAME_DEF_STMT (var);
> - if (!gimple_nop_p (def)
> - && flow_bb_inside_loop_p (loop, gimple_bb (def)))
> -   {
> - hoist = false;
> - break;
> -   }
> -   }
> -
> - if (hoist)
> -   {
> - if (dr)
> -   gimple_set_vuse (stmt, NULL);
> -
> - gsi_remove (&si, false);
> - gsi_i

[PATCH/AARCH64] Add issue_rate tuning field

2014-01-13 Thread Andrew Pinski
Hi,
  While writing a scheduler for Cavium's aarch64 processor (Thunder),
I found there was no way currently to change the issue rate in
back-end.  This patch adds a field (issue_rate) to tune_params and
creates a new function that the middle-end calls.  I updated the
current two tuning variables (generic_tunings and cortexa53_tunings)
to be 1 which was the default before.

OK?  Built and tested for aarch64-elf with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/aarch64-protos.h (tune_params): Add issue_rate.
* config/aarch64/aarch64.c (generic_tunings): Add issue rate of 1.
 (cortexa53_tunings): Likewise.
(aarch64_sched_issue_rate): New function.
(TARGET_SCHED_ISSUE_RATE): Define.
Index: config/aarch64/aarch64-protos.h
===
--- config/aarch64/aarch64-protos.h (revision 206594)
+++ config/aarch64/aarch64-protos.h (working copy)
@@ -156,6 +156,7 @@ struct tune_params
   const struct cpu_regmove_cost *const regmove_cost;
   const struct cpu_vector_cost *const vec_costs;
   const int memmov_cost;
+  const int issue_rate;
 };
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
Index: config/aarch64/aarch64.c
===
--- config/aarch64/aarch64.c(revision 206594)
+++ config/aarch64/aarch64.c(working copy)
@@ -221,7 +221,8 @@ static const struct tune_params generic_
   &generic_addrcost_table,
   &generic_regmove_cost,
   &generic_vector_cost,
-  NAMED_PARAM (memmov_cost, 4)
+  NAMED_PARAM (memmov_cost, 4),
+  NAMED_PARAM (issue_rate, 1)
 };
 
 static const struct tune_params cortexa53_tunings =
@@ -230,7 +231,8 @@ static const struct tune_params cortexa5
   &generic_addrcost_table,
   &generic_regmove_cost,
   &generic_vector_cost,
-  NAMED_PARAM (memmov_cost, 4)
+  NAMED_PARAM (memmov_cost, 4),
+  NAMED_PARAM (issue_rate, 1)
 };
 
 /* A processor implementing AArch64.  */
@@ -4895,6 +4897,13 @@ aarch64_memory_move_cost (enum machine_m
   return aarch64_tune_params->memmov_cost;
 }
 
+/* Return the number of instructions that can be issued per cycle.  */
+static int
+aarch64_sched_issue_rate (void)
+{
+  return aarch64_tune_params->issue_rate;
+}
+
 /* Vectorizer cost model target hooks.  */
 
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
@@ -8411,6 +8420,9 @@ aarch64_vectorize_vec_perm_const_ok (enu
 #undef TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS aarch64_rtx_costs
 
+#undef TARGET_SCHED_ISSUE_RATE
+#define TARGET_SCHED_ISSUE_RATE aarch64_sched_issue_rate
+
 #undef TARGET_TRAMPOLINE_INIT
 #define TARGET_TRAMPOLINE_INIT aarch64_trampoline_init
 


[PATCH,rs6000] Implement -maltivec=be for vec_mule and vec_mulo Altivec intrinsics

2014-01-13 Thread Bill Schmidt
This patch provides for interpreting parity of element numbers for the
Altivec vec_mule and vec_mulo intrinsics as big-endian (left to right in
a vector register) when targeting a little endian machine and specifying
-maltivec=be.  New test cases are added to test this functionality on
all supported vector types.

The main change is in the altivec.md define_insns for
vec_widen_{su}mult_{even,odd}_{v8hi,v16qi}, where we now test for
VECTOR_ELT_ORDER_BIG rather than BYTES_BIG_ENDIAN in order to treat the
element order as big-endian.  However, this necessitates changes to
other places in altivec.md where we previously called
gen_vec_widen_{su}mult_*.  The semantics of these internal uses are not
affected by -maltivec=be, so these are now replaced with direct
generation of the underlying instructions that were previously
generated.

Bootstrapped and tested with no new regressions on
powerpc64{,le}-unknown-linux-gnu.  Ok for trunk?

Thanks,
Bill


gcc:

2014-01-13  Bill Schmidt  

* config/rs6000/altivec.md (mulv8hi3): Explicitly generate vmulesh
and vmulosh rather than call gen_vec_widen_smult_*.
(vec_widen_umult_even_v16qi): Test VECTOR_ELT_ORDER_BIG rather
than BYTES_BIG_ENDIAN to determine use of even or odd instruction.
(vec_widen_smult_even_v16qi): Likewise.
(vec_widen_umult_even_v8hi): Likewise.
(vec_widen_smult_even_v8hi): Likewise.
(vec_widen_umult_odd_v16qi): Likewise.
(vec_widen_smult_odd_v16qi): Likewise.
(vec_widen_umult_odd_v8hi): Likewise.
(vec_widen_smult_odd_v8hi): Likewise.
(vec_widen_umult_hi_v16qi): Explicitly generate vmuleub and
vmuloub rather than call gen_vec_widen_umult_*.
(vec_widen_umult_lo_v16qi): Likewise.
(vec_widen_smult_hi_v16qi): Explicitly generate vmulesb and
vmulosb rather than call gen_vec_widen_smult_*.
(vec_widen_smult_lo_v16qi): Likewise.
(vec_widen_umult_hi_v8hi): Explicitly generate vmuleuh and vmulouh
rather than call gen_vec_widen_umult_*.
(vec_widen_umult_lo_v8hi): Likewise.
(vec_widen_smult_hi_v8hi): Explicitly gnerate vmulesh and vmulosh
rather than call gen_vec_widen_smult_*.
(vec_widen_smult_lo_v8hi): Likewise.

gcc/testsuite:

2014-01-13  Bill Schmidt  

* gcc.dg/vmx/mult-even-odd.c: New.
* gcc.dg/vmx/mult-even-odd-be-order.c: New.



Index: gcc/testsuite/gcc.dg/vmx/mult-even-odd.c
===
--- gcc/testsuite/gcc.dg/vmx/mult-even-odd.c(revision 0)
+++ gcc/testsuite/gcc.dg/vmx/mult-even-odd.c(revision 0)
@@ -0,0 +1,43 @@
+#include "harness.h"
+
+static void test()
+{
+  vector unsigned char vuca = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  vector unsigned char vucb = {2,3,2,3,2,3,2,3,2,3,2,3,2,3,2,3};
+  vector signed char vsca = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};
+  vector signed char vscb = {2,-3,2,-3,2,-3,2,-3,2,-3,2,-3,2,-3,2,-3};
+  vector unsigned short vusa = {0,1,2,3,4,5,6,7};
+  vector unsigned short vusb = {2,3,2,3,2,3,2,3};
+  vector signed short vssa = {-4,-3,-2,-1,0,1,2,3};
+  vector signed short vssb = {2,-3,2,-3,2,-3,2,-3};
+  vector unsigned short vuse, vuso;
+  vector signed short vsse, vsso;
+  vector unsigned int vuie, vuio;
+  vector signed int vsie, vsio;
+
+  vuse = vec_mule (vuca, vucb);
+  vuso = vec_mulo (vuca, vucb);
+  vsse = vec_mule (vsca, vscb);
+  vsso = vec_mulo (vsca, vscb);
+  vuie = vec_mule (vusa, vusb);
+  vuio = vec_mulo (vusa, vusb);
+  vsie = vec_mule (vssa, vssb);
+  vsio = vec_mulo (vssa, vssb);
+
+  check (vec_all_eq (vuse,
+((vector unsigned short){0,4,8,12,16,20,24,28})),
+"vuse");
+  check (vec_all_eq (vuso,
+((vector unsigned short){3,9,15,21,27,33,39,45})),
+"vuso");
+  check (vec_all_eq (vsse,
+((vector signed short){-16,-12,-8,-4,0,4,8,12})),
+"vsse");
+  check (vec_all_eq (vsso,
+((vector signed short){21,15,9,3,-3,-9,-15,-21})),
+"vsso");
+  check (vec_all_eq (vuie, ((vector unsigned int){0,4,8,12})), "vuie");
+  check (vec_all_eq (vuio, ((vector unsigned int){3,9,15,21})), "vuio");
+  check (vec_all_eq (vsie, ((vector signed int){-8,-4,0,4})), "vsie");
+  check (vec_all_eq (vsio, ((vector signed int){9,3,-3,-9})), "vsio");
+}
Index: gcc/testsuite/gcc.dg/vmx/mult-even-odd-be-order.c
===
--- gcc/testsuite/gcc.dg/vmx/mult-even-odd-be-order.c   (revision 0)
+++ gcc/testsuite/gcc.dg/vmx/mult-even-odd-be-order.c   (revision 0)
@@ -0,0 +1,64 @@
+/* { dg-options "-maltivec=be -mabi=altivec -std=gnu99 -mno-vsx" } */
+
+#include "harness.h"
+
+static void test()
+{
+  vector unsigned char vuca = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  vector unsigned char vucb = {2,3,2,3,2,3,2,3,2,3,2,3,2,3,2,3};
+  vector signed char vsca = {-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7};
+  vect

Re: [PATCH] i?86 unaligned/aligned load improvement for AVX512F

2014-01-13 Thread Jakub Jelinek
On Mon, Jan 13, 2014 at 07:35:41PM +0100, Uros Bizjak wrote:
> Jakub, do you plan to submit this patch?

That would be following patch then, tested on x86_64-linux.
Unfortunately, it doesn't help for the avx512f-vmovdqu32-1.c
testcase, the thing is that the __m512i type is V8DImode and while
the emitted (unaligned) load is V16SImode, as it is then cast to
V8DImode, combiner combines it into V8DImode load and thus it is
vmovdqu64 anyway.  So not sure if this is worth it, your call...

But, while at it, is there any reason why we treat V64QImode and V32HImode
so badly?  As vec_initv64qi and vec_initv32hi aren't defined, e.g. for the
foo_1 in avx512f-vec-init.c we generate ~ 180 instructions when I'd say
vmovd   %edi, %xmm0
vpbroadcastb%xmm0, %xmm0
vpbroadcastq%xmm0, %zmm0
ret
would do the trick just fine.

2014-01-13  Jakub Jelinek  

* config/i386/sse.md (*mov_internal): Only use
vmovdqa64 or vmovdqu64 instructions for V?DImode, for other
MODE_VECT_INT modes use vmovdqa32 or vmovdqu32.

* gcc.target/i386/avx512f-vec-init.c: Expect vmovdqa32 instead
of vmovdqa64.

--- gcc/config/i386/sse.md.jj   2014-01-04 10:56:54.795976470 +0100
+++ gcc/config/i386/sse.md  2014-01-13 20:30:04.052499798 +0100
@@ -705,7 +705,14 @@ (define_insn "*mov_internal"
return "vmovapd\t{%g1, %g0|%g0, %g1}";
  case MODE_OI:
  case MODE_TI:
-   return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
+   switch (mode)
+ {
+ case V4DImode:
+ case V2DImode:
+   return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
+ default:
+   return "vmovdqa32\t{%g1, %g0|%g0, %g1}";
+ }
  default:
gcc_unreachable ();
  }
@@ -743,9 +750,16 @@ (define_insn "*mov_internal"
case MODE_XI:
  if (misaligned_operand (operands[0], mode)
  || misaligned_operand (operands[1], mode))
-   return "vmovdqu64\t{%1, %0|%0, %1}";
- else
+   {
+ if (mode == V8DImode)
+   return "vmovdqu64\t{%1, %0|%0, %1}";
+ else
+   return "vmovdqu32\t{%1, %0|%0, %1}";
+   }
+ else if (mode == V8DImode)
return "vmovdqa64\t{%1, %0|%0, %1}";
+ else
+   return "vmovdqa32\t{%1, %0|%0, %1}";
 
default:
  gcc_unreachable ();
--- gcc/testsuite/gcc.target/i386/avx512f-vec-init.c.jj 2013-12-31 
12:51:09.0 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-vec-init.c2014-01-13 
21:42:48.410415601 +0100
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -mavx512f" } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+%zmm" 2 } } */
+/* { dg-final { scan-assembler-times "vmovdqa32\[ \\t\]+%zmm" 2 } } */
 /* { dg-final { scan-assembler-times "vpbroadcastd" 1 } } */
 /* { dg-final { scan-assembler-times "vpbroadcastq" 1 } } */
 /* { dg-final { scan-assembler-times "vpbroadcastb" 2 } } */


Jakub


Re: [C PATCH] Disallow subtracting pointers to empty structs (PR c/58346)

2014-01-13 Thread Marek Polacek
On Mon, Jan 13, 2014 at 05:48:59PM +0100, Marek Polacek wrote:
> The patch will need some tweaking, I realized that e.g. for struct S {
> union {}; }; it doesn't do the right thing...

Done in the patch below.  CCing Jason for the C++ part.  Does this
look sane now?

Regtested/bootstrapped on x86_64.

2014-01-13  Marek Polacek  

PR c/58346
c-family/
* c-common.c (pointer_to_zero_sized_aggr_p): New function.
* c-common.h: Declare it.
cp/
* typeck.c (pointer_diff): Give an error on arithmetic on pointer to
an empty aggregate.
c/
* c-typeck.c (pointer_diff): Give an error on arithmetic on pointer to
an empty aggregate.
testsuite/
* c-c++-common/pr58346.c: New test.

--- gcc/c-family/c-common.h.mp  2014-01-13 19:02:22.249870601 +0100
+++ gcc/c-family/c-common.h 2014-01-13 19:04:15.068294390 +0100
@@ -789,6 +789,7 @@ extern bool keyword_is_storage_class_spe
 extern bool keyword_is_type_qualifier (enum rid);
 extern bool keyword_is_decl_specifier (enum rid);
 extern bool cxx_fundamental_alignment_p (unsigned);
+extern bool pointer_to_zero_sized_aggr_p (tree);
 
 #define c_sizeof(LOC, T)  c_sizeof_or_alignof_type (LOC, T, true, false, 1)
 #define c_alignof(LOC, T) c_sizeof_or_alignof_type (LOC, T, false, false, 1)
--- gcc/c-family/c-common.c.mp  2014-01-13 19:01:20.503637616 +0100
+++ gcc/c-family/c-common.c 2014-01-13 19:42:32.805135382 +0100
@@ -11829,4 +11829,17 @@ cxx_fundamental_alignment_p  (unsigned a
 TYPE_ALIGN (long_double_type_node)));
 }
 
+/* Return true if T is a pointer to a zero-sized struct/union.  */
+
+bool
+pointer_to_zero_sized_aggr_p (tree t)
+{
+  t = strip_pointer_operator (t);
+  if (RECORD_OR_UNION_TYPE_P (t)
+  && TYPE_SIZE (t)
+  && integer_zerop (TYPE_SIZE (t)))
+return true;
+  return false;
+}
+
 #include "gt-c-family-c-common.h"
--- gcc/cp/typeck.c.mp  2014-01-13 19:08:12.237244663 +0100
+++ gcc/cp/typeck.c 2014-01-13 19:10:23.350742070 +0100
@@ -5043,6 +5043,14 @@ pointer_diff (tree op0, tree op1, tree p
return error_mark_node;
 }
 
+  if (pointer_to_zero_sized_aggr_p (TREE_TYPE (op1)))
+{
+  if (complain & tf_error)
+   error ("arithmetic on pointer to an empty aggregate");
+  else
+   return error_mark_node;
+}
+
   op1 = (TYPE_PTROB_P (ptrtype)
 ? size_in_bytes (target_type)
 : integer_one_node);
--- gcc/c/c-typeck.c.mp 2014-01-13 15:47:01.316105676 +0100
+++ gcc/c/c-typeck.c2014-01-13 19:58:19.237271626 +0100
@@ -3536,6 +3536,9 @@ pointer_diff (location_t loc, tree op0,
   /* This generates an error if op0 is pointer to incomplete type.  */
   op1 = c_size_in_bytes (target_type);
 
+  if (pointer_to_zero_sized_aggr_p (TREE_TYPE (orig_op1)))
+error_at (loc, "arithmetic on pointer to an empty aggregate");
+
   /* Divide by the size, in easiest possible way.  */
   result = fold_build2_loc (loc, EXACT_DIV_EXPR, inttype,
op0, convert (inttype, op1));
--- gcc/testsuite/c-c++-common/pr58346.c.mp 2014-01-13 15:48:20.011420141 
+0100
+++ gcc/testsuite/c-c++-common/pr58346.c2014-01-13 20:25:17.544582444 
+0100
@@ -0,0 +1,24 @@
+/* PR c/58346 */
+/* { dg-do compile } */
+
+struct U {
+#ifdef __cplusplus
+  char a[0];
+#endif
+};
+static struct U b[6];
+static struct U **u1, **u2;
+
+int
+foo (struct U *p, struct U *q)
+{
+  return q - p; /* { dg-error "arithmetic on pointer to an empty aggregate" } 
*/
+}
+
+void
+bar (void)
+{
+  __PTRDIFF_TYPE__ d = u1 - u2; /* { dg-error "arithmetic on pointer to an 
empty aggregate" } */
+  __asm volatile ("" : "+g" (d));
+  foo (&b[0], &b[4]);
+}

Marek


Re: [PATCH,rs6000] Implement -maltivec=be for vec_insert and vec_extract Altivec intrinsics

2014-01-13 Thread David Edelsohn
On Sun, Jan 12, 2014 at 7:53 PM, Bill Schmidt
 wrote:
> This patch provides for interpreting element numbers for the Altivec
> vec_insert and vec_extract intrinsics as big-endian (left to right in a
> vector register) when targeting a little endian machine and specifying
> -maltivec=be.  New test cases are added to test this functionality on
> all supported vector types.
>
> Bootstrapped and tested with no new regressions on
> powerpc64{,le}-unknown-linux-gnu.  Ok for trunk?
>
> Thanks,
> Bill
>
>
> gcc:
>
> 2014-01-12  Bill Schmidt  
>
> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> Implement -maltivec=be for vec_insert and vec_extract.
>
> gcc/testsuite:
>
> 2014-01-12  Bill Schmidt  
>
> * gcc.dg/vmx/insert.c: New.
> * gcc.dg/vmx/insert-be-order.c: New.
> * gcc.dg/vmx/extract.c: New.
> * gcc.dg/vmx/extract-be-order.c: New.


> +  if (!BYTES_BIG_ENDIAN && rs6000_altivec_element_order == 2)
> +   {
> + int last_elem = TYPE_VECTOR_SUBPARTS (arg1_type) - 1;
> + double_int di_last_elem = double_int::from_uhwi (last_elem);
> + arg2 = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (arg2),
> + double_int_to_tree (TREE_TYPE (arg2),
> + di_last_elem),
> + arg2);
> +   }

Please change last_elem to unsigned int in both blocks of code.  And I
believe that GCC provides a more direct API to create a Tree from
last_elem than the double_int::from_uhwi() and double_int_to_tree()
dance because it seems that the value is constant for each instance.
build_int_cstu()?

Okay with those changes.

Thanks, David


Re: Fix tree containers debug mode C++11 allocator awareness

2014-01-13 Thread François Dumont

On 12/22/2013 09:55 PM, François Dumont wrote:

On 12/22/2013 12:51 PM, Jonathan Wakely wrote:

On 21 December 2013 08:51, François Dumont wrote:

Any feedback for this proposal ?
It looks good but I don't have time to review it fully yet, please be 
patient.


I'm more concerned about your comment about the non-debug mode
implementation being incorrect, could you provide more details?
.

That's not a big issue. The constructor taking a rvalue reference 
and an allocator doesn't take care about safe iterators. They should 
be swap like in the move constructor when allocator is equivalent and 
invalidated if we have not been able to move memory. I plan to submit 
a patch to fix all implementations the same way at once but I can 
include it in this patch if you prefer.




Following agreement given here:

http://gcc.gnu.org/ml/libstdc++/2014-01/msg00066.html

Attached patch applied.

Profile mode will need the same kind of patch too.

2014-01-13  François Dumont  

* include/debug/set.h (set): Implement C++11 allocator-aware
container requirements.
* include/debug/map.h (map): Likewise.
* include/debug/multiset.h (multiset): Likewise.
* include/debug/multimap.h (multimap): Likewise.
* include/debug/set.h (set::operator=(set&&)): Add noexcept and
fix implementation regarding management of safe iterators.
* include/debug/map.h (map::operator=(map&&)): Likewise.
* include/debug/multiset.h (multiset::operator=(multiset&&)): Likewise.
* include/debug/multimap.h (multimap::operator=(multimap&&)):
Likewise.
* include/debug/set.h (set::operator=(std::initializer_list<>)):
Rely on the same operator from normal mode.
* include/debug/map.h (map::operator=(std::initializer_list<>)):
Likewise.
* include/debug/multiset.h
(multiset::operator=(std::initializer_list<>)): Likewise.
* include/debug/multimap.h
(multimap::operator=(std::initializer_list<>)): Likewise.
* include/debug/set.h (set::swap(set&)): Add noexcept
specification, add allocator equality check.
* include/debug/map.h (map::swap(map&)): Likewise.
* include/debug/multiset.h (multiset::swap(multiset&)): Likewise.
* include/debug/multimap.h (multimap::swap(multimap&)): Likewise.

François

Index: include/debug/set.h
===
--- include/debug/set.h	(revision 206587)
+++ include/debug/set.h	(working copy)
@@ -49,6 +49,10 @@
   typedef typename _Base::const_iterator _Base_const_iterator;
   typedef typename _Base::iterator _Base_iterator;
   typedef __gnu_debug::_Equal_to<_Base_const_iterator> _Equal;
+#if __cplusplus >= 201103L
+  typedef __gnu_cxx::__alloc_traits _Alloc_traits;
+#endif
 public:
   // types:
   typedef _Keykey_type;
@@ -101,6 +105,28 @@
 	  const _Compare& __comp = _Compare(),
 	  const allocator_type& __a = allocator_type())
   : _Base(__l, __comp, __a) { }
+
+  explicit
+  set(const allocator_type& __a)
+  : _Base(__a) { }
+
+  set(const set& __x, const allocator_type& __a)
+  : _Base(__x, __a) { }
+
+  set(set&& __x, const allocator_type& __a)
+  : _Base(std::move(__x._M_base()), __a) { }
+
+  set(initializer_list __l, const allocator_type& __a)
+	: _Base(__l, __a)
+  { }
+
+  template
+set(_InputIterator __first, _InputIterator __last,
+	const allocator_type& __a)
+	: _Base(__gnu_debug::__base(__gnu_debug::__check_valid_range(__first,
+ __last)),
+		__gnu_debug::__base(__last), __a)
+{ }
 #endif
 
   ~set() _GLIBCXX_NOEXCEPT { }
@@ -108,7 +134,7 @@
   set&
   operator=(const set& __x)
   {
-	*static_cast<_Base*>(this) = __x;
+	_M_base() = __x;
 	this->_M_invalidate_all();
 	return *this;
   }
@@ -116,20 +142,25 @@
 #if __cplusplus >= 201103L
   set&
   operator=(set&& __x)
+  noexcept(_Alloc_traits::_S_nothrow_move())
   {
-	// NB: DR 1204.
-	// NB: DR 675.
 	__glibcxx_check_self_move_assign(__x);
-	clear();
-	swap(__x);
+	bool xfer_memory = _Alloc_traits::_S_propagate_on_move_assign()
+	|| __x.get_allocator() == this->get_allocator();
+	_M_base() = std::move(__x._M_base());
+	if (xfer_memory)
+	  this->_M_swap(__x);
+	else
+	  this->_M_invalidate_all();
+	__x._M_invalidate_all();
 	return *this;
   }
 
   set&
   operator=(initializer_list __l)
   {
-	this->clear();
-	this->insert(__l);
+	_M_base() = __l;
+	this->_M_invalidate_all();
 	return *this;
   }
 #endif
@@ -337,7 +368,14 @@
 
   void
   swap(set& __x)
+#if __cplusplus >= 201103L
+  noexcept(_Alloc_traits::_S_nothrow_swap())
+#endif
   {
+#if __cplusplus >= 201103L
+	if (!_Alloc_traits::_S_propagate_on_swap())
+	  __glibcxx_check_equal_allocs(__x);
+#endif
 	_Base::swap(__x);
 	this->_M_swap(__x);
   }
Index: include/debug/map.h
===
--- include/debug/map.h	(revision 206587)
++

PATCH: PR middle-end/59789: [4.9 Regression] ICE in in convert_move, at expr.c:333

2014-01-13 Thread H.J. Lu
Hi,

We should report some early inlining errors.  This patch is based on

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57698#c7

It adds report_early_inliner_always_inline_failure and uses it in
expand_call_inline.  Tested on Linux/x86-64. OK to install?

Thanks.


H.J.

commit 7b18b53d308b2c25bef5664be3e6544249d86bdc
Author: H.J. Lu 
Date:   Mon Jan 13 11:54:36 2014 -0800

Update error handling during early_inlining

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5c674bc..284bc66 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2014-01-13  Sriraman Tallam  
+   H.J. Lu  
+
+   PR middle-end/59789
+   * tree-inline.c (report_early_inliner_always_inline_failure): New
+   function.
+   (expand_call_inline): Emit errors during early_inlining if
+   report_early_inliner_always_inline_failure returns true.
+
 2014-01-10  DJ Delorie  
 
* config/msp430/msp430.md (call_internal): Don't allow memory
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 459e365..2a7b3ca 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-01-13  H.J. Lu  
+
+   PR middle-end/59789
+   * gcc.target/i386/pr59789.c: New testcase.
+
 2014-01-13  Jakub Jelinek  
 
PR tree-optimization/59387
diff --git a/gcc/testsuite/gcc.target/i386/pr59789.c 
b/gcc/testsuite/gcc.target/i386/pr59789.c
new file mode 100644
index 000..b476d6c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr59789.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ia32 } */
+/* { dg-options "-O -march=i686" } */
+
+#pragma GCC push_options
+#pragma GCC target("sse2")
+typedef int __v4si __attribute__ ((__vector_size__ (16)));
+typedef long long __m128i __attribute__ ((__vector_size__ (16), 
__may_alias__));
+
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
+_mm_set_epi32 (int __q3, int __q2, int __q1, int __q0) /* { dg-error "target 
specific option mismatch" } */
+{
+  return __extension__ (__m128i)(__v4si){ __q0, __q1, __q2, __q3 };
+}
+#pragma GCC pop_options
+
+
+__m128i
+f1(void) /* { dg-message "warning: SSE vector return without SSE enabled 
changes the ABI" } */
+{
+  return _mm_set_epi32 (0, 0, 0, 0); /* { dg-error "called from here" } */
+}
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 22521b1..ce1e3af 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4046,6 +4046,32 @@ add_local_variables (struct function *callee, struct 
function *caller,
   }
 }
 
+/* Should an error be reported when early inliner fails to inline an
+   always_inline function?  That depends on the REASON.  */
+
+static inline bool
+report_early_inliner_always_inline_failure (cgraph_inline_failed_t reason)
+{
+  /* Only the following reasons need to be reported when the early inliner
+ fails to inline an always_inline function.  Called from
+ expand_call_inline.  */
+  switch (reason)
+{
+case CIF_BODY_NOT_AVAILABLE:
+case CIF_FUNCTION_NOT_INLINABLE:
+case CIF_OVERWRITABLE:
+case CIF_MISMATCHED_ARGUMENTS:
+case CIF_EH_PERSONALITY:
+case CIF_UNSPECIFIED:
+case CIF_NON_CALL_EXCEPTIONS:
+case CIF_TARGET_OPTION_MISMATCH:
+case CIF_OPTIMIZATION_MISMATCH:
+  return true;
+default:
+  return false;
+}
+}
+
 /* If STMT is a GIMPLE_CALL, replace it with its inline expansion.  */
 
 static bool
@@ -4116,7 +4142,8 @@ expand_call_inline (basic_block bb, gimple stmt, 
copy_body_data *id)
  /* During early inline pass, report only when optimization is
 not turned on.  */
  && (cgraph_global_info_ready
- || !optimize)
+ || !optimize
+ || report_early_inliner_always_inline_failure (reason))
  /* PR 20090218-1_0.c. Body can be provided by another module. */
  && (reason != CIF_BODY_NOT_AVAILABLE || !flag_generate_lto))
{


[Patch,AArch64] Support SISD variants of SCVTF,UCVTF

2014-01-13 Thread Vidya Praveen
Hello,

This patch adds support to the SISD variants of SCVTF/UCVTF instructions.
This also refactors the existing support for floating point instruction
variants of SCVTF/UCVTF in order to direct the instruction selection based
on the constraints. Given that the floating-point variations supports
inequal width convertions (SI to DF and DI to SF), new mode iterator w1 and
w2 have been introduced and fcvt_target,FCVT_TARGET have been extended to
support non vector type. Since this patch changes the existing patterns, the
testcase includes tests for both SISD and floating point variations of the
instructions.

Tested for aarch64-none-elf.

OK for trunk?

Cheers
VP.

gcc/ChangeLog:

2013-01-13  Vidya Praveen  

* aarch64.md (float2): Remove.
(floatuns2): Remove.
(2): New pattern for equal width float
and floatuns conversions.
(2): New pattern for inequal width float
and floatuns conversions.
* iterators.md (fcvt_target, FCVT_TARGET): Support SF and DF modes.
(w1,w2): New mode attributes for inequal width conversions.

gcc/testsuite/ChangeLog:

2013-01-13  Vidya Praveen  

* gcc.target/aarch64/cvtf_1.c: New.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c83622d..1775849 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3295,20 +3295,24 @@
   [(set_attr "type" "f_cvtf2i")]
 )
 
-(define_insn "float2"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-(float:GPF (match_operand:GPI 1 "register_operand" "r")))]
-  "TARGET_FLOAT"
-  "scvtf\\t%0, %1"
-  [(set_attr "type" "f_cvti2f")]
+(define_insn "2"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w")
+(FLOATUORS:GPF (match_operand: 1 "register_operand" "w,r")))]
+  ""
+  "@
+   cvtf\t%0, %1
+   cvtf\t%0, %1"
+  [(set_attr "simd" "yes,no")
+   (set_attr "fp" "no,yes")
+   (set_attr "type" "neon_int_to_fp_,f_cvti2f")]
 )
 
-(define_insn "floatuns2"
+(define_insn "2"
   [(set (match_operand:GPF 0 "register_operand" "=w")
-(unsigned_float:GPF (match_operand:GPI 1 "register_operand" "r")))]
+(FLOATUORS:GPF (match_operand: 1 "register_operand" "r")))]
   "TARGET_FLOAT"
-  "ucvtf\\t%0, %1"
-  [(set_attr "type" "f_cvt")]
+  "cvtf\t%0, %1"
+  [(set_attr "type" "f_cvti2f")]
 )
 
 ;; ---
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c4f95dc..11bdc35 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -293,6 +293,10 @@
 ;; 32-bit version and "%x0" in the 64-bit version.
 (define_mode_attr w [(QI "w") (HI "w") (SI "w") (DI "x") (SF "s") (DF "d")])
 
+;; For inequal width int to float conversion
+(define_mode_attr w1 [(SF "w") (DF "x")])
+(define_mode_attr w2 [(SF "x") (DF "w")])
+
 ;; For constraints used in scalar immediate vector moves
 (define_mode_attr hq [(HI "h") (QI "q")])
 
@@ -558,8 +562,12 @@
 (define_mode_attr atomic_sfx
   [(QI "b") (HI "h") (SI "") (DI "")])
 
-(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si")])
-(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI")])
+(define_mode_attr fcvt_target [(V2DF "v2di") (V4SF "v4si") (V2SF "v2si") (SF "si") (DF "di")])
+(define_mode_attr FCVT_TARGET [(V2DF "V2DI") (V4SF "V4SI") (V2SF "V2SI") (SF "SI") (DF "DI")])
+
+;; for the inequal width integer to fp conversions
+(define_mode_attr fcvt_iesize [(SF "di") (DF "si")])
+(define_mode_attr FCVT_IESIZE [(SF "DI") (DF "SI")])
 
 (define_mode_attr VSWAP_WIDTH [(V8QI "V16QI") (V16QI "V8QI")
 (V4HI "V8HI") (V8HI  "V4HI")
diff --git a/gcc/testsuite/gcc.target/aarch64/cvtf_1.c b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
new file mode 100644
index 000..80ab9a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cvtf_1.c
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -fno-inline -O1" } */
+
+#define FCVTDEF(ftype,itype) \
+void \
+cvt_##itype##_to_##ftype (itype a, ftype b)\
+{\
+  ftype c;\
+  c = (ftype) a;\
+  if ( (c - b) > 0.1) abort();\
+}
+
+#define force_simd_for_float(v) asm volatile ("mov %s0, %1.s[0]" :"=w" (v) :"w" (v) :)
+#define force_simd_for_double(v) asm volatile ("mov %d0, %1.d[0]" :"=w" (v) :"w" (v) :)
+
+#define FCVTDEF_SISD(ftype,itype) \
+void \
+cvt_##itype##_to_##ftype##_sisd (itype a, ftype b)\
+{\
+  ftype c;\
+  force_simd_for_##ftype(a);\
+  c = (ftype) a;\
+  if ( (c - b) > 0.1) abort();\
+}
+
+#define FCVT(ftype,itype,ival,fval) cvt_##itype##_to_##ftype (ival, fval);
+#define FCVT_SISD(ftype,itype,ival,fval) cvt_##itype##_to_##ftype##_sisd (ival, fval);
+
+typedef int int32_t;
+typedef unsigned int uint32_t;
+typedef long long int int64_t;
+typedef unsigned long long int uint64_t;
+
+extern void abort();
+
+FCVTDEF (float, int32_t)
+/* { dg-final { scan-assembler "scvtf\ts\[0-9\]+,\ w\[0-9\]+" } } */
+FCVTDEF (float, uint32_t)
+/* { dg-final { scan-assembler "uc

[PATCH] Fix up vect/fast-math-mgrid-resid.f testcase (PR testsuite/59494)

2014-01-13 Thread Jakub Jelinek
Hi!

As discussed in the PR and on IRC, this testcase is very fragile, counting
additions with vect_ named SSA_NAME on lhs works only for some tunings,
for other tunings reassoc width etc. affect it and we can e.g. have
anonymous SSA_NAMEs on the lhs in the optimized dump instead.

These alternate regexps seems to match regardless of the tunings (at least
what I've tried), starting with the corresponding fix onwards, and FAIL
before the fix.

Regtested on x86_64-linux and i686-linux, ok for trunk?

2014-01-13  Jakub Jelinek  

PR testsuite/59494
* gfortran.dg/vect/fast-math-mgrid-resid.f: Change
-fdump-tree-optimized to -fdump-tree-pcom-details in dg-options and
cleanup-tree-dump from optimized to pcom.  Remove scan-tree-dump-times
for vect_\[^\\n\]*\\+, add scan-tree-dump-times for no suitable chains 
and
Executing predictive commoning without unrolling.

--- gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f.jj   2013-04-08 
15:38:21.0 +0200
+++ gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f  2014-01-13 
13:18:39.904315828 +0100
@@ -1,7 +1,7 @@
 ! { dg-do compile { target i?86-*-* x86_64-*-* } }
 ! { dg-require-effective-target vect_double }
 ! { dg-require-effective-target sse2 }
-! { dg-options "-O3 -ffast-math -msse2 -fpredictive-commoning -ftree-vectorize 
-fdump-tree-optimized" }
+! { dg-options "-O3 -ffast-math -msse2 -fpredictive-commoning -ftree-vectorize 
-fdump-tree-pcom-details" }
 
 
 *** RESID COMPUTES THE RESIDUAL:  R = V - AU
@@ -39,8 +39,9 @@ C
   RETURN
   END
 ! we want to check that predictive commoning did something on the
-! vectorized loop, which means we have to have exactly 13 vector
-! additions.
-! { dg-final { scan-tree-dump-times "vect_\[^\\n\]*\\+ " 13 "optimized" } }
+! vectorized loop.
+! { dg-final { scan-tree-dump-times "Executing predictive commoning without 
unrolling" 1 "pcom" { target lp64 } } }
+! { dg-final { scan-tree-dump-times "Executing predictive commoning without 
unrolling" 2 "pcom" { target ia32 } } }
+! { dg-final { scan-tree-dump-times "Predictive commoning failed: no suitable 
chains" 0 "pcom" } }
 ! { dg-final { cleanup-tree-dump "vect" } }
-! { dg-final { cleanup-tree-dump "optimized" } }
+! { dg-final { cleanup-tree-dump "pcom" } }

Jakub


Re: PING: PATCH: PR libitm/53113: Build fails in x86_avx.cc if AVX disabled by -mno-avx

2014-01-13 Thread Richard Henderson
On 01/11/2014 08:28 AM, H.J. Lu wrote:
> +2013-12-25  H.J. Lu  
>> +
>> +   PR libitm/53113
>> +   * Makefile.am (x86_sse.lo): Append -msse to CXXFLAGS.
>> +   (x86_avx.lo): Append -mavx to CXXFLAGS.
>> +   * Makefile.in: Regenerate.
>> +

Ok.


r~


[msp430] fix call-via-sp and epilogue helper patterns

2014-01-13 Thread DJ Delorie

The call change avoids a problem on hardware where indirect calls that
use SP as a base register don't seem to do what you expect.  The 'J'
one fixes a link-time error wrt epilogue helper functions.  Committed.

* config/msp430/msp430.md (call_internal): Don't allow memory
references with SP as the base register.
(call_value_internal): Likewise.
* config/msp430/constraints.md (Yc): New.  For memory references
that don't use SP as a base register.

* config/msp430/msp430.c (msp430_print_operand): Add 'J' to mean
"an integer without a # prefix"
* config/msp430/msp430.md (epilogue_helper): Use it.

 
Index: config/msp430/msp430.md
===
--- config/msp430/msp430.md (revision 206582)
+++ config/msp430/msp430.md (working copy)
@@ -917,13 +917,13 @@
   )
 
 
 (define_insn "epilogue_helper"
   [(unspec_volatile [(match_operand 0 "immediate_operand" "i")] 
UNS_EPILOGUE_HELPER)]
   ""
-  "BR%Q0\t#__mspabi_func_epilog_%0"
+  "BR%Q0\t#__mspabi_func_epilog_%J0"
   )
 
 
 (define_insn "prologue_start_marker"
   [(unspec_volatile [(const_int 0)] UNS_PROLOGUE_START_MARKER)]
   ""
@@ -950,13 +950,13 @@
(match_operand 1 ""))]
   ""
   ""
 )
 
 (define_insn "call_internal"
-  [(call (mem:HI (match_operand 0 "general_operand" "rmi"))
+  [(call (mem:HI (match_operand 0 "general_operand" "rYci"))
 (match_operand 1 ""))]
   ""
   "CALL%Q0\t%0"
 )
 
 (define_expand "call_value"
@@ -966,13 +966,13 @@
   ""
   ""
 )
 
 (define_insn "call_value_internal"
   [(set (match_operand   0 "register_operand" "=r")
-   (call (mem:HI (match_operand 1 "general_operand" "rmi"))
+   (call (mem:HI (match_operand 1 "general_operand" "rYci"))
  (match_operand 2 "")))]
   ""
   "CALL%Q0\t%1"
 )
 
 (define_insn "msp_return"
Index: config/msp430/constraints.md
===
--- config/msp430/constraints.md(revision 206582)
+++ config/msp430/constraints.md(working copy)
@@ -67,6 +67,19 @@
(and (match_code "plus" "0")
 (and (match_code "reg" "00")
  (match_test ("CONST_INT_P (XEXP (XEXP (op, 0), 1))"))
  (match_test ("IN_RANGE (INTVAL (XEXP (XEXP (op, 0), 1)), -1 
<< 15, (1 << 15)-1)"
(match_code "reg" "0")
)))
+
+(define_constraint "Yc"
+  "Memory reference, for CALL - we can't use SP"
+  (and (match_code "mem")
+   (match_code "mem" "0")
+   (not (ior
+(and (match_code "plus" "00")
+ (and (match_code "reg" "000")
+  (match_test ("REGNO (XEXP (XEXP (op, 0), 0)) != 
SP_REGNO"
+(and (match_code "reg" "0")
+ (match_test ("REGNO (XEXP (XEXP (op, 0), 0)) != SP_REGNO")))
+
+
Index: config/msp430/msp430.c
===
--- config/msp430/msp430.c  (revision 206582)
+++ config/msp430/msp430.c  (working copy)
@@ -1917,12 +1917,13 @@ msp430_print_operand_addr (FILE * file, 
 /* A   low 16-bits of int/lower of register pair
B   high 16-bits of int/higher of register pair
C   bits 32-47 of a 64-bit value/reg 3 of a DImode value
D   bits 48-63 of a 64-bit value/reg 4 of a DImode value
H   like %B (for backwards compatibility)
I   inverse of value
+   J   an integer without a # prefix
L   like %A (for backwards compatibility)
O   offset of the top of the stack
Q   like X but generates an A postfix
R   inverse of condition code, unsigned.
X   X instruction postfix in large mode
Y   value - 4
@@ -1947,13 +1948,12 @@ msp430_print_operand (FILE * file, rtx o
   return;
 case 'Y':
   gcc_assert (CONST_INT_P (op));
   /* Print the constant value, less four.  */
   fprintf (file, "#%ld", INTVAL (op) - 4);
   return;
-  /* case 'D': used for "decimal without '#'" */
 case 'I':
   if (GET_CODE (op) == CONST_INT)
{
  /* Inverse of constants */
  int i = INTVAL (op);
  fprintf (file, "%d", ~i);
@@ -2107,12 +2107,14 @@ msp430_print_operand (FILE * file, rtx o
 because builtins are expanded before the frame layout is determined.  
*/
   fprintf (file, "%d",
   msp430_initial_elimination_offset (ARG_POINTER_REGNUM, 
STACK_POINTER_REGNUM)
- 2);
   return;
 
+case 'J':
+  gcc_assert (GET_CODE (op) == CONST_INT);
 case 0:
   break;
 default:
   output_operand_lossage ("invalid operand prefix");
   return;
 }


Re: [PATCH] Avoid introducing undefined behavior in sccp (PR tree-optimization/59387)

2014-01-13 Thread Jakub Jelinek
On Mon, Jan 13, 2014 at 11:42:11AM +0100, Richard Biener wrote:
> > + if (TREE_CODE (def) == INTEGER_CST && TREE_OVERFLOW (def))
> 
> TREE_OVERFLOW_P (), but it seems to me that the SCEV machinery
> should do this at a good place (like where it finally records
> the result into its cache before returning it, at set_and_end:
> of analyze_scalar_evolution_1).
> 
> > +   def = drop_tree_overflow (def);

As discussed on IRC, dropped this part of the change altogether (for now).

> Hmm, stmt is still in the 'stmts' sequence here, I think you should
> gsi_remove it before inserting it elsewhere.

Fixed, bootstrapped/regtested on x86_64-linux and i686-linux, here is
what I've committed in the end:

2014-01-13  Jakub Jelinek  

PR tree-optimization/59387
* tree-scalar-evolution.c: Include gimple-fold.h and gimplify-me.h.
(scev_const_prop): If folded_casts and type has undefined overflow,
use force_gimple_operand instead of force_gimple_operand_gsi and
for each added stmt if it is assign with
arith_code_with_undefined_signed_overflow, call
rewrite_to_defined_overflow.
* tree-ssa-loop-im.c: Don't include gimplify-me.h, include
gimple-fold.h instead.
(arith_code_with_undefined_signed_overflow,
rewrite_to_defined_overflow): Moved to ...
* gimple-fold.c (arith_code_with_undefined_signed_overflow,
rewrite_to_defined_overflow): ... here.  No longer static.
Include gimplify-me.h.
* gimple-fold.h (arith_code_with_undefined_signed_overflow,
rewrite_to_defined_overflow): New prototypes.

* gcc.c-torture/execute/pr59387.c: New test.

--- gcc/tree-scalar-evolution.c.jj  2014-01-08 17:44:57.596582925 +0100
+++ gcc/tree-scalar-evolution.c 2014-01-10 15:46:55.355915072 +0100
@@ -286,6 +286,8 @@ along with GCC; see the file COPYING3.
 #include "dumpfile.h"
 #include "params.h"
 #include "tree-ssa-propagate.h"
+#include "gimple-fold.h"
+#include "gimplify-me.h"
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree, tree);
 static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
@@ -3409,7 +3411,7 @@ scev_const_prop (void)
 {
   edge exit;
   tree def, rslt, niter;
-  gimple_stmt_iterator bsi;
+  gimple_stmt_iterator gsi;
 
   /* If we do not know exact number of iterations of the loop, we cannot
 replace the final value.  */
@@ -3424,7 +3426,7 @@ scev_const_prop (void)
   /* Ensure that it is possible to insert new statements somewhere.  */
   if (!single_pred_p (exit->dest))
split_loop_exit_edge (exit);
-  bsi = gsi_after_labels (exit->dest);
+  gsi = gsi_after_labels (exit->dest);
 
   ex_loop = superloop_at_depth (loop,
loop_depth (exit->dest->loop_father) + 1);
@@ -3447,7 +3449,9 @@ scev_const_prop (void)
  continue;
}
 
- def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL);
+ bool folded_casts;
+ def = analyze_scalar_evolution_in_loop (ex_loop, loop, def,
+ &folded_casts);
  def = compute_overall_effect_of_inner_loop (ex_loop, def);
  if (!tree_does_not_contain_chrecs (def)
  || chrec_contains_symbols_defined_in_loop (def, ex_loop->num)
@@ -3485,10 +3489,37 @@ scev_const_prop (void)
  def = unshare_expr (def);
  remove_phi_node (&psi, false);
 
- def = force_gimple_operand_gsi (&bsi, def, false, NULL_TREE,
- true, GSI_SAME_STMT);
+ /* If def's type has undefined overflow and there were folded
+casts, rewrite all stmts added for def into arithmetics
+with defined overflow behavior.  */
+ if (folded_casts && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (def)))
+   {
+ gimple_seq stmts;
+ gimple_stmt_iterator gsi2;
+ def = force_gimple_operand (def, &stmts, true, NULL_TREE);
+ gsi2 = gsi_start (stmts);
+ while (!gsi_end_p (gsi2))
+   {
+ gimple stmt = gsi_stmt (gsi2);
+ gimple_stmt_iterator gsi3 = gsi2;
+ gsi_next (&gsi2);
+ gsi_remove (&gsi3, false);
+ if (is_gimple_assign (stmt)
+ && arith_code_with_undefined_signed_overflow
+   (gimple_assign_rhs_code (stmt)))
+   gsi_insert_seq_before (&gsi,
+  rewrite_to_defined_overflow (stmt),
+  GSI_SAME_STMT);
+ else
+   gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+   }
+   }
+ else
+   def = force_gimple_operand_gsi (&gsi, def, false, NULL_TREE,
+   true, GSI_SAME_STMT);
+
   

[C PATCH] Preevaluate rhs for lhs op= rhs in C (PR c/58943)

2014-01-13 Thread Jakub Jelinek
Hi!

This patch fixes the following testcase by preevaluating rhs if it has
(can have) side-effects in lhs op= rhs expressions.  Bootstrapped/regtested
on x86_64-linux and i686-linux, ok for trunk?
C++ already does a similar thing (though in that case with TARGET_EXPRs).

Note1: had to tweak ssa-fre-33.c testcase a little bit (but it still fails
without the fix which went together with it and succeeds with the fix
and from that point onwards), because before fre1 there isn't enough forward
propagation that would make it constant (the addition result becomes
constant during fre1).

Note2: c-c++-common/cilk-plus/AN/rank_mismatch2.c ICEs now, supposedly
array notation handling doesn't handle SAVE_EXPRs properly, Balaji, do you
think you can debug it and fix up afterwards?

2014-01-13  Jakub Jelinek  

PR c/58943
* c-typeck.c (build_modify_expr): For lhs op= rhs, if rhs has side
effects, preevaluate rhs using SAVE_EXPR first.

* c-omp.c (c_finish_omp_atomic): Set in_late_binary_op around
build_modify_expr with non-NOP_EXPR opcode.  Handle return from it
being COMPOUND_EXPR.
(c_finish_omp_for): Handle incr being COMPOUND_EXPR with first
operand a SAVE_EXPR and second MODIFY_EXPR.

* gcc.c-torture/execute/pr58943.c: New test.
* gcc.dg/tree-ssa/ssa-fre-33.c (main): Avoid using += in the test.

--- gcc/c/c-typeck.c.jj 2014-01-04 09:48:20.845147744 +0100
+++ gcc/c/c-typeck.c2014-01-13 14:57:27.133743740 +0100
@@ -5193,6 +5193,7 @@ build_modify_expr (location_t location,
 {
   tree result;
   tree newrhs;
+  tree rhseval = NULL_TREE;
   tree rhs_semantic_type = NULL_TREE;
   tree lhstype = TREE_TYPE (lhs);
   tree olhstype = lhstype;
@@ -5254,8 +5255,17 @@ build_modify_expr (location_t location,
   /* Construct the RHS for any non-atomic compound assignemnt. */
   if (!is_atomic_op)
 {
+ /* If in LHS op= RHS the RHS has side-effects, ensure they
+are preevaluated before the rest of the assignment expression's
+side-effects, because RHS could contain e.g. function calls
+that modify LHS.  */
+ if (TREE_SIDE_EFFECTS (rhs))
+   {
+ newrhs = in_late_binary_op ? save_expr (rhs) : c_save_expr (rhs);
+ rhseval = newrhs;
+   }
  newrhs = build_binary_op (location,
-   modifycode, lhs, rhs, 1);
+   modifycode, lhs, newrhs, 1);
 
  /* The original type of the right hand side is no longer
 meaningful.  */
@@ -5269,7 +5279,7 @@ build_modify_expr (location_t location,
 if so, we need to generate setter calls.  */
   result = objc_maybe_build_modify_expr (lhs, newrhs);
   if (result)
-   return result;
+   goto return_result;
 
   /* Else, do the check that we postponed for Objective-C.  */
   if (!lvalue_or_else (location, lhs, lv_assign))
@@ -5363,7 +5373,7 @@ build_modify_expr (location_t location,
   if (result)
{
  protected_set_expr_location (result, location);
- return result;
+ goto return_result;
}
 }
 
@@ -5384,11 +5394,15 @@ build_modify_expr (location_t location,
  as the LHS argument.  */
 
   if (olhstype == TREE_TYPE (result))
-return result;
+goto return_result;
 
   result = convert_for_assignment (location, olhstype, result, rhs_origtype,
   ic_assign, false, NULL_TREE, NULL_TREE, 0);
   protected_set_expr_location (result, location);
+
+return_result:
+  if (rhseval)
+result = build2 (COMPOUND_EXPR, TREE_TYPE (result), rhseval, result);
   return result;
 }
 
--- gcc/c-family/c-omp.c.jj 2014-01-04 09:48:20.0 +0100
+++ gcc/c-family/c-omp.c2014-01-13 15:23:51.653591098 +0100
@@ -136,7 +136,7 @@ c_finish_omp_atomic (location_t loc, enu
 enum tree_code opcode, tree lhs, tree rhs,
 tree v, tree lhs1, tree rhs1, bool swapped, bool seq_cst)
 {
-  tree x, type, addr;
+  tree x, type, addr, pre = NULL_TREE;
 
   if (lhs == error_mark_node || rhs == error_mark_node
   || v == error_mark_node || lhs1 == error_mark_node
@@ -194,9 +194,18 @@ c_finish_omp_atomic (location_t loc, enu
   rhs = build2_loc (loc, opcode, TREE_TYPE (lhs), rhs, lhs);
   opcode = NOP_EXPR;
 }
+  bool save = in_late_binary_op;
+  in_late_binary_op = true;
   x = build_modify_expr (loc, lhs, NULL_TREE, opcode, loc, rhs, NULL_TREE);
+  in_late_binary_op = save;
   if (x == error_mark_node)
 return error_mark_node;
+  if (TREE_CODE (x) == COMPOUND_EXPR)
+{
+  pre = TREE_OPERAND (x, 0);
+  gcc_assert (TREE_CODE (pre) == SAVE_EXPR);
+  x = TREE_OPERAND (x, 1);
+}
   gcc_assert (TREE_CODE (x) == MODIFY_EXPR);
   rhs = TREE_OPERAND (x, 1);
 
@@ -264,6 +273,8 @@ c_finish_omp_atomic (location_t loc, enu
   x = omit_one_operand_loc (loc, type, x, rhs1ad

Re: Patch ping

2014-01-13 Thread Jakub Jelinek
On Mon, Jan 13, 2014 at 07:40:16PM +0100, Uros Bizjak wrote:
> An unrelated observation: gcc should figure out that %k1 mask register
> can be used in all gather insns and avoid unnecessary copies at the
> beginning of the loop.

I thought about that too, even started modifying sse.md, but then I read the
spec and the AVX512F gather insns overwrite the mask register (like it does
for the vector mask register in AVX2 case).

Jakub


Re: Patch ping

2014-01-13 Thread Uros Bizjak
On Mon, Jan 13, 2014 at 7:26 PM, Kirill Yukhin  wrote:

>> > Kirill, is it possible for you to test the patch in the simulator? Do
>> > we have a testcase in gcc's testsuite that can be used to check this
>> > patch?
>>
>> E.g. gcc.target/i386/avx2-gather* and avx512f-gather*.
> This tests are for built-in generation. The issue is connected to
> auto code gen.
>
> It seems to be working, we have for hss2a.fppized.f:
> .L402:
> vmovdqu64   (%rdi,%rax), %zmm1
> kmovw   %k1, %k3
> kmovw   %k1, %k2
> kmovw   %k1, %k4
> kmovw   %k1, %k5
> addl$1, %esi
> vpgatherdd  npwrx.4971-4(,%zmm1,4), %zmm0{%k3}
> vpgatherdd  (%r10,%zmm1,4), %zmm2{%k2}
> vpmulld %zmm3, %zmm0, %zmm0
> vpaddd  %zmm7, %zmm0, %zmm0
> vmovdqu32   %zmm0, (%r11,%rax)
> vpgatherdd  npwry.4973-4(,%zmm1,4), %zmm0{%k4}
> vpmulld %zmm3, %zmm0, %zmm0
> vpaddd  %zmm6, %zmm0, %zmm0
> vmovdqu32   %zmm0, (%r9,%rax)
> vpgatherdd  npwrz.4975-4(,%zmm1,4), %zmm0{%k5}
> vpmulld %zmm3, %zmm0, %zmm0
> vpaddd  %zmm5, %zmm0, %zmm0
> vmovdqu32   %zmm0, (%r14,%rax)
> vpaddd  %zmm2, %zmm4, %zmm0
> vmovdqa64   %zmm0, (%r15,%rax)
> addq$64, %rax
> cmpl%esi, %edx
> ja  .L402

An unrelated observation: gcc should figure out that %k1 mask register
can be used in all gather insns and avoid unnecessary copies at the
beginning of the loop.

Uros.


Re: Patch ping

2014-01-13 Thread Uros Bizjak
On Mon, Jan 13, 2014 at 7:26 PM, Kirill Yukhin  wrote:

>> On Mon, Jan 13, 2014 at 09:15:14AM +0100, Uros Bizjak wrote:
>> > On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek  wrote:
>> > Kirill, is it possible for you to test the patch in the simulator? Do
>> > we have a testcase in gcc's testsuite that can be used to check this
>> > patch?
>>
>> E.g. gcc.target/i386/avx2-gather* and avx512f-gather*.
> This tests are for built-in generation. The issue is connected to
> auto code gen.
>
> It seems to be working, we have for hss2a.fppized.f:
> .L402:
> vmovdqu64   (%rdi,%rax), %zmm1
> kmovw   %k1, %k3
> kmovw   %k1, %k2
> kmovw   %k1, %k4
> kmovw   %k1, %k5
> addl$1, %esi
> vpgatherdd  npwrx.4971-4(,%zmm1,4), %zmm0{%k3}
> vpgatherdd  (%r10,%zmm1,4), %zmm2{%k2}
> vpmulld %zmm3, %zmm0, %zmm0
> vpaddd  %zmm7, %zmm0, %zmm0
> vmovdqu32   %zmm0, (%r11,%rax)
> vpgatherdd  npwry.4973-4(,%zmm1,4), %zmm0{%k4}
> vpmulld %zmm3, %zmm0, %zmm0
> vpaddd  %zmm6, %zmm0, %zmm0
> vmovdqu32   %zmm0, (%r9,%rax)
> vpgatherdd  npwrz.4975-4(,%zmm1,4), %zmm0{%k5}
> vpmulld %zmm3, %zmm0, %zmm0
> vpaddd  %zmm5, %zmm0, %zmm0
> vmovdqu32   %zmm0, (%r14,%rax)
> vpaddd  %zmm2, %zmm4, %zmm0
> vmovdqa64   %zmm0, (%r15,%rax)
> addq$64, %rax
> cmpl%esi, %edx
> ja  .L402
>
> So, I vote that patch is working.

Well, OK for mainline, then.

Thanks,
Uros.


Re: Patch ping

2014-01-13 Thread Kirill Yukhin
Hello,
On 13 Jan 09:35, Jakub Jelinek wrote:
> On Mon, Jan 13, 2014 at 09:15:14AM +0100, Uros Bizjak wrote:
> > On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek  wrote:
> > Kirill, is it possible for you to test the patch in the simulator? Do
> > we have a testcase in gcc's testsuite that can be used to check this
> > patch?
> 
> E.g. gcc.target/i386/avx2-gather* and avx512f-gather*.
This tests are for built-in generation. The issue is connected to
auto code gen.

It seems to be working, we have for hss2a.fppized.f:
.L402:
vmovdqu64   (%rdi,%rax), %zmm1
kmovw   %k1, %k3
kmovw   %k1, %k2
kmovw   %k1, %k4
kmovw   %k1, %k5
addl$1, %esi
vpgatherdd  npwrx.4971-4(,%zmm1,4), %zmm0{%k3}
vpgatherdd  (%r10,%zmm1,4), %zmm2{%k2}
vpmulld %zmm3, %zmm0, %zmm0
vpaddd  %zmm7, %zmm0, %zmm0
vmovdqu32   %zmm0, (%r11,%rax)
vpgatherdd  npwry.4973-4(,%zmm1,4), %zmm0{%k4}
vpmulld %zmm3, %zmm0, %zmm0
vpaddd  %zmm6, %zmm0, %zmm0
vmovdqu32   %zmm0, (%r9,%rax)
vpgatherdd  npwrz.4975-4(,%zmm1,4), %zmm0{%k5}
vpmulld %zmm3, %zmm0, %zmm0
vpaddd  %zmm5, %zmm0, %zmm0
vmovdqu32   %zmm0, (%r14,%rax)
vpaddd  %zmm2, %zmm4, %zmm0
vmovdqa64   %zmm0, (%r15,%rax)
addq$64, %rax
cmpl%esi, %edx
ja  .L402

So, I vote that patch is working.

--
Thanks, K


Re: [Patch] Remove references to non-existent tree-flow.h file

2014-01-13 Thread Jeff Law

On 01/09/14 10:45, Steve Ellcey wrote:

While looking at PR 59335 (plugin doesn't build) I saw the comments about
tree-flow.h and tree-flow-inline.h not existing anymore.  While these
files have been removed there are still some references to them in
Makefile.in, doc/tree-ssa.texi, and a couple of source files.  This patch
removes the references to these now-nonexistent files.

OK to checkin?

Steve Ellcey
sell...@mips.com


2014-01-09  Steve Ellcey  

* Makefile.in (TREE_FLOW_H): Remove.
(TREE_SSA_H): Add files names from tree-flow.h.
* doc/tree-ssa.texi (Annotations): Remove reference to tree-flow.h
* tree.h: Remove tree-flow.h reference.
* hash-table.h: Remove tree-flow.h reference.
* tree-ssa-loop-niter.c (dump_affine_iv): Replace tree-flow.h
reference with tree-ssa-loop.h.

Yes, this is fine.

jeff



Re: [PATCH] Add zero-overhead looping for xtensa backend

2014-01-13 Thread Sterling Augustine
On Thu, Jan 9, 2014 at 7:48 PM, Yangfei (Felix)  wrote:
> And here is the xtensa configuration tested (include/xtensa-config.h):
>
> #define XCHAL_HAVE_BE   0
> #define XCHAL_HAVE_LOOPS1


Hi Felix,

I like this patch, and expect I will approve it. However, I would like
you to do two more things before I do:

1. Ensure it doesn't generate zcl's when:

#define XCHAL_HAVE_LOOPS 0

2. Ensure it doesn't produce loops bodies that contain ret, retw,
ret.n or retw.n as the last instruction. It might be easier to just
disallow them in loop bodies entirely though.

Thanks!


Re: [C PATCH] Disallow subtracting pointers to empty structs (PR c/58346)

2014-01-13 Thread Marek Polacek
On Mon, Jan 13, 2014 at 05:32:26PM +0100, Marek Polacek wrote:
> This doesn't really fix the PR, but solves a related issue, where we
> have e.g.
> struct U {};
> static struct U b[6];
> 
> int foo (struct U *p, struct U *q)
> {
>   return q - p;
> }
> int main()
> {
>   return foo (&b[0], &b[4]);
> }
> Such a program SIGFPEs at runtime.  But subtraction of pointers to empty
> structures/unions doesn't really make sense and this patch forbids that.
> Note that GCC permits a structure/union to have no members, but it's only
> an extension, in C11 it's undefined behavior.
> 
> Regtested/bootstrapped on x86_64, ok for trunk?

The patch will need some tweaking, I realized that e.g. for struct S {
union {}; }; it doesn't do the right thing...

Marek


Re: [C PATCH] Disallow subtracting pointers to empty structs (PR c/58346)

2014-01-13 Thread Florian Weimer

On 01/13/2014 05:32 PM, Marek Polacek wrote:


This doesn't really fix the PR, but solves a related issue, where we
have e.g.
struct U {};
static struct U b[6];

int foo (struct U *p, struct U *q)
{
   return q - p;
}
int main()
{
   return foo (&b[0], &b[4]);
}



Such a program SIGFPEs at runtime.  But subtraction of pointers to empty
structures/unions doesn't really make sense and this patch forbids that.
Note that GCC permits a structure/union to have no members, but it's only



+  if (pointer_to_empty_aggr_p (TREE_TYPE (orig_op1)))
+error_at (loc, "arithmetic on pointer to an empty aggregate");


You need to check the size of the aggregate, not if it has no members. 
With your patch applied, if the struct definition in your test case is 
changed to this:


struct U { char empty[0]; };

it still compiles and fails at run time.

Empty structs have size 1 in C++, but structs with a zero-length array 
have size 0, so the C++ compiler should be changed as well.


--
Florian Weimer / Red Hat Product Security Team


[C PATCH] Disallow subtracting pointers to empty structs (PR c/58346)

2014-01-13 Thread Marek Polacek
This doesn't really fix the PR, but solves a related issue, where we
have e.g.
struct U {};
static struct U b[6];

int foo (struct U *p, struct U *q)
{
  return q - p;
}
int main()
{
  return foo (&b[0], &b[4]);
}
Such a program SIGFPEs at runtime.  But subtraction of pointers to empty
structures/unions doesn't really make sense and this patch forbids that.
Note that GCC permits a structure/union to have no members, but it's only
an extension, in C11 it's undefined behavior.

Regtested/bootstrapped on x86_64, ok for trunk?

2014-01-13  Marek Polacek  

PR c/58346
c/
* c-typeck.c (pointer_to_empty_aggr_p): New function.
(pointer_diff): Give an error on arithmetic on pointer to an
empty aggregate.
testsuite/
* gcc.dg/pr58346.c: New test.

--- gcc/c/c-typeck.c.mp 2014-01-13 15:47:01.316105676 +0100
+++ gcc/c/c-typeck.c2014-01-13 16:03:35.513081392 +0100
@@ -3427,6 +3427,18 @@ parser_build_binary_op (location_t locat
   return result;
 }
 
+
+/* Return true if T is a pointer to an empty struct/union.  */
+
+static bool
+pointer_to_empty_aggr_p (tree t)
+{
+  t = strip_pointer_operator (t);
+  if (!RECORD_OR_UNION_TYPE_P (t))
+return false;
+  return TYPE_FIELDS (t) == NULL_TREE;
+}
+
 /* Return a tree for the difference of pointers OP0 and OP1.
The resulting tree has type int.  */
 
@@ -3536,6 +3548,9 @@ pointer_diff (location_t loc, tree op0,
   /* This generates an error if op0 is pointer to incomplete type.  */
   op1 = c_size_in_bytes (target_type);
 
+  if (pointer_to_empty_aggr_p (TREE_TYPE (orig_op1)))
+error_at (loc, "arithmetic on pointer to an empty aggregate");
+
   /* Divide by the size, in easiest possible way.  */
   result = fold_build2_loc (loc, EXACT_DIV_EXPR, inttype,
op0, convert (inttype, op1));
--- gcc/testsuite/gcc.dg/pr58346.c.mp   2014-01-13 15:48:20.011420141 +0100
+++ gcc/testsuite/gcc.dg/pr58346.c  2014-01-13 16:01:41.741713601 +0100
@@ -0,0 +1,21 @@
+/* PR c/58346 */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" } */
+
+struct U {};
+static struct U b[6];
+static struct U **s1, **s2;
+
+int
+foo (struct U *p, struct U *q)
+{
+  return q - p; /* { dg-error "arithmetic on pointer to an empty aggregate" } 
*/
+}
+
+void
+bar (void)
+{
+  __PTRDIFF_TYPE__ d = s1 - s2; /* { dg-error "arithmetic on pointer to an 
empty aggregate" } */
+  __asm volatile ("" : "+g" (d));
+  foo (&b[0], &b[4]);
+}

Marek


Re: [PATCH] Fix for PR57698

2014-01-13 Thread H.J. Lu
On Fri, Jul 12, 2013 at 3:16 PM, Sriraman Tallam  wrote:
> Patch attached to fix this: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57698
>
> Here is what is going on. In rev. 200179, this change to tree-inline.c
>
> Index: tree-inline.c
> ===
> --- tree-inline.c   (revision 200178)
> +++ tree-inline.c   (revision 200179)
> @@ -3905,8 +3905,6 @@
>  for inlining, but we can't do that because frontends overwrite
>  the body.  */
>   && !cg_edge->callee->local.redefined_extern_inline
> - /* Avoid warnings during early inline pass. */
> - && cgraph_global_info_ready
>   /* PR 20090218-1_0.c. Body can be provided by another module. */
>   && (reason != CIF_BODY_NOT_AVAILABLE || !flag_generate_lto))
> {
>
> made inline failure errors during early inlining reportable.  Now,
> this function is called when the early_inliner calls
> optimize_inline_calls.  The reason for the failure,
> CIF_INDIRECT_UNKNOWN_CALL, should not be reported because it is not a
> valid reason,(see can_inline_edge_p in ipa-inline.c for the list of
> reasons we intend to report) but it gets reported because of the above
> change.
>
>
> The reported bug happens only when optimization is turned on as the
> early inliner pass invokes incremental inlining which calls
> optimize_inline_calls and triggers the above failure.
>
> So, the fix is then as simple as:
>
> Index: tree-inline.c
> ===
> --- tree-inline.c   (revision 200912)
> +++ tree-inline.c   (working copy)
> @@ -3905,6 +3905,10 @@ expand_call_inline (basic_block bb, gimple stmt, c
>  for inlining, but we can't do that because frontends overwrite
>  the body.  */
>   && !cg_edge->callee->local.redefined_extern_inline
> + /* During early inline pass, report only when optimization is
> +not turned on.  */
> + && (cgraph_global_info_ready
> + || !optimize)
>   /* PR 20090218-1_0.c. Body can be provided by another module. */
>   && (reason != CIF_BODY_NOT_AVAILABLE || !flag_generate_lto))
> {
>
> Seems like the right fix to me. Ok?  The whole patch with test case
> included is attached.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59789

-- 
H.J.


Re: Patch ping

2014-01-13 Thread Jakub Jelinek
On Mon, Jan 13, 2014 at 08:15:11AM -0700, Jeff Law wrote:
> On 01/13/14 01:07, Jakub Jelinek wrote:
> >I'd like to ping 2 patches:
> >
> >http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html
> >- Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than
> >   memory load after optimization (I'd like to keep the current 
> >   patch for the reasons mentioned there, but also add this patch)
> I'd tend to think this is 4.10/5.0 material.  Unless (for example),
> you've got a PR where this makes a significant difference in compile
> time.

Ok, will defer it then.

Jakub


Re: [PATCH][IRA] Analysis of register usage of functions for usage by IRA.

2014-01-13 Thread Tom de Vries

On 10-01-14 12:39, Richard Earnshaw wrote:

>>Consequently, you'll need to add a patch for AArch64 which has two
>>registers clobbered by PLT-based calls.
>>

>
>Thanks for pointing that out. That's r16 and r17, right? I can propose the hook
>for AArch64, once we all agree on how the hook should look.
>

Yes; and thanks!


Hi Richard,

I'm posting this patch that implements the TARGET_FN_OTHER_HARD_REG_USAGE hook 
for aarch64. It uses the conservative hook format for now.


I've build gcc and cc1 with the patch, and observed the impact on this code 
snippet:
...
static int
bar (int x)
{
  return x + 3;
}

int
foo (int y)
{
  return y + bar (y);
}
...

AFAICT, that looks as expected:
...
$ gcc fuse-caller-save.c -mno-lra -fno-use-caller-save -O2 -S -o- > 1
$ gcc fuse-caller-save.c -mno-lra -fuse-caller-save -O2 -S -o- > 2
$ diff -u 1 2
--- 1   2014-01-13 16:51:24.0 +0100
+++ 2   2014-01-13 16:51:19.0 +0100
@@ -11,14 +11,12 @@
.global foo
.type   foo, %function
 foo:
-   stp x29, x30, [sp, -32]!
+   stp x29, x30, [sp, -16]!
+   mov w1, w0
add x29, sp, 0
-   str x19, [sp,16]
-   mov w19, w0
bl  bar
-   add w0, w0, w19
-   ldr x19, [sp,16]
-   ldp x29, x30, [sp], 32
+   ldp x29, x30, [sp], 16
+   add w0, w0, w1
ret
.size   foo, .-foo
.section.text.startup,"ax",%progbits
...

Btw, the results are the same for -mno-lra and -mlra. I'm just using the 
-mno-lra version here because the -mlra version of -fuse-caller-save is still in 
review ( http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00586.html ).


Thanks,
- Tom

2014-01-11  Tom de Vries  

	* config/aarch64/aarch64.c (TARGET_FN_OTHER_HARD_REG_USAGE): Redefine as
	aarch64_fn_other_hard_reg_usage.
	(aarch64_fn_other_hard_reg_usage): New function.
---
 gcc/config/aarch64/aarch64.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 3b1f6b5..295fd5d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3287,6 +3287,16 @@ aarch64_fixed_condition_code_regs (unsigned int *p1, unsigned int *p2)
   return true;
 }
 
+/* Implement TARGET_FN_OTHER_HARD_REG_USAGE.  */
+
+static bool
+aarch64_fn_other_hard_reg_usage (struct hard_reg_set_container *regs)
+{
+  SET_HARD_REG_BIT (regs->set, R16_REGNUM);
+  SET_HARD_REG_BIT (regs->set, R17_REGNUM);
+  return true;
+}
+
 enum machine_mode
 aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y)
 {
@@ -8472,6 +8482,11 @@ aarch64_vectorize_vec_perm_const_ok (enum machine_mode vmode,
 #undef TARGET_FIXED_CONDITION_CODE_REGS
 #define TARGET_FIXED_CONDITION_CODE_REGS aarch64_fixed_condition_code_regs
 
+#undef TARGET_FN_OTHER_HARD_REG_USAGE
+#define TARGET_FN_OTHER_HARD_REG_USAGE \
+  aarch64_fn_other_hard_reg_usage
+
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
-- 
1.8.3.2



[PATCH][ARM][committed] Fix typo in arm.h

2014-01-13 Thread Kyrill Tkachov

Hi all,

I've committed this obvious typo fix to trunk as r206580.

Kyrill

2014-01-13  Kyrylo Tkachov  

* config/arm/arm.h (MAX_CONDITIONAL_EXECUTE): Fix typo in description.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 409589d..b815488 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -189,7 +189,7 @@ extern arm_cc arm_current_cc;
 
 #define ARM_INVERSE_CONDITION_CODE(X)  ((arm_cc) (((int)X) ^ 1))
 
-/* The maximaum number of instructions that is beneficial to
+/* The maximum number of instructions that is beneficial to
conditionally execute. */
 #undef MAX_CONDITIONAL_EXECUTE
 #define MAX_CONDITIONAL_EXECUTE arm_max_conditional_execute ()

Re: Patch ping

2014-01-13 Thread Jeff Law

On 01/13/14 08:20, Jakub Jelinek wrote:

On Mon, Jan 13, 2014 at 08:15:11AM -0700, Jeff Law wrote:

On 01/13/14 01:07, Jakub Jelinek wrote:

I'd like to ping 2 patches:

http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html
- Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than
   memory load after optimization (I'd like to keep the current 
   patch for the reasons mentioned there, but also add this patch)

I'd tend to think this is 4.10/5.0 material.  Unless (for example),
you've got a PR where this makes a significant difference in compile
time.


Ok, will defer it then.

THanks.  I've put in my queued folder as well ;-)

jeff


Re: Patch ping

2014-01-13 Thread Jeff Law

On 01/13/14 01:07, Jakub Jelinek wrote:

Hi!

I'd like to ping 2 patches:

http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html
- Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than
   memory load after optimization (I'd like to keep the current 
   patch for the reasons mentioned there, but also add this patch)
I'd tend to think this is 4.10/5.0 material.  Unless (for example), 
you've got a PR where this makes a significant difference in compile time.


jeff



Re: [PING^2][PATCH] -fuse-caller-save - Implement TARGET_FN_OTHER_HARD_REG_USAGE hook for MIPS

2014-01-13 Thread Tom de Vries

On 10-01-14 09:47, Richard Sandiford wrote:

Tom de Vries  writes:

  Why not just collect the usage information at
the end of final rather than at the beginning, so that all splits during
final have been done?


If we have a call to a leaf function, the final rtl representation does not
contain calls. The problem does not lie in the final pass where the callee is
analyzed, but in the caller, where information is used, and where the unsplit
call is missing the clobber of r6.


Ah, so when you're using this hook in final, you're actually adding in
the set of registers that will be clobbered by a future caller's CALL_INSN,
as well as the registers that are clobbered by the callee itself?


Right. The first part is not the intended usage of the hook, but it was the 
simplest fix.



That seems a bit error-prone, since we don't know at this stage what
the future caller will look like.  (Things like the target attribute
make this harder to predict.)

I think it would be cleaner to just calculate the callee-clobbered
registers during final and leave the caller to say what it clobbers.



Agree. I've rewritten the patch as such.


FWIW, I still think it'd be better to collect the set at the end of final
(after any final splits) rather than at the beginning.



Hmm. I was not aware that splits can happen during final. I'll try to update 
that patch as well.



For other cases (where the usage isn't explicit
at the rtl level), why not record the usage in CALL_INSN_FUNCTION_USAGE
instead?



Right, we could add the r6 clobber that way. But to keep things simple, I've
used the hook instead.


Why's it simpler though?  That's the kind of thing CALL_INSN_FUNCTION_USAGE
is there for.



It was simpler to implement. But you're right, using CALL_INSN_FUNCTION_USAGE 
was simple as well.


build and reg-tested on MIPS. OK for stage1? (You've alread OK-ed the test-case 
part).


Thanks,
- Tom


Thanks,
Richard



2014-01-12  Radovan Obradovic  
Tom de Vries  

	* config/mips/mips.c (POST_CALL_TMP_REG): Define.
	(mips_split_call): Use POST_CALL_TMP_REG.
	(mips_fn_other_hard_reg_usage): New function.
	(TARGET_FN_OTHER_HARD_REG_USAGE): Define targhook using new function.
	(mips_expand_call): Add POST_CALL_TMP_REG clobber.

	* gcc.target/mips/mips.exp: Add use-caller-save to -ffoo/-fno-foo
	options.
	* gcc.target/mips/fuse-caller-save.c: New test.
---
 gcc/config/mips/mips.c   | 41 +---
 gcc/testsuite/gcc.target/mips/fuse-caller-save.c | 30 +
 gcc/testsuite/gcc.target/mips/mips.exp   |  1 +
 3 files changed, 67 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/fuse-caller-save.c

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 617391c..ef7a3f9 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -175,6 +175,11 @@ along with GCC; see the file COPYING3.  If not see
 /* Return the usual opcode for a nop.  */
 #define MIPS_NOP 0
 
+/* Temporary register that is used after a call, and suitable for both
+   MIPS16 and non-MIPS16 code.  $4 and $5 are used for returning complex double
+   values in soft-float code, so $6 is the first suitable candidate.  */
+#define POST_CALL_TMP_REG (GP_ARG_FIRST + 2)
+
 /* Classifies an address.
 
ADDRESS_REG
@@ -6906,11 +6911,19 @@ mips_expand_call (enum mips_call_type type, rtx result, rtx addr,
 {
   rtx orig_addr, pattern, insn;
   int fp_code;
+  rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG);
 
   fp_code = aux == 0 ? 0 : (int) GET_MODE (aux);
   insn = mips16_build_call_stub (result, &addr, args_size, fp_code);
   if (insn)
 {
+  if (TARGET_EXPLICIT_RELOCS
+	  && TARGET_CALL_CLOBBERED_GP
+	  && !find_reg_note (insn, REG_NORETURN, 0))
+	CALL_INSN_FUNCTION_USAGE (insn)
+	  = gen_rtx_EXPR_LIST (VOIDmode,
+			   gen_rtx_CLOBBER (VOIDmode, post_call_tmp_reg),
+			   CALL_INSN_FUNCTION_USAGE (insn));
   gcc_assert (!lazy_p && type == MIPS_CALL_NORMAL);
   return insn;
 }
@@ -6966,7 +6979,16 @@ mips_expand_call (enum mips_call_type type, rtx result, rtx addr,
   pattern = fn (result, addr, args_size);
 }
 
-  return mips_emit_call_insn (pattern, orig_addr, addr, lazy_p);
+  insn = mips_emit_call_insn (pattern, orig_addr, addr, lazy_p);
+  if (TARGET_EXPLICIT_RELOCS
+  && TARGET_CALL_CLOBBERED_GP
+  && !find_reg_note (insn, REG_NORETURN, 0))
+CALL_INSN_FUNCTION_USAGE (insn)
+  = gen_rtx_EXPR_LIST (VOIDmode,
+			   gen_rtx_CLOBBER (VOIDmode, post_call_tmp_reg),
+			   CALL_INSN_FUNCTION_USAGE (insn));
+
+  return insn;
 }
 
 /* Split call instruction INSN into a $gp-clobbering call and
@@ -6978,10 +7000,8 @@ mips_split_call (rtx insn, rtx call_pattern)
 {
   emit_call_insn (call_pattern);
   if (!find_reg_note (insn, REG_NORETURN, 0))
-/* Pick a temporary register that is suitable for both MIPS16 and
-   non-MIPS16 code.  $4 and $5 are used for returning complex double
-   v

Re: [PATCH][testsuite][ARM] Properly figure -mfloat-abi option for crypto tests

2014-01-13 Thread Christophe Lyon
On 13 January 2014 15:51, Kyrill Tkachov  wrote:
> On 13/01/14 13:57, Christophe Lyon wrote:
>>
>> Hi Kyrill,
>>
>> Your patch fixes most of the problems I noticed, however, it makes the
>> compiler crash on vld1Q_dupp64 when the target is big-endian:
>> --with-target= armeb-none-linux-gnueabihf
>> --with-cpu=cortex-a9
>> --with-fpu=neon-fp16
>>
>>
>>
>> /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:
>> In function 'test_vld1Q_dupp64':
>>
>> /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1:
>> error: unrecognizable insn:
>> (insn 30 29 16 (set (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110])
>> 0)
>>  (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 8))
>>
>> /aci-gcc-fsf/builds/gcc-fsf-trunk/obj-armeb-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:8624
>> -1
>>   (nil))
>
>
> Hmmm... This seems to be a failure in the vld1Q_dupu64 and vld1Q_dups64
> intrinsics as well that were not part of my crypto patches and were likely
> ICEing before that in big-endian. The problem seems that we end up splitting
> into subregs after register allocation, which causes the ICE. The cuprit is
> the neon_vld1_dupv2di. I think it can be modified to directly use the hard
> registers after reload instead of generating their low and high parts.
>

You are probably right; before your patch it failed in my
configuration because it was trying to #include gnu/stubs-soft.h in
the hf configuration. Since you fixed that, the other problem
appeared.

> I'll test a patch...
>
Thanks


Re: [PATCH][testsuite][ARM] Properly figure -mfloat-abi option for crypto tests

2014-01-13 Thread Kyrill Tkachov

On 13/01/14 13:57, Christophe Lyon wrote:

Hi Kyrill,

Your patch fixes most of the problems I noticed, however, it makes the
compiler crash on vld1Q_dupp64 when the target is big-endian:
--with-target= armeb-none-linux-gnueabihf
--with-cpu=cortex-a9
--with-fpu=neon-fp16


/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:
In function 'test_vld1Q_dupp64':
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1:
error: unrecognizable insn:
(insn 30 29 16 (set (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 0)
 (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 8))
/aci-gcc-fsf/builds/gcc-fsf-trunk/obj-armeb-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:8624
-1
  (nil))


Hmmm... This seems to be a failure in the vld1Q_dupu64 and vld1Q_dups64 
intrinsics as well that were not part of my crypto patches and were likely 
ICEing before that in big-endian. The problem seems that we end up splitting 
into subregs after register allocation, which causes the ICE. The cuprit is the 
neon_vld1_dupv2di. I think it can be modified to directly use the hard registers 
after reload instead of generating their low and high parts.


I'll test a patch...

Thanks,
Kyrill


/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1:
internal compiler error: in extract_insn, at recog.c:2168
0xa9e560 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/rtl-error.c:109
0xa9e59f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/rtl-error.c:117
0xa58fef extract_insn(rtx_def*)
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2168
0xa592ec extract_insn_cached(rtx_def*)
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2071
0x7e5309 cleanup_subreg_operands(rtx_def*)
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/final.c:3074
0xa5845f split_insn
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2886
0xa585b7 split_all_insns_noflow()
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2991
0xe31941 arm_reorg
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/config/arm/arm.c:16962
0xa9e240 rest_of_handle_machine_reorg
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/reorg.c:3933
0xa9e26e execute
 /aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/reorg.c:3963


Christophe.


On 10 January 2014 12:31, Richard Earnshaw  wrote:

On 09/01/14 17:02, Kyrill Tkachov wrote:

Hi all,

When adding the testsuite options for the crypto tests we need to make sure that
don't end up adding -mfloat-abi=softfp to a hard-float target like
arm-none-linux-gnueabihf. This patch adds that code to figure out which
-mfpu/-mfloat-abi combination to use in a similar approach to the NEON tests.

This patch addresses the same failures that Christophe mentioned in
http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00375.html
but with this patch we can get those tests to PASS on arm-none-linux-gnueabihf
instead of being just UNSUPPORTED.

Tested arm-none-linux-gnueabihf and arm-none-eabi.

Ok for trunk?

Thanks,
Kyrill


2014-01-09  Kyrylo Tkachov  

  * lib/target-supports.exp
  (check_effective_target_arm_crypto_ok_nocache): New.
  (check_effective_target_arm_crypto_ok): Use above procedure.
  (add_options_for_arm_crypto): Use et_arm_crypto_flags.



OK.

R.







[PATCH] remove some old code from ansidecl.h

2014-01-13 Thread Tom Tromey
ansidecl.h still defines a number of macros which I think are now
obsolete.  I recently removed all uses of these macros from
binutils-gdb.git; and there are no more uses in gcc.  So, I'd like to
propose removing the old macros entirely.

This patch removes the last uses of PARAMS from include, and the last
uses of the obsolete VA_* wrapper macros from libiberty.  Then, it
removes many obsolete macro definitions from ansidecl.h.

I tested this by rebuilding gcc and binutils-gdb with the patch.

Note that even if I missed a use of one of the macros, the
consequences are small, as the fix is always trivial.

2014-01-13  Tom Tromey  

* ansidecl.h (ANSI_PROTOTYPES, PTRCONST, LONG_DOUBLE, PARAMS)
(VPARAMS, VA_START, VA_OPEN, VA_CLOSE, VA_FIXEDARG, CONST)
(VOLATILE, SIGNED, PROTO, EXFUN, DEFUN, DEFUN_VOID, AND, DOTS)
(NOARGS): Don't define.
* libiberty.h (expandargv, writeargv): Don't use PARAMS.

2014-01-13  Tom Tromey  

* _doprint.c (checkit): Use stdarg, not VA_* macros.
* asprintf.c (asprintf): Use stdarg, not VA_* macros.
* concat.c (concat_length, concat_copy, concat_copy2, concat)
(reconcat): Use stdarg, not VA_* macros.
* snprintf.c (snprintf): Use stdarg, not VA_* macros.
* vasprintf.c (checkit): Use stdarg, not VA_* macros.
* vsnprintf.c (checkit): Use stdarg, not VA_* macros.
---
 include/ChangeLog |   8 +++
 include/ansidecl.h| 141 +-
 include/libiberty.h   |   6 +--
 libiberty/ChangeLog   |  10 
 libiberty/_doprnt.c   |   6 +--
 libiberty/asprintf.c  |   9 ++--
 libiberty/concat.c|  45 +++-
 libiberty/snprintf.c  |  10 ++--
 libiberty/vasprintf.c |   8 +--
 libiberty/vsnprintf.c |  10 ++--
 10 files changed, 62 insertions(+), 191 deletions(-)

diff --git a/include/ansidecl.h b/include/ansidecl.h
index 5cd03a7..0fb23bb 100644
--- a/include/ansidecl.h
+++ b/include/ansidecl.h
@@ -1,6 +1,6 @@
 /* ANSI and traditional C compatability macros
Copyright 1991, 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001,
-   2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010
+   2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2013
Free Software Foundation, Inc.
This file is part of the GNU C Library.
 
@@ -24,93 +24,16 @@ Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, 
MA 02110-1301, USA.
 
Macro   ANSI C definition   Traditional C definition
-    - --   --- - --
-   ANSI_PROTOTYPES 1   not defined
PTR `void *'`char *'
-   PTRCONST`void *const'   `char *'
-   LONG_DOUBLE `long double'   `double'
const   not defined `'
volatilenot defined `'
signed  not defined `'
-   VA_START(ap, var)   va_start(ap, var)   va_start(ap)
-
-   Note that it is safe to write "void foo();" indicating a function
-   with no return value, in all K+R compilers we have been able to test.
-
-   For declaring functions with prototypes, we also provide these:
-
-   PARAMS ((prototype))
-   -- for functions which take a fixed number of arguments.  Use this
-   when declaring the function.  When defining the function, write a
-   K+R style argument list.  For example:
-
-   char *strcpy PARAMS ((char *dest, char *source));
-   ...
-   char *
-   strcpy (dest, source)
-char *dest;
-char *source;
-   { ... }
-
-
-   VPARAMS ((prototype, ...))
-   -- for functions which take a variable number of arguments.  Use
-   PARAMS to declare the function, VPARAMS to define it.  For example:
-
-   int printf PARAMS ((const char *format, ...));
-   ...
-   int
-   printf VPARAMS ((const char *format, ...))
-   {
-  ...
-   }
-
-   For writing functions which take variable numbers of arguments, we
-   also provide the VA_OPEN, VA_CLOSE, and VA_FIXEDARG macros.  These
-   hide the differences between K+R  and C89  more
-   thoroughly than the simple VA_START() macro mentioned above.
-
-   VA_OPEN and VA_CLOSE are used *instead of* va_start and va_end.
-   Immediately after VA_OPEN, put a sequence of VA_FIXEDARG calls
-   corresponding to the list of fixed arguments.  Then use va_arg
-   normally to get the variable arguments, or pass your va_list object
-   around.  You do not declare the va_list yourself; VA_OPEN does it
-   for you.
-
-   Here is a complete example:
-
-   int
-   printf VPARAMS ((const char *format, ...))
-   {
-  int result;
-
-  VA_OPEN (ap, format);
-  VA_FIXEDARG (ap, const char *, format);
-
-  result = vfprintf (stdout, format, ap);
-  VA_CLOSE (ap);
-
-  return result;
-   }
-
-
-   You can declare variables either before or after the VA_OPEN,
-   VA_FIXED

Re: [PATCH][testsuite][ARM] Properly figure -mfloat-abi option for crypto tests

2014-01-13 Thread Christophe Lyon
Hi Kyrill,

Your patch fixes most of the problems I noticed, however, it makes the
compiler crash on vld1Q_dupp64 when the target is big-endian:
--with-target= armeb-none-linux-gnueabihf
--with-cpu=cortex-a9
--with-fpu=neon-fp16


/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:
In function 'test_vld1Q_dupp64':
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1:
error: unrecognizable insn:
(insn 30 29 16 (set (subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 0)
(subreg:DI (reg:V2DI 48 d16 [orig:110 D.14607 ] [110]) 8))
/aci-gcc-fsf/builds/gcc-fsf-trunk/obj-armeb-none-linux-gnueabihf/gcc3/gcc/include/arm_neon.h:8624
-1
 (nil))
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/testsuite/gcc.target/arm/neon/vld1Q_dupp64.c:16:1:
internal compiler error: in extract_insn, at recog.c:2168
0xa9e560 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/rtl-error.c:109
0xa9e59f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/rtl-error.c:117
0xa58fef extract_insn(rtx_def*)
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2168
0xa592ec extract_insn_cached(rtx_def*)
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2071
0x7e5309 cleanup_subreg_operands(rtx_def*)
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/final.c:3074
0xa5845f split_insn
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2886
0xa585b7 split_all_insns_noflow()
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/recog.c:2991
0xe31941 arm_reorg
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/config/arm/arm.c:16962
0xa9e240 rest_of_handle_machine_reorg
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/reorg.c:3933
0xa9e26e execute
/aci-gcc-fsf/sources/gcc-fsf/trunk/gcc/reorg.c:3963


Christophe.


On 10 January 2014 12:31, Richard Earnshaw  wrote:
> On 09/01/14 17:02, Kyrill Tkachov wrote:
>> Hi all,
>>
>> When adding the testsuite options for the crypto tests we need to make sure 
>> that
>> don't end up adding -mfloat-abi=softfp to a hard-float target like
>> arm-none-linux-gnueabihf. This patch adds that code to figure out which
>> -mfpu/-mfloat-abi combination to use in a similar approach to the NEON tests.
>>
>> This patch addresses the same failures that Christophe mentioned in
>> http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00375.html
>> but with this patch we can get those tests to PASS on 
>> arm-none-linux-gnueabihf
>> instead of being just UNSUPPORTED.
>>
>> Tested arm-none-linux-gnueabihf and arm-none-eabi.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>>
>> 2014-01-09  Kyrylo Tkachov  
>>
>>  * lib/target-supports.exp
>>  (check_effective_target_arm_crypto_ok_nocache): New.
>>  (check_effective_target_arm_crypto_ok): Use above procedure.
>>  (add_options_for_arm_crypto): Use et_arm_crypto_flags.
>>
>>
>
> OK.
>
> R.
>
>


Re: [PATCH] Fixing PR59006 and PR58921 by delaying loop invariant hoisting in vectorizer.

2014-01-13 Thread Jakub Jelinek
On Mon, Jan 13, 2014 at 02:37:38PM +0100, Richard Biener wrote:
> 2014-01-13  Richard Biener  
> 
>   PR tree-optimization/58921
>   PR tree-optimization/59006
>   * tree-vect-loop-manip.c (vect_loop_versioning): Remove code
>   hoisting invariant stmts.
>   * tree-vect-stmts.c (vectorizable_load): Insert the splat of
>   invariant loads on the preheader edge if possible.
> 
>   * gcc.dg/torture/pr58921.c: New testcase.
>   * gcc.dg/torture/pr59006.c: Likewise.
>   * gcc.dg/vect/pr58508.c: XFAIL no longer handled cases.

Looks good to me.  If you want, I can add another bool to loop_vinfo, which
would say if in the vectorized loop could be aliasing preventing the
hoisting (i.e. set to false always, unless the loop->simdlen > 0, when it
would be set if we would without loop->simdlen > 0 use versioning for alias
or punting, but loop->simdlen > 0 resulted in vectorization of the loop
anyway).  Then, as a follow-up we could use that predicate instead of
LOOP_REQUIRES_VERSIONING_FOR_ALIAS in vectorizable_load.

Jakub


Re: [PATCH] Fixing PR59006 and PR58921 by delaying loop invariant hoisting in vectorizer.

2014-01-13 Thread Richard Biener
On Wed, 27 Nov 2013, Jakub Jelinek wrote:

> On Wed, Nov 27, 2013 at 10:53:56AM +0100, Richard Biener wrote:
> > Hmm.  I'm still thinking that we should handle this during the regular
> > transform step.
> 
> I wonder if it can't be done instead just in vectorizable_load,
> if LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) and the load is
> invariant, just emit the (broadcasted) load not inside of the loop, but on
> the loop preheader edge.

So this implements this suggestion, XFAILing the no longer handled cases.
For example we get

  _94 = *b_8(D);
  vect_cst_.18_95 = {_94, _94, _94, _94};
  _99 = prolog_loop_adjusted_niters.9_132 * 4;
  vectp_a.22_98 = a_6(D) + _99;
  ivtmp.43_77 = (unsigned long) vectp_a.22_98;

  :
  # ivtmp.41_67 = PHI 
  # ivtmp.43_71 = PHI 
  vect__10.19_97 = vect_cst_.18_95 + { 1, 1, 1, 1 };
  _76 = (void *) ivtmp.43_71;
  MEM[base: _76, offset: 0B] = vect__10.19_97;

...

instead of having hoisted *b_8 + 1 as scalar computation.  Not sure
why LIM doesn't hoist the vector variant later.

vect__10.19_97 = vect_cst_.18_95 + vect_cst_.20_96;
  invariant up to level 1, cost 1.

ah, the cost thing.  Should be "improved" to see that hoisting
reduces the number of live SSA names in the loop.

Eventually lower_vector_ssa could optimize vector to scalar
code again ... (ick).

Bootstrap / regtest running on x86_64.

Comments?

Thanks,
Richard.

2014-01-13  Richard Biener  

PR tree-optimization/58921
PR tree-optimization/59006
* tree-vect-loop-manip.c (vect_loop_versioning): Remove code
hoisting invariant stmts.
* tree-vect-stmts.c (vectorizable_load): Insert the splat of
invariant loads on the preheader edge if possible.

* gcc.dg/torture/pr58921.c: New testcase.
* gcc.dg/torture/pr59006.c: Likewise.
* gcc.dg/vect/pr58508.c: XFAIL no longer handled cases.

Index: gcc/tree-vect-loop-manip.c
===
*** gcc/tree-vect-loop-manip.c  (revision 206576)
--- gcc/tree-vect-loop-manip.c  (working copy)
*** vect_loop_versioning (loop_vec_info loop
*** 2435,2507 
}
  }
  
- 
-   /* Extract load statements on memrefs with zero-stride accesses.  */
- 
-   if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
- {
-   /* In the loop body, we iterate each statement to check if it is a load.
-Then we check the DR_STEP of the data reference.  If DR_STEP is zero,
-then we will hoist the load statement to the loop preheader.  */
- 
-   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-   int nbbs = loop->num_nodes;
- 
-   for (int i = 0; i < nbbs; ++i)
-   {
- for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]);
-  !gsi_end_p (si);)
-   {
- gimple stmt = gsi_stmt (si);
- stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
- struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
- 
- if (is_gimple_assign (stmt)
- && (!dr
- || (DR_IS_READ (dr) && integer_zerop (DR_STEP (dr)
-   {
- bool hoist = true;
- ssa_op_iter iter;
- tree var;
- 
- /* We hoist a statement if all SSA uses in it are defined
-outside of the loop.  */
- FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
-   {
- gimple def = SSA_NAME_DEF_STMT (var);
- if (!gimple_nop_p (def)
- && flow_bb_inside_loop_p (loop, gimple_bb (def)))
-   {
- hoist = false;
- break;
-   }
-   }
- 
- if (hoist)
-   {
- if (dr)
-   gimple_set_vuse (stmt, NULL);
- 
- gsi_remove (&si, false);
- gsi_insert_on_edge_immediate (loop_preheader_edge (loop),
-   stmt);
- 
- if (dump_enabled_p ())
-   {
- dump_printf_loc
- (MSG_NOTE, vect_location,
-  "hoisting out of the vectorized loop: ");
- dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
- dump_printf (MSG_NOTE, "\n");
-   }
- continue;
-   }
-   }
- gsi_next (&si);
-   }
-   }
- }
- 
/* End loop-exit-fixes after versioning.  */
  
if (cond_expr_stmt_list)
--- 2435,2440 
Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 206576)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_loa

Re: [PATCH] Fix ifcvt (PR rtl-optimization/58668)

2014-01-13 Thread Christophe Lyon
Hi Jakub,
I can confirm it's OK now.

Thanks,

Christophe.

On 10 January 2014 17:56, Christophe Lyon  wrote:
> On 10 January 2014 17:45, Jakub Jelinek  wrote:
>> On Fri, Jan 10, 2014 at 05:44:22PM +0100, Christophe Lyon wrote:
>>> It seems this patch causes several regressions in gfortran on ARM too:
>>> gfortran.dg/default_format_1.f90
>>> gfortran.dg/default_format_denormal_1.f90
>>> gfortran.dg/fmt_bz_bn.f
>>> gfortran.dg/fmt_read_bz_bn.f90
>>> gfortran.dg/g77/f77-edit-t-in.f
>>> gfortran.dg/list_read_4.f90
>>> gfortran.dg/namelist_11.f
>>> gfortran.dg/past_eor.f90
>>> gfortran.dg/read_2.f90
>>> gfortran.dg/read_float_2.f03
>>> gfortran.dg/read_float_3.f90
>>> gfortran.dg/read_float_4.f90
>>> now fail after this patch.
>>
>> Even after the http://gcc.gnu.org/r206456 fix?
>>
> I don't know yet. My validations are still catching up with the backlog.
> I'll tell you shortly.


Re: wide-int, wide

2014-01-13 Thread Richard Biener
On Sat, Nov 23, 2013 at 8:23 PM, Mike Stump  wrote:
> Richi has asked the we break the wide-int patch so that the individual port 
> and front end maintainers can review their parts without have to go through 
> the entire patch.This patch covers the new wide-int code.
>
> Ok?

I know the patch is not up-to-date.  I've looked at the wide-int.h pieces
on the branch repeatedly - more eyes on .cc bits appreciated.

Ok for stage1.

Thanks,
Richard.


Re: wide-int, fold

2014-01-13 Thread Richard Biener
On Sat, Nov 23, 2013 at 8:21 PM, Mike Stump  wrote:
> Richi has asked the we break the wide-int patch so that the individual port 
> and front end maintainers can review their parts without have to go through 
> the entire patch.This patch covers the constant folding code.
>
> Ok?

Ok for stage1.

Thanks,
Richard.


[PATCH] Fix test case vect-nop-move.c

2014-01-13 Thread Bernd Edlinger
Hello,

there is another test case, that misses the necessary check_vect() runtime 
check.

Tested on i686-pc-linux-gnu.
OK for trunk?

Regards
Bernd.

patch-vect-nop-move.diff
Description: Binary data


Re: [PATCH] Fix unaligned access generated by IVOPTS

2014-01-13 Thread Richard Biener
On Mon, Jan 13, 2014 at 11:37 AM, Eric Botcazou  wrote:
>> Note that this now lets unaligned vector moves slip through as
>> their TYPE_ALIGN (TREE_TYPE (ref)) is properly reflecting this
>> fact, so is anything which dereferences a type with an aligned
>> attribute lowering its alignment.
>>
>> Which of course raises the question what the function is
>> supposed to verify alignment against - given that it is only
>> queried for STRICT_ALIGNMENT targets I would guess
>> it wants to verify against mode alignment (historically
>> at least ...).  Not sure how this observation relates to the
>> bug you want to fix though.
>
> Yes, it was the mode, but on STRICT_ALIGNMENT targets types must be as aligned
> as their mode (unless you previously under-aligned the type and knew what you
> were doing when you did it...).

Yeah, the vectorizer first querying target capabilities and then under-aligning
the vector type probably qualifies here.

>  The bug is that, for BLKmode, you really need
> to look at the type to have the alignment.

Of course.

>> Still the patch is an improvement and thus ok.
>
> Thanks.
>
> --
> Eric Botcazou


Re: [PATCH] Avoid introducing undefined behavior in sccp (PR tree-optimization/59387)

2014-01-13 Thread Richard Biener
On Fri, 10 Jan 2014, Jakub Jelinek wrote:

> Hi!
> 
> If folded_casts is true, sccp can introduce undefined behavior even when
> there was none in the original loop, e.g. all actual additions performed in
> unsigned type and then cast back to signed.
> 
> The following patch fixes that by turning the arithmetic stmts added by sccp
> use unsigned operations if folded_casts and def's type has undefined
> overflow behavior.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2014-01-10  Jakub Jelinek  
> 
>   PR tree-optimization/59387
>   * tree-scalar-evolution.c: Include gimple-fold.h and gimplify-me.h.
>   (scev_const_prop): If folded_casts and type has undefined overflow,
>   use force_gimple_operand instead of force_gimple_operand_gsi and
>   for each added stmt if it is assign with
>   arith_code_with_undefined_signed_overflow, call
>   rewrite_to_defined_overflow.
>   * tree-ssa-loop-im.c: Don't include gimplify-me.h, include
>   gimple-fold.h instead.
>   (arith_code_with_undefined_signed_overflow,
>   rewrite_to_defined_overflow): Moved to ...
>   * gimple-fold.c (arith_code_with_undefined_signed_overflow,
>   rewrite_to_defined_overflow): ... here.  No longer static.
>   Include gimplify-me.h.
>   * gimple-fold.h (arith_code_with_undefined_signed_overflow,
>   rewrite_to_defined_overflow): New prototypes.
> 
>   * gcc.c-torture/execute/pr59387.c: New test.
> 
> --- gcc/tree-scalar-evolution.c.jj2014-01-08 17:44:57.596582925 +0100
> +++ gcc/tree-scalar-evolution.c   2014-01-10 15:46:55.355915072 +0100
> @@ -286,6 +286,8 @@ along with GCC; see the file COPYING3.
>  #include "dumpfile.h"
>  #include "params.h"
>  #include "tree-ssa-propagate.h"
> +#include "gimple-fold.h"
> +#include "gimplify-me.h"
>  
>  static tree analyze_scalar_evolution_1 (struct loop *, tree, tree);
>  static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
> @@ -3409,7 +3411,7 @@ scev_const_prop (void)
>  {
>edge exit;
>tree def, rslt, niter;
> -  gimple_stmt_iterator bsi;
> +  gimple_stmt_iterator gsi;
>  
>/* If we do not know exact number of iterations of the loop, we cannot
>replace the final value.  */
> @@ -3424,7 +3426,7 @@ scev_const_prop (void)
>/* Ensure that it is possible to insert new statements somewhere.  */
>if (!single_pred_p (exit->dest))
>   split_loop_exit_edge (exit);
> -  bsi = gsi_after_labels (exit->dest);
> +  gsi = gsi_after_labels (exit->dest);
>  
>ex_loop = superloop_at_depth (loop,
>   loop_depth (exit->dest->loop_father) + 1);
> @@ -3447,7 +3449,9 @@ scev_const_prop (void)
> continue;
>   }
>  
> -   def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, NULL);
> +   bool folded_casts;
> +   def = analyze_scalar_evolution_in_loop (ex_loop, loop, def,
> +   &folded_casts);
> def = compute_overall_effect_of_inner_loop (ex_loop, def);
> if (!tree_does_not_contain_chrecs (def)
> || chrec_contains_symbols_defined_in_loop (def, ex_loop->num)
> @@ -3485,10 +3489,38 @@ scev_const_prop (void)
> def = unshare_expr (def);
> remove_phi_node (&psi, false);
>  
> -   def = force_gimple_operand_gsi (&bsi, def, false, NULL_TREE,
> -   true, GSI_SAME_STMT);
> +   if (TREE_CODE (def) == INTEGER_CST && TREE_OVERFLOW (def))

TREE_OVERFLOW_P (), but it seems to me that the SCEV machinery
should do this at a good place (like where it finally records
the result into its cache before returning it, at set_and_end:
of analyze_scalar_evolution_1).

> + def = drop_tree_overflow (def);
> +
> +   /* If def's type has undefined overflow and there were folded
> +  casts, rewrite all stmts added for def into arithmetics
> +  with defined overflow behavior.  */
> +   if (folded_casts && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (def)))
> + {
> +   gimple_seq stmts;
> +   gimple_stmt_iterator gsi2;
> +   def = force_gimple_operand (def, &stmts, true, NULL_TREE);
> +   gsi2 = gsi_start (stmts);
> +   while (!gsi_end_p (gsi2))
> + {
> +   gimple stmt = gsi_stmt (gsi2);
> +   gsi_next (&gsi2);
> +   if (is_gimple_assign (stmt)
> +   && arith_code_with_undefined_signed_overflow
> + (gimple_assign_rhs_code (stmt)))
> + gsi_insert_seq_before (&gsi,
> +rewrite_to_defined_overflow (stmt),
> +GSI_SAME_STMT);

Hmm, stmt is still in the 'stmts' sequence here, I think you should
gsi_remove it before inserting it elsewhere.

> +   else
> + gsi_insert_b

Re: [PATCH] Fix unaligned access generated by IVOPTS

2014-01-13 Thread Eric Botcazou
> Note that this now lets unaligned vector moves slip through as
> their TYPE_ALIGN (TREE_TYPE (ref)) is properly reflecting this
> fact, so is anything which dereferences a type with an aligned
> attribute lowering its alignment.
> 
> Which of course raises the question what the function is
> supposed to verify alignment against - given that it is only
> queried for STRICT_ALIGNMENT targets I would guess
> it wants to verify against mode alignment (historically
> at least ...).  Not sure how this observation relates to the
> bug you want to fix though.

Yes, it was the mode, but on STRICT_ALIGNMENT targets types must be as aligned 
as their mode (unless you previously under-aligned the type and knew what you 
were doing when you did it...).  The bug is that, for BLKmode, you really need 
to look at the type to have the alignment.

> Still the patch is an improvement and thus ok.

Thanks.

-- 
Eric Botcazou


Re: Patch ping

2014-01-13 Thread Richard Biener
On Mon, 13 Jan 2014, Jakub Jelinek wrote:

> On Mon, Jan 13, 2014 at 09:15:14AM +0100, Uros Bizjak wrote:
> > On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek  wrote:
> > 
> > > http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html
> > > - PR target/59617
> > >   handle gather loads for AVX512 (at least non-masked ones, masked ones
> > >   will need to wait for 5.0 and we need to find how to represent it in
> > >   GIMPLE)
> > 
> > This one needs tree-optimization approval first.
> 
> Sure, that is why Richard was on To line too ;)

The vectorizer parts are ok.

Richard.


Re: [PATCH] Fix unaligned access generated by IVOPTS

2014-01-13 Thread Richard Biener
On Sat, Jan 11, 2014 at 12:42 AM, Eric Botcazou  wrote:
> [Sorry for dropping the ball here]
>
>> I think that may_be_unaligned_p is just seriously out-dated ... shouldn't it
>> be sth like
>>
>>   get_object_alignment_1 (ref, &align, &bitpos);
>>   if step * BITS_PER_UNIT + bitpos is misaligned
>> ...
>>
>> or rather all this may_be_unaligned_p stuff should be dropped and IVOPTs
>> should finally generate proper [TARGET_]MEM_REFs instead?  That is,
>> we already handle aliasing fine:
>>
>>   ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff,
>> reference_alias_ptr_type (*use->op_p),
>> iv, base_hint, data->speed);
>>
>> so just also handle alignment properly by passing down
>> get_object_alignment (*use->op_p) and in create_mem_ref_raw
>> do at the end do the
>>
>>   if (TYPE_MODE (type) != BLKmode
>>   && GET_MODE_ALIGNMENT (TYPE_MODE (type)) > align)
>> type = build_aligned_type (type, align);
>>
>> for BLKmode we already look at TYPE_ALIGN and as we do not change
>> the access type(?) either the previous code was already wrong or it was
>> fine, so there is nothing to do.
>>
>> So - if you want to give it a try...?
>
> After a bit of pondering, I'm not really thrilled, as this would mean changing
> TARGET_MEM_REF to accept invalid (unaligned) memory references for the target.

AFAIK the expander already handles this if the target can expand it
via movmisalign at least.  One issue with vectorization is that
possibly unaligned vector accesses are not handled/optimized by IVOPTs
which is bad.  Something to re-visit for 4.10.

> But I agree that may_be_unaligned_p is seriously outdated, so the attached
> patch entirely rewrites it, fixing the bug in the process.
>
> Tested on SPARC, SPARC64, IA-64 and ARM, OK for the mainline?

OK.

Note that this now lets unaligned vector moves slip through as
their TYPE_ALIGN (TREE_TYPE (ref)) is properly reflecting this
fact, so is anything which dereferences a type with an aligned
attribute lowering its alignment.

Which of course raises the question what the function is
supposed to verify alignment against - given that it is only
queried for STRICT_ALIGNMENT targets I would guess
it wants to verify against mode alignment (historically
at least ...).  Not sure how this observation relates to the
bug you want to fix though.

Still the patch is an improvement and thus ok.

Thanks,
Richard.

> 2014-01-10  Eric Botcazou  
>
> * builtins.c (get_object_alignment_2): Minor tweak.
> * tree-ssa-loop-ivopts.c (may_be_unaligned_p): Rewrite.
>
>
> --
> Eric Botcazou


Re: [AARCH64][PATCH] PR59695

2014-01-13 Thread Richard Earnshaw
On 11/01/14 23:42, Kugan wrote:
> Hi,
> 
> aarch64_build_constant incorrectly truncates the immediate when
> constants are generated with MOVN. This causes coinor-osi tests to fail
> (tracked also in https://bugs.launchpad.net/gcc-linaro/+bug/1263576)
> 
> Attached patch fixes this. Also attaching a reduced testcase that
> reproduces this. Tested on aarch64-none-linux-gnu with no new
> regressions. Is this OK for trunk?
> 
> Thanks,
> Kugan
> 
> gcc/
> +2013-10-15  Matthew Gretton-Dann  
> + Kugan Vivekanandarajah  
> +
> + PR target/59588
> + * config/aarch64/aarch64.c (aarch64_build_constant): Fix incorrect
> + truncation.
> +
> 
> 
> gcc/testsuite/
> +2014-01-11  Matthew Gretton-Dann  
> + Kugan Vivekanandarajah  
> +
> + PR target/59695
> + * g++.dg/pr59695.C: New file.
> +
> 
> 
> p.txt
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3d32ea5..854666f 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -2486,7 +2486,7 @@ aarch64_build_constant (int regnum, HOST_WIDE_INT val)
>if (ncount < zcount)
>   {
> emit_move_insn (gen_rtx_REG (Pmode, regnum),
> -   GEN_INT ((~val) & 0x));
> +   GEN_INT (~((~val) & 0x)));

I think that would be better written as

GEN_INT (val | ~(HOST_WIDE_INT) 0x);

Note the cast after the ~ to ensure we invert the right number of bits.

Otherwise OK.

R.

> tval = 0x;
>   }
>else
> diff --git a/gcc/testsuite/g++.dg/pr59695.C b/gcc/testsuite/g++.dg/pr59695.C
> index e69de29..0da06cb 100644
> --- a/gcc/testsuite/g++.dg/pr59695.C
> +++ b/gcc/testsuite/g++.dg/pr59695.C
> @@ -0,0 +1,125 @@
> +
> +/* PR target/53055 */
> +/* { dg-do run { target aarch64*-*-* } } */
> +/* { dg-options "-O0" } */
> +
> +#define  DEFINE_VIRTUALS_FNS(i)  virtual void  xxx##i () {} \
> +  virtual void  foo1_##i ()  {}\
> +  virtual void  foo2_##i ()  {}\
> +  virtual void  foo3_##i ()  {}\
> +  virtual void  foo4_##i ()  {}\
> +  virtual void  foo5_##i ()  {}\
> +  virtual void  foo6_##i ()  {}\
> +  virtual void  foo7_##i ()  {}\
> +  virtual void  foo8_##i ()  {}\
> +  virtual void  foo9_##i ()  {}\
> +  virtual void  foo10_##i () {}\
> +  virtual void  foo11_##i () {}\
> +  virtual void  foo12_##i () {}\
> +  virtual void  foo13_##i () {}\
> +  virtual void  foo14_##i () {}\
> +  virtual void  foo15_##i () {}\
> +  virtual void  foo16_##i () {}\
> +  virtual void  foo17_##i () {}\
> +  virtual void  foo18_##i () {}\
> +  virtual void  foo19_##i () {}\
> +  virtual void  foo20_##i () {}\
> +  virtual void  foo21_##i () {}\
> +  virtual void  foo22_##i () {}\
> +
> +class base_class_2
> +{
> +
> +public:
> +  /* Define lots of virtual functions */
> +  DEFINE_VIRTUALS_FNS (1)
> +  DEFINE_VIRTUALS_FNS (2)
> +  DEFINE_VIRTUALS_FNS (3)
> +  DEFINE_VIRTUALS_FNS (4)
> +  DEFINE_VIRTUALS_FNS (5)
> +  DEFINE_VIRTUALS_FNS (6)
> +  DEFINE_VIRTUALS_FNS (7)
> +  DEFINE_VIRTUALS_FNS (8)
> +  DEFINE_VIRTUALS_FNS (9)
> +  DEFINE_VIRTUALS_FNS (10)
> +  DEFINE_VIRTUALS_FNS (11)
> +  DEFINE_VIRTUALS_FNS (12)
> +  DEFINE_VIRTUALS_FNS (13)
> +  DEFINE_VIRTUALS_FNS (14)
> +  DEFINE_VIRTUALS_FNS (15)
> +  DEFINE_VIRTUALS_FNS (16)
> +  DEFINE_VIRTUALS_FNS (17)
> +  DEFINE_VIRTUALS_FNS (18)
> +  DEFINE_VIRTUALS_FNS (19)
> +  DEFINE_VIRTUALS_FNS (20)
> +
> +  base_class_2();
> +  virtual ~base_class_2 ();
> +};
> +
> +base_class_2::base_class_2()
> +{
> +}
> +
> +base_class_2::~base_class_2 ()
> +{
> +}
> +
> +class base_class_1
> +{
> +public:
> +  virtual ~base_class_1();
> +  base_class_1();
> +};
> +
> +base_class_1::base_class_1()
> +{
> +}
> +
> +base_class_1::~base_class_1()
> +{
> +}
> +
> +class base_Impl_class :
> +  virtual public base_class_2, public base_class_1
> +{
> +public:
> +  base_Impl_class ();
> +  virtual ~base_Impl_class ();
> +};
> +
> +base_Impl_class::base_Impl_class ()
> +{
> +}
> +
> +base_Impl_class::~base_Impl_class ()
> +{
> +}
> +
> +
> +class test_cls : public base_Impl_class
> +{
> +public:
> +  test_cls();
> +  virtual ~test_cls();
> +};
> +
> +test_cls::test_cls()
> +{
> +}
> +
> +test_cls::~test_cls()
> +{
> +}
> +
> +int main()
> +{
> +  test_cls *test = new test_cls;
> +  base_class_2 *p1 = test;
> +
> +  /* PR 53055  destructor thunk offsets are not setup
> +   correctly resulting in crash.  */
> +  delete p1;
> +  return 0;
> +}
> +
> 




Re: [PATCH] Allocate all target globals using GC for SWITCHABLE_TARGETs

2014-01-13 Thread Richard Biener
On Sun, Jan 12, 2014 at 10:51 PM, Trevor Saunders  wrote:
> On Sun, Jan 12, 2014 at 02:23:21PM +0100, Richard Biener wrote:
>> On Fri, Jan 10, 2014 at 6:37 PM, Richard Henderson  wrote:
>> > On 01/09/2014 03:34 PM, Jakub Jelinek wrote:
>> >> 2014-01-09  Jakub Jelinek  
>> >>
>> >>   * target-globals.c (save_target_globals): Allocate < 4KB structs 
>> >> using
>> >>   GC in payload of target_globals struct instead of allocating them on
>> >>   the heap and the larger structs separately using GC.
>> >>   * target-globals.h (struct target_globals): Make regs, hard_regs,
>> >>   reload, expmed, ira, ira_int and lra_fields GTY((atomic)) instead
>> >>   of GTY((skip)) and change type to void *.
>> >>   (reset_target_globals): Cast loads from those fields to 
>> >> corresponding
>> >>   types.
>> >>
>> >> --- gcc/target-globals.h.jj   2014-01-09 19:24:20.0 +0100
>> >> +++ gcc/target-globals.h  2014-01-09 19:39:43.879348712 +0100
>> >> @@ -41,17 +41,17 @@ extern struct target_lower_subreg *this_
>> >>
>> >>  struct GTY(()) target_globals {
>> >>struct target_flag_state *GTY((skip)) flag_state;
>> >> -  struct target_regs *GTY((skip)) regs;
>> >> +  void *GTY((atomic)) regs;
>> >
>> > I'm not entirely fond of this either, for the obvious reason.  Clearly a
>> > deficiency in gengtype, but after 2 hours of poking around I can see that
>> > it isn't a quick fix.
>> >
>> > I guess I'm ok with the patch, since the use of the target_globals 
>> > structure
>> > is so restricted.
>>
>> Yeah.  At some time we need a way to specify a finalization hook called
>> if an object is collected and eventually a hook that walks extra roots
>> indirectly
>> reachable via an object (so you can have GC -> heap -> GC memory layouts
>> more easily).
>
> I actually tried to add finalizers a couple weeks ago, but it seems
> pretty non trivial.  ggc seems to basically just allocate by searching
> for the first unmarked block. It doesn't even sweep unmarked stuff, it
> just marks and then waits for the space to be allocated over.  I believe
> it deals with size by using different pages for each size class? So even
> if it did sweep it would be somewhat tricky to know what finalizer to
> call. Perhaps a solution is to have separate pages for each type that
> needs a finalizer, and be able to mark things as being in one of three
> states (in use, needs finalization but not in use, finalized and not in
> use).  That might hurt memory consumption in the short term, but I think
> finalizers will be really useful in getting stuff out of gc memory so
> that's probably not too bad.

I think you would need to have a list of object/finalizer per GC page
and do finalization at sweep_pages () time.

Yes, per-type pools would also work (for types with finalizers).

Or rework how the GC works - surely advanced techs like incremental
or copying collection might benefit GCC.

Richard.

> Trev
>
>>
>> Richard.
>>
>> >
>> > r~
>> >


Re: Test cases vect-simd-clone-10/12.c keep failing

2014-01-13 Thread Jakub Jelinek
On Sun, Jan 12, 2014 at 10:53:12PM +0100, Bernd Edlinger wrote:
> Yes, explicit /* { dg-do run } */ works.

Ok, I've committed
2014-01-13  Jakub Jelinek  

* gcc.dg/vect/vect-simd-clone-10.c: Add dg-do run.
* gcc.dg/vect/vect-simd-clone-12.c: Likewise.

--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c  (revision 206573)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c  (working copy)
@@ -1,3 +1,4 @@
+/* { dg-do run } */
 /* { dg-require-effective-target vect_simd_clones } */
 /* { dg-additional-options "-fopenmp-simd" } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-12.c  (revision 206573)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-12.c  (working copy)
@@ -1,3 +1,4 @@
+/* { dg-do run } */
 /* { dg-require-effective-target vect_simd_clones } */
 /* { dg-additional-options "-fopenmp-simd" } */
 /* { dg-additional-options "-mavx" { target avx_runtime } } */

then.

Jakub


Re: Patch ping

2014-01-13 Thread Jakub Jelinek
On Mon, Jan 13, 2014 at 09:15:14AM +0100, Uros Bizjak wrote:
> On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek  wrote:
> 
> > http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html
> > - PR target/59617
> >   handle gather loads for AVX512 (at least non-masked ones, masked ones
> >   will need to wait for 5.0 and we need to find how to represent it in
> >   GIMPLE)
> 
> This one needs tree-optimization approval first.

Sure, that is why Richard was on To line too ;)

> Kirill, is it possible for you to test the patch in the simulator? Do
> we have a testcase in gcc's testsuite that can be used to check this
> patch?

E.g. gcc.target/i386/avx2-gather* and avx512f-gather*.

Jakub


Re: Patch ping

2014-01-13 Thread Uros Bizjak
On Mon, Jan 13, 2014 at 9:07 AM, Jakub Jelinek  wrote:

> http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html
> - PR target/59617
>   handle gather loads for AVX512 (at least non-masked ones, masked ones
>   will need to wait for 5.0 and we need to find how to represent it in
>   GIMPLE)

This one needs tree-optimization approval first.

Kirill, is it possible for you to test the patch in the simulator? Do
we have a testcase in gcc's testsuite that can be used to check this
patch?

Uros.


Patch ping

2014-01-13 Thread Jakub Jelinek
Hi!

I'd like to ping 2 patches:

http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00140.html
- Ensure GET_MODE_{SIZE,INNER,NUNITS} (const) is constant rather than
  memory load after optimization (I'd like to keep the current 
  patch for the reasons mentioned there, but also add this patch)

http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00131.html
- PR target/59617
  handle gather loads for AVX512 (at least non-masked ones, masked ones
  will need to wait for 5.0 and we need to find how to represent it in
  GIMPLE)

Jakub


[committed] Fix #pragma omp atomic/atomic reductions (PR libgomp/59194)

2014-01-13 Thread Jakub Jelinek
Hi!

When expanding #pragma omp atomic or reduction merging using
expand_omp_atomic_pipeline loop, we start by fetching the initial value
using normal memory read and only in the second and following iteration
use the one from the atomic compare and exchange.  The initial value
is just an optimization, it is better if it is what we'll want to use,
but if it is something different, except perhaps for floating point
exceptions it shouldn't really matter what exact value we load.
This patch uses __atomic_load_N with MEMMODEL_RELAXED instead of
normal load.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2014-01-13  Jakub Jelinek  

PR libgomp/59194
* omp-low.c (expand_omp_atomic_pipeline): Expand the initial
load as __atomic_load_N if possible.

--- gcc/omp-low.c.jj2014-01-08 17:45:05.0 +0100
+++ gcc/omp-low.c   2014-01-10 21:12:22.498276852 +0100
@@ -7536,12 +7536,21 @@ expand_omp_atomic_pipeline (basic_block
   loadedi = loaded_val;
 }
 
+  fncode = (enum built_in_function) (BUILT_IN_ATOMIC_LOAD_N + index + 1);
+  tree loaddecl = builtin_decl_explicit (fncode);
+  if (loaddecl)
+initial
+  = fold_convert (TREE_TYPE (TREE_TYPE (iaddr)),
+ build_call_expr (loaddecl, 2, iaddr,
+  build_int_cst (NULL_TREE,
+ MEMMODEL_RELAXED)));
+  else
+initial = build2 (MEM_REF, TREE_TYPE (TREE_TYPE (iaddr)), iaddr,
+ build_int_cst (TREE_TYPE (iaddr), 0));
+
   initial
-= force_gimple_operand_gsi (&si,
-   build2 (MEM_REF, TREE_TYPE (TREE_TYPE (iaddr)),
-   iaddr,
-   build_int_cst (TREE_TYPE (iaddr), 0)),
-   true, NULL_TREE, true, GSI_SAME_STMT);
+= force_gimple_operand_gsi (&si, initial, true, NULL_TREE, true,
+   GSI_SAME_STMT);
 
   /* Move the value to the LOADEDI temporary.  */
   if (gimple_in_ssa_p (cfun))

Jakub