date:20111013

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini


On 10/13/2011 10:07 PM, H.J. Lu wrote:

On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner
  wrote:

The answer to H.J.'s "Why do we do it for MEM then?" is simply
"because no one ever thought about not doing it"


No, that's false.  The same expand_compound_operation / make_compound_operation
pair is present in the MEM case as in the SET case.  It's just that
there's some bug here that's noticable in not making proper MEMs that
doesn't show up in the SET case because of the way the insns are structured.



When we have (and (OP) M) where

(and (OP) M) == (and (OP) ((1<<  ceil_log2 (M)) - 1) ))

(and (OP) M) is zero_extract bits 0 to ceil_log2 (M).

Does it look OK?


Yes, it does.  How did you test it?

Paolo

Re: [Patch,AVR] Print no-return functions as JMP

2011-10-13 Thread Georg-Johann Lay


Richard Henderson schrieb:

On 10/13/2011 12:00 PM, Georg-Johann Lay wrote:


What do you propose?

o A command line option that is on per default like
 -mnoreturn-tail-calls or -mjmp-noreturn


The command-line-option.  I think I prefer -mjump-noreturn,
as the inverse -mno-noreturn-tail-calls is too awkward.


What about flag_optimize_sibling_calls?
What wa are seeing here is actually a tail call.

Johann



r~

[PATCH] Add mulv4di3 expander

2011-10-13 Thread Jakub Jelinek

Hi!

mulv2di3 can be expanded the same as mulv2di3.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-14  Jakub Jelinek  

* config/i386/sse.md (mulv2di3): Macroize using VI8_AVX2
iterator.
(ashl3): Use VI248_AVX2 iterator instead of VI248_128.
Use  instead of TI in mode attr.

--- gcc/config/i386/sse.md.jj   2011-10-13 21:10:52.0 +0200
+++ gcc/config/i386/sse.md  2011-10-13 22:51:55.0 +0200
@@ -5419,10 +5419,10 @@ (define_insn_and_split "*sse2_mulv4si3"
   DONE;
 })
 
-(define_insn_and_split "mulv2di3"
-  [(set (match_operand:V2DI 0 "register_operand" "")
-   (mult:V2DI (match_operand:V2DI 1 "register_operand" "")
-  (match_operand:V2DI 2 "register_operand" "")))]
+(define_insn_and_split "mul3"
+  [(set (match_operand:VI8_AVX2 0 "register_operand" "")
+   (mult:VI8_AVX2 (match_operand:VI8_AVX2 1 "register_operand" "")
+  (match_operand:VI8_AVX2 2 "register_operand" "")))]
   "TARGET_SSE2
&& can_create_pseudo_p ()"
   "#"
@@ -5436,7 +5436,7 @@ (define_insn_and_split "mulv2di3"
   op1 = operands[1];
   op2 = operands[2];
 
-  if (TARGET_XOP)
+  if (TARGET_XOP && mode == V2DImode)
 {
   /* op1: A,B,C,D, op2: E,F,G,H */
   op1 = gen_lowpart (V4SImode, op1);
@@ -5468,39 +5468,42 @@ (define_insn_and_split "mulv2di3"
 }
   else
 {
-  t1 = gen_reg_rtx (V2DImode);
-  t2 = gen_reg_rtx (V2DImode);
-  t3 = gen_reg_rtx (V2DImode);
-  t4 = gen_reg_rtx (V2DImode);
-  t5 = gen_reg_rtx (V2DImode);
-  t6 = gen_reg_rtx (V2DImode);
+  t1 = gen_reg_rtx (mode);
+  t2 = gen_reg_rtx (mode);
+  t3 = gen_reg_rtx (mode);
+  t4 = gen_reg_rtx (mode);
+  t5 = gen_reg_rtx (mode);
+  t6 = gen_reg_rtx (mode);
   thirtytwo = GEN_INT (32);
 
   /* Multiply low parts.  */
-  emit_insn (gen_sse2_umulv2siv2di3 (t1, gen_lowpart (V4SImode, op1),
-gen_lowpart (V4SImode, op2)));
-
-  /* Shift input vectors left 32 bits so we can multiply high parts.  */
-  emit_insn (gen_lshrv2di3 (t2, op1, thirtytwo));
-  emit_insn (gen_lshrv2di3 (t3, op2, thirtytwo));
+  emit_insn (gen__umulvsi3
+ (t1, gen_lowpart (mode, op1),
+  gen_lowpart (mode, op2)));
+
+  /* Shift input vectors right 32 bits so we can multiply high parts.  */
+  emit_insn (gen_lshr3 (t2, op1, thirtytwo));
+  emit_insn (gen_lshr3 (t3, op2, thirtytwo));
 
   /* Multiply high parts by low parts.  */
-  emit_insn (gen_sse2_umulv2siv2di3 (t4, gen_lowpart (V4SImode, op1),
-gen_lowpart (V4SImode, t3)));
-  emit_insn (gen_sse2_umulv2siv2di3 (t5, gen_lowpart (V4SImode, op2),
-gen_lowpart (V4SImode, t2)));
+  emit_insn (gen__umulvsi3
+ (t4, gen_lowpart (mode, op1),
+  gen_lowpart (mode, t3)));
+  emit_insn (gen__umulvsi3
+ (t5, gen_lowpart (mode, op2),
+  gen_lowpart (mode, t2)));
 
   /* Shift them back.  */
-  emit_insn (gen_ashlv2di3 (t4, t4, thirtytwo));
-  emit_insn (gen_ashlv2di3 (t5, t5, thirtytwo));
+  emit_insn (gen_ashl3 (t4, t4, thirtytwo));
+  emit_insn (gen_ashl3 (t5, t5, thirtytwo));
 
   /* Add the three parts together.  */
-  emit_insn (gen_addv2di3 (t6, t1, t4));
-  emit_insn (gen_addv2di3 (op0, t6, t5));
+  emit_insn (gen_add3 (t6, t1, t4));
+  emit_insn (gen_add3 (op0, t6, t5));
 }
 
   set_unique_reg_note (get_last_insn (), REG_EQUAL,
-  gen_rtx_MULT (V2DImode, operands[1], operands[2]));
+  gen_rtx_MULT (mode, operands[1], operands[2]));
   DONE;
 })
 
@@ -5768,9 +5771,9 @@ (define_insn "avx2_lshl3"
(set_attr "mode" "OI")])
 
 (define_insn "ashl3"
-  [(set (match_operand:VI248_128 0 "register_operand" "=x,x")
-   (ashift:VI248_128
- (match_operand:VI248_128 1 "register_operand" "0,x")
+  [(set (match_operand:VI248_AVX2 0 "register_operand" "=x,x")
+   (ashift:VI248_AVX2
+ (match_operand:VI248_AVX2 1 "register_operand" "0,x")
  (match_operand:SI 2 "nonmemory_operand" "xN,xN")))]
   "TARGET_SSE2"
   "@
@@ -5784,7 +5787,7 @@ (define_insn "ashl3"
(const_string "0")))
(set_attr "prefix_data16" "1,*")
(set_attr "prefix" "orig,vex")
-   (set_attr "mode" "TI")])
+   (set_attr "mode" "")])
 
 (define_expand "vec_shl_"
   [(set (match_operand:VI_128 0 "register_operand" "")

Jakub

[PATCH] 32-byte integer vec_interleave_{high,low}

2011-10-13 Thread Jakub Jelinek

Hi!

This patch adds VI_256 vec_interleave_{high,low} as well as
using it in the vector expander.
While it needs 3 insns for each, the first two will be actually CSEd
if both patterns are expanded (the usual case from the vectorizer, e.g.
for vect-strided-store-u32-i2.c), so we end up with 2 vunpck* insns
followed by 2 vperm2i128 insns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-14  Jakub Jelinek  

* config/i386/sse.md (vec_interleave_high,
vec_interleave_low): Add AVX2 expanders for VI_256
modes.
* config/i386/i386.c (expand_vec_perm_interleave3): New function.
(ix86_expand_vec_perm_builtin_1): Call it.

--- gcc/config/i386/sse.md.jj   2011-10-13 14:50:15.0 +0200
+++ gcc/config/i386/sse.md  2011-10-13 17:34:26.0 +0200
@@ -6765,6 +6765,38 @@ (define_insn "vec_interleave_lowv4si"
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
+(define_expand "vec_interleave_high"
+  [(match_operand:VI_256 0 "register_operand" "=x")
+   (match_operand:VI_256 1 "register_operand" "x")
+   (match_operand:VI_256 2 "nonimmediate_operand" "xm")]
+ "TARGET_AVX2"
+{
+  rtx t1 = gen_reg_rtx (mode);
+  rtx t2 = gen_reg_rtx (mode);
+  emit_insn (gen_avx2_interleave_low (t1, operands[1], operands[2]));
+  emit_insn (gen_avx2_interleave_high (t2,  operands[1], operands[2]));
+  emit_insn (gen_avx2_permv2ti (gen_lowpart (V4DImode, operands[0]),
+   gen_lowpart (V4DImode, t1),
+   gen_lowpart (V4DImode, t2), GEN_INT (1 + (3 << 
4;
+  DONE;
+})
+
+(define_expand "vec_interleave_low"
+  [(match_operand:VI_256 0 "register_operand" "=x")
+   (match_operand:VI_256 1 "register_operand" "x")
+   (match_operand:VI_256 2 "nonimmediate_operand" "xm")]
+ "TARGET_AVX2"
+{
+  rtx t1 = gen_reg_rtx (mode);
+  rtx t2 = gen_reg_rtx (mode);
+  emit_insn (gen_avx2_interleave_low (t1, operands[1], operands[2]));
+  emit_insn (gen_avx2_interleave_high (t2, operands[1], operands[2]));
+  emit_insn (gen_avx2_permv2ti (gen_lowpart (V4DImode, operands[0]),
+   gen_lowpart (V4DImode, t1),
+   gen_lowpart (V4DImode, t2), GEN_INT (0 + (2 << 
4;
+  DONE;
+})
+
 ;; Modes handled by pinsr patterns.
 (define_mode_iterator PINSR_MODE
   [(V16QI "TARGET_SSE4_1") V8HI
--- gcc/config/i386/i386.c.jj   2011-10-13 11:56:19.0 +0200
+++ gcc/config/i386/i386.c  2011-10-13 18:36:58.0 +0200
@@ -35474,6 +35474,82 @@ expand_vec_perm_interleave2 (struct expa
   return true;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to simplify
+   a two vector permutation using 2 intra-lane interleave insns
+   and cross-lane shuffle for 32-byte vectors.  */
+
+static bool
+expand_vec_perm_interleave3 (struct expand_vec_perm_d *d)
+{
+  unsigned i, nelt;
+  rtx (*gen) (rtx, rtx, rtx);
+
+  if (d->op0 == d->op1)
+return false;
+  if (TARGET_AVX2 && GET_MODE_SIZE (d->vmode) == 32)
+;
+  else if (TARGET_AVX && (d->vmode == V8SFmode || d->vmode == V4DFmode))
+;
+  else
+return false;
+
+  nelt = d->nelt;
+  if (d->perm[0] != 0 && d->perm[0] != nelt / 2)
+return false;
+  for (i = 0; i < nelt; i += 2)
+if (d->perm[i] != d->perm[0] + i / 2
+   || d->perm[i + 1] != d->perm[0] + i / 2 + nelt)
+  return false;
+
+  if (d->testing_p)
+return true;
+
+  switch (d->vmode)
+{
+case V32QImode:
+  if (d->perm[0])
+   gen = gen_vec_interleave_highv32qi;
+  else
+   gen = gen_vec_interleave_lowv32qi;
+  break;
+case V16HImode:
+  if (d->perm[0])
+   gen = gen_vec_interleave_highv16hi;
+  else
+   gen = gen_vec_interleave_lowv16hi;
+  break;
+case V8SImode:
+  if (d->perm[0])
+   gen = gen_vec_interleave_highv8si;
+  else
+   gen = gen_vec_interleave_lowv8si;
+  break;
+case V4DImode:
+  if (d->perm[0])
+   gen = gen_vec_interleave_highv4di;
+  else
+   gen = gen_vec_interleave_lowv4di;
+  break;
+case V8SFmode:
+  if (d->perm[0])
+   gen = gen_vec_interleave_highv8sf;
+  else
+   gen = gen_vec_interleave_lowv8sf;
+  break;
+case V4DFmode:
+  if (d->perm[0])
+   gen = gen_vec_interleave_highv4df;
+  else
+   gen = gen_vec_interleave_lowv4df;
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  emit_insn (gen (d->target, d->op0, d->op1));
+  return true;
+}
+
 /* A subroutine of expand_vec_perm_even_odd_1.  Implement the double-word
permutation with two pshufb insns and an ior.  We should have already
failed all two instruction sequences.  */
@@ -35972,6 +36048,9 @@ ix86_expand_vec_perm_builtin_1 (struct e
   if (expand_vec_perm_pshufb2 (d))
 return true;
 
+  if (expand_vec_perm_interleave3 (d))
+return true;
+
   /* Try sequences of four instructions.  */
 
   if (expand_vec_perm_vpshufb2_vpermq (d))

Jakub

Re: [PATCH 0/6] Cleanups for generic vector permutation.

2011-10-13 Thread David Miller

From: r...@redhat.com
Date: Thu, 13 Oct 2011 20:43:19 -0700

> These patches allow __builtin_shuffle to handle any vector permutation
> via optabs.  It allows for a not-uncommon fallback to byte permutation
> at rtl expansion time, while leaving the tree/gimple-level permutation
> as element-based.
> 
> All three targets which heretofore supported vector permutation in any
> way have been updated to support the new optabs.
> 
> The next step is to convert the vectorizer to use the VEC_PERM_EXPR code
> rather than using the hook that returns builtins.  Once that is done,
> it would be possible for the targets to delete the builtins.  Supposing
> that they're not exposed for user-level consumption (which is the case
> for i386; the user-level interface is via inlines in a header file, which
> can be updated to use __builtin_shuffle).

Looks good Richard, I'll work on vec_init and vec_perm* patterns on
Sparc when I get a chance.

Re: [google] support for building Linux kernel with FDO (issue4523061)

2011-10-13 Thread Xinliang David Li

This patch is for google/main which is 4.7 based, but the validated
version is in google_46 branch (which is based on 4.6).

By the way (given that you are from intel),  do you know if linux
kernel can be built with icc with PGO turned on? Our intern Xiaotian
has tried to use icc (12.0) to built kernel, and had some problems.
The bootable kernel built with icc + gcc (for those failed with icc)
does not perform quite well.

Thanks,

David

On Thu, Oct 13, 2011 at 7:02 PM, vulcansh  wrote:
>
>
> Rong Xu wrote:
>>
>> That will be good.
>> But you never know, we internally have fixed some bugs that filed to
>> us because people use kernel's old gcov code (many versions guarded by
>> ifdef) for their tests.
>>
>> -Rong
>>
>
> Has there been any progress one this patch?  What version of gcc is this
> patch for?  I am interested in something that works with gcc 4.7.
>
> -Steve
>
> --
> View this message in context: 
> http://old.nabble.com/-google---support-for-building-Linux-kernel-with-FDO-%28issue4523061%29-tp31607746p32649731.html
> Sent from the gcc - patches mailing list archive at Nabble.com.
>
>

[PATCH] Merge sparc plus/minus vector operations using a code iterator.

2011-10-13 Thread David Miller


This is based upon suggestions from David Bremner.

Committed to trunk.

gcc/

* config/sparc/sparc.md (plusminus): New code iterator.
(plusminus_insn): New code attr.
(addv2si3, subv2si3, addv4hi3, subv4hi3, addv2hi3, subv2hi3): Merge
using plusminus and plusminus_insn.
(fpadd64_vis, fpsub64_vis): Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179959 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |8 +
 gcc/config/sparc/sparc.md |   73 -
 2 files changed, 27 insertions(+), 54 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index db96937..a8f51e9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2011-10-13  David S. Miller  
+
+   * config/sparc/sparc.md (plusminus): New code iterator.
+   (plusminus_insn): New code attr.
+   (addv2si3, subv2si3, addv4hi3, subv4hi3, addv2hi3, subv2hi3): Merge
+   using plusminus and plusminus_insn.
+   (fpadd64_vis, fpsub64_vis): Likewise.
+
 2011-10-13  Richard Henderson  
 
* doc/md.texi (vec_perm): Document fallback to byte permutation.
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index c41e259..6118e6d 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -7881,64 +7881,36 @@
   [(set_attr "type" "multi")
(set_attr "length" "4")])
 
-
 ;; Vector instructions.
 
-(define_insn "addv2si3"
-  [(set (match_operand:V2SI 0 "register_operand" "=e")
-   (plus:V2SI (match_operand:V2SI 1 "register_operand" "e")
-  (match_operand:V2SI 2 "register_operand" "e")))]
-  "TARGET_VIS"
-  "fpadd32\t%1, %2, %0"
-  [(set_attr "type" "fga")
-   (set_attr "fptype" "double")])
-
-(define_insn "addv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=e")
-(plus:V4HI (match_operand:V4HI 1 "register_operand" "e")
-   (match_operand:V4HI 2 "register_operand" "e")))]
-  "TARGET_VIS"
-  "fpadd16\t%1, %2, %0"
-  [(set_attr "type" "fga")
-   (set_attr "fptype" "double")])
-
-;; fpadd32s is emitted by the addsi3 pattern.
-
-(define_insn "addv2hi3"
-  [(set (match_operand:V2HI 0 "register_operand" "=f")
-   (plus:V2HI (match_operand:V2HI 1 "register_operand" "f")
-  (match_operand:V2HI 2 "register_operand" "f")))]
-  "TARGET_VIS"
-  "fpadd16s\t%1, %2, %0"
-  [(set_attr "type" "fga")
-   (set_attr "fptype" "single")])
+(define_code_iterator plusminus [plus minus])
+(define_code_attr plusminus_insn [(plus "add") (minus "sub")])
 
-(define_insn "subv2si3"
+;; fp{add,sub}32s are emitted by the {add,sub}si3 patterns.
+(define_insn "v2si3"
   [(set (match_operand:V2SI 0 "register_operand" "=e")
-   (minus:V2SI (match_operand:V2SI 1 "register_operand" "e")
-   (match_operand:V2SI 2 "register_operand" "e")))]
+   (plusminus:V2SI (match_operand:V2SI 1 "register_operand" "e")
+   (match_operand:V2SI 2 "register_operand" "e")))]
   "TARGET_VIS"
-  "fpsub32\t%1, %2, %0"
+  "fp32\t%1, %2, %0"
   [(set_attr "type" "fga")
(set_attr "fptype" "double")])
 
-(define_insn "subv4hi3"
+(define_insn "v4hi3"
   [(set (match_operand:V4HI 0 "register_operand" "=e")
-   (minus:V4HI (match_operand:V4HI 1 "register_operand" "e")
-   (match_operand:V4HI 2 "register_operand" "e")))]
+   (plusminus:V4HI (match_operand:V4HI 1 "register_operand" "e")
+   (match_operand:V4HI 2 "register_operand" "e")))]
   "TARGET_VIS"
-  "fpsub16\t%1, %2, %0"
+  "fp16\t%1, %2, %0"
   [(set_attr "type" "fga")
(set_attr "fptype" "double")])
 
-;; fpsub32s is emitted by the subsi3 pattern.
-
-(define_insn "subv2hi3"
+(define_insn "v2hi3"
   [(set (match_operand:V2HI 0 "register_operand" "=f")
-   (minus:V2HI (match_operand:V2HI 1 "register_operand" "f")
-   (match_operand:V2HI 2 "register_operand" "f")))]
+   (plusminus:V2HI (match_operand:V2HI 1 "register_operand" "f")
+   (match_operand:V2HI 2 "register_operand" "f")))]
   "TARGET_VIS"
-  "fpsub16s\t%1, %2, %0"
+  "fp16s\t%1, %2, %0"
   [(set_attr "type" "fga")
(set_attr "fptype" "single")])
 
@@ -8505,19 +8477,12 @@
   "TARGET_VIS3"
   "fmean16\t%1, %2, %0")
 
-(define_insn "fpadd64_vis"
-  [(set (match_operand:DI 0 "register_operand" "=e")
-(plus:DI (match_operand:DI 1 "register_operand" "e")
- (match_operand:DI 2 "register_operand" "e")))]
-  "TARGET_VIS3"
-  "fpadd64\t%1, %2, %0")
-
-(define_insn "fpsub64_vis"
+(define_insn "fp64_vis"
   [(set (match_operand:DI 0 "register_operand" "=e")
-(minus:DI (match_operand:DI 1 "register_operand" "e")
-  (match_operand:DI 2 "register_operand" "e")))]
+   (plusminus:DI (match_operand:DI 1 "register_operand" "e")
+ (match_operand:DI 2 "register_operand" "e")))]
   "TARGET_VIS3"
-  "fpsub64\t%1, %2, %0")
+  "fp64\t%1, %2, %0")
 
 (define_mode_iterator VASS [V4HI

[PATCH 6/6] Expand vector permutation with vec_perm and vec_perm_const.

2011-10-13 Thread rth

From: Richard Henderson 

---
 gcc/doc/md.texi |6 ++
 gcc/genopinit.c |1 +
 gcc/optabs.c|  216 ---
 gcc/optabs.h|   12 ++-
 gcc/tree-vect-generic.c |2 +-
 5 files changed, 181 insertions(+), 56 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fe27210..68a5548 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4041,6 +4041,12 @@ be computed modulo @math{2*@var{N}}.  Note that if
 @code{rtx_equal_p(operand1, operand2)}, this can be implemented
 with just operand 1 and selector elements modulo @var{N}.
 
+In order to make things easy for a number of targets, if there is no
+@samp{vec_perm} pattern for mode @var{m}, but there is for mode @var{q}
+where @var{q} is a vector of @code{QImode} of the same width as @var{m},
+the middle-end will lower the mode @var{m} @code{VEC_PERM_EXPR} to
+mode @var{q}.
+
 @cindex @code{vec_perm_const@var{m}} instruction pattern
 @item @samp{vec_perm_const@var{m}}
 Like @samp{vec_perm} except that the permutation is a compile-time
diff --git a/gcc/genopinit.c b/gcc/genopinit.c
index 4eefa03..d40e4c4 100644
--- a/gcc/genopinit.c
+++ b/gcc/genopinit.c
@@ -254,6 +254,7 @@ static const char * const optabs[] =
   "set_optab_handler (vec_shr_optab, $A, CODE_FOR_$(vec_shr_$a$))",
   "set_optab_handler (vec_realign_load_optab, $A, 
CODE_FOR_$(vec_realign_load_$a$))",
   "set_direct_optab_handler (vec_perm_optab, $A, CODE_FOR_$(vec_perm$a$))",
+  "set_direct_optab_handler (vec_perm_const_optab, $A, 
CODE_FOR_$(vec_perm_const$a$))",
   "set_convert_optab_handler (vcond_optab, $A, $B, CODE_FOR_$(vcond$a$b$))",
   "set_convert_optab_handler (vcondu_optab, $A, $B, CODE_FOR_$(vcondu$a$b$))",
   "set_optab_handler (ssum_widen_optab, $A, CODE_FOR_$(widen_ssum$I$a3$))",
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e112467..e9a23f4 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6687,87 +6687,203 @@ vector_compare_rtx (tree cond, bool unsignedp, enum 
insn_code icode)
 
 /* Return true if VEC_PERM_EXPR can be expanded using SIMD extensions
of the CPU.  */
+
 bool
-expand_vec_perm_expr_p (enum machine_mode mode, tree v0, tree v1, tree mask)
+can_vec_perm_expr_p (tree type, tree sel)
 {
-  int v0_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (v0;
-  int mask_mode_s = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE 
(mask;
+  enum machine_mode mode, qimode;
+  mode = TYPE_MODE (type);
+
+  /* If the target doesn't implement a vector mode for the vector type,
+ then no operations are supported.  */
+  if (!VECTOR_MODE_P (mode))
+return false;
+
+  if (TREE_CODE (sel) == VECTOR_CST)
+{
+  if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing
+ && targetm.vectorize.builtin_vec_perm_ok (type, sel))
+   return true;
+}
 
-  if (TREE_CODE (mask) == VECTOR_CST
-  && targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), mask))
+  if (direct_optab_handler (vec_perm_optab, mode) != CODE_FOR_nothing)
 return true;
 
-  if (v0_mode_s != mask_mode_s
-  || TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
-!= TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
-  || TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
-!= TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+  /* We allow fallback to a QI vector mode, and adjust the mask.  */
+  qimode = mode_for_vector (QImode, GET_MODE_SIZE (mode));
+  if (!VECTOR_MODE_P (qimode))
 return false;
 
-  return direct_optab_handler (vec_perm_optab, mode) != CODE_FOR_nothing;
+  /* ??? For completeness, we ought to check the QImode version of
+  vec_perm_const_optab.  But all users of this implicit lowering
+  feature implement the variable vec_perm_optab.  */
+  if (direct_optab_handler (vec_perm_optab, qimode) == CODE_FOR_nothing)
+return false;
+
+  /* In order to support the lowering of non-constant permutations,
+ we need to support shifts and adds.  */
+  if (TREE_CODE (sel) != VECTOR_CST)
+{
+  if (GET_MODE_UNIT_SIZE (mode) > 2
+ && optab_handler (ashl_optab, mode) == CODE_FOR_nothing
+ && optab_handler (vashl_optab, mode) == CODE_FOR_nothing)
+   return false;
+  if (optab_handler (add_optab, qimode) == CODE_FOR_nothing)
+   return false;
+}
+
+  return true;
 }
 
-/* Generate instructions for VEC_COND_EXPR given its type and three
-   operands.  */
-rtx
-expand_vec_perm_expr (tree type, tree v0, tree v1, tree mask, rtx target)
+/* A subroutine of expand_vec_perm_expr for expanding one vec_perm insn.  */
+
+static rtx
+expand_vec_perm_expr_1 (enum insn_code icode, rtx target,
+   rtx v0, rtx v1, rtx sel)
 {
+  enum machine_mode tmode = GET_MODE (target);
+  enum machine_mode smode = GET_MODE (sel);
   struct expand_operand ops[4];
-  enum insn_code icode;
-  enum machine_mode mode = TYPE_MODE (type);
 
-  gcc_checking_assert (expand_vec_perm_expr_p (mode, v0, v1, mask));
+  create_output_operand (&ops[0

[PATCH 4/6] Move lowering of vector shifts from v/s to v/v to rtl.

2011-10-13 Thread rth

From: Richard Henderson 

This allows other rtl expanders to rely on shifts of vector by scalar.

This replaces the patch posted a couple of days ago that adds these
scalar shifts to the rs6000 backend, following the info that Sparc
needs this fallback as well.
---
 gcc/optabs.c  |   65 
 gcc/testsuite/gcc.dg/vect/vec-scal-opt.c  |2 +-
 gcc/testsuite/gcc.dg/vect/vec-scal-opt1.c |2 +-
 gcc/testsuite/gcc.dg/vect/vec-scal-opt2.c |2 +-
 gcc/testsuite/lib/target-supports.exp |   21 -
 gcc/tree-vect-generic.c   |   66 +
 6 files changed, 88 insertions(+), 70 deletions(-)

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 0ba1333..e112467 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -735,6 +735,41 @@ expand_vec_shift_expr (sepops ops, rtx target)
   return eops[0].value;
 }
 
+/* Create a new vector value in VMODE with all elements set to OP.  The
+   mode of OP must be the element mode of VMODE.  If OP is a constant,
+   then the return value will be a constant.  */
+
+static rtx
+expand_vector_broadcast (enum machine_mode vmode, rtx op)
+{
+  enum insn_code icode;
+  rtvec vec;
+  rtx ret;
+  int i, n;
+
+  gcc_checking_assert (VECTOR_MODE_P (vmode));
+
+  n = GET_MODE_NUNITS (vmode);
+  vec = rtvec_alloc (n);
+  for (i = 0; i < n; ++i)
+RTVEC_ELT (vec, i) = op;
+
+  if (CONSTANT_P (op))
+return gen_rtx_CONST_VECTOR (vmode, vec);
+
+  /* ??? If the target doesn't have a vec_init, then we have no easy way
+ of performing this operation.  Most of this sort of generic support
+ is hidden away in the vector lowering support in gimple.  */
+  icode = optab_handler (vec_init_optab, vmode);
+  if (icode == CODE_FOR_nothing)
+return NULL;
+
+  ret = gen_reg_rtx (vmode);
+  emit_insn (GEN_FCN (icode) (ret, gen_rtx_PARALLEL (vmode, vec)));
+
+  return ret;
+}
+
 /* This subroutine of expand_doubleword_shift handles the cases in which
the effective shift value is >= BITS_PER_WORD.  The arguments and return
value are the same as for the parent routine, except that SUPERWORD_OP1
@@ -1533,6 +1568,36 @@ expand_binop (enum machine_mode mode, optab binoptab, 
rtx op0, rtx op1,
}
 }
 
+  /* If this is a vector shift by a scalar, see if we can do a vector
+ shift by a vector.  If so, broadcast the scalar into a vector.  */
+  if (mclass == MODE_VECTOR_INT)
+{
+  optab otheroptab = NULL;
+
+  if (binoptab == ashl_optab)
+   otheroptab = vashl_optab;
+  else if (binoptab == ashr_optab)
+   otheroptab = vashr_optab;
+  else if (binoptab == lshr_optab)
+   otheroptab = vlshr_optab;
+  else if (binoptab == rotl_optab)
+   otheroptab = vrotl_optab;
+  else if (binoptab == rotr_optab)
+   otheroptab = vrotr_optab;
+
+  if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
+   {
+ rtx vop1 = expand_vector_broadcast (mode, op1);
+ if (vop1)
+   {
+ temp = expand_binop_directly (mode, otheroptab, op0, vop1,
+   target, unsignedp, methods, last);
+ if (temp)
+   return temp;
+   }
+   }
+}
+
   /* Look for a wider mode of the same class for which we think we
  can open-code the operation.  Check for a widening multiply at the
  wider mode as well.  */
diff --git a/gcc/testsuite/gcc.dg/vect/vec-scal-opt.c 
b/gcc/testsuite/gcc.dg/vect/vec-scal-opt.c
index 6514f05..f53e66d 100644
--- a/gcc/testsuite/gcc.dg/vect/vec-scal-opt.c
+++ b/gcc/testsuite/gcc.dg/vect/vec-scal-opt.c
@@ -19,5 +19,5 @@ int main (int argc, char *argv[]) {
return vidx(short, r1, 0);
 }
 
-/* { dg-final { scan-tree-dump-times ">> k.\[0-9_\]*" 1 "veclower2" { target 
vect_shift_scalar } } } */
+/* { dg-final { scan-tree-dump-times ">> k.\[0-9_\]*" 1 "veclower2" } } */
 /* { dg-final { cleanup-tree-dump "veclower2" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vec-scal-opt1.c 
b/gcc/testsuite/gcc.dg/vect/vec-scal-opt1.c
index acab407..4025f67 100644
--- a/gcc/testsuite/gcc.dg/vect/vec-scal-opt1.c
+++ b/gcc/testsuite/gcc.dg/vect/vec-scal-opt1.c
@@ -17,5 +17,5 @@ int main (int argc, char *argv[]) {
return vidx(short, r1, 0);
 }
 
-/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower2" { target 
vect_shift_scalar } } } */
+/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower2" } } */
 /* { dg-final { cleanup-tree-dump "veclower2" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vec-scal-opt2.c 
b/gcc/testsuite/gcc.dg/vect/vec-scal-opt2.c
index cfaf5e0..677836d 100644
--- a/gcc/testsuite/gcc.dg/vect/vec-scal-opt2.c
+++ b/gcc/testsuite/gcc.dg/vect/vec-scal-opt2.c
@@ -16,5 +16,5 @@ int main (int argc, char *argv[]) {
return vidx(short, r1, 0);
 }
 
-/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower2" { target 
vect_shift_scalar } } } */
+/* { dg-final { scan-tree-dump-times ">> 2" 1 "veclower2" } } */
 /

[PATCH 3/6] i386: Implement vec_perm_const.

2011-10-13 Thread rth

From: Richard Henderson 

---
 gcc/config/i386/i386-protos.h |1 +
 gcc/config/i386/i386.c|   61 +
 gcc/config/i386/sse.md|   21 ++
 3 files changed, 83 insertions(+), 0 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index eea038e..bdac6ff 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -124,6 +124,7 @@ extern bool ix86_expand_fp_movcc (rtx[]);
 extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
 extern void ix86_expand_vec_perm (rtx[]);
+extern bool ix86_expand_vec_perm_const (rtx[]);
 extern void ix86_expand_sse_unpack (rtx[], bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a81292b..df6267b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -36132,6 +36132,67 @@ ix86_expand_vec_perm_builtin (tree exp)
   return CONST0_RTX (d.vmode);
 }
 
+bool
+ix86_expand_vec_perm_const (rtx operands[4])
+{
+  struct expand_vec_perm_d d;
+  int i, nelt, which;
+  rtx sel;
+
+  d.target = operands[0];
+  d.op0 = operands[1];
+  d.op1 = operands[2];
+  sel = operands[3];
+
+  d.vmode = GET_MODE (d.target);
+  gcc_assert (VECTOR_MODE_P (d.vmode));
+  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
+  d.testing_p = false;
+
+  gcc_assert (GET_CODE (sel) == CONST_VECTOR);
+  gcc_assert (XVECLEN (sel, 0) == nelt);
+
+  for (i = which = 0; i < nelt; ++i)
+{
+  rtx e = XVECEXP (sel, 0, i);
+  int ei = INTVAL (e) & (2 * nelt - 1);
+
+  which |= (ei < nelt ? 1 : 2);
+  d.perm[i] = ei;
+}
+
+  switch (which)
+{
+default:
+  gcc_unreachable();
+
+case 3:
+  if (!rtx_equal_p (d.op0, d.op1))
+   break;
+
+  /* The elements of PERM do not suggest that only the first operand
+is used, but both operands are identical.  Allow easier matching
+of the permutation by folding the permutation into the single
+input vector.  */
+  for (i = 0; i < nelt; ++i)
+   if (d.perm[i] >= nelt)
+ d.perm[i] -= nelt;
+  /* FALLTHRU */
+
+case 1:
+  d.op1 = d.op0;
+  break;
+
+case 2:
+  for (i = 0; i < nelt; ++i)
+d.perm[i] -= nelt;
+  d.op0 = d.op1;
+  break;
+}
+
+  return ix86_expand_vec_perm_builtin_1 (&d);
+}
+
 /* Implement targetm.vectorize.builtin_vec_perm_ok.  */
 
 static bool
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5bf30a8..d5e2de5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6236,6 +6236,27 @@
   DONE;
 })
 
+(define_mode_iterator VEC_PERM_CONST
+  [(V4SF "TARGET_SSE") (V4SI "TARGET_SSE")
+   (V2DF "TARGET_SSE") (V2DI "TARGET_SSE")
+   (V16QI "TARGET_SSE2") (V8HI "TARGET_SSE2")
+   (V8SF "TARGET_AVX") (V4DF "TARGET_AVX")
+   (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")
+   (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2")])
+
+(define_expand "vec_perm_const"
+  [(match_operand:VEC_PERM_CONST 0 "register_operand" "")
+   (match_operand:VEC_PERM_CONST 1 "register_operand" "")
+   (match_operand:VEC_PERM_CONST 2 "register_operand" "")
+   (match_operand: 3 "" "")]
+  ""
+{
+  if (ix86_expand_vec_perm_const (operands))
+DONE;
+  else
+FAIL;
+})
+
 ;
 ;;
 ;; Parallel bitwise logical operations
-- 
1.7.6.4

[PATCH 5/6] rs6000: Fix typo in rs6000_expand_vector_init

2011-10-13 Thread rth

From: Richard Henderson 

Of course we don't support vectors of size <= 4.
We're supposed to be checking the vector element size.
---
 gcc/config/rs6000/rs6000.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4fd2192..aee976c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4758,7 +4758,7 @@ rs6000_expand_vector_init (rtx target, rtx vals)
 
   /* Store value to stack temp.  Load vector element.  Splat.  However, splat
  of 64-bit items is not supported on Altivec.  */
-  if (all_same && GET_MODE_SIZE (mode) <= 4)
+  if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
 {
   mem = assign_stack_temp (mode, GET_MODE_SIZE (inner_mode), 0);
   emit_move_insn (adjust_address_nv (mem, inner_mode, 0),
-- 
1.7.6.4

[PATCH 2/6] spu: Implement vec_permv16qi.

2011-10-13 Thread rth

From: Richard Henderson 

---
 gcc/config/spu/spu.md |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/gcc/config/spu/spu.md b/gcc/config/spu/spu.md
index 676d54e..00cfaa4 100644
--- a/gcc/config/spu/spu.md
+++ b/gcc/config/spu/spu.md
@@ -4395,6 +4395,18 @@ selb\t%0,%4,%0,%3"
   "shufb\t%0,%1,%2,%3"
   [(set_attr "type" "shuf")])
 
+(define_expand "vec_permv16qi"
+  [(set (match_operand:V16QI 0 "spu_reg_operand" "")
+   (unspec:V16QI
+ [(match_operand:V16QI 1 "spu_reg_operand" "")
+  (match_operand:V16QI 2 "spu_reg_operand" "")
+  (match_operand:V16QI 3 "spu_reg_operand" "")]
+ UNSPEC_SHUFB))]
+  ""
+  {
+operands[3] = gen_lowpart (TImode, operands[3]);
+  })
+
 (define_insn "nop"
   [(unspec_volatile [(const_int 0)] UNSPECV_NOP)]
   ""
-- 
1.7.6.4

[PATCH 0/6] Cleanups for generic vector permutation.

2011-10-13 Thread rth

From: Richard Henderson 

These patches allow __builtin_shuffle to handle any vector permutation
via optabs.  It allows for a not-uncommon fallback to byte permutation
at rtl expansion time, while leaving the tree/gimple-level permutation
as element-based.

All three targets which heretofore supported vector permutation in any
way have been updated to support the new optabs.

The next step is to convert the vectorizer to use the VEC_PERM_EXPR code
rather than using the hook that returns builtins.  Once that is done,
it would be possible for the targets to delete the builtins.  Supposing
that they're not exposed for user-level consumption (which is the case
for i386; the user-level interface is via inlines in a header file, which
can be updated to use __builtin_shuffle).

Tested on x86_64-linux, --with-cpu=corei7.
Tested on ppc64-linux, --with-cpu=G5.

Committed.


r~


Richard Henderson (6):
  rs6000: Implement vec_permv16qi.
  spu: Implement vec_permv16qi.
  i386: Implement vec_perm_const.
  Move lowering of vector shifts from v/s to v/v to rtl.
  rs6000: Fix typo in rs6000_expand_vector_init
  Expand vector permutation with vec_perm and vec_perm_const.

 gcc/config/i386/i386-protos.h |1 +
 gcc/config/i386/i386.c|   61 +++
 gcc/config/i386/sse.md|   21 +++
 gcc/config/rs6000/altivec.md  |9 +
 gcc/config/rs6000/rs6000.c|2 +-
 gcc/config/spu/spu.md |   12 ++
 gcc/doc/md.texi   |6 +
 gcc/genopinit.c   |1 +
 gcc/optabs.c  |  281 -
 gcc/optabs.h  |   12 +-
 gcc/testsuite/gcc.dg/vect/vec-scal-opt.c  |2 +-
 gcc/testsuite/gcc.dg/vect/vec-scal-opt1.c |2 +-
 gcc/testsuite/gcc.dg/vect/vec-scal-opt2.c |2 +-
 gcc/testsuite/lib/target-supports.exp |   21 ---
 gcc/tree-vect-generic.c   |   68 +++-
 15 files changed, 374 insertions(+), 127 deletions(-)

-- 
1.7.6.4

[PATCH 1/6] rs6000: Implement vec_permv16qi.

2011-10-13 Thread rth

From: Richard Henderson 

---
 gcc/config/rs6000/altivec.md |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 9e7437e..84c5444 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1357,6 +1357,15 @@
   "vperm %0,%1,%2,%3"
   [(set_attr "type" "vecperm")])
 
+(define_expand "vec_permv16qi"
+  [(set (match_operand:V16QI 0 "register_operand" "")
+   (unspec:V16QI [(match_operand:V16QI 1 "register_operand" "")
+  (match_operand:V16QI 2 "register_operand" "")
+  (match_operand:V16QI 3 "register_operand" "")]
+ UNSPEC_VPERM))]
+  "TARGET_ALTIVEC"
+  "")
+
 (define_insn "altivec_vrfip"   ; ceil
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "v")]
-- 
1.7.6.4

[C++ Patch / RFC] PR 38174

2011-10-13 Thread Paolo Carlini


Hi,

so, assuming I understood correctly Jason's tips (thanks again for your 
patience ;) the fix for this pretty old issue seems even simpler than I 
guessed at triage time, because we already have available 
composite_pointer_type, doing all the real work.

The below passes the testsuite on x86_64-linux.

What do you think?

Paolo.

/
Index: testsuite/g++.dg/overload/operator4.C
===
--- testsuite/g++.dg/overload/operator4.C   (revision 0)
+++ testsuite/g++.dg/overload/operator4.C   (revision 0)
@@ -0,0 +1,14 @@
+// PR c++/38174
+
+struct VolatileIntPtr {
+  operator int volatile *();
+};
+
+struct ConstIntPtr {
+  operator int const *();
+};
+
+void test_with_ptrs(VolatileIntPtr vip, ConstIntPtr cip) {
+  bool b1 = (vip == cip);
+  long p1 = vip - cip;
+}
Index: cp/call.c
===
--- cp/call.c   (revision 179947)
+++ cp/call.c   (working copy)
@@ -2582,6 +2582,23 @@ add_builtin_candidate (struct z_candidate **candid
  || MAYBE_CLASS_TYPE_P (type1)
  || TREE_CODE (type1) == ENUMERAL_TYPE))
 {
+  if ((TYPE_PTR_P (type1) && TYPE_PTR_P (type2))
+ || (TYPE_PTRMEM_P (type1) && TYPE_PTRMEM_P (type2))
+ || TYPE_PTRMEMFUNC_P (type1))
+   {
+ tree cptype = composite_pointer_type (type1, type2,
+   error_mark_node,
+   error_mark_node,
+   CPO_CONVERSION,
+   tf_none);
+ if (cptype != error_mark_node)
+   {
+ build_builtin_candidate
+   (candidates, fnname, cptype, cptype, args, argtypes, flags);
+ return;
+   }
+   }
+
   build_builtin_candidate
(candidates, fnname, type1, type1, args, argtypes, flags);
   build_builtin_candidate

Re: [google] support for building Linux kernel with FDO (issue4523061)

2011-10-13 Thread vulcansh



Rong Xu wrote:
> 
> That will be good.
> But you never know, we internally have fixed some bugs that filed to
> us because people use kernel's old gcov code (many versions guarded by
> ifdef) for their tests.
> 
> -Rong
> 

Has there been any progress one this patch?  What version of gcc is this
patch for?  I am interested in something that works with gcc 4.7.

-Steve

-- 
View this message in context: 
http://old.nabble.com/-google---support-for-building-Linux-kernel-with-FDO-%28issue4523061%29-tp31607746p32649731.html
Sent from the gcc - patches mailing list archive at Nabble.com.

[v3] libstdc++/50714

2011-10-13 Thread Paolo Carlini


Hi,

tested x86_64-linux, committed to mainline.

Thanks,
Paolo.

/
2011-10-13  Paolo Carlini  

PR libstdc++/50714
* include/bits/codecvt.h (codecvt<>::codecvt(size_t)): Initialize
_M_c_locale_codecvt member.
* testsuite/22_locale/codecvt_byname/50714.cc: New.
Index: include/bits/codecvt.h
===
--- include/bits/codecvt.h  (revision 179947)
+++ include/bits/codecvt.h  (working copy)
@@ -292,7 +292,9 @@
 
   explicit
   codecvt(size_t __refs = 0)
-  : __codecvt_abstract_base<_InternT, _ExternT, _StateT> (__refs) { }
+  : __codecvt_abstract_base<_InternT, _ExternT, _StateT> (__refs),
+   _M_c_locale_codecvt(0)
+  { }
 
   explicit
   codecvt(__c_locale __cloc, size_t __refs = 0);
Index: testsuite/22_locale/codecvt_byname/50714.cc
===
--- testsuite/22_locale/codecvt_byname/50714.cc (revision 0)
+++ testsuite/22_locale/codecvt_byname/50714.cc (revision 0)
@@ -0,0 +1,94 @@
+// { dg-require-namedlocale "de_DE" }
+
+// Copyright (C) 2011 Free Software Foundation
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+
+#define mychar unsigned short
+
+namespace std
+{
+  template<> codecvt::~codecvt()
+  { }
+
+  template<>
+  codecvt::result
+  codecvt::
+  do_out(state_type&, const intern_type*, const intern_type*,
+const intern_type*&, extern_type*, extern_type*,
+extern_type*&) const
+  { return codecvt_base::ok; }
+
+  template<>
+  codecvt::result
+  codecvt::
+  do_in(state_type&, const extern_type*, const extern_type*,
+   const extern_type*&, intern_type*, intern_type*,
+   intern_type*&) const
+  { return codecvt_base::ok; }
+
+  template<>
+  codecvt::result
+  codecvt::
+  do_unshift(state_type&, extern_type*, extern_type*,
+extern_type*&) const
+  { return noconv; }
+
+  template<>
+  int
+  codecvt::do_encoding() const
+  { return 0; }
+
+  template<>
+  bool
+  codecvt::do_always_noconv() const
+  { return false; }
+
+  template<>
+  int
+  codecvt::
+  do_length(state_type&, const extern_type*, const extern_type*,
+   size_t) const
+  { return 0; }
+
+  template<>
+  int
+  codecvt::do_max_length() const
+  { return 4; }
+}
+
+// libstdc++/50714
+void test01()
+{
+  using namespace std;
+
+  {
+locale loc(locale::classic(),
+  new codecvt());
+  }
+  {
+locale loc2(locale::classic(),
+   new codecvt_byname("de_DE"));
+  }
+}
+
+int main()
+{
+  test01();
+  return 0;
+}

RE: ObjC/ObjC++ Patch: rewrite objc/objc++ frontend hashtables

2011-10-13 Thread Nicola Pero

I actually forgot to post a tiny bit that is required to support
the additional objc/objc-map.h and objc/objc-map.c files.  It's
part of the same patch.  Apologies.

Thanks

Index: gengtype.c
===
--- gengtype.c  (revision 179947)
+++ gengtype.c  (working copy)
@@ -1817,6 +1817,11 @@ struct file_rule_st files_rules[] = {
 REG_EXTENDED, NULL_REGEX,
 "gt-objc-objc-act.h", "objc/objc-act.c", NULL_FRULACT },
 
+  /* objc/objc-map.h gives gt-objc-objc-map.h for objc/objc-map.c !  */
+  { DIR_PREFIX_REGEX "objc/objc-map\\.h$",
+REG_EXTENDED, NULL_REGEX,
+"gt-objc-objc-map.h", "objc/objc-map.c", NULL_FRULACT },
+
   /* General cases.  For header *.h and source *.c files, we need
* special actions to handle the language.  */
 
Index: ChangeLog
===
--- ChangeLog   (revision 179947)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2011-10-14  Nicola Pero  
+
+   * gengtype.c (files_rules): Added rules for objc/objc-map.h and
+   objc/objc-map.c.
+
 2011-10-13  Jakub Jelinek  
 
* config/i386/sse.md (vec_set): Change V_128 iterator mode

ObjC/ObjC++ Patch: rewrite objc/objc++ frontend hashtables

2011-10-13 Thread Nicola Pero

This patch finally rewrites the hashtables used by the ObjC (and ObjC++) 
frontend.  The
new code speeds up the compiler by about 4% when compiling the standard GNUstep 
ObjC
system headers with -fsyntax-only.  That's quite good for a change that does 
nothing
but swap a hashtable implementation with another one.

PS: This also supersedes the two small ObjC hashtable patches that I sent in 
the past 12
months or so and that were never applied.  The hashtable implemented by the 
current patch 
is polished and fast.

Bootstrapped and regtested on gnu-linux i686.

Ok to commit ?

Thanks

Index: gcc/objc/ChangeLog
===
--- gcc/objc/ChangeLog  (revision 179864)
+++ gcc/objc/ChangeLog  (working copy)
@@ -1,3 +1,54 @@
+2011-10-14  Nicola Pero  
+
+   * objc-map.h: New file.
+   * objc-map.c: New file. 
+   * config-lang.in (gtfiles): Added objc-map.h.
+   * Make-lang.in (OBJC_OBJS): Added objc-map.o.
+   (objc/objc-map.o): New rule.
+   (objc/objc-act.o): Depend on objc/objc-map.h.
+   * objc-next-runtime-abi-02.c: Added a TODO comment.
+   * objc-act.c: Include objc-map.h.
+   (nst_method_hash_list, cls_method_hash_list): Removed.
+   (instance_method_map, class_method_map): New.
+   (cls_name_hash_list, als_name_hash_list): Removed.
+   (class_name_map, alias_name_map): Removed.
+   (ivar_offset_hash_list): Removed.
+   (hash_class_name_enter, hash_class_name_lookup, hash_enter,
+   hash_lookup, hash_add_attr, add_method_to_hash_list): Removed.
+   (interface_hash_init): New.
+   (objc_init): Call interface_hash_init.
+   (objc_write_global_declarations): Iterate over class_method_map
+   and instance_method_map instead of cls_method_hash_list and
+   nst_method_hash_list.
+   (objc_declare_alias): Use alias_name_map instead of
+   cls_name_hash_list.
+   (objc_is_class_name): Use class_name_map and alias_name_map
+   instead of cls_name_hash_list and als_name_hash_list.
+   (interface_tuple, interface_htab, hash_interface, eq_interface):
+   Removed.
+   (interface_map): New.
+   (add_class): Renamed to add_interface.  Use interface_map instead
+   of interface_htab.
+   (lookup_interface): Use interface_map instead of interface_htab.
+   (check_duplicates): Changed first argument to be a tree,
+   potentially a TREE_VEC, instead of a hash.  Changed implementation
+   to match.
+   (lookup_method_in_hash_lists): Use class_method_map and
+   instance_method_map instead of cls_method_hash_list and
+   nst_method_hash_list.
+   (objc_build_selector_expr): Likewise.
+   (hash_func): Removed.
+   (hash_init): Create instance_method_map, class_method_map,
+   class_name_map, and alias_name_map.  Do not create
+   nst_method_hash_list, cls_method_hash_list, cls_name_hash_list,
+   als_name_hash_list, and ivar_offset_hash_list.
+   (insert_method_into_method_map): New.
+   (objc_add_method): Use insert_method_into_method_map instead of
+   add_method_to_hash_list.
+   (start_class): Call add_interface instead of add_class.
+   * objc-act.h (cls_name_hash_list, als_name_hash_list,
+   nst_method_hash_list, cls_method_hash_list): Removed.
+
 2011-10-11  Michael Meissner  
 
* objc-next-runtime-abi-01.c (objc_build_exc_ptr): Delete old
Index: gcc/objc/config-lang.in
===
--- gcc/objc/config-lang.in (revision 179864)
+++ gcc/objc/config-lang.in (working copy)
@@ -36,4 +36,4 @@ lang_requires="c"
 # Order is important.  If you change this list, make sure you test
 # building without C++ as well; that is, remove the gcc/cp directory,
 # and build with --enable-languages=c,objc.
-gtfiles="\$(srcdir)/c-family/c-objc.h \$(srcdir)/objc/objc-act.h 
\$(srcdir)/objc/objc-act.c \$(srcdir)/objc/objc-runtime-shared-support.c 
\$(srcdir)/objc/objc-gnu-runtime-abi-01.c 
\$(srcdir)/objc/objc-next-runtime-abi-01.c 
\$(srcdir)/objc/objc-next-runtime-abi-02.c \$(srcdir)/c-parser.c 
\$(srcdir)/c-tree.h \$(srcdir)/c-decl.c \$(srcdir)/c-lang.h 
\$(srcdir)/c-objc-common.c \$(srcdir)/c-family/c-common.c 
\$(srcdir)/c-family/c-common.h \$(srcdir)/c-family/c-cppbuiltin.c 
\$(srcdir)/c-family/c-pragma.h \$(srcdir)/c-family/c-pragma.c"
+gtfiles="\$(srcdir)/objc/objc-map.h \$(srcdir)/c-family/c-objc.h 
\$(srcdir)/objc/objc-act.h \$(srcdir)/objc/objc-act.c 
\$(srcdir)/objc/objc-runtime-shared-support.c 
\$(srcdir)/objc/objc-gnu-runtime-abi-01.c 
\$(srcdir)/objc/objc-next-runtime-abi-01.c 
\$(srcdir)/objc/objc-next-runtime-abi-02.c \$(srcdir)/c-parser.c 
\$(srcdir)/c-tree.h \$(srcdir)/c-decl.c \$(srcdir)/c-lang.h 
\$(srcdir)/c-objc-common.c \$(srcdir)/c-family/c-common.c 
\$(srcdir)/c-family/c-common.h \$(srcdir)/c-family/c-cppbuiltin.c 
\$(srcdir)/c-family/c-pragma.h \$(srcdir)/c-family/c-pragma.c"
Index: gcc/objc/Make-lang

Re: [C++ Patch] PR 17212

2011-10-13 Thread Paolo Carlini


On 10/13/2011 04:24 PM, Jason Merrill wrote:

On 10/13/2011 09:53 AM, Paolo Carlini wrote:
Yes I briefly wondered that but I know *so* little about that front 
end... Do you think we can just add it? Probably yes ;)
Definitely.  Anything supported in C++ should also be in Obj-C++ by 
default.

Ok, many thanks to Mike too for the additional clarifications.

I tested on x86_64-linux the below. Ok for mainline?

Thanks,
Paolo.

//
/gcc
2011-10-13  Paolo Carlini  

PR c++/17212
* c-family/c.opt ([Wformat-zero-length]): Add C++ and Objective-C++.
* doc/invoke.texi: Update.

/testsuite
2011-10-13  Paolo Carlini  

PR c++/17212
* g++.dg/warn/format6.C: New.
* obj-c++.dg/warn6.mm: Likewise.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 179947)
+++ doc/invoke.texi (working copy)
@@ -3190,7 +3190,7 @@ in the case of @code{scanf} formats, this option w
 warning if the unused arguments are all pointers, since the Single
 Unix Specification says that such unused arguments are allowed.
 
-@item -Wno-format-zero-length @r{(C and Objective-C only)}
+@item -Wno-format-zero-length @r{(C, C++, Objective-C and Objective-C++ only)}
 @opindex Wno-format-zero-length
 @opindex Wformat-zero-length
 If @option{-Wformat} is specified, do not warn about zero-length formats.
Index: c-family/c.opt
===
--- c-family/c.opt  (revision 179947)
+++ c-family/c.opt  (working copy)
@@ -396,7 +396,7 @@ C ObjC C++ ObjC++ Var(warn_format_y2k) Warning
 Warn about strftime formats yielding 2-digit years
 
 Wformat-zero-length
-C ObjC Var(warn_format_zero_length) Warning
+C ObjC C++ ObjC++ Var(warn_format_zero_length) Warning
 Warn about zero-length formats
 
 Wformat=
Index: testsuite/g++.dg/warn/format6.C
===
--- testsuite/g++.dg/warn/format6.C (revision 0)
+++ testsuite/g++.dg/warn/format6.C (revision 0)
@@ -0,0 +1,7 @@
+// PR c++/17212
+// { dg-options "-Wformat -Wno-format-zero-length" }
+
+void f()
+{
+  __builtin_printf("");
+}
Index: testsuite/obj-c++.dg/warn6.mm
===
--- testsuite/obj-c++.dg/warn6.mm   (revision 0)
+++ testsuite/obj-c++.dg/warn6.mm   (revision 0)
@@ -0,0 +1,7 @@
+// PR c++/17212
+// { dg-options "-Wformat -Wno-format-zero-length" }
+
+void f()
+{
+  __builtin_printf("");
+}

Re: [PR50672, PATCH] Fix ice triggered by -ftree-tail-merge: verify_ssa failed: no immediate_use list

2011-10-13 Thread Tom de Vries

On 10/12/2011 02:19 PM, Richard Guenther wrote:
> On Wed, Oct 12, 2011 at 8:35 AM, Tom de Vries  wrote:
>> Richard,
>>
>> I have a patch for PR50672.
>>
>> When compiling the testcase from the PR with -ftree-tail-merge, the scenario 
>> is
>> as follows:
>>
>> We start out tail_merge_optimize with blocks 14 and 20, which are alike, but 
>> not
>> equal, since they have different successors:
>> ...
>>  # BLOCK 14 freq:690
>>  # PRED: 25 [61.0%]  (false,exec)
>>
>>  if (wD.2197_57(D) != 0B)
>>goto ;
>>  else
>>goto ;
>>  # SUCC: 15 [78.4%]  (true,exec) 16 [21.6%]  (false,exec)
>>
>>
>>  # BLOCK 20 freq:2900
>>  # PRED: 29 [100.0%]  (fallthru) 31 [100.0%]  (fallthru)
>>
>>  # .MEMD.2447_209 = PHI <.MEMD.2447_125(29), .MEMD.2447_129(31)>
>>  if (wD.2197_57(D) != 0B)
>>goto ;
>>  else
>>goto ;
>>  # SUCC: 5 [85.0%]  (true,exec) 6 [15.0%]  (false,exec)
>> ...
>>
>> In the first iteration, we merge block 5 with block 15 and block 6 with block
>> 16. After that, the blocks 14 and 20 are equal.
>>
>> In the second iteration, the blocks 14 and 20 are merged, by redirecting the
>> incoming edges of block 20 to block 14, and removing block 20.
>>
>> Block 20 also contains the definition of .MEMD.2447_209. Removing the 
>> definition
>> delinks the vuse of .MEMD.2447_209 in block 5:
>> ...
>>  # BLOCK 5 freq:6036
>>  # PRED: 20 [85.0%]  (true,exec)
>>
>>  # PT = nonlocal escaped
>>  D.2306_58 = &thisD.2200_10(D)->D.2156;
>>  # .MEMD.2447_132 = VDEF <.MEMD.2447_209>
>>  # USE = anything
>>  # CLB = anything
>>  drawLineD.2135 (D.2306_58, wD.2197_57(D), gcD.2198_59(D));
>>  goto ;
>>  # SUCC: 17 [100.0%]  (fallthru,exec)
>> ...
> 
> And block 5 is retained and block 15 is discarded?
> 

Indeed.

>> After the pass, when executing the TODO_update_ssa_only_virtuals, we update 
>> the
>> drawLine call in block 5 using rewrite_update_stmt, which calls
>> maybe_replace_use for the vuse operand.
>>
>> However, maybe_replace_use doesn't have an effect since the old vuse and the 
>> new
>> vuse happen to be the same (rdef == use), so SET_USE is not called and the 
>> vuse
>> remains delinked:
>> ...
>>  if (rdef && rdef != use)
>>SET_USE (use_p, rdef);
>> ...
>>
>> The patch fixes this by forcing SET_USE for delinked uses.
> 
> That isn't the correct fix.  Whoever unlinks the vuse (by removing its
> definition) has to replace it with something valid, which is either the
> bare symbol .MEM, or the VUSE associated with the removed VDEF
> (thus, as unlink_stmt_vdef does).
> 

Another try. For each deleted bb, we call unlink_stmt_vdef for the statements,
and replace the .MEM phi uses with the bare .MEM symbol.

Bootstrapped and reg-tested on x86_64.

Ok for trunk?

Thanks,
- Tom

> Richard.
> 


2011-10-14  Tom de Vries  

PR tree-optimization/50672
* tree-ssa-tail-merge.c (release_vdefs): New function.
(purge_bbs): Add update_vops parameter.  Call release_vdefs for each
deleted basic block.
(tail_merge_optimize): Add argument to call to purge_bbs.
Index: gcc/tree-ssa-tail-merge.c
===
--- gcc/tree-ssa-tail-merge.c (revision 179773)
+++ gcc/tree-ssa-tail-merge.c (working copy)
@@ -773,18 +773,53 @@ same_succ_flush_bbs (bitmap bbs)
 }
 }
 
+/* Release all vdefs in BB, either normal or phi results.  */
+
+static void
+release_vdefs (basic_block bb)
+{
+  gimple_stmt_iterator i;
+
+  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
+{
+  gimple phi = gsi_stmt (i);
+  tree res = gimple_phi_result (phi);
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  gimple use_stmt;
+
+  if (is_gimple_reg (res))
+	continue;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, res)
+	{
+	  FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+	SET_USE (use_p, SSA_NAME_VAR (res));
+	}
+}
+  
+  for (i = gsi_last_bb (bb); !gsi_end_p (i); gsi_prev_nondebug (&i))
+unlink_stmt_vdef (gsi_stmt (i));
+}
+
 /* Delete all deleted_bbs.  */
 
 static void
-purge_bbs (void)
+purge_bbs (bool update_vops)
 {
   unsigned int i;
   bitmap_iterator bi;
+  basic_block bb;
 
   same_succ_flush_bbs (deleted_bbs);
 
   EXECUTE_IF_SET_IN_BITMAP (deleted_bbs, 0, i, bi)
-delete_basic_block (BASIC_BLOCK (i));
+{
+  bb = BASIC_BLOCK (i);
+  if (!update_vops)
+	release_vdefs (bb);
+  delete_basic_block (bb);
+}
 
   bitmap_and_compl_into (deleted_bb_preds, deleted_bbs);
   bitmap_clear (deleted_bbs);
@@ -1665,7 +1700,7 @@ tail_merge_optimize (unsigned int todo)
 	break;
 
   free_dominance_info (CDI_DOMINATORS);
-  purge_bbs ();
+  purge_bbs (update_vops);
 
   if (iteration_nr == max_iterations)
 	break;

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 3:52 PM, Richard Kenner
 wrote:
>> Like ths?
>
> Yes, that's what I meant.  Thanks.
>
> Again, I'd suggest doing some performance testing on this just to verify
> that it doesn't pessimize things.
>

I will run SPEC CPU 2K/2006 on ia32, x86-64 and x32.

-- 
H.J.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> Like ths?

Yes, that's what I meant.  Thanks.

Again, I'd suggest doing some performance testing on this just to verify
that it doesn't pessimize things.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 3:33 PM, Richard Kenner
 wrote:
>> I am testing this patch.  The difference is it checks nonzero
>> bits of the first operand.
>
> I would suggest moving (and expanding) the comments from the existing block
> into your new block.
>

Like ths?

-- 
H.J.
---
diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..4b57b88 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7739,16 +7739,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
 XEXP (XEXP (x, 0), 1)));
}

-  /* If the constant is one less than a power of two, this might be
-representable by an extraction even if no shift is present.
-If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
-we are in a COMPARE.  */
-  else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
-   new_rtx = make_extraction (mode,
-  make_compound_operation (XEXP (x, 0),
-   next_code),
-  0, NULL_RTX, i, 1, 0, in_code == COMPARE);
-
   /* If we are in a comparison and this is an AND with a power of two,
 convert this into the appropriate bit extract.  */
   else if (in_code == COMPARE
@@ -7758,6 +7748,26 @@ make_compound_operation (rtx x, enum rtx_code in_code)
next_code),
   i, NULL_RTX, 1, 1, 0, 1);

+  /* If the constant is an extraction mask with the zero bits in
+the first operand ignored, this might be representable by an
+extraction even if no shift is present.  If it doesn't end up
+being a ZERO_EXTEND, we will ignore it unless we are in a
+COMPARE.  */
+  else
+   {
+ unsigned HOST_WIDE_INT nonzero =
+   nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0)));
+ unsigned HOST_WIDE_INT mask = UINTVAL (XEXP (x, 1));
+ unsigned HOST_WIDE_INT len = ceil_log2 (mask);
+ if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1))
+ == (nonzero & mask))
+   {
+ new_rtx = make_compound_operation (XEXP (x, 0), next_code);
+ new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX,
+len, 1, 0, in_code == COMPARE);
+   }
+   }
+
   break;

 case LSHIFTRT:

Re: Ping shrink wrap patches

2011-10-13 Thread Alan Modra

On Thu, Oct 13, 2011 at 07:04:59PM +0200, Bernd Schmidt wrote:
> On 10/13/11 18:50, Bernd Schmidt wrote:
> > On 10/13/11 14:27, Alan Modra wrote:
> >> Without the ifcvt
> >> optimization for a function "int foo (int x)" we might have something
> >> like
> >>
> >>  r29 = r3; // save r3 in callee saved reg
> >>  if (some test) goto exit_label
> >>  // main body of foo, calling other functions
> >>  r3 = 0;
> >>  return;
> >> exit_label:
> >>  r3 = 1;
> >>  return;
> >>
> >> Bernd's http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00380.html quite
> >> happily rearranges the r29 assignment to be after the "if", and shrink
> >> wrapping occurs.  With the ifcvt optimization we get
> >>
> >>  r29 = r3; // save r3 in callee saved reg
> >>  r3 = 1;
> >>  if (some test) goto exit_label
> >>  // main body of foo, calling other functions
> >>  r3 = 0;
> >> exit_label:
> >>  return;
> > 
> > I wonder if this can't be described as another case for moving an insn
> > downwards in prepare_shrink_wrap, rather than stopping ifcvt?

Actually, I don't stop ifcvt completely, just on the ifcvt passes
before function prologue/epilogue is emitted and only on a very
restricted set of blocks.  Also, x86 and any other target that don't
define both FUNCTION_VALUE_REGNO_P and FUNCTION_ARG_REGNO_P, won't be
affected at all.

> I.e. something like this? Minimally tested by inspecting some generated
> assembly. I haven't found a case where it enables extra shrink-wrapping
> on i686, but maybe it's different on ppc?

It certainly is different on ppc. :)  x86 doesn't even run into this
issue, since x86 function args are passed on the stack they won't be
trashed by ifcvt setting up function return values early.

I'll try out your patch, but I think it likely won't work.
prepare_shrink_wrap pushes insns down into successor blocks only when
there's just one live edge for the reg, and that's quite unlikely for
an input arg like r3.

> Index: /local/src/egcs/scratch-trunk/gcc/function.c
> ===
> --- /local/src/egcs/scratch-trunk/gcc/function.c  (revision 179848)
> +++ /local/src/egcs/scratch-trunk/gcc/function.c  (working copy)
> @@ -5369,13 +5369,13 @@ static void
>  prepare_shrink_wrap (basic_block entry_block)
>  {
>rtx insn, curr;
> -  FOR_BB_INSNS_SAFE (entry_block, insn, curr)
> +  FOR_BB_INSNS_REVERSE_SAFE (entry_block, insn, curr)
>  {
>basic_block next_bb;
>edge e, live_edge;
>edge_iterator ei;
> -  rtx set, scan;
> -  unsigned destreg, srcreg;
> +  rtx set, src, dst, scan;
> +  unsigned destreg;
>  
>if (!NONDEBUG_INSN_P (insn))
>   continue;
> @@ -5383,12 +5383,14 @@ prepare_shrink_wrap (basic_block entry_b
>if (!set)
>   continue;
>  
> -  if (!REG_P (SET_SRC (set)) || !REG_P (SET_DEST (set)))
> +  src = SET_SRC (set);
> +  dst = SET_DEST (set);
> +  if (!(REG_P (src) || CONSTANT_P (src)) || !REG_P (dst))
>   continue;
> -  srcreg = REGNO (SET_SRC (set));
> -  destreg = REGNO (SET_DEST (set));
> -  if (hard_regno_nregs[srcreg][GET_MODE (SET_SRC (set))] > 1
> -   || hard_regno_nregs[destreg][GET_MODE (SET_DEST (set))] > 1)
> +  destreg = REGNO (dst);
> +  if (hard_regno_nregs[destreg][GET_MODE (dst)] > 1)
> + continue;
> +  if (REG_P (src) && hard_regno_nregs[REGNO (src)][GET_MODE (src)] > 1)
>   continue;
>  
>next_bb = entry_block;
> @@ -5436,7 +5438,8 @@ prepare_shrink_wrap (basic_block entry_b
>   if (REG_NOTE_KIND (link) == REG_INC)
> record_hard_reg_sets (XEXP (link, 0), NULL, &set_regs);
>  
> -   if (TEST_HARD_REG_BIT (set_regs, srcreg)
> +   if ((REG_P (src)
> +&& TEST_HARD_REG_BIT (set_regs, REGNO (src)))
> || reg_referenced_p (SET_DEST (set),
>  PATTERN (scan)))
>   {


-- 
Alan Modra
Australia Development Lab, IBM

Re: [Patch, Darwin] fix PR50699.

2011-10-13 Thread Iain Sandoe



On 13 Oct 2011, at 23:22, Mike Stump wrote:

+/* Add $LDBL128 suffix to long double builtins for ppc darwin.  */

static void
-darwin_patch_builtin (int fncode)
+darwin_patch_builtin (enum built_in_function fncode)


This is a property of the target machine.  DARWIN_PPC is a property  
of the target machine; maybe if (DARWIN_PPC) { } will do what you  
want?


yes, that should be right - there was no reason that this should have  
broken x86 darwin.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> I am testing this patch.  The difference is it checks nonzero
> bits of the first operand.

I would suggest moving (and expanding) the comments from the existing block
into your new block.

Re: [Patch, Darwin] fix PR50699.

2011-10-13 Thread Mike Stump

On Oct 13, 2011, at 8:22 AM, Iain Sandoe wrote:
> .. this looks like an (almost) obvious fix for the bootstrap breakage...

No...

> -/* Add $LDBL128 suffix to long double builtins.  */
> +#if defined (__ppc__) || defined (__ppc64__)

__ppc__ is a property of the host machine.

> +/* Add $LDBL128 suffix to long double builtins for ppc darwin.  */
> 
> static void
> -darwin_patch_builtin (int fncode)
> +darwin_patch_builtin (enum built_in_function fncode)

This is a property of the target machine.  DARWIN_PPC is a property of the 
target machine; maybe if (DARWIN_PPC) { } will do what you want?  If that 
works, Ok with that change.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> But the current code converts (and X 3) into a bit extraction
> since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true
> when UINTVAL (XEXP (x, 1)) == 3.  Should we do it or not?

By adding the test for nonzero bits, you'd potentially be doing the
conversion more often (which is the point of this patch, after all) and
it's therefore necessary to be sure you're not doing it *too* often.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 2:45 PM, H.J. Lu  wrote:
> On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner
>  wrote:
>>> It is because mask 0x is optimized to 0xfffc by keeping track
>>> of non-zero bits in registers and the above code doesn't take that
>>> into account.
>>
>> Then I'd suggest modifying that code so that it does rather than
>> essentially duplicating it.  But I'd recommend running some
>> performance tests to verify that you're not pessimizing things when
>> you do that: this stuff can be very tricky and you want to make sure
>> that you're not converting something like (and X 3) into a bit
>> extraction unnecessarily.
>>
>
> But the current code converts (and X 3) into a bit extraction
> since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true
> when UINTVAL (XEXP (x, 1))  == 3.  Should we do it or not?
>

I am testing this patch.  The difference is it checks nonzero
bits of the first operand.

-- 
H.J.
--

diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..598dee3 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7739,16 +7739,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
 XEXP (XEXP (x, 0), 1)));
}

-  /* If the constant is one less than a power of two, this might be
-representable by an extraction even if no shift is present.
-If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
-we are in a COMPARE.  */
-  else if ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0)
-   new_rtx = make_extraction (mode,
-  make_compound_operation (XEXP (x, 0),
-   next_code),
-  0, NULL_RTX, i, 1, 0, in_code == COMPARE);
-
   /* If we are in a comparison and this is an AND with a power of two,
 convert this into the appropriate bit extract.  */
   else if (in_code == COMPARE
@@ -7758,6 +7748,23 @@ make_compound_operation (rtx x, enum rtx_code in_code)
next_code),
   i, NULL_RTX, 1, 1, 0, 1);

+  /* If we are (and (OP) M) and M is an extraction mask, this is an
+extraction.  */
+  else
+   {
+ unsigned HOST_WIDE_INT nonzero =
+   nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0)));
+ unsigned HOST_WIDE_INT mask = UINTVAL (XEXP (x, 1));
+ unsigned HOST_WIDE_INT len = ceil_log2 (mask);
+ if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1))
+ == (nonzero & mask))
+   {
+ new_rtx = make_compound_operation (XEXP (x, 0), next_code);
+ new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX,
+len, 1, 0, in_code == COMPARE);
+   }
+   }
+
   break;

 case LSHIFTRT:

Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-13 Thread Gabriel Charette

Just looked at the line_table related sections, but see comments below:

On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo  wrote:
>
> Currently, the consistency check done on pre-processor symbols is
> triggering on symbols that are not really problematic (e.g., symbols
> used for double-include guards).
>
> The problem is that in the testsuite, we are refusing to process PPH
> images that fail that test, which means we don't get to test other
> issues.  To avoid this, I changed the error() call to warning().  Seemed
> innocent enough, but there were more problems behind that one:
>
> 1- We do not really try to avoid reading PPH images more than once.
>   This problem is different than the usual double-inclusion guard.
>   For instance, suppose a file foo.pph includes 1.pph, 2.pph and
>   3.pph.  When generating foo.pph, we read all 3 files just once and
>   double-include guards do not need to trigger.  However, if we are
>   later building a TU with:
>        #include 2.pph
>        #include foo.pph
>   we first read 2.pph and when reading foo.pph, we try to read 2.pph
>   again, because it is mentioned in foo.pph's line map table.
>
>   I added a guard in pph_stream_open() so it doesn't try to open the
>   same file more than once, but that meant adjusting some of the
>   assertions while reading the line table.  We should not expect to
>   find foo.pph's line map table exactly like the one we wrote.

That makes sense.

> @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream)
>   int entries_offset = line_table->used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES;
>   enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream);
>
> -  pph_reading_includes++;
> -
>   for (first = true; next_lt_marker != PPH_LINETABLE_END;
>        next_lt_marker = pph_in_linetable_marker (stream))
>     {
> @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream)
>          else
>            lm->included_from += entries_offset;
>
> -         gcc_assert (lm->included_from < (int) line_table->used);
> -

This should still hold, it is impossible that included_from points to
an entry that doesn't exist (i.e. beyond line_table->used), but since
we recalculate it on the previous line, adding entries_offset, this
was just a safe check to make sure everything read makes sense.

>          lm->start_location += pph_loc_offset;
>
>          line_table->used++;
>        }
>     }
>
> -  pph_reading_includes--;
> +  /* We used to expect exactly the same number of entries, but files
> +     included from this PPH file may sometimes not be needed.  For
> +     example,
> +
> +       #include "2.pph"
> +       #include "foo.pph"
> +         +-->  #include "1.pph"
> +               #include "2.pph"
> +               #include "3.pph"
> +
> +     When foo.pph was originally created, the line table was built
> +     with inclusions of 1.pph, 2.pph and 3.pph.  But when compiling
> +     the main translation unit, we include 2.pph before foo.pph, so
> +     the inclusion of 2.pph from foo.pph does nothing.  Leaving the
> +     line table in a different shape than the original compilation.
>
> +     Instead of insisting on getting EXPECTED_IN entries, we expect at
> +     most EXPECTED_IN entries.  */
>   {
>     unsigned int expected_in = pph_in_uint (stream);
> -    gcc_assert (line_table->used - used_before == expected_in);
> +    gcc_assert (line_table->used - used_before <= expected_in);

I'm not sure exactly how you skip headers already parsed now (we
didn't used to when I wrote this code and that was the only problem
remaining in the line_table (i.e. duplicate entries for guarded
headers in the non-pph compile)), but couldn't you count the number of
skipped entries and assert (line_table->used - used_before) +
numSkipped == expected_in) ?

I'd have to re-download the code, I've bee following through patches,
but I'm not so sure now exactly how the "guarded headers skipping" is
done, my memorized knowledge of the codebase has diverged I feel..!

A more important note: I think it could be worth having a new flag
that outputs the line_table when done parsing (as a mean to robustly
test it). My way to test it usually was to breakpoint on
varpool_assemble_decl (a random choice, but it was only called after
parsing was done...), both in pph and non-pph compiles and compare the
line_table in gdb However, to have a stable test in the long run,
it could be nice to have a flag that asks for an output of the
line_table and then we could checksum and compare the line_table
outputted by the pph and non-pph compiles.

A good test I had found to break in and analyze the line_table was
p4eabi.h as it pretty much had all the problems that I fixed regarding
the line_table (it also has re-includes if I remember correctly, but
that wasn't a problem before as we would not guard out re-includes as
I just mentioned above).

Having such a robust test would be important I feel as, as we saw with
previous bugs, discrepa

Re: [PATCH] vec_unpack{s,u}_float_{hi,lo}_{v8hi,v4si} support

2011-10-13 Thread Richard Henderson

On 10/13/2011 02:35 PM, Jakub Jelinek wrote:
>   * config/i386/sse.md (*avx_cvtdq2pd256_2): Rename to...
>   (avx_cvtdq2pd256_2): ... this.
>   (sseunpackfltmode): New mode attr.
>   (vec_unpacks_float_hi_v8hi, vec_unpacks_float_lo_v8hi,
>   vec_unpacku_float_hi_v8hi, vec_unpacku_float_lo_v8hi): Macroize
>   using VI2_AVX2 iterator.
>   (vec_unpacku_float_hi_v8si, vec_unpacku_float_lo_v8si): New
>   expanders.

Ok.


r~

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner
 wrote:
>> It is because mask 0x is optimized to 0xfffc by keeping track
>> of non-zero bits in registers and the above code doesn't take that
>> into account.
>
> Then I'd suggest modifying that code so that it does rather than
> essentially duplicating it.  But I'd recommend running some
> performance tests to verify that you're not pessimizing things when
> you do that: this stuff can be very tricky and you want to make sure
> that you're not converting something like (and X 3) into a bit
> extraction unnecessarily.
>

But the current code converts (and X 3) into a bit extraction
since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) >= 0) is true
when UINTVAL (XEXP (x, 1))  == 3.  Should we do it or not?

-- 
H.J.

[pph] Triage test status. (issue5271044)

2011-10-13 Thread Lawrence Crowl

Mark test x3hardorder.cc as passing.  Update many other tests to
indicate their current failure reason.  Fix the readme.


Index: gcc/testsuite/ChangeLog.pph

2011-10-13   Lawrence Crowl  

* g++.dg/pph/README: Put z files in regular expression.
* g++.dg/pph/x3hardorder.cc: Mark passing.
* g++.dg/pph/c1limits-externalid.cc: Add triage comment.
* g++.dg/pph/e4variables.cc: Likewise.
* g++.dg/pph/x1tmplclass1.cc: Likewise.
* g++.dg/pph/x1tmplclass2.cc: Likewise.
* g++.dg/pph/x4keyed.cc: Likewise.
* g++.dg/pph/x4keyex.cc: Likewise.
* g++.dg/pph/x4keyno.cc: Likewise.
* g++.dg/pph/x4resolve1.cc: Likewise.
* g++.dg/pph/x4resolve2.cc: Likewise.
* g++.dg/pph/x4structover1.cc: Likewise.
* g++.dg/pph/x4tmplclass2.cc: Likewise.
* g++.dg/pph/x4tmplfuncinln.cc: Likewise.
* g++.dg/pph/x4tmplfuncninl.cc: Likewise.
* g++.dg/pph/x6dynarray3.cc: Likewise.
* g++.dg/pph/x6dynarray4.cc: Likewise.
* g++.dg/pph/x6rtti.cc: Likewise.
* g++.dg/pph/x7dynarray5.cc: Likewise.
* g++.dg/pph/x7rtti.cc: Likewise.
* g++.dg/pph/z4nontrivinit.cc: Likewise.
* g++.dg/pph/z4tmplclass1.cc: Likewise.
* g++.dg/pph/z4tmplclass2.cc: Likewise.
* g++.dg/pph/z4tmplfuncinln.cc: Likewise.
* g++.dg/pph/z4tmplfuncninl.cc: Likewise.


Index: gcc/testsuite/g++.dg/pph/x4resolve1.cc
===
--- gcc/testsuite/g++.dg/pph/x4resolve1.cc  (revision 179942)
+++ gcc/testsuite/g++.dg/pph/x4resolve1.cc  (working copy)
@@ -1,4 +1,6 @@
 // pph asm xwant 03374
+// This test produces overload differences because the declaration and
+// call orders are different between pph and textual parsing.
 
 #include "x0resolve1.h"
 #include "x0resolve2.h"
Index: gcc/testsuite/g++.dg/pph/x4tmplfuncninl.cc
===
--- gcc/testsuite/g++.dg/pph/x4tmplfuncninl.cc  (revision 179942)
+++ gcc/testsuite/g++.dg/pph/x4tmplfuncninl.cc  (working copy)
@@ -1,4 +1,6 @@
 // pph asm xdiff 37887
+// xfail BOGUS DIFF LABEL
+
 #include "x0tmplfuncninl1.h"
 #include "x0tmplfuncninl2.h"
 #include "a0tmplfuncninl_u.h"
Index: gcc/testsuite/g++.dg/pph/z4tmplfuncninl.cc
===
--- gcc/testsuite/g++.dg/pph/z4tmplfuncninl.cc  (revision 179942)
+++ gcc/testsuite/g++.dg/pph/z4tmplfuncninl.cc  (working copy)
@@ -1,4 +1,6 @@
 // pph asm xdiff 05125
+// xfail BOGUS DUPFUN
+
 #include "x0tmplfuncninl3.h"
 #include "x0tmplfuncninl4.h"
 #include "a0tmplfuncninl_u.h"
Index: gcc/testsuite/g++.dg/pph/x6dynarray3.cc
===
--- gcc/testsuite/g++.dg/pph/x6dynarray3.cc (revision 179942)
+++ gcc/testsuite/g++.dg/pph/x6dynarray3.cc (working copy)
@@ -1,5 +1,7 @@
 // pph asm xdiff 30893
-// .Lnn labels emitted with different values of 'nn'.
+// xfail BOGUS UNKNOWN
+// Some branches seem to be missing.
+
 #include "x5dynarray3.h"
 
 #include "a0integer.h"
Index: gcc/testsuite/g++.dg/pph/x4keyno.cc
===
--- gcc/testsuite/g++.dg/pph/x4keyno.cc (revision 179942)
+++ gcc/testsuite/g++.dg/pph/x4keyno.cc (working copy)
@@ -1,5 +1,6 @@
 // { dg-xfail-if "BOGUS MERGE AUXVAR" { "*-*-*" } { "-fpph-map=pph.map" } }
-// { dg-bogus "x4keyno.cc:11:1: error: redefinition of 'const char _ZTS5keyno" 
"" { xfail *-*-* } 0 }
+// { dg-bogus "x4keyno.cc:12:1: error: redefinition of 'const char _ZTS5keyno" 
"" { xfail *-*-* } 0 }
+// The variable for the typeinfo name for 'keyno' is duplicated.
 
 #include "x0keyno1.h"
 #include "x0keyno2.h"
Index: gcc/testsuite/g++.dg/pph/x7dynarray5.cc
===
--- gcc/testsuite/g++.dg/pph/x7dynarray5.cc (revision 179942)
+++ gcc/testsuite/g++.dg/pph/x7dynarray5.cc (working copy)
@@ -1,4 +1,5 @@
-// { dg-xfail-if "BOGUS" { "*-*-*" } { "-fpph-map=pph.map" } }
+// { dg-xfail-if "BOGUS POSSIBLY DROPPING SYMBOLS " { "*-*-*" } { 
"-fpph-map=pph.map" } }
+
 #include "x0dynarray4.h"
 #include "x6dynarray5.h"
 
Index: gcc/testsuite/g++.dg/pph/README
===
--- gcc/testsuite/g++.dg/pph/README (revision 179942)
+++ gcc/testsuite/g++.dg/pph/README (working copy)
@@ -1,7 +1,7 @@
 The test names have the following convention on the prefix of their
 names.
 
-[acdpxy][0-9]*
+[acdpxyz][0-9]*
 
 a - auxillary headers
 c - positive tests for C-level headers and sources
Index: gcc/testsuite/g++.dg/pph/x4resolve2.cc
===
--- gcc/testsuite/g++.dg/pph/x4resolve2.cc  (revision 179942)
+++ gcc/testsuite/g++.dg/pph/x4resolve2.cc  (working copy)
@@ -1,4 +1,6 @@
 // pph asm xwant 37643
+// This test produces ove

[PATCH] vec_unpack{s,u}_float_{hi,lo}_{v8hi,v4si} support

2011-10-13 Thread Jakub Jelinek

Hi!

This patch allows 32-byte vectorization of e.g.
short a[512];
unsigned short b[512];
int c[512];
unsigned int d[512];
float e[512];
double f[512];

void
f1 (void)
{
  int i;
  for (i = 0; i < 512; ++i)
e[i] = a[i];
}

void
f2 (void)
{
  int i;
  for (i = 0; i < 512; ++i)
e[i] = b[i];
}

void
f3 (void)
{
  int i;
  for (i = 0; i < 512; ++i)
f[i] = c[i];
}

void
f4 (void)
{
  int i;
  for (i = 0; i < 512; ++i)
f[i] = d[i];
}
with -O3 -mavx2.  Bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?

2011-10-13  Jakub Jelinek  

* config/i386/sse.md (*avx_cvtdq2pd256_2): Rename to...
(avx_cvtdq2pd256_2): ... this.
(sseunpackfltmode): New mode attr.
(vec_unpacks_float_hi_v8hi, vec_unpacks_float_lo_v8hi,
vec_unpacku_float_hi_v8hi, vec_unpacku_float_lo_v8hi): Macroize
using VI2_AVX2 iterator.
(vec_unpacku_float_hi_v8si, vec_unpacku_float_lo_v8si): New
expanders.

--- gcc/config/i386/sse.md.jj   2011-10-13 17:34:26.0 +0200
+++ gcc/config/i386/sse.md  2011-10-13 21:10:52.0 +0200
@@ -2517,7 +2517,7 @@ (define_insn "avx_cvtdq2pd256"
(set_attr "prefix" "vex")
(set_attr "mode" "V4DF")])
 
-(define_insn "*avx_cvtdq2pd256_2"
+(define_insn "avx_cvtdq2pd256_2"
   [(set (match_operand:V4DF 0 "register_operand" "=x")
(float:V4DF
  (vec_select:V4SI
@@ -2786,51 +2786,58 @@ (define_expand "vec_unpacks_lo_v8sf"
   (const_int 2) (const_int 3)]]
   "TARGET_AVX")
 
-(define_expand "vec_unpacks_float_hi_v8hi"
-  [(match_operand:V4SF 0 "register_operand" "")
-   (match_operand:V8HI 1 "register_operand" "")]
+(define_mode_attr sseunpackfltmode
+  [(V8HI "V4SF") (V4SI "V2DF") (V16HI "V8SF") (V8SI "V4DF")])
+
+(define_expand "vec_unpacks_float_hi_"
+  [(match_operand: 0 "register_operand" "")
+   (match_operand:VI2_AVX2 1 "register_operand" "")]
   "TARGET_SSE2"
 {
-  rtx tmp = gen_reg_rtx (V4SImode);
+  rtx tmp = gen_reg_rtx (mode);
 
-  emit_insn (gen_vec_unpacks_hi_v8hi (tmp, operands[1]));
-  emit_insn (gen_sse2_cvtdq2ps (operands[0], tmp));
+  emit_insn (gen_vec_unpacks_hi_ (tmp, operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0],
+ gen_rtx_FLOAT (mode, tmp)));
   DONE;
 })
 
-(define_expand "vec_unpacks_float_lo_v8hi"
-  [(match_operand:V4SF 0 "register_operand" "")
-   (match_operand:V8HI 1 "register_operand" "")]
+(define_expand "vec_unpacks_float_lo_"
+  [(match_operand: 0 "register_operand" "")
+   (match_operand:VI2_AVX2 1 "register_operand" "")]
   "TARGET_SSE2"
 {
-  rtx tmp = gen_reg_rtx (V4SImode);
+  rtx tmp = gen_reg_rtx (mode);
 
-  emit_insn (gen_vec_unpacks_lo_v8hi (tmp, operands[1]));
-  emit_insn (gen_sse2_cvtdq2ps (operands[0], tmp));
+  emit_insn (gen_vec_unpacks_lo_ (tmp, operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0],
+ gen_rtx_FLOAT (mode, tmp)));
   DONE;
 })
 
-(define_expand "vec_unpacku_float_hi_v8hi"
-  [(match_operand:V4SF 0 "register_operand" "")
-   (match_operand:V8HI 1 "register_operand" "")]
+(define_expand "vec_unpacku_float_hi_"
+  [(match_operand: 0 "register_operand" "")
+   (match_operand:VI2_AVX2 1 "register_operand" "")]
   "TARGET_SSE2"
 {
-  rtx tmp = gen_reg_rtx (V4SImode);
+  rtx tmp = gen_reg_rtx (mode);
 
-  emit_insn (gen_vec_unpacku_hi_v8hi (tmp, operands[1]));
-  emit_insn (gen_sse2_cvtdq2ps (operands[0], tmp));
+  emit_insn (gen_vec_unpacku_hi_ (tmp, operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0],
+ gen_rtx_FLOAT (mode, tmp)));
   DONE;
 })
 
-(define_expand "vec_unpacku_float_lo_v8hi"
-  [(match_operand:V4SF 0 "register_operand" "")
-   (match_operand:V8HI 1 "register_operand" "")]
+(define_expand "vec_unpacku_float_lo_"
+  [(match_operand: 0 "register_operand" "")
+   (match_operand:VI2_AVX2 1 "register_operand" "")]
   "TARGET_SSE2"
 {
-  rtx tmp = gen_reg_rtx (V4SImode);
+  rtx tmp = gen_reg_rtx (mode);
 
-  emit_insn (gen_vec_unpacku_lo_v8hi (tmp, operands[1]));
-  emit_insn (gen_sse2_cvtdq2ps (operands[0], tmp));
+  emit_insn (gen_vec_unpacku_lo_ (tmp, operands[1]));
+  emit_insn (gen_rtx_SET (VOIDmode, operands[0],
+ gen_rtx_FLOAT (mode, tmp)));
   DONE;
 })
 
@@ -2942,6 +2949,58 @@ (define_expand "vec_unpacku_float_lo_v4s
 operands[i] = gen_reg_rtx (V2DFmode);
 })
 
+(define_expand "vec_unpacku_float_hi_v8si"
+  [(match_operand:V4DF 0 "register_operand" "")
+   (match_operand:V8SI 1 "register_operand" "")]
+  "TARGET_AVX"
+{
+  REAL_VALUE_TYPE TWO32r;
+  rtx x, tmp[6];
+  int i;
+
+  real_ldexp (&TWO32r, &dconst1, 32);
+  x = const_double_from_real_value (TWO32r, DFmode);
+
+  tmp[0] = force_reg (V4DFmode, CONST0_RTX (V4DFmode));
+  tmp[1] = force_reg (V4DFmode, ix86_build_const_vector (V4DFmode, 1, x));
+  tmp[5] = gen_reg_rtx (V4SImode);
+
+  for (i = 2; i < 5; i++)
+tmp[i] = gen_reg_rtx (V4DFmode);
+  emit_insn (gen_vec_extract_hi_v8si (tmp[5], operands[1]));
+

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> It is because mask 0x is optimized to 0xfffc by keeping track
> of non-zero bits in registers and the above code doesn't take that
> into account.

Then I'd suggest modifying that code so that it does rather than
essentially duplicating it.  But I'd recommend running some
performance tests to verify that you're not pessimizing things when
you do that: this stuff can be very tricky and you want to make sure
that you're not converting something like (and X 3) into a bit
extraction unnecessarily.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 2:23 PM, Richard Kenner
 wrote:
>> Does it look OK?
>
> No.
>
> If I understand your code correctly, there's essentially the same code
> as you have a bit above that:
>
>      /* If the constant is one less than a power of two, this might be
>         representable by an extraction even if no shift is present.
>         If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
>         we are in a COMPARE.  */
>      else if ((i = exact_log2 (INTVAL (XEXP (x, 1)) + 1)) >= 0)
>        new_rtx = make_extraction (mode,
>                               make_compound_operation (XEXP (x, 0),
>                                                        next_code),
>                               0, NULL_RTX, i, 1, 0, in_code == COMPARE);
>
> So you need to understand why your code "fires" and it doesn't.
>
>

It is because mask 0x is optimized to 0xfffc by keeping track
of non-zero bits in registers and the above code doesn't take that
into account.

-- 
H.J.

Re: [Patch,AVR] Fix PR46278, Take #3

2011-10-13 Thread Georg-Johann Lay


Weddington, Eric a écrit:


Georg-Johann Lay wrote:


This is yet another attempt to fix PR46278 (fake X addressing).

After the previous clean-ups it is just a small change.

caller-saves.c tries to eliminate call-clobbered hard-regs 
allocated to pseudos around function calls and that leads to

situations that reload is no more capable to perform all requested
spills because of the very few AVR's address registers.

Thus, the patch adds a new target option -mstrict-X so that the 
user can turn that option if he like to do so, and then

-fcaller-save is disabled.

The patch passes the testsuite without regressions. Moreover, the 
testsuite passes without regressions if all test cases are run with

-mstrict-X and all libraries (libgcc, avr-libc) are built with the
new option turned on.


Hi Johann,

But if all test cases pass with running -mstrict-X and everything 
built with that option on, then why is this even an option? Is it 
because that it may not always reduce code size?...


As with any other optimization, I'd guess yes.

But the major problem with this patch -- or any patch that addresses 
this PR -- is that taking away the X+const addressing might render the 
challenge of register allocation for AVR to a too big one so that reload 
cannot cope with it any more and ICEs with spill failure.


Denis' analysis showed that the root cause of these spill fails is that 
there is just one register that can perform R+const addressing besides 
FP but that register (Z) is call-clobbered.  Dunno if these problems 
were also triggered by caller-saves.  Thus, if some real world code 
breaks with a spill fail, the option provides a fallback to cure 
reload's shortcomings.


It's just a kludge, of course, but trying to fix reload is nothing I 
would do (I know my limits) and the prospects are thin that a target as 
unimportant as AVR will draw attention of some reload expert.


But looking at the results, I think it's worth it. 30% less for an 
already optimized program is not bad -- or the other way round: 50% 
bloat without this option is horrible.  And these test programs are not 
fancy; they are just accessing struct components via


pstruct->component

which is not uncommon code.


Thanks, Eric


Johann

C++ PATCH for c++/50614 (ICE with NSDMI and -fcompare-debug)

2011-10-13 Thread Jason Merrill

The problem here was that with -fcompare-debug, 
execute_cleanup_cfg_post_optimizing wants to print out all the decls 
used in a function, which involves printing the DECL_INITIAL, and the 
instantiation of a FIELD_DECL with an NSDMI had an uninstantiated 
DECL_INITIAL, so the dumper got confused by the C++ tree codes.  Fixed 
by setting DECL_INITIAL of instantiated FIELD_DECLs to error_mark_node 
and using DECL_TEMPLATE_INFO to look up the original NSDMI instead.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 69647e299fba76f47d157699d80afcc2c703e408
Author: Jason Merrill 
Date:   Thu Oct 13 16:59:42 2011 -0400

	PR c++/50614
	* cp-tree.h (VAR_TEMPL_TYPE_FIELD_OR_FUNCTION_DECL_CHECK): New.
	(DECL_TEMPLATE_INFO): Use it.
	* pt.c (tsubst_decl) [FIELD_DECL]: Set DECL_TEMPLATE_INFO
	if the decl has an NSDMI.
	* init.c (perform_member_init): Use it.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index e42cda1..98599f9 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -201,6 +201,9 @@ c-common.h, not after.
 #define VAR_TEMPL_TYPE_OR_FUNCTION_DECL_CHECK(NODE) \
   TREE_CHECK4(NODE,VAR_DECL,FUNCTION_DECL,TYPE_DECL,TEMPLATE_DECL)
 
+#define VAR_TEMPL_TYPE_FIELD_OR_FUNCTION_DECL_CHECK(NODE) \
+  TREE_CHECK5(NODE,VAR_DECL,FIELD_DECL,FUNCTION_DECL,TYPE_DECL,TEMPLATE_DECL)
+
 #define BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK(NODE) \
   TREE_CHECK(NODE,BOUND_TEMPLATE_TEMPLATE_PARM)
 
@@ -2556,7 +2559,7 @@ extern void decl_shadowed_for_var_insert (tree, tree);
global function f.  In this case, DECL_TEMPLATE_INFO for S::f
will be non-NULL, but DECL_USE_TEMPLATE will be zero.  */
 #define DECL_TEMPLATE_INFO(NODE) \
-  (DECL_LANG_SPECIFIC (VAR_TEMPL_TYPE_OR_FUNCTION_DECL_CHECK (NODE)) \
+  (DECL_LANG_SPECIFIC (VAR_TEMPL_TYPE_FIELD_OR_FUNCTION_DECL_CHECK (NODE)) \
->u.min.template_info)
 
 /* For a VAR_DECL, indicates that the variable is actually a
@@ -2701,7 +2704,10 @@ extern void decl_shadowed_for_var_insert (tree, tree);
  template  struct S { friend void f(int, double); }
 
the DECL_TI_TEMPLATE will be an IDENTIFIER_NODE for `f' and the
-   DECL_TI_ARGS will be {int}.  */
+   DECL_TI_ARGS will be {int}.
+
+   For a FIELD_DECL, this value is the FIELD_DECL it was instantiated
+   from.  */
 #define DECL_TI_TEMPLATE(NODE)  TI_TEMPLATE (DECL_TEMPLATE_INFO (NODE))
 
 /* The template arguments used to obtain this decl from the most
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index a21e566..4561979 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -497,11 +497,11 @@ perform_member_init (tree member, tree init)
  mem-initializer for this field.  */
   if (init == NULL_TREE)
 {
-  if (CLASSTYPE_TEMPLATE_INSTANTIATION (DECL_CONTEXT (member)))
+  if (DECL_LANG_SPECIFIC (member) && DECL_TEMPLATE_INFO (member))
 	/* Do deferred instantiation of the NSDMI.  */
 	init = (tsubst_copy_and_build
-		(DECL_INITIAL (member),
-		 CLASSTYPE_TI_ARGS (DECL_CONTEXT (member)),
+		(DECL_INITIAL (DECL_TI_TEMPLATE (member)),
+		 DECL_TI_ARGS (member),
 		 tf_warning_or_error, member, /*function_p=*/false,
 		 /*integral_constant_expression_p=*/false));
   else
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 880f3d1..1632c01 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10269,6 +10269,16 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 	= tsubst_expr (DECL_INITIAL (t), args,
 			   complain, in_decl,
 			   /*integral_constant_expression_p=*/true);
+	else if (DECL_INITIAL (t))
+	  {
+	/* Set up DECL_TEMPLATE_INFO so that we can get at the
+	   NSDMI in perform_member_init.  Still set DECL_INITIAL
+	   to error_mark_node so that we know there is one.  */
+	DECL_INITIAL (r) = error_mark_node;
+	gcc_assert (DECL_LANG_SPECIFIC (r) == NULL);
+	retrofit_lang_decl (r);
+	DECL_TEMPLATE_INFO (r) = build_template_info (t, args);
+	  }
 	/* We don't have to set DECL_CONTEXT here; it is set by
 	   finish_member_declaration.  */
 	DECL_CHAIN (r) = NULL_TREE;
diff --git a/gcc/testsuite/g++.dg/cpp0x/nsdmi-template2.C b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template2.C
new file mode 100644
index 000..27b0aa5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nsdmi-template2.C
@@ -0,0 +1,14 @@
+// PR c++/50614
+// { dg-options "-std=c++0x -fcompare-debug" }
+
+struct A
+{
+  int f ();
+};
+
+template  struct B : A
+{
+  int i = this->f ();
+};
+
+B<0> b;

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> Does it look OK?

No.  

If I understand your code correctly, there's essentially the same code
as you have a bit above that:

  /* If the constant is one less than a power of two, this might be 
 representable by an extraction even if no shift is present. 
 If it doesn't end up being a ZERO_EXTEND, we will ignore it unless
 we are in a COMPARE.  */
  else if ((i = exact_log2 (INTVAL (XEXP (x, 1)) + 1)) >= 0)
new_rtx = make_extraction (mode,
   make_compound_operation (XEXP (x, 0),
next_code),
   0, NULL_RTX, i, 1, 0, in_code == COMPARE);

So you need to understand why your code "fires" and it doesn't.

[PATCH] Fix the RTL of some sparc VIS patterns.

2011-10-13 Thread David Miller


Based upon a review of the sparc VIS support by Richard Henderson.

Committed to trunk.

gcc/

* config/sparc/sparc.md (UNSPEC_FPMERGE): Delete.
(UNSPEC_MUL16AU, UNSPEC_MUL8, UNSPEC_MUL8SU, UNSPEC_MULDSU): New
unspecs.
(fpmerge_vis): Remove inaccurate comment, represent using vec_select
of a vec_concat.
(vec_interleave_lowv8qi, vec_interleave_highv8qi): New insns.
(fmul8x16_vis, fmul8x16au_vis, fmul8sux16_vis, fmuld8sux16_vis):
Reimplement as unspecs and remove inaccurate comments.
(vis3_shift_patname): New code attr.
(_vis): Rename to 
"v3".
(vis3_addsub_ss_patname): New code attr.
(_vis): Rename to
"3".
* config/sparc/sparc.c (sparc_vis_init_builtins): Update to
accommodate pattern name changes.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179943 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |   16 +
 gcc/config/sparc/sparc.c  |   32 +-
 gcc/config/sparc/sparc.md |   79 
 3 files changed, 89 insertions(+), 38 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 18c6e88..6a514e8 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -64,6 +64,22 @@
 
 2011-10-12  David S. Miller  
 
+   * config/sparc/sparc.md (UNSPEC_FPMERGE): Delete.
+   (UNSPEC_MUL16AU, UNSPEC_MUL8, UNSPEC_MUL8SU, UNSPEC_MULDSU): New
+   unspecs.
+   (fpmerge_vis): Remove inaccurate comment, represent using vec_select
+   of a vec_concat.
+   (vec_interleave_lowv8qi, vec_interleave_highv8qi): New insns.
+   (fmul8x16_vis, fmul8x16au_vis, fmul8sux16_vis, fmuld8sux16_vis):
+   Reimplement as unspecs and remove inaccurate comments.
+   (vis3_shift_patname): New code attr.
+   (_vis): Rename to 
"v3".
+   (vis3_addsub_ss_patname): New code attr.
+   (_vis): Rename to
+   "3".
+   * config/sparc/sparc.c (sparc_vis_init_builtins): Update to
+   accommodate pattern name changes.
+
* config/sparc/sparc.h: Do not force TARGET_VIS3 and TARGET_FMAF
to zero when assembler lacks support for such instructions.
* config/sparc/sparc.c (sparc_option_override): Clear MASK_VIS3
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 5ecfe95..fc448cc 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -9496,21 +9496,21 @@ sparc_vis_init_builtins (void)
   def_builtin_const ("__builtin_vis_fchksm16", CODE_FOR_fchksm16_vis,
 v4hi_ftype_v4hi_v4hi);
 
-  def_builtin_const ("__builtin_vis_fsll16", CODE_FOR_fsll16_vis,
+  def_builtin_const ("__builtin_vis_fsll16", CODE_FOR_vashlv4hi3,
 v4hi_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fslas16", CODE_FOR_fslas16_vis,
+  def_builtin_const ("__builtin_vis_fslas16", CODE_FOR_vssashlv4hi3,
 v4hi_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fsrl16", CODE_FOR_fsrl16_vis,
+  def_builtin_const ("__builtin_vis_fsrl16", CODE_FOR_vlshrv4hi3,
 v4hi_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fsra16", CODE_FOR_fsra16_vis,
+  def_builtin_const ("__builtin_vis_fsra16", CODE_FOR_vashrv4hi3,
 v4hi_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fsll32", CODE_FOR_fsll32_vis,
+  def_builtin_const ("__builtin_vis_fsll32", CODE_FOR_vashlv2si3,
 v2si_ftype_v2si_v2si);
-  def_builtin_const ("__builtin_vis_fslas32", CODE_FOR_fslas32_vis,
+  def_builtin_const ("__builtin_vis_fslas32", CODE_FOR_vssashlv2si3,
 v2si_ftype_v2si_v2si);
-  def_builtin_const ("__builtin_vis_fsrl32", CODE_FOR_fsrl32_vis,
+  def_builtin_const ("__builtin_vis_fsrl32", CODE_FOR_vlshrv2si3,
 v2si_ftype_v2si_v2si);
-  def_builtin_const ("__builtin_vis_fsra32", CODE_FOR_fsra32_vis,
+  def_builtin_const ("__builtin_vis_fsra32", CODE_FOR_vashrv2si3,
 v2si_ftype_v2si_v2si);
 
   if (TARGET_ARCH64)
@@ -9527,21 +9527,21 @@ sparc_vis_init_builtins (void)
   def_builtin_const ("__builtin_vis_fpsub64", CODE_FOR_fpsub64_vis,
 di_ftype_di_di);
 
-  def_builtin_const ("__builtin_vis_fpadds16", CODE_FOR_fpadds16_vis,
+  def_builtin_const ("__builtin_vis_fpadds16", CODE_FOR_ssaddv4hi3,
 v4hi_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fpadds16s", CODE_FOR_fpadds16s_vis,
+  def_builtin_const ("__builtin_vis_fpadds16s", CODE_FOR_ssaddv2hi3,
 v2hi_ftype_v2hi_v2hi);
-  def_builtin_const ("__builtin_vis_fpsubs16", CODE_FOR_fpsubs16_vis,
+  def_builtin_const ("__builtin_vis_fpsubs16", CODE_FOR_sssubv4hi3,
 v4hi_ftype_v4hi_v4hi);
-  def_builtin_const ("__builtin_vis_fpsubs16s", CODE_FOR_fpsubs16s_

C++ PATCH for c++/50437 (ICE on auto with lambda in template)

2011-10-13 Thread Jason Merrill

The problem here was that auto deduced the closure type of the lambda in 
the template, and then instantiation tried to instantiate the closure 
outside of the context of the LAMBDA_EXPR, which doesn't work.  So I've 
changed LAMBDA_EXPR to always have a TREE_TYPE of NULL_TREE, and put the 
closure in a new field instead.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 1e940d8c7b567f7e0994ca99fe34b51309705d7f
Author: Jason Merrill 
Date:   Thu Oct 13 15:27:31 2011 -0400

	PR c++/50437
	* cp-tree.h (struct tree_lambda_expr): Add closure field.
	(LAMBDA_EXPR_CLOSURE): New.
	* pt.c (tsubst_copy_and_build) [LAMBDA_EXPR]: Likewise.
	* semantics.c (build_lambda_object): Use it instead of TREE_TYPE.
	(begin_lambda_type, lambda_function, add_capture): Likewise.
	(add_default_capture, lambda_expr_this_capture): Likewise.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b53accf..e42cda1 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -671,6 +671,12 @@ enum cp_lambda_default_capture_mode_type {
 #define LAMBDA_EXPR_PENDING_PROXIES(NODE) \
   (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->pending_proxies)
 
+/* The closure type of the lambda.  Note that the TREE_TYPE of a
+   LAMBDA_EXPR is always NULL_TREE, because we need to instantiate the
+   LAMBDA_EXPR in order to instantiate the type.  */
+#define LAMBDA_EXPR_CLOSURE(NODE) \
+  (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->closure)
+
 struct GTY (()) tree_lambda_expr
 {
   struct tree_typed typed;
@@ -678,6 +684,7 @@ struct GTY (()) tree_lambda_expr
   tree this_capture;
   tree return_type;
   tree extra_scope;
+  tree closure;
   VEC(tree,gc)* pending_proxies;
   location_t locus;
   enum cp_lambda_default_capture_mode_type default_capture_mode;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index bfbd244..880f3d1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13937,8 +13937,8 @@ tsubst_copy_and_build (tree t,
   {
 	tree r = build_lambda_expr ();
 
-	tree type = tsubst (TREE_TYPE (t), args, complain, NULL_TREE);
-	TREE_TYPE (r) = type;
+	tree type = tsubst (LAMBDA_EXPR_CLOSURE (t), args, complain, NULL_TREE);
+	LAMBDA_EXPR_CLOSURE (r) = type;
 	CLASSTYPE_LAMBDA_EXPR (type) = r;
 
 	LAMBDA_EXPR_LOCATION (r)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index eed38e6..7d37fa3 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -8324,7 +8324,7 @@ build_lambda_object (tree lambda_expr)
 
   /* N2927: "[The closure] class type is not an aggregate."
  But we briefly treat it as an aggregate to make this simpler.  */
-  type = TREE_TYPE (lambda_expr);
+  type = LAMBDA_EXPR_CLOSURE (lambda_expr);
   CLASSTYPE_NON_AGGREGATE (type) = 0;
   expr = finish_compound_literal (type, expr, tf_warning_or_error);
   CLASSTYPE_NON_AGGREGATE (type) = 1;
@@ -8365,7 +8365,7 @@ begin_lambda_type (tree lambda)
   type = begin_class_definition (type, /*attributes=*/NULL_TREE);
 
   /* Cross-reference the expression and the type.  */
-  TREE_TYPE (lambda) = type;
+  LAMBDA_EXPR_CLOSURE (lambda) = type;
   CLASSTYPE_LAMBDA_EXPR (type) = lambda;
 
   return type;
@@ -8399,7 +8399,7 @@ lambda_function (tree lambda)
 {
   tree type;
   if (TREE_CODE (lambda) == LAMBDA_EXPR)
-type = TREE_TYPE (lambda);
+type = LAMBDA_EXPR_CLOSURE (lambda);
   else
 type = lambda;
   gcc_assert (LAMBDA_TYPE_P (type));
@@ -8714,7 +8714,7 @@ add_capture (tree lambda, tree id, tree initializer, bool by_reference_p,
 
   /* If TREE_TYPE isn't set, we're still in the introducer, so check
  for duplicates.  */
-  if (!TREE_TYPE (lambda))
+  if (!LAMBDA_EXPR_CLOSURE (lambda))
 {
   if (IDENTIFIER_MARKED (name))
 	{
@@ -8740,13 +8740,14 @@ add_capture (tree lambda, tree id, tree initializer, bool by_reference_p,
 LAMBDA_EXPR_THIS_CAPTURE (lambda) = member;
 
   /* Add it to the appropriate closure class if we've started it.  */
-  if (current_class_type && current_class_type == TREE_TYPE (lambda))
+  if (current_class_type
+  && current_class_type == LAMBDA_EXPR_CLOSURE (lambda))
 finish_member_declaration (member);
 
   LAMBDA_EXPR_CAPTURE_LIST (lambda)
 = tree_cons (member, initializer, LAMBDA_EXPR_CAPTURE_LIST (lambda));
 
-  if (TREE_TYPE (lambda))
+  if (LAMBDA_EXPR_CLOSURE (lambda))
 return build_capture_proxy (member);
   /* For explicit captures we haven't started the function yet, so we wait
  and build the proxy from cp_parser_lambda_body.  */
@@ -8789,7 +8790,7 @@ add_default_capture (tree lambda_stack, tree id, tree initializer)
 {
   tree lambda = TREE_VALUE (node);
 
-  current_class_type = TREE_TYPE (lambda);
+  current_class_type = LAMBDA_EXPR_CLOSURE (lambda);
   var = add_capture (lambda,
 id,
 initializer,
@@ -8820,7 +8821,7 @@ lambda_expr_this_capture (tree lambda)
   if (!this_capture
   && LAMBDA_EXPR_DEFAULT_CAPTURE_MODE (lambda) != CPLD_NONE)
 {
-  tree containing_function =

Re: [rs6000] Enable scalar shifts of vectors

2011-10-13 Thread David Edelsohn

On Wed, Oct 12, 2011 at 6:32 PM, Richard Henderson  wrote:
> I suppose technically the middle-end could be improved to implement
> ashl as vashl by broadcasting the scalar, but Altivec
> is the only extant SIMD ISA that would make use of this.  All of
> the others can arrange for constant shifts to be encoded into the
> insn, and so implement the ashl named pattern.
>
> Tested on ppc64-linux, --with-cpu=G5.
>
> Ok?
>
>
> r~
>
>
>        * config/rs6000/rs6000.c (rs6000_expand_vector_broadcast): New.
>        * config/rs6000/rs6000-protos.h: Update.
>        * config/rs6000/vector.md (ashl3): New.
>        (lshr3, ashr3): New.

The patch is fine.

Thanks, David

Re: ifcvt cond_exec support rewrite

2011-10-13 Thread Bernd Schmidt

Ping.  Better support for nested if-then-else structures:

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01935.html


Bernd

Re: [trans-mem] Add gl_wt TM method.

2011-10-13 Thread Torvald Riegel

On Tue, 2011-08-30 at 00:33 +0200, Torvald Riegel wrote:
> The attached patches are several changes required for a new TM method,
> gl_wt (global lock, write-through), which is added by the last patch
> 
> patch1: Add TM-method-specific begin code. All time-based TMs need to
> know at which point in time they start working. Initializing lazily on
> the first txnal load or store would be unnecessary overhead.
> 
> patch2: A small fix for serial mode. This change should have been
> included in the previous renaming of the serial mode dispatchs.
> 
> patch3: We can't free transaction-local memory during nested commits
> unless we also go through the undo and redo logs and remove all
> references to the to-be-freed memory (otherwise, we'll undo/redo to
> privatized memory...). I guess going trough the logs is higher overhead
> than just keeping the allocations around. If we see transactions in
> practice that have large malloc/free cycles embedded in nested txns that
> are not flattened, we can still add special handling for this case.
> 
> patch4: We sometimes need to re-initialize method groups (e.g., to avoid
> overflow of counters etc.). TM methods can request this using a special
> restart reason.
> 
> patch5: The undo log is used for both thread-local and shared data
> (which are separate). Maintaining two undo logs does not provide any
> advantages. However, we have to perform undo actions to shared data
> before dispatch-specific rollback (e.g., where we release locks).
> 
> patch6: Add support for quiescence-based privatization safety (using
> gtm_thread::shared_state as the value of the current (snapshot) time of
> a transaction). Currently, this uses just spinning, but it should
> eventually be changed to block using cond vars / futexes if necessary.
> This requires more thought and tuning however, as it should probably be
> integrated with the serial lock (and it poses similar challenges, such
> as having to minimize the number of wait/wakeup calls, number of cache
> misses, etc.). Therefore, this should be addressed in a future patch.
> 
> patch7: Finally, the new TM method, gl_wt (global lock, write trough).
> This is a simple algorithm that uses a global versioned lock (aka
> ownership record or orec) together with write-through / undolog-style
> txnal writes. This has a lot of similarities to undolog-style TM methods
> that use several locks (e.g., privatization safety has to be ensured),
> but has less overhead. If update txns are frequent, it obviously won't
> scale. With the current code base, gl_wt performs better than
> serialirr_onwrite but probably mostly due spinning and restarts when the
> global lock is acquired instead of falling back to heavyweight waiting
> via futex wait/wakeup calls.
> gl_wt is in the globallock method group, to which at least another
> write-back, value-based-validation TM method will be added.

The attached two patches fix one bug in gl_wt (patch8) and one unrelated
bug that affected gl_wt (patch9).

patch8: The previous code did not handle transitions to serial mode
properly. On such a transition, a method's trycommit() and/or rollback()
functions are called when the serial lock is already acquired. This
acquisition is handled through (and stored in) gtm_thread::shared_state,
which the gl_wt method uses too. Thus, blindly overwriting shared_state
in commit/rollback doesn't work. The fix handles now detects this case
and makes gl_wt not interfere with serial mode anymore.

patch9: The previous custom TLS slot read accesses were reordered by the
compiler across non-inlined function calls. For gl_wt, this caused
restarts to operate on a stale value for abi_disp() (because this call
was moved to before the decide_retry_strategy() call, which would call
set_abi_disp). In turn, this broke gl_wt because it was being used
together with serial mode, which does not work because it tries to use
shared_state too.

Ok for branch (patches 1-9)?

commit 08d3f16c5fffa2bd3acb0c27af4010c72c75b23d
Author: Torvald Riegel 
Date:   Thu Oct 13 18:16:39 2011 +0200

Fix gl_wt commit/rollback when serial lock has been acquired.

* method-gl.cc (gl_wt_dispatch::trycommit): Fix interaction with
gtm_thread::shared_state when the serial lock is acquired.
(gl_wt_dispatch::rollback): Same.

diff --git a/libitm/method-gl.cc b/libitm/method-gl.cc
index 17a2b9f..1dc700a 100644
--- a/libitm/method-gl.cc
+++ b/libitm/method-gl.cc
@@ -72,6 +72,12 @@ static gl_mg o_gl_mg;
 // validate that no other update transaction comitted before we acquired the
 // orec, so we have the most recent timestamp and no other transaction can
 // commit until we have committed).
+// However, we therefore cannot use this method for a serial transaction
+// (because shared_state needs to remain at ~0) and we have to be careful
+// when switching to serial mode (see the special handling in trycommit() and
+// rollback()).
+// ??? This sharing adds some complexity wrt. serial mode. Just use

Re: Predication during scheduling

2011-10-13 Thread Bernd Schmidt

On 09/30/11 17:29, Bernd Schmidt wrote:
> This patch allows a backend to set a new scheduler flag, DO_PREDICATION,
> which will make the haifa scheduler try to move insns across jumps by
> predicating them. On C6X, the primary benefit is to fill jump delay slots.

Ping.

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg02053.html


Bernd

Re: [patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling

2011-10-13 Thread Jan Kratochvil

On Wed, 12 Oct 2011 16:18:07 +0200, Jan Kratochvil wrote:
> On Wed, 12 Oct 2011 16:07:24 +0200, Tristan Gingold wrote:
> > I fear that this may degrade performance of other debuggers.  What about
> > adding a command line option ?
> 
> I can test idb,

I do not find the difference measurable.  Dropping DW_AT_sibling is 0.25%
performance _improvement_ but I guess it is just less than the measurement
error.

libstdc++ built with gcc -gdwarf-2 as with gcc -gdwarf-4 -fdebug-types-section
it crashes.  i7-920 x86_64 used for testing:
Intel(R) Debugger for applications running on Intel(R) 64, Version 12.1, Build 
[76.472.14]

with DW_AT_sibling
real2m34.206s 2m31.822s 2m31.709s 2m32.316s
avg = 152.51325 seconds

patched GCC without DW_AT_sibling
real2m32.528s 2m30.524s 2m33.767s 2m31.719s
avg = 152.1345 seconds

I do not see a point in keeping DW_AT_sibling there.

Regards,
Jan

Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.

2011-10-13 Thread David Miller

From: Eric Botcazou 
Date: Thu, 29 Sep 2011 00:38:49 +0200

> [Vlad, if you have a few minutes, would you mind having a look at the couple 
> of 
> questions at the end of the message?  Thanks in advance].

Vlad, ping?

[lra] patch to improve elimination and inheritance

2011-10-13 Thread Vladimir Makarov


The following patch contains some of my work for last 2 weeks.

First of all, it improves register elimination to permit elimination a 
register to itself.  It resulted in fixing SPEC2000 code size 
degradation for ppc64.


The patch also contains improving inheritance by assigning the same hard 
register to inheritance pseudo and connected reload pseudos.


And, finally, different changes to speed up LRA and some bug fixes are 
also in the patch.


The patch was successfully bootstrapped on x86/x86-64 and ppc64.

Committed as revision 179942.

2011-10-13  Vladimir Makarov 

* lra-assign.c (process_copy_to_form_allocno): Rename to
process_copy_to_form_thread.
(conflict_reload_pseudos): Rename to
conflict_reload_and_inheritance_pseudos.
(live_reload_pseudos): Rename to
live_reload_and_inheritance_pseudos.
(init_live_reload_pseudos): Rename to
init_live_reload_and_inheritance_pseudos.
(finish_live_reload_and_inheritance_pseudos): Rename to
finish_live_reload_pseudos.
(find_hard_regno_for): Add new argument try_only_hard_regno.  Use
live_pseudos_reg_renumber instead of reg_renumber.  Check
live_pseudos_reg_renumber when adding to
conflict_reload_and_inheritance_pseudos.  Process
preferred_hard_regno2 only if preferred_hard_regno1 is
non-negative.
(setup_try_hard_regno_pseudos): Use live_pseudos_reg_renumber
instead of reg_renumber.
(spill_for): Ditto.  Pass new parameter to find_hard_regno_for.
(assign_temporarily): Don't change reg_renumber.
(setup_live_pseudos_and_spill_after_equiv_moves): Only call
update_lives if lra_risky_equiv_subst_p is false.
(improve_inheritance): New function.
(assign_by_spills): Call improve_inheritance.

* lra.c (lra): Call lra_eliminate after lra_constraints.

* lra-constraints.c (contains_pseudo_p): Rename to contains_reg_p.
Add a new argument.  Modify it for new semantics.
(lra_risky_equiv_subst_p): New global var.
(lra_constraints): Set up lra_risky_equiv_subst_p.  Compare
get_equiv_substitution with right value.

* lra-eliminations.c (eliminate_regs_in_insn): Remove a dead
code.
(spill_pseudos): Clear to_process after its processing.
(update_reg_eliminate): Add a new argument.  Set it up.  Modify
lra_no_alloc_regs and eliminable_regset.
(init_elim_table): Remove the argument.  Permit elimination from a
register to itself.
(lra_init_elimination): Add a new argument.  Set it up.  Don't set
up liveness fo HARD_FRAME_POINTER_REGNUM.
(lra_eliminate): Rename to_process to insns_with_changed_offsets.
Call update_reg_eliminate with insns_with_changed_offsets.
Restructure code.  Add additional assertion on
insns_with_changed_offsets.

* lra.h (lra_init_elimination): New argument.

* lra-int.h (lra_risky_equiv_subst_p): New flag declaration.

* ira.c (ira_setup_eliminable_regset): Add a new argument.  Set
up liveness fo HARD_FRAME_POINTER_REGNUM if necessary.
(ira): Call ira_setup_eliminable_regset with a new parameter.

* ira.h (ira_setup_eliminable_regset): New argument.

* loop-invariant.c (calculate_loop_reg_pressure): Call
ira_setup_eliminable_regset with a new parameter.

* haifa-sched.c (sched_init): Ditto.


Index: lra-assigns.c
===
--- lra-assigns.c   (revision 179932)
+++ lra-assigns.c   (working copy)
@@ -64,7 +64,7 @@ static struct regno_assign_info *regno_a
 /* Process a pseudo copy with frequency COPY_FREQ connecting REGNO1
and REGNO2 to form threads.  */
 static void
-process_copy_to_form_allocno (int regno1, int regno2, int copy_freq)
+process_copy_to_form_thread (int regno1, int regno2, int copy_freq)
 {
   int last, regno1_first, regno2_first;
 
@@ -111,7 +111,7 @@ init_regno_assign_info (void)
&& reg_renumber[regno2] < 0 && lra_reg_info[regno2].nrefs != 0
&& (ira_available_class_regs[regno_allocno_class_array[regno1]]
== ira_available_class_regs[regno_allocno_class_array[regno2]]))
-  process_copy_to_form_allocno (regno1, regno2, cp->freq);
+  process_copy_to_form_thread (regno1, regno2, cp->freq);
 }
 
 /* Free REGNO_ASSIGN_INFO.  */
@@ -243,42 +243,42 @@ update_lives (int regno, bool free_p)
 /* Sparseset used to calculate reload pseudos conflicting with a given
pseudo when we are trying to find a hard register for the given
pseudo.  */
-static sparseset conflict_reload_pseudos;
+static sparseset conflict_reload_and_inheritance_pseudos;
 
-/* Map: program point -> bitmap of all reload pseudos living at the
-   point.  */
-static bitmap_head *live_reload_pseudos;
+/* Map: program point -> bitmap of all reload and inheritance pseudos
+   living at the point.  */
+static bitmap_head *live_reload_and_inheritance_pseudos;
 
 /* Allocate and initialize data about living reload pseudos at any
given program point.  */
 static void
-init_live_relo

[pph] Fix builtin merges (issue5276044)

2011-10-13 Thread Diego Novillo


Computing the assembler name of a builtin function prevents the
middle-end from open coding the builtin.  This was causing assembly
differences between the non-pph and pph compiles.

Tested on x86_64.  Committed to branch.


Diego.

cp/ChangeLog.pph

* pph-streamer.c (pph_merge_name): Do not mangle names for
builtin functions.

testsuite/ChangeLog.pph

* g++.dg/pph/p2pr36533.cc: Mark fixed.
* g++.dg/pph/p4mean.cc: Likewise.
* g++.dg/pph/p4pr36533.cc: Likewise.

diff --git a/gcc/cp/pph-streamer.c b/gcc/cp/pph-streamer.c
index 7bcff92..ed2dfca 100644
--- a/gcc/cp/pph-streamer.c
+++ b/gcc/cp/pph-streamer.c
@@ -577,7 +577,7 @@ pph_get_signature (tree t, size_t *nbytes_p)
 tree
 pph_merge_name (tree expr)
 {
-  if (TREE_CODE (expr) == FUNCTION_DECL)
+  if (TREE_CODE (expr) == FUNCTION_DECL && !DECL_BUILT_IN (expr))
 return DECL_ASSEMBLER_NAME (expr);
   else
 return DECL_NAME (expr);
diff --git a/gcc/testsuite/g++.dg/pph/p2pr36533.cc 
b/gcc/testsuite/g++.dg/pph/p2pr36533.cc
index 8ff602a..3797327 100644
--- a/gcc/testsuite/g++.dg/pph/p2pr36533.cc
+++ b/gcc/testsuite/g++.dg/pph/p2pr36533.cc
@@ -1,6 +1,2 @@
 /* { dg-options "-w -fpermissive" } */
-// pph asm xdiff 25347
-// xfail BOGUS INTRINSIC
-// failing to recognise memset as an intrinsic
-
 #include "p1pr36533.h"
diff --git a/gcc/testsuite/g++.dg/pph/p4mean.cc 
b/gcc/testsuite/g++.dg/pph/p4mean.cc
index 80c2db6..e832ce5 100644
--- a/gcc/testsuite/g++.dg/pph/p4mean.cc
+++ b/gcc/testsuite/g++.dg/pph/p4mean.cc
@@ -1,8 +1,4 @@
 /* { dg-options "-w -fpermissive" }  */
-// pph asm xdiff 39234
-// xfail BOGUS INTRINSIC
-// failing to recognize sqrt as an intrinsic
-
 #include 
 #include 
 #include 
diff --git a/gcc/testsuite/g++.dg/pph/p4pr36533.cc 
b/gcc/testsuite/g++.dg/pph/p4pr36533.cc
index b230095..1fd03fa 100644
--- a/gcc/testsuite/g++.dg/pph/p4pr36533.cc
+++ b/gcc/testsuite/g++.dg/pph/p4pr36533.cc
@@ -1,6 +1,2 @@
 /* { dg-options "-w -fpermissive" } */
-// pph asm xdiff 25347
-// xfail BOGUS INTRINSIC
-// failing to recognise memset as an intrinsic
-
 #include "p4pr36533.h"

--
This patch is available for review at http://codereview.appspot.com/5276044

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner
 wrote:
>> The answer to H.J.'s "Why do we do it for MEM then?" is simply
>> "because no one ever thought about not doing it"
>
> No, that's false.  The same expand_compound_operation / 
> make_compound_operation
> pair is present in the MEM case as in the SET case.  It's just that
> there's some bug here that's noticable in not making proper MEMs that
> doesn't show up in the SET case because of the way the insns are structured.
>

When we have (and (OP) M) where

(and (OP) M) == (and (OP) ((1 << ceil_log2 (M)) - 1) ))

(and (OP) M) is zero_extract bits 0 to ceil_log2 (M).

Does it look OK?

Thanks.

-- 
H.J.
---
diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..5962b1d 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7758,6 +7758,23 @@ make_compound_operation (rtx x, enum rtx_code in_code)
next_code),
   i, NULL_RTX, 1, 1, 0, 1);

+
+  /* If we are (and (OP) M) and M is an extraction mask, this is an
+extraction.  */
+  else
+   {
+ unsigned HOST_WIDE_INT nonzero =
+   nonzero_bits (XEXP (x, 0), GET_MODE (XEXP (x, 0)));
+ unsigned HOST_WIDE_INT mask = INTVAL (XEXP (x, 1));
+ unsigned HOST_WIDE_INT len = ceil_log2 (mask);
+ if ((nonzero & (((unsigned HOST_WIDE_INT) 1 << len) - 1))
+ == (nonzero & mask))
+   {
+ new_rtx = make_compound_operation (XEXP (x, 0), next_code);
+ new_rtx = make_extraction (mode, new_rtx, 0, NULL_RTX,
+len, 1, 0, in_code == COMPARE);
+   }
+   }
   break;

 case LSHIFTRT:

RE: [Patch,AVR] Print no-return functions as JMP

2011-10-13 Thread Paul_Koning

>> You should have a way to turn this off.  Otherwise this makes 
>> debugging the call to abort impossible.
>
>What do you propose?
>
>o A command line option that is on per default like
>  -mnoreturn-tail-calls or -mjmp-noreturn
>
>o Hard-coded factor out some function names like "abort",
>  "exit", "_exit"

I'd suggest the first option.  That way you can do this for other similar 
functions like panic().

paul

Re: [Patch,AVR] Print no-return functions as JMP

2011-10-13 Thread Richard Henderson

On 10/13/2011 12:00 PM, Georg-Johann Lay wrote:
> What do you propose?
> 
> o A command line option that is on per default like
>   -mnoreturn-tail-calls or -mjmp-noreturn

The command-line-option.  I think I prefer -mjump-noreturn,
as the inverse -mno-noreturn-tail-calls is too awkward.


r~

Re: [Patch,AVR] Print no-return functions as JMP

2011-10-13 Thread Georg-Johann Lay

Richard Henderson schrieb:
> On 10/13/2011 11:16 AM, Georg-Johann Lay wrote:
>> This patch saves some ticks and bytes on stack by JUMPing to no-return
>> functions instead of CALLing them.
>>
>> Passes without regression.
>>
>> Ok for trunk?
>>
>> Johann
>>
>>  * config/avr/avr-protos.h (avr_out_call): New prototype.
>>  * config/avr/avr.md (adjust_len): Add alternative "call".
>>  (call_insn, call_calue_insn): Use it.  Use avr_out_call to print
>>  assembler.
>>  * config/avr/avr.c (avr_out_call): New function.
>>  (adjust_insn_length): Handle ADJUST_LEN_CALL.
> 
> You should have a way to turn this off.  Otherwise this makes debugging
> the call to abort impossible.
> 
> r~

What do you propose?

o A command line option that is on per default like
  -mnoreturn-tail-calls or -mjmp-noreturn

o Hard-coded factor out some function names like "abort",
  "exit", "_exit"

Johann

Re: [C++ Patch] PR 17212

2011-10-13 Thread Mike Stump

On Oct 13, 2011, at 6:53 AM, Paolo Carlini wrote:
>> Why not support it in Obj-C++, too?
> 
> Yes I briefly wondered that but I know *so* little about that front end... Do 
> you think we can just add it? Probably yes ;)

The ground rule is, make ObjC behave just like C, unless an ObjC expert decides 
differently.
The C++ rule is the same.  The mental model is to think of the Obj modifier as 
a slight fix, modifier on the language.  Rather, than, copy the entire 
language, and then forever have to put into the tree, every new feature from 
the base language.  gcc/cp was done in the copy the world (and suffers from 
that in various ways) style.  Obj tries hard to be in the, a few small patches 
style.  It'd be nice if we had NotObjC instead of ObjC, and NotObjC++ instead 
of ObjC++ to reduce the burden on regular folks.  Yes, it is always safe.  
Thanks.

RE: [Patch,AVR] Fix PR46278, Take #3

2011-10-13 Thread Weddington, Eric



> -Original Message-
> From: Georg-Johann Lay [mailto:a...@gjlay.de]
> Sent: Thursday, October 13, 2011 8:32 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Anatoly Sokolov; Denis Chertykov; Weddington, Eric
> Subject: [Patch,AVR] Fix PR46278, Take #3
> 
> This is yet another attempt to fix PR46278 (fake X addressing).
> 
> After the previous clean-ups it is just a small change.
> 
> caller-saves.c tries to eliminate call-clobbered hard-regs allocated
to
> pseudos
> around function calls and that leads to situations that reload is no
more
> capable to perform all requested spills because of the very few AVR's
> address
> registers.
> 
> Thus, the patch adds a new target option -mstrict-X so that the user
can
> turn
> that option if he like to do so, and then -fcaller-save is disabled.
> 
> The patch passes the testsuite without regressions. Moreover, the
> testsuite
> passes without regressions if all test cases are run with -mstrict-X
and
> all
> libraries (libgcc, avr-libc) are built with the new option turned on.

Hi Johann,

Sorry, I haven't been keeping up with the discussion on this PR.

But if all test cases pass with running -mstrict-X and everything built
with that option on, then why is this even an option? Is it because that
it may not always reduce code size?...

Thanks,
Eric

Re: [rs6000] Enable scalar shifts of vectors

2011-10-13 Thread Richard Henderson

On 10/13/2011 11:36 AM, David Edelsohn wrote:
> Are there testcases in the GCC testsuite that exercise these patterns?

I thought the vectorizer would use them.  E.g. gcc.dg/vect/vect-shift-3.c.

I see that I should have added ppc to check_effective_target_vect_shift_scalar,
though, to enable even more testing.


r~

Re: [Patch,AVR] Print no-return functions as JMP

2011-10-13 Thread Richard Henderson

On 10/13/2011 11:16 AM, Georg-Johann Lay wrote:
> This patch saves some ticks and bytes on stack by JUMPing to no-return
> functions instead of CALLing them.
> 
> Passes without regression.
> 
> Ok for trunk?
> 
> Johann
> 
>   * config/avr/avr-protos.h (avr_out_call): New prototype.
>   * config/avr/avr.md (adjust_len): Add alternative "call".
>   (call_insn, call_calue_insn): Use it.  Use avr_out_call to print
>   assembler.
>   * config/avr/avr.c (avr_out_call): New function.
>   (adjust_insn_length): Handle ADJUST_LEN_CALL.

You should have a way to turn this off.  Otherwise this makes debugging
the call to abort impossible.


r~

Re: [rs6000] Enable scalar shifts of vectors

2011-10-13 Thread David Edelsohn

On Wed, Oct 12, 2011 at 6:32 PM, Richard Henderson  wrote:
> I suppose technically the middle-end could be improved to implement
> ashl as vashl by broadcasting the scalar, but Altivec
> is the only extant SIMD ISA that would make use of this.  All of
> the others can arrange for constant shifts to be encoded into the
> insn, and so implement the ashl named pattern.
>
> Tested on ppc64-linux, --with-cpu=G5.

Richard,

Are there testcases in the GCC testsuite that exercise these patterns?

Thanks, David

[Patch,AVR] Print no-return functions as JMP

2011-10-13 Thread Georg-Johann Lay

This patch saves some ticks and bytes on stack by JUMPing to no-return
functions instead of CALLing them.

Passes without regression.

Ok for trunk?

Johann

* config/avr/avr-protos.h (avr_out_call): New prototype.
* config/avr/avr.md (adjust_len): Add alternative "call".
(call_insn, call_calue_insn): Use it.  Use avr_out_call to print
assembler.
* config/avr/avr.c (avr_out_call): New function.
(adjust_insn_length): Handle ADJUST_LEN_CALL.
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 179843)
+++ config/avr/avr.md	(working copy)
@@ -133,11 +133,10 @@ (define_attr "length" ""
 ;; Following insn attribute tells if and how the adjustment has to be
 ;; done:
 ;; no No adjustment needed; attribute "length" is fine.
-;; yesAnalyse pattern in adjust_insn_length by hand.
 ;; Otherwise do special processing depending on the attribute.
 
 (define_attr "adjust_len"
-  "out_bitop, out_plus, addto_sp, tsthi, tstsi, compare,
+  "out_bitop, out_plus, addto_sp, tsthi, tstsi, compare, call,
mov8, mov16, mov32, reload_in16, reload_in32,
ashlqi, ashrqi, lshrqi,
ashlhi, ashrhi, lshrhi,
@@ -3634,21 +3633,12 @@ (define_insn "call_insn"
   ;; Operand 1 not used on the AVR.
   ;; Operand 2 is 1 for tail-call, 0 otherwise.
   ""
-  "@
-%!icall
-%~call %x0
-%!ijmp
-%~jmp %x0"
+  {
+ return avr_out_call (insn, operands[0], 0 != INTVAL (operands[2]));
+  }
   [(set_attr "cc" "clobber")
-   (set_attr_alternative "length"
- [(const_int 1)
-  (if_then_else (eq_attr "mcu_mega" "yes")
-(const_int 2)
-(const_int 1))
-  (const_int 1)
-  (if_then_else (eq_attr "mcu_mega" "yes")
-(const_int 2)
-(const_int 1))])])
+   (set_attr "length" "1,*,1,*")
+   (set_attr "adjust_len" "*,call,*,call")])
 
 (define_insn "call_value_insn"
   [(parallel[(set (match_operand 0 "register_operand"   "=r,r,r,r")
@@ -3658,21 +3648,12 @@ (define_insn "call_value_insn"
   ;; Operand 2 not used on the AVR.
   ;; Operand 3 is 1 for tail-call, 0 otherwise.
   ""
-  "@
-%!icall
-%~call %x1
-%!ijmp
-%~jmp %x1"
+  {
+ return avr_out_call (insn, operands[1], 0 != INTVAL (operands[3]));
+  }
   [(set_attr "cc" "clobber")
-   (set_attr_alternative "length"
- [(const_int 1)
-  (if_then_else (eq_attr "mcu_mega" "yes")
-(const_int 2)
-(const_int 1))
-  (const_int 1)
-  (if_then_else (eq_attr "mcu_mega" "yes")
-(const_int 2)
-(const_int 1))])])
+   (set_attr "length" "1,*,1,*")
+   (set_attr "adjust_len" "*,call,*,call")])
 
 (define_insn "nop"
   [(const_int 0)]
Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(revision 179842)
+++ config/avr/avr-protos.h	(working copy)
@@ -84,6 +84,7 @@ extern const char *avr_out_sbxx_branch (
 extern const char* avr_out_bitop (rtx, rtx*, int*);
 extern const char* avr_out_plus (rtx*, int*, int*);
 extern const char* avr_out_addto_sp (rtx*, int*);
+extern const char* avr_out_call (rtx, rtx, bool);
 extern bool avr_popcount_each_byte (rtx, int, int);
 
 extern int extra_constraint_Q (rtx x);
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 179843)
+++ config/avr/avr.c	(working copy)
@@ -4905,6 +4905,27 @@ avr_out_plus (rtx *xop, int *plen, int *
 }
 
 
+/* Print call insn INSN to the assembler file and return "".
+   ADDRESS is the target address.
+   If SIBCALL_P then INSN is a tail-call.  */
+   
+const char*
+avr_out_call (rtx insn, rtx address, bool sibcall_p)
+{
+  /* No need to waste stack or time for no-return calls.  */
+  
+  if (optimize && find_reg_note (insn, REG_NORETURN, NULL))
+sibcall_p = true;
+
+  if (REG_P (address))
+output_asm_insn (sibcall_p ? "%!ijmp" : "%!icall", &address);
+  else
+output_asm_insn (sibcall_p ? "%~jmp %x0" : "%~call %x0", &address);
+
+  return "";
+}
+
+
 /* Output bit operation (IOR, AND, XOR) with register XOP[0] and compile
time constant XOP[2]:
 
@@ -5311,6 +5332,8 @@ adjust_insn_length (rtx insn, int len)
 case ADJUST_LEN_ASHLQI: ashlqi3_out (insn, op, &len); break;
 case ADJUST_LEN_ASHLHI: ashlhi3_out (insn, op, &len); break;
 case ADJUST_LEN_ASHLSI: ashlsi3_out (insn, op, &len); break;
+
+case ADJUST_LEN_CALL: len = AVR_HAVE_JMP_CALL ? 2 : 1; break;
   
 default:
   gcc_unreachable();

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> The answer to H.J.'s "Why do we do it for MEM then?" is simply
> "because no one ever thought about not doing it" 

No, that's false.  The same expand_compound_operation / make_compound_operation
pair is present in the MEM case as in the SET case.  It's just that
there's some bug here that's noticable in not making proper MEMs that
doesn't show up in the SET case because of the way the insns are structured.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> We first expand zero_extend:DI address to and:DI and then try
> to restore zero_extend:DI.   Why do we do this transformation
> to begin with?

Suppose there were an outer AND that duplicated what this one did.
Then when you combine those two, you merge it to one AND.  Then
make_compound_operation puts it back.  The net result is to eliminate
the outer AND.  There are lots of similar sorts of things.

As I said, the strategy there was to convert extractions and expansions
into the corresponding logical and shift operations, see if they can
merge with something outside (which is similarly converted), then convert
the result (possibly merged) back.

This, for example, is the code that will remove nested SIGN_EXTENDs.

C++ PATCH for c++/50618 (wrong-code with virtual bases)

2011-10-13 Thread Jason Merrill

When an object is value-initialized, if the type doesn't have a 
user-provided default constructor, the object is zero-initialized first, 
and then the synthesized constructor is called.  The problem in this PR 
was that when value-initializing a base in a constructor we were 
zero-initializing virtual bases of that base even though they had 
already been initialized properly.  The fix is to specify to 
build_zero_init_1 that we only want to clear the as-base portion of the 
type.


On the trunk I also tidied up the logic; on release branches I made the 
minimal change.  For the 4.4 branch I also needed to backport the fix 
for 48035.


Tested x86_64-pc-linux-gnu, applying to 4.4, 4.5, 4.6 and trunk.
commit ae7159a0bd2822557c503cd85d911e0390aaf55d
Author: Jason Merrill 
Date:   Thu Oct 13 12:59:31 2011 -0400

	PR c++/50618
	* init.c (expand_aggr_init_1): Don't zero-initialize virtual
	bases of a base subobject.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 7897fff..8a5bece 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -1588,27 +1588,26 @@ expand_aggr_init_1 (tree binfo, tree true_exp, tree exp, tree init, int flags,
  that's value-initialization.  */
   if (init == void_type_node)
 {
-  /* If there's a user-provided constructor, we just call that.  */
-  if (type_has_user_provided_constructor (type))
-	/* Fall through.  */;
-  /* If there isn't, but we still need to call the constructor,
-	 zero out the object first.  */
-  else if (type_build_ctor_call (type))
+  /* If no user-provided ctor, we need to zero out the object.  */
+  if (!type_has_user_provided_constructor (type))
 	{
-	  init = build_zero_init (type, NULL_TREE, /*static_storage_p=*/false);
+	  tree field_size = NULL_TREE;
+	  if (exp != true_exp
+	  && CLASSTYPE_AS_BASE (type) != type)
+	/* Don't clobber already initialized virtual bases.  */
+	field_size = TYPE_SIZE (CLASSTYPE_AS_BASE (type));
+	  init = build_zero_init_1 (type, NULL_TREE, /*static_storage_p=*/false,
+field_size);
 	  init = build2 (INIT_EXPR, type, exp, init);
 	  finish_expr_stmt (init);
-	  /* And then call the constructor.  */
 	}
+
   /* If we don't need to mess with the constructor at all,
-	 then just zero out the object and we're done.  */
-  else
-	{
-	  init = build2 (INIT_EXPR, type, exp,
-			 build_value_init_noctor (type, complain));
-	  finish_expr_stmt (init);
-	  return;
-	}
+	 then we're done.  */
+  if (! type_build_ctor_call (type))
+	return;
+
+  /* Otherwise fall through and call the constructor.  */
   init = NULL_TREE;
 }
 
diff --git a/gcc/testsuite/g++.dg/init/vbase1.C b/gcc/testsuite/g++.dg/init/vbase1.C
new file mode 100644
index 000..bbfd58f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/vbase1.C
@@ -0,0 +1,39 @@
+// PR c++/50618
+// { dg-do run }
+
+struct Base
+{
+const int text;
+Base():text(1) {}
+Base(int aText)
+: text(aText) {}
+};
+struct SubA : public virtual Base
+{
+protected:
+  int x;
+public:
+  SubA(int aX)
+  : x(aX) {}
+};
+class SubB : public virtual Base
+{};
+struct Diamond : public SubA, public SubB
+{
+Diamond(int text)
+: Base(text), SubA(5), SubB() {}
+
+void printText()
+{
+if(text != 2)
+  __builtin_abort();
+if(x!=5)
+  __builtin_abort();
+}
+};
+
+int main(int, char**)
+{
+Diamond x(2);
+x.printText();
+}
commit fda61bbc8fd29b1df3dbeb576e0dcf806b2fcdf5
Author: Jason Merrill 
Date:   Thu Oct 13 13:11:31 2011 -0400

	PR c++/50618
	* init.c (expand_aggr_init_1): Don't zero-initialize virtual
	bases of a base subobject.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index c4bd635..f85a30b 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -1561,7 +1561,12 @@ expand_aggr_init_1 (tree binfo, tree true_exp, tree exp, tree init, int flags,
 	 zero out the object first.  */
   else if (TYPE_NEEDS_CONSTRUCTING (type))
 	{
-	  init = build_zero_init (type, NULL_TREE, /*static_storage_p=*/false);
+	  tree field_size = NULL_TREE;
+	  if (exp != true_exp && CLASSTYPE_AS_BASE (type) != type)
+	/* Don't clobber already initialized virtual bases.  */
+	field_size = TYPE_SIZE (CLASSTYPE_AS_BASE (type));
+	  init = build_zero_init_1 (type, NULL_TREE, /*static_storage_p=*/false,
+field_size);
 	  init = build2 (INIT_EXPR, type, exp, init);
 	  finish_expr_stmt (init);
 	  /* And then call the constructor.  */
diff --git a/gcc/testsuite/g++.dg/init/vbase1.C b/gcc/testsuite/g++.dg/init/vbase1.C
new file mode 100644
index 000..bbfd58f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/vbase1.C
@@ -0,0 +1,39 @@
+// PR c++/50618
+// { dg-do run }
+
+struct Base
+{
+const int text;
+Base():text(1) {}
+Base(int aText)
+: text(aText) {}
+};
+struct SubA : public virtual Base
+{
+protected:
+  int x;
+public:
+  SubA(int aX)
+  : x(aX) {}
+};
+class SubB : public virtual Base
+{};
+struct Diamond : public SubA, public SubB
+{
+Di

Re: [PATCH, ARM] Unaligned accesses for builtin memcpy [2/2]

2011-10-13 Thread Julian Brown

On Wed, 28 Sep 2011 14:33:17 +0100
Ramana Radhakrishnan  wrote:

> On 6 May 2011 14:13, Julian Brown  wrote:
> > Hi,
> >
> > This is the second of two patches to add unaligned-access support to
> > the ARM backend. It builds on the first patch to provide support for
> > unaligned accesses when expanding block moves (i.e. for builtin
> > memcpy operations). It makes some effort to use load/store multiple
> > instructions where appropriate (when accessing sufficiently-aligned
> > source or destination addresses), and also makes some effort to
> > generate fast code (for -O1/2/3) or small code (for -Os), though
> > some of the heuristics may need tweaking still
> 
> Sorry it's taken me a while to get around to this one. Do you know
> what difference this makes to performance on some standard benchmarks
> on let's say an A9 and an M4 as I see that this gets triggered only
> when we have less than 64 bytes to copy. ?

No, sorry, I don't have any benchmark results available at present. I
think we'd have to have terrifically bad luck for it to be a
performance degradation, though...

> Please add a few testcases from the examples that you've shown here to
> be sure that ldm's are being generated in the right cases.

I've added test cases which cover copies with combinations of
aligned/unaligned sources/destinations, gated on a new
require-effective-target so the tests only run when suitable support is
available.

I re-tested the patch for good measure, in case of bitrot (and the new
tests pass with the patch applied, of course). OK to apply now?

Thanks,

Julian

ChangeLog

gcc/
* config/arm/arm.c (arm_block_move_unaligned_straight)
(arm_adjust_block_mem, arm_block_move_unaligned_loop)
(arm_movmemqi_unaligned): New.
(arm_gen_movmemqi): Support unaligned block copies.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_arm_unaligned):
New.
* gcc.target/arm/unaligned-memcpy-1.c: New.
* gcc.target/arm/unaligned-memcpy-2.c: New.
* gcc.target/arm/unaligned-memcpy-3.c: New.
* gcc.target/arm/unaligned-memcpy-4.c: New.
Index: gcc/testsuite/gcc.target/arm/unaligned-memcpy-3.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-memcpy-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-memcpy-3.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_unaligned } */
+/* { dg-options "-O2" } */
+
+#include 
+
+char src[16];
+
+void aligned_src (char *dest)
+{
+  memcpy (dest, src, 15);
+}
+
+/* Expect a multi-word load for the main part of the copy, but subword
+   loads/stores for the remainder.  */
+
+/* { dg-final { scan-assembler-times "ldmia" 1 } } */
+/* { dg-final { scan-assembler-times "ldrh" 1 } } */
+/* { dg-final { scan-assembler-times "strh" 1 } } */
+/* { dg-final { scan-assembler-times "ldrb" 1 } } */
+/* { dg-final { scan-assembler-times "strb" 1 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-memcpy-4.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-memcpy-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-memcpy-4.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_unaligned } */
+/* { dg-options "-O2" } */
+
+#include 
+
+char src[16];
+char dest[16];
+
+void aligned_both (void)
+{
+  memcpy (dest, src, 15);
+}
+
+/* We know both src and dest to be aligned: expect multiword loads/stores.  */
+
+/* { dg-final { scan-assembler-times "ldmia" 1 } } */
+/* { dg-final { scan-assembler-times "stmia" 1 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-memcpy-1.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-memcpy-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-memcpy-1.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_unaligned } */
+/* { dg-options "-O2" } */
+
+#include 
+
+void unknown_alignment (char *dest, char *src)
+{
+  memcpy (dest, src, 15);
+}
+
+/* We should see three unaligned word loads and store pairs, one unaligned
+   ldrh/strh pair, and an ldrb/strb pair.  Sanity check that.  */
+
+/* { dg-final { scan-assembler-times "@ unaligned" 8 } } */
+/* { dg-final { scan-assembler-times "ldrh" 1 } } */
+/* { dg-final { scan-assembler-times "strh" 1 } } */
+/* { dg-final { scan-assembler-times "ldrb" 1 } } */
+/* { dg-final { scan-assembler-times "strb" 1 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-memcpy-2.c
===
--- gcc/testsuite/gcc.target/arm/unaligned-memcpy-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-memcpy-2.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_unaligned } */
+/* { dg-options "-O2" } */
+
+#include 
+
+char dest[16];
+
+void aligned_dest (char *src)
+

RE: Intrinsics for N2965: Type traits and base classes

2011-10-13 Thread Michael Spertus

Addressing Jason's comments:

Index: libstdc++-v3/include/tr2/type_traits
===
--- libstdc++-v3/include/tr2/type_traits(revision 0)
+++ libstdc++-v3/include/tr2/type_traits(revision 0)
@@ -0,0 +1,96 @@
+// TR2 type_traits -*- C++ -*-
+
+// Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
+// Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// .
+
+/** @file tr2/type_traits
+ *  This is a TR2 C++ Library header.
+ */
+
+#ifndef _GLIBCXX_TR2_TYPE_TRAITS
+#define _GLIBCXX_TR2_TYPE_TRAITS 1
+
+#pragma GCC system_header
+#include 
+#include 
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+namespace tr2
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  /**
+   * @defgroup metaprogramming Type Traits
+   * @ingroup utilities
+   *
+   * Compile time type transformation and information.
+   * @{
+   */
+
+  template struct typelist;
+  template<>
+struct typelist<>
+{
+  typedef std::true_type empty;
+};
+
+  template
+struct typelist<_First, _Rest...>
+{
+  struct first
+  {
+typedef _First type;
+  };
+
+  struct rest
+  {
+typedef typelist<_Rest...> type;
+  };
+
+  typedef std::false_type empty;
+};
+
+  // Sequence abstraction metafunctions default to looking in the type
+  template struct first : public T::first {};
+  template struct rest : public T::rest {};
+  template struct empty : public T::empty {};
+
+
+  template
+struct bases
+{
+ typedef typelist<__bases(T)...> type;
+};
+
+  template
+struct direct_bases
+{
+  typedef typelist<__direct_bases(T)...> type;
+};
+
+_GLIBCXX_END_NAMESPACE_VERSION
+}
+}
+
+#endif // _GLIBCXX_TR2_TYPE_TRAITS
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 178892)
+++ gcc/c-family/c-common.c (working copy)
@@ -423,6 +423,7 @@
   { "__asm__", RID_ASM,0 },
   { "__attribute", RID_ATTRIBUTE,  0 },
   { "__attribute__",   RID_ATTRIBUTE,  0 },
+  { "__bases",  RID_BASES, D_CXXONLY },
   { "__builtin_choose_expr", RID_CHOOSE_EXPR, D_CONLY },
   { "__builtin_complex", RID_BUILTIN_COMPLEX, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
@@ -433,6 +434,7 @@
   { "__const", RID_CONST,  0 },
   { "__const__",   RID_CONST,  0 },
   { "__decltype",   RID_DECLTYPE,   D_CXXONLY },
+  { "__direct_bases",   RID_DIRECT_BASES, D_CXXONLY },
   { "__extension__",   RID_EXTENSION,  0 },
   { "__func__",RID_C99_FUNCTION_NAME, 0 },
   { "__has_nothrow_assign", RID_HAS_NOTHROW_ASSIGN, D_CXXONLY },
Index: gcc/c-family/c-common.h
===
--- gcc/c-family/c-common.h (revision 178892)
+++ gcc/c-family/c-common.h (working copy)
@@ -129,12 +129,13 @@
   RID_CONSTCAST, RID_DYNCAST, RID_REINTCAST, RID_STATCAST,

   /* C++ extensions */
+  RID_BASES,  RID_DIRECT_BASES,
   RID_HAS_NOTHROW_ASSIGN,  RID_HAS_NOTHROW_CONSTRUCTOR,
   RID_HAS_NOTHROW_COPY,RID_HAS_TRIVIAL_ASSIGN,
   RID_HAS_TRIVIAL_CONSTRUCTOR, RID_HAS_TRIVIAL_COPY,
   RID_HAS_TRIVIAL_DESTRUCTOR,  RID_HAS_VIRTUAL_DESTRUCTOR,
   RID_IS_ABSTRACT, RID_IS_BASE_OF,
-  RID_IS_CONVERTIBLE_TO,   RID_IS_CLASS,
+  RID_IS_CLASS,RID_IS_CONVERTIBLE_TO,
   RID_IS_EMPTY,RID_IS_ENUM,
   RID_IS_LITERAL_TYPE, RID_IS_POD,
   RID_IS_POLYMORPHIC,  RID_IS_STD_LAYOUT,
Index: gcc/testsuite/g++.dg/ext/bases.C
===
--- gcc/testsuite/g++.dg/ext/bases.C(revision 0)
+++ gcc/testsuite/g++.dg/ext/bases.C(revision 0)
@@ -0,0 +1,29 @@
+// { dg-do run }
+#include
+#include
+// A simple typelist
+template struct types {};
+
+// Simple bases implementation
+template struct b {
+  typedef types<__bases(T)...> type;
+};
+
+// Simple dire

Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.

2011-10-13 Thread Richard Henderson

On 10/11/2011 02:21 AM, Sameera Deshpande wrote:
> +/* When saved-register index (i) is odd, RTXs for both the 
> registers
> +   to be loaded are generated in above given LDRD pattern, and 
> the
> +   pattern can be emitted now.  */
> +par = emit_insn (par);
> +add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf);

I don't believe REG_FRAME_RELATED_EXPR does the right thing for 
anything besides prologues.  You need to emit REG_CFA_RESTORE
for the pop inside an epilogue.

r~

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini

On Thu, Oct 13, 2011 at 19:06, Richard Kenner
 wrote:
>> An and:DI is cheaper than a zero_extend:DI of an and:SI.
>
> That depends strongly on the constants and whether the machine is 32-bit
> or 64-bit.

Yes, the rtx_costs take care of that.

> But that's irrelevant in this case since the and:SI will be removed (it
> reflects what already been done).

Do you refer to this in make_extraction:

  /* See if this can be done without an extraction.  We never can if the
 width of the field is not the same as that of some integer mode. For
 registers, we can only avoid the extraction if the position is at the
 low-order bit and this is either not in the destination or we have the
 appropriate STRICT_LOW_PART operation available.  */

and this call to force_to_mode in particular:

new_rtx = force_to_mode (inner, tmode,
 len >= HOST_BITS_PER_WIDE_INT
 ? ~(unsigned HOST_WIDE_INT) 0
 : ((unsigned HOST_WIDE_INT) 1 << len) - 1,
 0);

and from there the call to simplify_and_const_int that does this:

  if (constop == nonzero)
return varop;

?

Then indeed it should work if you call make_extraction more greedily
than what we do now (which is, just if the constant is one less than a
power of two).

The answer to H.J.'s "Why do we do it for MEM then?" is simply
"because no one ever thought about not doing it" (because there are no
other POINTERS_EXTEND_UNSIGNED == 1 machines).  In fact it may even be
advantageous to do it in general, even if in_code != MEM.  Only
experimentation can tell.

Paolo

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 10:21 AM, Paolo Bonzini  wrote:
> On Thu, Oct 13, 2011 at 19:19, H.J. Lu  wrote:
>> On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini  wrote:
>>> On 10/13/2011 06:35 PM, Richard Kenner wrote:
>
> It never calls make_extraction.  There are several cases handled
> for AND operation. But
>
> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>                (const_int 4 [0x4])) 0)
>        (subreg:DI (reg:SI 106) 0))
>    (const_int 4294967292 [0xfffc]))
>
> isn't one of them.

 Yes, clearly.  Otherwise it would work!  The correct fix for this problem
 is to make it to do that.  That's where this needs to be fixed: in
 make_compound_operation.
>>>
>>> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
>>> in not doing this transformation.  I think adding a case to
>>> make_compound_operation that simply undoes the transformation (without
>>> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>>>
>>
>> We first expand zero_extend:DI address to and:DI and then try
>> to restore zero_extend:DI.   Why do we do this transformation
>> to begin with?
>
> Because outside of a MEM it may be beneficial _not_ to restore
> zero_extend:DI in this case (depending on rtx_costs).
>

Why do we do it for MEM then?

-- 
H.J.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini

On Thu, Oct 13, 2011 at 19:19, H.J. Lu  wrote:
> On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini  wrote:
>> On 10/13/2011 06:35 PM, Richard Kenner wrote:

 It never calls make_extraction.  There are several cases handled
 for AND operation. But

 (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
                (const_int 4 [0x4])) 0)
        (subreg:DI (reg:SI 106) 0))
    (const_int 4294967292 [0xfffc]))

 isn't one of them.
>>>
>>> Yes, clearly.  Otherwise it would work!  The correct fix for this problem
>>> is to make it to do that.  That's where this needs to be fixed: in
>>> make_compound_operation.
>>
>> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
>> in not doing this transformation.  I think adding a case to
>> make_compound_operation that simply undoes the transformation (without
>> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>>
>
> We first expand zero_extend:DI address to and:DI and then try
> to restore zero_extend:DI.   Why do we do this transformation
> to begin with?

Because outside of a MEM it may be beneficial _not_ to restore
zero_extend:DI in this case (depending on rtx_costs).

Paolo

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini  wrote:
> On 10/13/2011 06:35 PM, Richard Kenner wrote:
>>>
>>> It never calls make_extraction.  There are several cases handled
>>> for AND operation. But
>>>
>>> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>>>                (const_int 4 [0x4])) 0)
>>>        (subreg:DI (reg:SI 106) 0))
>>>    (const_int 4294967292 [0xfffc]))
>>>
>>> isn't one of them.
>>
>> Yes, clearly.  Otherwise it would work!  The correct fix for this problem
>> is to make it to do that.  That's where this needs to be fixed: in
>> make_compound_operation.
>
> An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is correct
> in not doing this transformation.  I think adding a case to
> make_compound_operation that simply undoes the transformation (without
> calling make_extraction) is fine if you guard it with if (in_code == MEM).
>

We first expand zero_extend:DI address to and:DI and then try
to restore zero_extend:DI.   Why do we do this transformation
to begin with?


-- 
H.J.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> An and:DI is cheaper than a zero_extend:DI of an and:SI.  

That depends strongly on the constants and whether the machine is 32-bit
or 64-bit. 

But that's irrelevant in this case since the and:SI will be removed (it
reflects what already been done).

Re: Ping shrink wrap patches

2011-10-13 Thread Bernd Schmidt

On 10/13/11 18:50, Bernd Schmidt wrote:
> On 10/13/11 14:27, Alan Modra wrote:
>> Without the ifcvt
>> optimization for a function "int foo (int x)" we might have something
>> like
>>
>>  r29 = r3; // save r3 in callee saved reg
>>  if (some test) goto exit_label
>>  // main body of foo, calling other functions
>>  r3 = 0;
>>  return;
>> exit_label:
>>  r3 = 1;
>>  return;
>>
>> Bernd's http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00380.html quite
>> happily rearranges the r29 assignment to be after the "if", and shrink
>> wrapping occurs.  With the ifcvt optimization we get
>>
>>  r29 = r3; // save r3 in callee saved reg
>>  r3 = 1;
>>  if (some test) goto exit_label
>>  // main body of foo, calling other functions
>>  r3 = 0;
>> exit_label:
>>  return;
> 
> I wonder if this can't be described as another case for moving an insn
> downwards in prepare_shrink_wrap, rather than stopping ifcvt?

I.e. something like this? Minimally tested by inspecting some generated
assembly. I haven't found a case where it enables extra shrink-wrapping
on i686, but maybe it's different on ppc?


Bernd

Index: /local/src/egcs/scratch-trunk/gcc/function.c
===
--- /local/src/egcs/scratch-trunk/gcc/function.c(revision 179848)
+++ /local/src/egcs/scratch-trunk/gcc/function.c(working copy)
@@ -5369,13 +5369,13 @@ static void
 prepare_shrink_wrap (basic_block entry_block)
 {
   rtx insn, curr;
-  FOR_BB_INSNS_SAFE (entry_block, insn, curr)
+  FOR_BB_INSNS_REVERSE_SAFE (entry_block, insn, curr)
 {
   basic_block next_bb;
   edge e, live_edge;
   edge_iterator ei;
-  rtx set, scan;
-  unsigned destreg, srcreg;
+  rtx set, src, dst, scan;
+  unsigned destreg;
 
   if (!NONDEBUG_INSN_P (insn))
continue;
@@ -5383,12 +5383,14 @@ prepare_shrink_wrap (basic_block entry_b
   if (!set)
continue;
 
-  if (!REG_P (SET_SRC (set)) || !REG_P (SET_DEST (set)))
+  src = SET_SRC (set);
+  dst = SET_DEST (set);
+  if (!(REG_P (src) || CONSTANT_P (src)) || !REG_P (dst))
continue;
-  srcreg = REGNO (SET_SRC (set));
-  destreg = REGNO (SET_DEST (set));
-  if (hard_regno_nregs[srcreg][GET_MODE (SET_SRC (set))] > 1
- || hard_regno_nregs[destreg][GET_MODE (SET_DEST (set))] > 1)
+  destreg = REGNO (dst);
+  if (hard_regno_nregs[destreg][GET_MODE (dst)] > 1)
+   continue;
+  if (REG_P (src) && hard_regno_nregs[REGNO (src)][GET_MODE (src)] > 1)
continue;
 
   next_bb = entry_block;
@@ -5436,7 +5438,8 @@ prepare_shrink_wrap (basic_block entry_b
if (REG_NOTE_KIND (link) == REG_INC)
  record_hard_reg_sets (XEXP (link, 0), NULL, &set_regs);
 
- if (TEST_HARD_REG_BIT (set_regs, srcreg)
+ if ((REG_P (src)
+  && TEST_HARD_REG_BIT (set_regs, REGNO (src)))
  || reg_referenced_p (SET_DEST (set),
   PATTERN (scan)))
{

Re: Vector alignment tracking

2011-10-13 Thread Jakub Jelinek

On Thu, Oct 13, 2011 at 06:57:47PM +0200, Andi Kleen wrote:
> > Or I am missing someting?
> 
> I often see the x86 vectorizer with -mtune=generic generate a lot of
> complicated code just to adjust for potential misalignment.
> 
> My thought was just if the alias oracle knows what the original
> declaration is, and it's available for changes (e.g. LTO), it would be 
> likely be better to just add an __attribute__((aligned()))
> there.
> 
> In the general case it's probably harder, you would need some 
> cost model to decide when it's worth it.

GCC already does that on certain targets, see
increase_alignment in tree-vectorizer.c.  Plus, various backends attempt
to align larger arrays more than they have to be aligned.

Jakub

Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-10-13 Thread Richard Henderson

On 10/13/2011 08:49 AM, Peter Bergner wrote:
> +   if (TARGET_LINK_STACK)
> + asm_fprintf (file, "\tbl 1f\n\tb 2f\n1:\n\tblr\n2:\n");
> +   else
> + asm_fprintf (file, "\tbcl 20,31,1f\n1:\n");

Wouldn't it be better to set up an out-of-line "blr" insn that could
be shared by all instances?  That would solve a lot of this sort of
this sort of branch-to-branch-to-branch ugliness.

See the i386 port for an example of this, if you need it.

r~

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini


On 10/13/2011 06:35 PM, Richard Kenner wrote:

It never calls make_extraction.  There are several cases handled
for AND operation. But

(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))
(const_int 4294967292 [0xfffc]))

isn't one of them.


Yes, clearly.  Otherwise it would work!  The correct fix for this problem
is to make it to do that.  That's where this needs to be fixed: in
make_compound_operation.


An and:DI is cheaper than a zero_extend:DI of an and:SI.  So GCC is 
correct in not doing this transformation.  I think adding a case to 
make_compound_operation that simply undoes the transformation (without 
calling make_extraction) is fine if you guard it with if (in_code == MEM).


Paolo

Re: Vector alignment tracking

2011-10-13 Thread Andi Kleen

> Or I am missing someting?

I often see the x86 vectorizer with -mtune=generic generate a lot of
complicated code just to adjust for potential misalignment.

My thought was just if the alias oracle knows what the original
declaration is, and it's available for changes (e.g. LTO), it would be 
likely be better to just add an __attribute__((aligned()))
there.

In the general case it's probably harder, you would need some 
cost model to decide when it's worth it.

Your approach of course would still be needed for cases where this
isn't possible. But it sounded like the infrastructure you're building
could in principle do both.

-Andi

Re: Ping shrink wrap patches

2011-10-13 Thread Bernd Schmidt

On 10/13/11 14:27, Alan Modra wrote:
> Without the ifcvt
> optimization for a function "int foo (int x)" we might have something
> like
> 
>  r29 = r3; // save r3 in callee saved reg
>  if (some test) goto exit_label
>  // main body of foo, calling other functions
>  r3 = 0;
>  return;
> exit_label:
>  r3 = 1;
>  return;
> 
> Bernd's http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00380.html quite
> happily rearranges the r29 assignment to be after the "if", and shrink
> wrapping occurs.  With the ifcvt optimization we get
> 
>  r29 = r3; // save r3 in callee saved reg
>  r3 = 1;
>  if (some test) goto exit_label
>  // main body of foo, calling other functions
>  r3 = 0;
> exit_label:
>  return;

I wonder if this can't be described as another case for moving an insn
downwards in prepare_shrink_wrap, rather than stopping ifcvt? Doesn't
matter much however.


Bernd

Re: Ping shrink wrap patches

2011-10-13 Thread Richard Henderson

On 10/13/2011 05:27 AM, Alan Modra wrote:
> Ping
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01002.html
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01003.html

Ok.

> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01596.html
> 
> The last one needs a tweak.
> s/FUNCTION_VALUE_REGNO_P/targetm.calls.function_value_regno_p/,
> or wrap the whole patch in #ifdef FUNCTION_VALUE_REGNO_P. 

Ok with the s///.


r~

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> It never calls make_extraction.  There are several cases handled
> for AND operation. But
> 
> (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
>(const_int 4 [0x4])) 0)
>(subreg:DI (reg:SI 106) 0))
>(const_int 4294967292 [0xfffc]))
> 
> isn't one of them.

Yes, clearly.  Otherwise it would work!  The correct fix for this problem
is to make it to do that.  That's where this needs to be fixed: in
make_compound_operation.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 9:11 AM, Richard Kenner
 wrote:
>> at the end.  make_compound_operation doesn't know how to
>> restore ZERO_EXTEND.
>
> It does in general.  See make_extraction, which it calls.  The question is
> why it doesn't in this case.  That's the bug.
>

It never calls make_extraction.  There are several cases handled
for AND operation. But

(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
   (const_int 4 [0x4])) 0)
   (subreg:DI (reg:SI 106) 0))
   (const_int 4294967292 [0xfffc]))

isn't one of them.

-- 
H.J.

Re: [PATCH] vec_set for 32-byte vectors

2011-10-13 Thread Richard Henderson

On 10/13/2011 09:21 AM, Jakub Jelinek wrote:
>   * config/i386/sse.md (vec_set): Change V_128 iterator mode to V.

Ok.


r~

[PATCH] vec_set for 32-byte vectors

2011-10-13 Thread Jakub Jelinek

Hi!

As noted by Kirill Yukhin (and what lead to the previous tree-ssa.c patch),
vec_set wasn't wired for 32-byte vectors.
Although ix86_expand_vector_set handles 32-byte vectors just fine (even for
AVX and integer vectors), without the expander we'd force things into memory
etc.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2011-10-13  Jakub Jelinek  

* config/i386/sse.md (vec_set): Change V_128 iterator mode
to V.

--- gcc/config/i386/sse.md.jj   2011-10-13 12:26:13.0 +0200
+++ gcc/config/i386/sse.md  2011-10-13 14:50:15.0 +0200
@@ -3786,7 +3786,7 @@ (define_split
 })
 
 (define_expand "vec_set"
-  [(match_operand:V_128 0 "register_operand" "")
+  [(match_operand:V 0 "register_operand" "")
(match_operand: 1 "register_operand" "")
(match_operand 2 "const_int_operand" "")]
   "TARGET_SSE"

Jakub

[committed] Drop TREE_ADDRESSABLE from BIT_FIELD_REF on lhs accessed vectors/complex

2011-10-13 Thread Jakub Jelinek

Hi!

I've noticed that
#define vector(elcount, type)  \
__attribute__((vector_size((elcount)*sizeof(type type

vector (4, int)
f1 (vector (4, int) a, int b)
{
  ((int *)&a)[0] = b;
  return a;
}

as well as

vector (4, int)
f2 (vector (4, int) a, int b)
{
  a[0] = b;
  return a;
}

don't result in vec_set_optab being used, instead the argument is
forced in memory.  The problem is that update_addresses_taken
wouldn't drop TREE_ADDRESSABLE from the vector when it is no
longer address taken.  While it can't be turned into DECL_GIMPLE_REG_P,
TREE_ADDRESSABLE can go, it will still not be considered a gimple register,
but at least the expander will be free to generate better code for it.

Bootstrapped/regtested on x86_64-linux and i686-linux, preapproved by
richi on IRC, committed to trunk.

2011-10-13  Jakub Jelinek  
Richard Guenther  

* tree-ssa.c (maybe_optimize_var): Drop TREE_ADDRESSABLE
from vector or complex vars even if their DECL_UID is in not_reg_needs
bitmap.

--- gcc/tree-ssa.c.jj   2011-10-13 11:19:30.0 +0200
+++ gcc/tree-ssa.c  2011-10-13 14:27:02.0 +0200
@@ -1976,6 +1976,8 @@ maybe_optimize_var (tree var, bitmap add
 a non-register.  Otherwise we are confused and forget to
 add virtual operands for it.  */
   && (!is_gimple_reg_type (TREE_TYPE (var))
+ || TREE_CODE (TREE_TYPE (var)) == VECTOR_TYPE
+ || TREE_CODE (TREE_TYPE (var)) == COMPLEX_TYPE
  || !bitmap_bit_p (not_reg_needs, DECL_UID (var
 {
   TREE_ADDRESSABLE (var) = 0;

Jakub

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner

> at the end.  make_compound_operation doesn't know how to
> restore ZERO_EXTEND.

It does in general.  See make_extraction, which it calls.  The question is
why it doesn't in this case.  That's the bug.

Re: Vector alignment tracking

2011-10-13 Thread Artem Shinkarov

On Thu, Oct 13, 2011 at 4:54 PM, Andi Kleen  wrote:
> Artem Shinkarov  writes:
>>
>> 1) Currently in C we cannot provide information that an array is
>> aligned to a certain number.  The problem is hidden in the fact, that
>
> Have you considered doing it the other way round: when an optimization
> needs something to be aligned, make the declaration aligned?
>
> -Andi

Andi, I can't realistically imagine how could it work.  The problem is
that for an arbitrary arr[x], I have no idea whether it should be
aligned or not.

what if

arr = ptr +  5;
v = *(vec *) arr;

I can make arr aligned, because it would be better for performance,
but obviously, the pointer expression breaks this alignment.  But the
code is valid, because unaligned move is still possible.  So I think
that checking is a more conservative approach.

Or I am missing someting?

Thanks,
Artem.
> --
> a...@linux.intel.com -- Speaking for myself only
>

Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching

2011-10-13 Thread Matthew Gretton-Dann


This patch seems to have caused PR50717:
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50717

Thanks,

Matt

On 19/08/11 15:49, Andrew Stubbs wrote:

On 14/07/11 15:35, Richard Guenther wrote:

Ok.


I've just committed this updated patch.

I found bugs with VOIDmode constants that have caused me to recast my
patches to is_widening_mult_rhs_p. They should be logically the same for
non VOIDmode cases, but work correctly for constants. I think the new
version is a bit easier to understand in any case.

Andrew


widening-multiplies-6.patch


2011-08-19  Andrew Stubbs

gcc/
* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
'type'.
Use 'type' from caller, not inferred from 'rhs'.
Don't reject non-conversion statements. Do return lhs in this case.
(is_widening_mult_p): Add new argument 'type'.
Use 'type' from caller, not inferred from 'stmt'.
Pass type to is_widening_mult_rhs_p.
(convert_mult_to_widen): Pass type to is_widening_mult_p.
(convert_plusminus_to_widen): Likewise.

gcc/testsuite/
* gcc.target/arm/wmul-8.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target arm_dsp } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1966,7 +1966,8 @@ struct gimple_opt_pass pass_optimize_bswap =
   }
  };

-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
 There are two cases:

   - RHS makes some value at least twice as wide.  Store that value
@@ -1976,27 +1977,31 @@ struct gimple_opt_pass pass_optimize_bswap =
 but leave *TYPE_OUT untouched.  */

  static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+   tree *new_rhs_out)
  {
gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
enum tree_code rhs_code;

if (TREE_CODE (rhs) == SSA_NAME)
  {
-  type = TREE_TYPE (rhs);
stmt = SSA_NAME_DEF_STMT (rhs);
-  if (!is_gimple_assign (stmt))
-   return false;
-
-  rhs_code = gimple_assign_rhs_code (stmt);
-  if (TREE_CODE (type) == INTEGER_TYPE
- ? !CONVERT_EXPR_CODE_P (rhs_code)
- : rhs_code != FIXED_CONVERT_EXPR)
-   return false;
+  if (is_gimple_assign (stmt))
+   {
+ rhs_code = gimple_assign_rhs_code (stmt);
+ if (TREE_CODE (type) == INTEGER_TYPE
+ ? !CONVERT_EXPR_CODE_P (rhs_code)
+ : rhs_code != FIXED_CONVERT_EXPR)
+   rhs1 = rhs;
+ else
+   rhs1 = gimple_assign_rhs1 (stmt);
+   }
+  else
+   rhs1 = rhs;

-  rhs1 = gimple_assign_rhs1 (stmt);
type1 = TREE_TYPE (rhs1);
+
if (TREE_CODE (type1) != TREE_CODE (type)
  || TYPE_PRECISION (type1) * 2>  TYPE_PRECISION (type))
return false;
@@ -2016,28 +2021,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree 
*new_rhs_out)
return false;
  }

-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */

  static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
tree *type1_out, tree *rhs1_out,
tree *type2_out, tree *rhs2_out)
  {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
if (TREE_CODE (type) != INTEGER_TYPE
&&  TREE_CODE (type) != FIXED_POINT_TYPE)
  return false;

-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+  rhs1_out))
  return false;

-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+  rhs2_out))
  return false;

if (*type1_out == NULL)
@@ -2089,7 +2093,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator 
*gsi)
if (TREE_CODE

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 7:14 AM, Richard Kenner
 wrote:
>> Or being fooled by the 0xfffc masking, perhaps.
>
> No, I'm pretty sure that's NOT the case.  The *whole point* of the
> routine is to deal with that masking.
>

I got

(gdb) step
make_compound_operation (x=0x7139c4c8, in_code=MEM)
at /export/gnu/import/git/gcc/gcc/combine.c:7572
7572  enum rtx_code code = GET_CODE (x);
(gdb) call debug_rtx (x)
(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))
(const_int 4294967292 [0xfffc]))

and it produces

(gdb) call debug_rtx (x)
(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))
(const_int 4294967292 [0xfffc]))

at the end.  make_compound_operation doesn't know how to
restore ZERO_EXTEND.

BTW, there is a small testcase at

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696

You can reproduce it on Linux/x86-64.

-- 
H.J.

Re: Vector alignment tracking

2011-10-13 Thread Andi Kleen

Artem Shinkarov  writes:
>
> 1) Currently in C we cannot provide information that an array is
> aligned to a certain number.  The problem is hidden in the fact, that

Have you considered doing it the other way round: when an optimization
needs something to be aligned, make the declaration aligned?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only

Re: [PATCH, rs6000] Preserve link stack for 476 cpus

2011-10-13 Thread Peter Bergner

On Mon, 2011-09-12 at 15:29 -0400, David Edelsohn wrote:
> First, please choose a more informative option name.
> -mpreserve-link-stack seems like something generally useful for all
> processors and someone may randomly add the option.  It always is
> useful to preserve the link stack -- that's why you're jumping through
> hoops to fix this bug.  Maybe -mpreserve-ppc476-link-stack .

Done.


> I would prefer that this patch were maintained by the chip vendors
> distributing SDKs for PPC476 instead of complicating the FSF codebase.

Talking with the chip folks, they said there were a number of companies
already downloading the FSF gcc sources and building it unpatched and
that they expected more to do so in the future, so I'm not sure how many
(if any) are actually even relying on a SDK.  So...


> Otherwise, please implement this like Xilinx FPU in rs6000.opt,
> rs6000.h, ppc476.h and config.gcc where TARGET_LINK_STACK is defined
> as 0 unless GCC explicitly is configured for powerpc476.

Here's a patch to do that, by adding a variant to the powerpc*-*-linux*
target for the 476.  I bootstrapped and regtested this as before, meaning
I also tested this with the -mpreserve-ppc476-link-stack on by default,
as well as configuring without 476 support and verified that the
TARGET_LINK_STACK tests are not only optimized away, but so is the
-mpreserve-ppc476-link-stack option itself.

Is this ok for mainline now?

Peter


* config.gcc (powerpc*-*-linux*): Add powerpc*-*-linux*ppc476* variant.
* config/rs6000/476.h: New file.
* config/rs6000/476.opt: Likewise.
* config/rs6000/rs6000.h (TARGET_LINK_STACK): New define.
(SET_TARGET_LINK_STACK): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
TARGET_LINK_STACK for -mtune=476 and -mtune=476fp.
(rs6000_legitimize_tls_address): Emit the link stack preserving GOT
code if TARGET_LINK_STACK.
(rs6000_emit_load_toc_table): Likewise.
(output_function_profiler): Likewise
(macho_branch_islands): Likewise
(machopic_output_stub): Likewise
* config/rs6000/rs6000.md (load_toc_v4_PIC_1, load_toc_v4_PIC_1b):
Convert to a define_expand.
(load_toc_v4_PIC_1_normal): New define_insn.
(load_toc_v4_PIC_1_476): Likewise.
(load_toc_v4_PIC_1b_normal): Likewise.
(load_toc_v4_PIC_1b_476): Likewise.


Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 179091)
+++ gcc/config.gcc  (working copy)
@@ -2133,6 +2133,9 @@ powerpc-*-linux* | powerpc64-*-linux*)
esac
tmake_file="${tmake_file} t-slibgcc-libgcc"
case ${target} in
+   powerpc*-*-linux*ppc476*)
+   tm_file="${tm_file} rs6000/476.h"
+   extra_options="${extra_options} rs6000/476.opt" ;;
powerpc*-*-linux*altivec*)
tm_file="${tm_file} rs6000/linuxaltivec.h" ;;
powerpc*-*-linux*spe*)
Index: gcc/config/rs6000/476.h
===
--- gcc/config/rs6000/476.h (revision 0)
+++ gcc/config/rs6000/476.h (revision 0)
@@ -0,0 +1,29 @@
+/* Enable IBM PowerPC 476 support.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Peter Bergner (berg...@vnet.ibm.com)
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#undef TARGET_LINK_STACK
+#define TARGET_LINK_STACK (rs6000_link_stack)
+
+#undef SET_TARGET_LINK_STACK
+#define SET_TARGET_LINK_STACK(X) do { TARGET_LINK_STACK = (X); } while (0)
Index: gcc/config/rs6000/476.opt
===
--- gcc/config/rs6000/476.opt   (revision 0)
+++ gcc/config/rs6000/476.opt   (revision 0)
@@ -0,0 +1,24 @@
+; IBM PowerPC 476 options.
+;
+; Copyright (C) 2011 Free Software Foundation, Inc.
+; Contributed by Peter Bergner (berg...@vnet.ibm.com)
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify it under
+;

[pph] Make streamer hooks internal (issue5278043)

2011-10-13 Thread Diego Novillo


To avoid confusion, I moved the callbacks into pph-streamer.c so they
can be internal to that file.  They don't need to be called directly
ever.

Tested on x86_64.  Committed to branch.


Diego.

* pph-streamer-in.c (pph_in_mergeable_tree): Fix comment.
(pph_read_tree): Move to pph-streamer.c.
(pph_in_location): Rename from pph_read_location.
(pph_read_location): Move to pph-streamer.c.
(pph_in_mergeable_chain): Call pph_in_hwi.
(pph_in_any_tree): Fix comment.
* pph-streamer-out.c (pph_write_tree): Move to pph-streamer.c.
(pph_out_location): Rename from pph_write_location.
(pph_write_location): Move to pph-streamer.c.
* pph-streamer.c (pph_write_tree): Move from pph-streamer-out.c.
Make static.
(pph_read_tree): Move from pph-streamer-in.c.  Make static.
(pph_input_location): Move from pph-streamer-in.c.  Rename
from pph_read_location.
(pph_output_location): Move from pph-streamer-out.c. Rename
from pph_out_location.
* pph-streamer.h (pph_write_tree): Remove.
(pph_write_location): Remove.
(pph_read_tree): Remove.
(pph_read_location): Remove.
(pph_out_location): Declare.
(pph_out_tree): Declare.
(pph_in_location): Declare.
(pph_in_tree): Declare.


diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index 3893ad2..f8d6393 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -517,6 +517,7 @@ static tree pph_in_any_tree (pph_stream *stream, tree 
*chain);
 
 
 /* Load an AST from STREAM.  Return the corresponding tree.  */
+
 tree
 pph_in_tree (pph_stream *stream)
 {
@@ -525,8 +526,7 @@ pph_in_tree (pph_stream *stream)
 }
 
 
-/* Load an AST in an ENCLOSING_NAMESPACE from STREAM.
-   Return the corresponding tree.  */
+/* Load an AST into CHAIN from STREAM.  */
 static void
 pph_in_mergeable_tree (pph_stream *stream, tree *chain)
 {
@@ -534,41 +534,23 @@ pph_in_mergeable_tree (pph_stream *stream, tree *chain)
 }
 
 
-/* Callback for reading ASTs from a stream.  Instantiate and return a
-   new tree from the PPH stream in DATA_IN.  */
-
-tree
-pph_read_tree (struct lto_input_block *ib_unused ATTRIBUTE_UNUSED,
-  struct data_in *root_data_in)
-{
-  /* Find data.  */
-  pph_stream *stream = (pph_stream *) root_data_in->sdata;
-  return pph_in_any_tree (stream, NULL);
-}
-
-
 /** lexical elements */
 
 
-/* Callback for streamer_hooks.input_location.  An offset is applied to
-   the location_t read in according to the properties of the merged
-   line_table.  IB and DATA_IN are as in lto_input_location.  This function
-   should only be called after pph_in_and_merge_line_table was called as
-   we expect pph_loc_offset to be set.  */
+/* Read and return a location_t from STREAM.  */
 
 location_t
-pph_read_location (struct lto_input_block *ib,
-   struct data_in *data_in ATTRIBUTE_UNUSED)
+pph_in_location (pph_stream *stream)
 {
   struct bitpack_d bp;
   bool is_builtin;
   unsigned HOST_WIDE_INT n;
   location_t old_loc;
 
-  bp = streamer_read_bitpack (ib);
+  bp = pph_in_bitpack (stream);
   is_builtin = bp_unpack_value (&bp, 1);
 
-  n = streamer_read_uhwi (ib);
+  n = pph_in_uhwi (stream);
   old_loc = (location_t) n;
   gcc_assert (old_loc == n);
 
@@ -576,20 +558,6 @@ pph_read_location (struct lto_input_block *ib,
 }
 
 
-/* Read and return a location_t from STREAM.
-   FIXME pph: Tracing doesn't depend on STREAM any more.  We could avoid having
-   to call this function, only for it to call lto_input_location, which calls
-   the streamer hook back to pph_read_location.  Say what?  */
-
-location_t
-pph_in_location (pph_stream *stream)
-{
-  location_t loc = pph_read_location (stream->encoder.r.ib,
-   stream->encoder.r.data_in);
-  return loc;
-}
-
-
 /* Load the tree value associated with TOKEN from STREAM.  */
 
 static void
@@ -761,7 +729,7 @@ pph_in_mergeable_chain (pph_stream *stream, tree *chain)
 {
   int i, count;
 
-  count = streamer_read_hwi (stream->encoder.r.ib);
+  count = pph_in_hwi (stream);
   for (i = 0; i < count; i++)
 pph_in_mergeable_tree (stream, chain);
 }
@@ -1954,8 +1922,8 @@ pph_in_tree_header (pph_stream *stream, enum LTO_tags tag)
 }
 
 
-/* Read a tree from the STREAM.  It ENCLOSING_NAMESPACE is not null,
-   the tree may be unified with an existing tree in that namespace.  */
+/* Read a tree from the STREAM.  If CHAIN is not null, the tree may be
+   unified with an existing tree in that chain.  */
 
 static tree
 pph_in_any_tree (pph_stream *stream, tree *chain)
diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c
index 0c00054..b5020f2 100644
--- a/gcc/cp/pph-streamer-out.c
+++ b/gcc/cp/pph-streamer-out.c
@@ -641,26 +641,13 @@ pph_out_mergeable_tree (pph_stream *stream, tree t)
 }
 
 
-/* Callback for writing ASTs t

[pph] shorten timeout on c1limits-externalid.cc and XFAIL (issue5278042)

2011-10-13 Thread Diego Novillo


I think this may be an infinite loop, but it may also just be taking a
long time to do the merge operations.

Teste on x86_64.  Committed to branch.


Diego.

* g++.dg/pph/c1limits-externalid.cc: Add shorter timeout.
Document failure mode.

diff --git a/gcc/testsuite/g++.dg/pph/c1limits-externalid.cc 
b/gcc/testsuite/g++.dg/pph/c1limits-externalid.cc
index b10f1c1..c44475f 100644
--- a/gcc/testsuite/g++.dg/pph/c1limits-externalid.cc
+++ b/gcc/testsuite/g++.dg/pph/c1limits-externalid.cc
@@ -1 +1,6 @@
+/* FIXME pph - The following timeout may cause failures on slow targets.
+   In general it takes no longer than a couple of seconds to compile
+   this test, but the new merging code is having trouble with this.  */
+/* { dg-timeout 15 } */
+/* { dg-xfail-if "MERGE INFINITE LOOP" { *-*-* } { "-fpph-map=pph.map" } } */
 #include "c0limits-externalid.h"
-- 
1.7.3.1


--
This patch is available for review at http://codereview.appspot.com/5278042

Vector alignment tracking

2011-10-13 Thread Artem Shinkarov

Hi

I would like to share some plans about improving the situation with
vector alignment tracking.  First of all, I would like to start with a
well-known bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50716.

There are several aspects of the problem:
1) We would like to avoid the quiet segmentation fault.
2) We would like to warn a user about the potential problems
considering assignment of vectors with different alignment.
3) We would like to replace obvious aligned vector assignments with
aligned move, and unaligned with unaligned.

All these aspects are interconnected and in order to find the problem,
we have to improve the alignment tracking facilities.

1) Currently in C we cannot provide information that an array is
aligned to a certain number.  The problem is hidden in the fact, that
pointer can represents an array or an address of an object.  And it
turns out that current aligned attribute doesn't help here.  My
proposal is to introduce an attribute called array_alligned (I am very
flexible on the name) which can be applied only to the pointers and
which would show that the pointer of this type represents an array,
where the first element is aligned to the given number.

2) After we have the new attribute, we can have a pass which would
check all the pointer arithmetic expressions, and in case of vectors,
mark the assignments with __builtin_assume_aligned.

3) In the separate pass we need to mark an alignments of the function
return types, in order to propagate this information through the
flow-graph.

4) In case of LTO, it becomes possible to track all the pointer
dereferences, and depending on the parameters warn, or change aligned
assignment to unaligned and vice-versa.


As a very first draft of (1) I include the patch, that introduces
array_aligned attribute.  The attribute sets is_array_flag in the
type, ans uses alignment number to store the alignment of the array.
In this implementation, we loose information about the alignment of
the pointer itself, but I don't know if we need it in this particular
situation.  Alternatively we can keep array_alignment in a separate
field, which one is better I am not sure.


Thanks,
Artem.
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 179906)
+++ gcc/c-family/c-common.c (working copy)
@@ -341,6 +341,7 @@ static tree handle_destructor_attribute
 static tree handle_mode_attribute (tree *, tree, tree, int, bool *);
 static tree handle_section_attribute (tree *, tree, tree, int, bool *);
 static tree handle_aligned_attribute (tree *, tree, tree, int, bool *);
+static tree handle_aligned_array_attribute (tree *, tree, tree, int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
 static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *);
@@ -643,6 +644,8 @@ const struct attribute_spec c_common_att
  handle_section_attribute, false },
   { "aligned",0, 1, false, false, false,
  handle_aligned_attribute, false },
+  { "aligned_array",  0, 1, false, false, false,
+ handle_aligned_array_attribute, false },
   { "weak",   0, 0, true,  false, false,
  handle_weak_attribute, false },
   { "ifunc",  1, 1, true,  false, false,
@@ -6682,6 +6685,26 @@ handle_section_attribute (tree *node, tr
 }
 
   return NULL_TREE;
+}
+
+/* Handle "aligned_array" attribute.  */
+static tree
+handle_aligned_array_attribute (tree *node, tree ARG_UNUSED (name), tree args,
+   int flags, bool *no_add_attrs)
+{
+  if (!TYPE_P (*node) || !POINTER_TYPE_P (*node))
+{
+  error ("array_alignment attribute must be applied to a pointer-type");
+  *no_add_attrs = true;
+}
+  else
+{
+  tree ret = handle_aligned_attribute (node, name, args, flags, 
no_add_attrs);
+  TYPE_IS_ARRAY (*node) = true;
+  return ret;
+}
+
+  return NULL_TREE;
 }
 
 /* Handle a "aligned" attribute; arguments as in
Index: gcc/tree.h
===
--- gcc/tree.h  (revision 179906)
+++ gcc/tree.h  (working copy)
@@ -2149,6 +2149,7 @@ struct GTY(()) tree_block {
 #define TYPE_NEXT_VARIANT(NODE) (TYPE_CHECK (NODE)->type_common.next_variant)
 #define TYPE_MAIN_VARIANT(NODE) (TYPE_CHECK (NODE)->type_common.main_variant)
 #define TYPE_CONTEXT(NODE) (TYPE_CHECK (NODE)->type_common.context)
+#define TYPE_IS_ARRAY(NODE) (TYPE_CHECK (NODE)->type_common.is_array_flag)
 
 /* Vector types need to check target flags to determine type.  */
 extern enum machine_mode vector_type_mode (const_tree);
@@ -2411,6 +2412,7 @@ struct GTY(()) tree_type_common {
   unsigned lang_flag_5 : 1;
   unsigned lang_flag_6 : 1;
 
+  unsigned is_array_flag: 1;

Re: [PATCH] Fix PR50712

2011-10-13 Thread H.J. Lu

On Thu, Oct 13, 2011 at 4:55 AM, Richard Guenther  wrote:
>
> This fixes PR50712, an issue with IPA split uncovered by adding
> verifier calls after it ... we need to also gimplify reads of
> register typed memory when passing it as argument.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Richard.
>
> 2011-10-13  Richard Guenther  
>
>        PR tree-optimization/50712
>        * ipa-split.c (split_function): Always re-gimplify parameters
>        when they are not gimple vals before passing them.  Properly
>        check for type compatibility.
>
>        * gcc.target/i386/pr50712.c: New testcase.
>

This test is valid only for ia32, not ilp32. I checked in this patch
to fix it.

-- 
H.J.
---
Index: gcc.target/i386/pr50712.c
===
--- gcc.target/i386/pr50712.c   (revision 179925)
+++ gcc.target/i386/pr50712.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-require-effective-target ilp32 } */
+/* { dg-require-effective-target ia32 } */
 /* { dg-options "-O2" } */

 typedef __builtin_va_list __va_list;
Index: ChangeLog
===
--- ChangeLog   (revision 179925)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2011-10-13  H.J. Lu  
+
+   * gcc.target/i386/pr50712.c: Check ia32 instead of ilp32.
+
 2011-10-13  Eric Botcazou  

* gcc.dg/builtins-67.c: Guard iround and irint with HAVE_C99_RUNTIME.

[Patch, Darwin] fix PR50699.

2011-10-13 Thread Iain Sandoe


.. this looks like an (almost) obvious fix for the bootstrap breakage...
OK for trunk?
Iain

Index: gcc/config/darwin.c
===
--- gcc/config/darwin.c (revision 179865)
+++ gcc/config/darwin.c (working copy)
@@ -2957,10 +2957,11 @@ darwin_override_options (void)
   darwin_running_cxx = (strstr (lang_hooks.name, "C++") != 0);
 }

-/* Add $LDBL128 suffix to long double builtins.  */
+#if defined (__ppc__) || defined (__ppc64__)
+/* Add $LDBL128 suffix to long double builtins for ppc darwin.  */

 static void
-darwin_patch_builtin (int fncode)
+darwin_patch_builtin (enum built_in_function fncode)
 {
   tree fn = builtin_decl_explicit (fncode);
   tree sym;
@@ -2998,6 +2999,7 @@ darwin_patch_builtins (void)
 #undef PATCH_BUILTIN_NO64
 #undef PATCH_BUILTIN_VARIADIC
 }
+#endif

 /*  CFStrings implementation.  */
 static GTY(()) tree cfstring_class_reference = NULL_TREE;

Re: RFC: Add ADD_RESTRICT tree code

2011-10-13 Thread Joseph S. Myers

On Thu, 13 Oct 2011, Michael Matz wrote:

> Yeah.  But I continue to think that this reading is against the intent (or 
> should be).  All the examples in the standard and rationale never say 
> anything about pointers to restricted objects and the problematic cases 
> one can construct with them, i.e. that one restricted pointer object might 
> have different names.  That leads me to think that this aspect simply was 
> overlooked or thought to be irrelevant.

(Restricted) pointers to restricted objects are exactly what the sentence 
"Every access that modifies X shall be considered also to modify P, for 
the purposes of this subclause." is about.  See my annotation in 
.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [pph] More DECL merging. (issue5268042)

2011-10-13 Thread Diego Novillo

I'm seeing an infinite loop in g++.dg/pph/c1limits-externalid.cc.  The
while() loop in pph_search_in_chain is not ending.  Or maybe it's
falling into the N^2 trap you mention in that routine?

I've added a short timeout to this test and XFAIL'd it so you can debug it.


Diego.

Re: [testsuite] require arm_little_endian in two tests

2011-10-13 Thread Richard Earnshaw

On 13/10/11 15:56, Joseph S. Myers wrote:
> On Thu, 13 Oct 2011, Richard Earnshaw wrote:
> 
>> 2) Change the compiler to make initializers of vectors assign elements
>> of initializers to consecutive lanes in a vector, rather than the
>> current behaviour of 'casting' an array of elements to a vector.
>>
>> While the second would be my preferred change, I suspect it's too hard
>> to fix, and may well cause code written for other targets to break on
>> big-endian (altivec for example).
> 
> Indeed, vector initializers are part of the target-independent GNU C 
> language and have target-independent semantics that the elements go in 
> memory order, corresponding to the target-independent semantics of lane 
> numbers where they appear in GENERIC, GIMPLE and (non-UNSPEC) RTL and any 
> target-independent built-in functions that use such numbers.  (The issue 
> here being, as you saw, that the lane numbers used in ARM-specific NEON 
> intrinsics are for big-endian not the same as those used in 
> target-independent features of GNU C and target-independent internal 
> representations in GCC - hence various code to translate them between the 
> two conventions when processing intrinsics into non-UNSPEC RTL, and to 
> translate back when generating assembly instructions that encode lane 
> numbers with the ARM conventions, as expounded at greater length at 
> .)
> 

This is all rather horrible, and leads to THREE different layouts for a
128-bit vector for big-endian Neon.

GCC format
'VLD1.n' format
'ABI' format

GCC format and 'ABI' format differ in that the 64-bit words of the
128-bit vector are swapped.

All this and they are all expected to share a single machine mode.

Furthermore, the definitions in GCC are broken, in that the types
defined in arm_neon.h (eg int8x16_t) are supposed to be ABI format, not
GCC format.

Eukk! :-(

R.

[Patch]: fix typo in rs6000.c (AIX bootstrap broken)

2011-10-13 Thread Tristan Gingold

Hi,

looks like an obvious typo.  Ok for trunk ?

Tristan.

2011-10-13  Tristan Gingold  

* config/rs6000/rs6000.c (rs6000_init_builtins): Fix typo.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4fd2192..3bfe33e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -12213,7 +12213,7 @@ rs6000_init_builtins (void)
 
 #if TARGET_XCOFF
   /* AIX libm provides clog as __clog.  */
-  if ((tdecl = builtin_decl_explicit ([BUILT_IN_CLOG))) != NULL_TREE)
+  if ((tdecl = builtin_decl_explicit (BUILT_IN_CLOG)) != NULL_TREE)
 set_user_assembler_name (tdecl, "__clog");
 #endif

Re: RFC: Add ADD_RESTRICT tree code

2011-10-13 Thread Michael Matz

Hi,

On Thu, 13 Oct 2011, Jakub Jelinek wrote:

> On Thu, Oct 13, 2011 at 02:57:56PM +0200, Michael Matz wrote:
> > struct S {int * restrict p;};
> > void foo (struct S *s, struct S *t) {
> >   s->p[0] = 0;
> >   t->p[0] = 1;  // undefined if s->p == t->p; the caller was responsible 
> > // to not do that
> 
> This is undefined only if s->p == t->p && &s->p != &t->p.  If both
> s->p and t->p designate the same restricted pointer object,
> it is fine.

Yeah.  But I continue to think that this reading is against the intent (or 
should be).  All the examples in the standard and rationale never say 
anything about pointers to restricted objects and the problematic cases 
one can construct with them, i.e. that one restricted pointer object might 
have different names.  That leads me to think that this aspect simply was 
overlooked or thought to be irrelevant.

I'm leaning towards (for C) to ignore restrict qualifications on all 
indirectly accessed or address-taken objects.  Or better not to ignore the 
restrict but make them conflict with all other pointers, restrict or 
non-restrict (normally non-restrict and restrict don't conflict in 
theory, although for GCC they do).

Ciao,
Michael.

Re: [Patch, Fortran, committed] PR 50659: [4.4/4.5/4.6/4.7 Regression] ICE with PROCEDURE statement

2011-10-13 Thread Janus Weil

> Committed to the 4.6 branch as r179864:

... and to 4.5 as r179923.

Cheers,
Janus



> 2011/10/9 Janus Weil :
>> Hi all,
>>
>> I have just committed as obvious a patch for an ICE-on-valid problem
>> with PROCEDURE statements:
>>
>> http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=179723
>>
>> The problem was the following: When setting up an external procedure
>> or procedure pointer (declared via a PROCEDURE statement), we copy the
>> expressions for the array bounds and string length from the interface
>> symbol given in the PROCEDURE declaration (cf.
>> 'resolve_procedure_interface'). If those expressions depend on the
>> actual args of the interface, we have to replace those args by the
>> args of the new procedure symbol that we're setting up. This is what
>> 'gfc_expr_replace_symbols' / 'replace_symbol' does. Unfortunately we
>> failed to check whether the symbol we try to replace is actually a
>> dummy!
>>
>> Contrary to Andrew's initial assumption, I think the test case is
>> valid. I could neither find a compiler which rejects it, nor a
>> restriction in the standard which makes it invalid. The relevant part
>> of F08 is probably chapter 7.1.11 ("Specification expression"). This
>> states that a specification expression can contain variables, which
>> are made accessible via use association.
>>
>> I'm planning to apply the patch to the 4.6, 4.5 and 4.4 branches soon.
>>
>> Cheers,
>> Janus
>>
>

1 2 >

1 - 100 of 166 matches

Mail list logo