Re: PATCH: PR target/47715: [x32] TLS doesn't work

2011-07-27 Thread Uros Bizjak
On Thu, Jul 28, 2011 at 4:55 AM, H.J. Lu  wrote:
> TLS on X32 is almost identical to TLS on x86-64.  The only difference is
> x32 address space is 32bit.  That means TLS symbols can be in either
> SImode or DImode with upper 32bit zero.  This patch updates
> tls_global_dynamic_64 to support x32.  OK for trunk?

> 2011-07-27  H.J. Lu  
>
>        PR target/47715
>        * config/i386/i386.md (PTR64): New.
>        (*tls_global_dynamic_64): Rename to ...
>        (*tls_global_dynamic_64_): This.  Put PTR64 on operand 1.
>        (tls_global_dynamic_64): Rename to ...
>        (tls_global_dynamic_64_): This.  Put PTR64 on operand 1.
>        * config/i386/i386.c (legitimize_tls_address): Updated.

Just remove mode check, so:

(unspec:DI [(match_operand 1 "tls_symbolic_operand" "")]

at both sites.

-  fputs (ASM_BYTE "0x66\n", asm_out_file);
+  if (!TARGET_X32)
+fputs (ASM_BYTE "0x66\n", asm_out_file);

Are you sure? There are some scary comments in binutils that these
sequences have to be written _exactly_ as shown to enable certain
linker relaxations w.r.t. TLS relocs.

Uros.


Re: PATCH: PR target/47364: [x32] internal compiler error: in emit_move_insn, at expr.c:3355

2011-07-27 Thread Uros Bizjak
On Thu, Jul 28, 2011 at 5:48 AM, H.J. Lu  wrote:

> We should only expand strlen to Pmode.  Otherwise, we got
>
> [hjl@gnu-6 ilp32-38]$ cat x.i
> char one[50] = "ijk";
> int
> main (void)
> {
>  return __builtin_strlen (one) != 3;
> }
> [hjl@gnu-6 ilp32-38]$ /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc 
> -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -S -o x.s -mx32 -O2 x.i
> x.i: In function ‘main’:
> x.i:5:27: internal compiler error: in emit_move_insn, at expr.c:
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.
>
> OK for trunk?
>
> 2011-07-27  H.J. Lu  
>
>        PR target/47364
>        * config/i386/i386.md (strlen): Replace SWI48x with P.

OK.

Thanks,
Uros.


Re: [PATCH, i386, testsuite] New BMI testcases

2011-07-27 Thread Uros Bizjak
On Wed, Jul 27, 2011 at 11:29 PM, Jakub Jelinek  wrote:

>> > Guys, with write approval, could you please commit that?
>> >
>>
>> I checked it in for you.
>
> Unfortunately many of the new tests fail with old assembler, because
> the builtin in check_effective_target_bmi is optimized away (ignored, as
> well as using constant arguments, two reasons to get rid of it).
>
> Fixed thusly, tested on i686-linux and x86_64-linux, both with old and new
> binutils.  Ok for trunk?
>
> 2011-07-27  Jakub Jelinek  
>
>        * gcc.target/i386/i386.exp (check_effective_target_bmi): Make sure
>        the builtin isn't optimized away.

OK.

Thanks,
Uros.


Re: Mention avx2 patch

2011-07-27 Thread Kirill Yukhin
Ping

--
Thanks, K

On Mon, Jun 20, 2011 at 4:43 PM, H.J. Lu  wrote:
> Hi,
>
> This patch removes ix86/avx branch and mentions avx2 branch.  OK
> to install?
>
>
> H.J.
> 
> Index: svn.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/svn.html,v
> retrieving revision 1.162
> diff -u -p -r1.162 svn.html
> --- svn.html    3 Jun 2011 14:04:32 -       1.162
> +++ svn.html    20 Jun 2011 12:41:27 -
> @@ -353,13 +353,12 @@ the command svn log --stop-on-copy
>   and Erven Rohou
>   erven.ro...@st.com>.
>
> -  ix86/avx
> -  The goal of this branch is to implement Intel AVX (Intel Advanced
> -  Vector Extensions).  The branch is maintained by
> -  H.J. Lu < href="mailto:hjl.to...@gmail.com";>hjl.to...@gmail.com>,
> -  Joey Ye joey...@intel.com>
> -  and Xuepeng Guo < href="mailto:xuepeng@intel.com";>xuepeng@intel.com>.
> -  Patches should be marked with the tag [AVX] in the subject
> +  avx2
> +  The goal of this branch is to implement AVX Programming Reference
> +  (June, 2011). The branch is maintained by
> +  H.J. Lu < href="mailto:hjl.to...@gmail.com";>hjl.to...@gmail.com>
> +  and Yukhin Kirill < href="mailto:kirill.yuk...@intel.com";>kirill.yuk...@intel.com>.
> +  Patches should be marked with the tag [AVX2] in the subject
>   line.
>
>   ix86/gcc-4_5-branch
>


PATCH: PR target/47364: [x32] internal compiler error: in emit_move_insn, at expr.c:3355

2011-07-27 Thread H.J. Lu
Hi,

We should only expand strlen to Pmode.  Otherwise, we got

[hjl@gnu-6 ilp32-38]$ cat x.i
char one[50] = "ijk";
int
main (void)
{
  return __builtin_strlen (one) != 3;
}
[hjl@gnu-6 ilp32-38]$ /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc 
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -S -o x.s -mx32 -O2 x.i
x.i: In function ‘main’:
x.i:5:27: internal compiler error: in emit_move_insn, at expr.c:
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

OK for trunk?

Thanks.


H.J.

2011-07-27  H.J. Lu  

PR target/47364
* config/i386/i386.md (strlen): Replace SWI48x with P.

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e91a299..c772f94 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15824,11 +15855,11 @@
(set_attr "prefix_rep" "1")])
 
 (define_expand "strlen"
-  [(set (match_operand:SWI48x 0 "register_operand" "")
-   (unspec:SWI48x [(match_operand:BLK 1 "general_operand" "")
-   (match_operand:QI 2 "immediate_operand" "")
-   (match_operand 3 "immediate_operand" "")]
-  UNSPEC_SCAS))]
+  [(set (match_operand:P 0 "register_operand" "")
+   (unspec:P [(match_operand:BLK 1 "general_operand" "")
+  (match_operand:QI 2 "immediate_operand" "")
+  (match_operand 3 "immediate_operand" "")]
+ UNSPEC_SCAS))]
   ""
 {
  if (ix86_expand_strlen (operands[0], operands[1], operands[2], operands[3]))


PATCH: PR target/47715: [x32] Use SImode for thread pointer

2011-07-27 Thread H.J. Lu
Hi,

In x32, thread pointer is 32bit and choice of segment register for the
thread base ptr load should be based on TARGET_64BIT.  This patch
implements it.  OK for trunk?

Thanks.


H.J.
---
2011-07-27  H.J. Lu  

PR target/47715
* config/i386/i386.c (get_thread_pointer): Use ptr_mode
instead of Pmode with UNSPEC_TP.

* config/i386/i386.md (tp_seg): Removed.
(*load_tp_): Replace :P with :PTR.
(*add_tp_): Likewise.
(*load_tp_x32): New.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8723dc5..d32d64d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12120,7 +12120,9 @@ get_thread_pointer (bool to_reg)
 {
   rtx tp, reg, insn;
 
-  tp = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
+  tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
+  if (ptr_mode != Pmode)
+tp = convert_to_mode (Pmode, tp, 1);
   if (!to_reg)
 return tp;
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e91a299..c772f94 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -951,9 +951,14 @@
 ;; This mode iterator allows :P to be used for patterns that operate on
 ;; pointer-sized quantities.  Exactly one of the two alternatives will match.
 (define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")])
+
+;; This mode iterator allows :PTR to be used for patterns that operate on
+;; ptr_mode sized quantities.
+(define_mode_iterator PTR
+  [(SI "ptr_mode == SImode") (DI "ptr_mode == DImode")])
 
 ;; Pointer modes in 64bit.
 (define_mode_iterator PTR64 [(SI "TARGET_X32") DI])
 
 ;; Scheduling descriptions
   output_asm_insn
 ("lea{q}\t{%a1@tlsgd(%%rip), %%rdi|rdi, %a1@tlsgd[rip]}", operands);
   fputs (ASM_SHORT "0x\n", asm_out_file);
 (define_insn "*tls_local_dynamic_base_32_gnu"
@@ -12438,15 +12451,28 @@
   (clobber (match_dup 5))
   (clobber (reg:CC FLAGS_REG))])])
 
-;; Segment register for the thread base ptr load
-(define_mode_attr tp_seg [(SI "gs") (DI "fs")])
-
-;; Load and add the thread base pointer from %:0.
+;; Load and add the thread base pointer from %gs:0 or %fs:0.
 (define_insn "*load_tp_"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (unspec:P [(const_int 0)] UNSPEC_TP))]
+  [(set (match_operand:PTR 0 "register_operand" "=r")
+   (unspec:PTR [(const_int 0)] UNSPEC_TP))]
   ""
-  "mov{}\t{%%:0, %0|%0,  PTR :0}"
+{
+  if (TARGET_64BIT)
+return "mov{}\t{%%fs:0, %0|%0,  PTR fs:0}";
+  else
+return "mov{}\t{%%gs:0, %0|%0,  PTR gs:0}";
+}
+  [(set_attr "type" "imov")
+   (set_attr "modrm" "0")
+   (set_attr "length" "7")
+   (set_attr "memory" "load")
+   (set_attr "imm_disp" "false")])
+
+(define_insn "*load_tp_x32"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))]
+  "TARGET_X32"
+  "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}"
   [(set_attr "type" "imov")
(set_attr "modrm" "0")
(set_attr "length" "7")
@@ -12454,12 +12480,17 @@
(set_attr "imm_disp" "false")])
 
 (define_insn "*add_tp_"
-  [(set (match_operand:P 0 "register_operand" "=r")
-   (plus:P (unspec:P [(const_int 0)] UNSPEC_TP)
-   (match_operand:P 1 "register_operand" "0")))
+  [(set (match_operand:PTR 0 "register_operand" "=r")
+   (plus:PTR (unspec:PTR [(const_int 0)] UNSPEC_TP)
+ (match_operand:PTR 1 "register_operand" "0")))
(clobber (reg:CC FLAGS_REG))]
   ""
-  "add{}\t{%%:0, %0|%0,  PTR :0}"
+{
+  if (TARGET_64BIT)
+return "add{}\t{%%fs:0, %0|%0,  PTR fs:0}";
+  else
+return "add{}\t{%%gs:0, %0|%0,  PTR gs:0}";
+}
   [(set_attr "type" "alu")
(set_attr "modrm" "0")
(set_attr "length" "7")


Re: PATCH: PR target/47715: [x32] TLS doesn't work

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 07:55:08PM -0700, H.J. Lu wrote:
> TLS on X32 is almost identical to TLS on x86-64.  The only difference is
> x32 address space is 32bit.  That means TLS symbols can be in either
> SImode or DImode with upper 32bit zero.  This patch updates
> tls_global_dynamic_64 to support x32.  OK for trunk?
> 

Small update to correct *tls_global_dynamic_64 length.

H.J.
--
2011-07-27  H.J. Lu  

PR target/47715
* config/i386/i386.md (PTR64): New.
(*tls_global_dynamic_64): Rename to ...
(*tls_global_dynamic_64_): This.  Put PTR64 on operand 1.
(tls_global_dynamic_64): Rename to ...
(tls_global_dynamic_64_): This.  Put PTR64 on operand 1.
* config/i386/i386.c (legitimize_tls_address): Updated.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8723dc5..31d5b8e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12215,9 +12215,22 @@ legitimize_tls_address (rtx x, enum tls_model model, 
bool for_mov)
  if (TARGET_64BIT)
{
  rtx rax = gen_rtx_REG (Pmode, AX_REG), insns;
+ rtx (*tls_global_dynamic) (rtx, rtx, rtx);
+
+ switch (GET_MODE (x))
+   {
+   case SImode:
+ tls_global_dynamic = gen_tls_global_dynamic_64_si;
+ break;
+   case DImode:
+ tls_global_dynamic = gen_tls_global_dynamic_64_di;
+ break;
+   default:
+ gcc_unreachable ();
+   }
 
  start_sequence ();
- emit_call_insn (gen_tls_global_dynamic_64 (rax, x, caddr));
+ emit_call_insn (tls_global_dynamic (rax, x, caddr));
  insns = get_insns ();
  end_sequence ();
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e91a299..f59e685 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -951,6 +951,9 @@
 ;; This mode iterator allows :P to be used for patterns that operate on
 ;; pointer-sized quantities.  Exactly one of the two alternatives will match.
 (define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")])
+
+;; Pointer modes in 64bit.
+(define_mode_iterator PTR64 [(SI "TARGET_X32") DI])
 
 ;; Scheduling descriptions
 
@@ -12322,16 +12325,17 @@
  (clobber (match_scratch:SI 5 ""))
  (clobber (reg:CC FLAGS_REG))])])
 
-(define_insn "*tls_global_dynamic_64"
+(define_insn "*tls_global_dynamic_64_"
   [(set (match_operand:DI 0 "register_operand" "=a")
(call:DI
 (mem:QI (match_operand:DI 2 "constant_call_address_operand" "z"))
 (match_operand:DI 3 "" "")))
-   (unspec:DI [(match_operand:DI 1 "tls_symbolic_operand" "")]
+   (unspec:DI [(match_operand:PTR64 1 "tls_symbolic_operand" "")]
  UNSPEC_TLS_GD)]
   "TARGET_64BIT"
 {
-  fputs (ASM_BYTE "0x66\n", asm_out_file);
+  if (!TARGET_X32)
+fputs (ASM_BYTE "0x66\n", asm_out_file);
   output_asm_insn
 ("lea{q}\t{%a1@tlsgd(%%rip), %%rdi|rdi, %a1@tlsgd[rip]}", operands);
   fputs (ASM_SHORT "0x\n", asm_out_file);
@@ -12341,15 +12345,16 @@
   return "call\t%P2";
 }
   [(set_attr "type" "multi")
-   (set_attr "length" "16")])
+   (set (attr "length")
+   (symbol_ref "TARGET_X32 ? 15 : 16"))])
 
-(define_expand "tls_global_dynamic_64"
+(define_expand "tls_global_dynamic_64_"
   [(parallel
 [(set (match_operand:DI 0 "register_operand" "")
  (call:DI
   (mem:QI (match_operand:DI 2 "constant_call_address_operand" ""))
   (const_int 0)))
- (unspec:DI [(match_operand:DI 1 "tls_symbolic_operand" "")]
+ (unspec:DI [(match_operand:PTR64 1 "tls_symbolic_operand" "")]
UNSPEC_TLS_GD)])])
 
 (define_insn "*tls_local_dynamic_base_32_gnu"


PATCH: PR target/47715: [x32] TLS doesn't work

2011-07-27 Thread H.J. Lu
TLS on X32 is almost identical to TLS on x86-64.  The only difference is
x32 address space is 32bit.  That means TLS symbols can be in either
SImode or DImode with upper 32bit zero.  This patch updates
tls_global_dynamic_64 to support x32.  OK for trunk?

Thanks.


H.J.
---
2011-07-27  H.J. Lu  

PR target/47715
* config/i386/i386.md (PTR64): New.
(*tls_global_dynamic_64): Rename to ...
(*tls_global_dynamic_64_): This.  Put PTR64 on operand 1.
(tls_global_dynamic_64): Rename to ...
(tls_global_dynamic_64_): This.  Put PTR64 on operand 1.
* config/i386/i386.c (legitimize_tls_address): Updated.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8723dc5..31d5b8e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12215,9 +12215,22 @@ legitimize_tls_address (rtx x, enum tls_model model, 
bool for_mov)
  if (TARGET_64BIT)
{
  rtx rax = gen_rtx_REG (Pmode, AX_REG), insns;
+ rtx (*tls_global_dynamic) (rtx, rtx, rtx);
+
+ switch (GET_MODE (x))
+   {
+   case SImode:
+ tls_global_dynamic = gen_tls_global_dynamic_64_si;
+ break;
+   case DImode:
+ tls_global_dynamic = gen_tls_global_dynamic_64_di;
+ break;
+   default:
+ gcc_unreachable ();
+   }
 
  start_sequence ();
- emit_call_insn (gen_tls_global_dynamic_64 (rax, x, caddr));
+ emit_call_insn (tls_global_dynamic (rax, x, caddr));
  insns = get_insns ();
  end_sequence ();
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e91a299..06d65fc 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -951,6 +951,9 @@
 ;; This mode iterator allows :P to be used for patterns that operate on
 ;; pointer-sized quantities.  Exactly one of the two alternatives will match.
 (define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")])
+
+;; Pointer modes in 64bit.
+(define_mode_iterator PTR64 [(SI "TARGET_X32") DI])
 
 ;; Scheduling descriptions
 
@@ -12322,16 +12325,17 @@
  (clobber (match_scratch:SI 5 ""))
  (clobber (reg:CC FLAGS_REG))])])
 
-(define_insn "*tls_global_dynamic_64"
+(define_insn "*tls_global_dynamic_64_"
   [(set (match_operand:DI 0 "register_operand" "=a")
(call:DI
 (mem:QI (match_operand:DI 2 "constant_call_address_operand" "z"))
 (match_operand:DI 3 "" "")))
-   (unspec:DI [(match_operand:DI 1 "tls_symbolic_operand" "")]
+   (unspec:DI [(match_operand:PTR64 1 "tls_symbolic_operand" "")]
  UNSPEC_TLS_GD)]
   "TARGET_64BIT"
 {
-  fputs (ASM_BYTE "0x66\n", asm_out_file);
+  if (!TARGET_X32)
+fputs (ASM_BYTE "0x66\n", asm_out_file);
   output_asm_insn
 ("lea{q}\t{%a1@tlsgd(%%rip), %%rdi|rdi, %a1@tlsgd[rip]}", operands);
   fputs (ASM_SHORT "0x\n", asm_out_file);
@@ -12343,13 +12347,13 @@
   [(set_attr "type" "multi")
(set_attr "length" "16")])
 
-(define_expand "tls_global_dynamic_64"
+(define_expand "tls_global_dynamic_64_"
   [(parallel
 [(set (match_operand:DI 0 "register_operand" "")
  (call:DI
   (mem:QI (match_operand:DI 2 "constant_call_address_operand" ""))
   (const_int 0)))
- (unspec:DI [(match_operand:DI 1 "tls_symbolic_operand" "")]
+ (unspec:DI [(match_operand:PTR64 1 "tls_symbolic_operand" "")]
UNSPEC_TLS_GD)])])
 
 (define_insn "*tls_local_dynamic_base_32_gnu"


Re: Unreviewed libgcc patches

2011-07-27 Thread NightStrike
On Mon, Jul 18, 2011 at 8:21 AM, Rainer Orth
 wrote:
> The following two libgcc patches have seen almost no comments, and
> certainly neither testing or review in a week:
>
>        CFT: [build] Move fp-bit support to toplevel libgcc
>        http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00927.html
>
>        CFT: [build] Move soft-fp support to toplevel libgcc
>        http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00931.html
>
> This patch will need to be updated for the recent addition of the c6x
> port.
>
> Both will probably need build and libgcc maintainers and either a bunch
> of target maintainers or a global reviewer.  I wonder how to proceed
> here: I've got a bunch of further libgcc patches in the works or
> planned, but if I can't get them reviewed, there's no point in
> continuing that work.

Do you still need support?


Re: [RS6000] asynch exceptions and unwind info

2011-07-27 Thread David Edelsohn
On Wed, Jul 27, 2011 at 1:30 AM, Alan Modra  wrote:

>        * config/rs6000/linux-unwind.h (frob_update_context <__powerpc64__>):
>        Leave r2 REG_UNSAVED if stopped on the instruction that saves r2
>        in a plt call stub.  Do restore r2 if stopped on bctrl.

Okay.

Thanks, David


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 1:25 PM, Paolo Bonzini  wrote:
> On 07/27/2011 07:29 PM, H.J. Lu wrote:
>>
>> If IRNORE_ADDRESS_WRAP_AROUND is TRUE, we
>> +   also permute the conversion and addition of a constant.  It is used to
>> +   optimize cases where overflow of base + constant offset won't happen
>> or
>> +   its behavior is implementation-defined for a given target.  */
>
> Regarding correctness: you're converting a SImode operation to DImode by
> "pushing in" the zero_extend operation.  What makes you think that base +
> constant offset won't overflow in any case?
>
> And also: what are you gaining by allowing the wrap around?  I don't need to
> know what ignore_address_wrap_around does, I need to know _why_ it is
> necessary.
>

We have

(zero_extend:DI (plus:SI (FOO:SI) (const_int Y)))

I want to convert it to

(plus:DI (zero_extend:DI (FOO:SI)) (const_int Y))

There is no zero-extend on (const_int Y).  if FOO == 0xfffc and Y = 8,

(zero_extend:DI (plus:SI (FOO:SI) (const_int Y)))

gives 0x4 and

(plus:DI (zero_extend:DI (FOO:SI)) (const_int Y))

gives 0x10004.   If (plus:SI (FOO:SI) (const_int Y)) won't overflow
or its behavior is implementation-defined, the conversion is safe. If
it isn't the case, we should just drop it.


-- 
H.J.


Re: [Patch, Fortran] PR 45586: Mark type pointer components as nonrestricted

2011-07-27 Thread Tobias Burnus

Mikael Morin wrote:

Build and regtested on x86-64-linux.
OK for the trunk? What about backporting to 4.6 and to 4.5?

OK for trunk and 4.6.
4.5 doesn't have the gfc_nonrestricted_type function, so it's not worth the
bother IMO (unless you feel like backporting Michael's patch ;-) ).


Thanks for the quick review! I don't plan to backport 
gfc_nonrestricted_type - thus, I will follow your suggestion to only 
backport it to 4.6 ;-)


I have now committed the patch to the trunk (Rev. 176852) - and will 
commit later to the branch.


Thanks,

Tobias


Re: [Patch, Fortran] PR 45586: Mark type pointer components as nonrestricted

2011-07-27 Thread Mikael Morin
On Wednesday 27 July 2011 22:39:20 Tobias Burnus wrote:
> See discussion at http://gcc.gnu.org/ml/fortran/2011-07/msg00281.html
> and see PR 45586.
> 
> This patch fixes the test case of the PR by properly using the
> nonrestricted type for pointer components. Before, the test case failed
> (ICE) in some tree checking. While the ICE only affects the trunk (more
> precisely: --enable-checking={yes,tree}), the issue itself also affects
> the branches.
> 
> I am not completely sure that the patch covers all restrict/nonrestrict
> issues, but it fixes the PR and I couldn't create a variant which still
> gave an ICE.
I think the patch is good. Anyway, it shouldn't make things worse, which makes 
it good enough at least.

> 
> Build and regtested on x86-64-linux.
> OK for the trunk? What about backporting to 4.6 and to 4.5?
OK for trunk and 4.6. 
4.5 doesn't have the gfc_nonrestricted_type function, so it's not worth the 
bother IMO (unless you feel like backporting Michael's patch ;-) ).

Thanks
Mikael



RE: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-27 Thread Weddington, Eric

> -Original Message-
> From: Georg-Johann Lay [mailto:a...@gjlay.de]
> Sent: Wednesday, July 27, 2011 3:00 PM
> To: Richard Henderson
> Cc: Weddington, Eric; gcc-patches@gcc.gnu.org; Anatoly Sokolov; Denis
> Chertykov
> Subject: Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)
> 
> >
> > Fair enough.
> >
> > I didn't review the asm code, but the rest of the patch look ok to me.
> >
> > r~
> 
> Thanks, Eric will review the asm part  :-)

LOL
I trust you on the asm stuff. Ok by me.

However, how is our test coverage in this area?

Eric


[PATCH] Disable size optimizations of -gdwarf-2 DW_AT_data_member_location DW_OP_plus_uconst

2011-07-27 Thread Jakub Jelinek
Hi!

Apparently gdb (and not very unlikely other consumers) weren't able to
handle arbitrary location descriptions in DW_AT_data_member_location,
and my recent optimization to try to optimize e.g. DW_OP_plus_uconst
into equivalent, but shorter sequence of more operations apparently
doesn't work there.  While GDB is being fixed (or has it been already?),
this affects just -gdwarf-2 and there it could be considered a fixed idiom.

We should IMHO change the default DWARF level to -gdwarf-3 at least for now,
except for targets with broken tools (that aren't able to cope properly even
with DWARF 2, like Apple).  DWARF 3 has been released more than 5 years
ago...

So, the following patch just disables the size optimization when
DW_AT_data_member_location -gdwarf-2 only op is DW_OP_plus_uconst.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-07-27  Jakub Jelinek  

* dwarf2out.c (resolve_addr): For -gdwarf-2 don't
optimize DW_AT_data_member_location containing just
DW_OP_plus_uconst.

--- gcc/dwarf2out.c.jj  2011-07-27 18:55:11.0 +0200
+++ gcc/dwarf2out.c 2011-07-27 20:53:55.0 +0200
@@ -21808,13 +21808,26 @@ resolve_addr (dw_die_ref die)
  }
break;
   case dw_val_class_loc:
-   if (!resolve_addr_in_expr (AT_loc (a)))
- {
-   remove_AT (die, a->dw_attr);
-   ix--;
- }
-   else
- mark_base_types (AT_loc (a));
+   {
+ dw_loc_descr_ref l = AT_loc (a);
+ /* For -gdwarf-2 don't attempt to optimize
+DW_AT_data_member_location containing
+DW_OP_plus_uconst - older consumers might
+rely on it being that op instead of a more complex,
+but shorter, location description.  */
+ if ((dwarf_version > 2
+  || a->dw_attr != DW_AT_data_member_location
+  || l == NULL
+  || l->dw_loc_opc != DW_OP_plus_uconst
+  || l->dw_loc_next != NULL)
+ && !resolve_addr_in_expr (l))
+   {
+ remove_AT (die, a->dw_attr);
+ ix--;
+   }
+ else
+   mark_base_types (l);
+   }
break;
   case dw_val_class_addr:
if (a->dw_attr == DW_AT_const_value


Jakub


Re: [DF] Replace various bitmaps with HARD_REG_SETs

2011-07-27 Thread Paolo Bonzini

On 07/27/2011 06:17 PM, Joseph S. Myers wrote:



>  --- gcc/target.h  2011-04-06 11:08:17 +
>  +++ gcc/target.h  2011-07-27 10:27:56 +
>  @@ -50,6 +50,7 @@
>#define GCC_TARGET_H
>
>#include "tm.h"
>  +#include "hard-reg-set.h"
>#include "insn-modes.h"

Please send a patch against current trunk.  target.h hasn't included tm.h
for over a month.  Since hard-reg-set.h depends on tm.h, you won't be able
to include hard-reg-set.h in target.h any more, so you'll need to find
another solution for that.


For example you can make HARD_REG_SET always a struct, so that you can 
add a forward declaration in target.h.  GCC is able to optimize the 
struct away, we rely on that.


Also, compiling GCC (make all-gcc TARGET-gcc=cc1 CFLAGS=-g) for the 
affected targets would be better.


Paolo


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-27 Thread Paolo Bonzini

On 07/27/2011 07:29 PM, H.J. Lu wrote:

If IRNORE_ADDRESS_WRAP_AROUND is TRUE, we
+   also permute the conversion and addition of a constant.  It is used to
+   optimize cases where overflow of base + constant offset won't happen or
+   its behavior is implementation-defined for a given target.  */


Regarding correctness: you're converting a SImode operation to DImode by 
"pushing in" the zero_extend operation.  What makes you think that base 
+ constant offset won't overflow in any case?


And also: what are you gaining by allowing the wrap around?  I don't 
need to know what ignore_address_wrap_around does, I need to know _why_ 
it is necessary.


DO NOT post another patch.  Answer questions in English, here, please.

Paolo


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Joseph S. Myers
On Wed, 27 Jul 2011, Andrew MacLeod wrote:

> On 07/27/2011 01:08 PM, Aldy Hernandez wrote:
> > 
> > > Anyway, I don't think a --param is appropriate to control a flag whether
> > > to allow store data-races to be created.  Why not use a regular option
> > > instead?
> > 
> > I don't care either way.  What -foption-name do you suggest?
> Well, I suggested a -f option set last year when this was laid out, and Ian
> suggested that it should be a --param
> 
> http://gcc.gnu.org/ml/gcc/2010-05/msg00118.html
> 
> "I don't agree with your proposed command line options.  They seem fine
> for internal use, but I think very very few users would know when or
> whether they should use -fno-data-race-stores.  I think you should
> downgrade those options to a --param value, and think about a
> multi-layered -fmemory-model option. "

The documentation says --param is for "various constants to control the 
amount of optimization that is done".  I don't think it should be used for 
anything that affects the semantics of the program; I think -f options are 
what's appropriate here (with appropriate warnings in the documentation if 
most of the options should not generally be used directly by users).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix -gdwarf-3 DW_AT_data_member_location for >= 64KB offsets (PR debug/49871)

2011-07-27 Thread Jakub Jelinek
Hi!

As the attached testcase shows, we were generating invalid DWARF 3
for -gdwarf-3 for large DW_AT_data_member_location offsets.
Unlike DWARF 2, DWARF 3 allows DW_AT_data_member_location to be
either constant, block or loclistptr class, and for this combination
it has a note there that DW_FORM_data[48] (which are used for loclistptr)
aren't included then in constant class.  We were generating a DW_FORM_data4
for the constant anyway, the following patch fixes it to generate
DW_FORM_udata in that case instead.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Would this be appropriate for 4.6 too?

2011-07-27  Jakub Jelinek  

PR debug/49871
* dwarf2out.c (size_of_die, value_format, output_die): Use
DW_FORM_udata instead of DW_FORM_data[48] for
dw_val_class_unsigned_const DW_AT_data_member_location for DWARF 3.

* gcc.dg/debug/dwarf2/pr49871.c: New test.

--- gcc/dwarf2out.c.jj  2011-07-26 16:19:56.0 +0200
+++ gcc/dwarf2out.c 2011-07-27 18:55:11.0 +0200
@@ -7652,7 +7652,15 @@ size_of_die (dw_die_ref die)
  size += size_of_sleb128 (AT_int (a));
  break;
case dw_val_class_unsigned_const:
- size += constant_size (AT_unsigned (a));
+ {
+   int csize = constant_size (AT_unsigned (a));
+   if (dwarf_version == 3
+   && a->dw_attr == DW_AT_data_member_location
+   && csize >= 4)
+ size += size_of_uleb128 (AT_unsigned (a));
+   else
+ size += csize;
+ }
  break;
case dw_val_class_const_double:
  size += 2 * HOST_BITS_PER_WIDE_INT / HOST_BITS_PER_CHAR;
@@ -7953,8 +7961,16 @@ value_format (dw_attr_ref a)
case 2:
  return DW_FORM_data2;
case 4:
+ /* In DWARF3 DW_AT_data_member_location with
+DW_FORM_data4 or DW_FORM_data8 is a loclistptr, not
+constant, so we need to use DW_FORM_udata if we need
+a large constant.  */
+ if (dwarf_version == 3 && a->dw_attr == DW_AT_data_member_location)
+   return DW_FORM_udata;
  return DW_FORM_data4;
case 8:
+ if (dwarf_version == 3 && a->dw_attr == DW_AT_data_member_location)
+   return DW_FORM_udata;
  return DW_FORM_data8;
default:
  gcc_unreachable ();
@@ -8261,8 +8277,15 @@ output_die (dw_die_ref die)
  break;
 
case dw_val_class_unsigned_const:
- dw2_asm_output_data (constant_size (AT_unsigned (a)),
-  AT_unsigned (a), "%s", name);
+ {
+   int csize = constant_size (AT_unsigned (a));
+   if (dwarf_version == 3
+   && a->dw_attr == DW_AT_data_member_location
+   && csize >= 4)
+ dw2_asm_output_data_uleb128 (AT_unsigned (a), "%s", name);
+   else
+ dw2_asm_output_data (csize, AT_unsigned (a), "%s", name);
+ }
  break;
 
case dw_val_class_const_double:
--- gcc/testsuite/gcc.dg/debug/dwarf2/pr49871.c.jj  2011-07-27 
19:14:10.0 +0200
+++ gcc/testsuite/gcc.dg/debug/dwarf2/pr49871.c 2011-07-27 19:15:40.0 
+0200
@@ -0,0 +1,12 @@
+/* PR debug/49871 */
+/* { dg-do compile } */
+/* { dg-options "-gdwarf-3 -dA -fno-merge-debug-strings" } */
+
+struct S
+{
+  char a[1 << 16];
+  int b;
+} s;
+
+/* { dg-final { scan-assembler 
"\\(DW_AT_data_member_location\\)\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\\(DW_FORM_udata\\)"
 } } */
+/* { dg-final { scan-assembler-not 
"\\(DW_AT_data_member_location\\)\[^\\r\\n\]*\[\\r\\n\]+\[^\\r\\n\]*\\(DW_FORM_data\[48\]\\)"
 } } */

Jakub


Re: [PATCH, i386, testsuite] New BMI testcases

2011-07-27 Thread Jakub Jelinek
On Wed, Jul 27, 2011 at 10:46:06AM -0700, H.J. Lu wrote:
> On Wed, Jul 27, 2011 at 10:26 AM, Kirill Yukhin  
> wrote:
> > Sharp eye! Thanks.
> > Updated patch is attached.
> > Guys, with write approval, could you please commit that?
> >
> 
> I checked it in for you.

Unfortunately many of the new tests fail with old assembler, because
the builtin in check_effective_target_bmi is optimized away (ignored, as
well as using constant arguments, two reasons to get rid of it).

Fixed thusly, tested on i686-linux and x86_64-linux, both with old and new
binutils.  Ok for trunk?

2011-07-27  Jakub Jelinek  

* gcc.target/i386/i386.exp (check_effective_target_bmi): Make sure
the builtin isn't optimized away.

--- gcc/testsuite/gcc.target/i386/i386.exp.jj   2011-07-27 20:18:14.0 
+0200
+++ gcc/testsuite/gcc.target/i386/i386.exp  2011-07-27 23:20:49.705402014 
+0200
@@ -189,9 +189,9 @@ proc check_effective_target_xop { } {
 # Return 1 if bmi instructions can be compiled.
 proc check_effective_target_bmi { } {
 return [check_no_compiler_messages bmi object {
-   void __bextr_u32 (void)
+   unsigned int __bextr_u32 (unsigned int __X, unsigned int __Y)
{
- __builtin_ia32_bextr_u32 (0, 0);
+ return __builtin_ia32_bextr_u32 (__X, __Y);
}
 } "-mbmi" ]
 }


Jakub


Re: [cxx-mem-model] __sync_mem builtin support patch 2/3 - code

2011-07-27 Thread Richard Henderson
On 07/27/2011 11:17 AM, Andrew MacLeod wrote:
> On 07/27/2011 12:03 PM, Richard Henderson wrote:
>> Please disable the relevant tests too.
> sure.
> 
>>>  if ((icode != CODE_FOR_nothing)&&  (model == MEMMODEL_SEQ_CST ||
>>> model == MEMMODEL_ACQ_REL))
>>> + #ifdef HAVE_sync_mem_thread_fence
>>> + emit_mem_thread_fence (model);
>>> + #else
>>>expand_builtin_sync_synchronize ();
>>> + #endif
>> Coding style requires braces here.  Yes, only one of the two
>> functions are called, but that's not immediately obvious to
>> the eye.
>>
>> Lots of other instances in your new code.
>>
>> That said, why wouldn't emit_mem_thread_fence always exist
>> and generate the expand_builtin_sync_synchronize as needed?
> 
> Done after a chat with you sorting it out.. I added an 
> expand_builtin_mem_thread_fence() routine which does just this, much cleaner 
> :-)
> 
> I also noticed that all the expand_builtin_sync_mem_ flag and fence routines 
> (4 in total) were all not quite correct. none of them would actually use a 
> pattern if it was defined, so I changed them a bit to do that.  They were 
> basically the set which did not have a TYPE modifier, so didnt have entries 
> in the direct_optab table, so were missing out.
> 
> Rest is done.  Patch is attached in case you want to look at the changes.

Looks ok.


r~


Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-27 Thread Georg-Johann Lay

http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02391.html

Richard Henderson schrieb:


On 07/27/2011 08:57 AM, Georg-Johann Lay wrote:

You'll probably end up with quite a few register classes 
out of this, but hopefully reload can do a better job than

you can manually...


Agreed.

insns that will benefit are insns with two input operands that
commute, i.e. mulsi3, umulhisi3, mulhisi3, mulhi3.

Maybe even other 2-input insns could benefit because there's no
predetermined order in which the moves are accomplished; e.g.
moving R24 before R22 in udivmodqi4.  I don't know if register
allocator is smart enough to swap the assignments if that is
better.

Moreover, it would reduce the number of insns resp. split
patterns and help cleanup md.

I'd prefer to do that work in a separate patch.  The current patch
behaves the same as the old code, so it's not a performance
regression of the current patch.


Fair enough.

I didn't review the asm code, but the rest of the patch look ok to me.

r~


Thanks, Eric will review the asm part  :-)

Johann





[PATCH] PR c++/33255 - Support -Wunused-local-typedefs warning

2011-07-27 Thread Dodji Seketeli
Hello,

The patch below implements a new flag to warn when a typedef defined
in a function is unused.  The warning is -Wunused-local-typedef, is
active for the C and C++ FEs and is intended for trunk.

With this patch the compiler caught a few spots of unused local
typedefs in libstdc++ that I have fixed thus.

Bootstrapped and tested on x86_64-unknown-linux-gnu against trunk.

-- 
Dodji

>From b4612a6dd8a642795fe81398b372746f19c86614 Mon Sep 17 00:00:00 2001
From: Dodji Seketeli 
Date: Mon, 25 Jul 2011 19:02:07 +0200
Subject: [PATCH] PR c++/33255 - Support -Wunused-local-typedefs warning

gcc/

* Makefile.in: add pointer-set.h dependency to function.h
* function.h (function::{local_typedefs,used_local_typedefs}): New
struct members.
* tree.c (walk_type_fields): Don't forget to walk the underlying
type of a typedef.
* c-decl.c (pushdecl, grokdeclarator): Use the new
record_locally_defined_typedef.
(finish_function): Use the new maybe_warn_unused_local_typedefs.
(maybe_record_local_typedef_use_r)
(c_maybe_record_local_typedef_use): New static functions.
(maybe_record_local_typedef_use): New public function definition.
* c-typeck.c (c_expr_sizeof_type, c_cast_expr): Use the new
maybe_record_local_typedef_use.

gcc/c-family

* c-common.h (record_locally_defined_typedef)
(maybe_record_local_typedef_use)
(maybe_record_local_typedef_use_real)
(maybe_warn_unused_local_typedefs): Declare new functions.
* c-common.c (c_sizeof_or_alignof_type): Use the new
maybe_record_local_typedef_use.
(record_locally_defined_typedef)
(maybe_record_local_typedef_use_real)
(maybe_warn_unused_local_typedefs): Define new functions.
* c.opt: Declare new -Wunused-local-typedefs flag.

gcc/cp

* name-lookup.c (pushdecl_maybe_friend_1): Use the new
record_locally_defined_typedef.
* cp-tree.h (maybe_record_local_typedef_use): Declare new function.
* decl.c (grokdeclarator): Use the new
maybe_record_local_typedef_use.
(finish_function): Use the new maybe_warn_unused_local_typedefs.
* decl2.c (cp_maybe_record_local_typedef_use_r)
(cp_maybe_record_local_typedef_use): New static functions.
(maybe_record_local_typedef_use): New public function.
(mark_used): Use the new maybe_record_local_typedef_use.
* init.c (build_new): Likewise.
* parser.c (cp_parser_qualifying_entity, cp_parser_template_id):
Likewise.
* rtti.c (build_dynamic_cast_1): Use the new
maybe_record_local_typedef_use.
* typeck.c (cxx_sizeof_or_alignof_type, build_static_cast_1)
(build_reinterpret_cast_1)
(build_const_cast_1): Use the new maybe_record_local_typedef_use.
* typeck2.c (build_functional_cast): Likewise.

gcc/doc/

* invoke.texi: Update documentation for -Wunused-local-typedefs.

gcc/testsuite/

* g++.dg/warn/Wunused-local-typedefs.C: New test file.
* c-c++-common/Wunused-local-typedefs.c: Likewise.

libstdc++-v3/

* include/ext/bitmap_allocator.h
(__detail::__mini_vector::__lower_bound): Remove unused typedef.
* src/istream.cc (std::operator>>(basic_istream& __in,
basic_string& __str)): Likewise.
(std::getline): Likewise.
* src/valarray.cc (__valarray_product): Likewise.
---
 gcc/Makefile.in|2 +-
 gcc/c-decl.c   |   54 +++-
 gcc/c-family/c-common.c|   95 
 gcc/c-family/c-common.h|4 +
 gcc/c-family/c.opt |4 +
 gcc/c-typeck.c |5 +
 gcc/cp/cp-tree.h   |1 +
 gcc/cp/decl.c  |6 ++
 gcc/cp/decl2.c |   46 ++
 gcc/cp/init.c  |2 +
 gcc/cp/name-lookup.c   |7 ++
 gcc/cp/parser.c|7 ++
 gcc/cp/rtti.c  |2 +
 gcc/cp/typeck.c|7 ++
 gcc/cp/typeck2.c   |2 +
 gcc/doc/invoke.texi|9 ++-
 gcc/function.h |8 ++
 .../c-c++-common/Wunused-local-typedefs.c  |   38 
 gcc/tree.c |5 +
 libstdc++-v3/include/ext/bitmap_allocator.h|2 -
 libstdc++-v3/src/istream.cc|3 -
 libstdc++-v3/src/valarray.cc   |1 -
 22 files changed, 300 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wunused-lo

[Patch, Fortran] PR 45586: Mark type pointer components as nonrestricted

2011-07-27 Thread Tobias Burnus
See discussion at http://gcc.gnu.org/ml/fortran/2011-07/msg00281.html 
and see PR 45586.


This patch fixes the test case of the PR by properly using the 
nonrestricted type for pointer components. Before, the test case failed 
(ICE) in some tree checking. While the ICE only affects the trunk (more 
precisely: --enable-checking={yes,tree}), the issue itself also affects 
the branches.


I am not completely sure that the patch covers all restrict/nonrestrict 
issues, but it fixes the PR and I couldn't create a variant which still 
gave an ICE.


Build and regtested on x86-64-linux.
OK for the trunk? What about backporting to 4.6 and to 4.5?

Tobias
2011-07-27  Tobias Burnus  

	PR fortran/45586
	* trans-types.c (gfc_get_derived_type): Ensure that pointer
	component types are marked as nonrestricted.

2011-07-27  Tobias Burnus  

	PR fortran/45586
	* gfortran.dg/lto/pr45586-2_0.f90: New.

diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
index b66941f..bec2a11 100644
--- a/gcc/fortran/trans-types.c
+++ b/gcc/fortran/trans-types.c
@@ -2421,6 +2421,9 @@ gfc_get_derived_type (gfc_symbol * derived)
 	   && !c->attr.proc_pointer)
 	field_type = build_pointer_type (field_type);
 
+  if (c->attr.pointer)
+	field_type = gfc_nonrestricted_type (field_type);
+
   /* vtype fields can point to different types to the base type.  */
   if (c->ts.type == BT_DERIVED && c->ts.u.derived->attr.vtype)
 	  field_type = build_pointer_type_for_mode (TREE_TYPE (field_type),
--- /dev/null	2011-07-27 08:24:41.216620249 +0200
+++ gcc/gcc/testsuite/gfortran.dg/lto/pr45586-2_0.f90	2011-07-27 21:44:48.0 +0200
@@ -0,0 +1,34 @@
+! { dg-lto-do link }
+!
+! PR fortran/45586 (comment 53)
+!
+
+MODULE M1
+  INTEGER, PARAMETER :: dp=8
+  TYPE realspace_grid_type
+ REAL(KIND=dp), DIMENSION ( :, :, : ), ALLOCATABLE :: r
+  END TYPE realspace_grid_type
+  TYPE realspace_grid_p_type
+ TYPE(realspace_grid_type), POINTER :: rs_grid
+  END TYPE realspace_grid_p_type
+  TYPE realspaces_grid_p_type
+ TYPE(realspace_grid_p_type), DIMENSION(:), POINTER :: rs
+  END TYPE realspaces_grid_p_type
+END MODULE
+
+MODULE M2
+ USE M1
+CONTAINS
+ SUBROUTINE S1()
+  INTEGER :: i,j
+  TYPE(realspaces_grid_p_type), DIMENSION(:), POINTER :: rs_gauge
+  REAL(dp), DIMENSION(:, :, :), POINTER:: y
+  y=>rs_gauge(i)%rs(j)%rs_grid%r
+ END SUBROUTINE
+END MODULE
+
+USE M2
+  CALL S1()
+END
+
+! { dg-final { cleanup-modules "m1 m2" } }


[v3] Library bits of c++/49813

2011-07-27 Thread Paolo Carlini

Hi,

these are the library bits of the issue, with a workaround in place for 
the weird std::isinf issue described in the audit trail. Tested 
x86_64-linux multilib, committed to mainline.


Paolo.

///
2011-07-27  Paolo Carlini  

PR c++/49813
* include/c_global/cmath: Use _GLIBCXX_CONSTEXPR and constexpr.
Index: include/c_global/cmath
===
--- include/c_global/cmath  (revision 176832)
+++ include/c_global/cmath  (working copy)
@@ -78,84 +78,88 @@
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-  inline double
+  inline _GLIBCXX_CONSTEXPR double
   abs(double __x)
   { return __builtin_fabs(__x); }
 
-  inline float
+  inline _GLIBCXX_CONSTEXPR float
   abs(float __x)
   { return __builtin_fabsf(__x); }
 
-  inline long double
+  inline _GLIBCXX_CONSTEXPR long double
   abs(long double __x)
   { return __builtin_fabsl(__x); }
 
   template
-inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
-  double>::__type
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value,
+double>::__type
 abs(_Tp __x)
 { return __builtin_fabs(__x); }
 
   using ::acos;
 
-  inline float
+  inline _GLIBCXX_CONSTEXPR float
   acos(float __x)
   { return __builtin_acosf(__x); }
 
-  inline long double
+  inline _GLIBCXX_CONSTEXPR long double
   acos(long double __x)
   { return __builtin_acosl(__x); }
 
   template
-inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
-  double>::__type
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
+double>::__type
 acos(_Tp __x)
 { return __builtin_acos(__x); }
 
   using ::asin;
 
-  inline float
+  inline _GLIBCXX_CONSTEXPR float
   asin(float __x)
   { return __builtin_asinf(__x); }
 
-  inline long double
+  inline _GLIBCXX_CONSTEXPR long double
   asin(long double __x)
   { return __builtin_asinl(__x); }
 
   template
-inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value,
-  double>::__type
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value,
+double>::__type
 asin(_Tp __x)
 { return __builtin_asin(__x); }
 
   using ::atan;
 
-  inline float
+  inline _GLIBCXX_CONSTEXPR float
   atan(float __x)
   { return __builtin_atanf(__x); }
 
-  inline long double
+  inline _GLIBCXX_CONSTEXPR long double
   atan(long double __x)
   { return __builtin_atanl(__x); }
 
   template
-inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
-  double>::__type
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value,
+double>::__type
 atan(_Tp __x)
 { return __builtin_atan(__x); }
 
   using ::atan2;
 
-  inline float
+  inline _GLIBCXX_CONSTEXPR float
   atan2(float __y, float __x)
   { return __builtin_atan2f(__y, __x); }
 
-  inline long double
+  inline _GLIBCXX_CONSTEXPR long double
   atan2(long double __y, long double __x)
   { return __builtin_atan2l(__y, __x); }
 
   template
-inline
+inline _GLIBCXX_CONSTEXPR
 typename __gnu_cxx::__promote_2<
 typename __gnu_cxx::__enable_if<__is_arithmetic<_Tp>::__value
&& __is_arithmetic<_Up>::__value,
@@ -168,191 +172,201 @@
 
   using ::ceil;
 
-  inline float
+  inline _GLIBCXX_CONSTEXPR float
   ceil(float __x)
   { return __builtin_ceilf(__x); }
 
-  inline long double
+  inline _GLIBCXX_CONSTEXPR long double
   ceil(long double __x)
   { return __builtin_ceill(__x); }
 
   template
-inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
-  double>::__type
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
+double>::__type
 ceil(_Tp __x)
 { return __builtin_ceil(__x); }
 
   using ::cos;
 
-  inline float
+  inline _GLIBCXX_CONSTEXPR float
   cos(float __x)
   { return __builtin_cosf(__x); }
 
-  inline long double
+  inline _GLIBCXX_CONSTEXPR long double
   cos(long double __x)
   { return __builtin_cosl(__x); }
 
   template
-inline typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value, 
-  double>::__type
+inline _GLIBCXX_CONSTEXPR
+typename __gnu_cxx::__enable_if<__is_integer<_Tp>::__value,
+double>::__type
 cos(_Tp __x)
 { return __builtin_cos(__x); }
 
   using ::cosh;
 
-  inline float
+  inline _GLIBCXX_CONSTEXPR float
   cosh(float __x)
   { return __builtin_coshf(__x); }
 
-  inline long 

Re: [PATCH, PR 49094] Refrain from creating misaligned accesses in SRA

2011-07-27 Thread Ulrich Weigand
Martin Jambor wrote:
> On Wed, Jul 27, 2011 at 02:34:59PM +0200, Ulrich Weigand wrote:
> > Martin Jambor wrote:
> > 
> > > OK, this is what I have just committed as revision 176797 after
> > > re-testing.
> > 
> > Thanks, this has fixed the forwprop-5.c regression on spu-elf on mainline.
> > 
> > I'm seeing the same failure on the 4.6 branch -- would this patch also be
> > appropriate there?
> > 
> 
> You're right, it should be applied to the 4.6 branch too.  Since you
> have the setup to thest it, can you do it please?  Otherwise I'll do
> it in a few days.

Full test on spu-elf has now completed.  In addition to the forwprop-5.c
regression, the patch also fixes this regression (see PR 49545):
FAIL: g++.dg/tree-ssa/fwprop-align.C scan-tree-dump-times forwprop2 "& 1" 0

No new regressions.

OK for the branch?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Aldy Hernandez

On 07/27/11 13:55, Jakub Jelinek wrote:

On Wed, Jul 27, 2011 at 01:51:04PM -0500, Aldy Hernandez wrote:

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875


The assembler sequence on ia32 was a bit different.

H.J.  Can you try this on your end?  If it fixes the problem, I will
commit as obvious.


You could test it yourself on x86_64-linux too with
make check -k RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=cxxbit*'


Committed.
PR middle-end/49875
* c-c++-common/cxxbitfields-4.c: Check for smaller than long
moves.
* c-c++-common/cxxbitfields-5.c: Same.

Index: c-c++-common/cxxbitfields-4.c
===
--- c-c++-common/cxxbitfields-4.c   (revision 176824)
+++ c-c++-common/cxxbitfields-4.c   (working copy)
@@ -15,4 +15,4 @@ void update_c(struct bits *p, int val) 
 p -> c = val;
 }
 
-/* { dg-final { scan-assembler-not "movl" } } */
+/* { dg-final { scan-assembler "mov\[bw\]" } } */
Index: c-c++-common/cxxbitfields-5.c
===
--- c-c++-common/cxxbitfields-5.c   (revision 176824)
+++ c-c++-common/cxxbitfields-5.c   (working copy)
@@ -26,4 +26,4 @@ void foo()
   p -> c = 55;
 }
 
-/* { dg-final { scan-assembler-not "movl\t\\(" } } */
+/* { dg-final { scan-assembler "mov\[bw\]" } } */


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Jakub Jelinek
On Wed, Jul 27, 2011 at 01:51:04PM -0500, Aldy Hernandez wrote:
> >This caused:
> >
> >http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875
> 
> The assembler sequence on ia32 was a bit different.
> 
> H.J.  Can you try this on your end?  If it fixes the problem, I will
> commit as obvious.

You could test it yourself on x86_64-linux too with
make check -k RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=cxxbit*'

>   PR middle-end/49875
>   * c-c++-common/cxxbitfields-4.c: Check for smaller than long
>   moves.

Jakub


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Aldy Hernandez



This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875


The assembler sequence on ia32 was a bit different.

H.J.  Can you try this on your end?  If it fixes the problem, I will 
commit as obvious.


Aldy
PR middle-end/49875
* c-c++-common/cxxbitfields-4.c: Check for smaller than long
moves.

Index: c-c++-common/cxxbitfields-4.c
===
--- c-c++-common/cxxbitfields-4.c   (revision 176824)
+++ c-c++-common/cxxbitfields-4.c   (working copy)
@@ -15,4 +15,4 @@ void update_c(struct bits *p, int val) 
 p -> c = val;
 }
 
-/* { dg-final { scan-assembler-not "movl" } } */
+/* { dg-final { scan-assembler "mov\[bw\]" } } */


[PATCH] Fix PR49876: Continue code generation with integer_zero_node on gloog_error

2011-07-27 Thread Sebastian Pop
When setting gloog_error, graphite should continue code generation
without early returns, as otherwise the SSA representation would not
be complete.  So set the new expression to integer_zero_node, that
would not require more SSA updates, and continue code generation as
nothing happened.

Regstrapping on amd64-linux.

2011-07-27  Sebastian Pop  

PR tree-optimization/49876
* sese.c (rename_uses): Do not return false on gloog_error: set
the new_expr to integer_zero_node and continue code generation.
(graphite_copy_stmts_from_block): Remove early exit on gloog_error.
---
 gcc/ChangeLog |7 +++
 gcc/sese.c|   18 --
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b07d494..a565c18 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,12 @@
 2011-07-27  Sebastian Pop  
 
+   PR tree-optimization/49876
+   * sese.c (rename_uses): Do not return false on gloog_error: set
+   the new_expr to integer_zero_node and continue code generation.
+   (graphite_copy_stmts_from_block): Remove early exit on gloog_error.
+
+2011-07-27  Sebastian Pop  
+
PR tree-optimization/49471
* tree-ssa-loop-manip.c (canonicalize_loop_ivs): Build an unsigned
iv only when the largest type is unsigned.  Do not call
diff --git a/gcc/sese.c b/gcc/sese.c
index ec96dfb..04a8e75 100644
--- a/gcc/sese.c
+++ b/gcc/sese.c
@@ -527,10 +527,10 @@ rename_uses (gimple copy, htab_t rename_map, 
gimple_stmt_iterator *gsi_tgt,
   if (chrec_contains_undetermined (scev))
{
  *gloog_error = true;
- return false;
+ new_expr = integer_zero_node;
}
-
-  new_expr = chrec_apply_map (scev, iv_map);
+  else
+   new_expr = chrec_apply_map (scev, iv_map);
 
   /* The apply should produce an expression tree containing
 the uses of the new induction variables.  We should be
@@ -540,12 +540,13 @@ rename_uses (gimple copy, htab_t rename_map, 
gimple_stmt_iterator *gsi_tgt,
  || tree_contains_chrecs (new_expr, NULL))
{
  *gloog_error = true;
- return false;
+ new_expr = integer_zero_node;
}
+  else
+   /* Replace the old_name with the new_expr.  */
+   new_expr = force_gimple_operand (unshare_expr (new_expr), &stmts,
+true, NULL_TREE);
 
-  /* Replace the old_name with the new_expr.  */
-  new_expr = force_gimple_operand (unshare_expr (new_expr), &stmts,
-  true, NULL_TREE);
   gsi_insert_seq_before (gsi_tgt, stmts, GSI_SAME_STMT);
   replace_exp (use_p, new_expr);
 
@@ -621,9 +622,6 @@ graphite_copy_stmts_from_block (basic_block bb, basic_block 
new_bb,
   gloog_error))
fold_stmt_inplace (copy);
 
-  if (*gloog_error)
-   break;
-
   update_stmt (copy);
 }
 }
-- 
1.7.4.1



Re: [PATCH] Propagate source locations from function_decls to their template_decls

2011-07-27 Thread Jason Merrill
Yes.

Jeffrey Yasskin  wrote:

Thanks. I'll commit to trunk in the morning when I can be around to
watch for breakage.

Is this also ok for gcc-4_6-branch?

On Tue, Jul 26, 2011 at 7:16 PM, Jason Merrill  wrote:
> Ok.
>
> Jeffrey Yasskin  wrote:
>
> Hi Jason. Paolo suggested I ping you directly about this patch for the
> C++ parser. Thanks in advance for taking a look.
>
> On Tue, Jul 26, 2011 at 2:20 PM, Jeffrey Yasskin  wrote:
>> This patch copies the source location of a FUNCTION_DECL to the
>> TEMPLATE_DECL that build_template_decl() builds out of it. Otherwise,
>> the TEMPLATE_DECL's location becomes input_location, which is the end
>> of the parameter list, while the FUNCTION_DECL's location is the
>> location of the name of the function. Depending on what order
>> templates are defined and used, gcc may emit either the
>> FUNCTION_DECL's or TEMPLATE_DECL's location into the debug location,
>> which causes gold's ODR checker to emit false positives.
>>
>> Tested with a bootstrap+`make -k check-c++` on
>> x86_64-unknown-linux-gnu. I'm looking to check it in to trunk, and
>> will propagate it to the gcc-4_6-branch if you think that's the right
>> thing to do.
>>
>> No more tests fail than in
>> http://gcc.gnu.org/ml/gcc-testresults/2011-07/msg02995.html.
>>
>> gcc/cp/ChangeLog:
>> 2011-07-26 ? Jeffrey Yasskin ?
>>
>> ? ? ? ?* pt.c (build_template_decl): Copy the function_decl's source
>> ? ? ? ?location to the new template_decl.
>>
>> gcc/testsuite/ChangeLog:
>> 2011-07-26 ? Jeffrey Yasskin ?
>>
>> ? ? ? ?* g++.old-deja/g++.pt/crash60.C: Updated.
>>
>> libstdc++-v3/ChangeLog:
>> 2011-07-26 ? Jeffrey Yasskin ?
>>
>> ? ? ? ?* testsuite/20_util/weak_ptr/comparison/cmp_neg.cc: Updated.
>>
>


Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread Kirill Yukhin
Okay, then here is an updated patch

updated ChangeLog entry:
2011-07-26  Kirill Yukhin  

PR target/49547
* config.gcc (i[34567]86-*-*): Replace abmintrin.h with
lzcntintrin.h.
(x86_64-*-*): Likewise.
* config/i386/i386.opt (mlzcnt): New.
* config/i386/abmintrin.h: File removed.
(__lzcnt_u16, __lzcnt, __lzcnt_u64): Moved to ...
* config/i386/lzcntintrin.h: ... here. New file.
(__lzcnt): Rename to ...
(__lzcnt32): ... this.
* config/i386/bmiintrin.h (head): Update copyright year.
(__lzcnt_u16): Removed.
(__lzcnt_u32): Likewise.
(__lzcnt_u64): Likewise.
* config/i386/x86intrin.h: Include lzcntintrin.h when __LZCNT__
is defined, remove abmintrin.h.
* config/i386/cpuid.h: New define.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
LZCNT feature.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__LZCNT__ if needed.
* config/i386/i386.c (ix86_target_string): New option -mlzcnt.
(ix86_option_override_internal): Handle LZCNT option.
(ix86_valid_target_attribute_inner_p): Likewise.
(struct builtin_description bdesc_args) : Update.
* config/i386/i386.h (TARGET_LZCNT): New.
(CLZ_DEFINED_VALUE_AT_ZERO): Update.
* config/i386/i386.md (clz2): Update insn constraint.
(clz2_lzcnt): Likewise.
* doc/invoke.texi: Mention -mlzcnt option.
* doc/extend.texi: Likewise.

Bootstrapped successfully.
Ok?

K

On Wed, Jul 27, 2011 at 8:51 PM, H.J. Lu  wrote:
> On Wed, Jul 27, 2011 at 9:49 AM, Uros Bizjak  wrote:
>> On Wed, Jul 27, 2011 at 6:12 PM, Kirill Yukhin  
>> wrote:
>>> Than as it is ABM header, it should include two headers: lzcntinrin.h
>>> and popcntintrin.h
>>>
>>> But again, it seems useless to me. If we cannot remove empty header,
>>> let it stay empty...
>>>
>>> K
>>>
>>> On Wed, Jul 27, 2011 at 7:53 PM, H.J. Lu  wrote:
 On Wed, Jul 27, 2011 at 8:45 AM, Kirill Yukhin  
 wrote:
> Just have a closer look to ABM intrinsics support in GCC
> Seems, we have popcnt support in separate file: popcntintrin.h
>
> So, after I move lzcnt intrinsics to lzcntintrin.h, abmintrin will
> become useless and have to be removed at all

 We can't remove an installed header file.  It should just include
 other header files.
>>
>> ambintrin.h has:
>>
>> #ifndef _X86INTRIN_H_INCLUDED
>> # error "Never use  directly; include  instead."
>> #endif
>>
>> I see no problem in removing this header. It is not possible to
>> #include it directly.
>>
>
> Sounds good to me.
>
> --
> H.J.
>


lzcnt-4.gcc.patch
Description: Binary data


Re: [PATCH] PR45450: disable legality check after an openscop read

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 7:37 AM, Sebastian Pop  wrote:
> Hi,
>
> I will commit this patch to trunk after regstrap.
>
> Sebastian
>
> 2011-07-23  Sebastian Pop  
>
>        PR middle-end/45450
>        * graphite-poly.c (apply_poly_transforms): Disable legality check
>        after an openscop read.

One of your changes from revision 176836-176838 may have caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49876

-- 
H.J.


[4.6] Fix -mcmodel=large calls (PR target/49866)

2011-07-27 Thread Jakub Jelinek
Hi!

As the testcase shows, 4.6 generates invalid
jmp *$baz
insn which fails to assemble with -O2 -mcmodel=large.  This has been fixed
by Uros on the trunk already, but with a larger patch, this patch just
backports the addition of the z constraint and uses it in all call patterns
instead of the s constraint.
Bootstrapped/regtested on x86_64-linux and i686-linux, approved by Uros
in the PR, committed to 4.6 branch and the testcase also to the trunk.

2011-07-27  Jakub Jelinek  

PR target/49866
* config/i386/i386.md (*call_pop_1_vzeroupper, *call_pop_1,
*sibcall_pop_1_vzeroupper, *sibcall_pop_1, *call_1_vzeroupper,
*call_1, *sibcall_1_vzeroupper, *sibcall_1, *call_1_rex64_vzeroupper,
*call_1_rex64, *call_1_rex64_ms_sysv_vzeroupper,
*call_1_rex64_ms_sysv, *sibcall_1_rex64_vzeroupper,
*sibcall_1_rex64, *call_value_pop_1_vzeroupper,
*call_value_pop_1, *sibcall_value_pop_1_vzeroupper,
*sibcall_value_pop_1, *call_value_1_vzeroupper,
*call_value_1, *sibcall_value_1_vzeroupper,
*sibcall_value_1, *call_value_1_rex64_vzeroupper,
*call_value_1_rex64, *call_value_1_rex64_ms_sysv_vzeroupper,
*call_value_1_rex64_ms_sysv, *sibcall_value_1_rex64_vzeroupper,
*sibcall_value_1_rex64): Use z constraint instead of s constraint.

Backport from mainline
2011-05-16  Uros Bizjak  

* config/i386/constraints.md (z): New constraint.
testsuite/
* gcc.target/i386/pr49866.c: New test.

--- gcc/config/i386/constraints.md.jj   2011-05-18 12:00:01.0 +0200
+++ gcc/config/i386/constraints.md  2011-07-27 14:28:06.0 +0200
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;; B H   T  W
-;;;   h jk  vw  z
+;;;   h jk  vw
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -105,6 +105,10 @@ (define_register_constraint "Ym"
  "TARGET_MMX && TARGET_INTER_UNIT_MOVES ? MMX_REGS : NO_REGS"
  "@internal Any MMX register, when inter-unit moves are enabled.")
 
+(define_constraint "z"
+  "@internal Constant call address operand."
+  (match_operand 0 "constant_call_address_operand"))
+
 ;; Integer constant constraints.
 (define_constraint "I"
   "Integer constant in the range 0 @dots{} 31, for 32-bit shifts."
--- gcc/config/i386/i386.md.jj  2011-07-27 13:45:38.0 +0200
+++ gcc/config/i386/i386.md 2011-07-27 14:30:16.0 +0200
@@ -11350,7 +11350,7 @@ (define_insn "*call_pop_0"
 
 (define_insn_and_split "*call_pop_1_vzeroupper"
   [(parallel
-[(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lsm"))
+[(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lzm"))
   (match_operand:SI 1 "" ""))
  (set (reg:SI SP_REG)
  (plus:SI (reg:SI SP_REG)
@@ -11365,7 +11365,7 @@ (define_insn_and_split "*call_pop_1_vzer
   [(set_attr "type" "call")])
 
 (define_insn "*call_pop_1"
-  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lsm"))
+  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lzm"))
 (match_operand:SI 1 "" ""))
(set (reg:SI SP_REG)
(plus:SI (reg:SI SP_REG)
@@ -11380,7 +11380,7 @@ (define_insn "*call_pop_1"
 
 (define_insn_and_split "*sibcall_pop_1_vzeroupper"
  [(parallel
-   [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "s,U"))
+   [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "z,U"))
   (match_operand:SI 1 "" ""))
  (set (reg:SI SP_REG)
  (plus:SI (reg:SI SP_REG)
@@ -11395,7 +11395,7 @@ (define_insn_and_split "*sibcall_pop_1_v
   [(set_attr "type" "call")])
 
 (define_insn "*sibcall_pop_1"
-  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "s,U"))
+  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "z,U"))
 (match_operand:SI 1 "" ""))
(set (reg:SI SP_REG)
(plus:SI (reg:SI SP_REG)
@@ -11446,7 +11446,7 @@ (define_insn "*call_0"
   [(set_attr "type" "call")])
 
 (define_insn_and_split "*call_1_vzeroupper"
-  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lsm"))
+  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lzm"))
 (match_operand 1 "" ""))
(unspec [(match_operand 2 "const_int_operand" "")]
   UNSPEC_CALL_NEEDS_VZEROUPPER)]
@@ -11458,14 +11458,14 @@ (define_insn_and_split "*call_1_vzeroupp
   [(set_attr "type" "call")])
 
 (define_insn "*call_1"
-  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lsm"))
+  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lzm"))
 (match_operand 1 "" ""))]
   "!TARGET_64BIT && !SIBLING_CALL_P (insn)"
   { return ix86_output_call_insn (insn, operands[0], 0); }
   [(set_attr "type" "call")])
 
 (define_insn_and_split "*sibcall_1_vzeroupper"
-  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "s,U"))
+  [(call (mem:QI (match_operand:SI 0 "sibcall_insn_operand" "z,U"))
 (match_operand 1 "" ""))
(unspec [(match_operand 2 "const_int_ope

Re: [cxx-mem-model] __sync_mem builtin support patch 2/3 - code

2011-07-27 Thread Andrew MacLeod

On 07/27/2011 12:03 PM, Richard Henderson wrote:

Please disable the relevant tests too.

sure.


 if ((icode != CODE_FOR_nothing)&&  (model == MEMMODEL_SEQ_CST ||
 model == MEMMODEL_ACQ_REL))
+ #ifdef HAVE_sync_mem_thread_fence
+ emit_mem_thread_fence (model);
+ #else
   expand_builtin_sync_synchronize ();
+ #endif

Coding style requires braces here.  Yes, only one of the two
functions are called, but that's not immediately obvious to
the eye.

Lots of other instances in your new code.

That said, why wouldn't emit_mem_thread_fence always exist
and generate the expand_builtin_sync_synchronize as needed?


Done after a chat with you sorting it out.. I added an 
expand_builtin_mem_thread_fence() routine which does just this, much 
cleaner :-)


I also noticed that all the expand_builtin_sync_mem_ flag and fence 
routines (4 in total) were all not quite correct. none of them would 
actually use a pattern if it was defined, so I changed them a bit to do 
that.  They were basically the set which did not have a TYPE modifier, 
so didnt have entries in the direct_optab table, so were missing out.


Rest is done.  Patch is attached in case you want to look at the changes.

bootstraps with no regressions on x86_64-unknown-linux-gnu, no regressions.

Andrew

* expr.h (expand_sync_mem_exchange): Change parameter order.
(expand_sync_mem_*): New prototypes.
(expand_builtin_sync_synchronize): Remove prototype.
(expand_builtin_mem_thread_fence): Add prototype.
* optabs.h (DOI_sync_mem_*): Add new optable enums.
(sync_mem_*_optab): Add new #defines for table entries.
* genopinit.c (const optabs[]): Add direct optab handlers.
* optabs.c (expand_sync_mem_exchange): Change parameter order, and use
builtin_mem_thread_fence.
(expand_sync_mem_compare_exchange, expand_sync_mem_load,
expand_sync_mem_store, expand_sync_mem_fetch_op): New. Expand
__sync_mem functions which handle multiple integral types.
* builtins.c (expand_expr_force_mode): New. Factor out common code for
ensuring an integer argument is in the proper mode.
(expand_builtin_sync_operation, expand_builtin_compare_and_swap,
expand_builtin_sync_lock_test_and_set): Use maybe_convert_modes.
(expand_builtin_sync_lock_release): Relocate higher in the file.
(get_memmodel): Don't assume the memmodel is the 3rd argument.
(expand_builtin_sync_mem_exchange): Change error check and use
maybe_convert_modes.
(expand_builtin_sync_mem_compare_exchange): New.
(expand_builtin_sync_mem_load, expand_builtin_sync_mem_store): New.
(expand_builtin_sync_mem_fetch_op): New.
(expand_builtin_sync_mem_flag_test_and_set): New.
(expand_builtin_sync_mem_flag_clear): New.
(expand_builtin_mem_thread_fence): New.
(expand_builtin_sync_mem_thread_fence): New.
(expand_builtin_mem_signal_fence): New.
(expand_builtin_sync_mem_signal_fence): New.
(expand_builtin): Handle BUILT_IN_SYNC_MEM_* types.
* c-family/c-common.c (resolve_overloaded_builtin): Handle
BUILT_IN_SYNC_MEM_* types.
* builtin-types.def (BT_FN_I{1,2,4,8,16}_VPTR_INT): New builtin type.
(BT_FN_VOID_VPTR_INT, BT_FN_BOOL_VPTR_INT): New builtin types.
(BT_FN_VOID_VPTR_I{1,2,4,8,16}_INT: New builtin type.
(BT_FN_BOOL_VPTR_PTR_I{1,2,4,8,16}_INT_INT): New builtin type.
* fortran/types.def (BT_FN_VOID_INT): New type.
(BT_FN_I{1,2,4,8,16}_VPTR_INT): New builtin type.
(BT_FN_VOID_VPTR_INT, BT_FN_BOOL_VPTR_INT): New builtin types.
(BT_FN_VOID_VPTR_I{1,2,4,8,16}_INT: New builtin type.
(BT_FN_BOOL_VPTR_PTR_I{1,2,4,8,16}_INT_INT): New builtin type.
* sync-builtins.def (BUILT_IN_SYNC_MEM_*): New sync builtins.


Index: expr.h
===
*** expr.h  (revision 175331)
--- expr.h  (working copy)
*** rtx expand_bool_compare_and_swap (rtx, r
*** 217,223 
  rtx expand_sync_operation (rtx, rtx, enum rtx_code);
  rtx expand_sync_fetch_operation (rtx, rtx, enum rtx_code, bool, rtx);
  rtx expand_sync_lock_test_and_set (rtx, rtx, rtx);
! rtx expand_sync_mem_exchange (enum memmodel, rtx, rtx, rtx);
  
  /* Functions from expmed.c:  */
  
--- 217,234 
  rtx expand_sync_operation (rtx, rtx, enum rtx_code);
  rtx expand_sync_fetch_operation (rtx, rtx, enum rtx_code, bool, rtx);
  rtx expand_sync_lock_test_and_set (rtx, rtx, rtx);
! 
! rtx expand_sync_mem_exchange (rtx, rtx, rtx, enum memmodel);
! rtx expand_sync_mem_compare_exchange (rtx, rtx, rtx, rtx, enum memmodel, 
! enum memmodel);
! rtx expand_sync_mem_load (rtx, rtx, enum memmodel);
! void expand_sync_mem_store (rtx, rtx, enum memmodel);
! rtx expand_sync_mem_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel);

Re: Support -march=native on IRIX

2011-07-27 Thread Richard Sandiford
Rainer Orth  writes:
> Here's the last of my patches to support -march=native, this time for
> IRIX.  It uses the getenvent(3) family of functions since /proc/cpuinfo
> is Linux-only.  The patch itself is pretty straight forward, the basic
> approach has been tested in a separate program, and the code compiles :-)
> I'm waiting for another bootstrap to complete to fully test it.
>
> Prompted by rth's response to my Tru64 UNIX/Alpha patch, I had another
> look at using mfc0 $reg, $15 to access the PRId register directly, but
> unfortunately that is a privileged operation, just as on SPARC.
>
> Ok for mainline if the bootstrap passes?

Yeah, looks good, thanks.

Richard


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread H.J. Lu
On Mon, Jul 25, 2011 at 10:07 AM, Aldy Hernandez  wrote:
> On 07/22/11 13:44, Jason Merrill wrote:
>>
>> On 07/18/2011 08:02 AM, Aldy Hernandez wrote:
>>>
>>> + /* If other threads can't see this value, no need to restrict
>>> stores. */
>>> + if (ALLOW_STORE_DATA_RACES
>>> + || !DECL_THREAD_VISIBLE_P (innerdecl))
>>> + {
>>> + *bitstart = *bitend = 0;
>>> + return;
>>> + }
>>
>> What if get_inner_reference returns something that isn't a DECL, such as
>> an INDIRECT_REF?
>
> I had changed this already to take into account aliasing, so if we get an
> INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we proceed
> with the restriction:
>
> +  /* If other threads can't see this value, no need to restrict stores.  */
> +  if (ALLOW_STORE_DATA_RACES
> +      || (!ptr_deref_may_alias_global_p (innerdecl)
> +         && (DECL_THREAD_LOCAL_P (innerdecl)
> +             || !TREE_STATIC (innerdecl
>
>
>>> + if (fld)
>>> + {
>>> + /* We found the end of the bit field sequence. Include the
>>> + padding up to the next field and be done. */
>>> + *bitend = bitpos - 1;
>>> + }
>>
>> bitpos is the position of "field", and it seems to me we want the
>> position of "fld" here.
>
> Notice that bitpos gets recalculated at each iteration by
> get_inner_reference, so bitpos is actually the position of fld.
>
>>> + /* If unset, no restriction. */
>>> + if (!bitregion_end)
>>> + maxbits = 0;
>>> + else
>>> + maxbits = (bitregion_end - bitregion_start) % align;
>>
>> Maybe use MAX_FIXED_MODE_SIZE so you don't have to test it against 0?
>
> Fixed everywhere.
>
>>> + if (!bitregion_end)
>>> + maxbits = 0;
>>> + else if (1||bitpos + offset * BITS_PER_UNIT < bitregion_start)
>>> + maxbits = bitregion_end - bitregion_start;
>>> + else
>>> + maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;
>>
>> I assume the 1|| was there for debugging?
>
> Fixed, plus I adjusted the calculation of maxbits everywhere because I found
> an off-by-one error.
>
> I have also overhauled store_bit_field() to adjust the address of the
> address to point to the beginning of the bit region.  This fixed a myraid of
> corner cases pointed out by a test Hans Boehm was kind enough to provide.
>
> I have added more tests.
>
> How does this look?  (Pending tests.)
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49875

-- 
H.J.


Re: [PATCH, i386, testsuite] New BMI testcases

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 10:26 AM, Kirill Yukhin  wrote:
> Sharp eye! Thanks.
> Updated patch is attached.
> Guys, with write approval, could you please commit that?
>

I checked it in for you.

Thanks.

-- 
H.J.


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Aldy Hernandez



Oh, and

INNERDECL is the actual object being referenced.

   || (!ptr_deref_may_alias_global_p (innerdecl)

is surely not what you want.  That asks if *innerdecl is global memory.
I suppose you want is_global_var (innerdecl)?  But with

   &&  (DECL_THREAD_LOCAL_P (innerdecl)
   || !TREE_STATIC (innerdecl

you can simply skip this test.  Or what was it supposed to do?


The test was there because neither DECL_THREAD_LOCAL_P nor is_global_var 
can handle MEM_REF's.


Would you prefer an explicit check for a *_DECL?

   if (ALLOW_STORE_DATA_RACES
-  || (!ptr_deref_may_alias_global_p (innerdecl)
+  || (DECL_P (innerdecl)
  && (DECL_THREAD_LOCAL_P (innerdecl)
  || !TREE_STATIC (innerdecl


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-27 Thread H.J. Lu
On Mon, Jul 25, 2011 at 2:58 AM, Paolo Bonzini  wrote:
> On 07/13/2011 07:48 PM, H.J. Lu wrote:
>>
>> Here is the patch.  OK for trunk?
>
> Again, at least you should explain clearly _why_ you need
> ignore_address_wrap_around.  You said elsewhere x32 should be first clean,
> then fast.
>
>>   if (GET_CODE (x) == SUBREG && SUBREG_PROMOTED_VAR_P (x)
>>       && GET_MODE_SIZE (GET_MODE (SUBREG_REG (x))) >= GET_MODE_SIZE (mode)
>>       && SUBREG_PROMOTED_UNSIGNED_P (x) == unsignedp)
>> +    {
>> +      if (no_emit)
>> +       x = rtl_hooks.gen_lowpart_no_emit (mode, x);
>> +      else
>> +       x = gen_lowpart (mode, x);
>> +    }
>> @@ -773,7 +781,10 @@ convert_modes (enum machine_mode mode, enum
>> machine_mode oldmode, rtx x, int uns
>>          return gen_int_mode (val, mode);
>>        }
>>
>> -      return gen_lowpart (mode, x);
>> +      if (no_emit)
>> +       return rtl_hooks.gen_lowpart_no_emit (mode, x);
>> +      else
>> +       return gen_lowpart (mode, x);
>>     }
>
> These should be
>
>  rtx tem = rtl_hooks.gen_lowpart_no_emit (mode, x);
>  if (tem)
>    x = tem;
>
>  rtx tem = rtl_hooks.gen_lowpart_no_emit (mode, x);
>  if (tem)
>    return x;
>
> since the "emitting" case can just reuse the code below.  However, see the
> patch I'm sending now.
>
> Paolo
>

Here is the updated patch.  OK for trunk?

Thanks.


-- 
H.J.

2011-07-27  H.J. Lu  

PR middle-end/49721
* explow.c (convert_memory_address_addr_space_1): New.
(convert_memory_address_addr_space): Use it.

* expr.c (convert_modes_1): New.
(convert_modes): Use it.

* expr.h (convert_modes_1): New.

* rtl.h (convert_memory_address_addr_space_1): New.
(convert_memory_address_1): Likewise.

* simplify-rtx.c (simplify_unary_operation_1): Call
convert_memory_address_1 instead of convert_memory_address.
2011-07-27  H.J. Lu  

	PR middle-end/49721
	* explow.c (convert_memory_address_addr_space_1): New.
	(convert_memory_address_addr_space): Use it.

	* expr.c (convert_modes_1): New.
	(convert_modes): Use it.

	* expr.h (convert_modes_1): New.

	* rtl.h (convert_memory_address_addr_space_1): New.
	(convert_memory_address_1): Likewise.

	* simplify-rtx.c (simplify_unary_operation_1): Call
	convert_memory_address_1 instead of convert_memory_address.

diff --git a/gcc/explow.c b/gcc/explow.c
index 3c692f4..1b2b4e9 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -317,11 +317,16 @@ break_out_memory_refs (rtx x)
an address in the address space's address mode, or vice versa (TO_MODE says
which way).  We take advantage of the fact that pointers are not allowed to
overflow by commuting arithmetic operations over conversions so that address
-   arithmetic insns can be used.  */
+   arithmetic insns can be used.  If IRNORE_ADDRESS_WRAP_AROUND is TRUE, we
+   also permute the conversion and addition of a constant.  It is used to
+   optimize cases where overflow of base + constant offset won't happen or
+   its behavior is implementation-defined for a given target.  */
 
 rtx
-convert_memory_address_addr_space (enum machine_mode to_mode ATTRIBUTE_UNUSED,
-   rtx x, addr_space_t as ATTRIBUTE_UNUSED)
+convert_memory_address_addr_space_1 (enum machine_mode to_mode ATTRIBUTE_UNUSED,
+ rtx x, addr_space_t as ATTRIBUTE_UNUSED,
+ bool no_emit ATTRIBUTE_UNUSED,
+ bool ignore_address_wrap_around ATTRIBUTE_UNUSED)
 {
 #ifndef POINTERS_EXTEND_UNSIGNED
   gcc_assert (GET_MODE (x) == to_mode || GET_MODE (x) == VOIDmode);
@@ -377,28 +382,28 @@ convert_memory_address_addr_space (enum machine_mode to_mode ATTRIBUTE_UNUSED,
   break;
 
 case CONST:
-  return gen_rtx_CONST (to_mode,
-			convert_memory_address_addr_space
-			  (to_mode, XEXP (x, 0), as));
+  temp = convert_memory_address_addr_space_1 (to_mode, XEXP (x, 0),
+		  as, no_emit,
+		  ignore_address_wrap_around);
+  return temp ? gen_rtx_CONST (to_mode, temp) : temp;
   break;
 
 case PLUS:
 case MULT:
-  /* For addition we can safely permute the conversion and addition
-	 operation if one operand is a constant and converting the constant
-	 does not change it or if one operand is a constant and we are
-	 using a ptr_extend instruction  (POINTERS_EXTEND_UNSIGNED < 0).
-	 We can always safely permute them if we are making the address
-	 narrower.  */
+  /* For addition, we can safely permute the conversion and addition
+	 operation if one operand is a constant and we are using a
+	 ptr_extend instruction (POINTERS_EXTEND_UNSIGNED < 0) or address
+	 wrap-around is ignored.  We can always safely permute them if
+	 we are making the address narrower.  */
   if (GET_MODE_SIZE (to_mode) < GET_MODE_SIZE (from_mode)
 	  || (GET_CODE (x) == PLUS
 	  && CONST_INT_P (XEXP (x, 1))
-	  && (XEXP (x, 1) == convert_memory_address_addr_space
-   (to_mode, XEXP (x, 1), as)
- || POINTERS_EXTEND_UNSIGNED < 0)))
+	  && (POI

Re: [PATCH, i386, testsuite] New BMI testcases

2011-07-27 Thread Kirill Yukhin
Sharp eye! Thanks.
Updated patch is attached.
Guys, with write approval, could you please commit that?

Thans, K

On Wed, Jul 27, 2011 at 8:46 PM, Uros Bizjak  wrote:
> On Wed, Jul 27, 2011 at 5:02 PM, Kirill Yukhin  
> wrote:
>
>> Thanks, for inputs.
>> Sure, lzcnt useless here. I am updated and tested BMI detection in test 
>> driver.
>>
>> testuite/ChageLog entry:
>> 2011-07-27  Yukhin Kirill  
>>
>>        * gcc.target/i386/i386.exp (check_effective_target_bmi): New.
>>        * gcc.target/i386/bmi-andn-1.c: New test.
>>        * gcc.target/i386/bmi-andn-1a.c: Likewise.
>>        * gcc.target/i386/bmi-andn-2.c: Likewise.
>>        * gcc.target/i386/bmi-andn-2a.c: Likewise.
>>        * gcc.target/i386/bmi-bextr-1.c: Likewise.
>>        * gcc.target/i386/bmi-bextr-1a.c: Likewise.
>>        * gcc.target/i386/bmi-bextr-2.c: Likewise.
>>        * gcc.target/i386/bmi-bextr-2a.c: Likewise.
>>        * gcc.target/i386/bmi-blsi-1.c: Likewise.
>>        * gcc.target/i386/bmi-blsi-1a.c: Likewise.
>>        * gcc.target/i386/bmi-blsi-2.c: Likewise.
>>        * gcc.target/i386/bmi-blsi-2a.c: Likewise.
>>        * gcc.target/i386/bmi-blsmsk-1.c: Likewise.
>>        * gcc.target/i386/bmi-blsmsk-1a.c: Likewise.
>>        * gcc.target/i386/bmi-blsmsk-2.c: Likewise.
>>        * gcc.target/i386/bmi-blsmsk-2a.c: Likewise.
>>        * gcc.target/i386/bmi-blsr-1.c: Likewise.
>>        * gcc.target/i386/bmi-blsr-1a.c: Likewise.
>>        * gcc.target/i386/bmi-blsr-2.c: Likewise.
>>        * gcc.target/i386/bmi-blsr-2a.c: Likewise.
>>        * gcc.target/i386/bmi-tzcnt-1.c: Likewise.
>>        * gcc.target/i386/bmi-tzcnt-1a.c: Likewise.
>>        * gcc.target/i386/bmi-tzcnt-2.c: Likewise.
>>        * gcc.target/i386/bmi-tzcnt-2a.c: Likewise.
>>
>>
>> New patch is attached.
>> Is it OK?
>
> +++ b/gcc/testsuite/gcc.target/i386/bmi-tzcnt-1a.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
> +
> +#include "bmi-tzcnt-1.c"
> +
> +/* { dg-final { scan-assembler-times "tzcntq" 1 } } */
>
> You don't need -dp there.
>
> The patch is OK for mainline with this change.
>
> Thanks,
> Uros.
>


bmi1-3.testcases.gcc.patch
Description: Binary data


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Andrew MacLeod

On 07/27/2011 01:08 PM, Aldy Hernandez wrote:



Anyway, I don't think a --param is appropriate to control a flag whether
to allow store data-races to be created.  Why not use a regular 
option instead?


I don't care either way.  What -foption-name do you suggest?
Well, I suggested a -f option set last year when this was laid out, and 
Ian suggested that it should be a --param


http://gcc.gnu.org/ml/gcc/2010-05/msg00118.html

"I don't agree with your proposed command line options.  They seem fine
for internal use, but I think very very few users would know when or
whether they should use -fno-data-race-stores.  I think you should
downgrade those options to a --param value, and think about a
multi-layered -fmemory-model option. "

Andrew


Re: [PATCH PR43513, 1/3] Replace vla with array - Implementation.

2011-07-27 Thread Tom de Vries
On 07/27/2011 05:27 PM, Richard Guenther wrote:
> On Wed, 27 Jul 2011, Tom de Vries wrote:
> 
>> On 07/27/2011 02:12 PM, Richard Guenther wrote:
>>> On Wed, 27 Jul 2011, Tom de Vries wrote:
>>>
 On 07/27/2011 01:50 PM, Tom de Vries wrote:
> Hi Richard,
>
> I have a patch set for bug 43513 - The stack pointer is adjusted twice.
>
> 01_pr43513.3.patch
> 02_pr43513.3.test.patch
> 03_pr43513.3.mudflap.patch
>
> The patch set has been bootstrapped and reg-tested on x86_64.
>
> I will sent out the patches individually.
>

 The patch replaces a vla __builtin_alloca that has a constant argument 
 with an
 array declaration.

 OK for trunk?
>>>
>>> I don't think it is safe to try to get at the VLA type the way you do.
>>
>> I don't understand in what way it's not safe. Do you mean I don't manage to 
>> find
>> the type always, or that I find the wrong type, or something else?
> 
> I think you might get the wrong type,

Ok, I'll review that code one more time.

> you also do not transform code
> like
> 
>   int *p = alloca(4);
>   *p = 3;
> 
> as there is no array type involved here.
> 

I was trying to stay away from non-vla allocas.  A source declared alloca has
function livetime, so we could have a single alloca in a loop, called 10 times,
with all 10 instances live at the same time. This patch does not detect such
cases, and thus stays away from non-vla allocas. A vla decl does not have such
problems, the lifetime ends when it goes out of scope.

>>> In fact I would simply do sth like
>>>
>>>   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
>>>   n_elem = size * 8 / BITS_PER_UNIT;
>>>   array_type = build_array_type_nelts (elem_type, n_elem);
>>>   var = create_tmp_var (array_type, NULL);
>>>   return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var));
>>>
>>
>> I tried this code on the example, and it works, but the newly declared type 
>> has
>> an 8-bit alignment, while the vla base type has a 32 bit alignment.  This 
>> make
>> the memory access in the example potentially unaligned, which prohibits an
>> ivopts optimization, so the resulting text size is 68 instead of the 64 
>> achieved
>> with my current patch.
> 
> Ok, so then set DECL_ALIGN of the variable to something reasonable
> like MIN (size * 8, GET_MODE_PRECISION (word_mode)).  Basically the
> alignment that the targets alloca function would guarantee.
> 

I tried that, but that doesn't help. It's the alignment of the type that
matters, not of the decl.

So should we try to find the base type of the vla, and use that, or use the
nonstandard char type?

>>> And obviously you lose the optimization we arrange with inserting
>>> __builtin_stack_save/restore pairs that way - stack space will no
>>> longer be shared for subsequent VLAs.  Which means that you'd
>>> better limit the size you allow this promotion.
>>>
>>
>> Right, I could introduce a parameter for this.
> 
> I would think you could use PARAM_LARGE_STACK_FRAME for now and say,
> allow a size of PARAM_LARGE_STACK_FRAME / 10?
> 

That unfortunately is too small for the example from bug report. The default
value of the param is 250, so that would be a threshold of 25, and the alloca
size of the example is 40.  Perhaps we can try a threshold of
PARAM_LARGE_STACK_FRAME - estimated_stack_size or some such?

>>> Alternatively this promotion could happen alongsize 
>>> optimize_stack_restore using more global knowledge of the effects
>>> on the maximum stack size this folding produces.
>>>
>>
>> OK, I'll look into this.
> 

Thanks,
- Tom


Re: [PATCH 5/9] [SMS] Support new loop pattern

2011-07-27 Thread Roman Zhuykov
2011/7/26 Richard Sandiford :
> Note that on ARM, the comparison and loop counter addition can happen
> as a single parallel:

Certainly, I notice such "subs" ARM instructions.  IMHO, this pattern seems to
appear rarely in real loops.  For loops without doloop_end pattern we have to
make the following instruction transformation as I have noticed already:

"The final register value X in compare instruction regF=COMPARE(regC,X) is
changed to another value Y respective to the stage this instruction is
scheduled: (Y = X - stage * step)"

In subs instruction we are unable to do this, because we can't change the
number to compare with.  It seems there are three following ways of
solving this.

The first way is to check that counter register is not used by non-control-flow
instructions before running SMS on such loops.  The same condition is
checked in doloop_condition_get.

The second way is to allow SMS to process loop with subs instruction, but when
the schedule is already computed, then allow to apply it only if X == Y
(otherwise new schedule lead to miscompilation).

The third way is to create a pair of sub and cmp instructions instead of subs
when needed.

> I think we'd need to handle that too before getting rid of the
> ARM doloop_end pattern.

I think all three ways are complicated enough and decide to begin with
implementing SMS without such loops support.

--
Roman Zhuykov
zhr...@ispras.ru


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Aldy Hernandez



Anyway, I don't think a --param is appropriate to control a flag whether
to allow store data-races to be created.  Why not use a regular option instead?


I don't care either way.  What -foption-name do you suggest?


Re: [Patch,AVR]: Fix PR29560 (map 16-bit shift to 8-bit)

2011-07-27 Thread Georg-Johann Lay
Richard Henderson wrote:
>> +;; "*ashluqihiqi3.mem"
>> +;; "*ashlsqihiqi3.mem"
>> +(define_insn_and_split "*ashlqihiqi3.mem"
>> +  [(set (match_operand:QI 0 "memory_operand" "=m")
>> +(subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 
>> "register_operand" "r"))
>> +  (match_operand:QI 2 "register_operand" "r"))
>> +   0))]
>> +  "!reload_completed"
>> +  { gcc_unreachable(); }
> 
> Surely this isn't necessary.  Why would you ever be matching a memory output?
> 
>> +(define_insn_and_split "*ashlhiqi3"
>> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=r")
>> +(subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0")
>> +  (match_operand:QI 2 "register_operand" "r")) 
>> 0))]
>> +  "!reload_completed"
>> +  { gcc_unreachable(); }
> 
> Likewise.
> 
> But the first pattern and the peep2 look good.
> 

It's that what combine comes up with, and combine is not smart enough
to find a split point between the mem and the subreg.  I don't know
enough of combine, maybe it's because can_create_pseudo_p is false
during combine, combine has no spare reg.  A combine-split won't
help as it needs a pseudo/spare reg.

As consequence there is better code if memory operand is allowed
which is a typical use-case, e.g. setting some bits in a SFR.

Johann



Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 9:49 AM, Uros Bizjak  wrote:
> On Wed, Jul 27, 2011 at 6:12 PM, Kirill Yukhin  
> wrote:
>> Than as it is ABM header, it should include two headers: lzcntinrin.h
>> and popcntintrin.h
>>
>> But again, it seems useless to me. If we cannot remove empty header,
>> let it stay empty...
>>
>> K
>>
>> On Wed, Jul 27, 2011 at 7:53 PM, H.J. Lu  wrote:
>>> On Wed, Jul 27, 2011 at 8:45 AM, Kirill Yukhin  
>>> wrote:
 Just have a closer look to ABM intrinsics support in GCC
 Seems, we have popcnt support in separate file: popcntintrin.h

 So, after I move lzcnt intrinsics to lzcntintrin.h, abmintrin will
 become useless and have to be removed at all
>>>
>>> We can't remove an installed header file.  It should just include
>>> other header files.
>
> ambintrin.h has:
>
> #ifndef _X86INTRIN_H_INCLUDED
> # error "Never use  directly; include  instead."
> #endif
>
> I see no problem in removing this header. It is not possible to
> #include it directly.
>

Sounds good to me.

-- 
H.J.


Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread Uros Bizjak
On Wed, Jul 27, 2011 at 6:12 PM, Kirill Yukhin  wrote:
> Than as it is ABM header, it should include two headers: lzcntinrin.h
> and popcntintrin.h
>
> But again, it seems useless to me. If we cannot remove empty header,
> let it stay empty...
>
> K
>
> On Wed, Jul 27, 2011 at 7:53 PM, H.J. Lu  wrote:
>> On Wed, Jul 27, 2011 at 8:45 AM, Kirill Yukhin  
>> wrote:
>>> Just have a closer look to ABM intrinsics support in GCC
>>> Seems, we have popcnt support in separate file: popcntintrin.h
>>>
>>> So, after I move lzcnt intrinsics to lzcntintrin.h, abmintrin will
>>> become useless and have to be removed at all
>>
>> We can't remove an installed header file.  It should just include
>> other header files.

ambintrin.h has:

#ifndef _X86INTRIN_H_INCLUDED
# error "Never use  directly; include  instead."
#endif

I see no problem in removing this header. It is not possible to
#include it directly.

Uros.


Re: [Patch,AVR]: Fix PR29560 (map 16-bit shift to 8-bit)

2011-07-27 Thread Richard Henderson
> +;; "*ashluqihiqi3.mem"
> +;; "*ashlsqihiqi3.mem"
> +(define_insn_and_split "*ashlqihiqi3.mem"
> +  [(set (match_operand:QI 0 "memory_operand" "=m")
> +(subreg:QI (ashift:HI (any_extend:HI (match_operand:QI 1 
> "register_operand" "r"))
> +  (match_operand:QI 2 "register_operand" "r"))
> +   0))]
> +  "!reload_completed"
> +  { gcc_unreachable(); }

Surely this isn't necessary.  Why would you ever be matching a memory output?

> +(define_insn_and_split "*ashlhiqi3"
> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=r")
> +(subreg:QI (ashift:HI (match_operand:HI 1 "register_operand" "0")
> +  (match_operand:QI 2 "register_operand" "r")) 
> 0))]
> +  "!reload_completed"
> +  { gcc_unreachable(); }

Likewise.

But the first pattern and the peep2 look good.


r~


Re: [PATCH, i386, testsuite] New BMI testcases

2011-07-27 Thread Uros Bizjak
On Wed, Jul 27, 2011 at 5:02 PM, Kirill Yukhin  wrote:

> Thanks, for inputs.
> Sure, lzcnt useless here. I am updated and tested BMI detection in test 
> driver.
>
> testuite/ChageLog entry:
> 2011-07-27  Yukhin Kirill  
>
>        * gcc.target/i386/i386.exp (check_effective_target_bmi): New.
>        * gcc.target/i386/bmi-andn-1.c: New test.
>        * gcc.target/i386/bmi-andn-1a.c: Likewise.
>        * gcc.target/i386/bmi-andn-2.c: Likewise.
>        * gcc.target/i386/bmi-andn-2a.c: Likewise.
>        * gcc.target/i386/bmi-bextr-1.c: Likewise.
>        * gcc.target/i386/bmi-bextr-1a.c: Likewise.
>        * gcc.target/i386/bmi-bextr-2.c: Likewise.
>        * gcc.target/i386/bmi-bextr-2a.c: Likewise.
>        * gcc.target/i386/bmi-blsi-1.c: Likewise.
>        * gcc.target/i386/bmi-blsi-1a.c: Likewise.
>        * gcc.target/i386/bmi-blsi-2.c: Likewise.
>        * gcc.target/i386/bmi-blsi-2a.c: Likewise.
>        * gcc.target/i386/bmi-blsmsk-1.c: Likewise.
>        * gcc.target/i386/bmi-blsmsk-1a.c: Likewise.
>        * gcc.target/i386/bmi-blsmsk-2.c: Likewise.
>        * gcc.target/i386/bmi-blsmsk-2a.c: Likewise.
>        * gcc.target/i386/bmi-blsr-1.c: Likewise.
>        * gcc.target/i386/bmi-blsr-1a.c: Likewise.
>        * gcc.target/i386/bmi-blsr-2.c: Likewise.
>        * gcc.target/i386/bmi-blsr-2a.c: Likewise.
>        * gcc.target/i386/bmi-tzcnt-1.c: Likewise.
>        * gcc.target/i386/bmi-tzcnt-1a.c: Likewise.
>        * gcc.target/i386/bmi-tzcnt-2.c: Likewise.
>        * gcc.target/i386/bmi-tzcnt-2a.c: Likewise.
>
>
> New patch is attached.
> Is it OK?

+++ b/gcc/testsuite/gcc.target/i386/bmi-tzcnt-1a.c
@@ -0,0 +1,6 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mbmi -fno-inline -dp" } */
+
+#include "bmi-tzcnt-1.c"
+
+/* { dg-final { scan-assembler-times "tzcntq" 1 } } */

You don't need -dp there.

The patch is OK for mainline with this change.

Thanks,
Uros.


Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-27 Thread Richard Henderson
On 07/27/2011 08:57 AM, Georg-Johann Lay wrote:
>> > You'll probably end up with quite a few register classes 
>> > out of this, but hopefully reload can do a better job than
>> > you can manually...
> Agreed.
> 
> insns that will benefit are insns with two input operands that
> commute, i.e. mulsi3, umulhisi3, mulhisi3, mulhi3.
> 
> Maybe even other 2-input insns could benefit because there's no
> predetermined order in which the moves are accomplished; e.g.
> moving R24 before R22 in udivmodqi4.  I don't know if register
> allocator is smart enough to swap the assignments if that is
> better.
> 
> Moreover, it would reduce the number of insns resp. split
> patterns and help cleanup md.
> 
> I'd prefer to do that work in a separate patch.  The current patch
> behaves the same as the old code, so it's not a performance
> regression of the current patch.

Fair enough.

I didn't review the asm code, but the rest of the patch look ok to me.


r~


Re: [PATCH] Fix PR47594: Sign extend constants while translating to Graphite

2011-07-27 Thread Sebastian Pop
On Tue, Jul 26, 2011 at 09:34, Richard Guenther  wrote:
> Truncating -1 doesn't matter - it matters that if you perform any
> unsigned arithmetic in arbitrary precision signed arithmetic that
> you properly truncate after each operation to simulate unsigned
> twos-complement wrapping semantic.  And if you did that you wouldn't
> need to sign-extend -1U either.

Ok, so I guess that the type of the expression that we generate from
Graphite should be, as the original expression, of unsigned type.
In the previous example,

> for (scat_3=0;scat_3<=4294967295*scat_1+T_51-1;scat_3++) {
>   S6(scat_1,scat_3);
> }

this is still valid if the type of "4294967295*scat_1" is unsigned.
That would fix only -fgraphite-identity: we also have to watch out for
operations on the polyhedral representation that would use -1U in
other computations, and here I'm thinking about everything we have
implemented on the polyhedral representation: dependence test,
counting the number of points, i.e., all the heuristics, etc.

When disabling Graphite on all unsigned niter expressions, we get
the following fails:

FAIL: gcc.dg/graphite/scop-0.c scan-tree-dump-times graphite "number
of SCoPs: 1" 1
FAIL: gcc.dg/graphite/scop-1.c scan-tree-dump-times graphite "number
of SCoPs: 3" 1
FAIL: gcc.dg/graphite/scop-10.c scan-tree-dump-times graphite "number
of SCoPs: 3" 1
FAIL: gcc.dg/graphite/scop-11.c scan-tree-dump-times graphite "number
of SCoPs: 3" 1
FAIL: gcc.dg/graphite/scop-12.c scan-tree-dump-times graphite "number
of SCoPs: 5" 1
FAIL: gcc.dg/graphite/scop-13.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-16.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-17.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-18.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-2.c scan-tree-dump-times graphite "number
of SCoPs: 4" 1
FAIL: gcc.dg/graphite/scop-20.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-21.c scan-tree-dump-times graphite "number
of SCoPs: 1" 1
FAIL: gcc.dg/graphite/scop-22.c scan-tree-dump-times graphite "number
of SCoPs: 1" 1
FAIL: gcc.dg/graphite/scop-3.c scan-tree-dump-times graphite "number
of SCoPs: 1" 1
FAIL: gcc.dg/graphite/scop-4.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-5.c scan-tree-dump-times graphite "number
of SCoPs: 3" 1
FAIL: gcc.dg/graphite/scop-6.c scan-tree-dump-times graphite "number
of SCoPs: 3" 1
FAIL: gcc.dg/graphite/scop-7.c scan-tree-dump-times graphite "number
of SCoPs: 3" 1
FAIL: gcc.dg/graphite/scop-8.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-9.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-dsyr2k.c scan-tree-dump-times graphite
"number of SCoPs: 1" 1
FAIL: gcc.dg/graphite/scop-dsyrk.c scan-tree-dump-times graphite
"number of SCoPs: 1" 1
FAIL: gcc.dg/graphite/scop-matmult.c scan-tree-dump-times graphite
"number of SCoPs: 1" 1
FAIL: gcc.dg/graphite/scop-mvt.c scan-tree-dump-times graphite "number
of SCoPs: 2" 1
FAIL: gcc.dg/graphite/scop-sor.c scan-tree-dump-times graphite "number
of SCoPs: 1" 1
FAIL: gcc.dg/graphite/interchange-0.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-1.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-10.c scan-tree-dump-times graphite
"will be interchanged" 2
FAIL: gcc.dg/graphite/interchange-11.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-12.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-13.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-3.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-4.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-5.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-6.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-7.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/interchange-8.c scan-tree-dump-times graphite
"will be interchanged" 2
FAIL: gcc.dg/graphite/interchange-9.c scan-tree-dump-times graphite
"will be interchanged" 1
FAIL: gcc.dg/graphite/block-1.c scan-tree-dump-times graphite "will be
loop blocked" 3
FAIL: gcc.dg/graphite/block-5.c scan-tree-dump-times graphite "will be
loop blocked" 1
FAIL: gcc.dg/graphite/vect-pr43423.c scan-tree-dump-times vect
"vectorized 2 loops" 1
FAIL: gcc.dg/graphite/pr35356-1.c scan-tree-dump-times graphite "loop_1" 0
FAIL: gcc.dg/graphite/pr35356-2.c scan-tree-dump-times graphite "MIN_EXPR" 4
FAIL: gcc.dg/graphite/pr35356-2.c scan-tree-dump-times graphite "MAX_EXPR" 4

FAIL: gfortran.dg/graphite/interchange-3.f90  -O  scan-

Support -march=native on IRIX

2011-07-27 Thread Rainer Orth
Here's the last of my patches to support -march=native, this time for
IRIX.  It uses the getenvent(3) family of functions since /proc/cpuinfo
is Linux-only.  The patch itself is pretty straight forward, the basic
approach has been tested in a separate program, and the code compiles :-)
I'm waiting for another bootstrap to complete to fully test it.

Prompted by rth's response to my Tru64 UNIX/Alpha patch, I had another
look at using mfc0 $reg, $15 to access the PRId register directly, but
unfortunately that is a privileged operation, just as on SPARC.

Ok for mainline if the bootstrap passes?

Thanks.
Rainer


2011-07-26  Rainer Orth  

* config/mips/driver-native.c [__sgi__]: Include ,
.
(cpu_types): New array.
(cputype): New function.
(host_detect_local_cpu): Only define buf, f if !__sgi__.
Use scaninvent instead of /proc/cpuinfo if __sgi__.
* config.host: Also use driver-native.o, mips/x-native on
mips-sgi-irix*.
* config/mips/iris6.h [__mips__] (host_detect_local_cpu):
Declare.
(EXTRA_SPEC_FUNCTIONS, MARCH_MTUNE_NATIVE_SPECS): Define.
(DRIVER_SELF_SPECS): Add MARCH_MTUNE_NATIVE_SPECS.

diff --git a/gcc/config.host b/gcc/config.host
--- a/gcc/config.host
+++ b/gcc/config.host
@@ -118,9 +118,9 @@ case ${host} in
;;
 esac
 ;;
-  mips*-*-linux*)
+  mips*-*-linux* | mips-sgi-irix*)
 case ${target} in
-  mips*-*-linux*)
+  mips*-*-linux* | mips-sgi-irix*)
host_extra_gcc_objs="driver-native.o"
host_xmake_file="${host_xmake_file} mips/x-native"
   ;;
diff --git a/gcc/config/mips/driver-native.c b/gcc/config/mips/driver-native.c
--- a/gcc/config/mips/driver-native.c
+++ b/gcc/config/mips/driver-native.c
@@ -1,5 +1,5 @@
 /* Subroutines for the gcc driver.
-   Copyright (C) 2008 Free Software Foundation, Inc.
+   Copyright (C) 2008, 2011 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -22,6 +22,59 @@ along with GCC; see the file COPYING3.  
 #include "coretypes.h"
 #include "tm.h"
 
+#ifdef __sgi__
+#include 
+#include 
+
+/* Cf. MIPS R1 Microprocessor User Guide, Version 2.0, 14.13 Processor
+   Revision Identifier (PRId) Register (15).
+
+   
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/hdwr/bks/SGI_Developer/books/R10K_UM/sgi_html/t5.Ver.2.0.book_279.html
  */
+
+static const struct cpu_types {
+  int impl;
+  const char *cpu;
+} cpu_types[] = {
+  { C0_IMP_R2000, "r2000" },
+  { C0_IMP_R3000, "r3000" },
+  { C0_IMP_R6000, "r6000" },
+  { C0_IMP_R4000, "r4000" },
+  { C0_IMP_R6000A, "r6000" },
+  { C0_IMP_R1, "r1" },
+  { C0_IMP_R12000, "r12000" },
+  { C0_IMP_R14000, "r14000" },
+  { C0_IMP_R8000,  "r8000" },
+  { C0_IMP_R4600,  "r4600" },
+  { C0_IMP_R4700,  "r4600" },
+  { C0_IMP_R4650,  "r4650" },
+  { C0_IMP_R5000,  "vr5000" },
+  { C0_IMP_RM7000, "rm7000" },
+  { C0_IMP_RM5271, "vr5000" },
+  { 0, 0 }
+};
+
+static int
+cputype (inventory_t *inv, void *arg)
+{
+  if (inv != NULL
+  && inv->inv_class == INV_PROCESSOR
+  && inv->inv_type == INV_CPUCHIP)
+{
+  int i;
+  /* inv_state is the cpu revision number.  */
+  int impl = (inv->inv_state & C0_IMPMASK) >> C0_IMPSHIFT;
+
+  for (i = 0; cpu_types[i].cpu != NULL; i++)
+   if (cpu_types[i].impl == impl)
+ {
+   *((const char **) arg) = cpu_types[i].cpu;
+   break;
+ }
+}
+  return 0;
+}
+#endif
+
 /* This will be called by the spec parser in gcc.c when it sees
a %:local_cpu_detect(args) construct.  Currently it will be called
with either "arch" or "tune" as argument depending on if -march=native
@@ -39,8 +92,10 @@ const char *
 host_detect_local_cpu (int argc, const char **argv)
 {
   const char *cpu = NULL;
+#ifndef __sgi__
   char buf[128];
   FILE *f;
+#endif
   bool arch;
 
   if (argc < 1)
@@ -50,6 +105,9 @@ host_detect_local_cpu (int argc, const c
   if (!arch && strcmp (argv[0], "tune"))
 return NULL;
 
+#ifdef __sgi__
+  scaninvent (cputype, &cpu);
+#else
   f = fopen ("/proc/cpuinfo", "r");
   if (f == NULL)
 return NULL;
@@ -73,6 +131,7 @@ host_detect_local_cpu (int argc, const c
   }
 
   fclose (f);
+#endif
 
   if (cpu == NULL)
 return NULL;
diff --git a/gcc/config/mips/iris6.h b/gcc/config/mips/iris6.h
--- a/gcc/config/mips/iris6.h
+++ b/gcc/config/mips/iris6.h
@@ -27,13 +27,28 @@ along with GCC; see the file COPYING3.  
 #undef MULTILIB_DEFAULTS
 #define MULTILIB_DEFAULTS { "mabi=n32" }
 
+/* -march=native handling only makes sense with compiler running on
+   a MIPS chip.  */
+#if defined(__mips__)
+extern const char *host_detect_local_cpu (int argc, const char **argv);
+# define EXTRA_SPEC_FUNCTIONS \
+  { "local_cpu_detect", host_detect_local_cpu },
+
+# define MARCH_MTUNE_NATIVE_SPECS  \
+  " %{march=native:%

Re: [Patch,AVR]: PR49313

2011-07-27 Thread Richard Henderson
On 07/27/2011 09:12 AM, Georg-Johann Lay wrote:
>   PR target/49313
>   * config/avr/libgcc.S (__ffshi2): Don't skip 2-word instruction.
>   (__ctzsi2): Result for 0 may be undefined.
>   (__ctzhi2): Result for 0 may be undefined.
>   (__popcounthi2): Don't clobber r30. Use __popcounthi2_tail.
>   (__popcountsi2): Ditto. And don't clobber r26.
>   (__popcountdi2): Ditto. And don't clobber r27.
>   * config/avr/avr.md (UNSPEC_COPYSIGN): New c_enum.
>   (parityhi2): New expand.
>   (paritysi2): New expand.
>   (popcounthi2): New expand.
>   (popcountsi2): New expand.
>   (clzhi2): New expand.
>   (clzsi2): New expand.
>   (ctzhi2): New expand.
>   (ctzsi2): New expand.
>   (ffshi2): New expand.
>   (ffssi2): New expand.
>   (copysignsf2): New insn.
>   (bswapsi2): New expand.
>   (*parityhi2.libgcc): New insn.
>   (*parityqihi2.libgcc): New insn.
>   (*paritysihi2.libgcc): New insn.
>   (*popcounthi2.libgcc): New insn.
>   (*popcountsi2.libgcc): New insn.
>   (*popcountqi2.libgcc): New insn.
>   (*popcountqihi2.libgcc): New insn-and-split.
>   (*clzhi2.libgcc): New insn.
>   (*clzsihi2.libgcc): New insn.
>   (*ctzhi2.libgcc): New insn.
>   (*ctzsihi2.libgcc): New insn.
>   (*ffshi2.libgcc): New insn.
>   (*ffssihi2.libgcc): New insn.
>   (*bswapsi2.libgcc): New insn.

Looks good.


r~


[PATCH, i386]: Do not explicitly check symbol_operands in ix86_expand_move

2011-07-27 Thread Uros Bizjak
Hello!

There is no way symbol_operand uses non-DI or non-SI modes on x86.

2011-07-27  Uros Bizjak  

* config/i386/i386.c (ix86_expand_move): Do not explicitly check
the mode of symbolic_opreand RTXes.

Tested on x86_64-pc-linux-gnu {,-m32}. Committed to mainline SVN.

Uros.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 176833)
+++ config/i386/i386.c  (working copy)
@@ -15032,7 +15032,6 @@
 }

   if ((flag_pic || MACHOPIC_INDIRECT)
-  && (mode == SImode || mode == DImode)
   && symbolic_operand (op1, mode))
 {
   if (TARGET_MACHO && !TARGET_64BIT)


Re: [cxx-mem-model] __sync_mem builtin support patch 1/3 - documentation

2011-07-27 Thread Richard Henderson
On 07/26/2011 06:20 PM, Andrew MacLeod wrote:
>   * doc/extend.texi (__sync_mem_*) : Document all the atomic builtin
>   functions which deal with memory models.

Ok.


r~


Re: [PATCH 0/3] Move Graphite to CLooG 0.16.3 with isl backend.

2011-07-27 Thread Jack Howarth
On Fri, Jul 22, 2011 at 01:00:09AM +0200, Tobias Grosser wrote:
> Hi,
> 
> I propose to switch to the official cloog.org cloog version with isl backend 
> and
> at the same time to remove support for both CLooG-PPL legacy as well as
> CLooG-Parma.
> 
> We want to switch to cloog-isl as it is the only officially maintained version
> of cloog. Furthermore, it provides features that will help to fix some bugs in
> the graphite code generation[1].
> The reason to abond CLooG-PPL (legacy version) is, that cloog-isl provides the
> new CloogInput library interface. This interface is not available the old 
> CLooG.
> I plan to move graphite to this interface. As I do not see enough benefits 
> from
> being able to use CLooG PPL, I decided to not introduce any compatibility
> scheme, but just remove any code that is only needed for CLooG-PPL.
> I also removed CLooG-Parma (cloog.org with PPL backend), as it is currently 
> not
> actively maintained and not well tested. I believe our time is better spent on
> improving graphite or cloog isl, as in putting time into this cloog version.
> 
> So here we are: Moving graphite back to the official cloog.org version!
> 
> Passes 'make check RUNTESTFLAGS=graphite.exp' as well as a bootstrap on Linux
> amd64.
> 
> Cheers
> Tobi

Tobi,
   Are there any additional plans for gcc 4.7? In particular, wasn't the 
-fgraphite-identity
option supported to be enabled at -O3 by defaulting ftree-loop-linear on which 
is now an alias
of -floop-interchange since gcc 4.6?
  Jack

> 
> P.S.: Why do we move to the super latest one. Because we expect that most 
> users
> would need an update, and, as we will soon use some of the newer features, 
> there
> is no need to force another update later.
> 
> 
> Tobias Grosser (3):
>   Make CLooG isl the only supported CLooG version.
>   Require cloog 0.16.3
>   Remove code that supported legacy CLooG.
> 
>  ChangeLog  |   17 +++
>  config/cloog.m4|  109 ++--
>  configure  |  176 ++
>  configure.ac   |2 +-
>  gcc/ChangeLog  |   18 +++
>  gcc/Makefile.in|4 +-
>  gcc/graphite-clast-to-gimple.c |   93 ++
>  gcc/graphite-cloog-compat.h|  275 
> 
>  gcc/graphite-cloog-util.c  |   15 +--
>  gcc/graphite-cloog-util.h  |1 -
>  gcc/graphite.c |2 -
>  11 files changed, 106 insertions(+), 606 deletions(-)
>  delete mode 100644 gcc/graphite-cloog-compat.h
> 
> -- 
> 1.7.4.1


Re: [DF] Replace various bitmaps with HARD_REG_SETs

2011-07-27 Thread Joseph S. Myers
On Wed, 27 Jul 2011, Dimitrios Apostolou wrote:

> --- gcc/target.h  2011-04-06 11:08:17 +
> +++ gcc/target.h  2011-07-27 10:27:56 +
> @@ -50,6 +50,7 @@
>  #define GCC_TARGET_H
>  
>  #include "tm.h"
> +#include "hard-reg-set.h"
>  #include "insn-modes.h"

Please send a patch against current trunk.  target.h hasn't included tm.h 
for over a month.  Since hard-reg-set.h depends on tm.h, you won't be able 
to include hard-reg-set.h in target.h any more, so you'll need to find 
another solution for that.

-- 
Joseph S. Myers
jos...@codesourcery.com


[Patch,AVR]: PR49313

2011-07-27 Thread Georg-Johann Lay
This patch is to finalize the work on PR49313, i.e. better libgcc
implementation of some functions like bswap, counting zeros,
parity and popcount.

These functions are already implemented in libgcc.

This patch now provides a better integration of these functions:
the calls are no more emit as ordinary black box calls by optabs,
instead there are insns to describe the exact register usage of
the functions which are represented as implicit library calls.

This is advantageous because some call-clobbered registers are not
touched and there are more leaf-functions.

Some libgcc functions have minor changes to reduce register
footprint.

Besides that, copysignsf3 is implemented which is easy on avr.

Ok to commit?

Johann


PR target/49313
* config/avr/libgcc.S (__ffshi2): Don't skip 2-word instruction.
(__ctzsi2): Result for 0 may be undefined.
(__ctzhi2): Result for 0 may be undefined.
(__popcounthi2): Don't clobber r30. Use __popcounthi2_tail.
(__popcountsi2): Ditto. And don't clobber r26.
(__popcountdi2): Ditto. And don't clobber r27.
* config/avr/avr.md (UNSPEC_COPYSIGN): New c_enum.
(parityhi2): New expand.
(paritysi2): New expand.
(popcounthi2): New expand.
(popcountsi2): New expand.
(clzhi2): New expand.
(clzsi2): New expand.
(ctzhi2): New expand.
(ctzsi2): New expand.
(ffshi2): New expand.
(ffssi2): New expand.
(copysignsf2): New insn.
(bswapsi2): New expand.
(*parityhi2.libgcc): New insn.
(*parityqihi2.libgcc): New insn.
(*paritysihi2.libgcc): New insn.
(*popcounthi2.libgcc): New insn.
(*popcountsi2.libgcc): New insn.
(*popcountqi2.libgcc): New insn.
(*popcountqihi2.libgcc): New insn-and-split.
(*clzhi2.libgcc): New insn.
(*clzsihi2.libgcc): New insn.
(*ctzhi2.libgcc): New insn.
(*ctzsihi2.libgcc): New insn.
(*ffshi2.libgcc): New insn.
(*ffssihi2.libgcc): New insn.
(*bswapsi2.libgcc): New insn.
Index: config/avr/libgcc.S
===
--- config/avr/libgcc.S	(revision 176818)
+++ config/avr/libgcc.S	(working copy)
@@ -1061,9 +1061,15 @@ ENDF __ffssi2
 ;; clobbers: r26
 DEFUN __ffshi2
 clr  r26
+#ifdef __AVR_HAVE_JMP_CALL__
+;; Some cores have problem skipping 2-word instruction
+tst  r24
+breq 2f
+#else
 cpse r24, __zero_reg__
+#endif /* __AVR_HAVE_JMP_CALL__ */
 1:  XJMP __loop_ffsqi2
-ldi  r26, 8
+2:  ldi  r26, 8
 or   r24, r25
 brne 1b
 ret
@@ -1093,12 +1099,12 @@ ENDF __loop_ffsqi2
 #if defined (L_ctzsi2)
 ;; count trailing zeros
 ;; r25:r24 = ctz32 (r25:r22)
-;; ctz(0) = 32
+;; clobbers: r26, r22
+;; ctz(0) = 255
+;; Note that ctz(0) in undefined for GCC
 DEFUN __ctzsi2
 XCALL __ffssi2
 dec  r24
-sbrc r24, 7
-ldi  r24, 32
 ret
 ENDF __ctzsi2
 #endif /* defined (L_ctzsi2) */
@@ -1106,12 +1112,12 @@ ENDF __ctzsi2
 #if defined (L_ctzhi2)
 ;; count trailing zeros
 ;; r25:r24 = ctz16 (r25:r24)
-;; ctz(0) = 16
+;; clobbers: r26
+;; ctz(0) = 255
+;; Note that ctz(0) in undefined for GCC
 DEFUN __ctzhi2
 XCALL __ffshi2
 dec  r24
-sbrc r24, 7
-ldi  r24, 16
 ret
 ENDF __ctzhi2
 #endif /* defined (L_ctzhi2) */
@@ -1245,47 +1251,50 @@ ENDF __parityqi2
 #if defined (L_popcounthi2)
 ;; population count
 ;; r25:r24 = popcount16 (r25:r24)
-;; clobbers: r30, __tmp_reg__
+;; clobbers: __tmp_reg__
 DEFUN __popcounthi2
 XCALL __popcountqi2
-mov  r30, r24
+push r24
 mov  r24, r25
 XCALL __popcountqi2
-add  r24, r30
 clr  r25
-ret
+;; FALLTHRU
 ENDF __popcounthi2
+
+DEFUN __popcounthi2_tail
+pop   __tmp_reg__
+add   r24, __tmp_reg__
+ret
+ENDF __popcounthi2_tail
 #endif /* defined (L_popcounthi2) */
 
 #if defined (L_popcountsi2)
 ;; population count
 ;; r25:r24 = popcount32 (r25:r22)
-;; clobbers: r26, r30, __tmp_reg__
+;; clobbers: __tmp_reg__
 DEFUN __popcountsi2
 XCALL __popcounthi2
-mov   r26, r24
+push  r24
 mov_l r24, r22
 mov_h r25, r23
 XCALL __popcounthi2
-add   r24, r26
-ret
+XJMP  __popcounthi2_tail
 ENDF __popcountsi2
 #endif /* defined (L_popcountsi2) */
 
 #if defined (L_popcountdi2)
 ;; population count
 ;; r25:r24 = popcount64 (r25:r18)
-;; clobbers: r22, r23, r26, r27, r30, __tmp_reg__
+;; clobbers: r22, r23, __tmp_reg__
 DEFUN __popcountdi2
 XCALL __popcountsi2
-mov   r27, r24
+push  r24
 mov_l r22, r18
 mov_h r23, r19
 mov_l r24, r20
 mov_h r25, r21
 XCALL __popcountsi2
-add   r24, r27
-ret
+XJMP  __popcounthi2_tail
 ENDF __popcountdi2
 #endif /* defined (L_popcountdi2) */
 
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 176818)
+++ config/avr/avr.md	(working copy)
@@ -55,6 +55,7 @@ (define_c_enum "u

Fix comment of get_last_value

2011-07-27 Thread Paulo J. Matos
There is a mistake in the comment for get_last_value in combine.c. This 
patch fixes this.


PMatos

2011-07-27  Paulo J. Matos 

* Fix comment if get_last_value in combine.c.

diff --git a/gcc/combine.c b/gcc/combine.c
index 4dbf022..affb509 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -12697,7 +12697,7 @@ get_last_value_validate (rtx *loc, rtx insn, int tick, int replace)
 
 /* Get the last value assigned to X, if known.  Some registers
in the value may be replaced with (clobber (const_int 0)) if their value
-   is known longer known reliably.  */
+   is no longer known reliably.  */
 
 static rtx
 get_last_value (const_rtx x)


Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread Kirill Yukhin
Than as it is ABM header, it should include two headers: lzcntinrin.h
and popcntintrin.h

But again, it seems useless to me. If we cannot remove empty header,
let it stay empty...

K

On Wed, Jul 27, 2011 at 7:53 PM, H.J. Lu  wrote:
> On Wed, Jul 27, 2011 at 8:45 AM, Kirill Yukhin  
> wrote:
>> Just have a closer look to ABM intrinsics support in GCC
>> Seems, we have popcnt support in separate file: popcntintrin.h
>>
>> So, after I move lzcnt intrinsics to lzcntintrin.h, abmintrin will
>> become useless and have to be removed at all
>
> We can't remove an installed header file.  It should just include
> other header files.
>
>
> H.J.
> ---
>


Re: [PATCH, RFC] PR49749 biased reassociation for accumulator patterns

2011-07-27 Thread Michael Matz
Hi,

On Wed, 27 Jul 2011, William J. Schmidt wrote:

> +static long
> +propagate_rank (long rank, tree op)
> +{
> +  long phi_prop_rank = phi_propagation_rank (op);
> +
> +  if (phi_prop_rank)
> +return MAX (rank, phi_prop_rank);
> +
> +  return MAX (rank, get_rank (op));
> +}

I know it's pre-existing code, but as you're touching it anyway: function 
calls in min/max macros usually are a bad idea due to multiple 
evaluations.  If nothing else it's just wasted work.


Ciao,
Michael.


Re: [cxx-mem-model] __sync_mem builtin support patch 3/3 - testcases

2011-07-27 Thread Richard Henderson
On 07/26/2011 06:20 PM, Andrew MacLeod wrote:
>   * gcc.dg/sync-mem-{1-5}.c: Remove.
>   * gcc.dg/sync-mem.h: Remove.
>   * gcc.dg/sync-mem-compare-exchange-{1-5}.c: New functional tests.
>   * gcc.dg/sync-mem-exchange-{1-5}.c: New functional tests.
>   * gcc.dg/sync-mem-fence.c: New functional tests.
>   * gcc.dg/sync-mem-fetch-*-{1-5}.c: New functional tests.
>   * gcc.dg/sync-mem-flag.c: New functional tests.
>   * gcc.dg/sync-mem-invalid.c: Add new invalid combinations.
>   * gcc.dg/sync-mem-load-{1-5}.c: New functional tests.
>   * gcc.dg/sync-mem-store-{1-5}.c: New functional tests.

Ok, but as said earlier, don't commit the compare-exchange tests
until we fix the code.


r~


Re: [cxx-mem-model] __sync_mem builtin support patch 2/3 - code

2011-07-27 Thread Richard Henderson
On 07/26/2011 06:20 PM, Andrew MacLeod wrote:
> * __sync_mem_compare_exchange has the skeleton in place, but not the
> guts.  There are some issues that rth and I will work out later, I
> just don't want to hold up the rest of the patch for that. Right now
> it will fail the compare_exchange tests.

Please disable the relevant tests too.

> if ((icode != CODE_FOR_nothing) && (model == MEMMODEL_SEQ_CST || 
>model == MEMMODEL_ACQ_REL))
> + #ifdef HAVE_sync_mem_thread_fence
> + emit_mem_thread_fence (model);
> + #else
>   expand_builtin_sync_synchronize ();
> + #endif

Coding style requires braces here.  Yes, only one of the two
functions are called, but that's not immediately obvious to
the eye.

Lots of other instances in your new code.

That said, why wouldn't emit_mem_thread_fence always exist
and generate the expand_builtin_sync_synchronize as needed?

> + #ifdef HAVE_sync_mem_thread_fence
> + emit_mem_thread_fence (model);
> + #else
> + if (model != MEMMODEL_RELAXED)
> +   expand_builtin_sync_synchronize ();
> + #endif
> + 
> +target = expand_sync_fetch_operation (mem, val, code, false, target);
> + 
> + #ifdef HAVE_sync_mem_thread_fence
> + emit_mem_thread_fence (model);
> + #else
> + if (model != MEMMODEL_RELAXED)
> +   expand_builtin_sync_synchronize ();
> + #endif
> +   return target;

Over-zealous with your pattern.  The sync_fetch_op is a
full barrier.  You don't need the extra stuff.

> + static rtx
> + maybe_convert_modes (tree exp, enum machine_mode mode)

I think a better name might be expand_expr_force_mode.


Otherwise it looks ok.


r~


Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-27 Thread Georg-Johann Lay
Richard Henderson wrote:
> On 07/27/2011 06:21 AM, Georg-Johann Lay wrote:
>> +(define_insn_and_split "*mulsi3"
>> +  [(set (match_operand:SI 0 "pseudo_register_operand"  
>> "=r")
>> +(mult:SI (match_operand:SI 1 "pseudo_register_operand"  
>> "r")
>> + (match_operand:SI 2 "pseudo_register_or_const_int_operand" 
>> "rn")))
>> +   (clobber (reg:DI 18))]
>> +  "AVR_HAVE_MUL && !reload_completed"
>> +  { gcc_unreachable(); }
>> +  "&& 1"
>> +  [(set (reg:SI 18)
>> +(match_dup 1))
> 
> That seems like it's guaranteed to force an unnecessary move.

It's the same as with the present implementation:


long mul (long a, long b)
{
return a*b;
}

long mul2 (long a, long b)
{
return b*a;
}

translates -Os -mmcu=atmega8 to

mul:
rcall __mulsi3
ret

mul2:
push r12
push r13
push r14
push r15
movw r12,r22
movw r14,r24
movw r24,r20
movw r22,r18
movw r20,r14
movw r18,r12
rcall __mulsi3
pop r15
pop r14
pop r13
pop r12
ret

> Have you tried defining special-purpose register classes to
> force reload to move the data into the right hard regs?
> 
> E.g.  "Y" prefix
>   "QHS" size
>   two digit starting register number, as needed.

I already thought about such register classes/constraints with
almost the same nomenclature, i.e. with prefix "R".

> You'll probably end up with quite a few register classes 
> out of this, but hopefully reload can do a better job than
> you can manually...

Agreed.

insns that will benefit are insns with two input operands that
commute, i.e. mulsi3, umulhisi3, mulhisi3, mulhi3.

Maybe even other 2-input insns could benefit because there's no
predetermined order in which the moves are accomplished; e.g.
moving R24 before R22 in udivmodqi4.  I don't know if register
allocator is smart enough to swap the assignments if that is
better.

Moreover, it would reduce the number of insns resp. split
patterns and help cleanup md.

I'd prefer to do that work in a separate patch.  The current patch
behaves the same as the old code, so it's not a performance
regression of the current patch.

Johann






Re: [PATCH PR43513, 1/3] Replace vla with array - Implementation.

2011-07-27 Thread Michael Matz
Hi,

On Wed, 27 Jul 2011, Richard Guenther wrote:

> > > I don't think it is safe to try to get at the VLA type the way you do.
> > 
> > I don't understand in what way it's not safe. Do you mean I don't manage to 
> > find
> > the type always, or that I find the wrong type, or something else?
> 
> I think you might get the wrong type, you also do not transform code
> like
> 
>   int *p = alloca(4);
>   *p = 3;
> 
> as there is no array type involved here.

That's good, because you _can't_ transform that code into an array decl.  
See:

   for (int i = 0; i < 100; i++)
 p[i] = alloca(4);
   assert (p[0] != p[1]);

vs.
   char vla_cst[4];
   for (int i = 0; i < 100; i++)
 p[i] = &vla_cst;
   assert (p[0] != p[1]);

Tom: you can reliably detect if an alloca call is for a VLA by checking 
CALL_ALLOCA_FOR_VAR_P (on a tree call expression, but only if it's a 
builtin call) or gimple_call_alloca_for_var_p (on a gimple call stmt).


Ciao,
Michael.


Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 8:45 AM, Kirill Yukhin  wrote:
> Just have a closer look to ABM intrinsics support in GCC
> Seems, we have popcnt support in separate file: popcntintrin.h
>
> So, after I move lzcnt intrinsics to lzcntintrin.h, abmintrin will
> become useless and have to be removed at all

We can't remove an installed header file.  It should just include
other header files.


H.J.
---


Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread Kirill Yukhin
Just have a closer look to ABM intrinsics support in GCC
Seems, we have popcnt support in separate file: popcntintrin.h

So, after I move lzcnt intrinsics to lzcntintrin.h, abmintrin will
become useless and have to be removed at all

K

On Wed, Jul 27, 2011 at 6:20 PM, H.J. Lu  wrote:
> On Wed, Jul 27, 2011 at 7:06 AM, Kirill Yukhin  
> wrote:
>> Okay,
>> Uros, thanks for correcting me. Here is updated Changelogs and patch.
>>
>> ChangeLog entry:
>> 2011-07-27  Kirill Yukhin  
>>
>>        PR target/49547
>>        * config/i386/abmintrin.h (head): Check if __LZCNT__ is defined.
>>        (__lzcnt): Rename to ...
>>        (__lzcnt32): ... this.
>>        * config/i386/bmiintrin.h (head): Update copyright year.
>>        (__lzcnt_u16): Removed.
>>        (__lzcnt_u32): Removed.
>>        (__lzcnt_u64): Likewise.
>>        * config/i386/cpuid.h: New define.
>>        * config/i386/driver-i386.c (host_detect_local_cpu): Detect
>>        LZCNT feature.
>>        * config/i386/i386-c.c (ix86_target_macros_internal): Define
>>        __LZCNT__ if needed.
>>        * config/i386/i386.c (ix86_target_string): New option -mlzcnt.
>>        (ix86_option_override_internal): Handle LZCNT option.
>>        (ix86_valid_target_attribute_inner_p): Likewise.
>>        (struct builtin_description bdesc_args) : Update.
>>        * config/i386/i386.h (TARGET_LZCNT): New.
>>        (CLZ_DEFINED_VALUE_AT_ZERO): Update.
>>        * config/i386/i386.md (clz2): Update insn constraint.
>>        (clz2_lzcnt): Likewise.
>>        * doc/invoke.texi: Mention -mlzcnt option.
>>        * doc/extend.texi: Likewise.
>
> Please mention config/i386/i386.opt.  It is very odd to include
> abmintrin.h for lzcnt.  What if someone decides to add new intrinsics
> for ABM?  I think it should be renamed to lzcntintrin.h and make
> abmintrin.h include it instead.
>
> H.J.
>


Re: [patch i386]: Allow attribute ms_abi/sysv_abi for 32-bit

2011-07-27 Thread Richard Henderson
On 07/27/2011 02:12 AM, Kai Tietz wrote:
> 2011-07-27  Kai Tietz  
> 
> * config/i386/i386.c (ix86_option_override_internal): Allow -mabi
> for 32-bit, too.
> (ix86_handle_abi_attribute): Allow function attributes ms_abi/sysv_abi
> in 32-bit mode, too.
> * doc/extend.texi: Adjust attribute documentation.
> 
> 2011-07-27  Kai Tietz  
> 
> * gcc.target/i386/aggregate-ret3.c: New test.
> * gcc.target/i386/aggregate-ret4.c: New test.

Ok.


r~


[PATCH, RFC] PR49749 biased reassociation for accumulator patterns

2011-07-27 Thread William J. Schmidt
This is a draft patch that biases the reassociation machinery so that
each iteration of an accumulator pattern in a loop is independent of the
other iterations.  This addresses a problem identified as an accidental
side effect of the bug observed in PR tree-optimization/49749.  This
patch reverses a substantial performance loss to 410.bwaves in cpu2006.

I've restricted the bias to take place only for phi results that are
identified as true accumulators within innermost loops.  Currently there
is no restriction on the size or complexity of the loop, otherwise.

I've bootstrapped and regression-tested this on powerpc64-linux with no
new failures.  I'm still doing performance runs to assess the results,
and may still need to tweak this.  It's close, though, and since I have
upcoming vacation, I wanted to post this for comments now in hopes of
wrapping this up by the end of the week.  Please let me know what you
think.

Thanks,
Bill


2011-07-27  Bill Schmidt  

PR tree-optimization/49749
* tree-ssa-reassoc.c (get_rank): Add forward declaration.
(PHI_LOOP_BIAS): New macro.
(phi_rank): New function.
(phi_propagation_rank): Likewise.
(propagate_rank): Likewise.
(get_rank): Add calls to phi_rank and propagate_rank.

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 176585)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -190,7 +190,118 @@ static long *bb_rank;
 /* Operand->rank hashtable.  */
 static struct pointer_map_t *operand_rank;
 
+/* Forward decls.  */
+static long get_rank (tree);
 
+
+/* Bias amount for loop-carried phis.  We want this to be larger than
+   the depth of any reassociation tree we can see, but not larger than
+   the rank difference between two blocks.  */
+#define PHI_LOOP_BIAS (1 << 15)
+
+/* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
+   an innermost loop, and the phi has only a single use which is inside
+   the loop, then the rank is the block rank of the loop latch plus an
+   extra bias for the loop-carried dependence.  This causes expressions
+   calculated into an accumulator variable to be independent for each
+   iteration of the loop.  If STMT is some other phi, the rank is the
+   block rank of its containing block.  */
+static long
+phi_rank (gimple stmt)
+{
+  basic_block bb = gimple_bb (stmt);
+  struct loop *father = bb->loop_father;
+  tree res;
+  unsigned i;
+  use_operand_p use;
+  gimple use_stmt;
+
+  /* We only care about real loops (those with a latch).  */
+  if (!father->latch)
+return bb_rank[bb->index];
+
+  /* Interesting phis must be in headers of innermost loops.  */
+  if (bb != father->header
+  || father->inner)
+return bb_rank[bb->index];
+
+  /* Ignore virtual SSA_NAMEs.  */
+  res = gimple_phi_result (stmt);
+  if (!is_gimple_reg (SSA_NAME_VAR (res)))
+return bb_rank[bb->index];
+
+  /* The phi definition must have a single use, and that use must be
+ within the loop.  Otherwise this isn't an accumulator pattern.  */
+  if (!single_imm_use (res, &use, &use_stmt)
+  || gimple_bb (use_stmt)->loop_father != father)
+return bb_rank[bb->index];
+
+  /* Look for phi arguments from within the loop.  If found, bias this phi.  */
+  for (i = 0; i < gimple_phi_num_args (stmt); i++)
+{
+  tree arg = gimple_phi_arg_def (stmt, i);
+  if (TREE_CODE (arg) == SSA_NAME
+ && !SSA_NAME_IS_DEFAULT_DEF (arg))
+   {
+ gimple def_stmt = SSA_NAME_DEF_STMT (arg);
+ if (gimple_bb (def_stmt)->loop_father == father)
+   return bb_rank[father->latch->index] + PHI_LOOP_BIAS;
+   }
+}
+
+  /* Must be an uninteresting phi.  */
+  return bb_rank[bb->index];
+}
+
+/* If EXP is an SSA_NAME defined by a PHI statement that represents a
+   loop-carried dependence of an innermost loop, return the block rank
+   of the defining PHI statement.  Otherwise return zero.
+
+   The motivation for this is that we can't propagate the biased rank
+   of the loop-carried phi, as this defeats the purpose of the bias.
+   However, the rank of a value that depends on the result of a loop-
+   carried phi should still be higher than the rank of a value that
+   depends on values from more distant blocks.  */
+static long
+phi_propagation_rank (tree exp)
+{
+  gimple phi_stmt;
+  long block_rank;
+
+  if (TREE_CODE (exp) != SSA_NAME
+  || SSA_NAME_IS_DEFAULT_DEF (exp))
+return 0;
+
+  phi_stmt = SSA_NAME_DEF_STMT (exp);
+
+  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
+return 0;
+
+  /* Non-loop-carried phis have block rank.  Loop-carried phis have
+ an additional bias added in.  If this phi doesn't have block rank,
+ it's biased and should not be propagated.  */
+  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
+
+  if (phi_rank (phi_stmt) != block_rank)
+return block_rank;
+
+  return 0;
+}
+
+/* Retu

Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-27 Thread Richard Henderson
On 07/27/2011 06:21 AM, Georg-Johann Lay wrote:
> +(define_insn_and_split "*mulsi3"
> +  [(set (match_operand:SI 0 "pseudo_register_operand"  
> "=r")
> +(mult:SI (match_operand:SI 1 "pseudo_register_operand"  
> "r")
> + (match_operand:SI 2 "pseudo_register_or_const_int_operand" 
> "rn")))
> +   (clobber (reg:DI 18))]
> +  "AVR_HAVE_MUL && !reload_completed"
> +  { gcc_unreachable(); }
> +  "&& 1"
> +  [(set (reg:SI 18)
> +(match_dup 1))

That seems like it's guaranteed to force an unnecessary move.
Have you tried defining special-purpose register classes to
force reload to move the data into the right hard regs?

E.g.  "Y" prefix
  "QHS" size
  two digit starting register number, as needed.

You'll probably end up with quite a few register classes 
out of this, but hopefully reload can do a better job than
you can manually...


r~


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Richard Guenther
On Wed, Jul 27, 2011 at 4:56 PM, Richard Guenther
 wrote:
> On Wed, Jul 27, 2011 at 4:52 PM, Richard Guenther
>  wrote:
>> On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill  wrote:
>>> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:

> I think the adjustment above is intended to match the adjustment of the
> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
> that bitregion_start%BITS_PER_UNIT == 0.

 That was intentional. bitregion_start always falls on a byte boundary,
 does it not?
>>>
>>> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
>>> then.
>>
>> Looks like this was an approval ...
>>
>> Anyway, I don't think a --param is appropriate to control a flag whether
>> to allow store data-races to be created.  Why not use a regular option 
>> instead?
>>
>> I believe that any after-the-fact attempt to recover bitfield boundaries is
>> going to fail unless you preserve more information during bitfield layout.
>>
>> Consider
>>
>> struct {
>>  char : 8;
>>  char : 0;
>>  char : 8;
>> };
>>
>> where the : 0 isn't preserved in any way and you can't distinguish
>> it from struct { char : 8; char : 8; }.
>
> Oh, and
>
>   INNERDECL is the actual object being referenced.
>
>      || (!ptr_deref_may_alias_global_p (innerdecl)
>
> is surely not what you want.  That asks if *innerdecl is global memory.
> I suppose you want is_global_var (innerdecl)?  But with
>
>          && (DECL_THREAD_LOCAL_P (innerdecl)
>              || !TREE_STATIC (innerdecl
>
> you can simply skip this test.  Or what was it supposed to do?

And

  t = build3 (COMPONENT_REF, TREE_TYPE (exp),
  unshare_expr (TREE_OPERAND (exp, 0)),
  fld, NULL_TREE);
  get_inner_reference (t, &bitsize, &bitpos, &offset,
   &mode, &unsignedp, &volatilep, true);

for each field of a struct type is of course ... gross!  In fact you already
have the FIELD_DECL in the single caller!  Yes I know there is not
enough information preserved by bitfield layout - see my previous reply.

  if (TREE_CODE (to) == COMPONENT_REF
  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
get_bit_range (&bitregion_start, &bitregion_end,
   to, tem, bitpos, bitsize);

and shouldn't this test DECL_BIT_FIELD instead of DECL_BIT_FIELD_TYPE?

Richard.


Re: [PATCH, i386, testsuite] New BMI testcases

2011-07-27 Thread Kirill Yukhin
Thanks, for inputs.
Sure, lzcnt useless here. I am updated and tested BMI detection in test driver.

testuite/ChageLog entry:
2011-07-27  Yukhin Kirill  

* gcc.target/i386/i386.exp (check_effective_target_bmi): New.
* gcc.target/i386/bmi-andn-1.c: New test.
* gcc.target/i386/bmi-andn-1a.c: Likewise.
* gcc.target/i386/bmi-andn-2.c: Likewise.
* gcc.target/i386/bmi-andn-2a.c: Likewise.
* gcc.target/i386/bmi-bextr-1.c: Likewise.
* gcc.target/i386/bmi-bextr-1a.c: Likewise.
* gcc.target/i386/bmi-bextr-2.c: Likewise.
* gcc.target/i386/bmi-bextr-2a.c: Likewise.
* gcc.target/i386/bmi-blsi-1.c: Likewise.
* gcc.target/i386/bmi-blsi-1a.c: Likewise.
* gcc.target/i386/bmi-blsi-2.c: Likewise.
* gcc.target/i386/bmi-blsi-2a.c: Likewise.
* gcc.target/i386/bmi-blsmsk-1.c: Likewise.
* gcc.target/i386/bmi-blsmsk-1a.c: Likewise.
* gcc.target/i386/bmi-blsmsk-2.c: Likewise.
* gcc.target/i386/bmi-blsmsk-2a.c: Likewise.
* gcc.target/i386/bmi-blsr-1.c: Likewise.
* gcc.target/i386/bmi-blsr-1a.c: Likewise.
* gcc.target/i386/bmi-blsr-2.c: Likewise.
* gcc.target/i386/bmi-blsr-2a.c: Likewise.
* gcc.target/i386/bmi-tzcnt-1.c: Likewise.
* gcc.target/i386/bmi-tzcnt-1a.c: Likewise.
* gcc.target/i386/bmi-tzcnt-2.c: Likewise.
* gcc.target/i386/bmi-tzcnt-2a.c: Likewise.


New patch is attached.
Is it OK?

Thanks, K

On Wed, Jul 27, 2011 at 6:23 PM, H.J. Lu  wrote:
> On Wed, Jul 27, 2011 at 7:08 AM, Kirill Yukhin  
> wrote:
>> Hi,
>> I've implemented a dozen of tests which cover BMI extensions
>>
>> testsuite/ChangeLog entry:
>> 2011-07-27  Yukhin Kirill  
>>
>>        * gcc.target/i386/i386.exp (check_effective_target_bmi): New.
>>        * gcc.target/i386/bmi-bextr-1.c: New test.
>>        * gcc.target/i386/bmi-bextr-1a.c: Likewise.
>>        * gcc.target/i386/bmi-bextr-2.c: Likewise.
>>        * gcc.target/i386/bmi-bextr-2a.c: Likewise.
>>        * gcc.target/i386/bmi-blsi-1.c: Likewise.
>>        * gcc.target/i386/bmi-blsi-1a.c: Likewise.
>>        * gcc.target/i386/bmi-blsi-2.c: Likewise.
>>        * gcc.target/i386/bmi-blsi-2a.c: Likewise.
>>        * gcc.target/i386/bmi-blsmsk-1.c: Likewise.
>>        * gcc.target/i386/bmi-blsmsk-1a.c: Likewise.
>>        * gcc.target/i386/bmi-blsmsk-2.c: Likewise.
>>        * gcc.target/i386/bmi-blsmsk-2a.c: Likewise.
>>        * gcc.target/i386/bmi-blsr-1.c: Likewise.
>>        * gcc.target/i386/bmi-blsr-1a.c: Likewise.
>>        * gcc.target/i386/bmi-blsr-2.c: Likewise.
>>        * gcc.target/i386/bmi-blsr-2a.c: Likewise.
>>        * gcc.target/i386/bmi-lzcnt-1.c: Likewise.
>>        * gcc.target/i386/bmi-lzcnt-1a.c: Likewise.
>>        * gcc.target/i386/bmi-lzcnt-2.c: Likewise.
>>        * gcc.target/i386/bmi-lzcnt-2a.c: Likewise.
>
> Are you sure your patch have those lzcnt tests?
>
>>        * gcc.target/i386/bmi-tzcnt-1.c: Likewise.
>>        * gcc.target/i386/bmi-tzcnt-1a.c: Likewise.
>>        * gcc.target/i386/bmi-tzcnt-2.c: Likewise.
>>        * gcc.target/i386/bmi-tzcnt-2a.c: Likewise.
>>
>
> BMI check:
>
>  __asm__ ("xchg{l}\t{%%}ebx, %1\n\t"
> +          "cpuid\n\t"
> +          "xchg{l}\t{%%}ebx, %1\n\t"
> +          : "=a" (eax), "=r" (ebx), "=c" (ecx), "=d" (edx)
> +          : "0" (7), "2" (0));
>
> is wrong.  It should be
>
>  if (__get_cpuid_max (0, NULL) < 7)
>    return 0;
>
>  __cpuid_count (7, 0, eax, ebx, ecx, edx);
>
>
> --
> H.J.
>


bmi1-2.testcases.gcc.patch
Description: Binary data


Re: Fix typo in internal documents

2011-07-27 Thread Paulo J. Matos

Today is the 27th, not 26th so the Changelog should be:
2011-07-27 Paulo J. Matos 
* Fix internal documentation typo. TERGET should be TARGET.

On 27/07/11 15:21, Paulo J. Matos wrote:

There is a typo in the internal documentation. This patch fixes this.

Please let me know if the patch is not in the required format.

PMatos

2011-07-26 Paulo J. Matos 

* Fix internal documentation typo. TERGET should be TARGET.



--
PMatos



Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Richard Guenther
On Wed, Jul 27, 2011 at 4:52 PM, Richard Guenther
 wrote:
> On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill  wrote:
>> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>>
 I think the adjustment above is intended to match the adjustment of the
 address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
 that bitregion_start%BITS_PER_UNIT == 0.
>>>
>>> That was intentional. bitregion_start always falls on a byte boundary,
>>> does it not?
>>
>> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
>> then.
>
> Looks like this was an approval ...
>
> Anyway, I don't think a --param is appropriate to control a flag whether
> to allow store data-races to be created.  Why not use a regular option 
> instead?
>
> I believe that any after-the-fact attempt to recover bitfield boundaries is
> going to fail unless you preserve more information during bitfield layout.
>
> Consider
>
> struct {
>  char : 8;
>  char : 0;
>  char : 8;
> };
>
> where the : 0 isn't preserved in any way and you can't distinguish
> it from struct { char : 8; char : 8; }.

Oh, and

   INNERDECL is the actual object being referenced.

  || (!ptr_deref_may_alias_global_p (innerdecl)

is surely not what you want.  That asks if *innerdecl is global memory.
I suppose you want is_global_var (innerdecl)?  But with

  && (DECL_THREAD_LOCAL_P (innerdecl)
  || !TREE_STATIC (innerdecl

you can simply skip this test.  Or what was it supposed to do?

Richard.


Re: Support -mcpu=native on Tru64 UNIX

2011-07-27 Thread Richard Henderson
On 07/27/2011 04:57 AM, Rainer Orth wrote:
> The following patch does so for -mcpu=native/-mtune=native on Tru64
> UNIX, using getsysinfo(2).  A non-bootstrap C-only build is currently
> running, the options above work as expected.

I hadn't realized that the =native detection wasn't being done
via __builtin_implver and __builtin_amask.  Seems to me that
we should just use that and eliminate all the OS-specific stuff.


r~


Re: PATCH: PR target/49860: [x32] Error: cannot represent relocation type BFD_RELOC_64 in x32 mode

2011-07-27 Thread Uros Bizjak
On Wed, Jul 27, 2011 at 3:28 PM, Uros Bizjak  wrote:

 Pmode is still in DImode and DImode addresses are *valid* addresses.
 For the testcase from PR,
 expand generates SImode symbol that is later extended to DImode and
 handled through movabs.

 Your patch just papers over this fact. Please see how
 *movdi_internal_rex64 handles immediates.

>>>
>>> For the testcase in:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49860
>>>
>>> my goal is  to make TARGET_X32 to generate code very similar to
>>> TARGET_32BIT, except for r8-r15.  How can we achieve that?
>>
>> Unless you can prevent DImode symbols in ix86_legitimate_constant_p, we 
>> can't.
>
> _Perhaps_ we can reject large offsets for X32 from
> ix86_legitimate_constant_p here:
>
>      if (GET_CODE (x) == PLUS)
>        {
>          if (!CONST_INT_P (XEXP (x, 1)))
>            return false;
>          x = XEXP (x, 0);
>        }
>
> (I didn't test this idea yet).

This won't fly:

pr49860.c:18:1: error: unrecognizable insn:
(insn 61 65 62 4 (set (reg:DI 96)
(const:DI (plus:DI (symbol_ref:DI ("outbuf") [flags 0x40]
)
(const_int -4294967295 [0x0001] -1
 (nil))

Uros.


Re: [C++0x] contiguous bitfields race implementation

2011-07-27 Thread Richard Guenther
On Tue, Jul 26, 2011 at 7:38 PM, Jason Merrill  wrote:
> On 07/26/2011 10:32 AM, Aldy Hernandez wrote:
>>
>>> I think the adjustment above is intended to match the adjustment of the
>>> address by bitregion_start/BITS_PER_UNIT, but the above seems to assume
>>> that bitregion_start%BITS_PER_UNIT == 0.
>>
>> That was intentional. bitregion_start always falls on a byte boundary,
>> does it not?
>
> Ah, yes, of course, it's bitnum that might not.  The code changes look good,
> then.

Looks like this was an approval ...

Anyway, I don't think a --param is appropriate to control a flag whether
to allow store data-races to be created.  Why not use a regular option instead?

I believe that any after-the-fact attempt to recover bitfield boundaries is
going to fail unless you preserve more information during bitfield layout.

Consider

struct {
  char : 8;
  char : 0;
  char : 8;
};

where the : 0 isn't preserved in any way and you can't distinguish
it from struct { char : 8; char : 8; }.

Richard.

> Jason
>


Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread Kirill Yukhin
Good point, I forgot about ABM's another instruction - popcnt.
I'll do.

Thanks, K

On Wed, Jul 27, 2011 at 6:20 PM, H.J. Lu  wrote:
> On Wed, Jul 27, 2011 at 7:06 AM, Kirill Yukhin  
> wrote:
>> Okay,
>> Uros, thanks for correcting me. Here is updated Changelogs and patch.
>>
>> ChangeLog entry:
>> 2011-07-27  Kirill Yukhin  
>>
>>        PR target/49547
>>        * config/i386/abmintrin.h (head): Check if __LZCNT__ is defined.
>>        (__lzcnt): Rename to ...
>>        (__lzcnt32): ... this.
>>        * config/i386/bmiintrin.h (head): Update copyright year.
>>        (__lzcnt_u16): Removed.
>>        (__lzcnt_u32): Removed.
>>        (__lzcnt_u64): Likewise.
>>        * config/i386/cpuid.h: New define.
>>        * config/i386/driver-i386.c (host_detect_local_cpu): Detect
>>        LZCNT feature.
>>        * config/i386/i386-c.c (ix86_target_macros_internal): Define
>>        __LZCNT__ if needed.
>>        * config/i386/i386.c (ix86_target_string): New option -mlzcnt.
>>        (ix86_option_override_internal): Handle LZCNT option.
>>        (ix86_valid_target_attribute_inner_p): Likewise.
>>        (struct builtin_description bdesc_args) : Update.
>>        * config/i386/i386.h (TARGET_LZCNT): New.
>>        (CLZ_DEFINED_VALUE_AT_ZERO): Update.
>>        * config/i386/i386.md (clz2): Update insn constraint.
>>        (clz2_lzcnt): Likewise.
>>        * doc/invoke.texi: Mention -mlzcnt option.
>>        * doc/extend.texi: Likewise.
>
> Please mention config/i386/i386.opt.  It is very odd to include
> abmintrin.h for lzcnt.  What if someone decides to add new intrinsics
> for ABM?  I think it should be renamed to lzcntintrin.h and make
> abmintrin.h include it instead.
>
> H.J.
>


Re: [testsuite] Provide and use mmap effective-target keyword

2011-07-27 Thread Rainer Orth
Ulrich,

> ChangeLog:
>
>   * lib/target-supports.exp (check_effective_target_mmap): Use
>   check_function_available.

Ok, thanks.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] PR45450: disable legality check after an openscop read

2011-07-27 Thread Sebastian Pop
Hi,

I will commit this patch to trunk after regstrap.

Sebastian

2011-07-23  Sebastian Pop  

PR middle-end/45450
* graphite-poly.c (apply_poly_transforms): Disable legality check
after an openscop read.
---
 gcc/ChangeLog   |6 ++
 gcc/graphite-poly.c |6 +-
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 266dd28..4dbca71 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2011-07-23  Sebastian Pop  
 
+   PR middle-end/45450
+   * graphite-poly.c (apply_poly_transforms): Disable legality check
+   after an openscop read.
+
+2011-07-23  Sebastian Pop  
+
PR middle-end/47691
* graphite-clast-to-gimple.c (translate_clast_user): Update use of
copy_bb_and_scalar_dependences.
diff --git a/gcc/graphite-poly.c b/gcc/graphite-poly.c
index bfdbc9f..db5b0cb 100644
--- a/gcc/graphite-poly.c
+++ b/gcc/graphite-poly.c
@@ -738,7 +738,11 @@ apply_poly_transforms (scop_p scop)
   graphite_file = init_graphite_in_file (file_scop_number);
   transform_done |= graphite_read_scop_file (graphite_file, scop);
 
-  if (!graphite_legal_transform (scop))
+  /* We cannot check for the legality of the transform here: there
+are cases where graphite_legal_transform cannot determine the
+dependence at compile time.  For an example, see the
+explanation of why http://gcc.gnu.org/PR45450 is invalid.  */
+  if (0 && !graphite_legal_transform (scop))
fatal_error ("the graphite file read for scop %d does not contain a 
legal transform",
 (int) file_scop_number);
 
-- 
1.7.4.1



Fix typo in internal documents

2011-07-27 Thread Paulo J. Matos

There is a typo in the internal documentation. This patch fixes this.

Please let me know if the patch is not in the required format.

PMatos

2011-07-26  Paulo J. Matos 

* Fix internal documentation typo. TERGET should be TARGET.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 097531f..419faf0 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2851,7 +2851,7 @@ A target hook returns the maximum number of consecutive registers
 of class @var{rclass} needed to hold a value of mode @var{mode}.
 
 This is closely related to the macro @code{HARD_REGNO_NREGS}.  In fact,
-the value returned by @code{TERGET_CLASS_MAX_NREGS (@var{rclass},
+the value returned by @code{TARGET_CLASS_MAX_NREGS (@var{rclass},
 @var{mode})} target hook should be the maximum value of
 @code{HARD_REGNO_NREGS (@var{regno}, @var{mode})} for all @var{regno}
 values in the class @var{rclass}.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 01beeb4..0a3a396 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2837,7 +2837,7 @@ A target hook returns the maximum number of consecutive registers
 of class @var{rclass} needed to hold a value of mode @var{mode}.
 
 This is closely related to the macro @code{HARD_REGNO_NREGS}.  In fact,
-the value returned by @code{TERGET_CLASS_MAX_NREGS (@var{rclass},
+the value returned by @code{TARGET_CLASS_MAX_NREGS (@var{rclass},
 @var{mode})} target hook should be the maximum value of
 @code{HARD_REGNO_NREGS (@var{regno}, @var{mode})} for all @var{regno}
 values in the class @var{rclass}.


Re: [PATCH PR43513, 1/3] Replace vla with array - Implementation.

2011-07-27 Thread Richard Guenther
On Wed, 27 Jul 2011, Tom de Vries wrote:

> On 07/27/2011 02:12 PM, Richard Guenther wrote:
> > On Wed, 27 Jul 2011, Tom de Vries wrote:
> > 
> >> On 07/27/2011 01:50 PM, Tom de Vries wrote:
> >>> Hi Richard,
> >>>
> >>> I have a patch set for bug 43513 - The stack pointer is adjusted twice.
> >>>
> >>> 01_pr43513.3.patch
> >>> 02_pr43513.3.test.patch
> >>> 03_pr43513.3.mudflap.patch
> >>>
> >>> The patch set has been bootstrapped and reg-tested on x86_64.
> >>>
> >>> I will sent out the patches individually.
> >>>
> >>
> >> The patch replaces a vla __builtin_alloca that has a constant argument 
> >> with an
> >> array declaration.
> >>
> >> OK for trunk?
> > 
> > I don't think it is safe to try to get at the VLA type the way you do.
> 
> I don't understand in what way it's not safe. Do you mean I don't manage to 
> find
> the type always, or that I find the wrong type, or something else?

I think you might get the wrong type, you also do not transform code
like

  int *p = alloca(4);
  *p = 3;

as there is no array type involved here.

> > In fact I would simply do sth like
> > 
> >   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
> >   n_elem = size * 8 / BITS_PER_UNIT;
> >   array_type = build_array_type_nelts (elem_type, n_elem);
> >   var = create_tmp_var (array_type, NULL);
> >   return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var));
> > 
> 
> I tried this code on the example, and it works, but the newly declared type 
> has
> an 8-bit alignment, while the vla base type has a 32 bit alignment.  This make
> the memory access in the example potentially unaligned, which prohibits an
> ivopts optimization, so the resulting text size is 68 instead of the 64 
> achieved
> with my current patch.

Ok, so then set DECL_ALIGN of the variable to something reasonable
like MIN (size * 8, GET_MODE_PRECISION (word_mode)).  Basically the
alignment that the targets alloca function would guarantee.

> > And obviously you lose the optimization we arrange with inserting
> > __builtin_stack_save/restore pairs that way - stack space will no
> > longer be shared for subsequent VLAs.  Which means that you'd
> > better limit the size you allow this promotion.
> > 
> 
> Right, I could introduce a parameter for this.

I would think you could use PARAM_LARGE_STACK_FRAME for now and say,
allow a size of PARAM_LARGE_STACK_FRAME / 10?

> > Alternatively this promotion could happen alongsize 
> > optimize_stack_restore using more global knowledge of the effects
> > on the maximum stack size this folding produces.
> > 
> 
> OK, I'll look into this.

Thanks,
Richard.


Re: [PATCH, i386, testsuite] New BMI testcases

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 7:08 AM, Kirill Yukhin  wrote:
> Hi,
> I've implemented a dozen of tests which cover BMI extensions
>
> testsuite/ChangeLog entry:
> 2011-07-27  Yukhin Kirill  
>
>        * gcc.target/i386/i386.exp (check_effective_target_bmi): New.
>        * gcc.target/i386/bmi-bextr-1.c: New test.
>        * gcc.target/i386/bmi-bextr-1a.c: Likewise.
>        * gcc.target/i386/bmi-bextr-2.c: Likewise.
>        * gcc.target/i386/bmi-bextr-2a.c: Likewise.
>        * gcc.target/i386/bmi-blsi-1.c: Likewise.
>        * gcc.target/i386/bmi-blsi-1a.c: Likewise.
>        * gcc.target/i386/bmi-blsi-2.c: Likewise.
>        * gcc.target/i386/bmi-blsi-2a.c: Likewise.
>        * gcc.target/i386/bmi-blsmsk-1.c: Likewise.
>        * gcc.target/i386/bmi-blsmsk-1a.c: Likewise.
>        * gcc.target/i386/bmi-blsmsk-2.c: Likewise.
>        * gcc.target/i386/bmi-blsmsk-2a.c: Likewise.
>        * gcc.target/i386/bmi-blsr-1.c: Likewise.
>        * gcc.target/i386/bmi-blsr-1a.c: Likewise.
>        * gcc.target/i386/bmi-blsr-2.c: Likewise.
>        * gcc.target/i386/bmi-blsr-2a.c: Likewise.
>        * gcc.target/i386/bmi-lzcnt-1.c: Likewise.
>        * gcc.target/i386/bmi-lzcnt-1a.c: Likewise.
>        * gcc.target/i386/bmi-lzcnt-2.c: Likewise.
>        * gcc.target/i386/bmi-lzcnt-2a.c: Likewise.

Are you sure your patch have those lzcnt tests?

>        * gcc.target/i386/bmi-tzcnt-1.c: Likewise.
>        * gcc.target/i386/bmi-tzcnt-1a.c: Likewise.
>        * gcc.target/i386/bmi-tzcnt-2.c: Likewise.
>        * gcc.target/i386/bmi-tzcnt-2a.c: Likewise.
>

BMI check:

 __asm__ ("xchg{l}\t{%%}ebx, %1\n\t"
+  "cpuid\n\t"
+  "xchg{l}\t{%%}ebx, %1\n\t"
+  : "=a" (eax), "=r" (ebx), "=c" (ecx), "=d" (edx)
+  : "0" (7), "2" (0));

is wrong.  It should be

 if (__get_cpuid_max (0, NULL) < 7)
return 0;

  __cpuid_count (7, 0, eax, ebx, ecx, edx);


-- 
H.J.


Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 7:06 AM, Kirill Yukhin  wrote:
> Okay,
> Uros, thanks for correcting me. Here is updated Changelogs and patch.
>
> ChangeLog entry:
> 2011-07-27  Kirill Yukhin  
>
>        PR target/49547
>        * config/i386/abmintrin.h (head): Check if __LZCNT__ is defined.
>        (__lzcnt): Rename to ...
>        (__lzcnt32): ... this.
>        * config/i386/bmiintrin.h (head): Update copyright year.
>        (__lzcnt_u16): Removed.
>        (__lzcnt_u32): Removed.
>        (__lzcnt_u64): Likewise.
>        * config/i386/cpuid.h: New define.
>        * config/i386/driver-i386.c (host_detect_local_cpu): Detect
>        LZCNT feature.
>        * config/i386/i386-c.c (ix86_target_macros_internal): Define
>        __LZCNT__ if needed.
>        * config/i386/i386.c (ix86_target_string): New option -mlzcnt.
>        (ix86_option_override_internal): Handle LZCNT option.
>        (ix86_valid_target_attribute_inner_p): Likewise.
>        (struct builtin_description bdesc_args) : Update.
>        * config/i386/i386.h (TARGET_LZCNT): New.
>        (CLZ_DEFINED_VALUE_AT_ZERO): Update.
>        * config/i386/i386.md (clz2): Update insn constraint.
>        (clz2_lzcnt): Likewise.
>        * doc/invoke.texi: Mention -mlzcnt option.
>        * doc/extend.texi: Likewise.

Please mention config/i386/i386.opt.  It is very odd to include
abmintrin.h for lzcnt.  What if someone decides to add new intrinsics
for ABM?  I think it should be renamed to lzcntintrin.h and make
abmintrin.h include it instead.

H.J.


Re: [PATCH PR43513, 1/3] Replace vla with array - Implementation.

2011-07-27 Thread Tom de Vries
On 07/27/2011 02:12 PM, Richard Guenther wrote:
> On Wed, 27 Jul 2011, Tom de Vries wrote:
> 
>> On 07/27/2011 01:50 PM, Tom de Vries wrote:
>>> Hi Richard,
>>>
>>> I have a patch set for bug 43513 - The stack pointer is adjusted twice.
>>>
>>> 01_pr43513.3.patch
>>> 02_pr43513.3.test.patch
>>> 03_pr43513.3.mudflap.patch
>>>
>>> The patch set has been bootstrapped and reg-tested on x86_64.
>>>
>>> I will sent out the patches individually.
>>>
>>
>> The patch replaces a vla __builtin_alloca that has a constant argument with 
>> an
>> array declaration.
>>
>> OK for trunk?
> 
> I don't think it is safe to try to get at the VLA type the way you do.

I don't understand in what way it's not safe. Do you mean I don't manage to find
the type always, or that I find the wrong type, or something else?

> In fact I would simply do sth like
> 
>   elem_type = build_nonstandard_integer_type (BITS_PER_UNIT, 1);
>   n_elem = size * 8 / BITS_PER_UNIT;
>   array_type = build_array_type_nelts (elem_type, n_elem);
>   var = create_tmp_var (array_type, NULL);
>   return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var));
> 

I tried this code on the example, and it works, but the newly declared type has
an 8-bit alignment, while the vla base type has a 32 bit alignment.  This make
the memory access in the example potentially unaligned, which prohibits an
ivopts optimization, so the resulting text size is 68 instead of the 64 achieved
with my current patch.

> And obviously you lose the optimization we arrange with inserting
> __builtin_stack_save/restore pairs that way - stack space will no
> longer be shared for subsequent VLAs.  Which means that you'd
> better limit the size you allow this promotion.
> 

Right, I could introduce a parameter for this.

> Alternatively this promotion could happen alongsize 
> optimize_stack_restore using more global knowledge of the effects
> on the maximum stack size this folding produces.
> 

OK, I'll look into this.

Thanks,
- Tom


[PATCH, i386, testsuite] New BMI testcases

2011-07-27 Thread Kirill Yukhin
Hi,
I've implemented a dozen of tests which cover BMI extensions

testsuite/ChangeLog entry:
2011-07-27  Yukhin Kirill  

* gcc.target/i386/i386.exp (check_effective_target_bmi): New.
* gcc.target/i386/bmi-bextr-1.c: New test.
* gcc.target/i386/bmi-bextr-1a.c: Likewise.
* gcc.target/i386/bmi-bextr-2.c: Likewise.
* gcc.target/i386/bmi-bextr-2a.c: Likewise.
* gcc.target/i386/bmi-blsi-1.c: Likewise.
* gcc.target/i386/bmi-blsi-1a.c: Likewise.
* gcc.target/i386/bmi-blsi-2.c: Likewise.
* gcc.target/i386/bmi-blsi-2a.c: Likewise.
* gcc.target/i386/bmi-blsmsk-1.c: Likewise.
* gcc.target/i386/bmi-blsmsk-1a.c: Likewise.
* gcc.target/i386/bmi-blsmsk-2.c: Likewise.
* gcc.target/i386/bmi-blsmsk-2a.c: Likewise.
* gcc.target/i386/bmi-blsr-1.c: Likewise.
* gcc.target/i386/bmi-blsr-1a.c: Likewise.
* gcc.target/i386/bmi-blsr-2.c: Likewise.
* gcc.target/i386/bmi-blsr-2a.c: Likewise.
* gcc.target/i386/bmi-lzcnt-1.c: Likewise.
* gcc.target/i386/bmi-lzcnt-1a.c: Likewise.
* gcc.target/i386/bmi-lzcnt-2.c: Likewise.
* gcc.target/i386/bmi-lzcnt-2a.c: Likewise.
* gcc.target/i386/bmi-tzcnt-1.c: Likewise.
* gcc.target/i386/bmi-tzcnt-1a.c: Likewise.
* gcc.target/i386/bmi-tzcnt-2.c: Likewise.
* gcc.target/i386/bmi-tzcnt-2a.c: Likewise.


Patch is attached.
Changes are bootstrapped and make-chek-ed successfully (with and
without BMI simulator).
OK for trunk?

Thanks, K


bmi1.testcases.gcc.patch
Description: Binary data


Re: [Patch, i386, testsuite] Fix for PR49547, new tescases for lzcnt instruction

2011-07-27 Thread Kirill Yukhin
Okay,
Uros, thanks for correcting me. Here is updated Changelogs and patch.

ChangeLog entry:
2011-07-27  Kirill Yukhin  

PR target/49547
* config/i386/abmintrin.h (head): Check if __LZCNT__ is defined.
(__lzcnt): Rename to ...
(__lzcnt32): ... this.
* config/i386/bmiintrin.h (head): Update copyright year.
(__lzcnt_u16): Removed.
(__lzcnt_u32): Removed.
(__lzcnt_u64): Likewise.
* config/i386/cpuid.h: New define.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
LZCNT feature.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__LZCNT__ if needed.
* config/i386/i386.c (ix86_target_string): New option -mlzcnt.
(ix86_option_override_internal): Handle LZCNT option.
(ix86_valid_target_attribute_inner_p): Likewise.
(struct builtin_description bdesc_args) : Update.
* config/i386/i386.h (TARGET_LZCNT): New.
(CLZ_DEFINED_VALUE_AT_ZERO): Update.
* config/i386/i386.md (clz2): Update insn constraint.
(clz2_lzcnt): Likewise.
* doc/invoke.texi: Mention -mlzcnt option.
* doc/extend.texi: Likewise.

testsuite/ChangeLog entry:
2011-07-27  Kirill Yukhin  

* gcc.target/i386/i386.exp (check_effective_target_lzcnt): New.
* gcc.target/i386/lzcnt-1.c: New test.
* gcc.target/i386/lzcnt-2.c: Likewise.
* gcc.target/i386/lzcnt-2a.c: Likewise.
* gcc.target/i386/lzcnt-3.c: Likewise.
* gcc.target/i386/lzcnt-4.c: Likewise.
* gcc.target/i386/lzcnt-4a.c: Likewise.
* gcc.target/i386/lzcnt-5.c: Likewise.
* gcc.target/i386/lzcnt-6.c: Likewise.
* gcc.target/i386/lzcnt-6a.c: Likewise.
* gcc.target/i386/lzcnt-check.h: Likewise.
* gcc.target/i386/sse-12.c (dg-compile): Add -mlzcnt.
* gcc.target/i386/sse-13.c: Likewise.
* gcc.target/i386/sse-14.c: Likewise.
* g++.dg/other/i386-2.C: Likewise.
* g++.dg/other/i386-3.C: Likewise.

Harsha, is it OK for trunk?

Thanks, K


On Wed, Jul 27, 2011 at 3:23 PM, Uros Bizjak  wrote:
> On Wed, Jul 27, 2011 at 12:56 PM, Kirill Yukhin  
> wrote:
>> Sorry, for misunderstanding I've introduced with error in my comment.
>> Your inputs are fixed. Since they don't touch sources, just testsuite,
>> I am posting only tesuite/ChangeLog updated entry.
>
>> tesuite/ChangeLog entry:
>> 2011-07-27  Kirill Yukhin  
>>
>>        * gcc.target/i386/i386.exp (check_effective_target_lzcnt): New.
>>        * gcc.target/i386/lzcnt-1.c: New test.
>>        * gcc.target/i386/lzcnt-2.c: Likewise.
>>        * gcc.target/i386/lzcnt-2a.c: Likewise.
>>        * gcc.target/i386/lzcnt-3.c: New test.
>>        * gcc.target/i386/lzcnt-4.c: Likewise.
>>        * gcc.target/i386/lzcnt-4a.c: Likewise.
>>        * gcc.target/i386/lzcnt-5.c: Likewise.
>>        * gcc.target/i386/lzcnt-6.c: Likewise.
>>        * gcc.target/i386/lzcnt-6a.c: Likewise.
>>        * gcc.target/i386/lzcnt-check.h: New driver to run LZCNT-*
>>        tests only if HW available.
>
> New.
>
>>        * gcc.target/i386/sse-12.c: Added -mlzcnt switch.
>
> * gcc.target/i386/sse-12.c (dg-compile): Add -mlzcnt.
>
>>        * gcc.target/i386/sse-13.c: Likewise.
>>        * gcc.target/i386/sse-14.c: Likewise.
>>        * g++.dg/other/i386-2.C: Likewise.
>>        * g++.dg/other/i386-3.C: Likewise.
>>
>>
>> Patch attached.
>
> Patch also includes non-testsuite changes, please update ChangeLog
> entry as follows:
>
>> 2011-07-26  Kirill Yukhin  
>>
>>        PR target/49547
>>       * config/i386/abmintrin.h (head): Added check if __LZCNT__ is defined.
>
> Check if __LZCNT__ is defined.
>
>>       (__lzcnt32): Fixed name according to Spec.
>
> Rename to ...
>
>>       * config/i386/bmiintrin.h (head): Updated year for Copyright.
>
> Update copyright year.
>
>>       (__lzcnt_u16): Removed.
>>       (__lzcnt_u32): Removed.
>>       (__lzcnt_u64): Likewise.
>>       * config/i386/cpuid.h: New bit defined.
>
> * config/i386/cpuid.h (__bit_LZCNT): New define.
>
>>       * config/i386/driver-i386.c (host_detect_local_cpu): Detect
>>       LZCNT feature.
>>       * config/i386/i386-c.c (ix86_target_macros_internal): Define
>>       __LZCNT__ if needed.
>>       * config/i386/i386.c (ix86_target_string): New entry to array.
>
> New option -mlzcnt.
>
>>       (ix86_option_override_internal): Handling LZCNT option.
>
> Handle ...
>
>>       (ix86_valid_target_attribute_inner_p): Likewise.
>>       (bdesc_args): built-in for LZCNT is extended to work under
>>       another flag.
>
> (struct builtin_description bdesc_args) : Update.
>
>>       * config/i386/i386.h (TARGET_LZCNT): New.
>>       (CLZ_DEFINED_VALUE_AT_ZERO): Updated flag name.
>
> ... : Update.
>
>>       * config/i386/i386.md (clz2): Target fixed.
>
> ... : Update insn constraint.
>
>>       (clz2_lzcnt): Likewise.
>>       * doc/invoke.texi: Added mention of -mlzcnt option.
>
> Mention  -mlzcnt option

Support -mcpu=native on Solaris/SPARC

2011-07-27 Thread Rainer Orth
This is a first cut at supporting -mcpu=native/-mtune=native on
Solaris/SPARC.  Unlike it's Tru64 UNIX/Alpha and IRIX/MIPS (to be
submitted soon) counterparts, it's a bit more involved:

* There's no support for -mcpu=native in the SPARC port yet.

* Access to the %ver register is privileged, so we need OS interfaces to
  access the information.  I couldn't find anything in libc.  While the
  AT_SUN_CPU file from  might fill the bill, it isn't
  actually set according to pargs -x.

  There seem to be two options: libkstat and libpicl.  The former has
  the advantage that it's a tad better documented and talks directly to
  the kernel, while the latter needs picld, which seems overkill.  Both
  are present in Solaris 8, though.

  I prefer the cpu_info:::brand kstat over cpu_info:::implementation:

  The former looks like (from kstat -p cpu_info:::brand):

  cpu_info:0:cpu_info0:brand  UltraSPARC-T2

  compared to

  cpu_info:0:cpu_info0:implementation UltraSPARC-T2 (chipid 0, clock 1165 
MHz)

  but brand was only introduced in Solaris 10.  Before that, only
  implementation existed with this contents:

  cpu_info:0:cpu_info0:implementation UltraSPARC-IIIi

* Unlike IRIX and Tru64 UNIX, where the respective interfaces return a
  numeric identifier for the cpu type from a finite range, on SPARC we
  get string names, and I'm having some trouble determining the complete
  set.  The patch below is based on what I've found so far, but
  certainly needs to be augmented for sun4m cpus which I don't have any
  longer.

* The requirement to link the drivers with an additional library
  (-lkstat) prompted me to introduce GCC_EXTRA_LIBS.  I didn't want to
  link the backends with -lkstat since they don't need it.  The build
  maintainers may not like the way this was done, though.

* Right now, this is Solaris-only since I have no idea what
  /proc/cpuinfo on Linux/SPARC contains.

With all those caveats, the patch has been run through a C-only
non-bootstrap build on sparc-sun-solaris2.11 so far.
-mcpu=native/-mtune=native seem to work as expected, though I'll have to
broaden the range of OS versions tested.  I'm seeing tons of testsuite
failures for -gdwarf-2 -g3 tests, but suppose they are related to recent
debug patches.

Comments, suggestions?

Thanks.
Rainer


2011-07-27  Rainer Orth  

gcc:
* config/sparc/driver-sparc.c: New file.
* config/sparc/x-sparc: New file.
* config.host: Use driver-sparc.o, sparc/x-sparc on
sparc*-*-solaris2*.
* config/sparc/sparc.opt (native): New value for enum
processor_type.
* config/sparc/sparc-opts.h (PROCESSOR_NATIVE): Declare.
* config/sparc/sol2.h [__sparc__] (host_detect_local_cpu): Declare.
(EXTRA_SPEC_FUNCTIONS, MCPU_MTUNE_NATIVE_SPECS,
DRIVER_SELF_SPECS): Define.
* configure.ac (EXTRA_GCC_LIBS): Check for libkstat.
Substitute result.
* configure: Regenerate.
* Makefile.in (EXTRA_GCC_LIBS): Set.
(xgcc$(exeext)): Add $(EXTRA_GCC_LIBS).
(cpp$(exeext)): Likewise.

gcc/cp:
* Make-lang.in (g++$(exeext)): Add $(EXTRA_GCC_LIBS).

gcc/fortran:
* Make-lang.in (gfortran$(exeext)): Add $(EXTRA_GCC_LIBS).

gcc/go:
* Make-lang.in (gccgo$(exeext)): Add $(EXTRA_GCC_LIBS).

gcc/java:
* Make-lang.in ($(XGCJ)$(exeext)): Add $(EXTRA_GCC_LIBS).

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -723,6 +723,9 @@ EXTRA_OBJS = @extra_objs@
 # the gcc driver.
 EXTRA_GCC_OBJS =@extra_gcc_objs@
 
+# List of extra libraries that should be linked with the gcc driver.
+EXTRA_GCC_LIBS = @EXTRA_GCC_LIBS@
+
 # List of additional header files to install.
 EXTRA_HEADERS =@extra_headers_list@
 
@@ -1828,7 +1831,8 @@ libcommon.a: $(OBJS-libcommon)
 xgcc$(exeext): $(GCC_OBJS) gccspec.o libcommon-target.a $(LIBDEPS) \
$(EXTRA_GCC_OBJS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ $(GCC_OBJS) \
- gccspec.o $(EXTRA_GCC_OBJS) libcommon-target.a $(LIBS)
+ gccspec.o $(EXTRA_GCC_OBJS) libcommon-target.a \
+ $(EXTRA_GCC_LIBS) $(LIBS)
 
 # cpp is to cpp0 as gcc is to cc1.
 # The only difference from xgcc is that it's linked with cppspec.o
@@ -1836,7 +1840,8 @@ xgcc$(exeext): $(GCC_OBJS) gccspec.o lib
 cpp$(exeext): $(GCC_OBJS) cppspec.o libcommon-target.a $(LIBDEPS) \
$(EXTRA_GCC_OBJS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ $(GCC_OBJS) \
- cppspec.o $(EXTRA_GCC_OBJS) libcommon-target.a $(LIBS)
+ cppspec.o $(EXTRA_GCC_OBJS) libcommon-target.a \
+ $(EXTRA_GCC_LIBS) $(LIBS)
 
 # Dump a specs file to make -B./ read these specs over installed ones.
 $(SPECS): xgcc$(exeext)
diff --git a/gcc/config.host b/gcc/config.host
--- a/gcc/config.host
+++ b/gcc/config.host
@@ -157,6 +157,14 @@ case ${host} in
;;
 esac
 ;;
+  sparc*-*-solaris2*)
+case ${target} in
+   

Re: PATCH: PR target/49860: [x32] Error: cannot represent relocation type BFD_RELOC_64 in x32 mode

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 6:09 AM, Uros Bizjak  wrote:
> On Wed, Jul 27, 2011 at 3:00 PM, H.J. Lu  wrote:
>
>>> Pmode is still in DImode and DImode addresses are *valid* addresses.
>>> For the testcase from PR,
>>> expand generates SImode symbol that is later extended to DImode and
>>> handled through movabs.
>>>
>>> Your patch just papers over this fact. Please see how
>>> *movdi_internal_rex64 handles immediates.
>>>
>>
>> For the testcase in:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49860
>>
>> my goal is  to make TARGET_X32 to generate code very similar to
>> TARGET_32BIT, except for r8-r15.  How can we achieve that?
>
> Unless you can prevent DImode symbols in ix86_legitimate_constant_p, we can't.
>

This

subq$outbuf-4294967295, %rdx

comes from loop unrolling on ptr_mode and works in x32.


-- 
H.J.


Re: PATCH: PR target/49860: [x32] Error: cannot represent relocation type BFD_RELOC_64 in x32 mode

2011-07-27 Thread Uros Bizjak
On Wed, Jul 27, 2011 at 3:09 PM, Uros Bizjak  wrote:
> On Wed, Jul 27, 2011 at 3:00 PM, H.J. Lu  wrote:
>
>>> Pmode is still in DImode and DImode addresses are *valid* addresses.
>>> For the testcase from PR,
>>> expand generates SImode symbol that is later extended to DImode and
>>> handled through movabs.
>>>
>>> Your patch just papers over this fact. Please see how
>>> *movdi_internal_rex64 handles immediates.
>>>
>>
>> For the testcase in:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49860
>>
>> my goal is  to make TARGET_X32 to generate code very similar to
>> TARGET_32BIT, except for r8-r15.  How can we achieve that?
>
> Unless you can prevent DImode symbols in ix86_legitimate_constant_p, we can't.

_Perhaps_ we can reject large offsets for X32 from
ix86_legitimate_constant_p here:

  if (GET_CODE (x) == PLUS)
{
  if (!CONST_INT_P (XEXP (x, 1)))
return false;
  x = XEXP (x, 0);
}

(I didn't test this idea yet).

Uros.


Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-27 Thread Georg-Johann Lay
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02113.html

Weddington, Eric wrote:
> 
>> -Original Message-
>> From: Georg-Johann Lay
>>
>> This means that a pure __mulsi3 will have 30+30+20 = 80 bytes (+18).
>>
>> If all functions are used they occupy 116 bytes (-4), so they actually
>> save a little space if they are used all with the benefit that they also
>> can one-extend, extend 32 = 16*32 as well as 32=16*16 and work for
>> small (17 bit signed) constants.
>>
>> __umulhisi3 reads:
>>
>> DEFUN __umulhisi3
>> mul A0, B0
>> movwC0, r0
>> mul A1, B1
>> movwC2, r0
>> mul A0, B1
>> add C1, r0
>> adc C2, r1
>> clr __zero_reg__
>> adc C3, __zero_reg__
>> mul A1, B0
>> add C1, r0
>> adc C2, r1
>> clr __zero_reg__
>> adc C3, __zero_reg__
>> ret
>> ENDF __umulhisi3
>>
>> It could be compressed to the following sequence, i.e.
>> 24 bytes instead of 30, but I think that's too much of
>> quenching the last byte out of the code:
>>
>> DEFUN __umulhisi3
>> mul A0, B0
>> movwC0, r0
>> mul A1, B1
>> movwC2, r0
>> mul A0, B1
>> rcall   1f
>> mul A1, B0
>> 1:  add C1, r0
>> adc C2, r1
>> clr __zero_reg__
>> adc C3, __zero_reg__
>> ret
>> ENDF __umulhisi3
>>
>>
>> In that lack of real-world-code that uses 32-bit arithmetic I trust
>> my intuition that code size will decrease in general ;-)
>>
> 
> Hi Johann,
> 
> I would agree with you that it seems that overall code size will decrease in 
> general.
> 
> However, I also like your creative compression in the second sequence above, 
> and I think that it would be best to implement that sequence and try to find 
> others like that where possible.
> 
> Remember that to AVR users, code size is *everything*. Even saving 6 bytes 
> here or there has a positive effect.
> 
> I'll let Richard (or Denis if he's back from vacation) do the actual approval 
> of the patch, as they are a lot more technically competent in this area. But 
> I'm ok with the general tactic of the code reuse with looking at further ways 
> to reduce code size like the example above.
> 
> Eric Weddington


This is a revised patch for review with the changes proposed by Eric,
i.e. __umulhisi3 is calling it's own tail.

A pure __mulsi3 will now cost 30+24+20 = 74 bytes (+12).

Using all functions will cost 110 bytes (-10).

__mulsi3 missed a final ENDF __mulsi3, I added it.

The rest of the patch is just technical:

* postponing emit of implicit library call from expand to split1,
  i.e. after combiner but prior to reload, of course.

* The patch covers QI->SI extensions where such extensions are
  done in two steps:  First an explicit QI-HI extension expanded
  inline and second the implicit HI->SI extension as by, e.g.
  __muluhisi3 (32 = 16 * 32)

* There is a bunch of possible HI/QI combinations. This is done
  with help of code iterators; the cross product covers all 16
  cases of QI->SI resp. HI->SI as signed resp. unsigned extension
  for operand1 resp. operand2.

* extendhisi2 need not to early-clobber the output because HI will
  always start in even register.

Tested without regressions.

Ok to install?

Johann

PR target/49687
* config/avr/t-avr (LIB1ASMFUNCS): Remove _xmulhisi3_exit.
Add _muluhisi3, _mulshisi3, _usmulhisi3.
* config/avr/libgcc.S (__mulsi3): Rewrite.
(__mulhisi3): Rewrite.
(__umulhisi3): Rewrite.
(__usmulhisi3): New.
(__muluhisi3): New.
(__mulshisi3): New.
(__mulohisi3): New.
(__mulqi3, __mulqihi3, __umulqihi3, __mulhi3): Use DEFUN/ENDF to
declare.
* config/avr/predicates.md (pseudo_register_operand): Rewrite.
(pseudo_register_or_const_int_operand): New.
(combine_pseudo_register_operand): New.
(u16_operand): New.
(s16_operand): New.
(o16_operand): New.
* config/avr/avr.c (avr_rtx_costs): Handle costs for mult:SI.
* config/avr/avr.md (QIHI, QIHI2): New mode iterators.
(any_extend, any_extend2): New code iterators.
(extend_prefix): New code attribute.
(mulsi3): Rewrite. Turn insn to expander.
(mulhisi3): Ditto.
(umulhisi3): Ditto.
(usmulhisi3): New expander.
(*mulsi3): New insn-and-split.
(mulusi3): New insn-and-split.
(mulssi3): New insn-and-split.
(mulohisi3): New insn-and-split.
(*uumulqihisi3, *uumulhiqisi3, *uumulhihisi3, *uumulqiqisi3,
*usmulqihisi3, *usmulhiqisi3, *usmulhihisi3, *usmulqiqisi3,
*sumulqihisi3, *sumulhiqisi3, *sumulhihisi3, *sumulqiqisi3,
*ssmulqihisi3, *ssmulhiqisi3, *ssmulhihisi3, *ssmulqiqisi3): New
insn-and-split.
(*mulsi3_call): Rewrite.
(*mulhisi3_call): Rewrite.
(*umulhisi3_call): Rewrite.
(*usmulhisi3_call): New insn.
(*muluhisi3_call): N

Re: [PATCH, PR 49094] Refrain from creating misaligned accesses in SRA

2011-07-27 Thread Ulrich Weigand
Martin Jambor wrote:
> On Wed, Jul 27, 2011 at 02:34:59PM +0200, Ulrich Weigand wrote:
> > I'm seeing the same failure on the 4.6 branch -- would this patch also be
> > appropriate there?
> 
> You're right, it should be applied to the 4.6 branch too.  Since you
> have the setup to thest it, can you do it please?  Otherwise I'll do
> it in a few days.

Sure; I've verified that the patch does fix the test case regression
on the branch as well.  A full regression test is still running ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: PATCH: PR target/49860: [x32] Error: cannot represent relocation type BFD_RELOC_64 in x32 mode

2011-07-27 Thread Uros Bizjak
On Wed, Jul 27, 2011 at 3:00 PM, H.J. Lu  wrote:

>> Pmode is still in DImode and DImode addresses are *valid* addresses.
>> For the testcase from PR,
>> expand generates SImode symbol that is later extended to DImode and
>> handled through movabs.
>>
>> Your patch just papers over this fact. Please see how
>> *movdi_internal_rex64 handles immediates.
>>
>
> For the testcase in:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49860
>
> my goal is  to make TARGET_X32 to generate code very similar to
> TARGET_32BIT, except for r8-r15.  How can we achieve that?

Unless you can prevent DImode symbols in ix86_legitimate_constant_p, we can't.

Uros.


Re: PATCH: PR target/49860: [x32] Error: cannot represent relocation type BFD_RELOC_64 in x32 mode

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 5:53 AM, Uros Bizjak  wrote:
> On Wed, Jul 27, 2011 at 2:44 PM, H.J. Lu  wrote:
>> On Wed, Jul 27, 2011 at 1:06 AM, Uros Bizjak  wrote:
>>> On Wed, Jul 27, 2011 at 6:31 AM, H.J. Lu  wrote:
>>>
 The offsetted memory references always work for x32.  OK for trunk?
>>>
>>> No, this is the same issue as in [1]. Please fix the assembler to
>>> zero-extend this relocation.
>>>
>>
>> It is about address range.  The  offsetted memory references are for x32
>> since they are OK for TARGET_32BIT.  There is no difference between
>> TARGET_X32 and TARGET_32BIT on this.
>
> Pmode is still in DImode and DImode addresses are *valid* addresses.
> For the testcase from PR,
> expand generates SImode symbol that is later extended to DImode and
> handled through movabs.
>
> Your patch just papers over this fact. Please see how
> *movdi_internal_rex64 handles immediates.
>

For the testcase in:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49860

my goal is  to make TARGET_X32 to generate code very similar to
TARGET_32BIT, except for r8-r15.  How can we achieve that?

Thanks.

-- 
H.J.


Re: [cxx-mem-model] __sync_mem builtin support patch 1/3 - documentation

2011-07-27 Thread Andrew MacLeod

On 07/27/2011 04:42 AM, Torvald Riegel wrote:

On Tue, 2011-07-26 at 21:20 -0400, Andrew MacLeod wrote:

This patch is simply the documentation for extend.texi which adds a
section about the new memory model __sync_mem routines.  I've supplied
the .info output since its easier to read, followed by the patch

OK for the branch?

I think that __SYNC_MEM_ACQUIRE does not synchronizes with all stores
but only with stores that have release semantics or stronger. Likewise
for loads / acquire semantics.


True, I'll make that modification.

Thanks
Andrew


Re: PATCH: PR target/49860: [x32] Error: cannot represent relocation type BFD_RELOC_64 in x32 mode

2011-07-27 Thread Uros Bizjak
On Wed, Jul 27, 2011 at 2:44 PM, H.J. Lu  wrote:
> On Wed, Jul 27, 2011 at 1:06 AM, Uros Bizjak  wrote:
>> On Wed, Jul 27, 2011 at 6:31 AM, H.J. Lu  wrote:
>>
>>> The offsetted memory references always work for x32.  OK for trunk?
>>
>> No, this is the same issue as in [1]. Please fix the assembler to
>> zero-extend this relocation.
>>
>
> It is about address range.  The  offsetted memory references are for x32
> since they are OK for TARGET_32BIT.  There is no difference between
> TARGET_X32 and TARGET_32BIT on this.

Pmode is still in DImode and DImode addresses are *valid* addresses.
For the testcase from PR,
expand generates SImode symbol that is later extended to DImode and
handled through movabs.

Your patch just papers over this fact. Please see how
*movdi_internal_rex64 handles immediates.

Uros.


Re: [PATCH, PR 49094] Refrain from creating misaligned accesses in SRA

2011-07-27 Thread Martin Jambor
Hi,

On Wed, Jul 27, 2011 at 02:34:59PM +0200, Ulrich Weigand wrote:
> Martin Jambor wrote:
> 
> > OK, this is what I have just committed as revision 176797 after
> > re-testing.
> 
> Thanks, this has fixed the forwprop-5.c regression on spu-elf on mainline.
> 
> I'm seeing the same failure on the 4.6 branch -- would this patch also be
> appropriate there?
> 

You're right, it should be applied to the 4.6 branch too.  Since you
have the setup to thest it, can you do it please?  Otherwise I'll do
it in a few days.

Thanks,

Martin


Re: PATCH: PR target/49860: [x32] Error: cannot represent relocation type BFD_RELOC_64 in x32 mode

2011-07-27 Thread H.J. Lu
On Wed, Jul 27, 2011 at 1:06 AM, Uros Bizjak  wrote:
> On Wed, Jul 27, 2011 at 6:31 AM, H.J. Lu  wrote:
>
>> The offsetted memory references always work for x32.  OK for trunk?
>
> No, this is the same issue as in [1]. Please fix the assembler to
> zero-extend this relocation.
>

It is about address range.  The  offsetted memory references are for x32
since they are OK for TARGET_32BIT.  There is no difference between
TARGET_X32 and TARGET_32BIT on this.


-- 
H.J.


Re: [patch tree-optimization]: Remove dead-code from gimple-fold

2011-07-27 Thread Richard Guenther
On Wed, Jul 27, 2011 at 1:49 PM, Kai Tietz  wrote:
> Hello,
>
> this patch removes from gimple-fold the dead-code about 
> TRUTH_AND/TRUTH_OR-expression checks.
>
> ChangeLog
>
>        * gimple-fold.c (or_comparisons_1): Remove TRUTH_AND/OR
>        expression handling.
>        (and_var_with_comparison_1): Likewise.
>
> Bootstrapped and regression tested on host x86_64-pc-linux-gnu.  Ok for apply?

Ok.

Thanks,
Richard.

> Regards,
> Kai
>
> Index: gcc-head/gcc/gimple-fold.c
> ===
> --- gcc-head.orig/gcc/gimple-fold.c
> +++ gcc-head/gcc/gimple-fold.c
> @@ -1937,17 +1937,15 @@ and_var_with_comparison_1 (gimple stmt,
>
>   /* If the definition is an AND or OR expression, we may be able to
>      simplify by reassociating.  */
> -  if (innercode == TRUTH_AND_EXPR
> -      || innercode == TRUTH_OR_EXPR
> -      || (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
> -         && (innercode == BIT_AND_EXPR || innercode == BIT_IOR_EXPR)))
> +  if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
> +      && (innercode == BIT_AND_EXPR || innercode == BIT_IOR_EXPR))
>     {
>       tree inner1 = gimple_assign_rhs1 (stmt);
>       tree inner2 = gimple_assign_rhs2 (stmt);
>       gimple s;
>       tree t;
>       tree partial = NULL_TREE;
> -      bool is_and = (innercode == TRUTH_AND_EXPR || innercode == 
> BIT_AND_EXPR);
> +      bool is_and = (innercode == BIT_AND_EXPR);
>
>       /* Check for boolean identities that don't require recursive examination
>         of inner1/inner2:
> @@ -2069,6 +2067,7 @@ and_comparisons_1 (enum tree_code code1,
>   if (operand_equal_p (op1a, op2a, 0)
>       && operand_equal_p (op1b, op2b, 0))
>     {
> +      /* Result will be either NULL_TREE, or a combined comparison.  */
>       tree t = combine_comparisons (UNKNOWN_LOCATION,
>                                    TRUTH_ANDIF_EXPR, code1, code2,
>                                    boolean_type_node, op1a, op1b);
> @@ -2080,6 +2079,7 @@ and_comparisons_1 (enum tree_code code1,
>   if (operand_equal_p (op1a, op2b, 0)
>       && operand_equal_p (op1b, op2a, 0))
>     {
> +      /* Result will be either NULL_TREE, or a combined comparison.  */
>       tree t = combine_comparisons (UNKNOWN_LOCATION,
>                                    TRUTH_ANDIF_EXPR, code1,
>                                    swap_tree_comparison (code2),
> @@ -2398,17 +2398,15 @@ or_var_with_comparison_1 (gimple stmt,
>
>   /* If the definition is an AND or OR expression, we may be able to
>      simplify by reassociating.  */
> -  if (innercode == TRUTH_AND_EXPR
> -      || innercode == TRUTH_OR_EXPR
> -      || (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
> -         && (innercode == BIT_AND_EXPR || innercode == BIT_IOR_EXPR)))
> +  if (TREE_CODE (TREE_TYPE (var)) == BOOLEAN_TYPE
> +      && (innercode == BIT_AND_EXPR || innercode == BIT_IOR_EXPR))
>     {
>       tree inner1 = gimple_assign_rhs1 (stmt);
>       tree inner2 = gimple_assign_rhs2 (stmt);
>       gimple s;
>       tree t;
>       tree partial = NULL_TREE;
> -      bool is_or = (innercode == TRUTH_OR_EXPR || innercode == BIT_IOR_EXPR);
> +      bool is_or = (innercode == BIT_IOR_EXPR);
>
>       /* Check for boolean identities that don't require recursive examination
>         of inner1/inner2:
> @@ -2531,6 +2529,7 @@ or_comparisons_1 (enum tree_code code1,
>   if (operand_equal_p (op1a, op2a, 0)
>       && operand_equal_p (op1b, op2b, 0))
>     {
> +      /* Result will be either NULL_TREE, or a combined comparison.  */
>       tree t = combine_comparisons (UNKNOWN_LOCATION,
>                                    TRUTH_ORIF_EXPR, code1, code2,
>                                    boolean_type_node, op1a, op1b);
> @@ -2542,6 +2541,7 @@ or_comparisons_1 (enum tree_code code1,
>   if (operand_equal_p (op1a, op2b, 0)
>       && operand_equal_p (op1b, op2a, 0))
>     {
> +      /* Result will be either NULL_TREE, or a combined comparison.  */
>       tree t = combine_comparisons (UNKNOWN_LOCATION,
>                                    TRUTH_ORIF_EXPR, code1,
>                                    swap_tree_comparison (code2),
>


  1   2   >