Re: [PATCH] Fix breakage for m68k-linux introduced by 4a020a8 / r189359

2012-07-10 Thread Steven Bosscher
On Wed, Jul 11, 2012 at 7:49 AM, Jan-Benedict Glaw  wrote:
> Hi!
>
> Git revision 4a020a8 [aka. SVN 189359], the large header reordering patch,
> broke m68k-linux (.../configure --target=m68k-linux --prefix=...
> --enable-languages=c --disable-threads) for me:
>
> [...]
> gcc -c   -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -W -Wall -Wno-narrowing 
> -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes 
> -Wmissing-format-attribute -pedantic -Wno
> -long-long -Wno-variadic-macros -Wno-overlength-strings 
> -Wold-style-definition -Wc++-compat -fno-common  -DHAVE_CONFIG_H -I. -I. 
> -I../../../../gcc/gcc -I../../../../gcc/gcc/. 
> -I../../../../gcc/gcc/../include -I../../../../gcc/gcc/../libcpp/include  
> -I../../../../gcc/gcc/../libdecnumber 
> -I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber
> ../../../../gcc/gcc/resource.c -o resource.o
> ../../../../gcc/gcc/resource.c: In function ‘init_resource_info’:
> ../../../../gcc/gcc/resource.c:1179:5: error: ‘current_function_decl’ 
> undeclared (first use in this function)
> ../../../../gcc/gcc/resource.c:1179:5: note: each undeclared identifier is 
> reported only once for each function it appears in
> make[2]: *** [resource.o] Error 1
>
> I suggest the following patch, which is only compile-tested.
>
> MfG, JBG
>
>
> 2012-07-11  Jan-Benedict Glaw  
>
> * config/m68k/m68k.c (m68k_epilogue_uses): New.
> * config/m68k/m68k.h (m68k_epilogue_uses): Use new function
> instead of former macro.
> * config/m68k/m68k-protos.h (m68k_epilogue_uses): Declare.

Did your build already include
http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00378.html ?

Ciao!
Steven


[PATCH] Fix breakage for m68k-linux introduced by 4a020a8 / r189359

2012-07-10 Thread Jan-Benedict Glaw
Hi!

Git revision 4a020a8 [aka. SVN 189359], the large header reordering patch,
broke m68k-linux (.../configure --target=m68k-linux --prefix=...
--enable-languages=c --disable-threads) for me:

[...]
gcc -c   -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE  -W -Wall -Wno-narrowing 
-Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes 
-Wmissing-format-attribute -pedantic -Wno
-long-long -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition 
-Wc++-compat -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../../../gcc/gcc 
-I../../../../gcc/gcc/. -I../../../../gcc/gcc/../include 
-I../../../../gcc/gcc/../libcpp/include  -I../../../../gcc/gcc/../libdecnumber 
-I../../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber
../../../../gcc/gcc/resource.c -o resource.o
../../../../gcc/gcc/resource.c: In function ‘init_resource_info’:
../../../../gcc/gcc/resource.c:1179:5: error: ‘current_function_decl’ 
undeclared (first use in this function)
../../../../gcc/gcc/resource.c:1179:5: note: each undeclared identifier is 
reported only once for each function it appears in
make[2]: *** [resource.o] Error 1

I suggest the following patch, which is only compile-tested.

MfG, JBG


2012-07-11  Jan-Benedict Glaw  

* config/m68k/m68k.c (m68k_epilogue_uses): New.
* config/m68k/m68k.h (m68k_epilogue_uses): Use new function
instead of former macro.
* config/m68k/m68k-protos.h (m68k_epilogue_uses): Declare.


diff --git a/gcc/config/m68k/m68k-protos.h b/gcc/config/m68k/m68k-protos.h
index c779588..16ad157 100644
--- a/gcc/config/m68k/m68k-protos.h
+++ b/gcc/config/m68k/m68k-protos.h
@@ -99,3 +99,4 @@ extern void init_68881_table (void);
 extern rtx m68k_legitimize_call_address (rtx);
 extern rtx m68k_legitimize_sibcall_address (rtx);
 extern int m68k_hard_regno_rename_ok(unsigned int, unsigned int);
+extern bool m68k_epilogue_uses (unsigned int regno);
diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index df70560..d07ee18 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -6506,4 +6506,14 @@ m68k_init_sync_libfuncs (void)
   init_sync_libfuncs (UNITS_PER_WORD);
 }
 
+/* Implement EPILOGUE_USES.  */
+
+bool
+m68k_epilogue_uses (unsigned int regno ATTRIBUTE_UNUSED)
+{
+  return (reload_completed
+ && (m68k_get_function_kind (current_function_decl)
+ == m68k_fk_interrupt_handler));
+}
+
 #include "gt-m68k.h"
diff --git a/gcc/config/m68k/m68k.h b/gcc/config/m68k/m68k.h
index b8d8d9c..4c3bb1b 100644
--- a/gcc/config/m68k/m68k.h
+++ b/gcc/config/m68k/m68k.h
@@ -803,10 +803,7 @@ do { if (cc_prev_status.flags & CC_IN_68881)   
\
 #define INCOMING_FRAME_SP_OFFSET 4
 
 /* All registers are live on exit from an interrupt routine.  */
-#define EPILOGUE_USES(REGNO)   \
-  (reload_completed\
-   && (m68k_get_function_kind (current_function_decl)  \
-   == m68k_fk_interrupt_handler))
+#define EPILOGUE_USES(REGNO)   m68k_epilogue_uses (REGNO)
 
 /* Describe how we implement __builtin_eh_return.  */
 #define EH_RETURN_DATA_REGNO(N) \

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
 Signature of:Don't believe in miracles: Rely on them!
 the second  :


signature.asc
Description: Digital signature


Re: [gimplefe] Fixing the bug for gimple_assign statement with ternary operands

2012-07-10 Thread Sandeep Soni
On Fri, Jul 6, 2012 at 7:26 PM, Diego Novillo  wrote:
> On 12-07-06 00:38 , Sandeep Soni wrote:
>
>> I am halfway through the patch for building gimple_cond statements. I
>> will be able to complete the patch over the weekend. I am also working
>> towards a patch that generalizes the assignment statements considering
>> all possible types of assignments.
>
>
> Great!  Thanks.
>
>
>>
>> Tested on x86.
>>
>> ChangeLog as follows:
>>
>> 2012-06-06   Sandeep Soni  
>>
>> * parser.c (gp_parse_expect_rhs_op): Tidy. Returns the tree
>> operand in rhs.
>> (gp_parse_assign_stmt): Tidy. Creates the gimple assignment
>> statement .
>
>
> OK with a couple of minor nits.
>
>
>> -/* Return the string representation of token TOKEN.  */
>>
>> -static const char *
>> -gl_token_as_text (const gimple_token *token)
>> +/* Gets the tree node for the corresponding identifier ID  */
>
>
> Period at the end of the comment.
>
>
>> +
>> +static tree
>> +gimple_symtab_get (tree id)
>>   {
>> -  switch (token->type)
>> +  struct gimple_symtab_entry_def temp;
>> +  gimple_symtab_entry_t entry;
>> +  void **slot;
>> +
>> +  gimple_symtab_maybe_init_hash_table();
>
>
> Space before '('.
>
>
> Diego.

Done with both the changes.

-- 
Cheers
Sandy


[gimplefe] Construction of individual gimple statements for gimple_cond and gimple_label

2012-07-10 Thread Sandeep Soni
The patch adds support for creating individual gimple statements for
the gimple_cond and gimple_label statements.

Diego, I need your help in generalizing to include all possible cases
of these statements.

Here is the ChangeLog

2012-07-10   Sandeep Soni 

* parser.c (gp_parse_expect_op1): Tidy. Returns tree operand.
Update all callers.
(gp_parse_expect_op2): Likewise.
(gp_parse_expect_true_label): Tidy. Returns a label.
Update all callers.
(gp_parse_expect_false_label): Likewise.
(gp_parse_cond_stmt): Tidy. Creates and returns a gimple cond
statement.
(gp_parse_label_stmt): Creates and returns the gimple label statement.


And the patch
Index: gcc/gimple/parser.c
===
--- gcc/gimple/parser.c (revision 188546)
+++ gcc/gimple/parser.c (working copy)

-static void
+static tree
 gp_parse_expect_op1 (gimple_parser *parser)
 {
   const gimple_token *next_token;
   next_token = gl_consume_token (parser->lexer);
+  tree op1 = NULL_TREE;

   switch (next_token->type)
 {
 case CPP_NAME:
+  op1 = gimple_symtab_get_token (next_token);
+  break;
+
 case CPP_NUMBER:
   break;

@@ -476,20 +529,24 @@
 }

   gl_consume_expected_token (parser->lexer, CPP_COMMA);
+  return op1;
 }

 /* Helper for gp_parse_cond_stmt. The token read from reader PARSER should
be the second operand in the tuple.  */

-static void
+static tree
 gp_parse_expect_op2 (gimple_parser *parser)
 {
   const gimple_token *next_token;
   next_token = gl_consume_token (parser->lexer);
-
+  tree op2 = NULL_TREE;
   switch (next_token->type)
 {
 case CPP_NAME:
+  op2 = gimple_symtab_get_token (next_token);
+  break;
+
 case CPP_NUMBER:
 case CPP_STRING:
   break;
@@ -503,50 +560,55 @@
   break;
 }

-  gl_consume_expected_token (parser->lexer, CPP_COMMA);
+  gl_consume_expected_token (parser->lexer, CPP_COMMA);
+  return op2;
 }

 /* Helper for gp_parse_cond_stmt. The token read from reader PARSER should
be the true label in the tuple that means the label where the control
jumps if the condition evaluates to true.  */

-static void
+static tree
 gp_parse_expect_true_label (gimple_parser *parser)
 {
   gl_consume_expected_token (parser->lexer, CPP_LESS);
   gl_consume_expected_token (parser->lexer, CPP_NAME);
   gl_consume_expected_token (parser->lexer, CPP_GREATER);
   gl_consume_expected_token (parser->lexer, CPP_COMMA);
+  return create_artificial_label (UNKNOWN_LOCATION);
 }

 /* Helper for gp_parse_cond_stmt. The token read from reader PARSER should
be the false label in the tuple that means the label where the control
jumps if the condition evaluates to false.  */

-static void
+static tree
 gp_parse_expect_false_label (gimple_parser *parser)
 {
   gl_consume_expected_token (parser->lexer, CPP_LESS);
   gl_consume_expected_token (parser->lexer, CPP_NAME);
   gl_consume_expected_token (parser->lexer, CPP_GREATER);
   gl_consume_expected_token (parser->lexer, CPP_GREATER);
+  return create_artificial_label (UNKNOWN_LOCATION);
 }

 /* Parse a gimple_cond tuple that is read from the reader PARSER. For
now we only recognize the tuple. Refer gimple.def for the format of
this tuple.  */

-static void
+static gimple
 gp_parse_cond_stmt (gimple_parser *parser)
 {
   gimple_token *optoken;
   enum tree_code opcode = gp_parse_expect_subcode (parser, &optoken);
   if (get_gimple_rhs_class (opcode) != GIMPLE_BINARY_RHS)
 error_at (optoken->location, "Unsupported gimple_cond expression");
-  gp_parse_expect_op1 (parser);
-  gp_parse_expect_op2 (parser);
-  gp_parse_expect_true_label (parser);
-  gp_parse_expect_false_label (parser);
+  tree op1 = gp_parse_expect_op1 (parser);
+  tree op2 = gp_parse_expect_op2 (parser);
+  tree true_label = gp_parse_expect_true_label (parser);
+  tree false_label = gp_parse_expect_false_label (parser);
+  gimple cond_stmt = gimple_build_cond (opcode, op1, op2, true_label,
false_label);
+  return cond_stmt;
 }

 /* Parse a gimple_goto tuple that is read from the reader PARSER. For
@@ -567,14 +629,18 @@
now we only recognize the tuple. Refer gimple.def for the format of
this tuple.  */

-static void
+static gimple
 gp_parse_label_stmt (gimple_parser *parser)
 {
   gl_consume_expected_token (parser->lexer, CPP_LESS);
   gl_consume_expected_token (parser->lexer, CPP_LESS);
-  gl_consume_expected_token (parser->lexer, CPP_NAME);
+  gimple_token *token = gl_consume_token (parser->lexer);
   gl_consume_expected_token (parser->lexer, CPP_GREATER);
-  gl_consume_expected_token (parser->lexer, CPP_GREATER);
+  gl_consume_expected_token (parser->lexer, CPP_GREATER);
+
+  tree label = create_artificial_label (token->location);
+  gimple stmt = gimple_build_label (label);
+  return stmt;
 }

 /* Parse a gimple_switch tuple that is read from the reader PARSER.


-- 
Cheers
Sandy


Re: G++ namespace association extension

2012-07-10 Thread Jason Merrill

OK.

Jason


Re: [RFT] Remove -fno-tree-dominator-opts from libgcc/config/t-darwin

2012-07-10 Thread Iain Sandoe
Hi Steven,

On 9 Jul 2012, at 09:21, Iain Sandoe wrote:
> On 9 Jul 2012, at 09:11, Iain Sandoe wrote:
>>> crt3.o: $(srcdir)/config/darwin-crt3.c
> 
>> regstrapped (all+ada+objc++) on i686-darwin9 with no regressions.
> 
> .. but, now I re-check, crt3 is only used on Darwin 8 and earlier; 
> That will take somewhat longer to check, machine availability is an issue  - 
> I'll see if I can run up a i686 or ppc darwin8 box.

Looking at the original PR it seems that the problem was fixed on the same day 
that the work-around (which your patch removes) was checked in.

I've only done c/c++, should be enough for this. 

Anyway, although i686-Darwin 8 is sadly in need of some TLC, the proposed patch 
causes no regressions.
ppc-darwin 8 tests are still running, but it bootstrapped (500M G4, > 24hrs for 
c/c++ build & test).

Iain.



Re: [SH] PR 53911 - Remove SImode displacement addressing related splits

2012-07-10 Thread Kaz Kojima
Oleg Endo  wrote:
> The attached patch removes two splits that undo displacement address
> re-basing.  I've noticed that removing the two splits seems to result in
> overall slightly smaller code according to the CSiBE set (compared with
> -m4-single -ml -O2 -mpretend-cmove, -1048 bytes in total), despite some
> code size increases here and there. 

This patch is OK.

Regards,
kaz


Re: G++ namespace association extension

2012-07-10 Thread Jonathan Wakely
On 9 July 2012 14:18, Jason Merrill wrote:
> On 07/09/2012 01:26 PM, Jonathan Wakely wrote:
>>
>> http://gcc.gnu.org/onlinedocs/gcc/Namespace-Association.html says:
>>
>> "Caution: The semantics of this extension are not fully defined. Users
>> should refrain from using this extension as its semantics may change
>> subtly over time. It is possible that this extension will be removed
>> in future versions of G++. "
>>
>> Is it safe to assume that the semantics are now fixed to match those
>> of C++11 inline namespaces and will not change unless removed?
>
>
> Yes, but people should use inline namespaces instead; we should deprecate
> this form and then remove it in 4.9.

* doc/extend.texi (Namespace Association): Alter cautionary text.

How's this, OK for trunk?
commit d6a414f6ebcd96645a1a6612e324eafee24b39e9
Author: Jonathan Wakely 
Date:   Tue Jul 10 21:21:09 2012 +0100

* doc/extend.texi (Namespace Association): Alter cautionary text.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 91e7385..c3faf09 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -15527,10 +15527,9 @@ See also @ref{Namespace Association}.
 @node Namespace Association
 @section Namespace Association
 
-@strong{Caution:} The semantics of this extension are not fully
-defined.  Users should refrain from using this extension as its
-semantics may change subtly over time.  It is possible that this
-extension will be removed in future versions of G++.
+@strong{Caution:} The semantics of this extension are equivalent
+to C++ 2011 inline namespaces.  Users should use inline namespaces
+instead as this extension will be removed in future versions of G++.
 
 A using-directive with @code{__attribute ((strong))} is stronger
 than a normal using-directive in two ways:


Re: [C++ Pubnames Patch] Anonymous namespaces enclosed in named namespaces. (issue6343052)

2012-07-10 Thread Sterling Augustine
On Tue, Jul 10, 2012 at 6:51 AM, Dominique Dhumieres  wrote:
> Hi Sterling,
>
> On x86_64-apple-darwin10 the test fails with
>
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler 
> .section\t.debug_pubnames
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler 
> "_GLOBAL__sub_I__ZN3one3c1vE0"+[ \t]+[#;]+[ \t]+external name
> FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler 
> .section\t.debug_pubtypes
>
> (note that looking at the test I was expecting much more failures;-).
> Grepping the assembly for 'debug_pubnames' and 'ZN3one3c1vE' returns
>
>         .section __DWARF,__debug_pubnames,regular,debug
> Lsection__debug_pubnames:
>
> and
>
>         .globl __ZN3one3c1vE
> __ZN3one3c1vE:
>         leal    __ZN3one3c1vE-L3$pb(%ebx), %eax
>         leal    __ZN3one3c1vE-L8$pb(%ebx), %eax
>         leal    __ZN3one3c1vE-L8$pb(%ebx), %eax
>         .ascii "_ZN3one3c1vE\0" # DW_AT_linkage_name
>         .long   __ZN3one3c1vE
>
> So fixing the debug_pubnames failures for Darwin is fairly easy,
> but I am not sure what to pick for ZN3one3c1vE.

Hmm. None of those from the grep appear to be inside the pubnames section.

The particular pubname here is a global constructor, and Darwin
(coff?) must handle those differently than Elf. I think it is OK to
either just delete the offending check, or to conditionalize it on
Elf--not sure how to do that.

Sterling


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-07-10 Thread Xinliang David Li
On Tue, Jul 10, 2012 at 2:46 AM, Jason Merrill  wrote:
> On 07/09/2012 11:27 PM, Xinliang David Li wrote:
>>
>> Ok.  Do you have specific comments on the patch?
>
>
> My comment is "Perhaps we want to implement this using a more generic
> mechanism."  I was thinking to defer a detailed code review until that
> question is settled.

We all like more generic solutions :)


Sri, can you provide more descriptions on FE changes -- this will help
reviewers get started.

By the way, there are a couple of files with bad contents and needs
re-upload -- e.g, cp/decl.c.

thanks,

David

>
> Jason


[PATCH] [LM32] Fix lm32-elf-gcc build error by remove unnecessary constant legitimate check.

2012-07-10 Thread Jia Liu
Hi all,

When I build lm32-elf-gcc, it failed at libgcc configure due to
lm32-elf-cc1 segment fault when compile conftest.c:

void bar ();
void clean (int *);
void foo ()
{
  int i __attribute__ ((cleanup (clean)));
  bar();
}

Then I find lm32_legitimate_constant_p return false too much times, it
shouldn't like this, I think.

And I find the movsi pattern has handle the pic and reloc_operand, but
lm32_legitimate_constant_p
handle them again, so, I think maybe it is unnecessary.

When I remove the unnecessary constant legitimate check, lm32-elf-gcc
is built OK.
And I made a patch for this. Please review.

---
 gcc/config/lm32/lm32.c |4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/config/lm32/lm32.c b/gcc/config/lm32/lm32.c
index 376df05..47024ff 100644
--- a/gcc/config/lm32/lm32.c
+++ b/gcc/config/lm32/lm32.c
@@ -1236,9 +1236,5 @@ lm32_move_ok (enum machine_mode mode, rtx operands[2]) {
 static bool
 lm32_legitimate_constant_p (enum machine_mode mode, rtx x)
 {
-  /* 32-bit addresses require multiple instructions.  */
-  if (!flag_pic && reloc_operand (x, mode))
-return false;
-
   return true;
 }
--

ChangeLog

2012-07-10  Jia Liu  

gcc/
* config/lm32/lm32.c (lm32_legitimate_constant_p): Remove
unnecessary constant legitimate check.

Regards,
Jia


0001-remove-the-unnecessary-constant-check.patch
Description: Binary data


Re: [C++ Pubnames Patch] Anonymous namespaces enclosed in named namespaces. (issue6343052)

2012-07-10 Thread Dominique Dhumieres
Hi Sterling,

On x86_64-apple-darwin10 the test fails with

FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler .section\t.debug_pubnames
FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler 
"_GLOBAL__sub_I__ZN3one3c1vE0"+[ \t]+[#;]+[ \t]+external name
FAIL: g++.dg/debug/dwarf2/pubnames-2.C scan-assembler .section\t.debug_pubtypes

(note that looking at the test I was expecting much more failures;-).
Grepping the assembly for 'debug_pubnames' and 'ZN3one3c1vE' returns

.section __DWARF,__debug_pubnames,regular,debug
Lsection__debug_pubnames:

and

.globl __ZN3one3c1vE
__ZN3one3c1vE:
leal__ZN3one3c1vE-L3$pb(%ebx), %eax
leal__ZN3one3c1vE-L8$pb(%ebx), %eax
leal__ZN3one3c1vE-L8$pb(%ebx), %eax
.ascii "_ZN3one3c1vE\0" # DW_AT_linkage_name
.long   __ZN3one3c1vE

So fixing the debug_pubnames failures for Darwin is fairly easy,
but I am not sure what to pick for ZN3one3c1vE.

Cheers,

Dominique


Re: [wwwdocs] Update coding conventions for C++

2012-07-10 Thread Gabriel Dos Reis
Jason Merrill  writes:

| On 07/09/2012 06:00 PM, Lawrence Crowl wrote:
| > Done.  New patch attached, but note that the  tags have been
| > stripped from the patch to avoid mailer problems.
| 
| Thanks.  If nobody else has any comments, I think this is good to go.

Great!

-- Gaby


[patch] fixes for power4 scheduler description

2012-07-10 Thread Steven Bosscher
Hello,

These look like typos:

* "power4-store-update" wants "iuX,iuY" for X=1|2 and Y=1|2. The
"iu2,iu1" case appeared twice.
* "power4-three" wants "iuX,iuX,iuY|iuX,iuY,iuY" for X=1|2 and Y=1|2.
The "iu1,iu1,iu2" case appeared twice.

Bootstrapped&tested on powerpc64-unknown-linux-gnu.
OK for trunk?

Note, it'd be nice if the size of the power4iu automaton could be
reduced somehow. It is by far the largest automaton in the rs6000 back
end, accounting for ~40% of the total.

Automaton `power4iu'
 8128 NDFA states,  68391 NDFA arcs
12609 DFA states,   104521 DFA arcs
10894 minimal DFA states,   89248 minimal DFA arcs
  683 all insns 23 insn equivalence classes
0 locked states
107203 transition comb vector els, 250562 trans table els: use simple vect
250562 min delay table els, compression factor 1

(All:)
124282 all allocated states, 493955 all allocated arcs
811237 all allocated alternative states
248236 all transition comb vector els, 613376 all trans table els
613376 all min delay table els
0 all locked states

For comparison, power7iu:

Automaton `power7iu'
 7697 NDFA states,  23034 NDFA arcs
 3738 DFA states,   10277 DFA arcs
 3690 minimal DFA states,   10056 minimal DFA arcs
  683 all insns  9 insn equivalence classes
0 locked states
10690 transition comb vector els, 33210 trans table els: use comb vect
33210 min delay table els, compression factor 1

I don't understand well enough how the scheduler descriptions are
translated to DFAs, so I don't really understand why the power4iu
automaton needs so many table elts, but the above seems
disproportional to me.

Ciao!
Steven

* config/rs6000/power4.md (power4-store-update): Fix reservation.
(power4-three): Likewise.

Index: config/rs6000/power4.md
===
--- config/rs6000/power4.md (revision 189388)
+++ config/rs6000/power4.md (working copy)
@@ -145,7 +145,7 @@
 |(du3_power4+du4_power4,lsu2_power4))+\
((nothing,iu2_power4,iu1_power4)\
 |(nothing,iu2_power4,iu2_power4)\
-|(nothing,iu1_power4,iu2_power4)\
+|(nothing,iu1_power4,iu1_power4)\
 |(nothing,iu1_power4,iu2_power4))")

 (define_insn_reservation "power4-store-update-indexed" 12
@@ -212,7 +212,7 @@
((iu1_power4,nothing,iu2_power4,nothing,iu2_power4)\
 |(iu2_power4,nothing,iu2_power4,nothing,iu1_power4)\
 |(iu2_power4,nothing,iu1_power4,nothing,iu1_power4)\
-|(iu1_power4,nothing,iu2_power4,nothing,iu2_power4))")
+|(iu1_power4,nothing,iu1_power4,nothing,iu2_power4))")

 (define_insn_reservation "power4-insert" 4
   (and (eq_attr "type" "insert_word")


[Patch, ARM] Fix PR53859: ICE on armv7e-m

2012-07-10 Thread Greta Yorsh
New RTL patterns generated for epilogues with RETURN (trunk r188742) are not
recognized by the pattern matching code in arm_early_load_addr_dep, which is
used for insn latency calculation when tuning for cortex-m4. It causes an
ICE when tuning for armv7e-m or cortex-m4:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53859.

The obvious fix is to detect RETURN pattern in arm_early_load_addr_dep. 

No regression on qemu.

Ok for trunk?

Thanks,
Greta

ChangeLog

2012-07-10  Greta Yorsh  

gcc/
PR target/53859
* config/arm/arm.c (arm_early_load_addr_dep): Handle new
epilogue patterns.

gcc/testsuite

PR target/53859
* gcc.target/arm/pr53859.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index a385e30..4a71a14 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24038,7 +24038,12 @@ arm_early_load_addr_dep (rtx producer, rtx consumer)
   if (GET_CODE (addr) == COND_EXEC)
 addr = COND_EXEC_CODE (addr);
   if (GET_CODE (addr) == PARALLEL)
-addr = XVECEXP (addr, 0, 0);
+{
+  if (GET_CODE (XVECEXP (addr, 0, 0)) == RETURN)
+addr = XVECEXP (addr, 0, 1);
+  else
+addr = XVECEXP (addr, 0, 0);
+}
   addr = XEXP (addr, 1);
 
   return reg_overlap_mentioned_p (value, addr);
diff --git a/gcc/testsuite/gcc.target/arm/pr53859.c 
b/gcc/testsuite/gcc.target/arm/pr53859.c
new file mode 100644
index 000..e4e9380
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr53859.c
@@ -0,0 +1,11 @@
+/* PR target/53859 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options "-mcpu=cortex-m4 -mthumb -O2" } */
+
+void bar (int,int,char* ,int);
+
+void foo (char c)
+{
+bar (1,2,&c,3);
+}


Re: [wwwdocs] Update coding conventions for C++

2012-07-10 Thread Jason Merrill

On 07/09/2012 06:00 PM, Lawrence Crowl wrote:

Done.  New patch attached, but note that the  tags have been
stripped from the patch to avoid mailer problems.


Thanks.  If nobody else has any comments, I think this is good to go.

Jason



PR bootstrap/53913

2012-07-10 Thread Andreas Schwab
To avoid having to include tree.h in resource.c move the body of
EPILOGUE_USES to m68k.c.  Tested on m68k-linux.

Andreas.

PR bootstrap/53913
* config/m68k/m68k.c (m68k_epilogue_uses): New.
* config/m68k/m68k.h (EPILOGUE_USES): Use it.
* config/m68k/m68k-protos.h (m68k_epilogue_uses): Add prototype.

diff --git a/gcc/config/m68k/m68k-protos.h b/gcc/config/m68k/m68k-protos.h
index c779588..a6b5dee 100644
--- a/gcc/config/m68k/m68k-protos.h
+++ b/gcc/config/m68k/m68k-protos.h
@@ -1,5 +1,5 @@
 /* Definitions of target machine for GNU compiler.  Sun 68000/68020 version.
-   Copyright (C) 2000, 2002, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2000, 2002, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2012
Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -68,6 +68,7 @@ extern int emit_move_sequence (rtx *, enum machine_mode, rtx);
 extern bool m68k_movem_pattern_p (rtx, rtx, HOST_WIDE_INT, bool);
 extern const char *m68k_output_movem (rtx *, rtx, HOST_WIDE_INT, bool);
 extern void m68k_final_prescan_insn (rtx, rtx *, int);
+extern bool m68k_epilogue_uses (int);
 
 /* Functions from m68k.c used in constraints.md.  */
 extern rtx m68k_unwrap_symbol (rtx, bool);
diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index df70560..0e55e1c 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -6506,4 +6506,14 @@ m68k_init_sync_libfuncs (void)
   init_sync_libfuncs (UNITS_PER_WORD);
 }
 
+/* Implements EPILOGUE_USES.  All registers are live on exit from an
+   interrupt routine.  */
+bool
+m68k_epilogue_uses (int regno ATTRIBUTE_UNUSED)
+{
+  return (reload_completed
+ && (m68k_get_function_kind (current_function_decl)
+ == m68k_fk_interrupt_handler));
+}
+
 #include "gt-m68k.h"
diff --git a/gcc/config/m68k/m68k.h b/gcc/config/m68k/m68k.h
index b8d8d9c..8be0879 100644
--- a/gcc/config/m68k/m68k.h
+++ b/gcc/config/m68k/m68k.h
@@ -802,11 +802,7 @@ do { if (cc_prev_status.flags & CC_IN_68881)   
\
 /* Before the prologue, the top of the frame is at 4(%sp).  */
 #define INCOMING_FRAME_SP_OFFSET 4
 
-/* All registers are live on exit from an interrupt routine.  */
-#define EPILOGUE_USES(REGNO)   \
-  (reload_completed\
-   && (m68k_get_function_kind (current_function_decl)  \
-   == m68k_fk_interrupt_handler))
+#define EPILOGUE_USES(REGNO) m68k_epilogue_uses (REGNO)
 
 /* Describe how we implement __builtin_eh_return.  */
 #define EH_RETURN_DATA_REGNO(N) \
-- 
1.7.11.1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-07-10 Thread Jason Merrill

On 07/09/2012 11:27 PM, Xinliang David Li wrote:

Ok.  Do you have specific comments on the patch?


My comment is "Perhaps we want to implement this using a more generic 
mechanism."  I was thinking to defer a detailed code review until that 
question is settled.


Jason


Re: [patch][i386] Remove some dead code (TARGET_BRANCH_PREDICTION_HINTS)

2012-07-10 Thread H.J. Lu
On Tue, Jul 10, 2012 at 1:57 AM, Uros Bizjak  wrote:
> On Tue, Jul 10, 2012 at 9:04 AM, Steven Bosscher  
> wrote:
 TARGET_BRANCH_PREDICTION_HINTS isn't used at all. This patch removes it.
 Bootstrapped&tested (incl. -m32) on x86_64-unknown-linux-gnu. OK for trunk?
>>>
>>> This infrastructure can be used for future targets, so let's leave it as is.
>>
>> Yes, I suppose that's theoretically possible. However, I don't think
>> this is very likely to happen. The branch hints only ever were
>> supported for P4, but even Intel's own ICC never emits them
>> (http://sources.redhat.com/ml/binutils/2004-07/msg00322.html). In
>> fact, the whole Netburst microachitecture appears to have been
>> abandoned (in favor of the ppro microachitecture).  For EM64T another
>> form of branch hints was introduced (which GCC doesn't support and ICC
>> doesn't emit).
>>
>> So any future target that would need this, would have to be a 32-bits,
>> deep-pipeline microarchitecture. I personally don't believe that such
>> an architecture will emerge, given the history of failure of this
>> concept.
>>
>> Are these arguments reason enough for you to reconsider? :-)
>
> Arguments are indeed good, but I'd like to hear HJ's opinion about
> future targets.
>
> Anyway, this functionality can be reimplemented from the scratch for
> future target, but OTOH, it is not a mainenance burden at all...
>

I'd like to keep it since we are planning to update and use
it.

Thanks.


-- 
H.J.


Re: [PATCH 0/7] Clean up widen mult even/odd

2012-07-10 Thread Jakub Jelinek
On Tue, Jul 10, 2012 at 10:22:44AM +0200, Richard Henderson wrote:
> I've not touched the interface to supportable_widening_operation,
> which is still prepared to return a CALL_EXPR and some decls.  After
> this patch set it will never do so.  I'm undecided as to whether we
> ought to be prepared for such in the future, or whether this should
> simply go in as a completely separate patch that could in the future
> be easily reverted.

I think it would be nice to remove the support for widening operation
calls as a follow-up, if we ever need it in the future, we can restore
it from svn and it will simplify the callers that already handle way too
many different cases.

Thanks for working on this.

Jakub


Re: [PATCH 0/7] Clean up widen mult even/odd

2012-07-10 Thread Richard Guenther
On Tue, Jul 10, 2012 at 10:22 AM, Richard Henderson  wrote:
> I find it instructive that 4 of the 5 isas that actually implement
> widening integer multiplication do have mult-widen-even as the isa
> primitive (even if the -odd variant is missing).  The fact that this
> operation is implemented as a set of builtins and target hooks has
> lead to disturbingly cookie-cutter implementations of these hooks
> in the various backends.
>
> Thus I choose to add VEC_WIDEN_MULT_EVEN/ODD_EXPR as tree codes and
> optabs.  This removes a farily trivial amount of code from three
> backends (the fourth backend, ia64, never grew this support).
>
> The existance of optabs then allows the expansion of MULT_HIGHPART_EXPR
> at the rtl-expansion level without having to resort to builtin expansion
> in order to emit the even/odd alternative.  This saves a fairly
> substantial amount of code from the vectorizer.
>
> I've not touched the interface to supportable_widening_operation,
> which is still prepared to return a CALL_EXPR and some decls.  After
> this patch set it will never do so.  I'm undecided as to whether we
> ought to be prepared for such in the future, or whether this should
> simply go in as a completely separate patch that could in the future
> be easily reverted.
>
> Tested on x86_64; cross-compiled to ppc64 and spu, spot checking the
> relevant division-by-constant testcases.

Thanks,

This looks all good!
Richard.

>
> r~
>
>
>
> Richard Henderson (7):
>   Add VEC_WIDEN_MULT_EVEN/ODD_EXPR
>   i386: Rename patterns for vec_widen_mult_even/odd_
>   rs6000: Rename patterns for vec_widen_mult_even/odd_
>   spu: Rename patterns for vec_widen_mult_even/odd_
>   Move vector highpart emulation to the optabs layer
>   Use VEC_WIDEN_MULT_EVEN/ODD_EXPR in supportable_widening_operation
>   Zap now unused builtin_mul_widen_even/odd target hooks
>
>  gcc/ChangeLog|   89 
>  gcc/cfgexpand.c  |4 +-
>  gcc/config/i386/i386.c   |  103 ++---
>  gcc/config/i386/sse.md   |   18 +--
>  gcc/config/rs6000/altivec.md |   54 +++
>  gcc/config/rs6000/rs6000-builtin.def |   24 +--
>  gcc/config/rs6000/rs6000.c   |   51 ---
>  gcc/config/spu/spu-builtins.def  |   24 +--
>  gcc/config/spu/spu-builtins.md   |   65 -
>  gcc/config/spu/spu.c |   42 --
>  gcc/config/spu/spu.md|   86 +--
>  gcc/doc/md.texi  |   12 +-
>  gcc/doc/tm.texi  |   22 ---
>  gcc/doc/tm.texi.in   |   22 ---
>  gcc/expmed.c |   32 ++--
>  gcc/expr.c   |   35 ++---
>  gcc/fold-const.c |   36 +++--
>  gcc/genopinit.c  |4 +
>  gcc/gimple-pretty-print.c|2 +
>  gcc/optabs.c |  134 +
>  gcc/optabs.h |   18 ++-
>  gcc/system.h |4 +-
>  gcc/target.def   |   14 --
>  gcc/tree-cfg.c   |2 +
>  gcc/tree-inline.c|2 +
>  gcc/tree-pretty-print.c  |   32 ++--
>  gcc/tree-vect-generic.c  |  145 +-
>  gcc/tree-vect-patterns.c |   23 +--
>  gcc/tree-vect-stmts.c|  267 
> +-
>  gcc/tree.c   |2 +
>  gcc/tree.def |4 +
>  31 files changed, 580 insertions(+), 792 deletions(-)
>
> --
> 1.7.10.4
>


Re: [patch][i386] Remove some dead code (TARGET_BRANCH_PREDICTION_HINTS)

2012-07-10 Thread Uros Bizjak
On Tue, Jul 10, 2012 at 9:04 AM, Steven Bosscher  wrote:
>>> TARGET_BRANCH_PREDICTION_HINTS isn't used at all. This patch removes it.
>>> Bootstrapped&tested (incl. -m32) on x86_64-unknown-linux-gnu. OK for trunk?
>>
>> This infrastructure can be used for future targets, so let's leave it as is.
>
> Yes, I suppose that's theoretically possible. However, I don't think
> this is very likely to happen. The branch hints only ever were
> supported for P4, but even Intel's own ICC never emits them
> (http://sources.redhat.com/ml/binutils/2004-07/msg00322.html). In
> fact, the whole Netburst microachitecture appears to have been
> abandoned (in favor of the ppro microachitecture).  For EM64T another
> form of branch hints was introduced (which GCC doesn't support and ICC
> doesn't emit).
>
> So any future target that would need this, would have to be a 32-bits,
> deep-pipeline microarchitecture. I personally don't believe that such
> an architecture will emerge, given the history of failure of this
> concept.
>
> Are these arguments reason enough for you to reconsider? :-)

Arguments are indeed good, but I'd like to hear HJ's opinion about
future targets.

Anyway, this functionality can be reimplemented from the scratch for
future target, but OTOH, it is not a mainenance burden at all...

Uros.


[PATCH 5/7] Move vector highpart emulation to the optabs layer

2012-07-10 Thread Richard Henderson
* expmed.c (expmed_mult_highpart): Rename from expand_mult_highpart.
(expmed_mult_highpart_optab): Rename from expand_mult_highpart_optab.
* optabs.c (can_mult_highpart_p): New.
(expand_mult_highpart): New.
* expr.c (expand_expr_real_2) [MULT_HIGHPART_EXPR): Use it.
* tree-vect-generic.c (expand_vector_operations_1): Don't expand
by pieces if can_mult_highpart_p.
(expand_vector_divmod): Use can_mult_highpart_p and always
generate MULT_HIGHPART_EXPR.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Likewise.
* tree-vect-stmts.c (vectorizable_operation): Likewise.
---
 gcc/ChangeLog|   12 
 gcc/expmed.c |   32 -
 gcc/expr.c   |7 +-
 gcc/optabs.c |  126 ++
 gcc/optabs.h |6 ++
 gcc/tree-vect-generic.c  |  113 +-
 gcc/tree-vect-patterns.c |   23 +--
 gcc/tree-vect-stmts.c|  171 ++
 8 files changed, 204 insertions(+), 286 deletions(-)

diff --git a/gcc/expmed.c b/gcc/expmed.c
index cec8d23..4101f61 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2381,8 +2381,8 @@ static rtx expand_mult_const (enum machine_mode, rtx, 
HOST_WIDE_INT, rtx,
  const struct algorithm *, enum mult_variant);
 static unsigned HOST_WIDE_INT invert_mod2n (unsigned HOST_WIDE_INT, int);
 static rtx extract_high_half (enum machine_mode, rtx);
-static rtx expand_mult_highpart (enum machine_mode, rtx, rtx, rtx, int, int);
-static rtx expand_mult_highpart_optab (enum machine_mode, rtx, rtx, rtx,
+static rtx expmed_mult_highpart (enum machine_mode, rtx, rtx, rtx, int, int);
+static rtx expmed_mult_highpart_optab (enum machine_mode, rtx, rtx, rtx,
   int, int);
 /* Compute and return the best algorithm for multiplying by T.
The algorithm must cost less than cost_limit
@@ -3477,7 +3477,7 @@ expand_mult_highpart_adjust (enum machine_mode mode, rtx 
adj_operand, rtx op0,
   return target;
 }
 
-/* Subroutine of expand_mult_highpart.  Return the MODE high part of OP.  */
+/* Subroutine of expmed_mult_highpart.  Return the MODE high part of OP.  */
 
 static rtx
 extract_high_half (enum machine_mode mode, rtx op)
@@ -3495,11 +3495,11 @@ extract_high_half (enum machine_mode mode, rtx op)
   return convert_modes (mode, wider_mode, op, 0);
 }
 
-/* Like expand_mult_highpart, but only consider using a multiplication
+/* Like expmed_mult_highpart, but only consider using a multiplication
optab.  OP1 is an rtx for the constant operand.  */
 
 static rtx
-expand_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
+expmed_mult_highpart_optab (enum machine_mode mode, rtx op0, rtx op1,
rtx target, int unsignedp, int max_cost)
 {
   rtx narrow_op1 = gen_int_mode (INTVAL (op1), mode);
@@ -3610,7 +3610,7 @@ expand_mult_highpart_optab (enum machine_mode mode, rtx 
op0, rtx op1,
MAX_COST is the total allowed cost for the expanded RTL.  */
 
 static rtx
-expand_mult_highpart (enum machine_mode mode, rtx op0, rtx op1,
+expmed_mult_highpart (enum machine_mode mode, rtx op0, rtx op1,
  rtx target, int unsignedp, int max_cost)
 {
   enum machine_mode wider_mode = GET_MODE_WIDER_MODE (mode);
@@ -3633,7 +3633,7 @@ expand_mult_highpart (enum machine_mode mode, rtx op0, 
rtx op1,
  mode == word_mode, however all the cost calculations in
  synth_mult etc. assume single-word operations.  */
   if (GET_MODE_BITSIZE (wider_mode) > BITS_PER_WORD)
-return expand_mult_highpart_optab (mode, op0, op1, target,
+return expmed_mult_highpart_optab (mode, op0, op1, target,
   unsignedp, max_cost);
 
   extra_cost = shift_cost[speed][mode][GET_MODE_BITSIZE (mode) - 1];
@@ -3651,7 +3651,7 @@ expand_mult_highpart (enum machine_mode mode, rtx op0, 
rtx op1,
 {
   /* See whether the specialized multiplication optabs are
 cheaper than the shift/add version.  */
-  tem = expand_mult_highpart_optab (mode, op0, op1, target, unsignedp,
+  tem = expmed_mult_highpart_optab (mode, op0, op1, target, unsignedp,
alg.cost.cost + extra_cost);
   if (tem)
return tem;
@@ -3666,7 +3666,7 @@ expand_mult_highpart (enum machine_mode mode, rtx op0, 
rtx op1,
 
   return tem;
 }
-  return expand_mult_highpart_optab (mode, op0, op1, target,
+  return expmed_mult_highpart_optab (mode, op0, op1, target,
 unsignedp, max_cost);
 }
 
@@ -3940,7 +3940,7 @@ expand_divmod (int rem_flag, enum tree_code code, enum 
machine_mode mode,
 
  In all cases but EXACT_DIV_EXPR, this multiplication requires the upper
  half of the product.  Different strategies for generating the product are
- implemented in expand_mult_highpart.
+ impleme

[PATCH 7/7] Zap now unused builtin_mul_widen_even/odd target hooks

2012-07-10 Thread Richard Henderson
* target.def (builtin_mul_widen_even, builtin_mul_widen_odd): Remove.
* system.h (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): Poison.
(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): Poison.
* config/i386/i386.c (IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI): Remove.
(IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V8SI): Remove.
(IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V4SI): Remove.
(IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V8SI): Remove.
(IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI): Remove.
(IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI): Remove.
(IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V8SI): Remove.
(bdesc_args): Remove entries to match.
(ix86_builtin_mul_widen_even, ix86_builtin_mul_widen_odd): Remove.
(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): Remove.
(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): Remove.
* config/rs6000/rs6000.c (rs6000_builtin_mul_widen_even): Remove.
(rs6000_builtin_mul_widen_odd): Remove.
(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): Remove.
(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): Remove.
* config/spu/spu.c (spu_builtin_mul_widen_even): Remove.
(spu_builtin_mul_widen_odd): Remove.
(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN): Remove.
(TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD): Remove.
* doc/tm.texi.in: Don't document the removed hooks.
---
 gcc/ChangeLog  |   24 ++
 gcc/config/i386/i386.c |   76 
 gcc/config/rs6000/rs6000.c |   51 -
 gcc/config/spu/spu.c   |   42 
 gcc/doc/tm.texi|   22 -
 gcc/doc/tm.texi.in |   22 -
 gcc/system.h   |4 ++-
 gcc/target.def |   14 
 8 files changed, 27 insertions(+), 228 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3cb34ce..23abe01 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -25754,14 +25754,6 @@ enum ix86_builtins
   IX86_BUILTIN_CPYSGNPS256,
   IX86_BUILTIN_CPYSGNPD256,
 
-  IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI,
-  IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V8SI,
-  IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V4SI,
-  IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V8SI,
-  IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI,
-  IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI,
-  IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V8SI,
-
   /* FMA4 instructions.  */
   IX86_BUILTIN_VFMADDSS,
   IX86_BUILTIN_VFMADDSD,
@@ -26620,10 +26612,6 @@ static const struct builtin_description bdesc_args[] =
 
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_umulv1siv1di3, 
"__builtin_ia32_pmuludq", IX86_BUILTIN_PMULUDQ, UNKNOWN, (int) 
V1DI_FTYPE_V2SI_V2SI },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_umult_even_v4si, 
"__builtin_ia32_pmuludq128", IX86_BUILTIN_PMULUDQ128, UNKNOWN, (int) 
V2DI_FTYPE_V4SI_V4SI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_umult_even_v4si, 
"__builtin_vw_umul_even_v4si", IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI, UNKNOWN, 
(int) V2UDI_FTYPE_V4USI_V4USI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_smult_even_v4si, 
"__builtin_ia32_vw_smul_even_v4si", IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI, 
UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_umult_odd_v4si, 
"__builtin_ia32_vw_umul_odd_v4si", IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V4SI, 
UNKNOWN, (int) V2UDI_FTYPE_V4USI_V4USI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_smult_odd_v4si, 
"__builtin_ia32_vw_smul_odd_v4si", IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI, 
UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI },
 
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_pmaddwd, "__builtin_ia32_pmaddwd128", 
IX86_BUILTIN_PMADDWD128, UNKNOWN, (int) V4SI_FTYPE_V8HI_V8HI },
 
@@ -27016,15 +27004,12 @@ static const struct builtin_description bdesc_args[] =
   { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_zero_extendv4hiv4di2  , 
"__builtin_ia32_pmovzxwq256", IX86_BUILTIN_PMOVZXWQ256, UNKNOWN, (int) 
V4DI_FTYPE_V8HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_zero_extendv4siv4di2  , 
"__builtin_ia32_pmovzxdq256", IX86_BUILTIN_PMOVZXDQ256, UNKNOWN, (int) 
V4DI_FTYPE_V4SI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_vec_widen_smult_even_v8si, 
"__builtin_ia32_pmuldq256", IX86_BUILTIN_PMULDQ256, UNKNOWN, (int) 
V4DI_FTYPE_V8SI_V8SI },
-  { OPTION_MASK_ISA_AVX2, CODE_FOR_vec_widen_smult_odd_v8si, 
"__builtin_ia32_vw_smul_odd_v8si", IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V8SI, 
UNKNOWN, (int) V4DI_FTYPE_V8SI_V8SI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_umulhrswv16hi3 , 
"__builtin_ia32_pmulhrsw256", IX86_BUILTIN_PMULHRSW256, UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_umulv16hi3_highpart, 
"__builtin_ia32_pmulhuw256" , IX86_BUILTIN_PMULHUW256 , UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_smulv16hi3_highpart, 
"__builtin_ia32_pmulhw256"  , IX86_BUILTIN_PMULHW256  , UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_mulv16hi3, "_

[PATCH 6/7] Use VEC_WIDEN_MULT_EVEN/ODD_EXPR in supportable_widening_operation

2012-07-10 Thread Richard Henderson
* tree-vect-stmts.c (supportable_widening_operation): Expand
WIDEN_MULT_EXPR via VEC_WIDEN_MULT_EVEN/ODD_EXPR if possible.
---
 gcc/ChangeLog |3 ++
 gcc/tree-vect-stmts.c |   96 +
 2 files changed, 53 insertions(+), 46 deletions(-)

diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9caf1c6..fe6a997 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -6199,7 +6199,8 @@ vect_is_simple_use_1 (tree operand, gimple stmt, 
loop_vec_info loop_vinfo,
 bool
 supportable_widening_operation (enum tree_code code, gimple stmt,
tree vectype_out, tree vectype_in,
-tree *decl1, tree *decl2,
+tree *decl1 ATTRIBUTE_UNUSED,
+   tree *decl2 ATTRIBUTE_UNUSED,
 enum tree_code *code1, enum tree_code *code2,
 int *multi_step_cvt,
 VEC (tree, heap) **interm_types)
@@ -6207,7 +6208,6 @@ supportable_widening_operation (enum tree_code code, 
gimple stmt,
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
   struct loop *vect_loop = NULL;
-  bool ordered_p;
   enum machine_mode vec_mode;
   enum insn_code icode1, icode2;
   optab optab1, optab2;
@@ -6223,56 +6223,60 @@ supportable_widening_operation (enum tree_code code, 
gimple stmt,
   if (loop_info)
 vect_loop = LOOP_VINFO_LOOP (loop_info);
 
-  /* The result of a vectorized widening operation usually requires two vectors
- (because the widened results do not fit into one vector). The generated
- vector results would normally be expected to be generated in the same
- order as in the original scalar computation, i.e. if 8 results are
- generated in each vector iteration, they are to be organized as follows:
-vect1: [res1,res2,res3,res4], vect2: [res5,res6,res7,res8].
-
- However, in the special case that the result of the widening operation is
- used in a reduction computation only, the order doesn't matter (because
- when vectorizing a reduction we change the order of the computation).
- Some targets can take advantage of this and generate more efficient code.
- For example, targets like Altivec, that support widen_mult using a 
sequence
- of {mult_even,mult_odd} generate the following vectors:
-vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8].
-
- When vectorizing outer-loops, we execute the inner-loop sequentially
- (each vectorized inner-loop iteration contributes to VF outer-loop
- iterations in parallel).  We therefore don't allow to change the order
- of the computation in the inner-loop during outer-loop vectorization.  */
-
-   if (vect_loop
-   && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
-   && !nested_in_vect_loop_p (vect_loop, stmt))
- ordered_p = false;
-   else
- ordered_p = true;
-
-  if (!ordered_p
-  && code == WIDEN_MULT_EXPR
-  && targetm.vectorize.builtin_mul_widen_even
-  && targetm.vectorize.builtin_mul_widen_even (vectype)
-  && targetm.vectorize.builtin_mul_widen_odd
-  && targetm.vectorize.builtin_mul_widen_odd (vectype))
-{
-  if (vect_print_dump_info (REPORT_DETAILS))
-fprintf (vect_dump, "Unordered widening operation detected.");
-
-  *code1 = *code2 = CALL_EXPR;
-  *decl1 = targetm.vectorize.builtin_mul_widen_even (vectype);
-  *decl2 = targetm.vectorize.builtin_mul_widen_odd (vectype);
-  return true;
-}
-
   switch (code)
 {
 case WIDEN_MULT_EXPR:
+  /* The result of a vectorized widening operation usually requires
+two vectors (because the widened results do not fit into one vector).
+The generated vector results would normally be expected to be
+generated in the same order as in the original scalar computation,
+i.e. if 8 results are generated in each vector iteration, they are
+to be organized as follows:
+   vect1: [res1,res2,res3,res4],
+   vect2: [res5,res6,res7,res8].
+
+However, in the special case that the result of the widening
+operation is used in a reduction computation only, the order doesn't
+matter (because when vectorizing a reduction we change the order of
+the computation).  Some targets can take advantage of this and
+generate more efficient code.  For example, targets like Altivec,
+that support widen_mult using a sequence of {mult_even,mult_odd}
+generate the following vectors:
+   vect1: [res1,res3,res5,res7],
+   vect2: [res2,res4,res6,res8].
+
+When vectorizing outer-loops, we execute the inner-loop sequentially
+(each vectorized inner-loop iteration contributes to VF outer-loop
+iterations in par

[PATCH 4/7] spu: Rename patterns for vec_widen_mult_even/odd_

2012-07-10 Thread Richard Henderson
* config/spu/spu-builtins.md (spu_mpy): Move to spu.md.
(spu_mpyu, spu_mpyhhu, spu_mpyhh): Likewise.
* config/spu/spu.md (vec_widen_smult_odd_v8hi): Rename from spu_mpy.
(vec_widen_umult_odd_v8hi): Rename from spu_mpyu.
(vec_widen_smult_even_v8hi): Rename from spu_mpyhh.
(vec_widen_umult_even_v8hi): Rename from spu_mpyhhu.
* config/spu/spu-builtins.def: Update pattern names to match.
---
 gcc/ChangeLog   |8 
 gcc/config/spu/spu-builtins.def |   24 +--
 gcc/config/spu/spu-builtins.md  |   65 -
 gcc/config/spu/spu.md   |   86 ++-
 4 files changed, 95 insertions(+), 88 deletions(-)

diff --git a/gcc/config/spu/spu-builtins.def b/gcc/config/spu/spu-builtins.def
index 4d01d94..6095e9c 100644
--- a/gcc/config/spu/spu-builtins.def
+++ b/gcc/config/spu/spu-builtins.def
@@ -62,15 +62,15 @@ DEF_BUILTIN (SI_SFI, CODE_FOR_spu_sf,
"si_sfi", B_INSN,
 DEF_BUILTIN (SI_SFX, CODE_FOR_spu_sfx,   "si_sfx", B_INSN, 
  _A4(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
 DEF_BUILTIN (SI_BG,  CODE_FOR_spu_bg,"si_bg",  B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
 DEF_BUILTIN (SI_BGX, CODE_FOR_spu_bgx,   "si_bgx", B_INSN, 
  _A4(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
-DEF_BUILTIN (SI_MPY, CODE_FOR_spu_mpy,   "si_mpy", B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
-DEF_BUILTIN (SI_MPYU,CODE_FOR_spu_mpyu,  "si_mpyu",B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
-DEF_BUILTIN (SI_MPYI,CODE_FOR_spu_mpy,   "si_mpyi",B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_S10))
-DEF_BUILTIN (SI_MPYUI,   CODE_FOR_spu_mpyu,  "si_mpyui",   B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_S10))
+DEF_BUILTIN (SI_MPY, CODE_FOR_vec_widen_smult_odd_v8hi, "si_mpy",  B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
+DEF_BUILTIN (SI_MPYU, CODE_FOR_vec_widen_umult_odd_v8hi, "si_mpyu",B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
+DEF_BUILTIN (SI_MPYI, CODE_FOR_vec_widen_smult_odd_v8hi, "si_mpyi",B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_S10))
+DEF_BUILTIN (SI_MPYUI, CODE_FOR_vec_widen_umult_odd_v8hi, "si_mpyui",  B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_S10))
 DEF_BUILTIN (SI_MPYA,CODE_FOR_spu_mpya,  "si_mpya",B_INSN, 
  _A4(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
 DEF_BUILTIN (SI_MPYH,CODE_FOR_spu_mpyh,  "si_mpyh",B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
 DEF_BUILTIN (SI_MPYS,CODE_FOR_spu_mpys,  "si_mpys",B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
-DEF_BUILTIN (SI_MPYHH,   CODE_FOR_spu_mpyhh, "si_mpyhh",   B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
-DEF_BUILTIN (SI_MPYHHU,  CODE_FOR_spu_mpyhhu,"si_mpyhhu",  B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
+DEF_BUILTIN (SI_MPYHH, CODE_FOR_vec_widen_smult_even_v8hi, "si_mpyhh", B_INSN, 
  _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
+DEF_BUILTIN (SI_MPYHHU, CODE_FOR_vec_widen_umult_even_v8hi, "si_mpyhhu", 
B_INSN, _A3(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
 DEF_BUILTIN (SI_MPYHHA,  CODE_FOR_spu_mpyhha,"si_mpyhha",  B_INSN, 
  _A4(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
 DEF_BUILTIN (SI_MPYHHAU, CODE_FOR_spu_mpyhhau,   "si_mpyhhau", B_INSN, 
  _A4(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
 DEF_BUILTIN (SI_CLZ, CODE_FOR_clzv4si2,  "si_clz", B_INSN, 
  _A2(SPU_BTI_QUADWORD, SPU_BTI_QUADWORD))
@@ -295,16 +295,16 @@ DEF_BUILTIN (SPU_MHHADD,   CODE_FOR_nothing,  
 "spu_mhhadd",
 DEF_BUILTIN (SPU_MHHADD_0, CODE_FOR_spu_mpyhhau,   "spu_mhhadd_0", 
B_INTERNAL, _A4(SPU_BTI_UV4SI,  SPU_BTI_UV8HI,  SPU_BTI_UV8HI,  
SPU_BTI_UV4SI))
 DEF_BUILTIN (SPU_MHHADD_1, CODE_FOR_spu_mpyhha,"spu_mhhadd_1", 
B_INTERNAL, _A4(SPU_BTI_V4SI,   SPU_BTI_V8HI,   SPU_BTI_V8HI,   
SPU_BTI_V4SI))
 DEF_BUILTIN (SPU_MULE, CODE_FOR_nothing,   "spu_mule", 
B_OVERLOAD, _A1(SPU_BTI_VOID))
-DEF_BUILTIN (SPU_MULE_0,   CODE_FOR_spu_mpyhhu,"spu_mule_0",   
B_INTERNAL, _A3(SPU_BTI_UV4SI,  SPU_BTI_UV8HI,  SPU_BTI_UV8HI))
-DEF_BUILTIN (SPU_MULE_1,   CODE_FOR_spu_mpyhh, "spu_mule_1",   
B_INTERNAL, _A3(SPU_BTI_V4SI,   SPU_BTI_V8HI,   SPU_BTI_V8HI))
+DEF_BUILTIN (SPU_MULE_0, CODE_FOR_vec_widen_umult_even_v8hi, "spu_mule_0", 
B_INTERN

[PATCH 3/7] rs6000: Rename patterns for vec_widen_mult_even/odd_

2012-07-10 Thread Richard Henderson
* config/rs6000/altivec.md (vec_widen_umult_even_v16qi): Rename
from altivec_vmuleub.
(vec_widen_smult_even_v16qi): Rename from altivec_vmulesb.
(vec_widen_umult_even_v8hi): Rename from altivec_vmuleuh.
(vec_widen_smult_even_v8hi): Rename from altivec_vmulesh.
(vec_widen_umult_odd_v16qi): Rename from altivec_vmuloub.
(vec_widen_smult_odd_v16qi): Rename from altivec_vmulosb.
(vec_widen_umult_odd_v8hi): Rename from altivec_vmulouh.
(vec_widen_smult_odd_v8hi): Rename from altivec_vmulosh.
* config/rs6000/rs6000-builtin.def: Update pattern names to match.
---
 gcc/ChangeLog|   11 +++
 gcc/config/rs6000/altivec.md |   54 +-
 gcc/config/rs6000/rs6000-builtin.def |   24 +++
 3 files changed, 50 insertions(+), 39 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index fd4bc9d..8c168c8 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -643,7 +643,7 @@
convert_move (small_swap, swap, 0);
  
low_product = gen_reg_rtx (V4SImode);
-   emit_insn (gen_altivec_vmulouh (low_product, one, two));
+   emit_insn (gen_vec_widen_umult_odd_v8hi (low_product, one, two));
  
high_product = gen_reg_rtx (V4SImode);
emit_insn (gen_altivec_vmsumuhm (high_product, one, small_swap, zero));
@@ -667,8 +667,8 @@
rtx high = gen_reg_rtx (V4SImode);
rtx low = gen_reg_rtx (V4SImode);
 
-   emit_insn (gen_altivec_vmulesh (even, operands[1], operands[2]));
-   emit_insn (gen_altivec_vmulosh (odd, operands[1], operands[2]));
+   emit_insn (gen_vec_widen_smult_even_v8hi (even, operands[1], operands[2]));
+   emit_insn (gen_vec_widen_smult_odd_v8hi (odd, operands[1], operands[2]));
 
emit_insn (gen_altivec_vmrghw (high, even, odd));
emit_insn (gen_altivec_vmrglw (low, even, odd));
@@ -936,7 +936,7 @@
   "vmrglw %0,%1,%2"
   [(set_attr "type" "vecperm")])
 
-(define_insn "altivec_vmuleub"
+(define_insn "vec_widen_umult_even_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
 (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
   (match_operand:V16QI 2 "register_operand" "v")]
@@ -945,7 +945,7 @@
   "vmuleub %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vmulesb"
+(define_insn "vec_widen_smult_even_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
 (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
   (match_operand:V16QI 2 "register_operand" "v")]
@@ -954,7 +954,7 @@
   "vmulesb %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vmuleuh"
+(define_insn "vec_widen_umult_even_v8hi"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
   (match_operand:V8HI 2 "register_operand" "v")]
@@ -963,7 +963,7 @@
   "vmuleuh %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vmulesh"
+(define_insn "vec_widen_smult_even_v8hi"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
   (match_operand:V8HI 2 "register_operand" "v")]
@@ -972,7 +972,7 @@
   "vmulesh %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vmuloub"
+(define_insn "vec_widen_umult_odd_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
 (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
   (match_operand:V16QI 2 "register_operand" "v")]
@@ -981,7 +981,7 @@
   "vmuloub %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vmulosb"
+(define_insn "vec_widen_smult_odd_v16qi"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
 (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")
   (match_operand:V16QI 2 "register_operand" "v")]
@@ -990,7 +990,7 @@
   "vmulosb %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vmulouh"
+(define_insn "vec_widen_umult_odd_v8hi"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
   (match_operand:V8HI 2 "register_operand" "v")]
@@ -999,7 +999,7 @@
   "vmulouh %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vmulosh"
+(define_insn "vec_widen_smult_odd_v8hi"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")
   (match_operand:V8HI 2 "register_operand" "v")]
@@ -2175,8 +2175,8 @@
   rtx ve = gen_reg_rtx (V8HImode);
   rtx vo = gen_reg_rtx (V8HImode);
   
-  emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
-  emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
+  emit_insn (gen_vec_widen_umult_even

[PATCH 1/7] Add VEC_WIDEN_MULT_EVEN/ODD_EXPR

2012-07-10 Thread Richard Henderson
* tree.def (VEC_WIDEN_MULT_EVEN_EXPR, VEC_WIDEN_MULT_ODD_EXPR): New.
* cfgexpand.c (expand_debug_expr): Handle them.
* expr.c (expand_expr_real_2): Likewise.
* fold-const.c (fold_binary_loc): Likewise.
* gimple-pretty-print.c (dump_binary_rhs): Likewise.
* optabs.c (optab_for_tree_code): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree.c (commutative_tree_code): Likewise.
* tree-vect-generic.c (expand_vector_operations_1): Likewise.
Handle type change before looking up optab.
* optabs.h (OTI_vec_widen_umult_even, OTI_vec_widen_umult_odd): New.
(OTI_vec_widen_smult_even, OTI_vec_widen_smult_odd): New.
(vec_widen_umult_even_optab, vec_widen_umult_odd_optab): New.
(vec_widen_smult_even_optab, vec_widen_smult_odd_optab): New.
* genopinit.c (optabs): Initialize them.
* doc/md.texi: Document them.
---
 gcc/ChangeLog |   21 +
 gcc/cfgexpand.c   |4 +++-
 gcc/doc/md.texi   |   12 +---
 gcc/expr.c|   28 +++-
 gcc/fold-const.c  |   36 
 gcc/genopinit.c   |4 
 gcc/gimple-pretty-print.c |2 ++
 gcc/optabs.c  |8 
 gcc/optabs.h  |   12 ++--
 gcc/tree-cfg.c|2 ++
 gcc/tree-inline.c |2 ++
 gcc/tree-pretty-print.c   |   32 +---
 gcc/tree-vect-generic.c   |   32 +---
 gcc/tree.c|2 ++
 gcc/tree.def  |4 
 15 files changed, 124 insertions(+), 77 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index ad2f667..c8d09c7 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1,5 +1,5 @@
 /* A pass for lowering trees to RTL.
-   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
+   Copyright (C) 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012
Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -3410,6 +3410,8 @@ expand_debug_expr (tree exp)
 case VEC_UNPACK_LO_EXPR:
 case VEC_WIDEN_MULT_HI_EXPR:
 case VEC_WIDEN_MULT_LO_EXPR:
+case VEC_WIDEN_MULT_EVEN_EXPR:
+case VEC_WIDEN_MULT_ODD_EXPR:
 case VEC_WIDEN_LSHIFT_HI_EXPR:
 case VEC_WIDEN_LSHIFT_LO_EXPR:
 case VEC_PERM_EXPR:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index c71c59c..99f6528 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4561,15 +4561,21 @@ floating point conversion and place the resulting N/2 
values of size 2*S in
 the output vector (operand 0).
 
 @cindex @code{vec_widen_umult_hi_@var{m}} instruction pattern
-@cindex @code{vec_widen_umult_lo__@var{m}} instruction pattern
+@cindex @code{vec_widen_umult_lo_@var{m}} instruction pattern
 @cindex @code{vec_widen_smult_hi_@var{m}} instruction pattern
 @cindex @code{vec_widen_smult_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_umult_even_@var{m}} instruction pattern
+@cindex @code{vec_widen_umult_odd_@var{m}} instruction pattern
+@cindex @code{vec_widen_smult_even_@var{m}} instruction pattern
+@cindex @code{vec_widen_smult_odd_@var{m}} instruction pattern
 @item @samp{vec_widen_umult_hi_@var{m}}, @samp{vec_widen_umult_lo_@var{m}}
 @itemx @samp{vec_widen_smult_hi_@var{m}}, @samp{vec_widen_smult_lo_@var{m}}
+@itemx @samp{vec_widen_umult_even_@var{m}}, @samp{vec_widen_umult_odd_@var{m}}
+@itemx @samp{vec_widen_smult_even_@var{m}}, @samp{vec_widen_smult_odd_@var{m}}
 Signed/Unsigned widening multiplication.  The two inputs (operands 1 and 2)
 are vectors with N signed/unsigned elements of size S@.  Multiply the high/low
-elements of the two vectors, and put the N/2 products of size 2*S in the
-output vector (operand 0).
+or even/odd elements of the two vectors, and put the N/2 products of size 2*S
+in the output vector (operand 0).
 
 @cindex @code{vec_widen_ushiftl_hi_@var{m}} instruction pattern
 @cindex @code{vec_widen_ushiftl_lo_@var{m}} instruction pattern
diff --git a/gcc/expr.c b/gcc/expr.c
index 1279186..c56b0e5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8917,29 +8917,15 @@ expand_expr_real_2 (sepops ops, rtx target, enum 
machine_mode tmode,
 
 case VEC_WIDEN_MULT_HI_EXPR:
 case VEC_WIDEN_MULT_LO_EXPR:
-  {
-   tree oprnd0 = treeop0;
-   tree oprnd1 = treeop1;
-
-   expand_operands (oprnd0, oprnd1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
-   target = expand_widen_pattern_expr (ops, op0, op1, NULL_RTX,
-   target, unsignedp);
-   gcc_assert (target);
-   return target;
-  }
-
+case VEC_WIDEN_MULT_EVEN_EXPR:
+case VEC_WIDEN_MULT_ODD_EXPR:
 case VEC_WIDEN_LSHIFT_HI_EXPR:
 case VEC_WIDEN_LSHIFT_LO_EXPR:
-  {
-tree oprnd0 = treeop0;
-  

[PATCH 2/7] i386: Rename patterns for vec_widen_mult_even/odd_

2012-07-10 Thread Richard Henderson
* config/i386/sse.md (vec_widen_umult_even_v8si): Rename from
avx2_umulv4siv4di3.
(vec_widen_umult_even_v4si): Rename from sse2_umulv2siv2di3.
(vec_widen_smult_even_v8si): Rename from avx2_mulv4siv4di3.
(mulv4si3): Remove XOP test shadowed by SSE4 test.
* config/i386/i386.c (bdesc_args): Update pattern names.
(ix86_expand_sse2_mulvxdi3): Likewise.
(ix86_expand_mul_widen_evenodd): Likewise.  Remove XOP test
shadowed by SSE4 test.
---
 gcc/ChangeLog  |   10 ++
 gcc/config/i386/i386.c |   31 +--
 gcc/config/i386/sse.md |   18 ++
 3 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fbab32f..3cb34ce 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -26619,8 +26619,8 @@ static const struct builtin_description bdesc_args[] =
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_psadbw, "__builtin_ia32_psadbw128", 
IX86_BUILTIN_PSADBW128, UNKNOWN, (int) V2DI_FTYPE_V16QI_V16QI },
 
   { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_umulv1siv1di3, 
"__builtin_ia32_pmuludq", IX86_BUILTIN_PMULUDQ, UNKNOWN, (int) 
V1DI_FTYPE_V2SI_V2SI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_umulv2siv2di3, 
"__builtin_ia32_pmuludq128", IX86_BUILTIN_PMULUDQ128, UNKNOWN, (int) 
V2DI_FTYPE_V4SI_V4SI },
-  { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_umulv2siv2di3, 
"__builtin_vw_umul_even_v4si", IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI, UNKNOWN, 
(int) V2UDI_FTYPE_V4USI_V4USI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_umult_even_v4si, 
"__builtin_ia32_pmuludq128", IX86_BUILTIN_PMULUDQ128, UNKNOWN, (int) 
V2DI_FTYPE_V4SI_V4SI },
+  { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_umult_even_v4si, 
"__builtin_vw_umul_even_v4si", IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI, UNKNOWN, 
(int) V2UDI_FTYPE_V4USI_V4USI },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_smult_even_v4si, 
"__builtin_ia32_vw_smul_even_v4si", IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI, 
UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_umult_odd_v4si, 
"__builtin_ia32_vw_umul_odd_v4si", IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V4SI, 
UNKNOWN, (int) V2UDI_FTYPE_V4USI_V4USI },
   { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_smult_odd_v4si, 
"__builtin_ia32_vw_smul_odd_v4si", IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI, 
UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI },
@@ -27015,15 +27015,15 @@ static const struct builtin_description bdesc_args[] =
   { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_zero_extendv8hiv8si2  , 
"__builtin_ia32_pmovzxwd256", IX86_BUILTIN_PMOVZXWD256, UNKNOWN, (int) 
V8SI_FTYPE_V8HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_zero_extendv4hiv4di2  , 
"__builtin_ia32_pmovzxwq256", IX86_BUILTIN_PMOVZXWQ256, UNKNOWN, (int) 
V4DI_FTYPE_V8HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_zero_extendv4siv4di2  , 
"__builtin_ia32_pmovzxdq256", IX86_BUILTIN_PMOVZXDQ256, UNKNOWN, (int) 
V4DI_FTYPE_V4SI },
-  { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_mulv4siv4di3  , 
"__builtin_ia32_pmuldq256"  , IX86_BUILTIN_PMULDQ256  , UNKNOWN, (int) 
V4DI_FTYPE_V8SI_V8SI },
+  { OPTION_MASK_ISA_AVX2, CODE_FOR_vec_widen_smult_even_v8si, 
"__builtin_ia32_pmuldq256", IX86_BUILTIN_PMULDQ256, UNKNOWN, (int) 
V4DI_FTYPE_V8SI_V8SI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_vec_widen_smult_odd_v8si, 
"__builtin_ia32_vw_smul_odd_v8si", IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V8SI, 
UNKNOWN, (int) V4DI_FTYPE_V8SI_V8SI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_umulhrswv16hi3 , 
"__builtin_ia32_pmulhrsw256", IX86_BUILTIN_PMULHRSW256, UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_umulv16hi3_highpart, 
"__builtin_ia32_pmulhuw256" , IX86_BUILTIN_PMULHUW256 , UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_smulv16hi3_highpart, 
"__builtin_ia32_pmulhw256"  , IX86_BUILTIN_PMULHW256  , UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_mulv16hi3, "__builtin_ia32_pmullw256"  , 
IX86_BUILTIN_PMULLW256  , UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_mulv8si3, "__builtin_ia32_pmulld256"  , 
IX86_BUILTIN_PMULLD256  , UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI },
-  { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_umulv4siv4di3  , 
"__builtin_ia32_pmuludq256" , IX86_BUILTIN_PMULUDQ256 , UNKNOWN, (int) 
V4DI_FTYPE_V8SI_V8SI },
-  { OPTION_MASK_ISA_AVX2, CODE_FOR_avx2_umulv4siv4di3  , 
"__builtin_i386_vw_umul_even_v8si" , IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V8SI, 
UNKNOWN, (int) V4UDI_FTYPE_V8USI_V8USI },
+  { OPTION_MASK_ISA_AVX2, CODE_FOR_vec_widen_umult_even_v8si, 
"__builtin_ia32_pmuludq256", IX86_BUILTIN_PMULUDQ256, UNKNOWN, (int) 
V4DI_FTYPE_V8SI_V8SI },
+  { OPTION_MASK_ISA_AVX2, CODE_FOR_vec_widen_umult_even_v8si, 
"__builtin_i386_vw_umul_even_v8si", IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V8SI, 
UNKNOWN, (int) V4UDI_FTYPE_V8USI_V8USI },
   { OPTION_MASK_ISA_AVX2, CODE_FOR_vec_widen_umult_odd_v8si, 
"__builtin_ia32_vw_umul_odd_v8si", IX86_

[PATCH 0/7] Clean up widen mult even/odd

2012-07-10 Thread Richard Henderson
I find it instructive that 4 of the 5 isas that actually implement
widening integer multiplication do have mult-widen-even as the isa
primitive (even if the -odd variant is missing).  The fact that this
operation is implemented as a set of builtins and target hooks has
lead to disturbingly cookie-cutter implementations of these hooks
in the various backends.

Thus I choose to add VEC_WIDEN_MULT_EVEN/ODD_EXPR as tree codes and
optabs.  This removes a farily trivial amount of code from three
backends (the fourth backend, ia64, never grew this support).

The existance of optabs then allows the expansion of MULT_HIGHPART_EXPR
at the rtl-expansion level without having to resort to builtin expansion
in order to emit the even/odd alternative.  This saves a fairly 
substantial amount of code from the vectorizer.

I've not touched the interface to supportable_widening_operation,
which is still prepared to return a CALL_EXPR and some decls.  After
this patch set it will never do so.  I'm undecided as to whether we
ought to be prepared for such in the future, or whether this should
simply go in as a completely separate patch that could in the future
be easily reverted.

Tested on x86_64; cross-compiled to ppc64 and spu, spot checking the
relevant division-by-constant testcases.


r~



Richard Henderson (7):
  Add VEC_WIDEN_MULT_EVEN/ODD_EXPR
  i386: Rename patterns for vec_widen_mult_even/odd_
  rs6000: Rename patterns for vec_widen_mult_even/odd_
  spu: Rename patterns for vec_widen_mult_even/odd_
  Move vector highpart emulation to the optabs layer
  Use VEC_WIDEN_MULT_EVEN/ODD_EXPR in supportable_widening_operation
  Zap now unused builtin_mul_widen_even/odd target hooks

 gcc/ChangeLog|   89 
 gcc/cfgexpand.c  |4 +-
 gcc/config/i386/i386.c   |  103 ++---
 gcc/config/i386/sse.md   |   18 +--
 gcc/config/rs6000/altivec.md |   54 +++
 gcc/config/rs6000/rs6000-builtin.def |   24 +--
 gcc/config/rs6000/rs6000.c   |   51 ---
 gcc/config/spu/spu-builtins.def  |   24 +--
 gcc/config/spu/spu-builtins.md   |   65 -
 gcc/config/spu/spu.c |   42 --
 gcc/config/spu/spu.md|   86 +--
 gcc/doc/md.texi  |   12 +-
 gcc/doc/tm.texi  |   22 ---
 gcc/doc/tm.texi.in   |   22 ---
 gcc/expmed.c |   32 ++--
 gcc/expr.c   |   35 ++---
 gcc/fold-const.c |   36 +++--
 gcc/genopinit.c  |4 +
 gcc/gimple-pretty-print.c|2 +
 gcc/optabs.c |  134 +
 gcc/optabs.h |   18 ++-
 gcc/system.h |4 +-
 gcc/target.def   |   14 --
 gcc/tree-cfg.c   |2 +
 gcc/tree-inline.c|2 +
 gcc/tree-pretty-print.c  |   32 ++--
 gcc/tree-vect-generic.c  |  145 +-
 gcc/tree-vect-patterns.c |   23 +--
 gcc/tree-vect-stmts.c|  267 +-
 gcc/tree.c   |2 +
 gcc/tree.def |4 +
 31 files changed, 580 insertions(+), 792 deletions(-)

-- 
1.7.10.4



[SH] PR 53911 - Remove SImode displacement addressing related splits

2012-07-10 Thread Oleg Endo
Hello,

The attached patch removes two splits that undo displacement address
re-basing.  I've noticed that removing the two splits seems to result in
overall slightly smaller code according to the CSiBE set (compared with
-m4-single -ml -O2 -mpretend-cmove, -1048 bytes in total), despite some
code size increases here and there. 

Tested on rev. 189361 with
make -k check RUNTESTFLAGS="--target_board=sh-sim
\{-m2/-ml,-m2/-mb,-m2a/-mb,-m2a-single/-mb,-m4/-ml,-m4/-mb,-m4-single/-ml,
-m4-single/-mb,-m4a-single/-ml,-m4a-single/-mb}"

and no new failures.

Cheers,
Oleg

ChangeLog

PR target/53911
* config/sh/sh.md: Remove displacement addresssing related
splits.
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 189362)
+++ gcc/config/sh/sh.md	(working copy)
@@ -6683,63 +6683,6 @@
 FAIL;
 })
 
-;; If a base address generated by LEGITIMIZE_ADDRESS for SImode is
-;; used only once, let combine add in the index again.
-
-(define_split
-  [(set (match_operand:SI 0 "register_operand" "")
-	(match_operand:SI 1 "" ""))
-   (clobber (match_operand 2 "register_operand" ""))]
-  "TARGET_SH1 && ! reload_in_progress && ! reload_completed
-   && ALLOW_INDEXED_ADDRESS"
-  [(use (reg:SI R0_REG))]
-{
-  rtx addr, reg, const_int;
-
-  if (!MEM_P (operands[1]))
-FAIL;
-  addr = XEXP (operands[1], 0);
-  if (GET_CODE (addr) != PLUS)
-FAIL;
-  reg = XEXP (addr, 0);
-  const_int = XEXP (addr, 1);
-  if (! (BASE_REGISTER_RTX_P (reg) && INDEX_REGISTER_RTX_P (operands[2])
-	 && CONST_INT_P (const_int)))
-FAIL;
-  emit_move_insn (operands[2], const_int);
-  emit_move_insn (operands[0],
-		  change_address (operands[1], VOIDmode,
-  gen_rtx_PLUS (SImode, reg, operands[2])));
-  DONE;
-})
-
-(define_split
-  [(set (match_operand:SI 1 "" "")
-	(match_operand:SI 0 "register_operand" ""))
-   (clobber (match_operand 2 "register_operand" ""))]
-  "TARGET_SH1 && ! reload_in_progress && ! reload_completed
-   && ALLOW_INDEXED_ADDRESS"
-  [(use (reg:SI R0_REG))]
-{
-  rtx addr, reg, const_int;
-
-  if (!MEM_P (operands[1]))
-FAIL;
-  addr = XEXP (operands[1], 0);
-  if (GET_CODE (addr) != PLUS)
-FAIL;
-  reg = XEXP (addr, 0);
-  const_int = XEXP (addr, 1);
-  if (! (BASE_REGISTER_RTX_P (reg) && INDEX_REGISTER_RTX_P (operands[2])
-	 && CONST_INT_P (const_int)))
-FAIL;
-  emit_move_insn (operands[2], const_int);
-  emit_move_insn (change_address (operands[1], VOIDmode,
-  gen_rtx_PLUS (SImode, reg, operands[2])),
-		  operands[0]);
-  DONE;
-})
-
 (define_expand "movdf"
   [(set (match_operand:DF 0 "general_movdst_operand" "")
 	(match_operand:DF 1 "general_movsrc_operand" ""))]


Re: [patch] Add a lexical block only when the callsite has source location info

2012-07-10 Thread Dehao Chen
On Tue, Jul 10, 2012 at 1:57 PM, Xinliang David Li  wrote:
> Is this related to the problem described in
> http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01511.html ?

This does not sounds related to me. This patch only fix the block info
for phi_arg_t.

The following patch is related to function split, but only for debug info.

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01579.html

Thanks,
Dehao

>
> David
>
> On Mon, Jun 25, 2012 at 4:43 AM, Dehao Chen  wrote:
>> During function inlining, a lexical block is added for each cloned
>> callee, and source info is attached to this block for addr2line to
>> derive the inline stack. However, some callsites do not have source
>> information attached to it. Adding a lexical block would be misleading
>> in this case. E.g. If a function is split, when the split callsite is
>> inlined back, the cloned callee should stay in the same lexical block
>> with its caller. This patch ensures that lexical blocks are only added
>> when the callsite has source location info in it.
>>
>> Bootstrapped and passed gcc regression tests.
>>
>> Is it ok for trunk?
>>
>> Thanks,
>> Dehao
>>
>> gcc/ChangeLog:
>> 2012-06-25  Dehao Chen  
>>
>> * tree-profile.c: (expand_call_inline): Make a new lexical block only
>> when the call stmt has source location.
>>
>> Index: gcc/tree-inline.c
>> ===
>> --- gcc/tree-inline.c   (revision 188926)
>> +++ gcc/tree-inline.c   (working copy)
>> @@ -3950,10 +3950,17 @@
>>   actual inline expansion of the body, and a label for the return
>>   statements within the function to jump to.  The type of the
>>   statement expression is the return type of the function call.  */
>> -  id->block = make_node (BLOCK);
>> -  BLOCK_ABSTRACT_ORIGIN (id->block) = fn;
>> -  BLOCK_SOURCE_LOCATION (id->block) = input_location;
>> -  prepend_lexical_block (gimple_block (stmt), id->block);
>> +  if (gimple_has_location (stmt))
>> +{
>> +  id->block = make_node (BLOCK);
>> +  BLOCK_ABSTRACT_ORIGIN (id->block) = fn;
>> +  BLOCK_SOURCE_LOCATION (id->block) = input_location;
>> +  prepend_lexical_block (gimple_block (stmt), id->block);
>> +}
>> +  else
>> +{
>> +  id->block = gimple_block (stmt);
>> +}
>>
>>/* Local declarations will be replaced by their equivalents in this
>>   map.  */


Re: [patch][i386] Remove some dead code (TARGET_BRANCH_PREDICTION_HINTS)

2012-07-10 Thread Steven Bosscher
On Tue, Jul 10, 2012 at 8:56 AM, Uros Bizjak  wrote:
> Hello!
>
>> TARGET_BRANCH_PREDICTION_HINTS isn't used at all. This patch removes it.
>> Bootstrapped&tested (incl. -m32) on x86_64-unknown-linux-gnu. OK for trunk?
>
> This infrastructure can be used for future targets, so let's leave it as is.

Hi Uros,

Yes, I suppose that's theoretically possible. However, I don't think
this is very likely to happen. The branch hints only ever were
supported for P4, but even Intel's own ICC never emits them
(http://sources.redhat.com/ml/binutils/2004-07/msg00322.html). In
fact, the whole Netburst microachitecture appears to have been
abandoned (in favor of the ppro microachitecture).  For EM64T another
form of branch hints was introduced (which GCC doesn't support and ICC
doesn't emit).

So any future target that would need this, would have to be a 32-bits,
deep-pipeline microarchitecture. I personally don't believe that such
an architecture will emerge, given the history of failure of this
concept.

Are these arguments reason enough for you to reconsider? :-)

Ciao!
Steven