Re: varpool alias reorg

2011-06-28 Thread Richard Guenther
On Mon, 27 Jun 2011, Jan Hubicka wrote:

  On Fri, 24 Jun 2011, Jan Hubicka wrote:
  
   Hi,
   this is yet another variant of the fix.  This time we stream builtins 
   decls as
   usually, but at fixup time we copy the assembler names (if set) into the
   builtin decls used by folders.  Not sure if it is any better than breaking
   memops-asm, but I can imagine that things like glibc actually rename 
   string
   functions into their internal variants (and thus with this version of 
   patch we
   would be able to LTO such library, but still we won't be able to LTO such
   library into something else because something else would end up 
   referncing the
   internal versions of builtins).  I doubt we could do any better, however.
  
  Not stream builtins with adjusted assembler names (I guess we'd need
  a flag for this, DECL_USER_ASSEMBLER_NAME_SET_P?  Or just check for
 
 Most of code just checks for '*' on begginign of assembler name. I suppose it 
 is safe.
 
  attributes?) as builtins but as new decls.  Let lto symbol merging
  then register those as aliases.  But which way around?  probably
  similar to how we should handle re-defined extern inlines, the
  extern inline being the GCC builtin and the re-definition being
  the aliased one.
 
 I don't quite get your answer here.  What we do now is:
 
  1) stream in builtin as special kind of reference with decl assembler name 
 associated to it
  2) at streaming in time always resolve builtlin to the official builtin 
 decls (no matter
  what types and other stuff builtin had at stream out time) and overwritting 
 the official builtin
  assembler name into one specified.
 
 What i suggest is
 
  1) Stream out builtins as usual decls just with the extra function code
  2) Stream in builtins as usually
  3) optionally set the assembler name of the official decl
 
 I see there are problems with i.e. one decl rule, but we do have same problems
 with normal frontends that also do use different decl for explicit builtin
 calls than for implicit, sadly.
 
 I am not quite sure what the proper fix for this problem is - it is very handy
 to have builtin decl in middle end where I know it is sane (i.e. it has the
 right types etc.). Since C allows to declare the builtins arbitrarily, it gets
 bit tricky to preserve one decl rule here.

Hm.  I would suggest to do as now, stream in builtin specially if it
does not have an assembler name attribute.  If it does have it, stream
it as usually and let lto symtab do its job (I suppose we need to
register builtin functions with the symtab as well).

   __attribute__ ((used)) is still needed in memops-asm-lib.c because LTO 
   symtab
   of course doesn't see the future references to builtins that we will emit
   later via folding.  I think it is resonable requirement, as discussed at 
   the
   time enabling the plugin.
  
  Yes, I think the testcase fix sounds reasonable.
  
  I suppose you can come up with a simpler testcase for this feature
  for gcc.dg/lto highlighting the different issues?  I'm not sure
  if we are talking about my_memcpy () alias(memcpy) or
  memcpy () alias(my_memcpy).
  
  I still like to stream unmodified builtins as builtins, as that is
  similar to pre-loading the streamer caches with things like
  void_type_node or sizetype.
 
 Doing so will need us to solve the other one decl rules probly.
 I didn't really got what the preloading is useful for after all?

Saving memory mostly, apart from the special singletons we have
(as Micha already hinted).

Richard.


Re: [RFC] Fix full memory barrier on SPARC-V8

2011-06-28 Thread Eric Botcazou
 Let's clarify something, did you run your testcase that triggered this
 bug on a v8 or a v9 machine?

Sun UltraSPARC, so V9 of course.  The point is that Solaris is TSO (TSO as 
defined for the V9 architecture, i.e. backward compatible with V8) so you have 
a V8-compatible TSO implementation, in particular not a Strong Consistency V8.

It is perfectly valid to compile with -mcpu=v8 on Solaris and expect to get a 
working program.  Now if you start to play seriously with __sync_synchronize, 
you conclude that it doesn't implement a full memory barrier with -mcpu=v8.

The V8 architecture manual is quite clear about it: TSO allows stores to be 
reordered after subsequent loads (it's the only difference in TSO with Strong 
Consistency) so you need to do something to have a full memory barrier.  As 
there is no specific instruction to that effect in V8, you need to do what is 
done for pre-SSE2 x86, i.e. use an atomic instruction.

-- 
Eric Botcazou


[ARM] fix PR target/48637

2011-06-28 Thread Richard Earnshaw
For a long time now the compiler has permitted printing a symbol with
the %c operator, but for some reason we've never permitted
symbol+offset.  This patch fixes this omission and also makes the
compiler slightly more friendly to users of ASM statements by not
generating an ICE when it can't handle an expression.  Tested on
arm-eabi and installed on trunk.

This is not a regression, so I don't propose to back-port it to older
compilers (though doing so would most-likely be trivial).

R.

2011-06-27  Richard Earnshaw  rearn...@arm.com

PR target/48637
* arm.c (arm_print_operand): Allow sym+offset.  Don't abort on
invalid asm operands.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index efffcf8..8b9cb25 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16242,8 +16242,17 @@ arm_print_operand (FILE *stream, rtx x, int code)
 	  output_addr_const (stream, x);
 	  break;
 
+	case CONST:
+	  if (GET_CODE (XEXP (x, 0)) == PLUS
+	   GET_CODE (XEXP (XEXP (x, 0), 0)) == SYMBOL_REF)
+	{
+	  output_addr_const (stream, x);
+	  break;
+	}
+	  /* Fall through.  */
+
 	default:
-	  gcc_unreachable ();
+	  output_operand_lossage (Unsupported operand for code '%c', code);
 	}
   return;
 


Re: [PATCH] __builtin_assume_aligned

2011-06-28 Thread Richard Guenther
On Mon, Jun 27, 2011 at 6:54 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Mon, Jun 27, 2011 at 12:17:40PM +0200, Richard Guenther wrote:
 Ok if you remove the builtins.c folding and instead verify arguments
 from check_builtin_function_arguments.

 Thanks, here is what I've committed after bootstrapping/regtesting
 again on x86_64-linux and i686-linux.

Thanks Jakub.  Probably worth an entry in changes.html.

Richard.

 2011-06-27  Jakub Jelinek  ja...@redhat.com

        * builtin-types.def (BT_FN_PTR_CONST_PTR_SIZE_VAR): New.
        * builtins.def (BUILT_IN_ASSUME_ALIGNED): New builtin.
        * tree-ssa-structalias.c (find_func_aliases_for_builtin_call,
        find_func_clobbers): Handle BUILT_IN_ASSUME_ALIGNED.
        * tree-ssa-ccp.c (bit_value_assume_aligned): New function.
        (evaluate_stmt, execute_fold_all_builtins): Handle
        BUILT_IN_ASSUME_ALIGNED.
        * tree-ssa-dce.c (propagate_necessity): Likewise.
        * tree-ssa-alias.c (ref_maybe_used_by_call_p_1,
        call_may_clobber_ref_p_1): Likewise.
        * builtins.c (is_simple_builtin, expand_builtin): Likewise.
        (expand_builtin_assume_aligned): New function.
        * doc/extend.texi (__builtin_assume_aligned): Document.

        * c-common.c (check_builtin_function_arguments): Handle
        BUILT_IN_ASSUME_ALIGNED.

        * gcc.dg/builtin-assume-aligned-1.c: New test.
        * gcc.dg/builtin-assume-aligned-2.c: New test.
        * gcc.target/i386/builtin-assume-aligned-1.c: New test.

 --- gcc/builtin-types.def.jj    2011-06-26 09:55:16.0 +0200
 +++ gcc/builtin-types.def       2011-06-27 15:08:12.0 +0200
 @@ -454,6 +454,8 @@ DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_CONST
                         BT_INT, BT_CONST_STRING, BT_CONST_STRING)
  DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_INT_CONST_STRING_VAR,
                         BT_INT, BT_INT, BT_CONST_STRING)
 +DEF_FUNCTION_TYPE_VAR_2 (BT_FN_PTR_CONST_PTR_SIZE_VAR, BT_PTR,
 +                        BT_CONST_PTR, BT_SIZE)

  DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_STRING_SIZE_CONST_STRING_VAR,
                         BT_INT, BT_STRING, BT_SIZE, BT_CONST_STRING)
 --- gcc/builtins.def.jj 2011-06-26 09:55:16.0 +0200
 +++ gcc/builtins.def    2011-06-27 15:08:12.0 +0200
 @@ -1,7 +1,7 @@
  /* This file contains the definitions and documentation for the
    builtins used in the GNU compiler.
    Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
 -   2010 Free Software Foundation, Inc.
 +   2010, 2011 Free Software Foundation, Inc.

  This file is part of GCC.

 @@ -638,6 +638,7 @@ DEF_EXT_LIB_BUILTIN        (BUILT_IN_EXE
  DEF_EXT_LIB_BUILTIN        (BUILT_IN_EXECVE, execve, 
 BT_FN_INT_CONST_STRING_PTR_CONST_STRING_PTR_CONST_STRING, ATTR_NOTHROW_LIST)
  DEF_LIB_BUILTIN        (BUILT_IN_EXIT, exit, BT_FN_VOID_INT, 
 ATTR_NORETURN_NOTHROW_LIST)
  DEF_GCC_BUILTIN        (BUILT_IN_EXPECT, expect, BT_FN_LONG_LONG_LONG, 
 ATTR_CONST_NOTHROW_LEAF_LIST)
 +DEF_GCC_BUILTIN        (BUILT_IN_ASSUME_ALIGNED, assume_aligned, 
 BT_FN_PTR_CONST_PTR_SIZE_VAR, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_GCC_BUILTIN        (BUILT_IN_EXTEND_POINTER, extend_pointer, 
 BT_FN_UNWINDWORD_PTR, ATTR_CONST_NOTHROW_LEAF_LIST)
  DEF_GCC_BUILTIN        (BUILT_IN_EXTRACT_RETURN_ADDR, extract_return_addr, 
 BT_FN_PTR_PTR, ATTR_LEAF_LIST)
  DEF_EXT_LIB_BUILTIN    (BUILT_IN_FFS, ffs, BT_FN_INT_INT, 
 ATTR_CONST_NOTHROW_LEAF_LIST)
 --- gcc/tree-ssa-structalias.c.jj       2011-06-26 09:55:16.0 +0200
 +++ gcc/tree-ssa-structalias.c  2011-06-27 15:08:12.0 +0200
 @@ -4002,6 +4002,7 @@ find_func_aliases_for_builtin_call (gimp
       case BUILT_IN_STPCPY_CHK:
       case BUILT_IN_STRCAT_CHK:
       case BUILT_IN_STRNCAT_CHK:
 +      case BUILT_IN_ASSUME_ALIGNED:
        {
          tree res = gimple_call_lhs (t);
          tree dest = gimple_call_arg (t, (DECL_FUNCTION_CODE (fndecl)
 @@ -4726,6 +4727,7 @@ find_func_clobbers (gimple origt)
              return;
            }
          /* The following functions neither read nor clobber memory.  */
 +         case BUILT_IN_ASSUME_ALIGNED:
          case BUILT_IN_FREE:
            return;
          /* Trampolines are of no interest to us.  */
 --- gcc/tree-ssa-ccp.c.jj       2011-06-26 09:55:16.0 +0200
 +++ gcc/tree-ssa-ccp.c  2011-06-27 15:08:12.0 +0200
 @@ -1476,6 +1476,64 @@ bit_value_binop (enum tree_code code, tr
   return val;
  }

 +/* Return the propagation value when applying __builtin_assume_aligned to
 +   its arguments.  */
 +
 +static prop_value_t
 +bit_value_assume_aligned (gimple stmt)
 +{
 +  tree ptr = gimple_call_arg (stmt, 0), align, misalign = NULL_TREE;
 +  tree type = TREE_TYPE (ptr);
 +  unsigned HOST_WIDE_INT aligni, misaligni = 0;
 +  prop_value_t ptrval = get_value_for_expr (ptr, true);
 +  prop_value_t alignval;
 +  double_int value, mask;
 +  prop_value_t val;
 +  if (ptrval.lattice_val == UNDEFINED)
 +    return ptrval;
 +  gcc_assert ((ptrval.lattice_val == CONSTANT
 

Re: [patch tree-optimization]: Try to sink type-casts for binary and/or/xor operations

2011-06-28 Thread Richard Guenther
On Mon, Jun 27, 2011 at 7:17 PM, Kai Tietz kti...@redhat.com wrote:
 Ups, missed to update patch.

You still modify the

  /* If the first argument is an SSA name that is itself a result of a
 typecast of an ADDR_EXPR to an integer, feed the ADDR_EXPR to the
 folder rather than the ssa name.  */

block.  Please merge the constant handling with the
CONVERT_EXPR_CODE_P path instead.  The above block is purely
legacy and should probably be entirely dropped.

Richard.


 Kai

 - Original Message -
 From: Kai Tietz kti...@redhat.com
 To: Richard Guenther richard.guent...@gmail.com
 Cc: gcc-patches@gcc.gnu.org
 Sent: Monday, June 27, 2011 7:04:04 PM
 Subject: Re: [patch tree-optimization]: Try to sink type-casts for binary 
 and/or/xor operations

 Hi,

 so I modified patch to use int_fits_type_p() for integer CST checking.  Well, 
 this approach is - as discussed on IRC suboptimal - as my intial approach was 
 for and-operations with precision type  precision type-x and unsigned type-x 
 for constant values bigger then (type-x)~0.
 But well, those we miss now by int_fits_type_p() approach, too. And also we 
 miss now the cases for that type is signed and type-x is unsigned with same 
 precision.

 Anyway ... here is the updated patch

 Regards,
 Kai

 - Original Message -
 From: Richard Guenther richard.guent...@gmail.com
 To: Kai Tietz kti...@redhat.com
 Cc: gcc-patches@gcc.gnu.org
 Sent: Monday, June 27, 2011 4:08:41 PM
 Subject: Re: [patch tree-optimization]: Try to sink type-casts for binary 
 and/or/xor operations

 On Mon, Jun 27, 2011 at 3:46 PM, Kai Tietz kti...@redhat.com wrote:
 Hello,

 this patch sink type conversions in forward-propagate for the following 
 patterns:
 - ((type) X) op ((type) Y): If X and Y have compatible types.
 - ((type) X) op CST: If the conversion of (type) ((type-x) CST) == CST and X 
 has integral type.
 - CST op ((type) X): If the conversion of (type) ((type-x) CST) == CST and X 
 has integral type.

 See IRC comments.

 Additionally it fixes another issue shown by this type-sinking in bswap 
 detection. The bswap pattern matching algorithm goes for the first hit, and 
 not tries to seek for best hit.  So we search here two times. First for di 
 case (if present) and then for si mode case.

 Please split this piece out.  I suppose either walking over stmts backwards
 or simply handling __builtin_bswap in find_bswap_1 would be a better
 fix than yours.

 Richard.

 ChangeLog

 2011-06-27  Kai Tietz  kti...@redhat.com

        * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
        type sinking.
        * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
        search for di/si mode patterns for finding widest match.

 Bootstrapped and regression tested for x86_64-pc-linux-gnu.  Ok for apply?

 Regards,
 Kai




Re: {patch tree-ssa-math-opts]: Change searching direction for bswap

2011-06-28 Thread Richard Guenther
On Mon, Jun 27, 2011 at 7:33 PM, Kai Tietz kti...@redhat.com wrote:
 Hello,

 this is the separated patch for issues noticed by doing type-sinking on 
 bitwise-operations.  The tests exposed that bswap pattern-matching searches 
 from top to down for each BB. As it replaces the found match for a bswap in 
 tree, but doesn't know about handling its inserted builtin-swap on 
 pattern-matching for wider-mode bswap, search failed.
 By reversing search order within BB from last to first, this issue can be 
 fixed.

 ChangeLog

 2011-06-27  Kai Tietz  kti...@redhat.com

        * tree-ssa-math-opts.c (execute_optimize_bswap): Search
        within BB from last to first.

 Bootstrapped and regression-tested for x86_64-pc-linux-gnu. Ok for apply?

 Regards,
 Kai

 Index: gcc-head/gcc/tree-ssa-math-opts.c
 ===
 --- gcc-head.orig/gcc/tree-ssa-math-opts.c
 +++ gcc-head/gcc/tree-ssa-math-opts.c
 @@ -1820,8 +1820,10 @@ execute_optimize_bswap (void)
   FOR_EACH_BB (bb)
     {
       gimple_stmt_iterator gsi;
 -
 +/*
       for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (gsi))
 + */
 +      for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi))
         {

Err ... please 1) don't comment out stuff this way, 2) add a comment
why we loop backwards

Richard.

          gimple stmt = gsi_stmt (gsi);
          tree bswap_src, bswap_type;



Re: [RFC] Fix full memory barrier on SPARC-V8

2011-06-28 Thread David Miller
From: Eric Botcazou ebotca...@adacore.com
Date: Tue, 28 Jun 2011 10:11:03 +0200

 The V8 architecture manual is quite clear about it: TSO allows stores to be 
 reordered after subsequent loads (it's the only difference in TSO with Strong 
 Consistency) so you need to do something to have a full memory barrier.  As 
 there is no specific instruction to that effect in V8, you need to do what is 
 done for pre-SSE2 x86, i.e. use an atomic instruction.

Fair enough, you can add this code if you want.





[RFC] Add middle end hook for stack red zone size

2011-06-28 Thread Jiangning Liu
This patch is to fix PR38644, which is a bug with long history about stack
red zone access, and PR30282 is correlated.

Originally red zone concept is not exposed to middle-end, and back-end uses
special logic to add extra memory barrier RTL and help the correct
dependence in middle-end. This way different back-ends must handle red zone
problem by themselves. For example, X86 target introduced function
ix86_using_red_zone() to judge red zone access, while POWER introduced
offset_below_red_zone_p() to judge it. Note that they have different
semantics, but the logic in caller sites of back-end uses them to decide
whether adding memory barrier RTL or not. If back-end incorrectly handles
this, bug would be introduced. 

Therefore, the correct method should be middle-end handles red zone related
things to avoid the burden in different back-ends. To be specific for
PR38644, this middle-end problem causes incorrect behavior for ARM target.
This patch exposes red zone concept to middle-end by introducing a
middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in target.def,
and by default its value is 0. Back-end may redefine this function to
provide concrete red zone size according to specific ABI requirements. 

In middle end, scheduling dependence is modified by using this hook plus
checking stack frame pointer adjustment instruction to decide whether memory
references need to be all flushed out or not. In theory, if
TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end would not be
required to specially handle this scheduling dependence issue by introducing
extra memory barrier RTL.

In back-end, the following changes are made to define the hook,
1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be
ix86_stack_red_zone_size() in i386.c, which is an newly introduced function.
2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be
rs6000_stack_red_zone_size() in rs6000.c, which is also a newly defined
function.
3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be
default_stack_red_zone_size in targhooks.c, and this function returns 0,
which means ARM eabi and others don't support red zone access at all.

In summary, the relationship between ABI and red zone access is like below,

-
|   ARCH   |  ARM  |   X86 |POWER  | others |
|--|---|---|---||
|ABI   | EABI  | MS_64 | other |   AIX  |  V4  ||
|--|---|---|---||--||
| RED ZONE |  No   |  YES  |  No   |  YES   |  No  |   No   |
|--|---|---|---||--||
| RED ZONE SIZE|   0   |  128  |   0   |220/288 |   0  |0   |
-

Thanks,
-Jiangning

stack-red-zone-patch-38644-3.patch
Description: Binary data


Re: [patch tree-optimization]: Try to do type sinking on comparisons

2011-06-28 Thread Richard Guenther
On Mon, Jun 27, 2011 at 8:52 PM, Kai Tietz kti...@redhat.com wrote:
 Hello,

 this patch tries to sink conversions for comparisons patterns:
 a) (type) X cmp (type) Y = x cmp y.
 b) (type) X cmp CST = x cmp ((type-x) CST).
 c) CST cmp (type) X = ((type-x) CST) cmp x.

 This patch just allows type sinking for the case that type-precision of type 
 is wider or equal to type-precision of type-x. Or if type and type-x have 
 same signess and CST fits into type-x. For cmp operation is == or !=, we 
 allow also that type and type-x have different signess, as long as CST fits 
 into type-x without truncation.

 ChangeLog

 2011-06-27  Kai Tietz  kti...@redhat.com

        * tree-ssa-forwprop.c (forward_propagate_into_comparision):
        Sink types within comparison operands, if suitable.

 Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply?

Hmm, why does fold_widened_comparison and fold_sign_changed_comparison
not handle these cases?  We already dispatch to fold in this function,
so this is a case where we'd want fold to be improved.  You didn't add
testcases - do you have some that are not handled by fold already?

Thanks,
Richard.

 Regards,
 Kai



Re: [RFC, ARM] Convert thumb1 prologue completely to rtl

2011-06-28 Thread Richard Earnshaw
On 27/06/11 19:31, Richard Henderson wrote:
 On 06/24/2011 02:59 AM, Richard Earnshaw wrote:
 On 18/06/11 20:02, Richard Henderson wrote:
 I couldn't find anything terribly tricky about the conversion.

 The existing push_mult pattern would service thumb1 with just
 a tweak or two to the memory predicate and the length.

 The existing emit_multi_reg_push wasn't set up to handle a
 complete switch of registers for unwind info.  I thought about
 trying to merge them, but thought chickened out.

 I havn't cleaned out the code that is now dead in thumb_pushpop.
 I'd been thinking about maybe converting epilogues completely
 to rtl as well, which would allow the function to be deleted
 completely, rather than incrementally.

 I'm unsure what testing should be applied.  I'm currently doing
 arm-elf, which does at least have a thumb1 multilib, and uses
 newlib so I don't have to fiddle with setting up a full native
 cross environment.  What else should be done?  arm-eabi?


 Testing this on arm-eabi is essential since this may affect C++ unwind
 table generation (I can't see any obvious problems, but you never know).

 
 I've now tested the patch with both arm-elf and arm-eabi with
 RUNTESTFLAGS='--target_board=arm-sim{-mthumb}' with no regressions.
 
 Ok to install?
 

Yep, thanks.

R.

 
 r~
 




Re: Backport AVX256 load/store split patches to gcc 4.6 for performance boost on latest AMD/Intel hardware.

2011-06-28 Thread Richard Guenther
On Tue, Jun 28, 2011 at 12:33 AM, Fang, Changpeng
changpeng.f...@amd.com wrote:
 Hi,

 Attached are the patches we propose to backport to gcc 4.6 branch which are 
 related to avx256 unaligned load/store splitting.
 As we mentioned before,  The combined effect of these patches are positive on 
 both AMD and Intel CPUs on cpu2006 and
 polyhedron 2005.

 0001-Split-32-byte-AVX-unaligned-load-store.patch
 Initial patch that implements unaligned load/store splitting

 0001-Don-t-assert-unaligned-256bit-load-store.patch
 Remove the assert.

 0001-Fix-a-typo-in-mavx256-split-unaligned-store.patch
 Fix a typo.

 0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch
 Disable unaligned load splitting for bdver1.

 All these patches are in 4.7 trunk.

 Bootstrap and tests are on-going in gcc 4.6 branch.

 Is It OK to commit to 4.6 branch as long as the tests pass?

Yes, if they have been approved and checked in for trunk.

Thanks,
Richard.

 Thanks,

 Changpeng



 
 From: Jagasia, Harsha
 Sent: Monday, June 20, 2011 12:03 PM
 To: 'H.J. Lu'
 Cc: 'gcc-patches@gcc.gnu.org'; 'hubi...@ucw.cz'; 'ubiz...@gmail.com'; 
 'hongjiu...@intel.com'; Fang, Changpeng
 Subject: RE: Backport AVX256 load/store split patches to gcc 4.6 for 
 performance boost on latest AMD/Intel hardware.

 On Mon, Jun 20, 2011 at 9:58 AM,  harsha.jaga...@amd.com wrote:
  Is it ok to backport patches, with Changelogs below, already in trunk
 to gcc
  4.6? These patches are for AVX-256bit load store splitting. These
 patches
  make significant performance difference =3% to several CPU2006 and
  Polyhedron benchmarks on latest AMD and Intel hardware. If ok, I will
 post
  backported patches for commit approval.
 
  AMD plans to submit additional patches on AVX-256 load/store
 splitting to
  trunk. We will send additional backport requests for those later once
 they
  are accepted/comitted to trunk.
 

 Since we will make some changes on trunk, I would prefer to to do
 the backport after trunk change is finished.

 Ok, thanks. Adding Changpeng who is working on the trunk changes.

 Harsha




Commit: Add support for V850 variants to libgcc

2011-06-28 Thread Nick Clifton
Hi Guys,

  I am checking in the patch below to add support for V850 variant
  architectures to the libgcc/config.host file.

Cheers
  Nick

libgcc/ChangeLog
2011-06-28  Nick Clifton  ni...@redhat.com

* config.host: Recognize all V850 variants.

Index: libgcc/config.host
===
--- libgcc/config.host  (revision 175575)
+++ libgcc/config.host  (working copy)
@@ -143,6 +143,9 @@
 sh[123456789lbe]*-*-*)
cpu_type=sh
;;
+v850*-*-*)
+   cpu_type=v850
+   ;;
 esac
 
 # Common parts for widely ported systems.
@@ -645,12 +648,8 @@
;;
 spu-*-elf*)
;;
-v850e1-*-*)
+v850*-*-*)
;;
-v850e-*-*)
-   ;;
-v850-*-*)
-   ;;
 vax-*-linux*)
;;
 vax-*-netbsdelf*)


Re: [patch, darwin, committed] fix PR47997

2011-06-28 Thread Iain Sandoe


On 26 Jun 2011, at 17:28, Iain Sandoe wrote:


It should also be applied to 4.6.x at some stage.


applied to 4.6 branch.


gcc/

PR target/47997
* config/darwin.c (darwin_mergeable_string_section): Place string
constants in '.cstring' rather than '.const' when CF/NSStrings are
active.

Index: gcc/config/darwin.c
===
--- gcc/config/darwin.c (revision 175409)
+++ gcc/config/darwin.c (working copy)
@@ -1195,7 +1195,11 @@ static section *
darwin_mergeable_string_section (tree exp,
 unsigned HOST_WIDE_INT align)
{
-  if (flag_merge_constants
+  /* Darwin's ld expects to see non-writable string literals in  
the .cstring
+ section.  Later versions of ld check and complain when  
CFStrings are
+ enabled.  Therefore we shall force the strings into .cstring  
since we

+ don't support writable ones anyway.  */
+  if ((darwin_constant_cfstrings || flag_merge_constants)
   TREE_CODE (exp) == STRING_CST
   TREE_CODE (TREE_TYPE (exp)) == ARRAY_TYPE
   align = 256





Re: {patch tree-ssa-math-opts]: Change searching direction for bswap

2011-06-28 Thread Kai Tietz
Oh, missed to fill comment.

Thanks,
Kai

Index: gcc-head/gcc/tree-ssa-math-opts.c
===
--- gcc-head.orig/gcc/tree-ssa-math-opts.c
+++ gcc-head/gcc/tree-ssa-math-opts.c
@@ -1821,7 +1821,11 @@ execute_optimize_bswap (void)
 {
   gimple_stmt_iterator gsi;

-  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (gsi))
+  /* We scan for bswap patterns reverse for making sure we get
+widest match.  As bswap pattern matching doesn't handle
+previously inserted smaller bswap replacements as sub-
+patterns, the wider variant wouldn't be detected.  */
+  for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi))
 {
  gimple stmt = gsi_stmt (gsi);
  tree bswap_src, bswap_type;


Re: {patch tree-ssa-math-opts]: Change searching direction for bswap

2011-06-28 Thread Richard Guenther
On Tue, Jun 28, 2011 at 11:29 AM, Kai Tietz kti...@redhat.com wrote:
 Oh, missed to fill comment.



 Thanks,
 Kai

 Index: gcc-head/gcc/tree-ssa-math-opts.c
 ===
 --- gcc-head.orig/gcc/tree-ssa-math-opts.c
 +++ gcc-head/gcc/tree-ssa-math-opts.c
 @@ -1821,7 +1821,11 @@ execute_optimize_bswap (void)
     {
       gimple_stmt_iterator gsi;

 -      for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (gsi))
 +      /* We scan for bswap patterns reverse for making sure we get

We do a reverse scan for bswap patterns to make sure we get the widest match.

Ok with that change.

Richard.

 +        widest match.  As bswap pattern matching doesn't handle
 +        previously inserted smaller bswap replacements as sub-
 +        patterns, the wider variant wouldn't be detected.  */
 +      for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi))
         {
          gimple stmt = gsi_stmt (gsi);
          tree bswap_src, bswap_type;



Re: [testsuite] ARM tests vfp-ldm*.c and vfp-stm*.c

2011-06-28 Thread Richard Earnshaw
On 24/06/11 15:49, Janis Johnson wrote:
 On 06/24/2011 03:29 AM, Joseph S. Myers wrote:
 On Thu, 23 Jun 2011, Janis Johnson wrote:

 Tests target/arm/vfp-ldm*.c and vfp-sdm*.c add -mfloat-abi=softfp but
 fail if multilib flags override that option.  This patch skips the test
 for multilibs that specify a different value for -mfloat-abi.

 While they need to be skipped for -mfloat-abi=soft, I'd think they ought 
 to pass for -mfloat-abi=hard - why do they fail there?

 
 They don't, this would be better:
 
 /* { dg-skip-if need fp instructions { *-*-* } { -mfloat-abi=soft } {  
 } } */
 
 Janis
 
 

OK with that change.

R.



Re: [Patch ARM] Add predefine for availability of DSP multiplication functions.

2011-06-28 Thread Richard Earnshaw
On 24/06/11 09:09, James Greenhalgh wrote:
 Hi,
 
 This patch adds a builtin macro __ARM_FEATURE_DSP which is defined
 when the ARMv5E DSP multiplication extensions are available for use.
 
 Thanks,
 James Greenhalgh
 
 2011-06-22  James Greenhalgh  james.greenha...@arm.com
 
   * TARGET_CPU_CPP_BUILTINS: Add __ARM_FEATURE_DSP.
 
 
 0001-Patch-ARM-Add-predefine-for-availability-of-DSP-mult.patch
 
 
 diff --git gcc/config/arm/arm.h gcc/config/arm/arm.h
 index c32ef1a..892065b 100644
 --- gcc/config/arm/arm.h
 +++ gcc/config/arm/arm.h
 @@ -45,6 +45,8 @@ extern char arm_arch_name[];
  #define TARGET_CPU_CPP_BUILTINS()\
do \
  {\
 + if (TARGET_DSP_MULTIPLY)\
 +   builtin_define (__ARM_FEATURE_DSP); \
   /* Define __arm__ even when in thumb mode, for  \
  consistency with armcc.  */  \
   builtin_define (__arm__); \

OK.

R.



Re: [patch tree-optimization]: Try to sink type-casts for binary and/or/xor operations

2011-06-28 Thread Kai Tietz
Ok, moved code out of special case for addresses.

Bootstrapped for x86_64-pc-linux-gnu.  Patch ok for apply?

Regards,
Kai
Index: gcc-head/gcc/tree-ssa-forwprop.c
===
--- gcc-head.orig/gcc/tree-ssa-forwprop.c
+++ gcc-head/gcc/tree-ssa-forwprop.c
@@ -1676,16 +1676,61 @@ simplify_bitwise_binary (gimple_stmt_ite
}
 }
 
+  /* Try to fold (type) X op CST - (type) (X op ((type-x) CST)).  */
+  if (TREE_CODE (arg2) == INTEGER_CST
+   CONVERT_EXPR_CODE_P (def1_code)
+   INTEGRAL_TYPE_P (def1_arg1)
+   int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
+{
+  gimple newop;
+  tree tem = create_tmp_reg (TREE_TYPE (def1_arg1), NULL);
+  newop =
+gimple_build_assign_with_ops (code, tem, def1_arg1,
+ fold_convert_loc (gimple_location (stmt),
+   TREE_TYPE (def1_arg1),
+   arg2));
+  tem = make_ssa_name (tem, newop);
+  gimple_assign_set_lhs (newop, tem);
+  gsi_insert_before (gsi, newop, GSI_SAME_STMT);
+  gimple_assign_set_rhs_with_ops_1 (gsi, NOP_EXPR,
+   tem, NULL_TREE, NULL_TREE);
+  update_stmt (gsi_stmt (*gsi));
+  return true;
+}
+
+  /* Try to fold CST op (type) X - (type) (((type-x) CST) op X).  */
+  if (TREE_CODE (arg1) == INTEGER_CST
+   CONVERT_EXPR_CODE_P (def2_code)
+   INTEGRAL_TYPE_P (def2_arg1)
+   int_fits_type_p (arg1, TREE_TYPE (def2_arg1)))
+{
+  gimple newop;
+  tree tem = create_tmp_reg (TREE_TYPE (def2_arg1), NULL);
+  newop =
+gimple_build_assign_with_ops (code, tem, def2_arg1,
+ fold_convert_loc (gimple_location (stmt),
+   TREE_TYPE (def2_arg1),
+   arg1));
+  tem = make_ssa_name (tem, newop);
+  gimple_assign_set_lhs (newop, tem);
+  gsi_insert_before (gsi, newop, GSI_SAME_STMT);
+  gimple_assign_set_rhs_with_ops_1 (gsi, NOP_EXPR,
+   tem, NULL_TREE, NULL_TREE);
+  update_stmt (gsi_stmt (*gsi));
+  return true;
+}
+
   /* For bitwise binary operations apply operand conversions to the
  binary operation result instead of to the operands.  This allows
  to combine successive conversions and bitwise binary operations.  */
   if (CONVERT_EXPR_CODE_P (def1_code)
CONVERT_EXPR_CODE_P (def2_code)
types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1))
-  /* Make sure that the conversion widens the operands or that it
-changes the operation to a bitfield precision.  */
+  /* Make sure that the conversion widens the operands, or has same
+precision,  or that it changes the operation to a bitfield
+precision.  */
((TYPE_PRECISION (TREE_TYPE (def1_arg1))
-   TYPE_PRECISION (TREE_TYPE (arg1)))
+  = TYPE_PRECISION (TREE_TYPE (arg1)))
  || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1)))
  != MODE_INT)
  || (TYPE_PRECISION (TREE_TYPE (arg1))


Re: [patch tree-optimization]: Try to do type sinking on comparisons

2011-06-28 Thread Kai Tietz
- Original Message -
From: Richard Guenther richard.guent...@gmail.com
To: Kai Tietz kti...@redhat.com
Cc: gcc-patches@gcc.gnu.org
Sent: Tuesday, June 28, 2011 10:45:20 AM
Subject: Re: [patch tree-optimization]: Try to do type sinking on comparisons

On Mon, Jun 27, 2011 at 8:52 PM, Kai Tietz kti...@redhat.com wrote:
 Hello,

 this patch tries to sink conversions for comparisons patterns:
 a) (type) X cmp (type) Y = x cmp y.
 b) (type) X cmp CST = x cmp ((type-x) CST).
 c) CST cmp (type) X = ((type-x) CST) cmp x.

 This patch just allows type sinking for the case that type-precision of type 
 is wider or equal to type-precision of type-x. Or if type and type-x have 
 same signess and CST fits into type-x. For cmp operation is == or !=, we 
 allow also that type and type-x have different signess, as long as CST fits 
 into type-x without truncation.

 ChangeLog

 2011-06-27  Kai Tietz  kti...@redhat.com

        * tree-ssa-forwprop.c (forward_propagate_into_comparision):
        Sink types within comparison operands, if suitable.

 Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply?

Hmm, why does fold_widened_comparison and fold_sign_changed_comparison
not handle these cases?  We already dispatch to fold in this function,
so this is a case where we'd want fold to be improved.  You didn't add
testcases - do you have some that are not handled by fold already?

Thanks,
Richard.

 Regards,
 Kai


Well, I noticed this kind of patterns in case for boolification of comparisons. 
 They seem to appear if one of the comparison operands itself has 
type-promotion and has non-trivial tree.
Nevertheless I am about to rework this patch a bit, as it has some issues about 
type-truncation, if outer type has smaller precision then inner type. I think 
for now
it would be ok to do operations only for case that inner type has smaller or 
equal precision then outer type, and inner and outer type are of integer kind.
In the other case we might want to transform such integer-comparisons from 
((char) a:int) cmp CST to (a:int  (char) ~0) cmp (char) CST. So we are 
truncation proper for comparison.

Nevertheless the more interesting part is, if inner type has smaller or equal 
precision to outer type.

Kai


Ping Re: Clean up TARGET_ASM_NAMED_SECTION defaults

2011-06-28 Thread Joseph S. Myers
Ping.  This patch 
http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01642.html is pending 
review.

-- 
Joseph S. Myers
jos...@codesourcery.com


[Patch, AVR]: Better 32=16*16 widening multiplication

2011-06-28 Thread Georg-Johann Lay
This implements mulhisi3 and umulhisi3 widening multiplication
insns if AVR_HAVE_MUL.

I chose the interface as r25:r22 = r19:r18 * r21:r20 which is ok
because only avr-gcc BE will call respective __* support functions in
libgcc.

Tested without regression and hand-tested assembler code.

Johann

* config/avr/t-avr (LIB1ASMFUNCS): Add _mulhisi3, _umulhisi3,
_xmulhisi3_exit.
* config/avr/libgcc.S (_xmulhisi3_exit): New Function.
(__mulhisi3): Optimize if have MUL*.  Use XJMP instead of rjmp.
(__umulhisi3): Ditto.
* config/avr/avr.md (mulhisi3): New insn expender.
(umulhisi3): New insn expender.
(*mulhisi3_call): New insn.
(*umulhisi3_call): New insn.
Index: config/avr/libgcc.S
===
--- config/avr/libgcc.S	(revision 175574)
+++ config/avr/libgcc.S	(working copy)
@@ -178,10 +178,57 @@ __mulhi3_exit:
 #endif /* defined (L_mulhi3) */
 #endif /* !defined (__AVR_HAVE_MUL__) */
 
+/***
+  Widening Multiplication  32 = 16 x 16
+***/
+  
 #if defined (L_mulhisi3)
-	.global	__mulhisi3
-	.func	__mulhisi3
-__mulhisi3:
+DEFUN __mulhisi3
+#if defined (__AVR_HAVE_MUL__)
+
+;; r25:r22 = r19:r18 * r21:r20
+
+#define A0 18
+#define B0 20
+#define C0 22
+
+#define A1 A0+1
+#define B1 B0+1
+#define C1 C0+1
+#define C2 C0+2
+#define C3 C0+3
+ 
+; C = (signed)A1 * (signed)B1
+muls  A1, B1
+movw  C2, R0
+
+; C += A0 * B0
+mul   A0, B0
+movw  C0, R0
+
+; C += (signed)A1 * B0
+mulsu A1, B0
+sbci  C3, 0
+add   C1, R0
+adc   C2, R1
+clr   __zero_reg__
+adc   C3, __zero_reg__
+
+; C += (signed)B1 * A0
+mulsu B1, A0
+sbci  C3, 0
+XJMP  __xmulhisi3_exit
+
+#undef A0
+#undef A1
+#undef B0
+#undef B1
+#undef C0
+#undef C1
+#undef C2
+#undef C3
+
+#else /* !__AVR_HAVE_MUL__ */
 	mov_l	r18, r24
 	mov_h	r19, r25
 	clr	r24
@@ -192,24 +239,91 @@ __mulhisi3:
 	sbrc	r19, 7
 	dec	r20
 	mov	r21, r20
-	rjmp	__mulsi3
-	.endfunc
+	XJMP	__mulsi3
+#endif /* __AVR_HAVE_MUL__ */
+ENDF __mulhisi3
 #endif /* defined (L_mulhisi3) */
 
 #if defined (L_umulhisi3)
-	.global	__umulhisi3
-	.func	__umulhisi3
-__umulhisi3:
+DEFUN __umulhisi3
+#if defined (__AVR_HAVE_MUL__)
+
+;; r25:r22 = r19:r18 * r21:r20
+
+#define A0 18
+#define B0 20
+#define C0 22
+
+#define A1 A0+1
+#define B1 B0+1
+#define C1 C0+1
+#define C2 C0+2
+#define C3 C0+3
+
+; C = A1 * B1
+mul   A1, B1
+movw  C2, R0
+
+; C += A0 * B0
+mul   A0, B0
+movw  C0, R0
+
+; C += A1 * B0
+mul   A1, B0
+add   C1, R0
+adc   C2, R1
+clr   __zero_reg__
+adc   C3, __zero_reg__
+
+; C += B1 * A0
+mul   B1, A0
+XJMP  __xmulhisi3_exit
+
+#undef A0
+#undef A1
+#undef B0
+#undef B1
+#undef C0
+#undef C1
+#undef C2
+#undef C3
+
+#else /* !__AVR_HAVE_MUL__ */
 	mov_l	r18, r24
 	mov_h	r19, r25
 	clr	r24
 	clr	r25
 	clr	r20
 	clr	r21
-	rjmp	__mulsi3
-	.endfunc
+	XJMP	__mulsi3
+#endif /* __AVR_HAVE_MUL__ */
+ENDF __umulhisi3
 #endif /* defined (L_umulhisi3) */
 
+#if defined (L_xmulhisi3_exit)
+
+;;; Helper for __mulhisi3 resp. __umulhisi3.
+
+#define C0 22
+#define C1 C0+1
+#define C2 C0+2
+#define C3 C0+3
+
+DEFUN __xmulhisi3_exit
+add   C1, R0
+adc   C2, R1
+clr   __zero_reg__
+adc   C3, __zero_reg__
+ret
+ENDF __xmulhisi3_exit
+
+#undef C0
+#undef C1
+#undef C2
+#undef C3
+
+#endif /* defined (L_xmulhisi3_exit) */
+
 #if defined (L_mulsi3)
 /***
Multiplication  32 x 32
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 175574)
+++ config/avr/avr.md	(working copy)
@@ -1056,6 +1056,50 @@ (define_insn *mulsi3_call
   [(set_attr type xcall)
(set_attr cc clobber)])
 
+(define_expand mulhisi3
+  [(set (reg:HI 18)
+(match_operand:HI 1 register_operand ))
+   (set (reg:HI 20)
+(match_operand:HI 2 register_operand ))
+   (set (reg:SI 22) 
+(mult:SI (sign_extend:SI (reg:HI 18))
+ (sign_extend:SI (reg:HI 20
+   (set (match_operand:SI 0 register_operand ) 
+(reg:SI 22))]
+  AVR_HAVE_MUL
+  )
+
+(define_expand umulhisi3
+  [(set (reg:HI 18)
+(match_operand:HI 1 register_operand ))
+   (set (reg:HI 20)
+(match_operand:HI 2 register_operand ))
+   (set (reg:SI 22) 
+(mult:SI (zero_extend:SI (reg:HI 18))
+ (zero_extend:SI (reg:HI 20
+   (set (match_operand:SI 0 register_operand ) 
+(reg:SI 22))]
+  AVR_HAVE_MUL
+  )
+
+(define_insn *mulhisi3_call
+  [(set (reg:SI 22) 
+(mult:SI (sign_extend:SI (reg:HI 18))
+ (sign_extend:SI (reg:HI 20]
+  AVR_HAVE_MUL
+  %~call __mulhisi3
+  [(set_attr type xcall)
+   (set_attr cc clobber)])
+
+(define_insn 

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-28 Thread Andrew Stubbs

On 24/06/11 16:47, Richard Guenther wrote:

I can certainly add checks to make sure that the skipped operations
  actually don't make any important changes to the value, but do I need to?

Yes.


OK, how about this patch?

I've added checks to make sure the value is not truncated at any point.

I've also changed the test cases to address Janis' comments.

Andrew
2011-06-28  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* gimple.h (tree_ssa_harmless_type_conversion): New prototype.
	(tree_ssa_strip_harmless_type_conversions): New prototype.
	(harmless_type_conversion_p): New prototype.
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Look for
	multiply statement beyond no-op conversion statements.
	* tree-ssa.c (harmless_type_conversion_p): New function.
	(tree_ssa_harmless_type_conversion): New function.
	(tree_ssa_strip_harmless_type_conversions): New function.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1090,8 +1090,11 @@ extern bool validate_gimple_arglist (const_gimple, ...);
 
 /* In tree-ssa.c  */
 extern bool tree_ssa_useless_type_conversion (tree);
+extern bool tree_ssa_harmless_type_conversion (tree);
 extern tree tree_ssa_strip_useless_type_conversions (tree);
+extern tree tree_ssa_strip_harmless_type_conversions (tree);
 extern bool useless_type_conversion_p (tree, tree);
+extern bool harmless_type_conversion_p (tree, tree);
 extern bool types_compatible_p (tree, tree);
 
 /* Return the code for GIMPLE statement G.  */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+int
+foo (int a, short b, short c)
+{
+ int bc = b * c;
+return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler mul } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler umlal } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2117,23 +2117,19 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   rhs1 = gimple_assign_rhs1 (stmt);
   rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (rhs1) == SSA_NAME)
-{
-  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-  if (is_gimple_assign (rhs1_stmt))
-	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
-}
-  else
+  if (TREE_CODE (rhs1) != SSA_NAME
+  || TREE_CODE (rhs2) != SSA_NAME)
 return false;
 
-  if (TREE_CODE (rhs2) == SSA_NAME)
-{
-  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-  if (is_gimple_assign (rhs2_stmt))
-	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
-}
-  else
-return false;
+  rhs1 = tree_ssa_strip_harmless_type_conversions (rhs1);
+  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+  if (is_gimple_assign (rhs1_stmt))
+rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+
+  rhs2 = tree_ssa_strip_harmless_type_conversions(rhs2);
+  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+  if (is_gimple_assign (rhs2_stmt))
+rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
 
   if (code == PLUS_EXPR  rhs1_code == MULT_EXPR)
 {
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -1484,6 +1484,33 @@ useless_type_conversion_p (tree outer_type, tree inner_type)
   return false;
 }
 
+/* Return true if the conversion from INNER_TYPE to OUTER_TYPE will
+   not alter the arithmetic meaning of a type, otherwise return false.
+
+   For example, widening an integer type leaves the value unchanged,
+   but narrowing an integer type can cause truncation.
+
+   Note that switching between signed and unsigned modes doesn't change
+   the underlying representation, and so is harmless.
+
+   This function is not yet a complete definition of what is harmless
+   but should reject everything that is not.  */
+
+bool
+harmless_type_conversion_p (tree outer_type, tree inner_type)
+{
+  /* If it's useless, it's also harmless.  */
+  if (useless_type_conversion_p (outer_type, inner_type))
+return true;
+
+  if (INTEGRAL_TYPE_P (inner_type)
+   INTEGRAL_TYPE_P (outer_type)
+   TYPE_PRECISION (inner_type) = TYPE_PRECISION (outer_type))
+return true;
+
+  return false;
+}
+
 /* Return true if a conversion from either type of TYPE1 and TYPE2
to the other is not required.  Otherwise return false.  */
 
@@ -1515,6 +1542,29 @@ tree_ssa_useless_type_conversion (tree expr)
   return false;
 }
 
+/* Return true if EXPR is a harmless type conversion, otherwise return
+   false.  */
+
+bool
+tree_ssa_harmless_type_conversion (tree expr)
+{
+  gimple stmt;
+
+  if (TREE_CODE (expr) != SSA_NAME)
+return false;
+
+  stmt = SSA_NAME_DEF_STMT (expr);
+
+  if (!is_gimple_assign (stmt))
+return false;
+
+  if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
+return false;
+
+  return harmless_type_conversion_p 

Ping #1: [Patch, AVR]: Fix PR34734

2011-06-28 Thread Georg-Johann Lay
http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01462.html

Georg-Johann Lay wrote:
 PR34734 produces annoying, false warnings if __attribute__((progmem))
 is used in conjunction with C++.  DECL_INITIAL is not yet set up in
 avr_handle_progmem_attribute.
 
 Johann
 
   PR target/34734
   * config/avr/avr.c (avr_handle_progmem_attribute): Move warning
   about uninitialized data attributed 'progmem' from here...
   (avr_encode_section_info): ...to this new function.
   (TARGET_ENCODE_SECTION_INFO): New define.
   (avr_section_type_flags): For data in .progmem.data, remove
   section flag SECTION_WRITE.

avr_encode_section_info is good place to emit the warning:
DECL_INITIAL has stabilized for C++, the warning will appear even for
unused variables that will eventually be thrown away, and the warning
appears only once (new_decl_p).

Johann



Re: [pph] Fix var order when streaming in. (issue4635074)

2011-06-28 Thread dnovillo

On 2011/06/28 00:27:04, Gabriel Charette wrote:

The names and namespaces chains are built by adding each new element

to the

front of the list. When streaming it in we traverse the list of names

and re-add

them to the current chains; thus reversing the order in which they

were defined

in the header file.



Since this is a singly linked-list we cannot start from the tail; thus

we

reverse the chain in place and then traverse it, now adding the

bindings in the

same order they were found in the header file.



I introduced a new failing test to test this. The test showed the

reverse

behaviour prior to the patch.
The test still fails however, there is another inversion problem

between the

global variables and the .LFBO, .LCFI0, ...
This patch only fixes the inversion of the global variables

declarations in the

assembly, not the second issue this is exposing.
This second issue is potentially already exposed by another test?? Do

we need

this new test?


It can't hurt.


This fixes all of the assembly mismatches in c1limits-externalid.cc

however!

Nice!


2011-06-27  Gabriel Charette  mailto:gch...@google.com



* pph-streamer-in.c (pph_add_bindings_to_namespace): Reverse names

and

namespaces chains.



* g++.dg/pph/c1limits-externalid.cc: Remove pph asm xdiff.
* g++.dg/pph/c1varorder.cc: New.
* g++.dg/pph/c1varorder.h: New.
* g++.dg/pph/pph.map: Add c1varorder.h


OK with a minor comment nit.

http://codereview.appspot.com/4635074/


Re: [pph] Fix var order when streaming in. (issue4635074)

2011-06-28 Thread dnovillo


http://codereview.appspot.com/4635074/diff/1/gcc/cp/pph-streamer-in.c
File gcc/cp/pph-streamer-in.c (right):

http://codereview.appspot.com/4635074/diff/1/gcc/cp/pph-streamer-in.c#newcode1144
gcc/cp/pph-streamer-in.c:1144: /* The chains are built backwards (ref:
add_decl_to_level@name-lookup.c),

1143
1144   /* The chains are built backwards (ref:

add_decl_to_level@name-lookup.c),

s/add_decl_to_level@name-lookup.c/add_decl_to_level/

http://codereview.appspot.com/4635074/


Re: [patch tree-optimization]: Try to sink type-casts for binary and/or/xor operations

2011-06-28 Thread Richard Guenther
On Tue, Jun 28, 2011 at 12:04 PM, Kai Tietz kti...@redhat.com wrote:
 Ok, moved code out of special case for addresses.

 Bootstrapped for x86_64-pc-linux-gnu.  Patch ok for apply?

There is no need to check for CST op (T) arg, the constant is always
the 2nd operand for commutative operations.

Ok with that variant removed.

Thanks,
Richard.

 Regards,
 Kai



Re: [PATCH, PR 49094] Refrain from creating misaligned accesses in SRA

2011-06-28 Thread Martin Jambor
Hi,

On Mon, Jun 27, 2011 at 03:18:01PM +0200, Richard Guenther wrote:
 On Sun, 26 Jun 2011, Martin Jambor wrote:
 
  Hi,
  
  under some circumstances involving user specified alignment and/or
  packed attributes, SRA can create a misaligned MEM_REF.  As the
  testcase demonstrates, it is not enough to not consider variables with
  these type attributes, mainly because we might attempt to load/store
  the scalar replacements from/to right/left sides of original aggregate
  assignments which might be misaligned.
  

...

 
 I think you want something like
 
 static bool
 tree_non_mode_aligned_mem_p (tree exp)
 {
   enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
   unsigned int align;
 
   if (mode == BLKmode
   || !STRICT_ALIGNMENT)
 return false;
 
   align = get_object_alignment (exp, BIGGEST_ALIGNMENT);
   if (GET_MODE_ALIGNMENT (mode)  align)
 return true;
 
   return false;
 }
 
 as for STRICT_ALIGNMENT targets we assume that the loads/stores SRA
 inserts have the alignment of the mode.
 

I admit to be surprised this works, I did not know aggregates could
have non-BLK modes.  Anyway, it does, and so I intend to commit the
following  this evening, after a testsuite run on sparc64.  Please
stop me if the previous message was not a pre-approval of sorts.

Thanks a lot,

Martin


2011-06-28  Martin Jambor  mjam...@suse.cz

PR tree-optimization/49094
* tree-sra.c (tree_non_mode_aligned_mem_p): New function.
(build_accesses_from_assign): Use it.

* testsuite/gcc.dg/tree-ssa/pr49094.c: New test.


Index: src/gcc/tree-sra.c
===
--- src.orig/gcc/tree-sra.c
+++ src/gcc/tree-sra.c
@@ -1050,6 +1050,25 @@ disqualify_ops_if_throwing_stmt (gimple
   return false;
 }
 
+/* Return true iff type of EXP is not sufficiently aligned.  */
+
+static bool
+tree_non_mode_aligned_mem_p (tree exp)
+{
+  enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
+  unsigned int align;
+
+  if (mode == BLKmode
+  || !STRICT_ALIGNMENT)
+return false;
+
+  align = get_object_alignment (exp, BIGGEST_ALIGNMENT);
+  if (GET_MODE_ALIGNMENT (mode)  align)
+return true;
+
+  return false;
+}
+
 /* Scan expressions occuring in STMT, create access structures for all accesses
to candidates for scalarization and remove those candidates which occur in
statements or expressions that prevent them from being split apart.  Return
@@ -1074,7 +1093,10 @@ build_accesses_from_assign (gimple stmt)
   lacc = build_access_from_expr_1 (lhs, stmt, true);
 
   if (lacc)
-lacc-grp_assignment_write = 1;
+{
+  lacc-grp_assignment_write = 1;
+  lacc-grp_unscalarizable_region |= tree_non_mode_aligned_mem_p (rhs);
+}
 
   if (racc)
 {
@@ -1082,6 +1104,7 @@ build_accesses_from_assign (gimple stmt)
   if (should_scalarize_away_bitmap  !gimple_has_volatile_ops (stmt)
   !is_gimple_reg_type (racc-type))
bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc-base));
+  racc-grp_unscalarizable_region |= tree_non_mode_aligned_mem_p (lhs);
 }
 
   if (lacc  racc
Index: src/gcc/testsuite/gcc.dg/tree-ssa/pr49094.c
===
--- /dev/null
+++ src/gcc/testsuite/gcc.dg/tree-ssa/pr49094.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options -O } */
+
+struct in_addr {
+   unsigned int s_addr;
+};
+
+struct ip {
+   unsigned char ip_p;
+   unsigned short ip_sum;
+   struct  in_addr ip_src,ip_dst;
+} __attribute__ ((aligned(1), packed));
+
+struct ip ip_fw_fwd_addr;
+
+int test_alignment( char *m )
+{
+  struct ip *ip = (struct ip *) m;
+  struct in_addr pkt_dst;
+  pkt_dst = ip-ip_dst ;
+  if( pkt_dst.s_addr == 0 )
+return 1;
+  else
+return 0;
+}
+
+int __attribute__ ((noinline, noclone))
+intermediary (char *p)
+{
+  return test_alignment (p);
+}
+
+int
+main (int argc, char *argv[])
+{
+  ip_fw_fwd_addr.ip_dst.s_addr = 1;
+  return intermediary ((void *) ip_fw_fwd_addr);
+}


Re: [google] Enable both ld and gold in gcc (issue4664051)

2011-06-28 Thread Diego Novillo

On 11-06-27 19:09 , Doug Kwan wrote:

This patch enables both ld and gold in gcc using the -fuse-ld switch.  The
original patch use written by Nick Clifton and was subsequently updated by
Matthias Klose.  The patch currently does not work with LTO but that is
okay for now and it is no worse than its counterpart in an older gcc version.
We need this functionality for now.  It is mostly used as a safety net in the
Android toolchain if gold does not work.  We can disable LTO in that case.
Hopefully we will fix this can resubmit it for trunk later.

This is tested by running ./buildit and building the Android toolchain.

I would like to apply this to goolge/main only.

2011-06-27   Doug Kwandougk...@google.com

Google ref 41164-p2
Backport upstream patch under review.

2011-01-19   Nick Cliftonni...@redhat.com
Matthias Klosed...@debian.org

* configure.ac (gcc_cv_gold_srcdir): New cached variable -
contains the location of the gold sources.
(ORIGINAL_GOLD_FOR_TARGET): New substituted variable - contains
the name of the locally built gold executable.
* configure: Regenerate.
* collect2.c (main): Detect the -use-gold and -use-ld switches
and select the appropriate linker, if found.
If a linker cannot be found and collect2 is executing in
verbose mode then report the search paths examined.
* exec-tool.in: Detect the -use-gold and -use-ld switches and
select the appropriate linker, if found.
Add support for -v switch.
Report problems locating linker executable.
* gcc.c (LINK_COMMAND_SPEC): Translate -fuse-ld=gold into
-use-gold and -fuse-ld=bfd into -use-ld.
* common.opt: Add fuse-ld=gold and fuse-ld=bfd.
* opts.c (comman_handle_option): Ignore -fuse-ld=gold and
-fuse-ld=bfd.
* doc/invoke.texi: Document the new options.


OK for google/main.

Nick/Matthias, anything in particular blocking this patch in trunk? 
(other than the LTO issue)



Diego.

--
This patch is available for review at http://codereview.appspot.com/4664051



Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-28 Thread Richard Guenther
On Tue, Jun 28, 2011 at 12:47 PM, Andrew Stubbs andrew.stu...@gmail.com wrote:
 On 24/06/11 16:47, Richard Guenther wrote:

 I can certainly add checks to make sure that the skipped operations
   actually don't make any important changes to the value, but do I need
  to?

 Yes.

 OK, how about this patch?

I'd name the predicate value_preserving_conversion_p which I think
is what you mean.  harmless isn't really descriptive.

Note that you include non-value-preserving conversions, namely
int - unsigned int.  Don't dispatch to useless_type_conversion_p,
it's easy to enumerate which conversions are value-preserving.

Don't try to match the tree_ssa_useless_* set of functions, instead
put the value_preserving_conversion_p predicate in tree.[ch] and
a suitable function using it in tree-ssa-math-opts.c.

Thanks,
Richard.

 I've added checks to make sure the value is not truncated at any point.

 I've also changed the test cases to address Janis' comments.

 Andrew



Re: Simplify Solaris configuration

2011-06-28 Thread Rainer Orth
Eric,

 At least I can build the 64-bit libgcc now, but the 32-bit one fails for
 unrelated reasons:

 configure:3247: checking for suffix of object files
 configure:3269: /var/gcc/gcc-4.7.0-20110622/11-gcc/./gcc/xgcc
 -B/var/gcc/gcc-4.7.0-20110622/11-gcc/./gcc/
 -B/usr/local/sparcv9-sun-solaris2.11/bin/
 -B/usr/local/sparcv9-sun-solaris2.11/lib/ -isystem
 /usr/local/sparcv9-sun-solaris2.11/include -isystem
 /usr/local/sparcv9-sun-solaris2.11/sys-include  -m32 -c -g -O2  conftest.c
 5 conftest.c:16:1: internal compiler error: in simplify_subreg, at
 simplify-rtx.c:5362

 It's very likely the same problem, the options -mptr32 -mno-stack-bias aren't 
 passed to cc1 anymore.

right. sparc/sol2-64.h was included too late.  The following patch fixes
this.  Other approaches to reordering the headers ran into various
issues since TARGET_DEFAULT is defined and redefined in several places.

The patch allowed a sparcv9-sun-solaris2.11 bootstrap to run well into
building the target libraries (failed configuring libgfortran since I'd
mis-merged the 32-bit and 64-bit gmp.h), a sparc-sun-solaris2.10
bootstrap is still running.

I'll probably fix the gmp.h issue, rebuild the sparcv9-sun-solaris2.11
configuration and commit unless I find problems or you disapprove of the
approach.

Rainer


2011-06-28  Rainer Orth  r...@cebitec.uni-bielefeld.de

* config/sparc/sol2-64.h (TARGET_DEFAULT): Remove.
(TARGET_64BIT_DEFAULT): Define.
* config.gcc (sparc*-*-solaris2*): Move sparc/sol2-64.h to front
of tm_file.
* config/sparc/sol2.h [TARGET_64BIT_DEFAULT] (TARGET_DEFAULT): Define.

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2482,7 +2482,7 @@ sparc*-*-solaris2*)
tm_file=sparc/biarch64.h ${tm_file} ${sol2_tm_file} sol2-bi.h
case ${target} in
sparc64-*-* | sparcv9-*-*)
-   tm_file=${tm_file} sparc/sol2-64.h
+   tm_file=sparc/sol2-64.h ${tm_file}
;;
*)
test x$with_cpu != x || with_cpu=v9
diff --git a/gcc/config/sparc/sol2-64.h b/gcc/config/sparc/sol2-64.h
--- a/gcc/config/sparc/sol2-64.h
+++ b/gcc/config/sparc/sol2-64.h
@@ -1,7 +1,7 @@
 /* Definitions of target machine for GCC, for bi-arch SPARC
running Solaris 2, defaulting to 64-bit code generation.
 
-   Copyright (C) 1999, 2010 Free Software Foundation, Inc.
+   Copyright (C) 1999, 2010, 2011 Free Software Foundation, Inc.
 
 This file is part of GCC.
 
@@ -19,7 +19,4 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 http://www.gnu.org/licenses/.  */
 
-#undef TARGET_DEFAULT
-#define TARGET_DEFAULT \
-  (MASK_V9 + MASK_PTR64 + MASK_64BIT /* + MASK_HARD_QUAD */ + \
-   MASK_STACK_BIAS + MASK_APP_REGS + MASK_FPU + MASK_LONG_DOUBLE_128)
+#define TARGET_64BIT_DEFAULT 1
diff --git a/gcc/config/sparc/sol2.h b/gcc/config/sparc/sol2.h
--- a/gcc/config/sparc/sol2.h
+++ b/gcc/config/sparc/sol2.h
@@ -20,11 +20,17 @@ You should have received a copy of the G
 along with GCC; see the file COPYING3.  If not see
 http://www.gnu.org/licenses/.  */
 
+#undef TARGET_DEFAULT
+#ifdef TARGET_64BIT_DEFAULT
+#define TARGET_DEFAULT \
+  (MASK_V9 + MASK_PTR64 + MASK_64BIT /* + MASK_HARD_QUAD */ + \
+   MASK_STACK_BIAS + MASK_APP_REGS + MASK_FPU + MASK_LONG_DOUBLE_128)
+#else
 /* Solaris allows 64 bit out and global registers in 32 bit mode.
sparc_override_options will disable V8+ if not generating V9 code.  */
-#undef TARGET_DEFAULT
 #define TARGET_DEFAULT (MASK_V8PLUS + MASK_APP_REGS + MASK_FPU \
+ MASK_LONG_DOUBLE_128)
+#endif
 
 /* The default code model used to be CM_MEDANY on Solaris
but even Sun eventually found it to be quite wasteful


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] [annotalysis] Support IPA-SRA cloned functions (issue 4591066)

2011-06-28 Thread Diego Novillo
On Wed, Jun 22, 2011 at 16:10, Delesley Hutchins deles...@google.com wrote:

 Hi,
 This patch is merely a port of an earlier patch, made by Le-Chun Wu, from
 google/main to annotalysis.  It extends Annotalysis to support cloned
 functions/methods (especially created by IPA-SRA).
 Bootstrapped and passed GCC regression testsuite on
 x86_64-unknown-linux-gnu.
 Okay for branches/annotalysis?
   -DeLesley

 2011-06-22  Le-Chun Wu  l...@google.com,  DeLesley Hutchins
 deles...@google.com

Minor nit.  Align names vertically:

2011-06-22  Le-Chun Wu  l...@google.com
DeLesley Hutchins deles...@google.com

         * tree-threadsafe-analyze.c (build_fully_qualified_lock): Handle
         IPA-SRA cloned methods.
         (get_canonical_lock_expr): Fold expressions that are INDIRECT_REF on
         top of ADDR_EXPR.
         (check_lock_required): Handle IPA-SRA cloned methods.
         (check_func_lock_excluded): Likewise.
         (process_function_attrs): Likewise.

OK.

Incidentally, I think it would make sense to have you added to the
list of maintainers for the annotalysis branch.  Le-Chun, what do you
think?


Diego.


Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies

2011-06-28 Thread Andrew Stubbs

On 23/06/11 15:41, Andrew Stubbs wrote:

If one or both of the inputs to a widening multiply are of unsigned type
then the compiler will attempt to use usmul_widen_optab or
umul_widen_optab, respectively.

That works fine, but only if the target supports those operations
directly. Otherwise, it just bombs out and reverts to the normal
inefficient non-widening multiply.

This patch attempts to catch these cases and use an alternative signed
widening multiply instruction, if one of those is available.

I believe this should be legal as long as the top bit of both inputs is
guaranteed to be zero. The code achieves this guarantee by
zero-extending the inputs to a wider mode (which must still be narrower
than the output mode).

OK?


This update fixes the testsuite issue Janis pointed out.

Andrew
2011-06-28  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* Makefile.in (tree-ssa-math-opts.o): Add langhooks.h dependency.
	* optabs.c (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this, and add new
	argument 'found_mode'.
	* optabs.h (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this.
	(find_widening_optab_handler): New macro.
	* tree-ssa-math-opts.c: Include langhooks.h
	(build_and_insert_cast): New function.
	(convert_mult_to_widen): Add new argument 'gsi'.
	Convert unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.
	(execute_optimize_widening_mul): Pass gsi to convert_mult_to_widen.

	gcc/testsuite/
	* gcc.target/arm/wmul-6.c: New file.

--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2672,7 +2672,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
$(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
-   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
+   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
+   langhooks.h
 tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
$(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
$(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -232,9 +232,10 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
non-widening optabs also.  */
 
 enum insn_code
-find_widening_optab_handler (optab op, enum machine_mode to_mode,
-			 enum machine_mode from_mode,
-			 int permit_non_widening)
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+  enum machine_mode from_mode,
+  int permit_non_widening,
+  enum machine_mode *found_mode)
 {
   for (; (permit_non_widening || from_mode != to_mode)
 	  GET_MODE_SIZE (from_mode) = GET_MODE_SIZE (to_mode)
@@ -245,7 +246,11 @@ find_widening_optab_handler (optab op, enum machine_mode to_mode,
 		   from_mode);
 
   if (handler != CODE_FOR_nothing)
-	return handler;
+	{
+	  if (found_mode)
+	*found_mode = from_mode;
+	  return handler;
+	}
 }
 
   return CODE_FOR_nothing;
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -808,8 +808,13 @@ extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
 /* Find a widening optab even if it doesn't widen as much as we want.  */
-extern enum insn_code find_widening_optab_handler (optab, enum machine_mode,
-		   enum machine_mode, int);
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+			enum machine_mode,
+			enum machine_mode,
+			int,
+			enum machine_mode *);
 
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
distinguish between machines with a vector shift that takes a scalar for the
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler smlal } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -98,6 +98,7 @@ along with GCC; see the file COPYING3.  If not see
 #include basic-block.h
 #include target.h
 #include gimple-pretty-print.h
+#include langhooks.h
 
 /* FIXME: RTL headers have to be included here for optabs.  */
 #include rtl.h		/* Because optabs.h wants enum rtx_code.  */
@@ -1086,6 +1087,21 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type,
   return result;
 }
 
+/* Build a gimple assignment to cast VAL to TYPE, and put the result in
+   TARGET.  Insert the statement prior to GSI's current 

Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies

2011-06-28 Thread Paolo Bonzini

On 06/23/2011 04:41 PM, Andrew Stubbs wrote:


I believe this should be legal as long as the top bit of both inputs is
guaranteed to be zero. The code achieves this guarantee by
zero-extending the inputs to a wider mode (which must still be narrower
than the output mode).


Yes, that's correct.

Paolo


[PATCH, SRA] Total scalarization and padding

2011-06-28 Thread Martin Jambor
Hi,

at the moment SRA can get confused by alignment padding and think that
it actually contains some data for which there is no planned
replacement and thus might leave some loads and stores in place
instead of removing them.  This is perhaps the biggest problem when we
attempt total scalarization of simple structures exactly in order to
get rid of these and of the variables altogether.

I've pondered for quite a while how to best deal with them.  One
option was to make just the total scalarization stronger.  I have also
contemplated creating phantom accesses for padding I could detect
(i.e. in simple structures) which would be more general, but this
would complicate the parts of SRA which are already quite convoluted
and I was not really sure it was worth it.

Eventually I decided for the total scalarization option.  This patch
changes it such that the flag is propagated down the access tree but
also, if it does not work out, is reset on the way up.  If the flag
survives, the access tree is considered covered by scalar
replacements and thus it is known not to contain unscalarized data.

While changing function analyze_access_subtree I have simplified the
way we compute the hole flag and also fixed one comparison which we
currently have the wrong way round but it fortunately does not matter
because if there is a hole, the covered_to will never add up to the
total size.  I'll probably post a separate patch against 4.6 just in
case someone attempts to read the source.

Bootstrapped and tested on x86_64-linux, OK for trunk?

Thanks,

Martin


2011-06-24  Martin Jambor  mjam...@suse.cz

* tree-sra.c (struct access): Rename total_scalarization to
grp_total_scalarization
(completely_scalarize_var): New function.
(sort_and_splice_var_accesses): Set total_scalarization in the
representative access.
(analyze_access_subtree): Propagate total scalarization accross the
tree, no holes in totally scalarized trees, simplify coverage
computation.
(analyze_all_variable_accesses): Call completely_scalarize_var instead
of completely_scalarize_record.

* testsuite/gcc.dg/tree-ssa/sra-12.c: New test.

Index: src/gcc/tree-sra.c
===
*** src.orig/gcc/tree-sra.c
--- src/gcc/tree-sra.c
*** struct access
*** 170,179 
/* Is this particular access write access? */
unsigned write : 1;
  
-   /* Is this access an artificial one created to scalarize some record
-  entirely? */
-   unsigned total_scalarization : 1;
- 
/* Is this access an access to a non-addressable field? */
unsigned non_addressable : 1;
  
--- 170,175 
*** struct access
*** 204,209 
--- 200,209 
   is not propagated in the access tree in any direction.  */
unsigned grp_scalar_write : 1;
  
+   /* Is this access an artificial one created to scalarize some record
+  entirely? */
+   unsigned grp_total_scalarization : 1;
+ 
/* Other passes of the analysis use this bit to make function
   analyze_access_subtree create scalar replacements for this group if
   possible.  */
*** dump_access (FILE *f, struct access *acc
*** 377,402 
fprintf (f, , type = );
print_generic_expr (f, access-type, 0);
if (grp)
! fprintf (f, , total_scalarization = %d, grp_read = %d, grp_write = %d, 
!grp_assignment_read = %d, grp_assignment_write = %d, 
!grp_scalar_read = %d, grp_scalar_write = %d, 
 grp_hint = %d, grp_covered = %d, 
 grp_unscalarizable_region = %d, grp_unscalarized_data = %d, 
 grp_partial_lhs = %d, grp_to_be_replaced = %d, 
 grp_maybe_modified = %d, 
 grp_not_necessarilly_dereferenced = %d\n,
!access-total_scalarization, access-grp_read, access-grp_write,
!access-grp_assignment_read, access-grp_assignment_write,
!access-grp_scalar_read, access-grp_scalar_write,
 access-grp_hint, access-grp_covered,
 access-grp_unscalarizable_region, access-grp_unscalarized_data,
 access-grp_partial_lhs, access-grp_to_be_replaced,
 access-grp_maybe_modified,
 access-grp_not_necessarilly_dereferenced);
else
! fprintf (f, , write = %d, total_scalarization = %d, 
 grp_partial_lhs = %d\n,
!access-write, access-total_scalarization,
 access-grp_partial_lhs);
  }
  
--- 377,402 
fprintf (f, , type = );
print_generic_expr (f, access-type, 0);
if (grp)
! fprintf (f, , grp_read = %d, grp_write = %d, grp_assignment_read = %d, 
!grp_assignment_write = %d, grp_scalar_read = %d, 
!grp_scalar_write = %d, grp_total_scalarization = %d, 
 grp_hint = %d, grp_covered = %d, 
 grp_unscalarizable_region = %d, grp_unscalarized_data = %d, 
 grp_partial_lhs = 

Re: [PATCH, SRA] Total scalarization and padding

2011-06-28 Thread Richard Guenther
On Tue, Jun 28, 2011 at 2:50 PM, Martin Jambor mjam...@suse.cz wrote:
 Hi,

 at the moment SRA can get confused by alignment padding and think that
 it actually contains some data for which there is no planned
 replacement and thus might leave some loads and stores in place
 instead of removing them.  This is perhaps the biggest problem when we
 attempt total scalarization of simple structures exactly in order to
 get rid of these and of the variables altogether.

 I've pondered for quite a while how to best deal with them.  One
 option was to make just the total scalarization stronger.  I have also
 contemplated creating phantom accesses for padding I could detect
 (i.e. in simple structures) which would be more general, but this
 would complicate the parts of SRA which are already quite convoluted
 and I was not really sure it was worth it.

 Eventually I decided for the total scalarization option.  This patch
 changes it such that the flag is propagated down the access tree but
 also, if it does not work out, is reset on the way up.  If the flag
 survives, the access tree is considered covered by scalar
 replacements and thus it is known not to contain unscalarized data.

 While changing function analyze_access_subtree I have simplified the
 way we compute the hole flag and also fixed one comparison which we
 currently have the wrong way round but it fortunately does not matter
 because if there is a hole, the covered_to will never add up to the
 total size.  I'll probably post a separate patch against 4.6 just in
 case someone attempts to read the source.

 Bootstrapped and tested on x86_64-linux, OK for trunk?

So, what will it do for the testcase?

The following is what I _think_ it should do:

bb 2:
  l = *p_1(D);
  l$i_6 = p_1(D)-i;
  D.2700_2 = l$i_6;
  D.2701_3 = D.2700_2 + 1;
  l$i_12 = D.2701_3;
  *p_1(D) = l;
  p_1(D)-i = l$i_12;

and let FRE/DSE do their job (which they don't do, unfortunately).
So does your patch then remove the load/store from/to l but keep
the elementwise loads/stores (which are probably cleaned up by FRE)?

Richard.


 Thanks,

 Martin


 2011-06-24  Martin Jambor  mjam...@suse.cz

        * tree-sra.c (struct access): Rename total_scalarization to
        grp_total_scalarization
        (completely_scalarize_var): New function.
        (sort_and_splice_var_accesses): Set total_scalarization in the
        representative access.
        (analyze_access_subtree): Propagate total scalarization accross the
        tree, no holes in totally scalarized trees, simplify coverage
        computation.
        (analyze_all_variable_accesses): Call completely_scalarize_var instead
        of completely_scalarize_record.

        * testsuite/gcc.dg/tree-ssa/sra-12.c: New test.

 Index: src/gcc/tree-sra.c
 ===
 *** src.orig/gcc/tree-sra.c
 --- src/gcc/tree-sra.c
 *** struct access
 *** 170,179 
    /* Is this particular access write access? */
    unsigned write : 1;

 -   /* Is this access an artificial one created to scalarize some record
 -      entirely? */
 -   unsigned total_scalarization : 1;
 -
    /* Is this access an access to a non-addressable field? */
    unsigned non_addressable : 1;

 --- 170,175 
 *** struct access
 *** 204,209 
 --- 200,209 
       is not propagated in the access tree in any direction.  */
    unsigned grp_scalar_write : 1;

 +   /* Is this access an artificial one created to scalarize some record
 +      entirely? */
 +   unsigned grp_total_scalarization : 1;
 +
    /* Other passes of the analysis use this bit to make function
       analyze_access_subtree create scalar replacements for this group if
       possible.  */
 *** dump_access (FILE *f, struct access *acc
 *** 377,402 
    fprintf (f, , type = );
    print_generic_expr (f, access-type, 0);
    if (grp)
 !     fprintf (f, , total_scalarization = %d, grp_read = %d, grp_write = %d, 
 
 !            grp_assignment_read = %d, grp_assignment_write = %d, 
 !            grp_scalar_read = %d, grp_scalar_write = %d, 
             grp_hint = %d, grp_covered = %d, 
             grp_unscalarizable_region = %d, grp_unscalarized_data = %d, 
             grp_partial_lhs = %d, grp_to_be_replaced = %d, 
             grp_maybe_modified = %d, 
             grp_not_necessarilly_dereferenced = %d\n,
 !            access-total_scalarization, access-grp_read, access-grp_write,
 !            access-grp_assignment_read, access-grp_assignment_write,
 !            access-grp_scalar_read, access-grp_scalar_write,
             access-grp_hint, access-grp_covered,
             access-grp_unscalarizable_region, access-grp_unscalarized_data,
             access-grp_partial_lhs, access-grp_to_be_replaced,
             access-grp_maybe_modified,
             access-grp_not_necessarilly_dereferenced);
    else
 !     fprintf (f, , write = %d, total_scalarization = %d, 
             grp_partial_lhs = %d\n,
 

Re: [Patch, AVR]: Better 32=16*16 widening multiplication

2011-06-28 Thread Denis Chertykov
2011/6/28 Georg-Johann Lay a...@gjlay.de:
 This implements mulhisi3 and umulhisi3 widening multiplication
 insns if AVR_HAVE_MUL.

 I chose the interface as r25:r22 = r19:r18 * r21:r20 which is ok
 because only avr-gcc BE will call respective __* support functions in
 libgcc.

 Tested without regression and hand-tested assembler code.

 Johann

        * config/avr/t-avr (LIB1ASMFUNCS): Add _mulhisi3, _umulhisi3,
        _xmulhisi3_exit.
        * config/avr/libgcc.S (_xmulhisi3_exit): New Function.
        (__mulhisi3): Optimize if have MUL*.  Use XJMP instead of rjmp.
        (__umulhisi3): Ditto.
        * config/avr/avr.md (mulhisi3): New insn expender.
        (umulhisi3): New insn expender.
        (*mulhisi3_call): New insn.
        (*umulhisi3_call): New insn.


Approved.

Denis.


Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const

2011-06-28 Thread H.J. Lu
On Mon, Jun 27, 2011 at 3:25 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Jun 27, 2011 at 3:19 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Jun 27, 2011 at 3:08 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
 H.J. Lu wrote:

 reload generates:

 (insn 914 912 0 (set (reg:SI 0 ax)
         (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0)
             (const_int 8 [0x8]))) 248 {*lea_1_x32}
      (nil))

 from

 insn = emit_insn_if_valid_for_reload (gen_rtx_SET (VOIDmode, out, in));

 Interesting.  The pseudo should have been replaced by the
 hard register (reg:DI 1) during the preceding call to
      op0 = find_replacement (XEXP (in, 0));
 (since reload 0 should have pushed a replacement record.)

 Interestingly enough, in the final output that replacement *is*
 performed in the REG_EQUIV note:

 (insn 1023 1022 1024 34 (set (reg:SI 1 dx)
        (plus:SI (reg:SI 1 dx)
            (const_int 8 [0x8]))) spooles.c:291 248 {*lea_1_x32}
     (expr_list:REG_EQUIV (plus:SI (subreg:SI (reg:DI 1 dx) 0)
            (const_int 8 [0x8]))
        (nil)))

 which is why I hadn't expected this to be a problem here.

 Can you try to find out why the find_replacement doesn't work
 with your test case?


 I will investigate.  Could (reg:SI 1 dx) vs  (subreg:SI (reg:DI 1 dx) 0)
 a problem?


 find_replacement never checks subreg:

 Breakpoint 3, find_replacement (loc=0x7068ab00)
    at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411
 6411          if (reloadreg  r-where == loc)
 (reg:DI 0 ax)
 (reg/v/f:DI 182 [ b ])
 (gdb) call debug_rtx (*loc)
 (subreg:SI (reg/v/f:DI 182 [ b ]) 0)
 (gdb)


This patch checks SUBREG pointer if Pmode != ptr_mode.  OK
for trunk?

Thanks.

-- 
H.J.
---
2011-06-28  H.J. Lu  hongjiu...@intel.com

PR rtl-optimization/49114
* reload.c (find_replacement): Properly handle SUBREG pointers.

diff --git a/gcc/reload.c b/gcc/reload.c
index 3ad46b9..829e45b 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -6415,6 +6415,36 @@ find_replacement (rtx *loc)

  return reloadreg;
}
+  else if (Pmode != ptr_mode
+   !r-subreg_loc
+   reloadreg
+   (r-mode == Pmode || GET_MODE (reloadreg) == Pmode)
+   REG_P (reloadreg)
+   GET_CODE (*loc) == SUBREG
+   REG_P (SUBREG_REG (*loc))
+   REG_POINTER (SUBREG_REG (*loc))
+   GET_MODE (*loc) == ptr_mode
+   r-where == SUBREG_REG (*loc))
+   {
+ int offset;
+
+ if (r-mode != VOIDmode  GET_MODE (reloadreg) != r-mode)
+   reloadreg = gen_rtx_REG (r-mode, REGNO (reloadreg));
+
+ if ((WORDS_BIG_ENDIAN || BYTES_BIG_ENDIAN)
+  GET_MODE_SIZE (Pmode)  GET_MODE_SIZE (ptr_mode))
+   {
+ offset = GET_MODE_SIZE (Pmode) - GET_MODE_SIZE (ptr_mode);
+ if (! BYTES_BIG_ENDIAN)
+   offset = (offset / UNITS_PER_WORD) * UNITS_PER_WORD;
+ else if (! WORDS_BIG_ENDIAN)
+   offset %= UNITS_PER_WORD;
+   }
+  else
+offset = 0;
+
+ return gen_rtx_SUBREG (ptr_mode, reloadreg, offset);
+   }
   else if (reloadreg  r-subreg_loc == loc)
{
  /* RELOADREG must be either a REG or a SUBREG.


MN10330: Do not use linker relaxation and incremental linking together

2011-06-28 Thread Nick Clifton
Hi Guys,

  With the MN10300, enabling linker relaxation when performing a
  incremental link does not work:

% mn10300-elf-gcc hello.c -mrelax -r
collect-ld: --relax and -r may not be used together
collect2: error: ld returned 1 exit status

  Hence I am applying the patch below as an obvious fix for the problem.
  Tested without regressions on an mn10300-elf toolchain.

Cheers
  Nick

gcc/ChangeLog
2011-06-28  Nick Clifton  ni...@redhat.com

* config/mn10300/mn10300.h (LINK_SPEC): Do not use linker
relaxation when performing an incremental link.

Index: gcc/config/mn10300/mn10300.h
===
--- gcc/config/mn10300/mn10300.h(revision 175576)
+++ gcc/config/mn10300/mn10300.h(working copy)
@@ -24,7 +24,7 @@
 #undef LIB_SPEC
 #undef ENDFILE_SPEC
 #undef  LINK_SPEC
-#define LINK_SPEC %{mrelax:--relax}
+#define LINK_SPEC %{mrelax:%{!r:--relax}}
 #undef  STARTFILE_SPEC
 #define STARTFILE_SPEC 
%{!mno-crt0:%{!shared:%{pg:gcrt0%O%s}%{!pg:%{p:mcrt0%O%s}%{!p:crt0%O%s


Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies

2011-06-28 Thread Andrew Stubbs

On 28/06/11 13:33, Andrew Stubbs wrote:

On 23/06/11 15:41, Andrew Stubbs wrote:

If one or both of the inputs to a widening multiply are of unsigned type
then the compiler will attempt to use usmul_widen_optab or
umul_widen_optab, respectively.

That works fine, but only if the target supports those operations
directly. Otherwise, it just bombs out and reverts to the normal
inefficient non-widening multiply.

This patch attempts to catch these cases and use an alternative signed
widening multiply instruction, if one of those is available.

I believe this should be legal as long as the top bit of both inputs is
guaranteed to be zero. The code achieves this guarantee by
zero-extending the inputs to a wider mode (which must still be narrower
than the output mode).

OK?


This update fixes the testsuite issue Janis pointed out.


And this one fixes up the wmul-5.c testcase also. The patch has changed 
the correct result.


Andrew
2011-06-28  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* Makefile.in (tree-ssa-math-opts.o): Add langhooks.h dependency.
	* optabs.c (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this, and add new
	argument 'found_mode'.
	* optabs.h (find_widening_optab_handler): Rename to ...
	(find_widening_optab_handler_and_mode): ... this.
	(find_widening_optab_handler): New macro.
	* tree-ssa-math-opts.c: Include langhooks.h
	(build_and_insert_cast): New function.
	(convert_mult_to_widen): Add new argument 'gsi'.
	Convert unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.
	(execute_optimize_widening_mul): Pass gsi to convert_mult_to_widen.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: Update expected result.
	* gcc.target/arm/wmul-6.c: New file.

--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2672,7 +2672,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \
 tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \
$(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \
-   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h
+   $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \
+   langhooks.h
 tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \
$(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \
$(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -232,9 +232,10 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
non-widening optabs also.  */
 
 enum insn_code
-find_widening_optab_handler (optab op, enum machine_mode to_mode,
-			 enum machine_mode from_mode,
-			 int permit_non_widening)
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+  enum machine_mode from_mode,
+  int permit_non_widening,
+  enum machine_mode *found_mode)
 {
   for (; (permit_non_widening || from_mode != to_mode)
 	  GET_MODE_SIZE (from_mode) = GET_MODE_SIZE (to_mode)
@@ -245,7 +246,11 @@ find_widening_optab_handler (optab op, enum machine_mode to_mode,
 		   from_mode);
 
   if (handler != CODE_FOR_nothing)
-	return handler;
+	{
+	  if (found_mode)
+	*found_mode = from_mode;
+	  return handler;
+	}
 }
 
   return CODE_FOR_nothing;
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -808,8 +808,13 @@ extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code);
 
 /* Find a widening optab even if it doesn't widen as much as we want.  */
-extern enum insn_code find_widening_optab_handler (optab, enum machine_mode,
-		   enum machine_mode, int);
+#define find_widening_optab_handler(A,B,C,D) \
+  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+extern enum insn_code find_widening_optab_handler_and_mode (optab,
+			enum machine_mode,
+			enum machine_mode,
+			int,
+			enum machine_mode *);
 
 /* An extra flag to control optab_for_tree_code's behavior.  This is needed to
distinguish between machines with a vector shift that takes a scalar for the
--- a/gcc/testsuite/gcc.target/arm/wmul-5.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -7,4 +7,4 @@ foo (long long a, char *b, char *c)
   return a + *b * *c;
 }
 
-/* { dg-final { scan-assembler umlal } } */
+/* { dg-final { scan-assembler smlalbb } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler smlal } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -98,6 +98,7 @@ along with GCC; see the file COPYING3.  If not see
 #include basic-block.h
 

[PATCH] [ARM] Fix constraint modifiers for VFP patterns.

2011-06-28 Thread Ramana Radhakrishnan
Hi,

Sometime back Chung-Lin noticed that a few of the VFP patterns as below
had the '+' constraint modifiers rather than the '=' constraint
modifiers.

I've now corrected this as follows and tested this on trunk with 
arm-linux-gnueabi
and qemu for a v7-a neon test run. Committed.

cheers
Ramana

2011-06-28  Ramana Radhakrishnan  ramana.radhakrish...@linaro.org

* config/arm/vfp.md (*divsf3_vfp): Replace '+' constraint modifier
with '=' constraint modifier.
(*divdf3_vfp): Likewise.
(*mulsf3_vfp): Likewise.
(*muldf3_vfp): Likewise.
(*mulsf3negsf_vfp): Likewise.
(*muldf3negdf_vfp): Likewise.
---
 gcc/config/arm/arm.h  |2 +-
 gcc/config/arm/vfp.md |   13 ++---
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index b0d2625..edd6afd 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1597,7 +1597,7 @@ typedef struct
frame.  */
 #define EXIT_IGNORE_STACK 1
 
-#define EPILOGUE_USES(REGNO) ((REGNO) == LR_REGNUM)
+#define EPILOGUE_USES(REGNO) (epilogue_completed  (REGNO) == LR_REGNUM)
 
 /* Determine if the epilogue should be output as RTL.
You should override this if you define FUNCTION_EXTRA_EPILOGUE.  */
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 42be2ff..e2165a8 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -719,7 +719,7 @@
 ;; Division insns
 
 (define_insn *divsf3_vfp
-  [(set (match_operand:SF0 s_register_operand +t)
+  [(set (match_operand:SF0 s_register_operand =t)
(div:SF (match_operand:SF 1 s_register_operand t)
(match_operand:SF 2 s_register_operand t)))]
   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP
@@ -729,7 +729,7 @@
 )
 
 (define_insn *divdf3_vfp
-  [(set (match_operand:DF0 s_register_operand +w)
+  [(set (match_operand:DF0 s_register_operand =w)
(div:DF (match_operand:DF 1 s_register_operand w)
(match_operand:DF 2 s_register_operand w)))]
   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP_DOUBLE
@@ -742,7 +742,7 @@
 ;; Multiplication insns
 
 (define_insn *mulsf3_vfp
-  [(set (match_operand:SF 0 s_register_operand +t)
+  [(set (match_operand:SF 0 s_register_operand =t)
(mult:SF (match_operand:SF 1 s_register_operand t)
 (match_operand:SF 2 s_register_operand t)))]
   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP
@@ -752,7 +752,7 @@
 )
 
 (define_insn *muldf3_vfp
-  [(set (match_operand:DF 0 s_register_operand +w)
+  [(set (match_operand:DF 0 s_register_operand =w)
(mult:DF (match_operand:DF 1 s_register_operand w)
 (match_operand:DF 2 s_register_operand w)))]
   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP_DOUBLE
@@ -761,9 +761,8 @@
(set_attr type fmuld)]
 )
 
-
 (define_insn *mulsf3negsf_vfp
-  [(set (match_operand:SF 0 s_register_operand +t)
+  [(set (match_operand:SF 0 s_register_operand =t)
(mult:SF (neg:SF (match_operand:SF 1 s_register_operand t))
 (match_operand:SF 2 s_register_operand t)))]
   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP
@@ -773,7 +772,7 @@
 )
 
 (define_insn *muldf3negdf_vfp
-  [(set (match_operand:DF 0 s_register_operand +w)
+  [(set (match_operand:DF 0 s_register_operand =w)
(mult:DF (neg:DF (match_operand:DF 1 s_register_operand w))
 (match_operand:DF 2 s_register_operand w)))]
   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP_DOUBLE
-- 
1.7.4.1



Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const

2011-06-28 Thread Ulrich Weigand
H.J. Lu wrote:
  find_replacement never checks subreg:
 
  Breakpoint 3, find_replacement (loc=3D0x7068ab00)
  =A0 =A0at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411
  6411 =A0 =A0 =A0 =A0 =A0if (reloadreg  r-where =3D=3D loc)
  (reg:DI 0 ax)
  (reg/v/f:DI 182 [ b ])
  (gdb) call debug_rtx (*loc)
  (subreg:SI (reg/v/f:DI 182 [ b ]) 0)
  (gdb)
 
 
 This seems to work.  Does it make any senses?

Ah, I see.  This was supposed to be handled via the SUBREG_LOC
member of the replacement struct.  Unfortunately, it turns out
that this is no longer reliably set these days ...

At first I was concerned that this might also cause problems at
the other location where replacements are processed, subst_reloads.
However, it turns out that code in subst_reloads is dead these days
anyway, as the reloadreg is *always* a REG, and never a SUBREG.

Once that code (and similar code in find_replacement that tries
to handle SUBREG reloadregs) is removed, the only remaining user
of the SUBREG_LOC field is actually find_replacement.  But here
we're doing a recursive descent through an RTL anyway, so we
always know we're replacing inside a SUBREG.

This makes the whole SUBREG_LOC field obsolete.

The patch below implements those changes (untested so far).
Can you verify that this works for you as well?

Thanks,
Ulrich


ChangeLog:

* reload.c (struct replacement): Remove SUBREG_LOC member.
(push_reload): Do not set it.
(push_replacement): Likewise.
(subst_reload): Remove dead code.
(copy_replacements): Remove assertion.
(copy_replacements_1): Do not handle SUBREG_LOC.
(move_replacements): Likewise.
(find_replacement): Remove dead code.  Detect subregs via
recursive descent instead of via SUBREG_LOC.

Index: gcc/reload.c
===
*** gcc/reload.c(revision 175580)
--- gcc/reload.c(working copy)
*** static int replace_reloads;
*** 158,165 
  struct replacement
  {
rtx *where; /* Location to store in */
-   rtx *subreg_loc;/* Location of SUBREG if WHERE is inside
-  a SUBREG; 0 otherwise.  */
int what;   /* which reload this is for */
enum machine_mode mode; /* mode it must have */
  };
--- 158,163 
*** push_reload (rtx in, rtx out, rtx *inloc
*** 1496,1502 
{
  struct replacement *r = replacements[n_replacements++];
  r-what = i;
- r-subreg_loc = in_subreg_loc;
  r-where = inloc;
  r-mode = inmode;
}
--- 1494,1499 
*** push_reload (rtx in, rtx out, rtx *inloc
*** 1505,1511 
  struct replacement *r = replacements[n_replacements++];
  r-what = i;
  r-where = outloc;
- r-subreg_loc = out_subreg_loc;
  r-mode = outmode;
}
  }
--- 1502,1507 
*** push_replacement (rtx *loc, int reloadnu
*** 1634,1640 
struct replacement *r = replacements[n_replacements++];
r-what = reloadnum;
r-where = loc;
-   r-subreg_loc = 0;
r-mode = mode;
  }
  }
--- 1630,1635 
*** subst_reloads (rtx insn)
*** 6287,6319 
  if (GET_MODE (reloadreg) != r-mode  r-mode != VOIDmode)
reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode);
  
! /* If we are putting this into a SUBREG and RELOADREG is a
!SUBREG, we would be making nested SUBREGs, so we have to fix
!this up.  Note that r-where == SUBREG_REG (*r-subreg_loc).  */
! 
! if (r-subreg_loc != 0  GET_CODE (reloadreg) == SUBREG)
!   {
! if (GET_MODE (*r-subreg_loc)
! == GET_MODE (SUBREG_REG (reloadreg)))
!   *r-subreg_loc = SUBREG_REG (reloadreg);
! else
!   {
! int final_offset =
!   SUBREG_BYTE (*r-subreg_loc) + SUBREG_BYTE (reloadreg);
! 
! /* When working with SUBREGs the rule is that the byte
!offset must be a multiple of the SUBREG's mode.  */
! final_offset = (final_offset /
! GET_MODE_SIZE (GET_MODE (*r-subreg_loc)));
! final_offset = (final_offset *
! GET_MODE_SIZE (GET_MODE (*r-subreg_loc)));
! 
! *r-where = SUBREG_REG (reloadreg);
! SUBREG_BYTE (*r-subreg_loc) = final_offset;
!   }
!   }
! else
!   *r-where = reloadreg;
}
/* If reload got no reg and isn't optional, something's wrong.  */
else
--- 6282,6288 
  if (GET_MODE (reloadreg) != r-mode  r-mode != VOIDmode)
reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode);
  
! *r-where = reloadreg;
}
/* If reload got no reg and isn't optional, 

Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const

2011-06-28 Thread H.J. Lu
On Tue, Jun 28, 2011 at 7:24 AM, Ulrich Weigand uweig...@de.ibm.com wrote:
 H.J. Lu wrote:
  find_replacement never checks subreg:
 
  Breakpoint 3, find_replacement (loc=3D0x7068ab00)
  =A0 =A0at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411
  6411 =A0 =A0 =A0 =A0 =A0if (reloadreg  r-where =3D=3D loc)
  (reg:DI 0 ax)
  (reg/v/f:DI 182 [ b ])
  (gdb) call debug_rtx (*loc)
  (subreg:SI (reg/v/f:DI 182 [ b ]) 0)
  (gdb)
 

 This seems to work.  Does it make any senses?

 Ah, I see.  This was supposed to be handled via the SUBREG_LOC
 member of the replacement struct.  Unfortunately, it turns out
 that this is no longer reliably set these days ...

 At first I was concerned that this might also cause problems at
 the other location where replacements are processed, subst_reloads.
 However, it turns out that code in subst_reloads is dead these days
 anyway, as the reloadreg is *always* a REG, and never a SUBREG.

 Once that code (and similar code in find_replacement that tries
 to handle SUBREG reloadregs) is removed, the only remaining user
 of the SUBREG_LOC field is actually find_replacement.  But here
 we're doing a recursive descent through an RTL anyway, so we
 always know we're replacing inside a SUBREG.

 This makes the whole SUBREG_LOC field obsolete.

 The patch below implements those changes (untested so far).
 Can you verify that this works for you as well?

 Thanks,
 Ulrich


 ChangeLog:

        * reload.c (struct replacement): Remove SUBREG_LOC member.
        (push_reload): Do not set it.
        (push_replacement): Likewise.
        (subst_reload): Remove dead code.
        (copy_replacements): Remove assertion.
        (copy_replacements_1): Do not handle SUBREG_LOC.
        (move_replacements): Likewise.
        (find_replacement): Remove dead code.  Detect subregs via
        recursive descent instead of via SUBREG_LOC.

 Index: gcc/reload.c
 ===
 *** gcc/reload.c        (revision 175580)
 --- gcc/reload.c        (working copy)
 *** static int replace_reloads;
 *** 158,165 
  struct replacement
  {
    rtx *where;                 /* Location to store in */
 -   rtx *subreg_loc;            /* Location of SUBREG if WHERE is inside
 -                                  a SUBREG; 0 otherwise.  */
    int what;                   /* which reload this is for */
    enum machine_mode mode;     /* mode it must have */
  };
 --- 158,163 
 *** push_reload (rtx in, rtx out, rtx *inloc
 *** 1496,1502 
        {
          struct replacement *r = replacements[n_replacements++];
          r-what = i;
 -         r-subreg_loc = in_subreg_loc;
          r-where = inloc;
          r-mode = inmode;
        }
 --- 1494,1499 
 *** push_reload (rtx in, rtx out, rtx *inloc
 *** 1505,1511 
          struct replacement *r = replacements[n_replacements++];
          r-what = i;
          r-where = outloc;
 -         r-subreg_loc = out_subreg_loc;
          r-mode = outmode;
        }
      }
 --- 1502,1507 
 *** push_replacement (rtx *loc, int reloadnu
 *** 1634,1640 
        struct replacement *r = replacements[n_replacements++];
        r-what = reloadnum;
        r-where = loc;
 -       r-subreg_loc = 0;
        r-mode = mode;
      }
  }
 --- 1630,1635 
 *** subst_reloads (rtx insn)
 *** 6287,6319 
          if (GET_MODE (reloadreg) != r-mode  r-mode != VOIDmode)
            reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode);

 !         /* If we are putting this into a SUBREG and RELOADREG is a
 !            SUBREG, we would be making nested SUBREGs, so we have to fix
 !            this up.  Note that r-where == SUBREG_REG (*r-subreg_loc).  */
 !
 !         if (r-subreg_loc != 0  GET_CODE (reloadreg) == SUBREG)
 !           {
 !             if (GET_MODE (*r-subreg_loc)
 !                 == GET_MODE (SUBREG_REG (reloadreg)))
 !               *r-subreg_loc = SUBREG_REG (reloadreg);
 !             else
 !               {
 !                 int final_offset =
 !                   SUBREG_BYTE (*r-subreg_loc) + SUBREG_BYTE (reloadreg);
 !
 !                 /* When working with SUBREGs the rule is that the byte
 !                    offset must be a multiple of the SUBREG's mode.  */
 !                 final_offset = (final_offset /
 !                                 GET_MODE_SIZE (GET_MODE (*r-subreg_loc)));
 !                 final_offset = (final_offset *
 !                                 GET_MODE_SIZE (GET_MODE (*r-subreg_loc)));
 !
 !                 *r-where = SUBREG_REG (reloadreg);
 !                 SUBREG_BYTE (*r-subreg_loc) = final_offset;
 !               }
 !           }
 !         else
 !           *r-where = reloadreg;
        }
        /* If reload got no reg and isn't optional, something's wrong.  */
        else
 --- 6282,6288 
          if (GET_MODE (reloadreg) != r-mode  r-mode != VOIDmode)
            reloadreg = 

Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const

2011-06-28 Thread H.J. Lu
On Tue, Jun 28, 2011 at 7:47 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Jun 28, 2011 at 7:24 AM, Ulrich Weigand uweig...@de.ibm.com wrote:
 H.J. Lu wrote:
  find_replacement never checks subreg:
 
  Breakpoint 3, find_replacement (loc=3D0x7068ab00)
  =A0 =A0at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411
  6411 =A0 =A0 =A0 =A0 =A0if (reloadreg  r-where =3D=3D loc)
  (reg:DI 0 ax)
  (reg/v/f:DI 182 [ b ])
  (gdb) call debug_rtx (*loc)
  (subreg:SI (reg/v/f:DI 182 [ b ]) 0)
  (gdb)
 

 This seems to work.  Does it make any senses?

 Ah, I see.  This was supposed to be handled via the SUBREG_LOC
 member of the replacement struct.  Unfortunately, it turns out
 that this is no longer reliably set these days ...

 At first I was concerned that this might also cause problems at
 the other location where replacements are processed, subst_reloads.
 However, it turns out that code in subst_reloads is dead these days
 anyway, as the reloadreg is *always* a REG, and never a SUBREG.

 Once that code (and similar code in find_replacement that tries
 to handle SUBREG reloadregs) is removed, the only remaining user
 of the SUBREG_LOC field is actually find_replacement.  But here
 we're doing a recursive descent through an RTL anyway, so we
 always know we're replacing inside a SUBREG.

 This makes the whole SUBREG_LOC field obsolete.

 The patch below implements those changes (untested so far).
 Can you verify that this works for you as well?

 Thanks,
 Ulrich


 ChangeLog:

        * reload.c (struct replacement): Remove SUBREG_LOC member.
        (push_reload): Do not set it.
        (push_replacement): Likewise.
        (subst_reload): Remove dead code.
        (copy_replacements): Remove assertion.
        (copy_replacements_1): Do not handle SUBREG_LOC.
        (move_replacements): Likewise.
        (find_replacement): Remove dead code.  Detect subregs via
        recursive descent instead of via SUBREG_LOC.

 Index: gcc/reload.c
 ===
 *** gcc/reload.c        (revision 175580)
 --- gcc/reload.c        (working copy)
 *** static int replace_reloads;
 *** 158,165 
  struct replacement
  {
    rtx *where;                 /* Location to store in */
 -   rtx *subreg_loc;            /* Location of SUBREG if WHERE is inside
 -                                  a SUBREG; 0 otherwise.  */
    int what;                   /* which reload this is for */
    enum machine_mode mode;     /* mode it must have */
  };
 --- 158,163 
 *** push_reload (rtx in, rtx out, rtx *inloc
 *** 1496,1502 
        {
          struct replacement *r = replacements[n_replacements++];
          r-what = i;
 -         r-subreg_loc = in_subreg_loc;
          r-where = inloc;
          r-mode = inmode;
        }
 --- 1494,1499 
 *** push_reload (rtx in, rtx out, rtx *inloc
 *** 1505,1511 
          struct replacement *r = replacements[n_replacements++];
          r-what = i;
          r-where = outloc;
 -         r-subreg_loc = out_subreg_loc;
          r-mode = outmode;
        }
      }
 --- 1502,1507 
 *** push_replacement (rtx *loc, int reloadnu
 *** 1634,1640 
        struct replacement *r = replacements[n_replacements++];
        r-what = reloadnum;
        r-where = loc;
 -       r-subreg_loc = 0;
        r-mode = mode;
      }
  }
 --- 1630,1635 
 *** subst_reloads (rtx insn)
 *** 6287,6319 
          if (GET_MODE (reloadreg) != r-mode  r-mode != VOIDmode)
            reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode);

 !         /* If we are putting this into a SUBREG and RELOADREG is a
 !            SUBREG, we would be making nested SUBREGs, so we have to fix
 !            this up.  Note that r-where == SUBREG_REG (*r-subreg_loc).  
 */
 !
 !         if (r-subreg_loc != 0  GET_CODE (reloadreg) == SUBREG)
 !           {
 !             if (GET_MODE (*r-subreg_loc)
 !                 == GET_MODE (SUBREG_REG (reloadreg)))
 !               *r-subreg_loc = SUBREG_REG (reloadreg);
 !             else
 !               {
 !                 int final_offset =
 !                   SUBREG_BYTE (*r-subreg_loc) + SUBREG_BYTE (reloadreg);
 !
 !                 /* When working with SUBREGs the rule is that the byte
 !                    offset must be a multiple of the SUBREG's mode.  */
 !                 final_offset = (final_offset /
 !                                 GET_MODE_SIZE (GET_MODE (*r-subreg_loc)));
 !                 final_offset = (final_offset *
 !                                 GET_MODE_SIZE (GET_MODE (*r-subreg_loc)));
 !
 !                 *r-where = SUBREG_REG (reloadreg);
 !                 SUBREG_BYTE (*r-subreg_loc) = final_offset;
 !               }
 !           }
 !         else
 !           *r-where = reloadreg;
        }
        /* If reload got no reg and isn't optional, something's wrong.  */
        else
 --- 6282,6288 
          if (GET_MODE 

Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs

2011-06-28 Thread Andrew Stubbs

On 23/06/11 15:41, Andrew Stubbs wrote:

This patch removes the restriction that the inputs to a widening
multiply must be of the same mode.

It does this by extending the smaller of the two inputs to match the
larger; therefore, it remains the case that subsequent code (in the
expand pass, for example) can rely on the type of rhs1 being the input
type of the operation, and the gimple verification code is still valid.

OK?


This update fixes the testcase issue Janis highlighted.

Andrew
2011-06-28  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME.
	Ensure the the larger type is the first operand.
	(convert_mult_to_widen): Insert cast if type2 is smaller than type1.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-7.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+unsigned long long
+foo (unsigned long long a, unsigned char *b, unsigned short *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler umlal } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2051,9 +2051,17 @@ is_widening_mult_p (gimple stmt,
   *type2_out = *type1_out;
 }
 
-  /* FIXME: remove this restriction.  */
-  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
-return false;
+  /* Ensure that the larger of the two operands comes first. */
+  if (TYPE_PRECISION (*type1_out)  TYPE_PRECISION (*type2_out))
+{
+  tree tmp;
+  tmp = *type1_out;
+  *type1_out = *type2_out;
+  *type2_out = tmp;
+  tmp = *rhs1_out;
+  *rhs1_out = *rhs2_out;
+  *rhs2_out = tmp;
+}
 
   return true;
 }
@@ -2069,6 +2077,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   enum insn_code handler;
   enum machine_mode to_mode, from_mode;
   optab op;
+  int cast1 = false, cast2 = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2107,16 +2116,26 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 	return false;
 
 	  type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0);
-
-	  rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
-	create_tmp_var (type1, NULL), rhs1, type1);
-	  rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
-	create_tmp_var (type2, NULL), rhs2, type2);
+	  cast1 = cast2 = true;
 	}
   else
 	return false;
 }
 
+  if (TYPE_MODE (type2) != from_mode)
+{
+  type2 = lang_hooks.types.type_for_mode (from_mode,
+	  TYPE_UNSIGNED (type2));
+  cast2 = true;
+}
+
+  if (cast1)
+rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+  create_tmp_var (type1, NULL), rhs1, type1);
+  if (cast2)
+rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+  create_tmp_var (type2, NULL), rhs2, type2);
+
   gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1));
   gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2));
   gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
@@ -2142,6 +2161,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   optab this_optab;
   enum tree_code wmult_code;
   enum insn_code handler;
+  int cast1 = false, cast2 = false;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2211,17 +2231,28 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (GET_MODE_SIZE (mode)  GET_MODE_SIZE (TYPE_MODE (type)))
 	{
 	  type1 = type2 = lang_hooks.types.type_for_mode (mode, 0);
-	  mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
-	 create_tmp_var (type1, NULL),
-	 mult_rhs1, type1);
-	  mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
-	 create_tmp_var (type2, NULL),
-	 mult_rhs2, type2);
+	  cast1 = cast2 = true;
 	}
   else
 	return false;
 }
 
+  if (TYPE_MODE (type2) != TYPE_MODE (type1))
+{
+  type2 = lang_hooks.types.type_for_mode (TYPE_MODE (type1),
+	  TYPE_UNSIGNED (type2));
+  cast2 = true;
+}
+
+  if (cast1)
+mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt),
+   create_tmp_var (type1, NULL),
+   mult_rhs1, type1);
+  if (cast2)
+mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt),
+   create_tmp_var (type2, NULL),
+   mult_rhs2, type2);
+
   /* Verify that the machine can perform a widening multiply
  accumulate in this mode/signedness combination, otherwise
  this transformation is likely to pessimize code.  */


Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const

2011-06-28 Thread Ulrich Weigand
H.J. Lu wrote:
  it doesn't work;
 
  allocation.f: In function 'allocation':
  allocation.f:1048:0: internal compiler error: in subreg_get_info, at
  rtlanal.c:3235
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See http://gcc.gnu.org/bugs.html for instructions.

  since subreg_regno_offset only works on hard registers.

Hmm, OK.  That look like another latent bug in the original code ...

 + if (r-mode != VOIDmode  GET_MODE (reloadreg) != r-mode)
 +   reloadreg = gen_rtx_REG (r-mode, REGNO (reloadreg));

(As an aside, this is wrong; it's already wrong in the place where you
copied it from.  This should now use reload_adjust_reg_for_mode just
like subst_reload does.)

 +
 + if ((WORDS_BIG_ENDIAN || BYTES_BIG_ENDIAN)
 +  GET_MODE_SIZE (Pmode)  GET_MODE_SIZE (ptr_mode))
 +   {
 + offset = GET_MODE_SIZE (Pmode) - GET_MODE_SIZE (ptr_mode);
 + if (! BYTES_BIG_ENDIAN)
 +   offset = (offset / UNITS_PER_WORD) * UNITS_PER_WORD;
 + else if (! WORDS_BIG_ENDIAN)
 +   offset %= UNITS_PER_WORD;
 +   }
 +  else
 +offset = 0;
 +
 + return gen_rtx_SUBREG (ptr_mode, reloadreg, offset);
 
 works for me.

This doesn't seem correct either, it completely ignores the SUBREG_BYTE
of the original SUBREG ...   Also, I don't quite see why this should
have anything special for Pmode / ptr_mode.

It seems simplest to just use simplify_gen_subreg here.  Can you try
the following version?

Thanks,
Ulrich


ChangeLog:

* reload.c (struct replacement): Remove SUBREG_LOC member.
(push_reload): Do not set it.
(push_replacement): Likewise.
(subst_reload): Remove dead code.
(copy_replacements): Remove assertion.
(copy_replacements_1): Do not handle SUBREG_LOC.
(move_replacements): Likewise.
(find_replacement): Remove dead code.  Use reload_adjust_reg_for_mode.
Detect subregs via recursive descent instead of via SUBREG_LOC.


Index: gcc/reload.c
===
*** gcc/reload.c(revision 175580)
--- gcc/reload.c(working copy)
*** static int replace_reloads;
*** 158,165 
  struct replacement
  {
rtx *where; /* Location to store in */
-   rtx *subreg_loc;/* Location of SUBREG if WHERE is inside
-  a SUBREG; 0 otherwise.  */
int what;   /* which reload this is for */
enum machine_mode mode; /* mode it must have */
  };
--- 158,163 
*** push_reload (rtx in, rtx out, rtx *inloc
*** 1496,1502 
{
  struct replacement *r = replacements[n_replacements++];
  r-what = i;
- r-subreg_loc = in_subreg_loc;
  r-where = inloc;
  r-mode = inmode;
}
--- 1494,1499 
*** push_reload (rtx in, rtx out, rtx *inloc
*** 1505,1511 
  struct replacement *r = replacements[n_replacements++];
  r-what = i;
  r-where = outloc;
- r-subreg_loc = out_subreg_loc;
  r-mode = outmode;
}
  }
--- 1502,1507 
*** push_replacement (rtx *loc, int reloadnu
*** 1634,1640 
struct replacement *r = replacements[n_replacements++];
r-what = reloadnum;
r-where = loc;
-   r-subreg_loc = 0;
r-mode = mode;
  }
  }
--- 1630,1635 
*** subst_reloads (rtx insn)
*** 6287,6319 
  if (GET_MODE (reloadreg) != r-mode  r-mode != VOIDmode)
reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode);
  
! /* If we are putting this into a SUBREG and RELOADREG is a
!SUBREG, we would be making nested SUBREGs, so we have to fix
!this up.  Note that r-where == SUBREG_REG (*r-subreg_loc).  */
! 
! if (r-subreg_loc != 0  GET_CODE (reloadreg) == SUBREG)
!   {
! if (GET_MODE (*r-subreg_loc)
! == GET_MODE (SUBREG_REG (reloadreg)))
!   *r-subreg_loc = SUBREG_REG (reloadreg);
! else
!   {
! int final_offset =
!   SUBREG_BYTE (*r-subreg_loc) + SUBREG_BYTE (reloadreg);
! 
! /* When working with SUBREGs the rule is that the byte
!offset must be a multiple of the SUBREG's mode.  */
! final_offset = (final_offset /
! GET_MODE_SIZE (GET_MODE (*r-subreg_loc)));
! final_offset = (final_offset *
! GET_MODE_SIZE (GET_MODE (*r-subreg_loc)));
! 
! *r-where = SUBREG_REG (reloadreg);
! SUBREG_BYTE (*r-subreg_loc) = final_offset;
!   }
!   }
! else
!   *r-where = reloadreg;
}
/* If reload got no reg and isn't optional, 

Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching

2011-06-28 Thread Andrew Stubbs

On 23/06/11 15:42, Andrew Stubbs wrote:

This patch fixes the case where widening multiply-and-accumulate were
not recognised because the multiplication itself is not actually widening.

This can happen when you have DI + SI * SI - the multiplication will
be done in SImode as a non-widening multiply, and it's only the final
accumulate step that is widening.

This was not recognised for two reasons:

1. is_widening_mult_p inferred the output type from the multiply
statement, which in not useful in this case.

2. The inputs to the multiply instruction may not have been converted at
all (because they're not being widened), so the pattern match failed.

The patch fixes these issues by making the output type explicit, and by
permitting unconverted inputs (the types are still checked, so this is
safe).

OK?


This update fixes Janis' testsuite issue.

Andrew
2011-06-28  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
	'type'.
	Use 'type' from caller, not inferred from 'rhs'.
	Don't reject non-conversion statements. Do return lhs in this case.
	(is_widening_mult_p): Add new argument 'type'.
	Use 'type' from caller, not inferred from 'stmt'.
	Pass type to is_widening_mult_rhs_p.
	(convert_mult_to_widen): Pass type to is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-8.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler smlal } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1963,7 +1963,8 @@ struct gimple_opt_pass pass_optimize_bswap =
  }
 };
 
-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
There are two cases:
 
  - RHS makes some value at least twice as wide.  Store that value
@@ -1973,32 +1974,32 @@ struct gimple_opt_pass pass_optimize_bswap =
but leave *TYPE_OUT untouched.  */
 
 static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+			tree *new_rhs_out)
 {
   gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
   enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
 {
-  type = TREE_TYPE (rhs);
   stmt = SSA_NAME_DEF_STMT (rhs);
   if (!is_gimple_assign (stmt))
 	return false;
 
-  rhs_code = gimple_assign_rhs_code (stmt);
-  if (TREE_CODE (type) == INTEGER_TYPE
-	  ? !CONVERT_EXPR_CODE_P (rhs_code)
-	  : rhs_code != FIXED_CONVERT_EXPR)
-	return false;
-
   rhs1 = gimple_assign_rhs1 (stmt);
   type1 = TREE_TYPE (rhs1);
   if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2  TYPE_PRECISION (type))
 	return false;
 
-  *new_rhs_out = rhs1;
+  rhs_code = gimple_assign_rhs_code (stmt);
+  if (TREE_CODE (type) == INTEGER_TYPE
+	  ? !CONVERT_EXPR_CODE_P (rhs_code)
+	  : rhs_code != FIXED_CONVERT_EXPR)
+	*new_rhs_out = gimple_assign_lhs (stmt);
+  else
+	*new_rhs_out = rhs1;
   *type_out = type1;
   return true;
 }
@@ -2013,28 +2014,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
   return false;
 }
 
-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */
 
 static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
 		tree *type1_out, tree *rhs1_out,
 		tree *type2_out, tree *rhs2_out)
 {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
   if (TREE_CODE (type) != INTEGER_TYPE
TREE_CODE (type) != FIXED_POINT_TYPE)
 return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+			   rhs1_out))
 return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+			   rhs2_out))
 return false;
 
   if (*type1_out == NULL)
@@ -2084,7 +2084,7 @@ convert_mult_to_widen (gimple stmt, 

[Patch, Fortran, F08] PR 49562: [4.6/4.7 Regression] [OOP] assigning value to type-bound function

2011-06-28 Thread Janus Weil
Hi all,

here is a patch for a problem which was originally reported as an
ICE-on-invalid regression (assigning to a type-bound function).

In the course of fixing it, I noticed that it becomes valid according
to F08 if the function is pointer-valued, and modified the patch such
that it will accept this variant. I also adapted the original test
case to be a run-time test of this F08 feature (in fact it is just a
very complicated way of performing an increment from 0 to 1, and would
still segfault without the patch).

The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.6.2?

Cheers,
Janus



2011-06-28  Janus Weil  ja...@gcc.gnu.org

PR fortran/49562
* expr.c (gfc_check_vardef_context): Handle type-bound procedures.


2011-06-28  Janus Weil  ja...@gcc.gnu.org

PR fortran/49562
* gfortran.dg/typebound_proc_23.f90: New.
Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c	(revision 175580)
+++ gcc/fortran/expr.c	(working copy)
@@ -4394,8 +4394,8 @@ gfc_check_vardef_context (gfc_expr* e, bool pointe
   sym = e-value.function.esym ? e-value.function.esym : e-symtree-n.sym;
 }
 
-  if (!pointer  e-expr_type == EXPR_FUNCTION
-   sym-result-attr.pointer)
+  attr = gfc_expr_attr (e);
+  if (!pointer  e-expr_type == EXPR_FUNCTION  attr.pointer)
 {
   if (!(gfc_option.allow_std  GFC_STD_F2008))
 	{
@@ -4432,7 +4432,6 @@ gfc_check_vardef_context (gfc_expr* e, bool pointe
 
   /* Find out whether the expr is a pointer; this also means following
  component references to the last one.  */
-  attr = gfc_expr_attr (e);
   is_pointer = (attr.pointer || attr.proc_pointer);
   if (pointer  !is_pointer)
 {
! { dg-do compile }
!
! PR 49562: [4.6/4.7 Regression] [OOP] assigning value to type-bound function
!
! Contributed by Hans-Werner Boschmann boschm...@tp1.physik.uni-siegen.de

module ice
  type::ice_type
   contains
 procedure::ice_func
  end type
  integer, target :: it = 0
contains
  function ice_func(this)
integer, pointer :: ice_func
class(ice_type)::this
ice_func = it
  end function ice_func
  subroutine ice_sub(a)
class(ice_type)::a
a%ice_func() = 1
  end subroutine ice_sub
end module

use ice
type(ice_type) :: t
if (it/=0) call abort()
call ice_sub(t)
if (it/=1) call abort()
end

! { dg-final { cleanup-modules ice } }


Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const

2011-06-28 Thread H.J. Lu
On Tue, Jun 28, 2011 at 8:19 AM, Ulrich Weigand uweig...@de.ibm.com wrote:
 H.J. Lu wrote:
  it doesn't work;
 
  allocation.f: In function 'allocation':
  allocation.f:1048:0: internal compiler error: in subreg_get_info, at
  rtlanal.c:3235
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See http://gcc.gnu.org/bugs.html for instructions.

  since subreg_regno_offset only works on hard registers.

 Hmm, OK.  That look like another latent bug in the original code ...

 +         if (r-mode != VOIDmode  GET_MODE (reloadreg) != r-mode)
 +           reloadreg = gen_rtx_REG (r-mode, REGNO (reloadreg));

 (As an aside, this is wrong; it's already wrong in the place where you
 copied it from.  This should now use reload_adjust_reg_for_mode just
 like subst_reload does.)

 +
 +         if ((WORDS_BIG_ENDIAN || BYTES_BIG_ENDIAN)
 +              GET_MODE_SIZE (Pmode)  GET_MODE_SIZE (ptr_mode))
 +           {
 +             offset = GET_MODE_SIZE (Pmode) - GET_MODE_SIZE (ptr_mode);
 +             if (! BYTES_BIG_ENDIAN)
 +               offset = (offset / UNITS_PER_WORD) * UNITS_PER_WORD;
 +             else if (! WORDS_BIG_ENDIAN)
 +               offset %= UNITS_PER_WORD;
 +           }
 +          else
 +            offset = 0;
 +
 +         return gen_rtx_SUBREG (ptr_mode, reloadreg, offset);

 works for me.

 This doesn't seem correct either, it completely ignores the SUBREG_BYTE
 of the original SUBREG ...   Also, I don't quite see why this should
 have anything special for Pmode / ptr_mode.

 It seems simplest to just use simplify_gen_subreg here.  Can you try
 the following version?

 Thanks,
 Ulrich


 ChangeLog:

        * reload.c (struct replacement): Remove SUBREG_LOC member.
        (push_reload): Do not set it.
        (push_replacement): Likewise.
        (subst_reload): Remove dead code.
        (copy_replacements): Remove assertion.
        (copy_replacements_1): Do not handle SUBREG_LOC.
        (move_replacements): Likewise.
        (find_replacement): Remove dead code.  Use reload_adjust_reg_for_mode.
        Detect subregs via recursive descent instead of via SUBREG_LOC.



It works much better.  I am testing it now.

Thanks.

-- 
H.J.


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-06-28 Thread Michael Matz
Hi,

On Tue, 28 Jun 2011, Richard Guenther wrote:

 I'd name the predicate value_preserving_conversion_p which I think is 
 what you mean.  harmless isn't really descriptive.
 
 Note that you include non-value-preserving conversions, namely int - 
 unsigned int.

It seems that Andrew really does want to accept them.  If so 
value_preserving_conversion_p would be the wrong name.  It seems to me he 
wants to accept those conversions that make it possible to retrieve the 
old value, i.e. when T1 x; (T1)(T2)x == x, then T1-T2 has the 
to-be-named property.  bits_preserving?  Hmm.


Ciao,
Michael.


Re: [PATCH, SRA] Total scalarization and padding

2011-06-28 Thread Martin Jambor
Hi,

On Tue, Jun 28, 2011 at 03:01:17PM +0200, Richard Guenther wrote:
 On Tue, Jun 28, 2011 at 2:50 PM, Martin Jambor mjam...@suse.cz wrote:
  Hi,
 
  at the moment SRA can get confused by alignment padding and think that
  it actually contains some data for which there is no planned
  replacement and thus might leave some loads and stores in place
  instead of removing them.  This is perhaps the biggest problem when we
  attempt total scalarization of simple structures exactly in order to
  get rid of these and of the variables altogether.
 
  I've pondered for quite a while how to best deal with them.  One
  option was to make just the total scalarization stronger.  I have also
  contemplated creating phantom accesses for padding I could detect
  (i.e. in simple structures) which would be more general, but this
  would complicate the parts of SRA which are already quite convoluted
  and I was not really sure it was worth it.
 
  Eventually I decided for the total scalarization option.  This patch
  changes it such that the flag is propagated down the access tree but
  also, if it does not work out, is reset on the way up.  If the flag
  survives, the access tree is considered covered by scalar
  replacements and thus it is known not to contain unscalarized data.
 
  While changing function analyze_access_subtree I have simplified the
  way we compute the hole flag and also fixed one comparison which we
  currently have the wrong way round but it fortunately does not matter
  because if there is a hole, the covered_to will never add up to the
  total size.  I'll probably post a separate patch against 4.6 just in
  case someone attempts to read the source.
 
  Bootstrapped and tested on x86_64-linux, OK for trunk?
 
 So, what will it do for the testcase?
 
 The following is what I _think_ it should do:
 
 bb 2:
   l = *p_1(D);
   l$i_6 = p_1(D)-i;
   D.2700_2 = l$i_6;
   D.2701_3 = D.2700_2 + 1;
   l$i_12 = D.2701_3;
   *p_1(D) = l;
   p_1(D)-i = l$i_12;
 
 and let FRE/DSE do their job (which they don't do, unfortunately).
 So does your patch then remove the load/store from/to l but keep
 the elementwise loads/stores (which are probably cleaned up by FRE)?
 

Well, that is what would happen if no total scalarization was going
on.  Total scalarization is a poor-man's aggregate copy-propagation by
splitting up small structures to individual fields whenever we can get
rid of them this way (i.e. if they are never used in a non-assignment)
which I introduced to fix PR 42585 - but unfortunately the padding
problem did not occur to me until this winter.

Currently, SRA performs very badly on the testcase, creating:

bb 2:
  l = *p_1(D);
  l$i_6 = p_1(D)-i;
  l$f1_8 = p_1(D)-f1;
  l$f2_9 = p_1(D)-f2;
  l$f3_10 = p_1(D)-f3;
  l$f4_11 = p_1(D)-f4;
  D.1966_2 = l$i_6;
  D.1967_3 = D.1966_2 + 1;
  l$i_12 = D.1967_3;
  *p_1(D) = l;  -- this should not be here
  p_1(D)-i = l$i_12;
  p_1(D)-f1 = l$f1_8;
  p_1(D)-f2 = l$f2_9;
  p_1(D)-f3 = l$f3_10;
  p_1(D)-f4 = l$f4_11;
  return;

Unfortunately, this basically survives all the way to the optimized
dump.  With the patch, the assignment *p_1(D) = l; is removed and
copyprop1 and cddce1 turn this into:

bb 2:
  l$i_6 = p_1(D)-i;
  D.1967_3 = l$i_6 + 1;
  p_1(D)-i = D.1967_3;
  return;

which is then the optimized gimple, already before IPA and at -O1.

For the record, without total scalarization, the optimized gimple
would be:

bb 2:
  l = *p_1(D);
  l$i_6 = p_1(D)-i;
  D.1967_3 = l$i_6 + 1;
  *p_1(D) = l;
  p_1(D)-i = D.1967_3;
  return;

So at the moment FRE/DSE certainly does not help.  Eventually we
should do something like that or a real aggregate copy propagation but
until then we probably need to live with the total scalarization
thingy - I have learned in the PR mentioned above and a few others,
there are people who really want at least this functionality now - and
it should not perform this badly on unaligned structures.

Martin




 Richard.
 
 
  Thanks,
 
  Martin
 
 
  2011-06-24  Martin Jambor  mjam...@suse.cz
 
         * tree-sra.c (struct access): Rename total_scalarization to
         grp_total_scalarization
         (completely_scalarize_var): New function.
         (sort_and_splice_var_accesses): Set total_scalarization in the
         representative access.
         (analyze_access_subtree): Propagate total scalarization accross the
         tree, no holes in totally scalarized trees, simplify coverage
         computation.
         (analyze_all_variable_accesses): Call completely_scalarize_var 
  instead
         of completely_scalarize_record.
 
         * testsuite/gcc.dg/tree-ssa/sra-12.c: New test.
 
  Index: src/gcc/tree-sra.c
  ===
  *** src.orig/gcc/tree-sra.c
  --- src/gcc/tree-sra.c
  *** struct access
  *** 170,179 
     /* Is this particular access write access? */
     unsigned write : 1;
 
  -   /* Is this access an artificial one created to 

[testsuite, objc] Don't XFAIL objc.dg/torture/forward-1.m

2011-06-28 Thread Rainer Orth
objc.dg/torture/forward-1.m now seems to XPASS everywhere, creating an
annoying amount of testsuite noise.  Dominique provided the following
patch in PR libobjc/Bug 36610.

Tested with the appropriate runtest invocations on i386-pc-solaris2.10
(both multilibs), sparc-sun-solaris2.10 (both multilibs),
alpha-dec-osf5.1b, mips-sgi-irix6.5 (both multilibs),
powerpc-apple-darwin9.8.0 (32-bit only).

Ok for mainline?

Thanks.
Rainer


2011-06-28  Dominique d'Humieres  domi...@lps.ens.fr

* objc.dg/torture/forward-1.m: Remove dg-xfail-run-if, dg-skip-if.

Index: gcc/testsuite/objc.dg/torture/forward-1.m
===
--- gcc/testsuite/objc.dg/torture/forward-1.m   (revision 175589)
+++ gcc/testsuite/objc.dg/torture/forward-1.m   (working copy)
@@ -1,7 +1,5 @@
 /* { dg-do run } */
 /* See if -forward:: is able to work. */
-/* { dg-xfail-run-if PR36610 { ! { { i?86-*-* x86_64-*-* }  ilp32 } } { 
-fgnu-runtime } {  } } */
-/* { dg-skip-if Needs OBJC2 Implementation { *-*-darwin*  { lp64 } } { 
-fnext-runtime } {  } } */
 
 #include stdio.h
 #include stdlib.h

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite, objc] Don't XFAIL objc.dg/torture/forward-1.m

2011-06-28 Thread Iain Sandoe


On 28 Jun 2011, at 17:47, Rainer Orth wrote:


objc.dg/torture/forward-1.m now seems to XPASS everywhere, creating an
annoying amount of testsuite noise.  Dominique provided the following
patch in PR libobjc/Bug 36610.

Tested with the appropriate runtest invocations on i386-pc-solaris2.10
(both multilibs), sparc-sun-solaris2.10 (both multilibs),
alpha-dec-osf5.1b, mips-sgi-irix6.5 (both multilibs),
powerpc-apple-darwin9.8.0 (32-bit only).

Ok for mainline?

Thanks.
   Rainer


2011-06-28  Dominique d'Humieres  domi...@lps.ens.fr

* objc.dg/torture/forward-1.m: Remove dg-xfail-run-if, dg-skip-if.

Index: gcc/testsuite/objc.dg/torture/forward-1.m
===
--- gcc/testsuite/objc.dg/torture/forward-1.m   (revision 175589)
+++ gcc/testsuite/objc.dg/torture/forward-1.m   (working copy)
@@ -1,7 +1,5 @@
/* { dg-do run } */
/* See if -forward:: is able to work. */
-/* { dg-xfail-run-if PR36610 { ! { { i?86-*-* x86_64-*-* }   
ilp32 } } { -fgnu-runtime } {  } } */


-/* { dg-skip-if Needs OBJC2 Implementation { *-*-darwin*   
{ lp64 } } { -fnext-runtime } {  } } */


actually, looking at this,  it should likely read (untested):

/* { dg-skip-if Needs OBJC2 Implementation { *-*-darwin8*  { lp64  
 { ! objc2 } } } { -fnext-runtime } {  } } */


and should stay in place to protect the test-cases for m64 on *-*- 
darwin8*


(not that there's ever likely to be an m64 objc2 on darwin 8.. but)

Iain



Use common and target option handling hooks in driver

2011-06-28 Thread Joseph S. Myers
This patch makes the driver use the common and target option handling
hooks, so making the option state in the driver much closer to that in
the core compiler as needed for it to drive multilib selection.
opts.o is put in libcommon-target; a few cases of global state usage
in opts.c (either missed in my previous changes, or recently added)
are fixed.  In a few cases where the driver has its own handling of a
common option, or where the common handling may not work in the driver
at present, common_handle_option is made to return early in the
driver.  In particular, this applies to --help (right now the driver
has its own code reporting help information for driver options and
they generally don't have help text in the .opt files; it would be
good to integrate things better so that there is only one set of
--help machinery used) and to -Werror= (the diagnostic machinery is
initialized in the driver without the support for individual option
control, which doesn't seem particularly useful there).

Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  Applied
to mainline.

2011-06-28  Joseph Myers  jos...@codesourcery.com

* common.opt (in_lto_p): New Variable entry.
* flags.h (in_lto_p): Move to common.opt.
* gcc.c: Include params.h.
(set_option_handlers): Also use common_handle_option and
target_handle_option.
(main): Call global_init_params, finish_params and
init_options_struct.
* opts.c (debug_type_names): Move from toplev.c.
(print_filtered_help): Access quiet_flag through opts pointer.
(common_handle_option): Return early in the driver for some
options.  Access in_lto_p, dwarf_version and
warn_maybe_uninitialized through opts pointer.
* toplev.c (in_lto_p): Move to common.opt.
(debug_type_names): Move to opts.c.
* Makefile.in (OBJS): Remove opts.o.
(OBJS-libcommon-target): Add opts.o.
(gcc.o): Update dependencies.

Index: gcc/flags.h
===
--- gcc/flags.h (revision 175330)
+++ gcc/flags.h (working copy)
@@ -1,6 +1,6 @@
 /* Compilation switch flag definitions for GCC.
Copyright (C) 1987, 1988, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2002,
-   2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -34,13 +34,6 @@ extern const char *const debug_type_name
 extern void strip_off_ending (char *, int);
 extern int base_of_path (const char *path, const char **base_out);
 
-/* True if this is the LTO front end (lto1).  This is used to disable
-   gimple generation and lowering passes that are normally run on the
-   output of a front end.  These passes must be bypassed for lto since
-   they have already been done before the gimple was written.  */
-
-extern bool in_lto_p;
-
 /* Return true iff flags are set as if -ffast-math.  */
 extern bool fast_math_flags_set_p (const struct gcc_options *);
 extern bool fast_math_flags_struct_set_p (struct cl_optimization *);
Index: gcc/gcc.c
===
--- gcc/gcc.c   (revision 175330)
+++ gcc/gcc.c   (working copy)
@@ -43,6 +43,7 @@ compilation is specified by a string cal
 #include diagnostic.h
 #include flags.h
 #include opts.h
+#include params.h
 #include vec.h
 #include filenames.h
 
@@ -3532,9 +3533,13 @@ set_option_handlers (struct cl_option_ha
   handlers-unknown_option_callback = driver_unknown_option_callback;
   handlers-wrong_lang_callback = driver_wrong_lang_callback;
   handlers-post_handling_callback = driver_post_handling_callback;
-  handlers-num_handlers = 1;
+  handlers-num_handlers = 3;
   handlers-handlers[0].handler = driver_handle_option;
   handlers-handlers[0].mask = CL_DRIVER;
+  handlers-handlers[1].handler = common_handle_option;
+  handlers-handlers[1].mask = CL_COMMON;
+  handlers-handlers[2].handler = target_handle_option;
+  handlers-handlers[2].mask = CL_TARGET;
 }
 
 /* Create the vector `switches' and its contents.
@@ -6156,7 +6161,11 @@ main (int argc, char **argv)
   if (argv != old_argv)
 at_file_supplied = true;
 
-  global_options = global_options_init;
+  /* Register the language-independent parameters.  */
+  global_init_params ();
+  finish_params ();
+
+  init_options_struct (global_options, global_options_set);
 
   decode_cmdline_options_to_array (argc, CONST_CAST2 (const char **, char **,
  argv),
Index: gcc/toplev.c
===
--- gcc/toplev.c(revision 175330)
+++ gcc/toplev.c(working copy)
@@ -125,13 +125,6 @@ unsigned int save_decoded_options_count;
 
 const struct gcc_debug_hooks *debug_hooks;
 
-/* True if this is the lto front end.  This is used to disable
-   gimple generation and lowering passes that are normally 

[testsuite] Remove dg-extra-errors in gcc.dg/inline_[12].c etc.

2011-06-28 Thread Rainer Orth
Three new testcases seem to XPASS everywhere, at least on all of my
targets:

XPASS: gcc.dg/inline_1.c (test for excess errors)
XPASS: gcc.dg/inline_2.c (test for excess errors)
XPASS: gcc.dg/unroll_1.c (test for excess errors)

The following patch fixes this to remove the noise.  Tested with the
appropriate runtest invocation on i386-pc-solaris2.10.

Ok for mainline?

Rainer


2011-06-28  Rainer Orth  r...@cebitec.uni-bielefeld.de

* gcc.dg/inline_1.c: Remove dg-excess-errors.
* gcc.dg/inline_2.c: Likewise.
* gcc.dg/unroll_1.c: Likewise.

Index: gcc/testsuite/gcc.dg/inline_2.c
===
--- gcc/testsuite/gcc.dg/inline_2.c (revision 175590)
+++ gcc/testsuite/gcc.dg/inline_2.c (working copy)
@@ -20,4 +20,3 @@
 
 /* { dg-final { scan-tree-dump-times bar 5 optimized } } */
 /* { dg-final { cleanup-tree-dump optimized } } */
-/* { dg-excess-errors extra notes } */
Index: gcc/testsuite/gcc.dg/inline_1.c
===
--- gcc/testsuite/gcc.dg/inline_1.c (revision 175590)
+++ gcc/testsuite/gcc.dg/inline_1.c (working copy)
@@ -20,4 +20,3 @@
 
 /* { dg-final { scan-tree-dump-times bar 5 optimized } } */
 /* { dg-final { cleanup-tree-dump optimized } } */
-/* { dg-excess-errors extra notes } */
Index: gcc/testsuite/gcc.dg/unroll_1.c
===
--- gcc/testsuite/gcc.dg/unroll_1.c (revision 175590)
+++ gcc/testsuite/gcc.dg/unroll_1.c (working copy)
@@ -30,4 +30,3 @@
 
 /* { dg-final { scan-rtl-dump-times Decided to peel loop completely 2 
loop2_unroll } } */
 /* { dg-final { cleanup-rtl-dump loop2_unroll } } */
-/* { dg-excess-errors extra notes } */

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [testsuite] Remove dg-extra-errors in gcc.dg/inline_[12].c etc.

2011-06-28 Thread Xinliang David Li
Your fix works ok for me (on x86-64/linux) too.

Thanks,

David

On Tue, Jun 28, 2011 at 10:09 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Three new testcases seem to XPASS everywhere, at least on all of my
 targets:

 XPASS: gcc.dg/inline_1.c (test for excess errors)
 XPASS: gcc.dg/inline_2.c (test for excess errors)
 XPASS: gcc.dg/unroll_1.c (test for excess errors)

 The following patch fixes this to remove the noise.  Tested with the
 appropriate runtest invocation on i386-pc-solaris2.10.

 Ok for mainline?

        Rainer


 2011-06-28  Rainer Orth  r...@cebitec.uni-bielefeld.de

        * gcc.dg/inline_1.c: Remove dg-excess-errors.
        * gcc.dg/inline_2.c: Likewise.
        * gcc.dg/unroll_1.c: Likewise.

 Index: gcc/testsuite/gcc.dg/inline_2.c
 ===
 --- gcc/testsuite/gcc.dg/inline_2.c     (revision 175590)
 +++ gcc/testsuite/gcc.dg/inline_2.c     (working copy)
 @@ -20,4 +20,3 @@

  /* { dg-final { scan-tree-dump-times bar 5 optimized } } */
  /* { dg-final { cleanup-tree-dump optimized } } */
 -/* { dg-excess-errors extra notes } */
 Index: gcc/testsuite/gcc.dg/inline_1.c
 ===
 --- gcc/testsuite/gcc.dg/inline_1.c     (revision 175590)
 +++ gcc/testsuite/gcc.dg/inline_1.c     (working copy)
 @@ -20,4 +20,3 @@

  /* { dg-final { scan-tree-dump-times bar 5 optimized } } */
  /* { dg-final { cleanup-tree-dump optimized } } */
 -/* { dg-excess-errors extra notes } */
 Index: gcc/testsuite/gcc.dg/unroll_1.c
 ===
 --- gcc/testsuite/gcc.dg/unroll_1.c     (revision 175590)
 +++ gcc/testsuite/gcc.dg/unroll_1.c     (working copy)
 @@ -30,4 +30,3 @@

  /* { dg-final { scan-rtl-dump-times Decided to peel loop completely 2 
 loop2_unroll } } */
  /* { dg-final { cleanup-rtl-dump loop2_unroll } } */
 -/* { dg-excess-errors extra notes } */

 --
 -
 Rainer Orth, Center for Biotechnology, Bielefeld University



Re: [pph] Fix var order when streaming in. (issue4635074)

2011-06-28 Thread gchare

On 2011/06/28 11:27:56, Diego Novillo wrote:

http://codereview.appspot.com/4635074/diff/1/gcc/cp/pph-streamer-in.c
File gcc/cp/pph-streamer-in.c (right):



http://codereview.appspot.com/4635074/diff/1/gcc/cp/pph-streamer-in.c#newcode1144

gcc/cp/pph-streamer-in.c:1144: /* The chains are built backwards (ref:
mailto:add_decl_to_level@name-lookup.c),
 1143
 1144   /* The chains are built backwards (ref:
mailto:add_decl_to_level@name-lookup.c),



s/add_decl_to_level@name-lookup.c/add_decl_to_level/


Done.

Commited as r175592.

Gab


http://codereview.appspot.com/4635074/


Re: Simplify Solaris configuration

2011-06-28 Thread Eric Botcazou
 The patch allowed a sparcv9-sun-solaris2.11 bootstrap to run well into
 building the target libraries (failed configuring libgfortran since I'd
 mis-merged the 32-bit and 64-bit gmp.h), a sparc-sun-solaris2.10
 bootstrap is still running.

 I'll probably fix the gmp.h issue, rebuild the sparcv9-sun-solaris2.11
 configuration and commit unless I find problems or you disapprove of the
 approach.

No, this is fine by me, thanks.

-- 
Eric Botcazou


Re: Simplify Solaris configuration

2011-06-28 Thread Rainer Orth
Eric Botcazou ebotca...@adacore.com writes:

 The patch allowed a sparcv9-sun-solaris2.11 bootstrap to run well into
 building the target libraries (failed configuring libgfortran since I'd
 mis-merged the 32-bit and 64-bit gmp.h), a sparc-sun-solaris2.10
 bootstrap is still running.

 I'll probably fix the gmp.h issue, rebuild the sparcv9-sun-solaris2.11
 configuration and commit unless I find problems or you disapprove of the
 approach.

 No, this is fine by me, thanks.

Both bootstraps have completed successfully, so I've checked in the
patch.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: PATCH [8/n]: Prepare x32: PR other/48007: Unwind library doesn't work with UNITS_PER_WORD sizeof (void *)

2011-06-28 Thread H.J. Lu
On Mon, Jun 27, 2011 at 7:58 AM, Jason Merrill ja...@redhat.com wrote:
 On 06/26/2011 05:58 PM, H.J. Lu wrote:

 The current unwind library scheme provides only one unwind
 context and is backward compatible with multiple different unwind
 contexts from multiple unwind libraries:

 http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01769.html

 My patch fixes UNITS_PER_WORD  sizeof (void *) and
 enforces single unwind context when backward compatibility
 isn't needed.

 OK, there seem to be two things going on in this patch:

 1) Handle registers larger than pointers.
 2) Require that all code share a single copy of the unwinder.

 For #2, how are you avoiding the issues Jakub describes in that message?
  Isn't his scenario 2 still possible?  Are you deciding that it's better to
 abort at run-time in that case?

 It seems to me that for targets newer than Jakub's patch we can hard-wire
 _Unwind_IsExtendedContext to true, but making further assumptions would be a
 mistake.

 Then, if we're still trying to handle versioning, I think your earlier patch
 for #1 (r170716) that just changes the type of the reg array is a better way
 to go.  But that change should be dependent on a target macro to avoid ABI
 changes for existing targets.


This updated patch.  It allows multiple unwind contexts.  It replaces

char by_value[DWARF_FRAME_REGISTERS+1];

with

_Unwind_Word value[DWARF_FRAME_REGISTERS+1];

The code is cleaner than conditionally replacing

void *reg[DWARF_FRAME_REGISTERS+1];

with

_Unwind_Word reg[DWARF_FRAME_REGISTERS+1];

with a bigger unwind context.  But it is more flexible if we
want to extend unwind context later, like saving/restoring
128bit or vector registers which may be bigger than the current
_Unwind_Word.

Thanks.

-- 
H.J.

gcc/

2011-06-28  H.J. Lu  hongjiu...@intel.com

* config.gcc (libgcc_tm_file): Add i386/value-unwind.h for
Linux/x86.

* system.h (REG_VALUE_IN_UNWIND_CONTEXT): Poisoned.

* unwind-dw2.c (_Unwind_Context): If REG_VALUE_IN_UNWIND_CONTEXT
is defined, add value and remove by_value.
(SIGNAL_FRAME_BIT): Define if REG_VALUE_IN_UNWIND_CONTEXT is
defined.
(EXTENDED_CONTEXT_BIT): Don't define if REG_VALUE_IN_UNWIND_CONTEXT
is defined.
(_Unwind_IsExtendedContext): Likewise.
(_Unwind_GetGR): Support REG_VALUE_IN_UNWIND_CONTEXT.
(_Unwind_SetGR): Likewise.
(_Unwind_GetGRPtr): Likewise.
(_Unwind_SetGRPtr): Likewise.
(_Unwind_SetGRValue): Likewise.
(_Unwind_GRByValue): Likewise.
(__frame_state_for): Likewise.
(uw_install_context_1): Likewise.

* doc/tm.texi.in: Document REG_VALUE_IN_UNWIND_CONTEXT.
* doc/tm.texi: Regenerated.

libgcc/

2011-06-28  H.J. Lu  hongjiu...@intel.com

* config/i386/value-unwind.h: New.
gcc/

2011-06-28  H.J. Lu  hongjiu...@intel.com

* config.gcc (libgcc_tm_file): Add i386/value-unwind.h for
Linux/x86.

* system.h (REG_VALUE_IN_UNWIND_CONTEXT): Poisoned.

* unwind-dw2.c (_Unwind_Context): If REG_VALUE_IN_UNWIND_CONTEXT
is defined, add value and remove by_value.
(SIGNAL_FRAME_BIT): Define if REG_VALUE_IN_UNWIND_CONTEXT is
defined.
(EXTENDED_CONTEXT_BIT): Don't define if REG_VALUE_IN_UNWIND_CONTEXT
is defined.
(_Unwind_IsExtendedContext): Likewise.
(_Unwind_GetGR): Support REG_VALUE_IN_UNWIND_CONTEXT.
(_Unwind_SetGR): Likewise.
(_Unwind_GetGRPtr): Likewise.
(_Unwind_SetGRPtr): Likewise.
(_Unwind_SetGRValue): Likewise.
(_Unwind_GRByValue): Likewise.
(__frame_state_for): Likewise.
(uw_install_context_1): Likewise.

* doc/tm.texi.in: Document REG_VALUE_IN_UNWIND_CONTEXT.
* doc/tm.texi: Regenerated.

libgcc/

2011-06-28  H.J. Lu  hongjiu...@intel.com

* config/i386/value-unwind.h: New.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a1dbd1a..c9867a2 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2627,6 +2648,7 @@ esac
 case ${target} in
 i[34567]86-*-linux* | x86_64-*-linux*)
tmake_file=${tmake_file} i386/t-pmm_malloc i386/t-i386
+   libgcc_tm_file=${libgcc_tm_file} i386/value-unwind.h
;;
 i[34567]86-*-* | x86_64-*-*)
tmake_file=${tmake_file} i386/t-gmm_malloc i386/t-i386
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 341628b..2666716 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3701,6 +3701,14 @@ return @code{@var{regno}}.
 
 @end defmac
 
+@defmac REG_VALUE_IN_UNWIND_CONTEXT
+
+Define this macro if the target stores register values as
+@code{_Unwind_Word} type in unwind context.  The default is to
+store register values as @code{void *} type.
+
+@end defmac
+
 @node Elimination
 @subsection Eliminating Frame Pointer and Arg Pointer
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f7c16e9..690fa52 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3687,6 

Re: [patch, fortran] Fix PR 49479, reshape with optional arg

2011-06-28 Thread Thomas Koenig

Hi Jerry,

On 06/27/2011 03:18 PM, Thomas Koenig wrote:

Hello world,

the attached patch fixes PR 49479, a regression for 4.7 and 4.6. Test
case was supplied by Joost, the approach to the patch was suggested by
Tobias in comment#4 of the PR. The patch certainly looks safe enough.

Regression-tested. OK for trunk and, after a couple of days, for 4.6?

Thomas



OK,


After your approval, I realized that I had forgotten the generic
reshape.  I added that as obvious.  Here is what I committed,
revision 175594.

Regards

Thomas

2011-06-28  Thomas Koenig  tkoe...@gcc.gnu.org

PR fortran/49479
* m4/reshape.m4: If source allocation is smaller than one, set it
to one.
* intrinsics/reshape_generic.c:  Likewise.
* generated/reshape_r16.c: Regenerated.
* generated/reshape_c4.c: Regenerated.
* generated/reshape_c16.c: Regenerated.
* generated/reshape_c8.c: Regenerated.
* generated/reshape_r4.c: Regenerated.
* generated/reshape_i4.c: Regenerated.
* generated/reshape_r10.c: Regenerated.
* generated/reshape_r8.c: Regenerated.
* generated/reshape_c10.c: Regenerated.
* generated/reshape_i8.c: Regenerated.
* generated/reshape_i16.c: Regenerated.

2011-06-28  Thomas Koenig  tkoe...@gcc.gnu.org

PR fortran/49479
* gfortran.dg/reshape_zerosize_3.f90:  New test.
Index: m4/reshape.m4
===
--- m4/reshape.m4	(Revision 175593)
+++ m4/reshape.m4	(Arbeitskopie)
@@ -101,6 +101,8 @@
 
   if (ret-data == NULL)
 {
+  index_type alloc_size;
+
   rs = 1;
   for (n = 0; n  rdim; n++)
 	{
@@ -111,7 +113,13 @@
 	  rs *= rex;
 	}
   ret-offset = 0;
-  ret-data = internal_malloc_size ( rs * sizeof ('rtype_name`));
+
+  if (unlikely (rs  1))
+alloc_size = 1;
+  else
+alloc_size = rs * sizeof ('rtype_name`);
+
+  ret-data = internal_malloc_size (alloc_size);
   ret-dtype = (source-dtype  ~GFC_DTYPE_RANK_MASK) | rdim;
 }
 
Index: intrinsics/reshape_generic.c
===
--- intrinsics/reshape_generic.c	(Revision 175593)
+++ intrinsics/reshape_generic.c	(Arbeitskopie)
@@ -85,6 +85,8 @@
 
   if (ret-data == NULL)
 {
+  index_type alloc_size;
+
   rs = 1;
   for (n = 0; n  rdim; n++)
 	{
@@ -95,7 +97,14 @@
 	  rs *= rex;
 	}
   ret-offset = 0;
-  ret-data = internal_malloc_size ( rs * size );
+
+  if (unlikely (rs  1))
+	alloc_size = 1;
+  else
+	alloc_size = rs * size;
+
+  ret-data = internal_malloc_size (alloc_size);
+
   ret-dtype = (source-dtype  ~GFC_DTYPE_RANK_MASK) | rdim;
 }
 
! { dg-do run }
! PR 49479 - this used not to print anything.
! Test case by Joost VandeVondele.
MODULE M1
  IMPLICIT NONE
  type foo
 character(len=5) :: x
  end type foo
CONTAINS
  SUBROUTINE S1(data)
INTEGER, DIMENSION(:), INTENT(IN), 
 OPTIONAL   :: DATA
character(20) :: line
IF (.not. PRESENT(data)) call abort
write (unit=line,fmt='(I5)') size(data)
if (line /= '0   ') call abort
  END SUBROUTINE S1

  subroutine s_type(data)
type(foo), dimension(:), intent(in), optional :: data
character(20) :: line
IF (.not. PRESENT(data)) call abort
write (unit=line,fmt='(I5)') size(data)
if (line /= '0   ') call abort
  end subroutine s_type

  SUBROUTINE S2(N)
INTEGER :: N
INTEGER, ALLOCATABLE, DIMENSION(:, :):: blki
type(foo), allocatable, dimension(:, :)  :: bar
ALLOCATE(blki(3,N))
allocate (bar(3,n))
blki=0
CALL S1(RESHAPE(blki,(/3*N/)))
call s_type(reshape(bar, (/3*N/)))
  END SUBROUTINE S2

END MODULE M1

USE M1
CALL S2(0)
END
! { dg-final { cleanup-modules m1 } }


Re: Updated: RFA: partially hookize POINTER_SIZE

2011-06-28 Thread Tom Tromey
 Joern == Joern Rennecke amyl...@spamcop.net writes:

Joern This is basically the same patch as posted before in
Joern http://gcc.gnu.org/ml/gcc-patches/2010-11/msg02772.html and updated in
Joern http://gcc.gnu.org/viewcvs?view=revisionrevision=168273, but with a
Joern few merge conflicts in current mainline resolved.

Joern  * java-tree.h (JAVA_POINTER_SIZE): Define.
Joern  * class.c (make_class_data): Use JAVA_POINTER_SIZE.
Joern  (emit_register_classes): Likewise.
Joern  * jcf-parse.c (handle_long_constant): Likewise.
Joern  * constants.c (build_constants_constructor): Likewise.
Joern  * builtins.c (UNMARSHAL3, UNMARSHAL4, UNMARSHAL5): Likewise.
Joern  (compareAndSwapObject_builtin): Likewise.
Joern  * boehm.c (get_boehm_type_descriptor): Likewise.
Joern  (mark_reference_fields): Add log2_size parameter.  Changed all callers.
Joern gcc/cp:

One question about the Java parts...

Joern -  if (offset % (HOST_WIDE_INT) (POINTER_SIZE / BITS_PER_UNIT))
Joern +  if (offset  ((1  log2_size) - 1))

I think this has to be '(((HOST_WIDE_INT) 1)  log2_size) - 1'.
Otherwise it seems like this could overflow.

The rest of the java parts are ok.

Tom


[pph] Add cp_global_trees to cache in preload (issue4635077)

2011-06-28 Thread Gabriel Charette
Add the cp_global_trees to the cache during the preload.

Those are preconstructed trees which we only need the pointers to (i.e. they 
should be identical in both the .cc and .h)

One exception to this is the keyed_classes tree which is generated during 
parsing.

We will need to merge the keyed_classes tree eventually when working with 
multiple pph's.

2011-06-28  Gabriel Charette  gch...@google.com

* pph-streamer.c (pph_preload_common_nodes):
Add cp_global_trees[] to cache.

* g++.dg/pph/x1typerefs.cc: Remove xfail.

diff --git a/gcc/cp/pph-streamer.c b/gcc/cp/pph-streamer.c
index e919baf..c62864a 100644
--- a/gcc/cp/pph-streamer.c
+++ b/gcc/cp/pph-streamer.c
@@ -79,6 +79,17 @@ pph_preload_common_nodes (struct lto_streamer_cache_d *cache)
 if (c_global_trees[i])
   lto_streamer_cache_append (cache, c_global_trees[i]);
 
+  /* cp_global_trees[] can have NULL entries in it.  Skip them.  */
+  for (i = 0; i  CPTI_MAX; i++)
+{
+  /* Also skip trees which are generated while parsing.  */
+  if (i == CPTI_KEYED_CLASSES)
+   continue;
+
+  if (cp_global_trees[i])
+   lto_streamer_cache_append (cache, cp_global_trees[i]);
+}
+
   lto_streamer_cache_append (cache, global_namespace);
 }
 
diff --git a/gcc/testsuite/g++.dg/pph/x1typerefs.cc 
b/gcc/testsuite/g++.dg/pph/x1typerefs.cc
index ba7580f..6aa0e96 100644
--- a/gcc/testsuite/g++.dg/pph/x1typerefs.cc
+++ b/gcc/testsuite/g++.dg/pph/x1typerefs.cc
@@ -1,6 +1,3 @@
-// { dg-xfail-if BOGUS { *-*-* } { -fpph-map=pph.map } }
-// { dg-bogus c1typerefs.h:11:18: error: cannot convert 'const 
std::type_info.' to 'const std::type_info.' in initialization  { xfail *-*-* 
} 0 }
-
 #include x1typerefs.h
 
 int derived::method() {

--
This patch is available for review at http://codereview.appspot.com/4635077


Re: [pph] Add cp_global_trees to cache in preload (issue4635077)

2011-06-28 Thread Diego Novillo
On Tue, Jun 28, 2011 at 15:23, Gabriel Charette gch...@google.com wrote:

 2011-06-28  Gabriel Charette  gch...@google.com

        * pph-streamer.c (pph_preload_common_nodes):
        Add cp_global_trees[] to cache.

        * g++.dg/pph/x1typerefs.cc: Remove xfail.

OK.


Diego.


Re: [pph] Add cp_global_trees to cache in preload (issue4635077)

2011-06-28 Thread gchare

Commited as r175595.

http://codereview.appspot.com/4635077/


[patch] libiberty/cp-demangle.c: Fix CP_DEMANGLE_DEBUG SIGSEGV

2011-06-28 Thread Jan Kratochvil
Hi,

a mechanical patch which fixes during

#define CP_DEMANGLE_DEBUG
make check
-
/bin/sh: line 1:  9179 Segmentation fault  ./test-demangle  
./demangle-expected

which also fixes confusing output for _Z1hI1AIiEdEDTcldtfp_1gIT0_EEET_S2_
binary operator arguments
  binary operator
operator .
binary operator arguments
???--- template
name 'g'
template argument list
  template parameter 1
  argument list


Thanks,
Jan


libiberty/
2011-06-28  Jan Kratochvil  jan.kratoch...@redhat.com

* cp-demangle.c (d_dump): Add  (zero-based) to
DEMANGLE_COMPONENT_TEMPLATE_PARAM.  Implement
DEMANGLE_COMPONENT_FUNCTION_PARAM, DEMANGLE_COMPONENT_VECTOR_TYPE,
DEMANGLE_COMPONENT_NUMBER, DEMANGLE_COMPONENT_GLOBAL_CONSTRUCTORS,
DEMANGLE_COMPONENT_GLOBAL_DESTRUCTORS, DEMANGLE_COMPONENT_LAMBDA,
DEMANGLE_COMPONENT_DEFAULT_ARG and DEMANGLE_COMPONENT_UNNAMED_TYPE.
Print ??? %d on unknown dc-type.

--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -506,7 +507,10 @@ d_dump (struct demangle_component *dc, int indent)
   printf (name '%.*s'\n, dc-u.s_name.len, dc-u.s_name.s);
   return;
 case DEMANGLE_COMPONENT_TEMPLATE_PARAM:
-  printf (template parameter %ld\n, dc-u.s_number.number);
+  printf (template parameter %ld (zero-based)\n, dc-u.s_number.number);
+  return;
+case DEMANGLE_COMPONENT_FUNCTION_PARAM:
+  printf (function parameter %ld (zero-based)\n, dc-u.s_number.number);
   return;
 case DEMANGLE_COMPONENT_CTOR:
   printf (constructor %d\n, (int) dc-u.s_ctor.kind);
@@ -633,6 +637,9 @@ d_dump (struct demangle_component *dc, int indent)
 case DEMANGLE_COMPONENT_FIXED_TYPE:
   printf (fixed-point type\n);
   break;
+case DEMANGLE_COMPONENT_VECTOR_TYPE:
+  printf (vector type\n);
+  break;
 case DEMANGLE_COMPONENT_ARGLIST:
   printf (argument list\n);
   break;
@@ -675,12 +682,35 @@ d_dump (struct demangle_component *dc, int indent)
 case DEMANGLE_COMPONENT_CHARACTER:
   printf (character '%c'\n,  dc-u.s_character.character);
   return;
+case DEMANGLE_COMPONENT_NUMBER:
+  printf (number %ld\n, dc-u.s_number.number);
+  return;
 case DEMANGLE_COMPONENT_DECLTYPE:
   printf (decltype\n);
   break;
+case DEMANGLE_COMPONENT_GLOBAL_CONSTRUCTORS:
+  printf (global constructors keyed to name\n);
+  break;
+case DEMANGLE_COMPONENT_GLOBAL_DESTRUCTORS:
+  printf (global destructors keyed to name\n);
+  break;
+case DEMANGLE_COMPONENT_LAMBDA:
+  printf (lambda %d (zero-based)\n, dc-u.s_unary_num.num);
+  d_dump (dc-u.s_unary_num.sub, indent + 2);
+  return;
+case DEMANGLE_COMPONENT_DEFAULT_ARG:
+  printf (default argument %d (zero-based)\n, dc-u.s_unary_num.num);
+  d_dump (dc-u.s_unary_num.sub, indent + 2);
+  return;
+case DEMANGLE_COMPONENT_UNNAMED_TYPE:
+  printf (unnamed type %ld\n, dc-u.s_number.number);
+  return;
 case DEMANGLE_COMPONENT_PACK_EXPANSION:
   printf (pack expansion\n);
   break;
+default:
+  printf (??? %d\n, dc-type);
+  break;
 }
 
   d_dump (d_left (dc), indent + 2);


Re: [patch] Fix oversight in tuplification of DOM

2011-06-28 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/28/11 14:36, Eric Botcazou wrote:
 Hi,
 
 the attached testcase triggers an ICE when compiled at -O or above, on all 
 the 
 open branches.  This is a regression introduced with the tuplification.  The 
 problem is that 2 ARRAY_RANGE_REFs are recognized as equivalent, although 
 they 
 don't have the same number of elements.  This is so because their type isn't 
 taken into account by the hash equality function as it simply isn't recorded 
 in initialize_hash_element (GIMPLE_SINGLE_RHS case).  Now in all the other 
 cases it is recorded so this very likely is an oversight.
 
 Tested on x86_64-suse-linux, OK for all branches?
 
 
 2011-06-28  Eric Botcazou  ebotca...@adacore.com
 
   * tree-ssa-dom.c (initialize_hash_element): Fix oversight.
 
 
 2011-06-28  Eric Botcazou  ebotca...@adacore.com
 
   * gnat.dg/opt17.ad[sb]: New test.
OK.
Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOCkOJAAoJEBRtltQi2kC7d74H/1UxVoJCRtAJyLBzwPvVCKni
7uowRbHYTWVpB5y+LrrrIh8vkcuM/SZ6LAB6SuowK00G+4zQJtmvnA9DBLq65WSZ
/vOiond3LljmH8E5m7lg9umx5VO7jdErScB7xORfEezNy4857Y0p78UOkZxKiDpI
RqKThfRYK/0mjizTlDaPaBQH/LIRJU8MgxWA8SDxLKZ1FmmqhOqcyH7Z+wbGdNPf
QoHAd5xrQsA7Ga3kmwI/eBjNqlKkWS92L0ggQnn6aKsJJNeDuLdfolFKw4Fi4waN
X8BV4vYDlDVywRoFRzo1lvBIjeJ9hpJsT3cLuW6Kp3BUvEzQjyv7d0o/BRxWYfw=
=lvG1
-END PGP SIGNATURE-


Re: [RFC] Fix full memory barrier on SPARC-V8

2011-06-28 Thread Eric Botcazou
 Fair enough, you can add this code if you want.

Thanks.  Note that this is marginal for Solaris as GCC defaults to -mcpu=v9 on 
Solaris but, in all other cases, it defaults to -mcpu=v8.  I can reproduce the 
problem on the SPARC/Linux machine 'grobluk' of the CompileFarm:

cpu : TI UltraSparc II  (BlackBird)
fpu : UltraSparc II integrated FPU
prom: OBP 3.2.30 2002/10/25 14:03
type: sun4u
ncpus probed: 4
ncpus active: 4

Linux grobluk 2.6.26-2-sparc64-smp #1 SMP Thu Nov 5 03:34:29 UTC 2009 sparc64 
GNU/Linux

With the pristine compiler, the test passes with -mcpu=v9 but fails otherwise.
It passes with the patched compiler.  However, I suspect that we would still 
have problems with newer UltraSparc CPUs supporting full RMO, because the new 
insn membar_v8 is only half a memory barrier for V9.

-- 
Eric Botcazou


[patch, fortran] Always return malloc(1) for empty arrays in the library

2011-06-28 Thread Thomas Koenig

Hello world,

looking at PR 49479 and other functions in the library made me realize
there are lots of places where we don't malloc one byte for empty
arrays.

This patch is an attempt at fixing the ton of regressions likely
caused by this (like in the PR) which haven't been found yet.
No test cases, as they haven't been found yet :-)

I also noticed two places where we had a memory leak (in eoshift1 and
eoshift3), which I also fixed.

Regression-tested.  OK for trunk and, after a few days, for 4.6?

Thomas


2011-06-28  Thomas Koenig  tkoe...@gcc.gnu.org

* m4/in_pack.m4 (internal_pack_'rtype_ccode`):  If size is
less than one, allocate a single byte.
* m4/transpose.m4 (transpose_'rtype_code`):  Likewise.
* m4/cshift1.m4 (cshift1):  Likewise.
* m4/matmull.m4 (matmul_'rtype_code`):  Likewise.
* m4/unpack.m4 (unpack0_'rtype_code`):  Likewise.
* m4/ifunction_logical.m4 (name`'rtype_qual`_'atype_code):  Likewise.
* m4/matmul.m4 (name`'rtype_qual`_'atype_code):  Likewise.
* intrinics/transpose_generic.c (transpose_internal):  Likewise.
* intrinsics/unpack_generic.c (unpack_internal):  Likewise.
* m4/eoshift1.m4 (eoshift1):  Remove double allocation.
* m4/eoshift3.m4 (eoshift3):  Likewise.
* generated/all_l16.c: Regenerated.
* generated/all_l1.c: Regenerated.
* generated/all_l2.c: Regenerated.
* generated/all_l4.c: Regenerated.
* generated/all_l8.c: Regenerated.
* generated/any_l16.c: Regenerated.
* generated/any_l1.c: Regenerated.
* generated/any_l2.c: Regenerated.
* generated/any_l4.c: Regenerated.
* generated/any_l8.c: Regenerated.
* generated/count_16_l.c: Regenerated.
* generated/count_1_l.c: Regenerated.
* generated/count_2_l.c: Regenerated.
* generated/count_4_l.c: Regenerated.
* generated/count_8_l.c: Regenerated.
* generated/cshift1_16.c: Regenerated.
* generated/cshift1_4.c: Regenerated.
* generated/cshift1_8.c: Regenerated.
* generated/eoshift1_16.c: Regenerated.
* generated/eoshift1_4.c: Regenerated.
* generated/eoshift1_8.c: Regenerated.
* generated/eoshift3_16.c: Regenerated.
* generated/eoshift3_4.c: Regenerated.
* generated/eoshift3_8.c: Regenerated.
* generated/in_pack_c10.c: Regenerated.
* generated/in_pack_c16.c: Regenerated.
* generated/in_pack_c4.c: Regenerated.
* generated/in_pack_c8.c: Regenerated.
* generated/in_pack_i16.c: Regenerated.
* generated/in_pack_i1.c: Regenerated.
* generated/in_pack_i2.c: Regenerated.
* generated/in_pack_i4.c: Regenerated.
* generated/in_pack_i8.c: Regenerated.
* generated/in_pack_r10.c: Regenerated.
* generated/in_pack_r16.c: Regenerated.
* generated/in_pack_r4.c: Regenerated.
* generated/in_pack_r8.c: Regenerated.
* generated/matmul_c10.c: Regenerated.
* generated/matmul_c16.c: Regenerated.
* generated/matmul_c4.c: Regenerated.
* generated/matmul_c8.c: Regenerated.
* generated/matmul_i16.c: Regenerated.
* generated/matmul_i1.c: Regenerated.
* generated/matmul_i2.c: Regenerated.
* generated/matmul_i4.c: Regenerated.
* generated/matmul_i8.c: Regenerated.
* generated/matmul_l16.c: Regenerated.
* generated/matmul_l4.c: Regenerated.
* generated/matmul_l8.c: Regenerated.
* generated/matmul_r10.c: Regenerated.
* generated/matmul_r16.c: Regenerated.
* generated/matmul_r4.c: Regenerated.
* generated/matmul_r8.c: Regenerated.
* generated/maxloc1_16_i16.c: Regenerated.
* generated/maxloc1_16_i1.c: Regenerated.
* generated/maxloc1_16_i2.c: Regenerated.
* generated/maxloc1_16_i4.c: Regenerated.
* generated/maxloc1_16_i8.c: Regenerated.
* generated/maxloc1_16_r10.c: Regenerated.
* generated/maxloc1_16_r16.c: Regenerated.
* generated/maxloc1_16_r4.c: Regenerated.
* generated/maxloc1_16_r8.c: Regenerated.
* generated/maxloc1_4_i16.c: Regenerated.
* generated/maxloc1_4_i1.c: Regenerated.
* generated/maxloc1_4_i2.c: Regenerated.
* generated/maxloc1_4_i4.c: Regenerated.
* generated/maxloc1_4_i8.c: Regenerated.
* generated/maxloc1_4_r10.c: Regenerated.
* generated/maxloc1_4_r16.c: Regenerated.
* generated/maxloc1_4_r4.c: Regenerated.
* generated/maxloc1_4_r8.c: Regenerated.
* generated/maxloc1_8_i16.c: Regenerated.
* generated/maxloc1_8_i1.c: Regenerated.
* generated/maxloc1_8_i2.c: Regenerated.
* generated/maxloc1_8_i4.c: Regenerated.
* generated/maxloc1_8_i8.c: Regenerated.
* generated/maxloc1_8_r10.c: Regenerated.
* generated/maxloc1_8_r16.c: Regenerated.
* 

RE: [PATCH, i386] Enable -mprefer-avx128 by default for Bulldozer

2011-06-28 Thread Fang, Changpeng
Hi, 

 I re-attached the patch here. Can someone review it?

We would like to commit to trunk as well as 4.6 branch.

Thanks,

Changpeng




From: Fang, Changpeng
Sent: Monday, June 27, 2011 5:42 PM
To: Fang, Changpeng; Jan Hubicka
Cc: Uros Bizjak; gcc-patches@gcc.gnu.org; rguent...@suse.de
Subject: RE: [PATCH, i386] Enable -mprefer-avx128 by default for Bulldozer

Is this patch OK to commit to trunk?

Also I would like to backport this patch to gcc 4.6 branch. Do I have to send a 
separate
request or use this one?

Thanks,

Changpeng





From: Fang, Changpeng
Sent: Friday, June 24, 2011 7:12 PM
To: Jan Hubicka
Cc: Uros Bizjak; gcc-patches@gcc.gnu.org; rguent...@suse.de
Subject: RE: [PATCH, i386] Enable -mprefer-avx128 by default for Bulldozer

Hi,

 I have no preference in tune feature coding. But I agree with you it's better 
to
put similar things together. I modified the code following your suggestion.

Is it OK to commit this modified patch?

Thanks,

Changpeng




From: Jan Hubicka [hubi...@ucw.cz]
Sent: Thursday, June 23, 2011 6:20 PM
To: Fang, Changpeng
Cc: Uros Bizjak; gcc-patches@gcc.gnu.org; hubi...@ucw.cz; rguent...@suse.de
Subject: Re: [PATCH, i386] Enable -mprefer-avx128 by default for Bulldozer

Hi,
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -2128,6 +2128,9 @@ static const unsigned int 
 x86_avx256_split_unaligned_load
  static const unsigned int x86_avx256_split_unaligned_store
= m_COREI7 | m_BDVER1 | m_GENERIC;

 +static const unsigned int x86_prefer_avx128
 +  = m_BDVER1;

What is reason for stuff like this to not go into initial_ix86_tune_features?
I sort of liked them better when they was individual flags, but having the 
target
tunning flags spread across multiple places seems unnecesary.

Honza

From a325395439a314f87b3c79a5b9ce79a6a976a710 Mon Sep 17 00:00:00 2001
From: Changpeng Fang chfang@huainan.(none)
Date: Wed, 22 Jun 2011 15:03:05 -0700
Subject: [PATCH] Auto-vectorizer generates 128-bit AVX insns by default for bdver1

	* config/i386/i386.opt (mprefer-avx128): Redefine the flag as a Mask option.

	* config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_AVX128_OPTIMAL entry.
	(TARGET_AVX128_OPTIMAL): New definition.

	* config/i386/i386.c (initial_ix86_tune_features): Initialize
	X86_TUNE_AVX128_OPTIMAL entry.
	(ix86_option_override_internal): Enable the generation
	of the 128-bit instructions when TARGET_AVX128_OPTIMAL is set.
	(ix86_preferred_simd_mode): Use TARGET_PREFER_AVX128.
	(ix86_autovectorize_vector_sizes): Use TARGET_PREFER_AVX128.
---
 gcc/config/i386/i386.c   |   16 
 gcc/config/i386/i386.h   |4 +++-
 gcc/config/i386/i386.opt |2 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 014401b..b3434dd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2089,7 +2089,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = {
   /* X86_SOFTARE_PREFETCHING_BENEFICIAL: Enable software prefetching
  at -O3.  For the moment, the prefetching seems badly tuned for Intel
  chips.  */
-  m_K6_GEODE | m_AMD_MULTIPLE
+  m_K6_GEODE | m_AMD_MULTIPLE,
+
+  /* X86_TUNE_AVX128_OPTIMAL: Enable 128-bit AVX instruction generation for
+ the auto-vectorizer.  */
+  m_BDVER1
 };
 
 /* Feature tests against the various architecture variations.  */
@@ -2623,6 +2627,7 @@ ix86_target_string (int isa, int flags, const char *arch, const char *tune,
 { -mvzeroupper,			MASK_VZEROUPPER },
 { -mavx256-split-unaligned-load,	MASK_AVX256_SPLIT_UNALIGNED_LOAD},
 { -mavx256-split-unaligned-store,	MASK_AVX256_SPLIT_UNALIGNED_STORE},
+{ -mprefer-avx128,		MASK_PREFER_AVX128},
   };
 
   const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (flag_opts) + 6][2];
@@ -3672,6 +3677,9 @@ ix86_option_override_internal (bool main_args_p)
 	  if ((x86_avx256_split_unaligned_store  ix86_tune_mask)
 	   !(target_flags_explicit  MASK_AVX256_SPLIT_UNALIGNED_STORE))
 	target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
+	  /* Enable 128-bit AVX instruction generation for the auto-vectorizer.  */
+	  if (TARGET_AVX128_OPTIMAL  !(target_flags_explicit  MASK_PREFER_AVX128))
+	target_flags |= MASK_PREFER_AVX128;
 	}
 }
   else 
@@ -34614,7 +34622,7 @@ ix86_preferred_simd_mode (enum machine_mode mode)
   return V2DImode;
 
 case SFmode:
-  if (TARGET_AVX  !flag_prefer_avx128)
+  if (TARGET_AVX  !TARGET_PREFER_AVX128)
 	return V8SFmode;
   else
 	return V4SFmode;
@@ -34622,7 +34630,7 @@ ix86_preferred_simd_mode (enum machine_mode mode)
 case DFmode:
   if (!TARGET_VECTORIZE_DOUBLE)
 	return word_mode;
-  else if (TARGET_AVX  !flag_prefer_avx128)
+  else if (TARGET_AVX  !TARGET_PREFER_AVX128)
 	return V4DFmode;
   else if (TARGET_SSE2)
 	return V2DFmode;
@@ -34639,7 +34647,7 @@ 

[patch] Fix PR tree-optimization/49539

2011-06-28 Thread Eric Botcazou
Hi,

this is an ICE building the gnattools on ARM, a regression present on the
mainline (and reproducible on x86/Linux by switching to SJLJ exceptions).

For the reduced testcase compiled at -O:

Unable to coalesce ssa_names 2 and 174 which are marked as MUST COALESCE.
comp_last_2(ab) and  comp_last_174(ab)
+===GNAT BUG DETECTED==+
| 4.7.0 20110626 (experimental) [trunk revision 175408] (i586-suse-linux-gnu) 
GCC error:|
| SSA corruption   |
| Error detected around p.adb:3:4

The SSA names (or rather 2 related ones) have overlapping lifetimes.  The
problem is created by forwprop1.  Before:

bb 23:
  # comp_last_1(ab) = PHI comp_last_159(ab)(20), comp_last_2(ab)(22)

[...]

  comp_last_174(ab) = comp_last_1(ab) + 1;
  D.2425_175 = args.P_BOUNDS;
  D.2426_176 = D.2425_175-LB0;
  if (D.2426_176  comp_last_174(ab))
goto bb 39;
  else
goto bb 38;

bb 38:
  D.2425_177 = args.P_BOUNDS;
  D.2427_178 = D.2425_177-UB0;
  if (D.2427_178  comp_last_174(ab))
goto bb 39;
  else
goto bb 40;

[...]

  comp_last_185(ab) = comp_last_174(ab) + 1;
  D.2425_186 = args.P_BOUNDS;
  D.2426_187 = D.2425_186-LB0;
  if (D.2426_187  comp_last_185(ab))
goto bb 43;
  else
goto bb 42;


After:

  comp_last_185(ab) = comp_last_1(ab) + 2;
  D.2425_186 = args.P_BOUNDS;
  D.2426_187 = D.2425_186-LB0;
  if (D.2426_187  comp_last_185(ab))
goto bb 43;
  else
goto bb 42;


The pass already contains a check for this situation in can_propagate_from but 
it isn't applied in this case.

Tested on x86_64-suse-linux, OK for the mainline?


2011-06-28  Eric Botcazou  ebotca...@adacore.com

PR tree-optimization/49539
* tree-ssa-forwprop.c (can_propagate_from): Check for abnormal SSA
by means of stmt_references_abnormal_ssa_name.
(associate_plusminus): Call can_propagate_from before propagating
from definition statements.
(ssa_forward_propagate_and_combine): Remove superfluous newline.


-- 
Eric Botcazou
Index: tree-ssa-forwprop.c
===
--- tree-ssa-forwprop.c	(revision 175408)
+++ tree-ssa-forwprop.c	(working copy)
@@ -260,9 +260,6 @@ get_prop_source_stmt (tree name, bool si
 static bool
 can_propagate_from (gimple def_stmt)
 {
-  use_operand_p use_p;
-  ssa_op_iter iter;
-
   gcc_assert (is_gimple_assign (def_stmt));
 
   /* If the rhs has side-effects we cannot propagate from it.  */
@@ -280,9 +277,8 @@ can_propagate_from (gimple def_stmt)
 return true;
 
   /* We cannot propagate ssa names that occur in abnormal phi nodes.  */
-  FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE)
-if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (USE_FROM_PTR (use_p)))
-  return false;
+  if (stmt_references_abnormal_ssa_name (def_stmt))
+return false;
 
   /* If the definition is a conversion of a pointer to a function type,
  then we can not apply optimizations as some targets require
@@ -1780,7 +1776,8 @@ associate_plusminus (gimple stmt)
 	{
 	  gimple def_stmt = SSA_NAME_DEF_STMT (rhs2);
 	  if (is_gimple_assign (def_stmt)
-	   gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR)
+	   gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR
+	   can_propagate_from (def_stmt))
 	{
 	  code = (code == MINUS_EXPR) ? PLUS_EXPR : MINUS_EXPR;
 	  gimple_assign_set_rhs_code (stmt, code);
@@ -1797,7 +1794,8 @@ associate_plusminus (gimple stmt)
 	{
 	  gimple def_stmt = SSA_NAME_DEF_STMT (rhs1);
 	  if (is_gimple_assign (def_stmt)
-	   gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR)
+	   gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR
+	   can_propagate_from (def_stmt))
 	{
 	  code = MINUS_EXPR;
 	  gimple_assign_set_rhs_code (stmt, code);
@@ -1840,7 +1838,7 @@ associate_plusminus (gimple stmt)
   if (TREE_CODE (rhs1) == SSA_NAME)
 {
   gimple def_stmt = SSA_NAME_DEF_STMT (rhs1);
-  if (is_gimple_assign (def_stmt))
+  if (is_gimple_assign (def_stmt)  can_propagate_from (def_stmt))
 	{
 	  enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
 	  if (def_code == PLUS_EXPR
@@ -1940,7 +1938,7 @@ associate_plusminus (gimple stmt)
   if (rhs2  TREE_CODE (rhs2) == SSA_NAME)
 {
   gimple def_stmt = SSA_NAME_DEF_STMT (rhs2);
-  if (is_gimple_assign (def_stmt))
+  if (is_gimple_assign (def_stmt)  can_propagate_from (def_stmt))
 	{
 	  enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
 	  if (def_code == PLUS_EXPR
@@ -2262,8 +2260,7 @@ ssa_forward_propagate_and_combine (void)
 	  else
 		gsi_next (gsi);
 	}
-	  else if (code == POINTER_PLUS_EXPR
-		can_propagate_from (stmt))
+	  else if (code == POINTER_PLUS_EXPR  can_propagate_from (stmt))
 	{
 	  if (TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST
 		  /* ???  Better adjust the interface to that function


[ARM] Clean up dead code in thumb_pushpop

2011-06-28 Thread Richard Henderson
When I presented the patch that converted thumb1 prologue
to rtl, I said I didn't clean up thumb_pushpop.  I had 
thought about converting the epilogue to rtl as well and
deleting the function entirely.

However, for my immediate purposes cleaning up dwarf2out,
I need to remove the text-based interface to the unwind
info, and that means cleaning out the dead code from
thumb_pushpop now.

Tested with crosses to arm-elf and arm-eabi, -mthumb.
Committed as obvious.


r~
* config/arm/arm.c (thumb_pop): Rename from thumb_pushpop.  Delete
all code and arguments that handled pushes.  Update all callers.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index be03659..4c6041a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -20188,16 +20188,9 @@ thumb1_emit_multi_reg_push (unsigned long mask, 
unsigned long real_regs)
 }
 
 /* Emit code to push or pop registers to or from the stack.  F is the
-   assembly file.  MASK is the registers to push or pop.  PUSH is
-   nonzero if we should push, and zero if we should pop.  For debugging
-   output, if pushing, adjust CFA_OFFSET by the amount of space added
-   to the stack.  REAL_REGS should have the same number of bits set as
-   MASK, and will be used instead (in the same order) to describe which
-   registers were saved - this is used to mark the save slots when we
-   push high registers after moving them to low registers.  */
+   assembly file.  MASK is the registers to pop.  */
 static void
-thumb_pushpop (FILE *f, unsigned long mask, int push, int *cfa_offset,
-  unsigned long real_regs)
+thumb_pop (FILE *f, unsigned long mask)
 {
   int regno;
   int lo_mask = mask  0xFF;
@@ -20205,7 +20198,7 @@ thumb_pushpop (FILE *f, unsigned long mask, int push, 
int *cfa_offset,
 
   gcc_assert (mask);
 
-  if (lo_mask == 0  !push  (mask  (1  PC_REGNUM)))
+  if (lo_mask == 0  (mask  (1  PC_REGNUM)))
 {
   /* Special case.  Do not generate a POP PC statement here, do it in
 thumb_exit() */
@@ -20213,22 +20206,7 @@ thumb_pushpop (FILE *f, unsigned long mask, int push, 
int *cfa_offset,
   return;
 }
 
-  if (push  arm_except_unwind_info (global_options) == UI_TARGET)
-{
-  fprintf (f, \t.save\t{);
-  for (regno = 0; regno  15; regno++)
-   {
- if (real_regs  (1  regno))
-   {
- if (real_regs  ((1  regno) -1))
-   fprintf (f, , );
- asm_fprintf (f, %r, regno);
-   }
-   }
-  fprintf (f, }\n);
-}
-
-  fprintf (f, \t%s\t{, push ? push : pop);
+  fprintf (f, \tpop\t{);
 
   /* Look at the low registers first.  */
   for (regno = 0; regno = LAST_LO_REGNUM; regno++, lo_mask = 1)
@@ -20244,17 +20222,7 @@ thumb_pushpop (FILE *f, unsigned long mask, int push, 
int *cfa_offset,
}
 }
 
-  if (push  (mask  (1  LR_REGNUM)))
-{
-  /* Catch pushing the LR.  */
-  if (mask  0xFF)
-   fprintf (f, , );
-
-  asm_fprintf (f, %r, LR_REGNUM);
-
-  pushed_words++;
-}
-  else if (!push  (mask  (1  PC_REGNUM)))
+  if (mask  (1  PC_REGNUM))
 {
   /* Catch popping the PC.  */
   if (TARGET_INTERWORK || TARGET_BACKTRACE
@@ -20278,23 +20246,6 @@ thumb_pushpop (FILE *f, unsigned long mask, int push, 
int *cfa_offset,
 }
 
   fprintf (f, }\n);
-
-  if (push  pushed_words  dwarf2out_do_frame ())
-{
-  char *l = dwarf2out_cfi_label (false);
-  int pushed_mask = real_regs;
-
-  *cfa_offset += pushed_words * 4;
-  dwarf2out_def_cfa (l, SP_REGNUM, *cfa_offset);
-
-  pushed_words = 0;
-  pushed_mask = real_regs;
-  for (regno = 0; regno = 14; regno++, pushed_mask = 1)
-   {
- if (pushed_mask  1)
-   dwarf2out_reg_save (l, regno, 4 * pushed_words++ - *cfa_offset);
-   }
-}
 }
 
 /* Generate code to return from a thumb function.
@@ -20440,8 +20391,7 @@ thumb_exit (FILE *f, int reg_containing_return_addr)
 }
 
   /* Pop as many registers as we can.  */
-  thumb_pushpop (f, regs_available_for_popping, FALSE, NULL,
-regs_available_for_popping);
+  thumb_pop (f, regs_available_for_popping);
 
   /* Process the registers we popped.  */
   if (reg_containing_return_addr == -1)
@@ -20522,8 +20472,7 @@ thumb_exit (FILE *f, int reg_containing_return_addr)
   int  popped_into;
   int  move_to;
 
-  thumb_pushpop (f, regs_available_for_popping, FALSE, NULL,
-regs_available_for_popping);
+  thumb_pop (f, regs_available_for_popping);
 
   /* We have popped either FP or SP.
 Move whichever one it is into the correct register.  */
@@ -20543,8 +20492,7 @@ thumb_exit (FILE *f, int reg_containing_return_addr)
 {
   int  popped_into;
 
-  thumb_pushpop (f, regs_available_for_popping, FALSE, NULL,
-regs_available_for_popping);
+  thumb_pop (f, regs_available_for_popping);
 
   popped_into = number_of_first_bit_set (regs_available_for_popping);
 
@@ 

Re: [pph] Append DECL_CONTEXT of global namespace to cache in preload (issue4629081)

2011-06-28 Thread Diego Novillo
On Tue, Jun 28, 2011 at 18:37, Gabriel Charette gch...@google.com wrote:

 2011-06-28  Gabriel Charette  gch...@google.com

        * pph-streamer.c (pph_preload_common_nodes):
        Append DECL_CONTEXT of global_namespace to cache.

OK.


Diego.


Remove __GCC_FLOAT_NOT_NEEDED define

2011-06-28 Thread Joseph S. Myers
In the course of options changes I noted the existence of too many defines 
conditioning code built for the target 
http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00947.html.  One of those 
defines, __GCC_FLOAT_NOT_NEEDED, is not tested anywhere, and this patch 
removes the definition.  Bootstrapped with no regressions on 
x86_64-unknown-linux-gnu.  Applied to mainline as obvious.

Index: ChangeLog
===
--- ChangeLog   (revision 175606)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2011-06-28  Joseph Myers  jos...@codesourcery.com
+
+   * Makefile.in (LIBGCC2_CFLAGS): Remove -D__GCC_FLOAT_NOT_NEEDED.
+
 2011-06-28  Richard Henderson  r...@redhat.com
 
* config/arm/arm.c (thumb_pop): Rename from thumb_pushpop.  Delete
Index: Makefile.in
===
--- Makefile.in (revision 175606)
+++ Makefile.in (working copy)
@@ -670,7 +670,7 @@
 LIBGCC2_DEBUG_CFLAGS = -g
 LIBGCC2_CFLAGS = -O2 $(LIBGCC2_INCLUDES) $(GCC_CFLAGS) 
$(TARGET_LIBGCC2_CFLAGS) \
 $(LIBGCC2_DEBUG_CFLAGS) $(GTHREAD_FLAGS) \
--DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED \
+-DIN_LIBGCC2 \
 -fbuilding-libgcc -fno-stack-protector \
 $(INHIBIT_LIBC_CFLAGS)
 

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC] Fix full memory barrier on SPARC-V8

2011-06-28 Thread David Miller
From: Eric Botcazou ebotca...@adacore.com
Date: Tue, 28 Jun 2011 23:27:43 +0200

 With the pristine compiler, the test passes with -mcpu=v9 but fails otherwise.
 It passes with the patched compiler.  However, I suspect that we would still 
 have problems with newer UltraSparc CPUs supporting full RMO, because the new 
 insn membar_v8 is only half a memory barrier for V9.

Linux doesn't ever run the cpu in the RMO memory model any more.  All
sparc64 chips run only in TSO now.

All of the Niagara chips implement an even stricter than TSO memory
model, and the membars we used to have all over the kernel to handle
that properly were just wasted I-cache space.  So I just moved
unilaterally to TSO everywhere and killed off the membars necessitated
by RMO.


Request to backport two -mvzeroupper related patches to 4.6 branch

2011-06-28 Thread Fang, Changpeng
Hi, 

Attached are two patches in gcc 4.7 trunk that we request to backport to 4.6 
branch.
There are all related to -mvzerupper

1)
0001-Save-the-initial-options-after-checking-vzeroupper.patch
This patch fixes bug 47315, ICE: in extract_insn, at recog.c:2109 
(unrecognizable insn) with -mvzeroupper and __attribute__((target(avx)))

The patch was committed to trunk: 2011-05-23  H.J. Lu  hongjiu...@intel.com

The bug still exists in gcc 4.6.1. Backporting this patches would fix it.

2).
0001--config-i386-i386.c-ix86_reorg-Run-move_or_dele.patch
This patch Run move_or_delete_vzeroupper first, and was committed to trunk:
2011-05-04  Uros Bizjak  ubiz...@gmail.com


Is It OK to commit to 4.6 branch?

Thanks,

Changpeng From 0b70e1e33afa25536305f4a228409cf9b4e0eaad Mon Sep 17 00:00:00 2001
From: hjl hjl@138bc75d-0d04-0410-961f-82ee72b054a4
Date: Mon, 23 May 2011 16:51:42 +
Subject: [PATCH] Save the initial options after checking vzeroupper.

gcc/

2011-05-23  H.J. Lu  hongjiu...@intel.com

	PR target/47315
	* config/i386/i386.c (ix86_option_override_internal): Save the
	initial options after checking vzeroupper.

gcc/testsuite/

2011-05-23  H.J. Lu  hongjiu...@intel.com

	PR target/47315
	* gcc.target/i386/pr47315.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@174078 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   |6 ++
 gcc/config/i386/i386.c  |   11 ++-
 gcc/testsuite/ChangeLog |5 +
 gcc/testsuite/gcc.target/i386/pr47315.c |   10 ++
 4 files changed, 27 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr47315.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a3cb0f1..1d46b04 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2011-05-23  H.J. Lu  hongjiu...@intel.com
+
+	PR target/47315
+	* config/i386/i386.c (ix86_option_override_internal): Save the
+	initial options after checking vzeroupper.
+
 2011-05-23  David Li  davi...@google.com
 
 	PR tree-optimization/48988
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0709be8..854e376 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4191,11 +4191,6 @@ ix86_option_override_internal (bool main_args_p)
 #endif
}
 
-  /* Save the initial options in case the user does function specific options */
-  if (main_args_p)
-target_option_default_node = target_option_current_node
-  = build_target_option_node ();
-
   if (TARGET_AVX)
 {
   /* When not optimize for size, enable vzeroupper optimization for
@@ -4217,6 +4212,12 @@ ix86_option_override_internal (bool main_args_p)
   /* Disable vzeroupper pass if TARGET_AVX is disabled.  */
   target_flags = ~MASK_VZEROUPPER;
 }
+
+  /* Save the initial options in case the user does function specific
+ options.  */
+  if (main_args_p)
+target_option_default_node = target_option_current_node
+  = build_target_option_node ();
 }
 
 /* Return TRUE if VAL is passed in register with 256bit AVX modes.  */
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 72aae61..85137d0 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2011-05-23  H.J. Lu  hongjiu...@intel.com
+
+	PR target/47315
+	* gcc.target/i386/pr47315.c: New test.
+
 2011-05-23  Jason Merrill  ja...@redhat.com
 
 	* g++.dg/cpp0x/lambda/lambda-eh2.C: New.
diff --git a/gcc/testsuite/gcc.target/i386/pr47315.c b/gcc/testsuite/gcc.target/i386/pr47315.c
new file mode 100644
index 000..871d3f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr47315.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O3 -mvzeroupper } */
+
+__attribute__ ((__target__ (avx)))
+float bar (float f) {}
+
+void foo (float f)
+{
+bar (f);
+}
-- 
1.6.0.2

From 343f07cbec2d66bebe71e4f48b0403f52ebfe8f9 Mon Sep 17 00:00:00 2001
From: uros uros@138bc75d-0d04-0410-961f-82ee72b054a4
Date: Wed, 4 May 2011 17:07:03 +
Subject: [PATCH] 	* config/i386/i386.c (ix86_reorg): Run move_or_delete_vzeroupper first.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@173383 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  |   16 ++--
 gcc/config/i386/i386.c |8 
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5412506..ca85616 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2011-05-04  Uros Bizjak  ubiz...@gmail.com
+
+	* config/i386/i386.c (ix86_reorg): Run move_or_delete_vzeroupper first.
+
 2011-05-04  Eric Botcazou  ebotca...@adacore.com
 
 	* stor-layout.c (variable_size): Do not issue errors.
@@ -263,9 +267,9 @@
 
 2011-05-03  Stuart Henderson  shend...@gcc.gnu.org
 
-From Mike Frysinger:
-* config/bfin/bfin.c (bfin_cpus[]): Add 0.4 for
-bf542/bf544/bf547/bf548/bf549.
+	From Mike Frysinger:
+	* config/bfin/bfin.c (bfin_cpus[]): Add 0.4 for
+	bf542/bf544/bf547/bf548/bf549.
 
 

Re: [patch] Fix oversight in tuplification of DOM

2011-06-28 Thread Hans-Peter Nilsson
On Tue, 28 Jun 2011, Eric Botcazou wrote:
 Hi,

 the attached testcase triggers an ICE when compiled at -O or above, on all the
 open branches.  This is a regression introduced with the tuplification.  The
 problem is that 2 ARRAY_RANGE_REFs are recognized as equivalent, although they
 don't have the same number of elements.  This is so because their type isn't
 taken into account by the hash equality function as it simply isn't recorded
 in initialize_hash_element (GIMPLE_SINGLE_RHS case).  Now in all the other
 cases it is recorded so this very likely is an oversight.

 Tested on x86_64-suse-linux, OK for all branches?


 2011-06-28  Eric Botcazou  ebotca...@adacore.com

   * tree-ssa-dom.c (initialize_hash_element): Fix oversight.

This caused a regression on 4.4 for cris-elf (at least), see
PR49572.

brgds, H-P


[pph] Support simple C++ programs (issue4630074)

2011-06-28 Thread Diego Novillo

This patch adds support for emitting functions read from a PPH image.
With this, we can now run some simple C++ programs whose header has
been reconstructed from a single PPH image.

The core problem it fixes was in the saving and restoring of functions
with a body.

1- When the parser wants to register a function for code
   generation, it calls expand_or_defer_fn().  When reading from the
   pph image, we were not calling this, so the callgraph manager was
   tossing these functions out.

2- Even when we call expand_or_defer_fn, we need to take care of
   another side-effect.  In the writer, the call to expand_or_defer_fn
   sets DECL_EXTERNAL to 1 (for reasons that I'm not too sure I
   understand).  At the same time, it remembers that it forced
   DECL_EXTERNAL by setting DECL_NOT_REALLY_EXTERN.  Since I don't
   think I understand why it does this, I'm simply using
   DECL_NOT_REALLY_EXTERN in the reader to recognize that the decl is
   should have DECL_EXTERNAL set to 0.  Jason, does this make any
   sense?

This fixed a whole bunch of tests: c1builtin-object-size-2.cc,
c1funcstatic.cc, c1return-5.cc, c1simple.cc, x1autometh.cc,
x1funcstatic.cc, x1struct1.cc, x1ten-hellos.cc and x1tmplfunc.cc.

It also exposed other bugs in c c1attr-warn-unused-result.cc and
x1template.cc. Lawrence, Gab, I think this affects some of the
failures you were looking at today.  Please double check.

I also added support for 'dg-do run' tests to support x1ten-hellos.cc
which now actually works (though it is not completely bug-free, I see
that the counter it initializes starts with a bogus value).

Tested on x86_64.  Committed to branch.


cp/ChangeLog.pph
2011-06-28   Diego Novillo  dnovi...@google.com

* pph-streamer-in.c (pph_in_ld_fn): Instantiate
DECL_STRUCT_FUNCTION by calling allocate_struct_function.
Remove assertion for stream-data_in.
(pph_in_function_decl): Factor out of ...
(pph_read_tree): ... here.
* pph-streamer-out.c (pph_out_function_decl): Factor out of ...
(pph_write_tree): ... here.

testsuite/ChangeLog.pph
* g++.dg/pph/c1attr-warn-unused-result.cc: Expect an ICE.
* g++.dg/pph/x1template.cc: Likewise.
* g++.dg/pph/c1builtin-object-size-2.cc: Expect no asm difference.
* g++.dg/pph/c1funcstatic.cc: Likewise.
* g++.dg/pph/c1return-5.cc: Likewise.
* g++.dg/pph/c1simple.cc: Likewise.
* g++.dg/pph/x1autometh.cc: Likewise.
* g++.dg/pph/x1funcstatic.cc: Likewise.
* g++.dg/pph/x1struct1.cc: Likewise.
* g++.dg/pph/x1ten-hellos.cc: Likewise.
* g++.dg/pph/x1tmplfunc.cc: Likewise.
* g++.dg/pph/c1meteor-contest.cc: Adjust timeout.
* g++.dg/pph/x1dynarray1.cc: Adjust expected ICE.
* g++.dg/pph/x1namespace.cc: Likewise.
* lib/dg-pph.exp: Do not compare assembly output if the test
is marked 'dg-do run'.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index 7f70b65..1dabcf1 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -767,18 +767,17 @@ pph_in_ld_fn (pph_stream *stream, struct lang_decl_fn 
*ldf)
 }
 
 
-/* Read applicable fields of struct function instance FN from STREAM.  */
+/* Read applicable fields of struct function from STREAM.  Associate
+   the read structure to DECL.  */
 
 static struct function *
-pph_in_struct_function (pph_stream *stream)
+pph_in_struct_function (pph_stream *stream, tree decl)
 {
   size_t count, i;
   unsigned ix;
   enum pph_record_marker marker;
   struct function *fn;
 
-  gcc_assert (stream-data_in != NULL);
-
   marker = pph_in_start_record (stream, ix);
   if (marker == PPH_RECORD_END)
 return NULL;
@@ -786,7 +785,8 @@ pph_in_struct_function (pph_stream *stream)
   /* Since struct function is embedded in every decl, fn cannot be shared.  */
   gcc_assert (marker != PPH_RECORD_SHARED);
 
-  fn = ggc_alloc_cleared_function ();
+  allocate_struct_function (decl, false);
+  fn = DECL_STRUCT_FUNCTION (decl);
 
   input_struct_function_base (fn, stream-data_in, stream-ib);
 
@@ -1355,6 +1355,35 @@ pph_read_file (const char *filename)
 }
 
 
+/* Read the attributes for a FUNCTION_DECL FNDECL.  If FNDECL had
+   a body, mark it for expansion.  */
+
+static void
+pph_in_function_decl (pph_stream *stream, tree fndecl)
+{
+  DECL_INITIAL (fndecl) = pph_in_tree (stream);
+  pph_in_lang_specific (stream, fndecl);
+  DECL_SAVED_TREE (fndecl) = pph_in_tree (stream);
+  DECL_STRUCT_FUNCTION (fndecl) = pph_in_struct_function (stream, fndecl);
+  DECL_CHAIN (fndecl) = pph_in_tree (stream);
+  if (DECL_SAVED_TREE (fndecl))
+{
+  /* FIXME pph - This is somewhat gross.  When we generated the
+PPH image, the parser called expand_or_defer_fn on FNDECL,
+which marked it DECL_EXTERNAL (see expand_or_defer_fn_1 for
+details).
+
+However, this is not really an extern definition, so it was
+also marked not-really-extern (yes, I