*ping* - Re: [Patch, Fortran] Support allocatable *scalar* coarrays

2011-07-14 Thread Tobias Burnus

*ping*
http://gcc.gnu.org/ml/fortran/2011-07/msg00106.html

On 07/11/2011 09:49 AM, Tobias Burnus wrote:

On 07/10/2011 09:56 PM, Tobias Burnus wrote:
This patch implemented the trans*.c part of allocatable scalar 
coarrays; contrary to noncoarray allocatable scalars, they have 
cobounds and thus use an array descriptor.


I found a test case (part of Reinhold Bader's fortran_tests), which 
gave an ICE: Allocatable scalar coarrays with SAVE.


I have fixed that (trans-decl.c) and added a test.


The attached patch was build and regtested on x86-64-linux.
OK for the trunk?


Tobias




RE: [PATCH, ARM, iWMMXt][1/5]: ARM code generic change

2011-07-14 Thread Xinyu Qi
  Hi,
 
  It is the first part of iWMMXt maintenance.
 
  *config/arm/arm.c (arm_option_override):
Enable iWMMXt with VFP. iWMMXt and NEON are incompatible. 
 iWMMXt unsupported under Thumb-2 mode.
(arm_expand_binop_builtin): Accept immediate op (with mode VOID)
  *config/arm/arm.md:
Resettle include location of iwmmxt.md so that *arm_movdi 
 and *arm_movsi_insn could be used when iWMMXt is enabled.
 
 With the current work in trunk to handle enabled attributes 
 and per-alternative predicable attributes (Thanks Bernd) we 
 should be able to get rid of *cond_iwmmxt_movsi_insn  in 
 iwmmxt.md file. It's not a matter for this patch but for a 
 follow-up patch.
 
 Actually we should probably do the same for the various insns 
 that are dotted around all over the place with final 
 conditions that prevent matching - atleast makes the backend 
 description slightly smaller :).
 
Add pipeline description file include.
 
 It is enough to say
 
  (filename): Include.
 
 in the changelog entry.
 
 The include for the pipeline description file should be with 
 the patch that you add this in i.e. patch #5. Please add this 
 to MD_INCLUDES in t-arm as well.
 
 Also as a general note, please provide a correct Changelog entry.
 
 This is not the format that we expect Changelog entries to be in.
 Please look at the coding standards on the website for this 
 or at other patches submitted with respect to Changelog 
 entries. Please fix this for each patch in the patch stack.
 
 
 cheers
 Ramana

Thanks for reviewing. I have updated the patches and the Changelog.

*config/arm/arm.c (arm_option_override): Enable iWMMXt with VFP.
 (arm_expand_binop_builtin): Accept VOIDmode op.
*config/arm/arm.md (*arm_movdi, *arm_movsi_insn): Remove condition 
!TARGET_IWMMXT.
 (iwmmxt.md): Include location.

Thanks,
Xinyu

1_generic.diff
Description: 1_generic.diff


RE: [PATCH, ARM, iWMMXt][2/5]: intrinsic head file change

2011-07-14 Thread Xinyu Qi
 
 Hi,
 
 It is the second part of iWMMXt maintenance.


*config/arm/mmintrin.h: Revise.

Thanks,
Xinyu


2_mmintrin.diff
Description: 2_mmintrin.diff


Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-14 Thread Andreas Krebbel
On 07/13/2011 04:34 PM, Richard Guenther wrote:
 On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel
 kreb...@linux.vnet.ibm.com wrote:
 Hi,

 the widening_mul pass might increase the number of multiplications in
 the code by transforming

 a = b * c
 d = a + 2
 e = a + 3

 into:

 d = b * c + 2
 e = b * c + 3

 under the assumption that an FMA instruction is not more expensive
 than a simple add.  This certainly isn't always true.  While e.g. on
 s390 an fma is indeed not slower than an add execution-wise it has
 disadvantages regarding instruction grouping.  It doesn't group with
 any other instruction what has a major impact on the instruction
 dispatch bandwidth.

 The following patch tries to figure out the costs for adds, mults and
 fmas by building an RTX and asking the backends cost function in order
 to estimate whether it is whorthwhile doing the transformation.

 With that patch the 436.cactus hotloop contains 28 less
 multiplications than before increasing performance slightly (~2%).

 Bootstrapped and regtested on x86_64 and s390x.
 
 Ick ;)

Just to defend myself I did not invent this. I basically copied it over from 
ivopts.

 Maybe this is finally the time to introduce target hook(s) to
 get us back costs for trees?  For this case we'd need two
 actually, or just one - dependent on what finegrained information
 we pass.  Choices:

Having another target hook for costs actually doesn't look that much better to 
me.
Duplicating the logic is not really practical for backend developers.  We 
rather should
have a generic interface for calculating tree costs based on the rtx_cost hook.

Bye,

-Andreas-


RE: [PATCH, ARM, iWMMXt][3/5]: built in define and expand

2011-07-14 Thread Xinyu Qi
 Hi,
 
 It is the third part of iWMMXt maintenance.
 

*config/arm/arm.c (enum arm_builtins): Built-in fcode.
 (builtin_description bdesc_2arg): Built in declare.
 (builtin_description bdesc_1arg): Ditto.
 (arm_init_iwmmxt_builtins): Built in initialize.
 (arm_expand_builtin): Built in expand.


Thanks,
Xinyu


3_arm_c.diff
Description: 3_arm_c.diff


Re: [Patch, AVR]: Fix PR49487 (ICE for wrong rotate scratch)

2011-07-14 Thread Georg-Johann Lay
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01036.html

Georg-Johann Lay wrote:
 This is a patch to fix PR49487.

Forgot to attach it. Here it is.

 As Denis will be off-line for some time, it'd be great if
 a global reviewer would review it.  It appears that he is
 the only AVR maintainer who approves patches.
 
 The reason for the ICE is as explained in the PR:
 
 Rotate pattern use X as constraint for an operand which is
 used as scratch.  However, the operand is a match_operand
 and not a match_scratch.
 
 Because the scratch is not needed in all situations, I choose
 to use match_scratch instead of match_operand and not to fix
 the constraints.  Fixing constraints would lead to superfluous
 allocation of register if no scratch was needed.
 
 Tested with 2 FAILs less: gcc.c-torture/compile/pr46883.c
 ICEs without this patch and passes with it.
 
 The test case in the PR passes, too. That test case
 passes also the current unpatched 4.7, but it's obvious that
 the constraint/operand combination is a bug.
 
 Ok to commit and back-port to 4.6?
 
 Johann
 
   PR target/49487
   * config/avr/avr.md (rotlmode3): Generate SCRATCH instead
   of REG.
   (*rotwmode): Use const_int_operand for operands2.
   Use match_scatch for operands3.
   (*rotbmode): Ditto
   * config/avr/avr.c (avr_rotate_bytes): Treat SCRATCH.


Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 176136)
+++ config/avr/avr.md	(working copy)
@@ -1597,18 +1597,18 @@ (define_expand rotlmode3
 (match_operand:VOID 2 const_int_operand )))
 		(clobber (match_dup 3))])]
   
-  
-{
-  if (CONST_INT_P (operands[2])  0 == (INTVAL (operands[2]) % 8))
   {
-  if (AVR_HAVE_MOVW  0 == INTVAL (operands[2]) % 16)
-operands[3] = gen_reg_rtx (rotsmodemode);
-  else
-operands[3] = gen_reg_rtx (QImode);
-  }
-  else
-FAIL;
-})
+if (CONST_INT_P (operands[2])
+ 0 == INTVAL (operands[2]) % 8)
+  {
+if (AVR_HAVE_MOVW  0 == INTVAL (operands[2]) % 16)
+  operands[3] = gen_rtx_SCRATCH (rotsmodemode);
+else
+  operands[3] = gen_rtx_SCRATCH (QImode);
+  }
+else
+  FAIL;
+  })
 
 
 ;; Overlapping non-HImode registers often (but not always) need a scratch.
@@ -1620,34 +1620,38 @@ (define_expand rotlmode3
 ; Split word aligned rotates using scratch that is mode dependent.
 (define_insn_and_split *rotwmode
   [(set (match_operand:HIDI 0 register_operand =r,r,#r)
-	(rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r)
-		 (match_operand 2 immediate_operand n,n,n)))
-   (clobber (match_operand:rotsmode 3 register_operand  =rotx ))]
-  (CONST_INT_P (operands[2]) 
- (0 == (INTVAL (operands[2]) % 16)  AVR_HAVE_MOVW))
+(rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r)
+ (match_operand 2 const_int_operand n,n,n)))
+   (clobber (match_scratch:rotsmode 3 =rotx))]
+  AVR_HAVE_MOVW
+CONST_INT_P (operands[2])
+0 == INTVAL (operands[2]) % 16
   #
(reload_completed || MODEmode == DImode)
   [(const_int 0)]
-  avr_rotate_bytes (operands);
-  DONE;
-)
+  {
+avr_rotate_bytes (operands);
+DONE;
+  })
 
 
 ; Split byte aligned rotates using scratch that is always QI mode.
 (define_insn_and_split *rotbmode
   [(set (match_operand:HIDI 0 register_operand =r,r,#r)
-	(rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r)
-		 (match_operand 2 immediate_operand n,n,n)))
-   (clobber (match_operand:QI 3 register_operand =rotx ))]
-  (CONST_INT_P (operands[2]) 
- (8 == (INTVAL (operands[2]) % 16)
- 	|| (!AVR_HAVE_MOVW  0 == (INTVAL (operands[2]) % 16
+(rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r)
+ (match_operand 2 const_int_operand n,n,n)))
+   (clobber (match_scratch:QI 3 =rotx))]
+  CONST_INT_P (operands[2])
+(8 == INTVAL (operands[2]) % 16
+   || (!AVR_HAVE_MOVW
+0 == INTVAL (operands[2]) % 16))
   #
(reload_completed || MODEmode == DImode)
   [(const_int 0)]
-  avr_rotate_bytes (operands);
-  DONE;
-)
+  {
+avr_rotate_bytes (operands);
+DONE;
+  })
 
 
 ;; 
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 176141)
+++ config/avr/avr.c	(working copy)
@@ -4438,7 +4438,9 @@ avr_rotate_bytes (rtx operands[])
 if (mode == DImode)
   move_mode = QImode;
 /* Make scratch smaller if needed.  */
-if (GET_MODE (scratch) == HImode  move_mode == QImode)
+if (SCRATCH != GET_CODE (scratch)
+ HImode == GET_MODE (scratch)
+ QImode == move_mode)
   scratch = simplify_gen_subreg (move_mode, scratch, HImode, 0); 
 
 move_size = GET_MODE_SIZE (move_mode);
@@ -4534,6 +4536,8 @@ avr_rotate_bytes (rtx operands[])
 		   When this move occurs, it will break chain deadlock.
 		   The scratch register is substituted for real move.  */
 
+		

RE: [PATCH, ARM, iWMMXt][4/5]: WMMX machine description

2011-07-14 Thread Xinyu Qi
 Hi,
 
 It is the fourth part of iWMMXt maintenance.
 

Since *cond_iwmmxt_movsi_insn would be got rid of soon, I keep it unchanged.

*config/arm/arm.c (arm_output_iwmmxt_shift_immediate): New function.
 (arm_output_iwmmxt_tinsr): Ditto.
*config/arm/arm-protos.h (arm_output_iwmmxt_shift_immediate): New prototype.
 (arm_output_iwmmxt_tinsr): Ditto.
*config/arm/iwmmxt.md (WCGR0, WCGR1, WCGR2, WCGR3): New constant.
 (iwmmxt_psadbw, iwmmxt_walign, iwmmxt_tmrc, iwmmxt_tmcr): Delete.
 (iwmmxt_tbcstqi, iwmmxt_tbcsthi, iwmmxt_tbcstsi, *iwmmxt_clrv8qi, 
*iwmmxt_clrv4hi, *iwmmxt_clrv2si): Rename...
 (tbcstv8qi, tbcstv4hi, tbsctv2si, iwmmxt_clrv8qi, iwmmxt_clrv4hi, 
iwmmxt_clrv2si): ...New pattern.
 (*andmode3_iwmmxt, *iormode3_iwmmxt, *xormode3_iwmmxt, rorimode3, 
ashrimode3_iwmmxt, lshrimode3_iwmmxt, ashlimode3_iwmmxt, iwmmxt_waligni, 
iwmmxt_walignr, iwmmxt_walignr0, iwmmxt_walignr1, iwmmxt_walignr2, 
iwmmxt_walignr3, iwmmxt_setwcgr0, iwmmxt_setwcgr1, iwmmxt_setwcgr2, 
iwmmxt_setwcgr3, iwmmxt_getwcgr0, iwmmxt_getwcgr1, iwmmxt_getwcgr2, 
iwmmxt_getwcgr3): New pattern.
(All instruction patterns): Add wtype attribute.
 (*iwmmxt_arm_movdi, *iwmmxt_movsi_insn, iwmmxt_uavgrndv8qi3, 
iwmmxt_uavgrndv4hi3, iwmmxt_uavgv8qi3, iwmmxt_uavgv4hi3, iwmmxt_tinsrb, 
iwmmxt_tinsrh, iwmmxt_tinsrw, eqv8qi3, eqv4hi3, eqv2si3, gtuv8qi3, gtuv4hi3, 
gtuv2si3, gtv8qi3, gtv4hi3, gtv2si3, iwmmxt_wunpckihb, iwmmxt_wunpckihh, 
iwmmxt_wunpckihw, iwmmxt_wunpckilb, iwmmxt_wunpckilh, iwmmxt_wunpckilw, 
iwmmxt_wunpckehub, iwmmxt_wunpckehuh, iwmmxt_wunpckehuw, iwmmxt_wunpckehsb, 
iwmmxt_wunpckehsh, iwmmxt_wunpckehsw, iwmmxt_wunpckelub, iwmmxt_wunpckeluh, 
iwmmxt_wunpckeluw, iwmmxt_wunpckelsb, iwmmxt_wunpckelsh, iwmmxt_wunpckelsw, 
iwmmxt_wmadds, iwmmxt_wmaddu, iwmmxt_wsadb, iwmmxt_wsadh, iwmmxt_wsadbz, 
iwmmxt_wsadhz): Revise.
 (iwmmxt2.md): Include.
*config/arm/iwmmxt2.md: New file.
*config/arm/iterators.md (VMMX2): New mode_iterator.
*config/arm/arm.md (wtype): New attribute.
*config/arm/t-arm (MD_INCLUDES): Add iwmmxt2.md.

Thanks,
Xinyu


4_wmmx_md.diff.gz
Description: 4_wmmx_md.diff.gz


RE: [PATCH, ARM, iWMMXt][5/5]: pipeline description

2011-07-14 Thread Xinyu Qi
 Hi,
 
 It is the fifth part of iWMMXt maintenance.
 

*config/arm/t-arm (MD_INCLUDES): Add marvell-f-iwmmxt.md.
*config/arm/marvell-f-iwmmxt.md: New file.
*config/arm/arm.md (marvell-f-iwmmxt.md): Include.


Thanks,
Xinyu


5_pipeline.diff
Description: 5_pipeline.diff


[Patch, Fortran] Mark token of static coarrays restrict

2011-07-14 Thread Tobias Burnus

As the title says: Mark token of static coarrays restrict

Bootstrapped and regtested on x86-64-linux.
OK for the trunk?

Tobias
2011-07-13  Tobias Burnus  bur...@net-b.de

	* trans-decl.c (gfc_build_qualified_array): Make coarray's
	token TYPE_QUAL_RESTRICT.

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index ddc7c36..65a8efa 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -759,7 +759,9 @@ gfc_build_qualified_array (tree decl, gfc_symbol * sym)
 {
   tree token;
 
-  token = gfc_create_var_np (pvoid_type_node, caf_token);
+  token = gfc_create_var_np (build_qualified_type (pvoid_type_node,
+		   TYPE_QUAL_RESTRICT),
+ caf_token);
   GFC_TYPE_ARRAY_CAF_TOKEN (type) = token;
   DECL_ARTIFICIAL (token) = 1;
   TREE_STATIC (token) = 1;


Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-14 Thread Andreas Krebbel
On 07/13/2011 06:58 PM, Richard Henderson wrote:
 Why the force_operand?  You've got register inputs.  Either the target
 is going to support the operation or it isn't.
 Seems to me you can check the availability of the operation in the 
 optab and pass that gen_rtx_fmt_ee result to rtx_cost directly.

Do I really need to check optab for availability at this point? For FMA
convert_mult_to_fma already did the check.  But what to do if no optab is 
available for
add/mul? Assume extraordinary high costs? This probably wouldn't make sense 
since it would
make a mul or add more expensive than an fma. Or perhaps the rtx_cost hooks 
should handle
that by returning high costs for everything the backend is not able to 
implement directly?

 +   bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt));
 +   static unsigned mul_cost[NUM_MACHINE_MODES];
 +   static unsigned add_cost[NUM_MACHINE_MODES];
 +   static unsigned fma_cost[NUM_MACHINE_MODES];
 ...
 +   if (!fma_cost[mode])
 + {
 +   fma_cost[mode] = compute_costs (mode, FMA, speed);
 +   add_cost[mode] = compute_costs (mode, PLUS, speed);
 +   mul_cost[mode] = compute_costs (mode, MULT, speed);
 + }
 
 Saving cost data dependent on speed, which is non-constant.
 You probably need to make this a two dimensional array.
Right. I'll fix this.

Bye,

-Andreas-



Re: [build] Remove crt0, mcrt0 support

2011-07-14 Thread Rainer Orth
Richard,

[Talking about an immediate removal of the netware target...]

 Jan Beulich jbeul...@novell.com writes:

 Rainer Orth r...@cebitec.uni-bielefeld.de 07/13/11 4:34 PM 
which variant would you prefer: obsoletion now and removal in 4.8 or
immediate removal?

 Both are fine with me, so unless someone else objects immediate removal
 would seem better given it had been pretty much unmaintained.

 Right: it would be a one-time offort to remove the support, but
 subsequent cleanups wouldn't have to deal with the effectively dead
 code.

 I had a quick look and it doesn't seem hard: apart from removing the
 netware-specific files in gcc and libgcc (and corresponding gcc/config.gcc
 and libgcc/config.host changes), there's only a small list (apart from
 netware-related target triplets in the testsuite):

 config/elf.m4
 configure.ac
 contrib/config-list.mk
 gcc/config/i386/i386.c
 gcc/config/i386/i386.h
 gcc/doc/extend.texi
 libstdc++-v3/crossconfig.m4

 configure.ac may have to stay if binutils/src wants to retain the
 report, but that's about it.

 Let's see what the release managers/global reviewers think.

what's your take on this?

Thanks
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Fix PR ada/48711

2011-07-14 Thread Eric Botcazou
This is the build failure of GNAT 4.6.x on Windows.  The patch was posted in 
September of last year: http://gcc.gnu.org/ml/gcc/2010-09/msg00228.html

Applied on the mainline and 4.6 branch.


2011-07-14  Florian Weimer  f...@deneb.enyo.de

PR ada/48711
* g-socthi-mingw.adb (Fill): Guard against invalid MSG_WAITALL.


-- 
Eric Botcazou
Index: g-socthi-mingw.adb
===
--- g-socthi-mingw.adb	(revision 176261)
+++ g-socthi-mingw.adb	(working copy)
@@ -277,7 +277,8 @@ package body GNAT.Sockets.Thin is
   use type C.size_t;
 
   Fill  : constant Boolean :=
-(C.unsigned (Flags) and SOSC.MSG_WAITALL) /= 0;
+	SOSC.MSG_WAITALL /= -1
+	and then (C.unsigned (Flags) and SOSC.MSG_WAITALL) /= 0;
   --  Is the MSG_WAITALL flag set? If so we need to fully fill all vectors
 
   Res   : C.int;


Re: [Patch] Fix PR 49309, ICE with mudflap

2011-07-14 Thread Richard Guenther
On Wed, Jul 13, 2011 at 10:37 PM, Andrew Pinski pins...@gmail.com wrote:
 Hi,
  The problem here is that the type of the POINTER_PLUS_EXPR is
 incorrect and also the non folded version leaks to the IR.  This patch
 fixes those two problems and fixes the ICE.

 OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Ok.

Thanks,
Richard.

 Thanks,
 Andrew Pinski

 2011-07-13  Andrew Pinski  pins...@gmail.com

        PR tree-opt/49309
        * tree-mudflap.c (mf_xform_derefs_1 case MEM_REF):
        Use fold_build2_loc instead of build2.
        Use the correct type for the new tree.

 2011-07-13  Andrew Pinski  pins...@gmail.com

        PR tree-opt/49309
        * g++.dg/torture/pr49309.C: New testcase.



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther
2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com:
 One minor note, you will need to update doc/invoke.texi to document the new
 switch before checkin: -ftree-reassoc-width=n

You also need approval and somebody to review the patch before checkin.

Richard.

 --
 Michael Meissner, IBM
 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
 meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899



Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther
On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com:
 One minor note, you will need to update doc/invoke.texi to document the new
 switch before checkin: -ftree-reassoc-width=n

 You also need approval and somebody to review the patch before checkin.

Btw, rather than a new target hook and an option I suggest to use
a --param which default you can modify in the backend.

Richard.

 Richard.

 --
 Michael Meissner, IBM
 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
 meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899




Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-14 Thread Richard Guenther
On Wed, Jul 13, 2011 at 11:49 PM, Steven Bosscher stevenb@gmail.com wrote:
 On Wed, Jul 13, 2011 at 4:34 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel
 kreb...@linux.vnet.ibm.com wrote:
 Hi,

 the widening_mul pass might increase the number of multiplications in
 the code by transforming

 a = b * c
 d = a + 2
 e = a + 3

 into:

 d = b * c + 2
 e = b * c + 3

 under the assumption that an FMA instruction is not more expensive
 than a simple add.  This certainly isn't always true.  While e.g. on
 s390 an fma is indeed not slower than an add execution-wise it has
 disadvantages regarding instruction grouping.  It doesn't group with
 any other instruction what has a major impact on the instruction
 dispatch bandwidth.

 The following patch tries to figure out the costs for adds, mults and
 fmas by building an RTX and asking the backends cost function in order
 to estimate whether it is whorthwhile doing the transformation.

 With that patch the 436.cactus hotloop contains 28 less
 multiplications than before increasing performance slightly (~2%).

 Bootstrapped and regtested on x86_64 and s390x.

 Ick ;)

 +1

 Maybe this is finally the time to introduce target hook(s) to
 get us back costs for trees?  For this case we'd need two
 actually, or just one - dependent on what finegrained information
 we pass.  Choices:

  tree_code_cost (enum tree_code)
  tree_code_cost (enum tree_code, enum machine_mode mode)
  unary_cost (enum tree_code, tree actual_arg0) // args will be mostly
 SSA names or constants, but at least they are typed - works for
 mixed-typed operations
  binary_cost (...)
  ...
  unary_cost (enum tree_code, enum tree_code arg0_kind) // constant
 vs. non-constant arg, but lacks type/mode

 Or maybe add a cost function for all named insns (i.e.
 http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names)?
 I think that any form of lower GIMPLE will not be so low level that
 more combinations will exist than the available named patterns. It
 should be possible to write a gen* tool using rtx_costs to compute
 some useful cost metric for all named patterns. How complicated that
 could be (modes, reg vs. mem, etc.), I don't know... But at least that
 way we don't end up with multiple target costs depending on the IR in
 use.

Yeah, it occured to me as well that when we look for supportable operations
via optabs the same mechanism should also be able to provide a cost,
maybe as simple as attaching one to the named expanders.

Generating RTL from GIMPLE passes just to be able to use rtx_cost is,
well ... gross.  Yes, we do it in IVOPTs (and that case is even more complex),
but I don't think we want to start doing it elsewhere (look how the vectorizer
for example uses new target hooks instead of generating vectorized RTL
and then using rtx_cost).

Richard.

 Ciao!
 Steven



Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-14 Thread Jakub Jelinek
On Wed, Jul 13, 2011 at 10:30:58AM -0400, Jason Merrill wrote:
 On 07/13/2011 10:06 AM, Jakub Jelinek wrote:
 --- gcc/testsuite/lib/dg-pch.exp.jj2011-01-03 18:58:03.0 
 +0100
 +++ gcc/testsuite/lib/dg-pch.exp   2011-07-12 23:13:50.943670171 +0200
 -  dg-test -keep-output ./$bname$suffix $otherflags $flags 
 +  dg-test -keep-output ./$bname$suffix -gno-record-gcc-switches 
 $otherflags $flags 
 
 It is only necessary if somebody wants to make -grecord-gcc-switches
 the default (for bootstrap/regtest I've tweaked common.opt to do that
 to test it better).  PCH is a big mess and screws debuginfo in many ways,
 in this case it was just small differences in DW_AT_producer, but
 we have e.g. ICEs with PCH and -feliminate-dwarf-dups etc.
 
 Why would PCH change DW_AT_producer?  Because we're restoring
 single_comp_unit_die from the PCH?  Then perhaps we should set
 DW_AT_producer in output_comp_unit rather than gen_compile_unit_die.

output_comp_unit is too late, because at that point has the form been
finalized and sizes calculated etc.  But at the start of dwarf2out_finish
it is possible.
Here is a PCH friendly variant of the patch which tries to set right
producer from the start, but early in dwarf2out_finish double checks
if PCH hasn't changed it and if it did, updates it back to the expected
string.

2011-07-14  Jakub Jelinek  ja...@redhat.com

PR other/32998
* common.opt (grecord-gcc-switches, gno-record-gcc-switches): New
options.
* dwarf2out.c: Include opts.h.
(dchar_p): New typedef.  Define heap VEC for it.
(producer_string): New variable.
(gen_producer_string): New function.
(gen_compile_unit_die): Use it.
(dwarf2out_finish): Fix up comp_unit_die () DW_AT_producer
if needed.
* Makefile.in (dwarf2out.o): Depend on $(OPTS_H).

--- gcc/common.opt.jj   2011-07-13 17:31:09.0 +0200
+++ gcc/common.opt  2011-07-14 10:36:40.0 +0200
@@ -2184,6 +2184,14 @@ ggdb
 Common JoinedOrMissing
 Generate debug information in default extended format
 
+gno-record-gcc-switches
+Common RejectNegative Var(dwarf_record_gcc_switches,0) Init(0)
+Don't record gcc command line switches in DWARF DW_AT_producer.
+
+grecord-gcc-switches
+Common RejectNegative Var(dwarf_record_gcc_switches,1)
+Record gcc command line switches in DWARF DW_AT_producer.
+
 gstabs
 Common JoinedOrMissing Negative(gstabs+)
 Generate debug information in STABS format
--- gcc/dwarf2out.c.jj  2011-07-13 17:31:18.0 +0200
+++ gcc/dwarf2out.c 2011-07-14 11:04:50.0 +0200
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.  
 #include tree-pass.h
 #include tree-flow.h
 #include cfglayout.h
+#include opts.h
 
 static void dwarf2out_source_line (unsigned int, const char *, int, bool);
 static rtx last_var_location_insn;
@@ -18108,13 +18109,125 @@ gen_ptr_to_mbr_type_die (tree type, dw_d
   add_type_attribute (ptr_die, TREE_TYPE (type), 0, 0, context_die);
 }
 
+typedef const char *dchar_p; /* For DEF_VEC_P.  */
+DEF_VEC_P(dchar_p);
+DEF_VEC_ALLOC_P(dchar_p,heap);
+
+static char *producer_string;
+
+/* Return a heap allocated producer string including command line options
+   if -grecord-gcc-switches.  */
+
+static char *
+gen_producer_string (void)
+{
+  size_t j;
+  VEC(dchar_p, heap) *switches = NULL;
+  const char *language_string = lang_hooks.name;
+  char *producer, *tail;
+  const char *p;
+  size_t len = dwarf_record_gcc_switches ? 0 : 3;
+  size_t plen = strlen (language_string) + 1 + strlen (version_string);
+
+  for (j = 1; dwarf_record_gcc_switches  j  save_decoded_options_count; j++)
+switch (save_decoded_options[j].opt_index)
+  {
+  case OPT_o:
+  case OPT_d:
+  case OPT_dumpbase:
+  case OPT_dumpdir:
+  case OPT_auxbase:
+  case OPT_auxbase_strip:
+  case OPT_quiet:
+  case OPT_version:
+  case OPT_v:
+  case OPT_w:
+  case OPT_L:
+  case OPT_D:
+  case OPT_I:
+  case OPT_U:
+  case OPT_SPECIAL_unknown:
+  case OPT_SPECIAL_ignore:
+  case OPT_SPECIAL_program_name:
+  case OPT_SPECIAL_input_file:
+  case OPT_grecord_gcc_switches:
+  case OPT_gno_record_gcc_switches:
+  case OPT__output_pch_:
+  case OPT_fdiagnostics_show_location_:
+  case OPT_fdiagnostics_show_option:
+  case OPT:
+  case OPT__sysroot_:
+   /* Ignore these.  */
+   continue;
+  default:
+   if (save_decoded_options[j].orig_option_with_args_text[0] != '-')
+ continue;
+   switch (save_decoded_options[j].orig_option_with_args_text[1])
+ {
+ case 'M':
+ case 'i':
+ case 'W':
+   continue;
+ case 'n':
+   if (save_decoded_options[j].orig_option_with_args_text[2] == 'o')
+ continue;
+   break;
+ case 'f':
+   if (strncmp (save_decoded_options[j].orig_option_with_args_text
++ 2, dump, 4) == 0)

Re: [build] Remove crt0, mcrt0 support

2011-07-14 Thread Richard Guenther
On Thu, 14 Jul 2011, Rainer Orth wrote:

 Richard,
 
 [Talking about an immediate removal of the netware target...]
 
  Jan Beulich jbeul...@novell.com writes:
 
  Rainer Orth r...@cebitec.uni-bielefeld.de 07/13/11 4:34 PM 
 which variant would you prefer: obsoletion now and removal in 4.8 or
 immediate removal?
 
  Both are fine with me, so unless someone else objects immediate removal
  would seem better given it had been pretty much unmaintained.
 
  Right: it would be a one-time offort to remove the support, but
  subsequent cleanups wouldn't have to deal with the effectively dead
  code.
 
  I had a quick look and it doesn't seem hard: apart from removing the
  netware-specific files in gcc and libgcc (and corresponding gcc/config.gcc
  and libgcc/config.host changes), there's only a small list (apart from
  netware-related target triplets in the testsuite):
 
  config/elf.m4
  configure.ac
  contrib/config-list.mk
  gcc/config/i386/i386.c
  gcc/config/i386/i386.h
  gcc/doc/extend.texi
  libstdc++-v3/crossconfig.m4
 
  configure.ac may have to stay if binutils/src wants to retain the
  report, but that's about it.
 
  Let's see what the release managers/global reviewers think.
 
 what's your take on this?

I'm fine with it if you install a deprecation patch on the 4.6 branch
and mention that in the 4.6 changes.html.

Richard.

-- 
Richard Guenther rguent...@suse.de
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

[Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera
This patch adds a caf_runtime_error function to the non-MPI 
implementation of Coarray Fortran. It is based on the MPI function of 
the same name in mpi.c.


Ok to commit?

ChangeLog:

2011-07-14  Daniel Carrera  dcarr...@gmail.com

* caf/single.c:  Include stdarg.h header.
(caf_runtime_error): New function based on the function in
mpi.c with the same name.
(_gfortran_caf_init): Use caf_runtime_error.
* caf/mpi.c (caf_runtime_error): Add a note to keep in sync
with the function in single.c.
--
I'm not overweight, I'm undertall.
Index: libgfortran/caf/single.c
===
--- libgfortran/caf/single.c	(revision 176230)
+++ libgfortran/caf/single.c	(working copy)
@@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI
 #include stdio.h  /* For fputs and fprintf.  */
 #include stdlib.h /* For exit and malloc.  */
 #include string.h /* For memcpy and memset.  */
+#include stdarg.h /* For variadic arguments.  */
 
 /* Define GFC_CAF_CHECK to enable run-time checking.  */
 /* #define GFC_CAF_CHECK  1  */
@@ -40,6 +41,21 @@ see the files COPYING3 and COPYING.RUNTI
 caf_static_t *caf_static_list = NULL;
 
 
+/* Keep in sync with mpi.c.  */
+static void
+caf_runtime_error (int error, const char *message, ...)
+{
+  va_list ap;
+  fprintf (stderr, Fortran runtime error.);
+  va_start (ap, message);
+  fprintf (stderr, message, ap);
+  va_end (ap);
+  fprintf (stderr, \n);
+
+  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */
+  exit (error);
+}
+
 void
 _gfortran_caf_init (int *argc __attribute__ ((unused)),
 		char ***argv __attribute__ ((unused)),
@@ -73,12 +89,12 @@ _gfortran_caf_register (ptrdiff_t size, 
 
   if (unlikely (local == NULL || token == NULL))
 {
+  const char msg[] = Failed to allocate coarray;
   if (stat)
 	{
 	  *stat = 1;
 	  if (errmsg_len  0)
 	{
-	  const char msg[] = Failed to allocate coarray;
 	  int len = ((int) sizeof (msg)  errmsg_len) ? errmsg_len
 			  : (int) sizeof (msg);
 	  memcpy (errmsg, msg, len);
@@ -88,10 +104,7 @@ _gfortran_caf_register (ptrdiff_t size, 
 	  return NULL;
 	}
   else
-	{
-	  fprintf (stderr, ERROR: Failed to allocate coarray);
-	  exit (1);
-	}
+	  caf_runtime_error (1, msg);
 }
 
   if (stat)
Index: libgfortran/caf/mpi.c
===
--- libgfortran/caf/mpi.c	(revision 176230)
+++ libgfortran/caf/mpi.c	(working copy)
@@ -47,6 +47,7 @@ static int caf_is_finalized;
 caf_static_t *caf_static_list = NULL;
 
 
+/* Keep in sync with single.c.  */
 static void
 caf_runtime_error (int error, const char *message, ...)
 {
@@ -62,7 +63,7 @@ caf_runtime_error (int error, const char
   MPI_Abort (MPI_COMM_WORLD, error);
 
   /* Should be unreachable, but to make sure also call exit.  */
-  exit (2);
+  exit (error);
 }
 
 


Re: [build] Move crtfastmath to toplevel libgcc

2011-07-14 Thread Andreas Schwab
H.J. Lu hjl.to...@gmail.com writes:

 On Wed, Jul 13, 2011 at 10:12 AM, Rainer Orth
 r...@cebitec.uni-bielefeld.de wrote:
 Richard Henderson r...@redhat.com writes:

 On 07/13/2011 09:57 AM, Rainer Orth wrote:
 Do you think the revised crtfastmath patch is safe enough to commit
 together to avoid this mess?

 Probably.

 Ok.  I'll will take this on me to get us out of this mess.  It has
 survived i386-pc-solaris2.11, sparc-sun-solaris2.11,
 x86_64-unknown-linux-gnu, and i386-apple-darwin9.8.0 bootstraps, so the
 risk seems acceptable.

 +# -frandom-seed is necessary to keep the mangled name of the constructor 
 on
 +# Tru64 Unix stable, but harmless otherwise.

 Instead of implying permanent stability, I'd mention bootstrap comparison
 failures specifically.

 Ok, will do.

 I think your patch caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49739

Same on ia64:

Configuration mismatch!
Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o
Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o

Andreas.

-- 
Andreas Schwab, sch...@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84  5EC7 45C6 250E 6F00 984E
And now for something completely different.


Re: [build] Remove crt0, mcrt0 support

2011-07-14 Thread Rainer Orth
Richard,

 what's your take on this?

 I'm fine with it if you install a deprecation patch on the 4.6 branch
 and mention that in the 4.6 changes.html.

ok, will do.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich
2011/7/14 Richard Guenther richard.guent...@gmail.com:

 Btw, rather than a new target hook and an option I suggest to use
 a --param which default you can modify in the backend.

 Richard.


Introduced target hook does not just define default value for option.
It is meant to make decision in each particular case. For now it
returns a constant value but march dependent heuristics will be added
later.

Ilya


Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt
PR11320 was a scheduler problem where an instruction was moved backwards
across a branch, leading to
addl r14 = @ltoffx(.LC2), r1
;;
(p7) addl r14 = 1, r0   --- r14 clobbered
;;
(p7) st4 [r15] = r14
ld8.mov r14 = [r14], .LC2   --- crash
(branch was here)

At the time, this was solved by a patch that recognizes when a register
is conditionally set in a basic block, and then treats the branch as
setting such a register:

http://gcc.gnu.org/ml/gcc-patches/2003-07/msg01044.html

The proposed fix is to say that JUMP_INSNs set the registers that are
live on entry of the fallthrough block and are conditionally set before
the jump, because they can be considered as restoring the former value
of the conditionally set registers.

While it is true that a jump could be seen as setting the correct value
for a register, this isn't at all related to conditional sets. Consider
a trivial example:

a = b + 3;
if (b  0) { use a } else { use a }

where the value of a is different in each branch of the else, with no
conditional execution in sight. From the point of view of the uses, it
can be argued that the jump sets the value, but this is irrelevant for
scheduling purposes.

There's nothing wrong with using a wrong value in an instruction hoisted
across a branch if the result of that instruction will (eventually) be
dead. We do this all the time in sched-ebb. The patch needlessly
restricts scheduling opportunities and should be reverted.

The real problem here is that the ia64 backend lies to the rest of the
compiler; it produces a load instruction without showing a MEM anywhere
in the instruction pattern. Hence, the following patch, which reverts
the bogus scheduler changes and adds a MEM to a pattern in ia64.md. The
represenation is somewhat weird, but no more so than before where the
load was represented as a plain LO_SUM. The documentation I have doesn't
mention ld8.mov, so I'll have to leave it to the ia64 maintainers to
choose between this change or implement something more involved.

Bootstrapped and tested without regressions on ia64-linux with
languages=c,c++,obj,fortran and --disable-shared; that was what I could
get to work at all at the moment. Even then, C++ seems pretty broken.
I've also built a 3.3 cross-cc1 to see if the original bug is still
fixed (it is).

Ok (scheduler  ia64 bits)?


Bernd
Revert
2003-07-10  Eric Botcazou  ebotca...@libertysurf.fr

PR optimization/11320
* sched-int.h (struct deps) [reg_conditional_sets]: New field.
(struct sched_info) [compute_jump_reg_dependencies]: New prototype.
* sched-deps.c (sched_analyze_insn) [JUMP_INSN]: Update call to
current_sched_info-compute_jump_reg_dependencies. Record which
registers are used and which registers are set by the jump.
Clear deps-reg_conditional_sets after a barrier.
Set deps-reg_conditional_sets if the insn is a COND_EXEC.
Clear deps-reg_conditional_sets if the insn is not a COND_EXEC.
(init_deps): Initialize reg_conditional_sets.
(free_deps): Clear reg_conditional_sets.
* sched-ebb.c (compute_jump_reg_dependencies): New prototype.
Mark registers live on entry of the fallthrough block and conditionally
set as set by the jump. Mark registers live on entry of non-fallthrough
blocks as used by the jump.
* sched-rgn.c (compute_jump_reg_dependencies): New prototype.
Mark new parameters as unused.

* config/ia64/ia64.md (load_symptr_low): Show a MEM.
* config/ia64/ia64.c (ia64_expand_load_address): Generate it.

Index: gcc/sched-ebb.c
===
--- gcc/sched-ebb.c (revision 176195)
+++ gcc/sched-ebb.c (working copy)
@@ -238,28 +238,18 @@
   return 1;
 }
 
- /* INSN is a JUMP_INSN, COND_SET is the set of registers that are
-conditionally set before INSN.  Store the set of registers that
-must be considered as used by this jump in USED and that of
-registers that must be considered as set in SET.  */
+ /* INSN is a JUMP_INSN.  Store the set of registers that
+must be considered as used by this jump in USED.  */
 
 void
-ebb_compute_jump_reg_dependencies (rtx insn, regset cond_set, regset used,
-  regset set)
+ebb_compute_jump_reg_dependencies (rtx insn, regset used)
 {
   basic_block b = BLOCK_FOR_INSN (insn);
   edge e;
   edge_iterator ei;
 
   FOR_EACH_EDGE (e, ei, b-succs)
-if (e-flags  EDGE_FALLTHRU)
-  /* The jump may be a by-product of a branch that has been merged
-in the main codepath after being conditionalized.  Therefore
-it may guard the fallthrough block from using a value that has
-conditionally overwritten that of the main codepath.  So we
-consider that it restores the value of the main codepath.  */
-  bitmap_and (set, df_get_live_in (e-dest), 

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Tobias Burnus

On 07/14/2011 11:48 AM, Daniel Carrera wrote:
This patch adds a caf_runtime_error function to the non-MPI 
implementation of Coarray Fortran. It is based on the MPI function of 
the same name in mpi.c.


Ok to commit?


I was wondering - based on the discussion - whether one should remove 
the int error argument from caf_runtime_error and simply use exit 
(EXIT_FAILURE) for all exit() calls in mpi.c/single.c, cf. 
http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html

But one can also do so as follow up patch.

Thus, OK for the patch as is - or with replacing all exit(...) by 
exit(EXIT_FAILURE), which  uses stdlib.h's EXIT_FAILURE. One then can 
also drop the int error argument to the caf_runtime_error function.


Thanks for the patch!

Tobias


ChangeLog:

2011-07-14  Daniel Carrera dcarr...@gmail.com

* caf/single.c:  Include stdarg.h header.
(caf_runtime_error): New function based on the function in
mpi.c with the same name.
(_gfortran_caf_init): Use caf_runtime_error.
* caf/mpi.c (caf_runtime_error): Add a note to keep in sync
with the function in single.c.


Re: [Patch, Fortran] Allocate + CAF library

2011-07-14 Thread Daniel Carrera

*ping* ?

On 07/11/2011 08:16 PM, Daniel Carrera wrote:

Hello,

This is my largest patch so far and the first that I'll commit myself.
This patch improves support for the ALLOCATE statement when using the
coarray library. Specifically, it adds support for the stat= and errmsg=
attributes:

ALLOCATE( x(n)[*] , stat=i , errmsg=str )

These attributes are now written by the CAF library. This patch also
involves a good amount of code cleanup.

ChangeLog is attached.

As soon as I get the go-ahead, I'll commit this patch.


Cheers,
Daniel.



--
I'm not overweight, I'm undertall.


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther
On Thu, Jul 14, 2011 at 11:59 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2011/7/14 Richard Guenther richard.guent...@gmail.com:

 Btw, rather than a new target hook and an option I suggest to use
 a --param which default you can modify in the backend.

 Richard.


 Introduced target hook does not just define default value for option.
 It is meant to make decision in each particular case. For now it
 returns a constant value but march dependent heuristics will be added
 later.

But then how comes the option to override it is useful?  It isn't dependent
on the particular case.  At least the option should be a --param.

Richard.

 Ilya



Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera

On 07/14/2011 12:04 PM, Tobias Burnus wrote:

I was wondering - based on the discussion - whether one should remove
the int error argument from caf_runtime_error and simply use exit
(EXIT_FAILURE) for all exit() calls in mpi.c/single.c, cf.
http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html
But one can also do so as follow up patch.


You are the boss. The message I got from Nick's post is that it doesn't 
matter much and that you could even get surprising behaviour because 
EXIT_SUCCESS is not required to be zero and EXIT_FAILURE is not required 
to be non-zero. But maybe I missed the point. So it's up to you.


Cheers,
Daniel.
--
I'm not overweight, I'm undertall.


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich
 One minor note, you will need to update doc/invoke.texi to document the new
 switch before checkin: -ftree-reassoc-width=n

 --
 Michael Meissner, IBM

Thanks for the note! Here is fixed patch.

Ilya
--
gcc/

2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): New hook documentation.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (ftree-reassoc-width): New option documented.

* hooks.h (hook_int_const_gimple_1): New default hook.

* hooks.c (hook_int_const_gimple_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* common.opt (ftree-reassoc-width): New option added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
-ftree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


Re: [build] Move crtfastmath to toplevel libgcc

2011-07-14 Thread Rainer Orth
Andreas Schwab sch...@redhat.com writes:

 Same on ia64:

 Configuration mismatch!
 Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o
 Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o 
 crtfastmath.o

I believe you need the following patch.

As one of my next tasks, I'll work to completely move
crtstuff/extra_parts/EXTRA_PARTS/EXTRA_MULTILIB_PARTS over to libgcc so
there's no duplication of configuration between gcc and libgcc.

If this tests ok, I suppose it can go in as obvious.

Rainer


2011-07-14  Rainer Orth  r...@cebitec.uni-bielefeld.de

Revert:
* config.gcc (ia64*-*-elf*): Remove crtfastmath.o from extra_parts.
(ia64*-*-freebsd*): Likewise.
(ia64*-*-linux*): Likewise.
(mips64*-*-linux*): Likewise.
(mips*-*-linux*): Likewise.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 176266)
+++ gcc/config.gcc  (working copy)
@@ -1594,13 +1594,13 @@
then
target_cpu_default=${target_cpu_default}|MASK_GNU_LD
fi
-   extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o
+   extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o
;;
 ia64*-*-freebsd*)
tm_file=${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} ia64/sysv4.h 
ia64/freebsd.h
target_cpu_default=MASK_GNU_AS|MASK_GNU_LD
tmake_file=${tmake_file} ia64/t-ia64
-   extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o
+   extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o
;;
 ia64*-*-linux*)
tm_file=${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h 
ia64/sysv4.h ia64/linux.h
@@ -1609,7 +1609,7 @@
tmake_file=${tmake_file} t-libunwind-elf 
ia64/t-glibc-libunwind
fi
target_cpu_default=MASK_GNU_AS|MASK_GNU_LD
-   extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o
+   extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o
;;
 ia64*-*-hpux*)
tm_file=${tm_file} dbxelf.h elfos.h ia64/sysv4.h ia64/hpux.h
@@ -1849,6 +1849,7 @@
tm_defines=${tm_defines} MIPS_ISA_DEFAULT=65
;;
esac
+   extra_parts=$extra_parts crtfastmath.o
gnu_ld=yes
gas=yes
test x$with_llsc != x || with_llsc=yes
@@ -1868,6 +1869,7 @@
 mipsisa32*)
tm_defines=${tm_defines} MIPS_ISA_DEFAULT=32
 esac
+   extra_parts=$extra_parts crtfastmath.o
test x$with_llsc != x || with_llsc=yes
;;
 mips*-*-openbsd*)


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich
2011/7/14 Richard Guenther richard.guent...@gmail.com:

 But then how comes the option to override it is useful?  It isn't dependent
 on the particular case.  At least the option should be a --param.

 Richard.


Option is still useful if you want to try feature on platform with no
hook implemented and for other performance experiments. I agree
--param usage should be better here. I'll fix it.

Ilya


Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Tobias Burnus

On 07/14/2011 12:14 PM, Daniel Carrera wrote:

On 07/14/2011 12:04 PM, Tobias Burnus wrote:

I was wondering - based on the discussion - whether one should remove
the int error argument from caf_runtime_error and simply use exit
(EXIT_FAILURE) for all exit() calls in mpi.c/single.c, cf.
http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html
But one can also do so as follow up patch.


You are the boss. The message I got from Nick's post is that it 
doesn't matter much and that you could even get surprising behaviour 
because EXIT_SUCCESS is not required to be zero and EXIT_FAILURE is 
not required to be non-zero. But maybe I missed the point. So it's up 
to you.


Well, for exit(3) one knows that one will get a surprising in Windows: 
An abort. While EXIT_FAILURE [but also exit(1)] have a very high change 
to be working (nearly) everywhere. At the end, as long as the operating 
system does not react in a completely odd way, the exit status does not 
matter - it's the problem of the caller (e.g. some script) to make use 
of the result. If EXIT_FAILURE is 0 and the script assumes that it is a 
success, it's either a fault of the script writer, or of the one putting 
0 for EXIT_FAILURE into the header, or a fault of the OS designer.


Thus, I think the chance that one avoids surprises is higher with 
exit(EXIT_FAILURE) and one avoids having some random number (for the 
exit status) in the code. Users usually do not check the exact exit 
status value but only its 0- vs non-0-ness.


Hence, I think exit(EXIT_FAILURE) is better than the current version.

Tobias


[PATCH] Only insert required conversions during gimplification

2011-07-14 Thread Richard Guenther

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2011-07-14  Richard Guenther  rguent...@suse.de

* gimplify.c (gimplify_expr): Only do required conversions.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 176266)
+++ gcc/gimplify.c  (working copy)
@@ -6787,22 +6787,20 @@ gimplify_expr (tree *expr_p, gimple_seq
 
case TRUTH_NOT_EXPR:
  {
-   tree org_type = TREE_TYPE (*expr_p);
-
+   tree orig_type = TREE_TYPE (*expr_p);
*expr_p = gimple_boolify (*expr_p);
-   if (org_type != boolean_type_node)
+   if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p)))
  {
-   *expr_p = fold_convert (org_type, *expr_p);
+   *expr_p = fold_convert_loc (saved_location, orig_type, *expr_p);
ret = GS_OK;
break;
  }
+   ret = gimplify_expr (TREE_OPERAND (*expr_p, 0), pre_p, post_p,
+is_gimple_val, fb_rvalue);
+   recalculate_side_effects (*expr_p);
+   break;
  }
 
- ret = gimplify_expr (TREE_OPERAND (*expr_p, 0), pre_p, post_p,
-  is_gimple_val, fb_rvalue);
- recalculate_side_effects (*expr_p);
- break;
-
case ADDR_EXPR:
  ret = gimplify_addr_expr (expr_p, pre_p, post_p);
  break;
@@ -7227,40 +7225,36 @@ gimplify_expr (tree *expr_p, gimple_seq
case TRUTH_OR_EXPR:
case TRUTH_XOR_EXPR:
  {
-   tree org_type = TREE_TYPE (*expr_p);
-   
+   tree orig_type = TREE_TYPE (*expr_p);
*expr_p = gimple_boolify (*expr_p);
-
-   /* This shouldn't happen, but due fold-const (and here especially
-  fold_truth_not_expr) happily uses operand type and doesn't
-  automatically uses boolean_type as result, we need to keep
-  orignal type.  */
-   if (org_type != boolean_type_node)
+   if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p)))
  {
-   *expr_p = fold_convert (org_type, *expr_p);
+   *expr_p = fold_convert_loc (saved_location, orig_type, *expr_p);
ret = GS_OK;
break;
  }
- }
 
- /* With two-valued operand types binary truth expressions are
-semantically equivalent to bitwise binary expressions.  
Canonicalize
-them to the bitwise variant.  */   switch (TREE_CODE (*expr_p))
- {
- case TRUTH_AND_EXPR:
-   TREE_SET_CODE (*expr_p, BIT_AND_EXPR);
-   break;
- case TRUTH_OR_EXPR:
-   TREE_SET_CODE (*expr_p, BIT_IOR_EXPR);
-   break;
- case TRUTH_XOR_EXPR:
-   TREE_SET_CODE (*expr_p, BIT_XOR_EXPR);
-   break;
- default:
-   break;
+ /* Boolified binary truth expressions are semantically equivalent
+to bitwise binary expressions.  Canonicalize them to the
+bitwise variant.  */
+   switch (TREE_CODE (*expr_p))
+ {
+ case TRUTH_AND_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_AND_EXPR);
+   break;
+ case TRUTH_OR_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_IOR_EXPR);
+   break;
+ case TRUTH_XOR_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_XOR_EXPR);
+   break;
+ default:
+   break;
+ }
+
+   /* Continue classified as tcc_binary.  */
+   goto expr_2;
  }
- /* Classified as tcc_expression.  */
- goto expr_2;
 
case FMA_EXPR:
  /* Classified as tcc_expression.  */


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Andrey Belevantsev

Hello Bernd,

FWIW, we have discussed your change with Alexander and we think you are 
right about the scheduler changes.  One question is:


On 14.07.2011 14:03, Bernd Schmidt wrote:

--- gcc/sched-deps.c(revision 176195)
+++ gcc/sched-deps.c(working copy)
@@ -568,7 +568,7 @@
  (rev1==rev2
  ? reversed_comparison_code (cond2, NULL)
  : GET_CODE (cond2))
-   XEXP (cond1, 0) == XEXP (cond2, 0)
+   rtx_equal_p (XEXP (cond1, 0), XEXP (cond2, 0))
XEXP (cond1, 1) == XEXP (cond2, 1))
 return 1;
   return 0;


this hunk from conditions_mutex_p seems to be unrelated?

Andrey



Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt
On 07/14/11 13:20, Andrey Belevantsev wrote:
 Hello Bernd,
 
 FWIW, we have discussed your change with Alexander and we think you are
 right about the scheduler changes.  One question is:
 
 On 14.07.2011 14:03, Bernd Schmidt wrote:
 --- gcc/sched-deps.c(revision 176195)
 +++ gcc/sched-deps.c(working copy)
 @@ -568,7 +568,7 @@
(rev1==rev2
? reversed_comparison_code (cond2, NULL)
: GET_CODE (cond2))
 -   XEXP (cond1, 0) == XEXP (cond2, 0)
 +   rtx_equal_p (XEXP (cond1, 0), XEXP (cond2, 0))
 XEXP (cond1, 1) == XEXP (cond2, 1))
  return 1;
return 0;
 
 this hunk from conditions_mutex_p seems to be unrelated?

Oh yes, sorry about that. That was approved a while ago and I haven't
gotten around to checking it in.


Bernd



[patch] fix typo in doc/extend.texi, committed as obvious

2011-07-14 Thread Matthias Klose
fix a typo in doc/extend.texi, committed as obvious.

  Matthias


2011-07-14  Matthias Klose d...@ubuntu.com

* doc/extend.texi (optimize attribute): Fix typo.

Index: doc/extend.texi
===
--- doc/extend.texi (revision 176263)
+++ doc/extend.texi (working copy)
@@ -3594,7 +3594,7 @@
 @item cpu=@var{CPU}
 @cindex @code{target(cpu=@var{CPU})} attribute
 Specify the architecture to generate code for when compiling the
-function.  If you select the @code{target(cpu=power7)} attribute when
+function.  If you select the @code{target(cpu=power7)} attribute when
 generating 32-bit code, VSX and Altivec instructions are not generated
 unless you use the @option{-mabi=altivec} option on the command line.
 


Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-14 Thread Richard Guenther
On Wed, Jul 13, 2011 at 2:15 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 1:08 PM, Kai Tietz ktiet...@googlemail.com wrote:
 2011/7/13 Richard Guenther richard.guent...@gmail.com:
 On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com 
 wrote:
 Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying
 issue is that fold-const re-inttroduces TRUTH_AND/OR and co.

 I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air
 when you present it with BIT_ variants as input.

 Well, look into fold-const's fold_binary_loc function and see

  /* ARG0 is the first operand of EXPR, and ARG1 is the second operand.

     First check for cases where an arithmetic operation is applied to a
     compound, conditional, or comparison operation.  Push the arithmetic
     operation inside the compound or conditional to see if any folding
     can then be done.  Convert comparison to conditional for this purpose.
     The also optimizes non-constant cases that used to be done in
     expand_expr.

     Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR,
     one of the operands is a comparison and the other is a comparison, a
     BIT_AND_EXPR with the constant 1, or a truth value.  In that case, the
     code below would make the expression more complex.  Change it to a
     TRUTH_{AND,OR}_EXPR.  Likewise, convert a similar NE_EXPR to
     TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR.  */

  if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR
       || code == EQ_EXPR || code == NE_EXPR)
       ((truth_value_p (TREE_CODE (arg0))
            (truth_value_p (TREE_CODE (arg1))
               || (TREE_CODE (arg1) == BIT_AND_EXPR
                    integer_onep (TREE_OPERAND (arg1, 1)
          || (truth_value_p (TREE_CODE (arg1))
               (truth_value_p (TREE_CODE (arg0))
                  || (TREE_CODE (arg0) == BIT_AND_EXPR
                       integer_onep (TREE_OPERAND (arg0, 1)))
    {
      tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR
                         : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR
                         : TRUTH_XOR_EXPR,
                         boolean_type_node,
                         fold_convert_loc (loc, boolean_type_node, arg0),
                         fold_convert_loc (loc, boolean_type_node, arg1));

      if (code == EQ_EXPR)
        tem = invert_truthvalue_loc (loc, tem);

      return fold_convert_loc (loc, type, tem);
    }

 Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands
 are of kind truth.  This is btw the point, why you see that those
 cases are handled.  But as soon as this part is turned off for BIT_-
 IOR/AND, we need to do the folding for 1-bit precision case explicit.

 First of all this checks for a quite complex pattern - where do we pass
 such complex pattern from the gimple level to fold?  For the EQ/NE_EXPR
 case forwprop probably might be able to feed it that, but then how does
 it go wrong?  The above could also simply be guarded by !in_gimple_form.

 Richard.

 See reassoc pass as example

I don't see where.

 and this hacky maybe_fold_and_comparisons
 / maybe_fold_or_comparisons functions.

Yes, that case is known - but unless the folding result is sane we
drop it anyway.

  As indeed we want still be
 able to do comparison foldings without getting back an TRUTH-op.

Well, if we get back one we have canonicalize_cond_expr_cond to
make it sane again.

 Additionally we have a lot of passes - like vectorizer - which are
 happily try to build new condition on tree-level.  This is another

That's true and you promised to clean up remaining TRUTH_*
consumers / producers when doing the canonicalization to BIT_*.
At the moment they get back to BIT_* via force_gimple_operand*
which is probably even fine.

 place I saw issues and tree-cfg failures. And last but not least those
 truth-ops might be reintroduced in gimple_fold, as soon as we see
 bitwise-ops on one-bit precision integral type as truth_value.

Well, I already said that one-bit-precision as truth_value is wrong.

Relying on fold-const.c here with its very strict type rules always
was a stretch.  If it turns out there are now cases that we want to
do differently then we finally have to bite the bullet and do it.
There's always two options - duplicate it properly in fold_stmt
(or forwprop in case it needs to lookup defs), or remove it from fold
and make sure we fold things using fold_stmt at gimplifcation time.

Richard.

 Kai



Re: [patch] fix typo in doc/extend.texi, committed as obvious

2011-07-14 Thread Richard Guenther
On Thu, Jul 14, 2011 at 1:41 PM, Matthias Klose d...@ubuntu.com wrote:
 fix a typo in doc/extend.texi, committed as obvious.

 Specify the architecture to generate code for when compiling the
-function.  If you select the @code{target(cpu=power7)} attribute when
+function.  If you select the @code{target(cpu=power7)} attribute when

still mismatched number of ''s.  I suggest to drop the first one.

Richard.

  Matthias





Re: RFA: Avoid unnecessary clearing in union initialisers

2011-07-14 Thread Richard Sandiford
H.J. Lu hjl.to...@gmail.com writes:
 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49736

Sorry for the breakage.  It was due to a humiliating stupid mistake
in the hunk to update all_zeros_p:

@@ -5129,13 +5152,12 @@ mostly_zeros_p (const_tree exp)
 all_zeros_p (const_tree exp)
 {
   if (TREE_CODE (exp) == CONSTRUCTOR)
-
 {
-  HOST_WIDE_INT nz_elts, count;
-  bool must_clear;
+  HOST_WIDE_INT nz_elts, init_elts;
+  bool complete_p;
 
-  categorize_ctor_elements (exp, nz_elts, count, must_clear);
-  return nz_elts == 0;
+  categorize_ctor_elements (exp, nz_elts, init_elts, complete_p);
+  return nz_elts == init_elts;
 }
 
   return initializer_zerop (exp);

which was due to trying a different approach and not properly reverting it.

I've applied the following as obvious.  Tested on x86_64-linux-gnu,
and on 255.vortex.

Richard


gcc/
PR middle-end/49736
* expr.c (all_zeros_p): Undo bogus part of last change.

Index: gcc/expr.c
===
--- gcc/expr.c  2011-07-14 11:46:42.0 +0100
+++ gcc/expr.c  2011-07-14 11:47:40.0 +0100
@@ -5157,7 +5157,7 @@ all_zeros_p (const_tree exp)
   bool complete_p;
 
   categorize_ctor_elements (exp, nz_elts, init_elts, complete_p);
-  return nz_elts == init_elts;
+  return nz_elts == 0;
 }
 
   return initializer_zerop (exp);


Re: Ping: The TI C6X port

2011-07-14 Thread Bernd Schmidt
On 06/14/11 22:52, Vladimir Makarov wrote:
 On 06/06/2011 07:26 AM, Bernd Schmidt wrote:
 Ping^3 for the C6X port. Now with extra patches:

 Additional preliminary scheduler tweaks:
 http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02408.html

 It is ok for me.  Thanks, Bernd.

I've committed this with one change: I've dropped the hunk that resolves
debug_insn dependencies rather than deleting them. We should do that,
but when I tested these changes together with the backtracking
scheduler, I saw some problems, so I'll postpone this.


Bernd


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Eric Botcazou
 The real problem here is that the ia64 backend lies to the rest of the
 compiler; it produces a load instruction without showing a MEM anywhere
 in the instruction pattern. Hence, the following patch, which reverts
 the bogus scheduler changes and adds a MEM to a pattern in ia64.md.

This is probably the root cause of the problem, indeed.  But you don't revert 
everything so, if this isn't an oversight, then the ChangeLog is incorrect.
And there is another change in sched-deps.c not mentioned in the ChangeLog.

-- 
Eric Botcazou


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt
On 07/14/11 13:57, Eric Botcazou wrote:
 The real problem here is that the ia64 backend lies to the rest of the
 compiler; it produces a load instruction without showing a MEM anywhere
 in the instruction pattern. Hence, the following patch, which reverts
 the bogus scheduler changes and adds a MEM to a pattern in ia64.md.
 
 This is probably the root cause of the problem, indeed.  But you don't revert 
 everything so, if this isn't an oversight, then the ChangeLog is incorrect.
 And there is another change in sched-deps.c not mentioned in the ChangeLog.

Well, the actual code has completely changed in the meantime. All the
hunks of the original patch failed :) I can write a new ChangeLog entry
if that seems important.

Any particular bits you still see that don't get reverted with this patch?


Bernd


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt
On 07/14/11 14:18, Eric Botcazou wrote:
 Any particular bits you still see that don't get reverted with this patch?
 
 The ebb_compute_jump_reg_dependencies changes.  The original patch has:
 
   * sched-ebb.c (compute_jump_reg_dependencies): New prototype.
   Mark registers live on entry of the fallthrough block and conditionally
   set as set by the jump. Mark registers live on entry of non-fallthrough
   blocks as used by the jump.
 
 but you're reverting only:
 
   * sched-ebb.c (compute_jump_reg_dependencies): New prototype.
   Mark registers live on entry of the fallthrough block and conditionally
   set as set by the jump.
 

??? Original code:

   basic_block b = BLOCK_FOR_INSN (insn);
edge e;
for (e = b-succ; e; e = e-succ_next)
! if ((e-flags  EDGE_FALLTHRU) == 0)
!   {
!   bitmap_operation (set, set, e-dest-global_live_at_start,
! BITMAP_IOR);
!   }
  }

Code after the revert:

   FOR_EACH_EDGE (e, ei, b-succs)
+if ((e-flags  EDGE_FALLTHRU) == 0)
   bitmap_ior_into (used, df_get_live_in (e-dest));

As far as I can tell these are identical, modulo the change in variable
name (set - used which seems like a better name).


Bernd


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Eric Botcazou
 Any particular bits you still see that don't get reverted with this patch?

The ebb_compute_jump_reg_dependencies changes.  The original patch has:

* sched-ebb.c (compute_jump_reg_dependencies): New prototype.
Mark registers live on entry of the fallthrough block and conditionally
set as set by the jump. Mark registers live on entry of non-fallthrough
blocks as used by the jump.

but you're reverting only:

* sched-ebb.c (compute_jump_reg_dependencies): New prototype.
Mark registers live on entry of the fallthrough block and conditionally
set as set by the jump.

-- 
Eric Botcazou


[PATCH] Fold to more canonical TRUHT_NOT_EXPRs

2011-07-14 Thread Richard Guenther

The following generates boolean-typed TRUTH_NOT_EXPRs instead of
operating on converted boolean operands.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-07-14  Richard Guenther  rguent...@suse.de

* fold-const.c (fold_binary_loc): Convert the !bool_var result,
not bool_var when folding bool_var != 1 or bool_var == 0.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 176266)
+++ gcc/fold-const.c(working copy)
@@ -12156,14 +12156,16 @@ fold_binary_loc (location_t loc,
   /* bool_var != 1 becomes !bool_var. */
   if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE  integer_onep (arg1)
code == NE_EXPR)
-return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
-   fold_convert_loc (loc, type, arg0));
+return fold_convert_loc (loc, type,
+fold_build1_loc (loc, TRUTH_NOT_EXPR,
+ TREE_TYPE (arg0), arg0));
 
   /* bool_var == 0 becomes !bool_var. */
   if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE  integer_zerop (arg1)
code == EQ_EXPR)
-return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
-   fold_convert_loc (loc, type, arg0));
+return fold_convert_loc (loc, type,
+fold_build1_loc (loc, TRUTH_NOT_EXPR,
+ TREE_TYPE (arg0), arg0));
 
   /* !exp != 0 becomes !exp */
   if (TREE_CODE (arg0) == TRUTH_NOT_EXPR  integer_zerop (arg1)


Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-14 Thread Andreas Krebbel
On Wed, Jul 13, 2011 at 09:58:08AM -0700, Richard Henderson wrote:
 Why the force_operand?  You've got register inputs.  Either the target
 is going to support the operation or it isn't.

I agree that it doesn't seem to be necessary. I've used force_operand
since ivopts (add_cost) is doing it without seeing a clear reason for
it.  So I've removed it now.

 Saving cost data dependent on speed, which is non-constant.
 You probably need to make this a two dimensional array.

Fixed.

Here is an updated version.

Bye,

-Andreas-

2011-07-14  Andreas Krebbel  andreas.kreb...@de.ibm.com

* tree-ssa-math-opts.c (compute_costs): New function.
(convert_mult_to_fma): Take costs into account when propagating
multiplications into several additions.
* config/s390/s390.c (z196_costs): Adjust costs for madbr and
maebr.


Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c.orig
--- gcc/tree-ssa-math-opts.c
*** convert_plusminus_to_widen (gimple_stmt_
*** 2185,2190 
--- 2185,2236 
return true;
  }
  
+ /* Computing the costs for calculating RTX with CODE in MODE.  */
+ 
+ static unsigned
+ compute_costs (enum machine_mode mode, enum rtx_code code, bool speed)
+ {
+   rtx insn;
+   unsigned cost;
+ 
+   switch (GET_RTX_LENGTH (code))
+ {
+ case 2:
+   insn = gen_rtx_fmt_ee (code, mode,
+gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
+gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2));
+   break;
+ case 3:
+   insn = gen_rtx_fmt_eee (code, mode,
+ gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
+ gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2),
+ gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3));
+   break;
+ default:
+   gcc_unreachable ();
+ }
+ 
+   if (dump_file  (dump_flags  TDF_DETAILS))
+ {
+   fprintf (dump_file, Calculating costs of %s in %s mode.  RTX is:\n,
+  GET_RTX_NAME (code), GET_MODE_NAME (mode));
+   print_rtl (dump_file, insn);
+ }
+ 
+   cost = rtx_cost (insn, SET, speed);
+ 
+   /* If the backend returns a cost of zero it is most certainly lying.
+  Set this to one in order to notice that we already calculated it
+  once.  */
+   cost = cost ? cost : 1;
+ 
+   if (dump_file  (dump_flags  TDF_DETAILS))
+ fprintf (dump_file, \n%s in %s costs %d\n\n,
+  GET_RTX_NAME (code), GET_MODE_NAME (mode), cost);
+ 
+   return cost;
+ }
+ 
  /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2
 with uses in additions and subtractions to form fused multiply-add
 operations.  Returns true if successful and MUL_STMT should be removed.  */
*** convert_mult_to_fma (gimple mul_stmt, tr
*** 2197,2202 
--- 2243,2254 
gimple use_stmt, neguse_stmt, fma_stmt;
use_operand_p use_p;
imm_use_iterator imm_iter;
+   enum machine_mode mode;
+   int uses = 0;
+   bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt));
+   static unsigned mul_cost[2][NUM_MACHINE_MODES];
+   static unsigned add_cost[2][NUM_MACHINE_MODES];
+   static unsigned fma_cost[2][NUM_MACHINE_MODES];
  
if (FLOAT_TYPE_P (type)
 flag_fp_contract_mode == FP_CONTRACT_OFF)
*** convert_mult_to_fma (gimple mul_stmt, tr
*** 2213, 
if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
  return false;
  
/* Make sure that the multiplication statement becomes dead after
!  the transformation, thus that all uses are transformed to FMAs.
!  This means we assume that an FMA operation has the same cost
!  as an addition.  */
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result)
  {
enum tree_code use_code;
--- 2265,2281 
if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
  return false;
  
+   mode = TYPE_MODE (type);
+ 
+   if (!fma_cost[speed][mode])
+ {
+   fma_cost[speed][mode] = compute_costs (mode, FMA, speed);
+   add_cost[speed][mode] = compute_costs (mode, PLUS, speed);
+   mul_cost[speed][mode] = compute_costs (mode, MULT, speed);
+ }
+ 
/* Make sure that the multiplication statement becomes dead after
!  the transformation, thus that all uses are transformed to FMAs.  */
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result)
  {
enum tree_code use_code;
*** convert_mult_to_fma (gimple mul_stmt, tr
*** 2292,2297 
--- 2351,2357 
if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt))
return false;
  
+   uses++;
/* While it is possible to validate whether or not the exact form
 that we've recognized is available in the backend, the assumption
 is that the transformation is never a loss.  For instance, suppose

[PATCH] More efficiently canoncialize operands in fold_stmt

2011-07-14 Thread Richard Guenther

We currently go through a tree expression generation when canonicalizing
operand order.  That's quite wasteful.  The following patch re-orders
things in a way to avoid this and remove some duplicate code.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2011-07-14  Richard Guenther  rguent...@suse.de

* gimple-fold.c (fold_gimple_assign): Remove operand swapping.
(fold_stmt_1): Do it here directly on gimple and as a first thing.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 176267)
--- gcc/gimple-fold.c   (working copy)
*** fold_gimple_assign (gimple_stmt_iterator
*** 817,842 
  
if (!result)
  result = fold_binary_loc (loc, subcode,
!   TREE_TYPE (gimple_assign_lhs (stmt)),
!   gimple_assign_rhs1 (stmt),
!   gimple_assign_rhs2 (stmt));
  
if (result)
  {
STRIP_USELESS_TYPE_CONVERSION (result);
if (valid_gimple_rhs_p (result))
return result;
- 
- /* Fold might have produced non-GIMPLE, so if we trust it blindly
-we lose canonicalization opportunities.  Do not go again
-through fold here though, or the same non-GIMPLE will be
-produced.  */
-   if (commutative_tree_code (subcode)
-tree_swap_operands_p (gimple_assign_rhs1 (stmt),
-gimple_assign_rhs2 (stmt), false))
- return build2 (subcode, TREE_TYPE (gimple_assign_lhs (stmt)),
-gimple_assign_rhs2 (stmt),
-gimple_assign_rhs1 (stmt));
  }
break;
  
--- 817,831 
  
if (!result)
  result = fold_binary_loc (loc, subcode,
! TREE_TYPE (gimple_assign_lhs (stmt)),
! gimple_assign_rhs1 (stmt),
! gimple_assign_rhs2 (stmt));
  
if (result)
  {
STRIP_USELESS_TYPE_CONVERSION (result);
if (valid_gimple_rhs_p (result))
return result;
  }
break;
  
*** fold_gimple_assign (gimple_stmt_iterator
*** 852,869 
STRIP_USELESS_TYPE_CONVERSION (result);
if (valid_gimple_rhs_p (result))
return result;
- 
- /* Fold might have produced non-GIMPLE, so if we trust it blindly
-we lose canonicalization opportunities.  Do not go again
-through fold here though, or the same non-GIMPLE will be
-produced.  */
-   if (commutative_ternary_tree_code (subcode)
-tree_swap_operands_p (gimple_assign_rhs1 (stmt),
-gimple_assign_rhs2 (stmt), false))
- return build3 (subcode, TREE_TYPE (gimple_assign_lhs (stmt)),
-  gimple_assign_rhs2 (stmt),
-  gimple_assign_rhs1 (stmt),
-  gimple_assign_rhs3 (stmt));
  }
break;
  
--- 841,846 
*** fold_stmt_1 (gimple_stmt_iterator *gsi,
*** 1576,1583 
  case GIMPLE_ASSIGN:
{
unsigned old_num_ops = gimple_num_ops (stmt);
!   tree new_rhs = fold_gimple_assign (gsi);
tree lhs = gimple_assign_lhs (stmt);
if (new_rhs
 !useless_type_conversion_p (TREE_TYPE (lhs),
   TREE_TYPE (new_rhs)))
--- 1553,1574 
  case GIMPLE_ASSIGN:
{
unsigned old_num_ops = gimple_num_ops (stmt);
!   enum tree_code subcode = gimple_assign_rhs_code (stmt);
tree lhs = gimple_assign_lhs (stmt);
+   tree new_rhs;
+   /* First canonicalize operand order.  This avoids building new
+  trees if this is the only thing fold would later do.  */
+   if ((commutative_tree_code (subcode)
+|| commutative_ternary_tree_code (subcode))
+tree_swap_operands_p (gimple_assign_rhs1 (stmt),
+gimple_assign_rhs2 (stmt), false))
+ {
+   tree tem = gimple_assign_rhs1 (stmt);
+   gimple_assign_set_rhs1 (stmt, gimple_assign_rhs2 (stmt));
+   gimple_assign_set_rhs2 (stmt, tem);
+   changed = true;
+ }
+   new_rhs = fold_gimple_assign (gsi);
if (new_rhs
 !useless_type_conversion_p (TREE_TYPE (lhs),
   TREE_TYPE (new_rhs)))


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Eric Botcazou
 ??? Original code:

basic_block b = BLOCK_FOR_INSN (insn);
 edge e;
 for (e = b-succ; e; e = e-succ_next)
 ! if ((e-flags  EDGE_FALLTHRU) == 0)
 !   {
 ! bitmap_operation (set, set, e-dest-global_live_at_start,
 !   BITMAP_IOR);
 !   }
   }

 Code after the revert:

FOR_EACH_EDGE (e, ei, b-succs)
 +if ((e-flags  EDGE_FALLTHRU) == 0)
bitmap_ior_into (used, df_get_live_in (e-dest));

 As far as I can tell these are identical, modulo the change in variable
 name (set - used which seems like a better name).

Yes, the code does the same thing, but the original patch did clear up the 
confusion set/use in sched_analyze_insn and compute_jump_reg_dependencies,
in particular in the comment of the latter.  But OK, never mind.

-- 
Eric Botcazou


Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-14 Thread Andrew Stubbs
This update changes only the context modified by changes to patch 2. The 
patch has already been approved. I'm just posting it for completeness.


Andrew
2011-07-14  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single
	conversion statement separating multiply-and-accumulate.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+int
+foo (int a, short b, short c)
+{
+ int bc = b * c;
+return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler mul } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler umlal } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
   tree type, type1, type2, tmp;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
@@ -2177,6 +2178,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
 return false;
 
+  /* Allow for one conversion statement between the multiply
+ and addition/subtraction statement.  If there are more than
+ one conversions then we assume they would invalidate this
+ transformation.  If that's not the case then they should have
+ been folded before now.  */
+  if (CONVERT_EXPR_CODE_P (rhs1_code))
+{
+  conv1_stmt = rhs1_stmt;
+  rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+  if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+  else
+	return false;
+}
+  if (CONVERT_EXPR_CODE_P (rhs2_code))
+{
+  conv2_stmt = rhs2_stmt;
+  rhs2 = gimple_assign_rhs1 (rhs2_stmt);
+  if (TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+  else
+	return false;
+}
+
   /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
  is_widening_mult_p, but we still need the rhs returns.
 
@@ -2190,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   type2, mult_rhs2))
 	return false;
   add_rhs = rhs2;
+  conv_stmt = conv1_stmt;
 }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
 {
@@ -2197,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   type2, mult_rhs2))
 	return false;
   add_rhs = rhs1;
+  conv_stmt = conv2_stmt;
 }
   else
 return false;
@@ -2207,6 +2242,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
 return false;
 
+  /* If there was a conversion between the multiply and addition
+ then we need to make sure it fits a multiply-and-accumulate.
+ The should be a single mode change which does not change the
+ value.  */
+  if (conv_stmt)
+{
+  tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
+  tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
+  int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  bool is_unsigned = TYPE_UNSIGNED (type1)  TYPE_UNSIGNED (type2);
+
+  if (TYPE_PRECISION (from_type)  TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is a truncate.  */
+	  if (TYPE_PRECISION (to_type)  data_size)
+	return false;
+	}
+  else if (TYPE_PRECISION (from_type)  TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is an extend.  Check it's the right sort.  */
+	  if (TYPE_UNSIGNED (from_type) != is_unsigned
+	   !(is_unsigned  TYPE_PRECISION (from_type)  data_size))
+	return false;
+	}
+  /* else convert is a no-op for our purposes.  */
+}
+
   /* Verify that the machine can perform a widening multiply
  accumulate in this mode/signedness combination, otherwise
  this transformation is likely to pessimize code.  */


Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs

2011-07-14 Thread Andrew Stubbs
I've updated this patch following the changes earlier in the patch 
series. There isn't much left.


This should obviate all the review comments. :)

OK?

Andrew
2011-07-14  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME.
	Ensure the the larger type is the first operand.

	gcc/testsuite/
	* gcc.target/arm/wmul-7.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+unsigned long long
+foo (unsigned long long a, unsigned char *b, unsigned short *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler umlal } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2053,9 +2053,17 @@ is_widening_mult_p (gimple stmt,
   *type2_out = *type1_out;
 }
 
-  /* FIXME: remove this restriction.  */
-  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
-return false;
+  /* Ensure that the larger of the two operands comes first. */
+  if (TYPE_PRECISION (*type1_out)  TYPE_PRECISION (*type2_out))
+{
+  tree tmp;
+  tmp = *type1_out;
+  *type1_out = *type2_out;
+  *type2_out = tmp;
+  tmp = *rhs1_out;
+  *rhs1_out = *rhs2_out;
+  *rhs2_out = tmp;
+}
 
   return true;
 }


Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs

2011-07-14 Thread Richard Guenther
On Thu, Jul 14, 2011 at 4:28 PM, Andrew Stubbs a...@codesourcery.com wrote:
 I've updated this patch following the changes earlier in the patch series.
 There isn't much left.

 This should obviate all the review comments. :)

Indeed ;)

 OK?

Ok.

Thanks,
Richard.

 Andrew



Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching

2011-07-14 Thread Andrew Stubbs

On 07/07/11 11:13, Richard Guenther wrote:

This updates the context changed by my update to patch 3.

  The content of this patch has not changed.

Ok.


I know this patch was already approved, but I discovered a bug in this 
patch that missed optimizing the case where the input to multiply did 
not come from an assign statement (this can happen when the value comes 
from a function parameter).


This patch fixes that case, and updates the context changed by my 
updates earlier in the patch series.


OK?

Andrew
2011-07-14  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
	'type'.
	Use 'type' from caller, not inferred from 'rhs'.
	Don't reject non-conversion statements. Do return lhs in this case.
	(is_widening_mult_p): Add new argument 'type'.
	Use 'type' from caller, not inferred from 'stmt'.
	Pass type to is_widening_mult_rhs_p.
	(convert_mult_to_widen): Pass type to is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-8.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler smlal } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1965,7 +1965,8 @@ struct gimple_opt_pass pass_optimize_bswap =
  }
 };
 
-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
There are two cases:
 
  - RHS makes some value at least twice as wide.  Store that value
@@ -1975,32 +1976,43 @@ struct gimple_opt_pass pass_optimize_bswap =
but leave *TYPE_OUT untouched.  */
 
 static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+			tree *new_rhs_out)
 {
   gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
   enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
 {
-  type = TREE_TYPE (rhs);
   stmt = SSA_NAME_DEF_STMT (rhs);
   if (!is_gimple_assign (stmt))
-	return false;
-
-  rhs_code = gimple_assign_rhs_code (stmt);
-  if (TREE_CODE (type) == INTEGER_TYPE
-	  ? !CONVERT_EXPR_CODE_P (rhs_code)
-	  : rhs_code != FIXED_CONVERT_EXPR)
-	return false;
+	{
+	  rhs1 = NULL;
+	  type1 = TREE_TYPE (rhs);
+	}
+  else
+	{
+	  rhs1 = gimple_assign_rhs1 (stmt);
+	  type1 = TREE_TYPE (rhs1);
+	}
 
-  rhs1 = gimple_assign_rhs1 (stmt);
-  type1 = TREE_TYPE (rhs1);
   if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2  TYPE_PRECISION (type))
 	return false;
 
-  *new_rhs_out = rhs1;
+  if (rhs1)
+	{
+	  rhs_code = gimple_assign_rhs_code (stmt);
+	  if (TREE_CODE (type) == INTEGER_TYPE
+	  ? !CONVERT_EXPR_CODE_P (rhs_code)
+	  : rhs_code != FIXED_CONVERT_EXPR)
+	*new_rhs_out = rhs;
+	  else
+	*new_rhs_out = rhs1;
+	}
+  else
+	*new_rhs_out = rhs;
   *type_out = type1;
   return true;
 }
@@ -2015,28 +2027,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
   return false;
 }
 
-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */
 
 static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
 		tree *type1_out, tree *rhs1_out,
 		tree *type2_out, tree *rhs2_out)
 {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
   if (TREE_CODE (type) != INTEGER_TYPE
TREE_CODE (type) != FIXED_POINT_TYPE)
 return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+			   rhs1_out))
 return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+			   rhs2_out))
 return false;
 
   if (*type1_out == NULL)
@@ -2088,7 +2099,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != INTEGER_TYPE)
 return false;
 
-  if (!is_widening_mult_p (stmt, type1, rhs1, type2, rhs2))
+  if 

Re: [PATCH][1/N][C][C++][Fortran][Java] Change POINTER_PLUS_EXPR offset type requirements

2011-07-14 Thread Richard Henderson
On 07/14/2011 04:28 AM, Richard Guenther wrote:
 On Tue, 12 Jul 2011, Richard Guenther wrote:
 

 This patch is step 1, it abstracts away the type of the offset operand
 when building a POINTER_PLUS_EXPR.  It does so by introducing
 fold_build_pointer_plus_expr{_hwi,_loc} helpers and use them in all
 places that currently build POINTER_PLUS_EXPRs (including those
 that do not fold the result).  This patch will make steps 2 and more
 purely middle-end (fingers crossed).

 Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.

 I also built cc1 for all supported targets via contrib/config-list.mk
 and saw no fails that relate to this patch.

 Are the frontend parts ok?
 
 Ping.  Joseph seems to be on vacation - Richard, can you approve
 the C frontend/family parts?

Yeah, this patch is ok.


r~


Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching

2011-07-14 Thread Richard Guenther
On Thu, Jul 14, 2011 at 4:34 PM, Andrew Stubbs a...@codesourcery.com wrote:
 On 07/07/11 11:13, Richard Guenther wrote:

 This updates the context changed by my update to patch 3.
 
   The content of this patch has not changed.

 Ok.

 I know this patch was already approved, but I discovered a bug in this patch
 that missed optimizing the case where the input to multiply did not come
 from an assign statement (this can happen when the value comes from a
 function parameter).

 This patch fixes that case, and updates the context changed by my updates
 earlier in the patch series.

 OK?

Ok.

Thanks,
Richard.

 Andrew



Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode

2011-07-14 Thread Andrew Stubbs

On 28/06/11 17:23, Andrew Stubbs wrote:

On 23/06/11 15:43, Andrew Stubbs wrote:

Patch 4 introduced support for using signed multiplies to code unsigned
multiplies in a narrower mode. Patch 5 then introduced support for
mis-matched input modes.

These two combined mean that there is case where only the smaller of two
inputs is unsigned, and yet it still tries to user a mode wider than the
larger, signed input. This is bad because it means unnecessary extends
and because the wider operation might not exist.

This patch catches that case, and ensures that the smaller, unsigned
input, is zero-extended to match the mode of the larger, signed input.

Of course, both inputs may still have to be extended to fit the nearest
available instruction, so it doesn't make a difference every time.

OK?


This update fixes Janis' issue with the testsuite.


And this version is updated to fit the changes made earlier in the 
series, and also to use the precision, instead of the mode-size, in 
order to better optimize bitfields.


OK?

Andrew
2011-06-24  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Better handle
	unsigned inputs of different modes.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-9.c: New file.
	* gcc.target/arm/wmul-bitfield-2.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+long long
+foo (long long a, short *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler smlalbb } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-bitfield-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options -O2 -march=armv7-a } */
+
+struct bf
+{
+  int a : 3;
+  unsigned int b : 15;
+  int c : 3;
+};
+
+long long
+foo (long long a, struct bf b, struct bf c)
+{
+  return a + b.b * c.c;
+}
+
+/* { dg-final { scan-assembler smlalbb } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2121,9 +2121,18 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
   if (op != smul_widen_optab)
 	{
-	  from_mode = GET_MODE_WIDER_MODE (from_mode);
-	  if (GET_MODE_SIZE (to_mode) = GET_MODE_SIZE (from_mode))
-	return false;
+	  /* We can use a signed multiply with unsigned types as long as
+	 there is a wider mode to use, or it is the smaller of the two
+	 types that is unsigned.  Note that type1 = type2, always.  */
+	  if ((TYPE_UNSIGNED (type1)
+	TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode))
+	  || (TYPE_UNSIGNED (type2)
+		   TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode)))
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) = GET_MODE_SIZE (from_mode))
+		return false;
+	}
 
 	  op = smul_widen_optab;
 	  handler = find_widening_optab_handler_and_mode (op, to_mode,
@@ -2290,14 +2299,20 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   /* There's no such thing as a mixed sign madd yet, so use a wider mode.  */
   if (from_unsigned1 != from_unsigned2)
 {
-  enum machine_mode mode = GET_MODE_WIDER_MODE (from_mode);
-  if (GET_MODE_PRECISION (mode)  GET_MODE_PRECISION (to_mode))
+  /* We can use a signed multiply with unsigned types as long as
+	 there is a wider mode to use, or it is the smaller of the two
+	 types that is unsigned.  Note that type1 = type2, always.  */
+  if ((from_unsigned1
+	TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode))
+	  || (from_unsigned2
+	   TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode)))
 	{
-	  from_mode = mode;
-	  from_unsigned1 = from_unsigned2 = false;
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (from_mode) = GET_MODE_SIZE (to_mode))
+	return false;
 	}
-  else
-	return false;
+
+  from_unsigned1 = from_unsigned2 = false;
 }
 
   /* If there was a conversion between the multiply and addition


Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode

2011-07-14 Thread Richard Guenther
On Thu, Jul 14, 2011 at 4:38 PM, Andrew Stubbs a...@codesourcery.com wrote:
 On 28/06/11 17:23, Andrew Stubbs wrote:

 On 23/06/11 15:43, Andrew Stubbs wrote:

 Patch 4 introduced support for using signed multiplies to code unsigned
 multiplies in a narrower mode. Patch 5 then introduced support for
 mis-matched input modes.

 These two combined mean that there is case where only the smaller of two
 inputs is unsigned, and yet it still tries to user a mode wider than the
 larger, signed input. This is bad because it means unnecessary extends
 and because the wider operation might not exist.

 This patch catches that case, and ensures that the smaller, unsigned
 input, is zero-extended to match the mode of the larger, signed input.

 Of course, both inputs may still have to be extended to fit the nearest
 available instruction, so it doesn't make a difference every time.

 OK?

 This update fixes Janis' issue with the testsuite.

 And this version is updated to fit the changes made earlier in the series,
 and also to use the precision, instead of the mode-size, in order to better
 optimize bitfields.

 OK?

Ok.

Thanks,
Richard.

 Andrew



[PATCH] Fix PR49651

2011-07-14 Thread Richard Guenther

This fixes PR49651 where we fail to handle aggregate dereferences
in call arguments in PTA properly.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied
where applicable.

Richard.

2011-07-14  Richard Guenther  rguent...@suse.de

PR tree-optimization/49651
* tree-ssa-structalias.c (get_constraint_for_1): Properly
handle dereferences with subvariables.

* gcc.dg/torture/pr49651.c: New testcase.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 176272)
--- gcc/tree-ssa-structalias.c  (working copy)
*** get_constraint_for_1 (tree t, VEC (ce_s,
*** 3349,3357 
  
  /* If we are not taking the address then make sure to process
 all subvariables we might access.  */
  cs = *VEC_last (ce_s, *results);
! if (address_p
! || cs.type != SCALAR)
return;
  
  vi = get_varinfo (cs.var);
--- 3349,3366 
  
  /* If we are not taking the address then make sure to process
 all subvariables we might access.  */
+ if (address_p)
+   return;
+ 
  cs = *VEC_last (ce_s, *results);
! if (cs.type == DEREF)
!   {
! /* For dereferences this means we have to defer it
!to solving time.  */
! VEC_last (ce_s, *results)-offset = UNKNOWN_OFFSET;
! return;
!   }
! if (cs.type != SCALAR)
return;
  
  vi = get_varinfo (cs.var);
Index: gcc/testsuite/gcc.dg/torture/pr49651.c
===
*** gcc/testsuite/gcc.dg/torture/pr49651.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr49651.c  (revision 0)
***
*** 0 
--- 1,31 
+ /* { dg-do run } */
+ 
+ extern void abort (void);
+ 
+ struct X {
+ int *p;
+ int *q;
+ };
+ 
+ void __attribute__((noinline, noclone))
+ foo (struct X x) { *x.q = 0; }
+ 
+ volatile int what;
+ struct X y;
+ 
+ int main()
+ {
+   int i, j;
+   struct X x, *p;
+   x.p = i;
+   x.q = j;
+   if (what)
+ p = y;
+   else
+ p = x;
+   j = 1;
+   foo (*p);
+   if (j != 0)
+ abort ();
+   return 0;
+ }


Re: [Patch, AVR]: Fix PR49487 (ICE for wrong rotate scratch)

2011-07-14 Thread Richard Henderson
On 07/14/2011 12:41 AM, Georg-Johann Lay wrote:
 Ok to commit and back-port to 4.6?
 
 Johann
 
  PR target/49487
  * config/avr/avr.md (rotlmode3): Generate SCRATCH instead
  of REG.
  (*rotwmode): Use const_int_operand for operands2.
  Use match_scatch for operands3.
  (*rotbmode): Ditto
  * config/avr/avr.c (avr_rotate_bytes): Treat SCRATCH.

Ok.


r~


Re: [PATCH (1/7)] New optab framework for widening multiplies

2011-07-14 Thread Andrew Stubbs

Ping. This is the last unreviewed patch in this series ...

Thanks

Andrew

On 09/07/11 15:43, Andrew Stubbs wrote:

On 23/06/11 15:37, Andrew Stubbs wrote:

This patch should have no effect on the compiler output. It merely
replaces one way to represent widening operations with another, and
refactors the other parts of the compiler to match. The rest of the
patch set uses this new framework to implement the optimization
improvements.

I considered and discarded many approaches to this patch before arriving
at this solution, and I feel sure that there'll be somebody out there
who will think I chose the wrong one, so let me first explain how I got
here 

The aim is to be able to encode and query optabs that have any given
input mode, and any given output mode. This is similar to the
convert_optab, but not compatible with that optab since it is handled
completely differently in the code.

(Just to be clear, the existing widening multiply support only covers
instructions that widen by *one* mode, so it's only ever been necessary
to know the output mode, up to now.)

Option 1 was to add a second dimension to the handlers table in optab_d,
but I discarded this option because it would increase the memory usage
by the square of the number of modes, which is a bit much.

Option 2 was to add a whole new optab, similar to optab_d, but with a
second dimension like convert_optab_d, however this turned out to cause
way too many pointer type mismatches in the code, and would have been
very difficult to fix up.

Option 3 was to add new optab entries for widening by two modes, by
three modes, and so on. True, I would only need to add one extra set for
what I need, but there would be so many places in the code that compare
against smul_widen_optab, for example, that would need to be taught
about these, that it seemed like a bad idea.

Option 4 was to have a separate table that contained the widening
operations, and refer to that whenever a widening entry in the main
optab is referenced, but I found that there was no easy way to do the
mapping without putting some sort of switch table in
widening_optab_handler, and that negates the other advantages.

So, what I've done in the end is add a new pointer entry widening into
optab_d, and dynamically build the widening operations table for each
optab that needs it. I've then added new accessor functions that take
both input and output modes, and altered the code to use them where
appropriate.

The down-side of this approach is that the optab entries for widening
operations now have two handlers tables, one of which is redundant.
That said, those cases are in the minority, and it is the smaller table
which is unused.

If people find that very distasteful, it might be possible to remove the
*_widen_optab entries and unify smul_optab with smul_widen_optab, and so
on, and save space that way. I've not done so yet, but I expect I could
if people feel strongly about it.

As a side-effect, it's now possible for any optab to be widening,
should some target happen to have a widening add, shift, or whatever.

Is this patch OK?


This update has been rebaselined to fix some conflicts with other recent
commits in this area.

I also identified a small bug which resulted in the operands to some
commutative operations being reversed. I don't believe the bug did any
harm, logically speaking, but I suppose there could be a testcase that
resulted in worse code being generated. With this fix, I now see exactly
matching output in all my testcases.

Andrew




Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich
 2011/7/14 Richard Guenther richard.guent...@gmail.com:

 But then how comes the option to override it is useful?  It isn't dependent
 on the particular case.  At least the option should be a --param.

 Richard.


 Option is still useful if you want to try feature on platform with no
 hook implemented and for other performance experiments. I agree
 --param usage should be better here. I'll fix it.

 Ilya


Here is a patch with new param instead of new option. Bootstrapped and
checked on x86_64-linux.

Ilya
--
gcc/

2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): New hook documentation.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_const_gimple_1): New default hook.

* hooks.c (hook_int_const_gimple_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-14  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data


[testsuite]: Some test case skips.

2011-07-14 Thread Georg-Johann Lay
This patchlet fixes two test cases:

* gcc.dg/pr32912-2.c: TImode is overkill for AVR.
* gcc.dg/pr44674.c: Test if -fprofile-generate is available.

Ok to commit?

Johann

testsuite/
* gcc.dg/pr32912-2.c: Skip for AVR.
* gcc.dg/pr44674.c: Add dg-require-profiling.


Index: gcc.dg/pr32912-2.c
===
--- gcc.dg/pr32912-2.c  (revision 176136)
+++ gcc.dg/pr32912-2.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options -O2 -w } */
+/* { dg-skip-if TImode not supported { avr-*-* } { * } {  } } */

 extern void abort (void);

Index: gcc.dg/pr44674.c
===
--- gcc.dg/pr44674.c(revision 176136)
+++ gcc.dg/pr44674.c(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -O -fprofile-generate } */
+/* { dg-require-profiling -fprofile-generate } */

 void
 jumpfunc (void *p)


Re: [PATCH] New IPA-CP with real function cloning

2011-07-14 Thread Martin Jambor
Hi,

like with the previous mail, I'll reply to the comments here and then
post a new version of the patch in a separate thread.

On Sun, Jul 10, 2011 at 07:04:21PM +0200, Jan Hubicka wrote:
  
  /* If checking is enabled, verify that no lattice is in the TOP state, i.e. 
  not
 bottom, not containing a variable component and without any known value 
  at
 the same time.  */
  
  static void
  verify_propagated_values (void)
  {
  #ifdef ENABLE_CHECKING
 
 Hmm, would not be better to keep this function around to be called from 
 debugger, like
 other verifiers do?

OK.

struct cgraph_node *node;
  
FOR_EACH_DEFINED_FUNCTION (node)
  {
struct ipa_node_params *info = IPA_NODE_REF (node);
int i, count = ipa_get_param_count (info);
  
for (i = 0; i  count; i++)
  {
struct ipcp_lattice *lat = ipa_get_lattice (info, i);
  
if (!lat-bottom
 !lat-contains_variable
 lat-values_count == 0)
  {
if (dump_file)
  {
fprintf (dump_file, \nIPA lattices after constant 
 propagation:\n);
print_all_lattices (dump_file, true, false);
  }
  
gcc_unreachable ();
  }
  }
  }
  #endif
  }
  
  /* Propagate values through a pass-through jump function JFUNC associated 
  with
 edge CS, taking values from SRC_LAT and putting them into DEST_LAT.  
  SRC_IDX
 is the index of the source parameter.  */
  
  static bool
  propagate_vals_accross_pass_through (struct cgraph_edge *cs,
   struct ipa_jump_func *jfunc,
   struct ipcp_lattice *src_lat,
   struct ipcp_lattice *dest_lat,
   int src_idx)
  {
struct ipcp_value *src_val;
bool ret = false;
  
if (jfunc-value.pass_through.operation == NOP_EXPR)
  for (src_val = src_lat-values; src_val; src_val = src_val-next)
ret |= add_value_to_lattice (dest_lat, src_val-value, cs,
 src_val, src_idx);
else if (edge_within_scc (cs))
  ret = set_lattice_contains_variable (dest_lat);
 
 Hmm, is the reason for not using artimetic within SCC documented somewhere?
 It seems like code someone will eventually run into and remove without much 
 of consideration.

I added a comment.

  
  /* Calculate devirtualization time bonus for NODE, assuming we know 
  KNOWN_CSTS
 and KNOWN_BINFOS.  */
  
  static int
  devirtualization_time_bonus (struct cgraph_node *node,
   VEC (tree, heap) *known_csts,
   VEC (tree, heap) *known_binfos)
 
 Eventually it would be nice to make this common with inliner analysis.  We 
 want to 
 increase inlining limits here too

True.

/* Only bare minimum benefit for clearly un-inlineable targets.  */
res += 1;
callee = cgraph_get_node (target);
if (!callee)
  continue;
 
 Hmm, when you test it here, you might probably ask if callee is analyzed and 
 inlinable then ;)

Good point, did that.

  
  /* Return true if cloning NODE is a good idea, given the estimated 
  TIME_BENEFIT
 and SIZE_COST and with the sum of frequencies of incoming edges to the
 potential new clone in FREQUENCIES.  */
  
  static bool
  good_cloning_opportunity_p (struct cgraph_node *node, int time_benefit,
  int freq_sum, gcov_type count_sum, int size_cost)
  {
if (time_benefit == 0
|| !flag_ipa_cp_clone
|| !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node-decl)))
  return false;
 
 Still I think cloning oppurtunity is good if the saving on call size reduce 
 enough
 to pay back for function body duplication.
 It would probably make sense then to create simple wrapper functions instead 
 of
 duplicating the function body.

I was already doing that for considering IPA-CP opportunities for all
known contexts - and the effect is more profound still now when we
consider moving costs.

On the other side, when opportunities for function specialization are
concerned, size is always considered a cost and never a benefit when
doing the calculations.  This simplifies a the code a bit because the
size savings are on the side of the caller and so it depends on the
number of call sites that actually get redirected.  That is not known
when estimating size but only we did specialization decisions for all
the callers and so it would need to be an extra thing stored along
values.  I think that since such calls should be picked up by
inlining, it's not really worth the added data and complexity.  But I
could add it later, sure.

 
  
gcc_checking_assert (size_cost = 0);
  
/* FIXME:  These decisions need tuning.  */
if (max_count)
  {
int evaluation, factor = (count_sum * 1000) / max_count;
  
evaluation = (time_benefit * factor) / 

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera

Hi Tobias,

As per your suggestion, I'm copying to gfotran and gcc-patches. I've 
made the change your asked and I'm going to commit the patch.


Cheers,
Daniel.


On 07/14/2011 05:31 PM, Tobias Burnus wrote:

On 07/14/2011 05:05 PM, Daniel Carrera wrote:

+caf_runtime_error (const char *message, ...)
+{
+ va_list ap;
+ fprintf (stderr, Fortran runtime error.);


Could you replace . by :  (colon, space), I think Fortran runtime
error: Could not ... looks better than Fortran runtime error.Could not


Otherwise, the patch is OK. Could you then send the patch as to fortran@
and gcc-patches@?

Tobias

PS: You could directly commit the modified patch, simply reply to this
email, add a CC to fortran@/gcc-patches@ and attach the patch and
changelog. There is no need that I approve the patch again.



--
I'm not overweight, I'm undertall.


Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera

And of course, here is the patch and ChangeLog.

2011-07-14  Daniel Carrera  dcarr...@gmail.com

* caf/single.c:  Include stdarg.h header.
(caf_runtime_error): New function. Use exit(EXIT_FAILURE).
(_gfortran_caf_register): Use caf_runtime_error.
(_gfortran_caf_sync_images): Use exit(EXIT_FAILURE).
* caf/mpi.c (caf_runtime_error): Remove error parameter.
Return EXIT_FAILURE instead.
(_gfortran_caf_register): Update call to caf_runtime_error.
(_gfortran_caf_sync_all): Ditto.
(_gfortran_caf_sync_images): Ditto.
(_gfortran_caf_error_stop_str): Use exit(EXIT_FAILURE).


Now I'm just waiting for SVN update before I commit...


On 07/14/2011 05:34 PM, Daniel Carrera wrote:

Hi Tobias,

As per your suggestion, I'm copying to gfotran and gcc-patches. I've
made the change your asked and I'm going to commit the patch.

Cheers,
Daniel.


On 07/14/2011 05:31 PM, Tobias Burnus wrote:

On 07/14/2011 05:05 PM, Daniel Carrera wrote:

+caf_runtime_error (const char *message, ...)
+{
+ va_list ap;
+ fprintf (stderr, Fortran runtime error.);


Could you replace . by :  (colon, space), I think Fortran runtime
error: Could not ... looks better than Fortran runtime error.Could not


Otherwise, the patch is OK. Could you then send the patch as to fortran@
and gcc-patches@?

Tobias

PS: You could directly commit the modified patch, simply reply to this
email, add a CC to fortran@/gcc-patches@ and attach the patch and
changelog. There is no need that I approve the patch again.






--
I'm not overweight, I'm undertall.
Index: libgfortran/caf/single.c
===
--- libgfortran/caf/single.c	(revision 176230)
+++ libgfortran/caf/single.c	(working copy)
@@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI
 #include stdio.h  /* For fputs and fprintf.  */
 #include stdlib.h /* For exit and malloc.  */
 #include string.h /* For memcpy and memset.  */
+#include stdarg.h /* For variadic arguments.  */
 
 /* Define GFC_CAF_CHECK to enable run-time checking.  */
 /* #define GFC_CAF_CHECK  1  */
@@ -40,6 +41,21 @@ see the files COPYING3 and COPYING.RUNTI
 caf_static_t *caf_static_list = NULL;
 
 
+/* Keep in sync with mpi.c.  */
+static void
+caf_runtime_error (const char *message, ...)
+{
+  va_list ap;
+  fprintf (stderr, Fortran runtime error: );
+  va_start (ap, message);
+  fprintf (stderr, message, ap);
+  va_end (ap);
+  fprintf (stderr, \n);
+
+  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */
+  exit (EXIT_FAILURE);
+}
+
 void
 _gfortran_caf_init (int *argc __attribute__ ((unused)),
 		char ***argv __attribute__ ((unused)),
@@ -73,12 +89,12 @@ _gfortran_caf_register (ptrdiff_t size, 
 
   if (unlikely (local == NULL || token == NULL))
 {
+  const char msg[] = Failed to allocate coarray;
   if (stat)
 	{
 	  *stat = 1;
 	  if (errmsg_len  0)
 	{
-	  const char msg[] = Failed to allocate coarray;
 	  int len = ((int) sizeof (msg)  errmsg_len) ? errmsg_len
 			  : (int) sizeof (msg);
 	  memcpy (errmsg, msg, len);
@@ -88,10 +104,7 @@ _gfortran_caf_register (ptrdiff_t size, 
 	  return NULL;
 	}
   else
-	{
-	  fprintf (stderr, ERROR: Failed to allocate coarray);
-	  exit (1);
-	}
+	  caf_runtime_error (msg);
 }
 
   if (stat)
@@ -140,7 +153,7 @@ _gfortran_caf_sync_images (int count __a
   {
 	fprintf (stderr, COARRAY ERROR: Invalid image index %d to SYNC 
 		 IMAGES, images[i]);
-	exit (1);
+	exit (EXIT_FAILURE);
   }
 #endif
 
Index: libgfortran/caf/mpi.c
===
--- libgfortran/caf/mpi.c	(revision 176230)
+++ libgfortran/caf/mpi.c	(working copy)
@@ -47,8 +47,9 @@ static int caf_is_finalized;
 caf_static_t *caf_static_list = NULL;
 
 
+/* Keep in sync with single.c.  */
 static void
-caf_runtime_error (int error, const char *message, ...)
+caf_runtime_error (const char *message, ...)
 {
   va_list ap;
   fprintf (stderr, Fortran runtime error on image %d: , caf_this_image);
@@ -59,10 +60,10 @@ caf_runtime_error (int error, const char
 
   /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */
   /* FIXME: Do some more effort than just MPI_ABORT.  */
-  MPI_Abort (MPI_COMM_WORLD, error);
+  MPI_Abort (MPI_COMM_WORLD, EXIT_FAILURE);
 
   /* Should be unreachable, but to make sure also call exit.  */
-  exit (2);
+  exit (EXIT_FAILURE);
 }
 
 
@@ -179,7 +180,7 @@ error:
 	  }
   }
 else
-  caf_runtime_error (caf_is_finalized ? STAT_STOPPED_IMAGE : 1, msg);
+  caf_runtime_error (msg);
   }
 
   return NULL;
@@ -223,7 +224,7 @@ _gfortran_caf_sync_all (int *stat, char 
 	memset (errmsg[len], ' ', errmsg_len-len);
 	}
   else
-	caf_runtime_error (caf_is_finalized ? STAT_STOPPED_IMAGE : ierr, msg);
+	caf_runtime_error (msg);
 }
 }
 
@@ -291,7 +292,7 @@ _gfortran_caf_sync_images (int count, in
 	memset (errmsg[len], 

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Michael Meissner
On Thu, Jul 14, 2011 at 11:32:59AM +0200, Richard Guenther wrote:
 On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
  2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com:
  One minor note, you will need to update doc/invoke.texi to document the new
  switch before checkin: -ftree-reassoc-width=n
 
  You also need approval and somebody to review the patch before checkin.
 
 Btw, rather than a new target hook and an option I suggest to use
 a --param which default you can modify in the backend.

You need the target hook that tells how big the reassociation based on the
type.  Machines have different numbers of functional units, for example, maybe
3 integer units and 2 floating point units.  For example, in the PowerPC, I
would set the basic integer and binary floating point types to 2 to match the
dispatch rules, but for decimal floating point, I would set it to one, since
the machines currently only have 1 decimal unit.

With the hook, I could see eliminating the switch and/or --param altogether,
and doing only in the hook.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899


Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich

 You need the target hook that tells how big the reassociation based on the
 type.  Machines have different numbers of functional units, for example, maybe
 3 integer units and 2 floating point units.  For example, in the PowerPC, I
 would set the basic integer and binary floating point types to 2 to match the
 dispatch rules, but for decimal floating point, I would set it to one, since
 the machines currently only have 1 decimal unit.

 With the hook, I could see eliminating the switch and/or --param altogether,
 and doing only in the hook.


Hook introduced by this patch meets these requirements. But I think it
is useful to have option to override the hook because you cannot tune
it perfectly for each case.

Ilya


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Richard Henderson
On 07/14/2011 03:03 AM, Bernd Schmidt wrote:
 +++ gcc/config/ia64/ia64.c(working copy)
 @@ -1047,7 +1047,7 @@
tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx);
emit_insn (gen_rtx_SET (VOIDmode, dest, tmp));
  
 -  tmp = gen_rtx_LO_SUM (Pmode, dest, src);
 +  tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src);

And the bug stems from ... what? 

Is this bug still fixed if you change this to gen_const_mem?

This is a load from the .got.  It's constant memory, and it's
always present.  There's nowhere in the function that we cannot
move this load (assuming the address is computed properly) 
where this load will fail.

It's difficult to tell if your raw gen_rtx_MEM with no aliasing
info doesn't just paper over a problem by preventing it from
being moved.


r~


Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-14 Thread Steven Bosscher
On Thu, Jul 14, 2011 at 11:40 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 Generating RTL from GIMPLE passes just to be able to use rtx_cost is,
 well ... gross.

Indeed. And it is one of the major blockers for fully separating the
RTL, GIMPLE and target code off into separate modules. It would be
great to get rid of RTL from IVOPTS too...

Ciao!
Steven


Re: [PATCH] New IPA-CP with real function cloning

2011-07-14 Thread Jan Hubicka
 if (dec  cs-count)
 cs-count -= dec;
 else
 cs-count = 0;
   }
   
 if (dump_file)
   dump_profile_updates (orig_node, new_node);
   }
   
 if (node-local.can_change_signature)
   {
 args_to_skip = BITMAP_GGC_ALLOC ();
 for (i = 0; i  count; i++)
 {
   tree t = VEC_index (tree, known_vals, i);
   
   if ((t  TREE_CODE (t) != TREE_BINFO)
   || !ipa_is_param_used (info, i))
 bitmap_set_bit (args_to_skip, i);
 }
   }
 else
   args_to_skip = NULL;
  When we can't change signature, still we can set is_parm_unused flag for 
  the callee
  to aid later optimizers.
 
 I assume I can re-use the node-local.can_change_signature flag?  Is
 that supposed to be set at any given place or can IPA-CP do it on its own?

can_change_signature is currently used by i386 backend and it is set by inliner.
I plan to move it to visibility pass at the time local functions are dentified.
So yes, you can assume it is set and up to date at the time IPA-CP is run.

Honza
 
  
  Rest of patch looks OK. It definitely reads better than previous ipa-cp.c ;)
  I suppose we will need to get some experience with the logic deciding 
  whether to clone..
 
 
 Thanks, I'll post the current version in a moment.
 
 Martin


Re: Fix PR48542: reload register contents reuse crossing setjmp

2011-07-14 Thread Ulrich Weigand
Jeff Law wrote:
 On 06/15/11 21:46, Hans-Peter Nilsson wrote:
  PR rtl-optimization/48542
  * reload.c (find_equiv_reg): Stop looking when finding a
  setjmp-type call.
  * reload1.c (reload_as_needed): Invalidate all reload
  registers when crossing a setjmp-type call.
 OK.
 Jeff

I see that this went already in, but I'm wondering why this
change should be necessary.  As far as register use is
concerned, setjmp ought to behave just like a regular function:
if a register is call-clobbered, reload will not attempt to
use it across a function call (*any* function call) anyway;
but if the register is call-saved, setjmp really ought to
restore the old value, *both* times it returns, and so reuse
ought to be allowed ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt
On 07/14/11 18:03, Richard Henderson wrote:
 On 07/14/2011 03:03 AM, Bernd Schmidt wrote:
 +++ gcc/config/ia64/ia64.c   (working copy)
 @@ -1047,7 +1047,7 @@
tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx);
emit_insn (gen_rtx_SET (VOIDmode, dest, tmp));
  
 -  tmp = gen_rtx_LO_SUM (Pmode, dest, src);
 +  tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src);
 
 And the bug stems from ... what? 
 
 Is this bug still fixed if you change this to gen_const_mem?

It should be. Testing this isn't straightforward bit tricky since the
original bug is in gcc-3.3 which doesn't have gen_const_mem, and current
mainline with just the scheduler patch removed doesn't reproduce it with
the testcase.

In mainline, gen_const_mem sets MEM_NOTRAP_P, but it looks like we
handle that correctly in may_trap_p:

  if (/* MEM_NOTRAP_P only relates to the actual position of the memory
 reference; moving it out of context such as when moving code
 when optimizing, might cause its address to become invalid.  */
  code_changed
  || !MEM_NOTRAP_P (x))

and sched_deps uses rtx_addr_can_trap anyway.

 This is a load from the .got.

Yes, but not using the fixed got pointer in r1, but a random other
register which can have different values in the function.

 It's difficult to tell if your raw gen_rtx_MEM with no aliasing
 info doesn't just paper over a problem by preventing it from
 being moved.

The problem isn't about memory aliasing, it's about the pointer register
being clobbered.


Bernd


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt
On 07/14/11 18:19, Bernd Schmidt wrote:
 On 07/14/11 18:03, Richard Henderson wrote:
 On 07/14/2011 03:03 AM, Bernd Schmidt wrote:
 +++ gcc/config/ia64/ia64.c  (working copy)
 @@ -1047,7 +1047,7 @@
tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx);
emit_insn (gen_rtx_SET (VOIDmode, dest, tmp));
  
 -  tmp = gen_rtx_LO_SUM (Pmode, dest, src);
 +  tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src);

 And the bug stems from ... what? 

 Is this bug still fixed if you change this to gen_const_mem?
 
 It should be. Testing this isn't straightforward bit tricky since the
 original bug is in gcc-3.3 which doesn't have gen_const_mem, and current
 mainline with just the scheduler patch removed doesn't reproduce it with
 the testcase.

Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P
which doesn't exist in that tree) the load stays behind the branch where
it should be.


Bernd


Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-14 Thread Jakub Jelinek
On Thu, Jul 14, 2011 at 12:26:41PM -0400, Jason Merrill wrote:
 On 07/14/2011 05:41 AM, Jakub Jelinek wrote:
 Here is a PCH friendly variant of the patch which tries to set right
 producer from the start, but early in dwarf2out_finish double checks
 if PCH hasn't changed it and if it did, updates it back to the expected
 string.
 
 Why not just wait until then to set it?

Because the same routine that sets it is used both to create the single
compilation unit early and to create CUs late (for -feliminate-dwarf2-dups
already during dwarf2out_finish).

Jakub


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Richard Henderson
On 07/14/2011 09:23 AM, Bernd Schmidt wrote:
 Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P
 which doesn't exist in that tree) the load stays behind the branch where
 it should be.
 

RTX_UNCHANGING_P was the bit back then, I believe.


r~


Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-14 Thread Jason Merrill

On 07/14/2011 12:29 PM, Jakub Jelinek wrote:

On Thu, Jul 14, 2011 at 12:26:41PM -0400, Jason Merrill wrote:

On 07/14/2011 05:41 AM, Jakub Jelinek wrote:

Here is a PCH friendly variant of the patch which tries to set right
producer from the start, but early in dwarf2out_finish double checks
if PCH hasn't changed it and if it did, updates it back to the expected
string.


Why not just wait until then to set it?


Because the same routine that sets it is used both to create the single
compilation unit early and to create CUs late (for -feliminate-dwarf2-dups
already during dwarf2out_finish).


Ah.  OK, then.

Jason


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt
On 07/14/11 18:29, Richard Henderson wrote:
 On 07/14/2011 09:23 AM, Bernd Schmidt wrote:
 Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P
 which doesn't exist in that tree) the load stays behind the branch where
 it should be.

 
 RTX_UNCHANGING_P was the bit back then, I believe.

Still ok with that also set:
addl r14 = @ltoff(ap_standalone#), r1
;;
.mii
ld8 r15 = [r14]
addl r14 = @ltoff(.LC2), r1
;;
(p7) addl r14 = 1, r0
;;
.mib
(p7) st4 [r15] = r14
nop.i 0
(p7) br.cond.dptk .L5
.mib
ld8 r14 = [r14]

And not ok if the MEM isn't exposed in RTL:
addl r14 = @ltoff(ap_standalone#), r1
;;
.mii
ld8 r15 = [r14]
addl r14 = @ltoff(.LC2), r1
;;
(p7) addl r14 = 1, r0
;;
.mii
(p7) st4 [r15] = r14
nop.i 0
nop.i 0
.mbb
ld8 r14 = [r14]
(p7) br.cond.dptk .L5

I'm attaching the 3.3 patch.


Bernd

Index: ../../branches/gcc-3_3-branch/gcc/sched-ebb.c
===
--- ../../branches/gcc-3_3-branch/gcc/sched-ebb.c   (revision 175226)
+++ ../../branches/gcc-3_3-branch/gcc/sched-ebb.c   (working copy)
@@ -52,8 +52,7 @@ static int schedule_more_p PARAMS ((void
 static const char *ebb_print_insn PARAMS ((rtx, int));
 static int rank PARAMS ((rtx, rtx));
 static int contributes_to_priority PARAMS ((rtx, rtx));
-static void compute_jump_reg_dependencies PARAMS ((rtx, regset, regset,
-  regset));
+static void compute_jump_reg_dependencies PARAMS ((rtx, regset));
 static void schedule_ebb PARAMS ((rtx, rtx));
 
 /* Return nonzero if there are more insns that should be scheduled.  */
@@ -161,30 +160,22 @@ contributes_to_priority (next, insn)
   return 1;
 }
 
-/* INSN is a JUMP_INSN, COND_SET is the set of registers that are
-   conditionally set before INSN.  Store the set of registers that
-   must be considered as used by this jump in USED and that of
-   registers that must be considered as set in SET.  */
+/* INSN is a JUMP_INSN.  Store the set of registers that must be considered
+   to be set by this jump in SET.  */
 
 static void
-compute_jump_reg_dependencies (insn, cond_set, used, set)
+compute_jump_reg_dependencies (insn, set)
  rtx insn;
- regset cond_set, used, set;
+ regset set;
 {
   basic_block b = BLOCK_FOR_INSN (insn);
   edge e;
   for (e = b-succ; e; e = e-succ_next)
-if (e-flags  EDGE_FALLTHRU)
-  /* The jump may be a by-product of a branch that has been merged
-in the main codepath after being conditionalized.  Therefore
-it may guard the fallthrough block from using a value that has
-conditionally overwritten that of the main codepath.  So we
-consider that it restores the value of the main codepath.  */
-  bitmap_operation (set, e-dest-global_live_at_start, cond_set,
-   BITMAP_AND);
-else
-  bitmap_operation (used, used, e-dest-global_live_at_start,
-   BITMAP_IOR);
+if ((e-flags  EDGE_FALLTHRU) == 0)
+  {
+   bitmap_operation (set, set, e-dest-global_live_at_start,
+ BITMAP_IOR);
+  }
 }
 
 /* Used in schedule_insns to initialize current_sched_info for scheduling
Index: ../../branches/gcc-3_3-branch/gcc/emit-rtl.c
===
--- ../../branches/gcc-3_3-branch/gcc/emit-rtl.c(revision 175226)
+++ ../../branches/gcc-3_3-branch/gcc/emit-rtl.c(working copy)
@@ -192,6 +192,18 @@ static tree component_ref_for_mem_expr P
 static rtx gen_const_vector_0  PARAMS ((enum machine_mode));
 static void copy_rtx_if_shared_1   PARAMS ((rtx *orig));
 
+/* Generate a memory referring to non-trapping constant memory.  */
+
+rtx
+gen_const_mem (enum machine_mode mode, rtx addr)
+{
+  rtx mem = gen_rtx_MEM (mode, addr);
+  RTX_UNCHANGING_P (mem) = 1;
+  MEM_NOTRAP_P (mem) = 1;
+  return mem;
+}
+
+
 /* Probability of the conditional branch currently proceeded by try_split.
Set to -1 otherwise.  */
 int split_branch_probability = -1;
Index: ../../branches/gcc-3_3-branch/gcc/sched-deps.c
===
--- ../../branches/gcc-3_3-branch/gcc/sched-deps.c  (revision 175226)
+++ ../../branches/gcc-3_3-branch/gcc/sched-deps.c  (working copy)
@@ -981,17 +981,12 @@ sched_analyze_insn (deps, x, insn, loop_
   else
{
  rtx pending, pending_mem;
- regset_head tmp_uses, tmp_sets;
- INIT_REG_SET (tmp_uses);
- INIT_REG_SET (tmp_sets);
-
- (*current_sched_info-compute_jump_reg_dependencies)
-   (insn, deps-reg_conditional_sets, tmp_uses, tmp_sets);
- IOR_REG_SET (reg_pending_uses, tmp_uses);
- IOR_REG_SET 

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Richard Henderson
On 07/14/2011 09:19 AM, Bernd Schmidt wrote:
 Yes, but not using the fixed got pointer in r1, but a random other
 register which can have different values in the function.

Oh, I think I see.

So if this really had been a PLUS, as implied by the LO_SUM,
we would have had garbage input, produced garbage output, but
(eventually) ignored the result.

But since this really is a load from memory, the garbage
input is immediately fatal.

Have I got that right?

If so, the patch with the use of gen_const_mem is ok.

It does raise the question of whether we ought to completely
change the way we represent the pairing of LTOFFX/LDXMOV
relocations.


r~


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt
On 07/14/11 18:39, Richard Henderson wrote:
 On 07/14/2011 09:19 AM, Bernd Schmidt wrote:
 Yes, but not using the fixed got pointer in r1, but a random other
 register which can have different values in the function.
 
 Oh, I think I see.
 
 So if this really had been a PLUS, as implied by the LO_SUM,
 we would have had garbage input, produced garbage output, but
 (eventually) ignored the result.
 
 But since this really is a load from memory, the garbage
 input is immediately fatal.
 
 Have I got that right?

This is correct.

 If so, the patch with the use of gen_const_mem is ok.

Will commit.

(Although now I wonder if we could instead use one of the speculative
load instructions? There's one that sets the NaT bit if the load would
fault, isn't there? It's been so long I can't remember.)

 It does raise the question of whether we ought to completely
 change the way we represent the pairing of LTOFFX/LDXMOV
 relocations.

This I can't answer since I don't know the definition of these.


Bernd


gimplefe patch

2011-07-14 Thread Ketaki
Hi,

This is the patch for gimple front end branch. I have written this
code as a part of Google Summer of Code program.

The patch consist of following changes:

1. The grammar rules for variable declaration are modified to include
parameters. The code for the same is in gp_parse_var_decl function.
2. New function gp_parse_function_declaration is added to recognize
function declaration.
3. Few changes in the function gp_parse_decl have been made to
accomodate the above code.
4. The comments have been updated to reflect the changes in the code.
5. Changelog is updated.

Please find the attached patch file. I hope you find everything as per
the standard

--
Ketaki Tiwatne


file.diff
Description: Binary data


Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Richard Henderson
On 07/14/2011 09:43 AM, Bernd Schmidt wrote:
 (Although now I wonder if we could instead use one of the speculative
 load instructions? There's one that sets the NaT bit if the load would
 fault, isn't there? It's been so long I can't remember.)

We could, but we also have to insert a check load to match.
It gets very difficult to tell when it's worth it.

Your example

(p7) br.cond.dptk .L5
ld8 r15 = [r14]

becomes

ld8.sa r15 = [r14]  // speculative advanced load
(p7) br.cond.dptk .L5
ld8.c.clr r15 = [r14]   // checked load (clear ALAT)

Note that the speculative load can arbitrarily fail (e.g. tlb miss)
and that the checked load can also arbitrarily re-issue the load.

Note that one can't split ld8 r14 = [r14] without additional
register allocation, because the address needs to remain live
until the check.

 It does raise the question of whether we ought to completely
 change the way we represent the pairing of LTOFFX/LDXMOV
 relocations.
 
 This I can't answer since I don't know the definition of these.

These are markers for linker relaxation.  If the symbol is
within range,

addlx = @ltoffx(foo), gp
...
ld8.mov y = [x], foo// .mov adds the LDXMOV reloc vs foo to ld8 insn

will be relaxed to

nop
...
addly = @gprel(foo), gp

The constraint in using the relocations is that every ldxmov
insn must be fed by an ltoffx reloc with the same symbol, and
that an ltoffx reloc cannot feed any other insn.  That allows
us, at link time, to not consider data flow, merely assert
that if foo is relaxed anywhere, it's relaxed everywhere.


r~


Remove NetWare support

2011-07-14 Thread Rainer Orth
I've got a preliminary NetWare removal patch ready (yet untested), but
have a couple of questions:

* Given that there's a considerable amount of NetWare support still in
  src, toplevel support has to stay.  On the other hand, the reference
  in config/elf.m4 is only used for the LTO plugin and can go, I
  believe.

* The i386 port has some code that seems to be NetWare-specific, namely
  KEEP_AGGREGATE_RETURN_POINTER, ix86_keep_aggregate_return_pointer and
  the callee_pop_aggregate_return attribute.  I'm uncertain if all this
  can go now.

* There are references to config/i386/netware.h in gcc/po/*.po.  Do I
  need to do anything about this when netware.h is removed?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Janne Blomqvist
On Thu, Jul 14, 2011 at 18:40, Daniel Carrera dcarr...@gmail.com wrote:
 And of course, here is the patch and ChangeLog.

 2011-07-14  Daniel Carrera  dcarr...@gmail.com

        * caf/mpi.c (caf_runtime_error): Remove error parameter.
        Return EXIT_FAILURE instead.

Well, this changelog entry is incorrect; you're calling
exit(EXIT_FAILURE) and not returning a value.

From the patch:

 /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */

This is unnecessary, as the call to exit() will call the libgfortran
destructor which will close all units, as explained in comment #3 in
the PR.



-- 
Janne Blomqvist


Re: [google] Backport patch r175881 from gcc-4_6-branch to google/gcc-4_6 (issue4695051)

2011-07-14 Thread Diego Novillo
On Wed, Jul 13, 2011 at 22:39, Carrot Wei car...@google.com wrote:
 Hi Diego

 The previous patch was done with svn merge.

 This new version is done with svnmerge.py. Again tested with

Great, thanks.  This simplifies regular merges, since svnmerge.py will
know that this rev was merged already.  I think svnmerge.py also reads
'svn merge' markers, but I'm not sure.

 make check-g++ RUNTESTFLAGS=--target_board=arm-sim/thumb/arch=armv7-a
 dg.exp=anon-ns1.C
 make check-g++ RUNTESTFLAGS=dg.exp=anon-ns1.C

 BTW, there are some unexpected property changes after merge, I don't
 how did they come out and how should I deal with them?

Don't worry about those.  The branch used to merge from multiple
sources that had different attributes for those files.  It will be
gone after the next merge I'm doing next week.

The backport is OK, of course.


Diego.


Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera

On 07/14/2011 09:48 PM, Janne Blomqvist wrote:

2011-07-14  Daniel Carreradcarr...@gmail.com



* caf/mpi.c (caf_runtime_error): Remove error parameter.
Return EXIT_FAILURE instead.


Well, this changelog entry is incorrect; you're calling
exit(EXIT_FAILURE) and not returning a value.


Ok. What term should I have used? I was thinking that the program itself 
returns a value, but I can see that what I wrote was not optimal.




 From the patch:

  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */

This is unnecessary, as the call to exit() will call the libgfortran
destructor which will close all units, as explained in comment #3 in
the PR.


Ok. I didn't want to mess with comments that I didn't fully understand.

--
I'm not overweight, I'm undertall.


Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Tobias Burnus

Janne Blomqvist wrote:

* caf/mpi.c (caf_runtime_error): Remove error parameter.
Return EXIT_FAILURE instead.
 From the patch:

  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */

This is unnecessary, as the call to exit() will call the libgfortran
destructor which will close all units, as explained in comment #3 in
the PR.


While I think it should be sufficient for single-processor usage, I am 
not sure that that way all I/O buffers gets written before one calls 
MPI_Finalize - nor am I sure how one would handle ERROR STOP with 
regards to I/O.


In terms of I/O, there are three kinds of I/O, which might need to be 
treated differently:
* I/O to stdout (OUTPUT_UNIT): Here, all output should be collected by 
MPI - I am not sure whether it will come to later in a destructor

* I/O to (local) files
* I/O via the communication library: Here, I see the greatest problems, 
but that's not in F2008's coarrays, but might well be added with the 
Technical Report.


I somehow would feel better if I could ensure that the buffers are 
flushed and the files closed before I pull the MPI plug (MPI_Finalize, 
MPI_Abort).


For reference, the comment Janne is referring to is the one at the 
bottom (comment 3) of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43849


Tobias


Re: [build] Move crtfastmath to toplevel libgcc

2011-07-14 Thread Uros Bizjak
On Thu, Jul 14, 2011 at 12:09 PM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Andreas Schwab sch...@redhat.com writes:

 Same on ia64:

 Configuration mismatch!
 Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o
 Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o 
 crtfastmath.o

Alpha needs the same fix. I need following patch to bootstrap the compiler:

--cut here--
Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 176282)
+++ gcc/config.gcc  (working copy)
@@ -757,6 +757,7 @@
extra_options=${extra_options} alpha/elf.opt
target_cpu_default=MASK_GAS
tmake_file=${tmake_file} alpha/t-alpha alpha/t-ieee alpha/t-linux
+   extra_parts=$extra_parts crtfastmath.o
;;
 alpha*-*-freebsd*)
tm_file=${tm_file} ${fbsd_tm_file} alpha/elf.h alpha/freebsd.h
--cut here--

Uros.


Re: [PATCH] New IPA-CP with real function cloning

2011-07-14 Thread Jan Hubicka
 
 Well, technically they survive until after inlining (because of
 indirect inlining which also derives information from the lattices
 corresponding to node-inlined_to node.  Results of arithmetic
 functions are not going to be accessed during inlining when compiling
 any reasonable program but...

Hmm, this sounds bad.  We should move it to GTY then incrementally.  I however
though that indirect inlining looks only at jump functions, not at lattices?
  
  __attribute__ ((really_bad_attribute))
  function (int param)
  {
use param
  }
  
  and then let ipa-cp to invent param is a constant.
 
 what would be such a really_bad_attribute ? 

Well, we need to go through the attribute list and prepare list of bad guys
instead of forbidding any attribute.
Obviously we should not give up on pure attribute, for example.

Another class of attributes are those referring to function arguments that needs
updating if they are still in use after clonning. I think this was original 
reason
for adding the check.

I am not sure if we have attributes that should prevent clonning completely.

can_change_sigunature will also handle apply_args.
Add VA_START code there, too.  For the other use of this flag (in i386) 
VA_START
 
 The last one already is VA_START... or do you mean a different one?

I meant the code in ipa-inline-analysis.c doing the same checks but skiiping 
va_start since
i386 backend tests it by itself.
  BTW currently the edges from thunks miss any profile info.
  (i.e. it will be 0). I guess we ought to make ipa-profile pass to estimate 
  those
  (it is difficult ot measure count of an alias).
  
 
 I'm not really looking at the edges from thunks to the actual
 function.  OTOH, I assume that edges to a thunk do have a count and
 look at that.

They do have (unless they are thunk to thunk edges), but in any case we ought to
regularize things here, sooner or later someone will get confused with counts
missing in the callgraph.
 
  If you walk only hot edges, then you need to make your function descent into
  both alias refs and thunks calls, or the aliases of thunks will not be seen
  then.
 
 Well, looking at bits of the patch again now, aliases to thunks might
 indeed be a problem for a few pieces of it.  I'll send the patch
 nevertheless and ponder about this problem later.

Hmm, please do :)
I will look at the updated patch.

Honza
 
 Thanks,
 
 Martin


Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Janne Blomqvist
On Thu, Jul 14, 2011 at 23:34, Tobias Burnus bur...@net-b.de wrote:
 Janne Blomqvist wrote:

        * caf/mpi.c (caf_runtime_error): Remove error parameter.
        Return EXIT_FAILURE instead.
  From the patch:

  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */

 This is unnecessary, as the call to exit() will call the libgfortran
 destructor which will close all units, as explained in comment #3 in
 the PR.

 While I think it should be sufficient for single-processor usage, I am not
 sure that that way all I/O buffers gets written before one calls
 MPI_Finalize - nor am I sure how one would handle ERROR STOP with regards to
 I/O.

Ah, I read that comment from caf/single.c, but now I see it in the
caf/mpi.c part of the patch as well. Yeah, in that case it might
matter, depending on how the MPI library implements MPI_Abort(). That
is, does it call exit() (in which case the destructor will be called)
or will it call abort() or raise some other fatal signal.

At least Open MPI seems to implement it by sending SIGTERM

http://www.open-mpi.org/doc/v1.4/man3/MPI_Abort.3.php

Come to think of it, in this case, if you want to use MPI_Abort()
rather than MPI_Finalize() + exit(), you should probably reset the
fatal signal handlers to the default one unless you want the backtrace
to be printed.


-- 
Janne Blomqvist


[patch] Fix PR target/48220 for the SPARC

2011-07-14 Thread Eric Botcazou
Hi,

this adds support for DW_OP_GNU_entry_value/DW_TAG_GNU_call_site_parameter on 
SPARC-like architectures (architectures with register windows and explicit 
window save instruction).  The transformation OUTGOING_REGNO - INCOMING_REGNO 
is explicit for them and not tied to the call-to-subroutine instruction, so 
this needs to be modelled if you want precise variable tracking.

Tested on SPARC/Solaris (both GCC and GDB) and x86/Linux.  OK for mainline?


2011-07-14  Eric Botcazou  ebotca...@adacore.com

PR target/48220
* doc/md.texi (Standard Names): Document window_save.
* cfgexpand.c (expand_debug_parm_decl): New function extracted from
expand_debug_expr and expand_debug_source_expr.  If the target has
a window_save instruction, adjust the ENTRY_VALUE_EXP.
(expand_debug_expr) SSA_NAME: Call expand_debug_parm_decl if the
SSA_NAME_VAR is a parameter.
(expand_debug_source_expr) PARM_DECL: Call expand_debug_parm_decl.
* var-tracking.c (parm_reg_t): New type and associated vector type.
(windowed_parm_regs): New variable.
(adjust_insn): If the target has a window_save instruction and this
is the instruction, make its effect on parameter registers explicit.
(next_non_note_insn_var_location): New function.
(emit_notes_in_bb): Use it instead of NEXT_INSN throughout.
(vt_add_function_parameter): If the target has a window_save insn,
adjust the incoming RTL and record that in windowed_parm_regs.
(vt_finalize): Free windowed_parm_regs.


-- 
Eric Botcazou
Index: cfgexpand.c
===
--- cfgexpand.c	(revision 176072)
+++ cfgexpand.c	(working copy)
@@ -2358,8 +2358,60 @@ convert_debug_memory_address (enum machi
   return x;
 }
 
-/* Return an RTX equivalent to the value of the tree expression
-   EXP.  */
+/* Return an RTX equivalent to the value of the parameter DECL.  */
+
+static rtx
+expand_debug_parm_decl (tree decl)
+{
+  rtx incoming = DECL_INCOMING_RTL (decl);
+
+  if (incoming
+   GET_MODE (incoming) != BLKmode
+   ((REG_P (incoming)  HARD_REGISTER_P (incoming))
+	  || (MEM_P (incoming)
+	   REG_P (XEXP (incoming, 0))
+	   HARD_REGISTER_P (XEXP (incoming, 0)
+{
+  rtx rtl = gen_rtx_ENTRY_VALUE (GET_MODE (incoming));
+
+#ifdef HAVE_window_save
+  /* DECL_INCOMING_RTL uses the INCOMING_REGNO of parameter registers.
+	 If the target machine has an explicit window save instruction, the
+	 actual entry value is the corresponding OUTGOING_REGNO instead.  */
+  if (REG_P (incoming)
+	   OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming))
+	incoming
+	  = gen_rtx_REG_offset (incoming, GET_MODE (incoming),
+OUTGOING_REGNO (REGNO (incoming)), 0);
+  else if (MEM_P (incoming))
+	{
+	  rtx reg = XEXP (incoming, 0);
+	  if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg))
+	{
+	  reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg)));
+	  incoming = replace_equiv_address_nv (incoming, reg);
+	}
+	}
+#endif
+
+  ENTRY_VALUE_EXP (rtl) = incoming;
+  return rtl;
+}
+
+  if (incoming
+   GET_MODE (incoming) != BLKmode
+   !TREE_ADDRESSABLE (decl)
+   MEM_P (incoming)
+   (XEXP (incoming, 0) == virtual_incoming_args_rtx
+	  || (GET_CODE (XEXP (incoming, 0)) == PLUS
+	   XEXP (XEXP (incoming, 0), 0) == virtual_incoming_args_rtx
+	   CONST_INT_P (XEXP (XEXP (incoming, 0), 1)
+return incoming;
+
+  return NULL_RTX;
+}
+
+/* Return an RTX equivalent to the value of the tree expression EXP.  */
 
 static rtx
 expand_debug_expr (tree exp)
@@ -3169,36 +3221,12 @@ expand_debug_expr (tree exp)
 		if (SSA_NAME_IS_DEFAULT_DEF (exp)
 		 TREE_CODE (SSA_NAME_VAR (exp)) == PARM_DECL)
 		  {
-		rtx incoming = DECL_INCOMING_RTL (SSA_NAME_VAR (exp));
-		if (incoming
-			 GET_MODE (incoming) != BLKmode
-			 ((REG_P (incoming)  HARD_REGISTER_P (incoming))
-			|| (MEM_P (incoming)
- REG_P (XEXP (incoming, 0))
- HARD_REGISTER_P (XEXP (incoming, 0)
-		  {
-			op0 = gen_rtx_ENTRY_VALUE (GET_MODE (incoming));
-			ENTRY_VALUE_EXP (op0) = incoming;
-			goto adjust_mode;
-		  }
-		if (incoming
-			 MEM_P (incoming)
-			 !TREE_ADDRESSABLE (SSA_NAME_VAR (exp))
-			 GET_MODE (incoming) != BLKmode
-			 (XEXP (incoming, 0) == virtual_incoming_args_rtx
-			|| (GET_CODE (XEXP (incoming, 0)) == PLUS
- XEXP (XEXP (incoming, 0), 0)
-   == virtual_incoming_args_rtx
- CONST_INT_P (XEXP (XEXP (incoming, 0),
-		  1)
-		  {
-			op0 = incoming;
-			goto adjust_mode;
-		  }
+		op0 = expand_debug_parm_decl (SSA_NAME_VAR (exp));
+		if (op0)
+		  goto adjust_mode;
 		op0 = expand_debug_expr (SSA_NAME_VAR (exp));
-		if (!op0)
-		  return NULL;
-		goto adjust_mode;
+		if (op0)
+		  goto adjust_mode;
 		  }
 		return NULL;
 	  }
@@ -3327,36 +3355,14 @@ 

[4.6] Fix PR tree-optimization/49725

2011-07-14 Thread Eric Botcazou
Hi,

this is the ICE at -O2 on ACATS c34005a introduced on the 4.6 branch by 
Martin's latest SRA patch.  But it's actually the same PRE issue as:
  http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02210.html

Bootstrapped/regtested on i586-suse-linux, OK for 4.6 branch?


2011-07-14  Eric Botcazou  ebotca...@adacore.com

PR tree-optimization/49725
Backport from mainline
2011-03-31  Eric Botcazou  ebotca...@adacore.com

* tree-ssa-pre.c (create_component_ref_by_pieces_1) ARRAY_REF: Drop
a zero minimum index only if it is redundant



-- 
Eric Botcazou
Index: tree-ssa-pre.c
===
--- tree-ssa-pre.c	(revision 176264)
+++ tree-ssa-pre.c	(working copy)
@@ -2874,8 +2874,11 @@ create_component_ref_by_pieces_1 (basic_
 	  return NULL_TREE;
 	if (genop2)
 	  {
-	/* Drop zero minimum index.  */
-	if (tree_int_cst_equal (genop2, integer_zero_node))
+	tree domain_type = TYPE_DOMAIN (TREE_TYPE (genop0));
+	/* Drop zero minimum index if redundant.  */
+	if (integer_zerop (genop2)
+		 (!domain_type
+		|| integer_zerop (TYPE_MIN_VALUE (domain_type
 	  genop2 = NULL_TREE;
 	else
 	  {


[patch] Fix PR middle-end/49732

2011-07-14 Thread Eric Botcazou
Hi,

this is a regression present on mainline and 4.6 branch.  The compiler crashes 
during gimplification because there is a COMPOUND_EXPR shared between the 
TYPE_SIZE and TYPE_SIZE_UNIT expressions of an array type.  Now this isn't 
supposed to happen because we run an unsharing pass before gimplification.

The problem here is that we have a forward declaration (DECL_EXPR) of a pointer 
type to the array type.  So, during the marking phase of the unsharing pass, 
the array type gets marked as visited through the pointer, which prevents it 
from being walked during the same phase when its own DECL_EXPR is processed.

This pointer/pointed-to type business is an old pattern.  Five years ago, 
Olivier changed gimplify_type_sizes like so:

2006-10-06  Olivier Hainque  hain...@adacore.com

* gimplify.c (gimplify_type_sizes) [POINTER_TYPE, REFERENCE_TYPE]:
Don't recurse on the pointed-to type.

because of a related problem.  It turns out that we need the same change in the 
DECL_EXPR case of walk_tree_1 to fix the bug at hand, which is sort of logical 
as there is a strong relationship between them:

case DECL_EXPR:
  /* If this is a TYPE_DECL, walk into the fields of the type that it's
 defining.  We only want to walk into these fields of a type in this
 case and not in the general case of a mere reference to the type.

 The criterion is as follows: if the field can be an expression, it
 must be walked only here.  This should be in keeping with the fields
 that are directly gimplified in gimplify_type_sizes in order for the
 mark/copy-if-shared/unmark machinery of the gimplifier to work with
 variable-sized types.

 Note that DECLs get walked as part of processing the BIND_EXPR.  */


Tested on x86_64-suse-linux, OK for mainline and 4.6 branch?


2011-07-14  Eric Botcazou  ebotca...@adacore.com

PR middle-end/49732
* tree.c (walk_tree_1) DECL_EXPR: Do not walk a pointed-to type.


2011-07-14  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/pointer_controlled.adb: New test.


-- 
Eric Botcazou
Index: tree.c
===
--- tree.c	(revision 176261)
+++ tree.c	(working copy)
@@ -10596,9 +10596,14 @@ walk_tree_1 (tree *tp, walk_tree_fn func
 	  if (result || !walk_subtrees)
 	return result;
 
-	  result = walk_type_fields (*type_p, func, data, pset, lh);
-	  if (result)
-	return result;
+	  /* But do not walk a pointed-to type since it may itself need to
+	 be walked in the declaration case if it isn't anonymous.  */
+	  if (!POINTER_TYPE_P (*type_p))
+	{
+	  result = walk_type_fields (*type_p, func, data, pset, lh);
+	  if (result)
+		return result;
+	}
 
 	  /* If this is a record type, also walk the fields.  */
 	  if (RECORD_OR_UNION_TYPE_P (*type_p))
-- PR ada/49732
-- Testcase by Vorfeed Canal

-- { dg-do compile }
-- { dg-options -gnato }

with Interfaces.C; use Interfaces.C;
with Interfaces.C.Strings; use Interfaces.C.Strings;
with Interfaces.C.Pointers;

procedure Pointer_Controlled is

   function Create (Name : String) return size_t is

  type Name_String is new char_array (0 .. Name'Length);
  type Name_String_Ptr is access Name_String;
  pragma Controlled (Name_String_Ptr);

  Name_Str : constant Name_String_Ptr := new Name_String;
  Name_Len : size_t;

   begin
  To_C (Name, Name_Str.all, Name_Len);
  return 1;
   end;

   Test : size_t;

begin
   Test := Create(ABC);
end;


Re: [PATCH] New IPA-CP with real function cloning - updated version

2011-07-14 Thread Jan Hubicka
 2011-07-14  Martin Jambor  mjam...@suse.cz
 
   * ipa-prop.h: Include alloc-pool.h, all sorts of updates to general
   comments.
   (ipcp_values_pool): Declare.
   (ipcp_sources_pool): Likewise.
   (ipcp_lattice): Changed to forward declaration.
   (ipa_param_descriptor): Removed fields ipcp_lattice, types and
   cannot_devirtualize.
   (ipa_node_params): New fields descriptors, lattices, known_vals,
   clone_for_all_contexts and node dead, removed fields params and
   count_scale.
   (ipa_set_param_count): Removed.
   (ipa_get_param_count): Made to work with descriptors vector.
   (ipa_get_param): Updated.
   (ipa_param_cannot_devirtualize_p): Removed.
   (ipa_param_types_vec_empty): Likewise.
   (ipa_set_param_used): New function.
   (ipa_get_param_used): Updated to use descriptors vector.
   (ipa_func_list): Removed.
   (ipa_init_func_list): Removed declaration.
   (ipa_push_func_to_list_1): Likewise.
   (ipa_pop_func_from_list): Likewise.
   (ipa_push_func_to_list): Removed.
   (ipa_lattice_from_jfunc): Remove declaration.
   (ipa_get_jf_pass_through_result): Declare.
   (ipa_get_jf_ancestor_result): Likewise.
   (ipa_value_from_jfunc): Likewise.
   (ipa_get_lattice): Update.
   (ipa_lat_is_single_const): New function.
   * ipa-prop.c (ipa_push_func_to_list_1): Removed.
   (ipa_init_func_list): Likewise.
   (ipa_pop_func_from_list): Likewise.
   (ipa_get_param_decl_index): Fix coding style.
   (count_formal_params): Removed.
   (count_formal_params_1): Renamed to count_formal_params.
   (ipa_populate_param_decls): Update to use descriptors vector.
   (ipa_initialize_node_params): Likewise.
   (visit_ref_for_mod_analysis): Use ipa_set_param_used.
   (ipa_analyze_params_uses): Likewise.
   (ipa_free_node_params_substructures): Likewise and free also lattices
   and known values.
   (duplicate_array): Removed.
   (ipa_edge_duplication_hook): Add the new edge to the list of edge
   clones.
   (ipa_node_duplication_hook): Update to use new lattices.
   (ipa_free_all_structures_after_ipa_cp): Free alloc pools.
   (ipa_free_all_structures_after_iinln): Likewise.
   (ipa_write_node_info): Update to use new lattices.
   (ipa_read_node_info): Likewise.
   (ipa_get_jf_pass_through_result): New function.
   (ipa_get_jf_ancestor_result): Likewise.
   (ipa_value_from_jfunc): Likewise.
   (ipa_cst_from_jfunc): Reimplemented using ipa_value_from_jfunc.
   * ipa-cp.c: Reimplemented.
   * params.def (PARAM_DEVIRT_TYPE_LIST_SIZE): Removed.
   (PARAM_IPA_CP_VALUE_LIST_SIZE): New parameter.
   (PARAM_IPA_CP_EVAL_THRESHOLD): Likewise.
   * Makefile.in (IPA_PROP_H): Added alloc-pool.h to dependencies.
 
   * doc/invoke.texi (devirt-type-list-size): Removed description.
   (ipa-cp-value-list-size): Added description.
 
   * testsuite/gcc.dg/ipa/ipa-1.c: Updated testcase dump scan.
   * testsuite/gcc.dg/ipa/ipa-2.c: Likewise.
   * testsuite/gcc.dg/ipa/ipa-3.c: Likewise and made functions static.
   * testsuite/gcc.dg/ipa/ipa-4.c: Updated testcase dump scan.
   * testsuite/gcc.dg/ipa/ipa-5.c: Likewise.
   * testsuite/gcc.dg/ipa/ipa-7.c: Likewise.
   * testsuite/gcc.dg/ipa/ipa-8.c: Updated testcase dump scan.
   * testsuite/gcc.dg/ipa/ipacost-1.c: Likewise.
   * testsuite/gcc.dg/ipa/ipacost-2.c: Likewise and increased sizes
   of some functions.
   * testsuite/gcc.dg/ipa/ipcp-1.c: New test.
   * testsuite/gcc.dg/ipa/ipcp-2.c: Likewise.
   * testsuite/gcc.dg/tree-ssa/ipa-cp-1.c: Updated testcase.
 
 
 /* ipa_node_params stores information related to formal parameters of 
 functions
and some other information for interprocedural passes that operate on
parameters (such as ipa-cp).  */
 
 struct ipa_node_params
 {
   /* Information about individual formal parameters that are gathered when
  summaries are generated. */
   VEC (ipa_param_descriptor_t, heap) *descriptors;
   /* Pointer to an array of structures describing individual formal
  parameters.  */
   struct ipcp_lattice *lattices;

Hmm, I was under impression that the plan was to break away the stuff used by
ipa-cp internally during the propagatoin stage (i.e. ipcp_orig_node/known_vals
and probably lattices from the stuff used to represent parameters and jump
functions, i.e. descriptors.

But this can be handled incrementally.

   /* FIXME: Can we clobber only the first argument of thunks?  */

Well, we should know how to propagate through it.  But it is not too important 
side case I guess
untill we can devirtualize to them effectively.
   if (node-alias || node-thunk.thunk_p
   || ipa_is_called_with_var_arguments (info))
 disable = true;

The patch is OK, thanks!
It would be nice to add a testcase that the profile updating is done correctly.

Honza


[pph] Add symbol table - Fix remaining asm differences (issue4732043)

2011-07-14 Thread Diego Novillo

This patch fixes the remaining assembly differences between non-pph
and pph compiles.

The idea is to make the pph compiler process functions and variables
in the same order that they had been processed in the original
compile.  To do this, we intercept calls to rest_of_decl_compilation
and allocate_struct_function.  At every call, we add the declaration
to a list.

This list becomes the symbol table for the header file, which is then
written to the end of the file.  When reading the file, we read this
table and present the symbols to the middle end in that order.

The symbol table is written in pph_out_symtab and read in
pph_in_symtab.  The other changes are mostly wrapping calls to
rest_of_decl_compilation so that we can add the declarations to the
symbol table.

This simplifies the logic we use to present symbols to the middle end
as it encodes the sequence in the symbol table itself.  No need to
reverse lists of symbols or the other contortions we used to make.

Tested on x86_64.  Applied to branch.


Diego.

cp/ChangeLog.pph
2011-07-14   Diego Novillo  dnovi...@google.com

* Make-lang.in (cp/decl.o): Add dependency on CXX_PPH_STREAMER_H.
(cp/pph-streamer-out.o): Add dependency on CGRAPH_H.
(cp/pph-streamer-in.o): Add dependency on toplev.h

* cp-tree.h (cp_rest_of_decl_compilation): Declare.
* call.c (set_up_extended_ref_temp): Call cp_rest_of_decl_compilation
instead of rest_of_decl_compilation.
* class.c (build_clone): Likewise.
* decl2.c (finish_anon_union): Likewise.
(maybe_emit_vtables): Likewise.
(write_out_vars): Likewise.
* method.c (implicitly_declare_fn): Likewise.
* semantics.c (maybe_add_lambda_conv_op): Likewise.
* decl.c (make_rtl_for_nonlocal_decl): Likewise.
(cp_finish_decl): Likewise.
(start_preparsed_function): Likewise.
(cp_rest_of_decl_compilation): New.

* pph-streamer-in.c: Include toplev.h
(pph_in_language_function): Tidy.
(pph_in_struct_function): Remove parameter DECL.
Support reading shared struct functions.
Read FN-DECL.
(pph_register_decl_in_symtab): Remove.  Update all users.
(pph_register_binding_in_symtab): Remove.  Update all users.
(pph_in_symtab_marker): New.
(pph_in_symtab): New.
(pph_read_file_contents): Call it.
(pph_in_function_decl): Do not call pph_in_struct_function.
* pph-streamer-out.c: Include cgraph.h.
(decls_to_register): New.
(pph_out_chain_filtered): Remove argument REVERSE_P.  Update
all users.
(pph_out_struct_function): Write FN-DECL.
(pph_out_symtab_marker): New.
(pph_out_symtab): New.
(pph_write_file_contents): Call it.
(pph_out_function_decl): Do not call pph_out_struct_function.
(pph_add_decl_to_register): New.
* pph-streamer.h (enum pph_symtab_marker): New.
(struct pph_stream): Remove field FNS_TO_EXPAND.  Update all
users.
(pph_add_decl_to_register): Declare.

testsuite/ChangeLog.pph
* g++.dg/pph/c1pr44948-1a.cc: Adjust expected failure.
* g++.dg/pph/x0hardlookup.h: Add new symbol in global namespace.
* g++.dg/pph/c4meteor-contest.cc: Mark fixed.
* g++.dg/pph/c1attr-warn-unused.cc: Likewise.
* g++.dg/pph/x1globalref.cc: Likewise.
* g++.dg/pph/x1hardlookup.cc: Likewise.


diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index 5f3249e..9634c47 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -271,7 +271,7 @@ cp/decl.o: cp/decl.c $(CXX_TREE_H) $(TM_H) $(FLAGS_H) 
cp/decl.h \
   cp/operators.def $(TM_P_H) $(TREE_INLINE_H) $(DIAGNOSTIC_H) $(C_PRAGMA_H) \
   debug.h gt-cp-decl.h $(TIMEVAR_H) $(TARGET_H) $(PLUGIN_H) \
   intl.h tree-iterator.h pointer-set.h $(SPLAY_TREE_H) \
-  c-family/c-objc.h
+  c-family/c-objc.h $(CXX_PPH_STREAMER_H)
 cp/decl2.o: cp/decl2.c $(CXX_TREE_H) $(TM_H) $(FLAGS_H) cp/decl.h \
   output.h toplev.h $(C_COMMON_H) gt-cp-decl2.h $(CGRAPH_H) \
   $(C_PRAGMA_H) $(TREE_DUMP_H) intl.h $(TARGET_H) $(GIMPLE_H) pointer-set.h \
@@ -351,8 +351,8 @@ cp/pph-streamer.o: cp/pph-streamer.c $(CONFIG_H) 
$(SYSTEM_H) coretypes.h \
 cp/pph-streamer-out.o: cp/pph-streamer-out.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \
$(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \
-   cppbuiltin.h tree-iterator.h
+   cppbuiltin.h tree-iterator.h $(CGRAPH_H)
 cp/pph-streamer-in.o: cp/pph-streamer-in.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \
$(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \
-   cppbuiltin.h tree-iterator.h
+   cppbuiltin.h tree-iterator.h toplev.h
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 56f3408..6988dc7 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8567,7 +8567,7 @@ 

[PATCH 3/9] dwarf2cfi: Populate CUR_ROW-REG_SAVE.

2011-07-14 Thread Richard Henderson
To be actually used by a subsequent patch.
---
 gcc/dwarf2cfi.c |   27 +--
 1 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index eb59f28..36fa7f8 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -285,6 +285,17 @@ add_cfi (dw_cfi_ref cfi)
 VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi);
 }
 
+/* Perform ROW-REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
+   that the register column is no longer saved.  */
+
+static void
+update_row_reg_save (dw_cfi_row_ref row, unsigned column, dw_cfi_ref cfi)
+{
+  if (VEC_length (dw_cfi_ref, row-reg_save) = column)
+VEC_safe_grow_cleared (dw_cfi_ref, gc, row-reg_save, column + 1);
+  VEC_replace (dw_cfi_ref, row-reg_save, column, cfi);
+}
+
 /* This function fills in aa dw_cfa_location structure from a dwarf location
descriptor sequence.  */
 
@@ -574,7 +585,13 @@ reg_save (unsigned int reg, unsigned int sreg, 
HOST_WIDE_INT offset)
   cfi-dw_cfi_oprnd2.dw_cfi_offset = offset;
 }
   else if (sreg == reg)
-cfi-dw_cfi_opc = DW_CFA_same_value;
+{
+  /* While we could emit something like DW_CFA_same_value or
+DW_CFA_restore, we never expect to see something like that
+in a prologue.  This is more likely to be a bug.  A backend
+can always bypass this by using REG_CFA_RESTORE directly.  */
+  gcc_unreachable ();
+}
   else
 {
   cfi-dw_cfi_opc = DW_CFA_register;
@@ -582,6 +599,7 @@ reg_save (unsigned int reg, unsigned int sreg, 
HOST_WIDE_INT offset)
 }
 
   add_cfi (cfi);
+  update_row_reg_save (cur_row, reg, cfi);
 }
 
 /* Given a SET, calculate the amount of stack adjustment it
@@ -1337,6 +1355,7 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
 {
   rtx src, dest, span;
   dw_cfi_ref cfi = new_cfi ();
+  unsigned regno;
 
   dest = SET_DEST (set);
   src = SET_SRC (set);
@@ -1347,8 +1366,10 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   span = targetm.dwarf_register_span (src);
   gcc_assert (!span);
 
+  regno = dwf_regno (src);
+
   cfi-dw_cfi_opc = DW_CFA_expression;
-  cfi-dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (src);
+  cfi-dw_cfi_oprnd1.dw_cfi_reg_num = regno;
   cfi-dw_cfi_oprnd2.dw_cfi_loc
 = mem_loc_descriptor (XEXP (dest, 0), get_address_mode (dest),
  GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED);
@@ -1356,6 +1377,7 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   /* ??? We'd like to use queue_reg_save, were the interface different,
  and, as above, we could manage flushing for epilogues.  */
   add_cfi (cfi);
+  update_row_reg_save (cur_row, regno, cfi);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
@@ -1370,6 +1392,7 @@ dwarf2out_frame_debug_cfa_restore (rtx reg)
   cfi-dw_cfi_oprnd1.dw_cfi_reg_num = regno;
 
   add_cfi (cfi);
+  update_row_reg_save (cur_row, regno, NULL);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_WINDOW_SAVE.
-- 
1.7.6



[PATCH 2/9] dwarf2cfi: Rename cfi_insn to add_cfi_insn.

2011-07-14 Thread Richard Henderson
Make it consistent with add_cfi_vec.
---
 gcc/dwarf2cfi.c |   20 +++-
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 1c1b74f..eb59f28 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -86,7 +86,7 @@ static GTY(()) dw_cfi_row_ref cie_cfi_row;
 static GTY(()) unsigned long dwarf2out_cfi_label_num;
 
 /* The insn after which a new CFI note should be emitted.  */
-static rtx cfi_insn;
+static rtx add_cfi_insn;
 
 /* When non-null, add_cfi will add the CFI to this vector.  */
 static cfi_vec *add_cfi_vec;
@@ -274,11 +274,13 @@ add_cfi (dw_cfi_ref cfi)
 }
 
   any_cfis_emitted = true;
-  if (cfi_insn != NULL)
+
+  if (add_cfi_insn != NULL)
 {
-  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
-  NOTE_CFI (cfi_insn) = cfi;
+  add_cfi_insn = emit_note_after (NOTE_INSN_CFI, add_cfi_insn);
+  NOTE_CFI (add_cfi_insn) = cfi;
 }
+
   if (add_cfi_vec != NULL)
 VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi);
 }
@@ -2319,7 +2321,7 @@ create_cfi_notes (void)
 {
   rtx pat;
 
-  cfi_insn = PREV_INSN (insn);
+  add_cfi_insn = PREV_INSN (insn);
 
   if (BARRIER_P (insn))
{
@@ -2342,7 +2344,7 @@ create_cfi_notes (void)
  break;
 
case NOTE_INSN_CFA_RESTORE_STATE:
- cfi_insn = insn;
+ add_cfi_insn = insn;
  dwarf2out_frame_debug_restore_state ();
  break;
}
@@ -2372,13 +2374,13 @@ create_cfi_notes (void)
 
   /* Do not separate tablejump insns from their ADDR_DIFF_VEC.
 Putting the note after the VEC should be ok.  */
-  if (!tablejump_p (insn, NULL, cfi_insn))
-   cfi_insn = insn;
+  if (!tablejump_p (insn, NULL, add_cfi_insn))
+   add_cfi_insn = insn;
 
   dwarf2out_frame_debug (insn, true);
 }
 
-  cfi_insn = NULL;
+  add_cfi_insn = NULL;
 }
 
 /* Determine if we need to save and restore CFI information around this
-- 
1.7.6



[PATCH 7/9] dwarf2cfi: Allocate reg_saved_in_data in the heap.

2011-07-14 Thread Richard Henderson
---
 gcc/dwarf2cfi.c |   19 ++-
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index a800cb4..1d6413f 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1109,13 +1109,13 @@ typedef struct GTY(()) reg_saved_in_data {
 } reg_saved_in_data;
 
 DEF_VEC_O (reg_saved_in_data);
-DEF_VEC_ALLOC_O (reg_saved_in_data, gc);
+DEF_VEC_ALLOC_O (reg_saved_in_data, heap);
 
 /* A set of registers saved in other registers.  This is implemented as
a flat array because it normally contains zero or 1 entry, depending
on the target.  IA-64 is the big spender here, using a maximum of
5 entries.  */
-static GTY(()) VEC(reg_saved_in_data, gc) *regs_saved_in_regs;
+static VEC(reg_saved_in_data, heap) *regs_saved_in_regs;
 
 static GTY(()) reg_saved_in_data *cie_return_save;
 
@@ -1161,7 +1161,7 @@ record_reg_saved_in_reg (rtx dest, rtx src)
   if (dest == NULL)
 return;
 
-  elt = VEC_safe_push(reg_saved_in_data, gc, regs_saved_in_regs, NULL);
+  elt = VEC_safe_push(reg_saved_in_data, heap, regs_saved_in_regs, NULL);
   elt-orig_reg = src;
   elt-saved_in_reg = dest;
 }
@@ -2699,6 +2699,9 @@ initial_return_save (rtx rtl)
 static unsigned int
 execute_dwarf2_frame (void)
 {
+  gcc_checking_assert (queued_reg_saves == NULL);
+  gcc_checking_assert (regs_saved_in_regs == NULL);
+
   /* The first time we're called, compute the incoming frame state.  */
   if (cie_cfi_vec == NULL)
 {
@@ -2737,7 +2740,7 @@ execute_dwarf2_frame (void)
  cie_return_save = ggc_alloc_reg_saved_in_data ();
  *cie_return_save = *VEC_index (reg_saved_in_data,
 regs_saved_in_regs, 0);
- regs_saved_in_regs = NULL;
+ VEC_pop (reg_saved_in_data, regs_saved_in_regs);
  break;
default:
  gcc_unreachable ();
@@ -2748,12 +2751,10 @@ execute_dwarf2_frame (void)
 }
 
   /* Set up state for generating call frame debug info.  */
-  gcc_checking_assert (queued_reg_saves == NULL);
-  gcc_checking_assert (regs_saved_in_regs == NULL);
-
   cur_row = copy_cfi_row (cie_cfi_row);
   if (cie_return_save)
-VEC_safe_push (reg_saved_in_data, gc, regs_saved_in_regs, cie_return_save);
+VEC_safe_push (reg_saved_in_data, heap,
+  regs_saved_in_regs, cie_return_save);
 
   cfa_store = cur_row-cfa;
   args_size = 0;
@@ -2770,7 +2771,7 @@ execute_dwarf2_frame (void)
   /* Reset all function-specific information, particularly for GC.  */
   XDELETEVEC (barrier_args_size);
   barrier_args_size = NULL;
-  regs_saved_in_regs = NULL;
+  VEC_free (reg_saved_in_data, heap, regs_saved_in_regs);
   VEC_free (queued_reg_save, heap, queued_reg_saves);
 
   free_cfi_row (cur_row);
-- 
1.7.6



[PATCH 6/9] dwarf2cfi: Convert queued_reg_save to a VEC.

2011-07-14 Thread Richard Henderson
Also, allocate it in the heap instead of garbage collected.
---
 gcc/dwarf2cfi.c |   51 ++-
 1 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 51fb824..a800cb4 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1091,14 +1091,16 @@ dwarf2out_notice_stack_adjust (rtx insn, bool after_p)
of the prologue or (b) the register is clobbered.  This clusters
register saves so that there are fewer pc advances.  */
 
-struct GTY(()) queued_reg_save {
-  struct queued_reg_save *next;
+typedef struct {
   rtx reg;
-  HOST_WIDE_INT cfa_offset;
   rtx saved_reg;
-};
+  HOST_WIDE_INT cfa_offset;
+} queued_reg_save;
 
-static GTY(()) struct queued_reg_save *queued_reg_saves;
+DEF_VEC_O (queued_reg_save);
+DEF_VEC_ALLOC_O (queued_reg_save, heap);
+
+static VEC(queued_reg_save, heap) *queued_reg_saves;
 
 /* The caller's ORIG_REG is saved in SAVED_IN_REG.  */
 typedef struct GTY(()) reg_saved_in_data {
@@ -1170,24 +1172,21 @@ record_reg_saved_in_reg (rtx dest, rtx src)
 static void
 queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset)
 {
-  struct queued_reg_save *q;
+  queued_reg_save *q;
+  size_t i;
 
   /* Duplicates waste space, but it's also necessary to remove them
  for correctness, since the queue gets output in reverse order.  */
-  for (q = queued_reg_saves; q != NULL; q = q-next)
+  FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q)
 if (compare_reg_or_pc (q-reg, reg))
-  break;
+  goto found;
 
-  if (q == NULL)
-{
-  q = ggc_alloc_queued_reg_save ();
-  q-next = queued_reg_saves;
-  queued_reg_saves = q;
-}
+  q = VEC_safe_push (queued_reg_save, heap, queued_reg_saves, NULL);
 
+ found:
   q-reg = reg;
-  q-cfa_offset = offset;
   q-saved_reg = sreg;
+  q-cfa_offset = offset;
 }
 
 /* Output all the entries in QUEUED_REG_SAVES.  */
@@ -1195,9 +1194,10 @@ queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset)
 static void
 dwarf2out_flush_queued_reg_saves (void)
 {
-  struct queued_reg_save *q;
+  queued_reg_save *q;
+  size_t i;
 
-  for (q = queued_reg_saves; q; q = q-next)
+  FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q)
 {
   unsigned int reg, sreg;
 
@@ -1214,7 +1214,7 @@ dwarf2out_flush_queued_reg_saves (void)
   reg_save (reg, sreg, q-cfa_offset);
 }
 
-  queued_reg_saves = NULL;
+  VEC_truncate (queued_reg_save, queued_reg_saves, 0);
 }
 
 /* Does INSN clobber any register which QUEUED_REG_SAVES lists a saved
@@ -1225,17 +1225,18 @@ dwarf2out_flush_queued_reg_saves (void)
 static bool
 clobbers_queued_reg_save (const_rtx insn)
 {
-  struct queued_reg_save *q;
+  queued_reg_save *q;
+  size_t iq;
 
-  for (q = queued_reg_saves; q; q = q-next)
+  FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, iq, q)
 {
-  size_t i;
+  size_t ir;
   reg_saved_in_data *rir;
 
   if (modified_in_p (q-reg, insn))
return true;
 
-  FOR_EACH_VEC_ELT (reg_saved_in_data, regs_saved_in_regs, i, rir)
+  FOR_EACH_VEC_ELT (reg_saved_in_data, regs_saved_in_regs, ir, rir)
if (compare_reg_or_pc (q-reg, rir-orig_reg)
 modified_in_p (rir-saved_in_reg, insn))
  return true;
@@ -1250,11 +1251,11 @@ static rtx
 reg_saved_in (rtx reg)
 {
   unsigned int regn = REGNO (reg);
-  struct queued_reg_save *q;
+  queued_reg_save *q;
   reg_saved_in_data *rir;
   size_t i;
 
-  for (q = queued_reg_saves; q; q = q-next)
+  FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q)
 if (q-saved_reg  regn == REGNO (q-saved_reg))
   return q-reg;
 
@@ -2770,7 +2771,7 @@ execute_dwarf2_frame (void)
   XDELETEVEC (barrier_args_size);
   barrier_args_size = NULL;
   regs_saved_in_regs = NULL;
-  queued_reg_saves = NULL;
+  VEC_free (queued_reg_save, heap, queued_reg_saves);
 
   free_cfi_row (cur_row);
   cur_row = NULL;
-- 
1.7.6



[PATCH 4/9] dwarf2cfi: Implement change_cfi_row.

2011-07-14 Thread Richard Henderson
Add a generic function to adjust cfi state from one row to another.
Use this to implement text section switching.  This will also be
usable for arbitrary changes around a cfg for shrink-wrapping.
---
 gcc/dwarf2cfi.c |  376 +--
 gcc/dwarf2out.c |  159 ++--
 gcc/dwarf2out.h |3 +-
 3 files changed, 290 insertions(+), 248 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 36fa7f8..745e137 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -285,6 +285,28 @@ add_cfi (dw_cfi_ref cfi)
 VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi);
 }
 
+static void
+add_cfi_args_size (HOST_WIDE_INT size)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi-dw_cfi_opc = DW_CFA_GNU_args_size;
+  cfi-dw_cfi_oprnd1.dw_cfi_offset = size;
+
+  add_cfi (cfi);
+}
+
+static void
+add_cfi_restore (unsigned reg)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi-dw_cfi_opc = (reg  ~0x3f ? DW_CFA_restore_extended : DW_CFA_restore);
+  cfi-dw_cfi_oprnd1.dw_cfi_reg_num = reg;
+
+  add_cfi (cfi);
+}
+
 /* Perform ROW-REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
that the register column is no longer saved.  */
 
@@ -474,64 +496,109 @@ cfa_equal_p (const dw_cfa_location *loc1, const 
dw_cfa_location *loc2)
  || loc1-base_offset == loc2-base_offset));
 }
 
-/* This routine does the actual work.  The CFA is now calculated from
-   the dw_cfa_location structure.  */
+/* Determine if two CFI operands are identical.  */
 
-static void
-def_cfa_1 (dw_cfa_location *loc_p)
+static bool
+cfi_oprnd_equal_p (enum dw_cfi_oprnd_type t, dw_cfi_oprnd *a, dw_cfi_oprnd *b)
 {
-  dw_cfi_ref cfi;
-  dw_cfa_location loc = *loc_p;
+  switch (t)
+{
+case dw_cfi_oprnd_unused:
+  return true;
+case dw_cfi_oprnd_reg_num:
+  return a-dw_cfi_reg_num == b-dw_cfi_reg_num;
+case dw_cfi_oprnd_offset:
+  return a-dw_cfi_offset == b-dw_cfi_offset;
+case dw_cfi_oprnd_addr:
+  return (a-dw_cfi_addr == b-dw_cfi_addr
+ || strcmp (a-dw_cfi_addr, b-dw_cfi_addr) == 0);
+case dw_cfi_oprnd_loc:
+  return loc_descr_equal_p (a-dw_cfi_loc, b-dw_cfi_loc);
+}
+  gcc_unreachable ();
+}
 
-  if (cfa_store.reg == loc.reg  loc.indirect == 0)
-cfa_store.offset = loc.offset;
+/* Determine if two CFI entries are identical.  */
+
+static bool
+cfi_equal_p (dw_cfi_ref a, dw_cfi_ref b)
+{
+  enum dwarf_call_frame_info opc;
+
+  /* Make things easier for our callers, including missing operands.  */
+  if (a == b)
+return true;
+  if (a == NULL || b == NULL)
+return false;
+
+  /* Obviously, the opcodes must match.  */
+  opc = a-dw_cfi_opc;
+  if (opc != b-dw_cfi_opc)
+return false;
+
+  /* Compare the two operands, re-using the type of the operands as
+ already exposed elsewhere.  */
+  return (cfi_oprnd_equal_p (dw_cfi_oprnd1_desc (opc),
+a-dw_cfi_oprnd1, b-dw_cfi_oprnd1)
+  cfi_oprnd_equal_p (dw_cfi_oprnd2_desc (opc),
+   a-dw_cfi_oprnd2, b-dw_cfi_oprnd2));
+}
+
+/* The CFA is now calculated from NEW_CFA.  Consider OLD_CFA in determining
+   what opcode to emit.  Returns the CFI opcode to effect the change, or
+   NULL if NEW_CFA == OLD_CFA.  */
+
+static dw_cfi_ref
+def_cfa_0 (dw_cfa_location *old_cfa, dw_cfa_location *new_cfa)
+{
+  dw_cfi_ref cfi;
 
   /* If nothing changed, no need to issue any call frame instructions.  */
-  if (cfa_equal_p (loc, cur_row-cfa))
-return;
+  if (cfa_equal_p (old_cfa, new_cfa))
+return NULL;
 
   cfi = new_cfi ();
 
-  if (loc.reg == cur_row-cfa.reg  !loc.indirect  !cur_row-cfa.indirect)
+  if (new_cfa-reg == old_cfa-reg  !new_cfa-indirect  !old_cfa-indirect)
 {
   /* Construct a DW_CFA_def_cfa_offset offset instruction, indicating
 the CFA register did not change but the offset did.  The data
 factoring for DW_CFA_def_cfa_offset_sf happens in output_cfi, or
 in the assembler via the .cfi_def_cfa_offset directive.  */
-  if (loc.offset  0)
+  if (new_cfa-offset  0)
cfi-dw_cfi_opc = DW_CFA_def_cfa_offset_sf;
   else
cfi-dw_cfi_opc = DW_CFA_def_cfa_offset;
-  cfi-dw_cfi_oprnd1.dw_cfi_offset = loc.offset;
+  cfi-dw_cfi_oprnd1.dw_cfi_offset = new_cfa-offset;
 }
 
 #ifndef MIPS_DEBUGGING_INFO  /* SGI dbx thinks this means no offset.  */
-  else if (loc.offset == cur_row-cfa.offset
-   cur_row-cfa.reg != INVALID_REGNUM
-   !loc.indirect
-   !cur_row-cfa.indirect)
+  else if (new_cfa-offset == old_cfa-offset
+   old_cfa-reg != INVALID_REGNUM
+   !new_cfa-indirect
+   !old_cfa-indirect)
 {
   /* Construct a DW_CFA_def_cfa_register register instruction,
 indicating the CFA register has changed to register but the
 offset has not changed.  */
   cfi-dw_cfi_opc = DW_CFA_def_cfa_register;
-  cfi-dw_cfi_oprnd1.dw_cfi_reg_num = loc.reg;
+  

[RFC PATCH 0/9] CFG aware dwarf2 cfi generation

2011-07-14 Thread Richard Henderson
This finally brings us to something that can support shrink-wrapping.
As mentioned in the description of the last patch, this is 95% of 
what Bernd had in his last 007-dw2cfg patch, except for the remember/
restore_state stuff.  And hopefully with better comments.

This is the first version of this that has actually made it into
stage3 bootstrap on x86_64, so it isn't well tested yet.  This just
happens to coincide with the end of my work day, and it's been a while
since I've shared state, so I thought I'd post for overnight review.

The complete tree is available at

  git://repo.or.cz/gcc/rth.git rth/cfi-pass


r~


Richard Henderson (9):
  dwarf2cfi: Introduce a dw_cfi_row state.
  dwarf2cfi: Rename cfi_insn to add_cfi_insn.
  dwarf2cfi: Populate CUR_ROW-REG_SAVE.
  dwarf2cfi: Implement change_cfi_row.
  dwarf2cfi: Remove dw_cfi_row_ref typedef.
  dwarf2cfi: Convert queued_reg_save to a VEC.
  dwarf2cfi: Allocate reg_saved_in_data in the heap.
  dwarf2cfi: Introduce dw_trace_info.
  dwarf2cfi: Generate and connect traces.

 gcc/dwarf2cfi.c | 1672 ++-
 gcc/dwarf2out.c |  159 --
 gcc/dwarf2out.h |5 +-
 3 files changed, 1030 insertions(+), 806 deletions(-)

-- 
1.7.6



[PATCH 8/9] dwarf2cfi: Introduce dw_trace_info.

2011-07-14 Thread Richard Henderson
This patch only introduces the structure definition and adjusts
the existing routines to use the new cur_trace global to access
the variables that were moved into the structure.
---
 gcc/dwarf2cfi.c |  440 +--
 1 files changed, 266 insertions(+), 174 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 1d6413f..9aa1208 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -74,7 +74,98 @@ typedef struct GTY(()) dw_cfi_row_struct
   HOST_WIDE_INT args_size;
 } dw_cfi_row;
 
-
+/* The caller's ORIG_REG is saved in SAVED_IN_REG.  */
+typedef struct GTY(()) reg_saved_in_data_struct {
+  rtx orig_reg;
+  rtx saved_in_reg;
+} reg_saved_in_data;
+
+DEF_VEC_O (reg_saved_in_data);
+DEF_VEC_ALLOC_O (reg_saved_in_data, heap);
+
+/* Since we no longer have a proper CFG, we're going to create a facsimile
+   of one on the fly while processing the frame-related insns.
+
+   We create dw_trace structures for each instruction trace beginning at
+   at a label following a barrier (or beginning of the function), and
+   ending at a barrier (or the end of the function).
+
+   As we encounter control transfer insns, we propagate the current
+   row state across the edges to the starts of traces.  If an edge goes
+   to a label that is not the start of a trace, we ignore it.  This
+   assumes that previous compiler transformations were correct, and that
+   we will reach the same row state from any source.  (We can perform some
+   limited validation of this assumption, but without the full CFG we
+   cannot be sure of full validation coverage.  It is expensive, so we
+   only do so with checking enabled.)
+
+   All traces are members of the TRACE_INFO array, in the order in which
+   they appear in the instruction stream.
+
+   All labels are given an LUID that indexes the LABEL_INFO array.  If
+   the label is the start of a trace, the TRACE pointer will be non-NULL
+   and point into the TRACE_INFO array.  */
+
+typedef struct
+{
+  /* The label that begins the trace.  This will be NULL for the first
+ trace beginning at function entry.  */
+  rtx label;
+
+  /* The row state at the beginning and end of the trace.  */
+  dw_cfi_row *enter_row, *exit_row;
+
+  /* The following variables contain data used in interpreting frame related
+ expressions.  These are not part of the real row state as defined by
+ Dwarf, but it seems like they need to be propagated into a trace in case
+ frame related expressions have been sunk.  */
+  /* ??? This seems fragile.  These variables are fragments of a larger
+ expression.  If we do not keep the entire expression together, we risk
+ not being able to put it together properly.  Consider forcing targets
+ to generate self-contained expressions and dropping all of the magic
+ interpretation code in this file.  Or at least refusing to shrink wrap
+ any frame related insn that doesn't contain a complete expression.  */
+
+  /* The register used for saving registers to the stack, and its offset
+ from the CFA.  */
+  dw_cfa_location cfa_store;
+
+  /* A temporary register holding an integral value used in adjusting SP
+ or setting up the store_reg.  The offset field holds the integer
+ value, not an offset.  */
+  dw_cfa_location cfa_temp;
+
+  /* A set of registers saved in other registers.  This is the inverse of
+ the row-reg_save info, if the entry is a DW_CFA_register.  This is
+ implemented as a flat array because it normally contains zero or 1
+ entry, depending on the target.  IA-64 is the big spender here, using
+ a maximum of 5 entries.  */
+  VEC(reg_saved_in_data, heap) *regs_saved_in_regs;
+
+} dw_trace_info;
+
+DEF_VEC_O (dw_trace_info);
+DEF_VEC_ALLOC_O (dw_trace_info, heap);
+
+typedef struct
+{
+  dw_trace_info *trace;
+
+#ifdef ENABLE_CHECKING
+  dw_cfi_row *check_row;
+#endif
+} dw_label_info;
+
+DEF_VEC_O (dw_label_info);
+DEF_VEC_ALLOC_O (dw_label_info, heap);
+
+/* The variables making up the pseudo-cfg, as described above.  */
+#if 0
+static VEC (int, heap) *uid_luid;
+static VEC (dw_label_info, heap) *label_info;
+static VEC (dw_trace_info, heap) *trace_info;
+#endif
+
 /* A vector of call frame insns for the CIE.  */
 cfi_vec cie_cfi_vec;
 
@@ -82,6 +173,8 @@ cfi_vec cie_cfi_vec;
state provided by the CIE.  */
 static GTY(()) dw_cfi_row *cie_cfi_row;
 
+static GTY(()) reg_saved_in_data *cie_return_save;
+
 static GTY(()) unsigned long dwarf2out_cfi_label_num;
 
 /* The insn after which a new CFI note should be emitted.  */
@@ -90,6 +183,34 @@ static rtx add_cfi_insn;
 /* When non-null, add_cfi will add the CFI to this vector.  */
 static cfi_vec *add_cfi_vec;
 
+/* The current instruction trace.  */
+static dw_trace_info *cur_trace;
+
+/* The current, i.e. most recently generated, row of the CFI table.  */
+static dw_cfi_row *cur_row;
+
+/* The row state from a preceeding DW_CFA_remember_state.  */
+static dw_cfi_row *remember_row;
+
+/* 

[PATCH 9/9] dwarf2cfi: Generate and connect traces.

2011-07-14 Thread Richard Henderson
This kinda-sorta corresponds to Bernd's 007-dw2cfi patch.  Certainly
the same concepts of splitting the instruction stream into extended
basic blocks is the same.  This patch does a bit better job with the
documentation.  Also, I'm a bit more explicit about matching things
up with the similar code from the regular CFG routines.

What's missing at this point is any attempt to use DW_CFA_remember_state.
I've deferred that for the moment because it's easy to test the state
change code across epilogues, whereas the shrink-wrapping code is not
in this tree and section switching is difficult to force.
---
 gcc/dwarf2cfi.c |  792 ---
 1 files changed, 398 insertions(+), 394 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 9aa1208..10e5740 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include flags.h
 #include rtl.h
 #include function.h
+#include basic-block.h
 #include dwarf2.h
 #include dwarf2out.h
 #include dwarf2asm.h
@@ -86,34 +87,30 @@ DEF_VEC_ALLOC_O (reg_saved_in_data, heap);
 /* Since we no longer have a proper CFG, we're going to create a facsimile
of one on the fly while processing the frame-related insns.
 
-   We create dw_trace structures for each instruction trace beginning at
-   at a label following a barrier (or beginning of the function), and
-   ending at a barrier (or the end of the function).
+   We create dw_trace_info structures for each extended basic block beginning
+   and ending at a save point.  Save points are labels, barriers, certain
+   notes, and of course the beginning and end of the function.
 
As we encounter control transfer insns, we propagate the current
-   row state across the edges to the starts of traces.  If an edge goes
-   to a label that is not the start of a trace, we ignore it.  This
-   assumes that previous compiler transformations were correct, and that
-   we will reach the same row state from any source.  (We can perform some
-   limited validation of this assumption, but without the full CFG we
-   cannot be sure of full validation coverage.  It is expensive, so we
-   only do so with checking enabled.)
+   row state across the edges to the starts of traces.  When checking is
+   enabled, we validate that we propagate the same data from all sources.
 
All traces are members of the TRACE_INFO array, in the order in which
they appear in the instruction stream.
 
-   All labels are given an LUID that indexes the LABEL_INFO array.  If
-   the label is the start of a trace, the TRACE pointer will be non-NULL
-   and point into the TRACE_INFO array.  */
+   All save points are present in the TRACE_INDEX hash, mapping the insn
+   starting a trace to the dw_trace_info describing the trace.  */
 
 typedef struct
 {
-  /* The label that begins the trace.  This will be NULL for the first
- trace beginning at function entry.  */
-  rtx label;
+  /* The insn that begins the trace.  */
+  rtx head;
 
   /* The row state at the beginning and end of the trace.  */
-  dw_cfi_row *enter_row, *exit_row;
+  dw_cfi_row *beg_row, *end_row;
+
+  /* True if this trace immediately follows NOTE_INSN_SWITCH_TEXT_SECTIONS.  */
+  bool switch_sections;
 
   /* The following variables contain data used in interpreting frame related
  expressions.  These are not part of the real row state as defined by
@@ -147,24 +144,15 @@ typedef struct
 DEF_VEC_O (dw_trace_info);
 DEF_VEC_ALLOC_O (dw_trace_info, heap);
 
-typedef struct
-{
-  dw_trace_info *trace;
-
-#ifdef ENABLE_CHECKING
-  dw_cfi_row *check_row;
-#endif
-} dw_label_info;
+typedef dw_trace_info *dw_trace_info_ref;
 
-DEF_VEC_O (dw_label_info);
-DEF_VEC_ALLOC_O (dw_label_info, heap);
+DEF_VEC_P (dw_trace_info_ref);
+DEF_VEC_ALLOC_P (dw_trace_info_ref, heap);
 
 /* The variables making up the pseudo-cfg, as described above.  */
-#if 0
-static VEC (int, heap) *uid_luid;
-static VEC (dw_label_info, heap) *label_info;
 static VEC (dw_trace_info, heap) *trace_info;
-#endif
+static VEC (dw_trace_info_ref, heap) *trace_work_list;
+static htab_t trace_index;
 
 /* A vector of call frame insns for the CIE.  */
 cfi_vec cie_cfi_vec;
@@ -189,9 +177,6 @@ static dw_trace_info *cur_trace;
 /* The current, i.e. most recently generated, row of the CFI table.  */
 static dw_cfi_row *cur_row;
 
-/* The row state from a preceeding DW_CFA_remember_state.  */
-static dw_cfi_row *remember_row;
-
 /* We delay emitting a register save until either (a) we reach the end
of the prologue or (b) the register is clobbered.  This clusters
register saves so that there are fewer pc advances.  */
@@ -211,9 +196,6 @@ static VEC(queued_reg_save, heap) *queued_reg_saves;
emitting this data, i.e. updating CUR_ROW, without async unwind.  */
 static HOST_WIDE_INT queued_args_size;
 
-/* True if remember_state should be emitted before following CFI directive.  */
-static bool emit_cfa_remember;

  1   2   >