Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-24 Thread David Edelsohn
On Fri, May 23, 2014 at 11:23 AM, Alan Modra amo...@gmail.com wrote:
 OK, let's start again from scratch.  This patch fixes PR61098, a
 problem caused by trying to do arithmetic on the count register.  The
 fix is to provide a new pseudo in rs6000_emit_set_long_const so
 arithmetic will be done in a gpr.

 Additionally, the patch fixes a number of other bugs and cleanup
 issues with rs6000_emit_set_{,long_}const.

 - rs6000_emit_set_long_const took two HWI constant parameters, a relic
   from the days when HWI might be 32 bits on powerpc.  We're only
   setting a 64-bit value, so remove the unnecessary parameter.

 - The !TARGET_POWERPC64 handling of DImode assumed a 32 bit HWI, and
   the insn setting the low 32-bit reg was wrongly marked with a
   reg_equiv note saying the reg contained the entire 64-bit constant.
   I hadn't spotted the bad reg_equiv when writing the previous patch.

 - The comments describing the functions are inaccurate and misleading.

 - rs6000_emit_set_long_const always returns DEST, so it's caller can
   assume this and rs6000_emit_set_long_const return void.

 - The code testing for a NULL DEST in rs6000_emit_set_const is dead.
   DEST cannot be NULL, since the three uses of the function are in
   rs6000.md splitters where DEST (operand[0]) satisfies
   gpc_reg_operand.

 - The above two points mean that rs6000_emit_set_const always returns
   DEST, which in turn would allow rs6000_emit_set_const to return
   void.  However, in view of a possible future change that might need
   to return status on whether rs6000_emit_set_const emitted anything,
   return a bool.

 - rs6000_emit_set_const parameter N is currently unused, and MODE
   always matches GET_MODE (DEST), so N and MODE can be removed.

 - The code is liberally sprinkled with copy_rtx.  DEST/TEMP is always
   used once without copy_rtx, but which insn uses copy_rtx varies.  I
   changed the code to always use a bare DEST as the last insn for
   consistency.  (Writing the code this way might allow us to omit the
   copy_rtx on DEST/TEMP entirely.  Before reload TEMP will be a new
   pseudo reg, thus doesn't need copy_rtx, and after reload we
   shouldn't have a SUBREG DEST.  I wasn't sure of exactly what might
   happen during reload, so left well enough alone.)

 Bootstrapped and regression tested powerpc64-linux.  OK to apply
 mainline?

This is a much clearer start. Thanks for the revised version.

This is okay.

Thanks, David


Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-23 Thread Alan Modra
OK, let's start again from scratch.  This patch fixes PR61098, a
problem caused by trying to do arithmetic on the count register.  The
fix is to provide a new pseudo in rs6000_emit_set_long_const so
arithmetic will be done in a gpr.

Additionally, the patch fixes a number of other bugs and cleanup
issues with rs6000_emit_set_{,long_}const.

- rs6000_emit_set_long_const took two HWI constant parameters, a relic
  from the days when HWI might be 32 bits on powerpc.  We're only
  setting a 64-bit value, so remove the unnecessary parameter.

- The !TARGET_POWERPC64 handling of DImode assumed a 32 bit HWI, and
  the insn setting the low 32-bit reg was wrongly marked with a
  reg_equiv note saying the reg contained the entire 64-bit constant.
  I hadn't spotted the bad reg_equiv when writing the previous patch.

- The comments describing the functions are inaccurate and misleading.

- rs6000_emit_set_long_const always returns DEST, so it's caller can
  assume this and rs6000_emit_set_long_const return void.

- The code testing for a NULL DEST in rs6000_emit_set_const is dead.
  DEST cannot be NULL, since the three uses of the function are in
  rs6000.md splitters where DEST (operand[0]) satisfies
  gpc_reg_operand.

- The above two points mean that rs6000_emit_set_const always returns
  DEST, which in turn would allow rs6000_emit_set_const to return
  void.  However, in view of a possible future change that might need
  to return status on whether rs6000_emit_set_const emitted anything,
  return a bool.

- rs6000_emit_set_const parameter N is currently unused, and MODE
  always matches GET_MODE (DEST), so N and MODE can be removed.

- The code is liberally sprinkled with copy_rtx.  DEST/TEMP is always
  used once without copy_rtx, but which insn uses copy_rtx varies.  I
  changed the code to always use a bare DEST as the last insn for
  consistency.  (Writing the code this way might allow us to omit the
  copy_rtx on DEST/TEMP entirely.  Before reload TEMP will be a new
  pseudo reg, thus doesn't need copy_rtx, and after reload we
  shouldn't have a SUBREG DEST.  I wasn't sure of exactly what might
  happen during reload, so left well enough alone.)

Bootstrapped and regression tested powerpc64-linux.  OK to apply
mainline?

PR target/61098
* config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded
params and return a bool.  Remove dead code.  Update comment.
Assert we have a const_int source.  Remove bogus code from
32-bit HWI days.  Move !TARGET_POWERPC64 handling, and correct
handling of constants  2G and reg_equal note, from..
(rs6000_emit_set_long_const): ..here.  Remove unneeded param and
return value.  Update comment.  If we can, use a new pseudo
for intermediate calculations.
* config/rs6000/rs6000-protos.h (rs6000_emit_set_const): Update
prototype.
* config/rs6000/rs6000.md (movsi_internal1_single+1): Update
call to rs6000_emit_set_const in splitter.
(movdi_internal64+2, +3): Likewise.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 210835)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1068,7 +1068,7 @@ static tree rs6000_handle_longcall_attribute (tree
 static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
-static rtx rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
+static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
 static int rs6000_memory_move_cost (enum machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, int, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, enum machine_mode, addr_space_t,
@@ -7849,53 +7849,50 @@ rs6000_conditional_register_usage (void)
 }
 
 
-/* Try to output insns to set TARGET equal to the constant C if it can
-   be done in less than N insns.  Do all computations in MODE.
-   Returns the place where the output has been placed if it can be
-   done and the insns have been emitted.  If it would take more than N
-   insns, zero is returned and no insns and emitted.  */
+/* Output insns to set DEST equal to the constant SOURCE as a series of
+   lis, ori and shl instructions and return TRUE.  */
 
-rtx
-rs6000_emit_set_const (rtx dest, enum machine_mode mode,
-  rtx source, int n ATTRIBUTE_UNUSED)
+bool
+rs6000_emit_set_const (rtx dest, rtx source)
 {
-  rtx result, insn, set;
-  HOST_WIDE_INT c0, c1;
+  enum machine_mode mode = GET_MODE (dest);
+  rtx temp, insn, set;
+  HOST_WIDE_INT c;
 
+  gcc_checking_assert (CONST_INT_P (source));
+  c = INTVAL (source);
   switch (mode)
 {
-case  QImode:
+case QImode:
 case HImode:
-  if (dest == NULL)
-   dest = gen_reg_rtx (mode);
   

Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-14 Thread Alan Modra
Hi David,
On Tue, May 13, 2014 at 11:46:20PM -0400, David Edelsohn wrote:
 Danny may have re-organized the code, but I thought that it originally
 came from Tom Rixx, if not earlier.

OK, I'm not trying to apportion blame.  My name is on plenty of dodgy
code in the rs6000 backend too.  :)

 I seem to remember problems in the past with late creation of TOC
 entries for constants causing problems, so it was easier to fall back
 to materializing all integer constants inline. I don't remember the
 PRs, but I think there were issues with creating a TOC if the late
 constant were the only TOC reference, or maybe the issue was buying a
 stack frame to materialize the TOC/GOT for a late constant.  And
 maximum 5 instruction sequence is not really bad relative to a load
 from the TOC (even with medium model / data in TOC). There are a lot
 of trade-offs with respect to I$ expansion versus the load hitting in
 the L1 D$.

Sure, but Steve will tell you that the 5 instruction sequence is both
slower due to all the dependent ops, and results in larger code+data
size.  We definitely want to avoid it if possible, and pr67836 shows a
case taken from glibc math library code where there should be no
problem in using the TOC.

 Alpha emit_set_const does limit the number of instructions, but that
 is a search for a more efficient sequence than the naive sequence. The
 Alpha splitters use alpha_split_const_mov(), which tries
 alpha_emit_set_const() for an efficient sequence and then falls back
 to alpha_emit_set_long_const() for a naive sequence.  Alpha uses PLUS
 instead of IOR because of the way its ISA works.
 alpha_emit_set_long_const() always will materialize the constant and
 does not check for a maximum number of instructions. This is why it's
 comment says fall back to straight forward decomposition.
 
 However, alpha_emit_set_long_const() and alpha_split_const_mov() can
 fail, presumably because emit_move_insn() fails, not because of
 reaching a maximum number of instructions.
 
 alpha_legitimate_constant_p() rejecs expensive constants early. Once
 the splitter is invoked, it always tries to materialize the constant,
 but the splitter apparently can fail for other reasons.

No, that is wrong.  alpha_emit_set_const does *not* always try to
materialize the constant inline.  It does so for constants that need
more than three instructions only when TARGET_BUILD_CONSTANTS.

 I don't mind exploring the benefits of tighening up
 rs6000_legitimate_const(), but I'm not sure it's an obvious win,
 especially with the potential failure corner cases.

Yes, those potential corner cases have me worried too..

 However, I want to have a better understanding about the part of the
 patch that removes the FAIL path from the splitters.

That part was really quite simple.  I was removing dead code.
rs6000_emit_set_const has never returned anything but DEST, right from
the initial commit.  It can't be called with DEST == NULL, so
dest = gen_reg_rtx (mode) is also dead code.

However, I retracted that patch because I now think
rs6000_emit_set_const should in fact sometimes result in the splitter
failing, exactly as is done in the alpha port.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-14 Thread David Edelsohn
On Wed, May 14, 2014 at 5:56 AM, Alan Modra amo...@gmail.com wrote:

 I seem to remember problems in the past with late creation of TOC
 entries for constants causing problems, so it was easier to fall back
 to materializing all integer constants inline. I don't remember the
 PRs, but I think there were issues with creating a TOC if the late
 constant were the only TOC reference, or maybe the issue was buying a
 stack frame to materialize the TOC/GOT for a late constant.  And
 maximum 5 instruction sequence is not really bad relative to a load
 from the TOC (even with medium model / data in TOC). There are a lot
 of trade-offs with respect to I$ expansion versus the load hitting in
 the L1 D$.

 Sure, but Steve will tell you that the 5 instruction sequence is both
 slower due to all the dependent ops, and results in larger code+data
 size.  We definitely want to avoid it if possible, and pr67836 shows a
 case taken from glibc math library code where there should be no
 problem in using the TOC.

I don't necessarily believe this is a win overall.  If the constant
reliably is in the L1 D$ (or maybe L2 D$) and accessed with a direct
load (data in TOC or medium model), then yes. If it's farther away in
the memory hierarchy, then it's not a win. I agree about the code
expansion concern, which has its own secondary effects.

If this is a constant in a tight loop, okay, but if it's a unique
constant, it may not occur elsewhere in the code to be shared and may
not be placed in the same cache line as other, recently accessed
constants. This would push the load to L3 or farther.

Also, remember that this same heuristic is used by AIX, which still
defaults to small TOC model. So either the constant is in the TOC
anchor constant pool, which hopefully will pre-load the anchor, or
will be a constant in the TOC, possibly putting more pressure on TOC
size and causing overflow.

I am certain that there are anecdotal examples where it is a win for
PPC64 Linux, but I would want more evidence that it's a general win.

 alpha_emit_set_long_const() always will materialize the constant and
 does not check for a maximum number of instructions. This is why it's
 comment says fall back to straight forward decomposition.

 No, that is wrong.  alpha_emit_set_const does *not* always try to
 materialize the constant inline.  It does so for constants that need
 more than three instructions only when TARGET_BUILD_CONSTANTS.

I said that alpha_emit_set_long_const() always materializes the
constant, but, as you say, it is not always called.
alpha_emit_set_const() may fail if it requires too many instructions
or the search depth is too deep. You seem to be referring to some of
the logic in alpha_split_const_mov() as well.

Again, this definitely is worth exploring. And I am confident that
there are cases where loading the constant from memory is a win. I
just don't have a good instinct if it is a win most of the time for a
broad range of real-world applications. One optimization opportunity
in GLIBC is not a general heuristic. I don't think that we know a lot
about the context of the use of the constant to apply a finer-grained
policy.

I think the original code tried to put the constant in memory if it
appeared before reload, when everything could be calculated correctly
for prologue and materializing the TOC, but tried to materialze any
constants that appeared during reload using splitters.  That can avoid
some of the problem corner cases. The code needs to handle

PPC32
PPC64
eABI
AIX

Thanks, David


Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-13 Thread Alan Modra
On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote:
 On Thu, May 8, 2014 at 10:40 PM, Alan Modra amo...@gmail.com wrote:
 
  Please do not remove all of the comments from the two functions. The
  comments should provide some documentation about the different
  purposes of the two functions other than setting DEST to a CONST.
 
  I believe my updated comment covers the complete purpose of the
  function nowadays.  The comments I removed are out-dated, and should
  have been removed a long time ago..  rs6000_emit_set_const does not
  even look at N, it always returns a non-zero result, and the return is
  only tested for non-zero.  I removed MODE too, because that is always
  the same as GET_MODE (dest).
 
 It is helpful if the comment expresses more than restating the
 information one can glean from the function name. It's useful to note
 that rs6000_emit_set_long_const is a standard decomposition with a
 bounded number of instructions.
 
  I think that the way you rearranged the invocations of copy_rtx() in
  rs6000_emit_set_long_const() is okay, but it would be good for someone
  else to double check.
 
  Yeah, that function is a bit messy.  I took the approach of always use
  a bare dest once in the last instruction emitted, with every other
  use getting hit with copy_rtx.  The previous approach was similar,
  but used the bare dest on the first instruction emitted.  Obviously
  you don't need copy_rtx anywhere with the new code when
  can_create_pseudo_p is true, but I felt it wasn't worth optimising
  that for the added source complication.
 
 Can you help clarify the removal of the code that tests if the
 splitter failed?  The splitters in the Alpha port follow mostly the
 same rhythm, with a little bit of further cleanup and consolidation
 relative to the rs6000 port. alpha_split_const_mov() falls back on
 alpha_emit_set_long_const(), but checks that the target is valid and
 allows the splitter to fail. Either the Alpha port is doing
 unnecessary work or this cleanup patch is too aggressive. Either way,
 a comment seems necessary.

OK, I've had a good look at the history of this code.

rs6000_emit_set_const and rs6000_emit_set_long_const were introduced
with revision 44516, a largish patch by Dan Berlin.  As you hint
above, it seems the functions were copied from alpha.  So the
parameters were unnecessary and the comments just plain wrong for the
rs6000 version of code right from the initial commit.  Worse, only
half of necessary infrastructure was copied from alpha..

So let me lay out what I believe should be happening with
(set (reg) (constant))

At expand time, if the above set can't be implemented in a single
instruction, then it should be decomposed to the equivalent set high
part, ori low part, and possibly shift instructions so long as the
resulting sequence is small.  I think we basically do this correctly
in rs6000_emit_move.  See the num_insns_constant call there.
Constants that can't be evaluated inline by two (or three)
instructions will be replaced with a load from the TOC.

The same thing ought to happen in the splitters that use
rs6000_emit_set_const.  rs6000_emit_set_const should refuse to expand
to too many instructions (just like alpha).  We don't do this, but if
we did, this would leave some (set (reg) (constant)) instructions in
the RTL.  Alternatively the splitters could generate loads from the
TOC, but see pr57836, which shows the loads from the TOC we crafted
at expand time being reduced back to (set (reg) (constant)).

Finally, at reload time, any remaining (set (reg) (constant))
(ie. those that result in a long inline sequence) should be forced to
the TOC.  This is the missing part of the infrastructure that wasn't
copied from alpha.  Our legitimate_constant_p needs to reject some
constants..  As it is, reload simply expands to a four or five
inline instruction sequence.

David, I'd like some help with the legitimate_constant_p
implementation.  I have something that seems to work (not yet
regression tested) but there are a number of things that I'm not clear
on (eg. the revision 20229 change) so likely will get it wrong.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-13 Thread David Edelsohn
Alan,

Danny may have re-organized the code, but I thought that it originally
came from Tom Rixx, if not earlier.

I seem to remember problems in the past with late creation of TOC
entries for constants causing problems, so it was easier to fall back
to materializing all integer constants inline. I don't remember the
PRs, but I think there were issues with creating a TOC if the late
constant were the only TOC reference, or maybe the issue was buying a
stack frame to materialize the TOC/GOT for a late constant.  And
maximum 5 instruction sequence is not really bad relative to a load
from the TOC (even with medium model / data in TOC). There are a lot
of trade-offs with respect to I$ expansion versus the load hitting in
the L1 D$.

Alpha emit_set_const does limit the number of instructions, but that
is a search for a more efficient sequence than the naive sequence. The
Alpha splitters use alpha_split_const_mov(), which tries
alpha_emit_set_const() for an efficient sequence and then falls back
to alpha_emit_set_long_const() for a naive sequence.  Alpha uses PLUS
instead of IOR because of the way its ISA works.
alpha_emit_set_long_const() always will materialize the constant and
does not check for a maximum number of instructions. This is why it's
comment says fall back to straight forward decomposition.

However, alpha_emit_set_long_const() and alpha_split_const_mov() can
fail, presumably because emit_move_insn() fails, not because of
reaching a maximum number of instructions.

alpha_legitimate_constant_p() rejecs expensive constants early. Once
the splitter is invoked, it always tries to materialize the constant,
but the splitter apparently can fail for other reasons.

I don't mind exploring the benefits of tighening up
rs6000_legitimate_const(), but I'm not sure it's an obvious win,
especially with the potential failure corner cases.

However, I want to have a better understanding about the part of the
patch that removes the FAIL path from the splitters.

Thanks, David



On Tue, May 13, 2014 at 11:04 PM, Alan Modra amo...@gmail.com wrote:
 On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote:
 On Thu, May 8, 2014 at 10:40 PM, Alan Modra amo...@gmail.com wrote:

  Please do not remove all of the comments from the two functions. The
  comments should provide some documentation about the different
  purposes of the two functions other than setting DEST to a CONST.
 
  I believe my updated comment covers the complete purpose of the
  function nowadays.  The comments I removed are out-dated, and should
  have been removed a long time ago..  rs6000_emit_set_const does not
  even look at N, it always returns a non-zero result, and the return is
  only tested for non-zero.  I removed MODE too, because that is always
  the same as GET_MODE (dest).

 It is helpful if the comment expresses more than restating the
 information one can glean from the function name. It's useful to note
 that rs6000_emit_set_long_const is a standard decomposition with a
 bounded number of instructions.

  I think that the way you rearranged the invocations of copy_rtx() in
  rs6000_emit_set_long_const() is okay, but it would be good for someone
  else to double check.
 
  Yeah, that function is a bit messy.  I took the approach of always use
  a bare dest once in the last instruction emitted, with every other
  use getting hit with copy_rtx.  The previous approach was similar,
  but used the bare dest on the first instruction emitted.  Obviously
  you don't need copy_rtx anywhere with the new code when
  can_create_pseudo_p is true, but I felt it wasn't worth optimising
  that for the added source complication.

 Can you help clarify the removal of the code that tests if the
 splitter failed?  The splitters in the Alpha port follow mostly the
 same rhythm, with a little bit of further cleanup and consolidation
 relative to the rs6000 port. alpha_split_const_mov() falls back on
 alpha_emit_set_long_const(), but checks that the target is valid and
 allows the splitter to fail. Either the Alpha port is doing
 unnecessary work or this cleanup patch is too aggressive. Either way,
 a comment seems necessary.

 OK, I've had a good look at the history of this code.

 rs6000_emit_set_const and rs6000_emit_set_long_const were introduced
 with revision 44516, a largish patch by Dan Berlin.  As you hint
 above, it seems the functions were copied from alpha.  So the
 parameters were unnecessary and the comments just plain wrong for the
 rs6000 version of code right from the initial commit.  Worse, only
 half of necessary infrastructure was copied from alpha..

 So let me lay out what I believe should be happening with
 (set (reg) (constant))

 At expand time, if the above set can't be implemented in a single
 instruction, then it should be decomposed to the equivalent set high
 part, ori low part, and possibly shift instructions so long as the
 resulting sequence is small.  I think we basically do this correctly
 in rs6000_emit_move.  See 

Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-11 Thread Alan Modra
On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote:
 On Thu, May 8, 2014 at 10:40 PM, Alan Modra amo...@gmail.com wrote:
  rs6000_emit_set_const ... always returns a non-zero result ...
 
 Can you help clarify the removal of the code that tests if the
 splitter failed?

See above.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-11 Thread Alan Modra
On Mon, May 12, 2014 at 08:23:16AM +0930, Alan Modra wrote:
 On Sat, May 10, 2014 at 10:24:34PM -0400, David Edelsohn wrote:
  On Thu, May 8, 2014 at 10:40 PM, Alan Modra amo...@gmail.com wrote:
   rs6000_emit_set_const ... always returns a non-zero result ...
  
  Can you help clarify the removal of the code that tests if the
  splitter failed?
 
 See above.

On thinking some more, let me retract the patch for the time being.
While it's true that I was only removing dead code in the splitters,
the question of why this code has become dead is worth looking into.

I suspect a previous patch to rs6000_emit_set_const was wrong, and
that we should in fact be failing before reload (but never after),
when an expansion would take too many instructions.  Too many being
a sequence that is slower than loading a 64-bit constant from the
TOC.

We try to make that sort of tradeoff in rs6000_emit_move (see
num_insn_constant call), but that is really too early.  Some
manipulations of code modify constants.  I've seen cases where
rs6000_emit_move decided to inline a constant, but later changes to
the value meant the expansion was five instructions.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-10 Thread David Edelsohn
On Thu, May 8, 2014 at 10:40 PM, Alan Modra amo...@gmail.com wrote:

 Please do not remove all of the comments from the two functions. The
 comments should provide some documentation about the different
 purposes of the two functions other than setting DEST to a CONST.

 I believe my updated comment covers the complete purpose of the
 function nowadays.  The comments I removed are out-dated, and should
 have been removed a long time ago..  rs6000_emit_set_const does not
 even look at N, it always returns a non-zero result, and the return is
 only tested for non-zero.  I removed MODE too, because that is always
 the same as GET_MODE (dest).

It is helpful if the comment expresses more than restating the
information one can glean from the function name. It's useful to note
that rs6000_emit_set_long_const is a standard decomposition with a
bounded number of instructions.

 I think that the way you rearranged the invocations of copy_rtx() in
 rs6000_emit_set_long_const() is okay, but it would be good for someone
 else to double check.

 Yeah, that function is a bit messy.  I took the approach of always use
 a bare dest once in the last instruction emitted, with every other
 use getting hit with copy_rtx.  The previous approach was similar,
 but used the bare dest on the first instruction emitted.  Obviously
 you don't need copy_rtx anywhere with the new code when
 can_create_pseudo_p is true, but I felt it wasn't worth optimising
 that for the added source complication.

Can you help clarify the removal of the code that tests if the
splitter failed?  The splitters in the Alpha port follow mostly the
same rhythm, with a little bit of further cleanup and consolidation
relative to the rs6000 port. alpha_split_const_mov() falls back on
alpha_emit_set_long_const(), but checks that the target is valid and
allows the splitter to fail. Either the Alpha port is doing
unnecessary work or this cleanup patch is too aggressive. Either way,
a comment seems necessary.

Thanks, David


Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-08 Thread David Edelsohn
On Wed, May 7, 2014 at 9:48 PM, Alan Modra amo...@gmail.com wrote:
 On powerpc64, to set a large loop count we have code like the
 following after split1:

 (insn 67 14 68 4 (set (reg:DI 160)
 (const_int 99942400 [0x5f5])) /home/amodra/unaligned_load.c:14 -1
  (nil))
 (insn 68 67 42 4 (set (reg:DI 160)
 (ior:DI (reg:DI 160)
 (const_int 57600 [0xe100]))) /home/amodra/unaligned_load.c:14 -1
  (expr_list:REG_EQUAL (const_int 1 [0x5f5e100])
 (nil)))

 and then test for loop exit with:

 (jump_insn 65 31 45 5 (parallel [
 (set (pc)
 (if_then_else (ne (reg:DI 160)
 (const_int 1 [0x1]))
 (label_ref:DI 42)
 (pc)))
 (set (reg:DI 160)
 (plus:DI (reg:DI 160)
 (const_int -1 [0x])))
 (clobber (scratch:CC))
 (clobber (scratch:DI))
 ]) /home/amodra/unaligned_load.c:15 800 {*ctrdi_internal1}
  (int_list:REG_BR_PROB 9899 (nil))
  - 42)

 The jump_insn of course is meant for use with bdnz, which implies a
 strong preference for reg 160 to live in the count register.  Trouble
 is, the count register doesn't do arithmetic.

 So, use a new psuedo for intermediate results.  On looking at this,
 I noticed the !TARGET_POWERPC64 code in rs6000_emit_set_long_const was
 broken, apparently expecting c1 and c2 to be the high and low 32 bits
 of the constant.  That's no longer true, so I've fixed that as well.
 Bootstrapped and regression tested powerpc64-linux.  OK for mainline
 and branches?

 PR target/61098
 * config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded
 params and return value.  Simplify.  Update comment.
 (rs6000_emit_set_long_const): Remove unneeded param and return
 value.  Correct !TARGET_POWERPC64 handling of constants  2G.
 If we can, use a new pseudo for intermediate calculations.

Alan,

The history is 32 bit HWI.

The ChangeLog does not mention the changes to rs6000.md nor rs6000-protos.h.

Please do not remove all of the comments from the two functions. The
comments should provide some documentation about the different
purposes of the two functions other than setting DEST to a CONST.

Why did you remove the test for NULL dest?

-  if (dest == NULL)
-   dest = gen_reg_rtx (mode);

That could occur, at least it used to occur.

I think that the way you rearranged the invocations of copy_rtx() in
rs6000_emit_set_long_const() is okay, but it would be good for someone
else to double check.

Thanks, David


Re: [RS6000] Fix PR61098, Poor code setting count register

2014-05-08 Thread Alan Modra
On Thu, May 08, 2014 at 09:48:35AM -0400, David Edelsohn wrote:
 The history is 32 bit HWI.

Right.

 The ChangeLog does not mention the changes to rs6000.md nor rs6000-protos.h.

Oops, added.

* config/rs6000/rs6000.md (movsi_internal1_single+1): Update
call to rs6000_emit_set_const in splitter.
(movdi_internal64+2, +3): Likewise.
* config/rs6000/rs6000-protos.h (rs6000_emit_set_const): Update
prototype.

 Please do not remove all of the comments from the two functions. The
 comments should provide some documentation about the different
 purposes of the two functions other than setting DEST to a CONST.

I believe my updated comment covers the complete purpose of the
function nowadays.  The comments I removed are out-dated, and should
have been removed a long time ago..  rs6000_emit_set_const does not
even look at N, it always returns a non-zero result, and the return is
only tested for non-zero.  I removed MODE too, because that is always
the same as GET_MODE (dest).

 Why did you remove the test for NULL dest?
 
 -  if (dest == NULL)
 -   dest = gen_reg_rtx (mode);
 
 That could occur, at least it used to occur.

I'm sure we can't get a NULL dest nowadays.  All (three) uses of
rs6000_emit_set_const occur in splitters.  They all must have passed a
gpc_reg_operand constraint on operands[0] before calling
rs6000_emit_set_const, so if NULL were possible we'd segfault in
gpc_reg_operand.

 I think that the way you rearranged the invocations of copy_rtx() in
 rs6000_emit_set_long_const() is okay, but it would be good for someone
 else to double check.

Yeah, that function is a bit messy.  I took the approach of always use
a bare dest once in the last instruction emitted, with every other
use getting hit with copy_rtx.  The previous approach was similar,
but used the bare dest on the first instruction emitted.  Obviously
you don't need copy_rtx anywhere with the new code when
can_create_pseudo_p is true, but I felt it wasn't worth optimising
that for the added source complication.

-- 
Alan Modra
Australia Development Lab, IBM


[RS6000] Fix PR61098, Poor code setting count register

2014-05-07 Thread Alan Modra
On powerpc64, to set a large loop count we have code like the
following after split1:

(insn 67 14 68 4 (set (reg:DI 160)
(const_int 99942400 [0x5f5])) /home/amodra/unaligned_load.c:14 -1
 (nil))
(insn 68 67 42 4 (set (reg:DI 160)
(ior:DI (reg:DI 160)
(const_int 57600 [0xe100]))) /home/amodra/unaligned_load.c:14 -1
 (expr_list:REG_EQUAL (const_int 1 [0x5f5e100])
(nil)))

and then test for loop exit with:

(jump_insn 65 31 45 5 (parallel [
(set (pc)
(if_then_else (ne (reg:DI 160)
(const_int 1 [0x1]))
(label_ref:DI 42)
(pc)))
(set (reg:DI 160)
(plus:DI (reg:DI 160)
(const_int -1 [0x])))
(clobber (scratch:CC))
(clobber (scratch:DI))
]) /home/amodra/unaligned_load.c:15 800 {*ctrdi_internal1}
 (int_list:REG_BR_PROB 9899 (nil))
 - 42)

The jump_insn of course is meant for use with bdnz, which implies a
strong preference for reg 160 to live in the count register.  Trouble
is, the count register doesn't do arithmetic.

So, use a new psuedo for intermediate results.  On looking at this,
I noticed the !TARGET_POWERPC64 code in rs6000_emit_set_long_const was
broken, apparently expecting c1 and c2 to be the high and low 32 bits
of the constant.  That's no longer true, so I've fixed that as well.
Bootstrapped and regression tested powerpc64-linux.  OK for mainline
and branches?

PR target/61098
* config/rs6000/rs6000.c (rs6000_emit_set_const): Remove unneeded
params and return value.  Simplify.  Update comment.
(rs6000_emit_set_long_const): Remove unneeded param and return
value.  Correct !TARGET_POWERPC64 handling of constants  2G.
If we can, use a new pseudo for intermediate calculations.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 209926)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1068,7 +1069,7 @@ static tree rs6000_handle_longcall_attribute (tree
 static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
-static rtx rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
+static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
 static int rs6000_memory_move_cost (enum machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, int, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, enum machine_mode, addr_space_t,
@@ -7826,53 +7811,36 @@ rs6000_conditional_register_usage (void)
 }
 
 
-/* Try to output insns to set TARGET equal to the constant C if it can
-   be done in less than N insns.  Do all computations in MODE.
-   Returns the place where the output has been placed if it can be
-   done and the insns have been emitted.  If it would take more than N
-   insns, zero is returned and no insns and emitted.  */
+/* Output insns to set DEST equal to the constant SOURCE.  */
 
-rtx
-rs6000_emit_set_const (rtx dest, enum machine_mode mode,
-  rtx source, int n ATTRIBUTE_UNUSED)
+void
+rs6000_emit_set_const (rtx dest, rtx source)
 {
-  rtx result, insn, set;
-  HOST_WIDE_INT c0, c1;
+  enum machine_mode mode = GET_MODE (dest);
+  rtx temp, insn, set;
+  HOST_WIDE_INT c;
 
+  gcc_checking_assert (CONST_INT_P (source));
+  c = INTVAL (source);
   switch (mode)
 {
-case  QImode:
+case QImode:
 case HImode:
-  if (dest == NULL)
-   dest = gen_reg_rtx (mode);
   emit_insn (gen_rtx_SET (VOIDmode, dest, source));
-  return dest;
+  return;
 
 case SImode:
-  result = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode);
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode);
 
-  emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (result),
- GEN_INT (INTVAL (source)
-   (~ (HOST_WIDE_INT) 0x;
+  emit_insn (gen_rtx_SET (VOIDmode, copy_rtx (temp),
+ GEN_INT (c  (~ (HOST_WIDE_INT) 0x;
   emit_insn (gen_rtx_SET (VOIDmode, dest,
- gen_rtx_IOR (SImode, copy_rtx (result),
-  GEN_INT (INTVAL (source)  
0x;
-  result = dest;
+ gen_rtx_IOR (SImode, copy_rtx (temp),
+  GEN_INT (c  0x;
   break;
 
 case DImode:
-  switch (GET_CODE (source))
-   {
-   case CONST_INT:
- c0 = INTVAL (source);
- c1 = -(c0  0);
- break;
-
-   default:
- gcc_unreachable ();
-   }
-
-  result = rs6000_emit_set_long_const (dest, c0, c1);
+