Re: A reload inheritance bug

2007-07-06 Thread Mark Shinwell

Bernd Schmidt wrote:

Mark Shinwell wrote:

Do you think it should be the case that, at the point below, _any_ reload
with reg_rtx corresponding to a hard register should have the relevant
bit set in reload_spill_index?


I think so.  I'm attaching a patch below.  It appears to have no effect
on all code I've tried - does it fix your test case?


As noted before, this does fix the test case; I've now regtested it
on arm-none-eabi and there are no regressions.  Do you want to take
care of applying it?

Mark


Re: A reload inheritance bug

2007-06-26 Thread Mark Shinwell

Mark Mitchell wrote:

Bernd Schmidt wrote:

Mark Shinwell wrote:

Do you think it should be the case that, at the point below, _any_ reload
with reg_rtx corresponding to a hard register should have the relevant
bit set in reload_spill_index?

I think so.  I'm attaching a patch below.  It appears to have no effect
on all code I've tried - does it fix your test case?


Mark, did you get a chance to try Bernd's patch?


This does indeed fix the test case.  I'll do full test runs for cross to
ARM on mainline with this patch applied.

Thanks (and sorry for the delay),
Mark


Re: A reload inheritance bug

2007-06-05 Thread Mark Shinwell

Bernd Schmidt wrote:

 My gut feeling is that this example will work as a consequence.


... note that I inserted some other insn which could conceivably use
R9 as an input reload, as the hard reg is dead.  Where would we
invalidate previous information about R9?  I assume it would be the loop
at the end of emit_reload_insns, specifically

  /* First, clear out memory of what used to be in this
spill reg.
 If consecutive registers are used, clear them all.  */

  for (k = 0; k  nr; k++)
{
CLEAR_HARD_REG_BIT (reg_reloaded_valid, i + k);
  CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
i + k);

Why isn't this triggering?


This code is guarded by

  /* I is nonneg if this reload used a register.
 If rld[r].reg_rtx is 0, this is an optional reload
 that we opted to ignore.  */

  if (i = 0  rld[r].reg_rtx != 0)

and in this [my original] case, i is negative (see my original patch).

I don't understand the I is nonneg ... comment above: the surrounding
code seems to say that i is actually non-negative if the reload used a
_spill_ register (rather than just any register).  Bernd, could you
clarify the precise meaning of spill register in this context?  I've
not yet managed to completely pin this down.

Mark


Re: A reload inheritance bug

2007-06-05 Thread Mark Shinwell

Bernd Schmidt wrote:

It appears that spill_reg_index is only set up for registers that go
through the choose_reload_regs process, not for the ones selected during
find_reloads.  That's probably a bad thing, as a reg_rtx chosen in
find_reloads could be used as a spill reg in a previous insn (as in your
example).


Yes, I think that is indeed what's going on.  I wonder if this issue
of spill_reg_index is also manifesting itself in:

  /* The following if-statement was #if 0'd in 1.34 (or before...).
 It's reenabled in 1.35 because supposedly nothing else
 deals with this problem.  */

  /* If a register gets output-reloaded from a non-spill register,
 that invalidates any previous reloaded copy of it.
 But forget_old_reloads_1 won't get to see it, because
 it thinks only about the original insn.  So invalidate it here.
 Also do the same thing for RELOAD_OTHER constraints where the
 output is discarded.  */
  if (i  0
...

and the code following in emit_reload_insns?  Perhaps if spill_reg_index
took account of registers selected during find_reloads then this could
be simplified too.

So what do you think is the best approach to fix all of this? :-)

Mark


Re: A reload inheritance bug

2007-06-05 Thread Mark Shinwell

Bernd Schmidt wrote:

Mark Shinwell wrote:


and the code following in emit_reload_insns?  Perhaps if spill_reg_index
took account of registers selected during find_reloads then this could
be simplified too.

So what do you think is the best approach to fix all of this? :-)


Sounds like you gave the answer yourself in the first paragraph.  At
least that's what I'd try.


OK, I'll have a go -- it seems like a good thing to clean up anyway,
even modulo fixing this bug.

Mark


Re: A reload inheritance bug

2007-05-30 Thread Mark Shinwell

Bernd Schmidt wrote:

insn 5301: (set (reg/f:SI 4082) (reg/f:SI 3275))

insn 5291 (set (reg:DF 4078])
  (mem/s:DF (plus:SI (reg/f:SI 3275) (reg:SI 3812
  REG_DEAD 3275

insn 5314 (set (reg:DF 4096)
  (mem/s:DF (plus:SI (reg/f:SI 4082) (reg:SI 4084



After reload we end up with the following.  I've added dividers to show
the correspondence with the insns above.

insn 5301 (set (mem/f/c:SI (plus:SI (reg/f:SI 13 sp) (const_int 12)))
 (reg/f:SI 9 r9 [3275]))
---
insn 6675 (set (reg:SI 9 r9)
   (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812])))

insn 5291 (set (reg:DF 75 s12 [4078])
   (mem/s:DF (reg:SI 9 r9)))
---
insn 6680 (set (reg:SI 1 r1) (const_int 4928))

insn 6681 (set (reg:SI 1 r1)
   (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 1 r1)))

insn 5314 (set (reg:DF 75 s12 [4096])
   (mem/s:DF (reg:SI 1 r1)))

We see here how pseudo 3275 was allocated to r9 and pseudo 4082 was
spilled to the stack.  At insn 5291, r9 has been allocated [*] as the
reload register since pseudo 3275 dies in that instruction; at insn
5314 we see the then-incorrect use of r9 (in instruction 6681)
for the value of pseudo 4082.  Note also how the dump shows that the
compiler thinks r9 still holds the value of pseudo 3275 at insn 6681.


Presumably this is one thing that is mildly unusual - R9 being chosen in
find_reloads already.  This wouldn't happen later, since it's in
reg_used_in_insn and therefore disallowed.


Sorry for the delay in replying -- hectic week!

As you say, one unusual thing about this situation must be the fact
that the reload register is getting chosen by the code in
push_reload heralded by If this is an input reload and the operand
contains a register that dies in this insn and is used nowhere else,
see if it is the right class to be used for this reload.  Use it if so.
etc.  I suspect it is the conjunction of this code and the behaviour of
reload in choosing r9 for pseudo 4082 in insn 5314 above that causes the
problem.

I don't entirely follow your example below...


Still, assume a similar sequence

insn 5301: (set (reg/f:SI 4082) (reg/f:SI 3275))

insn 5291 (set (reg:DF 4078])
   (unspec:DF (mem/s:DF (plus:SI (reg/f:SI 3275)
 (reg:SI 3812)))
  (reg:SI 3275)))
  REG_DEAD 3275

some other insn where R9 is used for an input reload

insn 5314 (set (reg:DF 4096)
  (mem/s:DF (plus:SI (reg/f:SI 4082) (reg:SI 4084

Here, we wouldn't use R9 as reload register in 5291


...since here, as far as I understand it, the clause mentioned above
in push_reload wouldn't select r9 to use as a reload register for
5291.  My gut feeling is that this example will work as a consequence.
(Perhaps you're getting at the fact that the decision procedure leading
us to choose r9 for pseudo 4082 in insn 5314 might be at fault instead?
Even if so, I still suspect it's the reuse of a hard reg in an insn with
a REG_DEAD note for the corresponding pseudo that is the real cause
because it upsets the later code -- that's what my patch was trying to
correct.)

Mark


Re: A reload inheritance bug

2007-05-15 Thread Mark Shinwell

Rask Ingemann Lambertsen wrote:

On Mon, May 14, 2007 at 10:47:13PM +0100, Mark Shinwell wrote:

[snip]

- the last use of reg2 (in B) is inside a matched input operand;

[snip]

The reload used for the instruction at B looks like this:

Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812]))
GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 8
reload_in_reg: (plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812]))
reload_reg_rtx: (reg:SI 9 r9)

where, in the notation from above, r9 is H and pseudo 3275 is reg2.


   I could be missing something here, but aren't matched operands given a
reload type of RELOAD_OTHER rather than RELOAD_FOR_INPUT?


I might be tripping up on the terminology here then.  The original
instruction is:

(insn:HI 5291 5282 5295 2 (set (reg:DF 4078 [ Jd+1040 ])
(mem/s:DF (plus:SI (reg/f:SI 3275)
(reg:SI 3812)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil)
(expr_list:REG_DEAD (reg/f:SI 3275)
(expr_list:REG_DEAD (reg:SI 3812)
(nil

and after reload we end up with:

(insn 6675 5282 5291 2 (set (reg:SI 9 r9)
(plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812]))) 4 {*arm_addsi3} (nil)
(nil))

(insn:HI 5291 6675 5295 2 (set (reg:DF 75 s12 [orig:4078 Jd+1040 ] [4078])
(mem/s:DF (reg:SI 9 r9) [30 Jd+1040 S8 A64])) 578
{*movdf_vfp} (nil)
(nil))


   Also, which register is the inc by 8 note for? R9 or sl?


I thought the value of inc wasn't relevant here since we aren't
dealing with an autoincrement instruction (see the comment in
reload.h:struct reload).

Mark


Re: A reload inheritance bug

2007-05-15 Thread Mark Shinwell

Rask Ingemann Lambertsen wrote:

On Tue, May 15, 2007 at 09:31:10AM +0100, Mark Shinwell wrote:

Rask Ingemann Lambertsen wrote:

On Mon, May 14, 2007 at 10:47:13PM +0100, Mark Shinwell wrote:

[snip]

   - the last use of reg2 (in B) is inside a matched input operand;

[snip]

The reload used for the instruction at B looks like this:

[snip]

   GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 8

[snip]

  I could be missing something here, but aren't matched operands given a
reload type of RELOAD_OTHER rather than RELOAD_FOR_INPUT?

I might be tripping up on the terminology here then.  The original
instruction is:

(insn:HI 5291 5282 5295 2 (set (reg:DF 4078 [ Jd+1040 ])
(mem/s:DF (plus:SI (reg/f:SI 3275)
(reg:SI 3812)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil)
(expr_list:REG_DEAD (reg/f:SI 3275)
(expr_list:REG_DEAD (reg:SI 3812)
(nil


   Where is the matched input operand you're referring to? I don't see any
in the *movdf_vfp pattern.


Sorry, I am mistaken in the terminology.  Change the bullet point to:

- the last use of reg2 (in B) is inside an input operand.

Mark


Re: A reload inheritance bug

2007-05-15 Thread Mark Shinwell

Rask Ingemann Lambertsen wrote:

On Mon, May 14, 2007 at 10:47:13PM +0100, Mark Shinwell wrote:


I'm fairly certain that this is the correct approach to fix this
issue, but I'm less certain that I have correctly guarded the call
to forget_old_reloads_1, and indeed that I've done everything to
eradicate the previous reloads involving H.  For example, should I be
wiping the necessary range in reg_last_reload_reg?


   No, any use of reg_last_reload_reg should be guarded by
TEST_HARD_REG_BIT(reg_reloaded_valid, reg). It is just an optimization to
avoid a linear scan of reg_reloaded_contents.


OK, thanks.


   There are two places in reload_as_needed() where note_stores() calls
forget_old_reloads(). The first one at the top before emitting reload insns:

  else if (INSN_P (insn))
{
  regset_head regs_to_forget;
  INIT_REG_SET (regs_to_forget);
  note_stores (PATTERN (insn), forget_old_reloads_1, regs_to_forget);


This one won't catch the store to the reload register r9 that causes
the clobber because it's only scanning PATTERN (insn).  It's the same
issue as noted in the existing comment in emit_reload_insns:

  /* If a register gets output-reloaded from a non-spill register,
 that invalidates any previous reloaded copy of it.
 But forget_old_reloads_1 won't get to see it, because
 it thinks only about the original insn.  So invalidate it here.
 Also do the same thing for RELOAD_OTHER constraints where the
 output is discarded.  */


   The second one after emitting reload insns, just before the big
AUTO_INC_DEC block at the end:

  if (num_eliminable  chain-need_elim)
update_eliminable_offsets ();

   /* Any previously reloaded spilled pseudo reg, stored in this
   * insn,
 is no longer validly lying around to save a future reload.
 Note that this does not detect pseudos that were reloaded
 for this insn in order to be stored in
 (obeying register constraints).  That is correct; such reload
 registers ARE still valid.  */
  forget_marked_reloads (regs_to_forget);
  CLEAR_REG_SET (regs_to_forget);

  /* There may have been CLOBBER insns placed after INSN.  So scan
 between INSN and NEXT and use them to forget old reloads.  */
  for (x = NEXT_INSN (insn); x != old_next; x = NEXT_INSN (x))
if (NONJUMP_INSN_P (x)  GET_CODE (PATTERN (x)) == CLOBBER)
  note_stores (PATTERN (x), forget_old_reloads_1, NULL);

#ifdef AUTO_INC_DEC
  /* Likewise for regs altered by auto-increment in this insn.


This case relies on the presence of a CLOBBER though, which we don't
have.

As per Bernd's request I'll provide some more reload/RTL dumps shortly
which might clarify this.

 I'm thinking that maybe we should

include the reload insns themselves in the scan.


I'm not suitably qualified to answer this one, but it sounds more
invasive than my current patch.

Mark


Re: A reload inheritance bug

2007-05-15 Thread Mark Shinwell

Bernd Schmidt wrote:

Mark Shinwell wrote:

The bug is currently only exhibiting itself on a proprietary testcase
when compiled for an ARM target and is extremely sensitive to the
particular code involved.  It arises as follows, using the same notation
as in Richard's mail:


If you can't post the testcase, the best thing you can do to help is to
print out all the involved RTL insns before and after, as well as output
from debug_reload for all of them (possibly once after returning to
reload_as_needed from find_reloads, and again after choose_reload_regs
has run).


The relevant RTL instructions before reload are as follows.  These
correspond to points A, B and C respectively in my previous email.

(insn:HI 5301 3071 3079 2 (set (reg/f:SI 4082)
(reg/f:SI 3275)) 572 {*arm_movsi_vfp} (nil)
(expr_list:REG_EQUIV (symbol_ref:SI (*.LANCHOR0) [flags 0x182])
(nil)))
---
(insn:HI 5291 5282 5295 2 (set (reg:DF 4078 [ Jd+1040 ])
(mem/s:DF (plus:SI (reg/f:SI 3275)
(reg:SI 3812)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil)
(expr_list:REG_DEAD (reg/f:SI 3275)
(expr_list:REG_DEAD (reg:SI 3812)
(nil
---
(insn:HI 5314 5308 5318 2 (set (reg:DF 4096 [ Jd+1040 ])
(mem/s:DF (plus:SI (reg/f:SI 4082)
(reg:SI 4084)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil)
(expr_list:REG_EQUIV (mem/s:DF (plus:SI (reg/f:SI 4082)
(reg:SI 4084)) [30 Jd+1040 S8 A64])
(nil)))

After reload we end up with the following.  I've added dividers to show
the correspondence with the insns above.

(insn:HI 5301 3071 3079 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 13 sp)
(const_int 12 [0xc])) [36 S4 A32])
(reg/f:SI 9 r9 [3275])) 572 {*arm_movsi_vfp} (nil)
(expr_list:REG_EQUIV (symbol_ref:SI (*.LANCHOR0) [flags 0x182])
(nil)))
---
(insn 6675 5282 5291 2 (set (reg:SI 9 r9)
(plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812]))) 4 {*arm_addsi3} (nil)
(nil))

(insn:HI 5291 6675 5295 2 (set (reg:DF 75 s12 [orig:4078 Jd+1040 ] [4078])
(mem/s:DF (reg:SI 9 r9) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil)
(nil))
---
(insn 6680 5308 6681 2 (set (reg:SI 1 r1)
(const_int 4928 [0x1340])) 572 {*arm_movsi_vfp} (nil)
(nil))

(insn 6681 6680 5314 2 (set (reg:SI 1 r1)
(plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 1 r1))) 4 {*arm_addsi3} (nil)
(nil))

(insn:HI 5314 6681 5318 2 (set (reg:DF 75 s12 [orig:4096 Jd+1040 ] [4096])
(mem/s:DF (reg:SI 1 r1) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil)
(expr_list:REG_EQUIV (mem/s:DF (reg:SI 1 r1) [30 Jd+1040 S8 A64])
(nil)))

We see here how pseudo 3275 was allocated to r9 and pseudo 4082 was
spilled to the stack.  At insn 5291, r9 has been allocated [*] as the
reload register since pseudo 3275 dies in that instruction; at insn
5314 we see the then-incorrect use of r9 (in instruction 6681)
for the value of pseudo 4082.  Note also how the dump shows that the
compiler thinks r9 still holds the value of pseudo 3275 at insn 6681.

The reloads and the values of the insn variable in reload_as_needed
are as follows.  These have been dumped at the points you suggest
(the output after choose_reload_regs being the same as after
find_reloads).

For instruction 5301:

(insn:HI 5301 3071 3079 2 (set (reg/f:SI 4082)
(reg/f:SI 9 r9 [3275])) 572 {*arm_movsi_vfp} (nil)
(expr_list:REG_EQUIV (symbol_ref:SI (*.LANCHOR0) [flags 0x182])
(nil)))
Reload 0: reload_out (SI) = (reg/f:SI 4082)
NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
reload_out_reg: (reg/f:SI 4082)

For instruction 5291:

(insn:HI 5291 5282 5295 2 (set (reg:DF 75 s12 [orig:4078 Jd+1040 ] [4078])
(mem/s:DF (plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812])) [30 Jd+1040 S8 A64])) 578 
{*movdf_vfp} (nil)

(expr_list:REG_DEAD (reg/f:SI 9 r9 [3275])
(expr_list:REG_DEAD (reg:SI 10 sl [3812])
(nil
Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812]))
GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 8
reload_in_reg: (plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812]))
reload_reg_rtx: (reg:SI 9 r9)

For instruction 5314:

(insn:HI 5314 5308 5318 2 (set (reg:DF 75 s12 [orig:4096 Jd+1040 ] [4096])
(mem/s:DF (plus:SI (reg/f:SI 4082)
(reg:SI 4084)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil)
(expr_list:REG_EQUIV (mem/s:DF (plus:SI (reg/f:SI 4082)
(reg:SI 4084)) [30 Jd+1040 S8 A64])
(nil)))
Reload 0: reload_in (SI) = (reg/f:SI 4082)
GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1)
reload_in_reg: (reg/f:SI 4082)
Reload 1: reload_in (SI) = (const_int 4928 [0x1340])
GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1), can't combine

Re: A reload inheritance bug

2007-05-15 Thread Mark Shinwell

Rask Ingemann Lambertsen wrote:

On Mon, May 14, 2007 at 10:47:13PM +0100, Mark Shinwell wrote:

I'm fairly certain that this is the correct approach to fix this
issue, but I'm less certain that I have correctly guarded the call
to forget_old_reloads_1,

[snip]
 

Index: reload1.c
===
--- reload1.c   (revision 170932)
+++ reload1.c   (working copy)
@@ -7506,6 +7506,9 @@ emit_reload_insns (struct insn_chain *ch
}
}

+  if (i  0  rld[r].in != NULL_RTX  rld[r].reg_rtx != NULL_RTX)
+   forget_old_reloads_1 (rld[r].reg_rtx, NULL_RTX, NULL);
+
   /* The following if-statement was #if 0'd in 1.34 (or before...).
 It's reenabled in 1.35 because supposedly nothing else
 deals with this problem.  */


   It seems to me that the only special thing happening in your testcase is
that reload.in(_reg) is a PLUS rather than a REG or MEM. Does you patch not
prevent reload inheritance in many cases where it would be OK?


See below.


   Hmm, the immediately preceeding if() block begins:

  /* I is nonneg if this reload used a register.
 If rld[r].reg_rtx is 0, this is an optional reload
 that we opted to ignore.  */

  if (i = 0  rld[r].reg_rtx != 0)
{

   I'd like a similiar comment just before your code. It is the i  0 in
your case which prevents the preceeding if() block from updating
reg_reloaded_valid and so on.


I'm not as clear as I ought to be about the i  0 condition; what I
was trying to establish here was is this reload register a hard
register that was pre-existing in the subexpression being reloaded?
I believe this corresponds to is this register not a spill register
(hence the use of i  0, following the existing code below) -- but there
doesn't seem anywhere to be a concrete definition of exactly what
counts as a spill register in this context, so I may be mistaken.

Part of the reason for starting this thread was that I was concerned
about invalidating reloads that could be re-used later.  However, it
seems to me that in every circumstance where the reload register is a
hard register and the value assigned to that reload register is
different from the value held by the register during the evaluation of
the subexpression being reloaded (as we have here with r9 - r9 + r10),
we must invalidate all previous reloads involving that hard register.
I may well have encoded that logic incorrectly in the patch, though.

Mark

--

  /* If a register gets output-reloaded from a non-spill register,
 that invalidates any previous reloaded copy of it.
 But forget_old_reloads_1 won't get to see it, because
 it thinks only about the original insn.  So invalidate it here.
 Also do the same thing for RELOAD_OTHER constraints where the
 output is discarded.  */
  if (i  0
   ((rld[r].out != 0
(REG_P (rld[r].out)
   || (MEM_P (rld[r].out)
REG_P (rld[r].out_reg
  || (rld[r].out == 0  rld[r].out_reg
   REG_P (rld[r].out_reg
{
  rtx out = ((rld[r].out  REG_P (rld[r].out))
 ? rld[r].out : rld[r].out_reg);
  int nregno = REGNO (out);

  /* REG_RTX is now set or clobbered by the main instruction.
 As the comment above explains, forget_old_reloads_1 only
 sees the original instruction, and there is no guarantee
 that the original instruction also clobbered REG_RTX.
 For example, if find_reloads sees that the input side of
 a matched operand pair dies in this instruction, it may
 use the input register as the reload register.

 Calling forget_old_reloads_1 is a waste of effort if
 REG_RTX is also the output register.

 If we know that REG_RTX holds the value of a pseudo
 register, the code after the call will record that fact.  */
  if (rld[r].reg_rtx  rld[r].reg_rtx != out)
forget_old_reloads_1 (rld[r].reg_rtx, NULL_RTX, NULL);


Re: A reload inheritance bug

2007-05-15 Thread Mark Shinwell

Bernd Schmidt wrote:

Mark Shinwell wrote:

These dumps are of course taken before the application of my patch.

Hope that helps,


Thanks.  I may have missed it in the previous mails, but which piece of
code exactly decides to use R9 for reload 0 of insn 5314?


I'm afraid I'm not sure exactly -- but given that r9 is still deemed
to contain a valid reload, isn't that logic correct given the pseudo
register numbers in question?

Mark


A reload inheritance bug

2007-05-14 Thread Mark Shinwell

I have had the misfortune to discover a reload inheritance bug which,
after a long period of analysis, has turned out to be very similar to
the situation described by Richard Sandiford in
http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01977.html.

This being my first serious foray into reload, I would appreciate some
advice as to whether it looks like my current patch is going in the
right direction.  (Ian, I've copied this to you since you approved
Richard's patch linked above; any help would be appreciated.)

The bug is currently only exhibiting itself on a proprietary testcase
when compiled for an ARM target and is extremely sensitive to the
particular code involved.  It arises as follows, using the same notation
as in Richard's mail:

A: reg1 - reg2
   ...
B: last use of reg2
   ...
C: use of reg1

where:

- reg1 is not allocated a hard register, and is spilled to the stack;
- reg2 is allocated a hard register H;
- reg2 has a death note in B;
- the last use of reg2 (in B) is inside a matched input operand;
- reg1 and reg2 are not modified between A and C;
- there are no labels between A and C.

The reload used for the instruction at B looks like this:

Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812]))
GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 8
reload_in_reg: (plus:SI (reg/f:SI 9 r9 [3275])
(reg:SI 10 sl [3812]))
reload_reg_rtx: (reg:SI 9 r9)

where, in the notation from above, r9 is H and pseudo 3275 is reg2.
Since 3275 dies in the corresponding instruction, reload has chosen
r9 as the reload register.  Unfortunately when we later arrive at
instruction C during inheritance, the current code believes that r9
is still holding the value that it had at A -- which it is not, because
the use of r9 as the reload register at B has clobbered it.

In order to fix this I think that reload1.c:emit_reload_insns should,
at the point of discovery of an input reload whose reload register is a
non-spill hard register H, invalidate all previous reloads involving
that hard register.  That is what the patch below does, or aims to
do.  (The patch is actually against a 4.2-based branch but the code
looks to be the same on the mainline.)  It at least fixes this
particular test case, and appears to be in line with Richard's patch.

I'm fairly certain that this is the correct approach to fix this
issue, but I'm less certain that I have correctly guarded the call
to forget_old_reloads_1, and indeed that I've done everything to
eradicate the previous reloads involving H.  For example, should I be
wiping the necessary range in reg_last_reload_reg?  On the subject of
the guard, I am unsure because the existing code involves a much
more complicated check:

  if (i  0
   ((rld[r].out != 0
(REG_P (rld[r].out)
   || (MEM_P (rld[r].out)
REG_P (rld[r].out_reg
  || (rld[r].out == 0  rld[r].out_reg
   REG_P (rld[r].out_reg

Should I perhaps be mirroring this check for the input case too?
I'm quite keen to make the fix for this bug cover all eventualities
in terms of the various varieties of reload that might arise, if
possible...

Thanks in advance for any help.  Once this is fixed, I shall add the
necessary comments, test it appropriately, and submit it for approval
on the mainline.

Mark

--


Index: reload1.c
===
--- reload1.c   (revision 170932)
+++ reload1.c   (working copy)
@@ -7506,6 +7506,9 @@ emit_reload_insns (struct insn_chain *ch
}
}

+  if (i  0  rld[r].in != NULL_RTX  rld[r].reg_rtx != NULL_RTX)
+   forget_old_reloads_1 (rld[r].reg_rtx, NULL_RTX, NULL);
+
   /* The following if-statement was #if 0'd in 1.34 (or before...).
 It's reenabled in 1.35 because supposedly nothing else
 deals with this problem.  */


libjvm.la and problems with libtool relink mode

2006-12-14 Thread Mark Shinwell

I am currently involved in building GCC toolchains by configuring them with
the prefix that is to be used on the target system (somewhere in /opt)
and then installing them in a separate working installation directory
(say somewhere in my scratch space).  The installation step into this
working installation directory is performed by invoking a command of
the form make prefix=/path/to/working/dir/install install.

Unfortunately it seems that, when using current mainline and building
for native x86_64, there is an awkward problem involving the building
of libjvm.la.  This libtool file is built from two others: one .lo file
and one .la file.  The difficulty arises at the libtool install step
for libjvm.la.  It seems that since the link command that constructed
libjvm.la built that library using another .la file (libgcj.la) then
libtool writes a relink command into the libjvm.la file.  Then, when
installing libjvm.la using libtool, it attempts to execute that relink
command.  This however fails because the installation location is not
a subdirectory of the configured prefix -- it is somewhere completely
different.  This piece of libtool code causes the error:

  # Don't allow the user to place us outside of our expected
  # location b/c this prevents finding dependent libraries that
  # are installed to the same prefix.
  # At present, this check doesn't affect windows .dll's that
  # are installed into $libdir/../bin (currently, that works fine)
  # but it's something to keep an eye on.
  if test $inst_prefix_dir = $destdir; then
$echo $modename: error: cannot install \`$file' to a directory not 
ending in $libdir 12

exit 1
  fi

This problem does not arise with various other .la files created during
the build process of gcc (such as libstdc++.la for example) because those
are directly created from a bunch of .lo files; none of them were built
from another .la as libjvm.la is.

The writing of the relink command into libjvm.la comes as a result of
need_relink=yes on line 2095 of the ltmain.sh in the toplevel directory
of mainline gcc.  I wonder if this assignment ought to be guarded
by a test involving $hardcode_into_libs, which is currently set to yes in
my generated builddir/x86_64-unknown-linux-gnu/libjava/libtool.
Perhaps I should be having $hardcode_into_libs set to no in addition to
changing the libtool code so that need_relink is only set to yes if
$hardcode_into_libs is set to yes also?  The exact behaviour of the
two modes of $hardcode_into_libs isn't clear to me, however, so I'm not
very sure.  With that solution I'm also assuming that libtool is doing
the right thing in forcing a relink in this situation (if $hardcode_into_libs
is yes); perhaps even that isn't the case.

If anyone could offer any advice as to how to proceed here, I'd be most
grateful.  I've copied this to Alexandre since a colleague suggested he
might know the answer :-)

Mark


Re: subreg transformation causes incorrect post_inc

2006-11-12 Thread Mark Shinwell

[EMAIL PROTECTED] wrote:

From: Mark Shinwell [EMAIL PROTECTED]

[EMAIL PROTECTED] wrote:

My port, based on (GCC) 4.2.0 20061002 (experimental), is producing
incorrect code for the following test case:

[snip]

I've only had a very quick look at your code, but I have a feeling 
thatthis is an instance of the kind of slip-up with 
GO_IF_MODE_DEPENDENT_ADDRESSthat my patch at 
http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00858.html is
aimed at preventing.  (This patch is currently only applied to 
addrmodesbranch.)


Mark


Hhmm.  Is the intent of your patch simply to prevent the mistake of
backends not defining GO_IF_MODE_DEPENDENT_ADDRESS properly?


That's right.  Presumably something else is wrong, then :-)

Mark


Re: subreg transformation causes incorrect post_inc

2006-11-10 Thread Mark Shinwell

[EMAIL PROTECTED] wrote:

My port, based on (GCC) 4.2.0 20061002 (experimental), is producing
incorrect code for the following test case:

[snip]

I've only had a very quick look at your code, but I have a feeling that
this is an instance of the kind of slip-up with GO_IF_MODE_DEPENDENT_ADDRESS
that my patch at http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00858.html is
aimed at preventing.  (This patch is currently only applied to addrmodes
branch.)

Mark



Re: Even stricter implicit conversions between vectors

2006-11-02 Thread Mark Shinwell

Ian Ollmann wrote:

stronger type checking seems like a good idea to me in general.


I agree, but I don't really want to break lots of code all at once,
even if that code is being slightly more slack than it perhaps ought
to be :-)

Given that no-one has really objected to stronger type-checking here
_per se_, then I see two ways forward:

1. Treat this as a regression: fix it and cause errors upon bad
conversions, but risk breaking code.

2. Emit a warning in cases of converting vector signed int to
vector unsigned int, etc., and state that the behaviour will change
to an error in a later version.

Thoughts?

Mark


Re: Even stricter implicit conversions between vectors

2006-11-02 Thread Mark Shinwell

Ian Lance Taylor wrote:

I would vote for: break the code, but provide an option to restore the
old behaviour, and mention the option in the error message.


I like this -- I shall prepare a patch and circulate it for review.

Mark


Re: Even stricter implicit conversions between vectors

2006-10-31 Thread Mark Shinwell

Ian Lance Taylor wrote:

Mark Shinwell [EMAIL PROTECTED] writes:


I would now like to propose that the check in that function be made
even stronger such that it disallows conversions between vectors
whose element types differ -- even if an implicit conversion exists
between those element types.


As far as I can see, what that amounts to is saying that there should
not be an implicit conversion between vector of int and vector of
unsigned int.  It seems to me that a cast would then be required to
call, e.g., vec_add with arguments of type __vector unsigned int.  I
don't see how that is useful.

But perhaps I am mistaken; can you give an example of the type of
thing which is currently permitted which you think should be
prohibited?


Things exactly like what you write above (an implicit conversion
between vector of int and vector of unsigned int).  My main argument
is that the gcc behaviour seems to be at odds with the semantics
envisaged by the designers of these various vector-based instruction
sets we see around -- and indeed the designers of the STL -- although
perhaps that is a misconception and there are actually many more
examples where it is believed that these implicit conversions are
reasonable.

I don't see a reason _per se_ why having the implicit conversions
is a good thing -- for things such as vec_add, presumably more
alternatives can be added (there are various ones already I see for
differing orders of vector bool short / vector signed short) etc.
I think it is open to debate whether it is a good thing to have to
make those alternatives explicit, or whether some should be filled in
by implicit conversions.

However, this...

Mike Stump wrote:
 My only concern is that we have tons of customers with tons of code and
 you don't have any

That isn't quite true :-)

 and that you break their code.

...is more of a concern, I agree, and is what I worry about most.

Mark


Implicit conversions between vectors

2006-10-12 Thread Mark Shinwell

Currently we permit implicit conversions between vectors whose total
bitsizes are equal but which are divided into differing numbers of subparts.
It seems that in some circumstances this rule is overly lax.  For example
the following code, using vector types (whose definitions I have provided
from the intrinsics header file) defined for the ARM NEON instruction set,
is accepted without error or warning:

...
typedef __builtin_neon_qi int8x8_t  __attribute__ ((__vector_size__ (8)));
typedef __builtin_neon_hi int16x4_t __attribute__ ((__vector_size__ (8)));
...

int8x8_t f (int16x4_t a)
{
  return a;
}

Here, the compiler is not complaining about an attempt to implicitly
convert a vector of four 16-bit quantities to a vector of eight
8-bit quantities.

This lack of type safety is unsettling, and I wonder if it should be fixed
with a patch along the lines of the (not yet fully tested) one below.  Does
that sound reasonable?  It seems right to try to fix the generic code here,
even though the testcase in hand is target-specific.  If this approach
is unreasonable, I guess some target-specific hooks will be needed.

Mark


--


Index: gcc/c-common.c
===
--- gcc/c-common.c  (revision 117639)
+++ gcc/c-common.c  (working copy)
@@ -1014,7 +1014,8 @@ vector_types_convertible_p (tree t1, tre
  (TREE_CODE (TREE_TYPE (t1)) != REAL_TYPE ||
 TYPE_PRECISION (t1) == TYPE_PRECISION (t2))
  INTEGRAL_TYPE_P (TREE_TYPE (t1))
-   == INTEGRAL_TYPE_P (TREE_TYPE (t2)));
+   == INTEGRAL_TYPE_P (TREE_TYPE (t2))
+ TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2));
 }

 /* Convert EXPR to TYPE, warning about conversion problems with constants.


Re: Implicit conversions between vectors

2006-10-12 Thread Mark Shinwell

Ian Lance Taylor wrote:

I believe that the problem with changing this unconditionally is that
the Altivec programming guidelines specify the rules which gcc
currently follows: you are permitted to assign one vector variable to
another, without an explicit cast, when the vectors are the same size.
So please check the Altivec programming manual.


Will do, thanks.

Mark


Re: Expansion of __builtin_frame_address

2006-06-02 Thread Mark Shinwell

Richard Earnshaw wrote:

I'm not keen on this.  On some machines a frame pointer is *very*
expensive, both in terms of the code required to set it up, and the
resulting loss of a register which affects code quality (in addition, on
Thumb, the frame pointer can access much less data on the stack than the
stack pointer can, so code quality is affected even more).


Do you have anything in mind that would be a better default?  Is there
something that could be used instead of hard_frame_pointer_rtx that
will later expand to the correct frame address, but not necessarily force
use of a frame pointer, for example?  (As far as I can tell,
frame_pointer_rtx will not do at least in the ARM case, because it doesn't
yield the same value.)

If the hard frame pointer is forced by default, then targets which are
particularly badly affected can simply define INITIAL_FRAME_ADDRESS_RTX.
Since such targets would presumably not have to force reload to keep
the frame pointer, then definitions of such macros would not need to
be side-effecting (in the way described earlier in this thread) and thus
be satisfactory.


I can see no argument for a frame pointer being *required* for getting
the return address.  We didn't have to do this in the past, so I think
it is wrong to require that we do it now.


Currently, the code does require a frame pointer in all except the
count == 0 case, and as far as that particular case goes I get the
impression that it would have been treated in the same way had this
glibc backtrace function been noticed last year.  This may be a mistaken
impression though.

Mark


Re: Expansion of __builtin_frame_address

2006-06-02 Thread Mark Shinwell

David Edelsohn wrote:

Mark Shinwell writes:


Mark If the hard frame pointer is forced by default, then targets which are
Mark particularly badly affected can simply define INITIAL_FRAME_ADDRESS_RTX.
Mark Since such targets would presumably not have to force reload to keep
Mark the frame pointer, then definitions of such macros would not need to
Mark be side-effecting (in the way described earlier in this thread) and thus
Mark be satisfactory.

PowerPC also does not need hard_frame_pointer_rtx for all cases.
It seems like a bad idea to force every port to define
INITIAL_FRAME_ADDRESS_RTX to avoid a penalty.  Why can't whatever port
needs this change define INITIAL_FRAME_ADDRESS_RTX to
hard_frame_pointer_rtx?   One could add count to the macro so that the
port can optimize further and avoid hard_frame_pointer_rtx, if possible.


OK, here is what I think is a better suggestion.  First note that
expand_builtin_return_addr is used for both __builtin_frame_address and
__builtin_return_address.  The behaviour for the return address
case seems to be for target-specific code to override the result of this
function in the case when count == 0; thus, it does indeed not matter what
we return from expand_builtin_return_addr in that case.  (I hadn't
realised this before.)  The new patch, below, thus has the same behaviour
for __builtin_return_address.

However when dealing with __builtin_frame_address, we must return the
correct value from this function no matter what the value of count.  This
patch therefore forces use of a hard FP in such situations.

Is that more satisfactory?

Mark
Index: builtins.c
===
--- builtins.c  (revision 114325)
+++ builtins.c  (working copy)
@@ -496,12 +496,16 @@ expand_builtin_return_addr (enum built_i
 #else
   rtx tem;
 
-  /* For a zero count, we don't care what frame address we return, so frame
- pointer elimination is OK, and using the soft frame pointer is OK.
- For a non-zero count, we require a stable offset from the current frame
- pointer to the previous one, so we must use the hard frame pointer, and
+  /* For a zero count with __builtin_return_address, we don't care what
+ frame address we return, because target-specific definitions will
+ override us.  Therefore frame pointer elimination is OK, and using
+ the soft frame pointer is OK.
+
+ For a non-zero count, or a zero count with __builtin_frame_address,
+ we require a stable offset from the current frame pointer to the
+ previous one, so we must use the hard frame pointer, and
  we must disable frame pointer elimination.  */
-  if (count == 0)
+  if (count == 0  fndecl_code == BUILT_IN_RETURN_ADDRESS)
 tem = frame_pointer_rtx;
   else 
 {


Re: Expansion of __builtin_frame_address

2006-06-01 Thread Mark Shinwell

Mark Mitchell wrote:

Mark Shinwell wrote:

As for the remaining problem, I suggest that we could:

(i) always return the hard frame pointer, and disable FP elimination in
the current function; or

(iii) ...the same as option (i), but allow targets to define another macro
that will cause the default code to use the soft frame pointer rather than
the hard frame pointer, and hence allow FP elimination.  (If such a macro
were set by a particular target, am I right in thinking that it would be
safe to use the soft frame pointer even in the count = 1 cases?)



I tend to think that option (iii) might be best, although perhaps it
is overkill and option (i) would do.  But I'm not entirely sure;
still being a gcc novice I have to admit to not being quite thoroughly
clear on this myself at this stage.  So any advice or comments would be
appreciated!


I agree that option (iii) is best, as it provides the ability to
optimize on platforms where that is feasible, and still provides a
working default elsewhere.  I will review and approve a suitable patch
to implement (iii), assuming that there are no objections from Jim or
others.


This having been discussed some more, and my understanding improved,
I now believe that option (i) is in fact the correct thing to do.  The
attach patch implements this, which basically amounts to the same logic
that is currently in the compiler save for the removal of the special
case when count == 0.

OK for mainline?

Mark

--

2006-06-01  Mark Shinwell  [EMAIL PROTECTED]

* gcc/builtins.c (expand_builtin_return_addr): Always use
hard_frame_pointer_rtx and prevent frame pointer elimination
if INITIAL_FRAME_ADDRESS_RTX isn't set.
Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 114277)
+++ gcc/builtins.c  (working copy)
@@ -496,20 +496,14 @@ expand_builtin_return_addr (enum built_i
 #else
   rtx tem;
 
-  /* For a zero count, we don't care what frame address we return, so frame
- pointer elimination is OK, and using the soft frame pointer is OK.
- For a non-zero count, we require a stable offset from the current frame
- pointer to the previous one, so we must use the hard frame pointer, and
- we must disable frame pointer elimination.  */
-  if (count == 0)
-tem = frame_pointer_rtx;
-  else 
-{
-  tem = hard_frame_pointer_rtx;
+  /* We require a stable offset from the current frame pointer to the
+ previous one, so we must use the hard frame pointer, and we must
+ disable frame pointer elimination.  */
 
-  /* Tell reload not to eliminate the frame pointer.  */
-  current_function_accesses_prior_frames = 1;
-}
+  tem = hard_frame_pointer_rtx;
+
+  /* Tell reload not to eliminate the frame pointer.  */
+  current_function_accesses_prior_frames = 1;
 #endif
 
   /* Some machines need special handling before we can access


Expansion of __builtin_frame_address

2006-05-25 Thread Mark Shinwell

Hi,

I'd like to gather some opinions and advice on the expansion of
__builtin_frame_address, as discussed on gcc-patches last year [1, 2].
This centres on the following comment in expand_builtin_return_addr
arising from revision 103294 last year:

/* For a zero count, we don't care what frame address we return, so frame
   pointer elimination is OK, and using the soft frame pointer is OK.
   For a non-zero count, we require a stable offset from the current frame
   pointer to the previous one, so we must use the hard frame pointer, and
   we must disable frame pointer elimination.  */

I believe that, when this function is used to expand
__builtin_frame_address (), the first sentence isn't true: in some cases,
a function does care about the value at count == 0.  A concrete
example is the glibc backtrace () function [3] which uses the expression
__builtin_frame_address (0) to determine the starting point for stack
traversal.  (It performs subsequent dereferences back down the chain
of frame pointers by itself.)

The wording of the comment in itself is unfortunately not the end of
the issue.  Due to the subsequent conditional testing count == 0, the
builtin can expand to an erroneous value when the following conditions
are met:

- the expansion function is invoked with count set to zero; and

- the target is such that frame_pointer_rtx and hard_frame_pointer_rtx
do not ultimately yield the same address.

An example of such an invocation occurs during compilation of the
aforementioned backtrace function, and an example of such a target is the
ARM.  Currently, calling backtrace () on such a target yields a failure.

Let us just consider how to fix the ARM case for a moment.  The obvious
thing to do is to define INITIAL_FRAME_ADDRESS_RTX.  However, the correct
semantics of such a macro definition would be to:

- set current_function_accesses_prior_frames to 1, so that FP is not
eliminated in this function; and

- return hard_frame_pointer_rtx.

I hope I'm not alone in thinking that such a side-effecting macro would
be in bad taste.

Let us come back to the more general case.  As identified above, when
expand_builtin_return_addr is used to expand __builtin_frame_address (),
we do care about the count == 0 case.  I believe that the comment should
be adjusted to reflect this whatever other changes, if any, are made.
As for the remaining problem, I suggest that we could:

(i) always return the hard frame pointer, and disable FP elimination in
the current function; or

(ii) remove this logic entirely (but preserve the means to communicate any
necessity to prevent FP elimination to reload) and insist that all
targets define INITIAL_FRAME_ADDRESS_RTX.  Any such solution should
probably adjust INITIAL_FRAME_ADDRESS_RTX so that it doesn't have to
cause a side-effect in order to communicate the information about the
frame pointer.  Or...

(iii) ...the same as option (i), but allow targets to define another macro
that will cause the default code to use the soft frame pointer rather than
the hard frame pointer, and hence allow FP elimination.  (If such a macro
were set by a particular target, am I right in thinking that it would be
safe to use the soft frame pointer even in the count = 1 cases?)

Option (ii) is in some ways the most satisfactory, as it would provide
more certainty (particularly if another target were to be added) that
the code is doing the right thing.  However it involves a moderate amount
of work and I would assume that the current level of enthusiasm in this
regard is approximately the same as last year :-)

Option (i), which is in all but name the solution 5 approach [1] proposed
last year, means that the count == 0 case is elevated to the same level
of importance as the count  0 cases, in line with the use in
backtrace ().  The problem with this is that on platforms where the
soft and hard FPs coincide there is going to be a slight
performance degradation, as identified previously, whenever these
builtins are used.  Someone with more experience will have to enlighten
me as to whether this penalty would actually be significant: my
intuition tells me that in fact it would not be, though perhaps there are
oft-used occurrences of these builtin expansions that I don't know about.
(On how many platforms is it the case that the soft and hard FPs actually
coincide?  If there are platforms where they do _not_ coincide, then
presumably those targets can be affected in the same way as I identify
above for ARM?  I'm confused because the current code seems to assume
(in the count == 0 case) that the soft and hard FPs coincide, yet uses
the hard rather than the soft FP for count = 1.)

Option (iii) gives the advantage of a working default and removes the
pessimization on a target-by-target basis, just when it is known to be
safe.  If I'm correct in thinking that the setting of the soft frame
pointer macro would enable the soft frame pointer to be used no matter
what the value of count, then this option actually