Re: A reload inheritance bug
Bernd Schmidt wrote: Mark Shinwell wrote: Do you think it should be the case that, at the point below, _any_ reload with reg_rtx corresponding to a hard register should have the relevant bit set in reload_spill_index? I think so. I'm attaching a patch below. It appears to have no effect on all code I've tried - does it fix your test case? As noted before, this does fix the test case; I've now regtested it on arm-none-eabi and there are no regressions. Do you want to take care of applying it? Mark
Re: A reload inheritance bug
Mark Mitchell wrote: Bernd Schmidt wrote: Mark Shinwell wrote: Do you think it should be the case that, at the point below, _any_ reload with reg_rtx corresponding to a hard register should have the relevant bit set in reload_spill_index? I think so. I'm attaching a patch below. It appears to have no effect on all code I've tried - does it fix your test case? Mark, did you get a chance to try Bernd's patch? This does indeed fix the test case. I'll do full test runs for cross to ARM on mainline with this patch applied. Thanks (and sorry for the delay), Mark
Re: A reload inheritance bug
Bernd Schmidt wrote: My gut feeling is that this example will work as a consequence. ... note that I inserted some other insn which could conceivably use R9 as an input reload, as the hard reg is dead. Where would we invalidate previous information about R9? I assume it would be the loop at the end of emit_reload_insns, specifically /* First, clear out memory of what used to be in this spill reg. If consecutive registers are used, clear them all. */ for (k = 0; k nr; k++) { CLEAR_HARD_REG_BIT (reg_reloaded_valid, i + k); CLEAR_HARD_REG_BIT (reg_reloaded_call_part_clobbered, i + k); Why isn't this triggering? This code is guarded by /* I is nonneg if this reload used a register. If rld[r].reg_rtx is 0, this is an optional reload that we opted to ignore. */ if (i = 0 rld[r].reg_rtx != 0) and in this [my original] case, i is negative (see my original patch). I don't understand the I is nonneg ... comment above: the surrounding code seems to say that i is actually non-negative if the reload used a _spill_ register (rather than just any register). Bernd, could you clarify the precise meaning of spill register in this context? I've not yet managed to completely pin this down. Mark
Re: A reload inheritance bug
Bernd Schmidt wrote: It appears that spill_reg_index is only set up for registers that go through the choose_reload_regs process, not for the ones selected during find_reloads. That's probably a bad thing, as a reg_rtx chosen in find_reloads could be used as a spill reg in a previous insn (as in your example). Yes, I think that is indeed what's going on. I wonder if this issue of spill_reg_index is also manifesting itself in: /* The following if-statement was #if 0'd in 1.34 (or before...). It's reenabled in 1.35 because supposedly nothing else deals with this problem. */ /* If a register gets output-reloaded from a non-spill register, that invalidates any previous reloaded copy of it. But forget_old_reloads_1 won't get to see it, because it thinks only about the original insn. So invalidate it here. Also do the same thing for RELOAD_OTHER constraints where the output is discarded. */ if (i 0 ... and the code following in emit_reload_insns? Perhaps if spill_reg_index took account of registers selected during find_reloads then this could be simplified too. So what do you think is the best approach to fix all of this? :-) Mark
Re: A reload inheritance bug
Bernd Schmidt wrote: Mark Shinwell wrote: and the code following in emit_reload_insns? Perhaps if spill_reg_index took account of registers selected during find_reloads then this could be simplified too. So what do you think is the best approach to fix all of this? :-) Sounds like you gave the answer yourself in the first paragraph. At least that's what I'd try. OK, I'll have a go -- it seems like a good thing to clean up anyway, even modulo fixing this bug. Mark
Re: A reload inheritance bug
Bernd Schmidt wrote: insn 5301: (set (reg/f:SI 4082) (reg/f:SI 3275)) insn 5291 (set (reg:DF 4078]) (mem/s:DF (plus:SI (reg/f:SI 3275) (reg:SI 3812 REG_DEAD 3275 insn 5314 (set (reg:DF 4096) (mem/s:DF (plus:SI (reg/f:SI 4082) (reg:SI 4084 After reload we end up with the following. I've added dividers to show the correspondence with the insns above. insn 5301 (set (mem/f/c:SI (plus:SI (reg/f:SI 13 sp) (const_int 12))) (reg/f:SI 9 r9 [3275])) --- insn 6675 (set (reg:SI 9 r9) (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812]))) insn 5291 (set (reg:DF 75 s12 [4078]) (mem/s:DF (reg:SI 9 r9))) --- insn 6680 (set (reg:SI 1 r1) (const_int 4928)) insn 6681 (set (reg:SI 1 r1) (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 1 r1))) insn 5314 (set (reg:DF 75 s12 [4096]) (mem/s:DF (reg:SI 1 r1))) We see here how pseudo 3275 was allocated to r9 and pseudo 4082 was spilled to the stack. At insn 5291, r9 has been allocated [*] as the reload register since pseudo 3275 dies in that instruction; at insn 5314 we see the then-incorrect use of r9 (in instruction 6681) for the value of pseudo 4082. Note also how the dump shows that the compiler thinks r9 still holds the value of pseudo 3275 at insn 6681. Presumably this is one thing that is mildly unusual - R9 being chosen in find_reloads already. This wouldn't happen later, since it's in reg_used_in_insn and therefore disallowed. Sorry for the delay in replying -- hectic week! As you say, one unusual thing about this situation must be the fact that the reload register is getting chosen by the code in push_reload heralded by If this is an input reload and the operand contains a register that dies in this insn and is used nowhere else, see if it is the right class to be used for this reload. Use it if so. etc. I suspect it is the conjunction of this code and the behaviour of reload in choosing r9 for pseudo 4082 in insn 5314 above that causes the problem. I don't entirely follow your example below... Still, assume a similar sequence insn 5301: (set (reg/f:SI 4082) (reg/f:SI 3275)) insn 5291 (set (reg:DF 4078]) (unspec:DF (mem/s:DF (plus:SI (reg/f:SI 3275) (reg:SI 3812))) (reg:SI 3275))) REG_DEAD 3275 some other insn where R9 is used for an input reload insn 5314 (set (reg:DF 4096) (mem/s:DF (plus:SI (reg/f:SI 4082) (reg:SI 4084 Here, we wouldn't use R9 as reload register in 5291 ...since here, as far as I understand it, the clause mentioned above in push_reload wouldn't select r9 to use as a reload register for 5291. My gut feeling is that this example will work as a consequence. (Perhaps you're getting at the fact that the decision procedure leading us to choose r9 for pseudo 4082 in insn 5314 might be at fault instead? Even if so, I still suspect it's the reuse of a hard reg in an insn with a REG_DEAD note for the corresponding pseudo that is the real cause because it upsets the later code -- that's what my patch was trying to correct.) Mark
Re: A reload inheritance bug
Rask Ingemann Lambertsen wrote: On Mon, May 14, 2007 at 10:47:13PM +0100, Mark Shinwell wrote: [snip] - the last use of reg2 (in B) is inside a matched input operand; [snip] The reload used for the instruction at B looks like this: Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812])) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 8 reload_in_reg: (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812])) reload_reg_rtx: (reg:SI 9 r9) where, in the notation from above, r9 is H and pseudo 3275 is reg2. I could be missing something here, but aren't matched operands given a reload type of RELOAD_OTHER rather than RELOAD_FOR_INPUT? I might be tripping up on the terminology here then. The original instruction is: (insn:HI 5291 5282 5295 2 (set (reg:DF 4078 [ Jd+1040 ]) (mem/s:DF (plus:SI (reg/f:SI 3275) (reg:SI 3812)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (expr_list:REG_DEAD (reg/f:SI 3275) (expr_list:REG_DEAD (reg:SI 3812) (nil and after reload we end up with: (insn 6675 5282 5291 2 (set (reg:SI 9 r9) (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812]))) 4 {*arm_addsi3} (nil) (nil)) (insn:HI 5291 6675 5295 2 (set (reg:DF 75 s12 [orig:4078 Jd+1040 ] [4078]) (mem/s:DF (reg:SI 9 r9) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (nil)) Also, which register is the inc by 8 note for? R9 or sl? I thought the value of inc wasn't relevant here since we aren't dealing with an autoincrement instruction (see the comment in reload.h:struct reload). Mark
Re: A reload inheritance bug
Rask Ingemann Lambertsen wrote: On Tue, May 15, 2007 at 09:31:10AM +0100, Mark Shinwell wrote: Rask Ingemann Lambertsen wrote: On Mon, May 14, 2007 at 10:47:13PM +0100, Mark Shinwell wrote: [snip] - the last use of reg2 (in B) is inside a matched input operand; [snip] The reload used for the instruction at B looks like this: [snip] GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 8 [snip] I could be missing something here, but aren't matched operands given a reload type of RELOAD_OTHER rather than RELOAD_FOR_INPUT? I might be tripping up on the terminology here then. The original instruction is: (insn:HI 5291 5282 5295 2 (set (reg:DF 4078 [ Jd+1040 ]) (mem/s:DF (plus:SI (reg/f:SI 3275) (reg:SI 3812)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (expr_list:REG_DEAD (reg/f:SI 3275) (expr_list:REG_DEAD (reg:SI 3812) (nil Where is the matched input operand you're referring to? I don't see any in the *movdf_vfp pattern. Sorry, I am mistaken in the terminology. Change the bullet point to: - the last use of reg2 (in B) is inside an input operand. Mark
Re: A reload inheritance bug
Rask Ingemann Lambertsen wrote: On Mon, May 14, 2007 at 10:47:13PM +0100, Mark Shinwell wrote: I'm fairly certain that this is the correct approach to fix this issue, but I'm less certain that I have correctly guarded the call to forget_old_reloads_1, and indeed that I've done everything to eradicate the previous reloads involving H. For example, should I be wiping the necessary range in reg_last_reload_reg? No, any use of reg_last_reload_reg should be guarded by TEST_HARD_REG_BIT(reg_reloaded_valid, reg). It is just an optimization to avoid a linear scan of reg_reloaded_contents. OK, thanks. There are two places in reload_as_needed() where note_stores() calls forget_old_reloads(). The first one at the top before emitting reload insns: else if (INSN_P (insn)) { regset_head regs_to_forget; INIT_REG_SET (regs_to_forget); note_stores (PATTERN (insn), forget_old_reloads_1, regs_to_forget); This one won't catch the store to the reload register r9 that causes the clobber because it's only scanning PATTERN (insn). It's the same issue as noted in the existing comment in emit_reload_insns: /* If a register gets output-reloaded from a non-spill register, that invalidates any previous reloaded copy of it. But forget_old_reloads_1 won't get to see it, because it thinks only about the original insn. So invalidate it here. Also do the same thing for RELOAD_OTHER constraints where the output is discarded. */ The second one after emitting reload insns, just before the big AUTO_INC_DEC block at the end: if (num_eliminable chain-need_elim) update_eliminable_offsets (); /* Any previously reloaded spilled pseudo reg, stored in this * insn, is no longer validly lying around to save a future reload. Note that this does not detect pseudos that were reloaded for this insn in order to be stored in (obeying register constraints). That is correct; such reload registers ARE still valid. */ forget_marked_reloads (regs_to_forget); CLEAR_REG_SET (regs_to_forget); /* There may have been CLOBBER insns placed after INSN. So scan between INSN and NEXT and use them to forget old reloads. */ for (x = NEXT_INSN (insn); x != old_next; x = NEXT_INSN (x)) if (NONJUMP_INSN_P (x) GET_CODE (PATTERN (x)) == CLOBBER) note_stores (PATTERN (x), forget_old_reloads_1, NULL); #ifdef AUTO_INC_DEC /* Likewise for regs altered by auto-increment in this insn. This case relies on the presence of a CLOBBER though, which we don't have. As per Bernd's request I'll provide some more reload/RTL dumps shortly which might clarify this. I'm thinking that maybe we should include the reload insns themselves in the scan. I'm not suitably qualified to answer this one, but it sounds more invasive than my current patch. Mark
Re: A reload inheritance bug
Bernd Schmidt wrote: Mark Shinwell wrote: The bug is currently only exhibiting itself on a proprietary testcase when compiled for an ARM target and is extremely sensitive to the particular code involved. It arises as follows, using the same notation as in Richard's mail: If you can't post the testcase, the best thing you can do to help is to print out all the involved RTL insns before and after, as well as output from debug_reload for all of them (possibly once after returning to reload_as_needed from find_reloads, and again after choose_reload_regs has run). The relevant RTL instructions before reload are as follows. These correspond to points A, B and C respectively in my previous email. (insn:HI 5301 3071 3079 2 (set (reg/f:SI 4082) (reg/f:SI 3275)) 572 {*arm_movsi_vfp} (nil) (expr_list:REG_EQUIV (symbol_ref:SI (*.LANCHOR0) [flags 0x182]) (nil))) --- (insn:HI 5291 5282 5295 2 (set (reg:DF 4078 [ Jd+1040 ]) (mem/s:DF (plus:SI (reg/f:SI 3275) (reg:SI 3812)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (expr_list:REG_DEAD (reg/f:SI 3275) (expr_list:REG_DEAD (reg:SI 3812) (nil --- (insn:HI 5314 5308 5318 2 (set (reg:DF 4096 [ Jd+1040 ]) (mem/s:DF (plus:SI (reg/f:SI 4082) (reg:SI 4084)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (expr_list:REG_EQUIV (mem/s:DF (plus:SI (reg/f:SI 4082) (reg:SI 4084)) [30 Jd+1040 S8 A64]) (nil))) After reload we end up with the following. I've added dividers to show the correspondence with the insns above. (insn:HI 5301 3071 3079 2 (set (mem/f/c:SI (plus:SI (reg/f:SI 13 sp) (const_int 12 [0xc])) [36 S4 A32]) (reg/f:SI 9 r9 [3275])) 572 {*arm_movsi_vfp} (nil) (expr_list:REG_EQUIV (symbol_ref:SI (*.LANCHOR0) [flags 0x182]) (nil))) --- (insn 6675 5282 5291 2 (set (reg:SI 9 r9) (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812]))) 4 {*arm_addsi3} (nil) (nil)) (insn:HI 5291 6675 5295 2 (set (reg:DF 75 s12 [orig:4078 Jd+1040 ] [4078]) (mem/s:DF (reg:SI 9 r9) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (nil)) --- (insn 6680 5308 6681 2 (set (reg:SI 1 r1) (const_int 4928 [0x1340])) 572 {*arm_movsi_vfp} (nil) (nil)) (insn 6681 6680 5314 2 (set (reg:SI 1 r1) (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 1 r1))) 4 {*arm_addsi3} (nil) (nil)) (insn:HI 5314 6681 5318 2 (set (reg:DF 75 s12 [orig:4096 Jd+1040 ] [4096]) (mem/s:DF (reg:SI 1 r1) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (expr_list:REG_EQUIV (mem/s:DF (reg:SI 1 r1) [30 Jd+1040 S8 A64]) (nil))) We see here how pseudo 3275 was allocated to r9 and pseudo 4082 was spilled to the stack. At insn 5291, r9 has been allocated [*] as the reload register since pseudo 3275 dies in that instruction; at insn 5314 we see the then-incorrect use of r9 (in instruction 6681) for the value of pseudo 4082. Note also how the dump shows that the compiler thinks r9 still holds the value of pseudo 3275 at insn 6681. The reloads and the values of the insn variable in reload_as_needed are as follows. These have been dumped at the points you suggest (the output after choose_reload_regs being the same as after find_reloads). For instruction 5301: (insn:HI 5301 3071 3079 2 (set (reg/f:SI 4082) (reg/f:SI 9 r9 [3275])) 572 {*arm_movsi_vfp} (nil) (expr_list:REG_EQUIV (symbol_ref:SI (*.LANCHOR0) [flags 0x182]) (nil))) Reload 0: reload_out (SI) = (reg/f:SI 4082) NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional reload_out_reg: (reg/f:SI 4082) For instruction 5291: (insn:HI 5291 5282 5295 2 (set (reg:DF 75 s12 [orig:4078 Jd+1040 ] [4078]) (mem/s:DF (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812])) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (expr_list:REG_DEAD (reg/f:SI 9 r9 [3275]) (expr_list:REG_DEAD (reg:SI 10 sl [3812]) (nil Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812])) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 8 reload_in_reg: (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812])) reload_reg_rtx: (reg:SI 9 r9) For instruction 5314: (insn:HI 5314 5308 5318 2 (set (reg:DF 75 s12 [orig:4096 Jd+1040 ] [4096]) (mem/s:DF (plus:SI (reg/f:SI 4082) (reg:SI 4084)) [30 Jd+1040 S8 A64])) 578 {*movdf_vfp} (nil) (expr_list:REG_EQUIV (mem/s:DF (plus:SI (reg/f:SI 4082) (reg:SI 4084)) [30 Jd+1040 S8 A64]) (nil))) Reload 0: reload_in (SI) = (reg/f:SI 4082) GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1) reload_in_reg: (reg/f:SI 4082) Reload 1: reload_in (SI) = (const_int 4928 [0x1340]) GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 1), can't combine
Re: A reload inheritance bug
Rask Ingemann Lambertsen wrote: On Mon, May 14, 2007 at 10:47:13PM +0100, Mark Shinwell wrote: I'm fairly certain that this is the correct approach to fix this issue, but I'm less certain that I have correctly guarded the call to forget_old_reloads_1, [snip] Index: reload1.c === --- reload1.c (revision 170932) +++ reload1.c (working copy) @@ -7506,6 +7506,9 @@ emit_reload_insns (struct insn_chain *ch } } + if (i 0 rld[r].in != NULL_RTX rld[r].reg_rtx != NULL_RTX) + forget_old_reloads_1 (rld[r].reg_rtx, NULL_RTX, NULL); + /* The following if-statement was #if 0'd in 1.34 (or before...). It's reenabled in 1.35 because supposedly nothing else deals with this problem. */ It seems to me that the only special thing happening in your testcase is that reload.in(_reg) is a PLUS rather than a REG or MEM. Does you patch not prevent reload inheritance in many cases where it would be OK? See below. Hmm, the immediately preceeding if() block begins: /* I is nonneg if this reload used a register. If rld[r].reg_rtx is 0, this is an optional reload that we opted to ignore. */ if (i = 0 rld[r].reg_rtx != 0) { I'd like a similiar comment just before your code. It is the i 0 in your case which prevents the preceeding if() block from updating reg_reloaded_valid and so on. I'm not as clear as I ought to be about the i 0 condition; what I was trying to establish here was is this reload register a hard register that was pre-existing in the subexpression being reloaded? I believe this corresponds to is this register not a spill register (hence the use of i 0, following the existing code below) -- but there doesn't seem anywhere to be a concrete definition of exactly what counts as a spill register in this context, so I may be mistaken. Part of the reason for starting this thread was that I was concerned about invalidating reloads that could be re-used later. However, it seems to me that in every circumstance where the reload register is a hard register and the value assigned to that reload register is different from the value held by the register during the evaluation of the subexpression being reloaded (as we have here with r9 - r9 + r10), we must invalidate all previous reloads involving that hard register. I may well have encoded that logic incorrectly in the patch, though. Mark -- /* If a register gets output-reloaded from a non-spill register, that invalidates any previous reloaded copy of it. But forget_old_reloads_1 won't get to see it, because it thinks only about the original insn. So invalidate it here. Also do the same thing for RELOAD_OTHER constraints where the output is discarded. */ if (i 0 ((rld[r].out != 0 (REG_P (rld[r].out) || (MEM_P (rld[r].out) REG_P (rld[r].out_reg || (rld[r].out == 0 rld[r].out_reg REG_P (rld[r].out_reg { rtx out = ((rld[r].out REG_P (rld[r].out)) ? rld[r].out : rld[r].out_reg); int nregno = REGNO (out); /* REG_RTX is now set or clobbered by the main instruction. As the comment above explains, forget_old_reloads_1 only sees the original instruction, and there is no guarantee that the original instruction also clobbered REG_RTX. For example, if find_reloads sees that the input side of a matched operand pair dies in this instruction, it may use the input register as the reload register. Calling forget_old_reloads_1 is a waste of effort if REG_RTX is also the output register. If we know that REG_RTX holds the value of a pseudo register, the code after the call will record that fact. */ if (rld[r].reg_rtx rld[r].reg_rtx != out) forget_old_reloads_1 (rld[r].reg_rtx, NULL_RTX, NULL);
Re: A reload inheritance bug
Bernd Schmidt wrote: Mark Shinwell wrote: These dumps are of course taken before the application of my patch. Hope that helps, Thanks. I may have missed it in the previous mails, but which piece of code exactly decides to use R9 for reload 0 of insn 5314? I'm afraid I'm not sure exactly -- but given that r9 is still deemed to contain a valid reload, isn't that logic correct given the pseudo register numbers in question? Mark
A reload inheritance bug
I have had the misfortune to discover a reload inheritance bug which, after a long period of analysis, has turned out to be very similar to the situation described by Richard Sandiford in http://gcc.gnu.org/ml/gcc-patches/2007-01/msg01977.html. This being my first serious foray into reload, I would appreciate some advice as to whether it looks like my current patch is going in the right direction. (Ian, I've copied this to you since you approved Richard's patch linked above; any help would be appreciated.) The bug is currently only exhibiting itself on a proprietary testcase when compiled for an ARM target and is extremely sensitive to the particular code involved. It arises as follows, using the same notation as in Richard's mail: A: reg1 - reg2 ... B: last use of reg2 ... C: use of reg1 where: - reg1 is not allocated a hard register, and is spilled to the stack; - reg2 is allocated a hard register H; - reg2 has a death note in B; - the last use of reg2 (in B) is inside a matched input operand; - reg1 and reg2 are not modified between A and C; - there are no labels between A and C. The reload used for the instruction at B looks like this: Reload 0: reload_in (SI) = (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812])) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1), inc by 8 reload_in_reg: (plus:SI (reg/f:SI 9 r9 [3275]) (reg:SI 10 sl [3812])) reload_reg_rtx: (reg:SI 9 r9) where, in the notation from above, r9 is H and pseudo 3275 is reg2. Since 3275 dies in the corresponding instruction, reload has chosen r9 as the reload register. Unfortunately when we later arrive at instruction C during inheritance, the current code believes that r9 is still holding the value that it had at A -- which it is not, because the use of r9 as the reload register at B has clobbered it. In order to fix this I think that reload1.c:emit_reload_insns should, at the point of discovery of an input reload whose reload register is a non-spill hard register H, invalidate all previous reloads involving that hard register. That is what the patch below does, or aims to do. (The patch is actually against a 4.2-based branch but the code looks to be the same on the mainline.) It at least fixes this particular test case, and appears to be in line with Richard's patch. I'm fairly certain that this is the correct approach to fix this issue, but I'm less certain that I have correctly guarded the call to forget_old_reloads_1, and indeed that I've done everything to eradicate the previous reloads involving H. For example, should I be wiping the necessary range in reg_last_reload_reg? On the subject of the guard, I am unsure because the existing code involves a much more complicated check: if (i 0 ((rld[r].out != 0 (REG_P (rld[r].out) || (MEM_P (rld[r].out) REG_P (rld[r].out_reg || (rld[r].out == 0 rld[r].out_reg REG_P (rld[r].out_reg Should I perhaps be mirroring this check for the input case too? I'm quite keen to make the fix for this bug cover all eventualities in terms of the various varieties of reload that might arise, if possible... Thanks in advance for any help. Once this is fixed, I shall add the necessary comments, test it appropriately, and submit it for approval on the mainline. Mark -- Index: reload1.c === --- reload1.c (revision 170932) +++ reload1.c (working copy) @@ -7506,6 +7506,9 @@ emit_reload_insns (struct insn_chain *ch } } + if (i 0 rld[r].in != NULL_RTX rld[r].reg_rtx != NULL_RTX) + forget_old_reloads_1 (rld[r].reg_rtx, NULL_RTX, NULL); + /* The following if-statement was #if 0'd in 1.34 (or before...). It's reenabled in 1.35 because supposedly nothing else deals with this problem. */
libjvm.la and problems with libtool relink mode
I am currently involved in building GCC toolchains by configuring them with the prefix that is to be used on the target system (somewhere in /opt) and then installing them in a separate working installation directory (say somewhere in my scratch space). The installation step into this working installation directory is performed by invoking a command of the form make prefix=/path/to/working/dir/install install. Unfortunately it seems that, when using current mainline and building for native x86_64, there is an awkward problem involving the building of libjvm.la. This libtool file is built from two others: one .lo file and one .la file. The difficulty arises at the libtool install step for libjvm.la. It seems that since the link command that constructed libjvm.la built that library using another .la file (libgcj.la) then libtool writes a relink command into the libjvm.la file. Then, when installing libjvm.la using libtool, it attempts to execute that relink command. This however fails because the installation location is not a subdirectory of the configured prefix -- it is somewhere completely different. This piece of libtool code causes the error: # Don't allow the user to place us outside of our expected # location b/c this prevents finding dependent libraries that # are installed to the same prefix. # At present, this check doesn't affect windows .dll's that # are installed into $libdir/../bin (currently, that works fine) # but it's something to keep an eye on. if test $inst_prefix_dir = $destdir; then $echo $modename: error: cannot install \`$file' to a directory not ending in $libdir 12 exit 1 fi This problem does not arise with various other .la files created during the build process of gcc (such as libstdc++.la for example) because those are directly created from a bunch of .lo files; none of them were built from another .la as libjvm.la is. The writing of the relink command into libjvm.la comes as a result of need_relink=yes on line 2095 of the ltmain.sh in the toplevel directory of mainline gcc. I wonder if this assignment ought to be guarded by a test involving $hardcode_into_libs, which is currently set to yes in my generated builddir/x86_64-unknown-linux-gnu/libjava/libtool. Perhaps I should be having $hardcode_into_libs set to no in addition to changing the libtool code so that need_relink is only set to yes if $hardcode_into_libs is set to yes also? The exact behaviour of the two modes of $hardcode_into_libs isn't clear to me, however, so I'm not very sure. With that solution I'm also assuming that libtool is doing the right thing in forcing a relink in this situation (if $hardcode_into_libs is yes); perhaps even that isn't the case. If anyone could offer any advice as to how to proceed here, I'd be most grateful. I've copied this to Alexandre since a colleague suggested he might know the answer :-) Mark
Re: subreg transformation causes incorrect post_inc
[EMAIL PROTECTED] wrote: From: Mark Shinwell [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: My port, based on (GCC) 4.2.0 20061002 (experimental), is producing incorrect code for the following test case: [snip] I've only had a very quick look at your code, but I have a feeling thatthis is an instance of the kind of slip-up with GO_IF_MODE_DEPENDENT_ADDRESSthat my patch at http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00858.html is aimed at preventing. (This patch is currently only applied to addrmodesbranch.) Mark Hhmm. Is the intent of your patch simply to prevent the mistake of backends not defining GO_IF_MODE_DEPENDENT_ADDRESS properly? That's right. Presumably something else is wrong, then :-) Mark
Re: subreg transformation causes incorrect post_inc
[EMAIL PROTECTED] wrote: My port, based on (GCC) 4.2.0 20061002 (experimental), is producing incorrect code for the following test case: [snip] I've only had a very quick look at your code, but I have a feeling that this is an instance of the kind of slip-up with GO_IF_MODE_DEPENDENT_ADDRESS that my patch at http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00858.html is aimed at preventing. (This patch is currently only applied to addrmodes branch.) Mark
Re: Even stricter implicit conversions between vectors
Ian Ollmann wrote: stronger type checking seems like a good idea to me in general. I agree, but I don't really want to break lots of code all at once, even if that code is being slightly more slack than it perhaps ought to be :-) Given that no-one has really objected to stronger type-checking here _per se_, then I see two ways forward: 1. Treat this as a regression: fix it and cause errors upon bad conversions, but risk breaking code. 2. Emit a warning in cases of converting vector signed int to vector unsigned int, etc., and state that the behaviour will change to an error in a later version. Thoughts? Mark
Re: Even stricter implicit conversions between vectors
Ian Lance Taylor wrote: I would vote for: break the code, but provide an option to restore the old behaviour, and mention the option in the error message. I like this -- I shall prepare a patch and circulate it for review. Mark
Re: Even stricter implicit conversions between vectors
Ian Lance Taylor wrote: Mark Shinwell [EMAIL PROTECTED] writes: I would now like to propose that the check in that function be made even stronger such that it disallows conversions between vectors whose element types differ -- even if an implicit conversion exists between those element types. As far as I can see, what that amounts to is saying that there should not be an implicit conversion between vector of int and vector of unsigned int. It seems to me that a cast would then be required to call, e.g., vec_add with arguments of type __vector unsigned int. I don't see how that is useful. But perhaps I am mistaken; can you give an example of the type of thing which is currently permitted which you think should be prohibited? Things exactly like what you write above (an implicit conversion between vector of int and vector of unsigned int). My main argument is that the gcc behaviour seems to be at odds with the semantics envisaged by the designers of these various vector-based instruction sets we see around -- and indeed the designers of the STL -- although perhaps that is a misconception and there are actually many more examples where it is believed that these implicit conversions are reasonable. I don't see a reason _per se_ why having the implicit conversions is a good thing -- for things such as vec_add, presumably more alternatives can be added (there are various ones already I see for differing orders of vector bool short / vector signed short) etc. I think it is open to debate whether it is a good thing to have to make those alternatives explicit, or whether some should be filled in by implicit conversions. However, this... Mike Stump wrote: My only concern is that we have tons of customers with tons of code and you don't have any That isn't quite true :-) and that you break their code. ...is more of a concern, I agree, and is what I worry about most. Mark
Implicit conversions between vectors
Currently we permit implicit conversions between vectors whose total bitsizes are equal but which are divided into differing numbers of subparts. It seems that in some circumstances this rule is overly lax. For example the following code, using vector types (whose definitions I have provided from the intrinsics header file) defined for the ARM NEON instruction set, is accepted without error or warning: ... typedef __builtin_neon_qi int8x8_t __attribute__ ((__vector_size__ (8))); typedef __builtin_neon_hi int16x4_t __attribute__ ((__vector_size__ (8))); ... int8x8_t f (int16x4_t a) { return a; } Here, the compiler is not complaining about an attempt to implicitly convert a vector of four 16-bit quantities to a vector of eight 8-bit quantities. This lack of type safety is unsettling, and I wonder if it should be fixed with a patch along the lines of the (not yet fully tested) one below. Does that sound reasonable? It seems right to try to fix the generic code here, even though the testcase in hand is target-specific. If this approach is unreasonable, I guess some target-specific hooks will be needed. Mark -- Index: gcc/c-common.c === --- gcc/c-common.c (revision 117639) +++ gcc/c-common.c (working copy) @@ -1014,7 +1014,8 @@ vector_types_convertible_p (tree t1, tre (TREE_CODE (TREE_TYPE (t1)) != REAL_TYPE || TYPE_PRECISION (t1) == TYPE_PRECISION (t2)) INTEGRAL_TYPE_P (TREE_TYPE (t1)) - == INTEGRAL_TYPE_P (TREE_TYPE (t2))); + == INTEGRAL_TYPE_P (TREE_TYPE (t2)) + TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2)); } /* Convert EXPR to TYPE, warning about conversion problems with constants.
Re: Implicit conversions between vectors
Ian Lance Taylor wrote: I believe that the problem with changing this unconditionally is that the Altivec programming guidelines specify the rules which gcc currently follows: you are permitted to assign one vector variable to another, without an explicit cast, when the vectors are the same size. So please check the Altivec programming manual. Will do, thanks. Mark
Re: Expansion of __builtin_frame_address
Richard Earnshaw wrote: I'm not keen on this. On some machines a frame pointer is *very* expensive, both in terms of the code required to set it up, and the resulting loss of a register which affects code quality (in addition, on Thumb, the frame pointer can access much less data on the stack than the stack pointer can, so code quality is affected even more). Do you have anything in mind that would be a better default? Is there something that could be used instead of hard_frame_pointer_rtx that will later expand to the correct frame address, but not necessarily force use of a frame pointer, for example? (As far as I can tell, frame_pointer_rtx will not do at least in the ARM case, because it doesn't yield the same value.) If the hard frame pointer is forced by default, then targets which are particularly badly affected can simply define INITIAL_FRAME_ADDRESS_RTX. Since such targets would presumably not have to force reload to keep the frame pointer, then definitions of such macros would not need to be side-effecting (in the way described earlier in this thread) and thus be satisfactory. I can see no argument for a frame pointer being *required* for getting the return address. We didn't have to do this in the past, so I think it is wrong to require that we do it now. Currently, the code does require a frame pointer in all except the count == 0 case, and as far as that particular case goes I get the impression that it would have been treated in the same way had this glibc backtrace function been noticed last year. This may be a mistaken impression though. Mark
Re: Expansion of __builtin_frame_address
David Edelsohn wrote: Mark Shinwell writes: Mark If the hard frame pointer is forced by default, then targets which are Mark particularly badly affected can simply define INITIAL_FRAME_ADDRESS_RTX. Mark Since such targets would presumably not have to force reload to keep Mark the frame pointer, then definitions of such macros would not need to Mark be side-effecting (in the way described earlier in this thread) and thus Mark be satisfactory. PowerPC also does not need hard_frame_pointer_rtx for all cases. It seems like a bad idea to force every port to define INITIAL_FRAME_ADDRESS_RTX to avoid a penalty. Why can't whatever port needs this change define INITIAL_FRAME_ADDRESS_RTX to hard_frame_pointer_rtx? One could add count to the macro so that the port can optimize further and avoid hard_frame_pointer_rtx, if possible. OK, here is what I think is a better suggestion. First note that expand_builtin_return_addr is used for both __builtin_frame_address and __builtin_return_address. The behaviour for the return address case seems to be for target-specific code to override the result of this function in the case when count == 0; thus, it does indeed not matter what we return from expand_builtin_return_addr in that case. (I hadn't realised this before.) The new patch, below, thus has the same behaviour for __builtin_return_address. However when dealing with __builtin_frame_address, we must return the correct value from this function no matter what the value of count. This patch therefore forces use of a hard FP in such situations. Is that more satisfactory? Mark Index: builtins.c === --- builtins.c (revision 114325) +++ builtins.c (working copy) @@ -496,12 +496,16 @@ expand_builtin_return_addr (enum built_i #else rtx tem; - /* For a zero count, we don't care what frame address we return, so frame - pointer elimination is OK, and using the soft frame pointer is OK. - For a non-zero count, we require a stable offset from the current frame - pointer to the previous one, so we must use the hard frame pointer, and + /* For a zero count with __builtin_return_address, we don't care what + frame address we return, because target-specific definitions will + override us. Therefore frame pointer elimination is OK, and using + the soft frame pointer is OK. + + For a non-zero count, or a zero count with __builtin_frame_address, + we require a stable offset from the current frame pointer to the + previous one, so we must use the hard frame pointer, and we must disable frame pointer elimination. */ - if (count == 0) + if (count == 0 fndecl_code == BUILT_IN_RETURN_ADDRESS) tem = frame_pointer_rtx; else {
Re: Expansion of __builtin_frame_address
Mark Mitchell wrote: Mark Shinwell wrote: As for the remaining problem, I suggest that we could: (i) always return the hard frame pointer, and disable FP elimination in the current function; or (iii) ...the same as option (i), but allow targets to define another macro that will cause the default code to use the soft frame pointer rather than the hard frame pointer, and hence allow FP elimination. (If such a macro were set by a particular target, am I right in thinking that it would be safe to use the soft frame pointer even in the count = 1 cases?) I tend to think that option (iii) might be best, although perhaps it is overkill and option (i) would do. But I'm not entirely sure; still being a gcc novice I have to admit to not being quite thoroughly clear on this myself at this stage. So any advice or comments would be appreciated! I agree that option (iii) is best, as it provides the ability to optimize on platforms where that is feasible, and still provides a working default elsewhere. I will review and approve a suitable patch to implement (iii), assuming that there are no objections from Jim or others. This having been discussed some more, and my understanding improved, I now believe that option (i) is in fact the correct thing to do. The attach patch implements this, which basically amounts to the same logic that is currently in the compiler save for the removal of the special case when count == 0. OK for mainline? Mark -- 2006-06-01 Mark Shinwell [EMAIL PROTECTED] * gcc/builtins.c (expand_builtin_return_addr): Always use hard_frame_pointer_rtx and prevent frame pointer elimination if INITIAL_FRAME_ADDRESS_RTX isn't set. Index: gcc/builtins.c === --- gcc/builtins.c (revision 114277) +++ gcc/builtins.c (working copy) @@ -496,20 +496,14 @@ expand_builtin_return_addr (enum built_i #else rtx tem; - /* For a zero count, we don't care what frame address we return, so frame - pointer elimination is OK, and using the soft frame pointer is OK. - For a non-zero count, we require a stable offset from the current frame - pointer to the previous one, so we must use the hard frame pointer, and - we must disable frame pointer elimination. */ - if (count == 0) -tem = frame_pointer_rtx; - else -{ - tem = hard_frame_pointer_rtx; + /* We require a stable offset from the current frame pointer to the + previous one, so we must use the hard frame pointer, and we must + disable frame pointer elimination. */ - /* Tell reload not to eliminate the frame pointer. */ - current_function_accesses_prior_frames = 1; -} + tem = hard_frame_pointer_rtx; + + /* Tell reload not to eliminate the frame pointer. */ + current_function_accesses_prior_frames = 1; #endif /* Some machines need special handling before we can access
Expansion of __builtin_frame_address
Hi, I'd like to gather some opinions and advice on the expansion of __builtin_frame_address, as discussed on gcc-patches last year [1, 2]. This centres on the following comment in expand_builtin_return_addr arising from revision 103294 last year: /* For a zero count, we don't care what frame address we return, so frame pointer elimination is OK, and using the soft frame pointer is OK. For a non-zero count, we require a stable offset from the current frame pointer to the previous one, so we must use the hard frame pointer, and we must disable frame pointer elimination. */ I believe that, when this function is used to expand __builtin_frame_address (), the first sentence isn't true: in some cases, a function does care about the value at count == 0. A concrete example is the glibc backtrace () function [3] which uses the expression __builtin_frame_address (0) to determine the starting point for stack traversal. (It performs subsequent dereferences back down the chain of frame pointers by itself.) The wording of the comment in itself is unfortunately not the end of the issue. Due to the subsequent conditional testing count == 0, the builtin can expand to an erroneous value when the following conditions are met: - the expansion function is invoked with count set to zero; and - the target is such that frame_pointer_rtx and hard_frame_pointer_rtx do not ultimately yield the same address. An example of such an invocation occurs during compilation of the aforementioned backtrace function, and an example of such a target is the ARM. Currently, calling backtrace () on such a target yields a failure. Let us just consider how to fix the ARM case for a moment. The obvious thing to do is to define INITIAL_FRAME_ADDRESS_RTX. However, the correct semantics of such a macro definition would be to: - set current_function_accesses_prior_frames to 1, so that FP is not eliminated in this function; and - return hard_frame_pointer_rtx. I hope I'm not alone in thinking that such a side-effecting macro would be in bad taste. Let us come back to the more general case. As identified above, when expand_builtin_return_addr is used to expand __builtin_frame_address (), we do care about the count == 0 case. I believe that the comment should be adjusted to reflect this whatever other changes, if any, are made. As for the remaining problem, I suggest that we could: (i) always return the hard frame pointer, and disable FP elimination in the current function; or (ii) remove this logic entirely (but preserve the means to communicate any necessity to prevent FP elimination to reload) and insist that all targets define INITIAL_FRAME_ADDRESS_RTX. Any such solution should probably adjust INITIAL_FRAME_ADDRESS_RTX so that it doesn't have to cause a side-effect in order to communicate the information about the frame pointer. Or... (iii) ...the same as option (i), but allow targets to define another macro that will cause the default code to use the soft frame pointer rather than the hard frame pointer, and hence allow FP elimination. (If such a macro were set by a particular target, am I right in thinking that it would be safe to use the soft frame pointer even in the count = 1 cases?) Option (ii) is in some ways the most satisfactory, as it would provide more certainty (particularly if another target were to be added) that the code is doing the right thing. However it involves a moderate amount of work and I would assume that the current level of enthusiasm in this regard is approximately the same as last year :-) Option (i), which is in all but name the solution 5 approach [1] proposed last year, means that the count == 0 case is elevated to the same level of importance as the count 0 cases, in line with the use in backtrace (). The problem with this is that on platforms where the soft and hard FPs coincide there is going to be a slight performance degradation, as identified previously, whenever these builtins are used. Someone with more experience will have to enlighten me as to whether this penalty would actually be significant: my intuition tells me that in fact it would not be, though perhaps there are oft-used occurrences of these builtin expansions that I don't know about. (On how many platforms is it the case that the soft and hard FPs actually coincide? If there are platforms where they do _not_ coincide, then presumably those targets can be affected in the same way as I identify above for ARM? I'm confused because the current code seems to assume (in the count == 0 case) that the soft and hard FPs coincide, yet uses the hard rather than the soft FP for count = 1.) Option (iii) gives the advantage of a working default and removes the pessimization on a target-by-target basis, just when it is known to be safe. If I'm correct in thinking that the setting of the soft frame pointer macro would enable the soft frame pointer to be used no matter what the value of count, then this option actually