[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #10 from matz at gcc dot gnu dot org 2009-03-26 13:41 --- I must be missing something, as I don't see the connection to SSA expansion (or non-SSA expansion). AFAIU the issue is (and that often was indeed a problem in the past with mplayer) that some transformation "needlessly" uses a temporary instead of the expression directly in an operand (here happening to be in an asm). I'm not sure why expanding from SSA form directly would somehow change that (after all fwprop could still do that). If the gimplifier already allocated a temporary then the expansion (if from SSA or non-SSA) has to expand that as pseudo. There are special occasions were the expander could care to fixup such asms, of course. E.g. doing TER for the expressions in question, but that can be done with or without SSA form. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #9 from bonzini at gnu dot org 2009-03-26 08:47 --- That's good to know. As I said, I'm okay with your patch. I just wondered if for 4.5 the cse_not_expected trick fixes it. In that case I'd have no problem applying the patch to 4.4, but I would revert it after SSA expansion goes in. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #8 from jakub at gcc dot gnu dot org 2009-03-25 18:31 --- fwprop won't propagate a constant into a "r" constrained asm operand, because verify_changes (0) fails in that case (it calls check_asm_operands, which verifies the constraints of all the operands). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #7 from bonzini at gnu dot org 2009-03-25 16:49 --- > one is the rtx_cost (SET_SRC (set), SET, speed) > old_cost check > (I wonder whether it is needed for asms at all and if yes, how to actually do > it) For addresses it is already done in should_replace_address. But for everything else, there could be problems if one uses x = 0x123456789abcdef; and fwprop propagates x into the asm. If they used a "r" constraint (correctly), reload probably will fix everything, but it may cause pessimizations or I don't know what. Sorry for the handwaving---the reason I am a bit weary about propagating into asms is that we are quite careful about it on the tree level. pr39543-3.c crashes on f1 with optimization, but crashes on f2 without optimization. Which means that for f2, cse_not_expected does not work because gimplification effectively has already done the same as force_reg. In my opinion that's the root cause: we worry about making asms complex, but we're cavalier in making them simpler (hoping that something later restores the complexity). For 4.5, maybe SSA expansion fixes this? (pr39543-3.c in Jakub's patch). If so, modifying fwprop is probably not the best thing to do. -- bonzini at gnu dot org changed: What|Removed |Added CC||matz at gcc dot gnu dot org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-03-25 16:49:20 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #6 from jakub at gcc dot gnu dot org 2009-03-25 15:49 --- Created an attachment (id=17540) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17540&action=view) gcc44-pr39543.patch Untested patch that cures all the 3 testcases by forward propagation into asm operands, as long as the propagation doesn't increase the number of needed registers. As I need to update a bunch of locations at once, I can't use try_fwprop_subst. There are 2 things that try_fwprop_subst does that I'm not doing, one is the rtx_cost (SET_SRC (set), SET, speed) > old_cost check (I wonder whether it is needed for asms at all and if yes, how to actually do it) and the update_df stuff (I'm never adding notes for asms, but for successful change I don't know how to inform df about the changes, if I need to do that. Paolo, any ideas? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #5 from jakub at gcc dot gnu dot org 2009-03-25 13:10 --- That can be solved easily, just compare whether new_rtx doesn't need more registers than old_rtx and only propagate into asm_noperands () >= 0 insns those that require the same number or fewer registers. Consider e.g.: int s[128]; void f1 (void) { int i; asm volatile ("# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17" : "=r" (i) : "m" (s[0]), "m" (s[2]), "m" (s[4]), "m" (s[6]), "m" (s[8]), "m" (s[10]), "m" (s[12]), "m" (s[14]), "m" (s[16]), "m" (s[18]), "m" (s[20]), "m" (s[22]), "m" (s[24]), "m" (s[26]), "m" (s[28]), "m" (s[30]), "m" (s[32])); asm volatile ("# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17" : "=r" (i) : "m" (s[0]), "m" (s[2]), "m" (s[4]), "m" (s[6]), "m" (s[8]), "m" (s[10]), "m" (s[12]), "m" (s[14]), "m" (s[16]), "m" (s[18]), "m" (s[20]), "m" (s[22]), "m" (s[24]), "m" (s[26]), "m" (s[28]), "m" (s[30]), "m" (s[32])); } void f2 (int *q) { int i; int *p = q + 32; asm volatile ("# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17" : "=r" (i) : "m" (p[0]), "m" (p[2]), "m" (p[4]), "m" (p[6]), "m" (p[8]), "m" (p[10]), "m" (p[12]), "m" (p[14]), "m" (p[16]), "m" (p[18]), "m" (p[20]), "m" (p[22]), "m" (p[24]), "m" (p[26]), "m" (p[28]), "m" (p[30]), "m" (p[32])); asm volatile ("# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16 %17" : "=r" (i) : "m" (p[0]), "m" (p[2]), "m" (p[4]), "m" (p[6]), "m" (p[8]), "m" (p[10]), "m" (p[12]), "m" (p[14]), "m" (p[16]), "m" (p[18]), "m" (p[20]), "m" (p[22]), "m" (p[24]), "m" (p[26]), "m" (p[28]), "m" (p[30]), "m" (p[32])); } at -O2, here cse_not_expected hack in stmt.c doesn't help, but I'd say fwprop could fix it up. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #4 from bonzini at gnu dot org 2009-03-24 15:03 --- The only thing to be careful is to have set_reg_equal == FALSE if the insn has multiple sets, and find which set in a multiple-set insn is actually referring to USE (a combination of note_stores and loc_mentioned_in_p will do). Such a patch is definitely backportable to 4.4 even if it is not ready for 4.4.0. However, I'm not sure it is good to have the best possible addressing mode in an asm, because I wonder if this could cause other kinds of reload failure. Think of: char p[10]; ... a = b * 4 + c; asm ("" : : "m" (p[a])) Which fwprop would likely change to a base/index/displacement address, with 2 registers gone. Setting cse_not_expected in practice would not use an addressing mode that is more complicated than the one specified by the user; for an asm, I think this is actually the least surprising behavior. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #3 from jakub at gcc dot gnu dot org 2009-03-24 14:18 --- Ok, so the reason why fwprop doesn't propagate this is: forward_propagate_and_simplify doing: rtx use_set = single_set (use_insn); ... if (!use_set) return false; ASM_OPERANDS with multiple output regs obviously isn't single_set, so nothing is propagated. Paolo, any reason to restrict this, or at least could forward_propagate_and_simplify specially propagate also to memory addresses in asms? The reason why combiner doesn't do anything is that when the pseudos initialized to constants are used multiple times (not dead on the asm insn), try_combine sets added_sets_2 and as the insn is a PARALLEL, appends the added set to that, making it invalid (asm_noperands (body) returns -1). So, to fix this bug, either we can teach fwprop to do this kind of propagation (my preference), or e.g. temporarily set cse_not_expected in expand_asm_operands. I can easily test the latter as a fallback variant if the former is deemed too unsafe for 4.4. -- jakub at gcc dot gnu dot org changed: What|Removed |Added CC||bonzini at gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #2 from jakub at gcc dot gnu dot org 2009-03-24 12:54 --- The reason for forcing the MEM addresses in "m"/"=m" into register is: /* By passing constant addresses through registers we get a chance to cse them. */ if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x)) x = force_reg (Pmode, x); in memory_address. On a simple s0[0] = 0; s0[4] = 0; s0[8] = 0; s0[12] = 0; the addresses are propagated back into the MEMs during fwprop1 (or during combine for -fno-forward-propagate), but this for some reason doesn't happen for the asm operands. Either we need to disable this kind of forcing into register for (constant) memory operands, or better find out and fix why neither fwprop nor combiner merges the memory operands back. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543
[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN
--- Comment #1 from jakub at gcc dot gnu dot org 2009-03-24 11:27 --- Ah, forgot the testcase: float __attribute__ ((aligned (16))) s0[128]; const float s1 = 0.707; float s2[8] __attribute__ ((aligned (16))); float s3[8] __attribute__ ((aligned (16))); float s4[16] __attribute__ ((aligned (16))); float s5[16] __attribute__ ((aligned (16))); void foo (int k, float *x, float *y, const float *d, const float *z) { float *a, *b, *c, *e; a = x + 2 * k; b = a + 2 * k; c = b + 2 * k; e = y + 2 * k; __asm__ volatile ("" : "=m" (x[0]), "=m" (b[0]), "=m" (a[0]), "=m" (c[0]) : "m" (y[0]), "m" (y[k * 2]), "m" (x[0]), "m" (a[0]) : "memory"); for (;;) { __asm__ volatile ("" : : "m" (y[2]), "m" (d[2]), "m" (e[2]), "m" (z[2]) : "memory"); if (!--k) break; } __asm__ volatile ("" : "=m" (x[2]), "=m" (x[10]), "=m" (x[6]), "=m" (x[14]) : "m" (y[2]), "m" (y[6]), "m" (x[2]), "m" (x[6]), "m" (s1) : "memory"); } void bar (float *a) { foo (4, a, a + 16, s2, s3); foo (8, a, a + 32, s4, s5); } void baz (void) { bar (s0); } -- jakub at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|--- |4.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543