[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-26 Thread bonzini at gnu dot org


--- Comment #9 from bonzini at gnu dot org  2009-03-26 08:47 ---
That's good to know.  As I said, I'm okay with your patch.  I just wondered if
for 4.5 the cse_not_expected trick fixes it.  In that case I'd have no problem
applying the patch to 4.4, but I would revert it after SSA expansion goes in.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-26 Thread matz at gcc dot gnu dot org


--- Comment #10 from matz at gcc dot gnu dot org  2009-03-26 13:41 ---
I must be missing something, as I don't see the connection to SSA expansion
(or non-SSA expansion).  AFAIU the issue is (and that often was indeed a
problem in the past with mplayer) that some transformation needlessly
uses a temporary instead of the expression directly in an operand (here
happening to be in an asm).  I'm not sure why expanding from SSA form
directly would somehow change that (after all fwprop could still do that).

If the gimplifier already allocated a temporary then the expansion (if from
SSA or non-SSA) has to expand that as pseudo.

There are special occasions were the expander could care to fixup such
asms, of course.  E.g. doing TER for the expressions in question,
but that can be done with or without SSA form.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-25 Thread jakub at gcc dot gnu dot org


--- Comment #5 from jakub at gcc dot gnu dot org  2009-03-25 13:10 ---
That can be solved easily, just compare whether new_rtx doesn't need more
registers than old_rtx and only propagate into asm_noperands () = 0 insns
those that require the same number or fewer registers.

Consider e.g.:

int s[128];

void
f1 (void)
{
  int i;
  asm volatile (# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16
%17
: =r (i)
: m (s[0]), m (s[2]), m (s[4]), m (s[6]), m (s[8]),
  m (s[10]), m (s[12]), m (s[14]), m (s[16]), m (s[18]),
  m (s[20]), m (s[22]), m (s[24]), m (s[26]), m (s[28]),
  m (s[30]), m (s[32]));
  asm volatile (# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16
%17
: =r (i)
: m (s[0]), m (s[2]), m (s[4]), m (s[6]), m (s[8]),
  m (s[10]), m (s[12]), m (s[14]), m (s[16]), m (s[18]),
  m (s[20]), m (s[22]), m (s[24]), m (s[26]), m (s[28]),
  m (s[30]), m (s[32]));
}

void
f2 (int *q)
{
  int i;
  int *p = q + 32;
  asm volatile (# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16
%17
: =r (i)
: m (p[0]), m (p[2]), m (p[4]), m (p[6]), m (p[8]),
  m (p[10]), m (p[12]), m (p[14]), m (p[16]), m (p[18]),
  m (p[20]), m (p[22]), m (p[24]), m (p[26]), m (p[28]),
  m (p[30]), m (p[32]));
  asm volatile (# %0 %1 %2 %3 %4 %5 %6 %7 %8 %9 %10 %11 %12 %13 %14 %15 %16
%17
: =r (i)
: m (p[0]), m (p[2]), m (p[4]), m (p[6]), m (p[8]),
  m (p[10]), m (p[12]), m (p[14]), m (p[16]), m (p[18]),
  m (p[20]), m (p[22]), m (p[24]), m (p[26]), m (p[28]),
  m (p[30]), m (p[32]));
}

at -O2, here cse_not_expected hack in stmt.c doesn't help, but I'd say fwprop
could fix it up.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-25 Thread jakub at gcc dot gnu dot org


--- Comment #6 from jakub at gcc dot gnu dot org  2009-03-25 15:49 ---
Created an attachment (id=17540)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17540action=view)
gcc44-pr39543.patch

Untested patch that cures all the 3 testcases by forward propagation into asm
operands, as long as the propagation doesn't increase the number of needed
registers.
As I need to update a bunch of locations at once, I can't use
try_fwprop_subst.  There are 2 things that try_fwprop_subst does that I'm not
doing, one is the rtx_cost (SET_SRC (set), SET, speed)  old_cost check
(I wonder whether it is needed for asms at all and if yes, how to actually do
it) and the update_df stuff (I'm never adding notes for asms, but for
successful
change I don't know how to inform df about the changes, if I need to do that.
Paolo, any ideas?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-25 Thread bonzini at gnu dot org


--- Comment #7 from bonzini at gnu dot org  2009-03-25 16:49 ---
 one is the rtx_cost (SET_SRC (set), SET, speed)  old_cost check
 (I wonder whether it is needed for asms at all and if yes, how to actually do
 it)

For addresses it is already done in should_replace_address.  But for everything
else, there could be problems if one uses

  x = 0x123456789abcdef;

and fwprop propagates x into the asm.  If they used a r constraint
(correctly), reload probably will fix everything, but it may cause
pessimizations or I don't know what.  Sorry for the handwaving---the reason I
am a bit weary about propagating into asms is that we are quite careful about
it on the tree level.

pr39543-3.c crashes on f1 with optimization, but crashes on f2 without
optimization.  Which means that for f2, cse_not_expected does not work because
gimplification effectively has already done the same as force_reg.  In my
opinion that's the root cause: we worry about making asms complex, but we're
cavalier in making them simpler (hoping that something later restores the
complexity).

For 4.5, maybe SSA expansion fixes this?  (pr39543-3.c in Jakub's patch).  If
so, modifying fwprop is probably not the best thing to do.


-- 

bonzini at gnu dot org changed:

   What|Removed |Added

 CC||matz at gcc dot gnu dot org
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2009-03-25 16:49:20
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-25 Thread jakub at gcc dot gnu dot org


--- Comment #8 from jakub at gcc dot gnu dot org  2009-03-25 18:31 ---
fwprop won't propagate a constant into a r constrained asm operand, because
verify_changes (0) fails in that case (it calls check_asm_operands, which
verifies the constraints of all the operands).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-24 Thread jakub at gcc dot gnu dot org


--- Comment #1 from jakub at gcc dot gnu dot org  2009-03-24 11:27 ---
Ah, forgot the testcase:

float __attribute__ ((aligned (16))) s0[128];
const float s1 = 0.707;
float s2[8] __attribute__ ((aligned (16)));
float s3[8] __attribute__ ((aligned (16)));
float s4[16] __attribute__ ((aligned (16)));
float s5[16] __attribute__ ((aligned (16)));

void
foo (int k, float *x, float *y, const float *d, const float *z)
{
  float *a, *b, *c, *e;

  a = x + 2 * k;
  b = a + 2 * k;
  c = b + 2 * k;
  e = y + 2 * k;
  __asm__ volatile (
: =m (x[0]), =m (b[0]), =m (a[0]), =m (c[0])
: m (y[0]), m (y[k * 2]), m (x[0]), m (a[0])
: memory);
  for (;;)
{
  __asm__ volatile (
:
: m (y[2]), m (d[2]), m (e[2]), m (z[2])
: memory);
  if (!--k)
break;
}
  __asm__ volatile (
: =m (x[2]), =m (x[10]), =m (x[6]), =m (x[14])
: m (y[2]), m (y[6]), m (x[2]), m (x[6]), m (s1)
: memory);
}

void
bar (float *a)
{
  foo (4, a, a + 16, s2, s3);
  foo (8, a, a + 32, s4, s5);
}

void
baz (void)
{
  bar (s0);
}


-- 

jakub at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|--- |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-24 Thread jakub at gcc dot gnu dot org


--- Comment #2 from jakub at gcc dot gnu dot org  2009-03-24 12:54 ---
The reason for forcing the MEM addresses in m/=m into register is:
  /* By passing constant addresses through registers
 we get a chance to cse them.  */
  if (! cse_not_expected  CONSTANT_P (x)  CONSTANT_ADDRESS_P (x))
x = force_reg (Pmode, x);
in memory_address.  On a simple s0[0] = 0; s0[4] = 0; s0[8] = 0; s0[12] = 0;
the addresses are propagated back into the MEMs during fwprop1 (or during
combine for -fno-forward-propagate), but this for some reason doesn't happen
for the asm operands.
Either we need to disable this kind of forcing into register for (constant)
memory operands, or better find out and fix why neither fwprop nor combiner
merges the memory operands back.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-24 Thread jakub at gcc dot gnu dot org


--- Comment #3 from jakub at gcc dot gnu dot org  2009-03-24 14:18 ---
Ok, so the reason why fwprop doesn't propagate this is:

forward_propagate_and_simplify doing:
  rtx use_set = single_set (use_insn);
...
  if (!use_set)
return false;

ASM_OPERANDS with multiple output regs obviously isn't single_set, so nothing
is propagated.  Paolo, any reason to restrict this, or at least could
forward_propagate_and_simplify specially propagate also to memory addresses in
asms?

The reason why combiner doesn't do anything is that when the pseudos
initialized to constants are used multiple times (not dead on the asm insn),
try_combine
sets added_sets_2 and as the insn is a PARALLEL, appends the added set to that,
making it invalid (asm_noperands (body) returns -1).

So, to fix this bug, either we can teach fwprop to do this kind of propagation
(my preference), or e.g. temporarily set cse_not_expected in
expand_asm_operands.  I can easily test the latter as a fallback variant if the
former is deemed too unsafe for 4.4.


-- 

jakub at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||bonzini at gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543



[Bug rtl-optimization/39543] [4.4 Regression] Reload failure on mplayer from SVN

2009-03-24 Thread bonzini at gnu dot org


--- Comment #4 from bonzini at gnu dot org  2009-03-24 15:03 ---
The only thing to be careful is to have set_reg_equal == FALSE if the insn has
multiple sets, and find which set in a multiple-set insn is actually referring
to USE (a combination of note_stores and loc_mentioned_in_p will do).  Such a
patch is definitely backportable to 4.4 even if it is not ready for 4.4.0.

However, I'm not sure it is good to have the best possible addressing mode in
an asm, because I wonder if this could cause other kinds of reload failure. 
Think of:

   char p[10];
   ...
   a = b * 4 + c;
   asm ( : : m (p[a]))

Which fwprop would likely change to a base/index/displacement address, with 2
registers gone.

Setting cse_not_expected in practice would not use an addressing mode that is
more complicated than the one specified by the user; for an asm, I think this
is actually the least surprising behavior.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39543