Hi,

On Fri, 18 Jan 2008, Johannes Schindelin wrote:

> > > > asm (" ... movzbl %b1, %%edx\n ... " : : "r" (blubb), "r" (bla) );
> > > 
> > > Okay, but this only concerns gcc4, apparently.
> > 
> > No, it's nothing to do with GCC.
> 
> But apparently it has!  With gcc < 4 I did never get the error.

As I tried to explain, this is pure luck.

> Which probably means that gcc < 4 did _not_ use ecx, and therefore it 
> does not have to be pushed and popped.

We are talking about the hunk using the "q" constraint for operand 1 in 
st[bw]_kernel.  The change in the clobber list (and the associated 
saving/restoring of %ecx around the call) is something entirely different.

> Which -- judging from how commonly glue() is called in op.c -- could 
> mean a performance hit.

glue() is a macro, the function called is stw_kernel (inline function).

> I am all for supporting gcc > 3, but please, please not at the cost of 
> having a performance hit for _existing_ users.

Have you measured this?  This function actually does a call to stw_mmu, a 
rather slow and big function, the overhead of one register store more or 
less is probably zero.

But that point is mood anyway.  When it works without the "q" constraint 
in gcc 3.4.2 it only does so, because GCC allocates one of the ax-dx 
registers to that operand (by luck, not by design).  As T1 is coming in in 
esi there anyway existed a reg-reg move already, so you pay that 
performance hit (if you like to call it such) already.

> > Only if you want to trust your luck.  I fear I don't have gcc 3.4.2 
> > lying around anywhere, so I can't really help debugging this reload 
> > breakage in that GCC version.  It might help to introduce a temporary to 
> > guide GCC through this problematic reload case by detaching the global 
> > register variable from the asm operand.  For cases where it's no problem 
> > this should be optimized away, so doesn't inhibit a performance cost.  
> > What I mean is something like the below.  If someone with gcc 3.4.2 
> > could test that ...
> 
> I do ask myself how gcc would optimise away instructions that are 
> explicitely written in the asm() statement.  If it does so, I consider 
> this a serious bug in gcc.

My patch in the last mail introduces a copy in C (to vtmp), _that_ can be 
optimized away under the right circumstances.  Of course GCC does not 
change the asm template in any way.


Ciao,
Michael.


Reply via email to