On Tue, Mar 07, 2006 at 05:27:37PM +0100, Andi Kleen wrote:
> On Wednesday 08 March 2006 00:26, Benjamin LaHaise wrote:
> > Hi Andi,
> > 
> > On x86-64 one inefficiency that shows up on profiles is the handling of 
> > struct page conversion to/from idx and addresses.  This is mostly due to 
> > the fact that struct page is currently 56 bytes on x86-64, so gcc has to 
> > emit a slow division or multiplication to convert. 
> 
> Huh? 

You used an unsigned long, but ptrdiff_t is signed.  gcc cannot use any 
shifting tricks because they round incorrectly in the signed case.

> AFAIK mul has a latency of < 10 cycles even on P4 so I can't imagine
> it's a real problem. Something must be wrong with your measurements.

mul isn't particularly interesting in the profiles, it's the idiv.

> My guess would be that on more macro loads it would be a loss due 
> to more cache misses.

But you get less false sharing of struct page on SMP as well.  With a 56 byte 
page a single struct page can overlap two cachelines, and on this workload 
the page definately gets transferred from one CPU to the other.

                -ben
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to