On Tue, Mar 07, 2006 at 05:27:37PM +0100, Andi Kleen wrote: > On Wednesday 08 March 2006 00:26, Benjamin LaHaise wrote: > > Hi Andi, > > > > On x86-64 one inefficiency that shows up on profiles is the handling of > > struct page conversion to/from idx and addresses. This is mostly due to > > the fact that struct page is currently 56 bytes on x86-64, so gcc has to > > emit a slow division or multiplication to convert. > > Huh?
You used an unsigned long, but ptrdiff_t is signed. gcc cannot use any shifting tricks because they round incorrectly in the signed case. > AFAIK mul has a latency of < 10 cycles even on P4 so I can't imagine > it's a real problem. Something must be wrong with your measurements. mul isn't particularly interesting in the profiles, it's the idiv. > My guess would be that on more macro loads it would be a loss due > to more cache misses. But you get less false sharing of struct page on SMP as well. With a 56 byte page a single struct page can overlap two cachelines, and on this workload the page definately gets transferred from one CPU to the other. -ben - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html