I've been writing a kernel module and I've noticed a measurable performance
drop between the same code compiled against linux-2.4.0-test7 and against
test8. I disassembled the code to try and work out what was going on and I saw
the following happen:

 * [test8]
   One of the atomic memory access primitives supplied by the kernel
   was putting immediate data into a register outside of the inline-asm
   instruction group and then using the register inside.

 * [test7]
   The immediate data was passed directly to the inline-asm instruction.

In test8, of course, this means that the compiler has to scratch up a spare
register, which is totally unnecessary, as the immediate data could be
attached directly to the instruction opcode as was done in test7.

This had the effect of making the compiler have to push the old contents of
the register into a slot on the stack (I think it held a local variable at the
time), which had the further effects of using more stack memory, introducing
more register rearrangement (the code ended up longer), and burning up more
CPU cycles.

I can't remember exactly what it was now, but I think it was either something
to do with spinlocks or bitops. I'll re-investigate tonight and see if I can
come back with some benchmarks/code-snippets tomorrow.

David Howells
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to