I don't understand, why on the receive side MR_1 and MR_2 are loaded into
registers (EBX and EBP). In many cases they may be unused and even if they
would be used, they could easily be fetched by the user (especially as the
UTCB pointer is already available in EDI).
You may argue, that MR_0 is loaded from the UTCB anyway, so MR_1 and MR_2
are already in the cache. But why has MR_0 to be loaded, if it is initially
in ESI?
Here's a suggestion what can be done (without small spaces) to keep ESI
untouched (l4ka-pistachio-38b2e96fc5a6\kernel\src\glue\v4-x86\x32\trap.S):
1. Use "test $0x3f, %esi" instead of "and $0x3f, %esi; test %esi, %esi" (one
byte more, but doesn't change ESI)
2. Use EDX instead of ESI for "dest" starting from label 3
3. Move "popl %ecx" to label 7 (should be unaffected by page table switch,
makes ECX free to use)
4. Use "movl %cr3, %ecx; cmpl %eax, %ecx" instead of "movl %cr3, %edx; cmpl
%eax, %edx" (don't use EDX anymore)
5. Remove all "load MRs" lines (9 bytes and three memory accesses less)
If you can also spare the copy of MR_0 to the destination UTCB, I think this
would also save you one cache line, if no untyped words are transmitted.
According to the L4 X.2 reference manual MR_0 is not mapped to memory,
anyway.
What do you think, does this make sense?
It would require a change of the (not stable) specification, but I don't see
any reason for MR_1 and MR_2 to be received in registers on x32.
Btw, why are MR_1 and MR_2 stored in the UTCB again in L4_Ipc, when they
have just been loaded from there?
Best regards
Moritz