On Sat, Aug 25, 2012 at 9:20 PM, Steven <wangwangk...@gmail.com> wrote: > On Tue, Aug 21, 2012 at 3:18 AM, Max Filippov <jcmvb...@gmail.com> wrote: >> On Tue, Aug 21, 2012 at 9:40 AM, Steven <wangwangk...@gmail.com> wrote: >>> Hi, Max, >>> I wrote a small program to verify your patch could catch all the load >>> instructions from the guest. However, I found some problem from the >>> results. >>> >>> The guest OS and the emulated machine are both 32bit x86. My simple >>> program in the guest declares an 1048576-element integer array, >>> initialize the elements, and load them in a loop. It looks like this >>> int array[1048576]; >>> initialize the array; >>> >>> /* region of interests */ >>> int temp; >>> for (i=0; i < 1048576; i++) { >>> temp = array[i]; >>> } >>> So ideally, the path should catch the guest virtual address of in the >>> loop, right? >>> In addition, the virtual address for the beginning and end >>> of the array is 0xbf68b6e0 and 0xbfa8b6e0. >>> What i got is as follows >>> >>> __ldl_mmu, vaddr=bf68b6e0 >>> __ldl_mmu, vaddr=bf68b6e4 >>> __ldl_mmu, vaddr=bf68b6e8 >>> ..... >>> These should be the virtual address of the above loop. The >>> results look good because the gap between each vaddr is 4 bypte, which >>> is the length of each element. >>> However, after certain address, I got >>> >>> __ldl_mmu, vaddr=bf68bffc >>> __ldl_mmu, vaddr=bf68c000 >>> __ldl_mmu, vaddr=bf68d000 >>> __ldl_mmu, vaddr=bf68e000 >>> __ldl_mmu, vaddr=bf68f000 >>> __ldl_mmu, vaddr=bf690000 >>> __ldl_mmu, vaddr=bf691000 >>> __ldl_mmu, vaddr=bf692000 >>> __ldl_mmu, vaddr=bf693000 >>> __ldl_mmu, vaddr=bf694000 >>> ... >>> __ldl_mmu, vaddr=bf727000 >>> __ldl_mmu, vaddr=bf728000 >>> __ldl_mmu, vaddr=bfa89000 >>> __ldl_mmu, vaddr=bfa8a000 >>> So the rest of the vaddr I got has a different of 4096 bytes, instead >>> of 4. I repeated the experiment for several times and got the same >>> results. Is there anything wrong? or could you explain this? Thanks. >> >> I see two possibilities here: >> - maybe there are more fast path shortcuts in the QEMU code? >> in that case output of qemu -d op,out_asm would help. >> - maybe your compiler had optimized that sample code? >> could you try to declare array in your sample as 'volatile int'? > After adding the "volatile" qualifier, the results are correct now. > So your patch can trap all the guest memory data load access, no > matter slow path or fast path. > > However, I found some problem when I try understanding the instruction > access. So I run the VM with "-d in_asm" to see program counter of > each guest code. I got > > __ldl_cmmu,ffffffff8102ff91 > __ldl_cmmu,ffffffff8102ff9a > ---------------- > IN: > 0xffffffff8102ff8a: mov 0x8(%rbx),%rax > 0xffffffff8102ff8e: add 0x790(%rbx),%rax > 0xffffffff8102ff95: xor %edx,%edx > 0xffffffff8102ff97: mov 0x858(%rbx),%rcx > 0xffffffff8102ff9e: cmp %rcx,%rax > 0xffffffff8102ffa1: je 0xffffffff8102ffb0 > ..... > > __ldl_cmmu,00000000004005a1 > __ldl_cmmu,00000000004005a6 > ---------------- > IN: > 0x0000000000400594: push %rbp > 0x0000000000400595: mov %rsp,%rbp > 0x0000000000400598: sub $0x20,%rsp > 0x000000000040059c: mov %rdi,-0x18(%rbp) > 0x00000000004005a0: mov $0x1,%edi > 0x00000000004005a5: callq 0x4004a0 > > From the results, I see that the guest virtual address of the pc is > slightly different between the __ldl_cmmu and the tb's pc(below IN:). > Could you help to understand this? Which one is the true pc memory > access? Thanks.
Guest code is accessed at the translation time by C functions and I guess there are other layers of address translation caching. I wouldn't try to interpret these _cmmu printouts and would instead instrument [cpu_]ld{{u,s}{b,w},l,q}_code macros. -- Thanks. -- Max