On Sat, Aug 25, 2012 at 4:41 PM, Max Filippov <jcmvb...@gmail.com> wrote: > On Sat, Aug 25, 2012 at 9:20 PM, Steven <wangwangk...@gmail.com> wrote: >> On Tue, Aug 21, 2012 at 3:18 AM, Max Filippov <jcmvb...@gmail.com> wrote: >>> On Tue, Aug 21, 2012 at 9:40 AM, Steven <wangwangk...@gmail.com> wrote: >>>> Hi, Max, >>>> I wrote a small program to verify your patch could catch all the load >>>> instructions from the guest. However, I found some problem from the >>>> results. >>>> >>>> The guest OS and the emulated machine are both 32bit x86. My simple >>>> program in the guest declares an 1048576-element integer array, >>>> initialize the elements, and load them in a loop. It looks like this >>>> int array[1048576]; >>>> initialize the array; >>>> >>>> /* region of interests */ >>>> int temp; >>>> for (i=0; i < 1048576; i++) { >>>> temp = array[i]; >>>> } >>>> So ideally, the path should catch the guest virtual address of in the >>>> loop, right? >>>> In addition, the virtual address for the beginning and end >>>> of the array is 0xbf68b6e0 and 0xbfa8b6e0. >>>> What i got is as follows >>>> >>>> __ldl_mmu, vaddr=bf68b6e0 >>>> __ldl_mmu, vaddr=bf68b6e4 >>>> __ldl_mmu, vaddr=bf68b6e8 >>>> ..... >>>> These should be the virtual address of the above loop. The >>>> results look good because the gap between each vaddr is 4 bypte, which >>>> is the length of each element. >>>> However, after certain address, I got >>>> >>>> __ldl_mmu, vaddr=bf68bffc >>>> __ldl_mmu, vaddr=bf68c000 >>>> __ldl_mmu, vaddr=bf68d000 >>>> __ldl_mmu, vaddr=bf68e000 >>>> __ldl_mmu, vaddr=bf68f000 >>>> __ldl_mmu, vaddr=bf690000 >>>> __ldl_mmu, vaddr=bf691000 >>>> __ldl_mmu, vaddr=bf692000 >>>> __ldl_mmu, vaddr=bf693000 >>>> __ldl_mmu, vaddr=bf694000 >>>> ... >>>> __ldl_mmu, vaddr=bf727000 >>>> __ldl_mmu, vaddr=bf728000 >>>> __ldl_mmu, vaddr=bfa89000 >>>> __ldl_mmu, vaddr=bfa8a000 >>>> So the rest of the vaddr I got has a different of 4096 bytes, instead >>>> of 4. I repeated the experiment for several times and got the same >>>> results. Is there anything wrong? or could you explain this? Thanks. >>> >>> I see two possibilities here: >>> - maybe there are more fast path shortcuts in the QEMU code? >>> in that case output of qemu -d op,out_asm would help. >>> - maybe your compiler had optimized that sample code? >>> could you try to declare array in your sample as 'volatile int'? >> After adding the "volatile" qualifier, the results are correct now. >> So your patch can trap all the guest memory data load access, no >> matter slow path or fast path. >> >> However, I found some problem when I try understanding the instruction >> access. So I run the VM with "-d in_asm" to see program counter of >> each guest code. I got >> >> __ldl_cmmu,ffffffff8102ff91 >> __ldl_cmmu,ffffffff8102ff9a >> ---------------- >> IN: >> 0xffffffff8102ff8a: mov 0x8(%rbx),%rax >> 0xffffffff8102ff8e: add 0x790(%rbx),%rax >> 0xffffffff8102ff95: xor %edx,%edx >> 0xffffffff8102ff97: mov 0x858(%rbx),%rcx >> 0xffffffff8102ff9e: cmp %rcx,%rax >> 0xffffffff8102ffa1: je 0xffffffff8102ffb0 >> ..... >> >> __ldl_cmmu,00000000004005a1 >> __ldl_cmmu,00000000004005a6 >> ---------------- >> IN: >> 0x0000000000400594: push %rbp >> 0x0000000000400595: mov %rsp,%rbp >> 0x0000000000400598: sub $0x20,%rsp >> 0x000000000040059c: mov %rdi,-0x18(%rbp) >> 0x00000000004005a0: mov $0x1,%edi >> 0x00000000004005a5: callq 0x4004a0 >> >> From the results, I see that the guest virtual address of the pc is >> slightly different between the __ldl_cmmu and the tb's pc(below IN:). >> Could you help to understand this? Which one is the true pc memory >> access? Thanks. > > Guest code is accessed at the translation time by C functions and > I guess there are other layers of address translation caching. I wouldn't > try to interpret these _cmmu printouts and would instead instrument > [cpu_]ld{{u,s}{b,w},l,q}_code macros. yes, you are right. Some ldub_code in x86 guest does not call __ldq_cmmu when the tlb hits. By the way, when I use your patch, I saw too many log event for the kernel data _mmu, ie., the addrs is 0x7fff ffff ffff. There are too many such mmu event that the user mode data can not be executed. So I have to setup a condition like if (addr < 0x8000 0000 0000) fprintf(stderr, "%s: %08x\n", __func__, addr); Then my simple array access program can be finished. I am wondering whether you have met the similar problem or you have any suggestion on this. My final goal is to obtain the memory access trace for a particular process in the guest, so your patch really helps, except for too many kernel _mmu events.
steven > > -- > Thanks. > -- Max