On Thu, Aug 23, 2012 at 9:06 PM, 陳韋任 (Wei-Ren Chen) <che...@iis.sinica.edu.tw> wrote: >> That might be difficult. what i did was that i disabled inlined >> translated and push the virt/phys address into 2 new fields in the cpu >> structure in the call out lookup. because in the callout lookup we >> have a handle to the cpu env. > > What you mean by "disabled inlined translated"? You mean apply Max's > patch so that all guest memory access go through the slow path without > looking software tlb? Since you said you're running arm on x86 host, > I guess what you did might be, > > int cpu_arm_handle_mmu_fault (CPUARMState *env, target_ulong address, > int access_type, int mmu_idx) > { > ... > > ret = get_phys_addr(env, address, access_type, is_user, &phys_addr, &prot, > &page_size); > > // store phys_addr into env->cpu_last_paddr > > ... > } > >> not too sure how much impact inlined lookup has on the performance. >> since i disabled it, next step i would just get rid of that piece of >> generated assembly, as it is no good for icache ( generated for every >> memory operation). > > You can run a benchmark inside your guest. I guess if you run a > long-running benchmark, you can see performance degradation. If software > tlb hit, you can get the value of guest memory in the code cache > with a few host instructions. Disabling software tlb lookup, every guest > memory access will call a helper function which takes a lot of time. > What you mean by "get rid of that piece of generated assembly"?
every inlined TLB lookup has ~10 instructions . Xin > > Regards, > chenwj > > -- > Wei-Ren Chen (陳韋任) > Computer Systems Lab, Institute of Information Science, > Academia Sinica, Taiwan (R.O.C.) > Tel:886-2-2788-3799 #1667 > Homepage: http://people.cs.nctu.edu.tw/~chenwj