On Wed, Jun 13, 2012 at 5:14 AM, 陳韋任 (Wei-Ren Chen) <che...@iis.sinica.edu.tw> wrote: > Hi all, > > I suspect that guest memory access (qemu_ld/qemu_st) account for the major of > time spent in system mode. I would like to know precisely how much (if > possible). > We use tools like perf [1] before, but since the logic of guest memory access > aslo > embedded in the host binary not only helper functions, the result cannot be > relied. The current idea is adding helper functions before/after guest memory > access logic. Take ARM guest on x86_64 host for example, should I add the > helper > functions before/after tcg_gen_qemu_{ld,st} in target-arm/translate.c or > tcg_out_qemu_{ld,st} in tcg/i386/tcg-target.c? Or there is a better way to > know > how much time QEMU spend on handling guest memory access?
I'm afraid there's no easy way to measure that: any change you make to generated code will completely change the timing given that the ld/st fast path is only a few instructions long. Another approach might be to run the program in user mode and then in system mode (provided the guest OS is very light). As a side note, it might be interesting to gather statistics about the hit rate of the QEMU TLB. Another thing to consider is speeding up the fast path; see YeongKyoon Lee RFC patch: http://www.mail-archive.com/qemu-devel@nongnu.org/msg91294.html Laurent