Re: [Qemu-devel] performance monitor
On Friday 04 January 2008 09:49:22 Rob Landley wrote: On Thursday 03 January 2008 15:38:02 Clemens Kolbitsch wrote: Does anyone have an idea on how I can measure performance in qemu to a somewhat accurate level? hwclock --show time1 tar xvjf linux-2.6.23.tar.bz2 cd linux-2.6.23 make allnoconfig make cd .. hwclock --show time2 Do that on host and client, and you've got a ratio of the performance of qemu to your host that should be good to within a few percent. I have modified qemu (the memory handling) and the linux kernel and want to find out the penalty this introduced... does anyone have any comments / ideas on this? If it's something big, you can compare the result in minutes and seconds. That's probably the best you're going to do. (Although really you want hwclock --show before and after, and then do the math. That tunnels out to the host system to get its idea of the time, which doesn't get thrown off by timer interrupt delivery (as a signal) getting deferred by the host system's scheduler. Of course the fact that hwclock _takes_ a second or so to read the clock is a bit of a downer, but anything that takes less than a minute or so to run isn't going to give you a very accurate time because the performance of qemu isn't constant, and your results are going to skew all over the place. Especially for small things, the performance varies from run to run. Start by imagining qemu as having the mother of all page fault latencies. The cost of faulting code into the L2 cache includes dynamic recompilation, which is expensive. Worse, when the dynamic recompilation buffer fills up it blanks the whole thing, and recompiles every new page it hits one at a time until the buffer fills up again. (What is it these days, 16 megs of translated code before it resets?) No LRU or anything, no cache management at _all_, just when the bucket fills up, dump it and start over. (Well, that's what it did back around the last stable release anyway. It has been almost a year since then, so maybe it's changed. I've been busy with other things and not really keeping track of changes that didn't affect what I could and couldn't get to run.) So anyway, depending on what code you run in what order, the performance can _differ_ from one run to the next due to when the cache gets blanked and stuff gets retranslated. By a lot. There's no obvious way to predict this or control it. And the software clock inside your emulated system can lie to you about it if timer interrupts get deferred. All this should pretty much average out if you do something big with lots of execs (like build a linux kernel from source). But if you do something small expect serious butterfly effects. Expect microbenchmarks to swing around wildly. Quick analogy: you know the performance difference faulting your executable in from disk vs running it out of cache? Imagine a daemon that makes random intermittent calls to echo 1 /proc/sys/vm/drop_caches, and now try to do a sane benchmark. No matter what you use to measure, what you're measuring isn't going to be consistent from one run to the next. Performance should be better (and more stable) with kqemu or kvm. Maybe that you can benchmark sanely, I wouldn't know. Ask somebody else. :) P.S. Take the above with a large grain of salt, I'm not close to an expert in this area... :-) Ok. What you've said pretty much covers how I've made up my mind in the last couple of hours trying to think about the problem *g* Guess I'll have to be happy counting TLB misses and page faults, adding up executed instructions (in user/kernel mode) per process and doing some timing stuff... then running the examples a lot of times, making an average of all numbers and finally just ignoring them since I *know* that they are bogus ;-) No, seriously... I understand the problem, but I think the above is the best I can do since I'm really only interested in the effekt it has on QEMU for the moment :-) Thanks again for your ideas!!
[Qemu-devel] performance monitor
hi! has anyone ever used some real performance monitoring tools (like papiex, perfex, pfmon, etc.) on qemu? i'm running a debian linux and would like to time some applications inside qemu and have tried the perfmon2 kernel-patch (http://perfmon2.sourceforge.net/) for testing. sadly, it does not work... dmesg tells me that the CPU is not identified correctly (unsupported family=6). Now i am not really sure what type of hardware-support the monitor relies on (i think PMU is the correct term, but I'm not sure about that) and what CPUs are supported (dmesg tells me that qemu simulates a Pentium M, but that's probably because I've compiled the kernel on my *real* Pentium M). ... Ok, to cut a long question short: Is there any hardware support im qemu for doing monitoring (that goes deeper than using time) and has anyone ever tested something that could work? Thanks! Clemens
Re: [Qemu-devel] performance monitor
On Thursday 03 January 2008 22:29:06 Paul Brook wrote: ... Ok, to cut a long question short: Is there any hardware support im qemu for doing monitoring (that goes deeper than using time) and has anyone ever tested something that could work? Probably your application wants the performance counters. Qemu doesn't emulate those. Besides which, qemu is not cycle accurate. Any performance measurements your make are pretty much meaningless, and bear absolutely no relationship to real hardware. Thanks for the quick answer Paul! Not really what I wanted to hear, but probably true ;-) Does anyone have an idea on how I can measure performance in qemu to a somewhat accurate level? I have modified qemu (the memory handling) and the linux kernel and want to find out the penalty this introduced... does anyone have any comments / ideas on this? Thanks!
Re: [Qemu-devel] performance monitor
... Ok, to cut a long question short: Is there any hardware support im qemu for doing monitoring (that goes deeper than using time) and has anyone ever tested something that could work? Probably your application wants the performance counters. Qemu doesn't emulate those. Besides which, qemu is not cycle accurate. Any performance measurements your make are pretty much meaningless, and bear absolutely no relationship to real hardware. Paul
Re: [Qemu-devel] performance monitor
Does anyone have an idea on how I can measure performance in qemu to a somewhat accurate level? I have modified qemu (the memory handling) and the linux kernel and want to find out the penalty this introduced... does anyone have any comments / ideas on this? Short answer is you probably can't. And even if you can I won't believe tyour results unless you've verified them on real hardware :-) With the exception of some very small embedded cores, Modern CPUs have complex out of order execution pipelines and multi-level cache hierarchies. It's common for performance to be dominated by these secondary factors rather than raw instruction throughput. Exactly what features dominate performance is very application specific. Determining which factor dominates is unlikely to be something qemu can help with. However if e.g. you know that for your application there's a good correlation was between performance and L2 cache misses you could instrument qemu to and a L1/L2 cache model. The overhead will be fairly severe (easily 10x slower), and completely screw up any realtime measurements. However it would produce some useful cache use statistics that you could use to guesstimate actual performance. This is similar to how cachegrind works. Obviously if your application isn't cache bound then these figures will be meaningless. Paul
Re: [Qemu-devel] performance monitor
On Thursday 03 January 2008 23:07:07 you wrote: Does anyone have an idea on how I can measure performance in qemu to a somewhat accurate level? I have modified qemu (the memory handling) and the linux kernel and want to find out the penalty this introduced... does anyone have any comments / ideas on this? Short answer is you probably can't. And even if you can I won't believe tyour results unless you've verified them on real hardware :-) With the exception of some very small embedded cores, Modern CPUs have complex out of order execution pipelines and multi-level cache hierarchies. It's common for performance to be dominated by these secondary factors rather than raw instruction throughput. Exactly what features dominate performance is very application specific. Determining which factor dominates is unlikely to be something qemu can help with. However if e.g. you know that for your application there's a good correlation was between performance and L2 cache misses you could instrument qemu to and a L1/L2 cache model. The overhead will be fairly severe (easily 10x slower), and completely screw up any realtime measurements. However it would produce some useful cache use statistics that you could use to guesstimate actual performance. This is similar to how cachegrind works. Obviously if your application isn't cache bound then these figures will be meaningless. Well, the measuring I had in mind partly concentrats on TLB misses, page faults, etc. (in addition to the cycle measuring). guess i'll have to implement something for myself in qemu :-/ But thanks a lot for helping me out!
Re: [Qemu-devel] performance monitor
On Jan 3, 2008 11:11 PM, Clemens Kolbitsch [EMAIL PROTECTED] wrote: Well, the measuring I had in mind partly concentrats on TLB misses, page faults, etc. (in addition to the cycle measuring). guess i'll have to implement something for myself in qemu :-/ There's something not clear here: do you want to measure your kernel changes or do you want to profile Qemu? As Paul clearly explained you can't do both :) If you want to measure kernel performance oprofile is probably worth looking at. But you will need the real hardware. Another option, though much more intrusive, would be to add explicit performance counters in places you need to look at (this method can be applied to both Qemu too). And to say it again: nobody can expect to measure OS performance on a simulator, unless the simulator is directly derived from the HDL code written by designers. At least I would never trust such a result ;) Laurent
Re: [Qemu-devel] performance monitor
On Thursday 03 January 2008 23:18:58 Paul Brook wrote: Well, the measuring I had in mind partly concentrats on TLB misses, page faults, etc. (in addition to the cycle measuring). guess i'll have to implement something for myself in qemu :-/ Be aware that the TLB qemu uses behaves very differently to a real CPU TLB. If you want to get TLB miss statistics you'll need to model a real TLB for that separately. Sure, yes. But I don't even care what it would be like on a real CPU. I just want to know the impact it has on the emulated CPU ;-) Page faults should be straightforward, but any half-decent guest OS would be able to tell you those anyway. True *g*
Re: [Qemu-devel] performance monitor
Well, the measuring I had in mind partly concentrats on TLB misses, page faults, etc. (in addition to the cycle measuring). guess i'll have to implement something for myself in qemu :-/ Be aware that the TLB qemu uses behaves very differently to a real CPU TLB. If you want to get TLB miss statistics you'll need to model a real TLB for that separately. Page faults should be straightforward, but any half-decent guest OS would be able to tell you those anyway. Paul