On Thursday 03 January 2008 15:38:02 Clemens Kolbitsch wrote: > Does anyone have an idea on how I can measure performance in qemu to a > somewhat accurate level?
hwclock --show > time1 tar xvjf linux-2.6.23.tar.bz2 && cd linux-2.6.23 && make allnoconfig && make cd .. hwclock --show > time2 Do that on host and client, and you've got a ratio of the performance of qemu to your host that should be good to within a few percent. > I have modified qemu (the memory handling) and the > linux kernel and want to find out the penalty this introduced... does > anyone have any comments / ideas on this? If it's something big, you can compare the result in minutes and seconds. That's probably the best you're going to do. (Although really you want hwclock --show before and after, and then do the math. That tunnels out to the host system to get its idea of the time, which doesn't get thrown off by timer interrupt delivery (as a signal) getting deferred by the host system's scheduler. Of course the fact that hwclock _takes_ a second or so to read the clock is a bit of a downer, but anything that takes less than a minute or so to run isn't going to give you a very accurate time because the performance of qemu isn't constant, and your results are going to skew all over the place. Especially for small things, the performance varies from run to run. Start by imagining qemu as having the mother of all page fault latencies. The cost of faulting code into the L2 cache includes dynamic recompilation, which is expensive. Worse, when the dynamic recompilation buffer fills up it blanks the whole thing, and recompiles every new page it hits one at a time until the buffer fills up again. (What is it these days, 16 megs of translated code before it resets?) No LRU or anything, no cache management at _all_, just "when the bucket fills up, dump it and start over". (Well, that's what it did back around the last stable release anyway. It has been almost a year since then, so maybe it's changed. I've been busy with other things and not really keeping track of changes that didn't affect what I could and couldn't get to run.) So anyway, depending on what code you run in what order, the performance can _differ_ from one run to the next due to when the cache gets blanked and stuff gets retranslated. By a lot. There's no obvious way to predict this or control it. And the "software" clock inside your emulated system can lie to you about it if timer interrupts get deferred. All this should pretty much average out if you do something big with lots of execs (like build a linux kernel from source). But if you do something small expect serious butterfly effects. Expect microbenchmarks to swing around wildly. Quick analogy: you know the performance difference faulting your executable in from disk vs running it out of cache? Imagine a daemon that makes random intermittent calls to "echo 1 > /proc/sys/vm/drop_caches", and now try to do a sane benchmark. No matter what you use to measure, what you're measuring isn't going to be consistent from one run to the next. Performance should be better (and more stable) with kqemu or kvm. Maybe that you can benchmark sanely, I wouldn't know. Ask somebody else. :) P.S. Take the above with a large grain of salt, I'm not close to an expert in this area... Rob -- "One of my most productive days was throwing away 1000 lines of code." - Ken Thompson.