Re: [Qemu-devel] performance monitor

2008-01-04 Thread Clemens Kolbitsch
On Friday 04 January 2008 09:49:22 Rob Landley wrote:
 On Thursday 03 January 2008 15:38:02 Clemens Kolbitsch wrote:
  Does anyone have an idea on how I can measure performance in qemu to a
  somewhat accurate level?

 hwclock --show  time1
 tar xvjf linux-2.6.23.tar.bz2  cd linux-2.6.23  make allnoconfig 
 make cd ..
 hwclock --show  time2

 Do that on host and client, and you've got a ratio of the performance of
 qemu to your host that should be good to within a few percent.

  I have modified qemu (the memory handling) and the
  linux kernel and want to find out the penalty this introduced... does
  anyone have any comments / ideas on this?

 If it's something big, you can compare the result in minutes and seconds.
 That's probably the best you're going to do.  (Although really you want
 hwclock --show before and after, and then do the math.  That tunnels out to
 the host system to get its idea of the time, which doesn't get thrown off
 by timer interrupt delivery (as a signal) getting deferred by the host
 system's scheduler.  Of course the fact that hwclock _takes_ a second or so
 to read the clock is a bit of a downer, but anything that takes less than a
 minute or so to run isn't going to give you a very accurate time because
 the performance of qemu isn't constant, and your results are going to skew
 all over the place.

 Especially for small things, the performance varies from run to run.  Start
 by imagining qemu as having the mother of all page fault latencies.  The
 cost of faulting code into the L2 cache includes dynamic recompilation,
 which is expensive.

 Worse, when the dynamic recompilation buffer fills up it blanks the whole
 thing, and recompiles every new page it hits one at a time until the buffer
 fills up again.  (What is it these days, 16 megs of translated code before
 it resets?)  No LRU or anything, no cache management at _all_, just when
 the bucket fills up, dump it and start over.  (Well, that's what it did
 back around the last stable release anyway.  It has been almost a year
 since then, so maybe it's changed.  I've been busy with other things and
 not really keeping track of changes that didn't affect what I could and
 couldn't get to run.)

 So anyway, depending on what code you run in what order, the performance
 can _differ_ from one run to the next due to when the cache gets blanked
 and stuff gets retranslated.  By a lot.  There's no obvious way to predict
 this or control it.  And the software clock inside your emulated system
 can lie to you about it if timer interrupts get deferred.

 All this should pretty much average out if you do something big with lots
 of execs (like build a linux kernel from source).  But if you do something
 small expect serious butterfly effects.  Expect microbenchmarks to swing
 around wildly.

 Quick analogy: you know the performance difference faulting your executable
 in

 from disk vs running it out of cache?  Imagine a daemon that makes random

 intermittent calls to echo 1  /proc/sys/vm/drop_caches, and now try to
 do a sane benchmark.  No matter what you use to measure, what you're
 measuring isn't going to be consistent from one run to the next.

 Performance should be better (and more stable) with kqemu or kvm.  Maybe
 that you can benchmark sanely, I wouldn't know.  Ask somebody else. :)

 P.S.  Take the above with a large grain of salt, I'm not close to an expert
 in this area...

:-)

Ok. What you've said pretty much covers how I've made up my mind in the last 
couple of hours trying to think about the problem *g*

Guess I'll have to be happy counting TLB misses and page faults, adding up 
executed instructions (in user/kernel mode) per process and doing some timing 
stuff... then running the examples a lot of times, making an average of all 
numbers and finally just ignoring them since I *know* that they are bogus ;-)

No, seriously... I understand the problem, but I think the above is the best I 
can do since I'm really only interested in the effekt it has on QEMU for the 
moment :-)

Thanks again for your ideas!!






[Qemu-devel] performance monitor

2008-01-03 Thread Clemens Kolbitsch
hi!
has anyone ever used some real performance monitoring tools (like papiex, 
perfex, pfmon, etc.) on qemu? i'm running a debian linux and would like to 
time some applications inside qemu and have tried the perfmon2 kernel-patch 
(http://perfmon2.sourceforge.net/) for testing.

sadly, it does not work... dmesg tells me that the CPU is not identified 
correctly (unsupported family=6). Now i am not really sure what type of 
hardware-support the monitor relies on (i think PMU is the correct term, but 
I'm not sure about that) and what CPUs are supported (dmesg tells me that 
qemu simulates a Pentium M, but that's probably because I've compiled the 
kernel on my *real* Pentium M).

... Ok, to cut a long question short: Is there any hardware support im qemu 
for doing monitoring (that goes deeper than using time) and has anyone ever 
tested something that could work?

Thanks!
Clemens




Re: [Qemu-devel] performance monitor

2008-01-03 Thread Clemens Kolbitsch
On Thursday 03 January 2008 22:29:06 Paul Brook wrote:
  ... Ok, to cut a long question short: Is there any hardware support im
  qemu for doing monitoring (that goes deeper than using time) and has
  anyone ever tested something that could work?

 Probably your application wants the performance counters. Qemu doesn't
 emulate those.

 Besides which, qemu is not cycle accurate. Any performance measurements
 your make are pretty much meaningless, and bear absolutely no relationship
 to real hardware.

Thanks for the quick answer Paul! Not really what I wanted to hear, but 
probably true ;-)

Does anyone have an idea on how I can measure performance in qemu to a 
somewhat accurate level? I have modified qemu (the memory handling) and the 
linux kernel and want to find out the penalty this introduced... does anyone 
have any comments / ideas on this?

Thanks!




Re: [Qemu-devel] performance monitor

2008-01-03 Thread Paul Brook
 ... Ok, to cut a long question short: Is there any hardware support im qemu
 for doing monitoring (that goes deeper than using time) and has anyone
 ever tested something that could work?

Probably your application wants the performance counters. Qemu doesn't emulate 
those.

Besides which, qemu is not cycle accurate. Any performance measurements your 
make are pretty much meaningless, and bear absolutely no relationship to real 
hardware.

Paul




Re: [Qemu-devel] performance monitor

2008-01-03 Thread Paul Brook
 Does anyone have an idea on how I can measure performance in qemu to a
 somewhat accurate level? I have modified qemu (the memory handling) and the
 linux kernel and want to find out the penalty this introduced... does
 anyone have any comments / ideas on this?

Short answer is you probably can't. And even if you can I won't believe tyour 
results unless you've verified them on real hardware :-)

With the exception of some very small embedded cores, Modern CPUs have complex 
out of order execution pipelines and multi-level cache hierarchies. It's 
common for performance to be dominated by these secondary factors rather than 
raw instruction throughput.

Exactly what features dominate performance is very application specific. 
Determining which factor dominates is unlikely to be something qemu can help 
with.

However if e.g. you know that for your application there's a good correlation 
was between performance and L2 cache misses you could instrument qemu to and 
a L1/L2 cache model. The overhead will be fairly severe (easily 10x slower), 
and completely screw up any realtime measurements. However it would produce 
some useful cache use statistics that you could use to guesstimate actual 
performance. This is similar to how cachegrind works. Obviously if your 
application isn't cache bound then these figures will be meaningless.

Paul




Re: [Qemu-devel] performance monitor

2008-01-03 Thread Clemens Kolbitsch
On Thursday 03 January 2008 23:07:07 you wrote:
  Does anyone have an idea on how I can measure performance in qemu to a
  somewhat accurate level? I have modified qemu (the memory handling) and
  the linux kernel and want to find out the penalty this introduced... does
  anyone have any comments / ideas on this?

 Short answer is you probably can't. And even if you can I won't believe
 tyour results unless you've verified them on real hardware :-)

 With the exception of some very small embedded cores, Modern CPUs have
 complex out of order execution pipelines and multi-level cache hierarchies.
 It's common for performance to be dominated by these secondary factors
 rather than raw instruction throughput.

 Exactly what features dominate performance is very application specific.
 Determining which factor dominates is unlikely to be something qemu can
 help with.

 However if e.g. you know that for your application there's a good
 correlation was between performance and L2 cache misses you could
 instrument qemu to and a L1/L2 cache model. The overhead will be fairly
 severe (easily 10x slower), and completely screw up any realtime
 measurements. However it would produce some useful cache use statistics
 that you could use to guesstimate actual performance. This is similar to
 how cachegrind works. Obviously if your application isn't cache bound then
 these figures will be meaningless.

Well, the measuring I had in mind partly concentrats on TLB misses, page 
faults, etc. (in addition to the cycle measuring). guess i'll have to 
implement something for myself in qemu :-/

But thanks a lot for helping me out!





Re: [Qemu-devel] performance monitor

2008-01-03 Thread Laurent Desnogues
On Jan 3, 2008 11:11 PM, Clemens Kolbitsch [EMAIL PROTECTED] wrote:

 Well, the measuring I had in mind partly concentrats on TLB misses, page
 faults, etc. (in addition to the cycle measuring). guess i'll have to
 implement something for myself in qemu :-/

There's something not clear here:  do you want to measure your kernel
changes or do you want to profile Qemu?

As Paul clearly explained you can't do both :)

If you want to measure kernel performance oprofile is probably worth
looking at.  But you will need the real hardware.

Another option, though much more intrusive, would be to add explicit
performance counters in places you need to look at (this method can
be applied to both Qemu too).

And to say it again:  nobody can expect to measure OS performance
on a simulator, unless the simulator is directly derived from the HDL
code written by designers.  At least I would never trust such a
result ;)


Laurent




Re: [Qemu-devel] performance monitor

2008-01-03 Thread Clemens Kolbitsch
On Thursday 03 January 2008 23:18:58 Paul Brook wrote:
  Well, the measuring I had in mind partly concentrats on TLB misses, page
  faults, etc. (in addition to the cycle measuring). guess i'll have to
  implement something for myself in qemu :-/

 Be aware that the TLB qemu uses behaves very differently to a real CPU TLB.
 If you want to get TLB miss statistics you'll need to model a real TLB
 for that separately.

Sure, yes. But I don't even care what it would be like on a real CPU. I just 
want to know the impact it has on the emulated CPU ;-)

 Page faults should be straightforward, but any half-decent guest OS would
 be able to tell you those anyway.

True *g*




Re: [Qemu-devel] performance monitor

2008-01-03 Thread Paul Brook
 Well, the measuring I had in mind partly concentrats on TLB misses, page
 faults, etc. (in addition to the cycle measuring). guess i'll have to
 implement something for myself in qemu :-/

Be aware that the TLB qemu uses behaves very differently to a real CPU TLB. If 
you want to get TLB miss statistics you'll need to model a real TLB for 
that separately.

Page faults should be straightforward, but any half-decent guest OS would be 
able to tell you those anyway.

Paul