Mike Bresnahan wrote:
I can understand that I will not get as much performance out of a EC2 instance
as a dedicated server, but I don't understand why top(1) is showing 50% CPU
utilization. If it were a memory speed problem wouldn't top(1) report 100% CPU

A couple of points:

top is not the be-all and end-all of analysis tools. I'm sure you know that, but it bears repeating.

More importantly, in a virtualised environment the tools on the inside of the guest don't have a full picture of what's really going on. I've not done any real work with Xen; most of my experience is with zVM and KVM.

It's pretty normal on a heavily loaded server to see tools like top (and vmstat, sar, et al) reporting less than 100% use while the box is running flat-out, leaving nothing left for the guest to get. I had this last night doing a load on a guest - 60-70% CPU at peak, with no more available. You *should* see steal and 0% idle time in this case, but I *have* seen zVM Linux guests reporting ample idle time while the zVM level monitoring tools reported the LPAR as a whole running at 90-95% utilisation (which is when an LPAR will usually run out of steam).

A secondary effect is that sometimes the scheduling of guests on and off the hypervisor will cause skewing in the timekeeping of the guest; it's not uncommon in our loaded-up zVM environment to see discrepencies of 5-20% between the guest's view of how much CPU time it thinks it's getting and how much time the hypervisor knows it's getting (this is why companies like Velocity make money selling hypervisor-aware tools that auto-correct those stats).

In any case, assuming this is a EC2 memory speed thing, it is going to be
difficult to diagnose application bottlenecks when I cannot rely on top(1)
reporting meaningful CPU stats.

It's going to be even harder from inside the guests, since you're getting an incomplete view of the system as a whole.

You could try the c2cbench (http://sourceforge.net/projects/c2cbench/) which is designed to benchmark memory cache performance, but it'll still be subject to the caveats I outlined above: it may give you something indicative if you think it's a cache problem, but it may also simply tell you that the virtual CPUs are fine while the real processors are pegged for cache from running a bunch of workloads with high memory pressure.

If you were running a newer kernel you could look at perf_counters or something similar to get more detail from what the guest thinks it's doing, but, again, there are going to be inaccuracies.

Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:

Reply via email to