On 10.07.2013, at 00:26, Scott Wood wrote:

> On 07/09/2013 05:00:26 PM, Alexander Graf wrote:
>> On 09.07.2013, at 23:54, Scott Wood wrote:
>> > On 07/09/2013 04:49:32 PM, Alexander Graf wrote:
>> >> Not sure I understand. What the timing stats do is that they measure the 
>> >> time between [exit ... entry], right? We'd do the same thing, just all in 
>> >> C code. That means we would become slightly less accurate, but gain 
>> >> dynamic enabling of the traces and get rid of all the timing stat asm 
>> >> code.
>> >
>> > Compile-time enabling bothers me less than a loss of accuracy (not just a 
>> > small loss by moving into C code, but a potential for a large loss if we 
>> > overflow the buffer)
>> Then don't overflow the buffer. Make it large enough.
> 
> How large is that?  Does the tool recognize and report when overflow happens?
> 
> How much will the overhead of running some python script on the host, 
> consuming a large volume of data, affect the results?
> 
>> IIRC ftrace improved recently to dynamically increase the buffer size too.
>> Steven, do I remember correctly here?
> 
> Yay more complexity.
> 
> So now we get to worry about possible memory allocations happening when we 
> try to log something?  Or if there is a way to do an "atomic" log, we're back 
> to the "buffer might be full" situation.
> 
>> > and a dependency on a userspace tool
>> We already have that for kvm_stat. It's a simple python script - and you 
>> surely have python on your rootfs, no?
>> > (both in terms of the tool needing to be written, and in the hassle of 
>> > ensuring that it's present in the root filesystem of whatever system I'm 
>> > testing).  And the whole mechanism will be more complicated.
>> It'll also be more flexible at the same time. You could take the logs and 
>> actually check what's going on to debug issues that you're encountering for 
>> example.
>> We could even go as far as sharing the same tool with other architectures, 
>> so that we only have to learn how to debug things once.
> 
> Have you encountered an actual need for this flexibility, or is it 
> theoretical?

Yeah, first thing I did back then to actually debug kvm failures was to add 
trace points.

> Is there common infrastructure for dealing with measuring intervals and 
> tracking statistics thereof, rather than just tracking points and letting 
> userspace connect the dots (though it could still do that as an option)?  
> Even if it must be done in userspace, it doesn't seem like something that 
> should be KVM-specific.

Would you like to have different ways of measuring mm subsystem overhead? I 
don't :). The same goes for KVM really. If we could converge towards a single 
user space interface to get exit timings, it'd make debugging a lot easier.

We already have this for the debugfs counters btw. And the timing framework 
does break kvm_stat today already, as it emits textual stats rather than 
numbers which all of the other debugfs stats do. But at least I can take the 
x86 kvm_stat tool and run it on ppc just fine to see exit stats.

> 
>> > Lots of debug options are enabled at build time; why must this be 
>> > different?
>> Because I think it's valuable as debug tool for cases where compile time 
>> switches are not the best way of debugging things. It's not a high profile 
>> thing to tackle for me tbh, but I don't really think working heavily on the 
>> timing stat thing is the correct path to walk along.
> 
> Adding new exit types isn't "working heavily" on it.

No, but the fact that the first patch is a fix to add exit stats for exits that 
we missed out before doesn't give me a lot of confidence that lots of people 
use timing stats. And I am always very weary of #ifdef'ed code, as it blows up 
the test matrix heavily.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to