Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 20:24, Scott Wood wrote:

> On 07/10/2013 05:23:36 AM, Alexander Graf wrote:
>> On 10.07.2013, at 00:26, Scott Wood wrote:
>> > On 07/09/2013 05:00:26 PM, Alexander Graf wrote:
>> >> It'll also be more flexible at the same time. You could take the logs and 
>> >> actually check what's going on to debug issues that you're encountering 
>> >> for example.
>> >> We could even go as far as sharing the same tool with other 
>> >> architectures, so that we only have to learn how to debug things once.
>> >
>> > Have you encountered an actual need for this flexibility, or is it 
>> > theoretical?
>> Yeah, first thing I did back then to actually debug kvm failures was to add 
>> trace points.
> 
> I meant specifically for handling exit timings this way.

No, but I did encounter the need for debugging exits. And the only thing we 
would need to get exit timing stats once we already have trace points for exit 
types would be to have trace points for guest entry and maybe type specific 
events to indicate what the exit is about as well.

> 
>> > Is there common infrastructure for dealing with measuring intervals and 
>> > tracking statistics thereof, rather than just tracking points and letting 
>> > userspace connect the dots (though it could still do that as an option)?  
>> > Even if it must be done in userspace, it doesn't seem like something that 
>> > should be KVM-specific.
>> Would you like to have different ways of measuring mm subsystem overhead? I 
>> don't :). The same goes for KVM really. If we could converge towards a 
>> single user space interface to get exit timings, it'd make debugging a lot 
>> easier.
> 
> I agree -- that's why I said it doesn't seem like something that should be 
> KVM-specific.  But that's orthogonal to whether it's done in kernel space or 
> user space.  The ability to get begin/end events from userspace would be nice 
> when it is specifically requested, but it would also be nice if the kernel 
> could track some basic statistics so we wouldn't have to ship so much data 
> around to arrive at the same result.
> 
> At the very least, I'd like such a tool/infrastructure to exist before we 
> start complaining about doing minor maintenance of the current mechanism.

I admit that I don't fully understand qemu/scripts/kvm/kvm_stat, but it seems 
to me as if it already does pretty much what we want. It sets up a filter to 
only get events and their time stamps through.

It does use normal exit trace points on x86 to replace the old debugfs based 
stat counters. And it seems to work reasonably well for that.

> 
>> We already have this for the debugfs counters btw. And the timing framework 
>> does break kvm_stat today already, as it emits textual stats rather than 
>> numbers which all of the other debugfs stats do. But at least I can take the 
>> x86 kvm_stat tool and run it on ppc just fine to see exit stats.
> 
> We already have what?  The last two sentences seem contradictory -- can you 
> or can't you use kvm_stat as is?  I'm not familiar with kvm_stat.

Kvm_stat back in the day used debugfs to give you an idea on what exit event 
happens most often. That mechanism got replaced by trace points later which the 
current kvm_stat uses.

I still have a copy of the old kvm_stat that I always use to get a first 
feeling for what goes wrong if something goes wrong. The original code couldn't 
deal with the fact that we have a debugfs file that contains text though. I 
patched it locally. It also works just fine if you simply disable timing stats, 
since then you won't have the text file.

> What does x86 KVM expose in debugfs?

The same thing it always exposed - exit stats. I am fairly sure Avi wanted to 
completely deprecate that interface in favor of the trace point based approach, 
but I don't think he ever got around to it.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-10 Thread Scott Wood

On 07/10/2013 05:23:36 AM, Alexander Graf wrote:


On 10.07.2013, at 00:26, Scott Wood wrote:

> On 07/09/2013 05:00:26 PM, Alexander Graf wrote:
>> It'll also be more flexible at the same time. You could take the  
logs and actually check what's going on to debug issues that you're  
encountering for example.
>> We could even go as far as sharing the same tool with other  
architectures, so that we only have to learn how to debug things once.

>
> Have you encountered an actual need for this flexibility, or is it  
theoretical?


Yeah, first thing I did back then to actually debug kvm failures was  
to add trace points.


I meant specifically for handling exit timings this way.

> Is there common infrastructure for dealing with measuring intervals  
and tracking statistics thereof, rather than just tracking points and  
letting userspace connect the dots (though it could still do that as  
an option)?  Even if it must be done in userspace, it doesn't seem  
like something that should be KVM-specific.


Would you like to have different ways of measuring mm subsystem  
overhead? I don't :). The same goes for KVM really. If we could  
converge towards a single user space interface to get exit timings,  
it'd make debugging a lot easier.


I agree -- that's why I said it doesn't seem like something that should  
be KVM-specific.  But that's orthogonal to whether it's done in kernel  
space or user space.  The ability to get begin/end events from  
userspace would be nice when it is specifically requested, but it would  
also be nice if the kernel could track some basic statistics so we  
wouldn't have to ship so much data around to arrive at the same result.


At the very least, I'd like such a tool/infrastructure to exist before  
we start complaining about doing minor maintenance of the current  
mechanism.


We already have this for the debugfs counters btw. And the timing  
framework does break kvm_stat today already, as it emits textual  
stats rather than numbers which all of the other debugfs stats do.  
But at least I can take the x86 kvm_stat tool and run it on ppc just  
fine to see exit stats.


We already have what?  The last two sentences seem contradictory -- can  
you or can't you use kvm_stat as is?  I'm not familiar with kvm_stat.


What does x86 KVM expose in debugfs?

>> > Lots of debug options are enabled at build time; why must this  
be different?
>> Because I think it's valuable as debug tool for cases where  
compile time switches are not the best way of debugging things. It's  
not a high profile thing to tackle for me tbh, but I don't really  
think working heavily on the timing stat thing is the correct path to  
walk along.

>
> Adding new exit types isn't "working heavily" on it.

No, but the fact that the first patch is a fix to add exit stats for  
exits that we missed out before doesn't give me a lot of confidence  
that lots of people use timing stats. And I am always very weary of  
#ifdef'ed code, as it blows up the test matrix heavily.


I used it quite a lot when I was doing KVM performance work.  It's just  
been a while since I last did that.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-10 Thread Alexander Graf

On 10.07.2013, at 00:26, Scott Wood wrote:

> On 07/09/2013 05:00:26 PM, Alexander Graf wrote:
>> On 09.07.2013, at 23:54, Scott Wood wrote:
>> > On 07/09/2013 04:49:32 PM, Alexander Graf wrote:
>> >> Not sure I understand. What the timing stats do is that they measure the 
>> >> time between [exit ... entry], right? We'd do the same thing, just all in 
>> >> C code. That means we would become slightly less accurate, but gain 
>> >> dynamic enabling of the traces and get rid of all the timing stat asm 
>> >> code.
>> >
>> > Compile-time enabling bothers me less than a loss of accuracy (not just a 
>> > small loss by moving into C code, but a potential for a large loss if we 
>> > overflow the buffer)
>> Then don't overflow the buffer. Make it large enough.
> 
> How large is that?  Does the tool recognize and report when overflow happens?
> 
> How much will the overhead of running some python script on the host, 
> consuming a large volume of data, affect the results?
> 
>> IIRC ftrace improved recently to dynamically increase the buffer size too.
>> Steven, do I remember correctly here?
> 
> Yay more complexity.
> 
> So now we get to worry about possible memory allocations happening when we 
> try to log something?  Or if there is a way to do an "atomic" log, we're back 
> to the "buffer might be full" situation.
> 
>> > and a dependency on a userspace tool
>> We already have that for kvm_stat. It's a simple python script - and you 
>> surely have python on your rootfs, no?
>> > (both in terms of the tool needing to be written, and in the hassle of 
>> > ensuring that it's present in the root filesystem of whatever system I'm 
>> > testing).  And the whole mechanism will be more complicated.
>> It'll also be more flexible at the same time. You could take the logs and 
>> actually check what's going on to debug issues that you're encountering for 
>> example.
>> We could even go as far as sharing the same tool with other architectures, 
>> so that we only have to learn how to debug things once.
> 
> Have you encountered an actual need for this flexibility, or is it 
> theoretical?

Yeah, first thing I did back then to actually debug kvm failures was to add 
trace points.

> Is there common infrastructure for dealing with measuring intervals and 
> tracking statistics thereof, rather than just tracking points and letting 
> userspace connect the dots (though it could still do that as an option)?  
> Even if it must be done in userspace, it doesn't seem like something that 
> should be KVM-specific.

Would you like to have different ways of measuring mm subsystem overhead? I 
don't :). The same goes for KVM really. If we could converge towards a single 
user space interface to get exit timings, it'd make debugging a lot easier.

We already have this for the debugfs counters btw. And the timing framework 
does break kvm_stat today already, as it emits textual stats rather than 
numbers which all of the other debugfs stats do. But at least I can take the 
x86 kvm_stat tool and run it on ppc just fine to see exit stats.

> 
>> > Lots of debug options are enabled at build time; why must this be 
>> > different?
>> Because I think it's valuable as debug tool for cases where compile time 
>> switches are not the best way of debugging things. It's not a high profile 
>> thing to tackle for me tbh, but I don't really think working heavily on the 
>> timing stat thing is the correct path to walk along.
> 
> Adding new exit types isn't "working heavily" on it.

No, but the fact that the first patch is a fix to add exit stats for exits that 
we missed out before doesn't give me a lot of confidence that lots of people 
use timing stats. And I am always very weary of #ifdef'ed code, as it blows up 
the test matrix heavily.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Steven Rostedt
On Tue, 2013-07-09 at 17:26 -0500, Scott Wood wrote:
> On 07/09/2013 05:00:26 PM, Alexander Graf wrote:
> > 
> > On 09.07.2013, at 23:54, Scott Wood wrote:
> > 
> > > On 07/09/2013 04:49:32 PM, Alexander Graf wrote:
> > >> Not sure I understand. What the timing stats do is that they  
> > measure the time between [exit ... entry], right? We'd do the same  
> > thing, just all in C code. That means we would become slightly less  
> > accurate, but gain dynamic enabling of the traces and get rid of all  
> > the timing stat asm code.
> > >
> > > Compile-time enabling bothers me less than a loss of accuracy (not  
> > just a small loss by moving into C code, but a potential for a large  
> > loss if we overflow the buffer)
> > 
> > Then don't overflow the buffer. Make it large enough.
> 
> How large is that?  Does the tool recognize and report when overflow  
> happens?

Note, the ftrace buffers allow you to see when overflow does happen.

> 
> How much will the overhead of running some python script on the host,  
> consuming a large volume of data, affect the results?

This doesn't need to be python, and you can read the buffers in binary
as well. Mauro wrote a tool that uses ftrace for MCE errors. You can
probably do something similar. I need to get the code that reads ftrace
binary buffers out as a library.


> 
> > IIRC ftrace improved recently to dynamically increase the buffer size  
> > too.

What did change was that you can create buffers for your own use.

> > 
> > Steven, do I remember correctly here?
> 
> Yay more complexity.

What? Is ftrace complex? ;-)

> 
> So now we get to worry about possible memory allocations happening when  
> we try to log something?  Or if there is a way to do an "atomic" log,  
> we're back to the "buffer might be full" situation.

Nope, ftrace doesn't do dynamic allocation here.

-- Steve

> 
> > > and a dependency on a userspace tool
> > 
> > We already have that for kvm_stat. It's a simple python script - and  
> > you surely have python on your rootfs, no?
> > 
> > > (both in terms of the tool needing to be written, and in the hassle  
> > of ensuring that it's present in the root filesystem of whatever  
> > system I'm testing).  And the whole mechanism will be more  
> > complicated.
> > 
> > It'll also be more flexible at the same time. You could take the logs  
> > and actually check what's going on to debug issues that you're  
> > encountering for example.
> > 
> > We could even go as far as sharing the same tool with other  
> > architectures, so that we only have to learn how to debug things once.
> 
> Have you encountered an actual need for this flexibility, or is it  
> theoretical?
> 
> Is there common infrastructure for dealing with measuring intervals and  
> tracking statistics thereof, rather than just tracking points and  
> letting userspace connect the dots (though it could still do that as an  
> option)?  Even if it must be done in userspace, it doesn't seem like  
> something that should be KVM-specific.
> 
> > > Lots of debug options are enabled at build time; why must this be  
> > different?
> > 
> > Because I think it's valuable as debug tool for cases where compile  
> > time switches are not the best way of debugging things. It's not a  
> > high profile thing to tackle for me tbh, but I don't really think  
> > working heavily on the timing stat thing is the correct path to walk  
> > along.
> 
> Adding new exit types isn't "working heavily" on it.
> 
> -Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Steven Rostedt
On Wed, 2013-07-10 at 00:00 +0200, Alexander Graf wrote:

> Then don't overflow the buffer. Make it large enough. IIRC ftrace improved 
> recently to dynamically increase the buffer size too.
> 
> Steven, do I remember correctly here?

Not really. Ftrace only dynamically increases the buffer when the trace
is first used. Other than that, the size is static. I also wouldn't
suggest allocating the buffer when needed as that has the overhead of
allocating memory.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Scott Wood

On 07/09/2013 05:00:26 PM, Alexander Graf wrote:


On 09.07.2013, at 23:54, Scott Wood wrote:

> On 07/09/2013 04:49:32 PM, Alexander Graf wrote:
>> Not sure I understand. What the timing stats do is that they  
measure the time between [exit ... entry], right? We'd do the same  
thing, just all in C code. That means we would become slightly less  
accurate, but gain dynamic enabling of the traces and get rid of all  
the timing stat asm code.

>
> Compile-time enabling bothers me less than a loss of accuracy (not  
just a small loss by moving into C code, but a potential for a large  
loss if we overflow the buffer)


Then don't overflow the buffer. Make it large enough.


How large is that?  Does the tool recognize and report when overflow  
happens?


How much will the overhead of running some python script on the host,  
consuming a large volume of data, affect the results?


IIRC ftrace improved recently to dynamically increase the buffer size  
too.


Steven, do I remember correctly here?


Yay more complexity.

So now we get to worry about possible memory allocations happening when  
we try to log something?  Or if there is a way to do an "atomic" log,  
we're back to the "buffer might be full" situation.



> and a dependency on a userspace tool

We already have that for kvm_stat. It's a simple python script - and  
you surely have python on your rootfs, no?


> (both in terms of the tool needing to be written, and in the hassle  
of ensuring that it's present in the root filesystem of whatever  
system I'm testing).  And the whole mechanism will be more  
complicated.


It'll also be more flexible at the same time. You could take the logs  
and actually check what's going on to debug issues that you're  
encountering for example.


We could even go as far as sharing the same tool with other  
architectures, so that we only have to learn how to debug things once.


Have you encountered an actual need for this flexibility, or is it  
theoretical?


Is there common infrastructure for dealing with measuring intervals and  
tracking statistics thereof, rather than just tracking points and  
letting userspace connect the dots (though it could still do that as an  
option)?  Even if it must be done in userspace, it doesn't seem like  
something that should be KVM-specific.


> Lots of debug options are enabled at build time; why must this be  
different?


Because I think it's valuable as debug tool for cases where compile  
time switches are not the best way of debugging things. It's not a  
high profile thing to tackle for me tbh, but I don't really think  
working heavily on the timing stat thing is the correct path to walk  
along.


Adding new exit types isn't "working heavily" on it.

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Alexander Graf

On 09.07.2013, at 23:54, Scott Wood wrote:

> On 07/09/2013 04:49:32 PM, Alexander Graf wrote:
>> On 09.07.2013, at 20:29, Scott Wood wrote:
>> > On 07/09/2013 12:46:32 PM, Alexander Graf wrote:
>> >> On 07/09/2013 07:16 PM, Scott Wood wrote:
>> >>> On 07/08/2013 01:45:58 PM, Alexander Graf wrote:
>>  On 03.07.2013, at 15:30, Mihai Caraman wrote:
>>  > Some guests are making use of return from machine check instruction
>>  > to do crazy things even though the 64-bit kernel doesn't handle yet
>>  > this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction 
>>  > accordingly.
>>  >
>>  > Signed-off-by: Mihai Caraman 
>>  > ---
>>  > arch/powerpc/include/asm/kvm_host.h |1 +
>>  > arch/powerpc/kvm/booke_emulate.c|   25 +
>>  > arch/powerpc/kvm/timing.c   |1 +
>>  > 3 files changed, 27 insertions(+), 0 deletions(-)
>>  >
>>  > diff --git a/arch/powerpc/include/asm/kvm_host.h 
>>  > b/arch/powerpc/include/asm/kvm_host.h
>>  > index af326cd..0466789 100644
>>  > --- a/arch/powerpc/include/asm/kvm_host.h
>>  > +++ b/arch/powerpc/include/asm/kvm_host.h
>>  > @@ -148,6 +148,7 @@ enum kvm_exit_types {
>>  > EMULATED_TLBWE_EXITS,
>>  > EMULATED_RFI_EXITS,
>>  > EMULATED_RFCI_EXITS,
>>  > +EMULATED_RFMCI_EXITS,
>>  I would quite frankly prefer to see us abandon the whole exit timing 
>>  framework in the kernel and instead use trace points. Then we don't 
>>  have to maintain all of this randomly exercised code.
>> >>> Would this map well to tracepoints?  We're not trying to track discrete 
>> >>> events, so much as accumulated time spent in different areas.
>> >> I think so. We'd just have to emit tracepoints as soon as we enter 
>> >> handle_exit and in prepare_to_enter. Then a user space program should 
>> >> have everything it needs to create statistics out of that. It would 
>> >> certainly simplify the entry/exit path.
>> >
>> > I was hoping that wasn't going to be your answer. :-)
>> >
>> > Such a change would introduce a new dependency, more complexity, and the 
>> > possibility for bad totals to result from a ring buffer filling faster 
>> > than userspace can drain it.
>> Well, at least it would allow for optional tracing :). Today you have to 
>> change a compile flag to enable / disable timing stats.
>> >
>> > I also don't see how it would simplify entry/exit, since we'd still need 
>> > to take timestamps in the same places, in order to record a final event 
>> > that says how long a particular event took.
>> Not sure I understand. What the timing stats do is that they measure the 
>> time between [exit ... entry], right? We'd do the same thing, just all in C 
>> code. That means we would become slightly less accurate, but gain dynamic 
>> enabling of the traces and get rid of all the timing stat asm code.
> 
> Compile-time enabling bothers me less than a loss of accuracy (not just a 
> small loss by moving into C code, but a potential for a large loss if we 
> overflow the buffer)

Then don't overflow the buffer. Make it large enough. IIRC ftrace improved 
recently to dynamically increase the buffer size too.

Steven, do I remember correctly here?

> and a dependency on a userspace tool

We already have that for kvm_stat. It's a simple python script - and you surely 
have python on your rootfs, no?

> (both in terms of the tool needing to be written, and in the hassle of 
> ensuring that it's present in the root filesystem of whatever system I'm 
> testing).  And the whole mechanism will be more complicated.

It'll also be more flexible at the same time. You could take the logs and 
actually check what's going on to debug issues that you're encountering for 
example.

We could even go as far as sharing the same tool with other architectures, so 
that we only have to learn how to debug things once.

> Lots of debug options are enabled at build time; why must this be different?

Because I think it's valuable as debug tool for cases where compile time 
switches are not the best way of debugging things. It's not a high profile 
thing to tackle for me tbh, but I don't really think working heavily on the 
timing stat thing is the correct path to walk along.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Scott Wood

On 07/09/2013 04:49:32 PM, Alexander Graf wrote:


On 09.07.2013, at 20:29, Scott Wood wrote:

> On 07/09/2013 12:46:32 PM, Alexander Graf wrote:
>> On 07/09/2013 07:16 PM, Scott Wood wrote:
>>> On 07/08/2013 01:45:58 PM, Alexander Graf wrote:
 On 03.07.2013, at 15:30, Mihai Caraman wrote:
 > Some guests are making use of return from machine check  
instruction
 > to do crazy things even though the 64-bit kernel doesn't  
handle yet
 > this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction  
accordingly.

 >
 > Signed-off-by: Mihai Caraman 
 > ---
 > arch/powerpc/include/asm/kvm_host.h |1 +
 > arch/powerpc/kvm/booke_emulate.c|   25  
+

 > arch/powerpc/kvm/timing.c   |1 +
 > 3 files changed, 27 insertions(+), 0 deletions(-)
 >
 > diff --git a/arch/powerpc/include/asm/kvm_host.h  
b/arch/powerpc/include/asm/kvm_host.h

 > index af326cd..0466789 100644
 > --- a/arch/powerpc/include/asm/kvm_host.h
 > +++ b/arch/powerpc/include/asm/kvm_host.h
 > @@ -148,6 +148,7 @@ enum kvm_exit_types {
 > EMULATED_TLBWE_EXITS,
 > EMULATED_RFI_EXITS,
 > EMULATED_RFCI_EXITS,
 > +EMULATED_RFMCI_EXITS,
 I would quite frankly prefer to see us abandon the whole exit  
timing framework in the kernel and instead use trace points. Then we  
don't have to maintain all of this randomly exercised code.
>>> Would this map well to tracepoints?  We're not trying to track  
discrete events, so much as accumulated time spent in different areas.
>> I think so. We'd just have to emit tracepoints as soon as we enter  
handle_exit and in prepare_to_enter. Then a user space program should  
have everything it needs to create statistics out of that. It would  
certainly simplify the entry/exit path.

>
> I was hoping that wasn't going to be your answer. :-)
>
> Such a change would introduce a new dependency, more complexity,  
and the possibility for bad totals to result from a ring buffer  
filling faster than userspace can drain it.


Well, at least it would allow for optional tracing :). Today you have  
to change a compile flag to enable / disable timing stats.


>
> I also don't see how it would simplify entry/exit, since we'd still  
need to take timestamps in the same places, in order to record a  
final event that says how long a particular event took.


Not sure I understand. What the timing stats do is that they measure  
the time between [exit ... entry], right? We'd do the same thing,  
just all in C code. That means we would become slightly less  
accurate, but gain dynamic enabling of the traces and get rid of all  
the timing stat asm code.


Compile-time enabling bothers me less than a loss of accuracy (not just  
a small loss by moving into C code, but a potential for a large loss if  
we overflow the buffer) and a dependency on a userspace tool (both in  
terms of the tool needing to be written, and in the hassle of ensuring  
that it's present in the root filesystem of whatever system I'm  
testing).  And the whole mechanism will be more complicated.


Lots of debug options are enabled at build time; why must this be  
different?


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Alexander Graf

On 09.07.2013, at 20:29, Scott Wood wrote:

> On 07/09/2013 12:46:32 PM, Alexander Graf wrote:
>> On 07/09/2013 07:16 PM, Scott Wood wrote:
>>> On 07/08/2013 01:45:58 PM, Alexander Graf wrote:
 On 03.07.2013, at 15:30, Mihai Caraman wrote:
 > Some guests are making use of return from machine check instruction
 > to do crazy things even though the 64-bit kernel doesn't handle yet
 > this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction accordingly.
 >
 > Signed-off-by: Mihai Caraman 
 > ---
 > arch/powerpc/include/asm/kvm_host.h |1 +
 > arch/powerpc/kvm/booke_emulate.c|   25 +
 > arch/powerpc/kvm/timing.c   |1 +
 > 3 files changed, 27 insertions(+), 0 deletions(-)
 >
 > diff --git a/arch/powerpc/include/asm/kvm_host.h 
 > b/arch/powerpc/include/asm/kvm_host.h
 > index af326cd..0466789 100644
 > --- a/arch/powerpc/include/asm/kvm_host.h
 > +++ b/arch/powerpc/include/asm/kvm_host.h
 > @@ -148,6 +148,7 @@ enum kvm_exit_types {
 > EMULATED_TLBWE_EXITS,
 > EMULATED_RFI_EXITS,
 > EMULATED_RFCI_EXITS,
 > +EMULATED_RFMCI_EXITS,
 I would quite frankly prefer to see us abandon the whole exit timing 
 framework in the kernel and instead use trace points. Then we don't have 
 to maintain all of this randomly exercised code.
>>> Would this map well to tracepoints?  We're not trying to track discrete 
>>> events, so much as accumulated time spent in different areas.
>> I think so. We'd just have to emit tracepoints as soon as we enter 
>> handle_exit and in prepare_to_enter. Then a user space program should have 
>> everything it needs to create statistics out of that. It would certainly 
>> simplify the entry/exit path.
> 
> I was hoping that wasn't going to be your answer. :-)
> 
> Such a change would introduce a new dependency, more complexity, and the 
> possibility for bad totals to result from a ring buffer filling faster than 
> userspace can drain it.

Well, at least it would allow for optional tracing :). Today you have to change 
a compile flag to enable / disable timing stats.

> 
> I also don't see how it would simplify entry/exit, since we'd still need to 
> take timestamps in the same places, in order to record a final event that 
> says how long a particular event took.

Not sure I understand. What the timing stats do is that they measure the time 
between [exit ... entry], right? We'd do the same thing, just all in C code. 
That means we would become slightly less accurate, but gain dynamic enabling of 
the traces and get rid of all the timing stat asm code.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Scott Wood

On 07/09/2013 12:46:32 PM, Alexander Graf wrote:

On 07/09/2013 07:16 PM, Scott Wood wrote:

On 07/08/2013 01:45:58 PM, Alexander Graf wrote:


On 03.07.2013, at 15:30, Mihai Caraman wrote:

> Some guests are making use of return from machine check  
instruction
> to do crazy things even though the 64-bit kernel doesn't handle  
yet
> this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction  
accordingly.

>
> Signed-off-by: Mihai Caraman 
> ---
> arch/powerpc/include/asm/kvm_host.h |1 +
> arch/powerpc/kvm/booke_emulate.c|   25  
+

> arch/powerpc/kvm/timing.c   |1 +
> 3 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h  
b/arch/powerpc/include/asm/kvm_host.h

> index af326cd..0466789 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -148,6 +148,7 @@ enum kvm_exit_types {
> EMULATED_TLBWE_EXITS,
> EMULATED_RFI_EXITS,
> EMULATED_RFCI_EXITS,
> +EMULATED_RFMCI_EXITS,

I would quite frankly prefer to see us abandon the whole exit  
timing framework in the kernel and instead use trace points. Then  
we don't have to maintain all of this randomly exercised code.


Would this map well to tracepoints?  We're not trying to track  
discrete events, so much as accumulated time spent in different  
areas.


I think so. We'd just have to emit tracepoints as soon as we enter  
handle_exit and in prepare_to_enter. Then a user space program should  
have everything it needs to create statistics out of that. It would  
certainly simplify the entry/exit path.


I was hoping that wasn't going to be your answer. :-)

Such a change would introduce a new dependency, more complexity, and  
the possibility for bad totals to result from a ring buffer filling  
faster than userspace can drain it.


I also don't see how it would simplify entry/exit, since we'd still  
need to take timestamps in the same places, in order to record a final  
event that says how long a particular event took.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Alexander Graf

On 07/09/2013 07:16 PM, Scott Wood wrote:

On 07/08/2013 01:45:58 PM, Alexander Graf wrote:


On 03.07.2013, at 15:30, Mihai Caraman wrote:

> Some guests are making use of return from machine check instruction
> to do crazy things even though the 64-bit kernel doesn't handle yet
> this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction 
accordingly.

>
> Signed-off-by: Mihai Caraman 
> ---
> arch/powerpc/include/asm/kvm_host.h |1 +
> arch/powerpc/kvm/booke_emulate.c|   25 +
> arch/powerpc/kvm/timing.c   |1 +
> 3 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h

> index af326cd..0466789 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -148,6 +148,7 @@ enum kvm_exit_types {
> EMULATED_TLBWE_EXITS,
> EMULATED_RFI_EXITS,
> EMULATED_RFCI_EXITS,
> +EMULATED_RFMCI_EXITS,

I would quite frankly prefer to see us abandon the whole exit timing 
framework in the kernel and instead use trace points. Then we don't 
have to maintain all of this randomly exercised code.


Would this map well to tracepoints?  We're not trying to track 
discrete events, so much as accumulated time spent in different areas.


I think so. We'd just have to emit tracepoints as soon as we enter 
handle_exit and in prepare_to_enter. Then a user space program should 
have everything it needs to create statistics out of that. It would 
certainly simplify the entry/exit path.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-09 Thread Scott Wood

On 07/08/2013 01:45:58 PM, Alexander Graf wrote:


On 03.07.2013, at 15:30, Mihai Caraman wrote:

> Some guests are making use of return from machine check instruction
> to do crazy things even though the 64-bit kernel doesn't handle yet
> this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction  
accordingly.

>
> Signed-off-by: Mihai Caraman 
> ---
> arch/powerpc/include/asm/kvm_host.h |1 +
> arch/powerpc/kvm/booke_emulate.c|   25 +
> arch/powerpc/kvm/timing.c   |1 +
> 3 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h  
b/arch/powerpc/include/asm/kvm_host.h

> index af326cd..0466789 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -148,6 +148,7 @@ enum kvm_exit_types {
>EMULATED_TLBWE_EXITS,
>EMULATED_RFI_EXITS,
>EMULATED_RFCI_EXITS,
> +  EMULATED_RFMCI_EXITS,

I would quite frankly prefer to see us abandon the whole exit timing  
framework in the kernel and instead use trace points. Then we don't  
have to maintain all of this randomly exercised code.


Would this map well to tracepoints?  We're not trying to track discrete  
events, so much as accumulated time spent in different areas.


-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-08 Thread Alexander Graf

On 03.07.2013, at 15:30, Mihai Caraman wrote:

> Some guests are making use of return from machine check instruction
> to do crazy things even though the 64-bit kernel doesn't handle yet
> this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction accordingly.
> 
> Signed-off-by: Mihai Caraman 
> ---
> arch/powerpc/include/asm/kvm_host.h |1 +
> arch/powerpc/kvm/booke_emulate.c|   25 +
> arch/powerpc/kvm/timing.c   |1 +
> 3 files changed, 27 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index af326cd..0466789 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -148,6 +148,7 @@ enum kvm_exit_types {
>   EMULATED_TLBWE_EXITS,
>   EMULATED_RFI_EXITS,
>   EMULATED_RFCI_EXITS,
> + EMULATED_RFMCI_EXITS,

I would quite frankly prefer to see us abandon the whole exit timing framework 
in the kernel and instead use trace points. Then we don't have to maintain all 
of this randomly exercised code.

FWIW I think in this case however, treating RFMCI the same as RFI or random 
"instruction emulation" shouldn't hurt. This whole table is only about timing 
measurements. If you want to know for real what's going on, use trace points.

Otherwise looks good.


Alex

>   DEC_EXITS,
>   EXT_INTR_EXITS,
>   HALT_WAKEUP,
> diff --git a/arch/powerpc/kvm/booke_emulate.c 
> b/arch/powerpc/kvm/booke_emulate.c
> index 27a4b28..aaff1b7 100644
> --- a/arch/powerpc/kvm/booke_emulate.c
> +++ b/arch/powerpc/kvm/booke_emulate.c
> @@ -23,6 +23,7 @@
> 
> #include "booke.h"
> 
> +#define OP_19_XOP_RFMCI   38
> #define OP_19_XOP_RFI 50
> #define OP_19_XOP_RFCI51
> 
> @@ -43,6 +44,12 @@ static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)
>   kvmppc_set_msr(vcpu, vcpu->arch.csrr1);
> }
> 
> +static void kvmppc_emul_rfmci(struct kvm_vcpu *vcpu)
> +{
> + vcpu->arch.pc = vcpu->arch.mcsrr0;
> + kvmppc_set_msr(vcpu, vcpu->arch.mcsrr1);
> +}
> +
> int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
> unsigned int inst, int *advance)
> {
> @@ -65,6 +72,12 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct 
> kvm_vcpu *vcpu,
>   *advance = 0;
>   break;
> 
> + case OP_19_XOP_RFMCI:
> + kvmppc_emul_rfmci(vcpu);
> + kvmppc_set_exit_type(vcpu, EMULATED_RFMCI_EXITS);
> + *advance = 0;
> + break;
> +
>   default:
>   emulated = EMULATE_FAIL;
>   break;
> @@ -138,6 +151,12 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, 
> int sprn, ulong spr_val)
>   case SPRN_DBCR1:
>   vcpu->arch.dbg_reg.dbcr1 = spr_val;
>   break;
> + case SPRN_MCSRR0:
> + vcpu->arch.mcsrr0 = spr_val;
> + break;
> + case SPRN_MCSRR1:
> + vcpu->arch.mcsrr1 = spr_val;
> + break;
>   case SPRN_DBSR:
>   vcpu->arch.dbsr &= ~spr_val;
>   break;
> @@ -284,6 +303,12 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, 
> int sprn, ulong *spr_val)
>   case SPRN_DBCR1:
>   *spr_val = vcpu->arch.dbg_reg.dbcr1;
>   break;
> + case SPRN_MCSRR0:
> + *spr_val = vcpu->arch.mcsrr0;
> + break;
> + case SPRN_MCSRR1:
> + *spr_val = vcpu->arch.mcsrr1;
> + break;
>   case SPRN_DBSR:
>   *spr_val = vcpu->arch.dbsr;
>   break;
> diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
> index c392d26..670f63d 100644
> --- a/arch/powerpc/kvm/timing.c
> +++ b/arch/powerpc/kvm/timing.c
> @@ -129,6 +129,7 @@ static const char 
> *kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = {
>   [EMULATED_TLBSX_EXITS] ="EMUL_TLBSX",
>   [EMULATED_TLBWE_EXITS] ="EMUL_TLBWE",
>   [EMULATED_RFI_EXITS] =  "EMUL_RFI",
> + [EMULATED_RFMCI_EXITS] ="EMUL_RFMCI",
>   [DEC_EXITS] =   "DEC",
>   [EXT_INTR_EXITS] =  "EXTINT",
>   [HALT_WAKEUP] = "HALT",
> -- 
> 1.7.3.4
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: PPC: Book3E: Emulate MCSRR0/1 SPR and rfmci instruction

2013-07-03 Thread Mihai Caraman
Some guests are making use of return from machine check instruction
to do crazy things even though the 64-bit kernel doesn't handle yet
this interrupt. Emulate MCSRR0/1 SPR and rfmci instruction accordingly.

Signed-off-by: Mihai Caraman 
---
 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/kvm/booke_emulate.c|   25 +
 arch/powerpc/kvm/timing.c   |1 +
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index af326cd..0466789 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -148,6 +148,7 @@ enum kvm_exit_types {
EMULATED_TLBWE_EXITS,
EMULATED_RFI_EXITS,
EMULATED_RFCI_EXITS,
+   EMULATED_RFMCI_EXITS,
DEC_EXITS,
EXT_INTR_EXITS,
HALT_WAKEUP,
diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index 27a4b28..aaff1b7 100644
--- a/arch/powerpc/kvm/booke_emulate.c
+++ b/arch/powerpc/kvm/booke_emulate.c
@@ -23,6 +23,7 @@
 
 #include "booke.h"
 
+#define OP_19_XOP_RFMCI   38
 #define OP_19_XOP_RFI 50
 #define OP_19_XOP_RFCI51
 
@@ -43,6 +44,12 @@ static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)
kvmppc_set_msr(vcpu, vcpu->arch.csrr1);
 }
 
+static void kvmppc_emul_rfmci(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.pc = vcpu->arch.mcsrr0;
+   kvmppc_set_msr(vcpu, vcpu->arch.mcsrr1);
+}
+
 int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
 unsigned int inst, int *advance)
 {
@@ -65,6 +72,12 @@ int kvmppc_booke_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
*advance = 0;
break;
 
+   case OP_19_XOP_RFMCI:
+   kvmppc_emul_rfmci(vcpu);
+   kvmppc_set_exit_type(vcpu, EMULATED_RFMCI_EXITS);
+   *advance = 0;
+   break;
+
default:
emulated = EMULATE_FAIL;
break;
@@ -138,6 +151,12 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
case SPRN_DBCR1:
vcpu->arch.dbg_reg.dbcr1 = spr_val;
break;
+   case SPRN_MCSRR0:
+   vcpu->arch.mcsrr0 = spr_val;
+   break;
+   case SPRN_MCSRR1:
+   vcpu->arch.mcsrr1 = spr_val;
+   break;
case SPRN_DBSR:
vcpu->arch.dbsr &= ~spr_val;
break;
@@ -284,6 +303,12 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, ulong *spr_val)
case SPRN_DBCR1:
*spr_val = vcpu->arch.dbg_reg.dbcr1;
break;
+   case SPRN_MCSRR0:
+   *spr_val = vcpu->arch.mcsrr0;
+   break;
+   case SPRN_MCSRR1:
+   *spr_val = vcpu->arch.mcsrr1;
+   break;
case SPRN_DBSR:
*spr_val = vcpu->arch.dbsr;
break;
diff --git a/arch/powerpc/kvm/timing.c b/arch/powerpc/kvm/timing.c
index c392d26..670f63d 100644
--- a/arch/powerpc/kvm/timing.c
+++ b/arch/powerpc/kvm/timing.c
@@ -129,6 +129,7 @@ static const char 
*kvm_exit_names[__NUMBER_OF_KVM_EXIT_TYPES] = {
[EMULATED_TLBSX_EXITS] ="EMUL_TLBSX",
[EMULATED_TLBWE_EXITS] ="EMUL_TLBWE",
[EMULATED_RFI_EXITS] =  "EMUL_RFI",
+   [EMULATED_RFMCI_EXITS] ="EMUL_RFMCI",
[DEC_EXITS] =   "DEC",
[EXT_INTR_EXITS] =  "EXTINT",
[HALT_WAKEUP] = "HALT",
-- 
1.7.3.4


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html