kvm BookE and SPRGs

2009-07-10 Thread Benjamin Herrenschmidt
Hi Hollis !

I was roaming through kernel usage of SPRGs and noticed a small detail
in kvmppc for BookE ... any reason why in OP_31_XOP_MTSPR, you
open coded the emulation of SPRG0..3, but 4...7 are handled
in kvmppc_core_emulate_mtspr() ?

It occurs to me that in fact for both MTSPR and MFSPR, the code should
be moved into kvmppc_core_emulate_mtspr() and
kvmppc_core_emulate_mfspr() for consistency.

Also, from looking at the FSL BookE code, it seems that there is such a
thing as SPRG9 (and so I suppose there must be an SPRG8 somewhere too),
shouldn't we handle it too ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm BookE and SPRGs

2009-07-10 Thread Benjamin Herrenschmidt
On Fri, 2009-07-10 at 16:31 +1000, Benjamin Herrenschmidt wrote:
 
 I was roaming through kernel usage of SPRGs and noticed a small detail
 in kvmppc for BookE ... any reason why in OP_31_XOP_MTSPR, you
 open coded the emulation of SPRG0..3, but 4...7 are handled
 in kvmppc_core_emulate_mtspr() ?
 
 It occurs to me that in fact for both MTSPR and MFSPR, the code should
 be moved into kvmppc_core_emulate_mtspr() and
 kvmppc_core_emulate_mfspr() for consistency.
 
 Also, from looking at the FSL BookE code, it seems that there is such a
 thing as SPRG9 (and so I suppose there must be an SPRG8 somewhere too),
 shouldn't we handle it too ?

BTW. That leads me to another question (CC'ing Avi there too), which is
what is the policy vs. para-virtualization ? IE. Are we ok with adding
paravirt tricks to speed things up ?

A prime example I have in mind that could possibly help a lot here is
to have a shared page mapped at -4K (at the top of the address space)
when the guest is in supervisor mode only that hosts part of the current
VCPU supervisor register state.

That way, we could, either using our existing alternate instruction
patching mechanism, or maybe lazily patching them as we trap on them,
replace instructions such as mtsprg and mfsprg with la/sta (load
absolute/store absolute) from/to this page (absolute addresses on ppc
are 16 bits signed so can reach either the top of the bottom of the
address space).

We could also access the guest MSR read only that way, the guest SRR0
and SRR1, and a few more things. I also have ideas to do soft irq
disabling that way as well which would eventually remove most if not all
the spurrious emulation traps in the exception entry/exit of the guest
kernel.

(Note: this is paravirt even if we patch instructions on traps, in part
because if we use that instead of SPRGs, then the values will not be
reflected in the user readable SPRG aliases, so the guest kernel needs
to be aware of that, typically, the current BookE code -does- use the
user readable variants of SPRG4..7 so we must be careful here).

The cost of course is an additional TLB entry for mapping that -4K page
(but only when running guest kernel code).

(Note: this technique would apply to KVM ppc64 from Alex as well)

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm BookE and SPRGs

2009-07-10 Thread Alexander Graf

Hi Ben,

On 10.07.2009, at 10:10, Benjamin Herrenschmidt wrote:


On Fri, 2009-07-10 at 16:31 +1000, Benjamin Herrenschmidt wrote:


I was roaming through kernel usage of SPRGs and noticed a small  
detail

in kvmppc for BookE ... any reason why in OP_31_XOP_MTSPR, you
open coded the emulation of SPRG0..3, but 4...7 are handled
in kvmppc_core_emulate_mtspr() ?

It occurs to me that in fact for both MTSPR and MFSPR, the code  
should

be moved into kvmppc_core_emulate_mtspr() and
kvmppc_core_emulate_mfspr() for consistency.

Also, from looking at the FSL BookE code, it seems that there is  
such a
thing as SPRG9 (and so I suppose there must be an SPRG8 somewhere  
too),

shouldn't we handle it too ?


BTW. That leads me to another question (CC'ing Avi there too), which  
is

what is the policy vs. para-virtualization ? IE. Are we ok with adding
paravirt tricks to speed things up ?


IMHO paravirt stuff can be really useful, but should stay in the  
guest. I don't really like the idea of adding binary patching of  
guests in the hypervisor more than for dcbz where I didn't see another  
way to do it.


Linux does provide pv_ops for such purposes, or maybe you could use  
the magic kernel patches itself hacks that exist in the power port  
today already.


So then newer guests would be fast, older guests would be slow. Sounds  
like a good tradeoff to me :-).


Maybe we could also do the hacks in the hypervisor, but #ifdef them  
out by default. I always get stomachaches from patching guests by  
default ;-).


[...]


That way, we could, either using our existing alternate instruction
patching mechanism, or maybe lazily patching them as we trap on them,
replace instructions such as mtsprg and mfsprg with la/sta (load
absolute/store absolute) from/to this page (absolute addresses on ppc
are 16 bits signed so can reach either the top of the bottom of the
address space).


That seems to be guest responsibility, no?

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm BookE and SPRGs

2009-07-10 Thread Benjamin Herrenschmidt
On Fri, 2009-07-10 at 10:42 +0200, Alexander Graf wrote:
 
 IMHO paravirt stuff can be really useful, but should stay in the  
 guest. I don't really like the idea of adding binary patching of  
 guests in the hypervisor more than for dcbz where I didn't see another
 way to do it.
 
I wasn't talking about that sort of binary patching :-)

There's two ways to do it:

 - One is when you fault on an instruction like mtsprg2, you can patch
-that- instruction and replace it with a magic stwa to the shared
page. However, I prefer -real- paravirt which is:

 - The guest can use the existing self-binary patching facility we have
to replace its own SPR access instructions with instructions that access
the magic shared page.

 Linux does provide pv_ops for such purposes, or maybe you could use  
 the magic kernel patches itself hacks that exist in the power port  
 today already.

pv_ops are useful for higher level things. We don't necessarily needs
them anyway as we already have various hooks for our existing
hypervisors which are all some kind of paravirt. But the problem we have
now with running supervisor instructions in user mode is too low level
and performance sensitive for something like pv_ops.

My proposed scheme would be much more efficient and remains reasonably
simple.

 So then newer guests would be fast, older guests would be slow.
 Sounds  
 like a good tradeoff to me :-).

Right :-)

 Maybe we could also do the hacks in the hypervisor, but #ifdef them  
 out by default. I always get stomachaches from patching guests by  
 default ;-).

I don't like patching guest from the HV that much neither, I prefer
paravirt for things like that. The case where we may -have- to do it
would be if we tried to run legacy non-open source OSes like MacOS to
handle things like cache line size issues, but then, it should be
special options that have to be explicitely enabled via some sort of
flags passed from userspace.

Thus, from the userspace tools, when creating a VM, you could enable
special MacOS 9 compatibility hacks for example.

But let's deal with that later, right now, the focus is linux on linux.
I was just proposing a simple paravirt approach that would speed up
significantly a whole bunch of existing low level exception entry/exit
code path.

Another approach would be to do that at a higher level, by having more
C-like entry points for the HV to call the guest into but that seems to
inflexible to me and complicated.

 [...]

 That seems to be guest responsibility, no?

Yes. mostly. The host side KVM code would have to provide the shared
page which contains the shadows of SPRGs, SRR's, MSR, etc... and
properly context switch and update it, and provide a way to map it up
the top of the address space (ie, we should make it appear in pseudo
real-mode too on KVM server, on existing KVM BookE, I suppose the
guest can do an explicit call to the HV to instanciate it).

But for the actual replacement of the various instructions with accesses
to this page, that would be the responsibility of the guest to patch
itself, for which we already have appropriate mechanisms so it should be
reasonably easy.

Cheers,
Ben.



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kvm BookE and SPRGs

2009-07-10 Thread Benjamin Herrenschmidt
On Fri, 2009-07-10 at 17:15 +0800, Liu Yu-B13201 wrote:
 Sounds reasonable.
 
 There are some old patchset which implemented the binary patch as Ben
 described.
 
 http://marc.info/?l=kvm-ppcm=122154653905212w=2
 http://marc.info/?l=kvm-ppcm=122154657905306w=2
 

Interesting. Any reason why that wasn't merged ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kvm BookE and SPRGs

2009-07-10 Thread Benjamin Herrenschmidt
On Fri, 2009-07-10 at 19:17 +1000, Benjamin Herrenschmidt wrote:
 On Fri, 2009-07-10 at 17:15 +0800, Liu Yu-B13201 wrote:
  Sounds reasonable.
  
  There are some old patchset which implemented the binary patch as Ben
  described.
  
  http://marc.info/?l=kvm-ppcm=122154653905212w=2
  http://marc.info/?l=kvm-ppcm=122154657905306w=2
  
 
 Interesting. Any reason why that wasn't merged ?

Ok, I had a look and it seems like he's rewriting the guest instructions
from the hypervisor.

I prefer having the guest rewrite it's own instructions. That does mean
that the layout inside the magic page has to be fixed to a certain
extent (or we need the hypervisor to at least pass some kind of
description of where the various fields are) but that's a much better
approach I believe.

The main reason is because of the user-readable SPRG4..7. Because the
guest will -not- trap when reading them, it will be able to read the
value from the real underlying registers. However, when the writes to
them are replaced by writing to the magic page, the underlying register
is not kept in sync and things will break.

Thus I prefer having the guest itself replace those instructions with
magic page accesses in both case (stores and loads), it becomes the
guest responsibility to ensure it's properly using the magic page -only-
and doesn't trap on the actual instructions.

We would thus continue trapping on the normal instructions and emulate
them the old way (though we can probably move that emulation to asm
code that is run before the switch back to the linux mm via the magic
page :-) and thus make the emulation much faster, but that's a different
deal.

But still, the bulk of the patches for adding the cleaner paravirt
interfaces, the magic page etc... seems sane.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm BookE and SPRGs

2009-07-10 Thread Benjamin Herrenschmidt
On Fri, 2009-07-10 at 11:18 +0200, Alexander Graf wrote:

 The only problem I see is that the firmware lives in the high 4k, so
 we'd have to have some sort of enabling HV-call too.

What firmware out of curiosity ? The treeboot thingy ? And yes, we
definitely need an enabling HV call, ie, we stick to traps until it's
enabled, we are talking about virtual space here so the kernel takes
over and put what it wants up there for BookE. For server, we really
don't care much what's in the ROM after we have booted neither.

But yes, my idea did involve an enabling HV call.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm BookE and SPRGs

2009-07-10 Thread Alexander Graf


On 10.07.2009, at 11:39, Benjamin Herrenschmidt wrote:


On Fri, 2009-07-10 at 11:18 +0200, Alexander Graf wrote:


The only problem I see is that the firmware lives in the high 4k, so
we'd have to have some sort of enabling HV-call too.


What firmware out of curiosity ? The treeboot thingy ? And yes, we
definitely need an enabling HV call, ie, we stick to traps until it's


On PPC32 openbios is somewhere up there. On PPC64 openbios stays where  
it was on PPC32, so it's fine.


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm BookE and SPRGs

2009-07-10 Thread Benjamin Herrenschmidt
On Fri, 2009-07-10 at 11:39 +0200, Alexander Graf wrote:
 
 
 Oh so we could have the emulation code mapped into the guest and could
 just jump there from our trampline code, so all page faults and other
 fun traps still work. 
 
 That'd be nice :-)

We can put -some- code in there yes, but some things will still have to
do the big switch over to linux.

Again, in your case, let's get your thingy stable and merged first, but
I may toy with the magic page on the BookE KVM if I have some spare
time.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm BookE and SPRGs

2009-07-10 Thread Benjamin Herrenschmidt
On Fri, 2009-07-10 at 11:43 +0200, Alexander Graf wrote:
  What firmware out of curiosity ? The treeboot thingy ? And yes, we
  definitely need an enabling HV call, ie, we stick to traps until
 it's
 
 On PPC32 openbios is somewhere up there. On PPC64 openbios stays where
 it was on PPC32, so it's fine.
 
Ok, but that's no big deal, we'll only enable it once we don't need the
FW anymore.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: kvm BookE and SPRGs

2009-07-10 Thread Hollis Blanchard
On Fri, 2009-07-10 at 19:17 +1000, Benjamin Herrenschmidt wrote:
 On Fri, 2009-07-10 at 17:15 +0800, Liu Yu-B13201 wrote:
  Sounds reasonable.
  
  There are some old patchset which implemented the binary patch as Ben
  described.
  
  http://marc.info/?l=kvm-ppcm=122154653905212w=2
  http://marc.info/?l=kvm-ppcm=122154657905306w=2
  
 
 Interesting. Any reason why that wasn't merged ?

Basically because we ran out of manpower to maintain it. We didn't want
to push PV changes into upstream Linux, useful only to us, and then
disappear.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm BookE and SPRGs

2009-07-10 Thread Hollis Blanchard
On Fri, 2009-07-10 at 18:10 +1000, Benjamin Herrenschmidt wrote:
 On Fri, 2009-07-10 at 16:31 +1000, Benjamin Herrenschmidt wrote:
  
  I was roaming through kernel usage of SPRGs and noticed a small detail
  in kvmppc for BookE ... any reason why in OP_31_XOP_MTSPR, you
  open coded the emulation of SPRG0..3, but 4...7 are handled
  in kvmppc_core_emulate_mtspr() ?
  
  It occurs to me that in fact for both MTSPR and MFSPR, the code should
  be moved into kvmppc_core_emulate_mtspr() and
  kvmppc_core_emulate_mfspr() for consistency.
  
  Also, from looking at the FSL BookE code, it seems that there is such a
  thing as SPRG9 (and so I suppose there must be an SPRG8 somewhere too),
  shouldn't we handle it too ?
 
 BTW. That leads me to another question (CC'ing Avi there too), which is
 what is the policy vs. para-virtualization ? IE. Are we ok with adding
 paravirt tricks to speed things up ?

Yes, that's fine. We would rather not *require* paravirtualization
though.

 A prime example I have in mind that could possibly help a lot here is
 to have a shared page mapped at -4K (at the top of the address space)
 when the guest is in supervisor mode only that hosts part of the current
 VCPU supervisor register state.
...
 The cost of course is an additional TLB entry for mapping that -4K page
 (but only when running guest kernel code).

It was a net win when Christian implemented it last year. While the
first access may miss in the TLB, these register accesses tend to come
in bunches (i.e. the guest interrupt vectors).

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html