Re: re-writing on powerpc

2010-12-14 Thread Avi Kivity

On 12/13/2010 09:03 PM, Scott Wood wrote:


  The interface is a lot simpler.  The guest decides what to patch and
  where to jump.  A please patch me flag needs a ton of documentation on
  what patch means and what the constraints on the guest environment are.


The constraints need to be documented, but I think a ton is a bit of
an exaggeration


I guess.  It's correct for x86 (which has four processor modes, and you 
need to consider segmentation, etc.), perhaps not so much for powerpc.



  -- and having the guest do the patching itself means
that the structure of the shared page must become stable ABI.


It has to be a stable ABI in any case so you can live migrate.  Unless 
you want the hypervisor to unpatch or something.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Avi Kivity

On 12/14/2010 02:24 AM, Alexander Graf wrote:

On 14.12.2010, at 01:18, Scott Wood wrote:

  On Tue, 14 Dec 2010 00:54:38 +0100
  Alexander Grafa...@csgraf.de  wrote:

  On 13.12.2010, at 20:03, Scott Wood wrote:
  [1] Speaking of which, what happens when an interrupt is raised in the
  middle of a paravirt critical section?  KVM will hold off the
  interrupt delivery if it sees the critical flag set, but when will it
  deliver the postponed interrupt?  Seems like it will wait until the next
  time an exit happens for some other reason.

  mtmsr with IF=1 checks for pending interrupts and enables them with a real 
mtmsr then which again checks interrupts in vm entry, so it immediately gets injected 
:).

  Right, but I'm not talking about an interrupt that happens when the
  virtual EE bit is zero.  I'm talking about an interrupt that happens
  right in the middle of the paravirt sequence -- after reading int_pending,
  but before setting critical to r2.

  It seems like the race window is just narrowed, not eliminated.

Hrm, is that window really that important? There's usually plenty of interrupts 
and mmios coming through to always have some check going on.


What about when usually doesn't happen?  Tickless kernel, everything's 
asleep, interrupt missed, system is dead.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Avi Kivity

On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
Rewriting is dangerous if the guest is unaware of it.  As soon as it 
is made aware of it, it might as well actually do it in the best way 
that suits it.


Can you list some examples of dangerous scenarios?



- guest checksums own kernel pages
- clever compiler reuses code for constant pool
- guest patches itself (a la linux alternatives), surprised when it sees 
a different instruction
- guest jits own kernel code (like Singularity), gets confused when it 
reads back something it didn't write


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Alexander Graf

Am 14.12.2010 um 09:42 schrieb Avi Kivity a...@redhat.com:

 On 12/14/2010 02:24 AM, Alexander Graf wrote:
 On 14.12.2010, at 01:18, Scott Wood wrote:
 
   On Tue, 14 Dec 2010 00:54:38 +0100
   Alexander Grafa...@csgraf.de  wrote:
 
   On 13.12.2010, at 20:03, Scott Wood wrote:
   [1] Speaking of which, what happens when an interrupt is raised in the
   middle of a paravirt critical section?  KVM will hold off the
   interrupt delivery if it sees the critical flag set, but when will it
   deliver the postponed interrupt?  Seems like it will wait until the next
   time an exit happens for some other reason.
 
   mtmsr with IF=1 checks for pending interrupts and enables them with a 
  real mtmsr then which again checks interrupts in vm entry, so it 
  immediately gets injected :).
 
   Right, but I'm not talking about an interrupt that happens when the
   virtual EE bit is zero.  I'm talking about an interrupt that happens
   right in the middle of the paravirt sequence -- after reading int_pending,
   but before setting critical to r2.
 
   It seems like the race window is just narrowed, not eliminated.
 
 Hrm, is that window really that important? There's usually plenty of 
 interrupts and mmios coming through to always have some check going on.
 
 What about when usually doesn't happen?  Tickless kernel, everything's 
 asleep, interrupt missed, system is dead.

Even tickless guest kernels will get out of guest context from time to time, 
simply because if there are no interrupts on the host, the host is useless - it 
would only have a single, isolated task running that wouldn't even be able to 
use the network.

But yes, if we can go without black spots, we should :)


Alex

 
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Tue, 14 Dec 2010 01:24:50 +0100
Alexander Graf a...@csgraf.de wrote:

 
 On 14.12.2010, at 01:18, Scott Wood wrote:
 
  Right, but I'm not talking about an interrupt that happens when the
  virtual EE bit is zero.  I'm talking about an interrupt that happens
  right in the middle of the paravirt sequence -- after reading int_pending,
  but before setting critical to r2.
  
  It seems like the race window is just narrowed, not eliminated.
 
 Hrm, is that window really that important? There's usually plenty of 
 interrupts and mmios coming through to always have some check going on.

It could be important for realtime loads, tickless systems (especially
if the Linux host eventually grows the ability to be tickless even
when things are running), etc., and it makes me nervous in general.

It's not something that's going to be causing problems all the time,
though.

 If it really is important, we could also check int_pending right after the 
 critical section and just do a nop exit.

Doesn't checking int_pending require clobbering registers, which is why
we have the critical section in the first place?

 That way we worst case waste a few cycles for the useless guest exit,
 but always fetch interrupts immediately when they occur.

What useless guest exit?  Either we exit when we see an interrupt
pending (in which case it's not useless), or we exit all the time, and
then what's the point of the paravirt?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Alexander Graf
Scott Wood wrote:
 On Tue, 14 Dec 2010 01:24:50 +0100
 Alexander Graf a...@csgraf.de wrote:

   
 On 14.12.2010, at 01:18, Scott Wood wrote:

 
 Right, but I'm not talking about an interrupt that happens when the
 virtual EE bit is zero.  I'm talking about an interrupt that happens
 right in the middle of the paravirt sequence -- after reading int_pending,
 but before setting critical to r2.

 It seems like the race window is just narrowed, not eliminated.
   
 Hrm, is that window really that important? There's usually plenty of 
 interrupts and mmios coming through to always have some check going on.
 

 It could be important for realtime loads, tickless systems (especially
 if the Linux host eventually grows the ability to be tickless even
 when things are running), etc., and it makes me nervous in general.

 It's not something that's going to be causing problems all the time,
 though.
   

I agree - it's certainly wrong.

 If it really is important, we could also check int_pending right after the 
 critical section and just do a nop exit.
 

 Doesn't checking int_pending require clobbering registers, which is why
 we have the critical section in the first place?
   

The critical section is to prevent us from overwriting the scratch
registers, yeah. And I think you're right - I had a thinko last night.

If we see that we should inject an interrupt, but we're inside of a
critical section, we could set the magic page to r/o and try to find the
critical end at which point we can just inject.

   
 That way we worst case waste a few cycles for the useless guest exit,
 but always fetch interrupts immediately when they occur.
 

 What useless guest exit?  Either we exit when we see an interrupt
 pending (in which case it's not useless), or we exit all the time, and
 then what's the point of the paravirt?
   

I was thinking of a case where we get a few false positives. But again,
I probably just had a bad thought :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Hollis Blanchard

On 12/14/2010 12:48 AM, Avi Kivity wrote:

On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
Rewriting is dangerous if the guest is unaware of it.  As soon as it 
is made aware of it, it might as well actually do it in the best way 
that suits it.


Can you list some examples of dangerous scenarios?

Perhaps I should rephrase... any real-world dangerous scenarios? :) I 
was hoping you could share some traps you've hit with Linux or Windows 
on x86.

- guest checksums own kernel pages
For runtime intrusion detection? Such guests can simply not ask the 
hypervisor to enable the rewriting feature.

- clever compiler reuses code for constant pool
Not sure what you mean here. Anyways I think clever compilers are 
irrelevant, since a compiler will not ordinarily emit a supervisor-mode 
instruction. The hypervisor has no need to patch normal user-mode 
instructions.
- guest patches itself (a la linux alternatives), surprised when it 
sees a different instruction

PowerPC Linux does patch itself, which is a write-only operation.
- guest jits own kernel code (like Singularity), gets confused when it 
reads back something it didn't write
This is getting really hypothetical, but why would a JIT need to read 
the generated code?


Hollis Blanchard
Mentor Graphics, Embedded Systems Division


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Tue, 14 Dec 2010 18:48:18 +0100
Alexander Graf ag...@suse.de wrote:

 The critical section is to prevent us from overwriting the scratch
 registers, yeah. And I think you're right - I had a thinko last night.
 
 If we see that we should inject an interrupt, but we're inside of a
 critical section, we could set the magic page to r/o and try to find the
 critical end at which point we can just inject.

Yeah, I thought of that as well -- but single stepping seemed better
than messing with MMU code (one less thing to check for on the TLB miss
path), and it shouldn't happen often enough, or for enough instructions,
to be a performance issue.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Tue, 14 Dec 2010 10:40:57 +0200
Avi Kivity a...@redhat.com wrote:

 On 12/13/2010 09:03 PM, Scott Wood wrote:
  
The interface is a lot simpler.  The guest decides what to patch and
where to jump.  A please patch me flag needs a ton of documentation on
what patch means and what the constraints on the guest environment are.
  
 
  The constraints need to be documented, but I think a ton is a bit of
  an exaggeration
 
 I guess.  It's correct for x86 (which has four processor modes, and you 
 need to consider segmentation, etc.), perhaps not so much for powerpc.

Yeah, x86 seems like it could be a mess.  We actually already wrote up
these constraints for PowerPC for an upcoming version of ePAPR.

-- and having the guest do the patching itself means
  that the structure of the shared page must become stable ABI.
 
 It has to be a stable ABI in any case so you can live migrate.  Unless 
 you want the hypervisor to unpatch or something.

Well, there's a difference between stable among a set of
implementations within which you can live upgrade and stable among
all implementations that can run a guest without further
modification.  I'm thinking of things like completely different
hypervisors (not just KVM) being able to run the same guest image with
paravirt, newly added paravirts working on a guest that doesn't need
updating beyond the initial change to permit rewriting, etc.  And if
there is a mistake made that needs to be incompatibly corrected,
breaking live migration seems less bad than requiring guest code
changes.

I think there are good arguments for both ways -- I don't see any
overwhelming reason to change from what KVM is already doing.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Tue, 14 Dec 2010 12:37:32 -0600
Scott Wood scottw...@freescale.com wrote:

 On Tue, 14 Dec 2010 18:48:18 +0100
 Alexander Graf ag...@suse.de wrote:
 
  The critical section is to prevent us from overwriting the scratch
  registers, yeah. And I think you're right - I had a thinko last night.
  
  If we see that we should inject an interrupt, but we're inside of a
  critical section, we could set the magic page to r/o and try to find the
  critical end at which point we can just inject.
 
 Yeah, I thought of that as well -- but single stepping seemed better
 than messing with MMU code (one less thing to check for on the TLB miss
 path), and it shouldn't happen often enough, or for enough instructions,
 to be a performance issue.

Well, the TLB path might not be so bad if it can reuse an existing
check for mapping the magic page in the first place -- but if an
interrupt happens immediately after setting critical, but before saving
scratch registers, the critical end will not be the next magic page
write.  So you'd still have to either single-step or emulate the stores
at least.

Or I suppose we could document that all magic page stores other than
ending critical must come before checking int_pending, though that seems
a bit ugly.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Wed, 15 Dec 2010 00:00:08 +0100
Alexander Graf ag...@suse.de wrote:

 
 On 14.12.2010, at 21:04, Scott Wood wrote:
 
  Well, the TLB path might not be so bad if it can reuse an existing
  check for mapping the magic page in the first place -- but if an
  interrupt happens immediately after setting critical, but before saving
  scratch registers, the critical end will not be the next magic page
  write.  So you'd still have to either single-step or emulate the stores
  at least.
 
 We could also move the critical value to its own page, so we only have to 
 trap that one :).

Stable ABI...

  Or I suppose we could document that all magic page stores other than
  ending critical must come before checking int_pending, though that seems
  a bit ugly.
 
 That one's very hard to do with live binary patching

Sorry, I was only talking about stores within a critical section -- not
unrelated stores that other patched instructions might do.

So that once KVM has an interrupt to deliver, and sees that critical is
engaged, it knows that the next magic page store will resolve things.
Either it is a store to critical, and KVM can now deliver the
interrupt -- or it is some other store (scratch or MSR itself) and thus
int_pending has not yet been checked.

I don't think it would be a problem for live patching.  It just seems a
bit icky.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Alexander Graf

On 15.12.2010, at 00:17, Scott Wood wrote:

 On Wed, 15 Dec 2010 00:00:08 +0100
 Alexander Graf ag...@suse.de wrote:
 
 
 On 14.12.2010, at 21:04, Scott Wood wrote:
 
 Well, the TLB path might not be so bad if it can reuse an existing
 check for mapping the magic page in the first place -- but if an
 interrupt happens immediately after setting critical, but before saving
 scratch registers, the critical end will not be the next magic page
 write.  So you'd still have to either single-step or emulate the stores
 at least.
 
 We could also move the critical value to its own page, so we only have to 
 trap that one :).
 
 Stable ABI...
 
 Or I suppose we could document that all magic page stores other than
 ending critical must come before checking int_pending, though that seems
 a bit ugly.
 
 That one's very hard to do with live binary patching
 
 Sorry, I was only talking about stores within a critical section -- not
 unrelated stores that other patched instructions might do.
 
 So that once KVM has an interrupt to deliver, and sees that critical is
 engaged, it knows that the next magic page store will resolve things.
 Either it is a store to critical, and KVM can now deliver the
 interrupt -- or it is some other store (scratch or MSR itself) and thus
 int_pending has not yet been checked.
 
 I don't think it would be a problem for live patching.  It just seems a
 bit icky.

Oh, because you'd only trap stores, but no writes? Yep, that would work.
The hard part here is that currently the ppc kvm emulator treats every memory 
write trap as mmio. But that's changeable.

I actually like that idea. It's probably the cleanest we can get away with 
without deep modifications of the guest. Single-step is always icky.

Thinking about the whole thing - can't we create an interrupt notification 
page? Some page that is always mapped read-only when interrupts are available, 
but read-write when they're not? Then we could just do an unconditional store 
after the crit section is done and everyone's happy.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Wed, 15 Dec 2010 00:29:40 +0100
Alexander Graf ag...@suse.de wrote:

 On 15.12.2010, at 00:17, Scott Wood wrote:
 
  So that once KVM has an interrupt to deliver, and sees that critical is
  engaged, it knows that the next magic page store will resolve things.
  Either it is a store to critical, and KVM can now deliver the
  interrupt -- or it is some other store (scratch or MSR itself) and thus
  int_pending has not yet been checked.
  
  I don't think it would be a problem for live patching.  It just seems a
  bit icky.
 
 Oh, because you'd only trap stores, but no writes? Yep, that would work.

writes or loads? :-)

 I actually like that idea. It's probably the cleanest we can get away with 
 without deep modifications of the guest. Single-step is always icky.

Well, there's another complication -- if we trap on the final store to
end the critical section, the critical section won't actually be ended
until after that instruction executes.  Which won't happen until we set
the page to read/write and let it go.  So we'd have to look at the
instruction to see what it's doing.

 Thinking about the whole thing - can't we create an interrupt notification 
 page? Some page that is always mapped read-only when interrupts are 
 available, but read-write when they're not?  Then we could just do an 
 unconditional store after the crit section is done and everyone's happy.

I'd limit it to interrupts that were deferred due to critical,
to avoid unnecessary MMU manipulation, and unnecessary traps when doing
mtmsr/wrtee if there's an interrupt pending and old EE = new EE = zero
(assuming the guest doesn't use a separate restore path for that case).

But otherwise sounds reasonable, if we're willing to change the
interface that much.  Does it even need to be read-only, or could it be
entirely unmapped when there's a pending interrupt?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html