RE: re-writing on powerpc

2010-12-17 Thread Benjamin Herrenschmidt
On Wed, 2010-12-15 at 11:32 +, Sethi Varun-B16395 wrote:

  kvm isn't aware of all guest mappings (only those that were instantiated
  in shadow tlb/pagetables).
 I am not sure if I understand, but guest would have to be instantiate the 
 mapping in the tlb (for BookE) before page can be accessed.
 That's when we can set the access permissions.

But then you need to track them and add overhead to your TLB management,
which you really don't want.

Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-17 Thread Benjamin Herrenschmidt
On Mon, 2010-12-13 at 09:12 -0800, Hollis Blanchard wrote:
 On 12/13/2010 12:42 AM, Alexander Graf wrote:
  Back when I implemented this, we did however have discussions on exactly 
  that distinction between patching in host or guest space and for some 
  reason I remember that you and Hollis figured that guest patching is 
  superior. I just really can't remember why and couldn't find traces of this 
  in my inbox either :).
 Don't blame me for this. :) My original patching (with Christian) was 
 done from host context, and those patches are in the list archives.
 
 As far as I remember, Ben H said he preferred patching from guest 
 context (mostly for unspecified or gut feeling reasons), and then 
 that's what you did. IIRC it was IRC conversation, which is why it 
 wouldn't be in your inbox.

I didn't want to mention rumors of patents I heard about...

Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-15 Thread Avi Kivity

On 12/14/2010 07:53 PM, Hollis Blanchard wrote:

On 12/14/2010 12:48 AM, Avi Kivity wrote:

On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
Rewriting is dangerous if the guest is unaware of it.  As soon as 
it is made aware of it, it might as well actually do it in the best 
way that suits it.


Can you list some examples of dangerous scenarios?

Perhaps I should rephrase... any real-world dangerous scenarios? :) 


That's much less fun.

I was hoping you could share some traps you've hit with Linux or 
Windows on x86.


We've hit a lot of issues with the very limited patching we do for 
Windows XP (Linux does its own patching):


- Windows hibernation saves the patched code, but not the payload, so we 
have to set up hooks to re-enable the payload when Windows resumes from 
hibernation
- We need the vcpu id in the payload code, and no easy way to get at 
it.  After several wierd hacks we settled on peeking at the Windows 
processor control block, a guest specific per-cpu data structure.
- Some patched instructions are called before the stack is set up, so 
the return doesn't work very well

- others I'm suppressing


- guest checksums own kernel pages
For runtime intrusion detection? Such guests can simply not ask the 
hypervisor to enable the rewriting feature.


Which is sad.


- clever compiler reuses code for constant pool
Not sure what you mean here. Anyways I think clever compilers are 
irrelevant, since a compiler will not ordinarily emit a 
supervisor-mode instruction. The hypervisor has no need to patch 
normal user-mode instructions.


I meant a really clever compiler.  And by using code for the constant 
pool I using IP-relative addressing to fetch a constant using a small 
offset.  If the constant happens to be a patched instruction, it won't 
be so constant.


- guest patches itself (a la linux alternatives), surprised when it 
sees a different instruction

PowerPC Linux does patch itself, which is a write-only operation.


Other self-patchers might be different; say you use xor to toggle 
between two variants, reducing the amount of data you need to keep for 
patching.


- guest jits own kernel code (like Singularity), gets confused when 
it reads back something it didn't write
This is getting really hypothetical, but why would a JIT need to read 
the generated code?




Any wierd hypothetical idea will be in mission-critical production use 
somewhere, see Andreas reply.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-15 Thread Avi Kivity

On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote:


  -Original Message-
  From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-
  ow...@vger.kernel.org] On Behalf Of Avi Kivity
  Sent: Tuesday, December 14, 2010 9:18 PM
  To: Yoder Stuart-B08248
  Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org
  Subject: Re: re-writing on powerpc

  On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote:
   -Original Message-
   From: Avi Kivity [mailto:a...@redhat.com]
   Sent: Tuesday, December 14, 2010 2:49 AM
   To: Hollis Blanchard
   Cc: Yoder Stuart-B08248; Alexander Graf; kvm-ppc@vger.kernel.org
   Subject: Re: re-writing on powerpc

   On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
  Rewriting is dangerous if the guest is unaware of it.  As soon
  as
it
  is made aware of it, it might as well actually do it in the
  best
way
  that suits it.
   
  Can you list some examples of dangerous scenarios?
   

   - guest checksums own kernel pages
   - clever compiler reuses code for constant pool
   - guest patches itself (a la linux alternatives), surprised when it
sees a
   different instruction
   - guest jits own kernel code (like Singularity), gets confused when
  it  reads back something it didn't write
  
One possible solution to hiding rewriting from guest if it must be
hidden is to mark patched pages as execute only.  If a guest reads a
patched page, the hypervisor can fix up the read.
  

  Yes.  Something that is common to all the problems above is using code
  as data.

  However, execute only would only affect the page's mapping, not the page
  itself, yes?  So if the page has another mapping, this doesn't work.


But KVM would be aware of guest page mappings, so access permissions for any 
particular mapping
can be controlled by KVM.


kvm isn't aware of all guest mappings (only those that were instantiated 
in shadow tlb/pagetables).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: re-writing on powerpc

2010-12-15 Thread Sethi Varun-B16395


 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Wednesday, December 15, 2010 4:49 PM
 To: Sethi Varun-B16395
 Cc: Yoder Stuart-B08248; Hollis Blanchard; Alexander Graf; kvm-
 p...@vger.kernel.org
 Subject: Re: re-writing on powerpc
 
 On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote:
 
-Original Message-
From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-
   ow...@vger.kernel.org] On Behalf Of Avi Kivity
Sent: Tuesday, December 14, 2010 9:18 PM
To: Yoder Stuart-B08248
Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org
Subject: Re: re-writing on powerpc
  
On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote:
 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Tuesday, December 14, 2010 2:49 AM
 To: Hollis Blanchard
 Cc: Yoder Stuart-B08248; Alexander Graf; kvm-
 p...@vger.kernel.org
 Subject: Re: re-writing on powerpc
  
 On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
Rewriting is dangerous if the guest is unaware of it.
 As soon
as
  it
is made aware of it, it might as well actually do it in
 the
best
  way
that suits it.
 
Can you list some examples of dangerous scenarios?
 
  
 - guest checksums own kernel pages
 - clever compiler reuses code for constant pool
 - guest patches itself (a la linux alternatives), surprised
 when it
  sees a
 different instruction
 - guest jits own kernel code (like Singularity), gets
 confused when
it  reads back something it didn't write  One possible
   solution to hiding rewriting from guest if it must behidden is
   to mark patched pages as execute only.  If a guest reads a  
   patched page, the hypervisor can fix up the read.

  
Yes.  Something that is common to all the problems above is using
   code  as data.
  
However, execute only would only affect the page's mapping, not the
   page  itself, yes?  So if the page has another mapping, this doesn't
 work.
  
 
  But KVM would be aware of guest page mappings, so access permissions
  for any particular mapping can be controlled by KVM.
 
 kvm isn't aware of all guest mappings (only those that were instantiated
 in shadow tlb/pagetables).
I am not sure if I understand, but guest would have to be instantiate the 
mapping in the tlb (for BookE) before page can be accessed.
That's when we can set the access permissions.

-Varun

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-15 Thread Avi Kivity

On 12/15/2010 01:32 PM, Sethi Varun-B16395 wrote:

  
But KVM would be aware of guest page mappings, so access permissions
for any particular mapping can be controlled by KVM.

  kvm isn't aware of all guest mappings (only those that were instantiated
  in shadow tlb/pagetables).
I am not sure if I understand, but guest would have to be instantiate the 
mapping in the tlb (for BookE) before page can be accessed.
That's when we can set the access permissions.


You're right, for a shadow tlb kvm has all guest mappings at all time.

For page table models, it doesn't.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Avi Kivity

On 12/13/2010 09:03 PM, Scott Wood wrote:


  The interface is a lot simpler.  The guest decides what to patch and
  where to jump.  A please patch me flag needs a ton of documentation on
  what patch means and what the constraints on the guest environment are.


The constraints need to be documented, but I think a ton is a bit of
an exaggeration


I guess.  It's correct for x86 (which has four processor modes, and you 
need to consider segmentation, etc.), perhaps not so much for powerpc.



  -- and having the guest do the patching itself means
that the structure of the shared page must become stable ABI.


It has to be a stable ABI in any case so you can live migrate.  Unless 
you want the hypervisor to unpatch or something.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Avi Kivity

On 12/14/2010 02:24 AM, Alexander Graf wrote:

On 14.12.2010, at 01:18, Scott Wood wrote:

  On Tue, 14 Dec 2010 00:54:38 +0100
  Alexander Grafa...@csgraf.de  wrote:

  On 13.12.2010, at 20:03, Scott Wood wrote:
  [1] Speaking of which, what happens when an interrupt is raised in the
  middle of a paravirt critical section?  KVM will hold off the
  interrupt delivery if it sees the critical flag set, but when will it
  deliver the postponed interrupt?  Seems like it will wait until the next
  time an exit happens for some other reason.

  mtmsr with IF=1 checks for pending interrupts and enables them with a real 
mtmsr then which again checks interrupts in vm entry, so it immediately gets injected 
:).

  Right, but I'm not talking about an interrupt that happens when the
  virtual EE bit is zero.  I'm talking about an interrupt that happens
  right in the middle of the paravirt sequence -- after reading int_pending,
  but before setting critical to r2.

  It seems like the race window is just narrowed, not eliminated.

Hrm, is that window really that important? There's usually plenty of interrupts 
and mmios coming through to always have some check going on.


What about when usually doesn't happen?  Tickless kernel, everything's 
asleep, interrupt missed, system is dead.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Avi Kivity

On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
Rewriting is dangerous if the guest is unaware of it.  As soon as it 
is made aware of it, it might as well actually do it in the best way 
that suits it.


Can you list some examples of dangerous scenarios?



- guest checksums own kernel pages
- clever compiler reuses code for constant pool
- guest patches itself (a la linux alternatives), surprised when it sees 
a different instruction
- guest jits own kernel code (like Singularity), gets confused when it 
reads back something it didn't write


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Alexander Graf

Am 14.12.2010 um 09:42 schrieb Avi Kivity a...@redhat.com:

 On 12/14/2010 02:24 AM, Alexander Graf wrote:
 On 14.12.2010, at 01:18, Scott Wood wrote:
 
   On Tue, 14 Dec 2010 00:54:38 +0100
   Alexander Grafa...@csgraf.de  wrote:
 
   On 13.12.2010, at 20:03, Scott Wood wrote:
   [1] Speaking of which, what happens when an interrupt is raised in the
   middle of a paravirt critical section?  KVM will hold off the
   interrupt delivery if it sees the critical flag set, but when will it
   deliver the postponed interrupt?  Seems like it will wait until the next
   time an exit happens for some other reason.
 
   mtmsr with IF=1 checks for pending interrupts and enables them with a 
  real mtmsr then which again checks interrupts in vm entry, so it 
  immediately gets injected :).
 
   Right, but I'm not talking about an interrupt that happens when the
   virtual EE bit is zero.  I'm talking about an interrupt that happens
   right in the middle of the paravirt sequence -- after reading int_pending,
   but before setting critical to r2.
 
   It seems like the race window is just narrowed, not eliminated.
 
 Hrm, is that window really that important? There's usually plenty of 
 interrupts and mmios coming through to always have some check going on.
 
 What about when usually doesn't happen?  Tickless kernel, everything's 
 asleep, interrupt missed, system is dead.

Even tickless guest kernels will get out of guest context from time to time, 
simply because if there are no interrupts on the host, the host is useless - it 
would only have a single, isolated task running that wouldn't even be able to 
use the network.

But yes, if we can go without black spots, we should :)


Alex

 
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Tue, 14 Dec 2010 01:24:50 +0100
Alexander Graf a...@csgraf.de wrote:

 
 On 14.12.2010, at 01:18, Scott Wood wrote:
 
  Right, but I'm not talking about an interrupt that happens when the
  virtual EE bit is zero.  I'm talking about an interrupt that happens
  right in the middle of the paravirt sequence -- after reading int_pending,
  but before setting critical to r2.
  
  It seems like the race window is just narrowed, not eliminated.
 
 Hrm, is that window really that important? There's usually plenty of 
 interrupts and mmios coming through to always have some check going on.

It could be important for realtime loads, tickless systems (especially
if the Linux host eventually grows the ability to be tickless even
when things are running), etc., and it makes me nervous in general.

It's not something that's going to be causing problems all the time,
though.

 If it really is important, we could also check int_pending right after the 
 critical section and just do a nop exit.

Doesn't checking int_pending require clobbering registers, which is why
we have the critical section in the first place?

 That way we worst case waste a few cycles for the useless guest exit,
 but always fetch interrupts immediately when they occur.

What useless guest exit?  Either we exit when we see an interrupt
pending (in which case it's not useless), or we exit all the time, and
then what's the point of the paravirt?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Alexander Graf
Scott Wood wrote:
 On Tue, 14 Dec 2010 01:24:50 +0100
 Alexander Graf a...@csgraf.de wrote:

   
 On 14.12.2010, at 01:18, Scott Wood wrote:

 
 Right, but I'm not talking about an interrupt that happens when the
 virtual EE bit is zero.  I'm talking about an interrupt that happens
 right in the middle of the paravirt sequence -- after reading int_pending,
 but before setting critical to r2.

 It seems like the race window is just narrowed, not eliminated.
   
 Hrm, is that window really that important? There's usually plenty of 
 interrupts and mmios coming through to always have some check going on.
 

 It could be important for realtime loads, tickless systems (especially
 if the Linux host eventually grows the ability to be tickless even
 when things are running), etc., and it makes me nervous in general.

 It's not something that's going to be causing problems all the time,
 though.
   

I agree - it's certainly wrong.

 If it really is important, we could also check int_pending right after the 
 critical section and just do a nop exit.
 

 Doesn't checking int_pending require clobbering registers, which is why
 we have the critical section in the first place?
   

The critical section is to prevent us from overwriting the scratch
registers, yeah. And I think you're right - I had a thinko last night.

If we see that we should inject an interrupt, but we're inside of a
critical section, we could set the magic page to r/o and try to find the
critical end at which point we can just inject.

   
 That way we worst case waste a few cycles for the useless guest exit,
 but always fetch interrupts immediately when they occur.
 

 What useless guest exit?  Either we exit when we see an interrupt
 pending (in which case it's not useless), or we exit all the time, and
 then what's the point of the paravirt?
   

I was thinking of a case where we get a few false positives. But again,
I probably just had a bad thought :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Hollis Blanchard

On 12/14/2010 12:48 AM, Avi Kivity wrote:

On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
Rewriting is dangerous if the guest is unaware of it.  As soon as it 
is made aware of it, it might as well actually do it in the best way 
that suits it.


Can you list some examples of dangerous scenarios?

Perhaps I should rephrase... any real-world dangerous scenarios? :) I 
was hoping you could share some traps you've hit with Linux or Windows 
on x86.

- guest checksums own kernel pages
For runtime intrusion detection? Such guests can simply not ask the 
hypervisor to enable the rewriting feature.

- clever compiler reuses code for constant pool
Not sure what you mean here. Anyways I think clever compilers are 
irrelevant, since a compiler will not ordinarily emit a supervisor-mode 
instruction. The hypervisor has no need to patch normal user-mode 
instructions.
- guest patches itself (a la linux alternatives), surprised when it 
sees a different instruction

PowerPC Linux does patch itself, which is a write-only operation.
- guest jits own kernel code (like Singularity), gets confused when it 
reads back something it didn't write
This is getting really hypothetical, but why would a JIT need to read 
the generated code?


Hollis Blanchard
Mentor Graphics, Embedded Systems Division


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Tue, 14 Dec 2010 18:48:18 +0100
Alexander Graf ag...@suse.de wrote:

 The critical section is to prevent us from overwriting the scratch
 registers, yeah. And I think you're right - I had a thinko last night.
 
 If we see that we should inject an interrupt, but we're inside of a
 critical section, we could set the magic page to r/o and try to find the
 critical end at which point we can just inject.

Yeah, I thought of that as well -- but single stepping seemed better
than messing with MMU code (one less thing to check for on the TLB miss
path), and it shouldn't happen often enough, or for enough instructions,
to be a performance issue.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Tue, 14 Dec 2010 10:40:57 +0200
Avi Kivity a...@redhat.com wrote:

 On 12/13/2010 09:03 PM, Scott Wood wrote:
  
The interface is a lot simpler.  The guest decides what to patch and
where to jump.  A please patch me flag needs a ton of documentation on
what patch means and what the constraints on the guest environment are.
  
 
  The constraints need to be documented, but I think a ton is a bit of
  an exaggeration
 
 I guess.  It's correct for x86 (which has four processor modes, and you 
 need to consider segmentation, etc.), perhaps not so much for powerpc.

Yeah, x86 seems like it could be a mess.  We actually already wrote up
these constraints for PowerPC for an upcoming version of ePAPR.

-- and having the guest do the patching itself means
  that the structure of the shared page must become stable ABI.
 
 It has to be a stable ABI in any case so you can live migrate.  Unless 
 you want the hypervisor to unpatch or something.

Well, there's a difference between stable among a set of
implementations within which you can live upgrade and stable among
all implementations that can run a guest without further
modification.  I'm thinking of things like completely different
hypervisors (not just KVM) being able to run the same guest image with
paravirt, newly added paravirts working on a guest that doesn't need
updating beyond the initial change to permit rewriting, etc.  And if
there is a mistake made that needs to be incompatibly corrected,
breaking live migration seems less bad than requiring guest code
changes.

I think there are good arguments for both ways -- I don't see any
overwhelming reason to change from what KVM is already doing.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Tue, 14 Dec 2010 12:37:32 -0600
Scott Wood scottw...@freescale.com wrote:

 On Tue, 14 Dec 2010 18:48:18 +0100
 Alexander Graf ag...@suse.de wrote:
 
  The critical section is to prevent us from overwriting the scratch
  registers, yeah. And I think you're right - I had a thinko last night.
  
  If we see that we should inject an interrupt, but we're inside of a
  critical section, we could set the magic page to r/o and try to find the
  critical end at which point we can just inject.
 
 Yeah, I thought of that as well -- but single stepping seemed better
 than messing with MMU code (one less thing to check for on the TLB miss
 path), and it shouldn't happen often enough, or for enough instructions,
 to be a performance issue.

Well, the TLB path might not be so bad if it can reuse an existing
check for mapping the magic page in the first place -- but if an
interrupt happens immediately after setting critical, but before saving
scratch registers, the critical end will not be the next magic page
write.  So you'd still have to either single-step or emulate the stores
at least.

Or I suppose we could document that all magic page stores other than
ending critical must come before checking int_pending, though that seems
a bit ugly.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Wed, 15 Dec 2010 00:00:08 +0100
Alexander Graf ag...@suse.de wrote:

 
 On 14.12.2010, at 21:04, Scott Wood wrote:
 
  Well, the TLB path might not be so bad if it can reuse an existing
  check for mapping the magic page in the first place -- but if an
  interrupt happens immediately after setting critical, but before saving
  scratch registers, the critical end will not be the next magic page
  write.  So you'd still have to either single-step or emulate the stores
  at least.
 
 We could also move the critical value to its own page, so we only have to 
 trap that one :).

Stable ABI...

  Or I suppose we could document that all magic page stores other than
  ending critical must come before checking int_pending, though that seems
  a bit ugly.
 
 That one's very hard to do with live binary patching

Sorry, I was only talking about stores within a critical section -- not
unrelated stores that other patched instructions might do.

So that once KVM has an interrupt to deliver, and sees that critical is
engaged, it knows that the next magic page store will resolve things.
Either it is a store to critical, and KVM can now deliver the
interrupt -- or it is some other store (scratch or MSR itself) and thus
int_pending has not yet been checked.

I don't think it would be a problem for live patching.  It just seems a
bit icky.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Alexander Graf

On 15.12.2010, at 00:17, Scott Wood wrote:

 On Wed, 15 Dec 2010 00:00:08 +0100
 Alexander Graf ag...@suse.de wrote:
 
 
 On 14.12.2010, at 21:04, Scott Wood wrote:
 
 Well, the TLB path might not be so bad if it can reuse an existing
 check for mapping the magic page in the first place -- but if an
 interrupt happens immediately after setting critical, but before saving
 scratch registers, the critical end will not be the next magic page
 write.  So you'd still have to either single-step or emulate the stores
 at least.
 
 We could also move the critical value to its own page, so we only have to 
 trap that one :).
 
 Stable ABI...
 
 Or I suppose we could document that all magic page stores other than
 ending critical must come before checking int_pending, though that seems
 a bit ugly.
 
 That one's very hard to do with live binary patching
 
 Sorry, I was only talking about stores within a critical section -- not
 unrelated stores that other patched instructions might do.
 
 So that once KVM has an interrupt to deliver, and sees that critical is
 engaged, it knows that the next magic page store will resolve things.
 Either it is a store to critical, and KVM can now deliver the
 interrupt -- or it is some other store (scratch or MSR itself) and thus
 int_pending has not yet been checked.
 
 I don't think it would be a problem for live patching.  It just seems a
 bit icky.

Oh, because you'd only trap stores, but no writes? Yep, that would work.
The hard part here is that currently the ppc kvm emulator treats every memory 
write trap as mmio. But that's changeable.

I actually like that idea. It's probably the cleanest we can get away with 
without deep modifications of the guest. Single-step is always icky.

Thinking about the whole thing - can't we create an interrupt notification 
page? Some page that is always mapped read-only when interrupts are available, 
but read-write when they're not? Then we could just do an unconditional store 
after the crit section is done and everyone's happy.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-14 Thread Scott Wood
On Wed, 15 Dec 2010 00:29:40 +0100
Alexander Graf ag...@suse.de wrote:

 On 15.12.2010, at 00:17, Scott Wood wrote:
 
  So that once KVM has an interrupt to deliver, and sees that critical is
  engaged, it knows that the next magic page store will resolve things.
  Either it is a store to critical, and KVM can now deliver the
  interrupt -- or it is some other store (scratch or MSR itself) and thus
  int_pending has not yet been checked.
  
  I don't think it would be a problem for live patching.  It just seems a
  bit icky.
 
 Oh, because you'd only trap stores, but no writes? Yep, that would work.

writes or loads? :-)

 I actually like that idea. It's probably the cleanest we can get away with 
 without deep modifications of the guest. Single-step is always icky.

Well, there's another complication -- if we trap on the final store to
end the critical section, the critical section won't actually be ended
until after that instruction executes.  Which won't happen until we set
the page to read/write and let it go.  So we'd have to look at the
instruction to see what it's doing.

 Thinking about the whole thing - can't we create an interrupt notification 
 page? Some page that is always mapped read-only when interrupts are 
 available, but read-write when they're not?  Then we could just do an 
 unconditional store after the crit section is done and everyone's happy.

I'd limit it to interrupts that were deferred due to critical,
to avoid unnecessary MMU manipulation, and unnecessary traps when doing
mtmsr/wrtee if there's an interrupt pending and old EE = new EE = zero
(assuming the guest doesn't use a separate restore path for that case).

But otherwise sounds reasonable, if we're willing to change the
interface that much.  Does it even need to be read-only, or could it be
entirely unmapped when there's a pending interrupt?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-13 Thread Avi Kivity

On 12/13/2010 10:42 AM, Alexander Graf wrote:

On 13.12.2010, at 09:35, Avi Kivity wrote:

  On 12/13/2010 06:45 AM, Yoder Stuart-B08248 wrote:
  Avi/Hollis,

  Exchanged some emails with Alex on the topic of rewriting on
  powerpc KVM-- the current approach taken by Alex's PV patch is
  to have a guest Linux paravirt  itself, by re-writing certain
  instructions.

  The downside to this approach (guest side patching) is that every OS
  to be run on KVM has to be modified or dynamically patched.

  What were the reasons for not going down the path of doing the
  re-writing in the hypervisor?  (Alex couldn't remember the
  specifics).What about doing it from Qemu?


  Rewriting is dangerous if the guest is unaware of it.  As soon as it is made 
aware of it, it might as well actually do it in the best way that suits it.

Yeah, let me rephrase my exact memory on this:

If the HV just rewrites instructions in the guest, it behaves different from 
real hw which is bad. It could potentially break checksumming inside the guest.

If, however, the guest sends a hypercall to the HV saying please patch me or 
there's a flag on creation time to enable patching, I have a hard time finding a reason 
to do it inside the guest context.

Back when I implemented this, we did however have discussions on exactly that 
distinction between patching in host or guest space and for some reason I 
remember that you and Hollis figured that guest patching is superior. I just 
really can't remember why and couldn't find traces of this in my inbox either 
:).


The interface is a lot simpler.  The guest decides what to patch and 
where to jump.  A please patch me flag needs a ton of documentation on 
what patch means and what the constraints on the guest environment are.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-13 Thread Avi Kivity

On 12/13/2010 07:12 PM, Hollis Blanchard wrote:

On 12/13/2010 12:42 AM, Alexander Graf wrote:
Back when I implemented this, we did however have discussions on 
exactly that distinction between patching in host or guest space and 
for some reason I remember that you and Hollis figured that guest 
patching is superior. I just really can't remember why and couldn't 
find traces of this in my inbox either :).
Don't blame me for this. :) My original patching (with Christian) was 
done from host context, and those patches are in the list archives.


As far as I remember, Ben H said he preferred patching from guest 
context (mostly for unspecified or gut feeling reasons), and then 
that's what you did. IIRC it was IRC conversation, which is why it 
wouldn't be in your inbox.


Well, it's the right thing IMO.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-13 Thread Scott Wood
On Mon, 13 Dec 2010 10:45:30 +0200
Avi Kivity a...@redhat.com wrote:

 On 12/13/2010 10:42 AM, Alexander Graf wrote:
  Yeah, let me rephrase my exact memory on this:
 
  If the HV just rewrites instructions in the guest, it behaves different 
  from real hw which is bad. It could potentially break checksumming inside 
  the guest.
 
  If, however, the guest sends a hypercall to the HV saying please patch me 
  or there's a flag on creation time to enable patching, I have a hard time 
  finding a reason to do it inside the guest context.
 
  Back when I implemented this, we did however have discussions on exactly 
  that distinction between patching in host or guest space and for some 
  reason I remember that you and Hollis figured that guest patching is 
  superior. I just really can't remember why and couldn't find traces of this 
  in my inbox either :).
 
 The interface is a lot simpler.  The guest decides what to patch and 
 where to jump.  A please patch me flag needs a ton of documentation on 
 what patch means and what the constraints on the guest environment are.
 

The constraints need to be documented, but I think a ton is a bit of
an exaggeration -- and having the guest do the patching itself means
that the structure of the shared page must become stable ABI.  Having
the hypervisor do the bulk of the work also makes it easier to add
paravirt to new OSes (in the embedded world, often the reason someone
wants to do virtualization is to run some custom OS alongside Linux).

OTOH, having the guest do it makes it easier to do more complex
rewriting such as mtmsr[1].  And the fact that we've already got an
implementation makes for a compelling tie-breaker.

-Scott

[1] Speaking of which, what happens when an interrupt is raised in the
middle of a paravirt critical section?  KVM will hold off the
interrupt delivery if it sees the critical flag set, but when will it
deliver the postponed interrupt?  Seems like it will wait until the next
time an exit happens for some other reason.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-13 Thread Alexander Graf

On 13.12.2010, at 20:03, Scott Wood wrote:

 On Mon, 13 Dec 2010 10:45:30 +0200
 Avi Kivity a...@redhat.com wrote:
 
 On 12/13/2010 10:42 AM, Alexander Graf wrote:
 Yeah, let me rephrase my exact memory on this:
 
 If the HV just rewrites instructions in the guest, it behaves different 
 from real hw which is bad. It could potentially break checksumming inside 
 the guest.
 
 If, however, the guest sends a hypercall to the HV saying please patch me 
 or there's a flag on creation time to enable patching, I have a hard time 
 finding a reason to do it inside the guest context.
 
 Back when I implemented this, we did however have discussions on exactly 
 that distinction between patching in host or guest space and for some 
 reason I remember that you and Hollis figured that guest patching is 
 superior. I just really can't remember why and couldn't find traces of this 
 in my inbox either :).
 
 The interface is a lot simpler.  The guest decides what to patch and 
 where to jump.  A please patch me flag needs a ton of documentation on 
 what patch means and what the constraints on the guest environment are.
 
 
 The constraints need to be documented, but I think a ton is a bit of
 an exaggeration -- and having the guest do the patching itself means
 that the structure of the shared page must become stable ABI.  Having
 the hypervisor do the bulk of the work also makes it easier to add
 paravirt to new OSes (in the embedded world, often the reason someone
 wants to do virtualization is to run some custom OS alongside Linux).
 
 OTOH, having the guest do it makes it easier to do more complex
 rewriting such as mtmsr[1].  And the fact that we've already got an
 implementation makes for a compelling tie-breaker.
 
 -Scott
 
 [1] Speaking of which, what happens when an interrupt is raised in the
 middle of a paravirt critical section?  KVM will hold off the
 interrupt delivery if it sees the critical flag set, but when will it
 deliver the postponed interrupt?  Seems like it will wait until the next
 time an exit happens for some other reason.

mtmsr with IF=1 checks for pending interrupts and enables them with a real 
mtmsr then which again checks interrupts in vm entry, so it immediately gets 
injected :).


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: re-writing on powerpc

2010-12-13 Thread Scott Wood
On Tue, 14 Dec 2010 00:54:38 +0100
Alexander Graf a...@csgraf.de wrote:

 On 13.12.2010, at 20:03, Scott Wood wrote:
  [1] Speaking of which, what happens when an interrupt is raised in the
  middle of a paravirt critical section?  KVM will hold off the
  interrupt delivery if it sees the critical flag set, but when will it
  deliver the postponed interrupt?  Seems like it will wait until the next
  time an exit happens for some other reason.
 
 mtmsr with IF=1 checks for pending interrupts and enables them with a real 
 mtmsr then which again checks interrupts in vm entry, so it immediately gets 
 injected :).

Right, but I'm not talking about an interrupt that happens when the
virtual EE bit is zero.  I'm talking about an interrupt that happens
right in the middle of the paravirt sequence -- after reading int_pending,
but before setting critical to r2.

It seems like the race window is just narrowed, not eliminated.

One option would be for KVM to single-step the guest until critical !=
r1.  It should only be a few instructions, and it shouldn't happen very
often.  This is probably the better option.

Another option would be to dispense with the critical section
altogether, by having the guest assume that these instructions clobber
certain registers -- though that would not be pleasant to maintain, as
you'd have to verify every place the instruction is used, now or in the
future.  A variant of this would be to use an out-of-section annotation
(similar to get_user et al) so that each instance of the instruction has
to explicitly opt-in.  The rewritten code would be faster this way,
though it may not make any practical difference.

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html