RE: re-writing on powerpc
On Wed, 2010-12-15 at 11:32 +, Sethi Varun-B16395 wrote: kvm isn't aware of all guest mappings (only those that were instantiated in shadow tlb/pagetables). I am not sure if I understand, but guest would have to be instantiate the mapping in the tlb (for BookE) before page can be accessed. That's when we can set the access permissions. But then you need to track them and add overhead to your TLB management, which you really don't want. Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Mon, 2010-12-13 at 09:12 -0800, Hollis Blanchard wrote: On 12/13/2010 12:42 AM, Alexander Graf wrote: Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :). Don't blame me for this. :) My original patching (with Christian) was done from host context, and those patches are in the list archives. As far as I remember, Ben H said he preferred patching from guest context (mostly for unspecified or gut feeling reasons), and then that's what you did. IIRC it was IRC conversation, which is why it wouldn't be in your inbox. I didn't want to mention rumors of patents I heard about... Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/14/2010 07:53 PM, Hollis Blanchard wrote: On 12/14/2010 12:48 AM, Avi Kivity wrote: On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? Perhaps I should rephrase... any real-world dangerous scenarios? :) That's much less fun. I was hoping you could share some traps you've hit with Linux or Windows on x86. We've hit a lot of issues with the very limited patching we do for Windows XP (Linux does its own patching): - Windows hibernation saves the patched code, but not the payload, so we have to set up hooks to re-enable the payload when Windows resumes from hibernation - We need the vcpu id in the payload code, and no easy way to get at it. After several wierd hacks we settled on peeking at the Windows processor control block, a guest specific per-cpu data structure. - Some patched instructions are called before the stack is set up, so the return doesn't work very well - others I'm suppressing - guest checksums own kernel pages For runtime intrusion detection? Such guests can simply not ask the hypervisor to enable the rewriting feature. Which is sad. - clever compiler reuses code for constant pool Not sure what you mean here. Anyways I think clever compilers are irrelevant, since a compiler will not ordinarily emit a supervisor-mode instruction. The hypervisor has no need to patch normal user-mode instructions. I meant a really clever compiler. And by using code for the constant pool I using IP-relative addressing to fetch a constant using a small offset. If the constant happens to be a patched instruction, it won't be so constant. - guest patches itself (a la linux alternatives), surprised when it sees a different instruction PowerPC Linux does patch itself, which is a write-only operation. Other self-patchers might be different; say you use xor to toggle between two variants, reducing the amount of data you need to keep for patching. - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write This is getting really hypothetical, but why would a JIT need to read the generated code? Any wierd hypothetical idea will be in mission-critical production use somewhere, see Andreas reply. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc- ow...@vger.kernel.org] On Behalf Of Avi Kivity Sent: Tuesday, December 14, 2010 9:18 PM To: Yoder Stuart-B08248 Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org Subject: Re: re-writing on powerpc On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, December 14, 2010 2:49 AM To: Hollis Blanchard Cc: Yoder Stuart-B08248; Alexander Graf; kvm-ppc@vger.kernel.org Subject: Re: re-writing on powerpc On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? - guest checksums own kernel pages - clever compiler reuses code for constant pool - guest patches itself (a la linux alternatives), surprised when it sees a different instruction - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write One possible solution to hiding rewriting from guest if it must be hidden is to mark patched pages as execute only. If a guest reads a patched page, the hypervisor can fix up the read. Yes. Something that is common to all the problems above is using code as data. However, execute only would only affect the page's mapping, not the page itself, yes? So if the page has another mapping, this doesn't work. But KVM would be aware of guest page mappings, so access permissions for any particular mapping can be controlled by KVM. kvm isn't aware of all guest mappings (only those that were instantiated in shadow tlb/pagetables). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: re-writing on powerpc
-Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Wednesday, December 15, 2010 4:49 PM To: Sethi Varun-B16395 Cc: Yoder Stuart-B08248; Hollis Blanchard; Alexander Graf; kvm- p...@vger.kernel.org Subject: Re: re-writing on powerpc On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc- ow...@vger.kernel.org] On Behalf Of Avi Kivity Sent: Tuesday, December 14, 2010 9:18 PM To: Yoder Stuart-B08248 Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org Subject: Re: re-writing on powerpc On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote: -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, December 14, 2010 2:49 AM To: Hollis Blanchard Cc: Yoder Stuart-B08248; Alexander Graf; kvm- p...@vger.kernel.org Subject: Re: re-writing on powerpc On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? - guest checksums own kernel pages - clever compiler reuses code for constant pool - guest patches itself (a la linux alternatives), surprised when it sees a different instruction - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write One possible solution to hiding rewriting from guest if it must behidden is to mark patched pages as execute only. If a guest reads a patched page, the hypervisor can fix up the read. Yes. Something that is common to all the problems above is using code as data. However, execute only would only affect the page's mapping, not the page itself, yes? So if the page has another mapping, this doesn't work. But KVM would be aware of guest page mappings, so access permissions for any particular mapping can be controlled by KVM. kvm isn't aware of all guest mappings (only those that were instantiated in shadow tlb/pagetables). I am not sure if I understand, but guest would have to be instantiate the mapping in the tlb (for BookE) before page can be accessed. That's when we can set the access permissions. -Varun -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/15/2010 01:32 PM, Sethi Varun-B16395 wrote: But KVM would be aware of guest page mappings, so access permissions for any particular mapping can be controlled by KVM. kvm isn't aware of all guest mappings (only those that were instantiated in shadow tlb/pagetables). I am not sure if I understand, but guest would have to be instantiate the mapping in the tlb (for BookE) before page can be accessed. That's when we can set the access permissions. You're right, for a shadow tlb kvm has all guest mappings at all time. For page table models, it doesn't. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/13/2010 09:03 PM, Scott Wood wrote: The interface is a lot simpler. The guest decides what to patch and where to jump. A please patch me flag needs a ton of documentation on what patch means and what the constraints on the guest environment are. The constraints need to be documented, but I think a ton is a bit of an exaggeration I guess. It's correct for x86 (which has four processor modes, and you need to consider segmentation, etc.), perhaps not so much for powerpc. -- and having the guest do the patching itself means that the structure of the shared page must become stable ABI. It has to be a stable ABI in any case so you can live migrate. Unless you want the hypervisor to unpatch or something. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/14/2010 02:24 AM, Alexander Graf wrote: On 14.12.2010, at 01:18, Scott Wood wrote: On Tue, 14 Dec 2010 00:54:38 +0100 Alexander Grafa...@csgraf.de wrote: On 13.12.2010, at 20:03, Scott Wood wrote: [1] Speaking of which, what happens when an interrupt is raised in the middle of a paravirt critical section? KVM will hold off the interrupt delivery if it sees the critical flag set, but when will it deliver the postponed interrupt? Seems like it will wait until the next time an exit happens for some other reason. mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :). Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on. What about when usually doesn't happen? Tickless kernel, everything's asleep, interrupt missed, system is dead. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? - guest checksums own kernel pages - clever compiler reuses code for constant pool - guest patches itself (a la linux alternatives), surprised when it sees a different instruction - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
Am 14.12.2010 um 09:42 schrieb Avi Kivity a...@redhat.com: On 12/14/2010 02:24 AM, Alexander Graf wrote: On 14.12.2010, at 01:18, Scott Wood wrote: On Tue, 14 Dec 2010 00:54:38 +0100 Alexander Grafa...@csgraf.de wrote: On 13.12.2010, at 20:03, Scott Wood wrote: [1] Speaking of which, what happens when an interrupt is raised in the middle of a paravirt critical section? KVM will hold off the interrupt delivery if it sees the critical flag set, but when will it deliver the postponed interrupt? Seems like it will wait until the next time an exit happens for some other reason. mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :). Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on. What about when usually doesn't happen? Tickless kernel, everything's asleep, interrupt missed, system is dead. Even tickless guest kernels will get out of guest context from time to time, simply because if there are no interrupts on the host, the host is useless - it would only have a single, isolated task running that wouldn't even be able to use the network. But yes, if we can go without black spots, we should :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 01:24:50 +0100 Alexander Graf a...@csgraf.de wrote: On 14.12.2010, at 01:18, Scott Wood wrote: Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on. It could be important for realtime loads, tickless systems (especially if the Linux host eventually grows the ability to be tickless even when things are running), etc., and it makes me nervous in general. It's not something that's going to be causing problems all the time, though. If it really is important, we could also check int_pending right after the critical section and just do a nop exit. Doesn't checking int_pending require clobbering registers, which is why we have the critical section in the first place? That way we worst case waste a few cycles for the useless guest exit, but always fetch interrupts immediately when they occur. What useless guest exit? Either we exit when we see an interrupt pending (in which case it's not useless), or we exit all the time, and then what's the point of the paravirt? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
Scott Wood wrote: On Tue, 14 Dec 2010 01:24:50 +0100 Alexander Graf a...@csgraf.de wrote: On 14.12.2010, at 01:18, Scott Wood wrote: Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on. It could be important for realtime loads, tickless systems (especially if the Linux host eventually grows the ability to be tickless even when things are running), etc., and it makes me nervous in general. It's not something that's going to be causing problems all the time, though. I agree - it's certainly wrong. If it really is important, we could also check int_pending right after the critical section and just do a nop exit. Doesn't checking int_pending require clobbering registers, which is why we have the critical section in the first place? The critical section is to prevent us from overwriting the scratch registers, yeah. And I think you're right - I had a thinko last night. If we see that we should inject an interrupt, but we're inside of a critical section, we could set the magic page to r/o and try to find the critical end at which point we can just inject. That way we worst case waste a few cycles for the useless guest exit, but always fetch interrupts immediately when they occur. What useless guest exit? Either we exit when we see an interrupt pending (in which case it's not useless), or we exit all the time, and then what's the point of the paravirt? I was thinking of a case where we get a few false positives. But again, I probably just had a bad thought :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/14/2010 12:48 AM, Avi Kivity wrote: On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? Perhaps I should rephrase... any real-world dangerous scenarios? :) I was hoping you could share some traps you've hit with Linux or Windows on x86. - guest checksums own kernel pages For runtime intrusion detection? Such guests can simply not ask the hypervisor to enable the rewriting feature. - clever compiler reuses code for constant pool Not sure what you mean here. Anyways I think clever compilers are irrelevant, since a compiler will not ordinarily emit a supervisor-mode instruction. The hypervisor has no need to patch normal user-mode instructions. - guest patches itself (a la linux alternatives), surprised when it sees a different instruction PowerPC Linux does patch itself, which is a write-only operation. - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write This is getting really hypothetical, but why would a JIT need to read the generated code? Hollis Blanchard Mentor Graphics, Embedded Systems Division -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 18:48:18 +0100 Alexander Graf ag...@suse.de wrote: The critical section is to prevent us from overwriting the scratch registers, yeah. And I think you're right - I had a thinko last night. If we see that we should inject an interrupt, but we're inside of a critical section, we could set the magic page to r/o and try to find the critical end at which point we can just inject. Yeah, I thought of that as well -- but single stepping seemed better than messing with MMU code (one less thing to check for on the TLB miss path), and it shouldn't happen often enough, or for enough instructions, to be a performance issue. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 10:40:57 +0200 Avi Kivity a...@redhat.com wrote: On 12/13/2010 09:03 PM, Scott Wood wrote: The interface is a lot simpler. The guest decides what to patch and where to jump. A please patch me flag needs a ton of documentation on what patch means and what the constraints on the guest environment are. The constraints need to be documented, but I think a ton is a bit of an exaggeration I guess. It's correct for x86 (which has four processor modes, and you need to consider segmentation, etc.), perhaps not so much for powerpc. Yeah, x86 seems like it could be a mess. We actually already wrote up these constraints for PowerPC for an upcoming version of ePAPR. -- and having the guest do the patching itself means that the structure of the shared page must become stable ABI. It has to be a stable ABI in any case so you can live migrate. Unless you want the hypervisor to unpatch or something. Well, there's a difference between stable among a set of implementations within which you can live upgrade and stable among all implementations that can run a guest without further modification. I'm thinking of things like completely different hypervisors (not just KVM) being able to run the same guest image with paravirt, newly added paravirts working on a guest that doesn't need updating beyond the initial change to permit rewriting, etc. And if there is a mistake made that needs to be incompatibly corrected, breaking live migration seems less bad than requiring guest code changes. I think there are good arguments for both ways -- I don't see any overwhelming reason to change from what KVM is already doing. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 12:37:32 -0600 Scott Wood scottw...@freescale.com wrote: On Tue, 14 Dec 2010 18:48:18 +0100 Alexander Graf ag...@suse.de wrote: The critical section is to prevent us from overwriting the scratch registers, yeah. And I think you're right - I had a thinko last night. If we see that we should inject an interrupt, but we're inside of a critical section, we could set the magic page to r/o and try to find the critical end at which point we can just inject. Yeah, I thought of that as well -- but single stepping seemed better than messing with MMU code (one less thing to check for on the TLB miss path), and it shouldn't happen often enough, or for enough instructions, to be a performance issue. Well, the TLB path might not be so bad if it can reuse an existing check for mapping the magic page in the first place -- but if an interrupt happens immediately after setting critical, but before saving scratch registers, the critical end will not be the next magic page write. So you'd still have to either single-step or emulate the stores at least. Or I suppose we could document that all magic page stores other than ending critical must come before checking int_pending, though that seems a bit ugly. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Wed, 15 Dec 2010 00:00:08 +0100 Alexander Graf ag...@suse.de wrote: On 14.12.2010, at 21:04, Scott Wood wrote: Well, the TLB path might not be so bad if it can reuse an existing check for mapping the magic page in the first place -- but if an interrupt happens immediately after setting critical, but before saving scratch registers, the critical end will not be the next magic page write. So you'd still have to either single-step or emulate the stores at least. We could also move the critical value to its own page, so we only have to trap that one :). Stable ABI... Or I suppose we could document that all magic page stores other than ending critical must come before checking int_pending, though that seems a bit ugly. That one's very hard to do with live binary patching Sorry, I was only talking about stores within a critical section -- not unrelated stores that other patched instructions might do. So that once KVM has an interrupt to deliver, and sees that critical is engaged, it knows that the next magic page store will resolve things. Either it is a store to critical, and KVM can now deliver the interrupt -- or it is some other store (scratch or MSR itself) and thus int_pending has not yet been checked. I don't think it would be a problem for live patching. It just seems a bit icky. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 15.12.2010, at 00:17, Scott Wood wrote: On Wed, 15 Dec 2010 00:00:08 +0100 Alexander Graf ag...@suse.de wrote: On 14.12.2010, at 21:04, Scott Wood wrote: Well, the TLB path might not be so bad if it can reuse an existing check for mapping the magic page in the first place -- but if an interrupt happens immediately after setting critical, but before saving scratch registers, the critical end will not be the next magic page write. So you'd still have to either single-step or emulate the stores at least. We could also move the critical value to its own page, so we only have to trap that one :). Stable ABI... Or I suppose we could document that all magic page stores other than ending critical must come before checking int_pending, though that seems a bit ugly. That one's very hard to do with live binary patching Sorry, I was only talking about stores within a critical section -- not unrelated stores that other patched instructions might do. So that once KVM has an interrupt to deliver, and sees that critical is engaged, it knows that the next magic page store will resolve things. Either it is a store to critical, and KVM can now deliver the interrupt -- or it is some other store (scratch or MSR itself) and thus int_pending has not yet been checked. I don't think it would be a problem for live patching. It just seems a bit icky. Oh, because you'd only trap stores, but no writes? Yep, that would work. The hard part here is that currently the ppc kvm emulator treats every memory write trap as mmio. But that's changeable. I actually like that idea. It's probably the cleanest we can get away with without deep modifications of the guest. Single-step is always icky. Thinking about the whole thing - can't we create an interrupt notification page? Some page that is always mapped read-only when interrupts are available, but read-write when they're not? Then we could just do an unconditional store after the crit section is done and everyone's happy. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Wed, 15 Dec 2010 00:29:40 +0100 Alexander Graf ag...@suse.de wrote: On 15.12.2010, at 00:17, Scott Wood wrote: So that once KVM has an interrupt to deliver, and sees that critical is engaged, it knows that the next magic page store will resolve things. Either it is a store to critical, and KVM can now deliver the interrupt -- or it is some other store (scratch or MSR itself) and thus int_pending has not yet been checked. I don't think it would be a problem for live patching. It just seems a bit icky. Oh, because you'd only trap stores, but no writes? Yep, that would work. writes or loads? :-) I actually like that idea. It's probably the cleanest we can get away with without deep modifications of the guest. Single-step is always icky. Well, there's another complication -- if we trap on the final store to end the critical section, the critical section won't actually be ended until after that instruction executes. Which won't happen until we set the page to read/write and let it go. So we'd have to look at the instruction to see what it's doing. Thinking about the whole thing - can't we create an interrupt notification page? Some page that is always mapped read-only when interrupts are available, but read-write when they're not? Then we could just do an unconditional store after the crit section is done and everyone's happy. I'd limit it to interrupts that were deferred due to critical, to avoid unnecessary MMU manipulation, and unnecessary traps when doing mtmsr/wrtee if there's an interrupt pending and old EE = new EE = zero (assuming the guest doesn't use a separate restore path for that case). But otherwise sounds reasonable, if we're willing to change the interface that much. Does it even need to be read-only, or could it be entirely unmapped when there's a pending interrupt? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/13/2010 10:42 AM, Alexander Graf wrote: On 13.12.2010, at 09:35, Avi Kivity wrote: On 12/13/2010 06:45 AM, Yoder Stuart-B08248 wrote: Avi/Hollis, Exchanged some emails with Alex on the topic of rewriting on powerpc KVM-- the current approach taken by Alex's PV patch is to have a guest Linux paravirt itself, by re-writing certain instructions. The downside to this approach (guest side patching) is that every OS to be run on KVM has to be modified or dynamically patched. What were the reasons for not going down the path of doing the re-writing in the hypervisor? (Alex couldn't remember the specifics).What about doing it from Qemu? Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Yeah, let me rephrase my exact memory on this: If the HV just rewrites instructions in the guest, it behaves different from real hw which is bad. It could potentially break checksumming inside the guest. If, however, the guest sends a hypercall to the HV saying please patch me or there's a flag on creation time to enable patching, I have a hard time finding a reason to do it inside the guest context. Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :). The interface is a lot simpler. The guest decides what to patch and where to jump. A please patch me flag needs a ton of documentation on what patch means and what the constraints on the guest environment are. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/13/2010 07:12 PM, Hollis Blanchard wrote: On 12/13/2010 12:42 AM, Alexander Graf wrote: Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :). Don't blame me for this. :) My original patching (with Christian) was done from host context, and those patches are in the list archives. As far as I remember, Ben H said he preferred patching from guest context (mostly for unspecified or gut feeling reasons), and then that's what you did. IIRC it was IRC conversation, which is why it wouldn't be in your inbox. Well, it's the right thing IMO. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Mon, 13 Dec 2010 10:45:30 +0200 Avi Kivity a...@redhat.com wrote: On 12/13/2010 10:42 AM, Alexander Graf wrote: Yeah, let me rephrase my exact memory on this: If the HV just rewrites instructions in the guest, it behaves different from real hw which is bad. It could potentially break checksumming inside the guest. If, however, the guest sends a hypercall to the HV saying please patch me or there's a flag on creation time to enable patching, I have a hard time finding a reason to do it inside the guest context. Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :). The interface is a lot simpler. The guest decides what to patch and where to jump. A please patch me flag needs a ton of documentation on what patch means and what the constraints on the guest environment are. The constraints need to be documented, but I think a ton is a bit of an exaggeration -- and having the guest do the patching itself means that the structure of the shared page must become stable ABI. Having the hypervisor do the bulk of the work also makes it easier to add paravirt to new OSes (in the embedded world, often the reason someone wants to do virtualization is to run some custom OS alongside Linux). OTOH, having the guest do it makes it easier to do more complex rewriting such as mtmsr[1]. And the fact that we've already got an implementation makes for a compelling tie-breaker. -Scott [1] Speaking of which, what happens when an interrupt is raised in the middle of a paravirt critical section? KVM will hold off the interrupt delivery if it sees the critical flag set, but when will it deliver the postponed interrupt? Seems like it will wait until the next time an exit happens for some other reason. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 13.12.2010, at 20:03, Scott Wood wrote: On Mon, 13 Dec 2010 10:45:30 +0200 Avi Kivity a...@redhat.com wrote: On 12/13/2010 10:42 AM, Alexander Graf wrote: Yeah, let me rephrase my exact memory on this: If the HV just rewrites instructions in the guest, it behaves different from real hw which is bad. It could potentially break checksumming inside the guest. If, however, the guest sends a hypercall to the HV saying please patch me or there's a flag on creation time to enable patching, I have a hard time finding a reason to do it inside the guest context. Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :). The interface is a lot simpler. The guest decides what to patch and where to jump. A please patch me flag needs a ton of documentation on what patch means and what the constraints on the guest environment are. The constraints need to be documented, but I think a ton is a bit of an exaggeration -- and having the guest do the patching itself means that the structure of the shared page must become stable ABI. Having the hypervisor do the bulk of the work also makes it easier to add paravirt to new OSes (in the embedded world, often the reason someone wants to do virtualization is to run some custom OS alongside Linux). OTOH, having the guest do it makes it easier to do more complex rewriting such as mtmsr[1]. And the fact that we've already got an implementation makes for a compelling tie-breaker. -Scott [1] Speaking of which, what happens when an interrupt is raised in the middle of a paravirt critical section? KVM will hold off the interrupt delivery if it sees the critical flag set, but when will it deliver the postponed interrupt? Seems like it will wait until the next time an exit happens for some other reason. mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :). Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 00:54:38 +0100 Alexander Graf a...@csgraf.de wrote: On 13.12.2010, at 20:03, Scott Wood wrote: [1] Speaking of which, what happens when an interrupt is raised in the middle of a paravirt critical section? KVM will hold off the interrupt delivery if it sees the critical flag set, but when will it deliver the postponed interrupt? Seems like it will wait until the next time an exit happens for some other reason. mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :). Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. One option would be for KVM to single-step the guest until critical != r1. It should only be a few instructions, and it shouldn't happen very often. This is probably the better option. Another option would be to dispense with the critical section altogether, by having the guest assume that these instructions clobber certain registers -- though that would not be pleasant to maintain, as you'd have to verify every place the instruction is used, now or in the future. A variant of this would be to use an out-of-section annotation (similar to get_user et al) so that each instance of the instruction has to explicitly opt-in. The rewritten code would be faster this way, though it may not make any practical difference. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html