Re: re-writing on powerpc
On 12/13/2010 09:03 PM, Scott Wood wrote: The interface is a lot simpler. The guest decides what to patch and where to jump. A please patch me flag needs a ton of documentation on what patch means and what the constraints on the guest environment are. The constraints need to be documented, but I think a ton is a bit of an exaggeration I guess. It's correct for x86 (which has four processor modes, and you need to consider segmentation, etc.), perhaps not so much for powerpc. -- and having the guest do the patching itself means that the structure of the shared page must become stable ABI. It has to be a stable ABI in any case so you can live migrate. Unless you want the hypervisor to unpatch or something. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/14/2010 02:24 AM, Alexander Graf wrote: On 14.12.2010, at 01:18, Scott Wood wrote: On Tue, 14 Dec 2010 00:54:38 +0100 Alexander Grafa...@csgraf.de wrote: On 13.12.2010, at 20:03, Scott Wood wrote: [1] Speaking of which, what happens when an interrupt is raised in the middle of a paravirt critical section? KVM will hold off the interrupt delivery if it sees the critical flag set, but when will it deliver the postponed interrupt? Seems like it will wait until the next time an exit happens for some other reason. mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :). Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on. What about when usually doesn't happen? Tickless kernel, everything's asleep, interrupt missed, system is dead. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? - guest checksums own kernel pages - clever compiler reuses code for constant pool - guest patches itself (a la linux alternatives), surprised when it sees a different instruction - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
Am 14.12.2010 um 09:42 schrieb Avi Kivity a...@redhat.com: On 12/14/2010 02:24 AM, Alexander Graf wrote: On 14.12.2010, at 01:18, Scott Wood wrote: On Tue, 14 Dec 2010 00:54:38 +0100 Alexander Grafa...@csgraf.de wrote: On 13.12.2010, at 20:03, Scott Wood wrote: [1] Speaking of which, what happens when an interrupt is raised in the middle of a paravirt critical section? KVM will hold off the interrupt delivery if it sees the critical flag set, but when will it deliver the postponed interrupt? Seems like it will wait until the next time an exit happens for some other reason. mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :). Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on. What about when usually doesn't happen? Tickless kernel, everything's asleep, interrupt missed, system is dead. Even tickless guest kernels will get out of guest context from time to time, simply because if there are no interrupts on the host, the host is useless - it would only have a single, isolated task running that wouldn't even be able to use the network. But yes, if we can go without black spots, we should :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 01:24:50 +0100 Alexander Graf a...@csgraf.de wrote: On 14.12.2010, at 01:18, Scott Wood wrote: Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on. It could be important for realtime loads, tickless systems (especially if the Linux host eventually grows the ability to be tickless even when things are running), etc., and it makes me nervous in general. It's not something that's going to be causing problems all the time, though. If it really is important, we could also check int_pending right after the critical section and just do a nop exit. Doesn't checking int_pending require clobbering registers, which is why we have the critical section in the first place? That way we worst case waste a few cycles for the useless guest exit, but always fetch interrupts immediately when they occur. What useless guest exit? Either we exit when we see an interrupt pending (in which case it's not useless), or we exit all the time, and then what's the point of the paravirt? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
Scott Wood wrote: On Tue, 14 Dec 2010 01:24:50 +0100 Alexander Graf a...@csgraf.de wrote: On 14.12.2010, at 01:18, Scott Wood wrote: Right, but I'm not talking about an interrupt that happens when the virtual EE bit is zero. I'm talking about an interrupt that happens right in the middle of the paravirt sequence -- after reading int_pending, but before setting critical to r2. It seems like the race window is just narrowed, not eliminated. Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on. It could be important for realtime loads, tickless systems (especially if the Linux host eventually grows the ability to be tickless even when things are running), etc., and it makes me nervous in general. It's not something that's going to be causing problems all the time, though. I agree - it's certainly wrong. If it really is important, we could also check int_pending right after the critical section and just do a nop exit. Doesn't checking int_pending require clobbering registers, which is why we have the critical section in the first place? The critical section is to prevent us from overwriting the scratch registers, yeah. And I think you're right - I had a thinko last night. If we see that we should inject an interrupt, but we're inside of a critical section, we could set the magic page to r/o and try to find the critical end at which point we can just inject. That way we worst case waste a few cycles for the useless guest exit, but always fetch interrupts immediately when they occur. What useless guest exit? Either we exit when we see an interrupt pending (in which case it's not useless), or we exit all the time, and then what's the point of the paravirt? I was thinking of a case where we get a few false positives. But again, I probably just had a bad thought :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 12/14/2010 12:48 AM, Avi Kivity wrote: On 12/13/2010 07:17 PM, Hollis Blanchard wrote: Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it. Can you list some examples of dangerous scenarios? Perhaps I should rephrase... any real-world dangerous scenarios? :) I was hoping you could share some traps you've hit with Linux or Windows on x86. - guest checksums own kernel pages For runtime intrusion detection? Such guests can simply not ask the hypervisor to enable the rewriting feature. - clever compiler reuses code for constant pool Not sure what you mean here. Anyways I think clever compilers are irrelevant, since a compiler will not ordinarily emit a supervisor-mode instruction. The hypervisor has no need to patch normal user-mode instructions. - guest patches itself (a la linux alternatives), surprised when it sees a different instruction PowerPC Linux does patch itself, which is a write-only operation. - guest jits own kernel code (like Singularity), gets confused when it reads back something it didn't write This is getting really hypothetical, but why would a JIT need to read the generated code? Hollis Blanchard Mentor Graphics, Embedded Systems Division -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 18:48:18 +0100 Alexander Graf ag...@suse.de wrote: The critical section is to prevent us from overwriting the scratch registers, yeah. And I think you're right - I had a thinko last night. If we see that we should inject an interrupt, but we're inside of a critical section, we could set the magic page to r/o and try to find the critical end at which point we can just inject. Yeah, I thought of that as well -- but single stepping seemed better than messing with MMU code (one less thing to check for on the TLB miss path), and it shouldn't happen often enough, or for enough instructions, to be a performance issue. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 10:40:57 +0200 Avi Kivity a...@redhat.com wrote: On 12/13/2010 09:03 PM, Scott Wood wrote: The interface is a lot simpler. The guest decides what to patch and where to jump. A please patch me flag needs a ton of documentation on what patch means and what the constraints on the guest environment are. The constraints need to be documented, but I think a ton is a bit of an exaggeration I guess. It's correct for x86 (which has four processor modes, and you need to consider segmentation, etc.), perhaps not so much for powerpc. Yeah, x86 seems like it could be a mess. We actually already wrote up these constraints for PowerPC for an upcoming version of ePAPR. -- and having the guest do the patching itself means that the structure of the shared page must become stable ABI. It has to be a stable ABI in any case so you can live migrate. Unless you want the hypervisor to unpatch or something. Well, there's a difference between stable among a set of implementations within which you can live upgrade and stable among all implementations that can run a guest without further modification. I'm thinking of things like completely different hypervisors (not just KVM) being able to run the same guest image with paravirt, newly added paravirts working on a guest that doesn't need updating beyond the initial change to permit rewriting, etc. And if there is a mistake made that needs to be incompatibly corrected, breaking live migration seems less bad than requiring guest code changes. I think there are good arguments for both ways -- I don't see any overwhelming reason to change from what KVM is already doing. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Tue, 14 Dec 2010 12:37:32 -0600 Scott Wood scottw...@freescale.com wrote: On Tue, 14 Dec 2010 18:48:18 +0100 Alexander Graf ag...@suse.de wrote: The critical section is to prevent us from overwriting the scratch registers, yeah. And I think you're right - I had a thinko last night. If we see that we should inject an interrupt, but we're inside of a critical section, we could set the magic page to r/o and try to find the critical end at which point we can just inject. Yeah, I thought of that as well -- but single stepping seemed better than messing with MMU code (one less thing to check for on the TLB miss path), and it shouldn't happen often enough, or for enough instructions, to be a performance issue. Well, the TLB path might not be so bad if it can reuse an existing check for mapping the magic page in the first place -- but if an interrupt happens immediately after setting critical, but before saving scratch registers, the critical end will not be the next magic page write. So you'd still have to either single-step or emulate the stores at least. Or I suppose we could document that all magic page stores other than ending critical must come before checking int_pending, though that seems a bit ugly. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Wed, 15 Dec 2010 00:00:08 +0100 Alexander Graf ag...@suse.de wrote: On 14.12.2010, at 21:04, Scott Wood wrote: Well, the TLB path might not be so bad if it can reuse an existing check for mapping the magic page in the first place -- but if an interrupt happens immediately after setting critical, but before saving scratch registers, the critical end will not be the next magic page write. So you'd still have to either single-step or emulate the stores at least. We could also move the critical value to its own page, so we only have to trap that one :). Stable ABI... Or I suppose we could document that all magic page stores other than ending critical must come before checking int_pending, though that seems a bit ugly. That one's very hard to do with live binary patching Sorry, I was only talking about stores within a critical section -- not unrelated stores that other patched instructions might do. So that once KVM has an interrupt to deliver, and sees that critical is engaged, it knows that the next magic page store will resolve things. Either it is a store to critical, and KVM can now deliver the interrupt -- or it is some other store (scratch or MSR itself) and thus int_pending has not yet been checked. I don't think it would be a problem for live patching. It just seems a bit icky. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On 15.12.2010, at 00:17, Scott Wood wrote: On Wed, 15 Dec 2010 00:00:08 +0100 Alexander Graf ag...@suse.de wrote: On 14.12.2010, at 21:04, Scott Wood wrote: Well, the TLB path might not be so bad if it can reuse an existing check for mapping the magic page in the first place -- but if an interrupt happens immediately after setting critical, but before saving scratch registers, the critical end will not be the next magic page write. So you'd still have to either single-step or emulate the stores at least. We could also move the critical value to its own page, so we only have to trap that one :). Stable ABI... Or I suppose we could document that all magic page stores other than ending critical must come before checking int_pending, though that seems a bit ugly. That one's very hard to do with live binary patching Sorry, I was only talking about stores within a critical section -- not unrelated stores that other patched instructions might do. So that once KVM has an interrupt to deliver, and sees that critical is engaged, it knows that the next magic page store will resolve things. Either it is a store to critical, and KVM can now deliver the interrupt -- or it is some other store (scratch or MSR itself) and thus int_pending has not yet been checked. I don't think it would be a problem for live patching. It just seems a bit icky. Oh, because you'd only trap stores, but no writes? Yep, that would work. The hard part here is that currently the ppc kvm emulator treats every memory write trap as mmio. But that's changeable. I actually like that idea. It's probably the cleanest we can get away with without deep modifications of the guest. Single-step is always icky. Thinking about the whole thing - can't we create an interrupt notification page? Some page that is always mapped read-only when interrupts are available, but read-write when they're not? Then we could just do an unconditional store after the crit section is done and everyone's happy. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-writing on powerpc
On Wed, 15 Dec 2010 00:29:40 +0100 Alexander Graf ag...@suse.de wrote: On 15.12.2010, at 00:17, Scott Wood wrote: So that once KVM has an interrupt to deliver, and sees that critical is engaged, it knows that the next magic page store will resolve things. Either it is a store to critical, and KVM can now deliver the interrupt -- or it is some other store (scratch or MSR itself) and thus int_pending has not yet been checked. I don't think it would be a problem for live patching. It just seems a bit icky. Oh, because you'd only trap stores, but no writes? Yep, that would work. writes or loads? :-) I actually like that idea. It's probably the cleanest we can get away with without deep modifications of the guest. Single-step is always icky. Well, there's another complication -- if we trap on the final store to end the critical section, the critical section won't actually be ended until after that instruction executes. Which won't happen until we set the page to read/write and let it go. So we'd have to look at the instruction to see what it's doing. Thinking about the whole thing - can't we create an interrupt notification page? Some page that is always mapped read-only when interrupts are available, but read-write when they're not? Then we could just do an unconditional store after the crit section is done and everyone's happy. I'd limit it to interrupts that were deferred due to critical, to avoid unnecessary MMU manipulation, and unnecessary traps when doing mtmsr/wrtee if there's an interrupt pending and old EE = new EE = zero (assuming the guest doesn't use a separate restore path for that case). But otherwise sounds reasonable, if we're willing to change the interface that much. Does it even need to be read-only, or could it be entirely unmapped when there's a pending interrupt? -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html