Re: Heads up: More user-unaccessible x86 states?
On 10/05/2009 02:42 PM, Jan Kiszka wrote: Yes, for big-endian 32/64 and s390. There are some patches floating around. Well, that's for fixing up the endianess of the bitmap itself. But the problem with void * in compat code are their different sizes. And GET_DIRTY_LOG solves this via padding: union { void __user *dirty_bitmap; __u64 padding2; }; So this should not make pointers a no-go, should it? No, it doesn't work. Big-endian will place the pointer at offset zero which is the high-end word when read by the host; and 31-bit s390 needs something unsurprisingly strange to be done to the pointer. The patches I mentioned involve creating a compat_ioctl callback, something which I tried to avoid but failed, not having considered big-endian and s390. We'll probably have to deal with both. Therefore, I'm looking for a unified solution. array of struct { __u16 type; __u16 offset; } ? For sure possible, just the setup of such data structure in user space gets a bit, well, unhandy. Yes. Fixed-offset substructures are a lot easier and less error-prone. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
Avi Kivity wrote: > On 10/05/2009 02:18 PM, Jan Kiszka wrote: >>> True pointers are no go since we have to deal with compat code (31/32 >>> bit userspace calling into a 64 bit kernel). Offsets into the state >>> structure are easier. >>> >> So current KVM_GET_DIRTY_LOG is broken in the compat case... >> >> > > Yes, for big-endian 32/64 and s390. There are some patches floating around. Well, that's for fixing up the endianess of the bitmap itself. But the problem with void * in compat code are their different sizes. And GET_DIRTY_LOG solves this via padding: union { void __user *dirty_bitmap; __u64 padding2; }; So this should not make pointers a no-go, should it? > >> We'll probably have to deal with both. Therefore, I'm looking for a >> unified solution. >> >> > > array of > > struct { > __u16 type; > __u16 offset; > } ? > For sure possible, just the setup of such data structure in user space gets a bit, well, unhandy. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
On 10/05/2009 02:18 PM, Jan Kiszka wrote: True pointers are no go since we have to deal with compat code (31/32 bit userspace calling into a 64 bit kernel). Offsets into the state structure are easier. So current KVM_GET_DIRTY_LOG is broken in the compat case... Yes, for big-endian 32/64 and s390. There are some patches floating around. We'll probably have to deal with both. Therefore, I'm looking for a unified solution. array of struct { __u16 type; __u16 offset; } ? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
Avi Kivity wrote: > On 10/05/2009 01:18 PM, Jan Kiszka wrote: >> Avi Kivity wrote: >> >>> On 10/05/2009 09:43 AM, Jan Kiszka wrote: >>> Avi Kivity wrote: > On 10/04/2009 09:07 PM, Jan Kiszka wrote: > > >>> btw, instead of adding a new ioctl, perhaps it makes sense to define a >>> new KVM_VCPU_STATE structure that holds all current and future state >>> (with generous reserved space), instead of separating state over a >>> dozen >>> ioctls. >>> >>> >>> >>> >> OK, makes sense. With our without lapic state? >> >> > I'm in two minds. I'm leaning towards including lapic but would welcome > arguments either way. > > The lapic is optional and, thus, typically handled in different code modules by user space. QEMU even creates a separate device that holds the state. >>> avx registers, nested vmx are optional as well. >>> >>> I'm not sure user space will benefit from a unified query/set interface with regard to this. >>> The main benefit is to avoid creating an ioctl each time we find a >>> missing bit. >>> >>> > Note we have to be careful with timers such as the tsc and lapic timer. > Maybe have a bitmask at the front specifying which elements are active. > > ...and the lapic timers are another argument. Regarding TSC, which means MSRs: I tend to include only states into the this meta state which have fixed sizes. Otherwise things will get very hairy. And the GET/SET_MSRS interface is already fairly flexible, the question would be again: What can we gain by unifying? >>> For MSRs, not much. >>> >>> Note we can make it work, by storing an offset/length at a fixed >>> location and letting userspace point it into the reserved area. >>> >> Hmm, pointers... That makes me think of a meta IOCTL like this: >> >> struct kvm_vcpu_state { >> int substates; >> void __user *substate[0]; >> }; >> >> > > True pointers are no go since we have to deal with compat code (31/32 > bit userspace calling into a 64 bit kernel). Offsets into the state > structure are easier. So current KVM_GET_DIRTY_LOG is broken in the compat case... > >> #define KVM_VCPU_STATE_REGS 0 /* i.e. substate[0] points to kvm_regs */ >> #define KVM_VCPU_STATE_SREGS 1 >> #define KVM_VCPU_STATE_LAPIC 2 >> ... >> >> We could easily extend the call with more substates just by defining new >> pointer slots. Moreover, user space could define which substates should >> be get/set by simply passing NULL or a valid pointer for substate[n] (or >> by limiting the substates field). >> >> The only ugliness I see is the missing type safety as we would have to >> deal with void pointers to the substate structures here. >> > > For fixed sized state a feature bitmap is sufficient I think. > We'll probably have to deal with both. Therefore, I'm looking for a unified solution. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
On 10/05/2009 01:18 PM, Jan Kiszka wrote: Avi Kivity wrote: On 10/05/2009 09:43 AM, Jan Kiszka wrote: Avi Kivity wrote: On 10/04/2009 09:07 PM, Jan Kiszka wrote: btw, instead of adding a new ioctl, perhaps it makes sense to define a new KVM_VCPU_STATE structure that holds all current and future state (with generous reserved space), instead of separating state over a dozen ioctls. OK, makes sense. With our without lapic state? I'm in two minds. I'm leaning towards including lapic but would welcome arguments either way. The lapic is optional and, thus, typically handled in different code modules by user space. QEMU even creates a separate device that holds the state. avx registers, nested vmx are optional as well. I'm not sure user space will benefit from a unified query/set interface with regard to this. The main benefit is to avoid creating an ioctl each time we find a missing bit. Note we have to be careful with timers such as the tsc and lapic timer. Maybe have a bitmask at the front specifying which elements are active. ...and the lapic timers are another argument. Regarding TSC, which means MSRs: I tend to include only states into the this meta state which have fixed sizes. Otherwise things will get very hairy. And the GET/SET_MSRS interface is already fairly flexible, the question would be again: What can we gain by unifying? For MSRs, not much. Note we can make it work, by storing an offset/length at a fixed location and letting userspace point it into the reserved area. Hmm, pointers... That makes me think of a meta IOCTL like this: struct kvm_vcpu_state { int substates; void __user *substate[0]; }; True pointers are no go since we have to deal with compat code (31/32 bit userspace calling into a 64 bit kernel). Offsets into the state structure are easier. #define KVM_VCPU_STATE_REGS 0 /* i.e. substate[0] points to kvm_regs */ #define KVM_VCPU_STATE_SREGS 1 #define KVM_VCPU_STATE_LAPIC 2 ... We could easily extend the call with more substates just by defining new pointer slots. Moreover, user space could define which substates should be get/set by simply passing NULL or a valid pointer for substate[n] (or by limiting the substates field). The only ugliness I see is the missing type safety as we would have to deal with void pointers to the substate structures here. For fixed sized state a feature bitmap is sufficient I think. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
Avi Kivity wrote: > On 10/05/2009 09:43 AM, Jan Kiszka wrote: >> Avi Kivity wrote: >> >>> On 10/04/2009 09:07 PM, Jan Kiszka wrote: >>> > btw, instead of adding a new ioctl, perhaps it makes sense to define a > new KVM_VCPU_STATE structure that holds all current and future state > (with generous reserved space), instead of separating state over a > dozen > ioctls. > > > OK, makes sense. With our without lapic state? >>> I'm in two minds. I'm leaning towards including lapic but would welcome >>> arguments either way. >>> >> The lapic is optional and, thus, typically handled in different code >> modules by user space. QEMU even creates a separate device that holds >> the state. > > avx registers, nested vmx are optional as well. > >> I'm not sure user space will benefit from a unified query/set >> interface with regard to this. >> > > The main benefit is to avoid creating an ioctl each time we find a > missing bit. > >>> Note we have to be careful with timers such as the tsc and lapic timer. >>> Maybe have a bitmask at the front specifying which elements are active. >>> >> ...and the lapic timers are another argument. >> >> Regarding TSC, which means MSRs: I tend to include only states into the >> this meta state which have fixed sizes. Otherwise things will get very >> hairy. And the GET/SET_MSRS interface is already fairly flexible, the >> question would be again: What can we gain by unifying? >> > > For MSRs, not much. > > Note we can make it work, by storing an offset/length at a fixed > location and letting userspace point it into the reserved area. Hmm, pointers... That makes me think of a meta IOCTL like this: struct kvm_vcpu_state { int substates; void __user *substate[0]; }; #define KVM_VCPU_STATE_REGS 0 /* i.e. substate[0] points to kvm_regs */ #define KVM_VCPU_STATE_SREGS 1 #define KVM_VCPU_STATE_LAPIC 2 ... We could easily extend the call with more substates just by defining new pointer slots. Moreover, user space could define which substates should be get/set by simply passing NULL or a valid pointer for substate[n] (or by limiting the substates field). The only ugliness I see is the missing type safety as we would have to deal with void pointers to the substate structures here. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
On 10/05/2009 09:43 AM, Jan Kiszka wrote: Avi Kivity wrote: On 10/04/2009 09:07 PM, Jan Kiszka wrote: btw, instead of adding a new ioctl, perhaps it makes sense to define a new KVM_VCPU_STATE structure that holds all current and future state (with generous reserved space), instead of separating state over a dozen ioctls. OK, makes sense. With our without lapic state? I'm in two minds. I'm leaning towards including lapic but would welcome arguments either way. The lapic is optional and, thus, typically handled in different code modules by user space. QEMU even creates a separate device that holds the state. avx registers, nested vmx are optional as well. I'm not sure user space will benefit from a unified query/set interface with regard to this. The main benefit is to avoid creating an ioctl each time we find a missing bit. Note we have to be careful with timers such as the tsc and lapic timer. Maybe have a bitmask at the front specifying which elements are active. ...and the lapic timers are another argument. Regarding TSC, which means MSRs: I tend to include only states into the this meta state which have fixed sizes. Otherwise things will get very hairy. And the GET/SET_MSRS interface is already fairly flexible, the question would be again: What can we gain by unifying? For MSRs, not much. Note we can make it work, by storing an offset/length at a fixed location and letting userspace point it into the reserved area. How much "future space"? avx will change the sse registers from 16x16 to 16x32, with a hint of more to come. Nested vmx needs the vmptr and some more bits. MSRs are potentially endless. Lots of space. Hmm, a some kB then (even without MSRs)... Yup, it's amazing how much state modern processors hold. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
Avi Kivity wrote: > On 10/04/2009 09:07 PM, Jan Kiszka wrote: >>> btw, instead of adding a new ioctl, perhaps it makes sense to define a >>> new KVM_VCPU_STATE structure that holds all current and future state >>> (with generous reserved space), instead of separating state over a dozen >>> ioctls. >>> >>> >> OK, makes sense. With our without lapic state? > > I'm in two minds. I'm leaning towards including lapic but would welcome > arguments either way. The lapic is optional and, thus, typically handled in different code modules by user space. QEMU even creates a separate device that holds the state. I'm not sure user space will benefit from a unified query/set interface with regard to this. > > Note we have to be careful with timers such as the tsc and lapic timer. > Maybe have a bitmask at the front specifying which elements are active. ...and the lapic timers are another argument. Regarding TSC, which means MSRs: I tend to include only states into the this meta state which have fixed sizes. Otherwise things will get very hairy. And the GET/SET_MSRS interface is already fairly flexible, the question would be again: What can we gain by unifying? > >> How much "future space"? >> > > avx will change the sse registers from 16x16 to 16x32, with a hint of > more to come. Nested vmx needs the vmptr and some more bits. MSRs are > potentially endless. Lots of space. > Hmm, a some kB then (even without MSRs)... Jan signature.asc Description: OpenPGP digital signature
Re: Heads up: More user-unaccessible x86 states?
On 10/04/2009 09:07 PM, Jan Kiszka wrote: btw, instead of adding a new ioctl, perhaps it makes sense to define a new KVM_VCPU_STATE structure that holds all current and future state (with generous reserved space), instead of separating state over a dozen ioctls. OK, makes sense. With our without lapic state? I'm in two minds. I'm leaning towards including lapic but would welcome arguments either way. Note we have to be careful with timers such as the tsc and lapic timer. Maybe have a bitmask at the front specifying which elements are active. How much "future space"? avx will change the sse registers from 16x16 to 16x32, with a hint of more to come. Nested vmx needs the vmptr and some more bits. MSRs are potentially endless. Lots of space. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
Avi Kivity wrote: > On 10/04/2009 12:50 PM, Jan Kiszka wrote: >> Avi Kivity wrote: >> >>> On 10/04/2009 10:59 AM, Jan Kiszka wrote: >>> Hi, while preparing new IOCTLs to let user space query& set the yet unaccessible NMI states (pending and masked) I also came across the interrupt shadow masks. Unless I missed something I would say that we so far break them in the rare case that a migration happens right while any of them is asserted. So I guess I should extend my interface and stuff them in as well. Do we have more of such unaccessible states on x86 that could be included, too? Would be a good chance... >>> There's some hidden state in the cpuid mechanism. I think we expose it >>> though (just don't use it in qemu). >>> >> Do you have more details on this? >> > > Some cpuid leaves return different information based on an internal > counter. This is indicated by KVM_CPUID_FLAG_STATEFUL_FUNC. > >> The PDPTRs are hidden state that we should save/restore, though no sane >> guest relies on them. >>A quick glance at SVM makes me think that those registered are not >> exposed there. So when keeping in mind that we may only help VMX guests, >> I think i makes even less sense to "fix" this, does it? >> > > Yes. With npt the PDPTRs are essentially gone. > >>> I think we can lose information if we migrate during a SIPI >>> (sipi_vector), though that might be fixable without exposing it. >>> >> Hmm, I see. But even it it's not fixable, such an extension would be an >> in-kernel irqchip thing. >> > > Yes. > >>> We'll might also lost debug traps. >>> >>> We drop pending exceptions; normally that's fine since they'll reinject >>> themselves, but MCE will not. >>> >> So would it make sense and fix those two issues when we simply save and >> restore the pending exception? >> > > Yes. > > btw, instead of adding a new ioctl, perhaps it makes sense to define a > new KVM_VCPU_STATE structure that holds all current and future state > (with generous reserved space), instead of separating state over a dozen > ioctls. > OK, makes sense. With our without lapic state? How much "future space"? Jan signature.asc Description: OpenPGP digital signature
Re: Heads up: More user-unaccessible x86 states?
On 10/04/2009 12:50 PM, Jan Kiszka wrote: Avi Kivity wrote: On 10/04/2009 10:59 AM, Jan Kiszka wrote: Hi, while preparing new IOCTLs to let user space query& set the yet unaccessible NMI states (pending and masked) I also came across the interrupt shadow masks. Unless I missed something I would say that we so far break them in the rare case that a migration happens right while any of them is asserted. So I guess I should extend my interface and stuff them in as well. Do we have more of such unaccessible states on x86 that could be included, too? Would be a good chance... There's some hidden state in the cpuid mechanism. I think we expose it though (just don't use it in qemu). Do you have more details on this? Some cpuid leaves return different information based on an internal counter. This is indicated by KVM_CPUID_FLAG_STATEFUL_FUNC. The PDPTRs are hidden state that we should save/restore, though no sane guest relies on them. A quick glance at SVM makes me think that those registered are not exposed there. So when keeping in mind that we may only help VMX guests, I think i makes even less sense to "fix" this, does it? Yes. With npt the PDPTRs are essentially gone. I think we can lose information if we migrate during a SIPI (sipi_vector), though that might be fixable without exposing it. Hmm, I see. But even it it's not fixable, such an extension would be an in-kernel irqchip thing. Yes. We'll might also lost debug traps. We drop pending exceptions; normally that's fine since they'll reinject themselves, but MCE will not. So would it make sense and fix those two issues when we simply save and restore the pending exception? Yes. btw, instead of adding a new ioctl, perhaps it makes sense to define a new KVM_VCPU_STATE structure that holds all current and future state (with generous reserved space), instead of separating state over a dozen ioctls. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Heads up: More user-unaccessible x86 states?
Avi Kivity wrote: > On 10/04/2009 10:59 AM, Jan Kiszka wrote: >> Hi, >> >> while preparing new IOCTLs to let user space query& set the yet >> unaccessible NMI states (pending and masked) I also came across the >> interrupt shadow masks. Unless I missed something I would say that we so >> far break them in the rare case that a migration happens right while any >> of them is asserted. So I guess I should extend my interface and stuff >> them in as well. >> >> Do we have more of such unaccessible states on x86 that could be >> included, too? Would be a good chance... >> > > There's some hidden state in the cpuid mechanism. I think we expose it > though (just don't use it in qemu). Do you have more details on this? > > The PDPTRs are hidden state that we should save/restore, though no sane > guest relies on them. A quick glance at SVM makes me think that those registered are not exposed there. So when keeping in mind that we may only help VMX guests, I think i makes even less sense to "fix" this, does it? > > I think we can lose information if we migrate during a SIPI > (sipi_vector), though that might be fixable without exposing it. Hmm, I see. But even it it's not fixable, such an extension would be an in-kernel irqchip thing. > > We'll might also lost debug traps. > > We drop pending exceptions; normally that's fine since they'll reinject > themselves, but MCE will not. So would it make sense and fix those two issues when we simply save and restore the pending exception? Jan signature.asc Description: OpenPGP digital signature
Re: Heads up: More user-unaccessible x86 states?
On 10/04/2009 10:59 AM, Jan Kiszka wrote: Hi, while preparing new IOCTLs to let user space query& set the yet unaccessible NMI states (pending and masked) I also came across the interrupt shadow masks. Unless I missed something I would say that we so far break them in the rare case that a migration happens right while any of them is asserted. So I guess I should extend my interface and stuff them in as well. Do we have more of such unaccessible states on x86 that could be included, too? Would be a good chance... There's some hidden state in the cpuid mechanism. I think we expose it though (just don't use it in qemu). The PDPTRs are hidden state that we should save/restore, though no sane guest relies on them. I think we can lose information if we migrate during a SIPI (sipi_vector), though that might be fixable without exposing it. We'll might also lost debug traps. We drop pending exceptions; normally that's fine since they'll reinject themselves, but MCE will not. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Heads up: More user-unaccessible x86 states?
Hi, while preparing new IOCTLs to let user space query & set the yet unaccessible NMI states (pending and masked) I also came across the interrupt shadow masks. Unless I missed something I would say that we so far break them in the rare case that a migration happens right while any of them is asserted. So I guess I should extend my interface and stuff them in as well. Do we have more of such unaccessible states on x86 that could be included, too? Would be a good chance... Jan signature.asc Description: OpenPGP digital signature