Re: Heads up: More user-unaccessible x86 states?

2009-10-05 Thread Avi Kivity

On 10/05/2009 02:42 PM, Jan Kiszka wrote:



Yes, for big-endian 32/64 and s390.  There are some patches floating around.
 

Well, that's for fixing up the endianess of the bitmap itself. But the
problem with void * in compat code are their different sizes. And
GET_DIRTY_LOG solves this via padding:

union {
void __user *dirty_bitmap;
__u64 padding2;
};

So this should not make pointers a no-go, should it?
   


No, it doesn't work.  Big-endian will place the pointer at offset zero 
which is the high-end word when read by the host; and 31-bit s390 needs 
something unsurprisingly strange to be done to the pointer.


The patches I mentioned involve creating a compat_ioctl callback, 
something which I tried to avoid but failed, not having considered 
big-endian and s390.



We'll probably have to deal with both. Therefore, I'm looking for a
unified solution.


   

array of

struct {
 __u16 type;
 __u16 offset;
} ?

 

For sure possible, just the setup of such data structure in user space
gets a bit, well, unhandy.
   


Yes.  Fixed-offset substructures are a lot easier and less error-prone.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-05 Thread Jan Kiszka
Avi Kivity wrote:
> On 10/05/2009 02:18 PM, Jan Kiszka wrote:
>>> True pointers are no go since we have to deal with compat code (31/32
>>> bit userspace calling into a 64 bit kernel).  Offsets into the state
>>> structure are easier.
>>>  
>> So current KVM_GET_DIRTY_LOG is broken in the compat case...
>>
>>
> 
> Yes, for big-endian 32/64 and s390.  There are some patches floating around.

Well, that's for fixing up the endianess of the bitmap itself. But the
problem with void * in compat code are their different sizes. And
GET_DIRTY_LOG solves this via padding:

union {
void __user *dirty_bitmap;
__u64 padding2;
};

So this should not make pointers a no-go, should it?

> 
>> We'll probably have to deal with both. Therefore, I'm looking for a
>> unified solution.
>>
>>
> 
> array of
> 
> struct {
> __u16 type;
> __u16 offset;
> } ?
> 

For sure possible, just the setup of such data structure in user space
gets a bit, well, unhandy.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-05 Thread Avi Kivity

On 10/05/2009 02:18 PM, Jan Kiszka wrote:



True pointers are no go since we have to deal with compat code (31/32
bit userspace calling into a 64 bit kernel).  Offsets into the state
structure are easier.
 

So current KVM_GET_DIRTY_LOG is broken in the compat case...

   


Yes, for big-endian 32/64 and s390.  There are some patches floating around.


We'll probably have to deal with both. Therefore, I'm looking for a
unified solution.

   


array of

struct {
   __u16 type;
   __u16 offset;
} ?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-05 Thread Jan Kiszka
Avi Kivity wrote:
> On 10/05/2009 01:18 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>
>>> On 10/05/2009 09:43 AM, Jan Kiszka wrote:
>>>  
 Avi Kivity wrote:


> On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>
>  
>>> btw, instead of adding a new ioctl, perhaps it makes sense to define a
>>> new KVM_VCPU_STATE structure that holds all current and future state
>>> (with generous reserved space), instead of separating state over a
>>> dozen
>>> ioctls.
>>>
>>>
>>>
>>>  
>> OK, makes sense. With our without lapic state?
>>
>>
> I'm in two minds.  I'm leaning towards including lapic but would welcome
> arguments either way.
>
>  
 The lapic is optional and, thus, typically handled in different code
 modules by user space. QEMU even creates a separate device that holds
 the state.

>>> avx registers, nested vmx are optional as well.
>>>
>>>  
 I'm not sure user space will benefit from a unified query/set
 interface with regard to this.


>>> The main benefit is to avoid creating an ioctl each time we find a
>>> missing bit.
>>>
>>>  
> Note we have to be careful with timers such as the tsc and lapic timer.
> Maybe have a bitmask at the front specifying which elements are active.
>
>  
 ...and the lapic timers are another argument.

 Regarding TSC, which means MSRs: I tend to include only states into the
 this meta state which have fixed sizes. Otherwise things will get very
 hairy. And the GET/SET_MSRS interface is already fairly flexible, the
 question would be again: What can we gain by unifying?


>>> For MSRs, not much.
>>>
>>> Note we can make it work, by storing an offset/length at a fixed
>>> location and letting userspace point it into the reserved area.
>>>  
>> Hmm, pointers... That makes me think of a meta IOCTL like this:
>>
>> struct kvm_vcpu_state {
>>  int substates;
>>  void __user *substate[0];
>> };
>>
>>
> 
> True pointers are no go since we have to deal with compat code (31/32 
> bit userspace calling into a 64 bit kernel).  Offsets into the state 
> structure are easier.

So current KVM_GET_DIRTY_LOG is broken in the compat case...

> 
>> #define KVM_VCPU_STATE_REGS  0 /* i.e. substate[0] points to kvm_regs */
>> #define KVM_VCPU_STATE_SREGS 1
>> #define KVM_VCPU_STATE_LAPIC 2
>> ...
>>
>> We could easily extend the call with more substates just by defining new
>> pointer slots. Moreover, user space could define which substates should
>> be get/set by simply passing NULL or a valid pointer for substate[n] (or
>> by limiting the substates field).
>>
>> The only ugliness I see is the missing type safety as we would have to
>> deal with void pointers to the substate structures here.
>>
> 
> For fixed sized state a feature bitmap is sufficient I think.
> 

We'll probably have to deal with both. Therefore, I'm looking for a
unified solution.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-05 Thread Avi Kivity

On 10/05/2009 01:18 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 10/05/2009 09:43 AM, Jan Kiszka wrote:
 

Avi Kivity wrote:

   

On 10/04/2009 09:07 PM, Jan Kiszka wrote:

 

btw, instead of adding a new ioctl, perhaps it makes sense to define a
new KVM_VCPU_STATE structure that holds all current and future state
(with generous reserved space), instead of separating state over a
dozen
ioctls.



 

OK, makes sense. With our without lapic state?

   

I'm in two minds.  I'm leaning towards including lapic but would welcome
arguments either way.

 

The lapic is optional and, thus, typically handled in different code
modules by user space. QEMU even creates a separate device that holds
the state.
   

avx registers, nested vmx are optional as well.

 

I'm not sure user space will benefit from a unified query/set
interface with regard to this.

   

The main benefit is to avoid creating an ioctl each time we find a
missing bit.

 

Note we have to be careful with timers such as the tsc and lapic timer.
Maybe have a bitmask at the front specifying which elements are active.

 

...and the lapic timers are another argument.

Regarding TSC, which means MSRs: I tend to include only states into the
this meta state which have fixed sizes. Otherwise things will get very
hairy. And the GET/SET_MSRS interface is already fairly flexible, the
question would be again: What can we gain by unifying?

   

For MSRs, not much.

Note we can make it work, by storing an offset/length at a fixed
location and letting userspace point it into the reserved area.
 

Hmm, pointers... That makes me think of a meta IOCTL like this:

struct kvm_vcpu_state {
int substates;
void __user *substate[0];
};

   


True pointers are no go since we have to deal with compat code (31/32 
bit userspace calling into a 64 bit kernel).  Offsets into the state 
structure are easier.



#define KVM_VCPU_STATE_REGS  0 /* i.e. substate[0] points to kvm_regs */
#define KVM_VCPU_STATE_SREGS 1
#define KVM_VCPU_STATE_LAPIC 2
...

We could easily extend the call with more substates just by defining new
pointer slots. Moreover, user space could define which substates should
be get/set by simply passing NULL or a valid pointer for substate[n] (or
by limiting the substates field).

The only ugliness I see is the missing type safety as we would have to
deal with void pointers to the substate structures here.
   


For fixed sized state a feature bitmap is sufficient I think.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-05 Thread Jan Kiszka
Avi Kivity wrote:
> On 10/05/2009 09:43 AM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>   
>>> On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>>> 
> btw, instead of adding a new ioctl, perhaps it makes sense to define a
> new KVM_VCPU_STATE structure that holds all current and future state
> (with generous reserved space), instead of separating state over a
> dozen
> ioctls.
>
>
>  
 OK, makes sense. With our without lapic state?

>>> I'm in two minds.  I'm leaning towards including lapic but would welcome
>>> arguments either way.
>>>  
>> The lapic is optional and, thus, typically handled in different code
>> modules by user space. QEMU even creates a separate device that holds
>> the state.
> 
> avx registers, nested vmx are optional as well.
> 
>> I'm not sure user space will benefit from a unified query/set
>> interface with regard to this.
>>
> 
> The main benefit is to avoid creating an ioctl each time we find a
> missing bit.
> 
>>> Note we have to be careful with timers such as the tsc and lapic timer.
>>> Maybe have a bitmask at the front specifying which elements are active.
>>>  
>> ...and the lapic timers are another argument.
>>
>> Regarding TSC, which means MSRs: I tend to include only states into the
>> this meta state which have fixed sizes. Otherwise things will get very
>> hairy. And the GET/SET_MSRS interface is already fairly flexible, the
>> question would be again: What can we gain by unifying?
>>
> 
> For MSRs, not much.
> 
> Note we can make it work, by storing an offset/length at a fixed
> location and letting userspace point it into the reserved area.

Hmm, pointers... That makes me think of a meta IOCTL like this:

struct kvm_vcpu_state {
int substates;
void __user *substate[0];
};

#define KVM_VCPU_STATE_REGS  0 /* i.e. substate[0] points to kvm_regs */
#define KVM_VCPU_STATE_SREGS 1
#define KVM_VCPU_STATE_LAPIC 2
...

We could easily extend the call with more substates just by defining new
pointer slots. Moreover, user space could define which substates should
be get/set by simply passing NULL or a valid pointer for substate[n] (or
by limiting the substates field).

The only ugliness I see is the missing type safety as we would have to
deal with void pointers to the substate structures here.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-05 Thread Avi Kivity

On 10/05/2009 09:43 AM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 10/04/2009 09:07 PM, Jan Kiszka wrote:
 

btw, instead of adding a new ioctl, perhaps it makes sense to define a
new KVM_VCPU_STATE structure that holds all current and future state
(with generous reserved space), instead of separating state over a dozen
ioctls.


 

OK, makes sense. With our without lapic state?
   

I'm in two minds.  I'm leaning towards including lapic but would welcome
arguments either way.
 

The lapic is optional and, thus, typically handled in different code
modules by user space. QEMU even creates a separate device that holds
the state.


avx registers, nested vmx are optional as well.


I'm not sure user space will benefit from a unified query/set
interface with regard to this.
   


The main benefit is to avoid creating an ioctl each time we find a 
missing bit.



Note we have to be careful with timers such as the tsc and lapic timer.
Maybe have a bitmask at the front specifying which elements are active.
 

...and the lapic timers are another argument.

Regarding TSC, which means MSRs: I tend to include only states into the
this meta state which have fixed sizes. Otherwise things will get very
hairy. And the GET/SET_MSRS interface is already fairly flexible, the
question would be again: What can we gain by unifying?
   


For MSRs, not much.

Note we can make it work, by storing an offset/length at a fixed 
location and letting userspace point it into the reserved area.



How much "future space"?

   

avx will change the sse registers from 16x16 to 16x32, with a hint of
more to come.  Nested vmx needs the vmptr and some more bits.  MSRs are
potentially endless.  Lots of space.

 

Hmm, a some kB then (even without MSRs)...
   


Yup, it's amazing how much state modern processors hold.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-05 Thread Jan Kiszka
Avi Kivity wrote:
> On 10/04/2009 09:07 PM, Jan Kiszka wrote:
>>> btw, instead of adding a new ioctl, perhaps it makes sense to define a
>>> new KVM_VCPU_STATE structure that holds all current and future state
>>> (with generous reserved space), instead of separating state over a dozen
>>> ioctls.
>>>
>>>  
>> OK, makes sense. With our without lapic state?
> 
> I'm in two minds.  I'm leaning towards including lapic but would welcome
> arguments either way.

The lapic is optional and, thus, typically handled in different code
modules by user space. QEMU even creates a separate device that holds
the state. I'm not sure user space will benefit from a unified query/set
interface with regard to this.

> 
> Note we have to be careful with timers such as the tsc and lapic timer. 
> Maybe have a bitmask at the front specifying which elements are active.

...and the lapic timers are another argument.

Regarding TSC, which means MSRs: I tend to include only states into the
this meta state which have fixed sizes. Otherwise things will get very
hairy. And the GET/SET_MSRS interface is already fairly flexible, the
question would be again: What can we gain by unifying?

> 
>> How much "future space"?
>>
> 
> avx will change the sse registers from 16x16 to 16x32, with a hint of
> more to come.  Nested vmx needs the vmptr and some more bits.  MSRs are
> potentially endless.  Lots of space.
> 

Hmm, a some kB then (even without MSRs)...

Jan



signature.asc
Description: OpenPGP digital signature


Re: Heads up: More user-unaccessible x86 states?

2009-10-04 Thread Avi Kivity

On 10/04/2009 09:07 PM, Jan Kiszka wrote:

btw, instead of adding a new ioctl, perhaps it makes sense to define a
new KVM_VCPU_STATE structure that holds all current and future state
(with generous reserved space), instead of separating state over a dozen
ioctls.

 

OK, makes sense. With our without lapic state?


I'm in two minds.  I'm leaning towards including lapic but would welcome 
arguments either way.


Note we have to be careful with timers such as the tsc and lapic timer.  
Maybe have a bitmask at the front specifying which elements are active.



How much "future space"?
   


avx will change the sse registers from 16x16 to 16x32, with a hint of 
more to come.  Nested vmx needs the vmptr and some more bits.  MSRs are 
potentially endless.  Lots of space.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-04 Thread Jan Kiszka
Avi Kivity wrote:
> On 10/04/2009 12:50 PM, Jan Kiszka wrote:
>> Avi Kivity wrote:
>>   
>>> On 10/04/2009 10:59 AM, Jan Kiszka wrote:
>>> 
 Hi,

 while preparing new IOCTLs to let user space query&   set the yet
 unaccessible NMI states (pending and masked) I also came across the
 interrupt shadow masks. Unless I missed something I would say that
 we so
 far break them in the rare case that a migration happens right while
 any
 of them is asserted. So I guess I should extend my interface and stuff
 them in as well.

 Do we have more of such unaccessible states on x86 that could be
 included, too? Would be a good chance...


>>> There's some hidden state in the cpuid mechanism.  I think we expose it
>>> though (just don't use it in qemu).
>>>  
>> Do you have more details on this?
>>
> 
> Some cpuid leaves return different information based on an internal
> counter.  This is indicated by KVM_CPUID_FLAG_STATEFUL_FUNC.
> 
>> The PDPTRs are hidden state that we should save/restore, though no sane
>> guest relies on them.
>>A quick glance at SVM makes me think that those registered are not
>> exposed there. So when keeping in mind that we may only help VMX guests,
>> I think i makes even less sense to "fix" this, does it?
>>
> 
> Yes.  With npt the PDPTRs are essentially gone.
> 
>>> I think we can lose information if we migrate during a SIPI
>>> (sipi_vector), though that might be fixable without exposing it.
>>>  
>> Hmm, I see. But even it it's not fixable, such an extension would be an
>> in-kernel irqchip thing.
>>
> 
> Yes.
> 
>>> We'll might also lost debug traps.
>>>
>>> We drop pending exceptions; normally that's fine since they'll reinject
>>> themselves, but MCE will not.
>>>  
>> So would it make sense and fix those two issues when we simply save and
>> restore the pending exception?
>>
> 
> Yes.
> 
> btw, instead of adding a new ioctl, perhaps it makes sense to define a
> new KVM_VCPU_STATE structure that holds all current and future state
> (with generous reserved space), instead of separating state over a dozen
> ioctls.
> 

OK, makes sense. With our without lapic state? How much "future space"?

Jan



signature.asc
Description: OpenPGP digital signature


Re: Heads up: More user-unaccessible x86 states?

2009-10-04 Thread Avi Kivity

On 10/04/2009 12:50 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 10/04/2009 10:59 AM, Jan Kiszka wrote:
 

Hi,

while preparing new IOCTLs to let user space query&   set the yet
unaccessible NMI states (pending and masked) I also came across the
interrupt shadow masks. Unless I missed something I would say that we so
far break them in the rare case that a migration happens right while any
of them is asserted. So I guess I should extend my interface and stuff
them in as well.

Do we have more of such unaccessible states on x86 that could be
included, too? Would be a good chance...

   

There's some hidden state in the cpuid mechanism.  I think we expose it
though (just don't use it in qemu).
 

Do you have more details on this?
   


Some cpuid leaves return different information based on an internal 
counter.  This is indicated by KVM_CPUID_FLAG_STATEFUL_FUNC.



The PDPTRs are hidden state that we should save/restore, though no sane
guest relies on them.
   
A quick glance at SVM makes me think that those registered are not

exposed there. So when keeping in mind that we may only help VMX guests,
I think i makes even less sense to "fix" this, does it?
   


Yes.  With npt the PDPTRs are essentially gone.


I think we can lose information if we migrate during a SIPI
(sipi_vector), though that might be fixable without exposing it.
 

Hmm, I see. But even it it's not fixable, such an extension would be an
in-kernel irqchip thing.
   


Yes.


We'll might also lost debug traps.

We drop pending exceptions; normally that's fine since they'll reinject
themselves, but MCE will not.
 

So would it make sense and fix those two issues when we simply save and
restore the pending exception?
   


Yes.

btw, instead of adding a new ioctl, perhaps it makes sense to define a 
new KVM_VCPU_STATE structure that holds all current and future state 
(with generous reserved space), instead of separating state over a dozen 
ioctls.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Heads up: More user-unaccessible x86 states?

2009-10-04 Thread Jan Kiszka
Avi Kivity wrote:
> On 10/04/2009 10:59 AM, Jan Kiszka wrote:
>> Hi,
>>
>> while preparing new IOCTLs to let user space query&  set the yet
>> unaccessible NMI states (pending and masked) I also came across the
>> interrupt shadow masks. Unless I missed something I would say that we so
>> far break them in the rare case that a migration happens right while any
>> of them is asserted. So I guess I should extend my interface and stuff
>> them in as well.
>>
>> Do we have more of such unaccessible states on x86 that could be
>> included, too? Would be a good chance...
>>
> 
> There's some hidden state in the cpuid mechanism.  I think we expose it
> though (just don't use it in qemu).

Do you have more details on this?

> 
> The PDPTRs are hidden state that we should save/restore, though no sane
> guest relies on them.

A quick glance at SVM makes me think that those registered are not
exposed there. So when keeping in mind that we may only help VMX guests,
I think i makes even less sense to "fix" this, does it?

> 
> I think we can lose information if we migrate during a SIPI
> (sipi_vector), though that might be fixable without exposing it.

Hmm, I see. But even it it's not fixable, such an extension would be an
in-kernel irqchip thing.

> 
> We'll might also lost debug traps.
> 
> We drop pending exceptions; normally that's fine since they'll reinject
> themselves, but MCE will not.

So would it make sense and fix those two issues when we simply save and
restore the pending exception?

Jan



signature.asc
Description: OpenPGP digital signature


Re: Heads up: More user-unaccessible x86 states?

2009-10-04 Thread Avi Kivity

On 10/04/2009 10:59 AM, Jan Kiszka wrote:

Hi,

while preparing new IOCTLs to let user space query&  set the yet
unaccessible NMI states (pending and masked) I also came across the
interrupt shadow masks. Unless I missed something I would say that we so
far break them in the rare case that a migration happens right while any
of them is asserted. So I guess I should extend my interface and stuff
them in as well.

Do we have more of such unaccessible states on x86 that could be
included, too? Would be a good chance...
   


There's some hidden state in the cpuid mechanism.  I think we expose it 
though (just don't use it in qemu).


The PDPTRs are hidden state that we should save/restore, though no sane 
guest relies on them.


I think we can lose information if we migrate during a SIPI 
(sipi_vector), though that might be fixable without exposing it.


We'll might also lost debug traps.

We drop pending exceptions; normally that's fine since they'll reinject 
themselves, but MCE will not.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Heads up: More user-unaccessible x86 states?

2009-10-04 Thread Jan Kiszka
Hi,

while preparing new IOCTLs to let user space query & set the yet
unaccessible NMI states (pending and masked) I also came across the
interrupt shadow masks. Unless I missed something I would say that we so
far break them in the rare case that a migration happens right while any
of them is asserted. So I guess I should extend my interface and stuff
them in as well.

Do we have more of such unaccessible states on x86 that could be
included, too? Would be a good chance...

Jan



signature.asc
Description: OpenPGP digital signature