Carsten Otte wrote:
>
> Avi Kivity wrote:
>> We'll want to keep a vcpu fd.  If the vcpu is idle we'll be asleep in 
>> poll() or the like, and we need some kind of wakeup mechanism.
> Our userspace does idle/wakeup differently:
> One cpu exits sys_s390host_sie, and the intercept code indicates a 
> halt with interrupts enabled (cpu idle loop). Now userland parks our 
> vcpu thread in pthread_cond_wait. Once we want to wakeup this thread, 
> either by interprocessor signal (need_resched and such) or due to an 
> IO interrupt, we do a pthread_cond_signal to wakeup the thread again. 
> The thread will now enter sys_s390host_sie, and after entering the 
> vcpu context will execute the interrupt handler first.
> The advantage of waiting in userland I see, is that userspace can dump 
> interrupts to idle CPUs without kernel intervention. On the other 
> hand, my brain hurts when thinking about userland passing vcpu fds to 
> other threads/processes and when thinking about sys_fork().

In both cases you wait in the kernel; with an fd you wait in the kernel 
and with pthread_cond_wait you wait in futex(FUTEX_WAIT) or a close 
relative.

Can one do the equivalent of a futex wakeup from the kernel easily?

> In the end, you do the decision and we'll follow the way you lead to.
>

My primary concern is not to lock userspace into one way of working.  
This is really another sad side effect of the kernel providing a 
bazillion sleep/wakeup methods.

>> I guess some of the difference stems from the fact that on x86, the 
>> Linux pagetables are actually the hardware pagetables.  VT and SVM 
>> use a separate page table for the guest which cannot be shared with 
>> the host. This means that
>>
>> - we need to teach the Linux mm to look at shadow page tables when 
>> transferring dirty bits
>> - when Linux wants to write protect a page, it has to modify the 
>> shadow page tables too (and flush the guest tlbs, which is again a 
>> bit different)
>> - this means rmap has to be extended to include kvm
>>
>> I think that non-x86 have purely software page tables, maybe this 
>> make things easier.
> We do use hardware page tables too. Our hardware does know about 
> mutiple levels of page translation, and does its part of maintaining 
> different sets of dirty/reference bits for guest and host while 
> running in the virtual machine context. This process is transparent 
> for both virtual machine and host.

Nested page tables/extended page tables also provide this facility, with 
some caveats:

- on 32-bit hosts (or 64-bit hosts with 32-bit userspace), host 
userspace virtual address space is not enough to contain the guest 
physical address space.
- there is no way to protect the host userspace from the guest
- some annoying linker scripts need to be used to compile the host 
userspace to move it out of the guest userspace area, making it more 
difficult to write kvm userspace

I think there's a way to work around these issues on 64-bit npt 
hardware: allocate a pgd entry (at a non-zero offset) to hold guest 
physical memory, and copy this pgd entry into a guest-only pgd at offset 
zero.

Of course, there are many millions of non-npt/ept processors out there, 
and we can't leave them out in the cold, so we'll have to work something 
out for classical shadow page tables.

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to