Carsten Otte wrote: > > Avi Kivity wrote: >> We'll want to keep a vcpu fd. If the vcpu is idle we'll be asleep in >> poll() or the like, and we need some kind of wakeup mechanism. > Our userspace does idle/wakeup differently: > One cpu exits sys_s390host_sie, and the intercept code indicates a > halt with interrupts enabled (cpu idle loop). Now userland parks our > vcpu thread in pthread_cond_wait. Once we want to wakeup this thread, > either by interprocessor signal (need_resched and such) or due to an > IO interrupt, we do a pthread_cond_signal to wakeup the thread again. > The thread will now enter sys_s390host_sie, and after entering the > vcpu context will execute the interrupt handler first. > The advantage of waiting in userland I see, is that userspace can dump > interrupts to idle CPUs without kernel intervention. On the other > hand, my brain hurts when thinking about userland passing vcpu fds to > other threads/processes and when thinking about sys_fork().
In both cases you wait in the kernel; with an fd you wait in the kernel and with pthread_cond_wait you wait in futex(FUTEX_WAIT) or a close relative. Can one do the equivalent of a futex wakeup from the kernel easily? > In the end, you do the decision and we'll follow the way you lead to. > My primary concern is not to lock userspace into one way of working. This is really another sad side effect of the kernel providing a bazillion sleep/wakeup methods. >> I guess some of the difference stems from the fact that on x86, the >> Linux pagetables are actually the hardware pagetables. VT and SVM >> use a separate page table for the guest which cannot be shared with >> the host. This means that >> >> - we need to teach the Linux mm to look at shadow page tables when >> transferring dirty bits >> - when Linux wants to write protect a page, it has to modify the >> shadow page tables too (and flush the guest tlbs, which is again a >> bit different) >> - this means rmap has to be extended to include kvm >> >> I think that non-x86 have purely software page tables, maybe this >> make things easier. > We do use hardware page tables too. Our hardware does know about > mutiple levels of page translation, and does its part of maintaining > different sets of dirty/reference bits for guest and host while > running in the virtual machine context. This process is transparent > for both virtual machine and host. Nested page tables/extended page tables also provide this facility, with some caveats: - on 32-bit hosts (or 64-bit hosts with 32-bit userspace), host userspace virtual address space is not enough to contain the guest physical address space. - there is no way to protect the host userspace from the guest - some annoying linker scripts need to be used to compile the host userspace to move it out of the guest userspace area, making it more difficult to write kvm userspace I think there's a way to work around these issues on 64-bit npt hardware: allocate a pgd entry (at a non-zero offset) to hold guest physical memory, and copy this pgd entry into a guest-only pgd at offset zero. Of course, there are many millions of non-npt/ept processors out there, and we can't leave them out in the cold, so we'll have to work something out for classical shadow page tables. -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel