Ramon van Handel wrote:
> Let's say we run linux in the virtual machine. If we do it the
> economic way (the way it is now) then linux system call interrupt
> will directly jump to the system call entry point, without any
> hassle in the monitor --> fast. Your way such an oft-used interrupt
> is slowed down immensely. I'd rather slow down things only if
> there's no other choice...
I'm assuming you're thinking in terms of a guest app making
a sys call to the guest OS using a soft int 0x80. That would
be a cool trick if we could do it. Some comments on that, things
that were only touched on awhile ago:
While running the guest OS, we have the paging
system set up much like the guest OS expects it, so the application
code can only access application pages and not kernel stuff.
Part of the transition process in the sys call, of course,
is that the CPL goes from 3 to 0 (in Linux anyways), at least
when the guest is running natively. If we push guest OS code
down to ring3, then we are forced to trap the system calls
because we need a chance to switch over to a separate set
of page tables, which would be used to give the effective
ring0 code access to all the pages it would expect. Where
guest pages are marked supervisor, our copy would actually
be market user. This is how we could virtualize them.
So for our current strategy anyhow, we can't allow the transition
without intervention, unless we look at using a task gate in
the IDT to a TSS, since the tasking mechanism switches the PDBR too.
Have to look into that more; currently we're not using tasking
at all. But it's good to keep some ideas cookin' on the back
burner.
A second strategy is this. We could push ring0 guest code
down to ring1, and still virtualize it there, rather than
push it down to ring3 code. The paging unit sees all
system rings 0..2 as supervisor, and ring3 as user, so to
the paging unit ring1 is as good as ring0. Of course, this
makes us look at our other strategies an see what we would break.
For instance, during our guest code execution, we have the
interrupt handlers mapped into the linear address space because
we have to. If we allow a transition up to ring0, then read
access to this space would no longer trap out. We have to
consider other stuff too. I think it's best to set this idea
aside for now, and see how everything else settles out. But
I certainly would like to revisit this all down the road.
It's usefulness should be much more evident in the future.
> I've been thinking of other ways to find out what interrupts are
> allocated without having access to structures that I'm not supposed
> to access (I assume that all the unused interrupts have a spcific
> kind of IDT entry... we could try to loop through the IDT to
> figure out which interrupts are actually in use.) I'm still looking
> into that. OTOH, for such a little thing like exporting a symbol,
> we might as well ask the linux guys to add an extra line to their
> kernel :)
I'm not sure how many dynamic changes to the IDT happen in
various OSes, but this may be dangerous if there are some.
I might propose this. Mark all interrupts from 0x20 to 0xff
as needing reflection (hardware ints). Then mark known
soft ints for that guest as OK for direct guest app to
guest OS calling, given that we determine we can do this correctly,
and want to add this for optimization. But the monitor is
free to ignore this as we will do up front anyways. For now,
let's not waste much time on this, and just reflect everything
back. But, hold that thought...
-Kevin