OK, attached is another dose of technical ideas
for implementation of FreeMWare.  This stuff is
important, as it relates to the next wave of
enhancements to our code.  I'd appreciate your reactions
to this stuff if you're into the techie side of FreeMWare.


-Kevin
Hey,

I'm thinking in terms of going the next step with
FreeMWare and adding our own private page-table mappings,
which we'll need as part of the overall virtualization
strategy.  To this end, there's some issues worth talking
about which relate to the linear address space of the
monitor and guest OS.

BTW, currently I just use the host page tables etc and
don't change CR3 (PDBR) upon switches to/from the
host and monitor.  Though, you'll notice there's
a little code in there in preparation for paging stuff.



MAPPING THE MONITOR'S GDT AND IDT INTO THE GUEST LINEAR SPACE
=============================================================

The first issue is dealing with mapping the monitor's IDT
and GDT into the linear address space used by both the
monitor and guest OS.

The monitor is never directly invoked by the guest
code, it is only invoked via interrupts and exceptions
which gate via the monitor's IDT.  Entries in the IDT
point into the GDT.  So let's look at both, with respect
to where to map them.

As our IDT and GDT must occupy the same linear address
domain as the guest code which is normally executing,
we need to make sure there are mechanisms to allow
the two to cohabitate.

Another point worth noting is that the SGDT and SIDT
instructions are not protected and thus ring3 (user)
code may execute them.  They each return a base address
and limit, the base address being a pure linear address
independent of the code and data segment base addresses.
To offer really precise virtualization, in the sense that
the user program will not detect us influencing the
base linear address at which we store these structures,
we could use the 2 following approaches.

(Generally, our thought process so far is to use the
 pre-scanning technique to virtualize instructions at
 ring0.  There may be some optimizations here specific
 to certain guest OSes.  But for now, I'll leave it at that.)

Approach #1:

For user code, if we are performing the pre-scanning
technique, we could simply virtualize the SGDT and SIDT
instructions, and emulate them to return the values
which the guest code expects.  In this case we can place
the GDT and IDT structure in linear memory such that
they are in an area which is not currently used by
either guest-OS or guest-user code.  We do have access
to the guest page tables, so it is fairly easy to
find a free area.

However, we would have to page protect the areas of memory
which contain what the guest-OS thinks is the real GDT and IDT,
and use the fault opportunity to update the real ones used
by the monitor.

Approach #2:

If we wanted to let these 2 instructions execute without
virtualized intervention, and still yield accurate results
with respect to the base address returned, then we could
actually place the GDT and IDT structures at their expected
linear addresses.  Since we need to page protect the GDT
and IDT from access by the guest-OS code anyways so
we can virtualize these structures, we might as well actually
place them where the guest OS thinks they should be.
Keep in mind that both the guest-OS and guest-app code
will be pushed down to ring3, so they will generate
a page fault upon trying to access the areas of memory
containing the GDT and IDT, which we of course protected.
This gives us a chance to do something smart with the
access.


MAPPING THE ACTUAL MONITOR INTERRUPT HANDLER CODE INTO
THE GUEST LINEAR SPACE
======================================================

Now that we've discussed placing the GDT and IDT in
linear memory, we need to map the actual interrupt handler
code as well.  Since we will be virtualizing the IDT and
GDT, the guest OS will not see our segment descriptors
and selectors, so we have some freedom here.  We can
place this code (by page mapping it) into an unused
linear address range, again given we have access to the
guest-OS page tables.

The interrupt handler code, is actually just code
linked with our host OS kernel module.  The consideration
here is that code generated by the compiler is based on offsets
from the code and data segments.  This code will not be calling
functions in the host-OS kernel and should be contained to access
within its own code and data when used in the monitor/guest
context.

So we must set the monitor's code and data segment base addresses
such that the offsets make sense, based on the linear address
where we map in the code.  For example, let's say our host-OS
uses a CS segment base normally of 0xc0000000
(like previous Linux kernels) and our kernel module lives
in the range 0xc2000000 .. 0xc200ffff.

Then let's say that based on empty areas in the guest-OS's
page tables, we find a free range living at
0x62000000 .. 0x6200ffff.  We would make the descriptor for
our interrupt handler contain a base of 0x60000000, so that
the offsets remain consistent with the kernel module code.

And of course, we mark these pages as supervisor, so that
in the case they are accesses by the guest OS, a fault will
occur.  We will also be virtualizing the guest-OS page
tables, protecting that area of memory, so we can update
our strategies.  Thus, we will know when the guest-OS makes
updates to it's page tables.  This gives us a
perfect opportunity to detect when an area of memory
is no longer free.  If the guest-OS marks a linear
address range as not free anymore, and that conflicts
with the range we are using for our monitor code, we can
simply change the segment descriptor base addresses for
code and data, and remap the handler code to another linear
address range which is currently free.  No memory transfers
occur, only remapping of addresses.

This kind of overhead will only occur once per time that
we find we are no longer living in free memory.  To reduce
this even further, we could start out at, and use well known
alternative addresses as part of our relocation strategy.
The addresses we use, could be ones which are likely not
to be used by particular guest OSes.

-Kevin

Reply via email to