I always like to see people thinking of stuff like this. The
true hacker spirit!
Though, here's the deal.
In guest CPL={0,1,2}, you need to monitor execution flow, because of
the RPL modification. So for those privilege levels, it's no extra
work to virtualize any arbitrary instruction, including VERR/VERW,
since you have to have this support anyways to monitor the
execution stream (SBE, dynamic translation, whatever).
In guest CPL={3}, more things trap naturally, the CPU offers tighter
page permissions, and the selector|RPL values can be used as-is.
So there is potential to run guest code at this level without
code flow monitoring. There's only a few things that prevent
100% virtualization at this level, but generally only if code
is specifically trying to look for a VM.
The VERR/VERW instructions are among this set. If your goal is to
have perfect virtualization with code monitoring off at CPL3, then
you create more problems than you solve by modifying the selectors.
Letting the selectors|RPL be as expected is important when running
without code flow monitor enabled. Otherwise PUSH CS; POP EAX breaks.
I only skimmed your posting. But it seems, in the end, it doesn't
solve the problem. Let me know if I missed an important point.
-Kevin
Willow Schlanger wrote:
> WSRFC0
>
> BTW it is possible to make instructions like VERR/VERW fault. Here's
> how: load the GDT with a 64KB limit (when I refer to the guest's GDT, I
> will say, VGDT. Thus also VIDT, VTR, VCR0, etc.)
>
> Make the monitor use only those GDT descriptors located in the last 4KB
> of the GDT.
>
> Assume for now that the guest will never load a GDT with a limit of 64KB
> and use that last 4KB (it won't, if it's any OS I know of: Windows 98
> and 2000 included).
>
> Descriptors 0..7679 -> available for the guest
> Descriptors 7680..8191 -> available for the monitor
>
> Now if the guest uses VERR/VERW etc. on those last 512 entries, it will
> think that those descriptors are not readable, or writable (regardless
> of the VCPL). This is "because" it those descriptors are beyond the
> VGDT.limit (really, the reason is because the guest runs at a nonzero
> CPL, even when VCPL is zero. Thus, since the monitor's descriptors will
> either be not present or present, and, since if they are present they
> will be CPL0, they are not readable or writable to the guest's CPL.
>
> Now, align the GDT like this: make put all but those last 512
> descriptors in one 4MB page and make those last 512 descriptors at the
> beginning of the next 4MB page.
>
> Now, when switching (native) execution to the guest, the guest's segment
> registers will be loaded with entries in those last 512 descriptors.
> There will be 7 set aside (one for VES, VCS, VSS, VDS, VFS, and VGS, and
> one used temporarilly for LOCK if it is used in vm86 mode (e.g. if the
> user uses LOCK in vm86 mode, you have to run that instruction in 16-bit
> protected mode with TF set, and if the user writes using the CS segment
> override, you need a temporary descriptor).
>
> Now before switching execution to the guest, clear the present bit of
> the low 4MB page. You will need to invalidate that page, too.
>
> When the guest runs, if it does VERR on, for example, descriptor number
> 1 (selector 8+RPL), a PAGE FAULT will be generated (I tried it). This is
> because if the PROCESSOR attempts to access the descriptor (which it
> does when it sets the accessed bit of it, OR WHEN IT LOOKS AT A
> DESCRIPTOR DUE TO VERR/VERW ETC.)
>
> Nifty, huh? Thus, a page fault can be made to happen each time the CPU
> tries to access one of the guest's descriptors.
>
> The only caveat, aside from the upper 512 descriptors (or however many
> you choose) being unavailable, is the following:
>
> What if the user reads from a segment register? They will get a
> different value then they wrote.
>
> This problem exists already at VCPL != 3, since RPL would not be right
> anyways since if it were right, the guest would be able to access
> monitor pages without causing a fault.
>
> Now, what about VCPL 3?
>
> The answer is, since all of the high 512 descriptors will not be allowed
> to be
>
> IMHO, it is possible to run everything 100% natively, doing tricks like
> this, with one exception: SMSW in real mode. True, there would be
> caveats if this were done since all privilige levels would be allowed
> (besides zero): the guest would be allowed to read from the monitor
> memory space at VCPL=2,1. But if we make the guest's page tables not
> present (that is, the linear memory that the guest has mapped to the
> guest's page tables) then faults will happen each time the guest tries
> to access its page tables. By analysing the instruction and CR2 one
> would then modify the ACTUAL page tables like so:
>
> - do not use 4MB paging
> - make the monitor take up 64mb and exist on a 64mb boundery
> - assume that the guest will never PURPOSELY access this area, without
> FIRST accessing the corresponding guest page tables.
> - thus if the guest wishes to use the linear memory the monitor happens
> to be using, it should make an entry in its page tables first. when it
> does that we will intercept and RELOCATE THE MONITOR. keep an LRU so we
> relocate to a place we have not recently been at (because the guest
> might access that area again). assume that the guest will not actually
> use all 4GB of linear memory at one moment.
> - make one of the monitor's page directory entries point to itself. thus
> it gets used as a page table, too.
> - the effect is that the highest 4MB of the monitor's 64MB area then
> contains a linear map of all page table
> entries. The place in there that corresponds with that 4MB area is the
> page directory.
> - don't make any page directory entries not present. if the guest
> requests that, make the corresponding page
> directory point to a 4KB page in the monitor space which is a page
> table whose entries are all not present.
> This allows you to traverse that 4MB linear region without causing a
> page fault. Set an AVL bit in each entry
> of that 4KB page.
>
> With this, we should be able to run all versions of Windows. All
> remaining caveats can be removed by assuming self modified code will not
> do SMSW in real mode. Then scan the Windows binaries (automatically) for
> SMSW and make the pages that MIGHT contain an SMSW CPL0. Trace through
> those pages. Do the same for the VGA BIOS (we could call it,
> pre-prescaning).
>
> BTW: RFC means Request for Comments. E.g. the above may or may not work;
> I just wished to share it; in case it is helpful.... what do you think?
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Kevin Lawton [EMAIL PROTECTED]
MandrakeSoft, Inc. Plex86 developer
http://www.linux-mandrake.com/ http://www.plex86.org/