Jesus Cea Avion wrote:
>
> > I'll work on the pre-scanning technique for virtualizing
> > arbitrary instructions. Will need to implement parts of
> > the other items here, but probably will do so minimally
> > at first to get started.
>
> I see several problems using this approach:
>
> - You can only virtualize a your own proccessor. That is, you can�t
> execute x86 code in a 68K or PowerPC processor.
We're only doing x86 on x86. There's no intent
to make any of this work on a PPC, though these concepts likely
carry over to other capable platforms.
> - The technique works only if the breakpoints (virtualization entry
> points) are single byte.
True. Fortunately, on x86 the opcode is 0xCC. :^)
> - Some common instructions must be executed virtualized, so speed
> penalty would be fairly high. A clear example is "RET" (return
> from subrutine).
Good point. I'll recap some stuff talked about here before,
because it's related.
The reason we use this pre-scan technique is so that we
can protect against the execution of instructions, which
the x86 processor does not otherwise protect for us.
In general, these are instructions which "look" but
don't "touch".
Depending upon the guest OS, and the virtualization
hacks used (making the info from a "look" correct),
it's possible that ring3 code won't need
this. For instance, I think we can get Linux application
code running without pre-scanning. For that case, this
would mean raw number crunching performance would be good.
The ability to dump pre-scanning depends on the fact that
you can make the system registers "look" like the
guest expects. That way all instructions can be executed
natively. I proposed some hacks to do some of this,
for instance for letting SGDT execute natively.
Then why not do the same for guest OS code, since it's
also running at ring3? We have to look at a few issues
here. The most obvious one is that the RPL (lower
bits) of the selector define the privilege level,
and if you're running in the wrong privilege
level, a "PUSH CS; POP EAX" will see this difference.
Another pisser issue is the EFLAGS register. We want to be
kind to our host OS and leave interrupts on, so that
we will receive them and redirect them back to the host
in a timely fashion. So we want IF=1. When running
guest application code, this is a highly likely setting,
otherwise how would the guest OS service asynchronous
events. So upon transition to guest app code space,
it's easy enough to detect if this condition is true,
and not pre-scan accordingly. App code can't change
IF so we don't care any further. Guest OS code on the
other hand has IF activity. STI/CLI will generate
exceptions at ring3 so we don't have to worry about them.
PUSHF will store the value of IF, so this needs to be
accurate in case it's used later by the guest OS.
So you have to virtualize this instruction to coerce
the value passed on the stack to be what the guest
OS thinks it should be.
Note that Intel did add a feature called Protected-mode
Virtual Interrupts (PVI) to the Pentium (and SL enhanced 486),
which if designed correctly, may be of help in virtualizing IF
in guest OS code. This is currently perhaps *the* most
undocumented x86 feature. It's not clear to me whether
we can use this, as I haven't dug into it enough.
Anyhow, here's a further thought I've been bouncing around
for awhile. Let's say we have a guest OS which doesn't
care about the idiosyncracies of the selector RPL and stuff like
that. It will spend some of it's time with IF=0, and
some with IF=1. There is potential for running the guest
OS code *without* prescanning while it's expecting IF=1.
We'd have to start from scratch next time we ran code
with IF=0, since it would require pre-scanning and
instructions may have gotten written over etc.
Back to your "IRET" statement, which is a very valid
point. Let's say when you're doing something useful
on your machine, like rendering a 3-d picture, 90% of
the CPU time is application space, 10% kernel. And
in the kernel, 50% is with IF=0, 50% is with IF=1.
I just made all this up for sake of argument.
That would mean normally, you're only pre-scanning 10% of the
time. And with conditions of the paragraph above, only
5% of the time (eliminating half the kernel time). Thus
the performance impact of virtualizing instructions such
as IRET isn't quite as bad, though could still pack some
punch.
Now if we could get the PVI stuff to work for us, maybe
we could look into eliminating pre-scanning the other
5% of the time. That would put us closer to the realm of
what Cedric wanted to do with with Linux, with the addition
of many of our other virtualization hacks.
For Windows, the assumptions above aren't valid, and we're
going to have to pre-scan.
> - You need THREE pages for each original page:
>
> - Original page, to support software instruction decoding/emulation
> when a breakpoint is found. This page is needed also to support
> code page read.
>
> - The translated page. This page is almost identical to previous
> page. The only diferences are the breakpoints used to mark
> virtualized instructions. This page is the one really executed.
>
> - The attribute page. The page which Kevin explains.
True, but again, we only do all of this, if we're running
in code that requires pre-scanning. It's worth noting that
the attribute page can be eliminated if you don't care about
handling overlapping code. Without concern for that, we
could use a strategy where we always start a code page out
filled with 0xCC rather than 0x00. (filled with breakpoints)
We'd have to figure out what to do with writes to a code
page (which we intercept), without an attribute page. The
brute force method would be to start the "translated page" out
at ground zero again with breakpoints.
There's a bunch of virtualization stuff that is unnecessary
given the right set of "don't cares". Some conditions can
be detected dynamically. Others we may have to spoon feed
to the virtualization via an option, to make things
perform better. For instance, how do you detect that an
OS doesn't use overlapping instructions?
What do you folks think about all of this? Maybe we should
get a handle on the PVI stuff before diving into pre-scanning.
It it works, maybe we could get the null kernel up without
pre-scanning, if not Linux. Following is the only public
source of this feature:
http://www.x86.org/articles/pvi1/pvi1.htm
-Kevin