Kevin Lawton wrote:

> ... Anyways some results.  I scaled up the macro loops count.
>
>   execution time    guest timeslice
>   14.55             500000 uS   (2Hz)
>   14.60              10000 uS (100Hz)
>
> Only an extra 0.3% overhead for a higher frequency context switch,
> on top of the overhead imposed by the extra code in the branch tcode
> talked about previously.  In other words, the factors listed in
> the previous section already include most of the overhead.  The
> extra revalidation doesn't weight that heavily.  There will be more
> overhead with a real VM system, but it's comforting.
> ...
>
> Working backwards, now it's a little easier to explain why the
> methods I proposed in the Plex86 Internals Guide (PIG) are
> page oriented - because of this context switch dynamic tcode address
> revalidation process.  It's also why there are constraints (or
> perhaps just a constraints ID) notated on the meta information
> for each page, on the included graphics.

I have been reading up on the IA32 architecture and its various
implementations, and I'm *just* starting to see some of the difficulties
involved with Kevin's work.  Some minor questions:

1. Am I dead wrong, or does the IA32 MMU impose massive amounts of overhead for
the work it does?  I understand it has a fantastically flexible architecture,
but it seems nobody makes use of the all its capabilities, and thus apps and
OSes seem burdened with the weight/overhead of that unused flexibility.  In
particular, the MMU makes extensive use of memory-based tables and linked
lists.  While a number of these are cached within the CPU/MMU, it would seem
any reasonably complex OS could easily start thrashing the cached entries.  Is
this the case in real life?  Is this a concern for Plex86?  (Context:  I am
used to MMUs for embedded CPUs, where memory-based tables are avoided to the
greatest extent possible.  Some processors even place the process table and
interrupt vector table in MMU/CPU hardware!)

2. I am seriously wondering if it might not be faster and easier to emulate
IA32 on a suitable processor (ala Bochs/Transmeta) than to try to virtualize
the architecture on actual IA32 hardware (ala Plex86)!  In particular, I'm
beginning to see why AMD has enlisted Transmeta to help with the IA32 support
for the SledgeHammer 64-bit processor:  Could it be that IA32 is better as a
software architecture than it is as a hardware architecture?

Of course, the answers to my questions will have no impact at all on Plex86.  I
hope instead to understand more of what I should worry about (where efficiency
is concerned) and what can be safely ignored (as part of the price of using
IA32).


> Dynamic branches next...

Wait, wait!  Let me catch up!

It is starting to seem that many of the "fun" parts of the Plex86 internals
will be finished before I'm up to speed.  Oh well, I'll just have to see what's
left when I get there...


-BobC



Reply via email to