"Robert W. Cunningham" wrote:

> I have been reading up on the IA32 architecture and its various
> implementations, and I'm *just* starting to see some of the difficulties
> involved with Kevin's work.  Some minor questions:
> 
> 1. Am I dead wrong, or does the IA32 MMU impose massive amounts of overhead for
> the work it does?  I understand it has a fantastically flexible architecture,
> but it seems nobody makes use of the all its capabilities, and thus apps and
> OSes seem burdened with the weight/overhead of that unused flexibility.  In
> particular, the MMU makes extensive use of memory-based tables and linked
> lists.  While a number of these are cached within the CPU/MMU, it would seem
> any reasonably complex OS could easily start thrashing the cached entries.  Is
> this the case in real life?  Is this a concern for Plex86?  (Context:  I am
> used to MMUs for embedded CPUs, where memory-based tables are avoided to the
> greatest extent possible.  Some processors even place the process table and
> interrupt vector table in MMU/CPU hardware!)

Sounds like you're talking about native use of the paging tables.  They're
the only native feature I can think of that could be referred to as a
linked list.  These of course define the linear -> physical address
mappings of any given process running at the momemt.

For every page of code or data which is accessed a translation must
be done.  This would be really slow, except that the CPU has builtin
separate I&D TLB tables to cache these translations.  They are flushed
on context switches (actually PDBR reloads).  So there's a little time
spent rebuilding them after context switches.  But they generally
have good success ratios, with occasional failures costing the extra
PDBR->PageDir->PageTable indexing.

There is also support for larger page sizes.

Is this what you meant?  


> 2. I am seriously wondering if it might not be faster and easier to emulate
> IA32 on a suitable processor (ala Bochs/Transmeta) than to try to virtualize
> the architecture on actual IA32 hardware (ala Plex86)!  In particular, I'm
> beginning to see why AMD has enlisted Transmeta to help with the IA32 support
> for the SledgeHammer 64-bit processor:  Could it be that IA32 is better as a
> software architecture than it is as a hardware architecture?

Bochs has no where near the potential of plex86, since we can use
native features, and execute most instructions 1:1.  Our underlying point
here is to run multiple x86 OSes concurrently.

I realize you missed the beginnings of our project, but in a
nutshell, our primary mission is specifically x86 on x86.

To answer your question, x86 sucks as a "software architecture".  It
is essentially a software layer to CPUs which support it, which these
days use underlying RISC/VLIW architectures.  This of course, takes
special hardware processor design to make work.  You'll never be
able to generalize it to work well on an _arbitrary_ architecture.

There are a set of features which the target architecture should
support if you want to efficiently translate x86 code to that
architecture.  But that's an off-topic discussion for another forum, or at
least another day...

You mentioned Transmeta.  That's a good example.  Specialized
iron from AMD and Intel still rules the performance domain.

There are some interesting things that can be done when you have
software component in your strategy.  Go read the papers on
Dynamo for example.  You might look into IBM's Daisy project.

I don't want to get into a discussion about it here, but just
keep in mind for user-space projects, their strategies do not
necessarily carry over well to system level DT.  There are prices
to pay when it comes to context switches, which may occur
at 100Hz or 1000Hz.  For example, what do you do with tcode which
directly branches to other tcode in another page, which may now
have a different page mapping?  And how efficiently can you
re-adapt that tcode?


> Wait, wait!  Let me catch up!
> 
> It is starting to seem that many of the "fun" parts of the Plex86 internals
> will be finished before I'm up to speed.  Oh well, I'll just have to see what's
> left when I get there...

Some of the good parts will be tuning, after the architecture is in place.
What table sizes to use.  Which address bits to hash on - how much associativity
to use.  What alignments to start tcode boundaries on.  Which code
to inline versus push into a handler.  Or should we just emulate
until we hit a certain usage count, and then translate.

There's tons of this stuff.

-Kevin


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Kevin Lawton                        [EMAIL PROTECTED]
MandrakeSoft, Inc.                  Plex86 developer
http://www.linux-mandrake.com/      http://www.plex86.org/

Reply via email to