Kevin Lawton wrote:

>   - "Kevin P. Lawton" <[EMAIL PROTECTED]>: Mon Jan  1 23:26:44 EST 2001
>     Added another docs chapter (17) to the user's manual, relating to
>       dynamic translation (DT) ideas for maintaining linear to translated
>       code address mappings and some other stuff.  You can update from
>       CVS and just untar the docs/ tarball if you want.
>
> For the folks who like hash functions and DT, have a look at
> chapter 17.  Think I'll split the DocBook into 2, one user's
> manual and one techie manual.
>
> I could use some feedback.

If the "density" of the tcode sequences (how large the gaps between tcode
sequences can be) can be very low (and/or sparse), then the location of the
tcode sequence can be determined IN ADVANCE (at the time of translation) as a
"nice" location that is "easy" to infer from the "real" 32-bit linear guest
code address.

In essence, the tcode can be the contents of the hash table, rather than the
hash table containing pointers to the tcode.

Is there any need to "pack" the tcode sequences?  How bad would it be if each
and every tcode sequence had its own 4k page?  (Not that that will be
necessary, but it does represent one possible extreme.)  I assume the tcode
address space is completely independent of the guest code address space, so it
may even be possible to use the identity function as our "hash" function.  (Not
likely, just vaguely possible from this first quick look.)

My initial thoughts are to place tcode sequences at exactly the "right" address
WITHIN a 4k page, and that the page address can be selectively hashed to find
which tcode page contains the desire tcode sequence.  (The 4k bound is not
mandatory - any can be used.  But it seems handy and

This can be made extremely simple if more than one hash function can be
specified, possibly by using a few bits of the contents of the target of the
32-bit address (in guest space).

The goal is to effect a collision-free access (hash lookup) 100% of the time.
Possible downsides include:

1.  tcode sequence locality does not match guest code locality, so tight loops
with multiple targets can become expensive as the pages containing the tcode
sequences are activated in turn (rather than having all tcode for a given page
of guest code reside in a single page).  I'm not saying that all tcode for a
given guest page CAN'T reside in a single page, merely that my initial look at
the problem makes this appear difficult, and certainly non-trivial.

2. tcode sequences may need substantial additional overhead information (and
thus grow larger) to assist in tcode sequence management (especially
invalidation and/or optimization, since locality will likely not be preserved).

There may be more downsides, but my initial hunch is that an absolute minimum
tcode lookup may be possible, that being a single address calculation performed
using the CPU hardware (and not an elaborate software algorithm).  The cost of
this fast access may be page/cache misses (though the working set should adjust
well enough), and more (possibly MUCH more) difficult tcode optimization and
management (slower, more overhead).

The addition of one more level of indirection can greatly reduce the downsides,
with the penalty being at least a doubling of the lookup cost (probably worse).

Now, if some or all of the tcode can reside within unused portions of the guest
code space, then a more complex lookup algorithm may be justified by the
elimination/reduction of interaction with the VM.

Let's say we simply require at least 256 MB of DRAM for use by Plex86 1.0.
With all that space, we may be able to get away with some interesting
permutations (perversions?) of the tcode lookup process.  We may even want
tcode use to determine the maximum size of the working set of guest pages (say,
no more than 32 or 64 MB, probably orders of magnitude less), and try to
preserve tcode while we madly flush and refill guest code pages.  Just a
thought.  If we keep all tcode pages resident, then the locality problems
become far less important, possibly irrelevant.  Let's say tcode *never* gets
flushed, not even when the corresponding guest page is flushed:  It can then
only be affected by tcode optimizations and things like self-modifying code.
And these need not have severe speed requirements.


I'll keep reading and pondering:  These are merely the initial
off-the-top-of-my-head thoughts.  And they may do little more than display my
ignorance of Plex86.  If that's the case, please feed me the associated Clues,
using small words and large print.  And be sure to use a small spoon.


-BobC



Reply via email to