Kevin Lawton wrote:

> I'm working out some details of the quasi dynamic translation
> engine replacement for SBE.
>
> Looks like it will be important for performance to have an
> efficient way to map from a guest linear page address to
> an index of a table (array) of entries representing meta
> information about the code cached at that page.
>
> There's no correlation between a linear address and the
> associated table index.  Elements of the table are used/unused
> as code in various pages is discovered by the DT engine.
>
> So, we'd have to hash from the linear page address (upper 20bits)
> to and array index (some value N), quickly with ideally not
> much memory consumption for lookup tables.
>
> Ideas?

I've been working with hash functions recently, but more info is needed
before I can make any kind of recommendation:

1. How large is "N" expected to be?  Does it have a hard upper limit?
(If not, can we assign one?)

2. Can the hash use a fixed overhead (allocate and optimize for maximum
value of N), or should it grow and shrink with N?

3. I suspect it may be possible to integrate the hash into your meta
info array.  Is that array statically allocated?  Is it ever dynamically
reallocated?  How large are the entries in the meta info array?

4. It is always possible to trade compute time for space in a hash.
Just how slow is the hash allowed to be?  (How often will it be hit?)
Conversely, what is the maximum tolerable space penalty?  (May it, for
example, equal the size of your meta info array?)

5. Is the page address already part of the meta info you save?  Is the
array relatively small?  If so, you may want to consider keeping the
array sorted, then use a simple binary (or other) search of the array to
do the lookup.  It ain't elegant, but it uses absolutely no overhead.
By the same token, for larger arrays an AVL tree would add little
overhead, but greatly simplify the process of keeping the array data
sorted, and traversing the tree will be much faster than most searches.


It looks to me like all possible mappings through the hash can't be
known in advance, so that eliminates the use of a perfect hash (absolute
minimum on both speed and space).  Without a perfect hash, you will have
one of two things:  Collisions (and collision resolution algorithms) or
gaps in the hash table(s).  Collisions save space, but cost time to
resolve.

If all the page addresses are already mapped into x86 GDT (?)
structures, I believe there may already be some prior art on solving
this problem (or a similar one).

Finally, I'm not really clear how you want the hash to work:  Will you
want *any* address from the page to map to the right array entry, or
just the *start* address of the page?  Are the pages of uniform size
(4k), or do they vary (clusters of MMU pages)?

If you can identify where in the current Plex86 code base the hash would
go, it would sure help.  A high-level description, complete with
function prototype, would help more (to strictly identify the inputs
available, and the outputs desired).


-BobC



Reply via email to