[Only bother reading this if you're into the internals development.]

>From ChangeLog:

  - "Kevin P. Lawton" <[EMAIL PROTECTED]>: Fri Jan 12 22:57:11 EST 2001
    More enhancements to dt-testbed/proto2, and more notes in the README.

>From dt-testbed/proto2/README:

The generated tcode for static out-of-page branches has a token
embedded inline which is backpatched along with the direct tcode
branch address.  When this token matches a global token, the tcode
knows the direct branch address is valid.  Up to now, the global
token was not incremented as it would be in a real VM environment.

The idea is that the global token is incremented each time there
are changes to the page mappings, like a PDBR reload etc.  This
lets the tcode dynamically re-adapt itself to possible page mapping
differences since the last time the code was executed.  For example,
since the last context switch, a code page could have been swapped out
or have been otherwise remapped.  We would not want to execute associated
tcode for that page until we have revalidated that the conditions
under which the tcode was generated are the same for the page.
The inline token enables this check.

The cost of this method is extra storage per static out-of-page
branch, and the execution time of the token compare and eflags
state management.  The upside is the simplicity.  No big tables
or branch graphs are stored, or need invalidation/revalidation
management every guest context switch.

I used the setitimer()/signal() services to simulate a context
switch and for now, just increment the global token.  This
will force all the branches to use the handler routine for the
first time they are executed after the context switch.  The
handler routine backpatches the new token and tcode address
inline.  There will eventually be some constraints checking
on the given codepage.  For now I don't do any.

Anyways some results.  I scaled up the macro loops count.

  execution time    guest timeslice
  14.55             500000 uS   (2Hz)
  14.60              10000 uS (100Hz)

Only an extra 0.3% overhead for a higher frequency context switch,
on top of the overhead imposed by the extra code in the branch tcode
talked about previously.  In other words, the factors listed in
the previous section already include most of the overhead.  The
extra revalidation doesn't weight that heavily.  There will be more
overhead with a real VM system, but it's comforting.  I think this area
is where a lot of user-space only DT strategies that use more complicated
flow graphs, would run into trouble.  User space DT efforts can make
assumptions about the consistency of linear->physical mappings across
context switches since the OS takes care of paging automagically.  There's
only a few system calls you have to watch out for and dump tcode.

But a system oriented DT strategy can not make these assumptions.  So
we are faced with following choices:

  1) Use direct branches to target tcode.  Maintain branch trees of
     some sort.  There would have to be a certain amount of maintenance
     involved with fixing up the use of direct branches every context
     switch.  This overhead gets magnified with increases in the
     guest context switch frequency (using 1000Hz instead of 100Hz),
     and with clock skewing to keep the guest time reference in sync
     with the host.  It is also more complicated and requires more
     memory.  The direct branches would be much faster, but I'm not
     sure how this will balance out with the extra context switch
     burden.  Note that with higher loads on the host, more clock
     skewing needs to be applied to the guest which magnifies the
     effective guest context switch frequency.  Thus this method
     gets incrementally worse with higher host loads.

  2) Always generate a call to a branch handler.  This is simple,
     but slower.

  3) Generate a simple check inline, then use the direct branch most
     of the time.

I chose #3.  Admittedly, it strives for "pretty good" rather than
"great" or "excellent".  A decent balance of performance, simplicity,
and scalability to host load.

Working backwards, now it's a little easier to explain why the
methods I proposed in the Plex86 Internals Guide (PIG) are
page oriented - because of this context switch dynamic tcode address
revalidation process.  It's also why there are constraints (or
perhaps just a constraints ID) notated on the meta information
for each page, on the included graphics.

Dynamic branches next...

-Kevin


Reply via email to