Gordon Henriksen <[EMAIL PROTECTED]> wrote:
> Leopold Toetsch wrote:

>> 2) Patch the native opcodes at these places with e.g. int3

> I don't think that bytecode-modifying versions should fly; they're not
> threadsafe,

Why? The bytecode is patched by a different thread *if* an event is due
(which in CPU cycles is rare). And I don't see a thread safety problem.
The (possibly different) CPU reads an opcode and runs it. Somewhere in
the meantime, the opcode at that memory position changes to the byte
sequence 0xCC (on intel: int3 ) one byte changes, the CPU executes the
trap or not (or course changing that memory position is assumed to be
atomic, which AFAIK works on i386) - but next time in the loop the trap
is honored.

> ... and it would be nice to write-protect the instruction stream
> to avert that attack vector.

We did protect it, so we can un- and reprotect it, that's not the
problem.

>> 1b) in places like described below under [1] c)

> I like this (1b). With the JIT, an event check could be inlined to 1
> load and 1 conditional branch to the event dispatcher, yes?

Yep. That's the plain average slower case :) Its a fallback, if there
are no better and faster solutions.

> (So long as
> &interp is already in a register.)

Arghh, damned i386 with *zero* registers, where zero is around 4
(usable, general... ) ;)
So no interpreter in registers here - no.

Its at least 3? cycles + branch prediction overhead, so a lot compared to
nul overhead...

> ... If that's done before blocking and at
> upward branches, the hit probably won't be killer for most of code. For
> REALLY tight loops (i.e., w/o branches or jumps, and w/ op count less
> than a particular threshold), maybe unroll the loop a few times and then
> still check on the upward branch.

Yep, loop unrolling would definitely help, that was the "currently very
likely working" solution in my head.

> Those branches will almost always fall straight through, so while there
> will be load in the platform's branch prediction cache and a bit of
> bloat, there shouldn't be much overhead in terms of pipeline bubbles.
> The "event ready" word (in the interpreter, presumably) will stay in the
> L1 or L2 cache, avoiding stalls.

Yep. Still I like these numbers:
$ parrot -j examples/assembly/mops.pasm
M op/s:        790.105001      # on AMD 800

> No, it's not zero-overhead, but it's simple and easy enough to do
> portably. Crazy platform-specific zero-overhead schemes can come later
> as optimizations.

s(Crazy)(Reasonable) but later is ok:)

leo

Reply via email to