At 01:20 PM 10/9/2001 +0200, Paolo Molaro wrote: >On 10/07/01 Bryan C. Warnock wrote: > > while (*pc) { > > switch (*pc) { > > } > > } > >With the early mono interpreter I observed a 10% slowdown when I >checked that the instruction pointer was within the method's code: >that is not a check you need on every opcode dispatch, but only with >branches, so it makes sense to have a simple while (1) loop.
Yeah, I really want to toss that as well for the normal case. Checking the validity of the instruction pointer's a good thing for the safe(ish) version of the interpreter loop, where we're explicitly asking for assurances. >About the goto label feature discussed elsewhere, if the dispatch >loop is emitted at compile time, there is no compatibility problem >with non-gcc compilers, since we know what compiler we are going to use. >I got speedups in the 10-20% range with dispatch-intensive benchmarks in >mono. It can also be coded in a way similar to the switch code >with a couple of defines, if needed, so that the same code compiles >on both gcc and strict ANSI compilers. If anyone wants an official ruling... DO_OP can contain any code you like as long as it: *) Some version ultimately compiles everywhere *) Allows opcodes above a certain point (probably 256 or 512) to be overridden at runtime Computed gotos, switches, function table dispatches, generated machine code, or some other form of coding madness are all OK. I don't care which way a particular platform goes as long as it doesn't constrain another platform's ability to go another way. > > seems simpler, but introduces some potential page (or at least i-cache(?)) > > thrashing, as you've got to do a significant jump just in order to jump > > again. The opcode comparison, followed by a small jump, behaves > much >nicer. FWIW (and I know this is out of order) the distance jumped really doesn't make any difference in the time it takes to jump. The real indicator if the time is what cache the destination is in. A 20 word jump will take longer than a 4M jump if the destination for the first is in main memory and the second in L1 cache. (L1 cache generally being loaded in 8, 16, or 32 byte chunks and L2 cache in 512 or 1024 byte chunks, but that's terribly processor-dependent) > > I've found [2] that the fastest solution (on the platforms I've tested) > are > > within the family: > > > > while (*pc) { > > if (*pc > CORE_OPCODE_NUMBER) { > > pc = func_table[*pc](); > > } else { > > switch (*pc) { > > } > > } >... but adds a comparison even for opcodes that don't need it. But it removes some uncertainty that the C compiler optimizer might be able to use to do better optimizations of the following switch. (Whether it *will* or not depends on the smarts built into the compiler) I can fully see this as a non-intuitive chunk of code that gets a performance boost. I'm really thinking that we're going to end up with benchmarked chunks of code keyed by OS and processor here, chosen by the configure process. I'm half-tempted to weld the benchmark runs into configure itself, but I'm leery of it choosing the wrong thing because some cron job fired off in the middle of its check. >The problem here is to make sure we really need the opcode swap >functionality, it's really something that is going to kill >dispatch performance. Dunno about kill, but it will have an impact. I'm fully aware of that, but lots of things in dynamic languages in general have performance impacts. Sometimes (and whether this is one of those times or not is up in the air) you just have to pay it. Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk