On Fri, Feb 22, 2002 at 09:00:29AM -0500, Gregor N. Purdy wrote: > I'm not surprised that find_op() is causing some distress. The "best > way" is subject to interpretation, of course. TMTOWTDI all over again. > I chose this way because whenever I started talking about op lookup > by name, cries went up about "it will be too slow!". So, I wrote the > absolutely fastest implementation I could think of. It requires only > a single scan over the op name to determine whether or not its valid, > and to return its op index in the oplib in question. I don't recall > the exact numbers, but I was getting a lookups-per-second on a par > with the ops-per-second from mops.pasm, which I think is excellent > (and probably faster than it needs to be :).
Do you have the benchmarking code handy? [ie I'd like to see if what I'm thinking of comes anywhere near, on a quantitative test. I might not have tuits immediately, so there's no real hurry] I agree about "best" being subject to interpretation, which is why I chose that word, not "fastest" :-) There's some compromise needed between code speed and code size, and I wasn't going to be so brave as to actually say something I could be held to in the future. (Well, not yet, until I'd had a play with things) > (1) Split core.ops into multiple oplibs, with the *really* core > stuff (Parrot couldn't do anything reasonable without them) > left in core.ops, and other stuff moved out to loadable oplibs. > There might even be enough heat generated by a discussion of > this we could have a nice marshmallow roast :) Hmm. I see we have sinh, cosh, tanh in core ops. When have I ever needed them? :-) Whereas something useful like erf, well, we don't have that :-( [Joking aside, in a previous life I did need to compute things [diffusion equations] with erf. As best I can remember I've never needed to evaluate hyperbolic trig functions. Should we put erf in?] > (2) Switch the implementation of find_op() to something that > makes for smaller code (I almost said simpler, but I think > nested switches aren't inherently non-simple, just hard for > the compiler to keep in mind all at once when they are this > large). I had considered laying down static data structures > for a hash table, or binary search tree or something, but I > decided to make the first implementation the fastest, all > other considerations be damned. I'm curious if it is actually the fastest. The large numbers of small switch statements made me wonder what sort of code is actually getting generated. The switch statements have sufficiently few choices that it seems unlikely that the compiler will elect to use jump tables, and instead effectively code them as a series of if()s A maze of twisty ifs, all alike, means 3 CPU instructions per character: ldrb r3, [r0, #6] @ zero_extendqisi2 cmp r3, #115 bne .L1782 [yes, if it's coming from me, it's going to be ARM assembler, not some nasty CISC thing :-)] which with all the potential branching blows pipelines away. And 12 bytes of code per 1 byte of string to test is a lot of code to shunt through RAM Switch statements with more possibilities cause compilers to start generating jump tables: ..L2396: ldrb r3, [r0, #1] @ zero_extendqisi2 sub r3, r3, #104 cmp r3, #7 ldrls pc, [pc, r3, asl #2] b .L2790 .align 2 ..L2791: .word .L2398 .word .L1782 .word .L1782 .word .L1782 .word .L2436 .word .L2523 .word .L1782 .word .L2627 which I think (I'm guessing here) are slightly denser than multiple ifs, but more importantly have less branches, and may well run less instructions. gcc 2.95 on ARM elected to use 19 jump tables in find_op. I count 1879 switch statements. On the other hand, perl5's Fcntl's constants seem to have mostly jump tables, but the final full string check is done with memcmp (which isn't being inlined on -O2 on ARM), so it may turn out to be slower. Hence why I'd be intrigued to benchmark it against the current approach. > I don't think (2) is a bad idea, but I'm actually in favor of (1), > whether or not we do (2). In fact, I've even toyed with the (doomed, > I'm sure) idea of having only one hard-coded op (wired to op zero, > but maybe still replaceable): > This approach means we don't have to even tinker with the packfile > format to get bytecode-relative optables, since they are (by > convention) built on the fly by a preamble. And, you can tinker all > you want with the opcode table at run time, too. Avoid the special case. All ops are equal. :-) Nicholas Clark -- EMCFT http://www.ccl4.org/~nick/CV.html