No I can't. Since I did it, I naturally think it's a good idea. Perhaps, instead of denigrating it without substantiating your claims you could propose (and then implement, and then get adopted) a better idea?
Sure. My own VM will take a lot longer to get done! ;) I don't want to blemish any of your credit for building a cool VM. I was rather just wondering why you decided to go for this particular implementation which seems unobvious to me. Hence the question. I guess I should've formulated it slightly differently :) More info below.

    I can see why it would pay off for Lisp programmers to have
    closures that run like the Pharo closures, since it has O(1)
    access performance. However, this performance boost only starts
    paying off once you have at least more than 4 levels of nested
    closures, something which, unlike in LISP, almost never happens in
    Pharo. Or at least shouldn't happen (if it does, it's probably ok
    to punish the people by giving them slower performance).


Slower performance than what? BTW, I think you have things backwards. I modelled the pharo closure implementation on lisp closures, not the other way around.
This is exactly what I meant. The closures seem like a very good idea for languages with very deeply nested closures. Lisp is such a language with all the macros ... I don't really see this being so in Pharo.

    This implementation is pretty hard to understand, and it makes
    decompilation semi-impossible unless you make very strong
    assumptions about how the bytecodes are used. This then again
    reduces the reusability of the new bytecodes and probably of the
    decompiler once people start actually using the pushNewArray:
    bytecodes.


Um, the decompiler works, and in fact works better now than it did a couple of years ago. So how does your claim stand up?
For example when I just use the InstructionClient I get in pushNewArray: and then later popIntoTemp. This combination is supposed to make clear that you are storing a remote array. This is not what the bytecode says however. And this bytecode can easily be reused for something else; what if I use the bytecode to make my own arrays? What if this array is created in a different way? I can think of a lot of ways the temparray could come to be using lots of variations of bytecodes, from which I would never (...) be able to figure out that it's actually making the tempvector. Somehow I just feel there's a bigger disconnect between the bytecodes and the Smalltalk code and I'm unsure if this isn't harmful.

But ok, I am working on the Opal decompiler of course. Are you building an IR out with your decompiler? If so I'd like to have a look since I'm spending the whole day already trying to get the Opal compiler to somehow do what I want... getting one that works and builds a reusable IR would be useful. (I'm implementing your field-index-updating through bytecode transformation btw).

    You might save a teeny tiny bit of memory by having stuff garbage
    collected when it's not needed anymore ... but I doubt that the
    whole design is based on that? Especially since it just penalizes
    the performance in almost all possible ways for standard methods.
    And it even wastes memory in general cases. I don't get it.


What has garbage collection got to do with anything? What precisely are you talking about? Indirection vectors? To understand the rationale for indirection vectors you have to understand the rationale for implementing closures on a conventional machine stack. For lisp that's clear; compile to a conventional stack as that's an easy model, in which case one has to store values that outlive LIFO discipline on the heap, hence indirection vectors. Why you might want to do that in a Smalltalk implementation when you could just access the outer context directly has a lot to do with VM internals. Basically its the same argument. If one can map Smalltalk execution to a conventional stack organization then the JIT can produce a more efficient execution engine. Not doing this causes significant problems in context management.
With the garbage collection I meant the fact that you can already collect part of the stack frames and leave other parts (the remote temps) and only get them GCd later on when possible.

I do understand why you want to keep them on the stack as long as possible. The stack-frame marriage stuff for optimizations is very neat indeed. What I'm more worried about myself is the fact that stackframes aren't just linked to each other and share memory that way. This means that you only have 1 indirection to access the method-frame (via the homeContext), and 1 for the outer context. You can directly access yourself. So only the 4rd context will have 2 indirections (what all contexts have now for remotes). From the 5th on it gets worse... but I can't really see this happening in real world situations.

Then you have the problem that since you don't just link the frames and don't look up values via the frames, you have to copy over part of your frame for activation. This isn't necessarily -that- slow (although it is an overhead); but it's slightly clumsy and uses more memory. And that's where my problem lies I guess ... There's such a straightforward implementation possible, by just linking up stackframes (well... they are already linked up anyway), and traversing them. You'll have to do some rewriting whenever you leave a context that's still needed, but you do that anyway for the remote temps right?
The explanation is all on my blog <http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/> and in my Context Management in VisualWorks 5i <http://www.esug.org/data/Articles/misc/oopsla99-contexts.pdf> paper.

But does a bright buy like yourself find this /really/ hard to understand? It's not that hard a transformation, and compared to what goes on in the JIT (e.g. in bytecode to machine-code pc mapping) its pretty trivial.
I guess I just like to really see what's going on by having a decent model around. When I look at the bytecodes; in the end I can reconstruct what it's doing ... as long as they are aligned in the way that the compiler currently generates them. But I can easily see how slight permutations would already throw me off completely.


    But probably I'm missing something?


It's me who's missing something. I did the simplest thing I knew could possibly work re getting an efficient JIT and a Squeak with closures (there's huge similarity between the above scheme and the changes I made to VisualWorks that resulted in a VM that was 2 to 3 times faster depending on platform than VW 1.0). But you can see a far more efficient and simple scheme. What is it?
Basically my scheme isn't necessarily far more efficient. It's just more understandable I think. I can understand scopes that point to their outer scope; and I can follow these scopes to see how the lookup works. And the fact that it does some pointer dereferencing and copying of data less is just something that makes me think it wouldn't be less efficient than what you have now. My problem is not that your implementation is slow, rather that it's complex. And I don't really see why this complexity is needed.

Obviously playing on my ego by telling me I should be clever enough to understand it makes me say I do! But I feel it's not the easiest; and probably less people understand this than the general model of just linking contexts together.

best,
Eliot
cheers,
Toon

Reply via email to