No I can't. Since I did it, I naturally think it's a good idea.
Perhaps, instead of denigrating it without substantiating your claims
you could propose (and then implement, and then get adopted) a better
idea?
Sure. My own VM will take a lot longer to get done! ;) I don't want to
blemish any of your credit for building a cool VM. I was rather just
wondering why you decided to go for this particular implementation which
seems unobvious to me. Hence the question. I guess I should've
formulated it slightly differently :) More info below.
I can see why it would pay off for Lisp programmers to have
closures that run like the Pharo closures, since it has O(1)
access performance. However, this performance boost only starts
paying off once you have at least more than 4 levels of nested
closures, something which, unlike in LISP, almost never happens in
Pharo. Or at least shouldn't happen (if it does, it's probably ok
to punish the people by giving them slower performance).
Slower performance than what? BTW, I think you have things backwards.
I modelled the pharo closure implementation on lisp closures, not the
other way around.
This is exactly what I meant. The closures seem like a very good idea
for languages with very deeply nested closures. Lisp is such a language
with all the macros ... I don't really see this being so in Pharo.
This implementation is pretty hard to understand, and it makes
decompilation semi-impossible unless you make very strong
assumptions about how the bytecodes are used. This then again
reduces the reusability of the new bytecodes and probably of the
decompiler once people start actually using the pushNewArray:
bytecodes.
Um, the decompiler works, and in fact works better now than it did a
couple of years ago. So how does your claim stand up?
For example when I just use the InstructionClient I get in pushNewArray:
and then later popIntoTemp. This combination is supposed to make clear
that you are storing a remote array. This is not what the bytecode says
however. And this bytecode can easily be reused for something else; what
if I use the bytecode to make my own arrays? What if this array is
created in a different way? I can think of a lot of ways the temparray
could come to be using lots of variations of bytecodes, from which I
would never (...) be able to figure out that it's actually making the
tempvector. Somehow I just feel there's a bigger disconnect between the
bytecodes and the Smalltalk code and I'm unsure if this isn't harmful.
But ok, I am working on the Opal decompiler of course. Are you building
an IR out with your decompiler? If so I'd like to have a look since I'm
spending the whole day already trying to get the Opal compiler to
somehow do what I want... getting one that works and builds a reusable
IR would be useful. (I'm implementing your field-index-updating through
bytecode transformation btw).
You might save a teeny tiny bit of memory by having stuff garbage
collected when it's not needed anymore ... but I doubt that the
whole design is based on that? Especially since it just penalizes
the performance in almost all possible ways for standard methods.
And it even wastes memory in general cases. I don't get it.
What has garbage collection got to do with anything? What precisely
are you talking about? Indirection vectors? To understand the
rationale for indirection vectors you have to understand the rationale
for implementing closures on a conventional machine stack. For lisp
that's clear; compile to a conventional stack as that's an easy model,
in which case one has to store values that outlive LIFO discipline on
the heap, hence indirection vectors. Why you might want to do that in
a Smalltalk implementation when you could just access the outer
context directly has a lot to do with VM internals. Basically its the
same argument. If one can map Smalltalk execution to a conventional
stack organization then the JIT can produce a more efficient execution
engine. Not doing this causes significant problems in context management.
With the garbage collection I meant the fact that you can already
collect part of the stack frames and leave other parts (the remote
temps) and only get them GCd later on when possible.
I do understand why you want to keep them on the stack as long as
possible. The stack-frame marriage stuff for optimizations is very neat
indeed. What I'm more worried about myself is the fact that stackframes
aren't just linked to each other and share memory that way. This means
that you only have 1 indirection to access the method-frame (via the
homeContext), and 1 for the outer context. You can directly access
yourself. So only the 4rd context will have 2 indirections (what all
contexts have now for remotes). From the 5th on it gets worse... but I
can't really see this happening in real world situations.
Then you have the problem that since you don't just link the frames and
don't look up values via the frames, you have to copy over part of your
frame for activation. This isn't necessarily -that- slow (although it is
an overhead); but it's slightly clumsy and uses more memory. And that's
where my problem lies I guess ... There's such a straightforward
implementation possible, by just linking up stackframes (well... they
are already linked up anyway), and traversing them. You'll have to do
some rewriting whenever you leave a context that's still needed, but you
do that anyway for the remote temps right?
The explanation is all on my blog
<http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/>
and in my Context Management in VisualWorks 5i
<http://www.esug.org/data/Articles/misc/oopsla99-contexts.pdf> paper.
But does a bright buy like yourself find this /really/ hard to
understand? It's not that hard a transformation, and compared to what
goes on in the JIT (e.g. in bytecode to machine-code pc mapping) its
pretty trivial.
I guess I just like to really see what's going on by having a decent
model around. When I look at the bytecodes; in the end I can reconstruct
what it's doing ... as long as they are aligned in the way that the
compiler currently generates them. But I can easily see how slight
permutations would already throw me off completely.
But probably I'm missing something?
It's me who's missing something. I did the simplest thing I knew
could possibly work re getting an efficient JIT and a Squeak with
closures (there's huge similarity between the above scheme and the
changes I made to VisualWorks that resulted in a VM that was 2 to 3
times faster depending on platform than VW 1.0). But you can see a
far more efficient and simple scheme. What is it?
Basically my scheme isn't necessarily far more efficient. It's just more
understandable I think. I can understand scopes that point to their
outer scope; and I can follow these scopes to see how the lookup works.
And the fact that it does some pointer dereferencing and copying of data
less is just something that makes me think it wouldn't be less efficient
than what you have now. My problem is not that your implementation is
slow, rather that it's complex. And I don't really see why this
complexity is needed.
Obviously playing on my ego by telling me I should be clever enough to
understand it makes me say I do! But I feel it's not the easiest; and
probably less people understand this than the general model of just
linking contexts together.
best,
Eliot
cheers,
Toon