Re: [Pharo-project] pharo closures

Toon Verwaest Thu, 24 Mar 2011 17:44:53 -0700

No I can't. Since I did it, I naturally think it's a good idea.Perhaps, instead of denigrating it without substantiating your claimsyou could propose (and then implement, and then get adopted) a betteridea?

Sure. My own VM will take a lot longer to get done! ;) I don't want toblemish any of your credit for building a cool VM. I was rather justwondering why you decided to go for this particular implementation whichseems unobvious to me. Hence the question. I guess I should'veformulated it slightly differently :) More info below.


    I can see why it would pay off for Lisp programmers to have
    closures that run like the Pharo closures, since it has O(1)
    access performance. However, this performance boost only starts
    paying off once you have at least more than 4 levels of nested
    closures, something which, unlike in LISP, almost never happens in
    Pharo. Or at least shouldn't happen (if it does, it's probably ok
    to punish the people by giving them slower performance).

Slower performance than what? BTW, I think you have things backwards.I modelled the pharo closure implementation on lisp closures, not theother way around.

This is exactly what I meant. The closures seem like a very good ideafor languages with very deeply nested closures. Lisp is such a languagewith all the macros ... I don't really see this being so in Pharo.


    This implementation is pretty hard to understand, and it makes
    decompilation semi-impossible unless you make very strong
    assumptions about how the bytecodes are used. This then again
    reduces the reusability of the new bytecodes and probably of the
    decompiler once people start actually using the pushNewArray:
    bytecodes.

Um, the decompiler works, and in fact works better now than it did acouple of years ago. So how does your claim stand up?

For example when I just use the InstructionClient I get in pushNewArray:and then later popIntoTemp. This combination is supposed to make clearthat you are storing a remote array. This is not what the bytecode sayshowever. And this bytecode can easily be reused for something else; whatif I use the bytecode to make my own arrays? What if this array iscreated in a different way? I can think of a lot of ways the temparraycould come to be using lots of variations of bytecodes, from which Iwould never (...) be able to figure out that it's actually making thetempvector. Somehow I just feel there's a bigger disconnect between thebytecodes and the Smalltalk code and I'm unsure if this isn't harmful.

But ok, I am working on the Opal decompiler of course. Are you buildingan IR out with your decompiler? If so I'd like to have a look since I'mspending the whole day already trying to get the Opal compiler tosomehow do what I want... getting one that works and builds a reusableIR would be useful. (I'm implementing your field-index-updating throughbytecode transformation btw).

    You might save a teeny tiny bit of memory by having stuff garbage
    collected when it's not needed anymore ... but I doubt that the
    whole design is based on that? Especially since it just penalizes
    the performance in almost all possible ways for standard methods.
    And it even wastes memory in general cases. I don't get it.
What has garbage collection got to do with anything? What preciselyare you talking about? Indirection vectors? To understand therationale for indirection vectors you have to understand the rationalefor implementing closures on a conventional machine stack. For lispthat's clear; compile to a conventional stack as that's an easy model,in which case one has to store values that outlive LIFO discipline onthe heap, hence indirection vectors. Why you might want to do that ina Smalltalk implementation when you could just access the outercontext directly has a lot to do with VM internals. Basically its thesame argument. If one can map Smalltalk execution to a conventionalstack organization then the JIT can produce a more efficient executionengine. Not doing this causes significant problems in context management.

With the garbage collection I meant the fact that you can alreadycollect part of the stack frames and leave other parts (the remotetemps) and only get them GCd later on when possible.

I do understand why you want to keep them on the stack as long aspossible. The stack-frame marriage stuff for optimizations is very neatindeed. What I'm more worried about myself is the fact that stackframesaren't just linked to each other and share memory that way. This meansthat you only have 1 indirection to access the method-frame (via thehomeContext), and 1 for the outer context. You can directly accessyourself. So only the 4rd context will have 2 indirections (what allcontexts have now for remotes). From the 5th on it gets worse... but Ican't really see this happening in real world situations.

Then you have the problem that since you don't just link the frames anddon't look up values via the frames, you have to copy over part of yourframe for activation. This isn't necessarily -that- slow (although it isan overhead); but it's slightly clumsy and uses more memory. And that'swhere my problem lies I guess ... There's such a straightforwardimplementation possible, by just linking up stackframes (well... theyare already linked up anyway), and traversing them. You'll have to dosome rewriting whenever you leave a context that's still needed, but youdo that anyway for the remote temps right?

The explanation is all on my blog<http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/>and in my Context Management in VisualWorks 5i<http://www.esug.org/data/Articles/misc/oopsla99-contexts.pdf> paper.
But does a bright buy like yourself find this /really/ hard tounderstand? It's not that hard a transformation, and compared to whatgoes on in the JIT (e.g. in bytecode to machine-code pc mapping) itspretty trivial.

I guess I just like to really see what's going on by having a decentmodel around. When I look at the bytecodes; in the end I can reconstructwhat it's doing ... as long as they are aligned in the way that thecompiler currently generates them. But I can easily see how slightpermutations would already throw me off completely.

    But probably I'm missing something?
It's me who's missing something. I did the simplest thing I knewcould possibly work re getting an efficient JIT and a Squeak withclosures (there's huge similarity between the above scheme and thechanges I made to VisualWorks that resulted in a VM that was 2 to 3times faster depending on platform than VW 1.0). But you can see afar more efficient and simple scheme. What is it?

Basically my scheme isn't necessarily far more efficient. It's just moreunderstandable I think. I can understand scopes that point to theirouter scope; and I can follow these scopes to see how the lookup works.And the fact that it does some pointer dereferencing and copying of dataless is just something that makes me think it wouldn't be less efficientthan what you have now. My problem is not that your implementation isslow, rather that it's complex. And I don't really see why thiscomplexity is needed.

Obviously playing on my ego by telling me I should be clever enough tounderstand it makes me say I do! But I feel it's not the easiest; andprobably less people understand this than the general model of justlinking contexts together.


best,
Eliot

cheers,
Toon

Re: [Pharo-project] pharo closures

Reply via email to