On Thu, Mar 24, 2011 at 5:44 PM, Toon Verwaest <[email protected]>wrote:

>
>  No I can't.  Since I did it, I naturally think it's a good idea.
>  Perhaps, instead of denigrating it without substantiating your claims you
> could propose (and then implement, and then get adopted) a better idea?
>
> Sure. My own VM will take a lot longer to get done! ;) I don't want to
> blemish any of your credit for building a cool VM. I was rather just
> wondering why you decided to go for this particular implementation which
> seems unobvious to me. Hence the question. I guess I should've formulated it
> slightly differently :) More info below.
>
>   I can see why it would pay off for Lisp programmers to have closures
>> that run like the Pharo closures, since it has O(1) access performance.
>> However, this performance boost only starts paying off once you have at
>> least more than 4 levels of nested closures, something which, unlike in
>> LISP, almost never happens in Pharo. Or at least shouldn't happen (if it
>> does, it's probably ok to punish the people by giving them slower
>> performance).
>>
>
>  Slower performance than what?  BTW, I think you have things backwards.  I
> modelled the pharo closure implementation on lisp closures, not the other
> way around.
>
> This is exactly what I meant. The closures seem like a very good idea for
> languages with very deeply nested closures. Lisp is such a language with all
> the macros ... I don't really see this being so in Pharo.
>

The issue is nothing to do with nesting and everything to do with executing
closures on a stack.   Since closures are non-LIFO (can outlive their
dynamic extent) any non-locval state they have should /not/ be held on the
stack.  Hence copying and indirection vectors.

  This implementation is pretty hard to understand, and it makes
>> decompilation semi-impossible unless you make very strong assumptions about
>> how the bytecodes are used. This then again reduces the reusability of the
>> new bytecodes and probably of the decompiler once people start actually
>> using the pushNewArray: bytecodes.
>>
>
>  Um, the decompiler works, and in fact works better now than it did a
> couple of years ago.  So how does your claim stand up?
>
> For example when I just use the InstructionClient I get in pushNewArray:
> and then later popIntoTemp. This combination is supposed to make clear that
> you are storing a remote array. This is not what the bytecode says however.
> And this bytecode can easily be reused for something else; what if I use the
> bytecode to make my own arrays? What if this array is created in a different
> way? I can think of a lot of ways the temparray could come to be using lots
> of variations of bytecodes, from which I would never (...) be able to figure
> out that it's actually making the tempvector. Somehow I just feel there's a
> bigger disconnect between the bytecodes and the Smalltalk code and I'm
> unsure if this isn't harmful.
>

I think you're setting up a straw-man here.  Until you find an example which
is ambiguous you haven't really got a case.  First of all pushNewArray:
consNewArray is /not/ used to implement Array new: N, quite rightly.  It is
currently only used for tuples and indirection vectors.  It is clearly
unnecessary (and for the decompiler extremely confusing) to use it for Array
new: N.  What other uses do you have in mind?

And how, other than vague stirrings, is this actually harmful?  Be sure :)



>
> But ok, I am working on the Opal decompiler of course. Are you building an
> IR out with your decompiler? If so I'd like to have a look since I'm
> spending the whole day already trying to get the Opal compiler to somehow do
> what I want... getting one that works and builds a reusable IR would be
> useful. (I'm implementing your field-index-updating through bytecode
> transformation btw).
>

I just modified the base Squeak decompiler.  The IR there-in is a stack of
stacks of parse nodes.  It's a bit hairy but works well enough.

  You might save a teeny tiny bit of memory by having stuff garbage
>> collected when it's not needed anymore ... but I doubt that the whole design
>> is based on that? Especially since it just penalizes the performance in
>> almost all possible ways for standard methods. And it even wastes memory in
>> general cases. I don't get it.
>>
>
>  What has garbage collection got to do with anything?  What precisely are
> you talking about?  Indirection vectors?  To understand the rationale for
> indirection vectors you have to understand the rationale for implementing
> closures on a conventional machine stack.  For lisp that's clear; compile to
> a conventional stack as that's an easy model, in which case one has to store
> values that outlive LIFO discipline on the heap, hence indirection vectors.
>  Why you might want to do that in a Smalltalk implementation when you could
> just access the outer context directly has a lot to do with VM internals.
>  Basically its the same argument.  If one can map Smalltalk execution to a
> conventional stack organization then the JIT can produce a more efficient
> execution engine. Not doing this causes significant problems in context
> management.
>
> With the garbage collection I meant the fact that you can already collect
> part of the stack frames and leave other parts (the remote temps) and only
> get them GCd later on when possible.
>

That's not the real issue one is solving.  The real issue one is solving is
not having to write-back stack state to contexts.  Read the f-ing paper ;)


>
> I do understand why you want to keep them on the stack as long as possible.
> The stack-frame marriage stuff for optimizations is very neat indeed. What
> I'm more worried about myself is the fact that stackframes aren't just
> linked to each other and share memory that way. This means that you only
> have 1 indirection to access the method-frame (via the homeContext), and 1
> for the outer context. You can directly access yourself. So only the 4rd
> context will have 2 indirections (what all contexts have now for remotes).
> From the 5th on it gets worse... but I can't really see this happening in
> real world situations.
>

The only place extra indirections are introduced is in the static chain for
nesting, i.e. non-local return being more expensive.  Variable access is
O(1).  Either you pay at variable dereference time (by walking the static
chain, a bad idea, closed over variables are read at least as many times as
they're read, so reduce the read time, not the close-over time) or at
closure-creation time (copying).  Yes, creating them isn't cheap, but its
just an allocation (and that the allocator is slow isn't closures fault, and
is remediable), and an adaptive optimizer would eliminate them altogether
and to an extent make this entire conversation moot.

Then you have the problem that since you don't just link the frames and
> don't look up values via the frames, you have to copy over part of your
> frame for activation. This isn't necessarily -that- slow (although it is an
> overhead); but it's slightly clumsy and uses more memory. And that's where
> my problem lies I guess ... There's such a straightforward implementation
> possible, by just linking up stackframes (well... they are already linked up
> anyway), and traversing them. You'll have to do some rewriting whenever you
> leave a context that's still needed, but you do that anyway for the remote
> temps right?
>

Read the f-ing paper ;)


>
>  The explanation is all on my 
> blog<http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/>and
>  in my Context
> Management in VisualWorks 
> 5i<http://www.esug.org/data/Articles/misc/oopsla99-contexts.pdf>paper.
>
>  But does a bright buy like yourself find this /really/ hard to
> understand?  It's not that hard a transformation, and compared to what goes
> on in the JIT (e.g. in bytecode to machine-code pc mapping) its pretty
> trivial.
>
> I guess I just like to really see what's going on by having a decent model
> around. When I look at the bytecodes; in the end I can reconstruct what it's
> doing ... as long as they are aligned in the way that the compiler currently
> generates them. But I can easily see how slight permutations would already
> throw me off completely.
>
>
>
>> But probably I'm missing something?
>>
>
>  It's me who's missing something.  I did the simplest thing I knew could
> possibly work re getting an efficient JIT and a Squeak with closures
> (there's huge similarity between the above scheme and the changes I made to
> VisualWorks that resulted in a VM that was 2 to 3 times faster depending on
> platform than VW 1.0).  But you can see a far more efficient and simple
> scheme.  What is it?
>
> Basically my scheme isn't necessarily far more efficient. It's just more
> understandable I think. I can understand scopes that point to their outer
> scope; and I can follow these scopes to see how the lookup works. And the
> fact that it does some pointer dereferencing and copying of data less is
> just something that makes me think it wouldn't be less efficient than what
> you have now. My problem is not that your implementation is slow, rather
> that it's complex. And I don't really see why this complexity is needed.
>
> Obviously playing on my ego by telling me I should be clever enough to
> understand it makes me say I do! But I feel it's not the easiest; and
> probably less people understand this than the general model of just linking
> contexts together.
>

Read the f-ing paper ;)


>
>  best,
> Eliot
>
> cheers,
> Toon
>

best
Eliot

Reply via email to