Re: [perl #44395] Lexical scopes are too coarse-grained for loops.
From: Matt Diephouse via RT [EMAIL PROTECTED] Date: Sat, 04 Aug 2007 00:26:05 -0700 . . . Again, I believe this is what Bob was saying: it's not possible to be faithful to the original p5 source without creating a separate subroutine for every loop body. Exactly. And there are reasons why that's a bad idea (even if that's what they are in Perl 6): - It introduces extra overhead in terms of the compiled code - It's extra work for the compiler ? introducing an extra pass for lexical analysis I am discovering that this is especially cumbersome for languages that have goto. FWIW, I believe that in Lisp they are restricted in such a way that it is possible (1) to identify the scopes that might need to be broken out into separate subs, and (2) to optimize them away in most of the cases where this is possible without needing to do too much control flow analysis. It still requires an extra pass and more runtime overhead, though. But languages with less restricted goto are hairier still. Consider the test-closures-4.pl script (first attachment) which rewrites the loop to use an explicit goto, and adds a few extra lexical variables and control flow quirks. Ostensibly, one would translate this to PIR by splitting the inner block at each label, and implementing the block fragments as subs (attached as test-closures-4.pir). Because the scopes are nested, the blocks must be nested as well, i.e. the sub for push_it must reference the sub for before_c as its :outer sub, and similarly the sub for the start of the block (let's call it block_start) must have before_c as its :outer. First off, note that I have finessed the implementation of goto again by using normal sub return, and moving this goto into the outermost sub. To handle situations where we are jumping to a lexically earlier label more generally, the compiler could create a continuation in the outer sub and stuff it in a new lexical variable so that the inner sub can call it if needed. The fully general technique would be to split the again label body out into a sub and just call it. Note that tailcalling here makes little difference, because all of the various contexts are kept alive by the closures we create, and are not destroyed until Parrot exits. But the goto push_it if $b == 4; part is even more problematic, as it skips the before_c block fragment entirely, and goes straight from block_start to push_it. To my surprise, it is possible to create a closure of push_it from block_start and then call it, even though block_start is not push_it's :outer sub (isn't this a bug?). But of course the resulting closure lacks $c; it quite properly says Lexical '$c' not found when called. My reading of perlsub is that $c ought to be in scope, because we're still in the same block as the my, but the value should be undef because we jumped over the initialization code. Interestingly, this is not what Perl 5 does; it reports that $c is bound to 9, the same value as for the *next* closure, which seems very odd. Regardless, a better PIR solution would be to call an alternative skip_before_c sub that creates the necessary lexical environment before calling push_it. But after a day of goto madness, I don't have the stomach for it. In sum, this all seems pretty ugly, and unworthy of Parrot. To compile an HLL goto using explicit pad manipulation, by contrast, the compiler just has to emit code to massage the pad state so that it conforms to what the destination requires, and then emit a PIR goto. The alternative (which Bob is suggesting) is to reintroduce push_pad/pop_pad so you can create new lexical scopes without using subroutines for loop bodies. His suggestion seems reasonable to me. We won't run into this particular problem with Tcl, but I think most languages will. So we've now heard from three fans of explicit pad manipulation, but nothing from The Powers That Be. Allison? Chip? -- Bob ## Return n closures, each with lexical references to I and N'. .sub make_closures_loop .param pmc n .lex $n, n .lex $i, $P42 .local pmc result, counter .lex $counter, counter .lex @result, result result = new .FixedPMCArray $I1 = n result = $I1 counter = new .Integer counter = 0 .local pmc block_start, bs .const .Sub bs = 'block_start' newclosure block_start, bs again: $I0 = counter if $I0 $I1 goto do_block .return (result) do_block: block_start() goto again .end .sub block_start :outer ('make_closures_loop') .lex $i, $P42 find_lex $P43, $n .local pmc counter, result find_lex counter, $counter find_lex result, @result ## bind $i $I42 = counter inc counter $P42 = new .Integer $P42 = $I42 inc $P42 ##
Re: [perl #44395] Lexical scopes are too coarse-grained for loops.
On 8/3/07, Patrick R. Michaud [EMAIL PROTECTED] wrote: On Fri, Aug 03, 2007 at 03:37:39PM -0700, Bob Rogers wrote: A naive PIR implementation of this loop is included as the second attachment. It fails miserably, because PIR can't distinguish between the different scopes for $i and $n. sub make_closures_loop { # Return n closures, each with lexical references to $i and $n. my $n = shift; my @result; for (1..$n) { my $i = $_; push(@result, sub { print Called sub $i out of $n.\n; }); } return @result; } [...] ## Return n closures, each with lexical references to I and N'. .sub make_closures_loop .param pmc n .lex $n, n .lex $i, $P42 [...] .sub internal_make_closures_loop_0 :outer('make_closures_loop') find_lex $P42, $i find_lex $P43, $n [...] This seems wrong to me -- shouldn't $i be scoped to the internal loop instead of the outer one? In other words, I think the PIR code should read: .sub make_closures_loop .param pmc n .lex $n, n ... and then .sub internal_make_closures_loop_0 :outer('make_closures_loop') .lex $i, $P42 find_lex $P43, $n ... No, this is different. Your code puts the scope of $i inside the anonymous sub. In other words, it's this: for (1..$n) { push(@result, sub { my $i = ...; print Called sub $i out of $n.\n; }); } Except there's way to initialize $i in this case. (Initializing it from $_ inside the anonymous sub just moves the problem to a different variable.) Otherwise, the reason that PIR isn't able to distinguish the scopes is because in PIR you're actually defining them to be in the same scope. I believe that's exactly Bob's point. However, I must confess that I still haven't grokked all of the ramifications of lexicals and closures in PIR yet -- and I may even be interpreting the p5 translation incorrectly. It just looks to me as though the naive translation isn't being faithful to the original p5 source, and so perhaps that's causing the problem you're seeing. Again, I believe this is what Bob was saying: it's not possible to be faithful to the original p5 source without creating a separate subroutine for every loop body. And there are reasons why that's a bad idea (even if that's what they are in Perl 6): - It introduces extra overhead in terms of the compiled code - It's extra work for the compiler – introducing an extra pass for lexical analysis The alternative (which Bob is suggesting) is to reintroduce push_pad/pop_pad so you can create new lexical scopes without using subroutines for loop bodies. His suggestion seems reasonable to me. We won't run into this particular problem with Tcl, but I think most languages will. -- Matt Diephouse http://matt.diephouse.com
[perl #44395] Lexical scopes are too coarse-grained for loops.
# New Ticket Created by Bob Rogers # Please include the string: [perl #44395] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org/rt3/Ticket/Display.html?id=44395 The problem arises when blocks with closed-over lexical bindings are executed more than once, as can happen inside loops. Here is an example in Perl 5: sub make_closures_loop { # Return $n closures, each with lexical references to $i and $n. my $n = shift; my @result; for (1..$n) { my $i = $_; push(@result, sub { print Called sub $i out of $n.\n; }); } return @result; } Here there is only one $n binding, but a new and independent $i binding is created every time through the loop, as running the first attachment clearly shows. It makes no difference if you say for my $i ... or replace the loop with map instead. A naive PIR implementation of this loop is included as the second attachment. It fails miserably, because PIR can't distinguish between the different scopes for $i and $n. Currently, the only way to get a distinct binding for each $i in PIR is to factor the loop body out into a separate lexical sub. This is far from ideal, not least because it is not transparent to the HLL user. It is also painful for compilers to optimize away: In order to decide whether the extra sub is really needed, the compiler must find all the closed-over variables first, which (for me at least) requires breaking lexical analysis into two passes. Parrot should do better, IMHO. The easiest way, it seems, would be to resurrect the push_pad and pop_pad instructions in some form. After the lexical analysis, the compiler can simply use the same is this closed-over binding inside a loop? analysis to decide whether or not it needs to emit push_pad/pop_pad at the appropriate places. It then becomes purely a question of merging lexical scopes during optimization, and doesn't affect semantic correctness. FWIW, I do remember waaay back when Leo got rid of the explicit pad manipulation instructions (r10240 on 2005-11-29 and thereabouts). At the time, it went right over my head that I might ever need these (my compiler didn't even support closures until three weeks later), for which I am now very sorry. I am hoping that others will have a similar reaction, and that push_pad etc. weren't flushed due to some fundamental problem. For the record, continuations are also broken in this respect; not surprisingly, they'll take you back to the right context, but with whatever value of $i was stored last. So whatever solution is adopted, we need to make sure that continuations capture the detailed lexical state. Any objection if I start working on this? TIA, -- Bob Rogers http://rgrjr.dyndns.org/ #! /usr/bin/perl -w use strict; use warnings; sub make_closures_loop { # Return n closures, each with lexical references to $i and $n. my $n = shift; my @result; for (1..$n) { my $i = $_; push(@result, sub { print Called sub $i out of $n.\n; }); } return @result; } # Make three closures and call them in turn. for my $c (make_closures_loop(3)) { $c-(); } ## Return n closures, each with lexical references to I and N'. .sub make_closures_loop .param pmc n .lex $n, n .lex $i, $P42 .local pmc result result = new .FixedPMCArray $I1 = n result = $I1 .const .Sub $P53 = 'internal_make_closures_loop_0' $I0 = 0 next: if $I0 = $I1 goto done $P42 = new .Integer $P42 = $I0 inc $P42 newclosure $P52, $P53 result[$I0] = $P52 inc $I0 goto next done: .return (result) .end .sub internal_make_closures_loop_0 :outer('make_closures_loop') find_lex $P42, $i find_lex $P43, $n print Called sub print $P42 print out of print $P43 print .\n .end ## Make three closures and call them in turn. .sub test_closures :main .local pmc closures $I1 = 3 closures = make_closures_loop($I1) $I0 = 0 next: if $I0 = $I1 goto done $P52 = closures[$I0] $P52() inc $I0 goto next done: .end
Re: [perl #44395] Lexical scopes are too coarse-grained for loops.
On Fri, Aug 03, 2007 at 03:37:39PM -0700, Bob Rogers wrote: A naive PIR implementation of this loop is included as the second attachment. It fails miserably, because PIR can't distinguish between the different scopes for $i and $n. sub make_closures_loop { # Return n closures, each with lexical references to $i and $n. my $n = shift; my @result; for (1..$n) { my $i = $_; push(@result, sub { print Called sub $i out of $n.\n; }); } return @result; } [...] ## Return n closures, each with lexical references to I and N'. .sub make_closures_loop .param pmc n .lex $n, n .lex $i, $P42 [...] .sub internal_make_closures_loop_0 :outer('make_closures_loop') find_lex $P42, $i find_lex $P43, $n [...] This seems wrong to me -- shouldn't $i be scoped to the internal loop instead of the outer one? In other words, I think the PIR code should read: .sub make_closures_loop .param pmc n .lex $n, n ... and then .sub internal_make_closures_loop_0 :outer('make_closures_loop') .lex $i, $P42 find_lex $P43, $n ... Otherwise, the reason that PIR isn't able to distinguish the scopes is because in PIR you're actually defining them to be in the same scope. However, I must confess that I still haven't grokked all of the ramifications of lexicals and closures in PIR yet -- and I may even be interpreting the p5 translation incorrectly. It just looks to me as though the naive translation isn't being faithful to the original p5 source, and so perhaps that's causing the problem you're seeing. Hope this helps, Pm
Re: [perl #44395] Lexical scopes are too coarse-grained for loops
Perhaps ignore my earlier message -- this one is more coherent. On Fri, Aug 03, 2007 at 03:37:39PM -0700, Bob Rogers wrote: sub make_closures_loop { # Return $n closures, each with lexical references to $i and $n. my $n = shift; my @result; for (1..$n) { my $i = $_; push(@result, sub { print Called sub $i out of $n.\n; }); } return @result; } Currently, the only way to get a distinct binding for each $i in PIR is to factor the loop body out into a separate lexical sub. This is far from ideal, not least because it is not transparent to the HLL user. Factoring the loop body out into a separate lexical sub is _exactly_ how Chip described to me that the above needed to be done. Or, phrased differently, every lexical scope ends up requiring its own parrot sub (with appropriate :outer pragmas), and the compiler can optimize out such subs when it determines that there aren't any new lexicals being declared within the loop body (which somewhat comes down to keeping track of whether the loop body contains any my declarations). I'm not sure why this separate lexical sub has to be visible to the HLL user -- the compiler ought to be able to make it appear as though the sub isn't present. (But I also bet that we can come up with an example that makes it really hard to do that.) Parrot should do better, IMHO. The easiest way, it seems, would be to resurrect the push_pad and pop_pad instructions in some form. [...] I'd like for Parrot to be able to do better, yes, but I'm not yet enough of an expert on closure handling to be able to refute Chip's reasons for designing things this way. I also found the push_pad and pop_pad model somewhat easier to grasp conceptually, but I recall Chip was fairly certain that they wouldn't handle things in the generic case. FWIW, the perl6 compiler currently treats _every_ block as a new lexical scope, and generates a separate Parrot sub for each. At some point we will add an optimization so that blocks can be inlined when the compiler determines that it's safe to do so (based in part on the absence of any lexical declarations within the block). Hope this helps somewhat more than my previous message... :-) Pm
Re: [perl #44395] Lexical scopes are too coarse-grained for loops
On Fri, Aug 03, 2007 at 09:51:53PM -0500, Patrick R. Michaud wrote: Currently, the only way to get a distinct binding for each $i in PIR is to factor the loop body out into a separate lexical sub. This is far from ideal, not least because it is not transparent to the HLL user. Factoring the loop body out into a separate lexical sub is _exactly_ how Chip described to me that the above needed to be done. Just for completeness, attached is a PIR version that I think is slightly more faithful to the original test-closures.pl program. It appears to work: $ ./parrot test-closures-3.pir Called sub 1 out of 3. Called sub 2 out of 3. Called sub 3 out of 3. $ Pm ## Return n closures, each with lexical references to I and N'. .sub make_closures_loop .param pmc n .lex $n, n .lex @result, $P0 $P0 = new 'ResizablePMCArray' $I0 = 1 $I1 = n next: if $I0 $I1 goto done 'internal_loop_body'($I0) inc $I0 goto next done: .return ($P0) .end .sub 'internal_loop_body' :outer('make_closures_loop') .param pmc topic .lex '$_', topic $P0 = clone topic .lex '$i', $P0 .const .Sub $P1 = 'internal_sub' newclosure $P2, $P1 find_lex $P3, '@result' push $P3, $P2 .return ($P3) .end .sub 'internal_sub' :outer('internal_loop_body') find_lex $P0, '$i' find_lex $P1, '$n' print Called sub print $P0 print out of print $P1 print .\n .end ## Make three closures and call them in turn. .sub test_closures :main .local pmc closures $I1 = 3 closures = make_closures_loop($I1) $I0 = 0 next: if $I0 = $I1 goto done $P52 = closures[$I0] $P52() inc $I0 goto next done: .end
Re: [perl #44395] Lexical scopes are too coarse-grained for loops
From: Patrick R. Michaud [EMAIL PROTECTED] Date: Fri, 3 Aug 2007 21:51:53 -0500 Perhaps ignore my earlier message -- this one is more coherent. I haven't gotten that one yet. (Must have been via RT. Now if I could only remember to send all my incoherent posts via RT. ;-) On Fri, Aug 03, 2007 at 03:37:39PM -0700, Bob Rogers wrote: sub make_closures_loop { # Return $n closures, each with lexical references to $i and $n. my $n = shift; my @result; for (1..$n) { my $i = $_; push(@result, sub { print Called sub $i out of $n.\n; }); } return @result; } Currently, the only way to get a distinct binding for each $i in PIR is to factor the loop body out into a separate lexical sub. This is far from ideal, not least because it is not transparent to the HLL user. Factoring the loop body out into a separate lexical sub is _exactly_ how Chip described to me that the above needed to be done. Or, phrased differently, every lexical scope ends up requiring its own parrot sub . . . Bummer. I'm not sure why this separate lexical sub has to be visible to the HLL user -- the compiler ought to be able to make it appear as though the sub isn't present. (But I also bet that we can come up with an example that makes it really hard to do that.) It will show up in backtraces, if nothing else. This may complicate the user's debugging session slightly, because what the user assumed would be one call frame is actually two, and inner functions may not have the expected compiler-assigned names, but that's hardly a show-stopper. Parrot should do better, IMHO. The easiest way, it seems, would be to resurrect the push_pad and pop_pad instructions in some form. [...] I'd like for Parrot to be able to do better, yes, but I'm not yet enough of an expert on closure handling to be able to refute Chip's reasons for designing things this way. The current model has the advantage of being more declarative, which is good; things that can be nailed down at compile time should be. But the same thing could be done for multiple loop scopes as well. Consider the following alternative: ## Return n closures, each with lexical references to I and N'. .sub make_closures_loop .param pmc n .lex $n, n .local pmc result result = new .FixedPMCArray $I1 = n result = $I1 .const .Sub $P53 = 'internal_make_closures_loop_0' $I0 = 0 next: if $I0 = $I1 goto done .push_scope .lex $i, $P42 $P42 = new .Integer $P42 = $I0 inc $P42 newclosure $P52, $P53 result[$I0] = $P52 inc $I0 .pop_scope goto next done: .return (result) .end Every .lex belongs to the immediately enclosing .push_scope/.pop_scope group (or the sub top-level scope). .push_scope needs to expand into push_pad, and .pop_scope needs to do a pop_pad, but the pseudo-ops serve to delimit scopes so that the PIR compiler can build a LexInfo object for each scope. (But the inner .lex ops may require an explicit store_lex, since they are not being stored in the Parrot_Context. (Which is a good thing.)) I also found the push_pad and pop_pad model somewhat easier to grasp conceptually, but I recall Chip was fairly certain that they wouldn't handle things in the generic case. They would certainly permit more to be handled than is handled now. And re-adding them should be monotonic in terms of Parrot functionality. Which means that Chip's statement (IYRC) seems to make no sense. ??? FWIW, the perl6 compiler currently treats _every_ block as a new lexical scope, and generates a separate Parrot sub for each . . . Hmm. The lexical scope == Parrot sub worldview seems backwards to me. But probably I'm just grumbling because I've realized I'm going to have to rewrite a fair-sized chunk of my compiler to cope with this. Oh, well; so be it. From: Patrick R. Michaud [EMAIL PROTECTED] Date: Fri, 3 Aug 2007 22:13:25 -0500 Just for completeness, attached is a PIR version that I think is slightly more faithful to the original test-closures.pl program. It appears to work: $ ./parrot test-closures-3.pir Called sub 1 out of 3. Called sub 2 out of 3. Called sub 3 out of 3. $ Pm Makes sense; looks like what map would compile into, if the block were not inlined. Thanks for explaining the situation. -- Bob