Re: [perl #44395] Lexical scopes are too coarse-grained for loops.

2007-08-05 Thread Bob Rogers
   From: Matt Diephouse via RT [EMAIL PROTECTED]
   Date: Sat, 04 Aug 2007 00:26:05 -0700

   . . .

   Again, I believe this is what Bob was saying: it's not possible to be
   faithful to the original p5 source without creating a separate
   subroutine for every loop body. 

Exactly.

   And there are reasons why that's a bad idea (even if that's what they
   are in Perl 6):

   - It introduces extra overhead in terms of the compiled code
   - It's extra work for the compiler ? introducing an extra pass
 for lexical analysis

I am discovering that this is especially cumbersome for languages that
have goto.  FWIW, I believe that in Lisp they are restricted in such a
way that it is possible (1) to identify the scopes that might need to be
broken out into separate subs, and (2) to optimize them away in most of
the cases where this is possible without needing to do too much control
flow analysis.  It still requires an extra pass and more runtime
overhead, though.

   But languages with less restricted goto are hairier still.
Consider the test-closures-4.pl script (first attachment) which rewrites
the loop to use an explicit goto, and adds a few extra lexical
variables and control flow quirks.  Ostensibly, one would translate this
to PIR by splitting the inner block at each label, and implementing the
block fragments as subs (attached as test-closures-4.pir).  Because the
scopes are nested, the blocks must be nested as well, i.e. the sub for
push_it must reference the sub for before_c as its :outer sub, and
similarly the sub for the start of the block (let's call it
block_start) must have before_c as its :outer.

   First off, note that I have finessed the implementation of goto
again by using normal sub return, and moving this goto into the
outermost sub.  To handle situations where we are jumping to a lexically
earlier label more generally, the compiler could create a continuation
in the outer sub and stuff it in a new lexical variable so that the
inner sub can call it if needed.  The fully general technique would be
to split the again label body out into a sub and just call it.  Note
that tailcalling here makes little difference, because all of the
various contexts are kept alive by the closures we create, and are not
destroyed until Parrot exits.

   But the goto push_it if $b == 4; part is even more problematic, as
it skips the before_c block fragment entirely, and goes straight from
block_start to push_it.  To my surprise, it is possible to create a
closure of push_it from block_start and then call it, even though
block_start is not push_it's :outer sub (isn't this a bug?).  But of
course the resulting closure lacks $c; it quite properly says Lexical
'$c' not found when called.  My reading of perlsub is that $c ought to
be in scope, because we're still in the same block as the my, but the
value should be undef because we jumped over the initialization code.
Interestingly, this is not what Perl 5 does; it reports that $c is bound
to 9, the same value as for the *next* closure, which seems very odd.
Regardless, a better PIR solution would be to call an alternative
skip_before_c sub that creates the necessary lexical environment before
calling push_it.  But after a day of goto madness, I don't have the
stomach for it.

   In sum, this all seems pretty ugly, and unworthy of Parrot.  To
compile an HLL goto using explicit pad manipulation, by contrast, the
compiler just has to emit code to massage the pad state so that it
conforms to what the destination requires, and then emit a PIR goto.

   The alternative (which Bob is suggesting) is to reintroduce
   push_pad/pop_pad so you can create new lexical scopes without using
   subroutines for loop bodies.

   His suggestion seems reasonable to me. We won't run into this
   particular problem with Tcl, but I think most languages will.

So we've now heard from three fans of explicit pad manipulation, but
nothing from The Powers That Be.  Allison?  Chip?

-- Bob


## Return n closures, each with lexical references to I and N'.
.sub make_closures_loop
.param pmc n
.lex $n, n
.lex $i, $P42
.local pmc result, counter
.lex $counter, counter
.lex @result, result
result = new .FixedPMCArray
$I1 = n
result = $I1
counter = new .Integer
counter = 0
.local pmc block_start, bs
.const .Sub bs = 'block_start'
newclosure block_start, bs

again:
$I0 = counter
if $I0  $I1 goto do_block
.return (result)
do_block:
block_start()
goto again
.end

.sub block_start :outer ('make_closures_loop')
.lex $i, $P42
find_lex $P43, $n
.local pmc counter, result
find_lex counter, $counter
find_lex result, @result
## bind $i
$I42 = counter
inc counter
$P42 = new .Integer
$P42 = $I42
inc $P42
## 

Re: [perl #44395] Lexical scopes are too coarse-grained for loops.

2007-08-04 Thread Matt Diephouse
On 8/3/07, Patrick R. Michaud [EMAIL PROTECTED] wrote:
 On Fri, Aug 03, 2007 at 03:37:39PM -0700, Bob Rogers wrote:
 A naive PIR implementation of this loop is included as the second
  attachment.  It fails miserably, because PIR can't distinguish between
  the different scopes for $i and $n.
 
  sub make_closures_loop {
  # Return n closures, each with lexical references to $i and $n.
  my $n = shift;
 
  my @result;
  for (1..$n) {
my $i = $_;
push(@result, sub { print Called sub $i out of $n.\n; });
  }
  return @result;
  }
 
  [...]
 
  ## Return n closures, each with lexical references to I and N'.
  .sub make_closures_loop
.param pmc n
.lex $n, n
.lex $i, $P42
  [...]
 
  .sub internal_make_closures_loop_0 :outer('make_closures_loop')
find_lex $P42, $i
find_lex $P43, $n
  [...]

 This seems wrong to me -- shouldn't $i be scoped to the internal
 loop instead of the outer one?  In other words, I think the PIR
 code should read:

 .sub make_closures_loop
 .param pmc n
 .lex $n, n
 ...

 and then

 .sub internal_make_closures_loop_0 :outer('make_closures_loop')
 .lex $i, $P42
 find_lex $P43, $n
 ...

No, this is different. Your code puts the scope of $i inside the
anonymous sub. In other words, it's this:

for (1..$n) {
push(@result, sub { my $i = ...; print Called sub $i out of $n.\n; });
}

Except there's way to initialize $i in this case. (Initializing it
from $_ inside the anonymous sub just moves the problem to a different
variable.)

 Otherwise, the reason that PIR isn't able to distinguish the
 scopes is because in PIR you're actually defining them to be
 in the same scope.

I believe that's exactly Bob's point.

 However, I must confess that I still haven't grokked all of the
 ramifications of lexicals and closures in PIR yet -- and I may
 even be interpreting the p5 translation incorrectly.  It just
 looks to me as though the naive translation isn't being faithful
 to the original p5 source, and so perhaps that's causing the
 problem you're seeing.

Again, I believe this is what Bob was saying: it's not possible to be
faithful to the original p5 source without creating a separate
subroutine for every loop body. And there are reasons why that's a bad
idea (even if that's what they are in Perl 6):

- It introduces extra overhead in terms of the compiled code
- It's extra work for the compiler – introducing an extra pass for
lexical analysis

The alternative (which Bob is suggesting) is to reintroduce
push_pad/pop_pad so you can create new lexical scopes without using
subroutines for loop bodies.

His suggestion seems reasonable to me. We won't run into this
particular problem with Tcl, but I think most languages will.

-- 
Matt Diephouse
http://matt.diephouse.com


[perl #44395] Lexical scopes are too coarse-grained for loops.

2007-08-03 Thread via RT
# New Ticket Created by  Bob Rogers 
# Please include the string:  [perl #44395]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org/rt3/Ticket/Display.html?id=44395 


The problem arises when blocks with closed-over lexical bindings are
executed more than once, as can happen inside loops.  Here is an example
in Perl 5:

sub make_closures_loop {
# Return $n closures, each with lexical references to $i and $n.
my $n = shift;

my @result;
for (1..$n) {
my $i = $_;
push(@result, sub { print Called sub $i out of $n.\n; });
}
return @result;
}

Here there is only one $n binding, but a new and independent $i binding
is created every time through the loop, as running the first attachment
clearly shows.  It makes no difference if you say for my $i ... or
replace the loop with map instead.

   A naive PIR implementation of this loop is included as the second
attachment.  It fails miserably, because PIR can't distinguish between
the different scopes for $i and $n.

   Currently, the only way to get a distinct binding for each $i in
PIR is to factor the loop body out into a separate lexical sub.  This is
far from ideal, not least because it is not transparent to the HLL user.
It is also painful for compilers to optimize away:  In order to decide
whether the extra sub is really needed, the compiler must find all the
closed-over variables first, which (for me at least) requires breaking
lexical analysis into two passes.

   Parrot should do better, IMHO.  The easiest way, it seems, would be
to resurrect the push_pad and pop_pad instructions in some form.  After
the lexical analysis, the compiler can simply use the same is this
closed-over binding inside a loop? analysis to decide whether or not it
needs to emit push_pad/pop_pad at the appropriate places.  It then
becomes purely a question of merging lexical scopes during optimization,
and doesn't affect semantic correctness.

   FWIW, I do remember waaay back when Leo got rid of the explicit pad
manipulation instructions (r10240 on 2005-11-29 and thereabouts).  At
the time, it went right over my head that I might ever need these (my
compiler didn't even support closures until three weeks later), for
which I am now very sorry.  I am hoping that others will have a similar
reaction, and that push_pad etc. weren't flushed due to some fundamental
problem.

   For the record, continuations are also broken in this respect; not
surprisingly, they'll take you back to the right context, but with
whatever value of $i was stored last.  So whatever solution is
adopted, we need to make sure that continuations capture the detailed
lexical state.

   Any objection if I start working on this?

   TIA,

-- Bob Rogers
   http://rgrjr.dyndns.org/

#! /usr/bin/perl -w

use strict;
use warnings;

sub make_closures_loop {
# Return n closures, each with lexical references to $i and $n.
my $n = shift;

my @result;
for (1..$n) {
my $i = $_;
push(@result, sub { print Called sub $i out of $n.\n; });
}
return @result;
}

# Make three closures and call them in turn.
for my $c (make_closures_loop(3)) {
$c-();
}

## Return n closures, each with lexical references to I and N'.
.sub make_closures_loop
.param pmc n
.lex $n, n
.lex $i, $P42
.local pmc result
result = new .FixedPMCArray
$I1 = n
result = $I1
.const .Sub $P53 = 'internal_make_closures_loop_0'
$I0 = 0
next:
if $I0 = $I1 goto done
$P42 = new .Integer
$P42 = $I0
inc $P42
newclosure $P52, $P53
result[$I0] = $P52
inc $I0
goto next
done:
.return (result)
.end

.sub internal_make_closures_loop_0 :outer('make_closures_loop')
find_lex $P42, $i
find_lex $P43, $n
print Called sub 
print $P42
print  out of 
print $P43
print .\n
.end

## Make three closures and call them in turn.
.sub test_closures :main
.local pmc closures
$I1 = 3
closures = make_closures_loop($I1)
$I0 = 0
next:
if $I0 = $I1 goto done
$P52 = closures[$I0]
$P52()
inc $I0
goto next
done:
.end


Re: [perl #44395] Lexical scopes are too coarse-grained for loops.

2007-08-03 Thread Patrick R. Michaud
On Fri, Aug 03, 2007 at 03:37:39PM -0700, Bob Rogers wrote:
A naive PIR implementation of this loop is included as the second
 attachment.  It fails miserably, because PIR can't distinguish between
 the different scopes for $i and $n.
 
 sub make_closures_loop {
 # Return n closures, each with lexical references to $i and $n.
 my $n = shift;
 
 my @result;
 for (1..$n) {
   my $i = $_;
   push(@result, sub { print Called sub $i out of $n.\n; });
 }
 return @result;
 }

 [...]

 ## Return n closures, each with lexical references to I and N'.
 .sub make_closures_loop
   .param pmc n
   .lex $n, n
   .lex $i, $P42
 [...]
 
 .sub internal_make_closures_loop_0 :outer('make_closures_loop')
   find_lex $P42, $i
   find_lex $P43, $n
 [...]

This seems wrong to me -- shouldn't $i be scoped to the internal
loop instead of the outer one?  In other words, I think the PIR
code should read:

.sub make_closures_loop
.param pmc n
.lex $n, n
...

and then

.sub internal_make_closures_loop_0 :outer('make_closures_loop')
.lex $i, $P42
find_lex $P43, $n
...

Otherwise, the reason that PIR isn't able to distinguish the
scopes is because in PIR you're actually defining them to be
in the same scope.

However, I must confess that I still haven't grokked all of the
ramifications of lexicals and closures in PIR yet -- and I may
even be interpreting the p5 translation incorrectly.  It just
looks to me as though the naive translation isn't being faithful
to the original p5 source, and so perhaps that's causing the
problem you're seeing.

Hope this helps,

Pm


Re: [perl #44395] Lexical scopes are too coarse-grained for loops

2007-08-03 Thread Patrick R. Michaud
Perhaps ignore my earlier message -- this one is more coherent.

On Fri, Aug 03, 2007 at 03:37:39PM -0700, Bob Rogers wrote:
   sub make_closures_loop {
   # Return $n closures, each with lexical references to $i and $n.
   my $n = shift;
 
   my @result;
   for (1..$n) {
   my $i = $_;
   push(@result, sub { print Called sub $i out of $n.\n; });
   }
   return @result;
   }
 
Currently, the only way to get a distinct binding for each $i in
 PIR is to factor the loop body out into a separate lexical sub.  This is
 far from ideal, not least because it is not transparent to the HLL user.

Factoring the loop body out into a separate lexical sub is _exactly_
how Chip described to me that the above needed to be done.
Or, phrased differently, every lexical scope ends up requiring
its own parrot sub (with appropriate :outer pragmas), and the
compiler can optimize out such subs when it determines that there
aren't any new lexicals being declared within the loop body
(which somewhat comes down to keeping track of whether the loop
body contains any my declarations).

I'm not sure why this separate lexical sub has to be visible
to the HLL user -- the compiler ought to be able to make it appear
as though the sub isn't present.  (But I also bet that we can
come up with an example that makes it really hard to do that.)

Parrot should do better, IMHO.  The easiest way, it seems, would be
 to resurrect the push_pad and pop_pad instructions in some form.  [...]

I'd like for Parrot to be able to do better, yes, but I'm not
yet enough of an expert on closure handling to be able to
refute Chip's reasons for designing things this way.  I also
found the push_pad and pop_pad model somewhat easier to grasp
conceptually, but I recall Chip was fairly certain that they
wouldn't handle things in the generic case.

FWIW, the perl6 compiler currently treats _every_ block as a new 
lexical scope, and generates a separate Parrot sub for each.
At some point we will add an optimization so that blocks can
be inlined when the compiler determines that it's safe to do
so (based in part on the absence of any lexical declarations
within the block).

Hope this helps somewhat more than my previous message... :-)

Pm


Re: [perl #44395] Lexical scopes are too coarse-grained for loops

2007-08-03 Thread Patrick R. Michaud
On Fri, Aug 03, 2007 at 09:51:53PM -0500, Patrick R. Michaud wrote:
 Currently, the only way to get a distinct binding for each $i in
  PIR is to factor the loop body out into a separate lexical sub.  This is
  far from ideal, not least because it is not transparent to the HLL user.
 
 Factoring the loop body out into a separate lexical sub is _exactly_
 how Chip described to me that the above needed to be done.

Just for completeness, attached is a PIR version that I think
is slightly more faithful to the original test-closures.pl 
program.  It appears to work:

  $ ./parrot test-closures-3.pir
  Called sub 1 out of 3.
  Called sub 2 out of 3.
  Called sub 3 out of 3.
  $

Pm



## Return n closures, each with lexical references to I and N'.
.sub make_closures_loop
.param pmc n
.lex $n, n
.lex @result, $P0
$P0 = new 'ResizablePMCArray'
$I0 = 1
$I1 = n
  next:
if $I0  $I1 goto done
'internal_loop_body'($I0)
inc $I0
goto next
  done:
.return ($P0)
.end


.sub 'internal_loop_body' :outer('make_closures_loop')
.param pmc topic
.lex '$_', topic
$P0 = clone topic
.lex '$i', $P0
.const .Sub $P1 = 'internal_sub'
newclosure $P2, $P1
find_lex $P3, '@result'
push $P3, $P2
.return ($P3)
.end

.sub 'internal_sub' :outer('internal_loop_body')
find_lex $P0, '$i'
find_lex $P1, '$n'
print Called sub 
print $P0
print  out of 
print $P1
print .\n
.end


## Make three closures and call them in turn.
.sub test_closures :main
.local pmc closures
$I1 = 3
closures = make_closures_loop($I1)
$I0 = 0
  next:
if $I0 = $I1 goto done
$P52 = closures[$I0]
$P52()
inc $I0
goto next
  done:
.end


Re: [perl #44395] Lexical scopes are too coarse-grained for loops

2007-08-03 Thread Bob Rogers
   From: Patrick R. Michaud [EMAIL PROTECTED]
   Date: Fri, 3 Aug 2007 21:51:53 -0500

   Perhaps ignore my earlier message -- this one is more coherent.

I haven't gotten that one yet.  (Must have been via RT.  Now if I could
only remember to send all my incoherent posts via RT.  ;-)

   On Fri, Aug 03, 2007 at 03:37:39PM -0700, Bob Rogers wrote:
   sub make_closures_loop {
   # Return $n closures, each with lexical references to $i and $n.
   my $n = shift;

   my @result;
   for (1..$n) {
   my $i = $_;
   push(@result, sub { print Called sub $i out of $n.\n; });
   }
   return @result;
   }

   Currently, the only way to get a distinct binding for each $i in
PIR is to factor the loop body out into a separate lexical sub.  This is
far from ideal, not least because it is not transparent to the HLL user.

   Factoring the loop body out into a separate lexical sub is _exactly_
   how Chip described to me that the above needed to be done.
   Or, phrased differently, every lexical scope ends up requiring
   its own parrot sub . . .

Bummer.

   I'm not sure why this separate lexical sub has to be visible
   to the HLL user -- the compiler ought to be able to make it appear
   as though the sub isn't present.  (But I also bet that we can
   come up with an example that makes it really hard to do that.)

It will show up in backtraces, if nothing else.  This may complicate the
user's debugging session slightly, because what the user assumed would
be one call frame is actually two, and inner functions may not have the
expected compiler-assigned names, but that's hardly a show-stopper.

   Parrot should do better, IMHO.  The easiest way, it seems, would be
to resurrect the push_pad and pop_pad instructions in some form.  [...]

   I'd like for Parrot to be able to do better, yes, but I'm not
   yet enough of an expert on closure handling to be able to
   refute Chip's reasons for designing things this way.

The current model has the advantage of being more declarative, which is
good; things that can be nailed down at compile time should be.  But the
same thing could be done for multiple loop scopes as well.  Consider the
following alternative:

## Return n closures, each with lexical references to I and N'.
.sub make_closures_loop
.param pmc n
.lex $n, n
.local pmc result
result = new .FixedPMCArray
$I1 = n
result = $I1
.const .Sub $P53 = 'internal_make_closures_loop_0'
$I0 = 0
next:
if $I0 = $I1 goto done
.push_scope
.lex $i, $P42
$P42 = new .Integer
$P42 = $I0
inc $P42
newclosure $P52, $P53
result[$I0] = $P52
inc $I0
.pop_scope
goto next
done:
.return (result)
.end

Every .lex belongs to the immediately enclosing .push_scope/.pop_scope
group (or the sub top-level scope).  .push_scope needs to expand into
push_pad, and .pop_scope needs to do a pop_pad, but the pseudo-ops serve
to delimit scopes so that the PIR compiler can build a LexInfo object
for each scope.  (But the inner .lex ops may require an explicit
store_lex, since they are not being stored in the Parrot_Context.
(Which is a good thing.))

   I also found the push_pad and pop_pad model somewhat easier to grasp
   conceptually, but I recall Chip was fairly certain that they wouldn't
   handle things in the generic case.

They would certainly permit more to be handled than is handled now.  And
re-adding them should be monotonic in terms of Parrot functionality.
Which means that Chip's statement (IYRC) seems to make no sense.  ???

   FWIW, the perl6 compiler currently treats _every_ block as a new 
   lexical scope, and generates a separate Parrot sub for each . . .

Hmm.  The lexical scope == Parrot sub worldview seems backwards to me.
But probably I'm just grumbling because I've realized I'm going to have
to rewrite a fair-sized chunk of my compiler to cope with this.  Oh,
well; so be it.

   
   From: Patrick R. Michaud [EMAIL PROTECTED]
   Date: Fri, 3 Aug 2007 22:13:25 -0500

   Just for completeness, attached is a PIR version that I think
   is slightly more faithful to the original test-closures.pl 
   program.  It appears to work:

 $ ./parrot test-closures-3.pir
 Called sub 1 out of 3.
 Called sub 2 out of 3.
 Called sub 3 out of 3.
 $

   Pm

Makes sense; looks like what map would compile into, if the block were
not inlined.

   Thanks for explaining the situation.

-- Bob