In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/65169990ec2fa183dd798b11e833db0f15b2dc24?hp=1fdb5498519a40e7ce6b5adced61c49638141e25>
- Log ----------------------------------------------------------------- commit 65169990ec2fa183dd798b11e833db0f15b2dc24 Author: Father Chrysostomos <spr...@cpan.org> Date: Wed Aug 10 23:43:34 2016 -0700 perlinterp.pod: Expand the op tree section based on things that came up in the thread starting at <20160808225325.79944...@shy.leonerd.org.uk>. ----------------------------------------------------------------------- Summary of changes: pod/perlinterp.pod | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 84 insertions(+), 5 deletions(-) diff --git a/pod/perlinterp.pod b/pod/perlinterp.pod index 5c41e29..e1af333 100644 --- a/pod/perlinterp.pod +++ b/pod/perlinterp.pod @@ -531,8 +531,45 @@ statement. Get the values of C<$b> and C<$c>, and add them together. Find C<$a>, and assign one to the other. Then leave. The way Perl builds up these op trees in the parsing process can be -unravelled by examining F<perly.y>, the YACC grammar. Let's take the -piece we need to construct the tree for C<$a = $b + $c> +unravelled by examining F<toke.c>, the lexer, and F<perly.y>, the YACC +grammar. Let's look at the code that constructs the tree for C<$a = $b + +$c>. + +First, we'll look at the C<Perl_yylex> function in the lexer. We want to +look for C<case 'x'>, where x is the first character of the operator. +(Incidentally, when looking for the code that handles a keyword, you'll +want to search for C<KEY_foo> where "foo" is the keyword.) Here is the code +that handles assignment (there are quite a few operators beginning with +C<=>, so most of it is omitted for brevity): + + 1 case '=': + 2 s++; + ... code that handles == => etc. and pod ... + 3 pl_yylval.ival = 0; + 4 OPERATOR(ASSIGNOP); + +We can see on line 4 that our token type is C<ASSIGNOP> (C<OPERATOR> is a +macro, defined in F<toke.c>, that returns the token type, among other +things). And C<+>: + + 1 case '+': + 2 { + 3 const char tmp = *s++; + ... code for ++ ... + 4 if (PL_expect == XOPERATOR) { + ... + 5 Aop(OP_ADD); + 6 } + ... + 7 } + +Line 4 checks what type of token we are expecting. C<Aop> returns a token. +If you search for C<Aop> elsewhere in F<toke.c>, you will see that it +returns an C<ADDOP> token. + +Now that we know the two token types we want to look for in the parser, +let's take the piece of F<perly.y> we need to construct the tree for +C<$a = $b + $c> 1 term : term ASSIGNOP term 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } @@ -541,9 +578,8 @@ piece we need to construct the tree for C<$a = $b + $c> If you're not used to reading BNF grammars, this is how it works: You're fed certain things by the tokeniser, which generally end up in -upper case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in -your code. C<ASSIGNOP> is provided when C<=> is used for assigning. -These are "terminal symbols", because you can't get any simpler than +upper case. C<ADDOP> and C<ASSIGNOP> are examples of "terminal symbols", +because you can't get any simpler than them. The grammar, lines one and three of the snippet above, tells you how to @@ -580,6 +616,49 @@ use C<$2>. The second parameter is the op's flags: 0 means "nothing special". Then the things to add: the left and right hand side of our expression, in scalar context. +The functions that create ops, which have names like C<newUNOP> and +C<newBINOP>, call a "check" function associated with each op type, before +returning the op. The check functions can mangle the op as they see fit, +and even replace it with an entirely new one. These functions are defined +in F<op.c>, and have a C<Perl_ck_> prefix. You can find out which +check function is used for a particular op type by looking in +F<regen/opcodes>. Take C<OP_ADD>, for example. (C<OP_ADD> is the token +value from the C<Aop(OP_ADD)> in F<toke.c> which the parser passes to +C<newBINOP> as its first argument.) Here is the relevant line: + + add addition (+) ck_null IfsT2 S S + +The check function in this case is C<Perl_ck_null>, which does nothing. +Let's look at a more interesting case: + + readline <HANDLE> ck_readline t% F? + +And here is the function from F<op.c>: + + 1 OP * + 2 Perl_ck_readline(pTHX_ OP *o) + 3 { + 4 PERL_ARGS_ASSERT_CK_READLINE; + 5 + 6 if (o->op_flags & OPf_KIDS) { + 7 OP *kid = cLISTOPo->op_first; + 8 if (kid->op_type == OP_RV2GV) + 9 kid->op_private |= OPpALLOW_FAKE; + 10 } + 11 else { + 12 OP * const newop + 13 = newUNOP(OP_READLINE, 0, newGVOP(OP_GV, 0, + 14 PL_argvgv)); + 15 op_free(o); + 16 return newop; + 17 } + 18 return o; + 19 } + +One particularly interesting aspect is that if the op has no kids (i.e., +C<readline()> or C<< <> >>) the op is freed and replaced with an entirely +new one that references C<*ARGV> (lines 12-16). + =head1 STACKS When perl executes something like C<addop>, how does it pass on its -- Perl5 Master Repository