Author: lwall Date: 2009-03-18 19:24:25 +0100 (Wed, 18 Mar 2009) New Revision: 25889
Modified: docs/Perl6/Spec/S05-regex.pod Log: Destroy the term "result object" in favor of "abstract object" and AST-Think. Modified: docs/Perl6/Spec/S05-regex.pod =================================================================== --- docs/Perl6/Spec/S05-regex.pod 2009-03-18 18:14:09 UTC (rev 25888) +++ docs/Perl6/Spec/S05-regex.pod 2009-03-18 18:24:25 UTC (rev 25889) @@ -14,9 +14,9 @@ Maintainer: Patrick Michaud <pmich...@pobox.com> and Larry Wall <la...@wall.org> Date: 24 Jun 2002 - Last Modified: 11 Mar 2009 + Last Modified: 18 Mar 2009 Number: 5 - Version: 91 + Version: 92 This document summarizes Apocalypse 5, which is about the new regex syntax. We now try to call them I<regex> rather than "regular @@ -774,15 +774,21 @@ \s+ { print "but does contain whitespace\n" } / -An B<explicit> reduction using the C<make> function sets the I<result object> +An B<explicit> reduction using the C<make> function generates the +I<abstract syntax tree> object (I<abstract object> or I<ast> for short) for this match: / (\d) { make $0.sqrt } Remainder /; -This has the effect of capturing the square root of the numified string, -instead of the string. The C<Remainder> part is matched but is not returned -as part of the result object unless the first C<make> is later overridden by another C<make>. +This has the effect of capturing the square root of the numified +string, instead of the string. The C<Remainder> part is matched and +returned as part of the C<Match> object but is not returned +as part of the abstract object. Since the abstract object usually +represents the top node of an abstract syntax tree, the abstract object +may be extracted from the C<Match> object by use if the C<.ast> method. +A second call to C<make> overrides any previous call to C<make>. + These closures are invoked with a topic (C<$_>) of the current match state (a C<Cursor> object). Within a closure, the instantaneous position within the search is denoted by the C<.pos> method on @@ -1331,7 +1337,7 @@ time you use it unless the string changes. (Any external lexical variable names must be rebound each time though.) Subrules may not be interpolated with unbalanced bracketing. An interpolated subrule -keeps its own inner match result as a single item, so its parentheses never count toward the +keeps its own inner match results as a single item, so its parentheses never count toward the outer regexes groupings. (In other words, parenthesis numbering is always lexically scoped.) @@ -1585,7 +1591,7 @@ =item * -A C<< <( >> token indicates the start of a result capture, while the +A C<< <( >> token indicates the start of the match's overall capture, while the corresponding C<< )> >> token indicates its endpoint. When matched, these behave as assertions that are always true, but have the side effect of setting the C<.from> and C<.to> attributes of the match @@ -1600,8 +1606,9 @@ except that the scan for "C<foo>" can be done in the forward direction, while a lookbehind assertion would presumably scan for C<\d+> and then match "C<foo>" backwards. The use of C<< <(...)> >> affects only the -meaning of the I<result object> and the positions of the beginning and -ending of the match. That is, after the match above, C<$()> contains +meaning the positions of the beginning and +ending of the match, and anything calculated based on those positions. +For instance, after the match above, C<$()> contains only the digits matched, and C<$/.to> is pointing to after the digits. Other captures (named or numbered) are unaffected and may be accessed through C<$/>. @@ -2389,8 +2396,9 @@ =item * Notionally, a match object contains (among other things) a boolean -success value, a scalar I<result object>, an array of ordered submatch -objects, and a hash of named submatch objects. To provide convenient +success value, an array of ordered submatch objects, and a hash of named +submatch objects. (It also optionally carries an I<abstract object> normally +used to build up an abstract syntax tree,) To provide convenient access to these various values, the match object evaluates differently in different contexts: @@ -2433,10 +2441,12 @@ When used as a scalar, a C<Match> object evaluates to itself. -However, sometimes you would like an alternate scalar value to ride -along with the match. This is called a I<result> object, and it rides -along is an attribute of the C<Match> object. -C<$()> is a shorthand for C<$($/.rob)>. +However, sometimes you would like an alternate scalar value to +ride along with the match. The C<Match> object itself describes +a concrete parse tree, so this extra value is called an I<abstract> +object; it rides along as an attribute of the C<Match> object. C<$()> +is a shorthand for C<$($/.ast)>. The C<.ast> method by default just +returns the string between the C<$/.from> and C<$/.to> positions. Therefore C<$()> is usually just the entire match string, but you can override that by calling C<make> inside a regex: @@ -2447,19 +2457,19 @@ # match succeeds -- ignore the rest of the regex }); -This puts the result object into C<$/.rob>. If a result object is +This puts the new abstract node into C<$/.ast>. If the abstract object is returned that way, it may be of any type, not just a string. This makes it convenient to build up an abstract syntax tree of arbitrary node types. -You may also capture a subset of the match as the result object using +You may also capture a subset of the match as the abstract object using the C<< <(...)> >> construct: "foo123bar" ~~ / foo <( \d+ )> bar / say $(); # says 123 -In this case the result object is always a string when doing string -matching, and a list of one or more elements when doing array matching. +In this case the abstract object is always a string when doing string +matching, and a list of one or more elements when doing list matching. =item * @@ -2564,15 +2574,15 @@ =item * -Inside a regex, the C<$_> variable holds the current regex's incomplete -C<Match> object, known as a match state. Generally this should not +Inside a regex, the C<$ยข> variable holds the current regex's incomplete +C<Match> object, known as a match state (of type C<Cursor>). Generally this should not be modified unless you know how to create and propagate match states. All regexes actually return match states even when you think they're returning something else, because the match states keep track of the success and failures of the pattern for you. -Fortunately, when you just want to return a different result object instead -of the default C<Match> object, you may associate your return value with +Fortunately, when you just want to return a different abstract result along with +the default concrete C<Match> object, you may associate your return value with the current match state using the C<make> function, which works something like a C<return>, but doesn't clobber the match state: @@ -2581,7 +2591,7 @@ /; say $(); # says 'bar' -The result object is available in the C<Match> object via a C<< .rob >> lookup. +The abstract object of any C<Match> object is available via the C<< .ast >> method. =back @@ -3942,8 +3952,9 @@ method call.) You'll note from the last example that substitutions only happen on -the "official" string result of the match, that is, the C<$()> value. -(Here we captured C<$()> using the C<< <(...)> >> pair; otherwise we +the "official" string result of the match, that is, the portion of +the string between the C<$/.from> and C<$/.to> positions. +(Here we set those explicitly using the C<< <(...)> >> pair; otherwise we would have had to use lookbehind to match the C<$>.) =head1 Positional matching, fixed width types