Author: larry
Date: Tue Jul 10 17:39:45 2007
New Revision: 14428

Modified:
   doc/trunk/design/syn/S05.pod

Log:
The ** form is now syntactically independent of the following token.
This allows us to distinguish literal counts and ranges from indirect ones
specified via closure.  It also allows a notational simplification for
infix repetition suggested by Morrie Siegel++.  (As a consequence, the ?
character to specify minimal matching now attaches to the ** directly.)


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod        (original)
+++ doc/trunk/design/syn/S05.pod        Tue Jul 10 17:39:45 2007
@@ -14,9 +14,9 @@
    Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
                Larry Wall <[EMAIL PROTECTED]>
    Date: 24 Jun 2002
-   Last Modified: 9 Jul 2007
+   Last Modified: 10 Jul 2007
    Number: 5
-   Version: 60
+   Version: 61
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> rather than "regular
@@ -676,28 +676,60 @@
 
 =item *
 
-The repetition specifier is now C<**{...}> for maximal matching,
-with a corresponding C<**{...}?> for minimal matching.  Space is
-allowed on either side of the asterisks.  The curlies are taken to
-be a closure returning an Int or a Range object.
+The general repetition specifier is now C<**> for maximal matching,
+with a corresponding C<**?> for minimal matching.  Space is
+allowed on either side.  The next token will determine what kind of
+repetition is desired:
 
-     / value was (\d ** {1..6}?) with ([\w]**{$m..$n}) /
+If the next thing is an integer, then it is parsed as either as an exact
+count or a range:
+
+    . ** 42                  # match exactly 42 times
+    <item> ** 3..*           # match 3 or more times
+
+This form is considered declarational.
+
+If you supply a closure, it should return either an C<Int> or a C<Range> 
object.
+
+    'x' ** {$m}              # exact count returned from closure
+    <foo> ** {$m..$n}        # range returned from closure
+
+    / value was (\d **? {1..6}) with ([ <alpha>\w* ]**{$m..$n}) /
 
 It is illegal to return a list, so this easy mistake fails:
 
-     / [foo]**{1,3} /
+    / [foo] ** {1,3} /
+
+The closure form is always considered procedural, so the item it is
+modifying is never considered part of the longest token.
+
+If you supply any other atom (which may not be quantified), it is
+interpreted as a separator (such as an infix operator), and the
+initial item is quantified by the number of times the separator is
+seen between items:
+
+    <alt> ** '|'            # repetition controlled by presence of separator
+    <addend> ** <addop>     # repetition controlled by presence of separator
+    <item> ** [ \!?'==' ]   # repetition controlled by presence of separator
+
+A successful match of such a quantifier always ends "in the middle",
+that is, after the initial item but before the next separator.
+(The separator never matches independently of the next item; if the
+separator matches but the next item fails, it backtracks all the way
+back through the separator.)  Therefore
+
+    / <ident> ** ',' /
+
+can match
+
+    foo
+    foo,bar
+    foo,bar,baz
 
-(At least, it fails in the absence of C<use rx :listquantifier>,
-which is likely to be unimplemented in PerlĀ 6.0.0 anyway.)
+but never
 
-The optimizer will likely optimize away things like C<**{1..*}>
-so that the closure is never actually run in that case.  But it's
-a closure that must be run in the general case, so you can use
-it to generate a range on the fly based on the earlier matching.
-(Of course, bear in mind the closure must be run I<before> attempting to
-match whatever it quantifies.)  A closure that must be run is considered
-procedural, but a closure that recognizably returns the same thing every
-time is considered declarative.
+    foo,
+    foo,bar,
 
 =item *
 

Reply via email to