Author: larry
Date: Sat May 17 14:37:37 2008
New Revision: 14542
Modified:
doc/trunk/design/syn/S05.pod
Log:
Clarifications to how tied longest tokens are handled under LTM
Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod (original)
+++ doc/trunk/design/syn/S05.pod Sat May 17 14:37:37 2008
@@ -14,9 +14,9 @@
Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
Larry Wall <[EMAIL PROTECTED]>
Date: 24 Jun 2002
- Last Modified: 7 May 2008
+ Last Modified: 18 May 2008
Number: 5
- Version: 78
+ Version: 79
This document summarizes Apocalypse 5, which is about the new regex
syntax. We now try to call them I<regex> rather than "regular
@@ -2094,8 +2094,14 @@
expressions). A logical alternation using C<|> then takes two or
more of these lists and dispatches to the alternative that matches
the longest token prefix. This may or may not be the alternative
-that comes first lexically. (However, in the case of a tie between
-alternatives, the textually earlier alternative does take precedence.)
+that comes first lexically.
+
+However, if two alternatives match at the same length, the tie is
+broken by one of two methods. If the alternatives are in different
+grammars, standard MRO (method resolution order) determines which
+one to try first. If the alternatives are in the same grammar, the
+textually earlier alternative takes precedence. (If a grammar's rules
+are defined in more than one file, the results are undefined.)
This longest token prefix corresponds roughly to the notion of "token"
in other parsing systems that use a lexer, but in the case of Perl
@@ -2150,6 +2156,11 @@
Greedy quantifiers and character classes do not terminate a token pattern.
Zero-width assertions such as word boundaries are also okay.
+Because such assertions can be part of the token, the lexer engine must
+be able to recover from the failure of such an assertion and backtrack
+to the next best token candidate, which might be the same length or shorter,
+but can never be longer than the current candidate.
+
For a pattern that starts with a positive lookahead assertion,
the assertion is assumed to be more specific than the subsequent
pattern, so the lookahead's pattern is treated as the longest token;