Author: larry
Date: Thu Jan 17 10:22:06 2008
New Revision: 14490

Modified:
   doc/trunk/design/syn/S05.pod

Log:
Clarifications suggested by moritz++ and rhr++


Modified: doc/trunk/design/syn/S05.pod
==============================================================================
--- doc/trunk/design/syn/S05.pod        (original)
+++ doc/trunk/design/syn/S05.pod        Thu Jan 17 10:22:06 2008
@@ -14,9 +14,9 @@
    Maintainer: Patrick Michaud <[EMAIL PROTECTED]> and
                Larry Wall <[EMAIL PROTECTED]>
    Date: 24 Jun 2002
-   Last Modified: 10 Jan 2008
+   Last Modified: 17 Jan 2008
    Number: 5
-   Version: 70
+   Version: 71
 
 This document summarizes Apocalypse 5, which is about the new regex
 syntax.  We now try to call them I<regex> rather than "regular
@@ -192,7 +192,9 @@
 
 The C<:ii> variant may be used on a substitution to change the
 substituted string to the same case pattern as the matched string.
-Case info is carried across on a character by character basis.  If the
+
+If the pattern is matched without the C<:sigspace> modifier, case
+info is carried across on a character by character basis.  If the
 right string is longer than the left one, the case of the final
 character is replicated.  Titlecase is carried across if possible
 regardless of whether the resulting letter is at the beginning of
@@ -200,7 +202,28 @@
 corresponding uppercase character is used.  (This policy can be
 modified within a lexical scope by a language-dependent Unicode
 declaration to substitute titlecase according to the orthographic
-rules of the specified language.)
+rules of the specified language.)  Characters that carry no case
+information leave their corresponding replacement character unchanged.
+
+If the pattern is matched with C<:sigspace>, then a slightly smarter
+algorithm is used which attempts to determine if there is a uniform
+capitalization policy over each matched word, and applies the same
+policy to each replacement word.  If there doesn't seem to be a uniform
+policy on the left, the policy for each word is carried over word by
+word, with the last pattern word replicated if necessary.  If a word
+does not appear to have a recognizable policy, the replacement word
+is translated character for character as in the non-sigspace case.
+Recognized policies include:
+
+    lc()
+    uc()
+    ucfirst(lc())
+    lcfirst(uc())
+    capitalize()
+
+In any case, only the officially matched string part of the pattern
+match counts, so any sort of lookahead or contextual matching is not
+included in the analysis.
 
 =item *
 
@@ -220,6 +243,7 @@
 the right string is longer than the left one, the remaining characters
 are substituted without any modification.  (Note that NFD/NFC distinctions
 are usually immaterial, since Perl encapsulates that in grapheme mode.)
+Under C<:sigspace> the preceding rules are applied word by word.
 
 =item *
 

Reply via email to