didou Mon Dec 1 18:44:09 2003 EDT
Modified files:
/phpdoc/en/reference/pcre/functions pcre.pattern.syntax.xml
Log:
There were forgotten division-marks inside words (Jakub Vrana)
Index: phpdoc/en/reference/pcre/functions/pcre.pattern.syntax.xml
diff -u phpdoc/en/reference/pcre/functions/pcre.pattern.syntax.xml:1.5
phpdoc/en/reference/pcre/functions/pcre.pattern.syntax.xml:1.6
--- phpdoc/en/reference/pcre/functions/pcre.pattern.syntax.xml:1.5 Tue Aug 6
16:04:34 2002
+++ phpdoc/en/reference/pcre/functions/pcre.pattern.syntax.xml Mon Dec 1 18:44:09
2003
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="iso-8859-1"?>
-<!-- $Revision: 1.5 $ -->
+<!-- $Revision: 1.6 $ -->
<!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
<refentry id="pcre.pattern.syntax">
<refnamediv>
@@ -47,10 +47,10 @@
</listitem>
<listitem>
<simpara>
- Capturing subpatterns that occur inside negative looka-
- head assertions are counted, but their entries in the
- offsets vector are never set. Perl sets its numerical vari-
- ables from any such patterns that are matched before the
+ Capturing subpatterns that occur inside negative
+ lookahead assertions are counted, but their entries in the
+ offsets vector are never set. Perl sets its numerical
+ variables from any such patterns that are matched before the
assertion fails to match something (thereby succeeding), but
only if the negative lookahead assertion contains just one
branch.
@@ -68,8 +68,8 @@
<simpara>
The following Perl escape sequences are not supported:
\l, \u, \L, \U, \E, \Q. In fact these are implemented by
- Perl's general string-handling and are not part of its pat-
- tern matching engine.
+ Perl's general string-handling and are not part of its
+ pattern matching engine.
</simpara>
</listitem>
<listitem>
@@ -123,7 +123,7 @@
<simpara>
If <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> is set and
<link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is not
- set, the $ meta- character matches only at the very end of
+ set, the $ meta-character matches only at the very end of
the string.
</simpara>
</listitem>
@@ -135,8 +135,8 @@
</listitem>
<listitem>
<simpara>
- If <link linkend="pcre.pattern.modifiers">PCRE_UNGREEDY</link> is set, the
greediness of the repeti-
- tion quantifiers is inverted, that is, by default they are
+ If <link linkend="pcre.pattern.modifiers">PCRE_UNGREEDY</link> is set, the
greediness of the
+ repetition quantifiers is inverted, that is, by default they are
not greedy, but if followed by a question mark they are.
</simpara>
</listitem>
@@ -152,8 +152,8 @@
<refsect2 id="regexp.introduction">
<title>Introduction</title>
<para>
- The syntax and semantics of the regular expressions sup-
- ported by PCRE are described below. Regular expressions are
+ The syntax and semantics of the regular expressions
+ supported by PCRE are described below. Regular expressions are
also described in the Perl documentation and in a number of
other books, some of which have copious examples. Jeffrey
Friedl's "Mastering Regular Expressions", published by
@@ -162,8 +162,8 @@
A regular expression is a pattern that is matched against a
subject string from left to right. Most characters stand for
- themselves in a pattern, and match the corresponding charac-
- ters in the subject. As a trivial example, the pattern
+ themselves in a pattern, and match the corresponding
+ characters in the subject. As a trivial example, the pattern
<literal>The quick brown fox</literal>
matches a portion of a subject string that is identical to
itself.
@@ -173,9 +173,9 @@
<title>Meta-characters</title>
<para>
The power of regular expressions comes from the
- ability to include alternatives and repetitions in the pat-
- tern. These are encoded in the pattern by the use of <emphasis>meta</emphasis>-
- <emphasis>characters</emphasis>, which do not stand for themselves but instead
+ ability to include alternatives and repetitions in the
+ pattern. These are encoded in the pattern by the use of
+ <emphasis>meta-characters</emphasis>, which do not stand for themselves but
instead
are interpreted in some special way.
</para>
<para>
@@ -299,8 +299,8 @@
</variablelist>
Part of a pattern that is in square brackets is called a
- "character class". In a character class the only meta-
- characters are:
+ "character class". In a character class the only
+ meta-characters are:
<variablelist>
<varlistentry>
<term><emphasis>\</emphasis></term>
@@ -350,23 +350,23 @@
</para>
<para>
For example, if you want to match a "*" character, you write
- "\*" in the pattern. This applies whether or not the follow-
- ing character would otherwise be interpreted as a meta-
- character, so it is always safe to precede a non-alphanumeric
- with "\" to specify that it stands for itself. In particu-
- lar, if you want to match a backslash, you write "\\".
+ "\*" in the pattern. This applies whether or not the
+ following character would otherwise be interpreted as a
+ meta-character, so it is always safe to precede a non-alphanumeric
+ with "\" to specify that it stands for itself. In
+ particular, if you want to match a backslash, you write "\\".
</para>
<para>
- If a pattern is compiled with the <link
linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option, whi-
- tespace in the pattern (other than in a character class) and
+ If a pattern is compiled with the <link
linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option,
+ whitespace in the pattern (other than in a character class) and
characters between a "#" outside a character class and the
next newline character are ignored. An escaping backslash
can be used to include a whitespace or "#" character as part
of the pattern.
</para>
<para>
- A second use of backslash provides a way of encoding non-
- printing characters in patterns in a visible manner. There
+ A second use of backslash provides a way of encoding
+ non-printing characters in patterns in a visible manner. There
is no restriction on the appearance of non-printing characters,
apart from the binary zero that terminates a pattern,
but when a pattern is being prepared by text editing, it is
@@ -569,8 +569,8 @@
</variablelist>
</para>
<para>
- Note that octal values of 100 or greater must not be intro-
- duced by a leading zero, because no more than three octal
+ Note that octal values of 100 or greater must not be
+ introduced by a leading zero, because no more than three octal
digits are ever read.
</para>
<para>
@@ -581,8 +581,8 @@
class it has a different meaning (see below).
</para>
<para>
- The third use of backslash is for specifying generic charac-
- ter types:
+ The third use of backslash is for specifying generic
+ character types:
</para>
<para>
<variablelist>
@@ -647,8 +647,8 @@
Perl "<literal>word</literal>". The definition of letters and digits is
controlled by PCRE's character tables, and may vary if locale-specific
matching is taking place (see "Locale support"
- above). For example, in the "fr" (French) locale, some char-
- acter codes greater than 128 are used for accented letters,
+ above). For example, in the "fr" (French) locale, some
+ character codes greater than 128 are used for accented letters,
and these are matched by <literal>\w</literal>.
</para>
<para>
@@ -659,8 +659,8 @@
is no character to match.
</para>
<para>
- The fourth use of backslash is for certain simple asser-
- tions. An assertion specifies a condition that has to be met
+ The fourth use of backslash is for certain simple
+ assertions. An assertion specifies a condition that has to be met
at a particular point in a match, without consuming any
characters from the subject string. The use of subpatterns
for more complicated assertions is described below. The
@@ -752,11 +752,11 @@
Circumflex need not be the first character of the pattern if
a number of alternatives are involved, but it should be the
first thing in each alternative in which it appears if the
- pattern is ever to match that branch. If all possible alter-
- natives start with a circumflex, that is, if the pattern is
+ pattern is ever to match that branch. If all possible
+ alternatives start with a circumflex, that is, if the pattern is
constrained to match only at the start of the subject, it is
- said to be an "anchored" pattern. (There are also other con-
- structs that can cause a pattern to be anchored.)
+ said to be an "anchored" pattern. (There are also other
+ constructs that can cause a pattern to be anchored.)
A dollar character is an assertion which is &true; only if the
current matching point is at the end of the subject string,
@@ -779,10 +779,10 @@
before an internal "\n" character, respectively, in addition
to matching at the start and end of the subject string. For
example, the pattern /^abc$/ matches the subject string
- "def\nabc" in multiline mode, but not otherwise. Conse-
- quently, patterns that are anchored in single line mode
- because all branches start with "^" are not anchored in mul-
- tiline mode. The <link
linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> option is ignored if
+ "def\nabc" in multiline mode, but not otherwise.
+ Consequently, patterns that are anchored in single line mode
+ because all branches start with "^" are not anchored in
+ multiline mode. The <link
linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> option is ignored if
<link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is set.
Note that the sequences \A, \Z, and \z can be used to match
@@ -798,9 +798,9 @@
Outside a character class, a dot in the pattern matches any
one character in the subject, including a non-printing
character, but not (by default) newline. If the <link
linkend="pcre.pattern.modifiers">PCRE_DOTALL</link>
- option is set, then dots match newlines as well. The han-
- dling of dot is entirely independent of the handling of cir-
- cumflex and dollar, the only relationship being that they
+ option is set, then dots match newlines as well. The
+ handling of dot is entirely independent of the handling of
+ circumflex and dollar, the only relationship being that they
both involve newline characters. Dot has no special meaning
in a character class.
</literallayout>
@@ -809,25 +809,25 @@
<refsect2 id="regexp.reference.squarebrackets">
<title>Square brackets</title>
<literallayout>
- An opening square bracket introduces a character class, ter-
- minated by a closing square bracket. A closing square
+ An opening square bracket introduces a character class,
+ terminated by a closing square bracket. A closing square
bracket on its own is not special. If a closing square
bracket is required as a member of the class, it should be
- the first data character in the class (after an initial cir-
- cumflex, if present) or escaped with a backslash.
+ the first data character in the class (after an initial
+ circumflex, if present) or escaped with a backslash.
A character class matches a single character in the subject;
the character must be in the set of characters defined by
- the class, unless the first character in the class is a cir-
- cumflex, in which case the subject character must not be in
+ the class, unless the first character in the class is a
+ circumflex, in which case the subject character must not be in
the set defined by the class. If a circumflex is actually
required as a member of the class, ensure it is not the
first character, or escape it with a backslash.
For example, the character class [aeiou] matches any lower
case vowel, while [^aeiou] matches any character that is not
- a lower case vowel. Note that a circumflex is just a con-
- venient notation for specifying the characters which are in
+ a lower case vowel. Note that a circumflex is just a
+ convenient notation for specifying the characters which are in
the class by enumerating those that are not. It is not an
assertion: it still consumes a character from the subject
string, and fails if the current pointer is at the end of
@@ -836,8 +836,8 @@
When caseless matching is set, any letters in a class
represent both their upper case and lower case versions, so
for example, a caseless [aeiou] matches "A" as well as "a",
- and a caseless [^aeiou] does not match "A", whereas a case-
- ful version would.
+ and a caseless [^aeiou] does not match "A", whereas a
+ caseful version would.
The newline character is never treated in any special way in
character classes, whatever the setting of the <link
linkend="pcre.pattern.modifiers">PCRE_DOTALL</link>
@@ -848,17 +848,17 @@
of characters in a character class. For example, [d-m]
matches any letter between d and m, inclusive. If a minus
character is required in a class, it must be escaped with a
- backslash or appear in a position where it cannot be inter-
- preted as indicating a range, typically as the first or last
+ backslash or appear in a position where it cannot be
+ interpreted as indicating a range, typically as the first or last
character in the class.
It is not possible to have the literal character "]" as the
end character of a range. A pattern such as [W-]46] is
- interpreted as a class of two characters ("W" and "-") fol-
- lowed by a literal string "46]", so it would match "W46]" or
+ interpreted as a class of two characters ("W" and "-")
+ followed by a literal string "46]", so it would match "W46]" or
"-46]". However, if the "]" is escaped with a backslash it
- is interpreted as the end of range, so [W-\]46] is inter-
- preted as a single class containing a range followed by two
+ is interpreted as the end of range, so [W-\]46] is
+ interpreted as a single class containing a range followed by two
separate characters. The octal or hexadecimal representation
of "]" can also be used to end a range.
@@ -875,8 +875,8 @@
appear in a character class, and add the characters that
they match to the class. For example, [\dABCDEF] matches any
hexadecimal digit. A circumflex can conveniently be used
- with the upper case character types to specify a more res-
- tricted set of characters than the matching lower case type.
+ with the upper case character types to specify a more
+ restricted set of characters than the matching lower case type.
For example, the class [^\W_] matches any letter or digit,
but not underscore.
@@ -984,8 +984,8 @@
which can be nested. Marking part of a pattern as a subpattern
does two things:
- 1. It localizes a set of alternatives. For example, the pat-
- tern
+ 1. It localizes a set of alternatives. For example, the
+ pattern
cat(aract|erpillar|)
@@ -1131,8 +1131,8 @@
does the right thing with the C comments. The meaning of the
various quantifiers is not otherwise changed, just the preferred
- number of matches. Do not confuse this use of ques-
- tion mark with its use as a quantifier in its own right.
+ number of matches. Do not confuse this use of
+ question mark with its use as a quantifier in its own right.
Because it has two uses, it can sometimes appear doubled, as
in
@@ -1374,8 +1374,8 @@
<title>Once-only subpatterns</title>
<literallayout>
With both maximizing and minimizing repetition, failure of
- what follows normally causes the repeated item to be re-
- evaluated to see if a different number of repeats allows the
+ what follows normally causes the repeated item to be
+ re-evaluated to see if a different number of repeats allows the
rest of the pattern to match. Sometimes it is useful to
prevent this, either to change the nature of the match, or
to cause it fail earlier than it otherwise might, when the
@@ -1401,8 +1401,8 @@
This kind of parenthesis "locks up" the part of the pattern
it contains once it has matched, and a failure further into
- the pattern is prevented from backtracking into it. Back-
- tracking past it to previous items, however, works as normal.
+ the pattern is prevented from backtracking into it.
+ Backtracking past it to previous items, however, works as normal.
An alternative description is that a subpattern of this type
matches the string of characters that an identical standalone
@@ -1419,8 +1419,8 @@
This construction can of course contain arbitrarily complicated
subpatterns, and it can be nested.
- Once-only subpatterns can be used in conjunction with look-
- behind assertions to specify efficient matching at the end
+ Once-only subpatterns can be used in conjunction with
+ look-behind assertions to specify efficient matching at the end
of the subject string. Consider a simple pattern such as
abcd$
@@ -1547,8 +1547,8 @@
comment play no part in the pattern matching at all.
If the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option is
set, an unescaped # character
- outside a character class introduces a comment that contin-
- ues up to the next newline character in the pattern.
+ outside a character class introduces a comment that
+ continues up to the next newline character in the pattern.
</literallayout>
</refsect2>
@@ -1571,8 +1571,8 @@
\( ( (?>[^()]+) | (?R) )* \)
First it matches an opening parenthesis. Then it matches any
- number of substrings which can either be a sequence of non-
- parentheses, or a recursive match of the pattern itself
+ number of substrings which can either be a sequence of
+ non-parentheses, or a recursive match of the pattern itself
(i.e. a correctly parenthesized substring). Finally there is
a closing parenthesis.