On Wed, 8 May 2013, [email protected] wrote:

> Thank you for helping.
> 
> However, when a file only contains a single term: "hello", the regex
> (?=.*\p{Any}hello.*) does NOT match it.

It wouldn't. That pattern looks for "zero of more of any character 
(except newline)", followed by "any one character", followed by "hello",
followed by "zero or more of any character (except newline)". So it has 
to match at least 6 characters; "hello" is only 5 characters.

> Also, \p is slow. Since (?=.*hello.*) is working for Unicode (which I
> need) and seems fast my question ... I guess ... is related to why NOT
> use PCRE_DOTALL and why does PCRE not treat a subject string as a
> single line with no regard for line endings as stated in the
> documentation?

ALL that PCRE_DOTALL does is make "." match newline characters. The 
default is not, for Perl compatibility. If that's what you want, then
use it. Note, however, the caveat about CRLF newlines in the spec:

  PCRE_DOTALL                                                            

  If this bit is set, a dot metacharacter in the pattern matches a
  character of any value, including one that indicates a newline.
  However, it only ever matches one character, even if newlines are
  coded as CRLF. Without this option, a dot does not match when the
  current position is at a newline. This option is equivalent to Perl's
  /s option, and it can be changed within a pattern by a (?s) option
  setting. A negative class such as [^a] always matches newline
  characters, independent of the setting of this option.

> Perhaps it's an issue with this bug: http://bugs.exim.org/show_bug.cgi?id=1351

No; that is a pcregrep bug. 

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to