On Wed, 8 May 2013, [email protected] wrote: > Thank you for helping. > > However, when a file only contains a single term: "hello", the regex > (?=.*\p{Any}hello.*) does NOT match it.
It wouldn't. That pattern looks for "zero of more of any character (except newline)", followed by "any one character", followed by "hello", followed by "zero or more of any character (except newline)". So it has to match at least 6 characters; "hello" is only 5 characters. > Also, \p is slow. Since (?=.*hello.*) is working for Unicode (which I > need) and seems fast my question ... I guess ... is related to why NOT > use PCRE_DOTALL and why does PCRE not treat a subject string as a > single line with no regard for line endings as stated in the > documentation? ALL that PCRE_DOTALL does is make "." match newline characters. The default is not, for Perl compatibility. If that's what you want, then use it. Note, however, the caveat about CRLF newlines in the spec: PCRE_DOTALL If this bit is set, a dot metacharacter in the pattern matches a character of any value, including one that indicates a newline. However, it only ever matches one character, even if newlines are coded as CRLF. Without this option, a dot does not match when the current position is at a newline. This option is equivalent to Perl's /s option, and it can be changed within a pattern by a (?s) option setting. A negative class such as [^a] always matches newline characters, independent of the setting of this option. > Perhaps it's an issue with this bug: http://bugs.exim.org/show_bug.cgi?id=1351 No; that is a pcregrep bug. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
