ID:               41216
 User updated by:  DPP <paul dot dovbush at gmail dot com>
 Reported By:      DPP <paul dot dovbush at gmail dot com>
 Status:           Open
 Bug Type:         PCRE related
 Operating System: WinXPsp2
 PHP Version:      5.2.1
 New Comment:

Forgot to say: file contain russian text encoded in UTF-8.
Without PCRE_UTF8 modifier regexp falls on russian letter "R".


Previous Comments:
------------------------------------------------------------------------

[2007-04-27 17:26:59] DPP <paul dot dovbush at gmail dot com>

Description:
------------
Parsing file with 10000 lines of following format:

level + delim + [EMAIL PROTECTED]@ + delim +] tag + [delim + line_value +]
terminator

level           digit
delim           space
xref_id alphanum
tag             alpha (english)
line_value      any (except terminator)
terminator      \r\n

With regexp:

$c=preg_match_all("/^\s*(\d+)\s+(@(\S+)@\s+)?(\w+)(\s+@(\S+)@\s*|.*)?$/Sm",$fp,$m,PREG_PATTERN_ORDER);

Setting PCRE_UTF8 modifier slows whole script down 30 times (from 300ms
to 9000ms).

May be more accurate regexp here will be
$c=preg_match_all("/^ *(\d+) +(@([EMAIL PROTECTED])@ +)?([^ \\n]+)(
+@([EMAIL PROTECTED])@ *| +[^\\n]*)?$/m",$fp,$m,PREG_PATTERN_ORDER);
But it changes nothing.



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=41216&edit=1

Reply via email to