ID: 41216 User updated by: DPP <paul dot dovbush at gmail dot com> Reported By: DPP <paul dot dovbush at gmail dot com> Status: Open Bug Type: PCRE related Operating System: WinXPsp2 PHP Version: 5.2.1 New Comment:
Forgot to say: file contain russian text encoded in UTF-8. Without PCRE_UTF8 modifier regexp falls on russian letter "R". Previous Comments: ------------------------------------------------------------------------ [2007-04-27 17:26:59] DPP <paul dot dovbush at gmail dot com> Description: ------------ Parsing file with 10000 lines of following format: level + delim + [EMAIL PROTECTED]@ + delim +] tag + [delim + line_value +] terminator level digit delim space xref_id alphanum tag alpha (english) line_value any (except terminator) terminator \r\n With regexp: $c=preg_match_all("/^\s*(\d+)\s+(@(\S+)@\s+)?(\w+)(\s+@(\S+)@\s*|.*)?$/Sm",$fp,$m,PREG_PATTERN_ORDER); Setting PCRE_UTF8 modifier slows whole script down 30 times (from 300ms to 9000ms). May be more accurate regexp here will be $c=preg_match_all("/^ *(\d+) +(@([EMAIL PROTECTED])@ +)?([^ \\n]+)( +@([EMAIL PROTECTED])@ *| +[^\\n]*)?$/m",$fp,$m,PREG_PATTERN_ORDER); But it changes nothing. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=41216&edit=1