ID: 40871 Updated by: [EMAIL PROTECTED] Reported By: ismith at motorola dot com Status: Assigned Bug Type: PCRE related Operating System: Windows Server 2003 SP1 PHP Version: 5.2.1 Assigned To: andrei New Comment:
Nuno, Andrei wake up. Is it worth/possible to do something about it or should I mark it as "won't fix"? Previous Comments: ------------------------------------------------------------------------ [2007-03-22 23:03:41] [EMAIL PROTECTED] in PHP 6, PHP always passes well-formed utf-8 strings to pcre, because the strings are previously processed by ICU. In PHP 4/5, well.. It's hard to leave up to the user-land app to deal with these kind of complex things, but should we really interfere with string? I dunno.. but my point is that maintaing BC is more important at this time.. ------------------------------------------------------------------------ [2007-03-22 00:29:24] [EMAIL PROTECTED] Did you see this: http://us3.php.net/manual/en/function.preg-last-error.php The error is not getting lost. There's just not much we can do about it aside from returning it to the user. ------------------------------------------------------------------------ [2007-03-21 22:47:02] [EMAIL PROTECTED] Andrei, do you think there is something we can do about it? ------------------------------------------------------------------------ [2007-03-21 17:45:27] ismith at motorola dot com Further info: I emailed the PCRE maintainer, and he said that since PCRE doesn't do the replacement part, PCRE itself isn't dumping the text. Apparently when PCRE sees bad UTF8, it returns an error code (I believe PCRE_ERROR_BADUTF8). I think the text is getting lost by php_pcre_replace_impl. If pcre_exec returns PCRE_ERROR_NOMATCH, it saves all the unmatched text in the result; but if pcre_exec returns some other error code, it looks to me like it's dumping the result (which matches what I'm seeing). I don't see how PHP can do much else than what it's doing; without a match count back from pcre_exec, it can't process the replacements in any case. My feeling is that PCRE should not return an error code in this case, but work around the bad UTF-8 character, which would be more in keeping with the Unicode standard. I'll discuss this further with the PCRE folks. OTOH, maybe MediaWiki should do UTF-8 cleanup on the string before giving it to PHP. ------------------------------------------------------------------------ [2007-03-20 20:16:57] [EMAIL PROTECTED] >Where do I report this? How do I get it fixed? See http://pcre.org, further details I don't know myself. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/40871 -- Edit this bug report at http://bugs.php.net/?id=40871&edit=1