Edit report at http://bugs.php.net/bug.php?id=52592&edit=1
ID: 52592 Updated by: ahar...@php.net Reported by: pj at ezgr dot net Summary: mb_ereg_replace and the Greek capital Pi -Status: Open +Status: Bogus Type: Bug Package: mbstring related Operating System: Centos 5.5 x64 PHP Version: 5.2.14 Block user comment: N New Comment: You need to also call mb_regex_encoding('UTF-8'); before using a UTF-8 regular expression. Previous Comments: ------------------------------------------------------------------------ [2010-08-12 14:36:15] pj at ezgr dot net Description: ------------ PHP: 5.2.14, Apache 2.2.15, mod_php While \s is supposed to match all whitespace, the greek unicode letter Pi (Î ) whose code is 0xCEA0 is matched too and if replaced with something, it's stripped of its second byte (0xA0). Test script: --------------- <?php mb_internal_encoding('UTF-8'); $testStr = 'Î Î Î !'; $newStr = mb_ereg_replace('\s+','_',$testStr); echo $testStr; echo $newStr; echo urlencode($testStr); echo urlencode($newStr); ?> Expected result: ---------------- Î Î Î ! Î __Î __Î ! %CE%A0++%CE%A0++%CE%A0%21 %CE%A0__%CE%A0__%CE%A0%21 Actual result: -------------- Î Î Î ! [non printable character]_[non printable character]_[non printable character]! %CE%A0++%CE%A0++%CE%A0%21 %CE_%CE_%CE_%21 ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=52592&edit=1