ID: 37794 User updated by: jdespatis at yahoo dot fr Reported By: jdespatis at yahoo dot fr Status: Bogus Bug Type: PCRE related Operating System: Linux 2.6.15 Debian Testing PHP Version: 5.1.4 New Comment:
Ok. However i've read again the documentation http://fr.php.net/manual/en/reference.pcre.pattern.syntax.php And i don't see it's explicitely said "in utf-8 mode don't use \w" i can only see: "Since PHP 4.4.0 and 5.1.0, three additional escape sequences to match generic character types are available when UTF-8 mode is selected. " So, a reader understand this as: \w works AND in utf8 i have also \p{} Would it be possible to update the documentation ? (for example, now, i have a doubt on \d, is it working on utf8 ?, i dunno...) One thing more: i've found that ucwords() and ucfirst() are not utf8 aware, the documentation should be updated i think Thanks Previous Comments: ------------------------------------------------------------------------ [2006-06-13 21:08:46] [EMAIL PROTECTED] sorry, my last comment is incorrect. in utf mode you should use the property escapes (\p{..}), instead of non utf8-aware escapes, like \W. ------------------------------------------------------------------------ [2006-06-13 18:35:27] [EMAIL PROTECTED] /\W/ means match any non-whitespace. you probably want to use \w (lower case) ------------------------------------------------------------------------ [2006-06-13 11:53:50] jdespatis at yahoo dot fr Description: ------------ preg_split("/\W/u", $utf8_string) cuts the words ! Reproduce code: --------------- print_r(preg_split("/(\W)/u", "этот", -1, PREG_SPLIT_DELIM_CAPTURE)); (watch out, i've put an utf8 string (you need to translate the html code into utf8), it's a russian string, (when you see the characters, you can see etot, with e being an epsilon inverted) For now, i succeed in making my code work by using: \P{L} instead of \W Expected result: ---------------- Array ( [0] => этот ) Actual result: -------------- Array ( [0] => [1] => э [2] => [3] => т [4] => [5] => о [6] => [7] => т [8] => ) ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=37794&edit=1