ID:               37794
 User updated by:  jdespatis at yahoo dot fr
 Reported By:      jdespatis at yahoo dot fr
 Status:           Bogus
 Bug Type:         PCRE related
 Operating System: Linux 2.6.15 Debian Testing
 PHP Version:      5.1.4
 New Comment:

Ok.
However i've read again the documentation
http://fr.php.net/manual/en/reference.pcre.pattern.syntax.php

And i don't see it's explicitely said "in utf-8 mode don't use \w"
i can only see: "Since PHP 4.4.0 and 5.1.0, three additional escape
sequences to match generic character types are available when UTF-8
mode is selected. "

So, a reader understand this as: \w works AND in utf8 i have also \p{}

Would it be possible to update the documentation ? (for example, now, i
have a doubt on \d, is it working on utf8 ?, i dunno...)

One thing more: i've found that ucwords() and ucfirst() are not utf8
aware, the documentation should be updated i think

Thanks


Previous Comments:
------------------------------------------------------------------------

[2006-06-13 21:08:46] [EMAIL PROTECTED]

sorry, my last comment is incorrect. in utf mode you should use the
property escapes (\p{..}), instead of non utf8-aware escapes, like \W.

------------------------------------------------------------------------

[2006-06-13 18:35:27] [EMAIL PROTECTED]

/\W/ means match any non-whitespace. you probably want to use \w (lower
case)

------------------------------------------------------------------------

[2006-06-13 11:53:50] jdespatis at yahoo dot fr

Description:
------------
preg_split("/\W/u", $utf8_string) cuts the words !

Reproduce code:
---------------
print_r(preg_split("/(\W)/u", "этот", -1,
PREG_SPLIT_DELIM_CAPTURE));

(watch out, i've put an utf8 string (you need to translate the html
code into utf8), it's a russian string, (when you see the characters,
you can see etot, with e being an epsilon inverted)

For now, i succeed in making my code work by using:
\P{L} instead of \W

Expected result:
----------------
Array
(
    [0] => этот
)

Actual result:
--------------
Array
(
    [0] =>
    [1] => э
    [2] =>
    [3] => т
    [4] =>
    [5] => о
    [6] =>
    [7] => т
    [8] =>
)


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=37794&edit=1

Reply via email to