ID: 44418
Updated by: [EMAIL PROTECTED]
Reported By: yarodin at gmail dot com
-Status: Open
+Status: Bogus
Bug Type: PCRE related
Operating System: Windows XP PRO/5.1.2600
PHP Version: 5.2.5
New Comment:
if the input is UTF-8 you need to use the 'u' modifier. (e.g.
'#(\s)#u').
Previous Comments:
------------------------------------------------------------------------
[2008-03-12 16:00:19] yarodin at gmail dot com
Description:
------------
$split = preg_split('#(\s)#', $value, -1, PREG_SPLIT_NO_EMPTY |
PREG_SPLIT_DELIM_CAPTURE );
make wrong spliting sentences on words when sentence at russian UTF-8
and begin with russian letter 'Р' (hex D0h A0h). For example
russian
"Расширенные
поля
пользователей"
splits by php 5.2.5 on 7(!) words, but php4 is split correctly on 5
words. I think the problem at russian letter letter 'Р' wich split
as single word.
Reproduce code:
---------------
<?
$value="Расширенные
поля
пользователей";
header('Content-type: text/html; charset=utf-8');
print_r($value."<BR><BR><BR>");
$split = preg_split('#(\s)#', $value, -1, PREG_SPLIT_NO_EMPTY |
PREG_SPLIT_DELIM_CAPTURE );
print_r($split);
?>
Expected result:
----------------
Array ( [0] =>
Расширенные
[1] => [2] => поля [3] => [4] =>
пользователей
)
Actual result:
--------------
Array ( [0] => Р [1] => [2] =>
асширенные
[3] => [4] => поля [5] => [6] =>
пользователей
)
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=44418&edit=1