Edit report at https://bugs.php.net/bug.php?id=62562&edit=1

 ID:                 62562
 User updated by:    magog dot the dot ogre at gmail dot com
 Reported by:        magog dot the dot ogre at gmail dot com
 Summary:            preg_replace mangles UTF8 string - Windows only
-Status:             Feedback
+Status:             Open
 Type:               Bug
 Package:            *Regular Expressions
 Operating System:   Windows x86
 PHP Version:        5.3.14
 Block user comment: N
 Private report:     N

 New Comment:

pcretest doesn't actually perform replacements: it only does matches. I'm not 
sure 
how I would run pcretest on this.


Previous Comments:
------------------------------------------------------------------------
[2012-07-14 02:44:58] ras...@php.net

This is unlikely to be a native PHP issue. Can you perform a similar test using 
the pcretest program from pcre.org? If you can reproduce it with that then it 
takes PHP completely out of the picture and you would need to file it against 
libpcre.

------------------------------------------------------------------------
[2012-07-14 01:44:35] magog dot the dot ogre at gmail dot com

Please note that I am aware that using a regex without the "u" modifier with 
non-
standard characters is discouraged. HOWEVER, it is still bad for there to be 
different behavior in Windows than in Unix.

------------------------------------------------------------------------
[2012-07-14 01:42:23] magog dot the dot ogre at gmail dot com

Description:
------------
In limited circumstances, PHP is mangling certain UTF8 strings in Windows. The 
same issue is not appearing in SunOS, and probably not in Linux either (I would 
have to reboot to double check that, but I've never seen the issue in the many 
times I've run the script in Ubuntu).

Test script:
---------------
$text = "{{ინფორმაცია | აღწერა   = 
საზღვარი განარჯიის მუხურთან | 
წყარო    =  | თარიღი   =  | ავტორი    = 
[[მომხმარებელი:lika";
echo preg_replace("/\s+/", " ", $text);

Expected result:
----------------
Expected result, observed on a SunOS, i386, PHP 5.3.8 (without quotes): 
"{{ინფორმაცია | აღწერა = 
საზღვარი განარჯიის მუხურთან | 
წყარო = | თარიღი = | ავტორი = 
[[მომხმარებელი:lika"

Actual result:
--------------
Observed result in Windows 7, WOW64, PHP 5.3.14 (without quotes): 
"{{ინფო▒ მაცია | 
აღწე▒ ა = საზღვა▒ ი განა▒ ჯიის 
მუხუ▒ თან | წყა▒ ო = | თა▒ იღი = | 
ავტო▒ ი = [[მომხმა▒ 
ებელი:lika"



------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=62562&edit=1

Reply via email to