Edit report at https://bugs.php.net/bug.php?id=62562&edit=1
ID: 62562
User updated by:magog dot the dot ogre at gmail dot com
Reported by:magog dot the dot ogre at gmail dot com
Summary:preg_replace mangles UTF8 string - Windows only
-Status: Feedback
+Status: Open
Type: Bug
Package:*Regular Expressions
Operating System: Windows x86
PHP Version:5.3.14
Block user comment: N
Private report: N
New Comment:
I have Perl itself installed; do they use PCRE? Sorry for my n00b questions. If
so, I will run a test on there shortly.
Previous Comments:
[2012-07-14 03:12:27] ras...@php.net
hrm.. how about finding something else that links against pcre and runs on
Windows that might be able to do a replace? Like Python perhaps?
I still doubt this has anything to do with PHP. We don't mangle anything going
in
nor out of pcre.
[2012-07-14 03:08:15] magog dot the dot ogre at gmail dot com
pcretest doesn't actually perform replacements: it only does matches. I'm not
sure
how I would run pcretest on this.
[2012-07-14 02:44:58] ras...@php.net
This is unlikely to be a native PHP issue. Can you perform a similar test using
the pcretest program from pcre.org? If you can reproduce it with that then it
takes PHP completely out of the picture and you would need to file it against
libpcre.
[2012-07-14 01:44:35] magog dot the dot ogre at gmail dot com
Please note that I am aware that using a regex without the "u" modifier with
non-
standard characters is discouraged. HOWEVER, it is still bad for there to be
different behavior in Windows than in Unix.
[2012-07-14 01:42:23] magog dot the dot ogre at gmail dot com
Description:
In limited circumstances, PHP is mangling certain UTF8 strings in Windows. The
same issue is not appearing in SunOS, and probably not in Linux either (I would
have to reboot to double check that, but I've never seen the issue in the many
times I've run the script in Ubuntu).
Test script:
---
$text = "{{ááá¤áá áááªáá | áá¦á¬áá á =
á¡ááá¦ááá á ááááá á¯ááá¡ áá£á®á£á ááá |
á¬á§áá á= | ááá áá¦á = | ááá¢áá á=
[[áááá®ááá ááááá:lika";
echo preg_replace("/\s+/", " ", $text);
Expected result:
Expected result, observed on a SunOS, i386, PHP 5.3.8 (without quotes):
"{{ááá¤áá áááªáá | áá¦á¬áá á =
á¡ááá¦ááá á ááááá á¯ááá¡ áá£á®á£á ááá |
á¬á§áá á = | ááá áá¦á = | ááá¢áá á =
[[áááá®ááá ááááá:lika"
Actual result:
--
Observed result in Windows 7, WOW64, PHP 5.3.14 (without quotes):
"{{ááá¤áâ áááªáá |
áá¦á¬áâ á = á¡ááá¦ááâ á ááááâ á¯ááá¡
áá£á®á£â ááá | á¬á§áâ á = | ááâ áá¦á = |
ááá¢áâ á = [[áááá®ááâ
ááááá:lika"
--
Edit this bug report at https://bugs.php.net/bug.php?id=62562&edit=1