Edit report at https://bugs.php.net/bug.php?id=62562&edit=1

 ID:                 62562
 Updated by:         paj...@php.net
 Reported by:        magog dot the dot ogre at gmail dot com
 Summary:            preg_replace mangles UTF8 string - Windows only
 Status:             Analyzed
 Type:               Bug
 Package:            *Regular Expressions
 Operating System:   Windows x86
 PHP Version:        5.3.14
 Block user comment: N
 Private report:     N

 New Comment:

It is set as analyzed, not resolved.

Can you try to compile PHP using the bundle PCRE instead of the system one 
please?


Previous Comments:
------------------------------------------------------------------------
[2012-07-22 20:28:38] magog dot the dot ogre at gmail dot com

Just curious: why was this marked as solved?

------------------------------------------------------------------------
[2012-07-16 15:38:10] a...@php.net

Btw. the PCRE version reported by PHP is 8.12, but the current is 8.30. May be 
a simple upgrade could solve this.

------------------------------------------------------------------------
[2012-07-16 15:19:54] a...@php.net

I've tested your PHP snippet on win7, but it's probably the same on any win. 
The behaviour is as you describe. But there is another point. The string to be 
matched is hardcoded into the script as UTF-8, if you open that file in the 
ASCII mode, you'll see each byte, see here (saved to a file as teh BT ruinates 
all the view) http://belsky.info/phpz/bugz/62562/62562_3.txt

Switch the encoding to UTF-8 in your browser and then to a non-multibyte one. 
Another way to do that - open the file under linux with 

vim -c 'set encoding=latin1' 62562_3.txt

In both cases one can see, that one byte is interpreted as a space. Combined 
with no UTF-8 modifier the behaviour is expected, further more windows seems do 
do it right :)

I've also debugged this under VS and it's definitely something coming back from 
the PCRE itself. Here http://lxr.php.net/xref/PHP_5_4/ext/pcre/php_pcre.c#621

is count > 0, so matched is incremented and returned some when. Nevertheless it 
could be a locale thing forcing PCRE to do UTF-8, but I actually don't see any 
locale dependent places in PCRE. Trying to boot linux with C locale might repro 
this there as well, I have no such mashines though.

------------------------------------------------------------------------
[2012-07-16 01:39:06] magog dot the dot ogre at gmail dot com

Yeah, it works SunOS and Ubuntu for me too.

Well if/when you get access to a Windows distro or another developer who has 
one comes along, then I guess you can work on this bug. :)

------------------------------------------------------------------------
[2012-07-15 22:43:01] ras...@php.net

Well, I have looked at the code. We take the raw binary string and pass it 
straight to PCRE both on Windows and UNIX. So something along the way isn't the 
same. But I am not a Windows guy, so I can't help you on the Windows side of 
things. It works fine on my Linux box here.

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=62562


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=62562&edit=1

Reply via email to