Once again, you're trying to work with bytes inside Unicode strings,
which just does not make sense. What do you propose we do, somehow
automatically detect that you used \x inside a Unicode string and
turn it into a binary one? Or simply allow one to stick any byte
sequence inside what is supposed to be a valid UTF-16 string?
If you're trying to generate a UTF-8 string on a byte by byte basis,
then it needs to be a binary string, I'm sorry. Whether you do this
via being in unicode.semantics=off mode or via using b"" prefix is up
to you.
-Andrei
unicode.fallback_encoding => 'utf-8' => 'utf-8'
unicode.filesystem_encoding => no value => no value
unicode.http_input_encoding => 'utf-8' => 'utf-8'
unicode.output_encoding => 'utf-8' => 'utf-8'
unicode.runtime_encoding => 'utf-8' => 'utf-8'
unicode.script_encoding => 'utf-8' => 'utf-8'
unicode.semantics => On => On
unicode.stream_encoding => UTF-8 => UTF-8
--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)
var_dump(preg_match("/[\240-\377]/",$string1));
var_dump(preg_match("/[\240-\377]/",$string2));
?>
---
ą is in utf-8 (latin small letter a with ogonek, latin extended-a
range).
It contains two bytes with 0xC4 0x85 values.
Expected result and actual result for php 5.2.0:
---
bool(true)
int(1)
int(1)
---
"/[\240-\377]/" range should match 0xC4 byte.
Actual result (PHP6):
---
bool(false)
int(0)
int(1)
---
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php