Once again, you're trying to work with bytes inside Unicode strings, which just does not make sense. What do you propose we do, somehow automatically detect that you used \x inside a Unicode string and turn it into a binary one? Or simply allow one to stick any byte sequence inside what is supposed to be a valid UTF-16 string?

If you're trying to generate a UTF-8 string on a byte by byte basis, then it needs to be a binary string, I'm sorry. Whether you do this via being in unicode.semantics=off mode or via using b"" prefix is up to you.

-Andrei

unicode.fallback_encoding => 'utf-8' => 'utf-8'
unicode.filesystem_encoding => no value => no value
unicode.http_input_encoding => 'utf-8' => 'utf-8'
unicode.output_encoding => 'utf-8' => 'utf-8'
unicode.runtime_encoding => 'utf-8' => 'utf-8'
unicode.script_encoding => 'utf-8' => 'utf-8'
unicode.semantics => On => On
unicode.stream_encoding => UTF-8 => UTF-8

--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)
var_dump(preg_match("/[\240-\377]/",$string1));
var_dump(preg_match("/[\240-\377]/",$string2));
?>
---

ą is in utf-8 (latin small letter a with ogonek, latin extended-a range).
It contains two bytes with 0xC4 0x85 values.

Expected result and actual result for php 5.2.0:
---
bool(true)
int(1)
int(1)
---
"/[\240-\377]/" range should match 0xC4 byte.

Actual result (PHP6):
---
bool(false)
int(0)
int(1)
---

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to