Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

Andrei Zmievski Mon, 09 Jul 2007 11:42:33 -0700

Once again, you're trying to work with bytes inside Unicode strings,which just does not make sense. What do you propose we do, somehowautomatically detect that you used \x inside a Unicode string andturn it into a binary one? Or simply allow one to stick any bytesequence inside what is supposed to be a valid UTF-16 string?

If you're trying to generate a UTF-8 string on a byte by byte basis,then it needs to be a binary string, I'm sorry. Whether you do thisvia being in unicode.semantics=off mode or via using b"" prefix is upto you.


-Andrei

unicode.fallback_encoding => 'utf-8' => 'utf-8'
unicode.filesystem_encoding => no value => no value
unicode.http_input_encoding => 'utf-8' => 'utf-8'
unicode.output_encoding => 'utf-8' => 'utf-8'
unicode.runtime_encoding => 'utf-8' => 'utf-8'
unicode.script_encoding => 'utf-8' => 'utf-8'
unicode.semantics => On => On
unicode.stream_encoding => UTF-8 => UTF-8

--- test.php ---
<?php
$string1 = "ą";
$string2 = "\xC4\x85";
var_dump($string1 == $string2)
var_dump(preg_match("/[\240-\377]/",$string1));
var_dump(preg_match("/[\240-\377]/",$string2));
?>
---

ą is in utf-8 (latin small letter a with ogonek, latin extended-arange).

It contains two bytes with 0xC4 0x85 values.

Expected result and actual result for php 5.2.0:
---
bool(true)
int(1)
int(1)
---
"/[\240-\377]/" range should match 0xC4 byte.

Actual result (PHP6):
---
bool(false)
int(0)
int(1)
---

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] What is the use of "unicode.semantics" in PHP 6?

Reply via email to