On Nov 28, 2023, at 11:12, Claude Pache <[email protected]> wrote:
> Le 28 nov. 2023 à 19:57, Hans Henrik Bergan <[email protected]> a écrit :
>> With the dominance of UTF-8 (a fixed-endian encoding), surely no new
>> code should utilize any of declare(encoding='...') / zend.multibyte /
>> zend.script_encoding / zend.detect_unicode.
>> I propose we deprecate all 4.
>
> What is the migration path for legacy code that use those directives?
Convert your PHP source files to UTF-8. These directives are only required for
code written in legacy multibyte encodings like Shift-JIS, Big5, or EUC-CN.
(These encodings are primarily used for Chinese and Japanese text.)
These directives are not required for scripts which *process* text in these
encodings. They're only required if the source code itself is in a legacy
multibyte encoding, as those encodings can contain octets in the basic ASCII
range (0x20 - 0x7f) within multibyte sequences. For example, the character "ボ"
(U+30DC KATAKANA LETTER BO) is encoded in Shift-JIS as 83 7B, whose second
octet would ordinarily represent the ASCII character "{". If this character
appeared in a variable name, for instance, PHP would need to recognize that the
"7B" does not represent open brace.
>> With the dominance of UTF-8 (a fixed-endian encoding)
I'll add that what's special about UTF-8 isn't that it's "fixed-endian". It's
that UTF-8 only uses octets above 0x7F for characters outside the ASCII range,
so the parser doesn't have to be specifically aware of UTF-8 encoding when
processing text.
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php