Maybe strings should be UTF-8 until declared otherwise or something,
because this just won't fly...

UTF8 would not help you with bits (since nobody guarantees you incoming data is valid UTF-8) and it's impossible to do any unicode stuff on utf-8 - you'd have to convert it to utf-16 and back on every step.

I dunno.  Aren't there headers to indicate what kind of data is coming
in?

I know of no headers that can tell you "parameter 'foo' in a form is a bitmask so please do not try to see it as text".

If there aren't, or can't be, then you have to let ME tell you what it
is.

You can. Use binary strings and explicit conversions.

You can't just go assuming I've got UTF-16 data coming in --
especially not when the entire Internet has been built and subsisted
on ASCII (more or less) for over a decade.

Actually, there's INI parameter that says which encoding the incoming data is in. The problem is not that - the problem is that PHP can't know that you pass bit fields inside textual information (and in HTTP all parameters are textual) so you have to work with it manually.

Anybody who actually NEEDS Unicode ought to be the ones who have to
type a new keyword or something, not the bazillion users who have no
need for Unicode and likely never will...

If they have no need for unicode, why run unicode-enabled PHP? Turn it off and get all your strings untouched.

It's just an ASCII string, same as it's always been.

IS_STRING

If you need some new-fangled UTF-16 datatype stringie, then go ahead
and give yourself one.

IS_UNICODE

But don't change all MY data to UTF-16 when it isn't UTF-16!!!

Then you can't use unicode mode. Because in Unicode mode the text string is UTF-16. If it's not a text string, you should tell so, PHP doesn't have any way to know.

In what sane world do you suddenly declare all that data isn't ASCII
any more and claim that it's UTF-16 when UTF-16 isn't backwards
compatible with ASCII?

Python tried that. They are moving to model PHP 6 uses in Python 3. Must be not that silly an idea, I guess.

But now \xF0 isn't going to be ASCII 128 anymore, is it?

ASCII doesn't have any characters beyond 0x7f AFAIK, but it doesn't matter, I get what you mean. \xF0 in unicode mode would be U+00F0 of course. Now how preg_match should handle it depends on preg_match.
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to