On Wed, July 11, 2007 9:14 pm, Rasmus Lerdorf wrote:
> Richard, you are rather confused on this Unicode stuff.

I'm 100% certain we can all agree on that point. :-)

> The fact that
> PHP and ICU uses UTF-16 internally has absolutely nothing to do with
> what is exposed at the scripting level.

But somebody has just said that it will, didn't they?

That GPC data will be Unicode, and trying to use it as ASCII will break?

> The only things that will break in a standard application is stuff
> that
> relies on strings being binary.  Normal text passing back and forth
> between the browser and the server will work just fine.
>
> The breakages, apart from various bugs at this early stage, are
> limited
> to places where the code is expecting to see a binary string and PHP
> hasn't been able to determine this automatically.  And hopefully we
> can
> come up with ways to automatically determine when something should
> default to a binary string.
>
> But if you write:
>
> $a = "マニュアル";
> echo $a[1];

Whoa.

That was weird...

It was just a bunch of question marks when I read it, and now it's a
bunch of symbols (variants on afz mostly) in my reply...

> and you expect to have that spew out 0xe3, then yes, it will break
> because it will result in ニ which is what it really should do.

You have me beat at the "...if you write" part, because I have no idea
how to make my keyboard make those symbols... :-v

My only concern is that:

http://example.com/foo=bar
echo $_GET['foo'][2];
should still print out 'a' just like it always has.

And:
http://example.com/mask=100110
echo $_GET['mask'] & 110010;
should print out 100010 just like it always has

Folks keep saying that bit-string manipulation makes no sense in
Unicode, and that's fine, I guess...

If a scripter is trying to do that, then see if the string is ASCII
[01]* and typecast it to binary string or whatever and just move on
with life in the old way.

> And yes, I know a lot of people reading this list don't care much for
> other charsets, but people reading an english mailing list are rather
> self-selecting.

I love the idea of users being able to write things in their own
language, and somehow it magically all just "looks right" when I slam
it into the database with mysql_real_escape_string and spew it back
out the the browser with htmlentities!

But it never quite seems to work out, in my limited experience,
because some software somewhere always manages to mangle it...

And I release the whole point of Unicode in PHP 6 is to make PHP 6 not
be that piece of software that mangles it, and I'm sure you guys are
getting that bit right.  Well, I hope so anyway. :-)

I especially hope so, because if you don't get it right, I'll never be
able to tell, as I wouldn't notice the difference if it's broken or
not just by looking at the text in anything other than English.

I just get real concerned when it seems to me like a lot of scripts
are going to break, based on what folks who should know post here...

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to