On Fri, July 6, 2007 1:23 am, Stanislav Malyshev wrote:
>> You mean this will break:
>>
>> <?php
>>   $mask = 0xf0;
>>   $value = $_POST['foo'] & $mask;
>> ?>
>>
>> because of Unicode?
>
> I'd say it won't do what it did before. Though I'm not sure bit
> operations on unicode make any sense at all... The problem here is the
> requirement conflict - how PHP can possibly know if $_POST['foo'] is a
> bit field or unicode string?

I'm starting to be quite concerned about PHP 6 Unicode, then...

Maybe strings should be UTF-8 until declared otherwise or something,
because this just won't fly...

As for how it knows?

I dunno.  Aren't there headers to indicate what kind of data is coming
in?

Should there be?

If there aren't, or can't be, then you have to let ME tell you what it
is.

You can't just go assuming I've got UTF-16 data coming in --
especially not when the entire Internet has been built and subsisted
on ASCII (more or less) for over a decade.

>> But if I haven't done something new-fangled to make a string be some
>> new-fangled Unicode thingie, then it's just plain old ASCII, no?
>>
>> Or PHP can just assume that anyway...
>
> It can't if we want to keep UTF-16. UTF-16 unlike UTF-8 is not
> compatible with ascii. We could have some "smart downgrade" attempt -
> Python 2 currently does something like this - but it won't work in all
> situations.

This is nuts.

Anybody who actually NEEDS Unicode ought to be the ones who have to
type a new keyword or something, not the bazillion users who have no
need for Unicode and likely never will...

>> But an old script ought to just work...
>
> Sometimes it's not possible - if you use the same variable as string
> and
> bitfield, and bit representation of the string changes, it can't just
> work anymore, something needs to be done to bring them together.

It's just an ASCII string, same as it's always been.

Don't go changing that out from under users for the zillion lines of
code already written.

If you need some new-fangled UTF-16 datatype stringie, then go ahead
and give yourself one.

But don't change all MY data to UTF-16 when it isn't UTF-16!!!

You've got 10 YEARS of legacy data built up being managed by billions
of scripts.

In what sane world do you suddenly declare all that data isn't ASCII
any more and claim that it's UTF-16 when UTF-16 isn't backwards
compatible with ASCII?

>>> Unicode code points can be defined with \u, but PHP6 breaks
>>> existing
>>> octal
>>> and hex escape sequences.
>
> I don't understand what this means...

I think I know...

I have code like this, somewhere:

if (preg_match("|[\xF0-\xFF]|", $data)){
  $data = un_microsuck($data);
}

un_microsuck() basically detects and converts any of the goof-ball
extended ASCII from MS products (Word, Outlook, etc) to an HTML
equivalent character.

But now \xF0 isn't going to be ASCII 128 anymore, is it?

Or maybe \xF0 will "work" but the octal \360 won't?

Yikes.

You think PHP 5 adoption rate was slow?

PHP 6 will be GLACIAL if you're changing that much out from under people.

Changing the definition of a string, arguably the most basic data type
in PHP, is not a Good Idea.

I'm sorry not to have spoken up earlier -- I simply failed to
understand what it was anybody was talking about before. :-(

Cripes, now I have to be the curmudgeon who won't let go of PHP 5. :-(

-- 
Some people have a "gift" link here.
Know what I want?
I want you to buy a CD from some indie artist.
http://cdbaby.com/browse/from/lynch
Yeah, I get a buck. So?

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to