On 12 Mar 2010, at 20:15, Philip Olson wrote:
On Mar 12, 2010, at 10:46 AM, Stanislav Malyshev wrote:
Hi!
Yeah.
We tried it, and it simply didn't pan out (performance, bc, lost
interest, ..).
I think it is a bit premature to declare the death of Unicode in
PHP. Yes, we know there are problems, and yes, it was harder that
initially thought, so we may want to take a step back and rethink
it. Also we may want to get Unicode out of the way of other PHP
development, since it's taking longer than planned. But that
doesn't mean we should bury it.
How have other languages progressed down the unicode road? Is there
anything we can learn from their progress over these past few years?
From all the languages that I've had dealings with, only Python has
attempted anything like the previous PHP 6 attempt. Ruby's move to a
certain level of Unicode support in 1.9 is interesting, though I'm not
entirely sure that's been out for long enough to draw any real
conclusions about uptake of it from.
I think the most important thing learnt from the Python case is that
backwards compatibility is paramount, and trying to break backwards
compatibility with programmatic conversion to the new language version
is hard to gather uptake on, yet alone what happened with the old PHP6
branch, which would've broken large amounts of applications with no
way to programmatically convert code to it.
Python 2 had no problem getting uptake where Unicode strings need to
be specifically marked (e.g., u"foo" as opposed to "foo"), yet Python
3 (which can mostly be programmatically converted from Python 2) has
had comparatively little uptake due to its incompatibility.
So, let me start with what I want to be true of PHP 6: anything that
runs under PHP 5.3 and does not throw any errors (with E_ALL |
E_DEPRECATED) must behave identically under PHP 6.
That single statement has quite a lot of consequences, but, with
regards to Unicode, one thing more than anything else: Unicode strings
cannot be the default. I have plenty of code that uses UTF-8 in some
strings and arbitrary binary data in others. I want to be able to move
to PHP 6 gradually: I shouldn't have to wait for every library I rely
upon to be modified for PHP 6 compatibility. I should just be able to
move to PHP 6, and look over my own code and change what strings I
want to Unicode strings.
To point out what should be obvious to everyone here: one of the
biggest strengths of PHP is the large amount of library and
applications already written for it. Making a large, backwards
incompatible change such as making Unicode strings the default would
not only limit adoption to those who have entirely new code, but also
alienate most shared-hosting providers who cannot afford to break
their clients code because of a backwards incompatible change that'll
break everyone's applications.
If there's one thing I've learnt from working on browsers for the past
few years it's that backwards compatibility is more valuable than
something new and shiny. I have no doubt PHP needs Unicode support,
but I don't think that breaking backwards compatibility for it is the
right solution. The fact that PHP is deployed as it is, often in
shared hosting setups, should very much be a reason to be concerned
for backwards compatibility. A browser would get almost no marketshare
if it broke a large percentage of existing websites; I believe the
same to be true of PHP with the websites it powers.
--
Geoffrey Sneddon
<http://gsnedders.com/>
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php