Hi Zeev,

On Wed, 2007-07-18 at 01:58 -0700, Zeev Suraski wrote:
> >Regarding the unicode on/off modes, I don't think you put yourself in
> >the developer's view at all. Users are not going to be better of having
> >to deal with both modes.
> 
> Well, I tend to agree with you that they shouldn't have to handle 
> BOTH modes (write code that works with both settings).  But they will 
> definitely be better off if they can choose one of these modes and 
> develop/deploy for it.
> 
> For someone for whom PHP 6 is a non-item (no interest in Unicode), 
> moving to PHP 6 and being forced to audit his code will be a 
> completely unreasonable cost of migration.  A clear 'not worth it' situation.

The question here in my opinion is: How much harm should we do to users
who develop new things in order to make lives simpler for these who need
BC.

The first thing I see is: Having these two modes is a pita for everybody
who wants to write portable code. The modes act different depending on
that switch, some parts of PHP work quite different, some of these
changes can be worked around in a quite simple way others not that easy
but still possible. (since the engine still knows unicode and you still
can make it all think there's some more unicode in there) But for a new
application it's imo bad to need such compatibility hacks.

If you want clean code you might concentrate on one of these two modes -
but which? The faster oder the better? Well, that depends on what
hoster's will configure, but how should they know?

For hosters it's hard to decide which road to go. Offer both? - Offering
both is, from the complexity, the same as hosting PHP 5 and PHP 6 since
unicode.semantics is PHP_INI_SYSTEM, meaning you need independent PHP
instances (FastCGI, individual hosts, whatever) Another possibility is
offering just PHP 6 with unicode.semantics Off. In my opinion a hoster
doing that might not advertise offering PHP 6 with that mode off since
it's only offering half of PHP 6 (namespaces, gote, maybe LSB, ...) or
offer PHP 6 + unicode and PHP 5 for BC. For me this feels like the most
sane way by the means of BC - on the one hand you have the full BC by
using PHP 5 on the other hand you're offering full PHP 6 for the ones
who need this feature.

Talking about BC: Except for the unicode stuff PHP 6 will most likely
have around the same amount of BC breaks as PHP 5 had compared to PHP 4.
(there are already a few tiny ones, like you can't call your functions
"goto" anymore and such stuff). PHP 5 offers an compatibility mode for
PHP 4, the benefit of that mode, compared to PHP 6's BC mode was that
one might change it even at runtime. What might help doing the migration
(while making the code ugly but hopefully such hacks are temporarily)

Another argument for that setting I read was performance. I didn't do
proper  benchmarks of the code comparing both modes so I don't know how
relevant the impact is but if performance of the unicode mode really is
a big problem for most users we are really going to have a big problem
since then we have to keep the mode forever and I, who can really live
with using ISO-8859-1, am wondering whether it really makes sense to
change half the engine for a mode which is too slow for most cases and
only needed by a minority of users (some mentioned in these discussions
numers like 10 % unicode mode on, 90% off ...) and whether it won't be
better do concentrate on the intl and mbstring extensions to improve the
tools for the ones needing better support in the area without harming
most users. But well, as said: Here I'm just wondering after reading the
previous discussions.

This all gives me the conclusion that we really should consider removing
the mode, but well, that's my opinion.

> > > As for ereg - especially in light of the discontinuation of PHP 4 we
> > > shouldn't even consider removing it in PHP 5.
> >
> >I don't think anybody wanted to remove it in PHP 5 - just make it
> >possible to disable as an extension.
> 
> Great, I misunderstood.

This gives me the possibility to come back to the original topic of this
thread, which wasn't about the unicode.semantics mode: Since I think we
should remove that setting I think we should disable ereg with PHP 6
since for what I know ereg won't work with unicode data. Regular
expressions which won't work on the main data type are pointless in my
opinion.

Besides that there are two other reasons I see:
- ereg functions are marked as deprecated for ages so user's should be
  prepared
- ereg functions aren't binary safe - most cases where I've seen them
  where most likely insecure since people didn't know you can bypass 
  ereg-based input checking by inserting nullbytes so removing these
  helps writing more secure code

In most cases a workaround, by PHP_Compat or something, can be offered
by escaping slashes in the pattern, adding slashes as delimiters and
give that to preg - this won't work in all cases but I'm sure it works
in most cases.


Ah, another thing kind of related to this thread: We really need a
proper way of having decisions declared  as being made. Recently it
happened quite often that many developer's thought some decision was
made (for example from reading the Paris meeting notes)  and then some
developers come and say there wasn't anything finally decided, yet. But
imo it's important to decide some things (like removal of possibly often
used functionality) soon so user's can be informed and prepare their
code and developers here can spent time on theses tasks knowing that
they are following decisions. Maybe this should discussed independently
from this thread - but it's a good example for the need... (while there
might be reasons to change the decision - but that shouldn't happen too
often)

johannes

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to