Hi Zeev, On Wed, 2007-07-18 at 01:58 -0700, Zeev Suraski wrote: > >Regarding the unicode on/off modes, I don't think you put yourself in > >the developer's view at all. Users are not going to be better of having > >to deal with both modes. > > Well, I tend to agree with you that they shouldn't have to handle > BOTH modes (write code that works with both settings). But they will > definitely be better off if they can choose one of these modes and > develop/deploy for it. > > For someone for whom PHP 6 is a non-item (no interest in Unicode), > moving to PHP 6 and being forced to audit his code will be a > completely unreasonable cost of migration. A clear 'not worth it' situation.
The question here in my opinion is: How much harm should we do to users who develop new things in order to make lives simpler for these who need BC. The first thing I see is: Having these two modes is a pita for everybody who wants to write portable code. The modes act different depending on that switch, some parts of PHP work quite different, some of these changes can be worked around in a quite simple way others not that easy but still possible. (since the engine still knows unicode and you still can make it all think there's some more unicode in there) But for a new application it's imo bad to need such compatibility hacks. If you want clean code you might concentrate on one of these two modes - but which? The faster oder the better? Well, that depends on what hoster's will configure, but how should they know? For hosters it's hard to decide which road to go. Offer both? - Offering both is, from the complexity, the same as hosting PHP 5 and PHP 6 since unicode.semantics is PHP_INI_SYSTEM, meaning you need independent PHP instances (FastCGI, individual hosts, whatever) Another possibility is offering just PHP 6 with unicode.semantics Off. In my opinion a hoster doing that might not advertise offering PHP 6 with that mode off since it's only offering half of PHP 6 (namespaces, gote, maybe LSB, ...) or offer PHP 6 + unicode and PHP 5 for BC. For me this feels like the most sane way by the means of BC - on the one hand you have the full BC by using PHP 5 on the other hand you're offering full PHP 6 for the ones who need this feature. Talking about BC: Except for the unicode stuff PHP 6 will most likely have around the same amount of BC breaks as PHP 5 had compared to PHP 4. (there are already a few tiny ones, like you can't call your functions "goto" anymore and such stuff). PHP 5 offers an compatibility mode for PHP 4, the benefit of that mode, compared to PHP 6's BC mode was that one might change it even at runtime. What might help doing the migration (while making the code ugly but hopefully such hacks are temporarily) Another argument for that setting I read was performance. I didn't do proper benchmarks of the code comparing both modes so I don't know how relevant the impact is but if performance of the unicode mode really is a big problem for most users we are really going to have a big problem since then we have to keep the mode forever and I, who can really live with using ISO-8859-1, am wondering whether it really makes sense to change half the engine for a mode which is too slow for most cases and only needed by a minority of users (some mentioned in these discussions numers like 10 % unicode mode on, 90% off ...) and whether it won't be better do concentrate on the intl and mbstring extensions to improve the tools for the ones needing better support in the area without harming most users. But well, as said: Here I'm just wondering after reading the previous discussions. This all gives me the conclusion that we really should consider removing the mode, but well, that's my opinion. > > > As for ereg - especially in light of the discontinuation of PHP 4 we > > > shouldn't even consider removing it in PHP 5. > > > >I don't think anybody wanted to remove it in PHP 5 - just make it > >possible to disable as an extension. > > Great, I misunderstood. This gives me the possibility to come back to the original topic of this thread, which wasn't about the unicode.semantics mode: Since I think we should remove that setting I think we should disable ereg with PHP 6 since for what I know ereg won't work with unicode data. Regular expressions which won't work on the main data type are pointless in my opinion. Besides that there are two other reasons I see: - ereg functions are marked as deprecated for ages so user's should be prepared - ereg functions aren't binary safe - most cases where I've seen them where most likely insecure since people didn't know you can bypass ereg-based input checking by inserting nullbytes so removing these helps writing more secure code In most cases a workaround, by PHP_Compat or something, can be offered by escaping slashes in the pattern, adding slashes as delimiters and give that to preg - this won't work in all cases but I'm sure it works in most cases. Ah, another thing kind of related to this thread: We really need a proper way of having decisions declared as being made. Recently it happened quite often that many developer's thought some decision was made (for example from reading the Paris meeting notes) and then some developers come and say there wasn't anything finally decided, yet. But imo it's important to decide some things (like removal of possibly often used functionality) soon so user's can be informed and prepare their code and developers here can spent time on theses tasks knowing that they are following decisions. Maybe this should discussed independently from this thread - but it's a good example for the need... (while there might be reasons to change the decision - but that shouldn't happen too often) johannes -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php