If Unicode were the solution, the PHP project was on the right page with 6.0. Sure there remained work to do, but...
How long did it take to realize UTF16 wasn't the end of the story? UCS-4 is the minimum to solve this, and we all agree that 32 bits aren't storing a single char in the western world, no way, no how. The UTF-8 solution is probably the right answer... you maintain 95% of char *UTF behavior, and you gain international character representation. The only Unicode OS I can think of offhand is NT, and of course they hit the UCS-4 problem early. They found this out 15+ years ago. Sure it doesn't appear as atomic, one Xword per char, but the existing library frameworks contain most of the string processing that is required. There is no 16-bit network transmission API that I can think of, you are still devolving to UTF-8 for client results. To move forward with accepting -and preferring- UTF-8 as the representation of characters throughout PHP, recognizing UTF-8 for char-length representations, and so forth, would do wonders to move forwards. And 8-bit octet data can be set aside in the same data structures. It is the straightforward answer, which is probably why Linux did not repeat Windows NT decision, and adopted utf-8. -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php