On Wed, Aug 25, 2004 at 03:45:24PM -0500, Jeff Licquia wrote:
As I understood it, this was the major objection. However, I've also heard that the patch works this way because of some performance considerations when handling unibyte with the multibyte code.
I've heard that too. Bottom line is that nobody who has to maintain the code in question wants to maintain two code paths with logic that should be identical except in character width.
It's my understanding that a proper multibyte implementation still uses fixed-width characters, just wider. Specifically, most people told me that it's futile to use UTF-8 Unicode internally; instead, UTF-8 input should be converted to UCS-2 for internal use and then manipulated as multibyte.
Interesting, since UCS-2 doesn't cover the whole unicode space. I assume the patches are handling the necessary mapping?
Obviously, the question is: what to do with UTF-8 in external files?
Not just files, consider command line arguments.
It may not be "the" patch, but it is "a" patch, and the lack of any other makes it "the" patch by default. Certainly the other distributions have been taking that approach.
It may be the patch by default, but that doesn't mean it will be included. Each distribution has to decide which ugly hacks it is willing to support. This isn't the first time that the choices have diverged. Mike Stone

