Freddie Unpenstein wrote: > My assertion here is basically this; ASCII text (defined here as > characters 1-127) encode into UTF-8 as-is. Anything else in the 0-255 > set is considered binary, and should be encoded in its shortest > multi-byte UTF-8 form. No more, and no less. Call it Glib encoding.
So, go do it and try get it in glib if you wish. I don't see what that has to do with g_utf8_* though when it's apparently not utf8, neither called so. > I believe, that differs from the UTF-8 specification ONLY in the > handling of the NULL byte, but then I've been avoiding dealing with > UTF-8 for the most part for exactly this reason. When UTF-8 is a strict > issue, I've been using higher-level scripted languages instead, that > already deal with it natively. (And I'm not 100% certain, but I think > that's essentially what they all do.) False. XML doesn't do such nasty stuff. HTTP doesn't either. HTML doesn't either. *Only* Java does. There's a reason standarrds are good, and there's a reason people use standards. > A "convert to UTF-8" function given a UTF-8 input with a 6-byte > representation of the character 'A' would store the regular single-byte > representation. False. It errs on an overlong representation of 'A'. If it doesn't, it's a bug. > I know it's a bit of a mind-bend from where Glib/GTK is right now with > encodings, Glib/GTK developers don't like hearing from us lowly humans, > and there's always resistance to change, but specifications often change > when needed to meet practical requirements (no one has ever written a > 100% perfect specification), and personally, changing the platform and > established behaviour (much harder and more dangerous to attempt to do) > to suit the UTF-8 specification in this rather trivial issue seems far > more wrong than breaking the UTF-8 specification slightly for internal > use only. (The key being the "for internal use only", all "convert to > UTF-8" functions would still produce the strict interpretation with > \0's) It seems furthermore to be more correct in this day and age to > bend a rule like this that makes it SAFER by allowing the old > NULL-terminated string handling to function, and not force programmers > to deal specially with length specifiers, which happens to all too > frequently be a great source of coding mistakes. This would also make it > easier to migrate, for example, to UTF-16 at some point in time - > everything will already be converting between UTF-8 to Glib-8, so > transitioning to Glib-16 would be an entirely internal affair. You're totally missing the point. Allowing an alternate nul representation opens security problems much harder to track down than a crashing application. There's a lot of literature to read about this already. Not going to continue it here. behdad > Fredderic > ------------------------------------------------------------------------ > Italian Charm Bracelet > <http://tagline.excite.com/fc/JkJQPTgLuTcOdlmN1YthoWcmwJpeghCVmKv3BTMZK4ss0jqUfbgWLC/> > Click for fashionable Italian charm bracelets. > <http://tagline.excite.com/fc/JkJQPTgLuTcOdlmN1YthoWcmwJpeghCVmKv3BTMZK4ss0jqUfbgWLC/> > Click here for more information > <http://tagline.excite.com/fc/JkJQPTgLuTcOdlmN1YthoWcmwJpeghCVmKv3BTMZK4ss0jqUfbgWLC/> > > > _______________________________________________ gtk-devel-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gtk-devel-list
