> On Thu, Jan 10, 2002 at 01:28:53PM -0800, Edward Cherlin wrote: > > > Hmm. Looks like Unicode language tags are a much better solution. > > > > Unicode language tags are heavily deprecated. Language tagging is > > markup, and there is no point pretending you have plain > text when you > > mark languages. > > Heavily deprecated? They were only added to the main body of the > standard in Unicode 3.1, which isn't a year old.
Their use is and was "strongly discouraged" from the very beginning, some years ago. As somebody wrote, the UTC approved them "while holding their noses" ;-) > > If you want tagging in plain text, use a standard. As far as I can > > tell, the best available standard for such things is XML, which > > defines Unicode as its preferred character set. > > The reason these characters *exist* is for specifying the > language where NO! The ONLY reason these "plane 14" language tag characters exist is political: some people involved with an IETF working group "threatened" to put forward a particularly bad variant of UTF-8 where illegal UTF-8 sequences was used to mark the language. The "plane 14" language tags proposal was made ONLY in order to kill that deviant form of UTF-8. Nobody (except this IETF group) involved WANTED these "plane 14" characters for any other reason. The reason this IETF group "wanted" a "plain text" solution for this is beyond me. The context was a new protocol, and it would make perfect sense to use syntax in that protocol itself to mark the language (if needed), as done in some other Internet protocols. As far as I am aware, the "plane 14" language tags are still (fortunately) unused by any IETF group... (But I haven't followed what they are doing.) /kent k -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/