Re: UTF-16 problems
Lisa Moore wrote: > Jianping wrote: > > only Oracle provides fully UTF-8 and > UTF-16 support for RDBMS > > Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16. And DB2 > for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's > intrepretation of "fully" > The "fully" here means to follow UTF-8 and UTF-16 standard with supplementary support. From this point of view, I don't think DB2 supports them as I checked its latest documentation on line that it only supports USC-2 with UTF-8 up to three-byte encoding. Checking with other vendors implementation, Microsoft only supports UTF-16 and Sybase supports UTF-16 but only UTF-8 up to three-byte encoding, I come up with my claim. I have not done fully study with other database vendors, but I welcome anyone to challenge me on this. > > Shouldn't a war about UTF-8 be discussed on Unicore? > It should not be a war but rather a technical discussion. But as some people on this list took a wrong position to make personal or company attack, I and Oracle have to defend for it. Regards, Jianping. > > Lisa begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard
RE: UTF-16 problems
Toby, I agree that there is a need to preserve standards. Oracle did not support surrogates. If you passed it a UTF-16 data stream it would not be converted into proper UTF-8 encoding. At this juncture it should have fixed UTF8. This would have worked with the old data because it had no non-plane0 codes. You would have had backwards compatibility. This is the documentation " UTF8 The UTF8 character set encodes characters in one to three bytes. Surrogate pairs require six bytes. AL32UTF8 The AL32UTF8 character set encodes characters in one to three bytes. Surrogate pairs require four bytes. " If asked to build a database for UTF-8 support which do you think a DBA would use? Do they know what surrogates are or if they should be encoded with 4 or 6 bytes? >I equate this issue identically to the Unicode Consortium's refusal >to change UCD names even when they are incredibly misleading, as >is the case with U+20A0 EURO CURRENCY SIGN. Your point is well taken. I agree that the impact of changing the name to "OBSOLETE EURO CURRENCY SIGN" or somthing similar is far less than keeping it and confusing users. The same applies to Oracle. The question is how to recover from a bad decision. 1) First explain the implications in the documentation. For example: UTF8 The UTF8 character set encodes the first 65535 Unicode characters in one to three bytes as standard UTF-8 characters. Higher Unicode characters that use UTF-16 surrogate pairs require six bytes. This is a non-standard UTF-8 encoding that is used to produce data that sorts in the same sequence as UTF-16. AL32UTF8 The AL32UTF8 character set encodes all Unicode characters in one to four bytes. This uses standard UTF-8 encoding for all Unicode characters. This sorts in UTF-32 (Standard Unicode code point) sort order. 2) In future releases change the name UTF8 to AL16UTF8. This should only affect the DBAs who build and maintain the databases. This will at least set the two on equal footing. This name change should not be a major compatibility impact. They could even make UTF8 an alias of AL16UTF8 for a few releases. Carl -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of [EMAIL PROTECTED] Sent: Monday, June 11, 2001 8:41 PM To: Michael (michka) Kaplan Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: UTF-16 problems Jianping Yang <[EMAIL PROTECTED]> wrote: >>So far, I can claim that only Oracle provides fully UTF-8 and >> UTF-16 support for RDBMS, but unfortunately, as we cannot change the exiting >> utf8 definition from Oracle 8i as backward compatibility, we have to use a new >> character set name for it as AL32UTF8. Michael (mitchka) Kaplan <[EMAIL PROTECTED]> wrote: >As many have pointed out, THIS will cause more confusion than just about >anything else. Tex is the only one who said anything but he is not the only >one to believe you are seriously undermining the standard with this >decision. It certainly does a lot to hurt interoperability. Yes, it will cause confusion, however stability, and 100% backwards compatibility is an overriding concern. I'd choose a little confusion anytime if given the choice between confusion and breaking products that depend on you. Just like systems build dependence on UCD character names, users of database systems build dependence on vendor naming conventions. Changing core API name references is not something that any responsible vendor would do without overwhelming support from their customer base, and since the database character set is chosen once per database installation, and is not visible to the average user, I see no overwhelming reason for Oracle to change this. I admit, it is confusing at first, however they do have it well documented (and I can only assume it will be documented with even greater clarity in their 9i release where many additional Unicode features have been added), and they also support the true, correct UTF-8 definition as per ISO 10646 and TUS 3.0. I equate this issue identically to the Unicode Consortium's refusal to change UCD names even when they are incredibly misleading, as is the case with U+20A0 EURO CURRENCY SIGN. This is obviously not the "Euro currency sign" regardless of its name. The description points to the appropriate character for the real sign. Oracle's had to do the same thing with their UTF8 character set to ensure backwards compatibilty and stability - leave it as-is, but document very clearly that it may not be what the user expects, and points them to an alternative character set setting (AL32UTF8). Toby.
RE: UTF-16 problems
At Mon, 11 Jun 2001 15:43:42 -0700, Carl W. Brown <[EMAIL PROTECTED]> wrote: > I first I thought the same thing but I have changed my mind. There are > problems but the problems are with UTF-16 not UTF-8. I don't think your new UTF-16 propesal solves any problem. It's yet another encoding. It won't replace the existing UTF-16. The right thing to do is to sort in order of Unicode scalar value regardless of the encodings. Period. The only reason of existence on an encoding (such as UTF-8S) is to produce the same result of the binary sort with the other encoding seems so silly. - Shigemichi Yazawa [EMAIL PROTECTED]
Re: UTF-16 problems
In a message dated 2001-06-11 21:46:38 Pacific Daylight Time, [EMAIL PROTECTED] writes: > Shouldn't a war about UTF-8 be discussed on Unicore? Please, don't excommunicate us non-members from the discussion by restricting it to the members-only unicoRe list. We have something to contribute too. I would think this war should be fought, sorry, discussed on both fronts... in fact, on as many fronts as possible -Doug Ewell Fullerton, California
Re: UTF-16 problems
At Mon, 11 Jun 2001 20:40:41 -0700, [EMAIL PROTECTED] wrote: > Yes, it will cause confusion, however stability, and 100% backwards > compatibility is an overriding concern. I'd choose a little confusion It's a BIG confusion. > Oracle's had to do the same thing with their > UTF8 character set to ensure backwards compatibilty and stability - leave > it as-is, but document very clearly that it may not be what the user > expects, and points them to an alternative character set setting > (AL32UTF8). What backward compatibility? When 8i was released, there was no supplementary characters defined. Even in 9i, Oracle only supports Unicode 3.0. They haven't officially supported supplementary characters yet. Who suffers inconvenience? Does PoepleSoft use supplementary characters in 8i or 9i? Too bad, you are using unsupported functionality. - Shigemichi Yazawa [EMAIL PROTECTED]
Re: UTF-16 problems
Lisa asked... > Shouldn't a war about UTF-8 be discussed on Unicore? Well, theoretically perhaps, but personally speaking I believe that this UTF-8 business is so choice and has such far-reaching implications for every user and so many other standards that, like presidential private lives, it's best discussed _everywhere_ with great relish. Maybe someday they'll write a song about it... ;-) Rick (not affliated with any combatant) --- Welcome to MySig.com Presidential Diversion of the Day: http://artists.mp3s.com/artist_song/175/175723.html Visit The Surrealism Server: http://www.madsci.org/~lynn/juju/surr/surrealism.html -
Re: UTF-16 problems
Jianping wrote: only Oracle provides fully UTF-8 and UTF-16 support for RDBMS Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16. And DB2 for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's intrepretation of "fully" Shouldn't a war about UTF-8 be discussed on Unicore? Lisa
Re: UTF-16 problems
Jianping Yang <[EMAIL PROTECTED]> wrote: >>So far, I can claim that only Oracle provides fully UTF-8 and >> UTF-16 support for RDBMS, but unfortunately, as we cannot change the exiting >> utf8 definition from Oracle 8i as backward compatibility, we have to use a new >> character set name for it as AL32UTF8. Michael (mitchka) Kaplan <[EMAIL PROTECTED]> wrote: >As many have pointed out, THIS will cause more confusion than just about >anything else. Tex is the only one who said anything but he is not the only >one to believe you are seriously undermining the standard with this >decision. It certainly does a lot to hurt interoperability. Yes, it will cause confusion, however stability, and 100% backwards compatibility is an overriding concern. I'd choose a little confusion anytime if given the choice between confusion and breaking products that depend on you. Just like systems build dependence on UCD character names, users of database systems build dependence on vendor naming conventions. Changing core API name references is not something that any responsible vendor would do without overwhelming support from their customer base, and since the database character set is chosen once per database installation, and is not visible to the average user, I see no overwhelming reason for Oracle to change this. I admit, it is confusing at first, however they do have it well documented (and I can only assume it will be documented with even greater clarity in their 9i release where many additional Unicode features have been added), and they also support the true, correct UTF-8 definition as per ISO 10646 and TUS 3.0. I equate this issue identically to the Unicode Consortium's refusal to change UCD names even when they are incredibly misleading, as is the case with U+20A0 EURO CURRENCY SIGN. This is obviously not the "Euro currency sign" regardless of its name. The description points to the appropriate character for the real sign. Oracle's had to do the same thing with their UTF8 character set to ensure backwards compatibilty and stability - leave it as-is, but document very clearly that it may not be what the user expects, and points them to an alternative character set setting (AL32UTF8). Toby.
RE: UTF-16 problems
Michka, I am exploring the concept. I would prefer this to UTF-8s. I am not sure that the merits balance the problems. In any case I think that if UTF-8s is accepted that they also have to accept UTF-32s. This advantage is that eventually the plane-0 code points would be phased out and we would end up with a stable solution. With UTF-8s we will be fighting the problem forever. Carl -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Monday, June 11, 2001 6:14 PM To: Carl W. Brown; unicode Subject: Re: UTF-16 problems From: "Carl W. Brown" <[EMAIL PROTECTED]> > I am proposing that we fix UTF-16. Are you formally proposing this? For the next UTC meeting? michka
Re: UTF-16 problems
"Michael (michka) Kaplan" wrote: > From: "Jianping Yang" <[EMAIL PROTECTED]> > > > > If UTF-8S were to by some miracle be accepted by > > > the UTC, implementers will be put out and offended > > > for most of the next decade. > > > > > > > If it is, that is rule of law from UTC. > > Very true. > > And if they vote against it, will you do the right thing > in THAT case as well -- never emitting this invalid form of UTF-8 again? This is already achievable in Oracle 9i by specifying Oracle client character set to AL32UTF8 or by using UTF-16 interface. > Or > will Oracle et. al. choose to ignore the law if the decision does not go > their way? > This will depend on the type of application. If the database is part of an application, the application has its own choice of character set it can receive and send to the database, providing it only sends and receives the standard UTF-8 to/from you. > > Just trying to help folks determine if all of this is being done for the > good of the standard (as has been claimed here many times). > Oracle is promoting and following the standard. Same as most other database vendors, our database does not fully support supplementary character in Oracle 8i and Oracle 7. But as we see the need to support it, we extend this support in Oracle 9i. So far, I can claim that only Oracle provides fully UTF-8 and UTF-16 support for RDBMS, but unfortunately, as we cannot change the exiting utf8 definition from Oracle 8i as backward compatibility, we have to use a new character set name for it as AL32UTF8. J.P. > > michka begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard
Re: UTF-16 problems
From: "Jianping Yang" <[EMAIL PROTECTED]> > Oracle is promoting and following the standard. Same as most other database > vendors, our database does not fully support supplementary character in Oracle > 8i and Oracle 7. But as we see the need to support it, we extend this support > in Oracle 9i. So far, I can claim that only Oracle provides fully UTF-8 and > UTF-16 support for RDBMS, but unfortunately, as we cannot change the exiting > utf8 definition from Oracle 8i as backward compatibility, we have to use a new > character set name for it as AL32UTF8. As many have pointed out, THIS will cause more confusion than just about anything else. Tex is the only one who said anything but he is not the only one to believe you are seriously undermining the standard with this decision. It certainly does a lot to hurt interoperability. Anyway, we are talking in circles at this point, its pretty clear what Oracle's position is here. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: UTF-16 problems
From: "Jianping Yang" <[EMAIL PROTECTED]> > > If UTF-8S were to by some miracle be accepted by > > the UTC, implementers will be put out and offended > > for most of the next decade. > > > > If it is, that is rule of law from UTC. Very true. And if they vote against it, will you do the right thing in THAT case as well -- never emitting this invalid form of UTF-8 again? Or will Oracle et. al. choose to ignore the law if the decision does not go their way? Just trying to help folks determine if all of this is being done for the good of the standard (as has been claimed here many times). michka
Re: UTF-16 problems
"Michael (michka) Kaplan" wrote: > From: "Jianping Yang" <[EMAIL PROTECTED]> > > > Is this the language that should be used in a professional way? I wonder > > how could this happen to the Unicode mail list! > > So many linguists afoot, and we will get bogged down in my attempts to > provide a little spice to the subject? > > The difference, of course, is that those who are offended will only be > offended for a short time. You know what I say in my word, but not to emit it out to follow professional way here to this mail list. > If UTF-8S were to by some miracle be accepted by > the UTC, implementers will be put out and offended for most of the next > decade. > If it is, that is rule of law from UTC. > > Trigeminal Software votes for simplicity. Every vote will be count in UTC. Sorry we may get a dimple vote from you this time. J.P. > > michka begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard
Re: UTF-16 problems
From: "Jianping Yang" <[EMAIL PROTECTED]> > Is this the language that should be used in a professional way? I wonder > how could this happen to the Unicode mail list! So many linguists afoot, and we will get bogged down in my attempts to provide a little spice to the subject? The difference, of course, is that those who are offended will only be offended for a short time. If UTF-8S were to by some miracle be accepted by the UTC, implementers will be put out and offended for most of the next decade. Trigeminal Software votes for simplicity. michka
Re: UTF-16 problems
From: "Carl W. Brown" <[EMAIL PROTECTED]> > I am proposing that we fix UTF-16. Are you formally proposing this? For the next UTC meeting? michka
Re: UTF-16 problems
(whoops, sent too soon!) From: "Carl W. Brown" <[EMAIL PROTECTED]> > I am proposing that we fix UTF-16. Are you formally proposing this? For the next UTC meeting? Without an actual customer that is wanting it for an implementation I am pretty sure this will be voted down pretty loudly. michka
Re: UTF-16 problems
Is this the language that should be used in a professional way? I wonder how could this happen to the Unicode mail list! "Michael (michka) Kaplan" wrote: > From: "Rick McGowan" <[EMAIL PROTECTED]> > > > > ... asking for a lavicious license to be lecherously lazy > > > > Parse error at "lavicious". No such word appears in any English > > dictionary I own, not even the OED. > > Sorry, that was to be lascivious. > > Glad someone is still parsing in this thread. > > michka begin:vcard n:Yang;Jianping tel;fax:650-506-7225 tel;work:650-506-4865 x-mozilla-html:FALSE org:Server Gobalization Technology;Server Technology version:2.1 email;internet:[EMAIL PROTECTED] title:Senior Development Manager adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065; fn:Jianping Yang end:vcard
Re: UTF-16 problems
From: "Rick McGowan" <[EMAIL PROTECTED]> > > ... asking for a lavicious license to be lecherously lazy > > Parse error at "lavicious". No such word appears in any English > dictionary I own, not even the OED. Sorry, that was to be lascivious. Glad someone is still parsing in this thread. michka
Re: UTF-16 problems
Michael Kaplan <[EMAIL PROTECTED]> wrote: > ... asking for a lavicious license to be lecherously lazy Parse error at "lavicious". No such word appears in any English dictionary I own, not even the OED. Rick
RE: UTF-16 problems
Michka, I guess that we can agree to disagree. I can see that if for nothing else having UTF-16 sort in Unicode code point order with a simple binary search has real performance advantages. You don't see it much in C code but some assembler implementations can really benefit. For example on an IBM 370/390 you can compare up to 16MB with a single machine instruction. Having to adjust for code point sequences for each character will add significant overhead. I am proposing that we fix UTF-16. Probably the most common use of these code points is for hankata (half width katakana). If the application does not have UTF-16 support it will work as it does normally. UTF-16 applications will translate either code point to the code page katakana character. Going back to UTF-16 it will use the high end surrogate character. If is sends the data to a system that does not have UTF-16 support it will have to convert by shifting the characters to the alternate code points(UTF-16 to UCS-2). UTF-8 representation will be 4 characters rather than 3 and UTF-16 will require 2 positions rather than 1. UTF-16 fonts will have to map both code points. In other words it will be a little more overhead. But it will eliminate the need for either UTF-8s or UTF-32s. Providing two code positions the same character is a requirement of UTF-32s anyway so the impact of this change is far less that splitting UTF-8 into two incompatible systems. This one is at least interchangeable. The real beauty for this system is that even when converting from UTF-16 to UCS-2 the UCS-2 applications will still sort in the same relative order as the UTF-16, UTF-8 and UTF-32 applications. You will produce different UTF-8 codes because those will correspond to the new UTF-16 code points. So there will be some minor changes to the code that converts from UCS-2 to UTF-8. But even if the code is not adjusted you will still get valid UTF-8 encoding, it will just not sort properly. That is certainly preferable to broken encoding. Carl -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Monday, June 11, 2001 3:47 PM To: Carl W. Brown; unicode Subject: Re: UTF-16 problems From: "Carl W. Brown" <[EMAIL PROTECTED]> > I first I thought the same thing but I have changed my mind. There are > problems but the problems are with UTF-16 not UTF-8. I don't think that I > am the only one who thinks that UTF-8s will create more problems that it > fixes. > > Worse yet they will also have to "fix" UTF-32 as well. > > The point of this message is to fix UTF-16 which is the source of the > problem. These changes are no more of a stretch than UTF-32s. The UTF-32s > proposal that I heard involves replication the same code points to get these > code points to sort high like UTF-16. > > What this does, is the legitimize the code point shift for UTF-16, UTF-8, > and UTF-32 so that the transforms all work and all sort the same and that > the binary sort and Unicode sort orders are the same. > > It does involve a minor normalization transform but you have to do that for > UTF-32s anyway and UTF-32s is required if you allow support of UTF-8s. The > big difference is that you don't change any UTF protocols or develop two > mutually exclusive transforms that are so similar that they might be > confused. Besides this transform keeps UTF-8 to 4 bytes not 6 and will work > with the existing UTF-8 software. > > The beauty of this proposal is that UCS-2 (plane 0 only) codes will sort in > the same order as the post transformed UTF-16 codes. Carl, I would agree with you except for one thing no one needs this, to solve their implementation issues! Why would everyone want to turn around and have to change all their implementations around, including the lazy folks who are asking others to change for their sake, to support something that no one wants to do? The whole UTF-8S mess is a bunch of people asking for a lavicious license to be lecherously lazy (they should have called it UTF-8L in effigy). No one is interested in doing a bunch of work here: 1) There is the group of people who took responsibility for their implementations at some point in the last seven years to properly support supplementary characters. They do not want to do any extra work since they work just fine. 2) There is the group of people who are scrambling around trying to get their laziness canonized as the forward looking savior of a solution that all of us were too foolish to realize is vital -- they do not want to do any work either (except marketing work to convince everyone how right they are). 3) There is the group of people who can't believe how far this has come. I understand the technical merit of the suggestion, and it is technically superior to the UTF-8S plan (this is of course not saying much, but your plan is well t
Re: UTF-16 problems
From: "Carl W. Brown" <[EMAIL PROTECTED]> > I first I thought the same thing but I have changed my mind. There are > problems but the problems are with UTF-16 not UTF-8. I don't think that I > am the only one who thinks that UTF-8s will create more problems that it > fixes. > > Worse yet they will also have to "fix" UTF-32 as well. > > The point of this message is to fix UTF-16 which is the source of the > problem. These changes are no more of a stretch than UTF-32s. The UTF-32s > proposal that I heard involves replication the same code points to get these > code points to sort high like UTF-16. > > What this does, is the legitimize the code point shift for UTF-16, UTF-8, > and UTF-32 so that the transforms all work and all sort the same and that > the binary sort and Unicode sort orders are the same. > > It does involve a minor normalization transform but you have to do that for > UTF-32s anyway and UTF-32s is required if you allow support of UTF-8s. The > big difference is that you don't change any UTF protocols or develop two > mutually exclusive transforms that are so similar that they might be > confused. Besides this transform keeps UTF-8 to 4 bytes not 6 and will work > with the existing UTF-8 software. > > The beauty of this proposal is that UCS-2 (plane 0 only) codes will sort in > the same order as the post transformed UTF-16 codes. Carl, I would agree with you except for one thing no one needs this, to solve their implementation issues! Why would everyone want to turn around and have to change all their implementations around, including the lazy folks who are asking others to change for their sake, to support something that no one wants to do? The whole UTF-8S mess is a bunch of people asking for a lavicious license to be lecherously lazy (they should have called it UTF-8L in effigy). No one is interested in doing a bunch of work here: 1) There is the group of people who took responsibility for their implementations at some point in the last seven years to properly support supplementary characters. They do not want to do any extra work since they work just fine. 2) There is the group of people who are scrambling around trying to get their laziness canonized as the forward looking savior of a solution that all of us were too foolish to realize is vital -- they do not want to do any work either (except marketing work to convince everyone how right they are). 3) There is the group of people who can't believe how far this has come. I understand the technical merit of the suggestion, and it is technically superior to the UTF-8S plan (this is of course not saying much, but your plan is well thought out!). The problem is that this is a solution that is looking for a problem. The only people who have the problem are the ones who were not thinking ahead, and they do not want to throw away their current solution, they are too in love with it. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
RE: UTF-16 problems
Michka, I first I thought the same thing but I have changed my mind. There are problems but the problems are with UTF-16 not UTF-8. I don't think that I am the only one who thinks that UTF-8s will create more problems that it fixes. Worse yet they will also have to "fix" UTF-32 as well. The point of this message is to fix UTF-16 which is the source of the problem. These changes are no more of a stretch than UTF-32s. The UTF-32s proposal that I heard involves replication the same code points to get these code points to sort high like UTF-16. What this does, is the legitimize the code point shift for UTF-16, UTF-8, and UTF-32 so that the transforms all work and all sort the same and that the binary sort and Unicode sort orders are the same. It does involve a minor normalization transform but you have to do that for UTF-32s anyway and UTF-32s is required if you allow support of UTF-8s. The big difference is that you don't change any UTF protocols or develop two mutually exclusive transforms that are so similar that they might be confused. Besides this transform keeps UTF-8 to 4 bytes not 6 and will work with the existing UTF-8 software. The beauty of this proposal is that UCS-2 (plane 0 only) codes will sort in the same order as the post transformed UTF-16 codes. Carl -Original Message- From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]] Sent: Monday, June 11, 2001 1:22 PM To: Carl W. Brown; unicode Subject: Re: UTF-16 problems From: "Carl W. Brown" <[EMAIL PROTECTED]> > I think that UTF-16x would be a better approach than UTF-8s. I am sure that > I have missed some issues feel free to comment. In any case UTF-16s would > naturally be in Unicode code point order. It would be easy to transform to > UCS-2 for applications that do not support UTF-16. Carl, you are missing the central point of the UTF-8S movement -- they do not want to change anything. Hell, they do not even want to change the *name* they are so disinterested in changing anything! They want the Unicode standard to embrace their format and support their bug, and not change a bleeding thing. They are distorting the truth (companies who only care about the whole mess for the sake of compatibility with Oracle are being quoted as being "intensely supportive of UTF-8S", and I'm sorry but distortion is the only word for it). Revisionist history and revisionist present/future at its finest, all you need is suspension is diebelief and you can vote for UTF-8S knowing that you are saving the standard from oblivion! Where are all these conspiracy buffs when you need them? They can have a field day with this little adventure we have been having. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/
Re: UTF-16 problems
From: "Carl W. Brown" <[EMAIL PROTECTED]> > I think that UTF-16x would be a better approach than UTF-8s. I am sure that > I have missed some issues feel free to comment. In any case UTF-16s would > naturally be in Unicode code point order. It would be easy to transform to > UCS-2 for applications that do not support UTF-16. Carl, you are missing the central point of the UTF-8S movement -- they do not want to change anything. Hell, they do not even want to change the *name* they are so disinterested in changing anything! They want the Unicode standard to embrace their format and support their bug, and not change a bleeding thing. They are distorting the truth (companies who only care about the whole mess for the sake of compatibility with Oracle are being quoted as being "intensely supportive of UTF-8S", and I'm sorry but distortion is the only word for it). Revisionist history and revisionist present/future at its finest, all you need is suspension is diebelief and you can vote for UTF-8S knowing that you are saving the standard from oblivion! Where are all these conspiracy buffs when you need them? They can have a field day with this little adventure we have been having. MichKa Michael Kaplan Trigeminal Software, Inc. http://www.trigeminal.com/