Re: UTF-16 problems

2001-06-12 Thread Jianping Yang



Lisa Moore wrote:

> Jianping wrote:
>
> only Oracle provides fully UTF-8 and
> UTF-16 support for RDBMS
>
> Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16.  And DB2
> for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's
> intrepretation of "fully"
>

The "fully" here means to follow UTF-8 and UTF-16 standard with supplementary
support. From this point of view, I don't think DB2 supports them as I  checked
its latest documentation on line that it only supports USC-2 with UTF-8 up to
three-byte encoding.

Checking with other vendors implementation, Microsoft only supports UTF-16 and
Sybase supports UTF-16 but only UTF-8 up to three-byte encoding, I come up with
my claim. I have not done fully study with other database vendors, but I
welcome anyone to challenge me on this.

>
> Shouldn't a war about UTF-8 be discussed on Unicore?
>

It should not be a war but rather a technical discussion. But as some people on
this list took a wrong position to make personal or company attack, I and
Oracle have to defend for it.

Regards,
Jianping.

>
> Lisa


begin:vcard 
n:Yang;Jianping
tel;fax:650-506-7225
tel;work:650-506-4865
x-mozilla-html:FALSE
org:Server Gobalization Technology;Server Technology
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Development Manager
adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065;
fn:Jianping Yang
end:vcard



RE: UTF-16 problems

2001-06-12 Thread Carl W. Brown

Toby,

I agree that there is a need to preserve standards.  Oracle did not support
surrogates.  If you passed it a UTF-16 data stream it would not be converted
into proper UTF-8 encoding.  At this juncture it should have fixed UTF8.
This would have worked with the old data because it had no non-plane0 codes.
You would have had backwards compatibility.

This is the documentation

"
  UTF8

  The UTF8 character set encodes characters in one to three bytes.
  Surrogate pairs require six bytes.


  AL32UTF8

  The AL32UTF8 character set encodes characters in one to three bytes.
  Surrogate pairs require four bytes.
"

If asked to build a database for UTF-8 support which do you think a DBA
would use?  Do they know what surrogates are or if they should be encoded
with 4 or 6 bytes?

>I equate this issue identically to the Unicode Consortium's refusal
>to change UCD names even when they are incredibly misleading, as
>is the case with U+20A0 EURO CURRENCY SIGN.

Your point is well taken.  I agree that the impact of changing the name to
"OBSOLETE EURO CURRENCY SIGN" or somthing similar is far less than keeping
it and confusing users.  The same applies to Oracle.  The question is how to
recover from a bad decision.

1) First explain the implications in the documentation.  For example:

  UTF8

  The UTF8 character set encodes the first 65535 Unicode characters in one
to three
  bytes as standard UTF-8 characters.  Higher Unicode characters that use
UTF-16
  surrogate pairs require six bytes.  This is a non-standard UTF-8
  encoding that is used to produce data that sorts in the same sequence as
UTF-16.

  AL32UTF8

  The AL32UTF8 character set encodes all Unicode characters in one to four
bytes.
  This uses standard UTF-8 encoding for all Unicode characters.  This sorts
in UTF-32
  (Standard Unicode code point) sort order.

2) In future releases change the name UTF8 to AL16UTF8.  This should only
affect the DBAs who build and maintain the databases.  This will at least
set the two on equal footing.  This name change should not be a major
compatibility impact.  They could even make UTF8 an alias of AL16UTF8 for a
few releases.

Carl



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of [EMAIL PROTECTED]
Sent: Monday, June 11, 2001 8:41 PM
To: Michael (michka) Kaplan
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: UTF-16 problems



Jianping Yang <[EMAIL PROTECTED]> wrote:
>>So far, I can claim that only Oracle provides fully UTF-8 and
>> UTF-16 support for RDBMS, but unfortunately, as we cannot change the
exiting
>> utf8 definition from Oracle 8i as backward compatibility, we have to use
a new
>> character set name for it as AL32UTF8.

Michael (mitchka) Kaplan <[EMAIL PROTECTED]> wrote:
>As many have pointed out, THIS will cause more confusion than just about
>anything else. Tex is the only one who said anything but he is not the
only
>one to believe you are seriously undermining the standard with this
>decision. It certainly does a lot to hurt interoperability.

Yes, it will cause confusion, however stability, and 100% backwards
compatibility is an overriding concern.  I'd choose a little confusion
anytime if given the choice between confusion and breaking products that
depend on you.

Just like systems build dependence on UCD character names, users of
database systems build dependence on vendor naming conventions.  Changing
core API name references is not something that any responsible vendor would
do without overwhelming support from their customer base, and since the
database character set is chosen once per database installation, and is not
visible to the average user, I see no overwhelming reason for Oracle to
change this.  I admit, it is confusing at first, however they do have it
well documented (and I can only assume it will be documented with even
greater clarity in their 9i release where many additional Unicode features
have been added), and they also support the true, correct UTF-8 definition
as per ISO 10646 and TUS 3.0.

I equate this issue identically to the Unicode Consortium's refusal to
change UCD names even when they are incredibly misleading, as is the case
with U+20A0 EURO CURRENCY SIGN.  This is obviously not the "Euro currency
sign" regardless of its name.  The description points to the appropriate
character for the real sign.  Oracle's had to do the same thing with their
UTF8 character set to ensure backwards compatibilty and stability - leave
it as-is, but document very clearly that it may not be what the user
expects, and points them to an alternative character set setting
(AL32UTF8).

Toby.







RE: UTF-16 problems

2001-06-12 Thread Shigemichi Yazawa

At Mon, 11 Jun 2001 15:43:42 -0700,
Carl W. Brown <[EMAIL PROTECTED]> wrote:
> I first I thought the same thing but I have changed my mind.  There are
> problems but the problems are with UTF-16 not UTF-8.

I don't think your new UTF-16 propesal solves any problem. It's yet
another encoding. It won't replace the existing UTF-16. The right
thing to do is to sort in order of Unicode scalar value regardless of
the encodings. Period. The only reason of existence on an encoding
(such as UTF-8S) is to produce the same result of the binary sort with
the other encoding seems so silly.

-
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: UTF-16 problems

2001-06-11 Thread DougEwell2

In a message dated 2001-06-11 21:46:38 Pacific Daylight Time, 
[EMAIL PROTECTED] writes:

>  Shouldn't a war about UTF-8 be discussed on Unicore?

Please, don't excommunicate us non-members from the discussion by restricting 
it to the members-only unicoRe list.  We have something to contribute too.

I would think this war should be fought, sorry, discussed on both fronts... 
in fact, on as many fronts as possible

-Doug Ewell
 Fullerton, California




Re: UTF-16 problems

2001-06-11 Thread Shigemichi Yazawa

At Mon, 11 Jun 2001 20:40:41 -0700,
[EMAIL PROTECTED] wrote:
> Yes, it will cause confusion, however stability, and 100% backwards
> compatibility is an overriding concern.  I'd choose a little confusion

It's a BIG confusion.

> Oracle's had to do the same thing with their
> UTF8 character set to ensure backwards compatibilty and stability - leave
> it as-is, but document very clearly that it may not be what the user
> expects, and points them to an alternative character set setting
> (AL32UTF8).

What backward compatibility? When 8i was released, there was no
supplementary characters defined. Even in 9i, Oracle only supports
Unicode 3.0. They haven't officially supported supplementary
characters yet. Who suffers inconvenience? Does PoepleSoft use
supplementary characters in 8i or 9i? Too bad, you are using
unsupported functionality.

-
Shigemichi Yazawa
[EMAIL PROTECTED]




Re: UTF-16 problems

2001-06-11 Thread Rick McGowan

Lisa asked...

> Shouldn't a war about UTF-8 be discussed on Unicore?

Well, theoretically perhaps, but personally speaking I believe that this  
UTF-8 business is so choice and has such far-reaching implications for  
every user and so many other standards that, like presidential private  
lives, it's best discussed _everywhere_ with great relish.

Maybe someday they'll write a song about it... ;-)

Rick (not affliated with any combatant)



--- Welcome to MySig.com 
Presidential Diversion of the Day:
http://artists.mp3s.com/artist_song/175/175723.html
Visit The Surrealism Server:
http://www.madsci.org/~lynn/juju/surr/surrealism.html
-




Re: UTF-16 problems

2001-06-11 Thread Lisa Moore


Jianping wrote:

only Oracle provides fully UTF-8 and
UTF-16 support for RDBMS

Whoa...let me interject, DB2 for OS/390 supports UTF-8 and UTF-16.  And DB2
for Intel, Unix, supported both much earlier. I cannot speak to Jiangping's
intrepretation of "fully"

Shouldn't a war about UTF-8 be discussed on Unicore?

Lisa






Re: UTF-16 problems

2001-06-11 Thread toby_phipps


Jianping Yang <[EMAIL PROTECTED]> wrote:
>>So far, I can claim that only Oracle provides fully UTF-8 and
>> UTF-16 support for RDBMS, but unfortunately, as we cannot change the
exiting
>> utf8 definition from Oracle 8i as backward compatibility, we have to use
a new
>> character set name for it as AL32UTF8.

Michael (mitchka) Kaplan <[EMAIL PROTECTED]> wrote:
>As many have pointed out, THIS will cause more confusion than just about
>anything else. Tex is the only one who said anything but he is not the
only
>one to believe you are seriously undermining the standard with this
>decision. It certainly does a lot to hurt interoperability.

Yes, it will cause confusion, however stability, and 100% backwards
compatibility is an overriding concern.  I'd choose a little confusion
anytime if given the choice between confusion and breaking products that
depend on you.

Just like systems build dependence on UCD character names, users of
database systems build dependence on vendor naming conventions.  Changing
core API name references is not something that any responsible vendor would
do without overwhelming support from their customer base, and since the
database character set is chosen once per database installation, and is not
visible to the average user, I see no overwhelming reason for Oracle to
change this.  I admit, it is confusing at first, however they do have it
well documented (and I can only assume it will be documented with even
greater clarity in their 9i release where many additional Unicode features
have been added), and they also support the true, correct UTF-8 definition
as per ISO 10646 and TUS 3.0.

I equate this issue identically to the Unicode Consortium's refusal to
change UCD names even when they are incredibly misleading, as is the case
with U+20A0 EURO CURRENCY SIGN.  This is obviously not the "Euro currency
sign" regardless of its name.  The description points to the appropriate
character for the real sign.  Oracle's had to do the same thing with their
UTF8 character set to ensure backwards compatibilty and stability - leave
it as-is, but document very clearly that it may not be what the user
expects, and points them to an alternative character set setting
(AL32UTF8).

Toby.






RE: UTF-16 problems

2001-06-11 Thread Carl W. Brown

Michka,

I am exploring the concept.  I would prefer this to UTF-8s.  I am not sure
that the merits balance the problems.  In any case I think that if UTF-8s is
accepted that they also have to accept UTF-32s.

This advantage is that eventually the plane-0 code points would be phased
out and we would end up with a stable solution.  With UTF-8s we will be
fighting the problem forever.

Carl

-Original Message-
From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 11, 2001 6:14 PM
To: Carl W. Brown; unicode
Subject: Re: UTF-16 problems


From: "Carl W. Brown" <[EMAIL PROTECTED]>

> I am proposing that we fix UTF-16.

Are you formally proposing this? For the next UTC meeting?

michka





Re: UTF-16 problems

2001-06-11 Thread Jianping Yang



"Michael (michka) Kaplan" wrote:

> From: "Jianping Yang" <[EMAIL PROTECTED]>
>
> > > If UTF-8S were to by some miracle be accepted by
> > > the UTC, implementers will be put out and offended
> > > for most of the next decade.
> > >
> >
> > If it is, that  is rule of law from UTC.
>
> Very true.
>
>  And if they vote against it, will you do the right thing
> in THAT case as well -- never emitting this invalid form of UTF-8 again?

This is already achievable in  Oracle 9i by specifying Oracle client character
set to AL32UTF8 or by using UTF-16 interface.

> Or
> will Oracle et. al. choose to ignore the law if the decision does not go
> their way?
>

This will depend on the type of application. If the database is part of an
application, the application has its own choice of character set it can receive
and send to the database, providing it only sends and receives the standard
UTF-8 to/from you.

>
> Just trying to help folks determine if all of this is being done for the
> good of the standard (as has been claimed here many times).
>

Oracle is promoting and following the standard. Same as most other database
vendors, our database does not fully support supplementary character in Oracle
8i and Oracle 7. But as we see the need to support it, we extend this support
in Oracle 9i. So far, I can claim that only Oracle provides fully UTF-8 and
UTF-16 support for RDBMS, but unfortunately, as we cannot change the exiting
utf8 definition from Oracle 8i as backward compatibility, we have to use a new
character set name for it as AL32UTF8.

J.P.

>
> michka


begin:vcard 
n:Yang;Jianping
tel;fax:650-506-7225
tel;work:650-506-4865
x-mozilla-html:FALSE
org:Server Gobalization Technology;Server Technology
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Development Manager
adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065;
fn:Jianping Yang
end:vcard



Re: UTF-16 problems

2001-06-11 Thread Michael \(michka\) Kaplan

From: "Jianping Yang" <[EMAIL PROTECTED]>

> Oracle is promoting and following the standard. Same as most other
database
> vendors, our database does not fully support supplementary character in
Oracle
> 8i and Oracle 7. But as we see the need to support it, we extend this
support
> in Oracle 9i. So far, I can claim that only Oracle provides fully UTF-8
and
> UTF-16 support for RDBMS, but unfortunately, as we cannot change the
exiting
> utf8 definition from Oracle 8i as backward compatibility, we have to use a
new
> character set name for it as AL32UTF8.

As many have pointed out, THIS will cause more confusion than just about
anything else. Tex is the only one who said anything but he is not the only
one to believe you are seriously undermining the standard with this
decision. It certainly does a lot to hurt interoperability.

Anyway, we are talking in circles at this point, its pretty clear what
Oracle's position is here.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/






Re: UTF-16 problems

2001-06-11 Thread Michael \(michka\) Kaplan

From: "Jianping Yang" <[EMAIL PROTECTED]>

> > If UTF-8S were to by some miracle be accepted by
> > the UTC, implementers will be put out and offended
> > for most of the next decade.
> >
>
> If it is, that  is rule of law from UTC.

Very true.

 And if they vote against it, will you do the right thing
in THAT case as well -- never emitting this invalid form of UTF-8 again? Or
will Oracle et. al. choose to ignore the law if the decision does not go
their way?

Just trying to help folks determine if all of this is being done for the
good of the standard (as has been claimed here many times).

michka





Re: UTF-16 problems

2001-06-11 Thread Jianping Yang



"Michael (michka) Kaplan" wrote:

> From: "Jianping Yang" <[EMAIL PROTECTED]>
>
> > Is this the language that should be used in a professional way? I wonder
> > how could this happen to the Unicode mail list!
>
> So many linguists afoot, and we will get bogged down in my attempts to
> provide a little spice to the subject?
>
> The difference, of course, is that those who are offended will only be
> offended for a short time.

You know what I say in my word, but not to emit it out  to follow professional
way here to this mail list.

> If UTF-8S were to by some miracle be accepted by
> the UTC, implementers will be put out and offended for most of the next
> decade.
>

If it is, that  is rule of law from UTC.

>
> Trigeminal Software votes for simplicity.

Every vote will be count in UTC. Sorry we may get a dimple vote from you this
time.

J.P.

>
> michka


begin:vcard 
n:Yang;Jianping
tel;fax:650-506-7225
tel;work:650-506-4865
x-mozilla-html:FALSE
org:Server Gobalization Technology;Server Technology
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Development Manager
adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065;
fn:Jianping Yang
end:vcard



Re: UTF-16 problems

2001-06-11 Thread Michael \(michka\) Kaplan

From: "Jianping Yang" <[EMAIL PROTECTED]>

> Is this the language that should be used in a professional way? I wonder
> how could this happen to the Unicode mail list!

So many linguists afoot, and we will get bogged down in my attempts to
provide a little spice to the subject?

The difference, of course, is that those who are offended will only be
offended for a short time. If UTF-8S were to by some miracle be accepted by
the UTC, implementers will be put out and offended for most of the next
decade.

Trigeminal Software votes for simplicity.

michka





Re: UTF-16 problems

2001-06-11 Thread Michael \(michka\) Kaplan

From: "Carl W. Brown" <[EMAIL PROTECTED]>

> I am proposing that we fix UTF-16.  

Are you formally proposing this? For the next UTC meeting?

michka





Re: UTF-16 problems

2001-06-11 Thread Michael \(michka\) Kaplan

(whoops, sent too soon!)

From: "Carl W. Brown" <[EMAIL PROTECTED]>

> I am proposing that we fix UTF-16.

Are you formally proposing this? For the next UTC meeting? Without an actual
customer that is wanting it for an implementation I am pretty sure this will
be voted down pretty loudly.

michka





Re: UTF-16 problems

2001-06-11 Thread Jianping Yang

Is this the language that should be used in a professional way? I wonder
how could this happen to the Unicode mail list!

"Michael (michka) Kaplan" wrote:

> From: "Rick McGowan" <[EMAIL PROTECTED]>
>
> > > ... asking for a lavicious license to be lecherously lazy
> >
> > Parse error at "lavicious".  No such word appears in any English
> > dictionary I own, not even the OED.
>
> Sorry, that was to be lascivious.
>
> Glad someone is still parsing in this thread.
>
> michka


begin:vcard 
n:Yang;Jianping
tel;fax:650-506-7225
tel;work:650-506-4865
x-mozilla-html:FALSE
org:Server Gobalization Technology;Server Technology
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Development Manager
adr;quoted-printable:;;500 Oracle Packway=0D=0AM/S 659407;Redwood Shores;CA;94065;
fn:Jianping Yang
end:vcard



Re: UTF-16 problems

2001-06-11 Thread Michael \(michka\) Kaplan

From: "Rick McGowan" <[EMAIL PROTECTED]>

> > ... asking for a lavicious license to be lecherously lazy
> 
> Parse error at "lavicious".  No such word appears in any English  
> dictionary I own, not even the OED.

Sorry, that was to be lascivious.

Glad someone is still parsing in this thread.

michka






Re: UTF-16 problems

2001-06-11 Thread Rick McGowan

Michael Kaplan <[EMAIL PROTECTED]> wrote:

> ... asking for a lavicious license to be lecherously lazy

Parse error at "lavicious".  No such word appears in any English  
dictionary I own, not even the OED.

Rick






RE: UTF-16 problems

2001-06-11 Thread Carl W. Brown

Michka,

I guess that we can agree to disagree.  I can see that if for nothing else
having UTF-16 sort in Unicode code point order with a simple binary search
has real performance advantages.  You don't see it much in C code but some
assembler implementations can really benefit.  For example on an IBM 370/390
you can compare up to 16MB with a single machine instruction.  Having to
adjust for code point sequences for each character will add significant
overhead.

I am proposing that we fix UTF-16.  Probably the most common use of these
code points is for hankata (half width katakana).  If the application does
not have UTF-16 support it will work as it does normally.  UTF-16
applications will translate either code point to the code page katakana
character.  Going back to UTF-16 it will use the high end surrogate
character.  If is sends the data to a system that does not have UTF-16
support it will have to convert by shifting the characters to the alternate
code points(UTF-16 to UCS-2).  UTF-8 representation will be 4 characters
rather than 3 and UTF-16 will require 2 positions rather than 1.

UTF-16 fonts will have to map both code points.  In other words it will be a
little more overhead.  But it will eliminate the need for either UTF-8s or
UTF-32s.  Providing two code positions the same character is a requirement
of UTF-32s anyway so the impact of this change is far less that splitting
UTF-8 into two incompatible systems.  This one is at least interchangeable.

The real beauty for this system is that even when converting from UTF-16 to
UCS-2 the UCS-2 applications will still sort in the same relative order as
the UTF-16, UTF-8 and UTF-32 applications.  You will produce different UTF-8
codes because those will correspond to the new UTF-16 code points.  So there
will be some minor changes to the code that converts from UCS-2 to UTF-8.
But even if the code is not adjusted you will still get valid UTF-8
encoding, it will just not sort properly.  That is certainly preferable to
broken encoding.

Carl


-Original Message-
From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 11, 2001 3:47 PM
To: Carl W. Brown; unicode
Subject: Re: UTF-16 problems


From: "Carl W. Brown" <[EMAIL PROTECTED]>

> I first I thought the same thing but I have changed my mind.  There are
> problems but the problems are with UTF-16 not UTF-8.  I don't think that I
> am the only one who thinks that UTF-8s will create more problems that it
> fixes.
>
> Worse yet they will also have to "fix" UTF-32 as well.
>
> The point of this message is to fix UTF-16 which is the source of the
> problem.  These changes are no more of a stretch than UTF-32s.  The
UTF-32s
> proposal that I heard involves replication the same code points to get
these
> code points to sort high like UTF-16.
>
> What this does, is the legitimize the code point shift for UTF-16, UTF-8,
> and UTF-32 so that the transforms all work and all sort the same and that
> the binary sort and Unicode sort orders are the same.
>
> It does involve a minor normalization transform but you have to do that
for
> UTF-32s anyway and UTF-32s is required if you allow support of UTF-8s.
The
> big difference is that you don't change any UTF protocols or develop two
> mutually exclusive transforms that are so similar that they might be
> confused.  Besides this transform keeps UTF-8 to 4 bytes not 6 and will
work
> with the existing UTF-8 software.
>
> The beauty of this proposal is that UCS-2 (plane 0 only) codes will sort
in
> the same order as the post transformed UTF-16 codes.


Carl,

I would agree with you except for one thing no one needs this, to solve
their implementation issues! Why would everyone want to turn around and have
to change all their implementations around, including the lazy folks who are
asking others to change for their sake, to support something that no one
wants to do?

The whole UTF-8S mess is a bunch of people asking for a lavicious license to
be lecherously lazy (they should have called it UTF-8L in effigy). No one is
interested in doing a bunch of work here:

1) There is the group of people who took responsibility for their
implementations at some point in the last seven years to properly support
supplementary characters. They do not want to do any extra work since they
work just fine.

2) There is the group of people who are scrambling around trying to get
their laziness canonized as the forward looking savior of a solution that
all of us were too foolish to realize is vital -- they do not want to do any
work either (except marketing work to convince everyone how right they are).

3) There is the group of people who can't believe how far this has come.

I understand the technical merit of the suggestion, and it is technically
superior to the UTF-8S plan (this is of course not saying much, but your
plan is well t

Re: UTF-16 problems

2001-06-11 Thread Michael \(michka\) Kaplan

From: "Carl W. Brown" <[EMAIL PROTECTED]>

> I first I thought the same thing but I have changed my mind.  There are
> problems but the problems are with UTF-16 not UTF-8.  I don't think that I
> am the only one who thinks that UTF-8s will create more problems that it
> fixes.
>
> Worse yet they will also have to "fix" UTF-32 as well.
>
> The point of this message is to fix UTF-16 which is the source of the
> problem.  These changes are no more of a stretch than UTF-32s.  The
UTF-32s
> proposal that I heard involves replication the same code points to get
these
> code points to sort high like UTF-16.
>
> What this does, is the legitimize the code point shift for UTF-16, UTF-8,
> and UTF-32 so that the transforms all work and all sort the same and that
> the binary sort and Unicode sort orders are the same.
>
> It does involve a minor normalization transform but you have to do that
for
> UTF-32s anyway and UTF-32s is required if you allow support of UTF-8s.
The
> big difference is that you don't change any UTF protocols or develop two
> mutually exclusive transforms that are so similar that they might be
> confused.  Besides this transform keeps UTF-8 to 4 bytes not 6 and will
work
> with the existing UTF-8 software.
>
> The beauty of this proposal is that UCS-2 (plane 0 only) codes will sort
in
> the same order as the post transformed UTF-16 codes.


Carl,

I would agree with you except for one thing no one needs this, to solve
their implementation issues! Why would everyone want to turn around and have
to change all their implementations around, including the lazy folks who are
asking others to change for their sake, to support something that no one
wants to do?

The whole UTF-8S mess is a bunch of people asking for a lavicious license to
be lecherously lazy (they should have called it UTF-8L in effigy). No one is
interested in doing a bunch of work here:

1) There is the group of people who took responsibility for their
implementations at some point in the last seven years to properly support
supplementary characters. They do not want to do any extra work since they
work just fine.

2) There is the group of people who are scrambling around trying to get
their laziness canonized as the forward looking savior of a solution that
all of us were too foolish to realize is vital -- they do not want to do any
work either (except marketing work to convince everyone how right they are).

3) There is the group of people who can't believe how far this has come.

I understand the technical merit of the suggestion, and it is technically
superior to the UTF-8S plan (this is of course not saying much, but your
plan is well thought out!). The problem is that this is a solution that is
looking for a problem.

The only people who have the problem are the ones who were not thinking
ahead, and they do not want to throw away their current solution, they are
too in love with it.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/






RE: UTF-16 problems

2001-06-11 Thread Carl W. Brown

Michka,

I first I thought the same thing but I have changed my mind.  There are
problems but the problems are with UTF-16 not UTF-8.  I don't think that I
am the only one who thinks that UTF-8s will create more problems that it
fixes.

Worse yet they will also have to "fix" UTF-32 as well.

The point of this message is to fix UTF-16 which is the source of the
problem.  These changes are no more of a stretch than UTF-32s.  The UTF-32s
proposal that I heard involves replication the same code points to get these
code points to sort high like UTF-16.

What this does, is the legitimize the code point shift for UTF-16, UTF-8,
and UTF-32 so that the transforms all work and all sort the same and that
the binary sort and Unicode sort orders are the same.

It does involve a minor normalization transform but you have to do that for
UTF-32s anyway and UTF-32s is required if you allow support of UTF-8s.  The
big difference is that you don't change any UTF protocols or develop two
mutually exclusive transforms that are so similar that they might be
confused.  Besides this transform keeps UTF-8 to 4 bytes not 6 and will work
with the existing UTF-8 software.

The beauty of this proposal is that UCS-2 (plane 0 only) codes will sort in
the same order as the post transformed UTF-16 codes.

Carl

-Original Message-
From: Michael (michka) Kaplan [mailto:[EMAIL PROTECTED]]
Sent: Monday, June 11, 2001 1:22 PM
To: Carl W. Brown; unicode
Subject: Re: UTF-16 problems


From: "Carl W. Brown" <[EMAIL PROTECTED]>

> I think that UTF-16x would be a better approach than UTF-8s.  I am sure
that
> I have missed some issues feel free to comment.  In any case UTF-16s would
> naturally be in Unicode code point order.  It would be easy to transform
to
> UCS-2 for applications that do not support UTF-16.

Carl, you are missing the central point of the UTF-8S movement -- they do
not want to change anything. Hell, they do not even want to change the
*name* they are so disinterested in changing anything! They want the Unicode
standard to embrace their format and support their bug, and not change a
bleeding thing.

They are distorting the truth (companies who only care about the whole mess
for the sake of compatibility with Oracle are being quoted as being
"intensely supportive of UTF-8S", and I'm sorry but distortion is the only
word for it). Revisionist history and revisionist present/future at its
finest, all you need is suspension is diebelief and you can vote for UTF-8S
knowing that you are saving the standard from oblivion!

Where are all these conspiracy buffs when you need them? They can have a
field day with this little adventure we have been having.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/






Re: UTF-16 problems

2001-06-11 Thread Michael \(michka\) Kaplan

From: "Carl W. Brown" <[EMAIL PROTECTED]>

> I think that UTF-16x would be a better approach than UTF-8s.  I am sure
that
> I have missed some issues feel free to comment.  In any case UTF-16s would
> naturally be in Unicode code point order.  It would be easy to transform
to
> UCS-2 for applications that do not support UTF-16.

Carl, you are missing the central point of the UTF-8S movement -- they do
not want to change anything. Hell, they do not even want to change the
*name* they are so disinterested in changing anything! They want the Unicode
standard to embrace their format and support their bug, and not change a
bleeding thing.

They are distorting the truth (companies who only care about the whole mess
for the sake of compatibility with Oracle are being quoted as being
"intensely supportive of UTF-8S", and I'm sorry but distortion is the only
word for it). Revisionist history and revisionist present/future at its
finest, all you need is suspension is diebelief and you can vote for UTF-8S
knowing that you are saving the standard from oblivion!

Where are all these conspiracy buffs when you need them? They can have a
field day with this little adventure we have been having.

MichKa

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/