Steven Bennett wrote:

Christian M. Cepel wrote:



It was my understanding that all unicode character sets contain English
characters mapped to the same values they're mapped to in other sets.



Close -- Unicode is a *single* character set. For convenience, you'll frequently run into references to Unicode code pages, but all they are is a range within the overall character set. All characters from every encoding that Unicode supports exist somewhere in that character set.

So with a Unicode (UTF-8 or UTF-16) encoded text file you could easily have
English, Chinese, Korean, Russian, and Symbol characters all in the same
sentence.



You know, about 3 years ago while in a SoftEng class, I started Thistledowne, and voiced my intentions to make it unicode16 native. I was SUPER rudely kicked in the nuts by some on this list and then thrown in the doghouse while a major flamewar resulted.. well not a flamewar exactly. I was the target, and was bombed without mercy. I was told "Hey stupid, ABC is strictly 7bit ascii, and there's damn good reasons why it's that way, so wanting to use Unicode is stupid and you should kill yourself for even thinking of it."

God people were mean and rude and nasty, along with the typical "Oh...Yet another abc project... And you're excited... Tell me what's gonna make your project shine over the hundreds of projects done by people who are probably better than you. Go jump in a lake. " response.

Oh I continued my project, and got an A, and scrapped it and will used what I learned there for my new project. Boy, I learned never to tell people on the list I had a project going. Sure a few were encouraging, but who could hear their voice over the nastiness.


//Christian

Another convenient item is that the first Unicode code page 0x0001 - 0x007f
is the ASCII code.  So if you're using wchar instead of char as your string
pointer type, then comparisons like:

   if (str[0] == 'K')

...will work the same when using Unicode or ASCII.  The only difference is
now str points to an array of 16 bit values instead of 8 bit ones.

-->Steve Bennett

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html





--

 //Christian

Christian Marcus Cepel            | And the wrens have returned &
[EMAIL PROTECTED] icq:12384980  | are nesting; In the hollow of
371 Crown Point, Columbia, MO     | that oak where his heart once
65203-2202 573.999.2370           | had been; And he lifts up his
Computer Support Specialist, Sr.  | arms in a blessing; For being
University of Missouri - Columbia | born again.    --Rich Mullins

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to