Re: Japanese Charset

Kirk Samuelson Sat, 28 Sep 2002 13:39:46 -0700


On Friday, September 27, 2002, at 01:12  PM, Michael T. Babcock wrote:


> Dawn Friedland wrote:
>
>> Prior to my client requesting that I add Japanese content to the 
>> content
>> tool & database, I had zero experience with characters sets other than
>> Latin. I always used notepad to filter out any weird MS Word 
>> formattings
>> and left the default as ANSI.
> I had that problem a year ago too, prior to doing Japanese database 
> work.
>
>> Many people have recommended I use UTF-8. I interpreted that to mean
>> that when I have the Japanese text in notepad, I choose file, save as,
>> and then choose the encoding ast UTF-8. When I do that, and then
>> copy/paste to insert using the DOS prompt, I get the same problematic
>> results. Is there something I am missing or not understanding when
>> people tell me to "use UTF-8" .... Am I supposed to configure the 
>> table
>> or database somehow to use it or should I be running the text through 
>> a
>> UTF-8 converter other than notepad?
> I wouldn't rely on your command prompt to be UTF-8 compliant; I'd 
> recommend inserting data using a web interface if nothing else (or 
> your own Unicode-compatible client) to a BINARY field (not TEXT) 
> unless you have MySQL with Unicode support.  Treat the data as binary 
> _everywhere_; pretend you can't translate it, etc. except using safe 
> tools (like the iconv library on *nix).  UTF-8 is just an encoding of 
> Unicode; you may get more milage in Windows using 16-bit Unicode.
>
Is there such a thing as MySQL with Unicode support? I'm fairly new to 
MySQL but all my research has led me to believe that this is still a to 
do item.

> See: http://www.unicode.org/ for reference, especially 
> http://www.unicode.org/unicode/faq/basic_q.html.
>
> To best deal with UTF-8 in a program, use dynamically-allocated 
> strings and never assume things like the 4th char in a string is 
> string[3] or anything.  "Pass-through" is the best way to deal with 
> UTF-8 until you actually have to handle processing of it (doing 
> something to a Unicode/UTF-8 string) -- read it from a 
> Unicode-compliant program / field / widget and write it straight to 
> the DB without translations, then read it when you need it and compare 
> it against something if necessary and display it.  Just because it 
> looks like garbage when its raw doesn't mean it _is_ garbage.
>
I've read lots of similar posts in the archives at 
<http://lists.mysql.com/>. Many suggestions to use a BLOB instead of a 
text field. But MySQL supports double-byte languages. Why not use an 
encoding it supports (SJIS or UJIS for Japanese) instead of this 
kludge? If I compile MySQL to support UJIS with  --with-charset=sjis 
won't text fields then store ujis encoded text properly? I'd like to 
use Unicode too but if it's not supported yet...

Thanks,

-Kirk


---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Japanese Charset

Reply via email to