On Friday, September 27, 2002, at 01:12 PM, Michael T. Babcock wrote:
> Dawn Friedland wrote: > >> Prior to my client requesting that I add Japanese content to the >> content >> tool & database, I had zero experience with characters sets other than >> Latin. I always used notepad to filter out any weird MS Word >> formattings >> and left the default as ANSI. > I had that problem a year ago too, prior to doing Japanese database > work. > >> Many people have recommended I use UTF-8. I interpreted that to mean >> that when I have the Japanese text in notepad, I choose file, save as, >> and then choose the encoding ast UTF-8. When I do that, and then >> copy/paste to insert using the DOS prompt, I get the same problematic >> results. Is there something I am missing or not understanding when >> people tell me to "use UTF-8" .... Am I supposed to configure the >> table >> or database somehow to use it or should I be running the text through >> a >> UTF-8 converter other than notepad? > I wouldn't rely on your command prompt to be UTF-8 compliant; I'd > recommend inserting data using a web interface if nothing else (or > your own Unicode-compatible client) to a BINARY field (not TEXT) > unless you have MySQL with Unicode support. Treat the data as binary > _everywhere_; pretend you can't translate it, etc. except using safe > tools (like the iconv library on *nix). UTF-8 is just an encoding of > Unicode; you may get more milage in Windows using 16-bit Unicode. > Is there such a thing as MySQL with Unicode support? I'm fairly new to MySQL but all my research has led me to believe that this is still a to do item. > See: http://www.unicode.org/ for reference, especially > http://www.unicode.org/unicode/faq/basic_q.html. > > To best deal with UTF-8 in a program, use dynamically-allocated > strings and never assume things like the 4th char in a string is > string[3] or anything. "Pass-through" is the best way to deal with > UTF-8 until you actually have to handle processing of it (doing > something to a Unicode/UTF-8 string) -- read it from a > Unicode-compliant program / field / widget and write it straight to > the DB without translations, then read it when you need it and compare > it against something if necessary and display it. Just because it > looks like garbage when its raw doesn't mean it _is_ garbage. > I've read lots of similar posts in the archives at <http://lists.mysql.com/>. Many suggestions to use a BLOB instead of a text field. But MySQL supports double-byte languages. Why not use an encoding it supports (SJIS or UJIS for Japanese) instead of this kludge? If I compile MySQL to support UJIS with --with-charset=sjis won't text fields then store ujis encoded text properly? I'd like to use Unicode too but if it's not supported yet... Thanks, -Kirk --------------------------------------------------------------------- Before posting, please check: http://www.mysql.com/manual.php (the manual) http://lists.mysql.com/ (the list archive) To request this thread, e-mail <[EMAIL PROTECTED]> To unsubscribe, e-mail <[EMAIL PROTECTED]> Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php