Here's a brief summary of where we are: 
I trying to store Japanese text (Shift_Jis) in MySQL and view it from a
web page.  The content is provided to me in Word format. I convert it to
plain text, copy/paste into a web form in an ASP-based CMS on a Windows
box. When viewed from a web page, seemingly random characters are
morphed into other characters. The majority of the database contains
rows in Latin. MySQL supports Japanese and Latin in the same table.
Other people are able to do this without the morphing problem. My
Regional & Language settings in Windows are set to to Shift_Jis in order
to view Shift_Jis characters in notepad and the DOS prompt. If I
circumvent the CMS and copy/paste from notepad directly to MySQL in the
DOS Prompt, the results are the same (although fewer characters are
broken when viewed through DOS).

For a good explanation visit this problem's web site:
http://commworks01.barklouder.com/japan/press/broken_chars.asp

I conclude that one of two things may be happening:
1. Characters are being corrupted by virtue of the fact that their
source of origination were copied from Word, despite the conversion to
plain text. (At this point I do not have a plain text file with content
typed directly into notepad....i.e. Word circumvented. I am at the mercy
of the client's PR department.) 
2. Characters are being corrupted by MySQL. 

If option 1 were true, then why do the characters show up fine when in a
static HTML document? (see below). 

In Response to Joel Rees:
> I checked the text you gave me, and I found what's getting 
> clobbered. It's the latter half of characters like the katakana 'so'.
> 
> Although the byte that is getting walked on here is 0x5c, 
> this is _not_ the escape character. It is preceded (in the 
> case of katakana 'so') by a byte of 0x83. The entire 
> character is '0x835c', and the 0x5c is being treated as if it 
> were a backslash. There are other characters that will get 
> hit by this, by the way.

Question 1: It seems like a lot more characters are getting hit than
just '0x835c'. How do I map the 0x835c to what the character looks like?
I don't know what 0x835c is. 
Question 2: How  do I handle the character escape mechanism correctly
according to MySQL? 



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Reply via email to