utility for looking at hex values in strings from the command line (was Re: Japanese Charset)

2002-10-02 Thread Joel Rees

Enjoy.

 Joel can I get a copy of that hex convert utility too? I am working on a
 Korean version of mysql yet I currently speak only english. That utility
 would help me gobbs. 
 
 -Original Message-
 From: Joel Rees [mailto:[EMAIL PROTECTED]]
 Sent: Sunday, September 29, 2002 8:24 PM
 To: Dawn Friedland
 Cc: [EMAIL PROTECTED]
 Subject: Re: Japanese Charset
...
 Say, do you want a little utility program in Java or C that will print
 the hexadecimal values of the characters in a string? Basically, it
 would be a command-line utility, so you would copy the text and then
 paste it into the command line, after the name of the utility. You'd
 need a compiler, of course. 

The C would be even more transparent. But I don't have a C compiler
loaded right now. Besides, Java is so international.

source---
/**
 * Get a look at the numeric values of characters.
 *
 * Use from the command line as (for example):
 * C: java ShowHex paste or type strings here
 *
 * @author Joel Rees, Altech Corporation, Esaka, Japan
 * Copyright September 2002
 * May be copied, modified, and otherwise used freely.
 * (I mean, really, this isn't very long or complicated. 8-)
 * No warranty. Use at your own risk.
 *
 * @version 0.1
 * Works in Java 1.4.
 */


import java.lang.Class;
import java.lang.Byte;


public class ShowHex 
{
public static void main( String[] args )
{   if ( args.length  1 )
{
System.out.println( Usage:  
+ ShowHex.class.getName() /* Okay, this is ridiculous. 
*/
+  string [, string ... ] );
}
else
{
for ( int arg = 0; arg  args.length; ++arg )
{
String input = new String( args[ arg ] );
byte[] myBytes = args[ arg ].getBytes();
System.out.println( input );
for ( int i = 0; i  myBytes.length; ++i )
{
System.out.print( Integer.toHexString( ( 
myBytes[ i ]  0xff ) ) );
if ( i  ( myBytes.length - 1 ) )
{   System.out.print( ' ' );
}
}
System.out.println();
}
}
}

}


-- 
Joel Rees [EMAIL PROTECTED]


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-10-01 Thread gerald_clark

I was not aware that you were running VB.
In perl and PHP we have a function do do this for us.
in perl:

$qtext=$dbh-quote($text);
$dbh-execute(insert into mytable set myvariable = $qtext);

The above quote() function will put a \ in fronnt of all the special 
characters listed in the manual.
These in clude '\ and the hex 00 character.

If  the VB library you are using does not have a similar function, you 
should write one.

You are welcome.

Dawn Friedland wrote:

I found a solution!!! ***Replace all backlashes with two backslashes.***
(The hex value of the backslash is 0x5c, see Joel Rees' previous emails
for an explanation on how multibyte Japanese characters contain the hex
value 0x5c and that MySQL uses that value as an escape character.)

VBscript used prior to submitting data to database: 


   szJapaneseText = replace(szJapaneseText, \,\\)

  




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-10-01 Thread Michael T. Babcock

gerald_clark wrote:

 $qtext=$dbh-quote($text);
 $dbh-execute(insert into mytable set myvariable = $qtext);

 The above quote() function will put a \ in fronnt of all the special 
 characters listed in the manual.
 These in clude '\ and the hex 00 character.

FWIW, before anyone copies and pastes that  you should use (as I'm sure 
Gerald actually does):
execute(insert into mytable set myvariable = \$qtext\); so that 
there are quotes around your variable when you insert it into the DB and 
if there are spaces within your data (even accidentally), it won't try 
to parse the data as part of the query;

UPDATE MyTable SET Name = Michael Babcock WHERE ID = 4; will get you 
some errors, for the people who like examples.  I've also had the 
occasional UPDATE MyTable SET Name = WHERE ID = 4; which is also 
avoided by always quoting variables.

-- 
Michael T. Babcock
C.T.O., FibreSpeed Ltd.
http://www.fibrespeed.net/~mbabcock



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-10-01 Thread gerald_clark

It is already quoted.  That's the point.

Michael T. Babcock wrote:

 gerald_clark wrote:

 $qtext=$dbh-quote($text);
 $dbh-execute(insert into mytable set myvariable = $qtext);

 The above quote() function will put a \ in fronnt of all the special 
 characters listed in the manual.
 These in clude '\ and the hex 00 character.


 FWIW, before anyone copies and pastes that  you should use (as I'm 
 sure Gerald actually does):
 execute(insert into mytable set myvariable = \$qtext\); so that 
 there are quotes around your variable when you insert it into the DB 
 and if there are spaces within your data (even accidentally), it won't 
 try to parse the data as part of the query;

 UPDATE MyTable SET Name = Michael Babcock WHERE ID = 4; will get you 
 some errors, for the people who like examples.  I've also had the 
 occasional UPDATE MyTable SET Name = WHERE ID = 4; which is also 
 avoided by always quoting variables.




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-30 Thread Michael T. Babcock

Joel Rees wrote:

If I compile MySQL using --with-charset=sjis , how will it handle the
Latin, Chinese, and Korean characters? 



Multiple databases on multiple servers?


Try this one on for size:

CREATE TABLE Customers ( Name VARCHAR(100) );

Now ... if your customers have names in Japanese, Russian and German, 
how do you compile MySQL so it can store them all in Customers?  You use 
Unicode with a binary field and do post-processing work (like ORDER BY) 
yourself.

-- 
Michael T. Babcock
C.T.O., FibreSpeed Ltd.
http://www.fibrespeed.net/~mbabcock



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-30 Thread Dawn Friedland

I welcome the pedantics!

 Multiple databases on multiple servers?
That is what I thought...which I assume means multiple machines (not an
option). And I'm glad you pointed it out that it still wouldn't solve
the problem. 

 Go re-read the section of the manual on string literals. Ask 
 yourself what is going to happen when you try to insert the text 
 
 Our network switches are 100% standard!
 
 or 
 
 You should store your preferences at c:\WINNT\bloopers\prefs.txt
 
 in your database. How would you set up the database queries 
 to store those strings?
Prior to sending the string to the database, I would look for \ and
replace it with \\. Its difficult for me to see the \ in Our
network switches are 100% standard! because, like you said, the 0x5c
may be only part of the entire hex value of the character (as Japanese
characters are multibyte). I would need to first convert the Japanese
character to hex, then look for the 0x5c, then replace it with
0x5c0x5c. Problem solved. 

 Say, do you want a little utility program in Java or C that 
 will print the hexadecimal values of the characters in a 
 string? 
I avoiding the 0x5c byte *precisely* because I don't know how to convert
my Japanese text (squiggly pictures I see in Notepad) into their hex
values. I would LOVE such a utility!

This issue is so baffling to me because 
1. People on this list have said they have successfully stored multiple
charsets, including Japanese, in the same database. 
2. I store Korean and Chinese (simplified) in this same database with no
errors. (Of course, I am dependent on the Korean and Chinese speaking
colleagues of my client whom were tasked with proofreading.) 
3. A fellow from MySQL with whom I was recently in contact with off-list
said the following: Right now, MySQL does not enable you to store
multiple charsets per database. (This same fellow said the 0x5c was
probably causing my problem.)

Thanks for all your help. I feel much less hopeless than I did several
weeks ago.
Dawn Friedland
[EMAIL PROTECTED]

 -Original Message-
 From: Joel Rees [mailto:[EMAIL PROTECTED]] 
 Sent: Sunday, September 29, 2002 7:24 PM
 To: Dawn Friedland
 Cc: [EMAIL PROTECTED]
 Subject: Re: Japanese Charset
 
 
 Dawn, I'm going to give in to the temptation to be pedantic. 
 Apologies in advance.
 
  Kirk Samuelson wrote:
   I've read lots of similar posts in the archives at
   http://lists.mysql.com/. Many suggestions to use a BLOB 
   instead of a 
   text field. But MySQL supports double-byte languages. Why 
 not use an 
   encoding it supports (SJIS or UJIS for Japanese) instead of this 
   kludge? If I compile MySQL to support UJIS with  
 --with-charset=sjis 
   won't text fields then store ujis encoded text properly? 
 I'd like to 
   use Unicode too but if it's not supported yet...
  
  The idea is to be able to store Latin and Japanese in the same 
  database (as well as Chinese  Korean). Isn't that 
 supported by MySQL? 
  People on this list say they've done it successfully.
  
  If I compile MySQL using --with-charset=sjis , how will it 
 handle the 
  Latin, Chinese, and Korean characters?
 
 Multiple databases on multiple servers?
 
 But you still have the problem of needing to handle the 
 escape characters correctly.
 
 (Sorry I wasn't able to get that page up over the weekend.)
 
 Go re-read the section of the manual on string literals. Ask 
 yourself what is going to happen when you try to insert the text 
 
 Our network switches are 100% standard!
 
 or 
 
 You should store your preferences at c:\WINNT\bloopers\prefs.txt
 
 in your database. How would you set up the database queries 
 to store those strings?
 
 Say, do you want a little utility program in Java or C that 
 will print the hexadecimal values of the characters in a 
 string? Basically, it would be a command-line utility, so you 
 would copy the text and then paste it into the command line, 
 after the name of the utility. You'd need a compiler, of course. 
 
 -- 
 Joel Rees [EMAIL PROTECTED]
 
 

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-30 Thread Dawn Friedland

 CREATE TABLE Customers ( Name VARCHAR(100) );
 
 Now ... if your customers have names in Japanese, Russian and German, 
 how do you compile MySQL so it can store them all in 
 Customers?  You use 
 Unicode with a binary field and do post-processing work (like 
 ORDER BY) 
 yourself.

I have tried your suggestion. 

If I am using a binary field, wouldn't it look like this: 
CREATE TABLE Customers ( Name varchar(100) binary );

I have saved my notepad file as Unicode. I copy/paste to a web form and
submit to a field of type varchar binary. (I also try with varchar,
char, blob, mediumtext). The characters still break. I copy/paste
directly to command prompt. Characters still break. 

Thanks for your input. I am welcome to hearing more ideas. Perhaps I am
missing something (likely). 
Dawn

 -Original Message-
 From: Michael T. Babcock [mailto:[EMAIL PROTECTED]] 
 Sent: Monday, September 30, 2002 5:35 AM
 To: Joel Rees
 Cc: Dawn Friedland; [EMAIL PROTECTED]
 Subject: Re: Japanese Charset
 
 
 Joel Rees wrote:
 
 If I compile MySQL using --with-charset=sjis , how will it 
 handle the 
 Latin, Chinese, and Korean characters?
 
 
 
 Multiple databases on multiple servers?
 
 
 Try this one on for size:
 
 CREATE TABLE Customers ( Name VARCHAR(100) );
 
 Now ... if your customers have names in Japanese, Russian and German, 
 how do you compile MySQL so it can store them all in 
 Customers?  You use 
 Unicode with a binary field and do post-processing work (like 
 ORDER BY) 
 yourself.
 
 -- 
 Michael T. Babcock
 C.T.O., FibreSpeed Ltd.
 http://www.fibrespeed.net/~mbabcock
 
 
 

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-30 Thread gerald_clark

Your webform application must call the appropriate mysql escape string 
function for the language
in which it is written.

Dawn Friedland wrote:

CREATE TABLE Customers ( Name VARCHAR(100) );

Now ... if your customers have names in Japanese, Russian and German, 
how do you compile MySQL so it can store them all in 
Customers?  You use 
Unicode with a binary field and do post-processing work (like 
ORDER BY) 
yourself.



I have tried your suggestion. 

If I am using a binary field, wouldn't it look like this: 
CREATE TABLE Customers ( Name varchar(100) binary );

I have saved my notepad file as Unicode. I copy/paste to a web form and
submit to a field of type varchar binary. (I also try with varchar,
char, blob, mediumtext). The characters still break. I copy/paste
directly to command prompt. Characters still break. 

Thanks for your input. I am welcome to hearing more ideas. Perhaps I am
missing something (likely). 
Dawn

  

-Original Message-
From: Michael T. Babcock [mailto:[EMAIL PROTECTED]] 
Sent: Monday, September 30, 2002 5:35 AM
To: Joel Rees
Cc: Dawn Friedland; [EMAIL PROTECTED]
Subject: Re: Japanese Charset


Joel Rees wrote:



If I compile MySQL using --with-charset=sjis , how will it 


handle the 


Latin, Chinese, and Korean characters?
   



Multiple databases on multiple servers?

  

Try this one on for size:

CREATE TABLE Customers ( Name VARCHAR(100) );

Now ... if your customers have names in Japanese, Russian and German, 
how do you compile MySQL so it can store them all in 
Customers?  You use 
Unicode with a binary field and do post-processing work (like 
ORDER BY) 
yourself.

-- 
Michael T. Babcock
C.T.O., FibreSpeed Ltd.
http://www.fibrespeed.net/~mbabcock






-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail 
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


  




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-30 Thread Dawn Friedland

I found a solution!!! ***Replace all backlashes with two backslashes.***
(The hex value of the backslash is 0x5c, see Joel Rees' previous emails
for an explanation on how multibyte Japanese characters contain the hex
value 0x5c and that MySQL uses that value as an escape character.)

VBscript used prior to submitting data to database: 


   szJapaneseText = replace(szJapaneseText, \,\\)


Pulling from the database and displaying to a web page requires no
additional processing. Simply display. 


Thank you so much for all your time and effort reading and responding to
my emails. 
Joel Rees - You insisted it was 0x5c from the beginning. I was so slow
to understand because of everyone's talk of UTF-8, was it Word, was it
my content tool, is MySQL capable of storing multiple charsets per
database, etc. Thank you immensely for your patience and all your help. 
Gerald Clark - It was your simple, brief message that made me think, Is
it that simple? It is. 
Michael Babcock, Shashank Tripathi, Jon Frisby, Brian Duke, kayamboo,
Kirk Samuelson - Thank you for just being there and responding with hope
that there is a solution - without which I would not have persisted on. 

I know this is all so sappy, but this was the most lengthy, most
frusterating, and hopeless problem solving event of my life. I am glad
it is over (and now maybe I'll get paid). The most unsurprising thing is
that it was so simple: one line of code. 

Sincerely, 
Dawn Friedland
[EMAIL PROTECTED]

 -Original Message-
 From: gerald_clark [mailto:[EMAIL PROTECTED]] 
 Sent: Monday, September 30, 2002 1:16 PM
 To: Dawn Friedland
 Cc: Michael T. Babcock; [EMAIL PROTECTED]
 Subject: Re: Japanese Charset
 
 
 Your webform application must call the appropriate mysql 
 escape string 
 function for the language
 in which it is written.
 
 Dawn Friedland wrote:
 
 CREATE TABLE Customers ( Name VARCHAR(100) );
 
 Now ... if your customers have names in Japanese, Russian 
 and German,
 how do you compile MySQL so it can store them all in 
 Customers?  You use 
 Unicode with a binary field and do post-processing work (like 
 ORDER BY) 
 yourself.
 
 
 
 I have tried your suggestion.
 
 If I am using a binary field, wouldn't it look like this:
 CREATE TABLE Customers ( Name varchar(100) binary );
 
 I have saved my notepad file as Unicode. I copy/paste to a 
 web form and 
 submit to a field of type varchar binary. (I also try with varchar, 
 char, blob, mediumtext). The characters still break. I copy/paste 
 directly to command prompt. Characters still break.
 
 Thanks for your input. I am welcome to hearing more ideas. 
 Perhaps I am 
 missing something (likely). Dawn
 
   
 
 -Original Message-
 From: Michael T. Babcock [mailto:[EMAIL PROTECTED]]
 Sent: Monday, September 30, 2002 5:35 AM
 To: Joel Rees
 Cc: Dawn Friedland; [EMAIL PROTECTED]
 Subject: Re: Japanese Charset
 
 
 Joel Rees wrote:
 
 
 
 If I compile MySQL using --with-charset=sjis , how will it
 
 
 handle the
 
 
 Latin, Chinese, and Korean characters?

 
 
 
 Multiple databases on multiple servers?
 
   
 
 Try this one on for size:
 
 CREATE TABLE Customers ( Name VARCHAR(100) );
 
 Now ... if your customers have names in Japanese, Russian 
 and German,
 how do you compile MySQL so it can store them all in 
 Customers?  You use 
 Unicode with a binary field and do post-processing work (like 
 ORDER BY) 
 yourself.
 
 --
 Michael T. Babcock
 C.T.O., FibreSpeed Ltd.
 http://www.fibrespeed.net/~mbabcock
 
 
 
 
 
 
 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)
 
 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail 
 [EMAIL PROTECTED]
 Trouble unsubscribing? Try: 
 http://lists.mysql.com/php/unsubscribe.php
 
 
   
 
 
 
 

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-30 Thread Joel Rees

  Multiple databases on multiple servers?
 That is what I thought...which I assume means multiple machines (not an
 option).

For future reference, MySQL can actually run multiple servers with
different configurations on a single machine in Linux. (*BSD and Mac OS
X, too, I think.) They are working on methods for doing this in
MSWindows, but the different user model in MSWindows requires a
different approach than they are using for *NIX.

-- 
Joel Rees [EMAIL PROTECTED]


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-30 Thread Joel Rees

 I found a solution!!!

Hooray!

 ***Replace all backlashes with two backslashes.***

You probably want to consider whether you want to replace all quotes
with backslash-quote, as well. Backslashes in the English text would
only show up for something like file paths on MS OSses, but your
customer might want to put single or double quotes around something on a
page, and the way it is right now MySQL should be throwing syntax errors
when it gets embedded quotes.

 (The hex value of the backslash is 0x5c, see Joel Rees' previous emails
 for an explanation on how multibyte Japanese characters contain the hex
 value 0x5c and that MySQL uses that value as an escape character.)
 
 VBscript used prior to submitting data to database: 
 
 
szJapaneseText = replace(szJapaneseText, \,\\)
 

Sorry about talking first about regular expressions. I'm thinking VB
ought to have full RE by now, but maybe it's still only got the text
replacement function.

 Pulling from the database and displaying to a web page requires no
 additional processing. Simply display. 

You might want to turn off the post-processing for all non-English pages.
Even though the Chinese seems to work, I suspect there will be some
characters that you just haven't seen yet which get morphed by this
little trick. I think I'd warn the customer, though, and explain that it
will cost them extra to have it in foreign language pages.

Case munging is just one example of post-processing that only makes
sense in certain contexts. The time to _research_ which kinds of
post-processing make sense in which languages, ethnic groups, etc., is
not trivial. Could be worth doing if they have the money and the need.
But there aren't really any packaged solutions for those problems yet.

If they really think they want post-processing, you definitely want to
make sure the text is converted to Unicode by the time it hits
post-processing.

 Gerald Clark - It was your simple, brief message that made me think, Is
 it that simple? It is. 

:)

(Thanks, Gerald.)

-- 
Joel Rees [EMAIL PROTECTED]


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-29 Thread Joel Rees

Dawn, I'm going to give in to the temptation to be pedantic. Apologies
in advance.

 Kirk Samuelson wrote:
  I've read lots of similar posts in the archives at 
  http://lists.mysql.com/. Many suggestions to use a BLOB 
  instead of a 
  text field. But MySQL supports double-byte languages. Why not use an 
  encoding it supports (SJIS or UJIS for Japanese) instead of this 
  kludge? If I compile MySQL to support UJIS with  --with-charset=sjis 
  won't text fields then store ujis encoded text properly? I'd like to 
  use Unicode too but if it's not supported yet...
 
 The idea is to be able to store Latin and Japanese in the same database
 (as well as Chinese  Korean). Isn't that supported by MySQL? People on
 this list say they've done it successfully. 
 
 If I compile MySQL using --with-charset=sjis , how will it handle the
 Latin, Chinese, and Korean characters? 

Multiple databases on multiple servers?

But you still have the problem of needing to handle the escape
characters correctly.

(Sorry I wasn't able to get that page up over the weekend.)

Go re-read the section of the manual on string literals. Ask yourself
what is going to happen when you try to insert the text 

Our network switches are 100% standard!

or 

You should store your preferences at c:\WINNT\bloopers\prefs.txt

in your database. How would you set up the database queries to store
those strings?

Say, do you want a little utility program in Java or C that will print
the hexadecimal values of the characters in a string? Basically, it
would be a command-line utility, so you would copy the text and then
paste it into the command line, after the name of the utility. You'd
need a compiler, of course. 

-- 
Joel Rees [EMAIL PROTECTED]


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-28 Thread Kirk Samuelson


On Friday, September 27, 2002, at 01:12  PM, Michael T. Babcock wrote:

 Dawn Friedland wrote:

 Prior to my client requesting that I add Japanese content to the 
 content
 tool  database, I had zero experience with characters sets other than
 Latin. I always used notepad to filter out any weird MS Word 
 formattings
 and left the default as ANSI.
 I had that problem a year ago too, prior to doing Japanese database 
 work.

 Many people have recommended I use UTF-8. I interpreted that to mean
 that when I have the Japanese text in notepad, I choose file, save as,
 and then choose the encoding ast UTF-8. When I do that, and then
 copy/paste to insert using the DOS prompt, I get the same problematic
 results. Is there something I am missing or not understanding when
 people tell me to use UTF-8  Am I supposed to configure the 
 table
 or database somehow to use it or should I be running the text through 
 a
 UTF-8 converter other than notepad?
 I wouldn't rely on your command prompt to be UTF-8 compliant; I'd 
 recommend inserting data using a web interface if nothing else (or 
 your own Unicode-compatible client) to a BINARY field (not TEXT) 
 unless you have MySQL with Unicode support.  Treat the data as binary 
 _everywhere_; pretend you can't translate it, etc. except using safe 
 tools (like the iconv library on *nix).  UTF-8 is just an encoding of 
 Unicode; you may get more milage in Windows using 16-bit Unicode.

Is there such a thing as MySQL with Unicode support? I'm fairly new to 
MySQL but all my research has led me to believe that this is still a to 
do item.

 See: http://www.unicode.org/ for reference, especially 
 http://www.unicode.org/unicode/faq/basic_q.html.

 To best deal with UTF-8 in a program, use dynamically-allocated 
 strings and never assume things like the 4th char in a string is 
 string[3] or anything.  Pass-through is the best way to deal with 
 UTF-8 until you actually have to handle processing of it (doing 
 something to a Unicode/UTF-8 string) -- read it from a 
 Unicode-compliant program / field / widget and write it straight to 
 the DB without translations, then read it when you need it and compare 
 it against something if necessary and display it.  Just because it 
 looks like garbage when its raw doesn't mean it _is_ garbage.

I've read lots of similar posts in the archives at 
http://lists.mysql.com/. Many suggestions to use a BLOB instead of a 
text field. But MySQL supports double-byte languages. Why not use an 
encoding it supports (SJIS or UJIS for Japanese) instead of this 
kludge? If I compile MySQL to support UJIS with  --with-charset=sjis 
won't text fields then store ujis encoded text properly? I'd like to 
use Unicode too but if it's not supported yet...

Thanks,

-Kirk


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-28 Thread Dawn Friedland

Kirk Samuelson wrote:
 I've read lots of similar posts in the archives at 
 http://lists.mysql.com/. Many suggestions to use a BLOB 
 instead of a 
 text field. But MySQL supports double-byte languages. Why not use an 
 encoding it supports (SJIS or UJIS for Japanese) instead of this 
 kludge? If I compile MySQL to support UJIS with  --with-charset=sjis 
 won't text fields then store ujis encoded text properly? I'd like to 
 use Unicode too but if it's not supported yet...

The idea is to be able to store Latin and Japanese in the same database
(as well as Chinese  Korean). Isn't that supported by MySQL? People on
this list say they've done it successfully. 

If I compile MySQL using --with-charset=sjis , how will it handle the
Latin, Chinese, and Korean characters? 

Dawn

 -Original Message-
 From: Kirk Samuelson [mailto:[EMAIL PROTECTED]] 
 Sent: Saturday, September 28, 2002 2:03 PM
 To: Michael T. Babcock
 Cc: [EMAIL PROTECTED]
 Subject: Re: Japanese Charset
 
 
 
 On Friday, September 27, 2002, at 01:12  PM, Michael T. Babcock wrote:
 
  Dawn Friedland wrote:
 
  Prior to my client requesting that I add Japanese content to the
  content
  tool  database, I had zero experience with characters 
 sets other than
  Latin. I always used notepad to filter out any weird MS Word 
  formattings
  and left the default as ANSI.
  I had that problem a year ago too, prior to doing Japanese database
  work.
 
  Many people have recommended I use UTF-8. I interpreted 
 that to mean 
  that when I have the Japanese text in notepad, I choose file, save 
  as, and then choose the encoding ast UTF-8. When I do 
 that, and then 
  copy/paste to insert using the DOS prompt, I get the same 
 problematic 
  results. Is there something I am missing or not understanding when 
  people tell me to use UTF-8  Am I supposed to configure the 
  table or database somehow to use it or should I be running 
 the text 
  through a
  UTF-8 converter other than notepad?
  I wouldn't rely on your command prompt to be UTF-8 compliant; I'd
  recommend inserting data using a web interface if nothing else (or 
  your own Unicode-compatible client) to a BINARY field (not TEXT) 
  unless you have MySQL with Unicode support.  Treat the data 
 as binary 
  _everywhere_; pretend you can't translate it, etc. except 
 using safe 
  tools (like the iconv library on *nix).  UTF-8 is just an 
 encoding of 
  Unicode; you may get more milage in Windows using 16-bit Unicode.
 
 Is there such a thing as MySQL with Unicode support? I'm 
 fairly new to 
 MySQL but all my research has led me to believe that this is 
 still a to 
 do item.
 
  See: http://www.unicode.org/ for reference, especially
  http://www.unicode.org/unicode/faq/basic_q.html.
 
  To best deal with UTF-8 in a program, use dynamically-allocated
  strings and never assume things like the 4th char in a string is 
  string[3] or anything.  Pass-through is the best way to deal with 
  UTF-8 until you actually have to handle processing of it (doing 
  something to a Unicode/UTF-8 string) -- read it from a 
  Unicode-compliant program / field / widget and write it straight to 
  the DB without translations, then read it when you need it 
 and compare 
  it against something if necessary and display it.  Just because it 
  looks like garbage when its raw doesn't mean it _is_ garbage.
 
 I've read lots of similar posts in the archives at 
 http://lists.mysql.com/. Many suggestions to use a BLOB 
 instead of a 
 text field. But MySQL supports double-byte languages. Why not use an 
 encoding it supports (SJIS or UJIS for Japanese) instead of this 
 kludge? If I compile MySQL to support UJIS with  --with-charset=sjis 
 won't text fields then store ujis encoded text properly? I'd like to 
 use Unicode too but if it's not supported yet...
 
 Thanks,
 
 -Kirk
 
 
 -
 Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)
 
 To request this thread, e-mail [EMAIL PROTECTED]
 To unsubscribe, e-mail 
 [EMAIL PROTECTED]
 Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php
 
 

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-27 Thread Joel Rees

 Lastly, I am convinced that this is a problem with MySQL.

I have some time again, let me try again. It's a tortuous tale, but bear
with me.

Background in brief:

Shift-JIS is a real pain in the neck to parse. When you look at a byte,
it's often impossible to tell whether you're looking at the first byte
or the second byte of the character.

Shift-JIS has this almost-neat feature where they shoe-horned US-ASCII,
with the two exceptions I've mentioned before, into the lowest 128 codes.
_If_ _you_ _parse_ _from_ _the_ _beginning_, you can deterministically
pick off each one-byte character and each two byte character. 

Unfortunately, if line-noise causes you to miss a byte, or if you start
anywhere but the beginning of a string, you can run into a lot of
situations where you can't tell whether you're in the middle of a
character or not.

The byte value of 0x5c, which is that wonderful backslash which MySQL
(and C, PHP, Java, et. al.) uses as an escape character, is one case in
point. It can be either a single byte of backslash, or it can be the end
byte of one of several valid Japanese characters.

*** moderately important stuff ***

I believe, if you set up MySQL to operate in shift-JIS, it parses the
characters correctly, in addition to handling sort and collation
correctly. But that's kind of beside the point here. If my memory here
is correct, you can set your whole database (well, your instance of
MySQL) to assume the shift-JIS character set, and MySQL will properly
ignore any 0x5c that is not a backslash.

*** end moderately important stuff ***

(unimportant stuff)
(The fact that the backslash shows up as the yen symbol in your browser
when looking at a shift-JIS page is a red herring. I shouldn't have
brought it up. Ignore it. Another piece of information you didn't need,
and should ignore, is that shift-JIS double-wide backslash is 0x815f. It
is not used as an escape character, so just forget I told you its value.
Don't bother looking for it in your text. It has nothing to do with
what's going on. Incidentally, its end byte is the same value as
US-ASCII underscore, which you also don't need to know.)
(end unimportant stuff)

But, you are trying to use multiple languages in a single database. It
so happens that, if you play by the rules (which are a little difficult
to keep straight the first time) you can get away with this. Why?
Because you are not sorting or collating, just getting the data out by
keys that have nothing to do with shift-JIS.

In either the current MySQL 4.0 or the next version, I don't remember
which, I am told you will be able to set the language for each table.
This will be marvelous. A lot of peace of mind, and great help, in fact.
But not necessary for your current project, since you aren't collating
or sorting on Japanese.

** extremely important stuff **

The issue you face is how to avoid the 0x5c being treated as an escape
character.

But it doesn't matter whether the text is Japanese or not.

** end extremely important stuff **

Let me explain that. You are taking text from an ordinary source and
pasting it into MySQL (through the command line, shall we say). You
should never have any escapes in that text. If you see a \, it is merely
a backslash. \t, if it were in your text, would not mean the tab
character. \n would not mean a newline. A quote would not be the
beginning of a string, and a \ would be just that, a backslash followed
by a double quote. See

http://www.mysql.com/doc/en/String_syntax.html 

Note carefully that this page tells you exactly what characters will
require escaping. (Sorry I can't provide the code. It shouldn't be that
hard to write, or maybe someone else on the list who uses .asp will
volunteer. As a hints, I'll give you a bit of untested C below.)

If MySQL allows you to turn off character escaping, you would want to
simply turn it off. As I understand, it doesn't. So you need to catch
NULs, backslashes, and single and double quotes, and stick a(nother)
backslash in front of them before you pass them to the database.

(This should feel a lot like what you do when picking up text from a
form on the web to go into an html page. But it is done at a different
time, and the characters to escape are different.)

** crux-of-the-matter **

This is crude, it feels like a real kludge, but it works. 

Since your database is not set up to parse shift-JIS, it is absolutely
going to think that every 0x5c it sees is a backslash, even when preceded
by 0x83 or some other lead byte of a valid shift-JIS character. It
doesn't know for shift-JIS. Because you haven't set it up to care, it
doesn't care at all, and just basically ignores the whole multi-byte
character business.

So, you just treat the 0x5c end-bytes like backslashes. Escape them like
they were real backslashes, and the escaping backslash gets eaten on its
way into the database.

** end crux-of-the-matter **

0x835c, for instance: Your escaping routine doesn't need 

Re: Japanese Charset

2002-09-27 Thread Michael T. Babcock

Joel Rees wrote:

Shift-JIS is a real pain in the neck to parse. When you look at a byte,
it's often impossible to tell whether you're looking at the first byte
or the second byte of the character.

Can I make a minor recommendation that doesn't help your current 
situation at all?

Use UTF-8.  You can _know_ which bytes are the first or middle bytes of 
a byte stream and in Japanese, its always three bytes per character 
(even though as an encoding, its variable length).  Its also sortable; 
but I haven't tried UTF-8 encoding in MySQL (nor do I know if it is 
actually supported).

-- 
Michael T. Babcock
C.T.O., FibreSpeed Ltd.
http://www.fibrespeed.net/~mbabcock



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-27 Thread Michael T. Babcock

Dawn Friedland wrote:

Prior to my client requesting that I add Japanese content to the content
tool  database, I had zero experience with characters sets other than
Latin. I always used notepad to filter out any weird MS Word formattings
and left the default as ANSI. 

I had that problem a year ago too, prior to doing Japanese database work.

Many people have recommended I use UTF-8. I interpreted that to mean
that when I have the Japanese text in notepad, I choose file, save as,
and then choose the encoding ast UTF-8. When I do that, and then
copy/paste to insert using the DOS prompt, I get the same problematic
results. Is there something I am missing or not understanding when
people tell me to use UTF-8  Am I supposed to configure the table
or database somehow to use it or should I be running the text through a
UTF-8 converter other than notepad? 
  

I wouldn't rely on your command prompt to be UTF-8 compliant; I'd 
recommend inserting data using a web interface if nothing else (or your 
own Unicode-compatible client) to a BINARY field (not TEXT) unless you 
have MySQL with Unicode support.  Treat the data as binary _everywhere_; 
pretend you can't translate it, etc. except using safe tools (like the 
iconv library on *nix).  UTF-8 is just an encoding of Unicode; you may 
get more milage in Windows using 16-bit Unicode.

See: http://www.unicode.org/ for reference, especially 
http://www.unicode.org/unicode/faq/basic_q.html.

To best deal with UTF-8 in a program, use dynamically-allocated strings 
and never assume things like the 4th char in a string is string[3] or 
anything.  Pass-through is the best way to deal with UTF-8 until you 
actually have to handle processing of it (doing something to a 
Unicode/UTF-8 string) -- read it from a Unicode-compliant program / 
field / widget and write it straight to the DB without translations, 
then read it when you need it and compare it against something if 
necessary and display it.  Just because it looks like garbage when its 
raw doesn't mean it _is_ garbage.

Others may have other tips ...

-- 
Michael T. Babcock
C.T.O., FibreSpeed Ltd.
http://www.fibrespeed.net/~mbabcock



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-26 Thread Dawn Friedland

Here's an update on this issue: The problem is when copy/pasting
Japanese characters into MySQL, certain characters are being corrupted. 
I've ruled out Word as the problem. 
I've ruled out my content tool as the problem.
I've ruled out the ASP display page as the problem. 

I've configured my Regional  Language settings to Japanese so I can see
the Japanese text in both the DOS Prompt and notepad.  I received
Japanese text which had been typed directly into notepad (saved as ANSI,
Unicode, UTF-8, or UTF-8 big endian - doesn't make a difference). I
inserted the text directly into the database by copy/pasting into the
DOS Prompt. A select statement will then display the data I just
enteredand the characters are broken! Here is a screen shot:
http://commworks01.barklouder.com/japan/press/370.jpg

For details see
http://commworks01.barklouder.com/japan/press/broken_chars.asp (You'll
notice that there aren't any backslashes present.)

Several people have suggested that I encode at UTF-8, then prior to
inserting the characters to the database, I should convert each
character to its HTML Entity in the form #; where  is the
hexadecimal representation of the UTF-16 value for the character.  My
Question: Does anyone know of a table that maps Japanese characters to
such HTML entities? (or know of a converter for that matter?)

Lastly, I am convinced that this is a problem with MySQL. I am somwhat
of a newbie - should I encourage my client to buy a support package from
MySQL AB? 

Sincerely,
Dawn Friedland
[EMAIL PROTECTED]

-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-24 Thread Dawn Friedland

Here's a brief summary of where we are: 
I trying to store Japanese text (Shift_Jis) in MySQL and view it from a
web page.  The content is provided to me in Word format. I convert it to
plain text, copy/paste into a web form in an ASP-based CMS on a Windows
box. When viewed from a web page, seemingly random characters are
morphed into other characters. The majority of the database contains
rows in Latin. MySQL supports Japanese and Latin in the same table.
Other people are able to do this without the morphing problem. My
Regional  Language settings in Windows are set to to Shift_Jis in order
to view Shift_Jis characters in notepad and the DOS prompt. If I
circumvent the CMS and copy/paste from notepad directly to MySQL in the
DOS Prompt, the results are the same (although fewer characters are
broken when viewed through DOS).

For a good explanation visit this problem's web site:
http://commworks01.barklouder.com/japan/press/broken_chars.asp

I conclude that one of two things may be happening:
1. Characters are being corrupted by virtue of the fact that their
source of origination were copied from Word, despite the conversion to
plain text. (At this point I do not have a plain text file with content
typed directly into notepadi.e. Word circumvented. I am at the mercy
of the client's PR department.) 
2. Characters are being corrupted by MySQL. 

If option 1 were true, then why do the characters show up fine when in a
static HTML document? (see below). 

In Response to Joel Rees:
 I checked the text you gave me, and I found what's getting 
 clobbered. It's the latter half of characters like the katakana 'so'.
 
 Although the byte that is getting walked on here is 0x5c, 
 this is _not_ the escape character. It is preceded (in the 
 case of katakana 'so') by a byte of 0x83. The entire 
 character is '0x835c', and the 0x5c is being treated as if it 
 were a backslash. There are other characters that will get 
 hit by this, by the way.

Question 1: It seems like a lot more characters are getting hit than
just '0x835c'. How do I map the 0x835c to what the character looks like?
I don't know what 0x835c is. 
Question 2: How  do I handle the character escape mechanism correctly
according to MySQL? 



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-24 Thread kayamboo

Hello friend
I am not an expert in mysql and I am also a newbie
But As for I think, mysql has nothing to do with this problem.
I am using mysql with jsp to store and retrieve japanese characters.
My O/S is also japanese WindowsNT4.0SP6a. Despite I get errors in my browser
in the beginning.So I wrote a small bean component.
This component will read the characters(Shift_JIS , a,b or any) and convert
into (8859_1) and put it into the database
While retrieving the data, I will do the reverse.
Also make sure the right charset is set in your html page

My knowledge in asp is a big zero. So if you are using jsp, I can send you
the bean.

Best of luck


- Original Message -
From: Dawn Friedland [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Joel Rees [EMAIL PROTECTED]; Shashank Tripathi [EMAIL PROTECTED]
Sent: Tuesday, September 24, 2002 4:54 PM
Subject: RE: Japanese Charset


Here's a brief summary of where we are:
I trying to store Japanese text (Shift_Jis) in MySQL and view it from a
web page.  The content is provided to me in Word format. I convert it to
plain text, copy/paste into a web form in an ASP-based CMS on a Windows
box. When viewed from a web page, seemingly random characters are
morphed into other characters. The majority of the database contains
rows in Latin. MySQL supports Japanese and Latin in the same table.
Other people are able to do this without the morphing problem. My
Regional  Language settings in Windows are set to to Shift_Jis in order
to view Shift_Jis characters in notepad and the DOS prompt. If I
circumvent the CMS and copy/paste from notepad directly to MySQL in the
DOS Prompt, the results are the same (although fewer characters are
broken when viewed through DOS).

For a good explanation visit this problem's web site:
http://commworks01.barklouder.com/japan/press/broken_chars.asp

I conclude that one of two things may be happening:
1. Characters are being corrupted by virtue of the fact that their
source of origination were copied from Word, despite the conversion to
plain text. (At this point I do not have a plain text file with content
typed directly into notepadi.e. Word circumvented. I am at the mercy
of the client's PR department.)
2. Characters are being corrupted by MySQL.

If option 1 were true, then why do the characters show up fine when in a
static HTML document? (see below).

In Response to Joel Rees:
 I checked the text you gave me, and I found what's getting
 clobbered. It's the latter half of characters like the katakana 'so'.

 Although the byte that is getting walked on here is 0x5c,
 this is _not_ the escape character. It is preceded (in the
 case of katakana 'so') by a byte of 0x83. The entire
 character is '0x835c', and the 0x5c is being treated as if it
 were a backslash. There are other characters that will get
 hit by this, by the way.

Question 1: It seems like a lot more characters are getting hit than
just '0x835c'. How do I map the 0x835c to what the character looks like?
I don't know what 0x835c is.
Question 2: How  do I handle the character escape mechanism correctly
according to MySQL?



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-24 Thread Joel Rees

Dawn, 

Look at chapter 6.1.1.1 in the MySQL docs:

http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html#Literals

Practically the first odd thing mentioned is escape sequences. Keep in
mind that the ASCII code for the backslash character, which is used to
initiate escape sequences, is 0x5c.

(I work with php and perl. Anyone have a good sample of input filters
from .asp files to MySQL?)

 Here's a brief summary of where we are: 
 I trying to store Japanese text (Shift_Jis) in MySQL and view it from a
 web page.  The content is provided to me in Word format. I convert it to
 plain text, copy/paste into a web form in an ASP-based CMS on a Windows
 box. When viewed from a web page, seemingly random characters are
 morphed into other characters. The majority of the database contains
 rows in Latin. MySQL supports Japanese and Latin in the same table.
 Other people are able to do this without the morphing problem. My
 Regional  Language settings in Windows are set to to Shift_Jis in order
 to view Shift_Jis characters in notepad and the DOS prompt. If I
 circumvent the CMS and copy/paste from notepad directly to MySQL in the
 DOS Prompt, the results are the same (although fewer characters are
 broken when viewed through DOS).
 
 For a good explanation visit this problem's web site:
 http://commworks01.barklouder.com/japan/press/broken_chars.asp

Hmm. The characters you pasted directly into the .asp file did not
survive intact. I can't read that very first sample, can't even guess
what that's saying. The first few words are fourth quarter soft
(something) market, but enough falls apart after that that it's hard to
tell what else is missing. It looks like something about functionality
being scheduled for development.

If you understand what MySQL is doing to two-byte characters which have
a second byte of 0x5c, then you are ready to dig into ASP and find out
if ASP wants the text escaped somehow. (Or take that question to an ASP
mailing list.)

Can you post that page as pure, unserved, html and send me the link
off-list? (Text, with the .htm extension, unless your server forces html
through asp, too.) If I can make sense of it as pure html text, you'll
be able to completely rule MSWord and the OS out.

 I conclude that one of two things may be happening:
 1. Characters are being corrupted by virtue of the fact that their
 source of origination were copied from Word, despite the conversion to
 plain text. (At this point I do not have a plain text file with content
 typed directly into notepadi.e. Word circumvented. I am at the mercy
 of the client's PR department.) 

No real need to worry about MSWord, I think. Anyway, if, as you say below,
pasting the characters in static HTML is okay, you can be sure that
MSWord is giving you no problems now.

 2. Characters are being corrupted by MySQL. 

Well, sort of. Except that MySQL is not really the culprit, because the
behavior in question is part of the spec, and has been for quite a while.

(At least one user of MySQL wanted MySQL to change their spec to conform
with Oracle's spec, but since the state of the SQL standard is a mess,
it's a hard point to argue right now. But the escape sequence _is_ part
of MySQL's spec.)

 If option 1 were true, then why do the characters show up fine when in a
 static HTML document? (see below). 

I want to see that static HTML.

 In Response to Joel Rees:
  I checked the text you gave me, and I found what's getting 
  clobbered. It's the latter half of characters like the katakana 'so'.
  
  Although the byte that is getting walked on here is 0x5c, 
  this is _not_ the escape character. It is preceded (in the 
  case of katakana 'so') by a byte of 0x83. The entire 
  character is '0x835c', and the 0x5c is being treated as if it 
  were a backslash. There are other characters that will get 
  hit by this, by the way.
 
 Question 1: It seems like a lot more characters are getting hit than
 just '0x835c'. How do I map the 0x835c to what the character looks like?

I said like. I suppose I was not clear about how they would be similar.

Two-byte characters with a final byte of 0x5c are going to be caught by
MySQL, interpreted as something followed by an escape character,
followed by the next byte (first byte of the next character) as a
literal. In some odd situations, you might end up with a control
character, in others, the 0x5c just simply disappears, leaving the
character stream corrupted. Only one byte lost, but the final app will
think that characters are starting on what is really the second byte.

Once you lose one character, a whole bunch get out of sync.

Example, the sequence for sofuto (modern Japanese word imported from
the English soft) is 0x835c 0x8374 0x8367. If you let MySQL try to
interpret the 0x5c as a backslash, it thinks that you're just telling it
that it should not do anything out of the ordinary with the 0x83 which
follows. The result is 0x8383 0x74 0x8367, which is not too bad. 0x8383

Re: Japanese Charset

2002-09-20 Thread Joel Rees

...

 The problem characters are the ASCII backslash and the ASCII tilde -
 Good to know, I will eliminate those, although there are still many more
 problem characters. 

Well, actually, the one-byte backslash and tilde you can leave alone.
They survive intact, they just display differently over here.
Everybody's used to it, so no problem. Even programming in C, when we
write something like '\t', the backslash (0x5c) shows up in our editors
as the yen symbol, and we pretend that the escape character is the yen
symbol, because, for us, when the encoding is shift-JIS, it is. So don't
worry about the one-byte characters.

...

I checked the text you gave me, and I found what's getting clobbered.
It's the latter half of characters like the katakana 'so'.

Although the byte that is getting walked on here is 0x5c, this is _not_
the escape character. It is preceded (in the case of katakana 'so') by a
byte of 0x83. The entire character is '0x835c', and the 0x5c is being
treated as if it were a backslash. There are other characters that will
get hit by this, by the way.

Bells ringing all over in my head. I think your content tool is
mishandling backslashes, but it could be that MySQL or the driver is
doing something the tool doesn't expect. (Well, really, the tool is
probably mis-handling the backslashes.) 

This is actually independent of the language issues. I'm pretty sure
I've seen this subject come up before on the list, just can't remember
which way the turkey rolled. But your content tool will need to do
something slightly different with the input.

Could you search the archives about escape sequences or the backslash
character?

(Maybe someone who remembers could chime in here?)

-- 
Joel Rees [EMAIL PROTECTED]


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-20 Thread xuefer tinys

i'm sure mysql4.x have wrong algorithm to escape/unescape multibyte chars
a multibyte is escaped while server read it, it maybe a pair of single byte
or a pair of single byte escaped while server read it, it appears as 
multibyte
both of these two situation make server unescape incorrectly
i've post the problem, no one take attention to it. at least, those who not 
using multibyte will never care about this problem.

dunno weather your problem really cause by this wrong 
multi-byte-escape-algorithm

From: Joel Rees [EMAIL PROTECTED]
To: Dawn Friedland [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: Japanese Charset
Date: Fri, 20 Sep 2002 16:25:29 +0900

...

  The problem characters are the ASCII backslash and the ASCII tilde -
  Good to know, I will eliminate those, although there are still many 
more
  problem characters.

Well, actually, the one-byte backslash and tilde you can leave alone.
They survive intact, they just display differently over here.
Everybody's used to it, so no problem. Even programming in C, when we
write something like '\t', the backslash (0x5c) shows up in our editors
as the yen symbol, and we pretend that the escape character is the yen
symbol, because, for us, when the encoding is shift-JIS, it is. So don't
worry about the one-byte characters.

...

I checked the text you gave me, and I found what's getting clobbered.
It's the latter half of characters like the katakana 'so'.

Although the byte that is getting walked on here is 0x5c, this is _not_
the escape character. It is preceded (in the case of katakana 'so') by a
byte of 0x83. The entire character is '0x835c', and the 0x5c is being
treated as if it were a backslash. There are other characters that will
get hit by this, by the way.

Bells ringing all over in my head. I think your content tool is
mishandling backslashes, but it could be that MySQL or the driver is
doing something the tool doesn't expect. (Well, really, the tool is
probably mis-handling the backslashes.)

This is actually independent of the language issues. I'm pretty sure
I've seen this subject come up before on the list, just can't remember
which way the turkey rolled. But your content tool will need to do
something slightly different with the input.

Could you search the archives about escape sequences or the backslash
character?

(Maybe someone who remembers could chime in here?)

--
Joel Rees [EMAIL PROTECTED]


-
Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail 
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




_
ÓëÁª»úµÄÅóÓѽøÐн»Á÷£¬ÇëʹÓà MSN Messenger: 
http://messenger.microsoft.com/cn


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-20 Thread Shashank Tripathi

Hi Xuefer,

What are you talking about? A lot of people are using MySQL without any
problems with multibyte characters. Please post a reference URL, with
perhaps a detailed explanation of the problem -- you mentioned you have
brought this to the attention of people already, please post a relevant
URL? 

Thanks,
Shashank



| i'm sure mysql4.x have wrong algorithm to escape/unescape 
| multibyte chars a multibyte is escaped while server read 
| it, it maybe a pair of single byte or a pair of single 
| byte escaped while server read it, it appears as 
| multibyte
| both of these two situation make server unescape 
| incorrectly i've post the problem, no one take attention 
| to it. at least, those who not 
| using multibyte will never care about this problem.
| 
| dunno weather your problem really cause by this wrong 
| multi-byte-escape-algorithm
| 




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-20 Thread Paul DuBois

At 9:09 + 9/20/02, xuefer tinys wrote:
i'm sure mysql4.x have wrong algorithm to escape/unescape multibyte chars
a multibyte is escaped while server read it, it maybe a pair of single byte
or a pair of single byte escaped while server read it, it appears as multibyte
both of these two situation make server unescape incorrectly
i've post the problem, no one take attention to it. at least, those
who not using multibyte will never care about this problem.

You'll probably find that the issue receives more developer attention
if you can provide hard evidence that there is a problem, preferably
accompanied by a repeatable test case.  The assertion I'm sure MySQL
has a problem just doesn't carry much weight otherwise.  There are plenty
of assertions like that on this mailing list, the vast majority of which turn
out to be misunderstanding on the user end.

I'm not saying you haven't uncovered a real bug, just that a better
demonstration that there *is* a bug would be more helpful than just making
a claim.


dunno weather your problem really cause by this wrong
multi-byte-escape-algorithm

From: Joel Rees [EMAIL PROTECTED]
To: Dawn Friedland [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: Japanese Charset
Date: Fri, 20 Sep 2002 16:25:29 +0900

...

  The problem characters are the ASCII backslash and the ASCII tilde -
   Good to know, I will eliminate those, although there are still many
more
   problem characters.

Well, actually, the one-byte backslash and tilde you can leave alone.
They survive intact, they just display differently over here.
Everybody's used to it, so no problem. Even programming in C, when we
write something like '\t', the backslash (0x5c) shows up in our editors
as the yen symbol, and we pretend that the escape character is the yen
symbol, because, for us, when the encoding is shift-JIS, it is. So don't
worry about the one-byte characters.

...

I checked the text you gave me, and I found what's getting clobbered.
It's the latter half of characters like the katakana 'so'.

Although the byte that is getting walked on here is 0x5c, this is _not_
the escape character. It is preceded (in the case of katakana 'so') by a
byte of 0x83. The entire character is '0x835c', and the 0x5c is being
treated as if it were a backslash. There are other characters that will
get hit by this, by the way.

Bells ringing all over in my head. I think your content tool is
mishandling backslashes, but it could be that MySQL or the driver is
doing something the tool doesn't expect. (Well, really, the tool is
probably mis-handling the backslashes.)

This is actually independent of the language issues. I'm pretty sure
I've seen this subject come up before on the list, just can't remember
which way the turkey rolled. But your content tool will need to do
something slightly different with the input.

Could you search the archives about escape sequences or the backslash
character?

(Maybe someone who remembers could chime in here?)

--
Joel Rees [EMAIL PROTECTED]


-
Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




_
ó^¡ä—ìIï¸óF‡¯çså•ó¨ÅC«Îégóp MSN Messenger: http://messenger.microsoft.com/cn


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-20 Thread xuefer tinys

yes, when i post a problem, someone told me to give more info
when i post the whole php-script, no response

mysql-4.x unable to handle binary data when using multibyte charset, (maybe 
the old mysql-3.x escape/unescape is fine)

file: mysql-4.0.1-alpha/libmysql_r/libmysql.c
in function mysql_sub_escape_string

...
#ifdef USE_MB
int l;
if (use_mb_flag  (l = my_ismbchar(charset_info, from, end)))
{
  while (l--)
  *to++ = *from++;
  from--;
  continue;
}
#endif

what the hell is that while (l--) *to++ = *from++; ??
it has never been in old mysql-3.x

and now, re-post my mail here


ok, finally have a kind man take a look at my problem :)
now before pasting a long example-script
here is the test result:

**
case 1
php4.2, mysql4.0.2 using GBK charset, windows
mysqlclient-gbk is not supported by php4.2 win build, but i made gbk.conf 
in C:\mysql\share\charsets

result:


string - ? ( [ ?] [  ] )
escaped - È\ ( [ ?] [ \ ] [  ] )
result - È\ ( [ ?] [ \ ] [  ] )
*** damn!


string - ? ( [ ?] [ ' ] )
escaped - È\' ( [ ?] [ \ ] [ ' ] )
cant query, error:#1064 You have an error in your SQL syntax near ''È\''' 
at line 1

string - È\' ( [ ?] [ \ ] [ ' ] )
escaped - È\\\' ( [ ?] [ \ ] [ \ ] [ \ ] [ ' ] )
cant query, error:#1064 You have an error in your SQL syntax near ''È\\\''' 
at line 1

**
case 2
php4.2, mysql4.0.2 using GBK charset, linux
php4.2 compiled with lib mysql, GBK supported

result:


string - ? ( [ ?] [  ] )
escaped - È\ ( [ ?] [ \ ] [  ] )
result - È\ ( [ ?] [ \ ] [  ] )
*** damn!


string - ? ( [ ?] [ ' ] )
escaped - È\' ( [ ?] [ \ ] [ ' ] )
cant query, error:#1064 You have an error in your SQL syntax near ''È\''' 
at line 1

string - È\' ( [ ?] [ \ ] [ ' ] )
escaped - È\\' ( [ ?] [ \ ] [ \ ] [ ' ] )
result - È\' ( [ ?] [ \ ] [ ' ] )
* fine


and the php test script
**
?php
error_reporting(E_ALL);
$conn = mysql_connect('localhost', 'user', 'pass') or die('cant connect');

test(chr(200) . '', $conn);
test(chr(200) . ', $conn);
test(chr(200) . \\', $conn);

function test($str, $conn)
{
echo brbr;
dump_str('string', $str);
$q_str = mysql_escape_string($str); // you may also try 
mysql_escape_string() (php cvs only)
dump_str('escaped', $q_str);
$res = mysql_query(SELECT '$q_str');
if (!$res) {
print('font color=redcant query/font, error:#'
. mysql_errno()
. ' '
. mysql_error());
return;
}
$row = mysql_fetch_row($res) or die('empty result');
dump_str('result', $row[0]);
echo $row[0] === $str ? * finebr:*** damn!br;
}
function dump_str($name, $str)
{
echo $name -gt; $str (;
for ($i = 0; $i  strlen($str); $i ++)
{
echo ' [ ' , $str{$i}, ' ] ';
}
echo )br;
}
?


From: Paul DuBois [EMAIL PROTECTED]
To: xuefer tinys [EMAIL PROTECTED], 
[EMAIL PROTECTED],[EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: Japanese Charset
Date: Fri, 20 Sep 2002 08:56:22 -0500

At 9:09 + 9/20/02, xuefer tinys wrote:
i'm sure mysql4.x have wrong algorithm to escape/unescape multibyte 
chars
a multibyte is escaped while server read it, it maybe a pair of 
single byte
or a pair of single byte escaped while server read it, it appears 
as multibyte
both of these two situation make server unescape incorrectly
i've post the problem, no one take attention to it. at least, those
who not using multibyte will never care about this problem.

You'll probably find that the issue receives more developer 
attention
if you can provide hard evidence that there is a problem, preferably
accompanied by a repeatable test case.  The assertion I'm sure 
MySQL
has a problem just doesn't carry much weight otherwise.  There are 
plenty
of assertions like that on this mailing list, the vast majority of 
which turn
out to be misunderstanding on the user end.

I'm not saying you haven't uncovered a real bug, just that a better
demonstration that there *is* a bug would be more helpful than just 
making
a claim.


dunno weather your problem really cause by this wrong
multi-byte-escape-algorithm

From: Joel Rees [EMAIL PROTECTED]
To: Dawn Friedland [EMAIL PROTECTED]
CC: [EMAIL PROTECTED]
Subject: Re: Japanese Charset
Date: Fri, 20 Sep 2002 16:25:29 +0900

...

  The problem characters are the ASCII backslash and the ASCII 
tilde -
   Good to know, I will eliminate those, although there are still 
many
more
   problem characters.

Well, actually, the one-byte backslash and tilde you can leave 
alone.
They survive intact, they just display differently over here.
Everybody's used to it, so no problem. Even programming in C, when 
we
write something like '\t', the backslash (0x5c) shows up in our 
editors
as the yen symbol, and we pretend that the escape character is the 
yen
symbol

RE: Japanese Charset

2002-09-20 Thread Shashank Tripathi

I hope you will recognize that what you stated as a problem is _not_
what this thread was about. 

You are talking about --

  (a) A new (and as yet non-universal) version of MySQL 
  (b) Only binary data 
  (c) Conf done by you, which is not standard 

Whereas the thread was about simple Japanese multibyte display. You'll
notice that being more specific and relevant within the context of a
mailing list thread is more fruitful as Paul indicated in an earlier
note. 

Anyway, hope you sort out your problem (if there is one). 

Cheers,
Shanx


Sql, query 



-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-20 Thread xuefer tinys

i'm sorry that i've not read through nor understand the whole thread


From: Shashank Tripathi [EMAIL PROTECTED]
To: 'xuefer tinys' 
[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED]

CC: [EMAIL PROTECTED]
Subject: RE: Japanese Charset
Date: Sat, 21 Sep 2002 00:22:04 +0900

I hope you will recognize that what you stated as a problem is _not_
what this thread was about.

You are talking about --

   (a) A new (and as yet non-universal) version of MySQL
   (b) Only binary data
   (c) Conf done by you, which is not standard

Whereas the thread was about simple Japanese multibyte display. You'll
notice that being more specific and relevant within the context of a
mailing list thread is more fruitful as Paul indicated in an earlier
note.

Anyway, hope you sort out your problem (if there is one).

Cheers,
Shanx


Sql, query



-
Before posting, please check:
http://www.mysql.com/manual.php   (the manual)
http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail 
[EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php





_
ÏíÓÃÊÀ½çÉÏ×î´óµÄµç×ÓÓʼþϵͳ¡ª MSN Hotmail¡£http://www.hotmail.com/cn


-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




RE: Japanese Charset

2002-09-19 Thread Dawn Friedland

Thank you for your time and response. 

I have checked if the text is surviving the paste buffer - it is. (I did
a character by character comparison of the text one pasted in the web
form, but before hitting submit.)

The problem characters are the ASCII backslash and the ASCII tilde -
Good to know, I will eliminate those, although there are still many more
problem characters. 

How do I catch what's going into IIS and what's coming out? Also, what's
going into MyODBC and what's coming out of that? - I will post a
separate message to the list with this as the only question. 

I declare the doctype with meta http-equiv=Content-Type
content=text/html; charset=Shift_JIS. My code is exactly what
Shashank Tripathi posted in another response to my post - which he says
works perfectly for him on Windows/ASP/MySQL.

I am not displaying Japanese, Chinese, Korean, and English on the same
page: each page is intended for a single language audience. However,
both Shashank Tripathi and you suggested using UTF-8 anyway. I will
reserve this as a possible solution. Thank you for the idea. Good to
know I have options. I'm sure I'll be posting questions on that later if
it comes to it. 

What do you mean by Installed, but not selected.

So MySQL is using Latin...but with support for multi-byte charsets.
Thank you - I wasn't sure how to interpret the manual. Here is the
documentation from http://www.mysql.com/doc/en/Character_sets.html to
which I assume you are referring. All standard MySQL binaries are
compiled with --with-extra-charsets=complex. This will add code to all
standard programs to be able to handle latin1 and all multi-byte
character sets within the binary.

I'll be sending you and Shashank Tripathi links to the problem pages and
also the source content  shortly. Anyone else interested in this thread
should email me and I'll include them on the links. 

Again, thank you so much for your time and effort in responding to my
post. 
Dawn Friedland
[EMAIL PROTECTED]




-
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/   (the list archive)

To request this thread, e-mail [EMAIL PROTECTED]
To unsubscribe, e-mail [EMAIL PROTECTED]
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php




Re: Japanese Charset

2002-09-18 Thread Joel Rees

 This is for anyone out there storing Japanese characters along with
 English characters. 

Hi.

 SUMMARY: 
 The client recently requested that Japanese be stored in an otherwise
 standard English (Latin) MySQL database. Whereas all rows in the table
 used to be Latin only, now some rows store Latin and some store
 Japanese.

This should create no problems for MySQL. Drivers and other software
between may do funny things. Hmm. I wonder if the text is surviving your
paste buffer, if you aren't running the system in Japanese. In other
words, I'm wondering if your text survives the paste from Word to your
publishing tool.

 I do not mix English with Japanese in the same row.

Actually, with one or two exceptions, shift-JIS and euc-JIS should allow
you to mix Japanese and English with no problems. Straight JIS would
have problems, however, because it is two-byte-only.

The problem characters are the ASCII backslash, which is the (half-width)
yen symbol in shift-JIS, and the ASCII tilde, which is sometimes the
(half-width) overbar in shift-JIS. I think euc-JIS does the same
substitutions, but it's been several months since I messed with that.

 Upon
 writing Japanese data to the database (web form - ASP - MyODBC), and
 then viewing the record on a web page (Shift-Jis), I discover that
 random Japanese characters are being 'morphed' into other, seemingly
 random, Japanese characters, and very occasionally, 'morphed' into a
 Latin character (so far just the letter t).

Can you catch what's going into IIS and what's coming out? Also, what's
going into MyODBC and what's coming out of that?

I vaguely recall that MyODBC sometimes coughs if not set up right.

Say, how are you declaring your doctype? You know, the Content-Type
header or meta-tag, or the XML doctype. See:

HTML: http://www.w3.org/TR/html4/charset.html#h-5.2.2
Header:
Content-Type: text/html; charset=EUC-JP
or Meta-tag:
META http-equiv=Content-Type content=text/html; charset=EUC-JP

XML: http://www.w3c.org/TR/2000/REC-xml-20001006#sec-prolog-dtd
?xml version=1.0 encoding=shift-jis ?

I think the driver may throw fits if you don't have the document type
declared right. (See above about mixing character sets.)

Hmm. You may find it easiest, if you are trying to display Japanese,
Chinese, Korean, and English on the same page, to use Unicode UTF-8
throughout.

 With the exception of
 these few, random characters, all the Japanese data looks fine *when
 displayed on a web page*.

Worst comes to worst, post the problem characters and what they're
supposed to be. (Or mail them to me direct.) I can take a look and see
what bit patterns are causing problems, and that should yield some clues.

 This is a standard install of MySQL version
 3.23.38-nt (on Windows 2000 SP2) - support for Japanese characters is
 installed by default, I assume. 

Installed, but not selected.

 I also store Chinese and Korean
 characters in the same table, and those character sets are diplayed
 without error. 

I would expect errors there too, unless you're using Unicode (UTF-8) for
those.

 Question 1. If I were to pull the Japanese rows out and put them in a
 separate table - what do I do to the table to 'configure' it as storing
 sjis characters without setting the default character set to the entire
 database?

Your version of MySQL does not support that. It's set in the my.ini or
my.cnf configuration file for the whole database when you start MySQL up.
So changing the settings for Japanese won't solve your problems unless
you want to set up another instance (MSWindows, so that probably means
another machine) of MySQL just for the Japanese. You shouldn't need to
do that, however.

The settings in my.cnf/my.ini are primarily for sort and collation order.

(And error messages. Ick. There's a back-burner project I'd forgotten.
Has the pan melted yet?)

 Question 2. How do I view Japanese records in the command line *in
 Japanese* to eliminate the possiblity that the culprit is somewhere
 outside of MySQL, for example: Microsoft IIS or ASP or MyODBC?

Sorry. You'll need to set up a machine running in Japanese to do that,
as far as I know. Well, if you know how to redirect to a file, and if
you have a text editor capable of displaying Japanese, that might get
you a look at the text. But it might introduce some other unknowns, as
well.

I think you mentioned a colleague who can read Japanese? It might be
worth your while to, oh, wait, the MSW2k box is your server, so you
don't want to mess with that. It would be handy if the box your
publishing tool runs on could be set up to boot the OS into either
English or Japanese. (Mac OS X can set the language for the OS at log-in
time, seems like MSW ought to be able to at least switch on boot.)

 Question 3. How do I tell which charset MySQL is using, euc-jis or
 s-jis? 

It's Latin, unless you've set the language in my.cng/my.ini. It's in the
manual, section 4.6.