Just so I understand. . .and I think I understood UNICODE BEFORE I started reading all
the literature that seemed to confuse the matter. :)
UNICODE is a character encoding that can handle any character irrespective of language
When I output to the web I will need to convert UNICODE to some appropriate
character-set based upon the language selection.
Is this correct? Or can this be done automatically. . .or at least, can I just avoid
it and send the UNICODE data directly to a web-browser and let the browser do whatever
is necessary. As I intend to develop a system that can handle an arbitrary number of
languages, I want let the code handle any language without me necessarily having to
add more and more code to support it -- I would love it if I could just choose one
flavor -- UNICODE -- and that be it. But hey, I know I do not live in an ideal world.
. . ;)
I do appreciate your help.
Thanks,
Ward
-----Original Message-----
From: Andrew McNaughton [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, August 01, 2001 9:27 PM
To: Vuillemot, Ward W
Subject: Re: DBD:mysql and UNICODE
On Wed, 1 Aug 2001, Vuillemot, Ward W wrote:
> Date: Wed, 1 Aug 2001 15:57:16 -0700
> From: "Vuillemot, Ward W" <[EMAIL PROTECTED]>
> To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> Subject: DBD:mysql and UNICODE
>
> I am looking to develop a set of databases that can handle
> international character sets. For example, I want to have menu items
> that can be changed on the fly from, say, English to Japanese to
> German to Chinese.
>
> Should I create a table that correlates each language with a UNICODE
> set? And then create a table where each row is for a specific
> language and the columns being the individual entries? After that,
> can I use a lookup into the first table based on the key of the second
> table to determine what type of UNICODE character-set it is. (sorry,
> I am typing out load as it were ;) ).
Your character set in the database *is* unicode. There's only one unicode
character set. All other common to medium-rare character sets are subsets
of that one big set. Keep things simple and store nothing in your
database that's not in unicode.
You could store your strings as you say, but I'd be inclined to have every
string in its own row, and have a column which identifies the language.
For a given language (eg english), there might be multiple possible
character encodings (eg iso-8859-1, cp1252, utf-8), and you might choose
to support more than one in your web output. You might store
language/character encoding combinations in your database, but character
encoding and character set are not to be confused.