RE: Client-APIs, unicode and charsets

Schroeder, Alexander Wed, 07 Sep 2005 01:26:47 -0700

Hello Sven,

usually we do not encourage to store anything other than 
ASCII/ISO-8859-1 in ASCII columns, as the charset itself is not
stored in the database, and we made the experience that e.g. a wrongly
or differently set locale may confuse lots of people with unwanted 
byte-strings conversions, and you may end up with data no one is really
able to read or decipher, as the originally used encoding/charset is 
lost.


If you had tried to connect with unicode=yes to a pure ASCII database,
you would have noticed it returns an error (i.e. that case isn't 
possible).

So, for JDBC, we suggest using an UNICODE database if storing anything other
than ASCII. (You may anyway create ASCII columns if you really need them
also in an UNICODE database).

Regards
Alexander Schröder
SAP Labs Berlin

-----Original Message-----
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Sven Köhler
Sent: Sonntag, 4. September 2005 18:50
To: [email protected]
Subject: Client-APIs, unicode and charsets

Hi,

i always wondered, what the JDBC driver does if it isn't accessing a
unicode-database. What is does, is quite inconvenient. AFAIK is assumes
ISO-8859-1. IMHO, the charset used for binary data _should_ be a
parameter of the connection-url and it should default to java's default
charset - well, or to ISO-8859-1 if you like that better.

On the other hand, there's another problem:
What if i connect with "unicode=yes" to a non-unicode database? I guess,
the MaxDB-kernel will convert the unicode-strings back to byte-strings -
but which charset is used for that? I guess this questions also applies
to writing strings to the database.

IMHO the JDBC should default to "unicode=yes" but with an adjustable
charset for all conversions from unicode- to byte-strings - even those
conversions that take place in the MaxDB kernel.

Non-unicode database are therefor currently unusable for JDBC-clients,
if the applications that write into the database (non-JDBC-clients)
don't use ISO-8859-1. In most cases, the charset these applications use
will depend on their current environment (i.e. the locale).
The main point is: i guess that byte-strings are copied into the
database uncheck - i mean, you cannot assume that these strings are
ISO-8859-1 or anythings else. They are just byte strings.

On the other hand, currently unicode-database are currently unsuable for
clients like DBD::MaxDB, ODBC (using the byte-string-API), ...

All that i've said also applies to ODBC (if the ODBC-unicode API is
used). The ODBC-driver doesn't accept any charset-parameter too. So any
conversions that takes place will again be based on some charset that's
either forced by the current locale-settings or hardcoded.


Well, are you aware of all the problems?

When will that change?


Thanks
  Sven

--
MaxDB Discussion Mailing List
For list archives: http://lists.mysql.com/maxdb
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: Client-APIs, unicode and charsets

Reply via email to