On 7/2/06, Agent M <[EMAIL PROTECTED]> wrote:

Certain Japanese characters cannot make a reliable round-trip through
Unicode. ICU uses UTF-16 as its store, so the Japanese folks won't be
happy with an ICU-only solution. However, it would still be of great

Could you explain what you mean and what's special with those characters?

benefit to allow ICU to handle as much as possible, leaving the string
encodings to the encoding experts.

At the very least, it would be great to have ICU to handle encoding on
a per-column basis (perhaps extending the text datatype with encoding
info). Perhaps this would be a decent stopgap solution? The backend
protocol would also need a version bump- currently, it converts all
strings to a single encoding.

Could you give an example of what that would look like in your opinion?
I was thinking more along the lines of a setting in pg_hba.conf where
the server uses or does not use something like ICU...at least as an
intermediate solution.
Adding a "LOCALE" clause to a column definition (similar to the
"ENCODING" clause of the "CREATE DATABASE" statement) would solve most
(not all) problems with a default locale.
There still might be some non-deterministic behaviour with operations
between strings in different locales but it's far from a showstopper.

t.n.a.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to