This is mixing a lot of things up. I also may use the wrong
terminology here.
Character set encodings really only come into play with tools like
ij, and import getting the string from the environment into derby. The more
standard interaction is using jdbc to load a java string into derby.
At that level we don't do anything with encodings.
We happen to use a modified utf8 to store stuff to disk, and this is
not configurable. But no user interface should depend on this encoding,
and Derby could change this storage in the future.
Logically all strings at runtime are converted to standard java char.
Before 10.3 we always used standard java string compare which did a
numerical comparison of the unicode value of chars to arrive at
ordering. That is still the default. In 10.3 an option was added to
set the territory based collation when the database is created such that
comparison is dependent on the territory of the database. For this
standard java
rule based Collator interfaces are used. This is documented in the latest
derby release.
David Van Couvering wrote:
Hi, all. I am getting some questions from Ken Frank NetBeans
internationalization quality team about Java DB and character set
encodings. Rather than try and play go-between, I'm including him
here so he can directly ask any follow-on questions.
Ken would like to understand how Derby makes use of character
encodings, and how it is affected by various settings. How does
Derby handle things if the encoding is set to something different from
our default of UTF-8? Are we impacted, or do we rely on Java routines
such as the Collator and Comparator class to handle this?
Sorry if I'm talking out my ear, i18n is not one of my fortes.
Thanks,
David