Roy Lyseng wrote:
Daniel John Debrunner wrote:
Thus Derby could have two character sets:
- USER - UCS repertoire with default collation of UCS_BASIC or
UNICODE depending on value of collation JDBC attribute at create
database time
- SYSTEM - UCS repertoire with default collation of UCS_BASIC
I think that you should carefully consider the implications of using two
character sets. Among other things, it means that two strings with
different character sets are not immediately comparable. And as far as I
know, this applies to literals as well. What this means (I think) is
that if columns in system tables are defined with character set SYSTEM,
columns in user-defined tables are defined with character set USER, and
literals are of type USER, then you cannot immediately compare literals
with the character columns in the system tables.
Note I'm using "character set" as the SQL Standard defines it (section
4.2.7) and different character sets are comparable if they have a
collation in common (section 4.2.2).
I think the SQL Standard also mandates multiple character sets if one
wants different default collations. The expression CURRENT USER has a
mandated character set of SQL_IDENTIFIER, thus Derby must support that,
and it is required that SQL identifiers have UCS_BASIC collation. Then a
CREATE TABLE picks up its collation from its default *character set*
which comes from its schema 11.4 SR10b), so to have a different default
collation to SQL_IDENTIFIER a different character set is needed.
Another option is to use one character set, but use different collations
for different types of tables. You may define that character columns in
system tables are created using collation UCS_BASIC, while all user
tables are created with a user-defined collation. Because all columns
are defined using the same character set, all columns and literals will
be comparable.
Is that correct? So far the discussion has assumed columns with
different implicit collations are not comparable, see 9.3 SR3e).
I don't think it's a goal to have columns in system tables be comparable
with user columns since if they have different collations the standard
says they are not. [assuming no implementation of a <collate clause>]
A goal is to have the SQL queries used for JDBC metadata continue to
work, which is currently the discussion around literals. The standard
seems to poorly define the character set of a string literal.
Just remember that when comparing two strings with different defined
collations, you need to consider the collation rules defined by the SQL
standard.
Right, I think we are trying to understand those rules, how they apply
to Derby and the proposed changes for DERBY-1478.
Thanks,
Dan.