Roy Lyseng wrote:


Daniel John Debrunner wrote:

Thus Derby could have two character sets:
- USER - UCS repertoire with default collation of UCS_BASIC or UNICODE depending on value of collation JDBC attribute at create database time
 - SYSTEM - UCS repertoire with default collation of UCS_BASIC

I think that you should carefully consider the implications of using two character sets. Among other things, it means that two strings with different character sets are not immediately comparable. And as far as I know, this applies to literals as well. What this means (I think) is that if columns in system tables are defined with character set SYSTEM, columns in user-defined tables are defined with character set USER, and literals are of type USER, then you cannot immediately compare literals with the character columns in the system tables.

Note I'm using "character set" as the SQL Standard defines it (section 4.2.7) and different character sets are comparable if they have a collation in common (section 4.2.2).

I think the SQL Standard also mandates multiple character sets if one wants different default collations. The expression CURRENT USER has a mandated character set of SQL_IDENTIFIER, thus Derby must support that, and it is required that SQL identifiers have UCS_BASIC collation. Then a CREATE TABLE picks up its collation from its default *character set* which comes from its schema 11.4 SR10b), so to have a different default collation to SQL_IDENTIFIER a different character set is needed.

Another option is to use one character set, but use different collations for different types of tables. You may define that character columns in system tables are created using collation UCS_BASIC, while all user tables are created with a user-defined collation. Because all columns are defined using the same character set, all columns and literals will be comparable.

Is that correct? So far the discussion has assumed columns with different implicit collations are not comparable, see 9.3 SR3e).

I don't think it's a goal to have columns in system tables be comparable with user columns since if they have different collations the standard says they are not. [assuming no implementation of a <collate clause>]

A goal is to have the SQL queries used for JDBC metadata continue to work, which is currently the discussion around literals. The standard seems to poorly define the character set of a string literal.

Just remember that when comparing two strings with different defined collations, you need to consider the collation rules defined by the SQL standard.

Right, I think we are trying to understand those rules, how they apply to Derby and the proposed changes for DERBY-1478.

Thanks,
Dan.

Reply via email to