Daniel John Debrunner wrote:
Mike Matrigali wrote:


Daniel John Debrunner wrote:
Rick Hillegas wrote:

Daniel John Debrunner wrote:

Rick Hillegas wrote:

Daniel John Debrunner wrote:

...

- The collation type (the integer) is written into the meta-data for an index just as ascending/descending is today (including the btree control row, thus making the information available for recovery). Collation type applies to all character columns in the index.

This suggests that all of the columns in the index must have the same collation? I don't think that is powerful enough to support the full-blown SQL collation language, which allows you to mix differently collated columns in an ORDER BY clause. Why can't the collation type be an array of ints just as the sort direction is an array of booleans in IndexDescriptor?


That would be more flexible, but is it required?

I believe so. I don't see any rule which requires one collation for all of the character expressions in an ORDER BY clause. There does seem to be a rule requiring one collation for both sides of a comparison, e.g., for both sides of a < operator.


I understand ORDER BY with different collations is required, but I meant are multiple collations required in a single BTREE index, which is where this meta data would be stored. With the plans for DERBY-1478 it isn't, with new collations it isn't, with collation per-schema it isn't, so I was wondering what would trigger it? If it's not in the foreseeable future or an option through SQL then I would say a simple single collation will work. Future expansion could change it to be per-column when required.

This is where I get confused. Are multiple collations required in a single database? With plans for DERBY-1478 it isn't. With new collations it isn't.
With collation per-schema it is, but should we pay overhead now for a
possible future, as long as we have a design that supports an upgrade
path to it?
I am not seeing the value in the argument for storing it once per
table vs. once per database today.

That's true. If the store used this per-database value when setting up the row arrays for an index then the path forward to per-schema collations would be clear. In that case in the future the store could override the per-database value with the value from the index meta data.

This would then mean that the default collation type must be stored in service.properties and thus be available to the store when it boots. The reading of the attribute logic could continue to be in DataValueFactory and should work similar to the code Mamta posted earlier.

Actually I think the above won't work, since 10.3 will have *two* collations per database. UCS_BASIC for system tables and locale based for user tables. Thus I think the information is needed at the index level unless store has a mechanism to figure out which are system indexes.

The different format identifier for the same type is what worries me, it seems there is ample opportunity to end up with the wrong CHAR data type and thus lead to bugs. Today's code handles this ok and it more obvious since there is a 1-1 mapping between:
       SQL type name (e.g. INTEGER) and derby internal type
       JDBC type name (e.g. Types.CHAR) and derby internal type

breaking that to be a 1-N mapping seems to be a huge risk. There is a lot of places in the code where a character datatype is created, can we guarantee to catch all of them? Limiting the exposure to collation related activity seems safer to me.

Dan.

Reply via email to