Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?

Daniel John Debrunner Thu, 15 Mar 2007 11:30:16 -0800

Daniel John Debrunner wrote:

Mike Matrigali wrote:
Daniel John Debrunner wrote:
Rick Hillegas wrote:
Daniel John Debrunner wrote:
Rick Hillegas wrote:
Daniel John Debrunner wrote:
...
- The collation type (the integer) is written into the meta-datafor an index just as ascending/descending is today (including thebtree control row, thus making the information available forrecovery). Collation type applies to all character columns in theindex.
This suggests that all of the columns in the index must have thesame collation? I don't think that is powerful enough to supportthe full-blown SQL collation language, which allows you to mixdifferently collated columns in an ORDER BY clause. Why can't thecollation type be an array of ints just as the sort direction isan array of booleans in IndexDescriptor?
That would be more flexible, but is it required?
I believe so. I don't see any rule which requires one collation forall of the character expressions in an ORDER BY clause. There doesseem to be a rule requiring one collation for both sides of acomparison, e.g., for both sides of a < operator.
I understand ORDER BY with different collations is required, but Imeant are multiple collations required in a single BTREE index,which is where this meta data would be stored. With the plans forDERBY-1478 it isn't, with new collations it isn't, with collationper-schema it isn't, so I was wondering what would trigger it? Ifit's not in the foreseeable future or an option through SQL then Iwould say a simple single collation will work. Future expansion couldchange it to be per-column when required.
This is where I get confused. Are multiple collations required in asingle database? With plans for DERBY-1478 it isn't. With newcollations it isn't.
With collation per-schema it is, but should we pay overhead now for a
possible future, as long as we have a design that supports an upgrade
path to it?
I am not seeing the value in the argument for storing it once per
table vs. once per database today.
That's true. If the store used this per-database value when setting upthe row arrays for an index then the path forward to per-schemacollations would be clear. In that case in the future the store couldoverride the per-database value with the value from the index meta data.
This would then mean that the default collation type must be stored inservice.properties and thus be available to the store when it boots. Thereading of the attribute logic could continue to be in DataValueFactoryand should work similar to the code Mamta posted earlier.

Actually I think the above won't work, since 10.3 will have *two*collations per database. UCS_BASIC for system tables and locale basedfor user tables. Thus I think the information is needed at the indexlevel unless store has a mechanism to figure out which are system indexes.

The different format identifier for the same type is what worries me, itseems there is ample opportunity to end up with the wrong CHAR data typeand thus lead to bugs. Today's code handles this ok and it more obvioussince there is a 1-1 mapping between:

       SQL type name (e.g. INTEGER) and derby internal type
       JDBC type name (e.g. Types.CHAR) and derby internal type

breaking that to be a 1-N mapping seems to be a huge risk. There is alot of places in the code where a character datatype is created, can weguarantee to catch all of them? Limiting the exposure to collationrelated activity seems safer to me.


Dan.

Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?

Reply via email to