Daniel John Debrunner wrote:
Mike Matrigali wrote:
Daniel John Debrunner wrote:
Rick Hillegas wrote:
Daniel John Debrunner wrote:
Rick Hillegas wrote:
Daniel John Debrunner wrote:
...
- The collation type (the integer) is written into the meta-data
for an index just as ascending/descending is today (including the
btree control row, thus making the information available for
recovery). Collation type applies to all character columns in the
index.
This suggests that all of the columns in the index must have the
same collation? I don't think that is powerful enough to support
the full-blown SQL collation language, which allows you to mix
differently collated columns in an ORDER BY clause. Why can't the
collation type be an array of ints just as the sort direction is
an array of booleans in IndexDescriptor?
That would be more flexible, but is it required?
I believe so. I don't see any rule which requires one collation for
all of the character expressions in an ORDER BY clause. There does
seem to be a rule requiring one collation for both sides of a
comparison, e.g., for both sides of a < operator.
I understand ORDER BY with different collations is required, but I
meant are multiple collations required in a single BTREE index,
which is where this meta data would be stored. With the plans for
DERBY-1478 it isn't, with new collations it isn't, with collation
per-schema it isn't, so I was wondering what would trigger it? If
it's not in the foreseeable future or an option through SQL then I
would say a simple single collation will work. Future expansion could
change it to be per-column when required.
This is where I get confused. Are multiple collations required in a
single database? With plans for DERBY-1478 it isn't. With new
collations it isn't.
With collation per-schema it is, but should we pay overhead now for a
possible future, as long as we have a design that supports an upgrade
path to it?
I am not seeing the value in the argument for storing it once per
table vs. once per database today.
That's true. If the store used this per-database value when setting up
the row arrays for an index then the path forward to per-schema
collations would be clear. In that case in the future the store could
override the per-database value with the value from the index meta data.
This would then mean that the default collation type must be stored in
service.properties and thus be available to the store when it boots. The
reading of the attribute logic could continue to be in DataValueFactory
and should work similar to the code Mamta posted earlier.
Actually I think the above won't work, since 10.3 will have *two*
collations per database. UCS_BASIC for system tables and locale based
for user tables. Thus I think the information is needed at the index
level unless store has a mechanism to figure out which are system indexes.
The different format identifier for the same type is what worries me, it
seems there is ample opportunity to end up with the wrong CHAR data type
and thus lead to bugs. Today's code handles this ok and it more obvious
since there is a 1-1 mapping between:
SQL type name (e.g. INTEGER) and derby internal type
JDBC type name (e.g. Types.CHAR) and derby internal type
breaking that to be a 1-N mapping seems to be a huge risk. There is a
lot of places in the code where a character datatype is created, can we
guarantee to catch all of them? Limiting the exposure to collation
related activity seems safer to me.
Dan.