Jay Pipes wrote:
I neglected to mention that I'm dithering on whether to translate index
keys to byte strings that can be compared naturally or to invoke
collation specific comparisons during index traversal.

Please expand on this for me.  The way I see it, a byte string cannot be
compared correctly without a collation (unless of course the collation
is binary...).  Otherwise, two sets of string characters stored in a
binary format could be sorted differently depending on whether a
specific locale determines char 0xXXXX to be before OxXXXX where XXXX is
some arbitrary utf8 character code.

I see a couple options:

1) Store all index keys as binary strings and do all lookups and
comparisons at runtime
2) Have collation set at create/alter time and store keys in collation
order, with ability to pull into a filesort if needed collation is
different from the stored index collation.
The collation must be declared or implicit for the index. Either we can apply the collation code to the UTF-8 (or whatever) string, or we can expand the UTF-8 (or whatever) string at key creation time with something like the MySQL strnxfrm collation method. The first consumes a lot of CPU cycles for comparisons when walking index buckets while the second fluffs up the keys, wasting memory and reduce index bucket fan-out. A choice between a rock and a hard place.

If the code that generated naturally comparable keys were a little smarter, it wouldn't have to use a full byte for each level of a multi-level collation, the trade-off wouldn't be so painful. My desire to rewrite every collation known to mankind, however, is distinctly limited.

But maybe this is what you meant...

For Falcon, we
use expanded keys for our btrees on the assumption that collation
specific compares would be too expensive in performance.  Nimbus uses
AVL (balanced) trees, so the trade-off is quite different.  I may punt
and let the user make the CPU/memory trade-off at index creation time
(for which I will expect -- and deserve -- a great deal of heckling).

Nothing is better for an engineer's soul than whacking out obsoleted
code.  Go, Brian, go!



--
Jim Starkey
President, NimbusDB, Inc.
978 526-1376


_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to