If, in Firebird 4, you're going to look at indexes, I have four suggestions.


1) Indexes with large keys

An index key that's a significant fraction of the page size leads to
inefficient indexes (or infinite depth), so rules about index size aren't
just the revenge of database management system developers on database
application designers.  There's a real problem with large keys.

Back sometime around Interbase 3 -  mid-1980's - we had an engineering
showdown about indexes.  Previously, the size of an index key was checked
at runtime, so a long string or concatenated index could cause a fatal
runtime error.  After much discussion, we decided to compute the largest
possible size for an index key at definition time and reject oversized
keys.  That decision was wrong.  Not that throwing an error at runtime was
right, but we didn't think about a possible third way.

Now UTF8 makes the problem much worse because the possible size of a key is
vastly larger than the probably size of the key.  Here's a possible third
way:  Store as much of the key as fits in a size appropriate for the given
page size.  Let prefix compression take care of the unfortunate case where
the first eighty or ninety characters are the same.  When inserting, if
there's a previous versions that matches for the key length, read the
record to decide whether to insert before or after.

Yes, that's going to mean reading more records - but only when the option
is not indexing the field at all, or indexing a substring function of it -
which performs worse.  Most UTF8 characters (or whatever the correct name
for them is ... glyphs?)  are only two bytes, so mostly Firebird is
refusing to create indexes that would never be a problem.  The problem
worse in compound indexes which are treated as if each field must be
included at its full length, rather than recognizing that many have
trailing spaces that are not included in the key.


2) Compound keys

The advantage of Firebird's algorithm is that individual values can be
suffix compressed, meaning that trailing spaces and zeros (in double
precision) are removed. . There's at least one algorithm for separating the
columns in compound keys that's more efficient than the one Firebird uses.
  The one I know about puts a binary zero byte between each column.  When
the column contains a byte of zero, replace it with binary bytes 0,1.  When
the column contains binary bytes 0,1, replace it with 0,1,1.  That's enough
to stop all confusion.  There are probably better algorithms.

3) Transaction IDs in index keys

 Someone mentioned including transaction ids in the index key to avoid
reading record data when using an index that provides all needed
information.  Those situations are real - count of customers in Mexico,
junction records for M to M relationships, etc.  In some cases, two
transaction ids are required - one for the transaction that created entry
and the transaction superseded it.  That's potentially 16 bytes subtracted
from the key size.  OK, maybe not so big a problem, but it also means that
when a key changes, the change must be made in two places.  But you know
all those arguments.

Jim's going to hate this.  Might it be possible to have a second sort of
index specifically for those cases where the read efficiency outweighs the
storage and management overhead?  Yes, one more place where the application
designer can be blamed for poor performance.


 4) Numeric indexes.

In my opinion, the decision to use 64 bit integers in indexes on numerics
larger than 9 digits and quads was boneheaded.  The advantage of double
precision keys is that the scale of an integer or numeric/decimal field can
be changed without invalidating the index.  That's a significant advantage.

Interbase supported quads on VAXen from the beginning using the same sort
of approximate logic I described above for the rare case when a key
actually turns out to be its full declared size and can't be stored.  Fine,
if you've got 18 digits of precision, use double precision and check the
record data for the details.beyond digit 16.

But that's not the best you can do.  When creating a numeric key which
represents a column having more than 16 digits, tack the remaining digits
onto the end of the key, following the double precision number.  Yes, that
will make reconstructing the value from the key slightly more complicated,
but it will sort correctly bytewise.


Cheers,

Ann
------------------------------------------------------------------------------
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to