The compound key scheme isn't quite right.  Binary 0 bytes delimit keys,
binary 0s in a key are replaced with the sequence 1,0 and binary 1s in the
key are replaced with the sequence 1,1.  So delimiter < 0 < 1 < everything
else.  I wish I had invented it, but I only stole it.

On Monday, October 27, 2014, Ann Harrison <a...@qbeast.net> wrote:

> If, in Firebird 4, you're going to look at indexes, I have four
> suggestions.
>
>
> 1) Indexes with large keys
>
> An index key that's a significant fraction of the page size leads to
> inefficient indexes (or infinite depth), so rules about index size aren't
> just the revenge of database management system developers on database
> application designers.  There's a real problem with large keys.
>
> Back sometime around Interbase 3 -  mid-1980's - we had an engineering
> showdown about indexes.  Previously, the size of an index key was checked
> at runtime, so a long string or concatenated index could cause a fatal
> runtime error.  After much discussion, we decided to compute the largest
> possible size for an index key at definition time and reject oversized
> keys.  That decision was wrong.  Not that throwing an error at runtime was
> right, but we didn't think about a possible third way.
>
> Now UTF8 makes the problem much worse because the possible size of a key
> is vastly larger than the probably size of the key.  Here's a possible
> third way:  Store as much of the key as fits in a size appropriate for the
> given page size.  Let prefix compression take care of the unfortunate case
> where the first eighty or ninety characters are the same.  When inserting,
> if there's a previous versions that matches for the key length, read the
> record to decide whether to insert before or after.
>
> Yes, that's going to mean reading more records - but only when the option
> is not indexing the field at all, or indexing a substring function of it -
> which performs worse.  Most UTF8 characters (or whatever the correct name
> for them is ... glyphs?)  are only two bytes, so mostly Firebird is
> refusing to create indexes that would never be a problem.  The problem
> worse in compound indexes which are treated as if each field must be
> included at its full length, rather than recognizing that many have
> trailing spaces that are not included in the key.
>
>
> 2) Compound keys
>
> The advantage of Firebird's algorithm is that individual values can be
> suffix compressed, meaning that trailing spaces and zeros (in double
> precision) are removed. . There's at least one algorithm for separating the
> columns in compound keys that's more efficient than the one Firebird uses.
>   The one I know about puts a binary zero byte between each column.  When
> the column contains a byte of zero, replace it with binary bytes 0,1.  When
> the column contains binary bytes 0,1, replace it with 0,1,1.  That's enough
> to stop all confusion.  There are probably better algorithms.
>
> 3) Transaction IDs in index keys
>
>  Someone mentioned including transaction ids in the index key to avoid
> reading record data when using an index that provides all needed
> information.  Those situations are real - count of customers in Mexico,
> junction records for M to M relationships, etc.  In some cases, two
> transaction ids are required - one for the transaction that created entry
> and the transaction superseded it.  That's potentially 16 bytes subtracted
> from the key size.  OK, maybe not so big a problem, but it also means that
> when a key changes, the change must be made in two places.  But you know
> all those arguments.
>
> Jim's going to hate this.  Might it be possible to have a second sort of
> index specifically for those cases where the read efficiency outweighs the
> storage and management overhead?  Yes, one more place where the application
> designer can be blamed for poor performance.
>
>
>  4) Numeric indexes.
>
> In my opinion, the decision to use 64 bit integers in indexes on numerics
> larger than 9 digits and quads was boneheaded.  The advantage of double
> precision keys is that the scale of an integer or numeric/decimal field can
> be changed without invalidating the index.  That's a significant advantage.
>
> Interbase supported quads on VAXen from the beginning using the same sort
> of approximate logic I described above for the rare case when a key
> actually turns out to be its full declared size and can't be stored.  Fine,
> if you've got 18 digits of precision, use double precision and check the
> record data for the details.beyond digit 16.
>
> But that's not the best you can do.  When creating a numeric key which
> represents a column having more than 16 digits, tack the remaining digits
> onto the end of the key, following the double precision number.  Yes, that
> will make reconstructing the value from the key slightly more complicated,
> but it will sort correctly bytewise.
>
>
> Cheers,
>
> Ann
>
>

-- 
Jim Starkey
------------------------------------------------------------------------------
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to