The compound key scheme isn't quite right. Binary 0 bytes delimit keys, binary 0s in a key are replaced with the sequence 1,0 and binary 1s in the key are replaced with the sequence 1,1. So delimiter < 0 < 1 < everything else. I wish I had invented it, but I only stole it.
On Monday, October 27, 2014, Ann Harrison <a...@qbeast.net> wrote: > If, in Firebird 4, you're going to look at indexes, I have four > suggestions. > > > 1) Indexes with large keys > > An index key that's a significant fraction of the page size leads to > inefficient indexes (or infinite depth), so rules about index size aren't > just the revenge of database management system developers on database > application designers. There's a real problem with large keys. > > Back sometime around Interbase 3 - mid-1980's - we had an engineering > showdown about indexes. Previously, the size of an index key was checked > at runtime, so a long string or concatenated index could cause a fatal > runtime error. After much discussion, we decided to compute the largest > possible size for an index key at definition time and reject oversized > keys. That decision was wrong. Not that throwing an error at runtime was > right, but we didn't think about a possible third way. > > Now UTF8 makes the problem much worse because the possible size of a key > is vastly larger than the probably size of the key. Here's a possible > third way: Store as much of the key as fits in a size appropriate for the > given page size. Let prefix compression take care of the unfortunate case > where the first eighty or ninety characters are the same. When inserting, > if there's a previous versions that matches for the key length, read the > record to decide whether to insert before or after. > > Yes, that's going to mean reading more records - but only when the option > is not indexing the field at all, or indexing a substring function of it - > which performs worse. Most UTF8 characters (or whatever the correct name > for them is ... glyphs?) are only two bytes, so mostly Firebird is > refusing to create indexes that would never be a problem. The problem > worse in compound indexes which are treated as if each field must be > included at its full length, rather than recognizing that many have > trailing spaces that are not included in the key. > > > 2) Compound keys > > The advantage of Firebird's algorithm is that individual values can be > suffix compressed, meaning that trailing spaces and zeros (in double > precision) are removed. . There's at least one algorithm for separating the > columns in compound keys that's more efficient than the one Firebird uses. > The one I know about puts a binary zero byte between each column. When > the column contains a byte of zero, replace it with binary bytes 0,1. When > the column contains binary bytes 0,1, replace it with 0,1,1. That's enough > to stop all confusion. There are probably better algorithms. > > 3) Transaction IDs in index keys > > Someone mentioned including transaction ids in the index key to avoid > reading record data when using an index that provides all needed > information. Those situations are real - count of customers in Mexico, > junction records for M to M relationships, etc. In some cases, two > transaction ids are required - one for the transaction that created entry > and the transaction superseded it. That's potentially 16 bytes subtracted > from the key size. OK, maybe not so big a problem, but it also means that > when a key changes, the change must be made in two places. But you know > all those arguments. > > Jim's going to hate this. Might it be possible to have a second sort of > index specifically for those cases where the read efficiency outweighs the > storage and management overhead? Yes, one more place where the application > designer can be blamed for poor performance. > > > 4) Numeric indexes. > > In my opinion, the decision to use 64 bit integers in indexes on numerics > larger than 9 digits and quads was boneheaded. The advantage of double > precision keys is that the scale of an integer or numeric/decimal field can > be changed without invalidating the index. That's a significant advantage. > > Interbase supported quads on VAXen from the beginning using the same sort > of approximate logic I described above for the rare case when a key > actually turns out to be its full declared size and can't be stored. Fine, > if you've got 18 digits of precision, use double precision and check the > record data for the details.beyond digit 16. > > But that's not the best you can do. When creating a numeric key which > represents a column having more than 16 digits, tack the remaining digits > onto the end of the key, following the double precision number. Yes, that > will make reconstructing the value from the key slightly more complicated, > but it will sort correctly bytewise. > > > Cheers, > > Ann > > -- Jim Starkey
------------------------------------------------------------------------------
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel