If, in Firebird 4, you're going to look at indexes, I have four suggestions.
1) Indexes with large keys An index key that's a significant fraction of the page size leads to inefficient indexes (or infinite depth), so rules about index size aren't just the revenge of database management system developers on database application designers. There's a real problem with large keys. Back sometime around Interbase 3 - mid-1980's - we had an engineering showdown about indexes. Previously, the size of an index key was checked at runtime, so a long string or concatenated index could cause a fatal runtime error. After much discussion, we decided to compute the largest possible size for an index key at definition time and reject oversized keys. That decision was wrong. Not that throwing an error at runtime was right, but we didn't think about a possible third way. Now UTF8 makes the problem much worse because the possible size of a key is vastly larger than the probably size of the key. Here's a possible third way: Store as much of the key as fits in a size appropriate for the given page size. Let prefix compression take care of the unfortunate case where the first eighty or ninety characters are the same. When inserting, if there's a previous versions that matches for the key length, read the record to decide whether to insert before or after. Yes, that's going to mean reading more records - but only when the option is not indexing the field at all, or indexing a substring function of it - which performs worse. Most UTF8 characters (or whatever the correct name for them is ... glyphs?) are only two bytes, so mostly Firebird is refusing to create indexes that would never be a problem. The problem worse in compound indexes which are treated as if each field must be included at its full length, rather than recognizing that many have trailing spaces that are not included in the key. 2) Compound keys The advantage of Firebird's algorithm is that individual values can be suffix compressed, meaning that trailing spaces and zeros (in double precision) are removed. . There's at least one algorithm for separating the columns in compound keys that's more efficient than the one Firebird uses. The one I know about puts a binary zero byte between each column. When the column contains a byte of zero, replace it with binary bytes 0,1. When the column contains binary bytes 0,1, replace it with 0,1,1. That's enough to stop all confusion. There are probably better algorithms. 3) Transaction IDs in index keys Someone mentioned including transaction ids in the index key to avoid reading record data when using an index that provides all needed information. Those situations are real - count of customers in Mexico, junction records for M to M relationships, etc. In some cases, two transaction ids are required - one for the transaction that created entry and the transaction superseded it. That's potentially 16 bytes subtracted from the key size. OK, maybe not so big a problem, but it also means that when a key changes, the change must be made in two places. But you know all those arguments. Jim's going to hate this. Might it be possible to have a second sort of index specifically for those cases where the read efficiency outweighs the storage and management overhead? Yes, one more place where the application designer can be blamed for poor performance. 4) Numeric indexes. In my opinion, the decision to use 64 bit integers in indexes on numerics larger than 9 digits and quads was boneheaded. The advantage of double precision keys is that the scale of an integer or numeric/decimal field can be changed without invalidating the index. That's a significant advantage. Interbase supported quads on VAXen from the beginning using the same sort of approximate logic I described above for the rare case when a key actually turns out to be its full declared size and can't be stored. Fine, if you've got 18 digits of precision, use double precision and check the record data for the details.beyond digit 16. But that's not the best you can do. When creating a numeric key which represents a column having more than 16 digits, tack the remaining digits onto the end of the key, following the double precision number. Yes, that will make reconstructing the value from the key slightly more complicated, but it will sort correctly bytewise. Cheers, Ann
------------------------------------------------------------------------------
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel