On Wed, Sep 16, 2015 at 8:53 AM, Nicolas Barbier <nicolas.barb...@gmail.com> wrote: > After thinking about it a bit more, it indeed seems never useful to > have f3 in the internal nodes if it is not part of the columns that > determine the UNIQUE property. It could as well be pushed out of the > internal nodes and only appear in the leaf nodes.
Correct. That's a standard technique in B-Tree implementations like our own; suffix truncation can remove unneeded information from the end of values, possibly including entire attributes, which can be truncated in a way that is similar to this patch. The difference here is only that this patch does not dynamically determine which attributes can be removed while re-finding the parent downlink in the second phase of a page split (the usual place it happens with standard suffix truncation). Rather, this patch knows "a priori" that it can truncate attributes that are merely "included" attributes. That means that this patch has as much to do with increasing B-Tree fan-out for these indexes as it does for making unique indexes more usable for index-only scans. Both of those goals are important, IMV. This patch seems pretty cool. I noticed some issues following a quick read though the patch including_columns_6.0.patch that Anastasia should look into: * You truncate (remove suffix attributes -- the "included" attributes) within _bt_insertonpg(): - right_item = CopyIndexTuple(item); + indnatts = IndexRelationGetNumberOfAttributes(rel); + indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel); + + if (indnatts != indnkeyatts) + { + right_item = index_reform_tuple(rel, item, indnatts, indnkeyatts); + right_item_sz = IndexTupleDSize(*right_item); + right_item_sz = MAXALIGN(right_item_sz); + } + else + right_item = CopyIndexTuple(item); ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY); I suggest that you do this within _bt_insert_parent(), instead, iff the original target page is know to be a leaf page. That's where it needs to happen for conventional suffix truncation, which has special considerations when determining which attributes are safe to truncate (or even which byte in the first distinguishing attribute it is okay to truncate past). Conventional suffix truncation (not this patch) could truncate, say, "C" collation text past the first distinguishing byte, where the byte distinguishes the new downlink being inserted into the parent page compared to both the left downlink and right downlink in the parent page-- the minimum amount of information to correctly guide later index scans is only stored. But it isn't correct (again, with conventional suffix truncation) to do this passed the leaf level. It must end there. It isn't safe past the leaf level (by which I mean when inserting a downlink into its parent, one level up) because applying suffix truncation again for the next level up might guide a search to the highest node in the left sub-tree rather than to the lowest node in the right sub-tree, or vice versa. In general, we must be careful about "cousin" nodes, that are beside each other but are not "siblings" due to not sharing the same parent. It doesn't really matter that this restriction exists, because you get almost all the benefit at the leaf -> immediate parent level anyway. Higher levels will reuse already truncated Index Tuples, that are typically "truncated enough". So, this should work in a similar way to conventional suffix truncation (BTW, you should work on that later). And so, it should just do it there. Besides, checking it only where it could possibly help is clearer, since as written the code in _bt_insertonpg() will never need to truncate following a non-leaf/internal page split. * I think the comparison logic may have a bug. Does this work with amcheck? Maybe it works with bt_index_check(), but not bt_index_parent_check()? I think that you need to make sure that _bt_compare() knows about this, too. That's because it isn't good enough to let a truncated internal IndexTuple compare equal to a scankey when non-truncated attributes are equal. I think you need to have an imaginary "minus infinity" attribute past the first non-truncated attribute (i.e. "minus infinity value" for the first *truncated* attribute). That way, the downlinks will always be lower bounds when the non-"included"/truncated attributes are involved. This seems necessary. No? It's necessary because you aren't storing any attributes, so it's not acceptable to even attempt a comparison -- I think that will segfault (doesn't matter that the index scan wouldn't have returned anything anyway). I think it's also necessary because of issues with "cousin" nodes making index scans lose their way. That's all I have right now. Nice work. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers