Heikki Linnakangas <hlinnakan...@vmware.com> writes: > I did some experimenting with that. I used the same test case Alexander > did, with geonames data, and compared unpatched version, the original > patch, and the attached patch that biases the first "best" tuple found, > but still sometimes chooses the other equally good ones.
> testname | initsize | finalsize | idx_blks_read | idx_blks_hit > ----------------+----------+-----------+---------------+-------------- > patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331 > unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647 > unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759 > patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738 > origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373 > I think the conclusion is that all of these patches are effective. The > 1/10 variant is less effective, as expected, as it's closer in behavior > to the unpatched behavior than the others. The 1/2 variant seems as good > as the original patch. At least on this example, it seems a tad better, if you look at index size. > A table full of duplicates isn't very realistic, but overall, I'm > leaning towards my version of this patch (gistchoose-2.patch). It has > less potential for causing a regression in existing applications, but is > just as effective in the original scenario of repeated delete+insert. +1 for this patch, but I think the comments could use more work. I was convinced it was wrong on first examination, mainly because it's hard to follow the underdocumented look_further_on_equal logic. I propose the attached, which is the same logic with better comments (I also chose to rename and invert the sense of the state variable, because it seemed easier to follow this way ... YMMV on that though). regards, tom lane
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c index 8b60774f50349222ee04fef13613e3592120973b..a127627334e80704edeed38d3522e4000a695abb 100644 *** a/src/backend/access/gist/gistutil.c --- b/src/backend/access/gist/gistutil.c *************** gistchoose(Relation r, Page p, IndexTupl *** 379,384 **** --- 379,385 ---- GISTENTRY entry, identry[INDEX_MAX_KEYS]; bool isnull[INDEX_MAX_KEYS]; + int keep_current_best; Assert(!GistPageIsLeaf(p)); *************** gistchoose(Relation r, Page p, IndexTupl *** 402,407 **** --- 403,433 ---- best_penalty[0] = -1; /* + * If we find a tuple that's exactly as good as the currently best one, we + * could use either one. When inserting a lot of tuples with the same or + * similar keys, it's preferable to descend down the same path when + * possible, as that's more cache-friendly. On the other hand, if all + * inserts land on the same leaf page after a split, we're never going to + * insert anything to the other half of the split, and will end up using + * only 50% of the available space. Distributing the inserts evenly would + * lead to better space usage, but that hurts cache-locality during + * insertion. To get the best of both worlds, when we find a tuple that's + * exactly as good as the previous best, choose randomly whether to stick + * to the old best, or use the new one. Once we decide to stick to the + * old best, we keep sticking to it for any subsequent equally good tuples + * we might find. This favors tuples with low offsets, but still allows + * some inserts to go to other equally-good subtrees. + * + * keep_current_best is -1 if we haven't yet had to make a random choice + * whether to keep the current best tuple. If we have done so, and + * decided to keep it, keep_current_best is 1; if we've decided to + * replace, keep_current_best is 0. (This state will be reset to -1 as + * soon as we've made the replacement, but sometimes we make the choice in + * advance of actually finding a replacement best tuple.) + */ + keep_current_best = -1; + + /* * Loop over tuples on page. */ maxoff = PageGetMaxOffsetNumber(p); *************** gistchoose(Relation r, Page p, IndexTupl *** 446,451 **** --- 472,480 ---- if (j < r->rd_att->natts - 1) best_penalty[j + 1] = -1; + + /* we have new best, so reset keep-it decision */ + keep_current_best = -1; } else if (best_penalty[j] == usize) { *************** gistchoose(Relation r, Page p, IndexTupl *** 468,479 **** } /* ! * If we find a tuple with zero penalty for all columns, there's no ! * need to examine remaining tuples; just break out of the loop and ! * return it. */ if (zero_penalty) ! break; } return result; --- 497,537 ---- } /* ! * If we looped past the last column, and did not update "result", ! * then this tuple is exactly as good as the prior best tuple. ! */ ! if (j == r->rd_att->natts && result != i) ! { ! if (keep_current_best == -1) ! { ! /* we didn't make the random choice yet for this old best */ ! keep_current_best = (random() <= (MAX_RANDOM_VALUE / 2)) ? 1 : 0; ! } ! if (keep_current_best == 0) ! { ! /* we choose to use the new tuple */ ! result = i; ! /* choose again if there are even more exactly-as-good ones */ ! keep_current_best = -1; ! } ! } ! ! /* ! * If we find a tuple with zero penalty for all columns, and we've ! * decided we don't want to search for another tuple with equal ! * penalty, there's no need to examine remaining tuples; just break ! * out of the loop and return it. */ if (zero_penalty) ! { ! if (keep_current_best == -1) ! { ! /* we didn't make the random choice yet for this old best */ ! keep_current_best = (random() <= (MAX_RANDOM_VALUE / 2)) ? 1 : 0; ! } ! if (keep_current_best == 1) ! break; ! } } return result;
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers