I wrote: > Tomas Vondra <t...@fuzzy.cz> writes: >> I've managed to further simplify the test-case, and I've verified that >> it's reproducible on current 9.2 and 9.3 branches.
> It seems that > (1) gistfindgroup decides that SimpleTestString is "equiv" to something. > It's not too clear what; for sure there is no other leaf key of the same > value. The code in gistfindgroup looks a bit like it ought to think > that all the items are "equiv" to one of the union datums or the other, > but that's not the case --- no other item gets marked. After looking closer at this, I see that it's marking left-side tuples as "equiv" if they could be put in the right-side page for zero penalty, and similarly marking right-side tuples as "equiv" if they could be put in the left-side page for zero penalty. IOW it's finding the tuples for which it doesn't matter which side they go to. So this is not quite as insane as I thought, although the name "equiv" for the flag does not seem like le mot juste, and the function name and comment are rather misleading IMO. > (After further > investigation it seems that gbt_var_penalty is returning a negative > penalty at the critical spot here, which might have something to > do with it --- the business with the tmp[4] array seems many degrees > away from trustworthy.) This is a bug, but the core gist code ought not generate invalid indexes just because the type-specific penalty function is doing something stupid. > (2) cleanupOffsets removes the item for SimpleTestString from > sv->spl_right[], evidently because it's "equiv". > (3) placeOne() sticks the entry into the spl_left[] array, which is > an absolutely horrid decision: it means the left-side page will now > need a range spanning the right-side page's range, when a moment > ago they had been disjoint. The above behaviors are fine, really, because they are driven by the penalty function's claim that there's zero/negative cost to moving this tuple to the other side. > (4) gistunionsubkey(), which apparently is supposed to fix the union > datums to reflect this rearrangement, does no such thing because it > only processes the index columns to the right of the one where the > damage has been done. (It's not clear to me that it could ever > be allowed to skip any columns, but that's what it's doing.) This is a bug. It's also a bug that gistunionsubkey() fails to null out the existing union keys for each side before calling gistMakeUnionItVec --- that can result in the recomputed union keys being larger than necessary. Furthermore, there's also a bug in gbt_var_bin_union(), the same one I saw previously that it does the wrong thing when both sides of the existing range need to be enlarged. The attached patch fixes these things, but not the buggy penalty function, because we ought to try to verify that these changes are enough to prevent creation of an incorrect index. I can't reproduce any misbehavior anymore with this patch applied. However, I'm troubled by your report that the behavior is unstable, because I get the same result each time if I start from an empty (truncated) table, with or without the patch. You may be seeing some further bug(s). Could you test this fix against your own test cases? There's a lot of other things I don't much like in gistsplit.c, but they're in the nature of cosmetic and documentation fixes. regards, tom lane
diff --git a/contrib/btree_gist/btree_utils_var.c b/contrib/btree_gist/btree_utils_var.c index 9f8a132..691b103 100644 *** a/contrib/btree_gist/btree_utils_var.c --- b/contrib/btree_gist/btree_utils_var.c *************** void *** 225,237 **** gbt_var_bin_union(Datum *u, GBT_VARKEY *e, Oid collation, const gbtree_vinfo *tinfo) { - GBT_VARKEY *nk = NULL; - GBT_VARKEY *tmp = NULL; - GBT_VARKEY_R nr; GBT_VARKEY_R eo = gbt_var_key_readable(e); if (eo.lower == eo.upper) /* leaf */ { tmp = gbt_var_leaf2node(e, tinfo); if (tmp != e) eo = gbt_var_key_readable(tmp); --- 225,237 ---- gbt_var_bin_union(Datum *u, GBT_VARKEY *e, Oid collation, const gbtree_vinfo *tinfo) { GBT_VARKEY_R eo = gbt_var_key_readable(e); + GBT_VARKEY_R nr; if (eo.lower == eo.upper) /* leaf */ { + GBT_VARKEY *tmp; + tmp = gbt_var_leaf2node(e, tinfo); if (tmp != e) eo = gbt_var_key_readable(tmp); *************** gbt_var_bin_union(Datum *u, GBT_VARKEY * *** 239,263 **** if (DatumGetPointer(*u)) { - GBT_VARKEY_R ro = gbt_var_key_readable((GBT_VARKEY *) DatumGetPointer(*u)); if ((*tinfo->f_cmp) (ro.lower, eo.lower, collation) > 0) { nr.lower = eo.lower; ! nr.upper = ro.upper; ! nk = gbt_var_key_copy(&nr, TRUE); } if ((*tinfo->f_cmp) (ro.upper, eo.upper, collation) < 0) { nr.upper = eo.upper; ! nr.lower = ro.lower; ! nk = gbt_var_key_copy(&nr, TRUE); } ! if (nk) ! *u = PointerGetDatum(nk); } else { --- 239,264 ---- if (DatumGetPointer(*u)) { GBT_VARKEY_R ro = gbt_var_key_readable((GBT_VARKEY *) DatumGetPointer(*u)); + bool update = false; + + nr.lower = ro.lower; + nr.upper = ro.upper; if ((*tinfo->f_cmp) (ro.lower, eo.lower, collation) > 0) { nr.lower = eo.lower; ! update = true; } if ((*tinfo->f_cmp) (ro.upper, eo.upper, collation) < 0) { nr.upper = eo.upper; ! update = true; } ! if (update) ! *u = PointerGetDatum(gbt_var_key_copy(&nr, TRUE)); } else { diff --git a/src/backend/access/gist/gistsplit.c b/src/backend/access/gist/gistsplit.c index a96b881..eac4977 100644 *** a/src/backend/access/gist/gistsplit.c --- b/src/backend/access/gist/gistsplit.c *************** gistunionsubkeyvec(GISTSTATE *giststate, *** 49,55 **** cleanedItVec[cleanedLen++] = itvec[gsvp->entries[i] - 1]; } ! gistMakeUnionItVec(giststate, cleanedItVec, cleanedLen, startkey, gsvp->attr, gsvp->isnull); pfree(cleanedItVec); --- 49,57 ---- cleanedItVec[cleanedLen++] = itvec[gsvp->entries[i] - 1]; } ! memset(gsvp->isnull, TRUE, sizeof(bool) * giststate->tupdesc->natts); ! ! gistMakeUnionItVec(giststate, cleanedItVec, cleanedLen, 0, gsvp->attr, gsvp->isnull); pfree(cleanedItVec);
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers