Amit Kapila escribió: > On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera > <alvhe...@2ndquadrant.com> wrote:
> > One thing still to tackle is when to mark ranges as unsummarized. Right > > now, any new tuple on a page range would cause a new index entry to be > > created and a new revmap update. This would cause huge index bloat if, > > say, a page is emptied and vacuumed and filled with new tuples with > > increasing values outside the original range; each new tuple would > > create a new index tuple. I have two ideas about this (1. mark range as > > unsummarized if 3rd time we touch the same page range; > > Why only at 3rd time? > Doesn't it need to be precise, like if someone inserts a row having > value greater than max value of corresponding index tuple, > then that index tuple's corresponding max value needs to be updated > and I think its updated with the help of validity map. Of course. Note I no longer have the concept of a validity map; I have switched things to use a "reverse range map", or revmap for short. The revmap is responsible for mapping each page number to an individual index TID. If the TID stored in the revmap is InvalidTid, that means the range is not summarized. Summarized ranges are always considered as "match query quals", and thus all tuples in them are returned in the bitmap for heap recheck. The way it works currently, is that any tuple insert (that's outside the bounds of the current index tuple) causes a new index tuple to be created, and the revmap is updated to point to the new index tuple. The old index tuple is orphaned and will be deleted at next vacuum. This works fine. However the problem is excess orphaned tuples; I don't want a long series of updates to create many orphaned dead tuples. Instead I would like the system to, at some point, stop creating new index tuples and instead set the revmap to InvalidTid. That would stop the index bloat. > For example: > considering we need to store below info for each index tuple: > In each index tuple (corresponding to one page range), we store: > - first block this tuple applies to > - last block this tuple applies to > - for each indexed column: > * min() value across all tuples in the range > * max() value across all tuples in the range > > Assume first and last block for index tuple is same (assume block > no. 'x') and min value is 5 and max is 10. > Now user insert/update value in block 'x' such that max value of > index col. is 11, if we don't update corresponding > index tuple or at least invalidate it, won't it lead to wrong results? Sure, that would result in wrong results. Fortunately that's not how I am suggesting to do it. I note you're reading an old version of the design. I realize now that this is my mistake because instead of posting the new design in the cover letter for the patch, I only put it in the "minmax-proposal" file. Please give that file a read to see how the design differs from the design I originally posted in the old thread. > > The "amcostestimate" routine is completely bogus; right now it returns > > constant 0, meaning the index is always chosen if it exists. > > I think for first version, you might want to keep things simple, but > there should be some way for optimizer to select this index. > So rather than choose if it is present, we can make optimizer choose > when some-one says set enable_minmax index to true. Well, enable_bitmapscan already disables minmax indexes, just like it disables other indexes. > How about keeping this up-to-date during foreground operations. > Vacuum/Maintainer task maintaining things usually have problems of > bloat and > then we need optimize/workaround issues. > Lot of people have raised this or similar point previously and what > I read you are of opinion that it seems to be slow. Well, the current code does keep the index up to date -- I did choose to implement what people suggested :-) > Now it can so happen that min and max values are sometimes not right > because later the operation is rolled back, but I think such cases > will > be less and we can find some way to handle such cases may be > maintainer task only, but the handling will be quite simpler. Agreed. > On Windows, patch gives below compilation errors: > src\backend\access\minmax\mmtuple.c(96): error C2057: expected > constant expression I have fixed all these compile errors (fix attached). Thanks for reporting them. I'll post a new version shortly. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
*** a/src/backend/access/minmax/mmtuple.c --- b/src/backend/access/minmax/mmtuple.c *************** *** 93,117 **** MMTuple * minmax_form_tuple(TupleDesc idxDsc, TupleDesc diskDsc, DeformedMMTuple *tuple, Size *size) { ! Datum values[diskDsc->natts]; ! bool nulls[diskDsc->natts]; bool anynulls = false; MMTuple *rettuple; int keyno; uint16 phony_infomask; ! bits8 phony_nullbitmap[BITMAPLEN(diskDsc->natts)]; Size len, hoff, data_len; /* * Set up the values/nulls arrays for heap_fill_tuple */ - MemSet(nulls, 0, sizeof(nulls)); for (keyno = 0; keyno < idxDsc->natts; keyno++) { ! AttrNumber idxattno = keyno * 2; if (tuple->values[keyno].allnulls) { nulls[idxattno] = true; --- 93,126 ---- minmax_form_tuple(TupleDesc idxDsc, TupleDesc diskDsc, DeformedMMTuple *tuple, Size *size) { ! Datum *values; ! bool *nulls; bool anynulls = false; MMTuple *rettuple; int keyno; uint16 phony_infomask; ! bits8 *phony_nullbitmap; Size len, hoff, data_len; + Assert(diskDsc->natts > 0); + + values = palloc(sizeof(Datum) * diskDsc->natts); + nulls = palloc0(sizeof(bool) * diskDsc->natts); + phony_nullbitmap = palloc(sizeof(bits8) * BITMAPLEN(diskDsc->natts)); + /* * Set up the values/nulls arrays for heap_fill_tuple */ for (keyno = 0; keyno < idxDsc->natts; keyno++) { ! int idxattno = keyno * 2; + /* + * "allnulls" is set when there's no nonnull value in any row in + * the column; set the nullable bits for both min and max attrs. + */ if (tuple->values[keyno].allnulls) { nulls[idxattno] = true; *************** *** 168,173 **** minmax_form_tuple(TupleDesc idxDsc, TupleDesc diskDsc, DeformedMMTuple *tuple, --- 177,187 ---- &phony_infomask, phony_nullbitmap); + /* done with these */ + pfree(values); + pfree(nulls); + pfree(phony_nullbitmap); + /* * Now fill in the real null bitmasks. allnulls first. */ *************** *** 243,251 **** DeformedMMTuple * minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple) { DeformedMMTuple *dtup; ! Datum values[tupdesc->natts * 2]; ! bool allnulls[tupdesc->natts]; ! bool hasnulls[tupdesc->natts]; char *tp; bits8 *nullbits = NULL; int keyno; --- 257,265 ---- minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple) { DeformedMMTuple *dtup; ! Datum *values; ! bool *allnulls; ! bool *hasnulls; char *tp; bits8 *nullbits = NULL; int keyno; *************** *** 253,258 **** minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple) --- 267,276 ---- dtup = palloc(offsetof(DeformedMMTuple, values) + sizeof(MMValues) * tupdesc->natts); + values = palloc(sizeof(Datum) * tupdesc->natts * 2); + allnulls = palloc(sizeof(bool) * tupdesc->natts); + hasnulls = palloc(sizeof(bool) * tupdesc->natts); + tp = (char *) tuple + MMTupleDataOffset(tuple); if (MMTupleHasNulls(tuple)) *************** *** 277,282 **** minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple) --- 295,304 ---- dtup->values[keyno].allnulls = false; } + pfree(values); + pfree(allnulls); + pfree(hasnulls); + return dtup; } *************** *** 293,298 **** minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple) --- 315,322 ---- * values output values, size 2 * natts (alternates min and max) * allnulls output "allnulls", size natts * hasnulls output "hasnulls", size natts + * + * Output arrays are allocated by caller. */ static inline void mm_deconstruct_tuple(char *tp, bits8 *nullbits, bool nulls,
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers