Re: [HACKERS] Minmax indexes

Alvaro Herrera Wed, 25 Sep 2013 13:18:28 -0700

Amit Kapila escribió:
> On Sun, Sep 15, 2013 at 5:44 AM, Alvaro Herrera
> <[email protected]> wrote:


> > One thing still to tackle is when to mark ranges as unsummarized.  Right
> > now, any new tuple on a page range would cause a new index entry to be
> > created and a new revmap update.  This would cause huge index bloat if,
> > say, a page is emptied and vacuumed and filled with new tuples with
> > increasing values outside the original range; each new tuple would
> > create a new index tuple.  I have two ideas about this (1. mark range as
> > unsummarized if 3rd time we touch the same page range;
> 
>    Why only at 3rd time?
>    Doesn't it need to be precise, like if someone inserts a row having
> value greater than max value of corresponding index tuple,
>    then that index tuple's corresponding max value needs to be updated
> and I think its updated with the help of validity map.

Of course.  Note I no longer have the concept of a validity map; I have
switched things to use a "reverse range map", or revmap for short.  The
revmap is responsible for mapping each page number to an individual
index TID.  If the TID stored in the revmap is InvalidTid, that means
the range is not summarized.  Summarized ranges are always considered as
"match query quals", and thus all tuples in them are returned in the
bitmap for heap recheck.

The way it works currently, is that any tuple insert (that's outside the
bounds of the current index tuple) causes a new index tuple to be
created, and the revmap is updated to point to the new index tuple.  The
old index tuple is orphaned and will be deleted at next vacuum.  This
works fine.  However the problem is excess orphaned tuples; I don't want
a long series of updates to create many orphaned dead tuples.  Instead I
would like the system to, at some point, stop creating new index tuples
and instead set the revmap to InvalidTid.  That would stop the index
bloat.

>    For example:
>    considering we need to store below info for each index tuple:
>    In each index tuple (corresponding to one page range), we store:
>     - first block this tuple applies to
>     - last block this tuple applies to
>     - for each indexed column:
>       * min() value across all tuples in the range
>       * max() value across all tuples in the range
> 
>    Assume first and last block for index tuple is same (assume block
> no. 'x') and min value is 5 and max is 10.
>    Now user insert/update value in block 'x' such that max value of
> index col. is 11, if we don't update corresponding
>    index tuple or at least invalidate it, won't it lead to wrong results?

Sure, that would result in wrong results.  Fortunately that's not how I
am suggesting to do it.

I note you're reading an old version of the design.  I realize now that
this is my mistake because instead of posting the new design in the
cover letter for the patch, I only put it in the "minmax-proposal" file.
Please give that file a read to see how the design differs from the
design I originally posted in the old thread.


> > The "amcostestimate" routine is completely bogus; right now it returns
> > constant 0, meaning the index is always chosen if it exists.
> 
>   I think for first version, you might want to keep things simple, but
> there should be some way for optimizer to select this index.
>   So rather than choose if it is present, we can make optimizer choose
> when some-one says set enable_minmax index to true.

Well, enable_bitmapscan already disables minmax indexes, just like it
disables other indexes.

>   How about keeping this up-to-date during foreground operations.
> Vacuum/Maintainer task maintaining things usually have problems of
> bloat and
>   then we need optimize/workaround issues.
>   Lot of people have raised this or similar point previously and what
> I read you are of opinion that it seems to be slow.

Well, the current code does keep the index up to date -- I did choose to
implement what people suggested :-)

>   Now it can so happen that min and max values are sometimes not right
> because later the operation is rolled back, but I think such cases
> will
>   be less and we can find some way to handle such cases may be
> maintainer task only, but the handling will be quite simpler.

Agreed.

>   On Windows, patch gives below compilation errors:
>   src\backend\access\minmax\mmtuple.c(96): error C2057: expected
> constant expression

I have fixed all these compile errors (fix attached).  Thanks for
reporting them.  I'll post a new version shortly.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

*** a/src/backend/access/minmax/mmtuple.c
--- b/src/backend/access/minmax/mmtuple.c
***************
*** 93,117 **** MMTuple *
  minmax_form_tuple(TupleDesc idxDsc, TupleDesc diskDsc, DeformedMMTuple *tuple,
  				  Size *size)
  {
! 	Datum		values[diskDsc->natts];
! 	bool		nulls[diskDsc->natts];
  	bool		anynulls = false;
  	MMTuple    *rettuple;
  	int			keyno;
  	uint16		phony_infomask;
! 	bits8		phony_nullbitmap[BITMAPLEN(diskDsc->natts)];
  	Size		len,
  				hoff,
  				data_len;
  
  	/*
  	 * Set up the values/nulls arrays for heap_fill_tuple
  	 */
- 	MemSet(nulls, 0, sizeof(nulls));
  	for (keyno = 0; keyno < idxDsc->natts; keyno++)
  	{
! 		AttrNumber	idxattno = keyno * 2;
  
  		if (tuple->values[keyno].allnulls)
  		{
  			nulls[idxattno] = true;
--- 93,126 ----
  minmax_form_tuple(TupleDesc idxDsc, TupleDesc diskDsc, DeformedMMTuple *tuple,
  				  Size *size)
  {
! 	Datum	   *values;
! 	bool	   *nulls;
  	bool		anynulls = false;
  	MMTuple    *rettuple;
  	int			keyno;
  	uint16		phony_infomask;
! 	bits8	   *phony_nullbitmap;
  	Size		len,
  				hoff,
  				data_len;
  
+ 	Assert(diskDsc->natts > 0);
+ 
+ 	values = palloc(sizeof(Datum) * diskDsc->natts);
+ 	nulls = palloc0(sizeof(bool) * diskDsc->natts);
+ 	phony_nullbitmap = palloc(sizeof(bits8) * BITMAPLEN(diskDsc->natts));
+ 
  	/*
  	 * Set up the values/nulls arrays for heap_fill_tuple
  	 */
  	for (keyno = 0; keyno < idxDsc->natts; keyno++)
  	{
! 		int		idxattno = keyno * 2;
  
+ 		/*
+ 		 * "allnulls" is set when there's no nonnull value in any row in
+ 		 * the column; set the nullable bits for both min and max attrs.
+ 		 */
  		if (tuple->values[keyno].allnulls)
  		{
  			nulls[idxattno] = true;
***************
*** 168,173 **** minmax_form_tuple(TupleDesc idxDsc, TupleDesc diskDsc, DeformedMMTuple *tuple,
--- 177,187 ----
  					&phony_infomask,
  					phony_nullbitmap);
  
+ 	/* done with these */
+ 	pfree(values);
+ 	pfree(nulls);
+ 	pfree(phony_nullbitmap);
+ 
  	/*
  	 * Now fill in the real null bitmasks.	allnulls first.
  	 */
***************
*** 243,251 **** DeformedMMTuple *
  minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple)
  {
  	DeformedMMTuple *dtup;
! 	Datum		values[tupdesc->natts * 2];
! 	bool		allnulls[tupdesc->natts];
! 	bool		hasnulls[tupdesc->natts];
  	char	   *tp;
  	bits8	   *nullbits = NULL;
  	int			keyno;
--- 257,265 ----
  minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple)
  {
  	DeformedMMTuple *dtup;
! 	Datum	   *values;
! 	bool	   *allnulls;
! 	bool	   *hasnulls;
  	char	   *tp;
  	bits8	   *nullbits = NULL;
  	int			keyno;
***************
*** 253,258 **** minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple)
--- 267,276 ----
  	dtup = palloc(offsetof(DeformedMMTuple, values) +
  				  sizeof(MMValues) * tupdesc->natts);
  
+ 	values = palloc(sizeof(Datum) * tupdesc->natts * 2);
+ 	allnulls = palloc(sizeof(bool) * tupdesc->natts);
+ 	hasnulls = palloc(sizeof(bool) * tupdesc->natts);
+ 
  	tp = (char *) tuple + MMTupleDataOffset(tuple);
  
  	if (MMTupleHasNulls(tuple))
***************
*** 277,282 **** minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple)
--- 295,304 ----
  		dtup->values[keyno].allnulls = false;
  	}
  
+ 	pfree(values);
+ 	pfree(allnulls);
+ 	pfree(hasnulls);
+ 
  	return dtup;
  }
  
***************
*** 293,298 **** minmax_deform_tuple(TupleDesc tupdesc, MMTuple *tuple)
--- 315,322 ----
   *	values		output values, size 2 * natts (alternates min and max)
   *	allnulls	output "allnulls", size natts
   *	hasnulls	output "hasnulls", size natts
+  *
+  * Output arrays are allocated by caller.
   */
  static inline void
  mm_deconstruct_tuple(char *tp, bits8 *nullbits, bool nulls,

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minmax indexes

Reply via email to