Re: POC, WIP: OR-clause support for indexes

Alena Rybakina Thu, 29 Jun 2023 02:33:20 -0700

Hi! I'm sorry I didn't answer you right away, I was too busy with work.


On 27.06.2023 22:50, Peter Geoghegan wrote:

On Tue, Jun 27, 2023 at 6:19 AM Alena Rybakina<[email protected]>  wrote:

I learned something new from your letter, thank you very much for that!

Cool. The MDAM paper is also worth a read:

https://vldb.org/conf/1995/P710.PDF

Some of the techniques it describes are already in Postgres. With
varying degrees of maturity.

The paper actually mentions OR optimization at one point, under
"Duplicate Elimination". The general idea is that ScalarArrayOpExpr
execution can "eliminate duplicates before the data is read". The
important underlying principle is that it can be really useful to give
the B-Tree code the context it requires to be clever about stuff like
that. We can do this by (say) using one ScalarArrayOpExpr, rather than
using two or more index scans that the B-Tree code will treat as
independent things. So a lot of the value in your patch comes from the
way that it can enable other optimizations (the immediate benefits are
  also nice).

In the past, OR optimizations have been prototyped that were later
withdrawn/rejected because the duplicate elimination aspect was...too
scary [1]. It's very easy to see that ScalarArrayOpExpr index scans
don't really have the same problem. "Giving the B-Tree code the
required context" helps here too.

Thank you for the explanation and the material provided) unfortunately,I am still only studying the article and at the moment I cannot writemore. To be honest, I didn't think about the fact that my optimizationcan help eliminate duplicates before reading the data before.

I am still only in the process of familiarizing myself with the thread[1] (reference from your letter), but I have already seen that there areproblems related, for example, to when "or" expressions refer to theparent element.

I think, I would face the similar problems if I complicate the currentcode, for example, so that not only or expressions standing on the samelevel are written in any, but also on different ones without violatingthe logic of the priority of executing operators.


For example, this query works now:

postgres=# EXPLAIN (analyze, COSTS OFF)
SELECT oid,relname FROM pg_class
WHERE
  (oid = 13779 OR oid = 2) OR (oid = 4 OR oid = 5) OR
  relname = 'pg_extension'
;

                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Seq Scan on pg_class (actual time=0.086..0.140 rows=1 loops=1)

Filter: ((oid = ANY ('{4,5}'::oid[])) OR (oid = ANY('{13779,2}'::oid[])) OR (relname = 'pg_extension'::name))

   Rows Removed by Filter: 412
 Planning Time: 2.135 ms
 Execution Time: 0.160 ms
(5 rows)

But I would like it works such as:

                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Seq Scan on pg_class (actual time=0.279..0.496 rows=1 loops=1)

Filter: ((oid = ANY ('{13779,2,4,5}'::oid[])) OR (relname ='pg_extension'::name))

   Rows Removed by Filter: 412
 Planning Time: 0.266 ms
 Execution Time: 0.536 ms
(5 rows)

I analyzed the buffer consumption when I ran control regression tests using my 
patch. diff shows me that there is no difference between the number of buffer 
block scans without and using my patch, as far as I have seen. 
(regression.diffs)

To be clear, I wasn't expecting that there'd be any regressions from
your patch. Intuitively, it seems like this optimization should make
the query plan do almost the same thing at execution time -- just
slightly more efficiently on average, and much more efficiently in
some individual cases.

It would probably be very hard for the optimizer to model/predict how
much work it can save by using a ScalarArrayOpExpr instead of an
"equivalent" set of bitmap index scans, OR'd together. But it doesn't
necessarily matter -- the only truly critical detail is understanding
the worst case for the transformation optimization.

Yes, I agree with you and I have yet to analyze this.

  It cannot be too
bad (maybe it's ~zero added runtime overhead relative to not doing the
transformation, even?).

I haven't seen a major performance degradation so far, but to be honest,I have not conducted a detailed analysis on other types of queries otherthan x=1 or x=2 or x=1 or y=2, etc. As soon as something is known, Iwill provide the data, it is very interesting to me.

At the same time, nbtree can be clever about
ScalarArrayOpExpr execution at runtime (once that's implemented),
without ever needing to make any kind of up-front commitment to
navigating through the index in any particular way. It's all dynamic,
and can be driven by the actual observed characteristics of the index
structure.

In other words, we don't really need to gamble (in the planner, or at
execution time). We're just keeping our options open in more cases.
(My thinking on these topics was influenced by Goetz Graefe -- "choice
is confusion" [2]).

Unfortunately, when I tried to make a transformation at the stage ofindex formation, I encountered too incorrect an assessment of theselectivity of relation, which affected the incorrect calculation of thecost and cardinality. I couldn't solve this problem.


My diff (transform_or_v0.diff). I got this result:

CREATE TABLE tenk1 (unique1int, unique2int, tenint, hundredint);
insert into tenk1 SELECT x,x,x,x FROM generate_series(1,50000) as x;
CREATE INDEX a_idx1 ON tenk1(unique1);
CREATE INDEX a_idx2 ON tenk1(unique2);
CREATE INDEX a_hundred ON tenk1(hundred);

postgres=# explain analyze
select * from tenk1 a join tenk1 b on
  a.unique2 = 3 or a.unique2 = 7 or a.unique1 = 1;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..15627479.50 rows=1250050000 width=32) (actual 
time=0.040..75.531 rows=150000 loops=1)
   ->  Seq Scan on tenk1 b  (cost=0.00..771.00 rows=50000 width=16) (actual 
time=0.022..5.467 rows=50000 loops=1)
   ->  Materialize  (cost=0.00..1146.01 rows=25001 width=16) (actual 
time=0.000..0.001 rows=3 loops=50000)
         ->  Seq Scan on tenk1 a  (cost=0.00..1021.00 rows=25001 width=16) 
(actual time=0.011..22.789 rows=3 loops=1)
               Filter: ((unique2 = ANY (ARRAY[3, 7])) OR (unique1 = 1))
               Rows Removed by Filter: 49997
 Planning Time: 0.427 ms
 Execution Time: 80.027 ms
(8 rows)

The current patch's result:

postgres=# set enable_bitmapscan ='off';
SET
postgres=# explain analyze
select * from tenk1 a join tenk1 b on
  a.unique2 = 3 or a.unique2 = 7 or a.unique1 = 1;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..22247.02 rows=1350000 width=32) (actual 
time=0.094..373.627 rows=1350000 loops=1)
   ->  Seq Scan on tenk1 b  (cost=0.00..2311.00 rows=150000 width=16) (actual 
time=0.051..14.667 rows=150000 loops=1)
   ->  Materialize  (cost=0.00..3061.05 rows=9 width=16) (actual 
time=0.000..0.001 rows=9 loops=150000)
         ->  Seq Scan on tenk1 a  (cost=0.00..3061.00 rows=9 width=16) (actual 
time=0.026..42.389 rows=9 loops=1)
               Filter: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
               Rows Removed by Filter: 149991
 Planning Time: 0.414 ms
 Execution Time: 409.154 ms
(8 rows)

[1]https://www.postgresql.org/message-id/flat/1397.1486598083%40sss.pgh.pa.us#310f974a8dc84478d6d3c70f336807bb
[2]https://sigmodrecord.org/publications/sigmodRecord/2009/pdfs/05_Profiles_Graefe.pdf

Thank you again for the explanations and the material provided. I willcarefully study everything as soon as possible and will write if thereare any thoughts or if there are ideas about my patch.


--

Regards,
Alena Rybakina
Postgres Professional

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 0065c8992bd..8ef3438d78c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -193,6 +193,273 @@ static bool ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
 									   EquivalenceClass *ec, EquivalenceMember *em,
 									   void *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *consts;
+	Oid				collation;
+	Oid				opno;
+	RestrictInfo   *rinfo;
+} OrClauseGroupEntry;
+
+/*
+ * Pass through baserestrictinfo clauses and try to convert OR clauses into IN
+ * Return a modified clause list or just the same baserestrictinfo, if no
+ * changes have made.
+ * XXX: do not change source list of clauses at all.
+ */
+static List *
+transform_ors(PlannerInfo *root, List *baserestrictinfo)
+{
+	ListCell   *lc;
+	ListCell   *lc_cp;
+	List	   *modified_rinfo = NIL;
+	bool		something_changed = false;
+	List	   *baserestrictinfo_origin = list_copy(baserestrictinfo);
+
+	/*
+	 * Complexity of a clause could be arbitrarily sophisticated. Here, we will
+	 * look up only on the top level of clause list.
+	 * XXX: It is substantiated? Could we change something here?
+	 */
+	forboth (lc, baserestrictinfo, lc_cp, baserestrictinfo_origin)
+	{
+		RestrictInfo   *rinfo = lfirst_node(RestrictInfo, lc);
+		RestrictInfo   *rinfo_base = lfirst_node(RestrictInfo, lc_cp);
+		List		   *or_list = NIL;
+		ListCell	   *lc_eargs,
+					   *lc_rargs,
+					   *lc_args;
+		List		   *groups_list = NIL;
+		bool			change_apply = false;
+
+		if (!restriction_is_or_clause(rinfo))
+		{
+			/* Add a clause without changes */
+			modified_rinfo = lappend(modified_rinfo, rinfo);
+			continue;
+		}
+
+		/*
+		 * NOTE:
+		 * It is an OR-clause. So, rinfo->orclause is a BoolExpr node, contains
+		 * a list of sub-restrictinfo args, and rinfo->clause - which is the
+		 * same expression, made from bare clauses. To not break selectivity
+		 * caches and other optimizations, use both:
+		 * - use rinfos from orclause if no transformation needed
+		 * - use  bare quals from rinfo->clause in the case of transformation,
+		 * to create new RestrictInfo: in this case we have no options to avoid
+		 * selectivity estimation procedure.
+		 */
+		forboth(lc_eargs, ((BoolExpr *) rinfo->clause)->args,
+				lc_rargs, ((BoolExpr *) rinfo->orclause)->args)
+		{
+			Expr			   *bare_orarg = (Expr *) lfirst(lc_eargs);
+			RestrictInfo	   *sub_rinfo;
+			Node			   *const_expr;
+			Node			   *non_const_expr;
+			ListCell		   *lc_groups;
+			OrClauseGroupEntry *gentry;
+			Oid					opno;
+
+			/* It may be one more boolean expression, skip it for now */
+			if (!IsA(lfirst(lc_rargs), RestrictInfo))
+			{
+				or_list = lappend(or_list, (void *) bare_orarg);
+				continue;
+			}
+
+			sub_rinfo = lfirst_node(RestrictInfo, lc_rargs);
+
+			/* Check: it is an expr of the form 'F(x) oper ConstExpr' */
+			if (!IsA(bare_orarg, OpExpr) ||
+				!(bms_is_empty(sub_rinfo->left_relids) ^
+				bms_is_empty(sub_rinfo->right_relids)) ||
+				contain_volatile_functions((Node *) bare_orarg))
+			{
+				/* Again, it's not the expr we can transform */
+				or_list = lappend(or_list, (void *) bare_orarg);
+				continue;
+			}
+
+			/* Get pointers to constant and expression sides of the clause */
+			const_expr =bms_is_empty(sub_rinfo->left_relids) ?
+												get_leftop(sub_rinfo->clause) :
+												get_rightop(sub_rinfo->clause);
+			non_const_expr = bms_is_empty(sub_rinfo->left_relids) ?
+												get_rightop(sub_rinfo->clause) :
+												get_leftop(sub_rinfo->clause);
+
+			opno = ((OpExpr *) sub_rinfo->clause)->opno;
+			if (!op_mergejoinable(opno, exprType(non_const_expr)))
+			{
+				/* And again, filter out non-equality operators */
+				or_list = lappend(or_list, (void *) bare_orarg);
+				continue;
+			}
+
+			/*
+			 * At this point we definitely have a transformable clause.
+			 * Classify it and add into specific group of clauses, or create new
+			 * group.
+			 * TODO: to manage complexity in the case of many different clauses
+			 * (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something
+			 * like a hash table (htab key ???).
+			 */
+			foreach(lc_groups, groups_list)
+			{
+				OrClauseGroupEntry *v = (OrClauseGroupEntry *) lfirst(lc_groups);
+
+				Assert(v->node != NULL);
+
+				if (equal(v->node, non_const_expr))
+				{
+					v->consts = lappend(v->consts, const_expr);
+					non_const_expr = NULL;
+					break;
+				}
+			}
+
+			if (non_const_expr == NULL)
+				/*
+				 * The clause classified successfully and added into existed
+				 * clause group.
+				 */
+				continue;
+
+			/* New clause group needed */
+			gentry = palloc(sizeof(OrClauseGroupEntry));
+			gentry->node = non_const_expr;
+			gentry->consts = list_make1(const_expr);
+			gentry->collation = exprInputCollation((Node *)sub_rinfo->clause);
+			gentry->opno = opno;
+			gentry->rinfo = sub_rinfo;
+			groups_list = lappend(groups_list,  (void *) gentry);
+		}
+
+		if (groups_list == NIL)
+		{
+			/*
+			 * No any transformations possible with this rinfo, just add itself
+			 * to the list and go further.
+			 */
+			modified_rinfo = lappend(modified_rinfo, rinfo);
+			continue;
+		}
+
+		/* Let's convert each group of clauses to an IN operation. */
+
+		/*
+		 * Go through the list of groups and convert each, where number of
+		 * consts more than 1. trivial groups move to OR-list again
+		 */
+
+		foreach(lc_args, groups_list)
+		{
+			OrClauseGroupEntry *gentry = (OrClauseGroupEntry *) lfirst(lc_args);
+			ScalarArrayOpExpr  *saopexpr;
+			ArrayExpr		   *newa;
+
+			Assert(list_length(gentry->consts) > 0);
+
+			if (list_length(gentry->consts) == 1)
+			{
+				/*
+				 * Only one element in the class. Return rinfo into the BoolExpr
+				 * args list unchanged.
+				 */
+				list_free(gentry->consts);
+				or_list = lappend(or_list, (void *) gentry->rinfo->clause);
+				continue;
+			}
+
+			/*
+			 * Do the transformation. It's been a long way ;)
+			 *
+			 * As a source of insides, use make_scalar_array_op()
+			 */
+
+			newa = makeNode(ArrayExpr);
+			newa->element_typeid = exprType(gentry->node);
+			newa->array_typeid = newa->element_typeid; /* don't used in the case of one dimension, but exprType returns this value */
+			newa->multidims = false;
+			newa->elements = gentry->consts;
+			newa->location = -1; /* Position of the new clause is undefined */
+
+			saopexpr = makeNode(ScalarArrayOpExpr);
+			saopexpr->opno = gentry->opno;
+			saopexpr->opfuncid = get_opcode(gentry->opno);
+			saopexpr->useOr = true;
+			saopexpr->inputcollid = gentry->collation;
+			saopexpr->args = list_make2(gentry->node, newa);
+			saopexpr->location = -1;
+
+			/* TODO: study on these parameters. */
+			saopexpr->hashfuncid = InvalidOid;
+			saopexpr->negfuncid = InvalidOid;
+
+			/*
+			 * TODO: here we can try to coerce the array to a Const and find
+			 * hash func instead of linear search (see 50e17ad281b).
+			 * convert_saop_to_hashed_saop((Node *) saopexpr);
+			 */
+
+			or_list = lappend(or_list, (void *) saopexpr);
+			change_apply = true;
+		}
+
+		if (!change_apply)
+		{
+			/*
+			 * Each group contains only one element - use rinfo as is.
+			 */
+			modified_rinfo = lappend(modified_rinfo, rinfo);
+			list_free(or_list);
+			list_free_deep(groups_list);
+			continue;
+		}
+
+		/*
+		 * Make a new version of the restriction. Remember source restriction
+		 * can be used in another path (SeqScan, for example).
+		 */
+
+		/* One more trick: assemble correct clause */
+		rinfo = make_restrictinfo(root,
+				  list_length(or_list) > 1 ? make_orclause(or_list) :
+											 (Expr *) linitial(or_list),
+				  rinfo->is_pushed_down,
+				  rinfo->has_clone,
+				  rinfo->is_clone,
+				  rinfo->pseudoconstant,
+				  rinfo->security_level,
+				  rinfo->required_relids,
+				  rinfo->incompatible_relids,
+				  rinfo->outer_relids);
+		rinfo->eval_cost=rinfo_base->eval_cost;
+		rinfo->norm_selec=rinfo_base->norm_selec;
+		rinfo->outer_selec=rinfo_base->outer_selec;
+		rinfo->left_bucketsize=rinfo_base->left_bucketsize;
+		rinfo->right_bucketsize=rinfo_base->right_bucketsize;
+		rinfo->left_mcvfreq=rinfo_base->left_mcvfreq;
+		rinfo->right_mcvfreq=rinfo_base->right_mcvfreq;
+		modified_rinfo = lappend(modified_rinfo, rinfo);
+		list_free_deep(groups_list);
+		something_changed = true;
+	}
+
+	/*
+	 * Check if transformation has made. If nothing changed - return
+	 * baserestrictinfo as is.
+	 */
+	if (something_changed)
+	{
+		return modified_rinfo;
+	}
+
+	list_free(modified_rinfo);
+	return baserestrictinfo;
+}
 
 /*
  * create_index_paths()
@@ -3309,11 +3576,16 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
 	 * not, this is all we need to do.
 	 */
 	have_partial = false;
+
+	if (rel->reloptkind == RELOPT_BASEREL)
+		rel->baserestrictinfo = transform_ors(root, rel->baserestrictinfo);
+
 	foreach(lc, rel->indexlist)
 	{
 		IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
 
 		index->indrestrictinfo = rel->baserestrictinfo;
+
 		if (index->indpred)
 			have_partial = true;
 	}

Re: POC, WIP: OR-clause support for indexes

Reply via email to