Hi!

    At the moment, I'm not sure that the constant is the right number
    for applying transformations, so I'm in search of it, to be
    honest. I will post my observations on this issue later. If you
    don't mind, I'll leave the constant equal to 15 for now.

It's hard to predict. Perhaps accounting for time on each benchmark could help decide.

I will try to test on JOB [1] (because queries are difficult for the optimizer due to the significant number of joins and correlations contained in the dataset) and tpcds [2] (the benchmark I noticed contains a sufficient number of queries with "or" expressions).

    Sorry, I don't understand well enough what is meant by points
    "Eliminate unnecessary variables" and "Eliminate unnecessary
    expressions". Can you explain in more detail?

One example is array_type.
As you can see in v2 and v3 it no longer exists.

I get it. Honestly, I was guided by the example of converting "IN" to "ANY" (transformAExprIn), at least the part of the code when we specifically convert the expression to ScalarArrayOpExpr.

Both there and here, we first look for a common type for the collected constants, and if there is one, then we try to find the type for the array structure.

Only I think in my current patch it is also worth returning to the original version in this place, since if it is not found, the ScalarArrayOpExpr generation function will be processed incorrectly and the request may not be executed at all, referring to the error that it is impossible to determine the type of node (ERROR: unrecognized node type. )

At the same time we are trying to do this transformation for each group. The group here implies that these are combined "or" expressions on the common left side, and at the same time we consider
only expressions that contain a constant and only equality.

What else should be taken into account is that we are trying to do this processing before forming a BoolExpr expression (if you notice, then after any outcome we call the makeBoolExpr function, which just forms the "Or" expression, as in the original version, regardless of what type of expressions it combines.


As I said earlier, these are just suggestions.
But thinking about it now, I think they can be classified as bad early optimizations.
Thank you again for your interest in this problem and help. Yes, I think so too)


1. https://github.com/gregrahn/join-order-benchmark

2. https://github.com/Alena0704/s64da-benchmark-toolkit/tree/master/benchmarks/tpcds

--
Regards,
Alena Rybakina
Postgres Professional
From f9f5958707bc1d7931323df05d51a60fc9d6cd38 Mon Sep 17 00:00:00 2001
From: Alena Rybakina <a.rybak...@postgrespro.ru>
Date: Thu, 29 Jun 2023 17:46:58 +0300
Subject: [PATCH] Replace OR clause to ANY expressions. Replace (X=N1) OR
 (X=N2) ... with X = ANY(N1, N2) on the stage of the optimiser when we are
 still working with a tree expression. Firstly, we do not try to make a
 transformation for "non-or" expressions or inequalities and the creation of a
 relation with "or" expressions occurs according to the same scenario.
 Secondly, we do not make transformations if there are less than 15 or
 expressions. Thirdly, it is worth considering that we consider "or"
 expressions only at the current level.

---
 src/backend/parser/parse_expr.c  | 290 ++++++++++++++++++++++++++++++-
 src/tools/pgindent/typedefs.list |   1 +
 2 files changed, 290 insertions(+), 1 deletion(-)

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 346fd272b6d..3d395fd6459 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -95,6 +95,294 @@ static Expr *make_distinct_op(ParseState *pstate, List *opname,
 static Node *make_nulltest_from_distinct(ParseState *pstate,
 										 A_Expr *distincta, Node *arg);
 
+typedef struct OrClauseGroupEntry
+{
+	Node		   *node;
+	List		   *consts;
+	Oid				scalar_type;
+	Oid				opno;
+	Expr 		   *expr;
+} OrClauseGroupEntry;
+
+static int const_transform_or_limit = 15;
+
+static Node *
+transformBoolExprOr(ParseState *pstate, Expr *expr_orig)
+{
+	List		   *or_list = NIL;
+	List		   *groups_list = NIL;
+	ListCell	   *lc_eargs;
+	Node 		   *result;
+	BoolExpr 	   *expr = (BoolExpr *)copyObject(expr_orig);
+	const char 	   *opname;
+	bool			change_apply = false;
+	bool			or_statement = false;
+
+	Assert(IsA(expr, BoolExpr));
+
+	/* If this is not expression "Or", then will do it the old way. */
+	switch (expr->boolop)
+	{
+		case AND_EXPR:
+			opname = "AND";
+			break;
+		case OR_EXPR:
+			opname = "OR";
+			or_statement = true;
+			break;
+		case NOT_EXPR:
+			opname = "NOT";
+			break;
+		default:
+			elog(ERROR, "unrecognized boolop: %d", (int) expr->boolop);
+			opname = NULL;		/* keep compiler quiet */
+			break;
+	}
+
+	if (!or_statement || list_length(expr->args) < const_transform_or_limit)
+		return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
+
+	/*
+		* NOTE:
+		* It is an OR-clause. So, rinfo->orclause is a BoolExpr node, contains
+		* a list of sub-restrictinfo args, and rinfo->clause - which is the
+		* same expression, made from bare clauses. To not break selectivity
+		* caches and other optimizations, use both:
+		* - use rinfos from orclause if no transformation needed
+		* - use  bare quals from rinfo->clause in the case of transformation,
+		* to create new RestrictInfo: in this case we have no options to avoid
+		* selectivity estimation procedure.
+		*/
+	foreach(lc_eargs, expr->args)
+	{
+		A_Expr			   *arg = (A_Expr *) lfirst(lc_eargs);
+		Node			   *bare_orarg;
+		Node			   *const_expr;
+		Node			   *non_const_expr;
+		ListCell		   *lc_groups;
+		OrClauseGroupEntry *gentry;
+		bool 				allow_transformation;
+
+		/*
+		 * The first step: checking that the expression consists only of equality.
+		 * We can only do this here, while arg is still row data type, namely A_Expr.
+		 * After applying transformExprRecurce, we already know that it will be OpExr type,
+		 * but checking the expression for equality is already becoming impossible for us.
+		 * Sometimes we have the chance to devide expression into the groups on
+		 * equality and inequality. This is possible if all list items are not at the
+		 * same level of a single BoolExpr expression, otherwise all of them cannot be converted.
+		 */
+
+		if (!arg)
+			break;
+
+		allow_transformation = (
+								or_statement &&
+		                        arg->type == T_A_Expr && (arg)->kind == AEXPR_OP &&
+							    list_length((arg)->name) >=1 && strcmp(strVal(linitial((arg)->name)), "=") == 0
+							   );
+
+
+		bare_orarg = transformExprRecurse(pstate, (Node *)arg);
+		bare_orarg = coerce_to_boolean(pstate, bare_orarg, opname);
+
+		/*
+		 * The next step: transform all the inputs, and detect whether any contain
+	 	 * Vars.
+		 */
+		if (!allow_transformation || !bare_orarg || !IsA(bare_orarg, OpExpr) || !contain_vars_of_level(bare_orarg, 0))
+		{
+			/* Again, it's not the expr we can transform */
+			or_list = lappend(or_list, bare_orarg);
+			continue;
+		}
+
+		/*
+		 * Get pointers to constant and expression sides of the clause
+		 */
+		non_const_expr = get_leftop(bare_orarg);
+		const_expr = get_rightop(bare_orarg);
+
+		/*
+			* At this point we definitely have a transformable clause.
+			* Classify it and add into specific group of clauses, or create new
+			* group.
+			* TODO: to manage complexity in the case of many different clauses
+			* (X1=C1) OR (X2=C2 OR) ... (XN = CN) we could invent something
+			* like a hash table (htab key ???).
+			*/
+		foreach(lc_groups, groups_list)
+		{
+			OrClauseGroupEntry *v = (OrClauseGroupEntry *) lfirst(lc_groups);
+
+			Assert(v->node != NULL);
+
+			if (equal(v->node, non_const_expr))
+			{
+				v->consts = lappend(v->consts, const_expr);
+				non_const_expr = NULL;
+				break;
+			}
+		}
+
+		if (non_const_expr == NULL)
+			/*
+				* The clause classified successfully and added into existed
+				* clause group.
+				*/
+			continue;
+
+		/* New clause group needed */
+		gentry = palloc(sizeof(OrClauseGroupEntry));
+		gentry->node = non_const_expr;
+		gentry->consts = list_make1(const_expr);
+		gentry->opno = ((OpExpr *)bare_orarg)->opno;
+		gentry->expr = (Expr *)bare_orarg;
+		groups_list = lappend(groups_list,  (void *) gentry);
+	}
+
+	if (groups_list == NIL)
+	{
+		/*
+		* No any transformations possible with this rinfo, just add itself
+		* to the list and go further.
+		*/
+		list_free(or_list);
+
+		return transformBoolExpr(pstate, (BoolExpr *)expr_orig);
+	}
+	else
+	{
+		ListCell	   *lc_args;
+
+		/* Let's convert each group of clauses to an IN operation. */
+
+		/*
+		* Go through the list of groups and convert each, where number of
+		* consts more than 1. trivial groups move to OR-list again
+		*/
+
+		foreach(lc_args, groups_list)
+		{
+			OrClauseGroupEntry *gentry = (OrClauseGroupEntry *) lfirst(lc_args);
+			List			   *allexprs;
+			Oid				    scalar_type;
+			Oid					array_type;
+
+			Assert(list_length(gentry->consts) > 0);
+
+			if (list_length(gentry->consts) == 1)
+			{
+				/*
+				 * Only one element in the class. Return rinfo into the BoolExpr
+				 * args list unchanged.
+				 */
+				list_free(gentry->consts);
+				or_list = lappend(or_list, gentry->expr);
+				continue;
+			}
+
+			/*
+			 * Do the transformation. It's been a long way ;)
+			 *
+			 * First of all, try to select a common type for the array elements.  Note that
+			 * since the LHS' type is first in the list, it will be preferred when
+			 * there is doubt (eg, when all the RHS items are unknown literals).
+			 *
+			 * Note: use list_concat here not lcons, to avoid damaging rnonvars.
+			 *
+			 * As a source of insides, use make_scalar_array_op()
+			 */
+			allexprs = list_concat(list_make1(gentry->node), gentry->consts);
+			scalar_type = select_common_type(NULL, allexprs, NULL, NULL);
+
+			if (OidIsValid(scalar_type) && scalar_type != RECORDOID)
+				array_type = get_array_type(scalar_type);
+			else
+				array_type = InvalidOid;
+
+			if (array_type != InvalidOid)
+			{
+				/*
+			 	 * OK: coerce all the right-hand non-Var inputs to the common type
+			 	 * and build an ArrayExpr for them.
+				 */
+				List	   *aexprs;
+				ArrayExpr  *newa;
+				ScalarArrayOpExpr *saopexpr;
+				ListCell *l;
+
+				aexprs = NIL;
+
+				foreach(l, gentry->consts)
+				{
+					Node	   *rexpr = (Node *) lfirst(l);
+
+					rexpr = coerce_to_common_type(pstate, rexpr,
+												scalar_type,
+												"IN");
+					aexprs = lappend(aexprs, rexpr);
+				}
+
+				newa = makeNode(ArrayExpr);
+				/* array_collid will be set by parse_collate.c */
+				newa->element_typeid = scalar_type;
+				newa->array_typeid = get_array_type(scalar_type);
+				newa->multidims = false;
+				newa->elements = aexprs;
+				newa->location = -1; /* Position of the new clause is undefined */
+
+				saopexpr = (ScalarArrayOpExpr *)make_scalar_array_op(pstate,
+												   list_make1(makeString((char *) "=")),
+												   true,
+												   gentry->node,
+												   (Node *) newa,
+												   -1); /* Position of the new clause is undefined */
+
+				/*
+				* TODO: here we can try to coerce the array to a Const and find
+				* hash func instead of linear search (see 50e17ad281b).
+				* convert_saop_to_hashed_saop((Node *) saopexpr);
+				* We don't have to do this anymore, do we?
+				*/
+
+				or_list = lappend(or_list, (void *) saopexpr);
+				change_apply = true;
+			}
+			else
+			{
+				list_free(gentry->consts);
+				or_list = lappend(or_list, gentry->expr);
+				continue;
+			}
+		}
+
+		if (!change_apply)
+		{
+			or_statement = false;
+		}
+		list_free_deep(groups_list);
+	}
+
+	if (or_statement)
+	{
+		/* One more trick: assemble correct clause */
+		expr = list_length(or_list) > 1 ? makeBoolExpr(OR_EXPR, list_copy(or_list), -1) : linitial(or_list);
+		result = (Node *)expr;
+	}
+	/*
+	 * There was no reasons to create a new expresion, so
+	 * run the original BoolExpr conversion with using
+	 * transformBoolExpr function
+	 */
+	else
+	{
+		result = transformBoolExpr(pstate, (BoolExpr *)expr_orig);
+	}
+	list_free(or_list);
+
+	return result;
+}
 
 /*
  * transformExpr -
@@ -208,7 +496,7 @@ transformExprRecurse(ParseState *pstate, Node *expr)
 			}
 
 		case T_BoolExpr:
-			result = transformBoolExpr(pstate, (BoolExpr *) expr);
+			result = (Node *)transformBoolExprOr(pstate, (Expr *)expr);
 			break;
 
 		case T_FuncCall:
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 260854747b4..cc1b676d200 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1631,6 +1631,7 @@ NumericVar
 OM_uint32
 OP
 OSAPerGroupState
+OrClauseGroupEntry
 OSAPerQueryState
 OSInfo
 OSSLCipher
-- 
2.34.1

Reply via email to