Re: POC, WIP: OR-clause support for indexes

2024-07-28 Thread Alexander Korotkov
On Sun, Jul 28, 2024 at 12:59 PM Alena Rybakina
 wrote:
> On 27.07.2024 13:56, Alexander Korotkov wrote:
> > On Thu, Jul 25, 2024 at 5:04 PM Alena Rybakina
> >   wrote:
> >> To be honest, I have found a big problem in this patch - we try to perform 
> >> the transformation every time we examime a column:
> >>
> >> for (indexcol = 0; indexcol < index->nkeycolumns; indexcol++) { ...
> >>
> >> }
> >>
> >> I have fixed it and moved the transformation before going through the loop.
> > What makes you think there is a problem?
>
> To be honest, I was bothered by the fact that we need to go through
> expressions several times that obviously will not fit under other
> conditions.
> Just yesterday I thought that it would be worthwhile to create a list of
> candidates - expressions that did not fit because the column did not
> match the index (!match_index_to_operand(nconst_expr, indexcol, index)).

I admit that this area probably could use some optimization,
especially for case of many clauses and many indexes.  But in the
scope of this patch, I think this is enough to not make things worse
in this area.

> Another problem that is related to the first one that the boolexpr could
> contain expressions referring to different operands, for example, both x
> and y. that is, we may have the problem that the optimal "ANY"
> expression may not be used, because the expression with x may come
> earlier and the loop may end earlier.
>
> alena@postgres=# create table b (x int, y int);
> CREATE TABLE
> alena@postgres=# insert into b select id, id from
> generate_series(1,1000) as id;
> INSERT 0 1000
> alena@postgres=# create index x_idx on b(x);
> CREATE INDEX
> alena@postgres=# analyze;
> ANALYZE
> alena@postgres=# explain select * from b where y =3 or x =4 or x=5 or
> x=6 or x = 7 or x=8 or x=9;
>QUERY PLAN
> ---
>   Seq Scan on b  (cost=0.00..32.50 rows=7 width=8)
> Filter: ((y = 3) OR (x = 4) OR (x = 5) OR (x = 6) OR (x = 7) OR (x =
> 8) OR (x = 9))
> (2 rows)
> alena@postgres=# explain select * from b where x =4 or x=5 or x=6 or x =
> 7 or x=8 or x=9 or y=1;
>QUERY PLAN
> ---
>   Seq Scan on b  (cost=0.00..32.50 rows=7 width=8)
> Filter: ((x = 4) OR (x = 5) OR (x = 6) OR (x = 7) OR (x = 8) OR (x =
> 9) OR (y = 1))
> (2 rows)
> alena@postgres=# explain select * from b where x =4 or x=5 or x=6 or x =
> 7 or x=8 or x=9;
> QUERY PLAN
> 
>   Index Scan using x_idx on b  (cost=0.28..12.75 rows=6 width=8)
> Index Cond: (x = ANY ('{4,5,6,7,8,9}'::integer[]))
> (2 rows)
>
> Furthermore expressions can be stored in a different order.
> For example, first comes "AND" expr, and then group of "OR" expr, which
> we can convert to "ANY" expr, but we won't do this due to the fact that
> we will exit the loop early, according to this condition:
>
> if (!IsA(sub_rinfo->clause, OpExpr))
> return NULL;
>
> or it may occur due to other conditions.
>
> alena@postgres=# create index x_y_idx on b(x,y);
> CREATE INDEX
> alena@postgres=# analyze;
> ANALYZE
>
> alena@postgres=# explain select * from b where (x = 2 and y =3) or x =4
> or x=5 or x=6 or x = 7 or x=8 or x=9;
>   QUERY PLAN
> -
>   Seq Scan on b  (cost=0.00..35.00 rows=6 width=8)
> Filter: (((x = 2) AND (y = 3)) OR (x = 4) OR (x = 5) OR (x = 6) OR
> (x = 7) OR (x = 8) OR (x = 9))
> (2 rows)
>
> Because of these reasons, I tried to save this and that transformation
> together for each column and try to analyze for each expr separately
> which method would be optimal.

Yes, with v27 of the patch, optimization wouldn't work in these cases.
However, you are using quite small table.  If you will use larger
table or disable sequential scans, there would be bitmap plans to
handle these queries.  So, v27 doesn't make the situation worse. It
just doesn't optimize all that it could potentially optimize and
that's OK.

I've written a separate 0002 patch to address this.  Now, before
generation of paths for bitmap OR, similar OR entries are grouped
together.  When considering a group of similar entries, they are
considered both together and one-by-one.  Ideally we could consider
more sophisticated grouping, but that seems fine for now.  You can
check how this patch handles the cases of above.

Also, 0002 address issue of duplicated bitmap scan conditions in
different forms. During generate_bitmap_or_paths() we need to exclude
considered condition for other clauses.  It couldn't be as normal
filtered out in the latter stage, because could reach the index in
another form.

> > Do you have a test 

Re: POC, WIP: OR-clause support for indexes

2024-07-28 Thread Alena Rybakina

On 27.07.2024 13:56, Alexander Korotkov wrote:

On Thu, Jul 25, 2024 at 5:04 PM Alena Rybakina
  wrote:

To be honest, I have found a big problem in this patch - we try to perform the 
transformation every time we examime a column:

for (indexcol = 0; indexcol < index->nkeycolumns; indexcol++) { ...

}

I have fixed it and moved the transformation before going through the loop.

What makes you think there is a problem?


To be honest, I was bothered by the fact that we need to go through 
expressions several times that obviously will not fit under other 
conditions.
Just yesterday I thought that it would be worthwhile to create a list of 
candidates - expressions that did not fit because the column did not 
match the index (!match_index_to_operand(nconst_expr, indexcol, index)).


Another problem that is related to the first one that the boolexpr could 
contain expressions referring to different operands, for example, both x 
and y. that is, we may have the problem that the optimal "ANY" 
expression may not be used, because the expression with x may come 
earlier and the loop may end earlier.


alena@postgres=# create table b (x int, y int);
CREATE TABLE
alena@postgres=# insert into b select id, id from 
generate_series(1,1000) as id;

INSERT 0 1000
alena@postgres=# create index x_idx on b(x);
CREATE INDEX
alena@postgres=# analyze;
ANALYZE
alena@postgres=# explain select * from b where y =3 or x =4 or x=5 or 
x=6 or x = 7 or x=8 or x=9;

  QUERY PLAN
---
 Seq Scan on b  (cost=0.00..32.50 rows=7 width=8)
   Filter: ((y = 3) OR (x = 4) OR (x = 5) OR (x = 6) OR (x = 7) OR (x = 
8) OR (x = 9))

(2 rows)
alena@postgres=# explain select * from b where x =4 or x=5 or x=6 or x = 
7 or x=8 or x=9 or y=1;

  QUERY PLAN
---
 Seq Scan on b  (cost=0.00..32.50 rows=7 width=8)
   Filter: ((x = 4) OR (x = 5) OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 
9) OR (y = 1))

(2 rows)
alena@postgres=# explain select * from b where x =4 or x=5 or x=6 or x = 
7 or x=8 or x=9;

   QUERY PLAN

 Index Scan using x_idx on b  (cost=0.28..12.75 rows=6 width=8)
   Index Cond: (x = ANY ('{4,5,6,7,8,9}'::integer[]))
(2 rows)

Furthermore expressions can be stored in a different order.
For example, first comes "AND" expr, and then group of "OR" expr, which 
we can convert to "ANY" expr, but we won't do this due to the fact that 
we will exit the loop early, according to this condition:


if (!IsA(sub_rinfo->clause, OpExpr))
           return NULL;

or it may occur due to other conditions.

alena@postgres=# create index x_y_idx on b(x,y);
CREATE INDEX
alena@postgres=# analyze;
ANALYZE

alena@postgres=# explain select * from b where (x = 2 and y =3) or x =4 
or x=5 or x=6 or x = 7 or x=8 or x=9;

 QUERY PLAN
-
 Seq Scan on b  (cost=0.00..35.00 rows=6 width=8)
   Filter: (((x = 2) AND (y = 3)) OR (x = 4) OR (x = 5) OR (x = 6) OR 
(x = 7) OR (x = 8) OR (x = 9))

(2 rows)

Because of these reasons, I tried to save this and that transformation 
together for each column and try to analyze for each expr separately 
which method would be optimal.



Do you have a test case
illustrating a slow planning time?

No, I didn't have time to measure it and sorry for that. I'll do it.

When v27 performs transformation for a particular column, it just
stops facing the first unmatched OR entry.  So,
match_orclause_to_indexcol() examines just the first OR entry for all
the columns excepts at most one.  So, the check
match_orclause_to_indexcol() does is not much slower than other
match_*_to_indexcol() do.

I actually think this could help performance in many cases, not hurt
it.  At least we get rid of O(n^2) complexity over the number of OR
entries, which could be very many.


I agree with you that there is an overhead and your patch fixes this 
problem, but optimizer needs to have a good ordering of expressions for 
application.


I think we can try to move the transformation to another place where 
there is already a loop pass, and also save two options "OR" expr and 
"ANY" expr in one place (through BoolExpr) (like find_duplicate_ors 
function) and teach the optimizer to determine which option is better, 
for example, like now in match_orclause_to_indexcol() function.


What do you thing about it?

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company





Re: POC, WIP: OR-clause support for indexes

2024-07-27 Thread Alexander Korotkov
On Thu, Jul 25, 2024 at 5:04 PM Alena Rybakina
 wrote:
> To be honest, I have found a big problem in this patch - we try to perform 
> the transformation every time we examime a column:
>
> for (indexcol = 0; indexcol < index->nkeycolumns; indexcol++) { ...
>
> }
>
> I have fixed it and moved the transformation before going through the loop.

What makes you think there is a problem?  Do you have a test case
illustrating a slow planning time?

When v27 performs transformation for a particular column, it just
stops facing the first unmatched OR entry.  So,
match_orclause_to_indexcol() examines just the first OR entry for all
the columns excepts at most one.  So, the check
match_orclause_to_indexcol() does is not much slower than other
match_*_to_indexcol() do.

I actually think this could help performance in many cases, not hurt
it.  At least we get rid of O(n^2) complexity over the number of OR
entries, which could be very many.

--
Regards,
Alexander Korotkov
Supabase




Re: POC, WIP: OR-clause support for indexes

2024-07-25 Thread Alena Rybakina

On 22.07.2024 03:52, Alexander Korotkov wrote:

Hi, Alena!

Let me answer to some of your findings.

On Mon, Jul 22, 2024 at 12:53 AM Alena Rybakina
  wrote:

To be honest,I saw a larger problem. Look at the query bellow:

master:

alena@postgres=# create table t (a int not null, b int not null, c int not 
null);
insert into t (select 1, 1, i from generate_series(1,1) i);
insert into t (select i, 2, 2 from generate_series(1,1) i);
create index t_a_b_idx on t (a, b);

Just a side note.  As I mention in [1], there is missing statement
create index t_a_b_idx on t (a, b);
to get same plan as in [2].


create statistics t_a_b_stat (mcv) on a, b from t;
create statistics t_b_c_stat (mcv) on b, c from t;
vacuum analyze t;
CREATE TABLE
INSERT 0 1
INSERT 0 1
CREATE INDEX
CREATE STATISTICS
CREATE STATISTICS
VACUUM
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c 
= 2;
   QUERY PLAN
--
  Bitmap Heap Scan on t  (cost=156.55..465.57 rows=5001 width=12)
Recheck Cond: (a = 1)
Filter: ((c = 2) AND ((b = 1) OR (b = 2)))
->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..155.29 rows=10001 width=0)
  Index Cond: (a = 1)
(5 rows)


The query plan if v26[0] and v27[1] versions are equal and wrong in my opinion 
-where is c=2 expression?

v27 [1]
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c 
= 2;
   QUERY PLAN
--
  Bitmap Heap Scan on t  (cost=165.85..474.87 rows=5001 width=12)
Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
Filter: (c = 2)
->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001 width=0)
  Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
(5 rows)
v26 [0]
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and c 
= 2;
   QUERY PLAN
--
  Bitmap Heap Scan on t  (cost=165.85..449.86 rows=5001 width=12)
Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
Filter: (c = 2)
->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001 width=0)
  Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
(5 rows)

I think both v26 and v27 are correct here.  The c = 2 condition is in
the Filter.

Yes, I see it and agree with that.

In addition, I noticed that the ANY expression will be formed only for first 
group and ignore for others, like in the sample bellow:

v26 version [0]:

alena@postgres=# explain select * from t where (b = 1 or b = 2) and (a = 2 or 
a=3);
 QUERY PLAN
---
  Index Scan using t_a_b_idx on t  (cost=0.29..24.75 rows=2 width=12)
Index Cond: ((a = ANY ('{2,3}'::integer[])) AND (b = ANY 
('{1,2}'::integer[])))
(2 rows)

v27 version [1]:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);
QUERY PLAN

  Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)
Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
(2 rows)

Did you notice you're running different queries on v26 and v27 here?
If you will run ton v27 the same query you run on v26, the plan also
will be the same.


alena@postgres=# create index a_idx on t(a);
CREATE INDEX
alena@postgres=# create index b_idx on t(b);
CREATE INDEX
alena@postgres=# analyze;
ANALYZE

v26:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);
  QUERY PLAN

  Bitmap Heap Scan on t  (cost=17.18..30.94 rows=4 width=12)
Recheck Cond: ((a = ANY ('{2,3}'::integer[])) OR (a = ANY 
('{2,3}'::integer[])))
->  BitmapOr  (cost=17.18..17.18 rows=4 width=0)
  ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)
Index Cond: (a = ANY ('{2,3}'::integer[]))
  ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)
Index Cond: (a = ANY ('{2,3}'::integer[]))
(7 rows)

v27:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or a=3);
QUERY PLAN

  Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)
Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
(2 rows)

The behavior in version 26 is incorrect, but in version 27, it does not select 
anything other than seqscan

Please, check that there is still possibility to the generate BitmapOr plan.
It is fine, I think. The transformation works, but due to the fact that 
index 

Re: POC, WIP: OR-clause support for indexes

2024-07-21 Thread Alexander Korotkov
On Mon, Jul 22, 2024 at 3:52 AM Alexander Korotkov  wrote:
> Please, check that there is still possibility to the generate BitmapOr plan.
>
> # explain select * from t where (b = 1 or b = 2 or a = 2 or a = 3);
>  QUERY PLAN
> 
>  Bitmap Heap Scan on t  (cost=326.16..835.16 rows=14999 width=12)
>Recheck Cond: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
>->  BitmapOr  (cost=326.16..326.16 rows=2 width=0)
>  ->  Bitmap Index Scan on t_b_c_idx  (cost=0.00..151.29
> rows=1 width=0)
>Index Cond: (b = 1)
>  ->  Bitmap Index Scan on t_b_c_idx  (cost=0.00..151.29
> rows=1 width=0)
>Index Cond: (b = 2)
>  ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..4.29 rows=1 width=0)
>Index Cond: (a = 2)
>  ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..4.29 rows=1 width=0)
>Index Cond: (a = 3)

Forgot to mention that I have to
# set enable_seqscan = off;
to get this plan.

--
Regards,
Alexander Korotkov
Supabase




Re: POC, WIP: OR-clause support for indexes

2024-07-21 Thread Alexander Korotkov
Hi, Alena!

Let me answer to some of your findings.

On Mon, Jul 22, 2024 at 12:53 AM Alena Rybakina
 wrote:
> To be honest,I saw a larger problem. Look at the query bellow:
>
> master:
>
> alena@postgres=# create table t (a int not null, b int not null, c int not 
> null);
> insert into t (select 1, 1, i from generate_series(1,1) i);
> insert into t (select i, 2, 2 from generate_series(1,1) i);
> create index t_a_b_idx on t (a, b);

Just a side note.  As I mention in [1], there is missing statement
create index t_a_b_idx on t (a, b);
to get same plan as in [2].

> create statistics t_a_b_stat (mcv) on a, b from t;
> create statistics t_b_c_stat (mcv) on b, c from t;
> vacuum analyze t;
> CREATE TABLE
> INSERT 0 1
> INSERT 0 1
> CREATE INDEX
> CREATE STATISTICS
> CREATE STATISTICS
> VACUUM
> alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and 
> c = 2;
>   QUERY PLAN
> --
>  Bitmap Heap Scan on t  (cost=156.55..465.57 rows=5001 width=12)
>Recheck Cond: (a = 1)
>Filter: ((c = 2) AND ((b = 1) OR (b = 2)))
>->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..155.29 rows=10001 width=0)
>  Index Cond: (a = 1)
> (5 rows)
>
>
> The query plan if v26[0] and v27[1] versions are equal and wrong in my 
> opinion -where is c=2 expression?
>
> v27 [1]
> alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and 
> c = 2;
>   QUERY PLAN
> --
>  Bitmap Heap Scan on t  (cost=165.85..474.87 rows=5001 width=12)
>Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
>Filter: (c = 2)
>->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001 width=0)
>  Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
> (5 rows)
> v26 [0]
> alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 2) and 
> c = 2;
>   QUERY PLAN
> --
>  Bitmap Heap Scan on t  (cost=165.85..449.86 rows=5001 width=12)
>Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
>Filter: (c = 2)
>->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001 width=0)
>  Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
> (5 rows)

I think both v26 and v27 are correct here.  The c = 2 condition is in
the Filter.

> In addition, I noticed that the ANY expression will be formed only for first 
> group and ignore for others, like in the sample bellow:
>
> v26 version [0]:
>
> alena@postgres=# explain select * from t where (b = 1 or b = 2) and (a = 2 or 
> a=3);
> QUERY PLAN
> ---
>  Index Scan using t_a_b_idx on t  (cost=0.29..24.75 rows=2 width=12)
>Index Cond: ((a = ANY ('{2,3}'::integer[])) AND (b = ANY 
> ('{1,2}'::integer[])))
> (2 rows)
>
> v27 version [1]:
>
> alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or 
> a=3);
>QUERY PLAN
> 
>  Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)
>Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
> (2 rows)

Did you notice you're running different queries on v26 and v27 here?
If you will run ton v27 the same query you run on v26, the plan also
will be the same.

> alena@postgres=# create index a_idx on t(a);
> CREATE INDEX
> alena@postgres=# create index b_idx on t(b);
> CREATE INDEX
> alena@postgres=# analyze;
> ANALYZE
>
> v26:
>
> alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or 
> a=3);
>  QUERY PLAN
> 
>  Bitmap Heap Scan on t  (cost=17.18..30.94 rows=4 width=12)
>Recheck Cond: ((a = ANY ('{2,3}'::integer[])) OR (a = ANY 
> ('{2,3}'::integer[])))
>->  BitmapOr  (cost=17.18..17.18 rows=4 width=0)
>  ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)
>Index Cond: (a = ANY ('{2,3}'::integer[]))
>  ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)
>Index Cond: (a = ANY ('{2,3}'::integer[]))
> (7 rows)
>
> v27:
>
> alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 or 
> a=3);
>QUERY PLAN
> 
>  Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)
>Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
> (2 rows)
>
> The behavior in version 26 is incorrect, but in version 27, it does not 
> select anything other than seqscan

Please, check that there is still possibility to the generate 

Re: POC, WIP: OR-clause support for indexes

2024-07-21 Thread Alena Rybakina

Hi! Thank you for your contribution to this thread!

To be honest,I saw a larger problem. Look at the query bellow:

master:

alena@postgres=# create table t (a int not null, b int not null, c int 
not null);

insert into t (select 1, 1, i from generate_series(1,1) i);
insert into t (select i, 2, 2 from generate_series(1,1) i);
create index t_a_b_idx on t (a, b);
create statistics t_a_b_stat (mcv) on a, b from t;
create statistics t_b_c_stat (mcv) on b, c from t;
vacuum analyze t;
CREATE TABLE
INSERT 0 1
INSERT 0 1
CREATE INDEX
CREATE STATISTICS
CREATE STATISTICS
VACUUM
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 
2) and c = 2;

  QUERY PLAN
--
 Bitmap Heap Scan on t  (cost=156.55..465.57 rows=5001 width=12)
   Recheck Cond: (a = 1)
   Filter: ((c = 2) AND ((b = 1) OR (b = 2)))
   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..155.29 rows=10001 
width=0)

 Index Cond: (a = 1)
(5 rows)


The query plan if v26[0] and v27[1] versions are equal and wrong in my 
opinion -where is c=2 expression?


v27 [1]
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 
2) and c = 2;

  QUERY PLAN
--
 Bitmap Heap Scan on t  (cost=165.85..474.87 rows=5001 width=12)
   Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
   Filter: (c = 2)
   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001 
width=0)

 Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
(5 rows)
v26 [0]
alena@postgres=# explain select * from t where a = 1 and (b = 1 or b = 
2) and c = 2;

  QUERY PLAN
--
 Bitmap Heap Scan on t  (cost=165.85..449.86 rows=5001 width=12)
   Recheck Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
   Filter: (c = 2)
   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..164.59 rows=10001 
width=0)

 Index Cond: ((a = 1) AND (b = ANY ('{1,2}'::integer[])))
(5 rows)


In addition, I noticed that the ANY expression will be formed only for 
first group and ignore for others, like in the sample bellow:


v26 version [0]:

alena@postgres=# explain select * from t where (b = 1 or b = 2) and (a = 
2 or a=3);

    QUERY PLAN
---
 Index Scan using t_a_b_idx on t  (cost=0.29..24.75 rows=2 width=12)
   Index Cond: ((a = ANY ('{2,3}'::integer[])) AND (b = ANY 
('{1,2}'::integer[])))

(2 rows)

v27 version [1]:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 
or a=3);

   QUERY PLAN

 Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)
   Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
(2 rows)


alena@postgres=# create index a_idx on t(a);
CREATE INDEX
alena@postgres=# create index b_idx on t(b);
CREATE INDEX
alena@postgres=# analyze;
ANALYZE

v26:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 
or a=3);

 QUERY PLAN

 Bitmap Heap Scan on t  (cost=17.18..30.94 rows=4 width=12)
   Recheck Cond: ((a = ANY ('{2,3}'::integer[])) OR (a = ANY 
('{2,3}'::integer[])))

   ->  BitmapOr  (cost=17.18..17.18 rows=4 width=0)
 ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)
   Index Cond: (a = ANY ('{2,3}'::integer[]))
 ->  Bitmap Index Scan on a_idx  (cost=0.00..8.59 rows=2 width=0)
   Index Cond: (a = ANY ('{2,3}'::integer[]))
(7 rows)

v27:

alena@postgres=# explain select * from t where (b = 1 or b = 2 or a = 2 
or a=3);

   QUERY PLAN

 Seq Scan on t  (cost=0.00..509.00 rows=14999 width=12)
   Filter: ((b = 1) OR (b = 2) OR (a = 2) OR (a = 3))
(2 rows)

The behavior in version 26 is incorrect, but in version 27, it does not 
select anything other than seqscan


Since Thursday I have been trying to add the code forming groups of 
identical "OR" expressions, as in version 26. I'm currently debugging 
errors.


On 21.07.2024 11:17, Nikolay Shaplov wrote:

В письме от среда, 17 июля 2024 г. 22:36:19 MSK пользователь Alexander
Korotkov написал:

Hi All!

I am continue reading the patch, now it's newer version

First main question:

As far a I can get, the entry point for OR->ANY convertation have been moved
to match_clause_to_indexcol funtion, that checks if some restriction can use
index for performance.

The thing I do not understand what match_clause_to_indexcol actually received
as arguments. Should this be set of expressions  with 

Re: POC, WIP: OR-clause support for indexes

2024-07-21 Thread Nikolay Shaplov
В письме от среда, 17 июля 2024 г. 22:36:19 MSK пользователь Alexander 
Korotkov написал:

Hi All!

I am continue reading the patch, now it's newer version

First main question:

As far a I can get, the entry point for OR->ANY convertation have been moved 
to match_clause_to_indexcol funtion, that checks if some restriction can use 
index for performance.

The thing I do not understand what match_clause_to_indexcol actually received 
as arguments. Should this be set of expressions  with OR in between grouped by 
one of the expression argument?

If not I do not understand how this ever should work.

The rest is about code readability

> + if (bms_is_member(index->rel->relid, rinfo->right_relids))
> + return NULL;

This check it totally not obvious for person who is not deep into postgres 
code. There should go comment explaining what are we checking for, and why it 
does not suit our purposes


> + foreach(lc, orclause->args)
> + {
Being no great expert in postgres code, I am confused what are we iterating on 
here? Two arguments of OR statement? (a>1) OR (b>2) those in brackets? Or 
what? Comment explaining that would be a great help here.


> +if (sub_rinfo->is_pushed_down != rinfo->is_pushed_down ||
> + sub_rinfo->is_clone != rinfo->is_clone ||
> + sub_rinfo->security_level != rinfo->security_level ||
> + !bms_equal(sub_rinfo->required_relids, rinfo->required_relids) ||
> + !bms_equal(sub_rinfo->incompatible_relids, rinfo- 
incompatible_relids) ||
> + !bms_equal(sub_rinfo->outer_relids, rinfo->outer_relids))
> + { 

This check it totally mind-blowing... What in the name of existence is going 
on here?

I would suggest  to split these checks into parts (compiler optimizer should 
take care about overhead)  and give each part a sane explanation.

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su


signature.asc
Description: This is a digitally signed message part.


Re: POC, WIP: OR-clause support for indexes

2024-07-17 Thread Alexander Korotkov
Hi, Alena!

On Wed, Jul 17, 2024 at 3:53 PM Alena Rybakina 
wrote:

> On 17.07.2024 03:03, Alexander Korotkov wrote:
> > Hi, Alena!
> >
> > On Thu, Jul 11, 2024 at 7:17 PM Alena Rybakina
> >  wrote:
> >> I have finished patch and processed almost your suggestions (from [0],
> [1], [2]). It remains only to add tests where the conversion should work,
> but I will add this in the next version.
> >>
> >> [0]
> https://www.postgresql.org/message-id/3381819.e9J7NaK4W3%40thinkpad-pgpro
> >>
> >> [1]
> https://www.postgresql.org/message-id/9736220.CDJkKcVGEf%40thinkpad-pgpro
> >>
> >> [2]
> https://www.postgresql.org/message-id/2193851.QkHrqEjB74%40thinkpad-pgpro
> > I dare making another revision of this patch.  In this version I moved
> > the transformation to match_clause_to_indexcol().  Therefore, this
> > allows to successfully construct index scans with SAOP, but has no
> > degradation in generation of bitmap scans which I observed in [1] and
> > [2].  BTW, I found that my description in [2] lacks of t_b_c_idx index
> > definition. Sorry for that.
> >
> > Given that now we're doing OR-to-ANY transformation solely to match an
> > index we don't need complex analysis of OR-list, which potentially
> > could take quadratic time.  Instead, we're trying to match every OR
> > element to an index and quit immediately on failure.
>
> Yes I see that. I will look at this in detail, but so far I have not
> found any unpleasant side effects indicating that the patch should be
> moved to another place and this is very good)
>
> The only thing that worries me so far is that most likely we will need
> to analyze the changes in rinfo and distribute them to others places
> where links about them are used.
> But I need to look at this in more detail separately before discussing it.
>

I'm not sure if would need to distribute changes of RestrictInfo's, because
we're modifying anything in-place.  Instead we create a new RestrictInfo
for IndexOptInfo.  I think this is what Robert proposed at [1].  The side
effect of this I yet see is redundancy of clauses in [2] test case.

   QUERY PLAN


 Bitmap Heap Scan on t  (cost=19.70..26.93 rows=5001 width=12)
   Recheck Cond: (((b = 1) AND (b = ANY ('{1,2}'::integer[])) AND (c = 2))
OR ((a = 1) AND (b = 2) AND (b = ANY ('{1,2}'::integer[]
   Filter: ((a = 1) AND (c = 2))
   ->  BitmapOr  (cost=19.70..19.70 rows=2 width=0)
 ->  Bitmap Index Scan on t_b_c_idx  (cost=0.00..8.60 rows=1
width=0)
   Index Cond: ((b = 1) AND (b = ANY ('{1,2}'::integer[])) AND
(c = 2))
 ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..8.60 rows=1
width=0)
   Index Cond: ((a = 1) AND (b = 2) AND (b = ANY
('{1,2}'::integer[])))
(8 rows)

You can see that index conds and recheck conds contain both SAOP clauses
and equality clauses.  I this this happens because bitmap scan planning
code doesn't understands equivalency of original and transformed
RestrictInfo's.  I'm not yet sure what to do about this.  We probably need
to teach bitmap scan planning code to understand this equivalency.  Or,
otherwise, just allow this redundancy given that this is quite rare case I
believe.


> Yes, I am ready to agree that there was no degradation in tests [1] and
> [2]. But just in case, I will do a review to rule out any other problems.
> > I'd like to head a feedback on the new place to apply the
> > transformation.  It looks like significant simplification for me and
> > the way to go.
> >
> > Also, I have addressed some of notes by Robert Haas [3].  In v27 we
> > don't use expression evaluation, but directly construct an array
> > constant when possible.  Also we don't transform operator id to string
> > and back, but directly construct SAOP instead.
> >
> > Links.
> > 1.
> https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com
> > 2.
> https://www.postgresql.org/message-id/CAPpHfdtSXxhdv3mLOLjEewGeXJ%2BFtfhjqodn1WWuq5JLsKx48g%40mail.gmail.com
> > 3.
> https://www.postgresql.org/message-id/CA%2BTgmobu0DUFCTF28DuAi975mEc4xYqX3xyt8RA0WbnyrYg%2BFw%40mail.gmail.com
>
> Thanks for your effort and any help is welcome)
>
> Yesterday I finished a big project in my work and now I'm ready to
> continue working on this thread. I'll write the results one of these days.
>

Great, thank you. I would appreciate your further work on this patch.
Apart from general feedback on approach, the last patch requires comments,
code beautification etc.

Links.
1.
https://www.postgresql.org/message-id/CA%2BTgmoarYLO6PL%2BFEnXJ6A-57KsVsotpvHnB771M-wXQOGNy9w%40mail.gmail.com
2.
https://www.postgresql.org/message-id/CAPpHfdtSXxhdv3mLOLjEewGeXJ%2BFtfhjqodn1WWuq5JLsKx48g%40mail.gmail.com

--
Regards,
Alexander Korotkov
Supabase


Re: POC, WIP: OR-clause support for indexes

2024-07-17 Thread Alena Rybakina

Hi! Thanks for your contribution to this topic!

On 17.07.2024 03:03, Alexander Korotkov wrote:

Hi, Alena!

On Thu, Jul 11, 2024 at 7:17 PM Alena Rybakina
 wrote:

I have finished patch and processed almost your suggestions (from [0], [1], 
[2]). It remains only to add tests where the conversion should work, but I will 
add this in the next version.

[0] https://www.postgresql.org/message-id/3381819.e9J7NaK4W3%40thinkpad-pgpro

[1] https://www.postgresql.org/message-id/9736220.CDJkKcVGEf%40thinkpad-pgpro

[2] https://www.postgresql.org/message-id/2193851.QkHrqEjB74%40thinkpad-pgpro

I dare making another revision of this patch.  In this version I moved
the transformation to match_clause_to_indexcol().  Therefore, this
allows to successfully construct index scans with SAOP, but has no
degradation in generation of bitmap scans which I observed in [1] and
[2].  BTW, I found that my description in [2] lacks of t_b_c_idx index
definition. Sorry for that.

Given that now we're doing OR-to-ANY transformation solely to match an
index we don't need complex analysis of OR-list, which potentially
could take quadratic time.  Instead, we're trying to match every OR
element to an index and quit immediately on failure.


Yes I see that. I will look at this in detail, but so far I have not 
found any unpleasant side effects indicating that the patch should be 
moved to another place and this is very good)


The only thing that worries me so far is that most likely we will need 
to analyze the changes in rinfo and distribute them to others places 
where links about them are used.

But I need to look at this in more detail separately before discussing it.

Yes, I am ready to agree that there was no degradation in tests [1] and 
[2]. But just in case, I will do a review to rule out any other problems.

I'd like to head a feedback on the new place to apply the
transformation.  It looks like significant simplification for me and
the way to go.

Also, I have addressed some of notes by Robert Haas [3].  In v27 we
don't use expression evaluation, but directly construct an array
constant when possible.  Also we don't transform operator id to string
and back, but directly construct SAOP instead.

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAPpHfdtSXxhdv3mLOLjEewGeXJ%2BFtfhjqodn1WWuq5JLsKx48g%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/CA%2BTgmobu0DUFCTF28DuAi975mEc4xYqX3xyt8RA0WbnyrYg%2BFw%40mail.gmail.com


Thanks for your effort and any help is welcome)

Yesterday I finished a big project in my work and now I'm ready to 
continue working on this thread. I'll write the results one of these days.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: POC, WIP: OR-clause support for indexes

2024-07-16 Thread Alexander Korotkov
Hi, Alena!

On Thu, Jul 11, 2024 at 7:17 PM Alena Rybakina
 wrote:
> I have finished patch and processed almost your suggestions (from [0], [1], 
> [2]). It remains only to add tests where the conversion should work, but I 
> will add this in the next version.
>
> [0] https://www.postgresql.org/message-id/3381819.e9J7NaK4W3%40thinkpad-pgpro
>
> [1] https://www.postgresql.org/message-id/9736220.CDJkKcVGEf%40thinkpad-pgpro
>
> [2] https://www.postgresql.org/message-id/2193851.QkHrqEjB74%40thinkpad-pgpro

I dare making another revision of this patch.  In this version I moved
the transformation to match_clause_to_indexcol().  Therefore, this
allows to successfully construct index scans with SAOP, but has no
degradation in generation of bitmap scans which I observed in [1] and
[2].  BTW, I found that my description in [2] lacks of t_b_c_idx index
definition. Sorry for that.

Given that now we're doing OR-to-ANY transformation solely to match an
index we don't need complex analysis of OR-list, which potentially
could take quadratic time.  Instead, we're trying to match every OR
element to an index and quit immediately on failure.

I'd like to head a feedback on the new place to apply the
transformation.  It looks like significant simplification for me and
the way to go.

Also, I have addressed some of notes by Robert Haas [3].  In v27 we
don't use expression evaluation, but directly construct an array
constant when possible.  Also we don't transform operator id to string
and back, but directly construct SAOP instead.

Links.
1. 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAPpHfdtSXxhdv3mLOLjEewGeXJ%2BFtfhjqodn1WWuq5JLsKx48g%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/CA%2BTgmobu0DUFCTF28DuAi975mEc4xYqX3xyt8RA0WbnyrYg%2BFw%40mail.gmail.com

--
Regards,
Alexander Korotkov
Supabase


v27-0001-Transform-OR-clauses-to-ANY-expression.patch
Description: Binary data


Re: POC, WIP: OR-clause support for indexes

2024-07-11 Thread Alena Rybakina
Sorry for repeating, but I have noticed that this message displays 
incorrectly and just in case I'll duplicate it.


On 11.07.2024 19:17, Alena Rybakina wrote:
The errorwascausedby the specificsof storingthe "OR"clausesinthe 
RestrictInfostructure.Scanning the orclauses list of the RestrictInfo 
variable, wecouldfacenotonlytheitem with RestrictInfo 
type,butalsotheBoolExpr type.


For example, when we have both or clauses and "AND" clauses together, 
like x = 1 and (y =1 or y=2 or y=3 and z = 1). The structure looks like:


RestrictInfo->orclauses = [RestrictInfo [x=1],
RestrictInfo->orclauses = [RestrictInfo[y=1],
                    RestrictInfo [y=2],
                    BoolExpr = [Restrictinfo [y=3], RestrictInfo [z=1]
                   ]
                                     ]

It'sworkingfinenow.

The error was caused by the specifics of storing the "OR" clauses in the 
RestrictInfo structure. When viewing the list of or offers, we could 
encounter not only the RestrictInfo type, but also the BoolExpr type. 
It's working fine now.


For example, when we have both or clauses and "AND" clauses together, 
like x = 1 and (y =1 or y=2 or y=3 and z = 1). The structure looks like:


RestrictInfo->orclauses = [RestrictInfo [x=1],
RestrictInfo->orclauses = [RestrictInfo[y=1],
RestrictInfo [y=2],
BoolExpr = [Restrictinfo [y=3], RestrictInfo [z=1]
]
 ]

It's working fine now.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-07-11 Thread Alena Rybakina

Hi, again!

I have finished patch and processed almost your suggestions (from [0], 
[1], [2]). It remainsonlyto addtestswherethe conversionshouldwork,butI 
willaddthis inthe nextversion.


[0] 
https://www.postgresql.org/message-id/3381819.e9J7NaK4W3%40thinkpad-pgpro


[1] 
https://www.postgresql.org/message-id/9736220.CDJkKcVGEf%40thinkpad-pgpro


[2] 
https://www.postgresql.org/message-id/2193851.QkHrqEjB74%40thinkpad-pgpro


On 09.07.2024 04:57, Alena Rybakina wrote:


Hi! Thank you for your review! Sorryforthe delayin responding.

Irewrotethe patchasyourequested,butnowI'm facedwiththe problemof 
processingthe elementsof the or_entries list.For somereason, 
thepointerto thelistis cleared and I couldn't find the place where it 
happened.MaybeI'mmissingsomethingsimpleinviewof the heavyworkloadright 
now,butmaybeyou'll seea problem?Ihave displayedpart of stackbelow.


#5 0x5b0f6d9f6a6a in ExceptionalCondition 
(conditionName=0x5b0f6dbb74f7 "IsPointerList(list)", 
fileName=0x5b0f6dbb7418 "list.c", lineNumber=341) at assert.c:66 #6 
0x5b0f6d5dc3ba in lappend (list=0x5b0f6eec5ca0, 
datum=0x5b0f6eec0d90) at list.c:341 #7 0x5b0f6d69230c in 
transform_or_to_any (root=0x5b0f6eeb13c8, orlist=0x5b0f6eec57c0) at 
initsplan.c:2818 #8 0x5b0f6d692958 in add_base_clause_to_rel 
(root=0x5b0f6eeb13c8, relid=1, restrictinfo=0x5b0f6eec5990) at 
initsplan.c:2982 #9 0x5b0f6d692e5f in 
distribute_restrictinfo_to_rels (root=0x5b0f6eeb13c8, 
restrictinfo=0x5b0f6eec5990) at initsplan.c:3175 #10 
0x5b0f6d691bf2 in distribute_qual_to_rels (root=0x5b0f6eeb13c8, 
clause=0x5b0f6eec0fc0, jtitem=0x5b0f6eec4330, sjinfo=0x0, 
security_level=0, qualscope=0x5b0f6eec4730, ojscope=0x0, 
outerjoin_nonnullable=0x0, incompatible_relids=0x0, 
allow_equivalence=true, has_clone=false, is_clone=false, 
postponed_oj_qual_list=0x0) at initsplan.c:2576 #11 0x5b0f6d69146f 
in distribute_quals_to_rels (root=0x5b0f6eeb13c8, 
clauses=0x5b0f6eec0bb0, jtitem=0x5b0f6eec4330, sjinfo=0x0, 
security_level=0, qualscope=0x5b0f6eec4730, ojscope=0x0, 
outerjoin_nonnullable=0x0, incompatible_relids=0x0, 
allow_equivalence=true, has_clone=false, is_clone=false, 
postponed_oj_qual_list=0x0) at initsplan.c:2144


Thisis stillthe firstiterationof the fixesyouhave proposed,soI have 
attachedthe patchindiffformat.I rewroteit,asyousuggestedinthe 
firstletter[0].Icreateda separatefunctionthattriesto forman 
OrClauseGroup node,butifit failsinthis, it returnsfalse,otherwiseit 
processesthe generatedelementaccordingtowhat it found-eitheraddsit to 
thelistasnew,oraddsa constantto anexistingone.


Ialsodividedonegenerallistof 
suitableforconversionandunsuitableintotwodifferentones:appropriate_entriesandor_entries.Nowweare 
onlylookinginthe listof suitableelementstoformANYexpr.


Thishelpsusto get ridofrepetitionsinthe codeyoumentioned. 
Pleasewriteifthisis notthelogicthatyouhave seenbefore.


[0] 
https://www.postgresql.org/message-id/3381819.e9J7NaK4W3%40thinkpad-pgpro


The errorwascausedby the specificsof storingthe "OR"clausesinthe 
RestrictInfostructure.Scanning the orclauses list of the RestrictInfo 
variable, wecouldfacenotonlytheitem with RestrictInfo 
type,butalsotheBoolExpr type.


For example, when we have both or clauses and "AND" clauses together, 
like x = 1 and (y =1 or y=2 or y=3 and z = 1). The structure looks like:


RestrictInfo->orclauses = [RestrictInfo [x=1],
RestrictInfo->orclauses = [RestrictInfo[y=1],
                    RestrictInfo [y=2],
                    BoolExpr = [Restrictinfo [y=3], RestrictInfo [z=1]
                   ]
                                     ]

It'sworkingfinenow.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
From 26eca98229749b20ad0c82a9fa55f4f37fd34d29 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Thu, 11 Jul 2024 19:01:10 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...])
during adding restrictinfo's to the base relation.

Here Cn is a n-th constant or parameters expression, 'expr' is non-constant
expression, 'op' is an operator which returns boolean result and has a commuter
(for the case of reverse order of constant and non-constant parts of the
expression, like 'Cn op expr').

Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru
Author: Alena Rybakina 
Author: Andrey Lepikhov 
Reviewed-by: Peter Geoghegan 
Reviewed-by: Ranier Vilela 
Reviewed-by: Alexander Korotkov 
Reviewed-by: Robert Haas 
Reviewed-by: Jian He 
Reviewed-by: Tom Lane 
Reviewed-by: Nikolay Shaplov 
---
 src/backend/optimizer/plan/initsplan.c| 335 ++
 src/include/nodes/pathnodes.h |  31 ++
 src/test/regress/expected/create_index.out|  32 +-
 src/test/regress/expected/partition_prune.out |  46 +--
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/tidscan.out |   6 +-
 

Re: POC, WIP: OR-clause support for indexes

2024-07-10 Thread Alena Rybakina

On 27.06.2024 23:06, Alena Rybakina wrote:
Tobe honest,I've alreadystartedwritingcodetodothis,butI'm facedwitha 
misunderstandingof howto correctlycreatea 
conditionfor"OR"expressionsthatare notsubjectto transformation.


For example,the expressions b=1in the query below:

alena@postgres=# explain select * from x where ( (a =5 or a=4) and a 
= ANY(ARRAY[5,4])) or (b=1); QUERY PLAN 
-- 
Seq Scan on x (cost=0.00..123.00 rows=1 width=8) Filter: a = 5) 
OR (a = 4)) AND (a = ANY ('{5,4}'::integer[]))) OR (b = 1)) (2 rows)


I see that two expressions have remained unchanged and it only works 
for "AND" binary operations.


But I think it might be worth applying this together, where does the 
optimizer generate indexes (build_paths_for_OR function)?


Iimplementedsuchcode,butatthe 
analysisstageinplanner,anditwasn'tfullyreadyyet,butIwas ableto 
drawsomeimportantconclusions.Firstof all,Ifacedtheproblemof the 
inequalityof the numberof columnsinthe expressionwiththe 
requiredone,atleastsomeextracolumnappeared,judgingby the 
crust.Ihaven'tfullyrealizedityet andhaven'tfixedit.


#0  __pthread_kill_implementation (no_tid=0, signo=6, 
threadid=134300960061248)

    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=134300960061248) at 
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=134300960061248, signo=signo@entry=6) 
at ./nptl/pthread_kill.c:89
#3  0x7a2560042476 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/posix/raise.c:26

#4  0x7a25600287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x5573f9df62a8 in ExceptionalCondition (
    conditionName=0x5573f9fec4c8 
"AttrNumberIsForUserDefinedAttr(list_attnums[i]) || 
!bms_is_member(attnum, clauses_attnums)", fileName=0x5573f9fec11c 
"dependencies.c", lineNumber=1525) at assert.c:66
#6  0x5573f9b8b85f in dependencies_clauselist_selectivity 
(root=0x5573fad534e8,
    clauses=0x5573fad0b2d8, varRelid=0, jointype=JOIN_INNER, 
sjinfo=0x0, rel=0x5573fad54b38,

    estimatedclauses=0x7ffe2e43f178) at dependencies.c:1525
#7  0x5573f9b8fed9 in statext_clauselist_selectivity 
(root=0x5573fad534e8, clauses=0x5573fad0b2d8,
    varRelid=0, jointype=JOIN_INNER, sjinfo=0x0, rel=0x5573fad54b38, 
estimatedclauses=0x7ffe2e43f178,

    is_or=false) at extended_stats.c:2035
--Type  for more, q to quit, c to continue without paging--
#8  0x5573f9a57f88 in clauselist_selectivity_ext 
(root=0x5573fad534e8, clauses=0x5573fad0b2d8,
    varRelid=0, jointype=JOIN_INNER, sjinfo=0x0, 
use_extended_stats=true) at clausesel.c:153
#9  0x5573f9a57e30 in clauselist_selectivity (root=0x5573fad534e8, 
clauses=0x5573fad0b2d8,

    varRelid=0, jointype=JOIN_INNER, sjinfo=0x0) at clausesel.c:106
#10 0x5573f9a62e03 in set_baserel_size_estimates 
(root=0x5573fad534e8, rel=0x5573fad54b38)

    at costsize.c:5247
#11 0x5573f9a51aa5 in set_plain_rel_size (root=0x5573fad534e8, 
rel=0x5573fad54b38,

    rte=0x5573fad0ec58) at allpaths.c:581
#12 0x5573f9a516ce in set_rel_size (root=0x5573fad534e8, 
rel=0x5573fad54b38, rti=1,

    rte=0x5573fad0ec58) at allpaths.c:411
#13 0x5573f9a514c7 in set_base_rel_sizes (root=0x5573fad534e8) at 
allpaths.c:322
#14 0x5573f9a5119d in make_one_rel (root=0x5573fad534e8, 
joinlist=0x5573fad0adf8) at allpaths.c:183

#15 0x5573f9a94d45 in query_planner (root=0x5573fad534e8,
    qp_callback=0x5573f9a9b59e , 
qp_extra=0x7ffe2e43f540) at planmain.c:280
#16 0x5573f9a977a8 in grouping_planner (root=0x5573fad534e8, 
tuple_fraction=0, setops=0x0)

    at planner.c:1520
#17 0x5573f9a96e47 in subquery_planner (glob=0x5573fad533d8, 
parse=0x5573fad0ea48, parent_root=0x0,

    hasRecursion=false, tuple_fraction=0, setops=0x0) at planner.c:1089
#18 0x5573f9a954aa in standard_planner (parse=0x5573fad0ea48,
    query_string=0x5573fad8b3b0 "explain analyze SELECT * FROM 
functional_dependencies WHERE ((a * 2) = 2 OR (a * 2) = 102) AND 
upper(b) = '1'", cursorOptions=2048, boundParams=0x0) at planner.c:415

#19 0x5573f9a951d4 in planner (parse=0x5573fad0ea48,
--Type  for more, q to quit, c to continue without paging--
    query_string=0x5573fad8b3b0 "explain analyze SELECT * FROM 
functional_dependencies WHERE ((a * 2) = 2 OR (a * 2) = 102) AND 
upper(b) = '1'", cursorOptions=2048, boundParams=0x0) at planner.c:282

#20 0x5573f9bf4e2e in pg_plan_query (querytree=0x5573fad0ea48,
    query_string=0x5573fad8b3b0 "explain analyze SELECT * FROM 
functional_dependencies WHERE ((a * 2) = 2 OR (a * 2) = 102) AND 
upper(b) = '1'", cursorOptions=2048, boundParams=0x0) at postgres.c:904
#21 0x5573f98613e7 in standard_ExplainOneQuery 
(query=0x5573fad0ea48, cursorOptions=2048, into=0x0,

    es=0x5573fad57da0,
    queryString=0x5573fad8b3b0 "explain analyze SELECT * FROM 
functional_dependencies WHERE ((a * 2) = 2 OR (a * 2) = 102) AND 
upper(b) = '1'", params=0x0, queryEnv=0x0) at explain.c:489
#22 

Re: POC, WIP: OR-clause support for indexes

2024-07-08 Thread Alena Rybakina

Hi! Thank you for your review! Sorryforthe delayin responding.

Irewrotethe patchasyourequested,butnowI'm facedwiththe problemof 
processingthe elementsof the or_entries list.For somereason, 
thepointerto thelistis cleared and I couldn't find the place where it 
happened.MaybeI'mmissingsomethingsimpleinviewof the heavyworkloadright 
now,butmaybeyou'll seea problem?Ihave displayedpart of stackbelow.


#5 0x5b0f6d9f6a6a in ExceptionalCondition 
(conditionName=0x5b0f6dbb74f7 "IsPointerList(list)", 
fileName=0x5b0f6dbb7418 "list.c", lineNumber=341) at assert.c:66 #6 
0x5b0f6d5dc3ba in lappend (list=0x5b0f6eec5ca0, 
datum=0x5b0f6eec0d90) at list.c:341 #7 0x5b0f6d69230c in 
transform_or_to_any (root=0x5b0f6eeb13c8, orlist=0x5b0f6eec57c0) at 
initsplan.c:2818 #8 0x5b0f6d692958 in add_base_clause_to_rel 
(root=0x5b0f6eeb13c8, relid=1, restrictinfo=0x5b0f6eec5990) at 
initsplan.c:2982 #9 0x5b0f6d692e5f in 
distribute_restrictinfo_to_rels (root=0x5b0f6eeb13c8, 
restrictinfo=0x5b0f6eec5990) at initsplan.c:3175 #10 0x5b0f6d691bf2 
in distribute_qual_to_rels (root=0x5b0f6eeb13c8, clause=0x5b0f6eec0fc0, 
jtitem=0x5b0f6eec4330, sjinfo=0x0, security_level=0, 
qualscope=0x5b0f6eec4730, ojscope=0x0, outerjoin_nonnullable=0x0, 
incompatible_relids=0x0, allow_equivalence=true, has_clone=false, 
is_clone=false, postponed_oj_qual_list=0x0) at initsplan.c:2576 #11 
0x5b0f6d69146f in distribute_quals_to_rels (root=0x5b0f6eeb13c8, 
clauses=0x5b0f6eec0bb0, jtitem=0x5b0f6eec4330, sjinfo=0x0, 
security_level=0, qualscope=0x5b0f6eec4730, ojscope=0x0, 
outerjoin_nonnullable=0x0, incompatible_relids=0x0, 
allow_equivalence=true, has_clone=false, is_clone=false, 
postponed_oj_qual_list=0x0) at initsplan.c:2144


Thisis stillthe firstiterationof the fixesyouhave proposed,soI have 
attachedthe patchindiffformat.I rewroteit,asyousuggestedinthe 
firstletter[0].Icreateda separatefunctionthattriesto forman 
OrClauseGroup node,butifit failsinthis, it returnsfalse,otherwiseit 
processesthe generatedelementaccordingtowhat it found-eitheraddsit to 
thelistasnew,oraddsa constantto anexistingone.


Ialsodividedonegenerallistof 
suitableforconversionandunsuitableintotwodifferentones:appropriate_entriesandor_entries.Nowweare 
onlylookinginthe listof suitableelementstoformANYexpr.


Thishelpsusto get ridofrepetitionsinthe codeyoumentioned. 
Pleasewriteifthisis notthelogicthatyouhave seenbefore.


[0] 
https://www.postgresql.org/message-id/3381819.e9J7NaK4W3%40thinkpad-pgpro


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f99..1fdada54e23 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,9 +14,13 @@
  */
 #include "postgres.h"
 
+#include "catalog/namespace.h"
+#include "catalog/pg_operator.h"
 #include "catalog/pg_type.h"
+#include "common/hashfn.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "nodes/queryjumble.h"
 #include "optimizer/clauses.h"
 #include "optimizer/cost.h"
 #include "optimizer/inherit.h"
@@ -29,8 +33,11 @@
 #include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "parser/analyze.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_oper.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
+#include "utils/syscache.h"
 #include "utils/rel.h"
 #include "utils/typcache.h"
 
@@ -38,7 +45,6 @@
 int			from_collapse_limit;
 int			join_collapse_limit;
 
-
 /*
  * deconstruct_jointree requires multiple passes over the join tree, because we
  * need to finish computing JoinDomains before we start distributing quals.
@@ -2617,6 +2623,287 @@ check_redundant_nullability_qual(PlannerInfo *root, Node *clause)
 	return false;
 }
 
+static bool
+try_get_appropriable_group(RestrictInfo *rinfo, List **appropriate_group)
+{
+	OpExpr	   *orqual;
+	Node	   *const_expr;
+	Node	   *nconst_expr;
+	Oid			opno;
+	Oid			consttype;
+	Node	   *leftop,
+*rightop;
+	ListCell   *lc2;
+	bool found = false;
+	OrClauseGroup *entry;
+
+	if (!IsA(rinfo, RestrictInfo) || !IsA(rinfo->clause, OpExpr))
+	{
+		return false;
+	}
+
+	orqual = (OpExpr *) rinfo->clause;
+	opno = orqual->opno;
+	if (get_op_rettype(opno) != BOOLOID)
+	{
+		/* Only operator returning boolean suits OR -> ANY transformation */
+		return false;
+	}
+
+	/*
+		* Detect the constant side of the clause. Recall non-constant
+		* expression can be made not only with Vars, but also with Params,
+		* which is not bonded with any relation. Thus, we detect the const
+		* side - if another side is constant too, the orqual couldn't be an
+		* OpExpr.  Get pointers to constant and expression sides of the qual.
+		*/
+	leftop = get_leftop(orqual);
+	if (IsA(leftop, RelabelType))
+		leftop = (Node *) ((RelabelType *) leftop)->arg;
+
+	rightop = get_rightop(orqual);
+	if (IsA(rightop, RelabelType))
+		

Re: POC, WIP: OR-clause support for indexes

2024-06-27 Thread Alena Rybakina
Tobe honest,I've alreadystartedwritingcodetodothis,butI'm facedwitha 
misunderstandingof howto correctlycreatea 
conditionfor"OR"expressionsthatare notsubjectto transformation.


For example,the expressions b=1in the query below:

alena@postgres=# explain select * from x where ( (a =5 or a=4) and a = 
ANY(ARRAY[5,4])) or (b=1); QUERY PLAN 
-- 
Seq Scan on x (cost=0.00..123.00 rows=1 width=8) Filter: a = 5) OR 
(a = 4)) AND (a = ANY ('{5,4}'::integer[]))) OR (b = 1)) (2 rows)


I see that two expressions have remained unchanged and it only works 
for "AND" binary operations.


But I think it might be worth applying this together, where does the 
optimizer generate indexes (build_paths_for_OR function)?




Sorry, it works) I needed to create one more index for b column.

Just in case, I gave an example of a complete case, otherwise it might 
not be entirely clear:


alena@postgres=# create table x (a int, b int);
CREATE TABLE
alena@postgres=# create index a_idx on x(a);
    insert into x select id,id from 
generate_series(1, 5000) as id;

CREATE INDEX
INSERT 0 5000
alena@postgres=# analyze;
ANALYZE

alena@postgres=# explain select * from x where ( (a =5 or a=4) and a = 
ANY(ARRAY[5,4])) or (b=1); QUERY PLAN 
-- 
Seq Scan on x (cost=0.00..123.00 rows=1 width=8) Filter: a = 5) OR 
(a = 4)) AND (a = ANY ('{5,4}'::integer[]))) OR (b = 1)) (2 rows)


alena@postgres=# create index b_idx on x(b);

CREATE INDEX

alena@postgres=# explain select * from x where ( (a =5 or a=4) and a = 
ANY(ARRAY[5,4]))  or (b=1);

    QUERY PLAN
--
 Bitmap Heap Scan on x  (cost=12.87..21.68 rows=1 width=8)
   Recheck Cond: ((a = ANY ('{5,4}'::integer[])) OR (b = 1))
   ->  BitmapOr  (cost=12.87..12.87 rows=3 width=0)
 ->  Bitmap Index Scan on a_idx  (cost=0.00..8.58 rows=2 width=0)
   Index Cond: (a = ANY ('{5,4}'::integer[]))
 ->  Bitmap Index Scan on b_idx  (cost=0.00..4.29 rows=1 width=0)
   Index Cond: (b = 1)
(7 rows)

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-06-27 Thread Alena Rybakina

On 26.06.2024 23:19, Robert Haas wrote:

I think maybe replying to multiple emails with a single email is
something you'd be better off doing less often.
Ok, I won't do this in the future. After thinkingit over,I 
realizedthatit turnedout to be somekindof messinthe end.

On Tue, Jun 25, 2024 at 7:14 PM Alena Rybakina
  wrote:

Sorry, you are right and I'll try to explain more precisely. The first approach is the 
first part of the patch, where we made "Or" expressions into an SAOP at an 
early stage of plan generation [0], the second one was the one proposed by A.Korotkov [1].

[0] isn't doing anything "at an early stage of plan generation".  It's
changing something in *the parser*. The parser and planner are VERY
different stages of query parsing, and it's really important to keep
them separate mentally and in discussions.


Thanks for the detailed explanation, I'm always glad to learn new things 
for myself)


To be honest, I had an intuitive feeling that the transformation was 
called in the analyzer stage, but I wasn't sure about it, so I tried to 
summarize it.


As for the fact that in general all this can be divided into two large 
stages, parsing and planner is a little new to me. I have reread the 
documentation [0] andI foundinformationaboutitthere.


Beforethat, Iwas guidedbyinformationfromthe 
CarnegieMellonUniversitylecture andthe BruceMamjian report[1],whichwas 
wrongonmypart.


By the way,it turnsout that the queryrewritingstagereferstoan 
independentstage,whichis locatedbetweenthe 
parserstageandtheplanner/optimizer. I found it from the documentation [2].


[0] https://www.postgresql.org/docs/current/planner-optimizer.html

[1] https://momjian.us/main/writings/pgsql/optimizer.pdf

[2] https://www.postgresql.org/docs/16/rule-system.html


We should not be changing
anything about the query in the parser, because that will, as
Alexander also pointed out, change what gets stored if the user does
something like CREATE VIEW whatever AS SELECT ...; and we should, for
the most part, be storing the query as the user entered it, not some
transformed version of it. Further, at the parser stage, we do not
know the cost of anything, so we can only transform things when the
transformed version is always - or practically always - going to be
cheaper than the untransformed version.


Thank you, now it has become clear to me why it is so important to leave 
the transformation at the planner stage.



On 24.06.2024 18:28, Robert Haas wrote:
Andrei mentioned the problem, which might be caused by the transformation on 
the later stage of optimization is updating references to expressions in 
RestrictInfo [3] lists, because they can be used in different parts during the 
formation of the query plan. As the practice of Self-join removal [4] has 
shown, this can be expensive, but feasible. By applying the transformation at 
the analysis stage [0], because no links were created, so we did not encounter 
such problems, so this approach was more suitable than the others.

The link you provided for [3] doesn't show me exactly what code you're
talking about, but I can see why mutating a RestrictInfo after
creating it could be problematic. However, I'm not proposing that, and
I don't think it's a good idea. Instead of mutating an existing data
structure after it's been created, we want to get each data structure
correct at the moment that it is created. What that means is that at
each stage of processing, whenever we create a new in-memory data
structure, we have an opportunity to make changes along the way.

For instance, let's say we have a RestrictInfo and we are creating a
Path, perhaps via create_index_path(). One argument to that function
is a list of indexclauses. The indexclauses are derived from the
RestrictInfo list associated with the RelOptInfo. We take some subset
of those quals that are deemed to be indexable and we reorder them and
maybe change a few things and we build this new list of indexclauses
that is then passed to create_index_path(). The RelOptInfo's list of
RestrictInfos is not changed -- only the new list of clauses derived
from it is being built up here, without any mutation of the original
structure.

This is the kind of thing that this patch can and probably should do.
Join removal is quite awkward, as you rightly point out, because we
end up modifying existing data structures after they've been created,
and that requires us to run around and fix up a bunch of stuff, and
that can have bugs. Whenever possible, we don't want to do it that
way. Instead, we want to pick points in the processing when we're
anyway constructing some new structure and use that as an opportunity
to do transformations when building the new structure that incorporate
optimizations that make sense.


Thanks for the idea! I hadn't thought in this direction before, but it 
really might just work and solve all our original problems.
By the way, I saw that the optimizer is smart enough to eliminate 
duplicates. Below I 

Re: POC, WIP: OR-clause support for indexes

2024-06-26 Thread Robert Haas
I think maybe replying to multiple emails with a single email is
something you'd be better off doing less often.

On Tue, Jun 25, 2024 at 7:14 PM Alena Rybakina
 wrote:
> Sorry, you are right and I'll try to explain more precisely. The first 
> approach is the first part of the patch, where we made "Or" expressions into 
> an SAOP at an early stage of plan generation [0], the second one was the one 
> proposed by A.Korotkov [1].

[0] isn't doing anything "at an early stage of plan generation".  It's
changing something in *the parser*. The parser and planner are VERY
different stages of query parsing, and it's really important to keep
them separate mentally and in discussions. We should not be changing
anything about the query in the parser, because that will, as
Alexander also pointed out, change what gets stored if the user does
something like CREATE VIEW whatever AS SELECT ...; and we should, for
the most part, be storing the query as the user entered it, not some
transformed version of it. Further, at the parser stage, we do not
know the cost of anything, so we can only transform things when the
transformed version is always - or practically always - going to be
cheaper than the untransformed version.

> On 24.06.2024 18:28, Robert Haas wrote:
> Andrei mentioned the problem, which might be caused by the transformation on 
> the later stage of optimization is updating references to expressions in 
> RestrictInfo [3] lists, because they can be used in different parts during 
> the formation of the query plan. As the practice of Self-join removal [4] has 
> shown, this can be expensive, but feasible. By applying the transformation at 
> the analysis stage [0], because no links were created, so we did not 
> encounter such problems, so this approach was more suitable than the others.

The link you provided for [3] doesn't show me exactly what code you're
talking about, but I can see why mutating a RestrictInfo after
creating it could be problematic. However, I'm not proposing that, and
I don't think it's a good idea. Instead of mutating an existing data
structure after it's been created, we want to get each data structure
correct at the moment that it is created. What that means is that at
each stage of processing, whenever we create a new in-memory data
structure, we have an opportunity to make changes along the way.

For instance, let's say we have a RestrictInfo and we are creating a
Path, perhaps via create_index_path(). One argument to that function
is a list of indexclauses. The indexclauses are derived from the
RestrictInfo list associated with the RelOptInfo. We take some subset
of those quals that are deemed to be indexable and we reorder them and
maybe change a few things and we build this new list of indexclauses
that is then passed to create_index_path(). The RelOptInfo's list of
RestrictInfos is not changed -- only the new list of clauses derived
from it is being built up here, without any mutation of the original
structure.

This is the kind of thing that this patch can and probably should do.
Join removal is quite awkward, as you rightly point out, because we
end up modifying existing data structures after they've been created,
and that requires us to run around and fix up a bunch of stuff, and
that can have bugs. Whenever possible, we don't want to do it that
way. Instead, we want to pick points in the processing when we're
anyway constructing some new structure and use that as an opportunity
to do transformations when building the new structure that incorporate
optimizations that make sense.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: POC, WIP: OR-clause support for indexes

2024-06-26 Thread Nikolay Shaplov
В письме от понедельник, 24 июня 2024 г. 23:51:56 MSK пользователь Nikolay 
Shaplov написал:

So,  I continue reading the patch.

I see there is `entries` variable in the code, that is the list of 
`RestrictInfo` objects and `entry` that is `OrClauseGroup` object.

This naming is quite misguiding (at least for me).

 `entries` variable name can be used, as we deal only with RestrictInfo 
entries here. It is kind of "generic" type. Though naming it 
`restric_info_entry` might make te code more readable.

But when we come to an `entry` variable, it is very specific entry, it should 
be `OrClauseGroup` entry, not just any entry. So I would suggest to name this 
variable `or_clause_group_entry`, or even `or_clause_group` , so when we meet 
this variable in the middle of the code, we can get the idea what we are 
dealing with, without scrolling code up.

Variable naming is very important thing. It can drastically improve (or ruin) 
code readability.



Also I see some changes in the tests int this patch. There are should be tests 
that check that this new feature works well. And there are test whose behavior 
have been just accidentally affected.

I whould suggest to split these tests into two patches, as they should be 
reviewed in different ways. Functionality tests should be thoroughly checked 
that all stuff we added is properly tested, and affected tests should be 
checked 
that nothing important is not broken. It would be more easy to check if these 
are two different patches.

I would also suggest to add to the commit message of affected tests changes 
some explanation why this changes does not really breaks anything. This will 
do the checking more simple.

To be continued.

-- 
Nikolay Shaplov aka Nataraj
Fuzzing Engineer at Postgres Professional
Matrix IM: @dhyan:nataraj.su


signature.asc
Description: This is a digitally signed message part.


Re: POC, WIP: OR-clause support for indexes

2024-06-25 Thread Alena Rybakina

On 24.06.2024 18:28, Robert Haas wrote:

On Fri, Jun 21, 2024 at 6:52 PM Alena Rybakina
  wrote:

It's hard to tell, but I think it might be one of the good places to apply 
transformation. Let me describe a brief conclusion on the two approaches.

This explanation is somewhat difficult for me to follow. For example:


In the first approach, we definitely did not process the extra "OR" expressions 
in the first approach, since they were packaged as an Array. It could lead to the fact 
that less planning time would be spent on the optimizer.

I don't know what the "first approach" refers to, or what processing
the extra "OR" expressions means, or what it would mean to package OR
expressions as an array. If you made them into an SAOP then you'd have
an array*instead of*  OR expressions, not OR expressions "packaged as
an array" but even then, they'd still be processed somewhere, unless
the patch was just wrong.

I think you should try writing this summary again and see if you can
make it a lot clearer and more precise.

I'm suspicious based that we should actually be postponing the
transformation even further. If, for example, the transformation is
advantageous for index scans and disadvantageous for bitmap scans, or
the other way around, then this approach can't help much: it either
does the transformation and all scan types are affected, or it doesn't
do it and no scan types are affected. But if you decided for each scan
whether to transform the quals, then you could handle that. Against
that, there might be increased planning cost. But, perhaps that could
be avoided somehow.


Sorry, you are right and I'll try to explain more precisely. The 
firstapproach isthefirstpartof the patch,wherewemade "Or" expressions 
into an SAOPatan earlystageof plangeneration[0],the secondonewasthe one 
proposedby A.Korotkov[1].


So, when we made "OR" expressions into an SAOPat the post-parsing stage 
of the plan generation [0], we definitely did not process the 
redundantexpressions"OR" expressions there (for example,duplicates), 
since they were transformed to SAOP expression. Furthermore, the list of 
OR expressions can be significantly reduced, since constants belonging 
to the same predicate will already be converted into an SAOP expression. 
I assume this may reduce planning time, as I know several places in the 
optimizer where these lists of "OR" expressions are scanned several times.
Also the selectivity for SAOP expressions is estimated better, which 
could lead to the generation of a more optimal plan, but, to be honest, 
this is just an observation from changes in regression tests and, in 
general, how the process of calculating the selectivity of a complex 
expression works. And I think it needs further consideration. SELECT * 
FROM check_estimated_rows('SELECT * FROM functional_dependencies WHERE 
(a = 1 OR a = 51) AND b = ''1'''); estimated | actual 
---+ - 99 | 100 + 100 | 100 (1 row) SELECT * FROM 
check_estimated_rows('SELECT * FROM functional_dependencies WHERE (a = 1 
OR a = 51) AND (b = ''1'' OR b = ''2'')'); estimated | actual 
---+ - 99 | 100 + 100 | 100 (1 row) SELECT * FROM 
check_estimated_rows('SELECT * FROM functional_dependencies WHERE (a = 1 
OR a = 2 OR a = 51 OR a = 52) AND (b = ''1'' OR b = ''2'')'); estimated 
| actual ---+ - 197 | 200 + 200 | 200 Speaking of the 
main disadvantages, we do not give the optimizer the opportunity to 
generate a plan using BitmapScan, which can lead to the generation of a 
suboptimal plan, but in the current approach the same thing happens [2]. 
And you mentioned about it before:


On 24.06.2024 18:28, Robert Haas wrote:

I'm suspicious based that we should actually be postponing the
transformation even further. If, for example, the transformation is
advantageous for index scans and disadvantageous for bitmap scans, or
the other way around, then this approach can't help much: it either
does the transformation and all scan types are affected, or it doesn't
do it and no scan types are affected. But if you decided for each scan
whether to transform the quals, then you could handle that. Against
that, there might be increased planning cost. But, perhaps that could
be avoided somehow.


Andrei mentioned the problem, which might be caused by the 
transformation on the later stage of optimization is updating references 
to expressions in RestrictInfo [3] lists, because they can be used in 
different parts during the formation of the query plan. As the practice 
of Self-join removal [4] has shown, this can be expensive, but feasible. 
By applying the transformation at the analysis stage [0], because no 
links were created, so we did not encounter such problems, so this 
approach was more suitable than the others.


If some things were not clear enough, let me know.

[0] 
https://www.postgresql.org/message-id/attachment/156971/v21-0001-Transform-OR-clauses-to-ANY-expression.patch 
[1] 

Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Nikolay Shaplov
Hi!

Let me join the review process.

I am no expert in execution plans, so there would not be much help in doing 
even better optimization. But I can read the code, as a person who is not 
familiar 
with this area and help making it clear even to a person like me.

So, I am reading v25-0001-Transform-OR-clauses-to-ANY-expression.patch that 
have been posted some time ago, and especially transform_or_to_any function.

> @@ -38,7 +45,6 @@
>  int  from_collapse_limit;
>  int  join_collapse_limit;
>
> - 
>  /*
>   * deconstruct_jointree requires multiple passes over the join tree,  
because we 
>   * need to finish computing JoinDomains before we start  distributing quals.

Do not think that removing empty line should be part of the patch

> + /*
> +  * If the const node's (right side of operator 
expression) type
> +  * don't have “true” array type, then we cannnot 
do the
> +  * transformation. We simply concatenate the 
expression node.
> +  */
Guess using unicode double quotes is not the best idea here... 
   
Now to the first part of  `transform_or_to_any`:

> + List   *entries = NIL;
I guess the idea of entries should be explained from the start. What kind of 
entries are accomulated there... I see they are added there all around the 
code, but what is the purpose of that is not quite clear when you read it.

At the first part of `transform_or_to_any` function, you costanly repeat two 
lines, like a mantra:

> + entries = lappend(entries, rinfo);
> + continue;

"If something is wrong -- do that mantra"

From my perspective, if you have to repeat same code again and again, then 
most probably you have some issues with architecture of the code. If you 
repeat some code again and again, you need to try to rewrite the code, the 
way, that part is repeated only once.

In that case I would try to move the most of the first loop of 
`transform_or_to_any`  to a separate function (let's say its name is 
prepare_single_or), that will do all the checks, if this or is good for us; 
return NULL if it does not suits our purposes (and in this case we do "entries 
= lappend(entries, rinfo); continue" in the main code, but only once) or 
return pointer to some useful data if this or clause is good for our purposes. 

This, I guess will make that part more clear and easy to read, without 
repeating same "lappend mantra" again and again.

Will continue digging into the code tomorrow.

P.S. Sorry for sending partly finished email. Pressed Ctrl+Enter 
accidentally... With no way to make it back :-((( 

signature.asc
Description: This is a digitally signed message part.


Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Nikolay Shaplov
Hi!

Let me join the review process.

I am no expert in execution plans, so there would not be much help in doing 
even better optimization. But I can read the code, as a person who is not 
familiar 
with this area and help making it clear even to a person like me.

So, I am reading v25-0001-Transform-OR-clauses-to-ANY-expression.patch that 
have been posted some time ago, and especially transform_or_to_any function.

> @@ -38,7 +45,6 @@
>  int  from_collapse_limit;
>  int  join_collapse_limit;
>
> - 
>  /*
>   * deconstruct_jointree requires multiple passes over the join tree,  
> because we 
>   * need to finish computing JoinDomains before we start  distributing quals.

Do not think that removing empty line should be part of the patch

> + /*
> +  * If the const node's (right side of operator 
> expression) type
> +  * don't have “true” array type, then we cannnot do 
> the
> +  * transformation. We simply concatenate the expression 
> node.
> +  */
Guess using unicode double quotes is not the best idea here... 
   
Now to the first part of  `transform_or_to_any`


signature.asc
Description: This is a digitally signed message part.


Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Peter Geoghegan
On Mon, Jun 24, 2024 at 2:28 PM Robert Haas  wrote:
> On Mon, Jun 24, 2024 at 1:47 PM Peter Geoghegan  wrote:
> > I agree, with the proviso that "avoid gratuitous failures" should
> > include cases where a query that got the optimization suddenly fails
> > to get the optimization, due only to some very innocuous looking
> > change. Such as a change from using a constant 1_000_000_000 to a
> > constant 5_000_000_000 in the query text. That is a POLA violation.
>
> Nope, I don't agree with that at all. If you imagine that we can
> either have the optimization apply to one of those cases on the other,
> or on the other hand we can have some cases that outright fail, I
> think it's entirely clear that the former is better.

I'm just saying that not having the optimization apply to a query very
similar to one where it does apply is a POLA violation. That's another
kind of failure, for all practical purposes. Weird performance cliffs
like that are bad. It's very easy to imagine code that generates a
query text, that at some point randomly and mysteriously gets a
sequential scan. Or a much less efficient index scan.

> I was assuming this patch shouldn't be changing the way indexes work
> at all, just making use of the facilities that we have today. More
> could be done, but that might make it harder to get anything
> committed.

I was just pointing out that there is currently no good way to make
nbtree efficiently execute a qual "WHERE a = 5 OR a IS NULL", which is
almost entirely (though not quite entirely) due to a lack of any way
of expressing that idea through SQL, in a way that'll get pushed down
to the index scan node. You can write "WHERE a = any('{5,NULL')", of
course, but that doesn't treat NULL as just another array element to
match against via an IS NULL qual (due to NULL semantics).

Yeah, this is nonessential. But it's quite a nice optimization, and
seems entirely doable within the framework of the patch. It would be a
natural follow-up.

All that I'd need on the nbtree side is to get an input scan key that
directly embodies "WHERE mycol = 5 OR mycol IS NULL". That would
probably just be a scan key with sk_flags "SK_SEARCHARRAY |
SK_SEARCHNULL", that was otherwise identical to the current
SK_SEARCHARRAY scan keys.

Adopting the nbtree array index scan code to work with this would be
straightforward. SK_SEARCHNULL scan keys basically already work like
regular equality scan keys at execution time, so all that this
optimization requires on the nbtree side is teaching
_bt_advance_array_keys to treat NULL as a distinct array condition
(evaluated as IS NULL, not as = NULL).

> It's even possible, in my mind at least, that the patch is already
> doing exactly the right things here. Even if it isn't, the problem
> doesn't seem to be fundamental, because if this example can work (and
> it does) then what the patch is trying to do should be workable, too.
> We just have to make sure we're plugging all the pieces properly
> together, and that we have comments adequately explain what is
> happening and test cases that verify it. My feeling is that the patch
> doesn't meet that standard today, but I think that just means it needs
> some more work. I'm not arguing we have to throw the whole thing out,
> or invent a lot of new infrastructure, or anything like that.

I feel like my point about the potential for POLA violations is pretty
much just common sense. I'm not particular about the exact mechanism
by which we avoid it; only that we avoid it.


--
Peter Geoghegan




Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Robert Haas
On Mon, Jun 24, 2024 at 1:47 PM Peter Geoghegan  wrote:
> I agree, with the proviso that "avoid gratuitous failures" should
> include cases where a query that got the optimization suddenly fails
> to get the optimization, due only to some very innocuous looking
> change. Such as a change from using a constant 1_000_000_000 to a
> constant 5_000_000_000 in the query text. That is a POLA violation.

Nope, I don't agree with that at all. If you imagine that we can
either have the optimization apply to one of those cases on the other,
or on the other hand we can have some cases that outright fail, I
think it's entirely clear that the former is better.

> Maybe it doesn't. My point was only that the B-Tree code doesn't
> necessarily need to use just one rhs type for the same column input
> opclass. The definition of SOAP works (or could work) in basically the
> same way, provided the "OR condition" were provably disjunct. We could
> for example mix different operators for the same nbtree scan key (with
> some work in nbtutils.c), just as we could support "where mycol =5 OR
> mycol IS NULL" with much effort.
>
> BTW, did you know MySQL has long supported the latter? It has a <=>
> operator, which is basically a non-standard spelling of IS NOT
> DISTINCT FROM. Importantly, it is indexable, whereas right now
> Postgres doesn't support indexing IS NOT DISTINCT FROM. If you're
> interested in working on this problem within the scope of this patch,
> or some follow-up patch, I can take care of the nbtree side of things.

I was assuming this patch shouldn't be changing the way indexes work
at all, just making use of the facilities that we have today. More
could be done, but that might make it harder to get anything
committed.

Before we get too deep into arguing about hypotheticals, I don't think
there's any problem here that we can't solve with the infrastructure
we already have. For instance, consider this:

robert.haas=# explain select * from foo where a in (1, 1000);
QUERY PLAN
---
 Seq Scan on foo1 foo  (cost=0.00..25.88 rows=13 width=36)
   Filter: (a = ANY ('{1,1000}'::bigint[]))
(2 rows)

I don't know exactly what's happening here, but it seems very similar
to what we need to have happen for this patch to work. pg_typeof(1) is
integer, and pg_typeof(1000) is bigint, and we're able to
figure out that it's OK to put both of those in an array of a single
type and without having any type conversion failures. If you replace
1000 with 2, then the array ends up being of type
integer[] rather than type bigint[], so. clearly the system is able to
reason its way through these kinds of scenarios already.

It's even possible, in my mind at least, that the patch is already
doing exactly the right things here. Even if it isn't, the problem
doesn't seem to be fundamental, because if this example can work (and
it does) then what the patch is trying to do should be workable, too.
We just have to make sure we're plugging all the pieces properly
together, and that we have comments adequately explain what is
happening and test cases that verify it. My feeling is that the patch
doesn't meet that standard today, but I think that just means it needs
some more work. I'm not arguing we have to throw the whole thing out,
or invent a lot of new infrastructure, or anything like that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Peter Geoghegan
On Mon, Jun 24, 2024 at 1:46 PM Peter Geoghegan  wrote:
> BTW, did you know MySQL has long supported the latter? It has a <=>
> operator, which is basically a non-standard spelling of IS NOT
> DISTINCT FROM. Importantly, it is indexable, whereas right now
> Postgres doesn't support indexing IS NOT DISTINCT FROM. If you're
> interested in working on this problem within the scope of this patch,
> or some follow-up patch, I can take care of the nbtree side of things.

To be clear, I meant that we could easily support "where mycol = 5 OR
mycol IS NULL" and have nbtree handle that efficiently, by making it a
SAOP internally. Separately, we could also make IS NOT DISTINCT FROM
indexable, though that probably wouldn't need any work in nbtree.

-- 
Peter Geoghegan




Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Peter Geoghegan
On Mon, Jun 24, 2024 at 1:29 PM Robert Haas  wrote:
> I am not against handling this kind of case if we can do it, but it's
> more important that the patch doesn't cause gratuitous failures than
> that it handles more cases.

I agree, with the proviso that "avoid gratuitous failures" should
include cases where a query that got the optimization suddenly fails
to get the optimization, due only to some very innocuous looking
change. Such as a change from using a constant 1_000_000_000 to a
constant 5_000_000_000 in the query text. That is a POLA violation.

> > Maybe it would be practical to do something with the B-Tree operator
> > class for each of the types involved in the optimization. You could
> > probably find a way for a SAOP to work against a
> > "heterogeneously-typed array" while still getting B-Tree index scans
> > -- provided the types all came from the same operator family. I'm
> > assuming that the index has an index column whose input opclass was a
> > member of that same family. That would necessitate changing the
> > general definition of SAOP, and adding new code to nbtree that worked
> > with that. But that seems doable.
>
> I agree that something based on operator families might be viable. Why
> would that require changing the definition of an SAOP?

Maybe it doesn't. My point was only that the B-Tree code doesn't
necessarily need to use just one rhs type for the same column input
opclass. The definition of SOAP works (or could work) in basically the
same way, provided the "OR condition" were provably disjunct. We could
for example mix different operators for the same nbtree scan key (with
some work in nbtutils.c), just as we could support "where mycol =5 OR
mycol IS NULL" with much effort.

BTW, did you know MySQL has long supported the latter? It has a <=>
operator, which is basically a non-standard spelling of IS NOT
DISTINCT FROM. Importantly, it is indexable, whereas right now
Postgres doesn't support indexing IS NOT DISTINCT FROM. If you're
interested in working on this problem within the scope of this patch,
or some follow-up patch, I can take care of the nbtree side of things.

> > Admittedly I'm glossing over a lot of important details here. Does it
> > just work for the default opclass for the type, or can we expect it to
> > work with a non-default opclass when that's the salient opclass (the
> > one used by our index)? I don't know what you'd do about stuff like
> > that.
>
> It seems to me that it just depends on the opclasses in the query. If
> the user says
>
> WHERE column op1 const1 AND column op2 const2
>
> ...then if op1 and op2 are in the same operator family and if we can
> convert one of const1 and const2 to the type of the other without risk
> of failure, then we can rewrite this as an SAOP with whichever of the
> two operators pertains to the target type, e.g.
>
> column1 op1 ANY[const1,converted_const2]
>
> I don't think the default opclass matters here, or the index properties 
> either.

Okay, good.

The docs do say "Another requirement for a multiple-data-type family
is that any implicit or binary-coercion casts that are defined between
data types included in the operator family must not change the
associated sort ordering" [1]. There must be precedent for this sort
of thing. Probably for merge joins.

[1] https://www.postgresql.org/docs/devel/btree.html#BTREE-BEHAVIOR
-- 
Peter Geoghegan




Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Robert Haas
On Mon, Jun 24, 2024 at 12:09 PM Peter Geoghegan  wrote:
> But what about cases like this:
>
> SELECT * FROM mytable WHERE columna = 1_000_000_000 or columna =
> 5_000_000_000; -- columna is int4
>
> This is using two types, of course. 1_000_000_000 is int4, while
> 5_000_000_000 is bigint. If the transformation suddenly failed to work
> when a constant above INT_MAX was used for the first time, then I'd
> say that that's pretty surprising. That's what happens currently if
> you write the same query as "WHERE columna =
> any('{1_000_000_000,5_000_000_000}')", due to the way the coercion
> works. That seems less surprising to me, because the user is required
> to construct their own array, and users expect arrays to always have
> one element type.

I am not against handling this kind of case if we can do it, but it's
more important that the patch doesn't cause gratuitous failures than
that it handles more cases.

> Maybe it would be practical to do something with the B-Tree operator
> class for each of the types involved in the optimization. You could
> probably find a way for a SAOP to work against a
> "heterogeneously-typed array" while still getting B-Tree index scans
> -- provided the types all came from the same operator family. I'm
> assuming that the index has an index column whose input opclass was a
> member of that same family. That would necessitate changing the
> general definition of SAOP, and adding new code to nbtree that worked
> with that. But that seems doable.

I agree that something based on operator families might be viable. Why
would that require changing the definition of an SAOP?

> Admittedly I'm glossing over a lot of important details here. Does it
> just work for the default opclass for the type, or can we expect it to
> work with a non-default opclass when that's the salient opclass (the
> one used by our index)? I don't know what you'd do about stuff like
> that.

It seems to me that it just depends on the opclasses in the query. If
the user says

WHERE column op1 const1 AND column op2 const2

...then if op1 and op2 are in the same operator family and if we can
convert one of const1 and const2 to the type of the other without risk
of failure, then we can rewrite this as an SAOP with whichever of the
two operators pertains to the target type, e.g.

column1 op1 ANY[const1,converted_const2]

I don't think the default opclass matters here, or the index properties either.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Peter Geoghegan
On Mon, Jun 24, 2024 at 11:28 AM Robert Haas  wrote:
> > It needs to transform all similar constants to one type, because some 
> > constants of "OR" expressions can belong others, like the numeric and int 
> > types. Due to the fact that array structure demands that all types must be 
> > belonged to one type, so for this reason we applied this procedure.
>
> The alternative that should be considered is not combining things if
> the types don't match. If we're going to combine such things, we need
> to be absolutely certain that type conversion cannot fail.

But what about cases like this:

SELECT * FROM mytable WHERE columna = 1_000_000_000 or columna =
5_000_000_000; -- columna is int4

This is using two types, of course. 1_000_000_000 is int4, while
5_000_000_000 is bigint. If the transformation suddenly failed to work
when a constant above INT_MAX was used for the first time, then I'd
say that that's pretty surprising. That's what happens currently if
you write the same query as "WHERE columna =
any('{1_000_000_000,5_000_000_000}')", due to the way the coercion
works. That seems less surprising to me, because the user is required
to construct their own array, and users expect arrays to always have
one element type.

It would probably be okay to make the optimization not combine
things/not apply when the user gratuitously mixes different syntaxes.
For example, if a numeric constant was used, rather than an integer
constant.

Maybe it would be practical to do something with the B-Tree operator
class for each of the types involved in the optimization. You could
probably find a way for a SAOP to work against a
"heterogeneously-typed array" while still getting B-Tree index scans
-- provided the types all came from the same operator family. I'm
assuming that the index has an index column whose input opclass was a
member of that same family. That would necessitate changing the
general definition of SAOP, and adding new code to nbtree that worked
with that. But that seems doable.

I was already thinking about doing something like this, to support
index scans for "IS NOT DISTINCT FROM", or on constructs like "columna
= 5 OR columna IS NULL". That is more or less a SAOP with two values,
except that one of the values in the value NULL. I've already
implemented "nbtree SAOPs where one of the elements is a NULL" for
skip scan, which could be generalized to support these other cases.

Admittedly I'm glossing over a lot of important details here. Does it
just work for the default opclass for the type, or can we expect it to
work with a non-default opclass when that's the salient opclass (the
one used by our index)? I don't know what you'd do about stuff like
that.

-- 
Peter Geoghegan




Re: POC, WIP: OR-clause support for indexes

2024-06-24 Thread Robert Haas
On Fri, Jun 21, 2024 at 6:52 PM Alena Rybakina
 wrote:
> It's hard to tell, but I think it might be one of the good places to apply 
> transformation. Let me describe a brief conclusion on the two approaches.

This explanation is somewhat difficult for me to follow. For example:

> In the first approach, we definitely did not process the extra "OR" 
> expressions in the first approach, since they were packaged as an Array. It 
> could lead to the fact that less planning time would be spent on the 
> optimizer.

I don't know what the "first approach" refers to, or what processing
the extra "OR" expressions means, or what it would mean to package OR
expressions as an array. If you made them into an SAOP then you'd have
an array *instead of* OR expressions, not OR expressions "packaged as
an array" but even then, they'd still be processed somewhere, unless
the patch was just wrong.

I think you should try writing this summary again and see if you can
make it a lot clearer and more precise.

I'm suspicious based that we should actually be postponing the
transformation even further. If, for example, the transformation is
advantageous for index scans and disadvantageous for bitmap scans, or
the other way around, then this approach can't help much: it either
does the transformation and all scan types are affected, or it doesn't
do it and no scan types are affected. But if you decided for each scan
whether to transform the quals, then you could handle that. Against
that, there might be increased planning cost. But, perhaps that could
be avoided somehow.

> What exactly is the strategy around OR-clauses with type differences?
> If I'm reading the code correctly, the first loop requires an exact
> opno match, which presumably implies that the constant-type elements
> are of the same type. But then why does the second loop need to use
> coerce_to_common_type?
>
> It needs to transform all similar constants to one type, because some 
> constants of "OR" expressions can belong others, like the numeric and int 
> types. Due to the fact that array structure demands that all types must be 
> belonged to one type, so for this reason we applied this procedure.

The alternative that should be considered is not combining things if
the types don't match. If we're going to combine such things, we need
to be absolutely certain that type conversion cannot fail.

> I do not think this is acceptable. We should find a way to get the
> right operator into the ScalarArrayOpExpr without translating the OID
> back into a name and then back into an OID.
>
> I don’t really understand the reason why it’s better not to do this. Can you 
> explain please?

One reason is that it is extra work to convert things to a name and
then back to an OID. It's got to be slower than using the OID you
already have.

The other reason is that it's error-prone. If somehow the second
lookup doesn't produce the same OID as the first lookup, bad things
will happen, possibly including security vulnerabilities. I see you've
taken steps to avoid that, like nailing down the schema, and that's
good, but it's not a good enough reason to do it like this. If we
don't have a function that can construct the node we need with the OID
rather than the name as an argument, we should invent one, not do this
sort of thing.

> Also, why is the array built with eval_const_expressions instead of
> something like makeArrayResult? There should be no need for general
> expression evaluation here if we are just dealing with constants.
>
> I'm not ready to answer this question right now, I need time to study the use 
> of the makeArrayResult function in the optimizer.

OK. An important consideration here is that eval_const_expressions()
is prone to *fail* because it can call user-defined functions. We
really don't want this optimization to cause planner failure (or
queries to error out at any other stage, either). We also don't want
to end up with any security problems, which is another possible danger
when we call a function that can execute arbitrary code. It's better
to keep it simple and only do things that we know are simple and safe,
like assembling a bunch of datums that we already have into an array.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: POC, WIP: OR-clause support for indexes

2024-06-21 Thread Alena Rybakina

On 21.06.2024 23:35, Robert Haas wrote:

On Fri, Jun 21, 2024 at 1:05 PM Alena Rybakina
  wrote:

I'm confused, I have seen that we have two threads [1] and [2] about this 
thread and I haven't found any explanation for how they differ.

And I don't understand, why am I not listed as the author of the patch? I was 
developing the first part of the patch before Andrey came to review it [3] and 
his first part hasn't changed much since then.

v25 still lists you as an author (in fact, the first author) but I
can't say why we have two CommitFest entries. Surely that's a mistake.


Sorry, maybe I was overreacting.

Thank you very much for taking the time to do a detailed review!

On 22.06.2024 00:07, Alexander Korotkov wrote:

Sorry, I didn't notice that the [1] commitfest entry exists and
created the [2] commitfest entry.  I'm removed [2].

Thank you!


On the patch itself, I'm really glad we got to a design where this is
part of planning, not parsing. I'm not sure yet whether we're doing it
at the right time within the planner, but I think this *might* be
right, whereas the old way was definitely wrong.


It's hard to tell, but I think it might be one of the good places to 
apply transformation. Let me describe a brief conclusion on the two 
approaches.


In the first approach, we definitely did not process the extra "OR" 
expressions in the first approach, since they were packaged as an Array. 
It could lead to the fact that less planning time would be spent on the 
optimizer.


Also the selectivity for Array expressions is estimated better, which 
could lead to the generation of a more optimal plan, but, to be honest, 
this is just an observation from changes in regression tests and, in 
general, how the process of calculating the selectivity of a complex 
expression works.


 SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies WHERE (a = 1 OR a = 51) AND b = ''1''');


  estimated | actual

 ---+

-    99 |    100

+   100 |    100

 (1 row)

 SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies WHERE (a = 1 OR a = 51) AND (b = ''1'' OR b = 
''2'')');


  estimated | actual

 ---+

-    99 |    100

+   100 |    100

 (1 row)

 SELECT * FROM check_estimated_rows('SELECT * FROM 
functional_dependencies WHERE (a = 1 OR a = 2 OR a = 51 OR a = 52) AND 
(b = ''1'' OR b = ''2'')');


  estimated | actual

 ---+

-   197 |    200

+   200 |    200


In addition, we do not have new equivalence classes, since some “Or” 
expressions are not available for their formation. This can result in 
reduced memory and time spent generating the query plan, especially in 
partitions.


Speaking of the main disadvantages, we do not give the optimizer the 
opportunity to generate a plan using BitmapScan, which can lead to the 
generation of a suboptimal plan, but in the current approach the same 
thing happens [0].


And the second one might be related the lack of generation Equivalence 
Classes and generation of useful pathkeysas a result, so we could miss 
an optimal plan again. But I haven't caught something like this on 
practice. I see we won't have such problems if we apply the 
transformation later.


Overall, I have not yet noticed any very different parts from what was 
in the first approach: I didn’t see any significant degradation or 
improvement, which is good, but so far the main problem with the 
degradation of the plan has not yet been solved, that is, we have not 
escaped from the main problems.


Andrei mentioned the problem in the second approach about updating 
references to expressions in RestrictInfo [1] lists, because the can be 
used in different variables during the formation of the query plan. As 
the practice of Self-join removal [2] has shown, this can be expensive, 
but feasible.


By applying the transformation at the analysis stage in the first 
approach, because no links were created, so we did not encounter such 
problems, so this approach was more suitable than the others.


[0] 
https://www.postgresql.org/message-id/7d5aed92-d4cc-4b76-8ae0-051d182c9eec%40postgrespro.ru


[1] 
https://www.postgresql.org/message-id/6850c306-4e9d-40b7-8096-1f3c7d29cd9e%40postgrespro.ru


[2] https://commitfest.postgresql.org/48/5043/


What exactly is the strategy around OR-clauses with type differences?
If I'm reading the code correctly, the first loop requires an exact
opno match, which presumably implies that the constant-type elements
are of the same type. But then why does the second loop need to use
coerce_to_common_type?


It needs to transform all similar constants to one type, because some 
constants of "OR" expressions can belong others, like the numeric and 
int types. Due to the fact that array structure demands that all types 
must be belonged to one type, so for this reason we applied this procedure.


You can find the similar strategy in transformAExprIn function, when we 
transform 

Re: POC, WIP: OR-clause support for indexes

2024-06-21 Thread Alexander Korotkov
Hi, Alena.

On Fri, Jun 21, 2024 at 8:05 PM Alena Rybakina
 wrote:
>
> I'm confused, I have seen that we have two threads [1] and [2] about this 
> thread and I haven't found any explanation for how they differ.
>
> And I don't understand, why am I not listed as the author of the patch? I was 
> developing the first part of the patch before Andrey came to review it [3] 
> and his first part hasn't changed much since then.
>
> If I wrote to the wrong person about it, then please tell me where.
>
> [1] https://commitfest.postgresql.org/48/4450/
>
> [2] https://commitfest.postgresql.org/48/5037/
>
> [3] 
> https://www.postgresql.org/message-id/b301dce1-09fd-72b1-834a-527ca428db5e%40yandex.ru

Sorry, I didn't notice that the [1] commitfest entry exists and
created the [2] commitfest entry.  I'm removed [2].

--
Regards,
Alexander Korotkov
Supabase




Re: POC, WIP: OR-clause support for indexes

2024-06-21 Thread Robert Haas
On Fri, Jun 21, 2024 at 1:05 PM Alena Rybakina
 wrote:
> I'm confused, I have seen that we have two threads [1] and [2] about this 
> thread and I haven't found any explanation for how they differ.
>
> And I don't understand, why am I not listed as the author of the patch? I was 
> developing the first part of the patch before Andrey came to review it [3] 
> and his first part hasn't changed much since then.

v25 still lists you as an author (in fact, the first author) but I
can't say why we have two CommitFest entries. Surely that's a mistake.

On the patch itself, I'm really glad we got to a design where this is
part of planning, not parsing. I'm not sure yet whether we're doing it
at the right time within the planner, but I think this *might* be
right, whereas the old way was definitely wrong.

What exactly is the strategy around OR-clauses with type differences?
If I'm reading the code correctly, the first loop requires an exact
opno match, which presumably implies that the constant-type elements
are of the same type. But then why does the second loop need to use
coerce_to_common_type?

Also, why is the array built with eval_const_expressions instead of
something like makeArrayResult? There should be no need for general
expression evaluation here if we are just dealing with constants.

+ foreach(lc2, entry->exprs)
+ {
+ RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc2);
+
+ is_pushed_down = is_pushed_down || rinfo->is_pushed_down;
+ has_clone = has_clone || rinfo->is_pushed_down;
+ security_level = Max(security_level, rinfo->security_level);
+ required_relids = bms_union(required_relids, rinfo->required_relids);
+ incompatible_relids = bms_union(incompatible_relids,
rinfo->incompatible_relids);
+ outer_relids = bms_union(outer_relids, rinfo->outer_relids);
+ }

This seems like an extremely bad idea. Smushing together things with
different security levels (or a bunch of these other properties) seems
like it will break things. Presumably we wouldn't track these
properties on a per-RelOptInfo basis unless we needed an accurate idea
of the property value for each RelOptInfo. If the values are
guaranteed to match, then it's fine, but then we don't need this code
to merge possibly-different values. If they're not guaranteed to
match, then presumably we shouldn't merge into a single OR clause
unless they do.

On a related note, it looks to me like the tests focus too much on
simple cases. It seems like it's mostly testing cases where there are
no security quals, no weird operator classes, no type mismatches, and
few joins. In the cases where there are joins, it's an inner join and
there's no distinction between an ON-qual and a WHERE-qual. I strongly
suggest adding some test cases for weirder scenarios.

+ if (!OperatorIsVisible(entry->opno))
+ namelist = lappend(namelist,
makeString(get_namespace_name(operform->oprnamespace)));
+
+ namelist = lappend(namelist, makeString(pstrdup(NameStr(operform->oprname;
+ ReleaseSysCache(opertup);
+
+ saopexpr =
+ (ScalarArrayOpExpr *)
+ make_scalar_array_op(NULL,
+ namelist,
+ true,
+ (Node *) entry->expr,
+ (Node *) newa,
+ -1);

I do not think this is acceptable. We should find a way to get the
right operator into the ScalarArrayOpExpr without translating the OID
back into a name and then back into an OID.

+ /* One more trick: assemble correct clause */

This comment doesn't seem to make much sense. Some other comments
contain spelling mistakes. The patch should have comments in more
places explaining key design decisions.

+extern JumbleState *JumbleExpr(Expr *expr, uint64 *exprId);

This is no longer needed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: POC, WIP: OR-clause support for indexes

2024-06-21 Thread Alena Rybakina
I'm confused, I have seen that we have two threads [1] and [2] about 
this thread andIhaven'tfoundanyexplanationfor howtheydiffer.


And I don't understand, whyam Inotlistedas the authorof the patch? 
Iwasdevelopingthe firstpartof the patchbeforeAndreycameto review it[3] 
andhisfirstparthasn'tchangedmuchsincethen.


IfIwroteto the wrongpersonaboutit,thenpleasetellme where.

[1] https://commitfest.postgresql.org/48/4450/

[2] https://commitfest.postgresql.org/48/5037/

[3] 
https://www.postgresql.org/message-id/b301dce1-09fd-72b1-834a-527ca428db5e%40yandex.ru 



--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-06-17 Thread Alena Rybakina

On 17.06.2024 15:11, Alexander Korotkov wrote:

On Mon, Jun 17, 2024 at 1:33 PM Alena Rybakina
  wrote:

I noticed that 7 libraries have been added to 
src/backend/optimizer/plan/initsplan.c, and as far as I remember, Tom Lane has 
already expressed doubts about the approach that requires adding a large number 
of libraries [0], but I'm afraid I'm out of ideas about alternative approach.

Thank you for pointing.  Right, the number of extra headers included
was one of points for criticism on this patch.  I'll look to move this
functionality elsewhere, while the stage of transformation could
probably be the same.

Yes, I thing so.

In addition, I checked the fix in the previous cases that you wrote earlier [1] 
and noticed that SeqScan continues to generate, unfortunately, without 
converting expressions:

I've rechecked and see I made wrong conclusion about this.  The plan
regression is still here.  But I'm still looking to workaround this
without extra GUC.

I think we need to additionally do something like [1], but take
further steps to avoid planning overhead when not necessary.


Iagreewithyoutoreconsiderthisplacein detailonceagain,becauseotherwiseit 
lookslike we're likelyto runinto aperformanceissue.

In
particular, I think we should only consider splitting SAOP for bitmap
OR in the following cases:
1. There are partial indexes with predicates over target column.

Frankly, I see that we will need to split SAOP anyway to check it, right?

2. There are multiple indexes covering target column and different
subsets of other columns presented in restrictions.
I see two cases in one. First, we need to check whether there is an 
index for the columns specified in the restrictlist, and secondly, the 
index ranges for which the conditions fall into the "OR" expressions.

3. There are indexes covreing target column without support of SAOP
(amsearcharray == false).
Hopefully this should skip generation of useless bitmap paths in
majority cases. Thoughts?
I'm notsureIfullyunderstandhowusefulthiscanbe.Couldyouexplainit to mein 
more detail?

Links.
1.https://www.postgresql.org/message-id/67bd918d-285e-44d2-a207-f52d9a4c35e6%40postgrespro.ru


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-06-17 Thread Alexander Korotkov
On Mon, Jun 17, 2024 at 7:02 AM Andrei Lepikhov
 wrote:
> On 6/14/24 19:00, Alexander Korotkov wrote:
> > This patch could use some polishing, but I'd like to first hear some
> > feedback on general design.
> Thanks for your time and efforts. I have skimmed through the code—there
> is a minor fix in the attachment.
> First and foremost, I think this approach can survive.
> But generally, I'm not happy with manipulations over a restrictinfo clause:
> 1. While doing that, we should remember the fields of the RestrictInfo
> clause. It may need to be changed, too, or it can require such a change
> in the future if someone adds new logic.
> 2. We should remember the link to the RestrictInfo: see how the caller
> of the  distribute_restrictinfo_to_rels routine manipulates its fields
> right after the distribution.
> 3. Remember caches and cached decisions inside the RestrictInfo
> structure: replacing the clause should we change these fields too?
>
> These were the key reasons why we shifted the code to the earlier stages
> in the previous incarnation. So, going this way we should recheck all
> the  fields of this structure and analyse how the transformation can
> [potentially] affect their values.

I see your points.  Making this at the stage of restrictinfos seems
harder, and there are open questions in the patch.

I'd like to hear how Tom feels about this.  Is this the right
direction, or should we try another way?

--
Regards,
Alexander Korotkov
Supabase




Re: POC, WIP: OR-clause support for indexes

2024-06-17 Thread Alexander Korotkov
On Mon, Jun 17, 2024 at 1:33 PM Alena Rybakina
 wrote:
> I noticed that 7 libraries have been added to 
> src/backend/optimizer/plan/initsplan.c, and as far as I remember, Tom Lane 
> has already expressed doubts about the approach that requires adding a large 
> number of libraries [0], but I'm afraid I'm out of ideas about alternative 
> approach.

Thank you for pointing.  Right, the number of extra headers included
was one of points for criticism on this patch.  I'll look to move this
functionality elsewhere, while the stage of transformation could
probably be the same.

> In addition, I checked the fix in the previous cases that you wrote earlier 
> [1] and noticed that SeqScan continues to generate, unfortunately, without 
> converting expressions:

I've rechecked and see I made wrong conclusion about this.  The plan
regression is still here.  But I'm still looking to workaround this
without extra GUC.

I think we need to additionally do something like [1], but take
further steps to avoid planning overhead when not necessary.  In
particular, I think we should only consider splitting SAOP for bitmap
OR in the following cases:
1. There are partial indexes with predicates over target column.
2. There are multiple indexes covering target column and different
subsets of other columns presented in restrictions.
3. There are indexes covreing target column without support of SAOP
(amsearcharray == false).
Hopefully this should skip generation of useless bitmap paths in
majority cases. Thoughts?

Links.
1. 
https://www.postgresql.org/message-id/67bd918d-285e-44d2-a207-f52d9a4c35e6%40postgrespro.ru

--
Regards,
Alexander Korotkov
Supabase




Re: POC, WIP: OR-clause support for indexes

2024-06-17 Thread Alena Rybakina

Hi, thank you for your work with this subject!

On 14.06.2024 15:00, Alexander Korotkov wrote:

On Mon, Apr 8, 2024 at 1:34 AM Alexander Korotkov  wrote:

I've revised the patch.  Did some beautification, improvements for
documentation, commit messages etc.

I've pushed the 0001 patch without 0002.  I think 0001 is good by
itself given that there is the or_to_any_transform_limit GUC option.
The more similar OR clauses are here the more likely grouping them
into SOAP will be a win.  But I've changed the default value to 5.
This will make it less invasive and affect only queries with obvious
repeating patterns.  That also reduced the changes in the regression
tests expected outputs.

Regarding 0002, it seems questionable since it could cause a planning
slowdown for SAOP's with large arrays.  Also, it might reduce the win
of transformation made by 0001.  So, I think we should skip it for
now.

The patch has been reverted from pg17.  Let me propose a new version
for pg18 based on the valuable feedback from Tom Lane [1][2].

  * The transformation is moved to the stage of adding restrictinfos to
the base relation (in particular add_base_clause_to_rel()).  This
leads to interesting consequences.  While this allows IndexScans to
use transformed clauses, BitmapScans and SeqScans seem unaffected.
Therefore, I wasn't able to find a planning regression.
  * As soon as there is no planning regression anymore, I've removed
or_to_any_transform_limit GUC, which was a source of critics.
  * Now, not only Consts allowed in the SAOP's list, but also Params.
  * The criticized hash based on expression jumbling has been removed.
Now, the plain list is used instead.
  * OrClauseGroup now gets a legal node tag.  That allows to mix it in
the list with other nodes without hacks.

I think this patch shouldn't be as good as before for optimizing
performance of large OR lists, given that BitmapScans and SeqScans
still deal with ORs.  However, it allows IndexScans to handle more,
doesn't seem to cause planning regression and therefore introduce no
extra GUC.  Overall, this seems like a good compromise.

This patch could use some polishing, but I'd like to first hear some
feedback on general design.

Links
1.https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us
2.https://www.postgresql.org/message-id/3649287.1712642139%40sss.pgh.pa.us


Inoticedthat7librarieshave 
beenaddedtosrc/backend/optimizer/plan/initsplan.c,andas faras 
Iremember,TomLanehas alreadyexpresseddoubtsaboutthe 
approachthatrequiresaddinga largenumberof libraries[0], but I'm afraid 
I'm out of ideas about alternative approach.


In addition,Icheckedthe fixinthe 
previouscasesthatyouwroteearlier[1]andnoticedthatSeqScancontinuesto 
generate,unfortunately,withoutconvertingexpressions:


with patch:

create table test as (select (random()*10)::int x, (random()*1000) y 
from generate_series(1,100) i); create index test_x_1_y on test (y) 
where x = 1; create index test_x_2_y on test (y) where x = 2; vacuum 
analyze test; SELECT 100 CREATE INDEX CREATE INDEX VACUUM 
alena@postgres=# explain select * from test where (x = 1 or x = 2) and y 
= 100; QUERY PLAN 
-- 
Gather (cost=1000.00..12690.10 rows=1 width=12) Workers Planned: 2 -> 
Parallel Seq Scan on test (cost=0.00..11690.00 rows=1 width=12) Filter: 
(((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (4 rows) 
alena@postgres=# set enable_seqscan =off; SET alena@postgres=# explain 
select * from test where (x = 1 or x = 2) and y = 100; QUERY PLAN 
- 
Seq Scan on test (cost=100.00..1020440.00 rows=1 width=12) 
Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (2 rows)


without patch:

-- 
Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) Recheck 
Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = 
'100'::double precision) AND (x = 2))) -> BitmapOr (cost=8.60..8.60 
rows=1 width=0) -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30 
rows=1 width=0) Index Cond: (y = '100'::double precision) -> Bitmap 
Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y 
= '100'::double precision) (7 rows)


[0] https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us

[1] 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com 



--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-06-16 Thread Andrei Lepikhov

On 6/14/24 19:00, Alexander Korotkov wrote:

This patch could use some polishing, but I'd like to first hear some
feedback on general design.
Thanks for your time and efforts. I have skimmed through the code—there 
is a minor fix in the attachment.

First and foremost, I think this approach can survive.
But generally, I'm not happy with manipulations over a restrictinfo clause:
1. While doing that, we should remember the fields of the RestrictInfo 
clause. It may need to be changed, too, or it can require such a change 
in the future if someone adds new logic.
2. We should remember the link to the RestrictInfo: see how the caller 
of the  distribute_restrictinfo_to_rels routine manipulates its fields 
right after the distribution.
3. Remember caches and cached decisions inside the RestrictInfo 
structure: replacing the clause should we change these fields too?


These were the key reasons why we shifted the code to the earlier stages 
in the previous incarnation. So, going this way we should recheck all 
the  fields of this structure and analyse how the transformation can 
[potentially] affect their values.


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 0022535318..3b3249b075 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -2981,7 +2981,15 @@ add_base_clause_to_rel(PlannerInfo *root, Index relid,
 
 		if (list_length(orExpr->args) >= 2)
 		{
-			orExpr->args = transform_or_to_any(root, orExpr->args);
+			List *args = transform_or_to_any(root, orExpr->args);
+
+			if (list_length(args) > 1)
+orExpr->args = args;
+			else
+			{
+restrictinfo->orclause = NULL;
+restrictinfo->clause = (Expr*)((RestrictInfo *)linitial(args))->clause;
+			}
 		}
 	}
 


Re: POC, WIP: OR-clause support for indexes

2024-06-14 Thread Alexander Korotkov
On Mon, Apr 8, 2024 at 1:34 AM Alexander Korotkov  wrote:
>
> I've revised the patch.  Did some beautification, improvements for
> documentation, commit messages etc.
>
> I've pushed the 0001 patch without 0002.  I think 0001 is good by
> itself given that there is the or_to_any_transform_limit GUC option.
> The more similar OR clauses are here the more likely grouping them
> into SOAP will be a win.  But I've changed the default value to 5.
> This will make it less invasive and affect only queries with obvious
> repeating patterns.  That also reduced the changes in the regression
> tests expected outputs.
>
> Regarding 0002, it seems questionable since it could cause a planning
> slowdown for SAOP's with large arrays.  Also, it might reduce the win
> of transformation made by 0001.  So, I think we should skip it for
> now.

The patch has been reverted from pg17.  Let me propose a new version
for pg18 based on the valuable feedback from Tom Lane [1][2].

 * The transformation is moved to the stage of adding restrictinfos to
the base relation (in particular add_base_clause_to_rel()).  This
leads to interesting consequences.  While this allows IndexScans to
use transformed clauses, BitmapScans and SeqScans seem unaffected.
Therefore, I wasn't able to find a planning regression.
 * As soon as there is no planning regression anymore, I've removed
or_to_any_transform_limit GUC, which was a source of critics.
 * Now, not only Consts allowed in the SAOP's list, but also Params.
 * The criticized hash based on expression jumbling has been removed.
Now, the plain list is used instead.
 * OrClauseGroup now gets a legal node tag.  That allows to mix it in
the list with other nodes without hacks.

I think this patch shouldn't be as good as before for optimizing
performance of large OR lists, given that BitmapScans and SeqScans
still deal with ORs.  However, it allows IndexScans to handle more,
doesn't seem to cause planning regression and therefore introduce no
extra GUC.  Overall, this seems like a good compromise.

This patch could use some polishing, but I'd like to first hear some
feedback on general design.

Links
1. https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us
2. https://www.postgresql.org/message-id/3649287.1712642139%40sss.pgh.pa.us

--
Regards,
Alexander Korotkov
Supabase


v25-0001-Transform-OR-clauses-to-ANY-expression.patch
Description: Binary data


Re: POC, WIP: OR-clause support for indexes

2024-04-07 Thread Justin Pryzby
On Mon, Apr 08, 2024 at 01:34:37AM +0300, Alexander Korotkov wrote:
> Hi!
> 
> On Mon, Apr 1, 2024 at 9:38 AM Andrei Lepikhov
>  wrote:
> > On 28/3/2024 16:54, Alexander Korotkov wrote:
> > > The current patch has a boolean guc enable_or_transformation.
> > > However, when we have just a few ORs to be transformated, then we
> > > should get less performance gain from the transformation and higher
> > > chances to lose a good bitmap scan plan from that.  When there is a
> > > huge list of ORs to be transformed, then the performance gain is
> > > greater and it is less likely we could lose a good bitmap scan plan.
> > >
> > > What do you think about introducing a GUC threshold value: the minimum
> > > size of list to do OR-to-ANY transformation?
> > > min_list_or_transformation or something.
> > I labelled it or_transformation_limit (see in attachment). Feel free to
> > rename it.
> > It's important to note that the limiting GUC doesn't operate
> > symmetrically for forward, OR -> SAOP, and backward SAOP -> OR
> > operations. In the forward case, it functions as you've proposed.
> > However, in the backward case, we only check whether the feature is
> > enabled or not. This is due to our existing limitation,
> > MAX_SAOP_ARRAY_SIZE, and the fact that we can't match the length of the
> > original OR list with the sizes of the resulting SAOPs. For instance, a
> > lengthy OR list with 100 elements can be transformed into 3 SAOPs, each
> > with a size of around 30 elements.
> > One aspect that requires attention is the potential inefficiency of our
> > OR -> ANY transformation when we have a number of elements less than
> > MAX_SAOP_ARRAY_SIZE. This is because we perform a reverse transformation
> > ANY -> OR at the stage of generating bitmap scans. If the BitmapScan
> > path dominates, we may have done unnecessary work. Is this an occurrence
> > that we should address?
> > But the concern above may just be a point of improvement later: We can
> > add one more strategy to the optimizer: testing each array element as an
> > OR clause; we can also provide a BitmapOr path, where SAOP is covered
> > with a minimal number of partial indexes (likewise, previous version).
> 
> I've revised the patch.  Did some beautification, improvements for
> documentation, commit messages etc.
> 
> I've pushed the 0001 patch without 0002.  I think 0001 is good by
> itself given that there is the or_to_any_transform_limit GUC option.
> The more similar OR clauses are here the more likely grouping them
> into SOAP will be a win.  But I've changed the default value to 5.

The sample config file has the wrong default

+#or_to_any_transform_limit = 0

We had a patch to catch this kind of error, but it was closed (which IMO
was itself an error).

-- 
Justin




Re: POC, WIP: OR-clause support for indexes

2024-04-07 Thread Alexander Korotkov
Hi!

On Mon, Apr 1, 2024 at 9:38 AM Andrei Lepikhov
 wrote:
> On 28/3/2024 16:54, Alexander Korotkov wrote:
> > The current patch has a boolean guc enable_or_transformation.
> > However, when we have just a few ORs to be transformated, then we
> > should get less performance gain from the transformation and higher
> > chances to lose a good bitmap scan plan from that.  When there is a
> > huge list of ORs to be transformed, then the performance gain is
> > greater and it is less likely we could lose a good bitmap scan plan.
> >
> > What do you think about introducing a GUC threshold value: the minimum
> > size of list to do OR-to-ANY transformation?
> > min_list_or_transformation or something.
> I labelled it or_transformation_limit (see in attachment). Feel free to
> rename it.
> It's important to note that the limiting GUC doesn't operate
> symmetrically for forward, OR -> SAOP, and backward SAOP -> OR
> operations. In the forward case, it functions as you've proposed.
> However, in the backward case, we only check whether the feature is
> enabled or not. This is due to our existing limitation,
> MAX_SAOP_ARRAY_SIZE, and the fact that we can't match the length of the
> original OR list with the sizes of the resulting SAOPs. For instance, a
> lengthy OR list with 100 elements can be transformed into 3 SAOPs, each
> with a size of around 30 elements.
> One aspect that requires attention is the potential inefficiency of our
> OR -> ANY transformation when we have a number of elements less than
> MAX_SAOP_ARRAY_SIZE. This is because we perform a reverse transformation
> ANY -> OR at the stage of generating bitmap scans. If the BitmapScan
> path dominates, we may have done unnecessary work. Is this an occurrence
> that we should address?
> But the concern above may just be a point of improvement later: We can
> add one more strategy to the optimizer: testing each array element as an
> OR clause; we can also provide a BitmapOr path, where SAOP is covered
> with a minimal number of partial indexes (likewise, previous version).

I've revised the patch.  Did some beautification, improvements for
documentation, commit messages etc.

I've pushed the 0001 patch without 0002.  I think 0001 is good by
itself given that there is the or_to_any_transform_limit GUC option.
The more similar OR clauses are here the more likely grouping them
into SOAP will be a win.  But I've changed the default value to 5.
This will make it less invasive and affect only queries with obvious
repeating patterns.  That also reduced the changes in the regression
tests expected outputs.

Regarding 0002, it seems questionable since it could cause a planning
slowdown for SAOP's with large arrays.  Also, it might reduce the win
of transformation made by 0001.  So, I think we should skip it for
now.

--
Regards,
Alexander Korotkov




Re: POC, WIP: OR-clause support for indexes

2024-04-01 Thread Andrei Lepikhov

On 28/3/2024 16:54, Alexander Korotkov wrote:

The current patch has a boolean guc enable_or_transformation.
However, when we have just a few ORs to be transformated, then we
should get less performance gain from the transformation and higher
chances to lose a good bitmap scan plan from that.  When there is a
huge list of ORs to be transformed, then the performance gain is
greater and it is less likely we could lose a good bitmap scan plan.

What do you think about introducing a GUC threshold value: the minimum
size of list to do OR-to-ANY transformation?
min_list_or_transformation or something.
I labelled it or_transformation_limit (see in attachment). Feel free to 
rename it.
It's important to note that the limiting GUC doesn't operate 
symmetrically for forward, OR -> SAOP, and backward SAOP -> OR 
operations. In the forward case, it functions as you've proposed. 
However, in the backward case, we only check whether the feature is 
enabled or not. This is due to our existing limitation, 
MAX_SAOP_ARRAY_SIZE, and the fact that we can't match the length of the 
original OR list with the sizes of the resulting SAOPs. For instance, a 
lengthy OR list with 100 elements can be transformed into 3 SAOPs, each 
with a size of around 30 elements.
One aspect that requires attention is the potential inefficiency of our 
OR -> ANY transformation when we have a number of elements less than 
MAX_SAOP_ARRAY_SIZE. This is because we perform a reverse transformation 
ANY -> OR at the stage of generating bitmap scans. If the BitmapScan 
path dominates, we may have done unnecessary work. Is this an occurrence 
that we should address?
But the concern above may just be a point of improvement later: We can 
add one more strategy to the optimizer: testing each array element as an 
OR clause; we can also provide a BitmapOr path, where SAOP is covered 
with a minimal number of partial indexes (likewise, previous version).


--
regards,
Andrei Lepikhov
Postgres Professional
From e42a7111a12ef82eecdb2e692d65e7ba6e43ad79 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  18 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/optimizer/prep/prepqual.c | 379 +-
 src/backend/utils/misc/guc_tables.c   |  13 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   2 +
 src/test/regress/expected/create_index.out| 172 +++-
 src/test/regress/expected/join.out|  60 ++-
 src/test/regress/expected/partition_prune.out | 211 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/tidscan.out |  21 +-
 src/test/regress/sql/create_index.sql |  44 ++
 src/test/regress/sql/join.sql |   8 +
 src/test/regress/sql/partition_prune.sql  |  18 +
 src/test/regress/sql/tidscan.sql  |   4 +
 src/tools/pgindent/typedefs.list  |   2 +
 18 files changed, 950 insertions(+), 51 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index b7af86d351..277ef3f385 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8853,18 +8853,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
- QUERY PLAN
 
-
+  QUERY PLAN   
   
+--
  Update on public.utrtest

Re: POC, WIP: OR-clause support for indexes

2024-03-28 Thread Alexander Korotkov
On Tue, Mar 19, 2024 at 7:17 AM Andrei Lepikhov
 wrote:
> On 14/3/2024 16:31, Alexander Korotkov wrote:
> > On Wed, Mar 13, 2024 at 2:16 PM Andrei Lepikhov
> > As you can see this case is not related to partial indexes.  Just no
> > index selective for the whole query.  However, splitting scan by the OR
> > qual lets use a combination of two selective indexes.
> I have rewritten the 0002-* patch according to your concern. A candidate
> and some thoughts are attached.
> As I see, we have a problem here: expanding each array and trying to
> apply an element to each index can result in a lengthy planning stage.
> Also, an index scan with the SAOP may potentially be more effective than
> with the list of OR clauses.
> Originally, the transformation's purpose was to reduce a query's
> complexity and the number of optimization ways to speed up planning and
> (sometimes) execution. Here, we reduce planning complexity only in the
> case of an array size larger than MAX_SAOP_ARRAY_SIZE.
> Maybe we can fall back to the previous version of the second patch,
> keeping in mind that someone who wants to get maximum profit from the
> BitmapOr scan of multiple indexes can just disable this optimization,
> enabling deep search of the most optimal scanning way?
> As a compromise solution, I propose adding one more option to the
> previous version: if an element doesn't fit any partial index, try to
> cover it with a plain index.
> In this case, we still do not guarantee the most optimal fit of elements
> to the set of indexes, but we speed up planning. Does that make sense?

Thank you for your research Andrei.  Now things get more clear on the
advantages and disadvantages of this transformation.

The current patch has a boolean guc enable_or_transformation.
However, when we have just a few ORs to be transformated, then we
should get less performance gain from the transformation and higher
chances to lose a good bitmap scan plan from that.  When there is a
huge list of ORs to be transformed, then the performance gain is
greater and it is less likely we could lose a good bitmap scan plan.

What do you think about introducing a GUC threshold value: the minimum
size of list to do OR-to-ANY transformation?
min_list_or_transformation or something.

--
Regards,
Alexander Korotkov




Re: POC, WIP: OR-clause support for indexes

2024-03-18 Thread Andrei Lepikhov

On 14/3/2024 16:31, Alexander Korotkov wrote:
On Wed, Mar 13, 2024 at 2:16 PM Andrei Lepikhov 
As you can see this case is not related to partial indexes.  Just no 
index selective for the whole query.  However, splitting scan by the OR 
qual lets use a combination of two selective indexes.
I have rewritten the 0002-* patch according to your concern. A candidate 
and some thoughts are attached.
As I see, we have a problem here: expanding each array and trying to 
apply an element to each index can result in a lengthy planning stage.
Also, an index scan with the SAOP may potentially be more effective than 
with the list of OR clauses.
Originally, the transformation's purpose was to reduce a query's 
complexity and the number of optimization ways to speed up planning and 
(sometimes) execution. Here, we reduce planning complexity only in the 
case of an array size larger than MAX_SAOP_ARRAY_SIZE.
Maybe we can fall back to the previous version of the second patch, 
keeping in mind that someone who wants to get maximum profit from the 
BitmapOr scan of multiple indexes can just disable this optimization, 
enabling deep search of the most optimal scanning way?
As a compromise solution, I propose adding one more option to the 
previous version: if an element doesn't fit any partial index, try to 
cover it with a plain index.
In this case, we still do not guarantee the most optimal fit of elements 
to the set of indexes, but we speed up planning. Does that make sense?


--
regards,
Andrei Lepikhov
Postgres Professional
From d2d8944fc83ccd090653c1b15703a2c3ba096fa9 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 13 Mar 2024 12:26:02 +0700
Subject: [PATCH 2/2] Teach generate_bitmap_or_paths to build BitmapOr paths
 over SAOP clauses.

Likewise OR clauses, discover SAOP array and try to split its elements
between smaller sized arrays to fit a set of partial indexes.
---
 doc/src/sgml/config.sgml   |   3 +
 src/backend/optimizer/path/indxpath.c  |  74 +-
 src/backend/optimizer/util/predtest.c  |  37 +++
 src/backend/optimizer/util/restrictinfo.c  |  13 +
 src/include/optimizer/optimizer.h  |   3 +
 src/include/optimizer/restrictinfo.h   |   1 +
 src/test/regress/expected/create_index.out |  24 +-
 src/test/regress/expected/select.out   | 280 +
 src/test/regress/sql/select.sql|  82 ++
 9 files changed, 500 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2de6ae301a..0df56f44e3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5485,6 +5485,9 @@ ANY num_sync 
( orclause)->args;
+
+   if (!enable_or_transformation)
+   return orlist;
+
+   if (restriction_is_saop_clause(rinfo))
+   {
+   result = transform_saop_to_ors(root, rinfo);
+   return (result == NIL) ? list_make1(rinfo) : result;
+   }
+
+   foreach(lc, orlist)
+   {
+   Expr *expr = (Expr *) lfirst(lc);
+
+   if (IsA(expr, RestrictInfo) && 
restriction_is_saop_clause((RestrictInfo *) expr))
+   {
+   List *sublist;
+
+   sublist = extract_saop_ors(root, (RestrictInfo *) 
lfirst(lc));
+   if (sublist != NIL)
+   {
+   result = list_concat(result, sublist);
+   continue;
+   }
+
+   /* Need to return expr to the result list */
+   }
+
+   result = lappend(result, expr);
+   }
+
+   return result;
+}
+
 /*
  * generate_bitmap_or_paths
- * Look through the list of clauses to find OR clauses, and 
generate
- * a BitmapOrPath for each one we can handle that way.  Return a 
list
- * of the generated BitmapOrPaths.
+ * Look through the list of clauses to find OR and SAOP clauses, 
and
+ * Each saop clause are splitted to be covered by partial indexes.
+ * generate a BitmapOrPath for each one we can handle that way.
+ * Return a list of the generated BitmapOrPaths.
  *
  * other_clauses is a list of additional clauses that can be assumed true
  * for the purpose of generating indexquals, but are not to be searched for
@@ -1247,20 +1303,24 @@ generate_bitmap_or_paths(PlannerInfo *root, RelOptInfo 
*rel,
foreach(lc, clauses)
{
RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
-   List   *pathlist;
+   List   *pathlist = NIL;
Path   *bitmapqual;
ListCell   *j;
+   List   *orlist = NIL;
 
-   /* Ignore RestrictInfos that aren't ORs */
-   if (!restriction_is_or_clause(rinfo))
+   orlist = extract_saop_ors(root, rinfo);
+   if (orlist == NIL)
+ 

Re: POC, WIP: OR-clause support for indexes

2024-03-14 Thread Andrei Lepikhov

On 14/3/2024 17:39, Alexander Korotkov wrote:

Thank you, Andrei.  Looks like a very undesirable side effect.  Do you
have any idea why it happens?  Partition pruning should work correctly
for both transformed and non-transformed quals, why does
transformation hurt it?
Now we have the v23-0001-* patch with all issues resolved. The last one 
which caused execution stage pruning was about necessity to evaluate 
SAOP expression right after transformation. In previous version the core 
executed it on transformed expressions.


> As you can see this case is not related to partial indexes.  Just no
> index selective for the whole query.  However, splitting scan by the
> OR qual lets use a combination of two selective indexes.
Thanks for the case. I will try to resolve it.

--
regards,
Andrei Lepikhov
Postgres Professional
From 156c00c820a38e5e1856f07363af87b3109b5d77 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/optimizer/prep/prepqual.c | 374 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   2 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/join.out|  62 ++-
 src/test/regress/expected/partition_prune.out | 215 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 19 files changed, 929 insertions(+), 58 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 58a603ac56..a965b43cc6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8838,18 +8838,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
- QUERY PLAN
 
-
+  QUERY PLAN   
   
+--
  Update on public.utrtest
Output: utrtest_1.a, utrtest_1.b
Foreign Update on public.remp utrtest_1
Update on public.locp utrtest_2
->  Append
  ->  Foreign Update on public.remp utrtest_1
-   Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a 
= 2))) RETURNING a, b
+   Remote SQL: UPDATE public.loct SET a = 1 WHERE ((a = ANY 
('{1,2}'::integer[]))) RETURNING a, b
  ->  Seq Scan on public.locp utrtest_2
Output: 1, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
-   Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
+   Filter: (utrtest_2.a = ANY ('{1,2}'::integer[]))
 (10 rows)
 
 -- The new values are concatenated with ' triggered !'
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 65a6e6c408..2de6ae301a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5472,6 +5472,23 @@ ANY num_sync ( 
+  enable_or_transformation (boolean)
+   
+enable_or_transformation configuration 
parameter
+   
+  
+  
+   
+Enables or disables the query 

Re: POC, WIP: OR-clause support for indexes

2024-03-14 Thread Alexander Korotkov
On Thu, Mar 14, 2024 at 12:11 PM Andrei Lepikhov
 wrote:
>
> On 14/3/2024 16:31, Alexander Korotkov wrote:
> > On Wed, Mar 13, 2024 at 2:16 PM Andrei Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> >  > On 13/3/2024 18:05, Alexander Korotkov wrote:
> >  > > On Wed, Mar 13, 2024 at 7:52 AM Andrei Lepikhov
> >  > > Given all of the above, I think moving transformation to the
> >  > > canonicalize_qual() would be the right way to go.
> >  > Ok, I will try to move the code.
> >  > I have no idea about the timings so far. I recall the last time I got
> >  > bogged down in tons of duplicated code. I hope with an almost-ready
> >  > sketch, it will be easier.
> >
> > Thank you!  I'll be looking forward to the updated patch.
> Okay, I moved the 0001-* patch to the prepqual.c module. See it in the
> attachment. I treat it as a transient patch.
> It has positive outcomes as well as negative ones.
> The most damaging result you can see in the partition_prune test:
> partition pruning, in some cases, moved to the executor initialization
> stage. I guess, we should avoid it somehow in the next version.

Thank you, Andrei.  Looks like a very undesirable side effect.  Do you
have any idea why it happens?  Partition pruning should work correctly
for both transformed and non-transformed quals, why does
transformation hurt it?

--
Regards,
Alexander Korotkov




Re: POC, WIP: OR-clause support for indexes

2024-03-14 Thread Andrei Lepikhov

On 14/3/2024 16:31, Alexander Korotkov wrote:
On Wed, Mar 13, 2024 at 2:16 PM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

 > On 13/3/2024 18:05, Alexander Korotkov wrote:
 > > On Wed, Mar 13, 2024 at 7:52 AM Andrei Lepikhov
 > > Given all of the above, I think moving transformation to the
 > > canonicalize_qual() would be the right way to go.
 > Ok, I will try to move the code.
 > I have no idea about the timings so far. I recall the last time I got
 > bogged down in tons of duplicated code. I hope with an almost-ready
 > sketch, it will be easier.

Thank you!  I'll be looking forward to the updated patch.
Okay, I moved the 0001-* patch to the prepqual.c module. See it in the 
attachment. I treat it as a transient patch.

It has positive outcomes as well as negative ones.
The most damaging result you can see in the partition_prune test:
partition pruning, in some cases, moved to the executor initialization 
stage. I guess, we should avoid it somehow in the next version.


--
regards,
Andrei Lepikhov
Postgres Professional
From 170f6871540025d0d1683750442e7af902b11a40 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/optimizer/prep/prepqual.c | 371 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   2 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/join.out|  62 ++-
 src/test/regress/expected/partition_prune.out | 235 +--
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  19 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 19 files changed, 939 insertions(+), 61 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 58a603ac56..a965b43cc6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8838,18 +8838,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest set a = 1 where a = 1 or a = 2 returning *;
- QUERY PLAN
 
-
+  QUERY PLAN   
   
+--
  Update on public.utrtest
Output: utrtest_1.a, utrtest_1.b
Foreign Update on public.remp utrtest_1
Update on public.locp utrtest_2
->  Append
  ->  Foreign Update on public.remp utrtest_1
-   Remote SQL: UPDATE public.loct SET a = 1 WHERE (((a = 1) OR (a 
= 2))) RETURNING a, b
+   Remote SQL: UPDATE public.loct SET a = 1 WHERE ((a = ANY 
('{1,2}'::integer[]))) RETURNING a, b
  ->  Seq Scan on public.locp utrtest_2
Output: 1, utrtest_2.tableoid, utrtest_2.ctid, NULL::record
-   Filter: ((utrtest_2.a = 1) OR (utrtest_2.a = 2))
+   Filter: (utrtest_2.a = ANY ('{1,2}'::integer[]))
 (10 rows)
 
 -- The new values are concatenated with ' triggered !'
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 65a6e6c408..2de6ae301a 100644
--- a/doc/src/sgml/config.sgml
+++ 

Re: POC, WIP: OR-clause support for indexes

2024-03-14 Thread Alexander Korotkov
On Wed, Mar 13, 2024 at 2:16 PM Andrei Lepikhov 
wrote:
> On 13/3/2024 18:05, Alexander Korotkov wrote:
> > On Wed, Mar 13, 2024 at 7:52 AM Andrei Lepikhov
> > Given all of the above, I think moving transformation to the
> > canonicalize_qual() would be the right way to go.
> Ok, I will try to move the code.
> I have no idea about the timings so far. I recall the last time I got
> bogged down in tons of duplicated code. I hope with an almost-ready
> sketch, it will be easier.

Thank you!  I'll be looking forward to the updated patch.

I also have notes about the bitmap patch.

/*
 * Building index paths over SAOP clause differs from the logic of OR
clauses.
 * Here we iterate across all the array elements and split them to SAOPs,
 * corresponding to different indexes. We must match each element to an
index.
 */

This covers the case I posted before.  But in order to fix all possible
cases we probably need to handle the SAOP clause in the same way as OR
clauses.  Check also this case.

Setup
create table t (a int not null, b int not null, c int not null);
insert into t (select 1, 1, i from generate_series(1,1) i);
insert into t (select i, 2, 2 from generate_series(1,1) i);
create index t_a_b_idx on t (a, b);
create statistics t_a_b_stat (mcv) on a, b from t;
create statistics t_b_c_stat (mcv) on b, c from t;
vacuum analyze t;

Plan with enable_or_transformation = on:
# explain select * from t where a = 1 and (b = 1 or b = 2) and c = 2;
  QUERY PLAN
--
 Bitmap Heap Scan on t  (cost=156.55..440.56 rows=5001 width=12)
   Recheck Cond: (a = 1)
   Filter: ((b = ANY ('{1,2}'::integer[])) AND (c = 2))
   ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..155.29 rows=10001
width=0)
 Index Cond: (a = 1)
(5 rows)

Plan with enable_or_transformation = off:
# explain select * from t where a = 1 and (b = 1 or b = 2) and c = 2;
  QUERY PLAN
--
 Bitmap Heap Scan on t  (cost=11.10..18.32 rows=5001 width=12)
   Recheck Cond: (((b = 1) AND (c = 2)) OR ((a = 1) AND (b = 2)))
   Filter: ((a = 1) AND (c = 2))
   ->  BitmapOr  (cost=11.10..11.10 rows=2 width=0)
 ->  Bitmap Index Scan on t_b_c_idx  (cost=0.00..4.30 rows=1
width=0)
   Index Cond: ((b = 1) AND (c = 2))
 ->  Bitmap Index Scan on t_a_b_idx  (cost=0.00..4.30 rows=1
width=0)
   Index Cond: ((a = 1) AND (b = 2))
(8 rows)

As you can see this case is not related to partial indexes.  Just no index
selective for the whole query.  However, splitting scan by the OR qual lets
use a combination of two selective indexes.

--
Regards,
Alexander Korotkov


Re: POC, WIP: OR-clause support for indexes

2024-03-13 Thread Andrei Lepikhov

On 13/3/2024 18:05, Alexander Korotkov wrote:

On Wed, Mar 13, 2024 at 7:52 AM Andrei Lepikhov
Given all of the above, I think moving transformation to the
canonicalize_qual() would be the right way to go.

Ok, I will try to move the code.
I have no idea about the timings so far. I recall the last time I got 
bogged down in tons of duplicated code. I hope with an almost-ready 
sketch, it will be easier.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-13 Thread Alexander Korotkov
On Wed, Mar 13, 2024 at 7:52 AM Andrei Lepikhov
 wrote:
> On 12/3/2024 22:20, Alexander Korotkov wrote:
> > On Mon, Mar 11, 2024 at 2:43 PM Andrei Lepikhov
> >> I think you are right. It is probably a better place than any other to
> >> remove duplicates in an array. I just think we should sort and remove
> >> duplicates from entry->consts in one pass. Thus, this optimisation
> >> should be applied to sortable constants.
> >
> > Ok.
> New version of the patch set implemented all we have agreed on for now.
> We can return MAX_SAOP_ARRAY_SIZE constraint and Alena's approach to
> duplicates deletion for non-sortable cases at the end.
> >
> >> Hmm, we already tried to do it at that point. I vaguely recall some
> >> issues caused by this approach. Anyway, it should be done as quickly as
> >> possible to increase the effect of the optimization.
> >
> > I think there were provided quite strong reasons why this shouldn't be
> > implemented at the parse analysis stage [1], [2], [3].  The
> > canonicalize_qual() looks quite appropriate place for that since it
> > does similar transformations.
> Ok. Let's discuss these reasons. In Robert's opinion [1,3], we should do
> the transformation based on the cost model. But in the canonicalize_qual
> routine, we still make the transformation blindly. Moreover, the second
> patch reduces the weight of this reason, doesn't it? Maybe we shouldn't
> think about that as about optimisation but some 'general form of
> expression'?
> Peter [2] worries about the possible transformation outcomes at this
> stage. But remember, we already transform clauses like ROW() IN (...) to
> a series of ORs here, so it is not an issue. Am I wrong?
> Why did we discard the attempt with canonicalize_qual on the previous
> iteration? - The stage of parsing is much more native for building SAOP
> quals. We can reuse make_scalar_array_op and other stuff, for example.
> During the optimisation stage, the only list partitioning machinery
> creates SAOP based on a list of constants. So, in theory, it is possible
> to implement. But do we really need to make the code more complex?

As we currently do OR-to-ANY transformation at the parse stage, the
system catalog (including views, inheritance clauses, partial and
expression indexes, and others) would have a form depending on
enable_or_transformation at the moment of DDL execution.  I think this
is rather wrong.  The enable_or_transformation should be run-time
optimization which affects the resulting query plan, its result
shouldn't be persistent.

Regarding the ROW() IN (...) precedent.

1. AFAICS, this is not exactly an optimization.  This transformation
allows us to perform type matching individually for every value.
Therefore it allows the execute some queries which otherwise would end
up with error.
2. I don't think this is a sample of good design.  This is rather
hack, which is historically here, but we don't want to replicate this
experience.

Given all of the above, I think moving transformation to the
canonicalize_qual() would be the right way to go.

--
Regards,
Alexander Korotkov




Re: POC, WIP: OR-clause support for indexes

2024-03-12 Thread Andrei Lepikhov

On 12/3/2024 22:20, Alexander Korotkov wrote:

On Mon, Mar 11, 2024 at 2:43 PM Andrei Lepikhov

I think you are right. It is probably a better place than any other to
remove duplicates in an array. I just think we should sort and remove
duplicates from entry->consts in one pass. Thus, this optimisation
should be applied to sortable constants.


Ok.
New version of the patch set implemented all we have agreed on for now. 
We can return MAX_SAOP_ARRAY_SIZE constraint and Alena's approach to 
duplicates deletion for non-sortable cases at the end.



Hmm, we already tried to do it at that point. I vaguely recall some
issues caused by this approach. Anyway, it should be done as quickly as
possible to increase the effect of the optimization.


I think there were provided quite strong reasons why this shouldn't be
implemented at the parse analysis stage [1], [2], [3].  The
canonicalize_qual() looks quite appropriate place for that since it
does similar transformations.
Ok. Let's discuss these reasons. In Robert's opinion [1,3], we should do 
the transformation based on the cost model. But in the canonicalize_qual 
routine, we still make the transformation blindly. Moreover, the second 
patch reduces the weight of this reason, doesn't it? Maybe we shouldn't 
think about that as about optimisation but some 'general form of 
expression'?
Peter [2] worries about the possible transformation outcomes at this 
stage. But remember, we already transform clauses like ROW() IN (...) to 
a series of ORs here, so it is not an issue. Am I wrong?
Why did we discard the attempt with canonicalize_qual on the previous 
iteration? - The stage of parsing is much more native for building SAOP 
quals. We can reuse make_scalar_array_op and other stuff, for example.
During the optimisation stage, the only list partitioning machinery 
creates SAOP based on a list of constants. So, in theory, it is possible 
to implement. But do we really need to make the code more complex?


Links.
1. 
https://www.postgresql.org/message-id/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAH2-Wzm2%3Dnf_JhiM3A2yetxRs8Nd2NuN3JqH%3Dfm_YWYd1oYoPg%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/CA%2BTgmoaOiwMXBBTYknczepoZzKTp-Zgk5ss1%2BCuVQE-eFTqBmA%40mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional
From 86d969f2598a03b1eba84c0c064707de34ee60ac Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 416 ++
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 ++-
 src/test/regress/expected/join.out|  62 ++-
 src/test/regress/expected/partition_prune.out | 215 -
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 +
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 20 files changed, 974 insertions(+), 60 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 58a603ac56..a965b43cc6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8838,18 +8838,18 @@ insert into utrtest values (2, 'qux');
 -- Check case where the foreign partition is a subplan target rel
 explain (verbose, costs off)
 update utrtest 

Re: POC, WIP: OR-clause support for indexes

2024-03-12 Thread Alexander Korotkov
On Mon, Mar 11, 2024 at 2:43 PM Andrei Lepikhov
 wrote:
> On 11/3/2024 18:31, Alexander Korotkov wrote:
> >> I'm not convinced about this limit. The initial reason was to combine
> >> long lists of ORs into the array because such a transformation made at
> >> an early stage increases efficiency.
> >> I understand the necessity of this limit in the array decomposition
> >> routine but not in the creation one.
> >
> > The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because
> > N^2 algorithms could be applied to arrays.  Are you sure that's not
> > true for our case?
> When you operate an array, indeed. But when we transform ORs to an
> array, not. Just check all the places in the optimizer and even the
> executor where we would pass along the list of ORs. This is why I think
> we should use this optimization even more intensively for huge numbers
> of ORs in an attempt to speed up the overall query.

Ok.

> >>> 3) Better save the original order of clauses by putting hash entries and
> >>> untransformable clauses to the same list.  A lot of differences in
> >>> regression tests output have gone.
> >> I agree that reducing the number of changes in regression tests looks
> >> better. But to achieve this, you introduced a hack that increases the
> >> complexity of the code. Is it worth it? Maybe it would be better to make
> >> one-time changes in tests instead of getting this burden on board. Or
> >> have you meant something more introducing the node type?
> >
> > For me the reason is not just a regression test.  The current code
> > keeps the original order of quals as much as possible.  The OR
> > transformation code reorders quals even in cases when it doesn't
> > eventually apply any optimization.  I don't think that's acceptable.
> > However, less hackery ways for this is welcome for sure.
> Why is it unacceptable? Can the user implement some order-dependent
> logic with clauses, and will it be correct?
> Otherwise, it is a matter of taste, and generally, this decision is up
> to you.

I think this is an important property that the user sees the quals in
the plan in the same order as they were in the query.  And if some
transformations are applied, then the order is saved as much as
possible.  I don't think we should sacrifice this property without
strong reasons.  A bit of code complexity is definitely not that
reason for me.

> >>> We don't make array values unique.  That might make query execution
> >>> performance somewhat worse, and also makes selectivity estimation
> >>> worse.  I suggest Andrei and/or Alena should implement making array
> >>> values unique.
> >> The fix Alena has made looks correct. But I urge you to think twice:
> >> The optimizer doesn't care about duplicates, so why do we do it?
> >> What's more, this optimization is intended to speed up queries with long
> >> OR lists. Using the list_append_unique() comparator on such lists could
> >> impact performance. I suggest sticking to the common rule and leaving
> >> the responsibility on the user's shoulders.
> >
> > I don't see why the optimizer doesn't care about duplicates for OR
> > lists.  As I showed before in an example, it successfully removes the
> > duplicate.  So, currently OR transformation clearly introduces a
> > regression in terms of selectivity estimation.  I think we should
> > evade that.
> I think you are right. It is probably a better place than any other to
> remove duplicates in an array. I just think we should sort and remove
> duplicates from entry->consts in one pass. Thus, this optimisation
> should be applied to sortable constants.

Ok.

> >> At least, we should do this optimization later, in one pass, with
> >> sorting elements before building the array. But what if we don't have a
> >> sort operator for the type?
> >
> > It was probably discussed before, but can we do our work later?  There
> > is a canonicalize_qual() which calls find_duplicate_ors().  This is
> > the place where currently duplicate OR clauses are removed.  Could our
> > OR-to-ANY transformation be just another call from
> > canonicalize_qual()?
> Hmm, we already tried to do it at that point. I vaguely recall some
> issues caused by this approach. Anyway, it should be done as quickly as
> possible to increase the effect of the optimization.

I think there were provided quite strong reasons why this shouldn't be
implemented at the parse analysis stage [1], [2], [3].  The
canonicalize_qual() looks quite appropriate place for that since it
does similar transformations.

Links.
1. 
https://www.postgresql.org/message-id/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAH2-Wzm2%3Dnf_JhiM3A2yetxRs8Nd2NuN3JqH%3Dfm_YWYd1oYoPg%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/CA%2BTgmoaOiwMXBBTYknczepoZzKTp-Zgk5ss1%2BCuVQE-eFTqBmA%40mail.gmail.com

--
Regards,
Alexander Korotkov




Re: POC, WIP: OR-clause support for indexes

2024-03-11 Thread Andrei Lepikhov

On 11/3/2024 18:31, Alexander Korotkov wrote:

I'm not convinced about this limit. The initial reason was to combine
long lists of ORs into the array because such a transformation made at
an early stage increases efficiency.
I understand the necessity of this limit in the array decomposition
routine but not in the creation one.


The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because
N^2 algorithms could be applied to arrays.  Are you sure that's not
true for our case?
When you operate an array, indeed. But when we transform ORs to an 
array, not. Just check all the places in the optimizer and even the 
executor where we would pass along the list of ORs. This is why I think 
we should use this optimization even more intensively for huge numbers 
of ORs in an attempt to speed up the overall query.



3) Better save the original order of clauses by putting hash entries and
untransformable clauses to the same list.  A lot of differences in
regression tests output have gone.

I agree that reducing the number of changes in regression tests looks
better. But to achieve this, you introduced a hack that increases the
complexity of the code. Is it worth it? Maybe it would be better to make
one-time changes in tests instead of getting this burden on board. Or
have you meant something more introducing the node type?


For me the reason is not just a regression test.  The current code
keeps the original order of quals as much as possible.  The OR
transformation code reorders quals even in cases when it doesn't
eventually apply any optimization.  I don't think that's acceptable.
However, less hackery ways for this is welcome for sure.
Why is it unacceptable? Can the user implement some order-dependent 
logic with clauses, and will it be correct?
Otherwise, it is a matter of taste, and generally, this decision is up 
to you.



We don't make array values unique.  That might make query execution
performance somewhat worse, and also makes selectivity estimation
worse.  I suggest Andrei and/or Alena should implement making array
values unique.

The fix Alena has made looks correct. But I urge you to think twice:
The optimizer doesn't care about duplicates, so why do we do it?
What's more, this optimization is intended to speed up queries with long
OR lists. Using the list_append_unique() comparator on such lists could
impact performance. I suggest sticking to the common rule and leaving
the responsibility on the user's shoulders.


I don't see why the optimizer doesn't care about duplicates for OR
lists.  As I showed before in an example, it successfully removes the
duplicate.  So, currently OR transformation clearly introduces a
regression in terms of selectivity estimation.  I think we should
evade that.
I think you are right. It is probably a better place than any other to 
remove duplicates in an array. I just think we should sort and remove 
duplicates from entry->consts in one pass. Thus, this optimisation 
should be applied to sortable constants.





At least, we should do this optimization later, in one pass, with
sorting elements before building the array. But what if we don't have a
sort operator for the type?


It was probably discussed before, but can we do our work later?  There
is a canonicalize_qual() which calls find_duplicate_ors().  This is
the place where currently duplicate OR clauses are removed.  Could our
OR-to-ANY transformation be just another call from
canonicalize_qual()?
Hmm, we already tried to do it at that point. I vaguely recall some 
issues caused by this approach. Anyway, it should be done as quickly as 
possible to increase the effect of the optimization.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-11 Thread Alexander Korotkov
Hi Andrei,

Thank you for your response.

On Mon, Mar 11, 2024 at 7:13 AM Andrei Lepikhov
 wrote:
> On 7/3/2024 21:51, Alexander Korotkov wrote:
> > Hi!
> >
> > On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> >  > On 5/3/2024 12:30, Andrei Lepikhov wrote:
> >  > > On 4/3/2024 09:26, jian he wrote:
> >  > ... and the new version of the patchset is attached.
> >
> > I made some revisions for the patchset.
> Great!
> > 1) Use hash_combine() to combine hash values.
> Looks better
> > 2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.
>
> I'm not convinced about this limit. The initial reason was to combine
> long lists of ORs into the array because such a transformation made at
> an early stage increases efficiency.
> I understand the necessity of this limit in the array decomposition
> routine but not in the creation one.

The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because
N^2 algorithms could be applied to arrays.  Are you sure that's not
true for our case?

> > 3) Better save the original order of clauses by putting hash entries and
> > untransformable clauses to the same list.  A lot of differences in
> > regression tests output have gone.
> I agree that reducing the number of changes in regression tests looks
> better. But to achieve this, you introduced a hack that increases the
> complexity of the code. Is it worth it? Maybe it would be better to make
> one-time changes in tests instead of getting this burden on board. Or
> have you meant something more introducing the node type?

For me the reason is not just a regression test.  The current code
keeps the original order of quals as much as possible.  The OR
transformation code reorders quals even in cases when it doesn't
eventually apply any optimization.  I don't think that's acceptable.
However, less hackery ways for this is welcome for sure.

> > We don't make array values unique.  That might make query execution
> > performance somewhat worse, and also makes selectivity estimation
> > worse.  I suggest Andrei and/or Alena should implement making array
> > values unique.
> The fix Alena has made looks correct. But I urge you to think twice:
> The optimizer doesn't care about duplicates, so why do we do it?
> What's more, this optimization is intended to speed up queries with long
> OR lists. Using the list_append_unique() comparator on such lists could
> impact performance. I suggest sticking to the common rule and leaving
> the responsibility on the user's shoulders.

I don't see why the optimizer doesn't care about duplicates for OR
lists.  As I showed before in an example, it successfully removes the
duplicate.  So, currently OR transformation clearly introduces a
regression in terms of selectivity estimation.  I think we should
evade that.

> At least, we should do this optimization later, in one pass, with
> sorting elements before building the array. But what if we don't have a
> sort operator for the type?

It was probably discussed before, but can we do our work later?  There
is a canonicalize_qual() which calls find_duplicate_ors().  This is
the place where currently duplicate OR clauses are removed.  Could our
OR-to-ANY transformation be just another call from
canonicalize_qual()?

--
Regards,
Alexander Korotkov




Re: POC, WIP: OR-clause support for indexes

2024-03-10 Thread Andrei Lepikhov

On 7/3/2024 21:51, Alexander Korotkov wrote:

Hi!

On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

 > On 5/3/2024 12:30, Andrei Lepikhov wrote:
 > > On 4/3/2024 09:26, jian he wrote:
 > ... and the new version of the patchset is attached.

I made some revisions for the patchset.

Great!

1) Use hash_combine() to combine hash values.

Looks better

2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.


I'm not convinced about this limit. The initial reason was to combine 
long lists of ORs into the array because such a transformation made at 
an early stage increases efficiency.
I understand the necessity of this limit in the array decomposition 
routine but not in the creation one.


3) Better save the original order of clauses by putting hash entries and 
untransformable clauses to the same list.  A lot of differences in 
regression tests output have gone.
I agree that reducing the number of changes in regression tests looks 
better. But to achieve this, you introduced a hack that increases the 
complexity of the code. Is it worth it? Maybe it would be better to make 
one-time changes in tests instead of getting this burden on board. Or 
have you meant something more introducing the node type?


We don't make array values unique.  That might make query execution 
performance somewhat worse, and also makes selectivity estimation 
worse.  I suggest Andrei and/or Alena should implement making array 
values unique.

The fix Alena has made looks correct. But I urge you to think twice:
The optimizer doesn't care about duplicates, so why do we do it?
What's more, this optimization is intended to speed up queries with long 
OR lists. Using the list_append_unique() comparator on such lists could 
impact performance. I suggest sticking to the common rule and leaving 
the responsibility on the user's shoulders.
At least, we should do this optimization later, in one pass, with 
sorting elements before building the array. But what if we don't have a 
sort operator for the type?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-08 Thread jian he
+ if (!IsA(lfirst(lc), Invalid))
+ {
+ or_list = lappend(or_list, lfirst(lc));
+ continue;
+ }
 Currently `IsA(lfirst(lc)` works.
but is this generally OK? I didn't find any other examples.
do you need do cast, like `(Node *) lfirst(lc);`

If I understand the logic correctly:
In `foreach(lc, args) ` if everything goes well, it will reach
`hashkey.type = T_Invalid;`
which will make `IsA(lfirst(lc), Invalid)` be true.
adding some comments to the above code would be great.




Re: POC, WIP: OR-clause support for indexes

2024-03-07 Thread Alena Rybakina

Hi!

On 07.03.2024 17:51, Alexander Korotkov wrote:

Hi!

On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov 
 wrote:

> On 5/3/2024 12:30, Andrei Lepikhov wrote:
> > On 4/3/2024 09:26, jian he wrote:
> ... and the new version of the patchset is attached.

I made some revisions for the patchset.
1) Use hash_combine() to combine hash values.
2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.
3) Better save the original order of clauses by putting hash entries 
and untransformable clauses to the same list.  A lot of differences in 
regression tests output have gone.

Thank you for your changes. I agree with them.


One important issue I found.

# create table t as (select i::int%100 i from generate_series(1,1) i);
# analyze t;
# explain select * from t where i = 1 or i = 1;
                     QUERY PLAN
-
 Seq Scan on t  (cost=0.00..189.00 rows=200 width=4)
   Filter: (i = ANY ('{1,1}'::integer[]))
(2 rows)

# set enable_or_transformation = false;
SET
# explain select * from t where i = 1 or i = 1;
                     QUERY PLAN
-
 Seq Scan on t  (cost=0.00..189.00 rows=100 width=4)
   Filter: (i = 1)
(2 rows)

We don't make array values unique.  That might make query execution 
performance somewhat worse, and also makes selectivity estimation 
worse.  I suggest Andrei and/or Alena should implement making array 
values unique.



I have corrected this and some spelling mistakes. The 
unique_any_elements_change.no-cfbot file contains changes.


While I was correcting the test results caused by such changes, I 
noticed that the same behavior was when converting the IN expression,

and this can be seen in the result of the regression test:

 EXPLAIN (COSTS OFF)
 SELECT unique2 FROM onek2
 WHERE stringu1 IN ('A', 'A') AND (stringu1 = 'A' OR stringu1 = 'A');
    QUERY PLAN
---
  Bitmap Heap Scan on onek2
    Recheck Cond: (stringu1 < 'B'::name)
   Filter: ((stringu1 = ANY ('{A,A}'::name[])) AND (stringu1 = 'A'::name))
    ->  Bitmap Index Scan on onek2_u2_prtl
 (4 rows)

--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index baeb83e58b..e9813ef6ab 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -312,8 +312,8 @@ TransformOrExprToANY(ParseState *pstate, List *args, int 
location)
 
if (unlikely(found))
{
-   entry->consts = lappend(entry->consts, const_expr);
-   entry->exprs = lappend(entry->exprs, orqual);
+   entry->consts = list_append_unique(entry->consts, 
const_expr);
+   entry->exprs = list_append_unique(entry->exprs, orqual);
}
else
{
@@ -352,7 +352,6 @@ TransformOrExprToANY(ParseState *pstate, List *args, int 
location)
entry = (OrClauseGroupEntry *) lfirst(lc);
 
Assert(list_length(entry->consts) > 0);
-   Assert(list_length(entry->exprs) == list_length(entry->consts));
 
if (list_length(entry->consts) == 1 ||
list_length(entry->consts) > MAX_SAOP_ARRAY_SIZE)
diff --git a/src/test/regress/expected/create_index.out 
b/src/test/regress/expected/create_index.out
index 7b721bac71..606a4399f9 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1915,16 +1915,16 @@ SELECT count(*) FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand < 42 OR thousand < 99 OR 43 > thousand OR 
42 > thousand);
-QUERY PLAN 
   
---
+  QUERY PLAN   

+---
  Aggregate
->  Bitmap Heap Scan on tenk1
- Recheck Cond: ((hundred = 42) AND (thousand < ANY 
('{42,99,43,42}'::integer[])))
+ Recheck Cond: ((hundred = 42) AND (thousand < ANY 
('{42,99,43}'::integer[])))
  ->  BitmapAnd
->  Bitmap Index Scan on tenk1_hundred
  Index Cond: (hundred = 42)
->  Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: (thousand < ANY ('{42,99,43,42}'::integer[]))
+ Index Cond: (thousand < ANY ('{42,99,43}'::integer[]))
 (8 rows)
 
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/expected/select.out 
b/src/test/regress/expected/select.out
index 0ebaf002e8..f4f5493c43 100644
--- 

Re: POC, WIP: OR-clause support for indexes

2024-03-07 Thread Alexander Korotkov
Hi!

On Tue, Mar 5, 2024 at 9:59 AM Andrei Lepikhov 
wrote:
> On 5/3/2024 12:30, Andrei Lepikhov wrote:
> > On 4/3/2024 09:26, jian he wrote:
> ... and the new version of the patchset is attached.

I made some revisions for the patchset.
1) Use hash_combine() to combine hash values.
2) Upper limit the number of array elements by MAX_SAOP_ARRAY_SIZE.
3) Better save the original order of clauses by putting hash entries and
untransformable clauses to the same list.  A lot of differences in
regression tests output have gone.

One important issue I found.

# create table t as (select i::int%100 i from generate_series(1,1) i);
# analyze t;
# explain select * from t where i = 1 or i = 1;
 QUERY PLAN
-
 Seq Scan on t  (cost=0.00..189.00 rows=200 width=4)
   Filter: (i = ANY ('{1,1}'::integer[]))
(2 rows)

# set enable_or_transformation = false;
SET
# explain select * from t where i = 1 or i = 1;
 QUERY PLAN
-
 Seq Scan on t  (cost=0.00..189.00 rows=100 width=4)
   Filter: (i = 1)
(2 rows)

We don't make array values unique.  That might make query execution
performance somewhat worse, and also makes selectivity estimation worse.  I
suggest Andrei and/or Alena should implement making array values unique.

--
Regards,
Alexander Korotkov


v20-0002-Teach-generate_bitmap_or_paths-to-build-BitmapOr.patch
Description: Binary data


v20-0001-Transform-OR-clauses-to-ANY-expression.patch
Description: Binary data


Re: POC, WIP: OR-clause support for indexes

2024-03-04 Thread Andrei Lepikhov

On 5/3/2024 12:30, Andrei Lepikhov wrote:

On 4/3/2024 09:26, jian he wrote:

... and the new version of the patchset is attached.

--
regards,
Andrei Lepikhov
Postgres Professional
From 1c3ac3e006cd66ff40f1ddaaa09e3fc0f3a75ca5 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clauses to ANY expression.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) 
on the
preliminary stage of optimization when we are still working with the
expression tree.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 doc/src/sgml/config.sgml  |  17 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 339 ++
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 +--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 22 files changed, 906 insertions(+), 69 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index c355e8f3f7..0523bbd8f7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3105,7 +3105,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 < 
20) OR ((r1.c1 IS NULL) AND (r2.c1 < 5 GROUP BY 2 ORDER BY 
array_agg(DISTINCT (r1.c1 % 5)) ASC NULLS LAST
+   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE r1.c1 IS 
NULL) AND (r2.c1 < 5)) OR (r1.c1 < 20))) GROUP BY 2 ORDER BY array_agg(DISTINCT 
(r1.c1 % 5)) ASC NULLS LAST
 (4 rows)
 
 select array_agg(distinct (t1.c1)%5) from ft4 t1 full join ft5 t2 on (t1.c1 = 
t2.c1) where t1.c1 < 20 or (t1.c1 is null and t2.c1 < 5) group by (t2.c1)%3 
order by 1;
@@ -3123,7 +3123,7 @@ select array_agg(distinct (t1.c1)%5 order by (t1.c1)%5) 
from ft4 t1 full join ft
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5) ORDER BY (t1.c1 % 5))), ((t2.c1 % 
3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5) ORDER BY ((r1.c1 % 5)) 
ASC NULLS LAST), (r2.c1 % 3) FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON 
(((r1.c1 = r2.c1 WHERE (((r1.c1 < 20) OR ((r1.c1 IS NULL) AND (r2.c1 < 
5 GROUP BY 2 ORDER BY 

Re: POC, WIP: OR-clause support for indexes

2024-03-04 Thread Andrei Lepikhov

On 4/3/2024 09:26, jian he wrote:

On Thu, Feb 29, 2024 at 4:59 PM Andrei Lepikhov

Feel free to add, change or totally rewrite these changes.

On replacement of static ScalarArrayOpExpr dest with dynamic allocated one:
After discussion [1] I agree with that replacement.

Some style (and language) changes in comments I haven't applied because 
it looks debatable for me.



I think it should be something like:
+ gettext_noop("Transform a sequence of OR expressions to an array
expression."),
+ gettext_noop("The planner will replace expression like 'x=c1 OR x=c2 "
+ "to the expression 'x = ANY( ARRAY[c1,c2])''"

Fixed


queryId may not be a good variable name here?
Not sure. QueryId is a concept, part of queryjumble technique and can be 
used by other tools. It just tells the developer what it is the same 
thing as Query Jumbling but for a separate expression.
At least you don't insist on removing of JumbleState return pointer that 
looks strange for a simple hash ...


comment `/* Compute query ID */`
seems not correct, here we are just hashing the expression?

The same as above.

+/*
+ * Dynahash match function to use in guc_hashtab
the above comments seem not correct?

Yes, fixed.


` It applies to equality expressions only.` seems not correct?
`select * from tenk1 where unique1 < 1 or unique1 < 2; ` can also do
the transformation.

Yes, I forgot it.

`similarity of variable sides.` seems not correct,
should  it be 'sameness of the variable sides`?

The term 'equivalence' looks better *).


in [2], we can get:
SOME is a synonym for ANY. IN is equivalent to = ANY.

but still transforming OR to ANY is not intuitive.
a normal user may not know what is "transforming OR to ANY".
so maybe adding a simple example at

would be great. which, I did at previous thread.
Not sure. Examples in that section are unusual things. What's more, 
should a user who doesn't know what it means to change this setting? 
Let's wait for other opinions.


[1] https://www.postgresql.org/message-id/2157387.1709068...@sss.pgh.pa.us

--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-03 Thread Andrei Lepikhov

On 1/3/2024 22:33, Alexander Korotkov wrote:

I'm going to review and revise the patch.

Nice!


One question I have yet.

 >        /*
 >         * Transformation only works with both side type is not
 >         * { array | composite | domain | record }.

Why do we limit transformation for these types?  Also, it doesn't seem 
the current code restricts anything except composite/record.

Answer can be a bit long. Let's try to see comment a4424c5 at first.

We forbid record types although they can have typarray. It is because of 
the RowExpr comparison specific. Although we have the record_eq() 
routine, all base types in comparing records must be strictly the same. 
Let me show:


explain analyze
SELECT * FROM
(SELECT ROW(relpages,relnatts) AS x FROM pg_class LIMIT 10) AS q1,
(SELECT ROW(relpages,relallvisible) AS x FROM pg_class LIMIT 10) AS q2
WHERE q1.x=q2.x;
ERROR:  cannot compare dissimilar column types smallint and integer at 
record column 2


As you can see, we have smallint and integer in the second position of 
RowExpr and it causes the ERROR. It is the reason, why PostgreSQL 
transforms ROW expressions to the series of ORs, Look:


explain (costs off)
SELECT oid,relname FROM pg_class
WHERE (oid,relname) IN ((1, 'a'), (2,'b'));

 Bitmap Heap Scan on pg_class
   Recheck Cond: ((relname = 'a'::name) OR (relname = 'b'::name))
   Filter: (((oid = '1'::oid) AND (relname = 'a'::name)) OR ((oid = 
'2'::oid) AND (relname = 'b'::name)))

   ->  BitmapOr
...

So, transforming composite types to the ScalarArrayOpExpr expression 
doesn't make sense. Am I wrong?


The same with domain. If it have composite base type we reject the 
transformation according to the logic above.


What about arrays? As I see, arrays don't have typarray and we can avoid 
to spend more cycles after detection of TYPCATEGORY_ARRAY. I haven't 
done it yet because have a second thought: what if to combine arrays 
into the larger one? I'm unsure on that, so we can forbid it too.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-03-03 Thread jian he
On Thu, Feb 29, 2024 at 4:59 PM Andrei Lepikhov
 wrote:
>
> On 28/2/2024 17:27, Alena Rybakina wrote:
> > Maybe like that:
> >
> > It also considers the way to generate a path using BitmapScan indexes,
> > converting the transformed expression into expressions separated by "OR"
> > operations, and if it turns out to be the best and finally selects the
> > best one.
> Thanks,
> I spent some time describing the feature with documentation.
> A condensed description of the GUC is in the runtime-config file.
> Feature description has spread between TransformOrExprToANY and
> generate_saop_pathlist routines.
> Also, I've made tiny changes in the code to look more smoothly.
> All modifications are integrated into the two new patches.
>
> Feel free to add, change or totally rewrite these changes.

diff --git a/src/backend/utils/misc/guc_tables.c
b/src/backend/utils/misc/guc_tables.c
index 93ded31ed9..7d3a1ca238 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1026,6 +1026,17 @@ struct config_bool ConfigureNamesBool[] =
  true,
  NULL, NULL, NULL
  },
+ {
+ {"enable_or_transformation", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Transform a sequence of OR clauses to an IN expression."),
+ gettext_noop("The planner will replace clauses like 'x=c1 OR x=c2 .."
+ "to the clause 'x IN (c1,c2,...)'"),
+ GUC_EXPLAIN
+ },
+ _or_transformation,
+ true,
+ NULL, NULL, NULL
+ },

I think it should be something like:
+ gettext_noop("Transform a sequence of OR expressions to an array
expression."),
+ gettext_noop("The planner will replace expression like 'x=c1 OR x=c2 "
+ "to the expression 'x = ANY( ARRAY[c1,c2])''"

+JumbleState *
+JumbleExpr(Expr *expr, uint64 *queryId)
+{
+ JumbleState *jstate = NULL;
+
+ Assert(queryId != NULL);
+
+ jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+ /* Set up workspace for query jumbling */
+ jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+ jstate->jumble_len = 0;
+ jstate->clocations_buf_size = 32;
+ jstate->clocations = (LocationLen *)
+ palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+ jstate->clocations_count = 0;
+ jstate->highest_extern_param_id = 0;
+
+ /* Compute query ID */
+ _jumbleNode(jstate, (Node *) expr);
+ *queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+ jstate->jumble_len,
+ 0));
+
+ return jstate;
+}
queryId may not be a good variable name here?

comment `/* Compute query ID */`
seems not correct, here we are just hashing the expression?


+/*
+ * Dynahash match function to use in guc_hashtab
+ */
+static int
+orclause_match(const void *data1, const void *data2, Size keysize)
+{
+ OrClauseGroupKey   *key1 = (OrClauseGroupKey *) data1;
+ OrClauseGroupKey   *key2 = (OrClauseGroupKey *) data2;
+
+ Assert(sizeof(OrClauseGroupKey) == keysize);
+
+ if (key1->opno == key2->opno && key1->exprtype == key2->exprtype &&
+ equal(key1->expr, key2->expr))
+ return 0;
+
+ return 1;
+}
the above comments seem not correct?

   
Enables or disables the query planner's ability to lookup and
group multiple
similar OR expressions to ANY () expressions.
The grouping technique of this transformation is based on the
similarity of variable sides.
It applies to equality expressions only. One side of such an expression
must be a constant clause, and the other must contain a variable clause.
The default is on.
Also, during BitmapScan paths generation it enables analysis of elements
of IN or ANY constant arrays to cover such clause with BitmapOr set of
partial index scans.

` It applies to equality expressions only.` seems not correct?
`select * from tenk1 where unique1 < 1 or unique1 < 2; ` can also do
the transformation.
`similarity of variable sides.` seems not correct,
should  it be 'sameness of the variable sides`?

in [1], we can get:
expression IN (value [, ...])
is equivalent to
expression = value1
OR
expression = value2
OR

in [2], we can get:
SOME is a synonym for ANY. IN is equivalent to = ANY.

but still transforming OR to ANY is not intuitive.
a normal user may not know what is "transforming OR to ANY".
so maybe adding a simple example at

would be great. which, I did at previous thread.

I also did some refactoring based on v18, attached.

[1] 
https://www.postgresql.org/docs/current/functions-comparisons.html#FUNCTIONS-COMPARISONS-IN-SCALAR
[2] 
https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-ANY-SOME


v18-0001-Minor-miscellaneous-refactor-based-on-v18.no-cfbot
Description: Binary data


Re: POC, WIP: OR-clause support for indexes

2024-03-03 Thread Alena Rybakina
Sorry, I found explanation - 
https://www.postgresql.org/message-id/CACJufxFS-xcjaWq2Du2OyJUjRAyqCk12Q_zGOPxv61sgrdpw9w%40mail.gmail.com


On 03.03.2024 12:26, Alena Rybakina wrote:


I found that it was mentioned here - 
https://www.postgresql.org/message-id/CACJufxFrZS07oBHMk1_c8P3A84VZ3ysXiZV8NeU6gAnvu%2BHsVA%40mail.gmail.com.


To be honest, I couldn't find any explanation for that.

On 01.03.2024 18:33, Alexander Korotkov wrote:

Hi, Andrei,
Hi, Alena!

On Thu, Feb 29, 2024 at 10:59 AM Andrei Lepikhov 
 wrote:


On 28/2/2024 17:27, Alena Rybakina wrote:
> Maybe like that:
>
> It also considers the way to generate a path using BitmapScan
indexes,
> converting the transformed expression into expressions
separated by "OR"
> operations, and if it turns out to be the best and finally
selects the
> best one.
Thanks,
I spent some time describing the feature with documentation.
A condensed description of the GUC is in the runtime-config file.
Feature description has spread between TransformOrExprToANY and
generate_saop_pathlist routines.
Also, I've made tiny changes in the code to look more smoothly.
All modifications are integrated into the two new patches.

Feel free to add, change or totally rewrite these changes.


I'm going to review and revise the patch.

One question I have yet.

>        /*
>         * Transformation only works with both side type is not
>         * { array | composite | domain | record }.

Why do we limit transformation for these types?  Also, it doesn't 
seem the current code restricts anything except composite/record.


--
Regards,
Alexander Korotkov

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-03-03 Thread Alena Rybakina
I found that it was mentioned here - 
https://www.postgresql.org/message-id/CACJufxFrZS07oBHMk1_c8P3A84VZ3ysXiZV8NeU6gAnvu%2BHsVA%40mail.gmail.com.


To be honest, I couldn't find any explanation for that.

On 01.03.2024 18:33, Alexander Korotkov wrote:

Hi, Andrei,
Hi, Alena!

On Thu, Feb 29, 2024 at 10:59 AM Andrei Lepikhov 
 wrote:


On 28/2/2024 17:27, Alena Rybakina wrote:
> Maybe like that:
>
> It also considers the way to generate a path using BitmapScan
indexes,
> converting the transformed expression into expressions separated
by "OR"
> operations, and if it turns out to be the best and finally
selects the
> best one.
Thanks,
I spent some time describing the feature with documentation.
A condensed description of the GUC is in the runtime-config file.
Feature description has spread between TransformOrExprToANY and
generate_saop_pathlist routines.
Also, I've made tiny changes in the code to look more smoothly.
All modifications are integrated into the two new patches.

Feel free to add, change or totally rewrite these changes.


I'm going to review and revise the patch.

One question I have yet.

>        /*
>         * Transformation only works with both side type is not
>         * { array | composite | domain | record }.

Why do we limit transformation for these types?  Also, it doesn't seem 
the current code restricts anything except composite/record.


--
Regards,
Alexander Korotkov


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-03-01 Thread Alexander Korotkov
Hi, Andrei,
Hi, Alena!

On Thu, Feb 29, 2024 at 10:59 AM Andrei Lepikhov 
wrote:

> On 28/2/2024 17:27, Alena Rybakina wrote:
> > Maybe like that:
> >
> > It also considers the way to generate a path using BitmapScan indexes,
> > converting the transformed expression into expressions separated by "OR"
> > operations, and if it turns out to be the best and finally selects the
> > best one.
> Thanks,
> I spent some time describing the feature with documentation.
> A condensed description of the GUC is in the runtime-config file.
> Feature description has spread between TransformOrExprToANY and
> generate_saop_pathlist routines.
> Also, I've made tiny changes in the code to look more smoothly.
> All modifications are integrated into the two new patches.
>
> Feel free to add, change or totally rewrite these changes.
>

I'm going to review and revise the patch.

One question I have yet.

>/*
> * Transformation only works with both side type is not
> * { array | composite | domain | record }.

Why do we limit transformation for these types?  Also, it doesn't seem the
current code restricts anything except composite/record.

--
Regards,
Alexander Korotkov


Re: POC, WIP: OR-clause support for indexes

2024-02-29 Thread Andrei Lepikhov

On 28/2/2024 17:27, Alena Rybakina wrote:

Maybe like that:

It also considers the way to generate a path using BitmapScan indexes, 
converting the transformed expression into expressions separated by "OR" 
operations, and if it turns out to be the best and finally selects the 
best one.

Thanks,
I spent some time describing the feature with documentation.
A condensed description of the GUC is in the runtime-config file.
Feature description has spread between TransformOrExprToANY and 
generate_saop_pathlist routines.

Also, I've made tiny changes in the code to look more smoothly.
All modifications are integrated into the two new patches.

Feel free to add, change or totally rewrite these changes.

--
regards,
Andrei Lepikhov
Postgres Professional
From 015a564cc784139c806a7004f25bf5f7a4b4a29d Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 doc/src/sgml/config.sgml  |  18 +
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 340 ++
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 +++-
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 +--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 22 files changed, 908 insertions(+), 69 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index c355e8f3f7..0523bbd8f7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3105,7 +3105,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 < 
20) OR ((r1.c1 IS NULL) AND (r2.c1 < 5 GROUP BY 2 ORDER BY 
array_agg(DISTINCT (r1.c1 % 5)) ASC NULLS LAST
+   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE r1.c1 IS 
NULL) AND (r2.c1 < 5)) OR (r1.c1 < 20))) GROUP BY 2 ORDER BY array_agg(DISTINCT 
(r1.c1 % 5)) ASC NULLS LAST
 (4 rows)
 
 select array_agg(distinct (t1.c1)%5) from ft4 t1 full join ft5 t2 on (t1.c1 = 
t2.c1) where t1.c1 < 20 or (t1.c1 is null and t2.c1 < 5) group by 

Re: POC, WIP: OR-clause support for indexes

2024-02-28 Thread Andrei Lepikhov

On 28/2/2024 17:07, jian he wrote:

doc/src/sgml/array.sgml corresponds to
https://www.postgresql.org/docs/current/arrays.html.
  this GUC is related to parser|optimzier.
adding a GUC to array.sgml seems strange. (I think).
Maybe. In that case, I suggest adding extended comments to functions 
transformBoolExprOr and generate_saop_pathlist (including 
cross-referencing each other). These are starting points to understand 
the transformation and, therefore, a good place for a detailed explanation.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-28 Thread Alena Rybakina



On 28.02.2024 13:07, jian he wrote:

On Wed, Feb 28, 2024 at 12:19 PM Andrei Lepikhov
 wrote:

On 26/2/2024 11:10, Alena Rybakina wrote:

On 24.02.2024 14:28, jian he wrote:

Hi.
I wrote the first draft patch of the documentation.
it's under the section: Planner Method Configuration
(runtime-config-query.html)
but this feature's main meat is in src/backend/parser/parse_expr.c
so it may be slightly inconsistent, as mentioned by others.

You can further furnish it.

Thank you for your work. I found a few spelling mistakes - I fixed this
and added some information about generating a partial index plan. I
attached it.

Thanks Jian and Alena,
It is a good start for the documentation. But I think the runtime-config
section needs only a condensed description of a feature underlying the
GUC. The explanations in this section look a bit awkward.
Having looked through the documentation for a better place for a
detailed explanation, I found array.sgml as a candidate. Also, we have
the parser's short overview section. I'm unsure about the best place but
it is better when the server config section.

doc/src/sgml/array.sgml corresponds to
https://www.postgresql.org/docs/current/arrays.html.
  this GUC is related to parser|optimzier.
adding a GUC to array.sgml seems strange. (I think).


I suggest describing our feature in array.sgml and mentioning a GUC there.

We can describe a GUC in config.sgml.


2. We should describe the second part of the feature, where the
optimiser can split an array to fit the optimal BitmapOr scan path.

we can add a sentence explaining that:
it may not do the expression transformation when the original
expression can be utilized by index mechanism.
I am not sure how to rephrase it.


Maybe like that:

It also considers the way to generate a path using BitmapScan indexes, 
converting the transformed expression into expressions separated by "OR" 
operations, and if it turns out to be the best and finally selects the 
best one.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: POC, WIP: OR-clause support for indexes

2024-02-28 Thread jian he
On Wed, Feb 28, 2024 at 12:19 PM Andrei Lepikhov
 wrote:
>
> On 26/2/2024 11:10, Alena Rybakina wrote:
> > On 24.02.2024 14:28, jian he wrote:
> >> Hi.
> >> I wrote the first draft patch of the documentation.
> >> it's under the section: Planner Method Configuration
> >> (runtime-config-query.html)
> >> but this feature's main meat is in src/backend/parser/parse_expr.c
> >> so it may be slightly inconsistent, as mentioned by others.
> >>
> >> You can further furnish it.
> >
> > Thank you for your work. I found a few spelling mistakes - I fixed this
> > and added some information about generating a partial index plan. I
> > attached it.
> Thanks Jian and Alena,
> It is a good start for the documentation. But I think the runtime-config
> section needs only a condensed description of a feature underlying the
> GUC. The explanations in this section look a bit awkward.
> Having looked through the documentation for a better place for a
> detailed explanation, I found array.sgml as a candidate. Also, we have
> the parser's short overview section. I'm unsure about the best place but
> it is better when the server config section.

doc/src/sgml/array.sgml corresponds to
https://www.postgresql.org/docs/current/arrays.html.
 this GUC is related to parser|optimzier.
adding a GUC to array.sgml seems strange. (I think).

> 2. We should describe the second part of the feature, where the
> optimiser can split an array to fit the optimal BitmapOr scan path.

we can add a sentence explaining that:
it may not do the expression transformation when the original
expression can be utilized by index mechanism.
I am not sure how to rephrase it.




Re: POC, WIP: OR-clause support for indexes

2024-02-27 Thread Andrei Lepikhov

On 26/2/2024 11:10, Alena Rybakina wrote:

On 24.02.2024 14:28, jian he wrote:

Hi.
I wrote the first draft patch of the documentation.
it's under the section: Planner Method Configuration 
(runtime-config-query.html)

but this feature's main meat is in src/backend/parser/parse_expr.c
so it may be slightly inconsistent, as mentioned by others.

You can further furnish it.


Thank you for your work. I found a few spelling mistakes - I fixed this 
and added some information about generating a partial index plan. I 
attached it.

Thanks Jian and Alena,
It is a good start for the documentation. But I think the runtime-config 
section needs only a condensed description of a feature underlying the 
GUC. The explanations in this section look a bit awkward.
Having looked through the documentation for a better place for a 
detailed explanation, I found array.sgml as a candidate. Also, we have 
the parser's short overview section. I'm unsure about the best place but 
it is better when the server config section.

What's more, there are some weak points in the documentation:
1. We choose constant and variable parts of an expression and don't 
require the constant to be on the right side.
2. We should describe the second part of the feature, where the 
optimiser can split an array to fit the optimal BitmapOr scan path.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-25 Thread Alena Rybakina

On 24.02.2024 14:28, jian he wrote:

Hi.
I wrote the first draft patch of the documentation.
it's under the section: Planner Method Configuration (runtime-config-query.html)
but this feature's main meat is in src/backend/parser/parse_expr.c
so it may be slightly inconsistent, as mentioned by others.

You can further furnish it.


Thank you for your work. I found a few spelling mistakes - I fixed this 
and added some information about generating a partial index plan. I 
attached it.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From e3a0e01c43a70099f6870a468d0cc3a8bdcb2775 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Mon, 26 Feb 2024 06:37:36 +0300
Subject: [PATCH] doc1

---
 doc/src/sgml/config.sgml | 74 
 1 file changed, 74 insertions(+)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 36a2a5ce431..47f82ca2dc9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5294,6 +5294,80 @@ ANY num_sync ( 
+  enable_or_transformation (boolean)
+   
+enable_or_transformation configuration 
parameter
+   
+  
+  
+   
+Enables or disables in query planner's ability to transform multiple 
expressions in ()
+ to a ANY expression ().
+This transformations only apply under the following condition:
+
+  
+  
+Each expression should be formed as: 
expression operator  
(expression).
+The right-hand side of the operator should be just a plain 
constant.
+The left-hand side of these expressions should remain unchanged.
+  
+  
+
+
+  
+  
+At the stage of index formation, a check is made on the 
restructuring of the plan using partial indexes or the formation of expressions 
combined by the "OR" operator.
+  
+  
+
+
+  
+
+  Each expression form should return Boolean (true/false) 
result.
+
+  
+
+
+  
+
+These expressions are logically linked in a OR condition.
+
+  
+
+  The default is on.
+  
+
+  For example, the following query without setting 
enable_or_transformation is usually applied to the three 
filtering conditions separately,
+  but if we set enable_or_transformation, we 
combine the three expressions into only one expression: unique1 = ANY 
('{1,2,3}'::integer[]) .
+  
+  EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 1 or unique1 = 2 or 
unique1 = 3;
+
+  QUERY PLAN
+  -
+  Seq Scan on tenk1  (cost=0.00..482.50 rows=3 width=244)
+Filter: (unique1 = ANY ('{1,2,3}'::integer[]))
+  
+
+
+  Another example is the following query with a given 
enable_or_transformation value, but we have generated a plan 
with partial indexes.
+  
+  EXPLAIN SELECT unique2, stringu1 FROM onek2 WHERE unique1 = 1 OR 
unique1 = PI()::integer;
+  QUERY PLAN
+  --
+  Bitmap Heap Scan on onek2
+Recheck Cond: ((unique1 = 3) OR (unique1 = 1))
+->  BitmapOr
+  ->  Bitmap Index Scan on onek2_u1_prtl
+Index Cond: (unique1 = 3)
+  ->  Bitmap Index Scan on onek2_u1_prtl
+Index Cond: (unique1 = 1)
+  (7 rows)
+  
+
+  
+ 
+
  
   enable_parallel_append (boolean)
   
-- 
2.34.1



Re: POC, WIP: OR-clause support for indexes

2024-02-24 Thread jian he
Hi.
I wrote the first draft patch of the documentation.
it's under the section: Planner Method Configuration (runtime-config-query.html)
but this feature's main meat is in src/backend/parser/parse_expr.c
so it may be slightly inconsistent, as mentioned by others.

You can further furnish it.


v1-0001-Add-enable_or_transformation-doc-entry.no-cfbot
Description: Binary data


Re: POC, WIP: OR-clause support for indexes

2024-02-20 Thread Ranier Vilela
Em ter., 20 de fev. de 2024 às 00:18, Andrei Lepikhov <
a.lepik...@postgrespro.ru> escreveu:

> On 19/2/2024 19:53, Ranier Vilela wrote:
> > v17-0002
> > 1) move the vars *arrayconst and *dest, to after if, to avoid makeNode
> > (palloc).
> > + Const   *arrayconst;
> > + ScalarArrayOpExpr  *dest;
> > +
> > + pd = (PredicatesData *) lfirst(lc);
> > + if (pd->elems == NIL)
> > + /* The index doesn't participate in this operation */
> > + continue;
> >
> > + arrayconst = lsecond_node(Const, saop->args);
> > + dest = makeNode(ScalarArrayOpExpr);
> Thanks for the review!
> I'm not sure I understand you clearly. Does the patch in attachment fix
> the issue you raised?
>
Sorry if I wasn't clear.
What I meant is to move the initializations of the variables *arrayconst*
and *dest*
for after the test (if (pd->elems == NIL)
To avoid unnecessary initialization if the test fails.

best regards,
Ranier Vilela


Re: POC, WIP: OR-clause support for indexes

2024-02-19 Thread Andrei Lepikhov

On 20/2/2024 11:03, jian he wrote:

Neither the code comments nor the commit message really explain the
design idea here. That's unfortunate, principally because it makes
review difficult.

I'm very skeptical about the idea of using JumbleExpr for any part of
this. It seems fairly expensive, and it might produce false matches.
If expensive is OK, then why not just use equal()? If it's not, then
this probably isn't really OK either. But in any case there should be
comments explaining why this strategy was chosen.


The above message
(https://postgr.es/m/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com)
seems still not answered.
How can we evaluate whether JumbleExpr is expensive or not?
I used this naive script to test, but didn't find a big difference
when enable_or_transformation is ON or OFF.
First, I am open to discussion here. But IMO, equal() operation is quite 
expensive by itself. We should use the hash table approach to avoid 
quadratic behaviour when looking for similar clauses in the 'OR' list.
Moreover, we use equal() in many places: selectivity estimations, 
proving of fitting the index, predtest, etc. So, by reducing the clause 
list, we eliminate many calls of the equal() routine, too.



`leftop operator rightop`
the operator can also be volatile.
Do we need to check (op_volatile(opno) == PROVOLATILE_VOLATILE) within
transformBoolExprOr?

As usual, could you provide a test case to discuss it more objectively?

--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-19 Thread jian he
On Mon, Feb 19, 2024 at 4:35 PM Andrei Lepikhov
 wrote:
>
> In attachment - v17 for both patches. As I see it, the only general
> explanation of the idea is not addressed. I'm not sure how deeply we
> should explain it.


> On Tue, Nov 28, 2023 at 5:04 AM Robert Haas  wrote:
>
> On Mon, Nov 27, 2023 at 3:02 AM Andrei Lepikhov
>  wrote:
> > On 25/11/2023 08:23, Alexander Korotkov wrote:
> > > I think patch certainly gets better in this aspect.  One thing I can't
> > > understand is why do we use home-grown code for resolving
> > > hash-collisions.  You can just define custom hash and match functions
> > > in HASHCTL.  Even if we need to avoid repeated JumbleExpr calls, we
> > > still can save pre-calculated hash value into hash entry and use
> > > custom hash and match.  This doesn't imply us to write our own
> > > collision-resolving code.
> >
> > Thanks, it was an insightful suggestion.
> > I implemented it, and the code has become shorter (see attachment).
>
> Neither the code comments nor the commit message really explain the
> design idea here. That's unfortunate, principally because it makes
> review difficult.
>
> I'm very skeptical about the idea of using JumbleExpr for any part of
> this. It seems fairly expensive, and it might produce false matches.
> If expensive is OK, then why not just use equal()? If it's not, then
> this probably isn't really OK either. But in any case there should be
> comments explaining why this strategy was chosen.

The above message
(https://postgr.es/m/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com)
seems still not answered.
How can we evaluate whether JumbleExpr is expensive or not?
I used this naive script to test, but didn't find a big difference
when enable_or_transformation is ON or OFF.

`
create table test_1_100 as (select (random()*1000)::int x,
(random()*1000) y from generate_series(1,1_000_000) i);
explain(costs off, analyze)
select * from test
where x = 1 or x + 2= 3 or x + 3= 4 or x + 4= 5
or x + 5= 6 or x + 6= 7 or x + 7= 8 or x + 8= 9 or x + 9=10
or x + 10= 11 or x + 11= 12 or x + 12= 13 or x + 13= 14
or x + 14= 15 or x + 15= 16 or x + 16= 17 or x + 17= 18
or x + 18=19 or x + 19= 20 or x + 20= 21 or x + 21= 22
or x + 22= 23 or x + 23= 24 or x + 24= 25 or x + 25= 26
or x + 26= 27 or x + 27=28 or x + 28= 29 or x + 29= 30
or x + 30= 31 \watch i=0.1 c=10
`

`leftop operator rightop`
the operator can also be volatile.
Do we need to check (op_volatile(opno) == PROVOLATILE_VOLATILE) within
transformBoolExprOr?




Re: POC, WIP: OR-clause support for indexes

2024-02-19 Thread Andrei Lepikhov

On 19/2/2024 19:53, Ranier Vilela wrote:

v17-0002
1) move the vars *arrayconst and *dest, to after if, to avoid makeNode 
(palloc).

+ Const   *arrayconst;
+ ScalarArrayOpExpr  *dest;
+
+ pd = (PredicatesData *) lfirst(lc);
+ if (pd->elems == NIL)
+ /* The index doesn't participate in this operation */
+ continue;

+ arrayconst = lsecond_node(Const, saop->args);
+ dest = makeNode(ScalarArrayOpExpr);

Thanks for the review!
I'm not sure I understand you clearly. Does the patch in attachment fix 
the issue you raised?


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 56b04541db..1545876e2f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -1284,7 +1284,7 @@ build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, 
RestrictInfo *rinfo,
ArrayType  *arrayval = NULL;
ArrayExpr  *arr = NULL;
Const  *arrayconst = lsecond_node(Const, 
saop->args);
-   ScalarArrayOpExpr  *dest = makeNode(ScalarArrayOpExpr);
+   ScalarArrayOpExpr   dest;
 
pd = (PredicatesData *) lfirst(lc);
if (pd->elems == NIL)
@@ -1302,10 +1302,10 @@ build_paths_for_SAOP(PlannerInfo *root, RelOptInfo 
*rel, RestrictInfo *rinfo,
arr->multidims = false;
 
/* Compose new SAOP, partially covering the source one */
-   memcpy(dest, saop, sizeof(ScalarArrayOpExpr));
-   dest->args = list_make2(linitial(saop->args), arr);
+   memcpy(, saop, sizeof(ScalarArrayOpExpr));
+   dest.args = list_make2(linitial(saop->args), arr);
 
-   clause = (Expr *) estimate_expression_value(root, (Node *) 
dest);
+   clause = (Expr *) estimate_expression_value(root, (Node *) 
);
 
/*
 * Create new RestrictInfo. It maybe more heavy than just copy 
node,


Re: POC, WIP: OR-clause support for indexes

2024-02-19 Thread Ranier Vilela
Em seg., 19 de fev. de 2024 às 05:35, Andrei Lepikhov <
a.lepik...@postgrespro.ru> escreveu:

> On 16/2/2024 19:54, jian he wrote:
> > After setting these parameters, overall enable_or_transformation ON is
> > performance better.
> > sorry for the noise.
> Don't worry, at least we know a weak point of partial paths estimation.
> > so now I didn't find any corner case where enable_or_transformation is
> > ON peforms worse than when it's OFF.
> >
> > +typedef struct OrClauseGroupEntry
> > +{
> > + OrClauseGroupKey key;
> > +
> > + Node   *node;
> > + List   *consts;
> > + Oid scalar_type;
> > + List   *exprs;
> > +} OrClauseGroupEntry;
> >
> > I found that the field `scalar_type` was never used.
> Thanks, fixed.
>
Not that it will make a big difference, but it would be good to avoid, I
think.

v17-0002
1) move the vars *arrayconst and *dest, to after if, to avoid makeNode
(palloc).
+ Const   *arrayconst;
+ ScalarArrayOpExpr  *dest;
+
+ pd = (PredicatesData *) lfirst(lc);
+ if (pd->elems == NIL)
+ /* The index doesn't participate in this operation */
+ continue;

+ arrayconst = lsecond_node(Const, saop->args);
+ dest = makeNode(ScalarArrayOpExpr);

best regards,
Ranier Vilela


Re: POC, WIP: OR-clause support for indexes

2024-02-19 Thread Andrei Lepikhov

On 16/2/2024 19:54, jian he wrote:

After setting these parameters, overall enable_or_transformation ON is
performance better.
sorry for the noise.

Don't worry, at least we know a weak point of partial paths estimation.

so now I didn't find any corner case where enable_or_transformation is
ON peforms worse than when it's OFF.

+typedef struct OrClauseGroupEntry
+{
+ OrClauseGroupKey key;
+
+ Node   *node;
+ List   *consts;
+ Oid scalar_type;
+ List   *exprs;
+} OrClauseGroupEntry;

I found that the field `scalar_type` was never used.

Thanks, fixed.
In attachment - v17 for both patches. As I see it, the only general 
explanation of the idea is not addressed. I'm not sure how deeply we 
should explain it.


--
regards,
Andrei Lepikhov
Postgres Professional
From 3a3b6aa36320a06b64f2f608e3526255e53ed655 Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 326 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 -
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 ++--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 21 files changed, 875 insertions(+), 70 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index c355e8f3f7..0523bbd8f7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3105,7 +3105,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 < 
20) OR ((r1.c1 IS NULL) AND (r2.c1 < 5 GROUP BY 2 ORDER BY 
array_agg(DISTINCT (r1.c1 % 5)) ASC NULLS LAST
+   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE r1.c1 IS 
NULL) AND (r2.c1 < 5)) OR (r1.c1 < 20))) GROUP BY 2 ORDER BY array_agg(DISTINCT 
(r1.c1 % 5)) ASC NULLS LAST
 (4 rows)
 
 select array_agg(distinct (t1.c1)%5) from ft4 t1 full join ft5 t2 on (t1.c1 = 
t2.c1) where t1.c1 < 20 or (t1.c1 is null and t2.c1 < 5) group by (t2.c1)%3 
order by 1;
@@ -3123,7 +3123,7 @@ 

Re: POC, WIP: OR-clause support for indexes

2024-02-16 Thread jian he
On Fri, Feb 16, 2024 at 1:32 PM Andrei Lepikhov
 wrote:
>
> On 16/2/2024 07:00, jian he wrote:
> > On Wed, Feb 14, 2024 at 11:21 AM Andrei Lepikhov
> >  wrote:
> > My OS: Ubuntu 22.04.3 LTS
> > I already set the max_parallel_workers_per_gather to 10.
> > So for all cases, it should use parallelism first?
> >
> > a better question would be:
> > how to make the number of OR less than 29 still faster when
> > enable_or_transformation is ON by only set parameters?
> In my test environment this example gives some subtle supremacy to ORs
> over ANY with only 3 ors and less.
> Please, provide next EXPLAIN ANALYZE results for the case you want to
> discuss here:
> 1. with enable_or_transformation enabled
> 2. with enable_or_transformation disabled
> 3. with enable_or_transformation disabled but with manual transformation
> OR -> ANY done, to check the overhead of this optimization.
>

you previously mentioned playing with parallel_tuple_cost and
parallel_setup_cost.
(https://www.postgresql.org/message-id/e3338e82-a28d-4631-9eec-b9c0984b32d5%40postgrespro.ru)

So I did by
`
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;
`

After setting these parameters, overall enable_or_transformation ON is
performance better.
sorry for the noise.
so now I didn't find any corner case where enable_or_transformation is
ON peforms worse than when it's OFF.

+typedef struct OrClauseGroupEntry
+{
+ OrClauseGroupKey key;
+
+ Node   *node;
+ List   *consts;
+ Oid scalar_type;
+ List   *exprs;
+} OrClauseGroupEntry;

I found that the field `scalar_type` was never used.




Re: POC, WIP: OR-clause support for indexes

2024-02-15 Thread Andrei Lepikhov

On 16/2/2024 07:00, jian he wrote:

On Wed, Feb 14, 2024 at 11:21 AM Andrei Lepikhov
 wrote:
My OS: Ubuntu 22.04.3 LTS
I already set the max_parallel_workers_per_gather to 10.
So for all cases, it should use parallelism first?

a better question would be:
how to make the number of OR less than 29 still faster when
enable_or_transformation is ON by only set parameters?
In my test environment this example gives some subtle supremacy to ORs 
over ANY with only 3 ors and less.
Please, provide next EXPLAIN ANALYZE results for the case you want to 
discuss here:

1. with enable_or_transformation enabled
2. with enable_or_transformation disabled
3. with enable_or_transformation disabled but with manual transformation 
OR -> ANY done, to check the overhead of this optimization.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-15 Thread jian he
On Wed, Feb 14, 2024 at 11:21 AM Andrei Lepikhov
 wrote:
>
> So, this example is more about the subtle balance between
> parallel/sequential execution, which can vary from one platform to another.
>

Hi, here I attached two files, expression_num_or_1_100.sql,
expression_num_or_1_1.sql
it has step by step test cases, also with the tests output.

For both sql files, I already set the max_parallel_workers_per_gather to
10, work_mem to 4GB.
I think the parameters setting should be fine.

in expression_num_or_1_100.sql:
main test table:
create table test_1_100 as (select
(random()*1000)::int x, (random()*1000) y from
generate_series(1,1_000_000) i);

if the number of OR exceeds 29,
the performance with enable_or_transformation (ON) begins to outpace
enable_or_transformation (OFF).

if the number of OR less than 29,
the performance with enable_or_transformation (OFF) is better than
enable_or_transformation (ON).

expression_num_or_1_1.sql
enable_or_transformation (ON) is always better than
enable_or_transformation (OFF).

My OS: Ubuntu 22.04.3 LTS
I already set the max_parallel_workers_per_gather to 10.
So for all cases, it should use parallelism first?

a better question would be:
how to make the number of OR less than 29 still faster when
enable_or_transformation is ON by only set parameters?


expression_num_or_1_100.sql
Description: application/sql


expression_num_or_1_1.sql
Description: application/sql


Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov

On 13/2/2024 17:03, Andrei Lepikhov wrote:

On 13/2/2024 07:00, jian he wrote:

The time is the last result of the 10 iterations.
I'm not sure about the origins of such behavior, but it seems to be an 
issue of parallel workers, not this specific optimization.
Having written that, I'd got a backburner. And to close that issue, I 
explored get_restriction_qual_cost(). A close look shows us that "x IN 
(..)" cheaper than its equivalent "x=N1 OR ...". Just numbers:


ANY: startup_cost = 0.0225; total_cost = 0.005
OR: startup_cost==0; total_cost = 0.0225

Expression total_cost is calculated per tuple. In your example, we have 
many tuples, so the low cost of expression per tuple dominates over the 
significant startup cost.


According to the above, SAOP adds 6250 to the cost of SeqScan; OR - 
13541. So, the total cost of the query with SAOP is less than with OR, 
and the optimizer doesn't choose heavy parallel workers. And it is the 
answer.


So, this example is more about the subtle balance between 
parallel/sequential execution, which can vary from one platform to another.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov

On 12/2/2024 17:51, Alena Rybakina wrote:

On 12.02.2024 12:01, Andrei Lepikhov wrote:

On 12/2/2024 15:55, Alena Rybakina wrote:
As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.
I think you must explain your note in more detail. Before this call, 
we already called make_restrictinfo() and built rinfo1, haven't we?


I got it, I think, I was confused by the “else“ block when we can 
process the index, otherwise we move on to the next element.


I think maybe “else“ block of creating restrictinfo with the indexpaths 
list creation should be moved to a separate function or just remove "else"?
IMO, it is a matter of taste. But if you are really confused, maybe it 
will make understanding for someone else simpler. So, changed.
I think we need to check that rinfo->clause is not empty, because if it 
is we can miss calling build_paths_for_OR function. We should add it there:


restriction_is_saop_clause(RestrictInfo *restrictinfo)
{
     if (IsA(restrictinfo->clause, ScalarArrayOpExpr))
I wonder if we should add here assertion, not NULL check. In what case 
we could get NULL clause here? But added for surety.


By the way, I think we need to add a check that the clauseset is not 
empty (if (!clauseset.nonempty)) otherwise we could get an error. The 
same check I noticed in build_paths_for_OR function.

I don't. Feel free to provide counterexample.

--
regards,
Andrei Lepikhov
Postgres Professional
From 2b088a1bd662c614ee491a797d2fcb67e2fb2391 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 24 Jan 2024 14:07:17 +0700
Subject: [PATCH 2/2] Teach generate_bitmap_or_paths to build BitmapOr paths
 over SAOP clauses.

Likewise OR clauses, discover SAOP array and try to split its elements
between smaller sized arrays to fit a set of partial indexes.
---
 src/backend/optimizer/path/indxpath.c | 301 ++
 src/backend/optimizer/util/predtest.c |  46 
 src/backend/optimizer/util/restrictinfo.c |  13 +
 src/include/optimizer/optimizer.h |  16 ++
 src/include/optimizer/restrictinfo.h  |   1 +
 src/test/regress/expected/select.out  | 282 
 src/test/regress/sql/select.sql   |  82 ++
 src/tools/pgindent/typedefs.list  |   1 +
 8 files changed, 689 insertions(+), 53 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 32c6a8bbdc..56b04541db 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -32,6 +32,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/array.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
@@ -1220,11 +1221,174 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
return result;
 }
 
+/*
+ * Building index paths over SAOP clause differs from the logic of OR clauses.
+ * Here we iterate across all the array elements and split them to SAOPs,
+ * corresponding to different indexes. We must match each element to an index.
+ */
+static List *
+build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
+List *other_clauses)
+{
+   List   *result = NIL;
+   List   *predicate_lists = NIL;
+   ListCell   *lc;
+   PredicatesData *pd;
+   ScalarArrayOpExpr  *saop = (ScalarArrayOpExpr *) rinfo->clause;
+
+   Assert(IsA(saop, ScalarArrayOpExpr) && saop->useOr);
+
+   if (!IsA(lsecond(saop->args), Const))
+   /*
+* Has it practical outcome to merge arrays which couldn't 
constantified
+* before that step?
+*/
+   return NIL;
+
+   /* Collect predicates */
+   foreach(lc, rel->indexlist)
+   {
+   IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+
+   /* Take into consideration partial indexes supporting bitmap 
scans */
+   if (!index->amhasgetbitmap || index->indpred == NIL || 
index->predOK)
+   continue;
+
+   pd = palloc0(sizeof(PredicatesData));
+   pd->id = foreach_current_index(lc);
+   /* The trick with reducing recursion is stolen from 
predicate_implied_by */
+   pd->predicate = list_length(index->indpred) == 1 ?
+   
(Node *) linitial(index->indpred) :
+   
(Node *) index->indpred;
+   predicate_lists = lappend(predicate_lists, (void *) pd);
+   }
+
+   /* Split the array data according to index predicates. */
+   if (predicate_lists == NIL ||
+ 

Re: POC, WIP: OR-clause support for indexes

2024-02-13 Thread Andrei Lepikhov

On 13/2/2024 07:00, jian he wrote:

+ newa = makeNode(ArrayExpr);
+ /* array_collid will be set by parse_collate.c */
+ newa->element_typeid = scalar_type;
+ newa->array_typeid = array_type;
+ newa->multidims = false;
+ newa->elements = aexprs;
+ newa->location = -1;

I am confused by the comments `array_collid will be set by
parse_collate.c`, can you further explain it?
I wonder if the second paragraph of comments on commit b310b6e will be 
enough to dive into details.



if OR expression right arm is not plain Const, but with collation
specification, eg.
`where a  = 'a' collate "C" or a = 'b' collate "C";`

then the rightop is not Const, it will be CollateExpr, it will not be
used in transformation.
Yes, it is done for simplicity right now. I'm not sure about corner 
cases of merging such expressions.




set enable_or_transformation to on;
explain(timing off, analyze, costs off)
select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
= 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
35.376 ms

The time is the last result of the 10 iterations.

The reason here - parallel workers.
If you see into the plan you will find parallel workers without 
optimization and absence of them in the case of optimization:


Gather  (cost=1000.00..28685.37 rows=87037 width=12)
(actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
 Filter: ((x = 1) OR (x = 2) OR (x = 3) OR (x = 4) OR (x = 5) 
OR (x = 6) OR (x = 7) OR (x = 8) OR (x = 9))


Seq Scan on test  (cost=0.02..20440.02 rows=90600 width=12)
  (actual rows=90363 loops=1)
   Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))

Having 90600 tuples returned we estimate it into 87000 (less precisely) 
without transformation and 90363 (more precisely) with the transformation.
But if you play with parallel_tuple_cost and parallel_setup_cost, you 
will end up having these parallel workers:


 Gather  (cost=0.12..11691.03 rows=90600 width=12)
 (actual rows=90363 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test
 Filter: (x = ANY ('{1,2,3,4,5,6,7,8,9}'::integer[]))
 Rows Removed by Filter: 303212

And some profit about 25%, on my laptop.
I'm not sure about the origins of such behavior, but it seems to be an 
issue of parallel workers, not this specific optimization.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-12 Thread jian he
On Thu, Feb 8, 2024 at 1:34 PM Andrei Lepikhov
 wrote:
> A couple of questions:
> 1. As I see, transformAExprIn uses the same logic as we invented but
> allows composite and domain types. Could you add a comment explaining
> why we forbid row types in general, in contrast to the transformAExprIn
> routine?
> 2. Could you provide the tests to check issues covered by the recent (in
> v.15) changes?
>
> Patch 0001-* in the attachment incorporates changes induced by Jian's
> notes from [1].
> Patch 0002-* contains a transformation of the SAOP clause, which allows
> the optimizer to utilize partial indexes if they cover all values in
> this array. Also, it is an answer to Alexander's note [2] on performance
> degradation. This first version may be a bit raw, but I need your
> opinion: Does it resolve the issue?
>

+ newa = makeNode(ArrayExpr);
+ /* array_collid will be set by parse_collate.c */
+ newa->element_typeid = scalar_type;
+ newa->array_typeid = array_type;
+ newa->multidims = false;
+ newa->elements = aexprs;
+ newa->location = -1;

I am confused by the comments `array_collid will be set by
parse_collate.c`, can you further explain it?
if OR expression right arm is not plain Const, but with collation
specification, eg.
`where a  = 'a' collate "C" or a = 'b' collate "C";`

then the rightop is not Const, it will be CollateExpr, it will not be
used in transformation.
-
Maybe the previous thread mentioned it, but this thread is very long.
after apply
v16-0001-Transform-OR-clause-to-ANY-expressions.patch
and 0002-Teach-generate_bitmap_or_paths-to-build-BitmapOr-pat-20240212.patch
I found a performance degradation case:

drop table if exists test;
create table test as (select (random()*100)::int x, (random()*1000) y
from generate_series(1,100) i);
vacuum analyze test;

set enable_or_transformation to off;
explain(timing off, analyze, costs off)
select * from test where (x = 1 or x = 2 or x = 3 or x = 4 or x = 5 or
x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
50.887 ms

set enable_or_transformation to on;
explain(timing off, analyze, costs off)
select * from test where (x = 1 or x = 2 or x = 3 or x = 4 or x = 5 or
x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
92.001 ms

-
but for aggregate count(*), it indeed increased the performance:

set enable_or_transformation to off;
explain(timing off, analyze, costs off)
select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
= 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
46.818 ms

set enable_or_transformation to on;
explain(timing off, analyze, costs off)
select count(*) from test where (x = 1 or x = 2 or x = 3 or x = 4 or x
= 5 or x = 6 or x = 7 or x = 8 or x = 9 ) \watch i=0.1 c=10
35.376 ms

The time is the last result of the 10 iterations.




Re: POC, WIP: OR-clause support for indexes

2024-02-12 Thread Alena Rybakina

On 12.02.2024 12:01, Andrei Lepikhov wrote:

On 12/2/2024 15:55, Alena Rybakina wrote:

Hi!

I can't unnderstand this part of code:

/* Time to generate index paths */

MemSet(, 0, sizeof(clauseset));
match_clauses_to_index(root, list_make1(rinfo1), index, );

As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.
I think you must explain your note in more detail. Before this call, 
we already called make_restrictinfo() and built rinfo1, haven't we?


I got it, I think, I was confused by the “else“ block when we can 
process the index, otherwise we move on to the next element.


I think maybe “else“ block of creating restrictinfo with the indexpaths 
list creation should be moved to a separate function or just remove "else"?



I think we need to check that rinfo->clause is not empty, because if it 
is we can miss calling build_paths_for_OR function. We should add it there:


restriction_is_saop_clause(RestrictInfo *restrictinfo)
{
    if (IsA(restrictinfo->clause, ScalarArrayOpExpr))
...


By the way, I think we need to add a check that the clauseset is not 
empty (if (!clauseset.nonempty)) otherwise we could get an error. The 
same check I noticed in build_paths_for_OR function.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: POC, WIP: OR-clause support for indexes

2024-02-12 Thread Andrei Lepikhov

On 12/2/2024 15:55, Alena Rybakina wrote:

Hi!

I can't unnderstand this part of code:

/* Time to generate index paths */

MemSet(, 0, sizeof(clauseset));
match_clauses_to_index(root, list_make1(rinfo1), index, );

As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.
I think you must explain your note in more detail. Before this call, we 
already called make_restrictinfo() and built rinfo1, haven't we?


--
regards,
Andrei Lepikhov
Postgres Professional





Re: POC, WIP: OR-clause support for indexes

2024-02-12 Thread Alena Rybakina

Hi!

I can't unnderstand this part of code:

/* Time to generate index paths */

MemSet(, 0, sizeof(clauseset));
match_clauses_to_index(root, list_make1(rinfo1), index, );

As I understand it, match_clauses_to_index is necessary if you have a 
RestrictInfo (rinfo1) variable, so maybe we should run it after the 
make_restrictonfo procedure, otherwise calling it, I think, is useless.


--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-02-11 Thread Andrei Lepikhov

Thanks for the review!
It was the first version for discussion. Of course, refactoring and 
polishing cycles will be needed, even so we can discuss the general idea 
earlier.


On 10/2/2024 12:00, jian he wrote:

On Thu, Feb 8, 2024 at 1:34 PM Andrei Lepikhov
  1235 | PredicatesData *pd;

Thanks


+ if (!predicate_implied_by(index->indpred, list_make1(rinfo1), true))
+ elog(ERROR, "Logical mistake in OR <-> ANY transformation code");
the error message seems not clear?

Yeah, have rewritten


static List *
build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
List *other_clauses)
I am not sure what's `other_clauses`, and `rinfo` refers to? adding
some comments would be great.

struct PredicatesData needs some comments, I think.

Added, not so much though


+bool
+saop_covered_by_predicates(ScalarArrayOpExpr *saop, List *predicate_lists)
+{
+ ListCell   *lc;
+ PredIterInfoData clause_info;
+ bool result = false;
+ bool isConstArray;
+
+ Assert(IsA(saop, ScalarArrayOpExpr));
is this Assert necessary?

Not sure. Moved it into another routine.


For the function build_paths_for_SAOP, I think I understand the first
part of the code.
But I am not 100% sure of the second part of the `foreach(lc,
predicate_lists)` code.
more comments in `foreach(lc, predicate_lists)` would be helpful.

Done


do you need to add `PredicatesData` to src/tools/pgindent/typedefs.list?

Done


I also did some minor refactoring of generate_saop_pathlist.

Partially agree


instead of let it go to `foreach (lc, entries)`,
we can reject the Const array at `foreach(lc, expr->args)`
Yeah, I just think we can go further and transform two const arrays into 
a new one if we have the same clause and operator. In that case, we 
should allow it to pass through this cycle down to the classification stage.


also `foreach(lc, expr->args)` do we need to reject cases like
`contain_subplans((Node *) nconst_expr)`?
maybe let the nconst_expr be a Var node would be far more easier.
It's contradictory. On the one hand, we simplify the comparison 
procedure and can avoid expr jumbling at all. On the other hand - we 
restrict the feature. IMO, it would be better to unite such clauses

complex_clause1 IN (..) OR complex_clause1 IN (..)
into
complex_clause1 IN (.., ..)
and don't do duplicated work computing complex clauses.
In the attachment - the next version of the second patch.

--
regards,
Andrei Lepikhov
Postgres Professional
From c1e5fc35778acd01cca67e63fdf80c5605af03f2 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 24 Jan 2024 14:07:17 +0700
Subject: [PATCH 2/2] Teach generate_bitmap_or_paths to build BitmapOr paths
 over SAOP clauses.

Likewise OR clauses, discover SAOP array and try to split its elements
between smaller sized arrays to fit a set of partial indexes.
---
 src/backend/optimizer/path/indxpath.c | 304 ++
 src/backend/optimizer/util/predtest.c |  46 
 src/backend/optimizer/util/restrictinfo.c |  13 +
 src/include/optimizer/optimizer.h |  16 ++
 src/include/optimizer/restrictinfo.h  |   1 +
 src/test/regress/expected/select.out  | 282 
 src/test/regress/sql/select.sql   |  82 ++
 src/tools/pgindent/typedefs.list  |   1 +
 8 files changed, 692 insertions(+), 53 deletions(-)

diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 32c6a8bbdc..5383cb76dc 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -32,6 +32,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
+#include "utils/array.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
 
@@ -1220,11 +1221,177 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
return result;
 }
 
+/*
+ * Building index paths over SAOP clause differs from the logic of OR clauses.
+ * Here we iterate across all the array elements and split them to SAOPs,
+ * corresponding to different indexes. We must match each element to an index.
+ */
+static List *
+build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
+List *other_clauses)
+{
+   List   *result = NIL;
+   List   *predicate_lists = NIL;
+   ListCell   *lc;
+   PredicatesData *pd;
+   ScalarArrayOpExpr  *saop = (ScalarArrayOpExpr *) rinfo->clause;
+
+   Assert(IsA(saop, ScalarArrayOpExpr) && saop->useOr);
+
+   if (!IsA(lsecond(saop->args), Const))
+   /*
+* Has it practical outcome to merge arrays which couldn't 
constantified
+* before that step?
+*/
+   return NIL;
+
+   /* Collect predicates */
+   foreach(lc, rel->indexlist)
+   {
+   IndexOptInfo *index = (IndexOptInfo *) lfirst(lc);
+
+   /* Take 

Re: POC, WIP: OR-clause support for indexes

2024-02-09 Thread jian he
On Thu, Feb 8, 2024 at 1:34 PM Andrei Lepikhov
 wrote:
>
> On 3/2/2024 02:06, Alena Rybakina wrote:
> > On 01.02.2024 08:00, jian he wrote:
> > I added your code to the patch.
> Thanks Alena and Jian for the detailed scrutiny!
>
> A couple of questions:
> 1. As I see, transformAExprIn uses the same logic as we invented but
> allows composite and domain types. Could you add a comment explaining
> why we forbid row types in general, in contrast to the transformAExprIn
> routine?
> 2. Could you provide the tests to check issues covered by the recent (in
> v.15) changes?
>
> Patch 0001-* in the attachment incorporates changes induced by Jian's
> notes from [1].
> Patch 0002-* contains a transformation of the SAOP clause, which allows
> the optimizer to utilize partial indexes if they cover all values in
> this array. Also, it is an answer to Alexander's note [2] on performance
> degradation. This first version may be a bit raw, but I need your
> opinion: Does it resolve the issue?
yes. It resolved the partial index performance degradation issue.
The v16, 0002 extra code overhead is limited.

Here is how I test it.
drop table if exists test;
create table test as (select (random()*100)::int x, (random()*1000) y
from generate_series(1,100) i);
create index test_x_1_y on test (y) where x = 1;
create index test_x_2_y on test (y) where x = 2;
create index test_x_3_y on test (y) where x = 3;
create index test_x_4_y on test (y) where x = 4;
create index test_x_5_y on test (y) where x = 5;
create index test_x_6_y on test (y) where x = 6;
create index test_x_7_y on test (y) where x = 7;
create index test_x_8_y on test (y) where x = 8;
create index test_x_9_y on test (y) where x = 9;
create index test_x_10_y on test (y) where x = 10;

set enable_or_transformation to on;
explain(analyze, costs off)
select * from test
where (x = 1 or x = 2 or x = 3 or x = 4 or x = 5 or x = 6 or x = 7 or
x = 8 or x = 9 or x = 10);

set enable_or_transformation to off;
explain(analyze, costs off)
select * from test
where (x = 1 or x = 2 or x = 3 or x = 4 or x = 5 or x = 6 or x = 7 or
x = 8 or x = 9 or x = 10);



FAILED: src/backend/postgres_lib.a.p/optimizer_path_indxpath.c.o
ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include
-I../../Desktop/pg_src/src8/postgres/src/include
-I/usr/include/libxml2 -fdiagnostics-color=always --coverage
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O0 -g
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE
-Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3
-Wcast-function-type -Wshadow=compatible-local -Wformat-security
-Wdeclaration-after-statement -Wno-format-truncation
-Wno-stringop-truncation -Wunused-variable -Wuninitialized
-Werror=maybe-uninitialized -Wreturn-type
-DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES
-DREALLOCATE_BITMAPSETS -DRAW_EXPRESSION_COVERAGE_TEST
-fno-omit-frame-pointer -fPIC -pthread -DBUILDING_DLL -MD -MQ
src/backend/postgres_lib.a.p/optimizer_path_indxpath.c.o -MF
src/backend/postgres_lib.a.p/optimizer_path_indxpath.c.o.d -o
src/backend/postgres_lib.a.p/optimizer_path_indxpath.c.o -c
../../Desktop/pg_src/src8/postgres/src/backend/optimizer/path/indxpath.c
../../Desktop/pg_src/src8/postgres/src/backend/optimizer/path/indxpath.c:
In function ‘build_paths_for_SAOP’:
../../Desktop/pg_src/src8/postgres/src/backend/optimizer/path/indxpath.c:1267:33:
error: declaration of ‘pd’ shadows a previous local
[-Werror=shadow=compatible-local]
 1267 | PredicatesData *pd = (PredicatesData *) lfirst(lc);
  | ^~
../../Desktop/pg_src/src8/postgres/src/backend/optimizer/path/indxpath.c:1235:29:
note: shadowed declaration is here
 1235 | PredicatesData *pd;
  | ^~
cc1: all warnings being treated as errors
[32/126] Compiling C object src/backend/postgres_lib.a.p/utils_adt_ruleutils.c.o
ninja: build stopped: subcommand failed.

+ if (!predicate_implied_by(index->indpred, list_make1(rinfo1), true))
+ elog(ERROR, "Logical mistake in OR <-> ANY transformation code");
the error message seems not clear?
What is a "Logical mistake"?

static List *
build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
List *other_clauses)
I am not sure what's `other_clauses`, and `rinfo` refers to? adding
some comments would be great.

struct PredicatesData needs some comments, I think.

+bool
+saop_covered_by_predicates(ScalarArrayOpExpr *saop, List *predicate_lists)
+{
+ ListCell   *lc;
+ PredIterInfoData clause_info;
+ bool result = false;
+ bool isConstArray;
+
+ Assert(IsA(saop, ScalarArrayOpExpr));
is this Assert necessary?

For the function build_paths_for_SAOP, I think I understand the first
part of the code.
But I am not 100% sure of the second part of the `foreach(lc,
predicate_lists)` code.
more comments in `foreach(lc, predicate_lists)` would be helpful.

do you need to add `PredicatesData` to 

Re: POC, WIP: OR-clause support for indexes

2024-02-07 Thread Andrei Lepikhov

On 3/2/2024 02:06, Alena Rybakina wrote:

On 01.02.2024 08:00, jian he wrote:
I added your code to the patch.

Thanks Alena and Jian for the detailed scrutiny!

A couple of questions:
1. As I see, transformAExprIn uses the same logic as we invented but 
allows composite and domain types. Could you add a comment explaining 
why we forbid row types in general, in contrast to the transformAExprIn 
routine?
2. Could you provide the tests to check issues covered by the recent (in 
v.15) changes?


Patch 0001-* in the attachment incorporates changes induced by Jian's 
notes from [1].
Patch 0002-* contains a transformation of the SAOP clause, which allows 
the optimizer to utilize partial indexes if they cover all values in 
this array. Also, it is an answer to Alexander's note [2] on performance 
degradation. This first version may be a bit raw, but I need your 
opinion: Does it resolve the issue?


Skimming through the thread, I see that, in general, all issues have 
been covered for now. Only Robert's note on a lack of documentation is 
still needs to be resolved.


[1] 
https://www.postgresql.org/message-id/CACJufxGXhJ823cdAdp2Ho7qC-HZ3_-dtdj-myaAi_u9RQLn45g%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional
From 0ac511ab94959e87d1525d5ea8c4855643a6ccbe Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH 1/2] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
Authors: Alena Rybakina , Andrey Lepikhov 

Reviewed-by: Peter Geoghegan , Ranier Vilela 
Reviewed-by: Alexander Korotkov , Robert Haas 

Reviewed-by: jian he 
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 src/backend/nodes/queryjumblefuncs.c  |  27 ++
 src/backend/parser/parse_expr.c   | 327 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/optimizer/optimizer.h |   1 +
 src/test/regress/expected/create_index.out| 156 -
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 ++--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 21 files changed, 876 insertions(+), 70 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index b5a38aeb21..a07aefc9e5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3105,7 +3105,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 

Re: POC, WIP: OR-clause support for indexes

2024-02-02 Thread Alena Rybakina


On 01.02.2024 08:00, jian he wrote:

On Wed, Jan 31, 2024 at 7:10 PM Alena Rybakina
  wrote:

Hi, thank you for your review and interest in this subject.

On 31.01.2024 13:15, jian he wrote:

On Wed, Jan 31, 2024 at 10:55 AM jian he  wrote:

based on my understanding of
https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR
I think you need move commutator check right after the `if
(get_op_rettype(opno) != BOOLOID)` branch

I was wrong about this part. sorry for the noise.


I have made some changes (attachment).
* if the operator expression left or right side type category is
{array | domain | composite}, then don't do the transformation.
(i am not 10% sure with composite)

To be honest, I'm not sure about this check, because we check the type of 
variable there:

if (!IsA(orqual, OpExpr))
 {
 or_list = lappend(or_list, orqual);
 continue;
 }
And below:
if (IsA(leftop, Const))
 {
 opno = get_commutator(opno);

 if (!OidIsValid(opno))
 {
 /* Commuter doesn't exist, we can't reverse the order */
 or_list = lappend(or_list, orqual);
 continue;
 }

 nconst_expr = get_rightop(orqual);
 const_expr = get_leftop(orqual);
 }
 else if (IsA(rightop, Const))
 {
 const_expr = get_rightop(orqual);
 nconst_expr = get_leftop(orqual);
 }
 else
 {
 or_list = lappend(or_list, orqual);
 continue;
 }

Isn't that enough?

alter table tenk1 add column arr int[];
set enable_or_transformation to on;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE arr = '{1,2,3}' or arr = '{1,2}';

the above query will not do the OR transformation. because array type
doesn't have array type.
`
scalar_type = entry->key.exprtype;
if (scalar_type != RECORDOID && OidIsValid(scalar_type))
array_type = get_array_type(scalar_type);
else
array_type = InvalidOid;
`

If either side of the operator expression is array or array related type,
we can be sure it cannot do the transformation
(get_array_type will return InvalidOid for anyarray type).
we can check it earlier, so hash related code will not be invoked for
array related types.

Agree.

Besides, some of examples (with ARRAY) works fine:

postgres=# CREATE TABLE sal_emp (
 pay_by_quarter  integer[],
 pay_by_quater1 integer[]
);
CREATE TABLE
postgres=# INSERT INTO sal_emp
 VALUES (
 '{1, 1, 1, 1}',
 '{1,2,3,4}');
INSERT 0 1
postgres=# select * from sal_emp where pay_by_quarter[1] = 1 or 
pay_by_quarter[1]=2;
   pay_by_quarter   | pay_by_quater1
---+
  {1,1,1,1} | {1,2,3,4}
(1 row)

postgres=# explain select * from sal_emp where pay_by_quarter[1] = 1 or 
pay_by_quarter[1]=2;
   QUERY PLAN
--
  Seq Scan on sal_emp  (cost=0.00..21.00 rows=9 width=64)
Filter: (pay_by_quarter[1] = ANY ('{1,2}'::integer[]))
(2 rows)

* if the left side of the operator expression node contains volatile
functions, then don't do the transformation.

I'm also not sure about the volatility check function, because we perform such 
a conversion at the parsing stage, and at this stage we don't have a RelOptInfo 
variable and especially a RestictInfo such as PathTarget.


  see the example in here:
https://www.postgresql.org/message-id/CACJufxGXhJ823cdAdp2Ho7qC-HZ3_-dtdj-myaAi_u9RQLn45g%40mail.gmail.com

set enable_or_transformation to on;
create or replace function retint(int) returns int as
$func$
begin raise notice 'hello';
 return $1 + round(10 * random()); end
$func$ LANGUAGE plpgsql;

SELECT count(*) FROM tenk1 WHERE thousand = 42;
will return 10 rows.

SELECT count(*) FROM tenk1 WHERE thousand = 42 AND (retint(1) = 4 OR
retint(1) = 3);
this query I should return 20 notices 'hello', but now only 10.

EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE thousand = 42 AND (retint(1) = 4 OR  retint(1) = 3);
   QUERY PLAN
--
  Aggregate
->  Seq Scan on tenk1
  Filter: ((thousand = 42) AND (retint(1) = ANY ('{4,3}'::integer[])))
(3 rows)


Agree.

I added your code to the patch.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
From 0fcad72153c9eaaf436e5b417c803a70fbeaf8ac Mon Sep 17 00:00:00 2001
From: Alena Rybakina 
Date: Fri, 2 Feb 2024 22:01:09 +0300
Subject: [PATCH] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an 

Re: POC, WIP: OR-clause support for indexes

2024-01-31 Thread jian he
On Wed, Jan 31, 2024 at 7:10 PM Alena Rybakina
 wrote:
>
> Hi, thank you for your review and interest in this subject.
>
> On 31.01.2024 13:15, jian he wrote:
>
> On Wed, Jan 31, 2024 at 10:55 AM jian he  wrote:
>
> based on my understanding of
> https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR
> I think you need move commutator check right after the `if
> (get_op_rettype(opno) != BOOLOID)` branch
>
> I was wrong about this part. sorry for the noise.
>
>
> I have made some changes (attachment).
> * if the operator expression left or right side type category is
> {array | domain | composite}, then don't do the transformation.
> (i am not 10% sure with composite)
>
> To be honest, I'm not sure about this check, because we check the type of 
> variable there:
>
> if (!IsA(orqual, OpExpr))
> {
> or_list = lappend(or_list, orqual);
> continue;
> }
> And below:
> if (IsA(leftop, Const))
> {
> opno = get_commutator(opno);
>
> if (!OidIsValid(opno))
> {
> /* Commuter doesn't exist, we can't reverse the order */
> or_list = lappend(or_list, orqual);
> continue;
> }
>
> nconst_expr = get_rightop(orqual);
> const_expr = get_leftop(orqual);
> }
> else if (IsA(rightop, Const))
> {
> const_expr = get_rightop(orqual);
> nconst_expr = get_leftop(orqual);
> }
> else
> {
> or_list = lappend(or_list, orqual);
> continue;
> }
>
> Isn't that enough?

alter table tenk1 add column arr int[];
set enable_or_transformation to on;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE arr = '{1,2,3}' or arr = '{1,2}';

the above query will not do the OR transformation. because array type
doesn't have array type.
`
scalar_type = entry->key.exprtype;
if (scalar_type != RECORDOID && OidIsValid(scalar_type))
array_type = get_array_type(scalar_type);
else
array_type = InvalidOid;
`

If either side of the operator expression is array or array related type,
we can be sure it cannot do the transformation
(get_array_type will return InvalidOid for anyarray type).
we can check it earlier, so hash related code will not be invoked for
array related types.


> Besides, some of examples (with ARRAY) works fine:
>
> postgres=# CREATE TABLE sal_emp (
> pay_by_quarter  integer[],
> pay_by_quater1 integer[]
> );
> CREATE TABLE
> postgres=# INSERT INTO sal_emp
> VALUES (
> '{1, 1, 1, 1}',
> '{1,2,3,4}');
> INSERT 0 1
> postgres=# select * from sal_emp where pay_by_quarter[1] = 1 or 
> pay_by_quarter[1]=2;
>   pay_by_quarter   | pay_by_quater1
> ---+
>  {1,1,1,1} | {1,2,3,4}
> (1 row)
>
> postgres=# explain select * from sal_emp where pay_by_quarter[1] = 1 or 
> pay_by_quarter[1]=2;
>   QUERY PLAN
> --
>  Seq Scan on sal_emp  (cost=0.00..21.00 rows=9 width=64)
>Filter: (pay_by_quarter[1] = ANY ('{1,2}'::integer[]))
> (2 rows)
>
> * if the left side of the operator expression node contains volatile
> functions, then don't do the transformation.
>
> I'm also not sure about the volatility check function, because we perform 
> such a conversion at the parsing stage, and at this stage we don't have a 
> RelOptInfo variable and especially a RestictInfo such as PathTarget.
>
 see the example in here:
https://www.postgresql.org/message-id/CACJufxGXhJ823cdAdp2Ho7qC-HZ3_-dtdj-myaAi_u9RQLn45g%40mail.gmail.com

set enable_or_transformation to on;
create or replace function retint(int) returns int as
$func$
begin raise notice 'hello';
return $1 + round(10 * random()); end
$func$ LANGUAGE plpgsql;

SELECT count(*) FROM tenk1 WHERE thousand = 42;
will return 10 rows.

SELECT count(*) FROM tenk1 WHERE thousand = 42 AND (retint(1) = 4 OR
retint(1) = 3);
this query I should return 20 notices 'hello', but now only 10.

EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE thousand = 42 AND (retint(1) = 4 OR  retint(1) = 3);
  QUERY PLAN
--
 Aggregate
   ->  Seq Scan on tenk1
 Filter: ((thousand = 42) AND (retint(1) = ANY ('{4,3}'::integer[])))
(3 rows)




Re: POC, WIP: OR-clause support for indexes

2024-01-31 Thread Alena Rybakina

Hi, thank you for your review and interest in this subject.

On 31.01.2024 13:15, jian he wrote:

On Wed, Jan 31, 2024 at 10:55 AM jian he  wrote:

based on my understanding of
https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR
I think you need move commutator check right after the `if
(get_op_rettype(opno) != BOOLOID)` branch


I was wrong about this part. sorry for the noise.


I have made some changes (attachment).
* if the operator expression left or right side type category is
{array | domain | composite}, then don't do the transformation.
(i am not 10% sure with composite)


To be honest, I'm not sure about this check, because we check the type 
of variable there:


if (!IsA(orqual, OpExpr))
    {
    or_list = lappend(or_list, orqual);
    continue;
    }
And below:
if (IsA(leftop, Const))
    {
        opno = get_commutator(opno);

        if (!OidIsValid(opno))
        {
            /* Commuter doesn't exist, we can't reverse the order */
            or_list = lappend(or_list, orqual);
            continue;
        }

        nconst_expr = get_rightop(orqual);
        const_expr = get_leftop(orqual);
    }
    else if (IsA(rightop, Const))
    {
        const_expr = get_rightop(orqual);
        nconst_expr = get_leftop(orqual);
    }
    else
    {
        or_list = lappend(or_list, orqual);
        continue;
    }

Isn't that enough?

Besides, some of examples (with ARRAY) works fine:

postgres=# CREATE TABLE sal_emp (
    pay_by_quarter  integer[],
    pay_by_quater1 integer[]
);
CREATE TABLE
postgres=# INSERT INTO sal_emp
    VALUES (
    '{1, 1, 1, 1}',
    '{1,2,3,4}');
INSERT 0 1
postgres=# select * from sal_emp where pay_by_quarter[1] = 1 or 
pay_by_quarter[1]=2;

  pay_by_quarter   | pay_by_quater1
---+
 {1,1,1,1} | {1,2,3,4}
(1 row)

postgres=# explain select * from sal_emp where pay_by_quarter[1] = 1 
or pay_by_quarter[1]=2;

  QUERY PLAN
--
 Seq Scan on sal_emp  (cost=0.00..21.00 rows=9 width=64)
   Filter: (pay_by_quarter[1] = ANY ('{1,2}'::integer[]))
(2 rows)


* if the left side of the operator expression node contains volatile
functions, then don't do the transformation.


I'm also not sure about the volatility check function, because we 
perform such a conversion at the parsing stage, and at this stage we 
don't have a RelOptInfo variable and especially a RestictInfo such as 
PathTarget.


Speaking of NextValueExpr, I couldn't find any examples where the 
current patch wouldn't work. I wrote one of them below:


postgres=# create table foo (f1 int, f2 int generated always as identity);
CREATE TABLE
postgres=# insert into foo values(1);
INSERT 0 1

postgres=# explain verbose update foo set f1 = 2 where f1=1 or f1=2 ;
    QUERY PLAN
---
 Update on public.foo  (cost=0.00..38.25 rows=0 width=0)
   ->  Seq Scan on public.foo  (cost=0.00..38.25 rows=23 width=10)
 Output: 2, ctid
 Filter: (foo.f1 = ANY ('{1,2}'::integer[]))
(4 rows)

Maybe I missed something. Do you have any examples?


* some other minor  cosmetic changes.

Thank you, I agree with them.

--
Regards,
Alena Rybakina
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Re: POC, WIP: OR-clause support for indexes

2024-01-31 Thread jian he
On Wed, Jan 31, 2024 at 10:55 AM jian he  wrote:
>
> based on my understanding of
> https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR
> I think you need move commutator check right after the `if
> (get_op_rettype(opno) != BOOLOID)` branch
>
I was wrong about this part. sorry for the noise.


I have made some changes (attachment).
* if the operator expression left or right side type category is
{array | domain | composite}, then don't do the transformation.
(i am not 10% sure with composite)

* if the left side of the operator expression node contains volatile
functions, then don't do the transformation.

* some other minor  cosmetic changes.


v14_comments.no-cfbot
Description: Binary data


Re: POC, WIP: OR-clause support for indexes

2024-01-30 Thread jian he
+/*
+ * Hash function that's compatible with guc_name_compare
+ */
+static uint32
+orclause_hash(const void *data, Size keysize)
+{
+ OrClauseGroupKey   *key = (OrClauseGroupKey *) data;
+ uint64 hash;
+
+ (void) JumbleExpr(key->expr, );
+ hash += ((uint64) key->opno + (uint64) key->exprtype) % UINT64_MAX;
+ return hash;
+}

looks strange. `hash` is uint64, but here you return uint32.

based on my understanding of
https://www.postgresql.org/docs/current/xoper-optimization.html#XOPER-COMMUTATOR
I think you need move commutator check right after the `if
(get_op_rettype(opno) != BOOLOID)` branch

+ opno = ((OpExpr *) orqual)->opno;
+ if (get_op_rettype(opno) != BOOLOID)
+ {
+ /* Only operator returning boolean suits OR -> ANY transformation */
+ or_list = lappend(or_list, orqual);
+ continue;
+ }

select  
po.oprname,po.oprkind,po.oprcanhash,po.oprleft::regtype,po.oprright,po.oprresult,
po1.oprname
frompg_operator po join pg_operator po1
on  po.oprcom = po1.oid
where   po.oprresult = 16;

I am wondering, are all these types as long as the return type is bool
suitable for this transformation?




Re: POC, WIP: OR-clause support for indexes

2024-01-30 Thread jian he
On Tue, Dec 5, 2023 at 6:55 PM Andrei Lepikhov
 wrote:
>
> Here is fresh version with the pg_dump.pl regex fixed. Now it must pass
> buildfarm.

+JumbleState *
+JumbleExpr(Expr *expr, uint64 *queryId)
+{
+ JumbleState *jstate = NULL;
+
+ Assert(queryId != NULL);
+
+ jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+ /* Set up workspace for query jumbling */
+ jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+ jstate->jumble_len = 0;
+ jstate->clocations_buf_size = 32;
+ jstate->clocations = (LocationLen *)
+ palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+ jstate->clocations_count = 0;
+ jstate->highest_extern_param_id = 0;
+
+ /* Compute query ID */
+ _jumbleNode(jstate, (Node *) expr);
+ *queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+ jstate->jumble_len,
+ 0));
+
+ if (*queryId == UINT64CONST(0))
+ *queryId = UINT64CONST(1);
+
+ return jstate;
+}

+/*
+ * Hash function that's compatible with guc_name_compare
+ */
+static uint32
+orclause_hash(const void *data, Size keysize)
+{
+ OrClauseGroupKey   *key = (OrClauseGroupKey *) data;
+ uint64 hash;
+
+ (void) JumbleExpr(key->expr, );
+ hash += ((uint64) key->opno + (uint64) key->exprtype) % UINT64_MAX;
+ return hash;
+}

correct me if i am wrong:
in orclause_hash, you just want to return a uint32, then why does the
JumbleExpr function return struct JumbleState.
here JumbleExpr, we just simply hash part of a Query struct,
so JumbleExpr's queryId would be confused with JumbleQuery function's queryId.

not sure the purpose of the following:
+ if (*queryId == UINT64CONST(0))
+ *queryId = UINT64CONST(1);

even if  *queryId is 0
`hash += ((uint64) key->opno + (uint64) key->exprtype) % UINT64_MAX;`
will make the hash return non-zero?

+ MemSet(, 0, sizeof(info));
i am not sure this is necessary.

Some comments on OrClauseGroupEntry would be great.

seems there is no doc.

create or replace function retint(int) returns int as
$func$
begin return $1 + round(10 * random()); end
$func$ LANGUAGE plpgsql;

set enable_or_transformation to on;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE thousand = 42 AND (tenthous * retint(1) = NULL OR tenthous *
retint(1) = 3) OR thousand = 41;

returns:
QUERY PLAN
---
 Aggregate
   ->  Seq Scan on tenk1
 Filter: (((thousand = 42) AND ((tenthous * retint(1)) = ANY
('{NULL,3}'::integer[]))) OR (thousand = 41))
(3 rows)

Based on the query plan, retint executed once, but here it should be
executed twice?
maybe we need to use contain_volatile_functions to check through the
other part of the operator expression.

+ if (IsA(leftop, Const))
+ {
+ opno = get_commutator(opno);
+
+ if (!OidIsValid(opno))
+ {
+ /* Commuter doesn't exist, we can't reverse the order */
+ or_list = lappend(or_list, orqual);
+ continue;
+ }
+
+ nconst_expr = get_rightop(orqual);
+ const_expr = get_leftop(orqual);
+ }
+ else if (IsA(rightop, Const))
+ {
+ const_expr = get_rightop(orqual);
+ nconst_expr = get_leftop(orqual);
+ }
+ else
+ {
+ or_list = lappend(or_list, orqual);
+ continue;
+ }
do we need to skip this transformation for the const type is anyarray?




Re: POC, WIP: OR-clause support for indexes

2024-01-26 Thread vignesh C
On Tue, 5 Dec 2023 at 16:25, Andrei Lepikhov  wrote:
>
> Here is fresh version with the pg_dump.pl regex fixed. Now it must pass
> buildfarm.
>
> Under development:
> 1. Explanation of the general idea in comments (Robert's note)
> 2. Issue with hiding some optimizations (Alexander's note and example
> with overlapping clauses on two partial indexes)

CFBot shows that the patch does not apply anymore as in [1]:
=== Applying patches on top of PostgreSQL commit ID
6ce071f6b04d3fc836f436fa08108a6d11e2 ===
=== applying patch ./v14-1-0001-Transform-OR-clause-to-ANY-expressions.patch

patching file src/test/regress/expected/sysviews.out
Hunk #1 succeeded at 124 (offset 1 line).
Hunk #2 FAILED at 134.
1 out of 2 hunks FAILED -- saving rejects to file
src/test/regress/expected/sysviews.out.rej

Please post an updated version for the same.

[1] - http://cfbot.cputube.org/patch_46_4450.log

Regards,
Vignesh




Re: POC, WIP: OR-clause support for indexes

2023-12-05 Thread Andrei Lepikhov
Here is fresh version with the pg_dump.pl regex fixed. Now it must pass 
buildfarm.


Under development:
1. Explanation of the general idea in comments (Robert's note)
2. Issue with hiding some optimizations (Alexander's note and example 
with overlapping clauses on two partial indexes)


--
regards,
Andrei Lepikhov
Postgres Professional
From 71746caae3eb0c771792b563fd53244fc1ac575b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 23 Nov 2023 16:00:13 +0700
Subject: [PATCH] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 src/backend/nodes/queryjumblefuncs.c  |  30 ++
 src/backend/parser/parse_expr.c   | 310 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl  |   6 +-
 src/include/nodes/queryjumble.h   |   1 +
 src/include/parser/parse_expr.h   |   1 +
 src/test/regress/expected/create_index.out| 156 -
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 +++--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 21 files changed, 862 insertions(+), 70 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0a5bdf8bcc..ff69b2cd3b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3112,7 +3112,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 < 
20) OR ((r1.c1 IS NULL) AND (r2.c1 < 5 GROUP BY 2 ORDER BY 
array_agg(DISTINCT (r1.c1 % 5)) ASC NULLS LAST
+   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE r1.c1 IS 
NULL) AND (r2.c1 < 5)) OR (r1.c1 < 20))) GROUP BY 2 ORDER BY array_agg(DISTINCT 
(r1.c1 % 5)) ASC NULLS LAST
 (4 rows)
 
 select array_agg(distinct (t1.c1)%5) from ft4 t1 full join ft5 t2 on (t1.c1 = 
t2.c1) where t1.c1 < 20 or (t1.c1 is null and t2.c1 < 5) group by (t2.c1)%3 
order by 1;
@@ -3130,7 +3130,7 @@ select array_agg(distinct (t1.c1)%5 order by (t1.c1)%5) 
from ft4 t1 full join ft
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5) ORDER BY (t1.c1 % 5))), ((t2.c1 % 
3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5) ORDER BY ((r1.c1 % 5)) 
ASC NULLS LAST), (r2.c1 % 3) FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON 
(((r1.c1 = r2.c1 WHERE (((r1.c1 < 20) OR ((r1.c1 IS NULL) AND (r2.c1 < 
5 GROUP BY 2 ORDER BY array_agg(DISTINCT (r1.c1 % 5) ORDER BY ((r1.c1 % 5)) 
ASC NULLS LAST) ASC 

Re: POC, WIP: OR-clause support for indexes

2023-12-03 Thread Andrei Lepikhov

Hi,

Here is the next version of the patch where, I think, all of Roberts's 
claims related to the code have been fixed.

For example, expression 'x < 1 OR x < 2' is transformed to
'x < ANY (1,2)'.

Here, we still need to deal with the architectural issues. I like the 
approach mentioned by Peter: try to transform the expression tree to 
some 'normal' form, which is more laconic and simple; delay the search 
for any optimization ways to the following stages.


Also, it doesn't pass pg_dump test. At first glance, it is a problem of 
regex expression, which should be corrected further.


--
regards,
Andrei Lepikhov
Postgres Professional
From 73031b7acae68494ddd0f9b1faf4c94aae3bd6b0 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 23 Nov 2023 16:00:13 +0700
Subject: [PATCH] Transform OR clause to ANY expressions.

Replace (expr op C1) OR (expr op C2) ... with expr op ANY(C1, C2, ...) on the
preliminary stage of optimization when we are still working with the tree
expression.
Here C is a constant expression, 'expr' is non-constant expression, 'op' is
an operator which returns boolean result and has a commuter (for the case of
reverse order of constant and non-constant parts of the expression,
like 'CX op expr').
Sometimes it can lead to not optimal plan. But we think it is better to have
array of elements instead of a lot of OR clauses. Here is a room for further
optimizations on decomposing that array into more optimal parts.
---
 .../postgres_fdw/expected/postgres_fdw.out|  16 +-
 src/backend/nodes/queryjumblefuncs.c  |  30 ++
 src/backend/parser/parse_expr.c   | 310 +-
 src/backend/utils/misc/guc_tables.c   |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/queryjumble.h   |   1 +
 src/include/parser/parse_expr.h   |   1 +
 src/test/regress/expected/create_index.out| 156 -
 src/test/regress/expected/inherit.out |   2 +-
 src/test/regress/expected/join.out|  62 +++-
 src/test/regress/expected/partition_prune.out | 219 +++--
 src/test/regress/expected/rules.out   |   4 +-
 src/test/regress/expected/stats_ext.out   |  12 +-
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/expected/tidscan.out |  23 +-
 src/test/regress/sql/create_index.sql |  35 ++
 src/test/regress/sql/join.sql |  10 +
 src/test/regress/sql/partition_prune.sql  |  22 ++
 src/test/regress/sql/tidscan.sql  |   6 +
 src/tools/pgindent/typedefs.list  |   2 +
 20 files changed, 859 insertions(+), 67 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0a5bdf8bcc..ff69b2cd3b 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1349,7 +1349,7 @@ SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN 
(SELECT * FROM ft5 WHERE
  Foreign Scan
Output: t1.c1, t1.c2, ft5.c1, ft5.c2
Relations: (public.ft4 t1) LEFT JOIN (public.ft5)
-   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 < 
10) OR (r4.c1 IS NULL))) AND ((r1.c1 < 10))
+   Remote SQL: SELECT r1.c1, r1.c2, r4.c1, r4.c2 FROM ("S 1"."T 3" r1 LEFT 
JOIN "S 1"."T 4" r4 ON (((r1.c1 = r4.c1)) AND ((r4.c1 < 10 WHERE (((r4.c1 
IS NULL) OR (r4.c1 < 10))) AND ((r1.c1 < 10))
 (4 rows)
 
 SELECT t1.c1, t1.c2, t2.c1, t2.c2 FROM ft4 t1 LEFT JOIN (SELECT * FROM ft5 
WHERE c1 < 10) t2 ON (t1.c1 = t2.c1)
@@ -3112,7 +3112,7 @@ select array_agg(distinct (t1.c1)%5) from ft4 t1 full 
join ft5 t2 on (t1.c1 = t2
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5))), ((t2.c1 % 3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE (((r1.c1 < 
20) OR ((r1.c1 IS NULL) AND (r2.c1 < 5 GROUP BY 2 ORDER BY 
array_agg(DISTINCT (r1.c1 % 5)) ASC NULLS LAST
+   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5)), (r2.c1 % 3) FROM ("S 
1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1 WHERE r1.c1 IS 
NULL) AND (r2.c1 < 5)) OR (r1.c1 < 20))) GROUP BY 2 ORDER BY array_agg(DISTINCT 
(r1.c1 % 5)) ASC NULLS LAST
 (4 rows)
 
 select array_agg(distinct (t1.c1)%5) from ft4 t1 full join ft5 t2 on (t1.c1 = 
t2.c1) where t1.c1 < 20 or (t1.c1 is null and t2.c1 < 5) group by (t2.c1)%3 
order by 1;
@@ -3130,7 +3130,7 @@ select array_agg(distinct (t1.c1)%5 order by (t1.c1)%5) 
from ft4 t1 full join ft
  Foreign Scan
Output: (array_agg(DISTINCT (t1.c1 % 5) ORDER BY (t1.c1 % 5))), ((t2.c1 % 
3))
Relations: Aggregate on ((public.ft4 t1) FULL JOIN (public.ft5 t2))
-   Remote SQL: SELECT array_agg(DISTINCT (r1.c1 % 5) ORDER BY ((r1.c1 % 5)) 
ASC NULLS 

  1   2   >