bt Scankey in another contradictory case

2024-08-30 Thread bigbro...@hotmail.com
Hi,
when i read source code of the part of nbtree, i detected another kind of 
contradictory case postgresql has not eliminated 
(eg. x >4 and x <3) in the function _bt_preprocess_keys in nbtutils.c. this 
cause next unnecessary index scan and qual evaluation.
it seems no one will write SQL like that, but maybe it generated from the ec or 
degenerate qual. 
the function just invoked in function _bt_first during initializing index scan, 
so it’s cheap to eliminate this case in the function.
we just pay attention on the opposite operator, so there are four detailed 
cases (ie. x >4 and x <3 , x >4 and x <=3, x >=4 and x <3, x >=4 and x <=3).
when test the case is contradictory or not, it's need to use the more 
restrictive operator when less and more restrictive operator both appeared, 
case like x >4 and x <=4, if we choose <=, 4 <= 4 is satisfied, actually x >4 
and x <=4 is contradictory. 
when >= with <= or > with <, it's ok to pick any one of them.
i have added this feature in my local environment and tested. but i am not sure 
it's worth to spend more effort on it. 


bigbro...@hotmail.com


Re: Re: bt Scankey in another contradictory case

2024-09-01 Thread bigbro...@hotmail.com
Hi,
I have reanalysed the code of function _bt_first. I notice that using a 
multi-attribute index
if we can't identify the starting boundaries and the following attributes 
markded not required ,
that means we need start at first or last page in the index to examine every 
tuple to satisfy the 
qual or not, in the meantime the scan will be stopped while the first attribute 
evaluated failed.

For instance:
create table c_s( x int, y int);
insert into c_s select generate_series(1, 2), generate_series(1, 2);
create index my_c_s_idx on c_s using btree(x,y);
explain (analyze, buffers) select * from c_s where x >4000 and y >10 and y 
<10 order by x desc;

 Index Only Scan Backward using my_c_s_idx on c_s  (cost=0.29..384.31 rows=1 
width=8) (actual time=1.302..1.304 rows=0 loops=1)
   Index Cond: ((x > 4000) AND (y > 10) AND (y < 10))
   Heap Fetches: 0
   Buffers: shared read=46
 Planning:
   Buffers: shared hit=51 read=15
 Planning Time: 1.311 ms
 Execution Time: 1.435 ms
(8 rows)

The instance is a little different for description above due to the implies NOT 
NULL Scankey, 
but it has no effect on the whole situation. 

What's more, if i change the number 4000 to 1000.
-
 Sort  (cost=441.01..441.01 rows=1 width=8) (actual time=2.974..2.975 rows=0 
loops=1)
   Sort Key: x DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=89
   ->  Seq Scan on c_s  (cost=0.00..441.00 rows=1 width=8) (actual 
time=2.971..2.971 rows=0 loops=1)
 Filter: ((x > 1000) AND (y > 10) AND (y < 10))
 Rows Removed by Filter: 2
 Buffers: shared hit=89
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.113 ms
 Execution Time: 2.990 ms
(12 rows)

The planner choosed the Seq Scan, and the executor have done the unnecessary 
jobs 2 times.

Let's don't confine to the plain attributes or row comparison and Seq Scan or 
Index Scan . 
We can pretend row-comparison as multi-attributes comparison. The qual is 
implicit-AND format, 
that means once one attribute is self-contradictory, we can abandon the qual 
immediately.

Maybe we can do more efficient jobs during planning time. Like at the phase of 
function deconstruct_recurse 
invoked, we can identify qual that is self-contradictory then flag it. With 
this information we know who is
a dummy relation named arbitrarily.

For instance:

explain (analyze, buffers) select * from c_s a , c_s b where a.y >10 and a.y<10 
and a.x=b.x;
  QUERY PLAN
   
---
 Nested Loop  (cost=0.29..393.31 rows=1 width=16) (actual time=1.858..1.858 
rows=0 loops=1)
   Buffers: shared hit=89
   ->  Seq Scan on c_s a  (cost=0.00..389.00 rows=1 width=8) (actual 
time=1.857..1.857 rows=0 loops=1)
 Filter: ((y > 10) AND (y < 10))
 Rows Removed by Filter: 2
 Buffers: shared hit=89
   ->  Index Only Scan using my_c_s_idx on c_s b  (cost=0.29..4.30 rows=1 
width=8) (never executed)
 Index Cond: (x = a.x)
 Heap Fetches: 0
 Planning:
   Buffers: shared hit=12
 Planning Time: 0.236 ms
 Execution Time: 1.880 ms
(13 rows)

After deconstruct_recurse invoked, qual(a.y >10 and a.y<10) was distributed to 
relation "a" and relation "a" is 
a dummy relation. When "a" and "b" make inner join, "a" will supply nothing. 
That means the inner-join rel is 
a dummy relation too. If there is a outer join above the inner join and some 
one who can commute with it,
we can do more efficient jobs and so on.
It also has benefit on path-competing phase with this feature due to it doesn't 
cost anything.

It's need to discuss the idea whether is feasible or not and it will takes a 
lot of effort to complete this feature. 
Anyway we can fix these issues what we had encountered first.



bigbro...@hotmail.com
 
From: Peter Geoghegan
Date: 2024-08-30 22:32
To: b ro
CC: pgsql-hackers
Subject: Re: bt Scankey in another contradictory case
On Fri, Aug 30, 2024 at 7:36 AM b ro  wrote:
>   this is the patch attachment.
 
We discussed this recently:
 
https://www.postgresql.org/message-id/80384.1715458896%40sss.pgh.pa.us
 
I think that we should do this.
 
It doesn't make a huge difference in practice, because we'll still end
the scan once the leaf level is reached. But it matters more when
array keys are involved, where there might be more than one descent to
the leaf level. Plus we might as well just be thorough about this
stuff.
 
-- 
Peter Geoghegan


Repeat the condition check twice in function distribute_qual_to_rels

2024-11-02 Thread bigbro...@hotmail.com
Hi,
I notice that check the condtion again when after invoked process_equivalence 
and return failed.

if (restrictinfo->mergeopfamilies)
{
if (maybe_equivalence)
{
if (process_equivalence(root, &restrictinfo, jtitem->jdomain))
{
/* distribute to base  */
int relid;
if (bms_get_singleton_member(restrictinfo->required_relids, &relid))
{
restrictinfo->fromec = true;
distribute_restrictinfo_to_rels(root, restrictinfo);
}
return;
}
/* EC rejected it, so set left_ec/right_ec the hard way ... */
if (restrictinfo->mergeopfamilies) /* EC might have changed this */
initialize_mergeclause_eclasses(root, restrictinfo);
/* ... and fall through to distribute_restrictinfo_to_rels */
}
---
Above the codes, check the condition again when invoked process_equivalence 
failed,
but it's already under the positive condition "if 
(restrictinfo->mergeopfamilies)".
It's necessary ?
bigbro...@hotmail.com