On 29 April 2014 13:28, Hadi Moshayedi Wrote,

>This looks like a great improvement. Repeating Nicolas's question, do you have 
>a real-world example of such joins?

I can think of some scenario where, user need to self-join and find the 
comparison  with other tuples, For example, list down all the employee which 
has less salary compare to others employees and count of the employees who are 
earning more than that emp. Like query given below

“select ta.emp_name, count(*) from t1 as ta, t1 as tb where 
ta.emp_salary<tb.emp_salary group by ta.emp_name;”

>In my experience, I see more queries like "self-join table A and table B where 
>A.time BETWEEN B.time - '1 week' and B.time", similar to what Nicolas and Tom 
>mentioned. As an example, "count users who placed   an order in the week 
>following their registration".

Currently I have implemented very basic POC which can work only for a < b 
query, I think actual patch can be enhanced for these type of queries also.

>Can you send a patch so we can also try it?

Patch is attached in the mail, but for testing we need to take care of some 
points

1.      Patch is implemented only for a<b type of queries (only for table with 
one integer field, this can be modified in create_nestloop_plan if needed, I 
have written for basic test with integer).



2.      What changes are done

There is no changes done in planner cost calculation, so hack is put while 
generating the plan.

IF planner has selected NLJ plan, and enable material is set to off (this is 
the hint to select special Merge Join)

Then add sort node above left and right tree for NLJ.



3.      So if you want to test with normal NLJ no need to change anything, and 
if you want to test using this merge join just run ‘set enable_material=off’;


postgres=# explain select count(*) from t1 as ta, t1 as tb where ta.a<tb.a;
                                QUERY PLAN
---------------------------------------------------------------------------
Aggregate  (cost=396625.51..396625.52 rows=1 width=0)
   ->  Nested Loop  (cost=0.00..375758.83 rows=8346672 width=0)
         Join Filter: (ta.a < tb.a)
         ->  Seq Scan on t1 ta  (cost=0.00..73.04 rows=5004 width=4)
         ->  Materialize  (cost=0.00..98.06 rows=5004 width=4)
               ->  Seq Scan on t1 tb  (cost=0.00..73.04 rows=5004 width=4)
Planning time: 0.291 ms
(7 rows)

Now For enabling this merge Join
postgres=# set enable_material=off;
SET
postgres=# explain select count(*) from t1 as ta, t1 as tb where ta.a<tb.a;
                                QUERY PLAN
---------------------------------------------------------------------------
Aggregate  (cost=699432.08..699432.09 rows=1 width=0)
   ->  Nested Loop  (cost=0.00..678565.40 rows=8346672 width=0)
         Join Filter: (ta.a < tb.a)
         ->  Sort  (cost=380.51..393.02 rows=5004 width=4)
               Sort Key: ta.a
               ->  Seq Scan on t1 ta  (cost=0.00..73.04 rows=5004 width=4)
         ->  Sort  (cost=380.51..393.02 rows=5004 width=4)
               Sort Key: tb.a
               ->  Seq Scan on t1 tb  (cost=0.00..73.04 rows=5004 width=4)
Planning time: 0.286 ms
(10 rows)


Thanks & Regards,
 Dilip Kumar


Attachment: merge_join_nonequal.patch
Description: merge_join_nonequal.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to