[ 
https://issues.apache.org/jira/browse/CALCITE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Solimando updated CALCITE-7029:
------------------------------------------
    Description: 
Add conflict detection algorithm CD-C to DPhyp so that it can handle various 
join types. 

DpHyp algorithm is a join reorder algorithm based on dynamic programming. It 
can enumerate all possibilities of the query graph without duplication, and it 
can theoretically handle complex join predicates and various types of joins 
(outer, semi, anti, etc.).

CALCITE-6846 completed the basic DpHyp algorithm, defined the hypergraph 
structure and enumeration process. It can enumerate the query graph without 
duplication and handle complex join predicates, but is limited to inner join. 
The ability to handle various types of joins requires conflict detection.

This pr implements the CD-C conflict detection algorithm based on the paper 
{_}On the correct and complete enumeration of the core search space{_}. The 
conflict detection algorithm does not change the enumeration process of DpHyp. 
It calculates the conflict rules for each join operator in the process of 
constructing the hypergraph from the plan tree, and verifies the applicability 
of csg-cmp according to the conflict rules when DpHyp enumerates csg-cmp.

 

The following is an example of sql and the expected plan:

sql
{code:java}
select emp.empno from 
emp_address inner join emp on emp_address.empno = emp.empno 
left join dept on emp.deptno = dept.deptno 
inner join dept_nested on dept.deptno = dept_nested.deptno {code}
initial plan
{code:java}
LogicalProject(EMPNO=[$3])
  LogicalJoin(condition=[=($12, $14)], joinType=[inner])
    LogicalJoin(condition=[=($10, $12)], joinType=[left])
      LogicalJoin(condition=[=($0, $3)], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
build hypergraph
{code:java}
LogicalProject(EMPNO=[$3])
  HyperGraph(edges=[{0}——[INNER, =(vertex(0)_field(0), 
vertex(1)_field(0))]——{1},{1}——[LEFT, =(vertex(1)_field(7), 
vertex(2)_field(0))]——{2},{1, 2}——[INNER, =(vertex(2)_field(0), 
vertex(3)_field(0))]——{3}])
    LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
after dphyp
{code:java}
LogicalProject(EMPNO=[$4])
  LogicalJoin(condition=[=($15, $4)], joinType=[inner])
    LogicalJoin(condition=[=($13, $0)], joinType=[inner])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
      LogicalJoin(condition=[=($7, $9)], joinType=[left])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code}

  was:
Add conflict detection algorithm CD-C to DPhyp so that it can handle various 
join types. 

DpHyp algorithm is a join reorder algorithm based on dynamic programming. It 
can enumerate all possibilities of the query graph without duplication, and it 
can theoretically handle complex join predicates and various types of joins 
(outer, semi, anti, etc.).

Calcite-6846 completed the basic DpHyp algorithm, defined the hypergraph 
structure and enumeration process. It can enumerate the query graph without 
duplication and handle complex join predicates, but is limited to inner join. 
The ability to handle various types of joins requires conflict detection.

This pr implements the CD-C conflict detection algorithm based on the paper 
{_}On the correct and complete enumeration of the core search space{_}. The 
conflict detection algorithm does not change the enumeration process of DpHyp. 
It calculates the conflict rules for each join operator in the process of 
constructing the hypergraph from the plan tree, and verifies the applicability 
of csg-cmp according to the conflict rules when DpHyp enumerates csg-cmp.

 

The following is an example of sql and the expected plan:

sql
{code:java}
select emp.empno from 
emp_address inner join emp on emp_address.empno = emp.empno 
left join dept on emp.deptno = dept.deptno 
inner join dept_nested on dept.deptno = dept_nested.deptno {code}
initial plan
{code:java}
LogicalProject(EMPNO=[$3])
  LogicalJoin(condition=[=($12, $14)], joinType=[inner])
    LogicalJoin(condition=[=($10, $12)], joinType=[left])
      LogicalJoin(condition=[=($0, $3)], joinType=[inner])
        LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
build hypergraph
{code:java}
LogicalProject(EMPNO=[$3])
  HyperGraph(edges=[{0}——[INNER, =(vertex(0)_field(0), 
vertex(1)_field(0))]——{1},{1}——[LEFT, =(vertex(1)_field(7), 
vertex(2)_field(0))]——{2},{1, 2}——[INNER, =(vertex(2)_field(0), 
vertex(3)_field(0))]——{3}])
    LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
    LogicalTableScan(table=[[CATALOG, SALES, EMP]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
after dphyp
{code:java}
LogicalProject(EMPNO=[$4])
  LogicalJoin(condition=[=($15, $4)], joinType=[inner])
    LogicalJoin(condition=[=($13, $0)], joinType=[inner])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
      LogicalJoin(condition=[=($7, $9)], joinType=[left])
        LogicalTableScan(table=[[CATALOG, SALES, EMP]])
        LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
    LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code}


> Support DPhyp to handle various join types
> ------------------------------------------
>
>                 Key: CALCITE-7029
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7029
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.39.0
>            Reporter: Silun Dong
>            Assignee: Silun Dong
>            Priority: Major
>              Labels: pull-request-available
>
> Add conflict detection algorithm CD-C to DPhyp so that it can handle various 
> join types. 
> DpHyp algorithm is a join reorder algorithm based on dynamic programming. It 
> can enumerate all possibilities of the query graph without duplication, and 
> it can theoretically handle complex join predicates and various types of 
> joins (outer, semi, anti, etc.).
> CALCITE-6846 completed the basic DpHyp algorithm, defined the hypergraph 
> structure and enumeration process. It can enumerate the query graph without 
> duplication and handle complex join predicates, but is limited to inner join. 
> The ability to handle various types of joins requires conflict detection.
> This pr implements the CD-C conflict detection algorithm based on the paper 
> {_}On the correct and complete enumeration of the core search space{_}. The 
> conflict detection algorithm does not change the enumeration process of 
> DpHyp. It calculates the conflict rules for each join operator in the process 
> of constructing the hypergraph from the plan tree, and verifies the 
> applicability of csg-cmp according to the conflict rules when DpHyp 
> enumerates csg-cmp.
>  
> The following is an example of sql and the expected plan:
> sql
> {code:java}
> select emp.empno from 
> emp_address inner join emp on emp_address.empno = emp.empno 
> left join dept on emp.deptno = dept.deptno 
> inner join dept_nested on dept.deptno = dept_nested.deptno {code}
> initial plan
> {code:java}
> LogicalProject(EMPNO=[$3])
>   LogicalJoin(condition=[=($12, $14)], joinType=[inner])
>     LogicalJoin(condition=[=($10, $12)], joinType=[left])
>       LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> build hypergraph
> {code:java}
> LogicalProject(EMPNO=[$3])
>   HyperGraph(edges=[{0}——[INNER, =(vertex(0)_field(0), 
> vertex(1)_field(0))]——{1},{1}——[LEFT, =(vertex(1)_field(7), 
> vertex(2)_field(0))]——{2},{1, 2}——[INNER, =(vertex(2)_field(0), 
> vertex(3)_field(0))]——{3}])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> after dphyp
> {code:java}
> LogicalProject(EMPNO=[$4])
>   LogicalJoin(condition=[=($15, $4)], joinType=[inner])
>     LogicalJoin(condition=[=($13, $0)], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
>       LogicalJoin(condition=[=($7, $9)], joinType=[left])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>         LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to