[jira] [Updated] (IGNITE-22328) Improve test coverage for SQL planner optimization for JOIN

Iurii Gerzhedovich (Jira) Fri, 24 May 2024 07:58:04 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-22328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Iurii Gerzhedovich updated IGNITE-22328:
----------------------------------------
    Description: 
During implementation  planner optimization for JOIN in SQL ( IGNITE-18749 ) 
were added set of tests. However added set of tests is insufficient. 
Let's add the following test scenarios:
h2. Performance Testing
 # Check bound intersection timings, i.e. between  MAX_SIZE_OF_JOIN_TO_OPTIMIZE 
up to MAX_SIZE_OF_JOIN_TO_OPTIMIZE + 1 tables joining. Take into account that 
this approach is applies only to the planning *phase* of execution engine thus 
statement for N joins (with empty data rows in table) will need to consume 
equal time in comparison with an equal query but with N+1 joins instead.
 # Check there is no sufficient difference between involved N and N+1 tables. 
We need to store (somehow) performance results of such a checks from build to 
build

h2. Functional and End-to-End Testing:

pre requisites:

Current implementation uses MAX_SIZE_OF_JOIN_TO_OPTIMIZE = 5; constant as a 
threshold for disabling JOIN COMMUTE rules, thus all sql statements need to 
have joins with more than 5 tables\sources.

 
 # All possible joins INNER, OUTER(LEFT, RIGHT), NATURAL, SELF need to be 
checked.
 # Not only tables can be used as a sources for JOIN operations, but 
subqueries(with and without table sources), *system_range* function and system 
views.
 # Due to non optimal plans for some statements are raised - fill distributed 
table (more than 3 nodes) with data step by step (10k 100k and so on rows) and 
run all from p1.
 # Mutate statements to change sequence of join order, i.e. for: ON T1.custId = 
T2.custId also need to be checked: ON T2.custId = T1.custId. Self check: you 
need to obtain two different plans (explain plan for sql statement):

 #   Ignite.*Join

  ...

      Ignite.*Scan(table=[[PUBLIC, T1]]

    ...

      Ignite.*Scan(table=[[PUBLIC, T2]]

 
 #   Ignite.*Join

  ...

      Ignite.*Scan(table=[[PUBLIC, T2]]

    ...

      Ignite.*Scan(table=[[PUBLIC, T1]]

 
 # Check over > 1000 tables

 

Functional tests complete successfully if no timeout or any other exceptions 
are defined in a log and all statements (up to 50 different tables\sources) are 
passed.

Example of SELF join:

SELECT _column_name(s)_

FROM _table1 T1, table1 T2_

WHERE {_}condition{_};

 

Example of join with subqueries:

SELECT t1.a, t2.b from t1, (SELECT 1 as b) as t2 where t1.a=t2.b

SELECT t1.a, t2.b from t1, (SELECT b as b from integers1 where b>1) as t2 where 
t1.a=t2.b

  was:
During implementation  planner optimization for JOIN in SQL ( IGNITE-18749 ) 
were added set of tests. However added set of tests is insufficient.
Let's add the following test scenarios:
h2. Performance Testing
 # Check bound intersection timings, i.e. between  MAX_SIZE_OF_JOIN_TO_OPTIMIZE 
up to MAX_SIZE_OF_JOIN_TO_OPTIMIZE + 1 tables joining. Take into account that 
this approach is applies only to the planning *phase* of execution engine thus 
statement for N joins (with empty data rows in table) will need to consume 
equal time in comparison with an equal query but with N+1 joins instead.
 # Check there is no sufficient difference between involved N and N+1 tables. 
We need to store (somehow) performance results of such a checks from build to 
build

h2. Functional and End-to-End Testing:

pre requisites:

Current implementation uses MAX_SIZE_OF_JOIN_TO_OPTIMIZE = 5; constant as a 
threshold for disabling JOIN COMMUTE rules, thus all sql statements need to 
have joins with more than 5 tables\sources.

 
 # All possible joins INNER, OUTER(LEFT, RIGHT), NATURAL, SELF need to be 
checked.
 # Not only tables can be used as a sources for JOIN operations, but 
subqueries(with and without table sources), *system_range* function and system 
views.
 # Due to non optimal plans for some statements are raised - fill distributed 
table (more than 3 nodes) with data step by step (10k 100k and so on rows) and 
run all from p1.
 # Mutate statements to change sequence of join order, i.e. for: ON T1.custId = 
T2.custId also need to be checked: ON T2.custId = T1.custId. Self check: you 
need to obtain two different plans (explain plan for sql statement):

 #   Ignite.*Join

  ...

      Ignite.*Scan(table=[[PUBLIC, T1]]

    ...

      Ignite.*Scan(table=[[PUBLIC, T2]]

 
 #   Ignite.*Join

  ...

      Ignite.*Scan(table=[[PUBLIC, T2]]

    ...

      Ignite.*Scan(table=[[PUBLIC, T1]]

 
 #  Check over > 1000 tables

 

Functional tests complete successfully if no timeout or any other exceptions 
are defined in a log and all statements (up to 50 different tables\sources) are 
passed.





Example of SELF join:

SELECT _column_name(s)_

FROM _table1 T1, table1 T2_

WHERE {_}condition{_};

 

Example of join with subqueries:

SELECT t1.a, t2.b from t1, (SELECT 1 as b) as t2 where t1.a=t2.b

SELECT t1.a, t2.b from t1, (SELECT b as b from integers1 where b>1) as t2 where 
t1.a=t2.b


> Improve test coverage for SQL planner optimization for JOIN
> -----------------------------------------------------------
>
>                 Key: IGNITE-22328
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22328
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>            Reporter: Iurii Gerzhedovich
>            Priority: Major
>              Labels: ignite-3
>
> During implementation  planner optimization for JOIN in SQL ( IGNITE-18749 ) 
> were added set of tests. However added set of tests is insufficient. 
> Let's add the following test scenarios:
> h2. Performance Testing
>  # Check bound intersection timings, i.e. between  
> MAX_SIZE_OF_JOIN_TO_OPTIMIZE up to MAX_SIZE_OF_JOIN_TO_OPTIMIZE + 1 tables 
> joining. Take into account that this approach is applies only to the planning 
> *phase* of execution engine thus statement for N joins (with empty data rows 
> in table) will need to consume equal time in comparison with an equal query 
> but with N+1 joins instead.
>  # Check there is no sufficient difference between involved N and N+1 tables. 
> We need to store (somehow) performance results of such a checks from build to 
> build
> h2. Functional and End-to-End Testing:
> pre requisites:
> Current implementation uses MAX_SIZE_OF_JOIN_TO_OPTIMIZE = 5; constant as a 
> threshold for disabling JOIN COMMUTE rules, thus all sql statements need to 
> have joins with more than 5 tables\sources.
>  
>  # All possible joins INNER, OUTER(LEFT, RIGHT), NATURAL, SELF need to be 
> checked.
>  # Not only tables can be used as a sources for JOIN operations, but 
> subqueries(with and without table sources), *system_range* function and 
> system views.
>  # Due to non optimal plans for some statements are raised - fill distributed 
> table (more than 3 nodes) with data step by step (10k 100k and so on rows) 
> and run all from p1.
>  # Mutate statements to change sequence of join order, i.e. for: ON T1.custId 
> = T2.custId also need to be checked: ON T2.custId = T1.custId. Self check: 
> you need to obtain two different plans (explain plan for sql statement):
>  #   Ignite.*Join
>   ...
>       Ignite.*Scan(table=[[PUBLIC, T1]]
>     ...
>       Ignite.*Scan(table=[[PUBLIC, T2]]
>  
>  #   Ignite.*Join
>   ...
>       Ignite.*Scan(table=[[PUBLIC, T2]]
>     ...
>       Ignite.*Scan(table=[[PUBLIC, T1]]
>  
>  # Check over > 1000 tables
>  
> Functional tests complete successfully if no timeout or any other exceptions 
> are defined in a log and all statements (up to 50 different tables\sources) 
> are passed.
> Example of SELF join:
> SELECT _column_name(s)_
> FROM _table1 T1, table1 T2_
> WHERE {_}condition{_};
>  
> Example of join with subqueries:
> SELECT t1.a, t2.b from t1, (SELECT 1 as b) as t2 where t1.a=t2.b
> SELECT t1.a, t2.b from t1, (SELECT b as b from integers1 where b>1) as t2 
> where t1.a=t2.b



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22328) Improve test coverage for SQL planner optimization for JOIN

Reply via email to