[ https://issues.apache.org/jira/browse/IGNITE-22328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Iurii Gerzhedovich updated IGNITE-22328: ---------------------------------------- Description: During implementation planner optimization for JOIN in SQL ( IGNITE-18749 ) were added set of tests. However added set of tests is insufficient. Let's add the following test scenarios: h2. Performance Testing # Check bound intersection timings, i.e. between MAX_SIZE_OF_JOIN_TO_OPTIMIZE up to MAX_SIZE_OF_JOIN_TO_OPTIMIZE + 1 tables joining. Take into account that this approach is applies only to the planning *phase* of execution engine thus statement for N joins (with empty data rows in table) will need to consume equal time in comparison with an equal query but with N+1 joins instead. # Check there is no sufficient difference between involved N and N+1 tables. We need to store (somehow) performance results of such a checks from build to build h2. Functional and End-to-End Testing: pre requisites: Current implementation uses MAX_SIZE_OF_JOIN_TO_OPTIMIZE = 5; constant as a threshold for disabling JOIN COMMUTE rules, thus all sql statements need to have joins with more than 5 tables\sources. # All possible joins INNER, OUTER(LEFT, RIGHT), NATURAL, SELF need to be checked. # Not only tables can be used as a sources for JOIN operations, but subqueries(with and without table sources), *system_range* function and system views. # Due to non optimal plans for some statements are raised - fill distributed table (more than 3 nodes) with data step by step (10k 100k and so on rows) and run all from p1. # Mutate statements to change sequence of join order, i.e. for: ON T1.custId = T2.custId also need to be checked: ON T2.custId = T1.custId. Self check: you need to obtain two different plans (explain plan for sql statement): # Ignite.*Join ... Ignite.*Scan(table=[[PUBLIC, T1]] ... Ignite.*Scan(table=[[PUBLIC, T2]] # Ignite.*Join ... Ignite.*Scan(table=[[PUBLIC, T2]] ... Ignite.*Scan(table=[[PUBLIC, T1]] # Check over > 1000 tables Functional tests complete successfully if no timeout or any other exceptions are defined in a log and all statements (up to 50 different tables\sources) are passed. Example of SELF join: SELECT _column_name(s)_ FROM _table1 T1, table1 T2_ WHERE {_}condition{_}; Example of join with subqueries: SELECT t1.a, t2.b from t1, (SELECT 1 as b) as t2 where t1.a=t2.b SELECT t1.a, t2.b from t1, (SELECT b as b from integers1 where b>1) as t2 where t1.a=t2.b was: During implementation planner optimization for JOIN in SQL ( IGNITE-18749 ) were added set of tests. However added set of tests is insufficient. Let's add the following test scenarios: h2. Performance Testing # Check bound intersection timings, i.e. between MAX_SIZE_OF_JOIN_TO_OPTIMIZE up to MAX_SIZE_OF_JOIN_TO_OPTIMIZE + 1 tables joining. Take into account that this approach is applies only to the planning *phase* of execution engine thus statement for N joins (with empty data rows in table) will need to consume equal time in comparison with an equal query but with N+1 joins instead. # Check there is no sufficient difference between involved N and N+1 tables. We need to store (somehow) performance results of such a checks from build to build h2. Functional and End-to-End Testing: pre requisites: Current implementation uses MAX_SIZE_OF_JOIN_TO_OPTIMIZE = 5; constant as a threshold for disabling JOIN COMMUTE rules, thus all sql statements need to have joins with more than 5 tables\sources. # All possible joins INNER, OUTER(LEFT, RIGHT), NATURAL, SELF need to be checked. # Not only tables can be used as a sources for JOIN operations, but subqueries(with and without table sources), *system_range* function and system views. # Due to non optimal plans for some statements are raised - fill distributed table (more than 3 nodes) with data step by step (10k 100k and so on rows) and run all from p1. # Mutate statements to change sequence of join order, i.e. for: ON T1.custId = T2.custId also need to be checked: ON T2.custId = T1.custId. Self check: you need to obtain two different plans (explain plan for sql statement): # Ignite.*Join ... Ignite.*Scan(table=[[PUBLIC, T1]] ... Ignite.*Scan(table=[[PUBLIC, T2]] # Ignite.*Join ... Ignite.*Scan(table=[[PUBLIC, T2]] ... Ignite.*Scan(table=[[PUBLIC, T1]] # Check over > 1000 tables Functional tests complete successfully if no timeout or any other exceptions are defined in a log and all statements (up to 50 different tables\sources) are passed. Example of SELF join: SELECT _column_name(s)_ FROM _table1 T1, table1 T2_ WHERE {_}condition{_}; Example of join with subqueries: SELECT t1.a, t2.b from t1, (SELECT 1 as b) as t2 where t1.a=t2.b SELECT t1.a, t2.b from t1, (SELECT b as b from integers1 where b>1) as t2 where t1.a=t2.b > Improve test coverage for SQL planner optimization for JOIN > ----------------------------------------------------------- > > Key: IGNITE-22328 > URL: https://issues.apache.org/jira/browse/IGNITE-22328 > Project: Ignite > Issue Type: Improvement > Components: sql > Reporter: Iurii Gerzhedovich > Priority: Major > Labels: ignite-3 > > During implementation planner optimization for JOIN in SQL ( IGNITE-18749 ) > were added set of tests. However added set of tests is insufficient. > Let's add the following test scenarios: > h2. Performance Testing > # Check bound intersection timings, i.e. between > MAX_SIZE_OF_JOIN_TO_OPTIMIZE up to MAX_SIZE_OF_JOIN_TO_OPTIMIZE + 1 tables > joining. Take into account that this approach is applies only to the planning > *phase* of execution engine thus statement for N joins (with empty data rows > in table) will need to consume equal time in comparison with an equal query > but with N+1 joins instead. > # Check there is no sufficient difference between involved N and N+1 tables. > We need to store (somehow) performance results of such a checks from build to > build > h2. Functional and End-to-End Testing: > pre requisites: > Current implementation uses MAX_SIZE_OF_JOIN_TO_OPTIMIZE = 5; constant as a > threshold for disabling JOIN COMMUTE rules, thus all sql statements need to > have joins with more than 5 tables\sources. > > # All possible joins INNER, OUTER(LEFT, RIGHT), NATURAL, SELF need to be > checked. > # Not only tables can be used as a sources for JOIN operations, but > subqueries(with and without table sources), *system_range* function and > system views. > # Due to non optimal plans for some statements are raised - fill distributed > table (more than 3 nodes) with data step by step (10k 100k and so on rows) > and run all from p1. > # Mutate statements to change sequence of join order, i.e. for: ON T1.custId > = T2.custId also need to be checked: ON T2.custId = T1.custId. Self check: > you need to obtain two different plans (explain plan for sql statement): > # Ignite.*Join > ... > Ignite.*Scan(table=[[PUBLIC, T1]] > ... > Ignite.*Scan(table=[[PUBLIC, T2]] > > # Ignite.*Join > ... > Ignite.*Scan(table=[[PUBLIC, T2]] > ... > Ignite.*Scan(table=[[PUBLIC, T1]] > > # Check over > 1000 tables > > Functional tests complete successfully if no timeout or any other exceptions > are defined in a log and all statements (up to 50 different tables\sources) > are passed. > Example of SELF join: > SELECT _column_name(s)_ > FROM _table1 T1, table1 T2_ > WHERE {_}condition{_}; > > Example of join with subqueries: > SELECT t1.a, t2.b from t1, (SELECT 1 as b) as t2 where t1.a=t2.b > SELECT t1.a, t2.b from t1, (SELECT b as b from integers1 where b>1) as t2 > where t1.a=t2.b -- This message was sent by Atlassian Jira (v8.20.10#820010)