[jira] Updated: (DERBY-47) Some possible improvements to IN optimization

A B (JIRA) Fri, 16 Mar 2007 16:07:29 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


A B updated DERBY-47:
---------------------

    Attachment: d47_mp_junitTest_v1.patch

Thank you for the review of the cleanup patch, Bryan!  I will continue with my 
plan to commit that patch before the end of day on Monday if no other comments 
come in.

I'm also attaching here a (final?) patch for this issue: 
d47_mp_junitTest_v1.patch, which creates a new JUnit test based on the repro 
program attached to this issue (thanks James Synge!).  The test creates the 
same kind of table and data that Derby47PerformanceTest.java creates, and then 
runs three types of queries with larger and larger IN lists.  The three types 
of queries are:

  1 - "Markers" : same as in James' program
  2 - "Literals" : same as in James' program
  3 - 'MixedIds": IN list has a combination of parameter markers and literals.

For each query we check to make sure the results are correct and then we look 
at the query plan to determine whether or not the optimizer chose to do 
multi-probing.  If the results are incorrect or if the optimizer did *not* 
choose multi-probing then the test will fail.

The test determines that "multi-probing" was in effect by looking at the query 
plan and verifying two things:

  1. We did an index scan on the target table AND

  2. The number of rows that "qualified" is equal to the number of rows that 
were actually returned for the query.  If we did *not* do multi-probing then we 
would scan all or part of the index and then apply the IN-list restriction 
after reading the rows.  That means that the number of rows "qualified" for the 
scan would be greater than the number of rows returned from the query.  But if 
we do multi-probing we will just probe for rows that we know satsify the 
restriction, thus the number of rows that we "fetch" for the scan (i.e. "rows 
qualified") should exactly match the number of rows in the result set.

I ran the new test using ibm142, ibm15, jdk142, jdk15, jdk16, and weme6.1 and 
it passed in all cases.

I also ran the new test against a build that I created before any of the 
DERBY-47 changes were committed; as expected, the test failed because even 
though the optimizer did chose to use an index, it scanned a lot (thousands) of 
extra rows for that index.

Any reviews/comments on the this new JUnit test are very much welcomed.  In the 
absence of any feedback to the contrary, I'm thinking I'll commit this new test 
by the end of Monday, as well.  And of course, comments/suggestions can still 
be made after that, if needed...

> Some possible improvements to IN optimization
> ---------------------------------------------
>
>                 Key: DERBY-47
>                 URL: https://issues.apache.org/jira/browse/DERBY-47
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.0.2.0
>         Environment: all
>            Reporter: Sunitha Kambhampati
>         Assigned To: A B
>         Attachments: d47_engine_doNotCommit_v1.patch, 
> d47_engine_doNotCommit_v1.stat, d47_mp_addlTestCases.patch, 
> d47_mp_CBO_MoAP_v1.patch, d47_mp_CBO_MoAP_v1.stat, d47_mp_cleanup_v1.patch, 
> d47_mp_cleanup_v1.stat, d47_mp_codeGen_v1.patch, d47_mp_codeGen_v1.stat, 
> d47_mp_exec_v1.patch, d47_mp_exec_v1.stat, d47_mp_junitTest_v1.patch, 
> d47_mp_masters_v1.patch, d47_mp_preprocess_v1.patch, 
> d47_mp_preprocess_v1.stat, d47_mp_preprocess_v2.patch, 
> d47_mp_relOpPredCheck_v1.patch, d47_mp_relOpPredCheck_v1.stat, 
> derby-47-performance-data.txt, derby-47-performance-data.txt, 
> Derby47PerformanceTest.java, Derby47PerformanceTest.java, 
> Derby47PerformanceTest.java, InListOperatorNode.java, 
> QueryPlanUniqueIndexAndWordIndexOneTerm.txt, 
> QueryPlanUniqueIndexAndWordIndexTwoTerms.txt, 
> QueryPlanUniqueIndexOnlyOneTerm.txt, QueryPlanUniqueIndexOnlyTwoTerms.txt, 
> readlocks.diff, readlocks_withContext.diff, readlocks_withContext.diff
>
>
> Consider a simple case of  - 
> A table tbl has 10000 rows, there is a primary key index on i1
> and the query in question is 
>  select * from tbl where i1 in (-1,100000)
> derby does a table scan of the entire table even though the "IN" list has 
> only two values and the comparison is on a field that has an index.
> Briefly looking at the code, it seems like we insert a between and use the IN 
> list to get the start and stop values for the scan. Thus the range of the 
> values in the "IN" list here plays an important role. 
> Thus if the query was changed to select * from tbl where i1 in (-1, 1), an 
> index scan would be chosen.
> It would be nice if we could do something clever in this case where there is 
> clearly an index on the field and the number of values in the IN list is 
> known. Maybe use the rowcount estimate and the IN list size to do some 
> optimizations.  
> - consider the length of the "IN" list to do searches on the table.  ie use 
> the IN list values to do index key searches on the table,
> -or try to convert it to a join. Use the "IN" list values to create a 
> temporary table and do a join. It is most likely that the optimizer will 
> choose the table with "IN" list here as the outer table in the join and thus 
> will do key searches on the larger table. 
> -------------------------------------------------------------------
> some query plans that I logged using derby.language.logQueryPlan=true for 
> some similar queries:
> Table has ascending values from 0 - 9999 for i1. primary key index on i1.
> GMT Thread[UT0,5,main] (XID = 19941), (SESSIONID = 0), select * from 
> scanfixed where i1 in (-1,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) 
> ******* Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 9990
> restriction = true
> projection = false
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count:          750.38
>       optimizer estimated cost:         8579.46
> Source result set:
>       Table Scan ResultSet for SCANFIXED at read committed isolation level 
> using instantaneous share row locking chosen by the optimizer
>       Number of opens = 1
>       Rows seen = 10000
>       Rows filtered = 0
>       Fetch Size = 16
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               next time in milliseconds/row = 0
>       scan information: 
>               Bit set of columns fetched=All
>               Number of columns fetched=9
>               Number of pages visited=417
>               Number of rows qualified=10000
>               Number of rows visited=10000
>               Scan type=heap
>               start position: 
> null          stop position: 
> null          qualifiers:
> Column[0][0] Id: 0
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> Column[0][1] Id: 0
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
>               optimizer estimated row count:          750.38
>               optimizer estimated cost:         8579.46
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l
> 2004-10-14 18:59:47.577 GMT Thread[UT0,5,main] (XID = 19216), (SESSIONID = 
> 0), select * from scanfixed where i1 in 
> (9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict 
> ResultSet (3):
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> restriction = true
> projection = true
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count:            4.80
>       optimizer estimated cost:           39.53
> Source result set:
>       Index Row to Base Row ResultSet for SCANFIXED:
>       Number of opens = 1
>       Rows seen = 10
>       Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8}
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               optimizer estimated row count:            4.80
>               optimizer estimated cost:           39.53
>               Index Scan ResultSet for SCANFIXED using index SCANFIXEDX at 
> read committed isolation level using instantaneous share row locking chosen 
> by the optimizer
>               Number of opens = 1
>               Rows seen = 10
>               Rows filtered = 0
>               Fetch Size = 16
>                       constructor time (milliseconds) = 0
>                       open time (milliseconds) = 0
>                       next time (milliseconds) = 0
>                       close time (milliseconds) = 0
>                       next time in milliseconds/row = 0
>               scan information: 
>                       Bit set of columns fetched=All
>                       Number of columns fetched=2
>                       Number of deleted rows visited=0
>                       Number of pages visited=2
>                       Number of rows qualified=10
>                       Number of rows visited=10
>                       Scan type=btree
>                       Tree height=2
>                       start position: 
>       >= on first 1 column(s).
>       Ordered null semantics on the following columns: 
>                       stop position: 
>       > on first 1 column(s).
>       Ordered null semantics on the following columns: 
>                       qualifiers:
> None
>                       optimizer estimated row count:            4.80
>                       optimizer estimated cost:           39.53

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-47) Some possible improvements to IN optimization

Reply via email to