[ 
https://issues.apache.org/jira/browse/DERBY-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474895
 ] 

A B commented on DERBY-47:
--------------------------

In one of my previous comments I mentioned that d47_engine_doNOTCommit_v1.patch 
has a problem where the addition of an InListResultSet to the FromList causes 
all of the SUR (Scrollable Updatable Result set) tests that use an IN-clause to 
fail. More specifically, the presence of the InListResultSet causes Derby to 
downgrade the cursor scrollability to "CONCUR_READ_ONLY". 

I was eventually able to track down the cause of this behavior: whether or not 
Derby downgrades a result set to CONCUR_READ_ONLY depends on the value returned 
by the execution time result set in the "isForUpdate()" method.  The default 
(as defined in NoPutResultSetImpl) is to return false.

In the case of the SUR tests (prior to my changes) we were getting a 
ScrollInsensitiveResultSet on top of a TableScanResultSet.  The former gets its 
"isForUpdate()" value from the latter, and the latter correctly returns "true" 
to indicate that the result set is for update and thus no downgrade is needed. 
With d47_engine_doNotCommit_v1.patch applied, though, we add an InListResultSet 
to the FromList, which ultimately gives us a JoinResultSet at execution time.  
The JoinResultSet class does not define an "isForUpdate()" method, so we just 
return the default--which is "false".  That causes the cursor concurrency to be 
downgraded to CONCUR_READ_ONLY.  

To see what would happen I forced JoinResultSet to return "true" and then there 
was an ASSERT failure because JoinResultSets are not expected to be used for 
update/delete.  The relevant code is in execute/JoinResultSet.java; in the 
"getRowLocation()" method we see the following comment:

   * A join is combining rows from two sources, so it has no
   * single row location to return; just return a null.

My guess is that, since the decision to disallow update/delete on a 
JoinResultSet was intentional, trying to code around that restriction is a bad 
idea.  Or at the very least, it would require a lot more investigation and/or 
work.

Instead of pursuing that potentially treacherous path, I was able to come up 
with some logic that checks to see if a result set is updatable at compile time 
and, if so, to skip the IN-list rewrite.  Early testing suggests that this is 
viable solution.

HOWEVER, as the list of "exceptions" to the IN-list (_v1.patch) rewrite grew, I 
started to wonder if there wasn't some other solution to DERBY-47 that would 
accomplish the same thing, minus all of the exception cases.

A few days later I came across some queries involving multiple IN-list 
predicates for a single SELECT statement.  It turns out that many of those 
queries return incorrect results (duplicate rows) and/or run much more slowly 
with the _v1 patch than without.

The fact that there are so many "exceptions to the rule" combined with the 
undesirable behavior in the face of multiple IN-lists prompted me to abandon my 
initial idea of internally adding an InListResultSetNode to the user's FROM 
list (d47_engine_doNOTCommit_v1.patch).

Instead I have been working on an alternate approach to the problem.  This 
second approach, like the first, is based on a comment made by someone else on 
this issue.  This time, though, the comment in question is from James Synge and 
is as follows:

> The goal of the re-write is to avoid the source of DERBY-47,
> which is that we don't have a multi-probe index scan, only
> range index scans, and worse still, when there are parameters
> involved, the range is the entire index.

In short, I decided to work with the execution-time result sets to see if it is 
possible to enforce some kind of "multi-probing" for indexes.  That is to say, 
instead of scanning a range of values on the index, we want to make it so that 
Derby probes the index N times, where N is the size of the IN-list.  For each 
probe, then, we'll get all rows for which the target column's value equals the 
N-th value in the IN-list.

Effectively, this comes down to the "Marker" strategy in 
Derby47Performance.java, where we evaluate an equality predicate, "column = ?", 
N times.  Except of course that, unlike with the Marker strategy, we do the N 
evaluations internally instead of making the user do it.  

A high-level description of how this approach will work is as follows:

  - If any value in the IN-list is not a constant AND is not a parameter,
    then do processing as usual (no optimizations, no rewrite).  Notice
    that with this approach we *do* still perform the optimization if
    IN-list values are parameters.  That is not the case for the current
    Derby rewrite (i.e. without any changes for DERBY-47).

  Otherwise:

  - During preprocessing, replace the IN-list with an equality predicate of
    the form "column = ?".  I.e. the right operand will be a parameter node,
    the left operand will be whatever column reference belongs to the IN-list
    (hereafter referred to as the "target column").  We call this new, internal
    predicate an IN-list "probe predicate" because it will be the basis of
    the multi-probing logic at execution time.

  - During costing, the equality predicate "column = ?" will be treated as
    a start/stop key by normal optimizer processing (no changes necessary).
    If the predicate is deemed a start/stop key for the *first* column in
    an index, then we'll multiply the estimated cost of "column = ?" by
    the size of the IN-list, to account for the fact that we're actually
    evaluating the predicate N times (not just once).

    If the predicate is not a start/stop key for the first column in
    the index, then we do not adjust the cost.  See below for why.

  - After we have found the best plan (including a best conglomerate choice),
    check to see if the IN-list probe predicate is a start/stop key on the
    first column in the chosen conglomerate.  In order for that to be the
    case the conglomerate must be an index (because we don't have start/stop
    keys on table conglomerates).  If the probe predicate is not a
    start/stop key for the *first* column in the index, then we "revert" the 
probe
    predicate back to its original IN-list form and evaluate it as a "normal"
    InListOperatorNode.  In this way we are effectively "giving up" on the 
multi-
    probing approach.  This is why we don't change the cost for such probe
    predicates (mentioned above).

    If the probe predicate *is* a start/stop key on the first column in
    the index conglomerate, then leave it as it is.

  - When it comes time to generate the byte code, look to see if we have
    a probe predicate that is a start/stop key on the first column in the
    chosen conglomerate.  If so, generate an array of DataValueDescriptors
    to hold the values from the corresponding IN-list and pass that array
    to the underlying execution-time result set (i.e. to TableScanResultSet).
    Then generate the probe predicate as a "normal" start/stop key for the
    scan.  This will serve as a "place-holder" for the IN-list values
    at execution time.

  - Finally, at execution time, instead of using a single start key and a
    single stop key to define a scan range, we iterate through the IN-list
    values and for each non-duplicate value V[i] (0 <= i < N) we treat V[i]
    as both a start key *and* a stop key.  Or put another way, we plug
    V[i] into the "column = ?" predicate and retrieve all matching rows.
    In this way we are "probing" the index for all rows having value V[i]
    in the target column.  Once all rows matching V[i] have been returned,
    we then grab the next IN-list value, V[i+1], and we reposition the
    scan (by calling "reopenCore()"), this time using V[i+1] as the start
    and stop key (i.e. plugging V[i+1] into "column = ?").  This will
    return all rows having value V[1+1] in the target column.

    Continue this process until all values in the IN-list have been
    exhausted.  When we reach that point, we are done.

As a simple example, assume our query is of the form:

  select ... from admin.changes where id in (1, 20000)

During preprocessing we will effectively change this to be:

  select ... from admin.changes where id = ?

where "id = ?" is our IN-list "probe predicate".  Note that we must make sure 
the optimizer recognizes "id = ?" as a disguised IN-list operator, as opposed 
to a true relational operator.  The reason is that, while we do treat the probe 
predicate as a "fake" relational operator so that it can be seen as a 
start/stop key for the relevant indexes, there are many operations (ex. 
transitive closure) that should *not* be done on a probe predicate.  So we have 
to be able to distinguish a probe predicate from other relational predicates.

With the probe predicate in place the optimizer will determine that it is a 
start/stop key for any index having column "ID" as the first column, and thus 
the optimizer is more likely to choose one of those indexes over a table scan.  
If we assume the optimizer decides to use an index on ID for the query, then 
we'll generate the IN-list values (1, 20000) and we will pass them to the 
underlying index scan. We'll then generate the probe predicate "column = ?" as 
a start and stop key for the scan.  

At execution, then, the scan will first open up the index by using "1" as the 
start and stop key for the scan (or put another way, by plugging "1" into the 
probe predicate, "column = ?").  That will return all rows having ID equal to 
"1". Then when that scan ends, we'll reopen the scan a second time, this time 
using "20000" as the start and stop key, therefore returning all the 20000's.  
Meanwhile, all result sets higher up in the result set tree will just see a 
stream of rows from the index where ID only equals 1 or 20000.   

This works for IN-lists with parameters, as well, because by the time execution 
starts we know what values the parameters hold.

The first draft of code for implementing all of this is pretty much done, but I 
plan to post it in increments for ease of review.  If all goes well, nothing 
should change functionally until the final patch, which will be the 
preprocessing patch that does the actual creation of the "probe predicate".  At 
that point all of the machinery should be in place to recognize the probe 
predicate and function accordingly.

I ran Derby47PerformanceTest with this "multi-probing" approach and the numbers 
were similar to what I saw with the _v1 approach (more details coming later).  
Unlike the _v1 approach, though, the multi-probing approach works with 
subqueries, left outer joins, and scrollable updatable result sets.  In 
addition, all of the test queries that I came up with involving multiple 
IN-lists run correctly with the multi-probing approach, and in many cases run 
quite a bit more quickly--neither of which was true for the _v1 approach.  And 
yes, I do hope to add those test cases to the nightlies as part of my work on 
this issue.

Incremental patches are forthcoming.  In the meantime, if anyone has any 
comments/suggestions on the approach outlined above, I would appreciated the 
feedback...

> Some possible improvements to IN optimization
> ---------------------------------------------
>
>                 Key: DERBY-47
>                 URL: https://issues.apache.org/jira/browse/DERBY-47
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.0.2.0
>         Environment: all
>            Reporter: Sunitha Kambhampati
>         Assigned To: A B
>         Attachments: d47_engine_doNotCommit_v1.patch, 
> d47_engine_doNotCommit_v1.stat, derby-47-performance-data.txt, 
> derby-47-performance-data.txt, Derby47PerformanceTest.java, 
> Derby47PerformanceTest.java, InListOperatorNode.java, 
> QueryPlanUniqueIndexAndWordIndexOneTerm.txt, 
> QueryPlanUniqueIndexAndWordIndexTwoTerms.txt, 
> QueryPlanUniqueIndexOnlyOneTerm.txt, QueryPlanUniqueIndexOnlyTwoTerms.txt, 
> readlocks.diff, readlocks_withContext.diff
>
>
> Consider a simple case of  - 
> A table tbl has 10000 rows, there is a primary key index on i1
> and the query in question is 
>  select * from tbl where i1 in (-1,100000)
> derby does a table scan of the entire table even though the "IN" list has 
> only two values and the comparison is on a field that has an index.
> Briefly looking at the code, it seems like we insert a between and use the IN 
> list to get the start and stop values for the scan. Thus the range of the 
> values in the "IN" list here plays an important role. 
> Thus if the query was changed to select * from tbl where i1 in (-1, 1), an 
> index scan would be chosen.
> It would be nice if we could do something clever in this case where there is 
> clearly an index on the field and the number of values in the IN list is 
> known. Maybe use the rowcount estimate and the IN list size to do some 
> optimizations.  
> - consider the length of the "IN" list to do searches on the table.  ie use 
> the IN list values to do index key searches on the table,
> -or try to convert it to a join. Use the "IN" list values to create a 
> temporary table and do a join. It is most likely that the optimizer will 
> choose the table with "IN" list here as the outer table in the join and thus 
> will do key searches on the larger table. 
> -------------------------------------------------------------------
> some query plans that I logged using derby.language.logQueryPlan=true for 
> some similar queries:
> Table has ascending values from 0 - 9999 for i1. primary key index on i1.
> GMT Thread[UT0,5,main] (XID = 19941), (SESSIONID = 0), select * from 
> scanfixed where i1 in (-1,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) 
> ******* Project-Restrict ResultSet (2):
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 9990
> restriction = true
> projection = false
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count:          750.38
>       optimizer estimated cost:         8579.46
> Source result set:
>       Table Scan ResultSet for SCANFIXED at read committed isolation level 
> using instantaneous share row locking chosen by the optimizer
>       Number of opens = 1
>       Rows seen = 10000
>       Rows filtered = 0
>       Fetch Size = 16
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               next time in milliseconds/row = 0
>       scan information: 
>               Bit set of columns fetched=All
>               Number of columns fetched=9
>               Number of pages visited=417
>               Number of rows qualified=10000
>               Number of rows visited=10000
>               Scan type=heap
>               start position: 
> null          stop position: 
> null          qualifiers:
> Column[0][0] Id: 0
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> Column[0][1] Id: 0
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
>               optimizer estimated row count:          750.38
>               optimizer estimated cost:         8579.46
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l
> 2004-10-14 18:59:47.577 GMT Thread[UT0,5,main] (XID = 19216), (SESSIONID = 
> 0), select * from scanfixed where i1 in 
> (9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict 
> ResultSet (3):
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> restriction = true
> projection = true
>       constructor time (milliseconds) = 0
>       open time (milliseconds) = 0
>       next time (milliseconds) = 0
>       close time (milliseconds) = 0
>       restriction time (milliseconds) = 0
>       projection time (milliseconds) = 0
>       optimizer estimated row count:            4.80
>       optimizer estimated cost:           39.53
> Source result set:
>       Index Row to Base Row ResultSet for SCANFIXED:
>       Number of opens = 1
>       Rows seen = 10
>       Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8}
>               constructor time (milliseconds) = 0
>               open time (milliseconds) = 0
>               next time (milliseconds) = 0
>               close time (milliseconds) = 0
>               optimizer estimated row count:            4.80
>               optimizer estimated cost:           39.53
>               Index Scan ResultSet for SCANFIXED using index SCANFIXEDX at 
> read committed isolation level using instantaneous share row locking chosen 
> by the optimizer
>               Number of opens = 1
>               Rows seen = 10
>               Rows filtered = 0
>               Fetch Size = 16
>                       constructor time (milliseconds) = 0
>                       open time (milliseconds) = 0
>                       next time (milliseconds) = 0
>                       close time (milliseconds) = 0
>                       next time in milliseconds/row = 0
>               scan information: 
>                       Bit set of columns fetched=All
>                       Number of columns fetched=2
>                       Number of deleted rows visited=0
>                       Number of pages visited=2
>                       Number of rows qualified=10
>                       Number of rows visited=10
>                       Scan type=btree
>                       Tree height=2
>                       start position: 
>       >= on first 1 column(s).
>       Ordered null semantics on the following columns: 
>                       stop position: 
>       > on first 1 column(s).
>       Ordered null semantics on the following columns: 
>                       qualifiers:
> None
>                       optimizer estimated row count:            4.80
>                       optimizer estimated cost:           39.53

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to