[ https://issues.apache.org/jira/browse/DERBY-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13804635#comment-13804635 ]
Mike Matrigali edited comment on DERBY-6301 at 1/22/14 6:26 PM: ---------------------------------------------------------------- I believe multi-probe is only picked when internally we have an in-list. We will get internal in-list either in the case of a user specified IN-list or the code may (maybe always - i am not sure) convert a set of equality OR's to an internal IN-list. I did not mean to say there is a current bound. I just remember that a long time ago there were derby compiler problems with very long IN-lists, and Dan did work in this area to avoid them. So whatever we do in this area we should think about how it might affect the size of a compiled query plan with 1000's of terms in an IN-list. There is a point when the optimizer will not pick multi-probe on an IN-list even if a good index exists, but it is more of a costing issue. The cost to probe each term is more than a normal scan so as the number of terms gets close to or exceeds the number of rows in the table then multi-probe will not be picked. In general I think the more we can get the sql layer to push predicates into store the better. Read committed locking works best when this is the case, and I assume it also fixes issues with virtual tables and predicates. was (Author: mikem): I believe multi-probe is only picked when internally we have an in-list. We will get internal in-list either in the case of a user specified IN-list or the code may (maybe always - i am not sure) convert a set of equality OR's to an internal IN-list. I did not mean to say there is a current bound. I just remember that a long time ago there were derby compiler problems with very long IN-lists, and Dan did work in this area to avoid them. So whatever we do in this area we should think about how it might affect the size of a compiled query plan with 1000's of terms in an IN-list. There is a point when the optimizer will not pick multi-probe on an IN-list even if a good index exists, but it is more of a costing issue. The cost to probe each term is more than a normal scan so as the number of terms gets close to or exceeds the number of rows in the table then multi-probe will not be picked. In general I think the more we can get the sql layer to push predicates into store the better. Read committed locking worst best when this is the case, and I assume it also fixes issues with virtual tables and predicates. > SQL layer should push down IN list predicates to store when doing a scan > ------------------------------------------------------------------------ > > Key: DERBY-6301 > URL: https://issues.apache.org/jira/browse/DERBY-6301 > Project: Derby > Issue Type: Improvement > Components: SQL > Affects Versions: 10.10.1.1 > Reporter: Mike Matrigali > > The store interface allows for OR and AND qualifiers to be passed down to > store as part of either > a heap or btree scan. It is more efficient to qualify the rows at the lowest > levels. The SQL level > does not seem to push any qualifier in the case of IN lists. > This does not matter if the optimizer choses the multi-probe execution > strategy for the IN list as that also > qualifies the row at the lowest level. > The problem arises when the optimizer chooses not to do multi-probe, for > instance if it determines there > are too many terms in the in-list relative to the size of the table and the > cardinality of the terms. In this > case it chooses a scan with no qualifiers which results in all rows being > returned to the sql layer and qualified there. > In addition to performance considerations this presents a locking problem > with respect to the repeatable read isolation level. It is optimal in > repeatable read to not maintain locks on those > rows that do not qualify. Currently this locking optimization only takes > place for those rows that > are qualified in the store vs. those qualified in the upper SQL layer. So in > the case of a non-multi-probe IN-LIST plan all non-qualified rows looked at > as part of the execution will remain locked in repeatable > read. -- This message was sent by Atlassian JIRA (v6.1.5#6160)