Nuno Santos created OAK-11764:
---------------------------------

             Summary: Query planner: when multiple indexes have same cost, 
planner should choose index deterministically
                 Key: OAK-11764
                 URL: https://issues.apache.org/jira/browse/OAK-11764
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: indexing
            Reporter: Nuno Santos


The query planner chooses the index with the lowest cost. But if multiple 
indexes evaluate to the same cost, currently the planner is taking the first 
index that is evaluated with the lowest score. This is not deterministic, so 
the same query may be evaluated sometimes by an index and other times by a 
different index. Although the results should be the same regardless of the 
index being used, there can be some cases where the customers are (wrongly) 
relying on the query being evaluated by the same index every time. For 
instance, if the query does not precisely define the order of the results, and 
different indexes will return different orders. This does not violate the 
contract of the query specification, but users may rely on the results being 
always in the same order (for instance, to implement pagination).

The indexes are evaluated in the class 
[FullTextIndex|[https://github.com/apache/jackrabbit-oak/blob/ff1080185d21bd10c017d8a8eb94236ef99c7e87/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/query/FulltextIndex.java#L116-L133],]
 and taken from a Set with all the indexes created in 
[IndexLookup|[https://github.com/apache/jackrabbit-oak/blob/ff1080185d21bd10c017d8a8eb94236ef99c7e87/oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/IndexLookup.java#L55-L70].]
 This class uses a JDK HashSet to store the indexes. The iteration order of 
this class is changed on purpose from run to run of the JDK. Previously, this 
class was using a Guava HashSet, which iterates always by the order by which 
the elements are added. This 
[PR|https://github.com/apache/jackrabbit-oak/pull/1678] changed the Guava to 
the JDK HashSet.

We should ensure that the indexes are selected in a deterministic order, even 
if they have the same cost. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to