[ 
https://issues.apache.org/jira/browse/OAK-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583037#comment-13583037
 ] 

Thomas Mueller commented on OAK-622:
------------------------------------

> put the estimated nodes count as a cost measure as calculating that may have 
> an heavier impact 

Calculating the plan(s) and estimating the cost should normally not do any disk 
or network operations. When using getters (getCost()), it is tempting for the 
index implementation to do far too much. So here my first draft that uses 
classes and fields instead of interfaces / getters (but this could be changed):

{code}
public interface QueryIndex {
    
    /**
     * Get the unique index name.
     *
     * @return the index name
     */
    String getIndexName();

    /**
     * Return the possible index plans for the given filter and sort order.
     * Please note this method is supposed to run quickly. That means it should
     * usually not read any data from the storage.
     *
     * @param filter the filter
     * @param sortOrder the sort order or null if no sorting is required
     * @param rootState root state of the current repository snapshot
     * @return the list of index plans (null if none)
     */
    List<IndexPlan> getPlans(Filter filter, List<Order> sortOrder, NodeState 
rootState);
    
    /**
     * Get the query plan description (for logging purposes).
     *
     * @param plan the index plan
     * @return the query plan description
     */
    String getPlanDescription(IndexPlan plan);
    
    /**
     * Start a query. The filter and sort order of the index plan is to be used.
     *
     * @param plan the index plan to use
     * @param rootState root state of the current repository snapshot
     * @return a cursor to iterate over the result
     */
    Cursor query(IndexPlan plan, NodeState rootState);
    
    /**
     * An index plan.
     */
    public static class IndexPlan {
        
        /**
         * The cost to execute the query once. The returned value should
         * approximately match the number of disk read operations plus the
         * number of network roundtrips.
         */
        double costPerExecution;
        
        /**
         * The cost to read one entry from the cursor. The returned value should
         * approximately match the number of disk read operations plus the
         * number of network roundtrips.
         */
        double costPerEntry;
        
        /**
         * The estimated number of entries. This value does not have to be
         * accurate.
         */
        long estimatedEntryCount;
        
        /**
         * The filter to use.
         */
        Filter filter;

        /**
         * Whether transient (unsaved) changes are included.
         */
        boolean includeTransient;

        /**
         * Whether the index is not always up-to-date.
         */
        boolean isDelayed;
        
        /**
         * Whether the fulltext part of the filter is evaluated (possibly with
         * an extended syntax). If set, the fulltext part of the filter is not
         * evaluated any more within the query engine.
         */
        boolean isFulltextIndex;
        
        /**
         * Whether the cursor is able to read all properties from a node.
         */
        boolean includesNodeData;
        
        /**
         * The sort order of the returned entries, or null if unsorted.
         */
        List<Order> sortOrder;
        
    }
    
    /**
     * A sort order entry.
     */
    static class Order {
        
        /**
         * The property name on where to sort.
         */
        String propertyName;

        /**
         * True for descending, false for ascending.
         */
        boolean descending;
        
    }

}
{code}

                
> Improve QueryIndex interface
> ----------------------------
>
>                 Key: OAK-622
>                 URL: https://issues.apache.org/jira/browse/OAK-622
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Minor
>
> The current QueryIndex interface is quite simple, but doesn't address some of 
> the required features and more advanced optimizations that are possible:
> - For fulltext queries, it doesn't address the case where the index 
> implementation has a different understanding of the fulltext condition than 
> what is described in the JCR spec (the basic features).
> - For queries with "order by" it would be good to know if the index supports 
> returning the data in sorted order, and if yes, how much slower that would be 
> (if it is slower). So a index might have multiple strategies with different 
> costs.
> - It's quite easy to misunderstand what getCost is supposed to do exactly. 
> The new API should have a clearer solution here.
> - Even if the query doesn't have "order by", the index might return the data 
> in a sorted way, which might help improving query performance (using a merge 
> join)
> - The cost is currently a single value, it might be better to estimate the 
> number of nodes, the cost to run a query, and the cost per node. That way we 
> could optimize to quickly return the first few nodes (versus optimize for 
> thoughput).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to