Hi, >>QueryIndex.getCost
>my doubt is what >this heuristic function to estimate the "traversed entries" should look >like in general Relational databases typically know the number of entries in the index (total indexed entries), plus the selectivity of a column. See also http://www.akadia.com/services/ora_index_selectivity.html . That way it's quite easy to estimate the number of returned values, for a given filter. I think Lucene and Solr have a way to get those numbers, but I didn't look into that yet. I guess the problem might be, the cost method needs to be very fast, otherwise all the queries would be affected. Relational databases typically have those index statistics fully in memory. Some databases additionally have a histogram that describes the distribution of the values. See also http://www.dba-oracle.com/t_histograms.htm >, but that may be implementation specific The cost returned by an index should not be implementation specific, otherwise the cost between different index implementations can't be compared. >, and how that >estimate should be used, should we just return the number of estimated >entries for the cost? So that cost is 1 when we estimate to return 1 >entry, >2 when we estimate to return 2 entries, 1000 when we estimate to return >1000 entries? Generally yes, but if you keep everything in memory or know things are cached, then you could return less (as documented). >My other concern on this point is that it's not granted, in my opinion, >that the index returning less entries would be the faster. Yes, it's not that much about less entries or more entries, it's about lower or higher cost. If there are more entries, but they are all in memory, then that's better than less entries, but you need a disk read for each entry. That's what the javadocs talk about. >it would be ok for me to either deprecate it or improve the semantics of >the cost calculation (e.g. explicitly introduce other metrics to be taken >into account in the cost calculation: local / remote index, The QueryIndex will be deprated, sure. Until the AdvancedQueryIndex is used, the documented behavior should be used: "...and if there could in theory be one network roundtrip or disk read operation per node (this method may return a lower number if the data is known to be fully in memory)." >since the FullTextQueryIndex seems to be a marker interface, could we >"merge" it some way into the AdvancedQueryIndex? Just to make things >simpler. Yes. The IndexPlan of the AdvancedQueryIndex has a method "isFulltextIndex". Once we can switch to the AdvancedQueryIndex, the FullTextQueryIndex is no longer needed. >In the end my very generic take on this discussion is that we should try >to >have a clearer API/documentation for the part that involves picking the >index via cost / index plans (but maybe that's only me), implementations >can be then adjusted accordingly if needed. Yes, sure. Regards, Thomas