[ https://issues.apache.org/jira/browse/LUCENE-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
AMIRAULT Martin closed LUCENE-7482. ----------------------------------- Resolution: Invalid Sorry, just realized that actually my implementation assumed that all documents match, which most of the time is not the case. > Faster sorted index search for reverse order search > --------------------------------------------------- > > Key: LUCENE-7482 > URL: https://issues.apache.org/jira/browse/LUCENE-7482 > Project: Lucene - Core > Issue Type: New Feature > Reporter: AMIRAULT Martin > Priority: Minor > > We are currently using Lucene here in my company for our main product. > Our search functionnality is quite basic and the results are always sorted > given a predefined field. The user is only able to choose the sort order > (Asc/Desc). > I am currently investigating using the index sort feature with > EarlyTerminationSortingCollector. > This is quite a shame searching on a sorted index in reverse order do not > have any optimization and was wondering if it would be possible to make it > faster by creating a special "ReverseSortingCollector" for this purpose. > I am aware the posting list is designed to be always iterated in the same > order, so it is not about early-terminating the search but more about > filtering-out unneeded documents more efficiently. > If a segment is sorted in reverse order, we can work out easily the docId > from which documents should be collected. > Here is a sample quick code: > {code:title=ReverseSortingCollector.java|borderStyle=solid} > public class ReverseSortingCollector extends FilterCollector { > /** Sort used to sort the search results */ > protected final Sort sort; > /** Number of documents to collect in each segment */ > protected final int numDocsToCollect; > > [...] > @Override > public LeafCollector getLeafCollector(LeafReaderContext context) throws > IOException { > LeafReader reader = context.reader(); > Sort segmentSort = reader.getIndexSort(); > if (isReverseOrder(sort, segmentSort)) {//segment is sorted in > reverse order than the search sort > > //Here we can easily work out the docNum from which we > should collect > long collectFrom = context.reader().numDocs() - > numDocsToCollect; > > return new FilterLeafCollector(in.getLeafCollector(context)) { > @Override > public void collect(int doc) throws IOException { > if (doc >= collectFrom) {//only delegates > super.collect(doc); > } > } > }; > }else{ > return in.getLeafCollector(context); > } > } > > } > {code} > This is specially efficient when used along with TopFieldCollector as a lot > of docValue lookup would not take place. > In my experiment it reduced search time by 90%. > However I was wondering if it is correct, as my knowledge of Lucene is still > quite limited. > Especially is it correct to assume that LeafReader docId always span from > 0=>LeafReader.numDocs() ? > Note : Does not support paging. Could be eventually implemented by providing > a way to look up the docId to match from the last document collected (eg for > LongPoint querying the docId closest to the previously returned value...) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org