Somehow you need to get the sorting server-side ... that's really the only way to do your use case efficiently.
Why can't you sort each request to your N shards, and then do a merge sort on the client side, to get the top hits? Mike McCandless http://blog.mikemccandless.com On Thu, Jul 7, 2016 at 5:48 AM, Tarun Kumar <ta...@sumologic.com> wrote: > Any suggestions pls? > > On Mon, Jul 4, 2016 at 3:37 PM, Tarun Kumar <ta...@sumologic.com> wrote: > >> Hey Michael, >> >> docIds from multiple indices (from multiple machines) need to be >> aggregated, sorted and first few thousand new to be queried. These few >> thousand docs can be distributed among multiple machines. Each machine will >> search the docs which are there in their own indices. So, pulling sorting >> on server side won't suffice the use-case. Is there a alternative to get >> document for given docIds faster? >> >> Thanks >> Tarun >> >> On Mon, Jul 4, 2016 at 3:17 PM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >>> Why not ask Lucene to do the sort on your time field, instead of pulling >>> millions of docids to the client and having it sort. You could even do >>> index-time sorting by time field if you want, which makes early termination >>> possible (faster sorted searches). >>> >>> But if even on having Lucene do the sort you still need to load millions >>> of documents per search request, you are in trouble: you need to >>> re-formulate your use case somehow to take advantage of what Lucene is good >>> for (getting top results for a search). >>> >>> Maybe you can use faceting to do whatever aggregation you are currently >>> doing after retrieving those millions of documents. >>> >>> Maybe you could make a custom collector, and use doc values, to do your >>> own custom aggregation. >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Mon, Jul 4, 2016 at 1:39 AM, Tarun Kumar <ta...@sumologic.com> wrote: >>> >>>> Thanks for reply Michael! In my application, i need to get millions of >>>> documents per search. >>>> >>>> Use case is following: return documents in increasing order of field >>>> time. Client (caller) can't hold more than a few thousand docs at a time so >>>> it gets all docIds and corresponding time field for each doc, sort them on >>>> time and get n docs at a time. To support this usecase, i am: >>>> >>>> - getting all docsIds first. >>>> - Sort docIds on time fields. >>>> - Query n docids at a time from client which make >>>> indexReader.document(docId) call for all n docs at server, combine the docs >>>> these docs and return. >>>> >>>> indexReader.document(docId) is creating bottlenecks. What alternatives >>>> do you suggest? >>>> >>>> On Wed, Jun 29, 2016 at 4:00 AM, Michael McCandless < >>>> luc...@mikemccandless.com> wrote: >>>> >>>>> Are you maybe trying to load too many documents for each search >>>>> request? >>>>> >>>>> The IR.document API is designed to be used to load just a few hits, >>>>> like a page worth or ~ 10 documents, per search. >>>>> >>>>> Mike McCandless >>>>> >>>>> http://blog.mikemccandless.com >>>>> >>>>> On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar <ta...@sumologic.com> >>>>> wrote: >>>>> >>>>>> I am running lucene 4.6.1. I am trying to get documents corresponding >>>>>> to >>>>>> docIds. All threads get stuck (don't get stuck exactly but spend a >>>>>> LOT of >>>>>> time in) at: >>>>>> >>>>>> java.lang.Thread.State: RUNNABLE >>>>>> at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) >>>>>> at >>>>>> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52) >>>>>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220) >>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:197) >>>>>> at >>>>>> sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731) >>>>>> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716) >>>>>> at >>>>>> >>>>>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169) >>>>>> at >>>>>> >>>>>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271) >>>>>> at >>>>>> >>>>>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51) >>>>>> at >>>>>> org.apache.lucene.store.DataInput.readVInt(DataInput.java:108) >>>>>> at >>>>>> >>>>>> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218) >>>>>> at >>>>>> >>>>>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232) >>>>>> at >>>>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277) >>>>>> at >>>>>> >>>>>> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110) >>>>>> at >>>>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:440) >>>>>> >>>>>> >>>>>> There is no disk throttling. What can result into this? >>>>>> >>>>>> Thanks >>>>>> Tarun >>>>>> >>>>> >>>>> >>>> >>> >> >