Re: Order docIds to reduce disk seeks

Michael McCandless Tue, 18 Nov 2014 09:43:29 -0800

Even if you sort all hits by docID it's likely too slow to visit every
single one and load the stored document ...


Try to find another way to solve your problem, making use of the inverted index?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J <[email protected]> wrote:
> Hi Vijay,
>
> ...sorting the documents you need to retrieve by docID order first...
>
> means sorting them by their 'document number' which is the value in the 
> 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve' the 
> document from the index. If you write a comparator to sort the elements in 
> the ScoreDoc[] by their doc field then that will put them in 'docID order' 
> and the reader will always be skipping forward to the next doc which will 
> probably reduce its seek time.
>
> Regards,
> Stuart
>
>
>
> -----Original Message-----
> From: Vijay B [mailto:[email protected]]
> Sent: Monday, November 17, 2014 9:16 AM
> To: [email protected]
> Subject: Order docIds to reduce disk seeks
>
> *Could someone point me how to order docIds as per 
> **http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*
>
> *"Limit usage of stored fields and term vectors. Retrieving these from the 
> index is quite costly. Typically you should only retrieve these for the 
> current "page" the user will see, not for all documents in the full result 
> set. For each document retrieved, Lucene must seek to a different location in 
> various files. Try sorting the documents you need to retrieve by docID order 
> first."*
>
> *To give some background:*
>
> *We are using plain vanilla LUCNE (version 4.2.1) for our **Our 
> application.**We index our documents using stored fields. We add two fields 
> related to our documents: UUID: 9 digit number represents internal id and
> doc_text: document text( 7k to 20K in size approx). In our search code, **we 
> use boolean Query to retrive by UUID  and fetch document text use if for 
> other processing. We are noticing slow response times with the searches. I 
> understand that stored field retrieval are slower and should be limited but 
> this is mandatory for our app.*
>
>
> Current code:
>
> TopScoreDocCollector collector =
> TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);
>
> dirReader = DirectoryReader.open(FSDirectory.open(......))
> IndexSearcher indexSearcher = new IndexSearcher(dirReader); 
> indexSearcher.search(query, collector); ScoreDoc[] scoreDocs = 
> collector.topDocs().scoreDocs;
>
> for (ScoreDoc scoreDoc : scoreDocs) {
> Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text = 
> luceneDoc.get("doc_text"); //these calls take lot of time
>
> //process text
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Order docIds to reduce disk seeks

Reply via email to