RE: Order docIds to reduce disk seeks

Rose, Stuart J Mon, 17 Nov 2014 15:08:08 -0800

Hi Vijay, 

...sorting the documents you need to retrieve by docID order first...


means sorting them by their 'document number' which is the value in the 
'scoreDoc.doc' field and is the value that the reader takes to 'retrieve' the 
document from the index. If you write a comparator to sort the elements in the 
ScoreDoc[] by their doc field then that will put them in 'docID order' and the 
reader will always be skipping forward to the next doc which will probably 
reduce its seek time. 

Regards, 
Stuart



-----Original Message-----
From: Vijay B [mailto:[email protected]] 
Sent: Monday, November 17, 2014 9:16 AM
To: [email protected]
Subject: Order docIds to reduce disk seeks

*Could someone point me how to order docIds as per 
**http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
<http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*

*"Limit usage of stored fields and term vectors. Retrieving these from the 
index is quite costly. Typically you should only retrieve these for the current 
"page" the user will see, not for all documents in the full result set. For 
each document retrieved, Lucene must seek to a different location in various 
files. Try sorting the documents you need to retrieve by docID order first."*

*To give some background:*

*We are using plain vanilla LUCNE (version 4.2.1) for our **Our 
application.**We index our documents using stored fields. We add two fields 
related to our documents: UUID: 9 digit number represents internal id and
doc_text: document text( 7k to 20K in size approx). In our search code, **we 
use boolean Query to retrive by UUID  and fetch document text use if for other 
processing. We are noticing slow response times with the searches. I understand 
that stored field retrieval are slower and should be limited but this is 
mandatory for our app.*


Current code:

TopScoreDocCollector collector =
TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);

dirReader = DirectoryReader.open(FSDirectory.open(......))
IndexSearcher indexSearcher = new IndexSearcher(dirReader); 
indexSearcher.search(query, collector); ScoreDoc[] scoreDocs = 
collector.topDocs().scoreDocs;

for (ScoreDoc scoreDoc : scoreDocs) {
Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text = 
luceneDoc.get("doc_text"); //these calls take lot of time

//process text
}

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Order docIds to reduce disk seeks

Reply via email to