Re: Order docIds to reduce disk seeks

Vijay B Tue, 18 Nov 2014 11:42:07 -0800

Hi Mike,  could you provide some pointers on using inverted index. Any
examples or what API classes to use to accomplish this.


On Tue, Nov 18, 2014 at 12:40 PM, Michael McCandless <
[email protected]> wrote:

> Even if you sort all hits by docID it's likely too slow to visit every
> single one and load the stored document ...
>
> Try to find another way to solve your problem, making use of the inverted
> index?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J <[email protected]>
> wrote:
> > Hi Vijay,
> >
> > ...sorting the documents you need to retrieve by docID order first...
> >
> > means sorting them by their 'document number' which is the value in the
> 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve'
> the document from the index. If you write a comparator to sort the elements
> in the ScoreDoc[] by their doc field then that will put them in 'docID
> order' and the reader will always be skipping forward to the next doc which
> will probably reduce its seek time.
> >
> > Regards,
> > Stuart
> >
> >
> >
> > -----Original Message-----
> > From: Vijay B [mailto:[email protected]]
> > Sent: Monday, November 17, 2014 9:16 AM
> > To: [email protected]
> > Subject: Order docIds to reduce disk seeks
> >
> > *Could someone point me how to order docIds as per **
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*
> >
> > *"Limit usage of stored fields and term vectors. Retrieving these from
> the index is quite costly. Typically you should only retrieve these for the
> current "page" the user will see, not for all documents in the full result
> set. For each document retrieved, Lucene must seek to a different location
> in various files. Try sorting the documents you need to retrieve by docID
> order first."*
> >
> > *To give some background:*
> >
> > *We are using plain vanilla LUCNE (version 4.2.1) for our **Our
> application.**We index our documents using stored fields. We add two fields
> related to our documents: UUID: 9 digit number represents internal id and
> > doc_text: document text( 7k to 20K in size approx). In our search code,
> **we use boolean Query to retrive by UUID  and fetch document text use if
> for other processing. We are noticing slow response times with the
> searches. I understand that stored field retrieval are slower and should be
> limited but this is mandatory for our app.*
> >
> >
> > Current code:
> >
> > TopScoreDocCollector collector =
> > TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);
> >
> > dirReader = DirectoryReader.open(FSDirectory.open(......))
> > IndexSearcher indexSearcher = new IndexSearcher(dirReader);
> indexSearcher.search(query, collector); ScoreDoc[] scoreDocs =
> collector.topDocs().scoreDocs;
> >
> > for (ScoreDoc scoreDoc : scoreDocs) {
> > Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text =
> luceneDoc.get("doc_text"); //these calls take lot of time
> >
> > //process text
> > }
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Order docIds to reduce disk seeks

Reply via email to