Hi Mike, could you provide some pointers on using inverted index. Any examples or what API classes to use to accomplish this.
On Tue, Nov 18, 2014 at 12:40 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Even if you sort all hits by docID it's likely too slow to visit every > single one and load the stored document ... > > Try to find another way to solve your problem, making use of the inverted > index? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J <stuart.r...@pnnl.gov> > wrote: > > Hi Vijay, > > > > ...sorting the documents you need to retrieve by docID order first... > > > > means sorting them by their 'document number' which is the value in the > 'scoreDoc.doc' field and is the value that the reader takes to 'retrieve' > the document from the index. If you write a comparator to sort the elements > in the ScoreDoc[] by their doc field then that will put them in 'docID > order' and the reader will always be skipping forward to the next doc which > will probably reduce its seek time. > > > > Regards, > > Stuart > > > > > > > > -----Original Message----- > > From: Vijay B [mailto:vijay.nip...@gmail.com] > > Sent: Monday, November 17, 2014 9:16 AM > > To: java-user@lucene.apache.org > > Subject: Order docIds to reduce disk seeks > > > > *Could someone point me how to order docIds as per ** > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed > > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>* > > > > *"Limit usage of stored fields and term vectors. Retrieving these from > the index is quite costly. Typically you should only retrieve these for the > current "page" the user will see, not for all documents in the full result > set. For each document retrieved, Lucene must seek to a different location > in various files. Try sorting the documents you need to retrieve by docID > order first."* > > > > *To give some background:* > > > > *We are using plain vanilla LUCNE (version 4.2.1) for our **Our > application.**We index our documents using stored fields. We add two fields > related to our documents: UUID: 9 digit number represents internal id and > > doc_text: document text( 7k to 20K in size approx). In our search code, > **we use boolean Query to retrive by UUID and fetch document text use if > for other processing. We are noticing slow response times with the > searches. I understand that stored field retrieval are slower and should be > limited but this is mandatory for our app.* > > > > > > Current code: > > > > TopScoreDocCollector collector = > > TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true); > > > > dirReader = DirectoryReader.open(FSDirectory.open(......)) > > IndexSearcher indexSearcher = new IndexSearcher(dirReader); > indexSearcher.search(query, collector); ScoreDoc[] scoreDocs = > collector.topDocs().scoreDocs; > > > > for (ScoreDoc scoreDoc : scoreDocs) { > > Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text = > luceneDoc.get("doc_text"); //these calls take lot of time > > > > //process text > > } > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >