Hi Vijay,

I'm guessing Michael means that perhaps your text processing step could be
better solved by using Lucene features. The use case of Lucene you describe
in your post is better suited to a key value store or a relational database.

Can you give more details on what your text processing step does?

Barry

On Nov 18, 2014 7:41 PM, "Vijay B" <vijay.nip...@gmail.com> wrote:
>
> Hi Mike,  could you provide some pointers on using inverted index. Any
> examples or what API classes to use to accomplish this.
>
> On Tue, Nov 18, 2014 at 12:40 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
> > Even if you sort all hits by docID it's likely too slow to visit every
> > single one and load the stored document ...
> >
> > Try to find another way to solve your problem, making use of the
inverted
> > index?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Mon, Nov 17, 2014 at 6:05 PM, Rose, Stuart J <stuart.r...@pnnl.gov>
> > wrote:
> > > Hi Vijay,
> > >
> > > ...sorting the documents you need to retrieve by docID order first...
> > >
> > > means sorting them by their 'document number' which is the value in
the
> > 'scoreDoc.doc' field and is the value that the reader takes to
'retrieve'
> > the document from the index. If you write a comparator to sort the
elements
> > in the ScoreDoc[] by their doc field then that will put them in 'docID
> > order' and the reader will always be skipping forward to the next doc
which
> > will probably reduce its seek time.
> > >
> > > Regards,
> > > Stuart
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Vijay B [mailto:vijay.nip...@gmail.com]
> > > Sent: Monday, November 17, 2014 9:16 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Order docIds to reduce disk seeks
> > >
> > > *Could someone point me how to order docIds as per **
> > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> > > <http://wiki.apache.org/lucene-java/ImproveSearchingSpeed>*
> > >
> > > *"Limit usage of stored fields and term vectors. Retrieving these from
> > the index is quite costly. Typically you should only retrieve these for
the
> > current "page" the user will see, not for all documents in the full
result
> > set. For each document retrieved, Lucene must seek to a different
location
> > in various files. Try sorting the documents you need to retrieve by
docID
> > order first."*
> > >
> > > *To give some background:*
> > >
> > > *We are using plain vanilla LUCNE (version 4.2.1) for our **Our
> > application.**We index our documents using stored fields. We add two
fields
> > related to our documents: UUID: 9 digit number represents internal id
and
> > > doc_text: document text( 7k to 20K in size approx). In our search
code,
> > **we use boolean Query to retrive by UUID  and fetch document text use
if
> > for other processing. We are noticing slow response times with the
> > searches. I understand that stored field retrieval are slower and
should be
> > limited but this is mandatory for our app.*
> > >
> > >
> > > Current code:
> > >
> > > TopScoreDocCollector collector =
> > > TopScoreDocCollector.create(BooleanQuery.getMaxClauseCount(), true);
> > >
> > > dirReader = DirectoryReader.open(FSDirectory.open(......))
> > > IndexSearcher indexSearcher = new IndexSearcher(dirReader);
> > indexSearcher.search(query, collector); ScoreDoc[] scoreDocs =
> > collector.topDocs().scoreDocs;
> > >
> > > for (ScoreDoc scoreDoc : scoreDocs) {
> > > Document luceneDoc = indexSearcher.doc(scoreDoc.doc); String text =
> > luceneDoc.get("doc_text"); //these calls take lot of time
> > >
> > > //process text
> > > }
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >

Reply via email to