Re: A fast way to get real docID from large indexes?

Bissan AUDEH Wed, 12 Dec 2012 14:46:17 -0800

 Thanx David, I'll give that a try for sure because this "time" issue is 
driving me crazy. it is useless to be very fast in searching the index if you 
need a lot of time to present what you've found in a meaningful way!


Le Mercredi 12 Décembre 2012 23:19 CET, "Smiley, David W." <[email protected]> 
a écrit:

> I suggest you load your unique key field into memory via the FieldCache,
> then reference it that way.  See LUCENE-4541 for a "ValueSourceAccessor"
> proposal.  There are FieldCache based ValueSources.
>
> ~ David Smiley
>
> On 12/12/12 5:00 PM, "Bissan AUDEH" <[email protected]> wrote:
>
> > Thank you Carsten,
> >What I mean by document real name is any stored field in the index that
> >represents the document (ex:Document title, document file name in the
> >file system, document location,...), or anything that you stored as a
> >field at index time and you which to present to the user as  search
> >result, because presenting the LuceneDocID means nothing to the user.
> >
> >What I'm doing actually is something like this :
> >
> >IndexSearcher searcher;
> >TopDocs results =  searcher.search(query, numTotalHits);
> >ScoreDoc[] hits = results.scoreDocs;
> >for (int  i = 0; i < numTotalHits; i++)
> >{
> >   doc = searcher.doc(hits[i].doc);
> >   System.out.println( hits[i].doc + " : " + hits[i].score);
> >}
> >
> >unless I'm doing it wrong, the instruction "searcher.doc(hits[i].doc);"
> >seems to be time consuming for large indexes.
> >
> >I'll take a look at AllDocCollector that you mentioned in your mail
> >hoping it will resolve my problem.
> >
> >Le Mercredi 12 Décembre 2012 13:30 CET, Carsten Schnober
> ><[email protected]> a écrit:
> >
> >> Am 07.12.2012 15:12, schrieb Bissan Audeh:
> >>
> >> > I'm doing some experiments with Lucene where I run many queries and I
> >>keep top 1500  results of each query. I recently switched to Lucene4.0,
> >>but in all cases I find that it takes a lot of time to get the REAL
> >>document id using ScoreDoc and IndexSearcher especially that I have very
> >>large indexes.
> >> > Does anyone know a faster way?
> >> > It would be more efficient to have the document real name as an
> >>attribute of the class ScoreDoc in addition to its luceneID and its
> >>score, because in all cases this information is always needed to show
> >>retrieved documents.
> >>
> >>
> >> By "real" name, do you mean something like the input document title as
> >> opposed to the id assigned by Lucene during indexing? I've resolved this
> >> by storing document name in a dedicated field so that I can use it in a
> >> query or filter.
> >> If you refer to the Lucene index ids, you might be interested in using a
> >> Collector; the example "AllDocCollector" given in the textbook "Lucene
> >> in Action" (McCandless, Hatcher, Gospodnetić, 2nd ed., ch. 6) is
> >> probably helpful.
> >> Best,
> >> Carsten
> >>
> >> --
> >> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> >> Projekt KorAP                 | http://korap.ids-mannheim.de
> >> Tel. +49-(0)621-43740789      | [email protected]
> >> Korpusanalyseplattform der nächsten Generation
> >> Next Generation Corpus Analysis Platform
> >
> >
> >
> >
> >
>

Re: A fast way to get real docID from large indexes?

Reply via email to