Re: HitCollector or Hits

Erick Erickson Thu, 24 May 2007 09:35:37 -0700

You're on the right track. But that said, access to anything that's
indexed (stored or not) should be pretty quick. Things
stored, but not indexed, are costlier. This might drive your
decision on what to index .vs. store.....


Loading the document is anything like IndexReader.document(), or
Hits.doc().

Part of the difference is that if you load the document, you get
all the fields, whether you need them or not.

Also, you can use your own TermEnum/TermDocs lookup for
this kind of thing if the terms you're interested in are indexed...

I wrote a mail some time ago that detailed my experience, in my
situation with my peculiar data set that you may want to read,
see...

Lucene 2.1, using FieldSelector speeds up my app by a factor of 10+,


As I mentioned in that message, I suspect that my improvement was
*highly* dependent upon how the index is structured.....

All that said, your notion of benchmarking is a very good one. It lead
me to using FieldSelector in the first place...

Best
Erick

On 5/24/07, Carlos Pita <[EMAIL PROTECTED]> wrote:


Hi Erick,

thank you for your prompt answer. What do you mean by loading the
document?
Accessing one of the stored fields? In that case I'm afraid I would need
to
do it. For example, in the aforementioned case of a result of products, I
have to look at any product store_id, which is stored along the document.
Is
this a performance killer? Maybe I should keep some tables in memory, for
example an array mapping from id to store_id in O(1). I will do some
benchmarking before anyway.

Cheers,
Carlos

On 5/24/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
>
> I know of no way to alter the Hits behavior, I recommend using
> a TopDocs/TopDocCollector.
>
> But be aware that if you load the document for each one, you may incur
> a significant penalty, although the lazy-loading helped me a lot, see
> FieldSelector.....
>
> On 5/23/07, Carlos Pita <[EMAIL PROTECTED]> wrote:
> >
> > Hi folks,
> >
> > I need to collect some global information from my first 1000 search
> > results
> > in order to build up some search refining components containing only
> > relevant values (those which correspond to at least one of the first
> 1000
> > hits). For example, the results are products and there is a store
filter
> > component that shows only the stores that sells a product between the
> > first
> > 1000 hits. So even if the user sees just the first 20, I would have to
> > inspect the first 1000. I've read that Hits mantains a cache of about
> 100
> > or
> > 200 hits. Is this configurable? If I could set this cache to 1000 I
> would
> > then use Hits to browse the search results. Another way, I should use
> > HitCollector. What's your advice?
> >
> > TIA
> > Cheers,
> > Carlos
> >
>

Re: HitCollector or Hits

Reply via email to