Re: Use of AllTermDocs with custom scorer

Peter Keegan Tue, 17 Nov 2009 09:37:47 -0800

> But if re-creating the entire file on each reopen isn't a problem for
> you then there's no need to change this :)


It's actually created after IndexWriter.commit(), but same idea. If we
needed real-time indexing, or if disk I/O gets excessive, I'd go with
separate files per segment.

>Hmm -- if you are extending HitCollector and passing that to search(),
>then the docIDs fed to it should already be top-level docIDs, not
>segment relative.

I just assumed the same was true for the collector, but you're right. The
incorrect sorting I see must be due to something else.

Thanks,
Peter

On Tue, Nov 17, 2009 at 11:51 AM, Michael McCandless <
[email protected]> wrote:

> On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan <[email protected]>
> wrote:
> > The external data is just an array of fixed-length records, one for each
> > Lucene document. Indexes are updated at regular intervals in one jvm. A
> > searcher jvm opens the index and reads all the fixed-length records into
> > RAM. Given an index-wide docId, the custom scorer can quickly access the
> > corresponding fixed-length external data.
> >
> > Could you explain a bit more about how mapping the external data to be
> per
> > segment would work? As I said, rebuilding the whole file isn't a big deal
> > and the single file keeps the Searcher's use of it simple.
>
> Well, you could use IndexReader.getSequentialSubReaders(), then step
> through that array of SegmentReaders, making a seprate external file
> for each?
>
> This way, when you reopen your readers, you would only need to make a
> new external file for those segments that are new.
>
> But if re-creating the entire file on each reopen isn't a problem for
> you then there's no need to change this :)
>
> > With or without a SegmentReader->docBase map (which does sound like a
> huge
> > performance hit), I still don't see how the custom scorer gets the
> segment
> > number. Btw, the custom scorer usually becomes part of a
> ConjunctionScorer
> > (if that matters)
>
> Looks like you already answered this (Lucene asks the Query's weight
> for a new scorer one segment at a time).
>
> >>FSHQ expects you to init it with the top-level reader, and then insert
> > using top docIDs.
> > For sorting, I'm using FSHQ directly with a custom collector that inserts
> > docs to the FSHQ. But the custom collector is passed the segment-relative
> > docId and the custom comparator needs the index-wide docId. The custom
> > collector extends HitCollector. I'm missing where this type of collector
> > finds the docBase.
>
> Hmm -- if you are extending HitCollector and passing that to search(),
> then the docIDs fed to it should already be top-level docIDs, not
> segment relative.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Use of AllTermDocs with custom scorer

Reply via email to