> But if re-creating the entire file on each reopen isn't a problem for > you then there's no need to change this :)
It's actually created after IndexWriter.commit(), but same idea. If we needed real-time indexing, or if disk I/O gets excessive, I'd go with separate files per segment. >Hmm -- if you are extending HitCollector and passing that to search(), >then the docIDs fed to it should already be top-level docIDs, not >segment relative. I just assumed the same was true for the collector, but you're right. The incorrect sorting I see must be due to something else. Thanks, Peter On Tue, Nov 17, 2009 at 11:51 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Tue, Nov 17, 2009 at 8:58 AM, Peter Keegan <peterlkee...@gmail.com> > wrote: > > The external data is just an array of fixed-length records, one for each > > Lucene document. Indexes are updated at regular intervals in one jvm. A > > searcher jvm opens the index and reads all the fixed-length records into > > RAM. Given an index-wide docId, the custom scorer can quickly access the > > corresponding fixed-length external data. > > > > Could you explain a bit more about how mapping the external data to be > per > > segment would work? As I said, rebuilding the whole file isn't a big deal > > and the single file keeps the Searcher's use of it simple. > > Well, you could use IndexReader.getSequentialSubReaders(), then step > through that array of SegmentReaders, making a seprate external file > for each? > > This way, when you reopen your readers, you would only need to make a > new external file for those segments that are new. > > But if re-creating the entire file on each reopen isn't a problem for > you then there's no need to change this :) > > > With or without a SegmentReader->docBase map (which does sound like a > huge > > performance hit), I still don't see how the custom scorer gets the > segment > > number. Btw, the custom scorer usually becomes part of a > ConjunctionScorer > > (if that matters) > > Looks like you already answered this (Lucene asks the Query's weight > for a new scorer one segment at a time). > > >>FSHQ expects you to init it with the top-level reader, and then insert > > using top docIDs. > > For sorting, I'm using FSHQ directly with a custom collector that inserts > > docs to the FSHQ. But the custom collector is passed the segment-relative > > docId and the custom comparator needs the index-wide docId. The custom > > collector extends HitCollector. I'm missing where this type of collector > > finds the docBase. > > Hmm -- if you are extending HitCollector and passing that to search(), > then the docIDs fed to it should already be top-level docIDs, not > segment relative. > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >