RE: No subsearcher in Lucene 3.3?

Uwe Schindler Tue, 30 Aug 2011 12:52:49 -0700

Hi,

Use ReaderUtil from o.a.l.util package that does the recursive traversal of
the reader tree. It has methods to solve this problems. You can cache the
int[] start array that contains the starting document ids for each
subreader. This makes it possible to use standard TopDocs based search
without Collectors (which should not be required for your case) to remap the
document ids.


As for this issue you are not interested in stepping recursively into the
reader tree to the lowest level (as non-optimized subindexes will also
expand to multiple readers), so the only thing you would like to know is: on
which direct subreader of MultiReader you are interested. For a quick
lookup, an approach might be to iterate *once* before search over the direct
subreaders of the MultiReader (without recursion), and sum up the maxDoc()
(not numDocs!) return values. For each subreader (starting with 0) put the
sum into a TreeMap (!!!) with the target index name or whatever you need to
identify the subreader. You can then lookup the docid from the TopDocs
object using TreeMap.floorEntry(docId).getValue() (Java 6 only).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Devon H. O'Dell [mailto:[email protected]]
> Sent: Tuesday, August 30, 2011 8:04 PM
> To: [email protected]
> Subject: Re: No subsearcher in Lucene 3.3?
> 
> 2011/8/30 Joe MA <[email protected]>:
> > When searching a single collection, no problem.  But if I want to search
the
> two collections at the same time, I need to know which collection the hit
came
> from so I can retrieve the base_path from the database.  These base_paths
can
> be different.  As mentioned, this was trivial in Lucene 1.x and 2.x as I
just
> grabbed the subsearcher from the result, which would for example return a
1
> or 2 indicating which of the two collections the result came from.  Then I
can
> build the path to the file.  In other words, subsearcher gave me the
foreign key
> I needed to map to additional external information associated with each
index
> during a multisearch.  That is now gone in Lucene 3.3.
> 
> You could use the suggestion I made of doing the loop over the IndexReader
> subReaders (recursively until you get to the
> SegmentReaders) and use a HashMap<SegmentReader, String> (or similar
> container structure) to associate the segments to a path. It sounds like
your
> application doesn't reopen indexes with much frequency, which is good: you
> will need to regenerate this map any time you reopen your indexes.
> 
> When collector.setNextReader is called, you can simply get (at that
> point) the String associated with the particular SegmentReader you're
working
> with. Then, every time Collector.collect is called, you can tack that on
to
> whatever data structure you're using to get at your documents. It doesn't
have
> to be high memory overhead if you make sure the strings are interned.
> 
> Perhaps Uwe or other Lucene devs have better ideas for approaching this;
they
> often do :)
> 
> --dho
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: No subsearcher in Lucene 3.3?

Reply via email to