Thanks for the reply. Our metadata is not stored in a single field, but is rather a collection of fields. So, it requires a boolean search that spans multiple fields. My understanding is that it is not possible to iterate over the matching documents efficiently using termDocs() when the search involves multiple terms and/or multiple fields, right?
/Jong On Mon, Apr 23, 2012 at 11:58 AM, Earl Hood <[email protected]> wrote: > On Mon, Apr 23, 2012 at 10:31 AM, Jong Kim wrote: > > > Is there any good way to solve this design problem? Obviously, an > > alternative design would be to split the index into two, and maintain > > static (and large) data in one index and the other dynamic part in the > > other index. However, this approach is not acceptable due to our data > > pattern where the match on the first index yields very large result set, > > and filtering them against the second index is very inefficient due to > high > > ratio of disjoint data. In other word, while the alternate approach > > significantly reduces the indexing-time overhead, resulting search is > > unacceptably expensive. > > Have you tested to verify it is expensive? If the meta document is > identified with a unique ID (that can be stored with the main document > so you know which meta document to retrieve), accessing the meta > document should be fairly efficient. > > In the project I'm on (we are using Lucen 3.0.3), we just use > InderReader.termDocs() to retrieve a document based on a unique ID we > store in one of the documents fields. > > --ewh > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
