I'm looking through the trunk code on various implementations of 
Filter.getDocIdSet(IndexReader).  It is often needed to get an instance of 
Terms and then do other work from there.  Looking at 
MultiTermQueryWrapperFilter, the first set of lines to do this is:

  public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
    final Fields fields = MultiFields.getFields(reader);
    if (fields == null) {
      // reader has no fields
      return DocIdSet.EMPTY_DOCIDSET;
    }

    final Terms terms = fields.terms(query.field);
    if (terms == null) {
      // field does not exist
      return DocIdSet.EMPTY_DOCIDSET;
    }
....

When I look at the javadoc for MultiFields.getFields(reader), I see some 
Javadoc (apparently written by Michael McCandless, CC'ed), with the following 
javadoc snippet :  
   *  <p><b>NOTE</b>: this is a slow way to access postings.
   *  It's better to get the sub-readers (using {...@link
   *  Gather}) and iterate through them
   *  yourself.

If this is the case, then why is MultiFields.getFields(reader) used 43 times 
across Lucene/Solr whereas ReaderUtil.Gather is only used 5 times?  If it's a 
TODO then perhaps a JIRA issue needs to be created.  I don't find helpful 
examples of how to use ReaderUtil.Gather... the existing 5 uses are all within 
MultiFields & ReaderUtil.

FWIW, in a Lucene Filter I wrote, I've been using this code snippet 
successfully:

        Terms terms = reader.fields().terms(fieldName);

On a related topic, I think that if Filter.getDocIdSet() is documented that it 
may return null, then it's better code design to consequently return null in 
appropriate circumstances instead of DocIdSet.EMPTY_DOCIDSET.  That said, FWIW, 
I prefer API design that favors non-null when you can get away with it, like 
this case.  So I'm in favor of making getDocIdSet() be documented to not return 
null (and follow through throughout the codebase).  Admittedly some callers 
might have short-circuit logic.

~ David Smiley
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to