Re: Use of MultiFields.getFields() bad practice?

Michael McCandless Wed, 05 Jan 2011 09:24:30 -0800

I just cleaned up a few more unnecessary ones.

Really, we need to break out atomic vs composite IndexReaders.
Effectively, we already have, it's just that it's dynamically typed
(you hit exc's at runtime) not statically typed.  I'd like to make it
statically typed so it's clear which APIs take what readers.  EG
Filter.getDocIDSet always takes an atomic reader.


I do agree we should be returning null not EMPTY_DOCIDSET since
Filter.getDocIDSet's jdoc clearly states a null return means no docs
are accepted.

And I think I'm on the opposite side of the fence on null returns, at
least for advanced APIs -- I'd prefer not to hide information (by
returning an empty instance) in this case.  But other cases I think
should never return null -- eg once you have a non-null DocIdSet, then
its .iterator() should never return null.

Mike

On Tue, Jan 4, 2011 at 5:38 PM, Smiley, David W. <dsmi...@mitre.org> wrote:
> I'm looking through the trunk code on various implementations of 
> Filter.getDocIdSet(IndexReader).  It is often needed to get an instance of 
> Terms and then do other work from there.  Looking at 
> MultiTermQueryWrapperFilter, the first set of lines to do this is:
>
>  public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
>    final Fields fields = MultiFields.getFields(reader);
>    if (fields == null) {
>      // reader has no fields
>      return DocIdSet.EMPTY_DOCIDSET;
>    }
>
>    final Terms terms = fields.terms(query.field);
>    if (terms == null) {
>      // field does not exist
>      return DocIdSet.EMPTY_DOCIDSET;
>    }
> ....
>
> When I look at the javadoc for MultiFields.getFields(reader), I see some 
> Javadoc (apparently written by Michael McCandless, CC'ed), with the following 
> javadoc snippet :
>   *  <p><b>NOTE</b>: this is a slow way to access postings.
>   *  It's better to get the sub-readers (using {...@link
>   *  Gather}) and iterate through them
>   *  yourself.
>
> If this is the case, then why is MultiFields.getFields(reader) used 43 times 
> across Lucene/Solr whereas ReaderUtil.Gather is only used 5 times?  If it's a 
> TODO then perhaps a JIRA issue needs to be created.  I don't find helpful 
> examples of how to use ReaderUtil.Gather... the existing 5 uses are all 
> within MultiFields & ReaderUtil.
>
> FWIW, in a Lucene Filter I wrote, I've been using this code snippet 
> successfully:
>
>        Terms terms = reader.fields().terms(fieldName);
>
> On a related topic, I think that if Filter.getDocIdSet() is documented that 
> it may return null, then it's better code design to consequently return null 
> in appropriate circumstances instead of DocIdSet.EMPTY_DOCIDSET.  That said, 
> FWIW, I prefer API design that favors non-null when you can get away with it, 
> like this case.  So I'm in favor of making getDocIdSet() be documented to not 
> return null (and follow through throughout the codebase).  Admittedly some 
> callers might have short-circuit logic.
>
> ~ David Smiley

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Use of MultiFields.getFields() bad practice?

Reply via email to