RE: Use of MultiFields.getFields() bad practice?

Uwe Schindler Tue, 04 Jan 2011 14:49:53 -0800

Hi David,

As all Filters and Scorers now work solely on segments, MultiFields is no
longer needed for those cases. Usage of MultiFields is not a problem here,
as all index readers passed into these parts of Lucene are all  atomic, so
MultiFields is a no-op.  There is as far as I know already an issue open to
remove that (see also https://issues.apache.org/jira/browse/LUCENE-2771 for
norms). But this is not important at the moment. So you can at all those
places simply replace that by the direct call to IndexReader. This was just
not yet done.


We currently have MultiFields in that for backwards compatibility (to
support executing a filter on a composite reader). But as trunk no longer
needs backwards compatibility, this is no longer needed but does not hurt,
it's currently just inconsistent.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Smiley, David W. [mailto:[email protected]]
> Sent: Tuesday, January 04, 2011 11:38 PM
> To: [email protected] Dev
> Cc: Michael McCandless
> Subject: Use of MultiFields.getFields() bad practice?
> 
> I'm looking through the trunk code on various implementations of
> Filter.getDocIdSet(IndexReader).  It is often needed to get an instance of
> Terms and then do other work from there.  Looking at
> MultiTermQueryWrapperFilter, the first set of lines to do this is:
> 
>   public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
>     final Fields fields = MultiFields.getFields(reader);
>     if (fields == null) {
>       // reader has no fields
>       return DocIdSet.EMPTY_DOCIDSET;
>     }
> 
>     final Terms terms = fields.terms(query.field);
>     if (terms == null) {
>       // field does not exist
>       return DocIdSet.EMPTY_DOCIDSET;
>     }
> ....
> 
> When I look at the javadoc for MultiFields.getFields(reader), I see some
> Javadoc (apparently written by Michael McCandless, CC'ed), with the
> following javadoc snippet :
>    *  <p><b>NOTE</b>: this is a slow way to access postings.
>    *  It's better to get the sub-readers (using {...@link
>    *  Gather}) and iterate through them
>    *  yourself.
> 
> If this is the case, then why is MultiFields.getFields(reader) used 43
times
> across Lucene/Solr whereas ReaderUtil.Gather is only used 5 times?  If
it's a
> TODO then perhaps a JIRA issue needs to be created.  I don't find helpful
> examples of how to use ReaderUtil.Gather... the existing 5 uses are all
within
> MultiFields & ReaderUtil.
> 
> FWIW, in a Lucene Filter I wrote, I've been using this code snippet
> successfully:
> 
>       Terms terms = reader.fields().terms(fieldName);
> 
> On a related topic, I think that if Filter.getDocIdSet() is documented
that it
> may return null, then it's better code design to consequently return null
in
> appropriate circumstances instead of DocIdSet.EMPTY_DOCIDSET.  That said,
> FWIW, I prefer API design that favors non-null when you can get away with
> it, like this case.  So I'm in favor of making getDocIdSet() be documented
to
> not return null (and follow through throughout the codebase).  Admittedly
> some callers might have short-circuit logic.
> 
> ~ David Smiley
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Use of MultiFields.getFields() bad practice?

Reply via email to