How do these two go together?

I think for DOCS_ONLY it makes sense that we lie (say freq=1 when we
> don't know): lots of places would otherwise have to be special cased
> for when they consume DOCS_ONLY vs DOCS_AND_POSITIONS.


and

I'm also not sure that
> all codecs return 1 today if the fields was indexed with DOCS_ONLY ...


 That just makes it even worse right? I.e., we have code today that relies
no that behavior, but we're not sure it works w/ all Codecs?

Remember that DocIdSetIterator.nextDoc() was loosely specified? It was very
hard to write a decent DISI consumer. Sometimes calling nextDoc() returned
MAX_VAL, sometimes -1, sometimes who knows. When we hardened the spec, it
actually made consumers' life easier, I think?

It's ok if we say that for DOCS_ONLY you have to return 1. That's even
99.9% of the time the correct value to return (unless someone adds e.g. the
same StringField twice to the document).

And it's also ok to say that if you passed FLAG_NONE, freq()'s value is
unspecified. I think it would be wrong to "lie" here .. not sure if the
consumer always knows how DocsEnum was requested. Not sure if this happens
in real life though (consuming a DocsEnum that you didn't obtain yourself),
so I'm willing to ignore that case.

These two together sound like a reasonable "spec" to me?

Shai


On Mon, Dec 17, 2012 at 7:16 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> I think the FLAG_NONE ("I don't need/want freqs when reading the
> index") and the DOCS_ONLY ("Do not index freqs") are two different
> cases?
>
> I think for DOCS_ONLY it makes sense that we lie (say freq=1 when we
> don't know): lots of places would otherwise have to be special cased
> for when they consume DOCS_ONLY vs DOCS_AND_POSITIONS.
>
> But, for FLAG_NONE, when the caller passes this it means they have no
> intention of using/calling freq() right?  Eg
> MultiTermQueryWrapperFilter would pass this.  For that case I'm not
> sure we should promise / require that codecs return 1 always?  EG what
> if the index does has freqs?  I think in that case the codec shouldn't
> be required to go out of its way and return 1?  I'm also not sure that
> all codecs return 1 today if the fields was indexed with DOCS_ONLY ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Dec 17, 2012 at 11:24 AM, Shai Erera <ser...@gmail.com> wrote:
> > Hi
> >
> > While migrating code to Lucene 4.0, I noticed that I have an assert on a
> > field that is indexed with DOCS_ONLY that DocsEnum.freq() == 1. This got
> me
> > thinking ... why?
> >
> > If you index w/ DOCS_ONLY, or ask for DocsEnum with FLAG_NONE, why do we
> > "lie" to the consumer? Rather, we could just return 0 or -1?
> >
> > I personally don't mind if we continue to return 1, if there's a real
> reason
> > to. I don't think that anyone should call freq() if he asked for DocsEnum
> > with FLAG_NONE. But if we do keep the current behavior, can we at least
> > document it?
> >
> > E.g., something like this patch:
> >
> > Index: lucene/core/src/java/org/apache/lucene/index/DocsEnum.java
> > ===================================================================
> > --- lucene/core/src/java/org/apache/lucene/index/DocsEnum.java  (revision
> > 1422804)
> > +++ lucene/core/src/java/org/apache/lucene/index/DocsEnum.java  (working
> > copy)
> > @@ -47,10 +47,16 @@
> >    protected DocsEnum() {
> >    }
> >
> > -  /** Returns term frequency in the current document.  Do
> > -   *  not call this before {@link #nextDoc} is first called,
> > -   *  nor after {@link #nextDoc} returns NO_MORE_DOCS.
> > -   **/
> > +  /**
> > +   * Returns term frequency in the current document, or 1 if the
> > +   * {@link DocsEnum} was obtained with {@link #FLAG_NONE}. Do not call
> > this
> > +   * before {@link #nextDoc} is first called, nor after {@link #nextDoc}
> > returns
> > +   * {@link DocIdSetIterator#NO_MORE_DOCS}.
> > +   *
> > +   * <p>
> > +   * <b>NOTE:</b> if the {@link DocsEnum} was obtain with {@link
> > #FLAG_NONE},
> > +   * this method returns 1.
> > +   */
> >    public abstract int freq() throws IOException;
> >
> >    /** Returns the related attributes. */
> >
> > Shai
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to