I think the FLAG_NONE ("I don't need/want freqs when reading the
index") and the DOCS_ONLY ("Do not index freqs") are two different
cases?

I think for DOCS_ONLY it makes sense that we lie (say freq=1 when we
don't know): lots of places would otherwise have to be special cased
for when they consume DOCS_ONLY vs DOCS_AND_POSITIONS.

But, for FLAG_NONE, when the caller passes this it means they have no
intention of using/calling freq() right?  Eg
MultiTermQueryWrapperFilter would pass this.  For that case I'm not
sure we should promise / require that codecs return 1 always?  EG what
if the index does has freqs?  I think in that case the codec shouldn't
be required to go out of its way and return 1?  I'm also not sure that
all codecs return 1 today if the fields was indexed with DOCS_ONLY ...

Mike McCandless

http://blog.mikemccandless.com

On Mon, Dec 17, 2012 at 11:24 AM, Shai Erera <ser...@gmail.com> wrote:
> Hi
>
> While migrating code to Lucene 4.0, I noticed that I have an assert on a
> field that is indexed with DOCS_ONLY that DocsEnum.freq() == 1. This got me
> thinking ... why?
>
> If you index w/ DOCS_ONLY, or ask for DocsEnum with FLAG_NONE, why do we
> "lie" to the consumer? Rather, we could just return 0 or -1?
>
> I personally don't mind if we continue to return 1, if there's a real reason
> to. I don't think that anyone should call freq() if he asked for DocsEnum
> with FLAG_NONE. But if we do keep the current behavior, can we at least
> document it?
>
> E.g., something like this patch:
>
> Index: lucene/core/src/java/org/apache/lucene/index/DocsEnum.java
> ===================================================================
> --- lucene/core/src/java/org/apache/lucene/index/DocsEnum.java  (revision
> 1422804)
> +++ lucene/core/src/java/org/apache/lucene/index/DocsEnum.java  (working
> copy)
> @@ -47,10 +47,16 @@
>    protected DocsEnum() {
>    }
>
> -  /** Returns term frequency in the current document.  Do
> -   *  not call this before {@link #nextDoc} is first called,
> -   *  nor after {@link #nextDoc} returns NO_MORE_DOCS.
> -   **/
> +  /**
> +   * Returns term frequency in the current document, or 1 if the
> +   * {@link DocsEnum} was obtained with {@link #FLAG_NONE}. Do not call
> this
> +   * before {@link #nextDoc} is first called, nor after {@link #nextDoc}
> returns
> +   * {@link DocIdSetIterator#NO_MORE_DOCS}.
> +   *
> +   * <p>
> +   * <b>NOTE:</b> if the {@link DocsEnum} was obtain with {@link
> #FLAG_NONE},
> +   * this method returns 1.
> +   */
>    public abstract int freq() throws IOException;
>
>    /** Returns the related attributes. */
>
> Shai

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to