How do these two go together? I think for DOCS_ONLY it makes sense that we lie (say freq=1 when we > don't know): lots of places would otherwise have to be special cased > for when they consume DOCS_ONLY vs DOCS_AND_POSITIONS.
and I'm also not sure that > all codecs return 1 today if the fields was indexed with DOCS_ONLY ... That just makes it even worse right? I.e., we have code today that relies no that behavior, but we're not sure it works w/ all Codecs? Remember that DocIdSetIterator.nextDoc() was loosely specified? It was very hard to write a decent DISI consumer. Sometimes calling nextDoc() returned MAX_VAL, sometimes -1, sometimes who knows. When we hardened the spec, it actually made consumers' life easier, I think? It's ok if we say that for DOCS_ONLY you have to return 1. That's even 99.9% of the time the correct value to return (unless someone adds e.g. the same StringField twice to the document). And it's also ok to say that if you passed FLAG_NONE, freq()'s value is unspecified. I think it would be wrong to "lie" here .. not sure if the consumer always knows how DocsEnum was requested. Not sure if this happens in real life though (consuming a DocsEnum that you didn't obtain yourself), so I'm willing to ignore that case. These two together sound like a reasonable "spec" to me? Shai On Mon, Dec 17, 2012 at 7:16 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > I think the FLAG_NONE ("I don't need/want freqs when reading the > index") and the DOCS_ONLY ("Do not index freqs") are two different > cases? > > I think for DOCS_ONLY it makes sense that we lie (say freq=1 when we > don't know): lots of places would otherwise have to be special cased > for when they consume DOCS_ONLY vs DOCS_AND_POSITIONS. > > But, for FLAG_NONE, when the caller passes this it means they have no > intention of using/calling freq() right? Eg > MultiTermQueryWrapperFilter would pass this. For that case I'm not > sure we should promise / require that codecs return 1 always? EG what > if the index does has freqs? I think in that case the codec shouldn't > be required to go out of its way and return 1? I'm also not sure that > all codecs return 1 today if the fields was indexed with DOCS_ONLY ... > > Mike McCandless > > http://blog.mikemccandless.com > > On Mon, Dec 17, 2012 at 11:24 AM, Shai Erera <ser...@gmail.com> wrote: > > Hi > > > > While migrating code to Lucene 4.0, I noticed that I have an assert on a > > field that is indexed with DOCS_ONLY that DocsEnum.freq() == 1. This got > me > > thinking ... why? > > > > If you index w/ DOCS_ONLY, or ask for DocsEnum with FLAG_NONE, why do we > > "lie" to the consumer? Rather, we could just return 0 or -1? > > > > I personally don't mind if we continue to return 1, if there's a real > reason > > to. I don't think that anyone should call freq() if he asked for DocsEnum > > with FLAG_NONE. But if we do keep the current behavior, can we at least > > document it? > > > > E.g., something like this patch: > > > > Index: lucene/core/src/java/org/apache/lucene/index/DocsEnum.java > > =================================================================== > > --- lucene/core/src/java/org/apache/lucene/index/DocsEnum.java (revision > > 1422804) > > +++ lucene/core/src/java/org/apache/lucene/index/DocsEnum.java (working > > copy) > > @@ -47,10 +47,16 @@ > > protected DocsEnum() { > > } > > > > - /** Returns term frequency in the current document. Do > > - * not call this before {@link #nextDoc} is first called, > > - * nor after {@link #nextDoc} returns NO_MORE_DOCS. > > - **/ > > + /** > > + * Returns term frequency in the current document, or 1 if the > > + * {@link DocsEnum} was obtained with {@link #FLAG_NONE}. Do not call > > this > > + * before {@link #nextDoc} is first called, nor after {@link #nextDoc} > > returns > > + * {@link DocIdSetIterator#NO_MORE_DOCS}. > > + * > > + * <p> > > + * <b>NOTE:</b> if the {@link DocsEnum} was obtain with {@link > > #FLAG_NONE}, > > + * this method returns 1. > > + */ > > public abstract int freq() throws IOException; > > > > /** Returns the related attributes. */ > > > > Shai > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >