RE: Iterating TermsEnum for Long field produces zero values at the end

Uwe Schindler Mon, 17 Nov 2014 10:52:48 -0800

Hi,

> It is expected: those are the "prefix" terms, which come after all the full-
> precision numeric terms.
> 
> But I'm not sure why you see 0s ... the bytes should be unique for every term
> you get back from the TermsEnum.


That's easy to explain:

The lower precision terms at the end have more than one doc in the DocsEnum, 
you always return only the first (Lucene docid 0, you never list all other 
entries in DocsEnum). The prefixcoded term has a shift value> 0 and because 
bits are stripped from the right, the small long values will therefore return 
0L after decoding.

In general to have such a type of cache, I would not use terms and instead use 
numeric docvalues. An alternative is to use FieldCache, which does the right 
thing automatically. Relying on the internal implementation of numeric terms is 
not a good idea.

Uwe

> On Mon, Nov 17, 2014 at 10:39 AM, Barry Coughlan
> <b.coughl...@gmail.com> wrote:
> > Hi all,
> >
> > I'm using 4.10.2. I have a Long "id" field. Each document has one "id"
> > value. I am creating a look-up between Lucene's internal document id
> > and my "id" values by enumerating the inverted index:
> >
> >     private long[] cacheDocIds() throws IOException {
> >         long[] ourIds = new long[reader.maxDoc()];
> >
> >         Bits liveDocs = MultiFields.getLiveDocs(reader);
> >         Fields fields = MultiFields.getFields(reader);
> >         Terms terms = fields.terms("id");
> >
> >         TermsEnum iterator = terms.iterator(null);
> >         BytesRef bytesRef = null;
> >         while ((bytesRef = iterator.next()) != null) {
> >             DocsEnum docsEnum = iterator.docs(liveDocs, null,
> > DocsEnum.FLAG_NONE);
> >
> >             int luceneId = docsEnum.nextDoc();
> >             long ourId = NumericUtils.prefixCodedToLong(bytesRef);
> >             System.out.println(luceneId + " " + ourId);
> >             ourIds[luceneId] = ourId;
> >         }
> >
> >         return ourIds;
> >     }
> >
> > With 5 documents (1, 2, 3, 4, 5) I get this output from the above code:
> >
> > 0 1
> > 1 2
> > 2 3
> > 3 4
> > 4 5
> > 0 0
> > 0 0
> > 0 0
> >
> > I don't understand why there are three zeroes at the end.
> >
> > - reader.maxDoc is 5 and no documents have been deleted.
> > - I have tried this with a varying number of documents and there are
> > always three zeroes at the end.
> > - I tried changing version to Lucene 4.10.0 and Lucene 4.9 and the
> > same behavior occurs.
> >
> > I can work around this with but I'm just curious if this behavior is
> > expected?
> >
> > Regards,
> > Barry
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Iterating TermsEnum for Long field produces zero values at the end

Reply via email to