Makes sense, thanks. I switched the implementation to a FieldCache with no noticeable performance difference:
private Longs cacheDocIds() throws IOException { AtomicReader wrapped = SlowCompositeReaderWrapper.wrap(reader); Longs vals = FieldCache.DEFAULT.getLongs(wrapped, "id", false); return vals; } Regards, Barry On Mon, Nov 17, 2014 at 6:50 PM, Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > > It is expected: those are the "prefix" terms, which come after all the > full- > > precision numeric terms. > > > > But I'm not sure why you see 0s ... the bytes should be unique for every > term > > you get back from the TermsEnum. > > That's easy to explain: > > The lower precision terms at the end have more than one doc in the > DocsEnum, you always return only the first (Lucene docid 0, you never list > all other entries in DocsEnum). The prefixcoded term has a shift value> 0 > and because bits are stripped from the right, the small long values will > therefore return 0L after decoding. > > In general to have such a type of cache, I would not use terms and instead > use numeric docvalues. An alternative is to use FieldCache, which does the > right thing automatically. Relying on the internal implementation of > numeric terms is not a good idea. > > Uwe > > > On Mon, Nov 17, 2014 at 10:39 AM, Barry Coughlan > > <b.coughl...@gmail.com> wrote: > > > Hi all, > > > > > > I'm using 4.10.2. I have a Long "id" field. Each document has one "id" > > > value. I am creating a look-up between Lucene's internal document id > > > and my "id" values by enumerating the inverted index: > > > > > > private long[] cacheDocIds() throws IOException { > > > long[] ourIds = new long[reader.maxDoc()]; > > > > > > Bits liveDocs = MultiFields.getLiveDocs(reader); > > > Fields fields = MultiFields.getFields(reader); > > > Terms terms = fields.terms("id"); > > > > > > TermsEnum iterator = terms.iterator(null); > > > BytesRef bytesRef = null; > > > while ((bytesRef = iterator.next()) != null) { > > > DocsEnum docsEnum = iterator.docs(liveDocs, null, > > > DocsEnum.FLAG_NONE); > > > > > > int luceneId = docsEnum.nextDoc(); > > > long ourId = NumericUtils.prefixCodedToLong(bytesRef); > > > System.out.println(luceneId + " " + ourId); > > > ourIds[luceneId] = ourId; > > > } > > > > > > return ourIds; > > > } > > > > > > With 5 documents (1, 2, 3, 4, 5) I get this output from the above code: > > > > > > 0 1 > > > 1 2 > > > 2 3 > > > 3 4 > > > 4 5 > > > 0 0 > > > 0 0 > > > 0 0 > > > > > > I don't understand why there are three zeroes at the end. > > > > > > - reader.maxDoc is 5 and no documents have been deleted. > > > - I have tried this with a varying number of documents and there are > > > always three zeroes at the end. > > > - I tried changing version to Lucene 4.10.0 and Lucene 4.9 and the > > > same behavior occurs. > > > > > > I can work around this with but I'm just curious if this behavior is > > > expected? > > > > > > Regards, > > > Barry > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >