It's better to use doc values than field cache, if you can. Mike McCandless
http://blog.mikemccandless.com On Mon, Nov 17, 2014 at 2:55 PM, Barry Coughlan <b.coughl...@gmail.com> wrote: > Makes sense, thanks. I switched the implementation to a FieldCache with no > noticeable performance difference: > > private Longs cacheDocIds() throws IOException { > AtomicReader wrapped = SlowCompositeReaderWrapper.wrap(reader); > Longs vals = FieldCache.DEFAULT.getLongs(wrapped, "id", false); > return vals; > } > > Regards, > Barry > > On Mon, Nov 17, 2014 at 6:50 PM, Uwe Schindler <u...@thetaphi.de> wrote: > >> Hi, >> >> > It is expected: those are the "prefix" terms, which come after all the >> full- >> > precision numeric terms. >> > >> > But I'm not sure why you see 0s ... the bytes should be unique for every >> term >> > you get back from the TermsEnum. >> >> That's easy to explain: >> >> The lower precision terms at the end have more than one doc in the >> DocsEnum, you always return only the first (Lucene docid 0, you never list >> all other entries in DocsEnum). The prefixcoded term has a shift value> 0 >> and because bits are stripped from the right, the small long values will >> therefore return 0L after decoding. >> >> In general to have such a type of cache, I would not use terms and instead >> use numeric docvalues. An alternative is to use FieldCache, which does the >> right thing automatically. Relying on the internal implementation of >> numeric terms is not a good idea. >> >> Uwe >> >> > On Mon, Nov 17, 2014 at 10:39 AM, Barry Coughlan >> > <b.coughl...@gmail.com> wrote: >> > > Hi all, >> > > >> > > I'm using 4.10.2. I have a Long "id" field. Each document has one "id" >> > > value. I am creating a look-up between Lucene's internal document id >> > > and my "id" values by enumerating the inverted index: >> > > >> > > private long[] cacheDocIds() throws IOException { >> > > long[] ourIds = new long[reader.maxDoc()]; >> > > >> > > Bits liveDocs = MultiFields.getLiveDocs(reader); >> > > Fields fields = MultiFields.getFields(reader); >> > > Terms terms = fields.terms("id"); >> > > >> > > TermsEnum iterator = terms.iterator(null); >> > > BytesRef bytesRef = null; >> > > while ((bytesRef = iterator.next()) != null) { >> > > DocsEnum docsEnum = iterator.docs(liveDocs, null, >> > > DocsEnum.FLAG_NONE); >> > > >> > > int luceneId = docsEnum.nextDoc(); >> > > long ourId = NumericUtils.prefixCodedToLong(bytesRef); >> > > System.out.println(luceneId + " " + ourId); >> > > ourIds[luceneId] = ourId; >> > > } >> > > >> > > return ourIds; >> > > } >> > > >> > > With 5 documents (1, 2, 3, 4, 5) I get this output from the above code: >> > > >> > > 0 1 >> > > 1 2 >> > > 2 3 >> > > 3 4 >> > > 4 5 >> > > 0 0 >> > > 0 0 >> > > 0 0 >> > > >> > > I don't understand why there are three zeroes at the end. >> > > >> > > - reader.maxDoc is 5 and no documents have been deleted. >> > > - I have tried this with a varying number of documents and there are >> > > always three zeroes at the end. >> > > - I tried changing version to Lucene 4.10.0 and Lucene 4.9 and the >> > > same behavior occurs. >> > > >> > > I can work around this with but I'm just curious if this behavior is >> > > expected? >> > > >> > > Regards, >> > > Barry >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org