Never mind, I got it: MultiDocValues.getNumericValues(final IndexReader r, final String field)
Barry On Tue, Nov 18, 2014 at 12:05 PM, Barry Coughlan <b.coughl...@gmail.com> wrote: > Hi Michael, > > Indexing: > > private NumericDocValuesField idField = new > NumericDocValuesField("id", 0); > > Reading: > > private NumericDocValues cacheDocIds() throws IOException { > AtomicReader wrapped = SlowCompositeReaderWrapper.wrap(reader); > return DocValues.getNumeric(wrapped, "id"); > } > > > I'm just putting this here for others because it's hard to find up-to-date > examples of using DocValues. > > Two quick questions: > > 1. Do you suggest I use DocValues because intended to eventually replace > FieldCache? > 2. Is it preferable to use reader.leaves() instead of > SlowCompositeReaderWrapper here and somehow merge the segments? > > Thanks for all your help. > > Barry > > > > > On Mon, Nov 17, 2014 at 8:37 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> It's better to use doc values than field cache, if you can. >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Mon, Nov 17, 2014 at 2:55 PM, Barry Coughlan <b.coughl...@gmail.com> >> wrote: >> > Makes sense, thanks. I switched the implementation to a FieldCache with >> no >> > noticeable performance difference: >> > >> > private Longs cacheDocIds() throws IOException { >> > AtomicReader wrapped = SlowCompositeReaderWrapper.wrap(reader); >> > Longs vals = FieldCache.DEFAULT.getLongs(wrapped, "id", false); >> > return vals; >> > } >> > >> > Regards, >> > Barry >> > >> > On Mon, Nov 17, 2014 at 6:50 PM, Uwe Schindler <u...@thetaphi.de> wrote: >> > >> >> Hi, >> >> >> >> > It is expected: those are the "prefix" terms, which come after all >> the >> >> full- >> >> > precision numeric terms. >> >> > >> >> > But I'm not sure why you see 0s ... the bytes should be unique for >> every >> >> term >> >> > you get back from the TermsEnum. >> >> >> >> That's easy to explain: >> >> >> >> The lower precision terms at the end have more than one doc in the >> >> DocsEnum, you always return only the first (Lucene docid 0, you never >> list >> >> all other entries in DocsEnum). The prefixcoded term has a shift >> value> 0 >> >> and because bits are stripped from the right, the small long values >> will >> >> therefore return 0L after decoding. >> >> >> >> In general to have such a type of cache, I would not use terms and >> instead >> >> use numeric docvalues. An alternative is to use FieldCache, which does >> the >> >> right thing automatically. Relying on the internal implementation of >> >> numeric terms is not a good idea. >> >> >> >> Uwe >> >> >> >> > On Mon, Nov 17, 2014 at 10:39 AM, Barry Coughlan >> >> > <b.coughl...@gmail.com> wrote: >> >> > > Hi all, >> >> > > >> >> > > I'm using 4.10.2. I have a Long "id" field. Each document has one >> "id" >> >> > > value. I am creating a look-up between Lucene's internal document >> id >> >> > > and my "id" values by enumerating the inverted index: >> >> > > >> >> > > private long[] cacheDocIds() throws IOException { >> >> > > long[] ourIds = new long[reader.maxDoc()]; >> >> > > >> >> > > Bits liveDocs = MultiFields.getLiveDocs(reader); >> >> > > Fields fields = MultiFields.getFields(reader); >> >> > > Terms terms = fields.terms("id"); >> >> > > >> >> > > TermsEnum iterator = terms.iterator(null); >> >> > > BytesRef bytesRef = null; >> >> > > while ((bytesRef = iterator.next()) != null) { >> >> > > DocsEnum docsEnum = iterator.docs(liveDocs, null, >> >> > > DocsEnum.FLAG_NONE); >> >> > > >> >> > > int luceneId = docsEnum.nextDoc(); >> >> > > long ourId = NumericUtils.prefixCodedToLong(bytesRef); >> >> > > System.out.println(luceneId + " " + ourId); >> >> > > ourIds[luceneId] = ourId; >> >> > > } >> >> > > >> >> > > return ourIds; >> >> > > } >> >> > > >> >> > > With 5 documents (1, 2, 3, 4, 5) I get this output from the above >> code: >> >> > > >> >> > > 0 1 >> >> > > 1 2 >> >> > > 2 3 >> >> > > 3 4 >> >> > > 4 5 >> >> > > 0 0 >> >> > > 0 0 >> >> > > 0 0 >> >> > > >> >> > > I don't understand why there are three zeroes at the end. >> >> > > >> >> > > - reader.maxDoc is 5 and no documents have been deleted. >> >> > > - I have tried this with a varying number of documents and there >> are >> >> > > always three zeroes at the end. >> >> > > - I tried changing version to Lucene 4.10.0 and Lucene 4.9 and the >> >> > > same behavior occurs. >> >> > > >> >> > > I can work around this with but I'm just curious if this behavior >> is >> >> > > expected? >> >> > > >> >> > > Regards, >> >> > > Barry >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >