Re: DocValues memory usage

Peter Keegan Thu, 28 Mar 2013 05:43:50 -0700

This is wierd. I indexed using DiskDocValuesFormat as the default codec and
observed 16K qps with BinaryDocValuesField. But with a simple StoredField,
I observed a much higher 30K qps. When I added both fields
(BinaryDocValuesField and StoredField) to the index, I observed only 100
qps on each field.


Peter

On Tue, Mar 26, 2013 at 12:30 PM, Michael McCandless <
[email protected]> wrote:

> DiskDocValuesFormat is the right thing to use: it loads certain things
> into RAM, eg the compressed bits that tell it the addresses of the
> bytes on disk, but then leaves the actual bytes on disk.
>
> I believe the old DirectSource was more extreme as it left the
> addresses on disk too, so there were 2 seeks to load a value.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Mar 26, 2013 at 11:55 AM, Duke <[email protected]> wrote:
> > I made the same experiment and got same result. Then I used per-field
> codec with DiskDocValuesFormat, it works like DirectSource in 4.0.0, but
> I'm not feeling confident with this usage. Anyone can say more about
> removing DirectSource API?
> >
> >
> >
> > On 2013-3-26, at 22:59, Peter Keegan <[email protected]> wrote:
> >
> >> Inspired by this presentation of DocValues:
> >>
> http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
> >> I decided to try them out in 4.2. I created a 1M document index with one
> >> DocValues field:
> >>
> >> BinaryDocValuesField conceptsDV = new
> BinaryDocValuesField("concepts",new
> >> BytesRef(byteArray(4000)));
> >> d.add(conceptsDV);
> >> writer.addDocument(d);
> >>
> >> I searched the index and fetched the DocValues field:
> >>
> >> TopDocs docs = searcher.search(new TermQuery(new Term("guid", val)), 1);
> >> int docId = docs.scoreDocs[0].doc;
> >> BinaryDocValues conceptValues =
> >> MultiDocValues.getBinaryValues(r,"concepts");
> >> BytesRef result = new BytesRef();
> >> conceptValues.get(docId,result);
> >>
> >> However, the first call to MultiDocValues.getBinaryValues reads in the
> >> values for the entire index:
> >>
> >> Lucene42DocValuesProducer.loadBinary // loads DocValues for entire index
> >>
> >> My hope was to take advantage of faster disk access than stored fields
> and
> >> less RAM than FieldCache, but this is using too much memory. Are my
> >> assumptions and my usage correct?
> >>
> >> Thanks,
> >> Peter
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: DocValues memory usage

Reply via email to