Re: Question about Payloads in Lucene 4.5

Rohit Banga Sat, 22 Mar 2014 02:19:13 -0700

Awesome BinaryDocValues sounds nice!
I saw that NumericDocValues did not inherit from a base class hence I
thought there is no StringDocValues :).


Can I expect that a searcher manager will invoke
searcherfactory.newSearcher at most once between searcher manager
refreshes? I believe IndexSearcher is threadsafe. Is my assumption that
newSearcher is invoked only once correct?

If BinaryDocValues didn't exist I was thinking of using a custom searcher
factory which would return an instance of a custom subclass of
IndexSearcher.This subclass could encapsulate a map from numeric doc value
to string. I was thinking SearcherManager.acquire could then be used to
fetch the instance of this subclass while permitting concurrent updates and
reads to index and HashMap.
Is using SearcherManager in this way appropriate? Just want to make sure my
understanding of how SearcherManager works is correct.

Thanks
Rohit
 On Mar 22, 2014 1:29 AM, "Michael McCandless" <luc...@mikemccandless.com>
wrote:

> On Fri, Mar 21, 2014 at 10:25 PM, Rohit Banga <iamrohitba...@gmail.com>
> wrote:
> > Thanks Michael for your response.
>
> You're welcome!
>
> > Few questions:
> >
> > 1. Can I expect better performance when retrieving a single
> NumericDocValue
> > for all hits vs when I retrieve documents for all hits to fetch the field
> > value? As far as I understand retrieving n documents from the index
> > requires n disk reads. How many disk reads to I do when using
> > NumericDocValues? How are they stored?
>
> It should be faster; doc values are stored "column stride", where all
> values across all docs for that one field are stored together, vs "row
> stride" of a stored document, where all fields for each document are
> stored together.
>
> The default DV format is Lucene45DocValuesFormat; it tries to compress
> the values, and then leaves the compressed form on disk and seeks for
> each lookup, but often the OS will cache those pages in RAM, if your
> application keeps them hot.
>
> You should test that first; if it's still too slow, and you're willing
> to use RAM, then swap in a different DVFormat for your field, e.g.
> DirectDocValuesFormat is the most RAM consuming (stores native java
> array under the hood) but should be the fastest.
>
> Swapping in a custom DVFormat for a field is easy: just make your own
> codec by subclassing the default Lucene46Codec, and override the
> method getDocValuesFormatForField.
>
> > 2. I tried looking for examples on how to use numeric doc values. I found
> > that in new versions of lucene we have to use "AtomicReader".
> > Found this:
> http://www.gossamer-threads.com/lists/lucene/java-user/182641
> >
> > So is this the code I am looking for:
> > long getNumericDocValueForDocument(IndexSearcher searcher, int docId) {
> >      IndexReader reader = searcher.getIndexReader();
> >      long docVal = 0;
> >      for (AtomicReaderContext rc : reader.leaves()) {
> >         AtomicReader ar = rc.reader();
> >         docVal = ar.getNumericDocValues().get(*docID*);
> >      }
> >      return docVal;
> > }
> >
> > How do I know which docVal to return? It appears that each AtomicReader
> > (every iteration of the loop) may return a docVal?
>
> Looks like you solved this already ...
>
> > 3. Can I only store NumericDocValues? Can I get something like
> > StringDocValues? I have a string "id". I guess I could keep a mapping
> from
> > numeric doc value (Long) to String but I want to avoid keeping two
> sources
> > of information (Lucene Index and a HashMap). I can use SearcherManager to
> > deal with concurrent searches and index updates (
> >
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html
> ),
> > but how about managing two data sources Lucene index and HashMap<Long,
> > String> with SearcherManager? Is there a way to achieve this using a
> custom
> > SearcherFactory?
>
> There are also binary doc values, maybe that helps?
>
> You may also want LiveFieldValues, if you need precise (real-time)
> lookup of the id for all docs, including just indexed ones.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Question about Payloads in Lucene 4.5

Reply via email to