Re: Question about Payloads in Lucene 4.5

Rohit Banga Fri, 21 Mar 2014 19:40:09 -0700

Just saw the implementation of MultiDocValues.getNumericValues(). It uses
sort of returns an anonymous inner classes to get the doc value from the
appropriate index reader. Very cool impleentation!
I guess that answers my question on how to get docVal from multiple
 atomic readers.


It would be nice if you could help me with the other two questions though.

Thanks
Rohit Banga
http://iamrohitbanga.com/


On Fri, Mar 21, 2014 at 7:25 PM, Rohit Banga <iamrohitba...@gmail.com>wrote:

> Thanks Michael for your response.
>
> Few questions:
>
> 1. Can I expect better performance when retrieving a single
> NumericDocValue for all hits vs when I retrieve documents for all hits to
> fetch the field value? As far as I understand retrieving n documents from
> the index requires n disk reads. How many disk reads to I do when using
> NumericDocValues? How are they stored?
>
> 2. I tried looking for examples on how to use numeric doc values. I found
> that in new versions of lucene we have to use "AtomicReader".
> Found this: http://www.gossamer-threads.com/lists/lucene/java-user/182641
>
> So is this the code I am looking for:
> long getNumericDocValueForDocument(IndexSearcher searcher, int docId) {
>      IndexReader reader = searcher.getIndexReader();
>      long docVal = 0;
>      for (AtomicReaderContext rc : reader.leaves()) {
>         AtomicReader ar = rc.reader();
>         docVal = ar.getNumericDocValues().get(*docID*);
>      }
>      return docVal;
> }
>
> How do I know which docVal to return? It appears that each AtomicReader
> (every iteration of the loop) may return a docVal?
>
> 3. Can I only store NumericDocValues? Can I get something like
> StringDocValues? I have a string "id". I guess I could keep a mapping from
> numeric doc value (Long) to String but I want to avoid keeping two sources
> of information (Lucene Index and a HashMap). I can use SearcherManager to
> deal with concurrent searches and index updates (
> http://blog.mikemccandless.com/2011/09/lucenes-searchermanager-simplifies.html),
> but how about managing two data sources Lucene index and HashMap<Long,
> String> with SearcherManager? Is there a way to achieve this using a custom
> SearcherFactory?
>
>
> Thanks
> Rohit Banga
> http://iamrohitbanga.com/
>
>
> On Fri, Mar 21, 2014 at 3:26 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> DocValues are better than payloads.
>>
>> E.g. index a NumericDocValuesField with each doc, holding your id.
>>
>> Then at search time you can use MultiDocValues.getNumericValues.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Mar 21, 2014 at 4:35 PM, Rohit Banga <iamrohitba...@gmail.com>
>> wrote:
>> > Hi everyone
>> >
>> > When I query a lucene index, I get back a list of document ids. This
>> index
>> > search is fast. Now for all documents matching the result I need a
>> unique
>> > String field called "id" which is stored in the document. From the
>> > documentation I gather that document ids are internal and I should not
>> use
>> > them for referencing my own data structures. Currently I iterate over
>> all
>> > the hits matching the document and then for each one I get the document
>> to
>> > read the field using IndexReader.document().
>> >
>> http://lucene.apache.org/core/4_5_0/core/org/apache/lucene/index/IndexReader.html
>> >
>> > I read the "id" field from the document and then use it further in my
>> > processing logic.
>> > The problem is that reading all documents to get all "id"'s is turning
>> out
>> > to be very slow. It is the bottleneck in my application. It would be
>> nice
>> > to have a way if lucene could return some metadata along with the
>> internal
>> > document id when I did a search. I do not want to read all documents
>> just
>> > to retrieve this metadata.
>> >
>> > The best solution I have come across searching on the net is to use
>> > payloads which will be returned by the fast index search query along
>> with
>> > the document ids.
>> >
>> > Is my understanding correct that using payloads I can get "id" string
>> field
>> > for all my documents faster than reading my entire document?
>> >
>> > I am not able to find a good example of how to store and retrieve
>> payloads?
>> > Can you please point me to a good resource to learn how to use payloads
>> and
>> > how they will impact performance?
>> > I am using Lucene 4.5.
>> >
>> > Thanks
>> > Rohit Banga
>> > http://iamrohitbanga.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>

Re: Question about Payloads in Lucene 4.5

Reply via email to