Re: How to access DocValues inside a customized collector?
Thanks very much Uwe and Mikhail! Your points are all very well taken, so far it seems to work well, i will test more to verify details. Lisheng On Fri, Sep 21, 2018 at 3:54 AM Uwe Schindler wrote: > Hi, > > in general your approach is right, but you have to do it correctly. It > depends on the Collector subclass you are using. The simplest is to > subclass SimpleCollector: > https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/search/SimpleCollector.html > > There you have to override 2 methods: > > doSetNextReader(LeafReaderContext context): Here you call *once* > context.reader().getBinaryDocValues(String field) and save the thing in a > private member field "actReaderdocValues" of the collector (non-final). > > In collect(docId) you can then call actReaderdocValues.advanceExact(docId) > and retrieve the value. As collect is always called "in order", its safe to > use advanceExact(). > > Important is: Don't get a new docvalues instance on each call and > advanceExact()! This is only needed for out of order! So in combination > with an collector (like above) you get maximum performance, as everything > is per leaf reader and in order. > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Lisheng Zhang > > Sent: Friday, September 21, 2018 3:23 AM > > To: java-user@lucene.apache.org > > Subject: How to access DocValues inside a customized collector? > > > > we need to use binary DocValues (in a customized collector) added during > > indexing, i first tested in standard TopScoreDocCollector, it seems that > we > > need to: > > > > LeafReaderContext => reader() => get binary iterator => advanced to > correct > > location > > > > Is this the correct way or actually we have a better API (since we > already > > in that docId it seems to me that the binary DocValues should be readily > > available? > > > > Also do we have a way to see directly indexed data (Luke seems obsolete, > > Marple does not work with lucene 7.4.0 yet)? > > > > Thanks very much for helps, Lisheng > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
RE: How to access DocValues inside a customized collector?
Hi, in general your approach is right, but you have to do it correctly. It depends on the Collector subclass you are using. The simplest is to subclass SimpleCollector: https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/search/SimpleCollector.html There you have to override 2 methods: doSetNextReader(LeafReaderContext context): Here you call *once* context.reader().getBinaryDocValues(String field) and save the thing in a private member field "actReaderdocValues" of the collector (non-final). In collect(docId) you can then call actReaderdocValues.advanceExact(docId) and retrieve the value. As collect is always called "in order", its safe to use advanceExact(). Important is: Don't get a new docvalues instance on each call and advanceExact()! This is only needed for out of order! So in combination with an collector (like above) you get maximum performance, as everything is per leaf reader and in order. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Lisheng Zhang > Sent: Friday, September 21, 2018 3:23 AM > To: java-user@lucene.apache.org > Subject: How to access DocValues inside a customized collector? > > we need to use binary DocValues (in a customized collector) added during > indexing, i first tested in standard TopScoreDocCollector, it seems that we > need to: > > LeafReaderContext => reader() => get binary iterator => advanced to correct > location > > Is this the correct way or actually we have a better API (since we already > in that docId it seems to me that the binary DocValues should be readily > available? > > Also do we have a way to see directly indexed data (Luke seems obsolete, > Marple does not work with lucene 7.4.0 yet)? > > Thanks very much for helps, Lisheng - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: How to access DocValues inside a customized collector?
Not sure why are you looking for something better, since it's the best API already. You can check the sample usage at .FastTaxonomyFacetCounts.countAll(IndexReader), also notice FastTaxonomyFacetCounts.count(List) where DV iterator is dragged by enclosing intersection. also SolrDocumentFetcher.decodeDVField(int, LeafReader, String) does exactly this. On Fri, Sep 21, 2018 at 4:23 AM Lisheng Zhang wrote: > we need to use binary DocValues (in a customized collector) added during > indexing, i first tested in standard TopScoreDocCollector, it seems that we > need to: > > LeafReaderContext => reader() => get binary iterator => advanced to correct > location > > Is this the correct way or actually we have a better API (since we already > in that docId it seems to me that the binary DocValues should be readily > available? > > Also do we have a way to see directly indexed data (Luke seems obsolete, > Marple does not work with lucene 7.4.0 yet)? > > Thanks very much for helps, Lisheng > -- Sincerely yours Mikhail Khludnev
Re: How to access DocValues inside a customized collector?
Erick: Thanks very much for quick help, Luke you referred worked well (i found binary DocValues did get put in well) However i am still not sure how to efficiently access DocValues in a collector, " The Terms component directly access the indexed data and can be used to poke around in the indexed data. " Could you elaborate a little or roughly point a source code where DocValues were accessed inside collector (lucene or solr source code would be fine)? Thanks again for helps! On Thu, Sep 20, 2018 at 7:39 PM Erick Erickson wrote: > What Luke are you using? I think this one is being maintained: > https://github.com/DmitryKey/luke > > The Terms component directly access the indexed data and can be used > to poke around in the indexed data. > > I'll skip the accessing DocValues as I have to go back and look every time. > On Thu, Sep 20, 2018 at 6:23 PM Lisheng Zhang wrote: > > > > we need to use binary DocValues (in a customized collector) added during > > indexing, i first tested in standard TopScoreDocCollector, it seems that > we > > need to: > > > > LeafReaderContext => reader() => get binary iterator => advanced to > correct > > location > > > > Is this the correct way or actually we have a better API (since we > already > > in that docId it seems to me that the binary DocValues should be readily > > available? > > > > Also do we have a way to see directly indexed data (Luke seems obsolete, > > Marple does not work with lucene 7.4.0 yet)? > > > > Thanks very much for helps, Lisheng > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: How to access DocValues inside a customized collector?
What Luke are you using? I think this one is being maintained: https://github.com/DmitryKey/luke The Terms component directly access the indexed data and can be used to poke around in the indexed data. I'll skip the accessing DocValues as I have to go back and look every time. On Thu, Sep 20, 2018 at 6:23 PM Lisheng Zhang wrote: > > we need to use binary DocValues (in a customized collector) added during > indexing, i first tested in standard TopScoreDocCollector, it seems that we > need to: > > LeafReaderContext => reader() => get binary iterator => advanced to correct > location > > Is this the correct way or actually we have a better API (since we already > in that docId it seems to me that the binary DocValues should be readily > available? > > Also do we have a way to see directly indexed data (Luke seems obsolete, > Marple does not work with lucene 7.4.0 yet)? > > Thanks very much for helps, Lisheng - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org