Re: Row summary

Aaron McCurry Fri, 26 Oct 2012 17:46:47 -0700

I think that changing the fetch to be a search vs the termDoc iterator
in the lucene-4.0.0 branch.  However you are correct by adding a
summary to the primedoc you should get the maximum performance in the
current paradigm (0.1.x).  Btw there should be a method on the
BlurReducer that you can override that gives you a last chance to
change the docs before indexing (documentsToIndex).


Aaron

On Thu, Oct 25, 2012 at 3:38 PM, Tim Williams <[email protected]> wrote:
> On Thu, Oct 25, 2012 at 8:49 AM, Tim Williams <[email protected]> wrote:
>> I need a row summary type of record that should be the only record
>> returned on a super query.  I've tried to just add it as a new
>> "summary" columnfamily, then add "summary" in "columnFamiliesToFetch"
>> on the Selector.  I am seeing some benefit over just pulling back all
>> the records, but it's not as big as I'd hoped or as compared to a null
>> selector. I think the problem is that I'm still paying the cost of
>> iterating over all the TermDocs even for my single record.
>>
>> Are there any other ways I could use the current API to achieve this
>> effect in a more performant way?
>>
>> The alternative I've come up with is to...
>> ... add a (summarize()) hook in BlurReducer (just after
>> fetchOldRecords) that builds my summary document and ensures that it
>> becomes the prime doc.
>> ... then add a summaryOnly flag in the selector that stops after
>> reading the first termDoc.
>
> FWIW, this path looks promising, without the termDoc iterations it's
> obviously significantly faster.  The downside is that row.recordCount
> was calculated inside the iteration. Not sure yet how to solve that
> one...
>
> Thanks,
> --tim

Re: Row summary

Reply via email to