I think that changing the fetch to be a search vs the termDoc iterator in the lucene-4.0.0 branch. However you are correct by adding a summary to the primedoc you should get the maximum performance in the current paradigm (0.1.x). Btw there should be a method on the BlurReducer that you can override that gives you a last chance to change the docs before indexing (documentsToIndex).
Aaron On Thu, Oct 25, 2012 at 3:38 PM, Tim Williams <[email protected]> wrote: > On Thu, Oct 25, 2012 at 8:49 AM, Tim Williams <[email protected]> wrote: >> I need a row summary type of record that should be the only record >> returned on a super query. I've tried to just add it as a new >> "summary" columnfamily, then add "summary" in "columnFamiliesToFetch" >> on the Selector. I am seeing some benefit over just pulling back all >> the records, but it's not as big as I'd hoped or as compared to a null >> selector. I think the problem is that I'm still paying the cost of >> iterating over all the TermDocs even for my single record. >> >> Are there any other ways I could use the current API to achieve this >> effect in a more performant way? >> >> The alternative I've come up with is to... >> ... add a (summarize()) hook in BlurReducer (just after >> fetchOldRecords) that builds my summary document and ensures that it >> becomes the prime doc. >> ... then add a summaryOnly flag in the selector that stops after >> reading the first termDoc. > > FWIW, this path looks promising, without the termDoc iterations it's > obviously significantly faster. The downside is that row.recordCount > was calculated inside the iteration. Not sure yet how to solve that > one... > > Thanks, > --tim
