Andrzej Bialecki wrote:
Further input into this: after replacing the ConjunctionScorer with the
fixed version from JIRA, now the bottleneck seems to be ... in
Summarizer, of all things. :-)

While making the summarizer faster would of course be good, keep in mind that the cost of summarizing ten hits is constant as the size of the collection grows. In search running on a single node, ten summaries are computed per query. On ten nodes, one summary is computed per query. On 100 nodes, one summary is computed per ten queries.

Also note that we must save the raw text in order to form the text snippets of the summary. So we might store the token stream, but I think we'd still have to store the raw text too.

Doug

Reply via email to