Andrzej Bialecki created SOLR-14470:
---------------------------------------

             Summary: Add streaming expressions to /export handler
                 Key: SOLR-14470
                 URL: https://issues.apache.org/jira/browse/SOLR-14470
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: Export Writer, streaming expressions
            Reporter: Andrzej Bialecki
            Assignee: Andrzej Bialecki


Many streaming scenarios would greatly benefit from the ability to perform 
partial rollups (or other transformations) as early as possible, in order to 
minimize the amount of data that has to be sent from shards to the aggregating 
node.

This can be implemented as a subset of streaming expressions that process the 
data directly inside each local {{ExportHandler}} and outputs only the records 
from the resulting stream. 

Conceptually it would be similar to the way Hadoop {{Combiner}} works. As is 
the case with {{Combiner}}, because the input data is processed in batches 
there would be no guarantee that only 1 record per unique sort values would be 
emitted - in fact, in most cases multiple partial aggregations would be 
emitted. Still, in many scenarios this would allow reducing the amount of data 
to be sent by several orders of magnitude.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to