[jira] [Comment Edited] (SOLR-8281) Add RollupMergeStream to Streaming API

Joel Bernstein (JIRA) Wed, 18 Nov 2015 20:14:10 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012755#comment-15012755
 ]


Joel Bernstein edited comment on SOLR-8281 at 11/19/15 4:12 AM:
----------------------------------------------------------------

Early versions of the ParallelStream handled the merging of Rollups. But I 
pulled it out because I felt this needed more thought.

The nice thing about adding operations to the ReducerStream is that it makes 
the ReducerStream much more useful. So even we don't use it to merge Rollups 
it's worth doing.

But this construct seems nice:

{code}
reduce (...
                      parallel (...
                                    rollup (...
                                                hashJoin (
                                                                  search(...),
                                                                  search(...),
                                                                  on="fieldA" 
                                                )
                                     )
                         )
               )
{code}

Actually this is even nicer

{code}
reduce  (...
                      parallel (...
                                    reduce (...
                                                hashJoin (
                                                                  search(...),
                                                                  search(...),
                                                                  on="fieldA" 
                                                )
                                     )
                         )
               )
{code}

In this case the ReducerStream replaces the RollupStream. 

To support this we would need an Operation to rollup the Metrics.


was (Author: joel.bernstein):
Early versions of the ParallelStream handled the merging of Rollups. But I 
pulled it out because I felt this needed more thought.

The nice thing about adding operations to the ReducerStream is that it makes 
the ReducerStream much more useful. So even we don't use to merge Rollups it's 
worth doing.

But this construct seems nice:

{code}
reduce (...
                      parallel (...
                                    rollup (...
                                                hashJoin (
                                                                  search(...),
                                                                  search(...),
                                                                  on="fieldA" 
                                                )
                                     )
                         )
               )
{code}

Actually this is even nicer

{code}
reduce  (...
                      parallel (...
                                    reduce (...
                                                hashJoin (
                                                                  search(...),
                                                                  search(...),
                                                                  on="fieldA" 
                                                )
                                     )
                         )
               )
{code}

In this case the ReducerStream replaces the RollupStream. 

To support this we would need an Operation to rollup the Metrics.

> Add RollupMergeStream to Streaming API
> --------------------------------------
>
>                 Key: SOLR-8281
>                 URL: https://issues.apache.org/jira/browse/SOLR-8281
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>
> The RollupMergeStream merges the aggregate results emitted by the 
> RollupStream on *worker* nodes.
> This is designed to be used in conjunction with the HashJoinStream to perform 
> rollup Aggregations on the joined Tuples. The HashJoinStream will require the 
> tuples to be partitioned on the Join keys. To avoid needing to repartition on 
> the *group by* fields for the RollupStream, we can perform a merge of the 
> rolled up Tuples coming from the workers.
> The construct would like this:
> {code}
> mergeRollup (...
>                       parallel (...
>                                     rollup (...
>                                                 hashJoin (
>                                                                   search(...),
>                                                                   search(...),
>                                                                   on="fieldA" 
>                                                 )
>                                      )
>                          )
>                )
> {code}
> The pseudo code above would push the *hashJoin* and *rollup* to the *worker* 
> nodes. The emitted rolled up tuples would be merged by the mergeRollup.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8281) Add RollupMergeStream to Streaming API

Reply via email to