[ 
https://issues.apache.org/jira/browse/MRQL-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leonidas Fegaras updated MRQL-56:
---------------------------------
    Attachment: MRQL-56-2.patch

Improved MapAggregateReduce and CrossAggregateProduct.

> Improve total aggregations and repetitions with shared results in Flink mode
> ----------------------------------------------------------------------------
>
>                 Key: MRQL-56
>                 URL: https://issues.apache.org/jira/browse/MRQL-56
>             Project: MRQL
>          Issue Type: Improvement
>          Components: Run-Time/Flink
>    Affects Versions: 0.9.4
>            Reporter: Leonidas Fegaras
>            Assignee: Leonidas Fegaras
>         Attachments: MRQL-56-2.patch, MRQL-56.patch
>
>
> The following patch improves the Flink evaluation mode in two cases:
> 1. improves total aggregations by allowing MapAggregateReduce and 
> MapAggregateReduce2 operations to do the aggregations at the groupBy stage, 
> thus generating one aggregation result per node. Previously, the total 
> aggregation was performed after the groupBy, which was inefficient. This 
> problem was reported by Eldon Carman.
> 2. Re-implements repetitions whose result must be shared by all nodes, such 
> as the centroids in the kmeans algorithm. Previously, it used a loop to 
> evaluate the repetition body as a detached query. Now, it uses Flink 
> iterations, as is described in the KMeans.java example in the Flink codebase 
> (by broadcasting the shared results to nodes). Now the kmeans query is 7 
> times faster than before, and about 2.5 faster than Spark. Unfortunately, due 
> to a Flink bug, this loop ignores the stopping condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to