[
https://issues.apache.org/jira/browse/MRQL-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Leonidas Fegaras updated MRQL-56:
---------------------------------
Attachment: MRQL-56-2.patch
Improved MapAggregateReduce and CrossAggregateProduct.
> Improve total aggregations and repetitions with shared results in Flink mode
> ----------------------------------------------------------------------------
>
> Key: MRQL-56
> URL: https://issues.apache.org/jira/browse/MRQL-56
> Project: MRQL
> Issue Type: Improvement
> Components: Run-Time/Flink
> Affects Versions: 0.9.4
> Reporter: Leonidas Fegaras
> Assignee: Leonidas Fegaras
> Attachments: MRQL-56-2.patch, MRQL-56.patch
>
>
> The following patch improves the Flink evaluation mode in two cases:
> 1. improves total aggregations by allowing MapAggregateReduce and
> MapAggregateReduce2 operations to do the aggregations at the groupBy stage,
> thus generating one aggregation result per node. Previously, the total
> aggregation was performed after the groupBy, which was inefficient. This
> problem was reported by Eldon Carman.
> 2. Re-implements repetitions whose result must be shared by all nodes, such
> as the centroids in the kmeans algorithm. Previously, it used a loop to
> evaluate the repetition body as a detached query. Now, it uses Flink
> iterations, as is described in the KMeans.java example in the Flink codebase
> (by broadcasting the shared results to nodes). Now the kmeans query is 7
> times faster than before, and about 2.5 faster than Spark. Unfortunately, due
> to a Flink bug, this loop ignores the stopping condition.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)