Leonidas Fegaras created MRQL-56:
------------------------------------

             Summary: Improve total aggregations and repetitions with shared 
results in Flink mode
                 Key: MRQL-56
                 URL: https://issues.apache.org/jira/browse/MRQL-56
             Project: MRQL
          Issue Type: Improvement
          Components: Run-Time/Flink
    Affects Versions: 0.9.4
            Reporter: Leonidas Fegaras
            Assignee: Leonidas Fegaras
         Attachments: MRQL-56.patch

The following patch improves the Flink evaluation mode in two cases:
1. improves total aggregations by allowing MapAggregateReduce and 
MapAggregateReduce2 operations to do the aggregations at the groupBy stage, 
thus generating one aggregation result per node. Previously, the total 
aggregation was performed after the groupBy, which was inefficient. This 
problem was reported by Eldon Carman.
2. Re-implements repetitions whose result must be shared by all nodes, such as 
the centroids in the kmeans algorithm. Previously, it used a loop to evaluate 
the repetition body as a detached query. Now, it uses Flink iterations, as is 
described in the KMeans.java example in the Flink codebase (by broadcasting the 
shared results to nodes). Now the kmeans query is 7 times faster than before, 
and about 2.5 faster than Spark. Unfortunately, due to a Flink bug, this loop 
ignores the stopping condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to