[
https://issues.apache.org/jira/browse/MRQL-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176330#comment-14176330
]
Leonidas Fegaras commented on MRQL-55:
--------------------------------------
Here are some performance results (in secs) on a small Yarn cluster with 12
nodes (48 cores):
# PageRank (6 steps) 1M nodes, 10M edges
# K-means clustering (5 steps) 10M points
# DBLP XML PageRank (12 steps) 1.5GB
# matrix multiplication 500x500
{noformat}
Map-Reduce Spark Flink
-----------------------------------------------
1 591.8 145.1 145.3
2 1068.1 184.0 516.4
3 994.2 149.4 181.6
4 78.7 83.2 94.9
{noformat}
k-means is slower in Flink mode than in Spark mode because MRQL doesn't use
Flink iterations for k-means (but it does use Flink iterations for pagerank).
> Add support for Hadoop Sequence input format in flink mode
> ----------------------------------------------------------
>
> Key: MRQL-55
> URL: https://issues.apache.org/jira/browse/MRQL-55
> Project: MRQL
> Issue Type: Improvement
> Components: Run-Time/Flink
> Affects Versions: 0.9.4
> Reporter: Leonidas Fegaras
> Assignee: Leonidas Fegaras
> Priority: Minor
> Attachments: MRQL-55.patch
>
>
> The following patch adds support for hadoop Sequence input format in flink
> mode. Before this, we used the flink binary input format to read/write binary
> files, which was not compatible with other MRQL evaluation modes. The patch
> also fixes the mrql.flink script to get the flink job manager from
> conf/.yarn-properties instead of conf/.yarn-jobmanager.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)