[ 
https://issues.apache.org/jira/browse/MRQL-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14176330#comment-14176330
 ] 

Leonidas Fegaras commented on MRQL-55:
--------------------------------------

Here are some performance results (in secs) on a small Yarn cluster with 12 
nodes (48 cores):

# PageRank (6 steps) 1M nodes, 10M edges
# K-means clustering (5 steps) 10M points
# DBLP XML PageRank (12 steps) 1.5GB
# matrix multiplication 500x500

{noformat}
     Map-Reduce    Spark      Flink
-----------------------------------------------
1       591.8       145.1     145.3
2      1068.1       184.0     516.4
3       994.2       149.4     181.6
4        78.7        83.2      94.9
{noformat}

k-means is slower in Flink mode than in Spark mode because MRQL doesn't use 
Flink iterations for k-means (but it does use Flink iterations for pagerank).

> Add support for Hadoop Sequence input format in flink mode
> ----------------------------------------------------------
>
>                 Key: MRQL-55
>                 URL: https://issues.apache.org/jira/browse/MRQL-55
>             Project: MRQL
>          Issue Type: Improvement
>          Components: Run-Time/Flink
>    Affects Versions: 0.9.4
>            Reporter: Leonidas Fegaras
>            Assignee: Leonidas Fegaras
>            Priority: Minor
>         Attachments: MRQL-55.patch
>
>
> The following patch adds support for hadoop Sequence input format in flink 
> mode. Before this, we used the flink binary input format to read/write binary 
> files, which was not compatible with other MRQL evaluation modes. The patch 
> also fixes the mrql.flink script to get the flink job manager from 
> conf/.yarn-properties instead of conf/.yarn-jobmanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to