[
https://issues.apache.org/jira/browse/FLINK-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175285#comment-14175285
]
Robert Metzger commented on FLINK-1170:
---------------------------------------
I found the issue while running a very simple "distributed grep" job that is
just reading a lot of data, filtering it for a certain string.
I had 1 TB of input data on a 24 nodes cluster.
The runtime was very bad with the issue (~1 hour), after the fix, I've got it
down to less than 4 minutes.
Flink and HDFS seem to use different hostname-representations. While hdfs was
just using "worker1", Flink was using the full hostname
("worker1.hdcluster.company.com"). This caused the input splits to be assigned
randomly, not local to the actual data.
After the fix, the data has been read locally most of the time (without costy
network IO).
> Localization of InputSplits is not working properly
> ---------------------------------------------------
>
> Key: FLINK-1170
> URL: https://issues.apache.org/jira/browse/FLINK-1170
> Project: Flink
> Issue Type: Bug
> Components: Distributed Runtime
> Reporter: Robert Metzger
> Assignee: Robert Metzger
>
> While running some benchmarks, I found that Flink is not properly assigning
> the InputSplits.
> On my testing cluster, ALL splits were assigned to remote HDFS DataNodes,
> which causes a lot of network I/O.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)