[
https://issues.apache.org/jira/browse/FLINK-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244135#comment-14244135
]
ASF GitHub Bot commented on FLINK-1287:
---------------------------------------
Github user uce commented on a diff in the pull request:
https://github.com/apache/incubator-flink/pull/258#discussion_r21743890
--- Diff:
flink-core/src/main/java/org/apache/flink/api/common/io/LocatableInputSplitAssigner.java
---
@@ -36,89 +33,111 @@
/**
* The locatable input split assigner assigns to each host splits that are
local, before assigning
- * splits that are not local.
+ * splits that are not local.
*/
public final class LocatableInputSplitAssigner implements
InputSplitAssigner {
private static final Logger LOG =
LoggerFactory.getLogger(LocatableInputSplitAssigner.class);
+ // unassigned input splits
+ private final Set<LocatableInputSplitWithCount> unassigned = new
HashSet<LocatableInputSplitWithCount>();
+
+ // input splits indexed by host for local assignment
+ private final ConcurrentHashMap<String, LocatableInputSplitChooser>
localPerHost = new ConcurrentHashMap<String, LocatableInputSplitChooser>();
+
+ // unassigned splits for remote assignment
--- End diff --
indentation is off
> Improve File Input Split assignment
> -----------------------------------
>
> Key: FLINK-1287
> URL: https://issues.apache.org/jira/browse/FLINK-1287
> Project: Flink
> Issue Type: Improvement
> Components: Local Runtime
> Reporter: Robert Metzger
> Assignee: Fabian Hueske
>
> While running some DFS read-intensive benchmarks, I found that the assignment
> of input splits is not optimal. In particular in cases where the numWorker !=
> numDataNodes and when the replication factor is low (in my case it was 1).
> In the particular example, the input had 40960 splits, of which 4694 were
> read remotely. Spark did only 2056 remote reads for the same dataset.
> With the replication factor increased to 2, Flink did only 290 remote reads.
> So usually, users shouldn't be affected by this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)