[
https://issues.apache.org/jira/browse/FLINK-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Waury updated FLINK-1141:
--------------------------------
Description:
en.As soon as a DataSet exceeds a certain size (1000000 tuples in my example) a
Selfjoin with a FlatJoinFunction no longer works. After around a second the
Join, DataSource and DataSink threads are all in Wait and don't perform any
work (no output files are created) and the job never finishes.
If I cut the input size in half it works fine.
My current workaround is to create the DataSet twice and join the two identical
DataSets.
was:
As soon as a DataSet exceeds a certain size (1000000 tuples in my example) a
Selfjoin with a FlatJoinFunction no longer works. After around a second the
Join, DataSource and DataSink threads are all in Wait and don't perform any
work (no output files are created) and the job never finishes.
If I cut the input size in half it works fine.
My current workaround is to create the DataSet twice and join the two identical
DataSets.
> Selfjoin fails after DataSet exceeds certain size
> -------------------------------------------------
>
> Key: FLINK-1141
> URL: https://issues.apache.org/jira/browse/FLINK-1141
> Project: Flink
> Issue Type: Bug
> Components: Distributed Runtime, Local Runtime
> Affects Versions: 0.6.1-incubating, 0.7-incubating
> Environment: LocalExecutionEnvironment (dop=4)
> Reporter: Robert Waury
> Priority: Minor
> Attachments: LargeSelfJoin.java
>
>
> en.As soon as a DataSet exceeds a certain size (1000000 tuples in my example)
> a Selfjoin with a FlatJoinFunction no longer works. After around a second the
> Join, DataSource and DataSink threads are all in Wait and don't perform any
> work (no output files are created) and the job never finishes.
> If I cut the input size in half it works fine.
> My current workaround is to create the DataSet twice and join the two
> identical DataSets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)