Re: PartitionNotFoundException when running in yarn-session.

2017-10-13 Thread Niels Basjes
Hi I did some tests and it turns out I was really overloading the cluster which caused the problems. I tried the timeout setting but that didn't help. Simply 'not overloading' the system did help. Thanks. Niels On Thu, Oct 12, 2017 at 10:42 AM, Ufuk Celebi wrote: > Hey

Re: PartitionNotFoundException when running in yarn-session.

2017-10-12 Thread Ufuk Celebi
Hey Niels, Flink currently restarts the complete job if you have a restart strategy configured: https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/restart_strategies.html. I agree that only restarting the required parts of the pipeline is an important optimization. Flink has not

Re: PartitionNotFoundException when running in yarn-session.

2017-10-11 Thread Ufuk Celebi
Hey Niels, any update on this? – Ufuk On Mon, Oct 9, 2017 at 10:16 PM, Ufuk Celebi wrote: > Hey Niels, > > thanks for the detailed report. I don't think that it is related to > the Hadoop or Scala version. I think the following happens: > > - Occasionally, one of your tasks

Re: PartitionNotFoundException when running in yarn-session.

2017-10-09 Thread Ufuk Celebi
Hey Niels, thanks for the detailed report. I don't think that it is related to the Hadoop or Scala version. I think the following happens: - Occasionally, one of your tasks seems to be extremely slow in registering its produced intermediate result (the data shuffled between TaskManagers) -

PartitionNotFoundException when running in yarn-session.

2017-10-09 Thread Niels Basjes
Hi, I'm having some trouble running a java based Flink job in a yarn-session. The job itself consists of reading a set of files resulting in a DataStream (I use DataStream because in the future I intend to change the file with a Kafka feed), then does some parsing and eventually writes the data