Hi Ken,
Changing the parallelism can affect the generation of input splits.
I had a look at BinaryInputFormat, and it adds a bunch of empty input
splits if the number of generated splits is less than the minimum number of
splits (which is equal to the parallelism).
See -->
Good to hear that some of your problems have been solved Ken. For the
UTFDataFormatException it is hard to tell. Usually it says that the input
has been produced using `writeUTF`. Cloud you maybe provide an example
program which reproduces the problem? Moreover, it would be helpful to see
how the
Hi Till,
I tried out 1.9.0 with my workflow, and I no longer am running into the errors
I described below, which is great!
Just to recap, this is batch, per-job mode on YARN/EMR.
Though I did run into a new issue, related to my previous problem when reading
files written via
Thanks for the update Ken. The input splits seem to
be org.apache.hadoop.mapred.FileSplit. Nothing too fancy pops into my eye.
Internally they use org.apache.hadoop.mapreduce.lib.input.FileSplit which
stores a Path, two long pointers and two string arrays with hosts and host
infos. I would assume
Hi Stephan,
Thanks for responding, comments inline below…
Regards,
— Ken
> On Jun 26, 2019, at 7:50 AM, Stephan Ewen wrote:
>
> Hi Ken!
>
> Sorry to hear you are going through this experience. The major focus on
> streaming so far means that the DataSet API has stability issues at scale.
>
Hi Till,
Thanks for following up.
I’ve got answers to other emails on this thread pending, but wanted to respond
to this one now.
> On Jul 1, 2019, at 7:20 AM, Till Rohrmann wrote:
>
> Quick addition for problem (1): The AkkaRpcActor should serialize the
> response if it is a remote RPC and
Quick addition for problem (1): The AkkaRpcActor should serialize the
response if it is a remote RPC and send an AkkaRpcException if the
response's size exceeds the maximum frame size. This should be visible on
the call site since the future should be completed with this exception. I'm
wondering
Hi Ken,
in order to further debug your problems it would be helpful if you could
share the log files on DEBUG level with us.
For problem (2), I suspect that it has been caused by Flink releasing TMs
too early. This should be fixed with FLINK-10941 which is part of Flink
1.8.1. The 1.8.1 release
Hi Ken again,
In regard to TimeoutException, I just realized that there is no
akka.remote.OversizedPayloadException in your log file. There might be some
other reason caused this.
1. Have you ever tried increasing the configuration "akka.ask.timeout"?
2. Have you ever checked the garbage
Hi Ken,
In regard to oversized input splits, it seems to be a rare case beyond my
expectation. However it should be fixed definitely since input split can be
user-defined. We should not assume it must be small.
I agree with Stephan that maybe there is something unexpectedly involved in
the input
Hi Stephan,
We have met similar issues described as Ken. Would all these issues be
hopefully fixed in 1.9?
Thanks,
Qi
> On Jun 26, 2019, at 10:50 PM, Stephan Ewen wrote:
>
> Hi Ken!
>
> Sorry to hear you are going through this experience. The major focus on
> streaming so far means that
Hi Ken!
Sorry to hear you are going through this experience. The major focus on
streaming so far means that the DataSet API has stability issues at scale.
So, yes, batch mode in current Flink version can be somewhat tricky.
It is a big focus of Flink 1.9 to fix the batch mode, finally, and by
Hi all,
I’ve been running a somewhat complex batch job (in EMR/YARN) with Flink 1.8.0,
and it regularly fails, but for varying reasons.
Has anyone else had stability with 1.8.0 in batch mode and non-trivial
workflows?
Thanks,
— Ken
1. TimeoutException getting input splits
The batch job
13 matches
Mail list logo