Hello Wonderful Sparks Peoples,

We are testing AWS Kinesis/Spark Streaming (1.5) failover behavior with
Hadoop/Yarn 2.6 and 2.71 and want to understand expected behavior.

When I manually kill a yarn application master/driver with a linux kill -9,
YARN will automatically relaunch another master that successfully reads in
the previous checkpoint.

However- more than half the time, the kinesis executors (5 second batches)
don't continue processing immediately.  I.e. batches of 0 events are queued
for  5-9 minutes before it starts reprocessing the stream again. When I
drill down to the current job which is hanging- it shows all stages/tasks
are complete. I would expect the automatically relaunched behavior to be
similar to as if I had manually done a resubmit with spark-submit where the
stream processing continues within a minute of launch.

Any input is highly appreciated.

Thanks much,
Heji

Reply via email to