Hello Wonderful Sparks Peoples, We are testing AWS Kinesis/Spark Streaming (1.5) failover behavior with Hadoop/Yarn 2.6 and 2.71 and want to understand expected behavior.
When I manually kill a yarn application master/driver with a linux kill -9, YARN will automatically relaunch another master that successfully reads in the previous checkpoint. However- more than half the time, the kinesis executors (5 second batches) don't continue processing immediately. I.e. batches of 0 events are queued for 5-9 minutes before it starts reprocessing the stream again. When I drill down to the current job which is hanging- it shows all stages/tasks are complete. I would expect the automatically relaunched behavior to be similar to as if I had manually done a resubmit with spark-submit where the stream processing continues within a minute of launch. Any input is highly appreciated. Thanks much, Heji