Re: kinesis batches hang after YARN automatic driver restart

2015-11-03 Thread Hster Geguri
Hello Tathagata, Thank you for responding. I have read your excellent article on Zero Data Loss many many times. The Spark Streaming screen shows KCL consistently pulling events from the stream after half a minute as per usual which gets queued up. It's always the first two batches (0 events

Re: kinesis batches hang after YARN automatic driver restart

2015-11-03 Thread Tathagata Das
The Kinesis integration underneath uses the KCL libraries which takes a minute or so sometimes to spin up the threads and start getting data from Kinesis. That is under normal conditions. In your case, it could be happening that because of your killing and restarting, the restarted KCL may be

kinesis batches hang after YARN automatic driver restart

2015-11-02 Thread Hster Geguri
Hello Wonderful Sparks Peoples, We are testing AWS Kinesis/Spark Streaming (1.5) failover behavior with Hadoop/Yarn 2.6 and 2.71 and want to understand expected behavior. When I manually kill a yarn application master/driver with a linux kill -9, YARN will automatically relaunch another master