Hi Spark users, I have been seeing this issue where receivers enter a "stuck" state after it encounters a the following exception "Error in block pushing thread - java.util.concurrent.TimeoutException: Futures timed out". I am running the application on spark-1.4.1 and using kinesis-asl-1.4.
When this happens, the observation is that the Kinesis.ProcessTask.shardxxxx.MillisBehindLatest metric does not get published anymore, when I look at cloudwatch, which indicates that the workers associated with the receiver are not checkpointing any more for the shards that they were reading from. This seems like a bug in to BlockGenerator code , here - https://github.com/apache/spark/blob/branch-1.4/streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala#L171 when pushBlock encounters an exception, in this case the TimeoutException, it stops pushing blocks. Is this really expected behavior? Has anyone else seen this error and have you also seen the issue where receivers stop receiving records? I'm also trying to find the root cause for the TimeoutException. If anyone has an idea on this please share. Thanks, Bharath