As we increase the throughput on our Spark streaming application, we're finding we hit errors with the WriteAheadLog, with errors like this:
16/05/21 20:42:21 WARN scheduler.ReceivedBlockTracker: Exception thrown while writing record: BlockAdditionEvent(ReceivedBlockInfo(0,Some(10),None,WriteAheadLogBasedStoreResult(input-0-1463850002991,Some(10),FileBasedWriteAheadLogSegment(hdfs://x.x.x.x:8020/checkpoint/receivedData/0/log-1463863286930-1463863346930,625283,39790)))) to the WriteAheadLog. java.util.concurrent.TimeoutException: Futures timed out after [5000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:81) at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:232) at org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:87) at org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock(ReceiverTracker.scala:321) at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1$$anonfun$run$1.apply$mcV$sp(ReceiverTracker.scala:500) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229) at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1.run(ReceiverTracker.scala:498) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/05/21 20:42:26 WARN scheduler.ReceivedBlockTracker: Exception thrown while writing record: BlockAdditionEvent(ReceivedBlockInfo(1,Some(10),None,WriteAheadLogBasedStoreResult(input-1-1462971836350,Some(10),FileBasedWriteAheadLogSegment(hdfs://x.x.x.x:8020/checkpoint/receivedData/1/log-1463863313080-1463863373080,455191,60798)))) to the WriteAheadLog. I've found someone else on StackOverflow with the same issue, who's suggested increasing the spark.streaming.driver.writeAheadLog.batchingTimeout setting, but we're not actually seeing significant performance issues on HDFS when the issue occurs. http://stackoverflow.com/questions/34879092/reliability-issues-with-checkpointing-wal-in-spark-streaming-1-6-0 Has anyone else come across this, and any suggested areas we can look at? Thanks, Ewan