A few months ago, I've started investigating part of an empirical research several stream processing engines, including but not limited to Spark Streaming.
As the benchmark should extend the scope further from performance metrics such as throughput and latency, I've focused onto fault tolerance as well. In particular, the rate of data items lost due to various faults. In this context, I have the following questions. If a Driver fails, all Executors and their in-memory kept blocks will be lost. The state is however maintained in HDFS for example when checkpointing, or using synchronous Write Ahead Logs. But, what happens to the blocks that have been received by the Receiver, but not yet processed? Follow the assumption of using an unreliable input data source. Secondly, to simulate this scenario and determine whether and in which amount data items were lost, how exactly can I simulate Driver failure after a certain amount of time the application has been operational. Thirdly, what other fault tolerance scenarios might result to data items being lost? I've investigated onto back-pressure to, but unlike Storm which uses a fast-fail back-pressure strategy, Spark Streaming handles back-pressure gracefully. Thanks a lot in advance! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-fault-tolerance-benchmark-tp27528.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org