Vladimir Feinberg created SPARK-16899: -----------------------------------------
Summary: Structured Streaming Checkpointing Example invalid Key: SPARK-16899 URL: https://issues.apache.org/jira/browse/SPARK-16899 Project: Spark Issue Type: Bug Components: Documentation Reporter: Vladimir Feinberg Priority: Critical The structured streaming checkpointing example at the bottom of the page (https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing) has the following excerpt: ``` aggDF .writeStream .outputMode("complete") .option(“checkpointLocation”, “path/to/HDFS/dir”) .format("memory") .start() ``` But memory sinks are not fault-tolerant. Indeed, trying this out, I get the following error: ``` This query does not support recovering from checkpoint location. Delete /tmp/streaming.metadata-625631e5-baee-41da-acd1-f16c82f68a40/offsets to start over.; ``` The documentation should be changed to demonstrate checkpointing for a non-aggregation streaming task, and explicitly mention there is no way to checkpoint aggregates. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org