Vladimir Feinberg created SPARK-16899:
-----------------------------------------

             Summary: Structured Streaming Checkpointing Example invalid
                 Key: SPARK-16899
                 URL: https://issues.apache.org/jira/browse/SPARK-16899
             Project: Spark
          Issue Type: Bug
          Components: Documentation
            Reporter: Vladimir Feinberg
            Priority: Critical


The structured streaming checkpointing example at the bottom of the page 
(https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing)
 has the following excerpt:
```
aggDF
   .writeStream
   .outputMode("complete")
   .option(“checkpointLocation”, “path/to/HDFS/dir”)
   .format("memory")
   .start()
```

But memory sinks are not fault-tolerant. Indeed, trying this out, I get the 
following error: 
```
This query does not support recovering from checkpoint location. Delete 
/tmp/streaming.metadata-625631e5-baee-41da-acd1-f16c82f68a40/offsets to start 
over.;
```

The documentation should be changed to demonstrate checkpointing for a 
non-aggregation streaming task, and explicitly mention there is no way to 
checkpoint aggregates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to