Hello, I have an issue with my Pyspark job related to checkpoint.
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 16997.0 failed 4 times, most recent failure: Lost task 3.3 in stage 16997.0 (TID 206609, 10.XXX, executor 4): java.lang.IllegalStateException: Error reading delta file file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta of HDFSStateStoreProvider[id = (op=0,part=3),dir = file:/opt/spark/workdir/query6/checkpointlocation/state/0/3]: *file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta does not exist* This job is based on Spark 3.0.1 and Structured Streaming This Spark cluster (1 driver and 6 executors) works without hdfs. And we don't want to manage an hdfs cluster if possible. Is it necessary to have a distributed filesystem ? What are the different solutions/workarounds ? Thanks in advance David