Does it work with the standard AWS S3 solution and its new consistency model <https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/> ?
Le mer. 23 déc. 2020 à 18:48, David Morin <morin.david....@gmail.com> a écrit : > Thanks. > My Spark applications run on nodes based on docker images but this is a > standalone mode (1 driver - n workers) > Can we use S3 directly with consistency addon like s3guard (s3a) or AWS > Consistent view > <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html> > ? > > Le mer. 23 déc. 2020 à 17:48, Lalwani, Jayesh <jlalw...@amazon.com> a > écrit : > >> Yes. It is necessary to have a distributed file system because all the >> workers need to read/write to the checkpoint. The distributed file system >> has to be immediately consistent: When one node writes to it, the other >> nodes should be able to read it immediately >> >> The solutions/workarounds depend on where you are hosting your Spark >> application. >> >> >> >> *From: *David Morin <morin.david....@gmail.com> >> *Date: *Wednesday, December 23, 2020 at 11:08 AM >> *To: *"user@spark.apache.org" <user@spark.apache.org> >> *Subject: *[EXTERNAL] Spark 3.0.1 Structured streaming - checkpoints fail >> >> >> >> *CAUTION*: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender and >> know the content is safe. >> >> >> >> Hello, >> >> >> >> I have an issue with my Pyspark job related to checkpoint. >> >> >> >> Caused by: org.apache.spark.SparkException: Job aborted due to stage >> failure: Task 3 in stage 16997.0 failed 4 times, most recent failure: Lost >> task 3.3 in stage 16997.0 (TID 206609, 10.XXX, executor 4): >> java.lang.IllegalStateException: Error reading delta file >> file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta of >> HDFSStateStoreProvider[id = (op=0,part=3),dir = >> file:/opt/spark/workdir/query6/checkpointlocation/state/0/3]: >> *file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta >> does not exist* >> >> >> >> This job is based on Spark 3.0.1 and Structured Streaming >> >> This Spark cluster (1 driver and 6 executors) works without hdfs. And we >> don't want to manage an hdfs cluster if possible. >> >> Is it necessary to have a distributed filesystem ? What are the different >> solutions/workarounds ? >> >> >> >> Thanks in advance >> >> David >> >