Thanks Jungtaek Ok I got it. I'll test it and check if the loss of efficiency is acceptable.
Le mer. 23 déc. 2020 à 23:29, Jungtaek Lim <kabhwan.opensou...@gmail.com> a écrit : > Please refer my previous answer - > https://lists.apache.org/thread.html/r7dfc9e47cd9651fb974f97dde756013fd0b90e49d4f6382d7a3d68f7%40%3Cuser.spark.apache.org%3E > Probably we may want to add it in the SS guide doc. We didn't need it as > it just didn't work with eventually consistent model, and now it works > anyway but is very inefficient. > > > On Thu, Dec 24, 2020 at 6:16 AM David Morin <morin.david....@gmail.com> > wrote: > >> Does it work with the standard AWS S3 solution and its new >> consistency model >> <https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/> >> ? >> >> Le mer. 23 déc. 2020 à 18:48, David Morin <morin.david....@gmail.com> a >> écrit : >> >>> Thanks. >>> My Spark applications run on nodes based on docker images but this is a >>> standalone mode (1 driver - n workers) >>> Can we use S3 directly with consistency addon like s3guard (s3a) or AWS >>> Consistent view >>> <https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html> >>> ? >>> >>> Le mer. 23 déc. 2020 à 17:48, Lalwani, Jayesh <jlalw...@amazon.com> a >>> écrit : >>> >>>> Yes. It is necessary to have a distributed file system because all the >>>> workers need to read/write to the checkpoint. The distributed file system >>>> has to be immediately consistent: When one node writes to it, the other >>>> nodes should be able to read it immediately >>>> >>>> The solutions/workarounds depend on where you are hosting your Spark >>>> application. >>>> >>>> >>>> >>>> *From: *David Morin <morin.david....@gmail.com> >>>> *Date: *Wednesday, December 23, 2020 at 11:08 AM >>>> *To: *"user@spark.apache.org" <user@spark.apache.org> >>>> *Subject: *[EXTERNAL] Spark 3.0.1 Structured streaming - checkpoints >>>> fail >>>> >>>> >>>> >>>> *CAUTION*: This email originated from outside of the organization. Do >>>> not click links or open attachments unless you can confirm the sender and >>>> know the content is safe. >>>> >>>> >>>> >>>> Hello, >>>> >>>> >>>> >>>> I have an issue with my Pyspark job related to checkpoint. >>>> >>>> >>>> >>>> Caused by: org.apache.spark.SparkException: Job aborted due to stage >>>> failure: Task 3 in stage 16997.0 failed 4 times, most recent failure: Lost >>>> task 3.3 in stage 16997.0 (TID 206609, 10.XXX, executor 4): >>>> java.lang.IllegalStateException: Error reading delta file >>>> file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta of >>>> HDFSStateStoreProvider[id = (op=0,part=3),dir = >>>> file:/opt/spark/workdir/query6/checkpointlocation/state/0/3]: >>>> *file:/opt/spark/workdir/query6/checkpointlocation/state/0/3/1.delta >>>> does not exist* >>>> >>>> >>>> >>>> This job is based on Spark 3.0.1 and Structured Streaming >>>> >>>> This Spark cluster (1 driver and 6 executors) works without hdfs. And >>>> we don't want to manage an hdfs cluster if possible. >>>> >>>> Is it necessary to have a distributed filesystem ? What are the >>>> different solutions/workarounds ? >>>> >>>> >>>> >>>> Thanks in advance >>>> >>>> David >>>> >>>