Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Jungtaek Lim
One more thing I missed, commit metadata for the batch N must be written "after" all other parts of the checkpoint are successfully written for the batch N. So you seem to find a way to do asynchronous commit on "custom state store provider" - as I commented before, it's being tied to the task

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Rohit Agrawal
Thank you for the reply. For our use case, it's okay to not have exactly-once semantics. Given this use case of not needing exactly-once a) Is there any negative implications if one were to use a custom state store provider which asynchronously committed under the hood b) Is there any other option

Re: Checkpointing in Spark Structured Streaming

2021-03-22 Thread Jungtaek Lim
I see some points making async checkpoint be tricky to add in micro-batch; one example is "end to end exactly-once", as the commit phase in sink for the batch N can be run "after" the batch N + 1 has been started and write for batch N + 1 can happen before committing batch N. state store

Checkpointing in Spark Structured Streaming

2021-03-22 Thread Rohit Agrawal
Hi, I have been experimenting with the Continuous mode and the Micro batch mode in Spark Structured Streaming. When enabling checkpoint to S3 instead of the local File System we see that Continuous mode has no change in latency (expected due to async checkpointing) however the Micro-batch mode

Re: K8s Integration test is unable to run because of the unavailable libs

2021-03-22 Thread Yikun Jiang
hey, Yi Wu Looks like it's just an apt installation problem, we should do apt update to refresh the local package cache list before we install the "gnupg". I opened a issue on jira [1] , and try to fix it in [2], hope this helps. [1] https://issues.apache.org/jira/browse/SPARK-34820 [2]

K8s Integration test is unable to run because of the unavailable libs

2021-03-22 Thread Yi Wu
Hi devs, It seems like the K8s Integration test is unable to run recently because of the unavailable libs: Err:20 http://security.debian.org/debian-security buster/updates/main amd64 libldap-common all 2.4.47+dfsg-3+deb10u4 404 Not Found [IP: 151.101.194.132 80] Err:21