Re: Missing data in spark output

Emil Ejbyfeldt Tue, 18 Oct 2022 22:40:44 -0700

Hi,

We have observed similar behavior in older versions of spark. But wewere are currently using 3.3.0 where we have not seen such issues.


Which version of Spark and Hadoop are you using?

On 18/10/2022 19:48, Sandeep Vinayak wrote:

Hello Everyone,
We are recently observing an intermittent data loss in the spark withoutput to GCS (google cloud storage). When there are missing rows, theyare accompanied by duplicate rows. The re-run of the job doesn't haveany duplicate or missing rows. Since it's hard to debug, we are firsttrying to understand the potential theoretical root cause of this issue,can this be a GCS specific issue where GCS might not be handling theconsistencies well? Any tips will be super helpful.
Thanks,


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Missing data in spark output

Reply via email to