Missing data in spark output

Sandeep Vinayak Tue, 18 Oct 2022 10:49:00 -0700

Hello Everyone,

We are recently observing an intermittent data loss in the spark with
output to GCS (google cloud storage). When there are missing rows, they are
accompanied by duplicate rows. The re-run of the job doesn't have any
duplicate or missing rows. Since it's hard to debug, we are first trying to
understand the potential theoretical root cause of this issue, can this be
a GCS specific issue where GCS might not be handling the consistencies
well? Any tips will be super helpful.


Thanks,

Missing data in spark output

Reply via email to