Re: Missing data in spark output

2022-10-25 Thread Steve Loughran
> > > On Wed, Oct 19, 2022 at 8:18 AM Martin Andersson < > martin.anders...@kambi.com> wrote: > >> Is your spark job batch or streaming? >> -- >> *From:* Sandeep Vinayak >> *Sent:* Tuesday, October 18, 2022 19:48 >>

Re: Missing data in spark output

2022-10-21 Thread Chris Nauroth
> *To:* dev@spark.apache.org > *Subject:* Missing data in spark output > > > EXTERNAL SENDER. Do not click links or open attachments unless you > recognize the sender and know the content is safe. DO NOT provide your > username or password. > > Hello Everyone, > > We are

Re: Missing data in spark output

2022-10-19 Thread Martin Andersson
Is your spark job batch or streaming? From: Sandeep Vinayak Sent: Tuesday, October 18, 2022 19:48 To: dev@spark.apache.org Subject: Missing data in spark output EXTERNAL SENDER. Do not click links or open attachments unless you recognize the sender and know

Re: Missing data in spark output

2022-10-18 Thread Emil Ejbyfeldt
Hi, We have observed similar behavior in older versions of spark. But we were are currently using 3.3.0 where we have not seen such issues. Which version of Spark and Hadoop are you using? On 18/10/2022 19:48, Sandeep Vinayak wrote: Hello Everyone, We are recently observing an intermittent

Missing data in spark output

2022-10-18 Thread Sandeep Vinayak
Hello Everyone, We are recently observing an intermittent data loss in the spark with output to GCS (google cloud storage). When there are missing rows, they are accompanied by duplicate rows. The re-run of the job doesn't have any duplicate or missing rows. Since it's hard to debug, we are first