Re: Parallelising JDBC reads in spark

2020-05-25 Thread Manjunath Shetty H
om: Georg Heiler mailto:georg.kf.hei...@gmail.com>> Sent: Monday, May 25, 2020 11:52 AM To: Manjunath Shetty H mailto:manjunathshe...@live.com>> Cc: Mike Artz mailto:michaelea...@gmail.com>>; user mailto:user@spark.apache.org>> Subject: Re: Parallelising JDBC reads in spark

Re: Parallelising JDBC reads in spark

2020-05-25 Thread Manjunath Shetty H
: Re: Parallelising JDBC reads in spark Well you seem to have performance and consistency problems. Using a CDC tool fitting for your database you might be able to fix both. However, streaming the change events of the database log might be a bit more complicated. Tools like https://debezium.io

Re: Parallelising JDBC reads in spark

2020-05-25 Thread Georg Heiler
--- > *From:* Georg Heiler > *Sent:* Monday, May 25, 2020 11:14 AM > *To:* Manjunath Shetty H > *Cc:* Mike Artz ; user > *Subject:* Re: Parallelising JDBC reads in spark > > Why don't you apply proper change data capture? > This will be more complex though. > >

Re: Parallelising JDBC reads in spark

2020-05-25 Thread Manjunath Shetty H
Hi Georg, Thanks for the response, can please elaborate what do mean by change data capture ? Thanks Manjunath From: Georg Heiler Sent: Monday, May 25, 2020 11:14 AM To: Manjunath Shetty H Cc: Mike Artz ; user Subject: Re: Parallelising JDBC reads in spark

Re: Parallelising JDBC reads in spark

2020-05-24 Thread Georg Heiler
k >- by the time second task starts, e has been updated, so the row order >changes >- As f moves up, it will completely get missed in the fetch > > > Thanks > Manjunath > > -- > *From:* Mike Artz > *Sent:* Monday, May 25, 2

Re: Parallelising JDBC reads in spark

2020-05-24 Thread Manjunath Shetty H
Shetty H Cc: user Subject: Re: Parallelising JDBC reads in spark Does anything different happened when you set the isolationLevel to do Dirty Reads i.e. "READ_UNCOMMITTED" On Sun, May 24, 2020 at 7:50 PM Manjunath Shetty H mailto:manjunathshe...@live.com>> wrote: Hi, We a

Re: Parallelising JDBC reads in spark

2020-05-24 Thread Mike Artz
Does anything different happened when you set the isolationLevel to do Dirty Reads i.e. "READ_UNCOMMITTED" On Sun, May 24, 2020 at 7:50 PM Manjunath Shetty H wrote: > Hi, > > We are writing a ETL pipeline using Spark, that fetch the data from SQL > server in batch mode (every 15mins). Problem

Parallelising JDBC reads in spark

2020-05-24 Thread Manjunath Shetty H
Hi, We are writing a ETL pipeline using Spark, that fetch the data from SQL server in batch mode (every 15mins). Problem we are facing when we try to parallelising single table reads into multiple tasks without missing any data. We have tried this, * Use `ROW_NUMBER` window function in