Re: Control Sqoop job from Spark job

2019-09-02 Thread Chris Teoh
Hey Chetan, How many database connections are you anticipating in this job? Is this for every row in the dataframe? Kind regards Chris On Mon., 2 Sep. 2019, 9:11 pm Chetan Khatri, wrote: > Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets > launched based on

Re: Control Sqoop job from Spark job

2019-09-02 Thread Mich Talebzadeh
Hi, Just to clarify, JDBC connection to RDBMS from Spark is slow? This one read from an Oracle table with 4 connections in parallel to Oracle table assuming there is a primary key on the Oracle tale // // Get maxID first // val minID = HiveContext.read.format("jdbc").options(Map("url" ->

Re: Control Sqoop job from Spark job

2019-09-02 Thread Chetan Khatri
Hi Mich, JDBC Connection which is similar to Sqoop takes time and could not do parallelism. On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh wrote: > Spark is an excellent ETL tool to lift data from source and put it in > target. Spark uses JDBC connection similar to Sqoop. I don't see the need

Re: Control Sqoop job from Spark job

2019-09-02 Thread Chetan Khatri
Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets launched based on dataframe values in spark job. Certainly it can be isolated and broken. On Sat, Aug 31, 2019 at 8:07 AM Chris Teoh wrote: > I'd say this is an uncommon approach, could you use a workflow/scheduling >

Re: [ANNOUNCE] Announcing Apache Spark 2.4.4

2019-09-02 Thread Wesley Peng
on 2019/9/2 5:54, Dongjoon Hyun wrote: We are happy to announce the availability of Spark 2.4.4! Spark 2.4.4 is a maintenance release containing stability fixes. This release is based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4 users to upgrade to this stable