Hey Chetan,
How many database connections are you anticipating in this job? Is this for
every row in the dataframe?
Kind regards
Chris
On Mon., 2 Sep. 2019, 9:11 pm Chetan Khatri,
wrote:
> Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets
> launched based on
Hi,
Just to clarify, JDBC connection to RDBMS from Spark is slow?
This one read from an Oracle table with 4 connections in parallel to Oracle
table assuming there is a primary key on the Oracle tale
//
// Get maxID first
//
val minID = HiveContext.read.format("jdbc").options(Map("url" ->
Hi Mich, JDBC Connection which is similar to Sqoop takes time and could not
do parallelism.
On Sat, Aug 31, 2019 at 12:17 PM Mich Talebzadeh
wrote:
> Spark is an excellent ETL tool to lift data from source and put it in
> target. Spark uses JDBC connection similar to Sqoop. I don't see the need
Hi Chris, Thanks for the email. You're right. but it's like Sqoop job gets
launched based on dataframe values in spark job. Certainly it can be
isolated and broken.
On Sat, Aug 31, 2019 at 8:07 AM Chris Teoh wrote:
> I'd say this is an uncommon approach, could you use a workflow/scheduling
>
on 2019/9/2 5:54, Dongjoon Hyun wrote:
We are happy to announce the availability of Spark 2.4.4!
Spark 2.4.4 is a maintenance release containing stability fixes. This
release is based on the branch-2.4 maintenance branch of Spark. We strongly
recommend all 2.4 users to upgrade to this stable