SparkSQL parallelism

Madabhattula Rajesh Kumar Thu, 11 Feb 2016 20:45:54 -0800

Hi,

I have a spark cluster with One Master and 3 worker nodes. I have written a
below code to fetch the records from oracle using sparkSQL


val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val employees = sqlContext.read.format("jdbc").options(
    Map("url" -> "jdbc:oracle:thin:@xxxx:1525:SID",
    "dbtable" -> "(select * from employee where name like '%18%')",
    "user" -> "username",
    "password" -> "password")).load

I have a submitted this job to spark cluster using spark-submit command.



*Looks like, All 3 workers are executing same query and fetching same data.
It means, it is making 3 jdbc calls to oracle.*
*How to make this code to make a single jdbc call to oracle(In case of more
than one worker) ?*

Please help me to resolve this use case

Regards,
Rajesh

SparkSQL parallelism

Reply via email to