This all depends if you provide information to Driver on the underlying
RDBMS table and assuming that there is a unique ID on the underlying table
you can use to partition the load.
Have a look at this
http://metricbrew.com/get-data-from-databases-with-apache-spark-jdbc/
HTH
Dr Mich Talebzadeh
Hi All,
I want read Mysql from Spark. Please let me know how many threads will be
used to read the RDBMS after set numPartitions =10 in Spark JDBC. What is
the best practice to read large dataset from RDBMS to Spark?
Thanks,
Jingyu
--
This message and its attachments may contain legally