Actually, Spark SQL provides a data source. Here is from documentation -
JDBC To Other Databases
Spark SQL also includes a data source that can read data from other
databases using JDBC. This functionality should be preferred over using
JdbcRDD
If your use case is more to do with querying RDBMS and then bringing the
results to spark do some analysis then Spark SQL JDBC datasource API
http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/
is the best. If your use case is to bring entire data to
If I run spark in stand-alone mode ( not YARN mode ), is there any tool like
Sqoop that able to transfer data from RDBMS to spark storage?
Thanks
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
What is the specific usecase? I can think of couple of ways (write to hdfs
and then read from spark or stream data to spark). Also I have seen people
using mysql jars to bring data in. Essentially you want to simulate
creation of rdd.
On 24 Apr 2015 18:15, sequoiadb mailing-list-r...@sequoiadb.com