Re: what is the best way to transfer data from RDBMS to spark?

2015-04-25 Thread ayan guha
Actually, Spark SQL provides a data source. Here is from documentation - JDBC To Other Databases Spark SQL also includes a data source that can read data from other databases using JDBC. This functionality should be preferred over using JdbcRDD

Re: what is the best way to transfer data from RDBMS to spark?

2015-04-25 Thread Sujeevan
If your use case is more to do with querying RDBMS and then bringing the results to spark do some analysis then Spark SQL JDBC datasource API http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ is the best. If your use case is to bring entire data to

what is the best way to transfer data from RDBMS to spark?

2015-04-24 Thread sequoiadb
If I run spark in stand-alone mode ( not YARN mode ), is there any tool like Sqoop that able to transfer data from RDBMS to spark storage? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: what is the best way to transfer data from RDBMS to spark?

2015-04-24 Thread ayan guha
What is the specific usecase? I can think of couple of ways (write to hdfs and then read from spark or stream data to spark). Also I have seen people using mysql jars to bring data in. Essentially you want to simulate creation of rdd. On 24 Apr 2015 18:15, sequoiadb mailing-list-r...@sequoiadb.com