RE: Save an RDD to a SQL Database

2014-08-27 Thread bdev
I have similar requirement to export the data to mysql. Just wanted to know what the best approach is so far after the research you guys have done. Currently thinking of saving to hdfs and use sqoop to handle export. Is that the best approach or is there any other way to write to mysql? Thanks!

Re: Save an RDD to a SQL Database

2014-08-07 Thread 诺铁
I haven't seen people write directly to sql database, mainly because it's difficult to deal with failure, what if network broken in half of the process? should we drop all data in database and restart from beginning? if the process is Appending data to database, then things becomes even complex.

Re: Save an RDD to a SQL Database

2014-08-07 Thread Cheng Lian
Maybe a little off topic, but would you mind to share your motivation of saving the RDD into an SQL DB? If you’re just trying to do further transformations/queries with SQL for convenience, then you may just use Spark SQL directly within your Spark application without saving them into DB:

Re: Save an RDD to a SQL Database

2014-08-07 Thread Nicholas Chammas
On Thu, Aug 7, 2014 at 11:08 AM, 诺铁 noty...@gmail.com wrote: what if network broken in half of the process? should we drop all data in database and restart from beginning? The best way to deal with this -- which, unfortunately, is not commonly supported -- is with a two-phase commit that can

Re: Save an RDD to a SQL Database

2014-08-07 Thread Nicholas Chammas
On Thu, Aug 7, 2014 at 11:25 AM, Cheng Lian lian.cs@gmail.com wrote: Maybe a little off topic, but would you mind to share your motivation of saving the RDD into an SQL DB? Many possible reasons (Vida, please chime in with yours!): - You have an existing database you want to load new

Re: Save an RDD to a SQL Database

2014-08-07 Thread chutium
right, Spark is more like to act as an OLAP, i believe no one will use spark as an OLTP, so there is always some question about how to share the data between these two platform efficiently and a more important is that most of enterprise BI tools rely on RDBMS or at least a JDBC/ODBC interface

Re: Save an RDD to a SQL Database

2014-08-07 Thread Vida Ha
The use case I was thinking of was outputting calculations made in Spark into a SQL database for the presentation layer to access. So in other words, having a Spark backend in Java that writes to a SQL database and then having a Rails front-end that can display the data nicely. On Thu, Aug 7,

Re: Save an RDD to a SQL Database

2014-08-07 Thread Nicholas Chammas
Vida, What kind of database are you trying to write to? For example, I found that for loading into Redshift, by far the easiest thing to do was to save my output from Spark as a CSV to S3, and then load it from there into Redshift. This is not a slow as you think, because Spark can write the

Re: Save an RDD to a SQL Database

2014-08-07 Thread Flavio Pompermaier
Isn't sqoop export meant for that? http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1 On Aug 7, 2014 7:59 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Vida, What kind of database are you trying to write to? For example, I found that for loading into

Re: Save an RDD to a SQL Database

2014-08-07 Thread Vida Ha
That's a good idea - to write to files first and then load. Thanks. On Thu, Aug 7, 2014 at 11:26 AM, Flavio Pompermaier pomperma...@okkam.it wrote: Isn't sqoop export meant for that? http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1 On Aug 7, 2014 7:59 PM,

RE: Save an RDD to a SQL Database

2014-08-07 Thread Jim Donahue
=C7gWtxelYNMfeature=youtu.be. Jim Donahue Adobe -Original Message- From: Ron Gonzalez [mailto:zlgonza...@yahoo.com.INVALID] Sent: Wednesday, August 06, 2014 7:18 AM To: Vida Ha Cc: u...@spark.incubator.apache.org Subject: Re: Save an RDD to a SQL Database Hi Vida, It's possible to save an RDD

Re: Save an RDD to a SQL Database

2014-08-06 Thread Ron Gonzalez
Hi Vida, It's possible to save an RDD as a hadoop file using hadoop output formats. It might be worthwhile to investigate using DBOutputFormat and see if this will work for you. I haven't personally written to a db, but I'd imagine this would be one way to do it. Thanks, Ron Sent from my

Re: Save an RDD to a SQL Database

2014-08-06 Thread Yana
Hi Vida, I am writing to a DB -- or trying to :). I believe the best practice for this (you can search the mailing list archives) is to do a combination of mapPartitions and use a grouped iterator. Look at this thread, esp. the comment from A. Boisvert and Matei's comment above it: