I have similar requirement to export the data to mysql. Just wanted to know
what the best approach is so far after the research you guys have done.
Currently thinking of saving to hdfs and use sqoop to handle export. Is that
the best approach or is there any other way to write to mysql? Thanks!
I haven't seen people write directly to sql database,
mainly because it's difficult to deal with failure,
what if network broken in half of the process? should we drop all data in
database and restart from beginning? if the process is Appending data to
database, then things becomes even complex.
Maybe a little off topic, but would you mind to share your motivation of saving
the RDD into an SQL DB?
If you’re just trying to do further transformations/queries with SQL for
convenience, then you may just use Spark SQL directly within your Spark
application without saving them into DB:
On Thu, Aug 7, 2014 at 11:08 AM, 诺铁 noty...@gmail.com wrote:
what if network broken in half of the process? should we drop all data in
database and restart from beginning?
The best way to deal with this -- which, unfortunately, is not commonly
supported -- is with a two-phase commit that can
On Thu, Aug 7, 2014 at 11:25 AM, Cheng Lian lian.cs@gmail.com wrote:
Maybe a little off topic, but would you mind to share your motivation of
saving the RDD into an SQL DB?
Many possible reasons (Vida, please chime in with yours!):
- You have an existing database you want to load new
right, Spark is more like to act as an OLAP, i believe no one will use spark
as an OLTP, so there is always some question about how to share the data
between these two platform efficiently
and a more important is that most of enterprise BI tools rely on RDBMS or at
least a JDBC/ODBC interface
The use case I was thinking of was outputting calculations made in Spark
into a SQL database for the presentation layer to access. So in other
words, having a Spark backend in Java that writes to a SQL database and
then having a Rails front-end that can display the data nicely.
On Thu, Aug 7,
Vida,
What kind of database are you trying to write to?
For example, I found that for loading into Redshift, by far the easiest
thing to do was to save my output from Spark as a CSV to S3, and then load
it from there into Redshift. This is not a slow as you think, because Spark
can write the
Isn't sqoop export meant for that?
http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1
On Aug 7, 2014 7:59 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Vida,
What kind of database are you trying to write to?
For example, I found that for loading into
That's a good idea - to write to files first and then load. Thanks.
On Thu, Aug 7, 2014 at 11:26 AM, Flavio Pompermaier pomperma...@okkam.it
wrote:
Isn't sqoop export meant for that?
http://hadooped.blogspot.it/2013/06/apache-sqoop-part-3-data-transfer.html?m=1
On Aug 7, 2014 7:59 PM,
=C7gWtxelYNMfeature=youtu.be.
Jim Donahue
Adobe
-Original Message-
From: Ron Gonzalez [mailto:zlgonza...@yahoo.com.INVALID]
Sent: Wednesday, August 06, 2014 7:18 AM
To: Vida Ha
Cc: u...@spark.incubator.apache.org
Subject: Re: Save an RDD to a SQL Database
Hi Vida,
It's possible to save an RDD
Hi Vida,
It's possible to save an RDD as a hadoop file using hadoop output formats. It
might be worthwhile to investigate using DBOutputFormat and see if this will
work for you.
I haven't personally written to a db, but I'd imagine this would be one way
to do it.
Thanks,
Ron
Sent from my
Hi Vida,
I am writing to a DB -- or trying to :).
I believe the best practice for this (you can search the mailing list
archives) is to do a combination of mapPartitions and use a grouped
iterator.
Look at this thread, esp. the comment from A. Boisvert and Matei's comment
above it:
13 matches
Mail list logo