Re: Using a Database to persist and load data from

2014-10-31 Thread Sonal Goyal
I think you can try to use the Hadoop DBOutputFormat

Best Regards,
Sonal
Nube Technologies 





On Fri, Oct 31, 2014 at 1:00 PM, Kamal Banga 
wrote:

> You can also use PairRDDFunctions' saveAsNewAPIHadoopFile that takes an
> OutputFormat class.
> So you will have to write a custom OutputFormat class that extends
> OutputFormat. In this class, you will have to implement a getRecordWriter
> which returns a custom RecordWriter.
> So you will also have to write a custom RecordWriter which extends
> RecordWriter which will have a write method that actually writes to the DB.
>
> On Fri, Oct 31, 2014 at 11:25 AM, Yanbo Liang 
> wrote:
>
>> AFAIK, you can read data from DB with JdbcRDD, but there is no interface
>> for writing to DB.
>> JdbcRDD has some restrict such as  SQL must with "where" clause.
>> For writing to DB, you can use mapPartitions or foreachPartition to
>> implement.
>> You can refer this example:
>>
>> http://stackoverflow.com/questions/24916852/how-can-i-connect-to-a-postgresql-database-into-apache-spark-using-scala
>>
>> 2014-10-30 23:01 GMT+08:00 Asaf Lahav :
>>
>>> Hi Ladies and Gents,
>>> I would like to know what are the options I have if I would like to
>>> leverage Spark code I already have written to use a DB (Vertica) as its
>>> store/datasource.
>>> The data is of tabular nature. So any relational DB can essentially be
>>> used.
>>>
>>> Do I need to develop a context? If yes, how? where can I get a good
>>> example?
>>>
>>>
>>> Thank you,
>>> Asaf
>>>
>>
>>
>


Re: Using a Database to persist and load data from

2014-10-31 Thread Kamal Banga
You can also use PairRDDFunctions' saveAsNewAPIHadoopFile that takes an
OutputFormat class.
So you will have to write a custom OutputFormat class that extends
OutputFormat. In this class, you will have to implement a getRecordWriter
which returns a custom RecordWriter.
So you will also have to write a custom RecordWriter which extends
RecordWriter which will have a write method that actually writes to the DB.

On Fri, Oct 31, 2014 at 11:25 AM, Yanbo Liang  wrote:

> AFAIK, you can read data from DB with JdbcRDD, but there is no interface
> for writing to DB.
> JdbcRDD has some restrict such as  SQL must with "where" clause.
> For writing to DB, you can use mapPartitions or foreachPartition to
> implement.
> You can refer this example:
>
> http://stackoverflow.com/questions/24916852/how-can-i-connect-to-a-postgresql-database-into-apache-spark-using-scala
>
> 2014-10-30 23:01 GMT+08:00 Asaf Lahav :
>
>> Hi Ladies and Gents,
>> I would like to know what are the options I have if I would like to
>> leverage Spark code I already have written to use a DB (Vertica) as its
>> store/datasource.
>> The data is of tabular nature. So any relational DB can essentially be
>> used.
>>
>> Do I need to develop a context? If yes, how? where can I get a good
>> example?
>>
>>
>> Thank you,
>> Asaf
>>
>
>


Re: Using a Database to persist and load data from

2014-10-30 Thread Yanbo Liang
AFAIK, you can read data from DB with JdbcRDD, but there is no interface
for writing to DB.
JdbcRDD has some restrict such as  SQL must with "where" clause.
For writing to DB, you can use mapPartitions or foreachPartition to
implement.
You can refer this example:
http://stackoverflow.com/questions/24916852/how-can-i-connect-to-a-postgresql-database-into-apache-spark-using-scala

2014-10-30 23:01 GMT+08:00 Asaf Lahav :

> Hi Ladies and Gents,
> I would like to know what are the options I have if I would like to
> leverage Spark code I already have written to use a DB (Vertica) as its
> store/datasource.
> The data is of tabular nature. So any relational DB can essentially be
> used.
>
> Do I need to develop a context? If yes, how? where can I get a good
> example?
>
>
> Thank you,
> Asaf
>


Using a Database to persist and load data from

2014-10-30 Thread Asaf Lahav
Hi Ladies and Gents,
I would like to know what are the options I have if I would like to
leverage Spark code I already have written to use a DB (Vertica) as its
store/datasource.
The data is of tabular nature. So any relational DB can essentially be used.

Do I need to develop a context? If yes, how? where can I get a good example?


Thank you,
Asaf