Totally depends on your database, if that's a NoSQL database like
MongoDB/HBase etc then you can use the native .saveAsNewAPIHAdoopFile or
.saveAsHadoopDataSet etc.

For a SQL databases, i think people usually puts the overhead on driver
like you did.

Thanks
Best Regards

On Wed, Mar 18, 2015 at 10:52 PM, Praveen Balaji <prav...@soundhound.com>
wrote:

> I was wondering what people generally do about doing database operations
> from executor nodes. I’m (at least for now) avoiding doing database updates
> from executor nodes to avoid proliferation of database connections on the
> cluster. The general pattern I adopt is to collect queries (or tuples) on
> the executors and write to the database on the driver.
>
> // Executes on the executor
> rdd.foreach(s => {
>   val query = s"insert into .... ${s}";
>   accumulator += query;
> });
>
> // Executes on the driver
> acclumulator.value.foreach(query => {
>     // get connection
>     // update database
> });
>
> I’m obviously trading database connections for driver heap. How do other
> spark users do it?
>
> Cheers
> Praveen
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to