I was wondering what people generally do about doing database operations from 
executor nodes. I’m (at least for now) avoiding doing database updates from 
executor nodes to avoid proliferation of database connections on the cluster. 
The general pattern I adopt is to collect queries (or tuples) on the executors 
and write to the database on the driver.

// Executes on the executor
rdd.foreach(s => {
  val query = s"insert into .... ${s}";
  accumulator += query;
});

// Executes on the driver
acclumulator.value.foreach(query => {
    // get connection
    // update database
});

I’m obviously trading database connections for driver heap. How do other spark 
users do it?

Cheers
Praveen
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to