Re: Connection pooling in spark jobs

2015-04-03 Thread Charles Feduke
Out of curiosity I wanted to see what JBoss supported in terms of clustering and database connection pooling since its implementation should suffice for your use case. I found: *Note:* JBoss does not recommend using this feature on a production environment. It requires accessing a connection pool

Re: Connection pooling in spark jobs

2015-04-02 Thread Ted Yu
http://docs.oracle.com/cd/B10500_01/java.920/a96654/connpoca.htm The question doesn't seem to be Spark specific, btw On Apr 2, 2015, at 4:45 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Hi, We have a case that we will have to run concurrent jobs (for the same algorithm) on

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
Right, I am aware on how to use connection pooling with oracle, but the specific question is how to use it in the context of spark job execution On 2 Apr 2015 17:41, Ted Yu yuzhih...@gmail.com wrote: http://docs.oracle.com/cd/B10500_01/java.920/a96654/connpoca.htm The question doesn't seem to

Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
Hi, We have a case that we will have to run concurrent jobs (for the same algorithm) on different data sets. And these jobs can run in parallel and each one of them would be fetching the data from the database. We would like to optimize the database connections by making use of connection

Re: Connection pooling in spark jobs

2015-04-02 Thread Charles Feduke
How long does each executor keep the connection open for? How many connections does each executor open? Are you certain that connection pooling is a performant and suitable solution? Are you running out of resources on the database server and cannot tolerate each executor having a single

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
But this basically means that the pool is confined to the job (of a single app) in question, but is not sharable across multiple apps? The setup we have is a job server (the spark-jobserver) that creates jobs. Currently, we have each job opening and closing a connection to the database. What we

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
Each executor runs for about 5 secs until which time the db connection can potentially be open. Each executor will have 1 connection open. Connection pooling surely has its advantages of performance and not hitting the dbserver for every open/close. The database in question is not just used by the

Re: Connection pooling in spark jobs

2015-04-02 Thread Cody Koeninger
Connection pools aren't serializable, so you generally need to set them up inside of a closure. Doing that for every item is wasteful, so you typically want to use mapPartitions or foreachPartition rdd.mapPartition { part = setupPool part.map { ... See Design Patterns for using foreachRDD in