subject:"Connection pooling in spark jobs"

Re: Connection pooling in spark jobs

2015-04-03 Thread Charles Feduke

Out of curiosity I wanted to see what JBoss supported in terms of clustering and database connection pooling since its implementation should suffice for your use case. I found: *Note:* JBoss does not recommend using this feature on a production environment. It requires accessing a connection pool

Re: Connection pooling in spark jobs

2015-04-02 Thread Ted Yu

http://docs.oracle.com/cd/B10500_01/java.920/a96654/connpoca.htm The question doesn't seem to be Spark specific, btw On Apr 2, 2015, at 4:45 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Hi, We have a case that we will have to run concurrent jobs (for the same algorithm) on

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri

Right, I am aware on how to use connection pooling with oracle, but the specific question is how to use it in the context of spark job execution On 2 Apr 2015 17:41, Ted Yu yuzhih...@gmail.com wrote: http://docs.oracle.com/cd/B10500_01/java.920/a96654/connpoca.htm The question doesn't seem to

Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri

Hi, We have a case that we will have to run concurrent jobs (for the same algorithm) on different data sets. And these jobs can run in parallel and each one of them would be fetching the data from the database. We would like to optimize the database connections by making use of connection

Re: Connection pooling in spark jobs

2015-04-02 Thread Charles Feduke

How long does each executor keep the connection open for? How many connections does each executor open? Are you certain that connection pooling is a performant and suitable solution? Are you running out of resources on the database server and cannot tolerate each executor having a single

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri

But this basically means that the pool is confined to the job (of a single app) in question, but is not sharable across multiple apps? The setup we have is a job server (the spark-jobserver) that creates jobs. Currently, we have each job opening and closing a connection to the database. What we

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri

Each executor runs for about 5 secs until which time the db connection can potentially be open. Each executor will have 1 connection open. Connection pooling surely has its advantages of performance and not hitting the dbserver for every open/close. The database in question is not just used by the

Re: Connection pooling in spark jobs

2015-04-02 Thread Cody Koeninger

Connection pools aren't serializable, so you generally need to set them up inside of a closure. Doing that for every item is wasteful, so you typically want to use mapPartitions or foreachPartition rdd.mapPartition { part = setupPool part.map { ... See Design Patterns for using foreachRDD in

Re: Connection pooling in spark jobs

Re: Connection pooling in spark jobs

Re: Connection pooling in spark jobs

Connection pooling in spark jobs

Re: Connection pooling in spark jobs

Re: Connection pooling in spark jobs

Re: Connection pooling in spark jobs

Re: Connection pooling in spark jobs

8 matches

Site Navigation

Mail list logo

Footer information