But this basically means that the pool is confined to the job (of a single app) in question, but is not sharable across multiple apps? The setup we have is a job server (the spark-jobserver) that creates jobs. Currently, we have each job opening and closing a connection to the database. What we would like to achieve is for each of the jobs to obtain a connection from a db pool
Any directions on how this can be achieved? -- Sateesh On Thu, Apr 2, 2015 at 7:00 PM, Cody Koeninger <c...@koeninger.org> wrote: > Connection pools aren't serializable, so you generally need to set them up > inside of a closure. Doing that for every item is wasteful, so you > typically want to use mapPartitions or foreachPartition > > rdd.mapPartition { part => > setupPool > part.map { ... > > > > See "Design Patterns for using foreachRDD" in > http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams > > On Thu, Apr 2, 2015 at 7:52 AM, Sateesh Kavuri <sateesh.kav...@gmail.com> > wrote: > >> Right, I am aware on how to use connection pooling with oracle, but the >> specific question is how to use it in the context of spark job execution >> On 2 Apr 2015 17:41, "Ted Yu" <yuzhih...@gmail.com> wrote: >> >>> http://docs.oracle.com/cd/B10500_01/java.920/a96654/connpoca.htm >>> >>> The question doesn't seem to be Spark specific, btw >>> >>> >>> >>> >>> > On Apr 2, 2015, at 4:45 AM, Sateesh Kavuri <sateesh.kav...@gmail.com> >>> wrote: >>> > >>> > Hi, >>> > >>> > We have a case that we will have to run concurrent jobs (for the same >>> algorithm) on different data sets. And these jobs can run in parallel and >>> each one of them would be fetching the data from the database. >>> > We would like to optimize the database connections by making use of >>> connection pooling. Any suggestions / best known ways on how to achieve >>> this. The database in question is Oracle >>> > >>> > Thanks, >>> > Sateesh >>> >> >