But this basically means that the pool is confined to the job (of a single
app) in question, but is not sharable across multiple apps?
The setup we have is a job server (the spark-jobserver) that creates jobs.
Currently, we have each job opening and closing a connection to the
database. What we would like to achieve is for each of the jobs to obtain a
connection from a db pool

Any directions on how this can be achieved?

--
Sateesh

On Thu, Apr 2, 2015 at 7:00 PM, Cody Koeninger <c...@koeninger.org> wrote:

> Connection pools aren't serializable, so you generally need to set them up
> inside of a closure.  Doing that for every item is wasteful, so you
> typically want to use mapPartitions or foreachPartition
>
> rdd.mapPartition { part =>
> setupPool
> part.map { ...
>
>
>
> See "Design Patterns for using foreachRDD" in
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
>
> On Thu, Apr 2, 2015 at 7:52 AM, Sateesh Kavuri <sateesh.kav...@gmail.com>
> wrote:
>
>> Right, I am aware on how to use connection pooling with oracle, but the
>> specific question is how to use it in the context of spark job execution
>> On 2 Apr 2015 17:41, "Ted Yu" <yuzhih...@gmail.com> wrote:
>>
>>> http://docs.oracle.com/cd/B10500_01/java.920/a96654/connpoca.htm
>>>
>>> The question doesn't seem to be Spark specific, btw
>>>
>>>
>>>
>>>
>>> > On Apr 2, 2015, at 4:45 AM, Sateesh Kavuri <sateesh.kav...@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > We have a case that we will have to run concurrent jobs (for the same
>>> algorithm) on different data sets. And these jobs can run in parallel and
>>> each one of them would be fetching the data from the database.
>>> > We would like to optimize the database connections by making use of
>>> connection pooling. Any suggestions / best known ways on how to achieve
>>> this. The database in question is Oracle
>>> >
>>> > Thanks,
>>> > Sateesh
>>>
>>
>

Reply via email to