Looks like the way to go. 

Quick question regarding the connection pool approach - if I have a connection 
that gets lazily instantiated, will it automatically die if I kill the driver 
application? In my scenario, I can keep a connection open for the duration of 
the app, and aren't that concerned about having idle connections as long as the 
app is running. For this specific scenario, do I still need to think of the 
timeout, or would it be shut down when the driver stops? (Using a stand alone 
cluster btw).

Regards,
Ashic.

> From: tathagata.das1...@gmail.com
> Date: Thu, 11 Dec 2014 06:33:49 -0800
> Subject: Re: "Session" for connections?
> To: as...@live.com
> CC: user@spark.apache.org
> 
> Also, this is covered in the streaming programming guide in bits and pieces.
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
> 
> On Thu, Dec 11, 2014 at 4:55 AM, Ashic Mahtab <as...@live.com> wrote:
> > That makes sense. I'll try that.
> >
> > Thanks :)
> >
> >> From: tathagata.das1...@gmail.com
> >> Date: Thu, 11 Dec 2014 04:53:01 -0800
> >> Subject: Re: "Session" for connections?
> >> To: as...@live.com
> >> CC: user@spark.apache.org
> >
> >>
> >> You could create a lazily initialized singleton factory and connection
> >> pool. Whenever an executor starts running the firt task that needs to
> >> push out data, it will create the connection pool as a singleton. And
> >> subsequent tasks running on the executor is going to use the
> >> connection pool. You will also have to intelligently shutdown the
> >> connections because there is not a obvious way to shut them down. You
> >> could have a usage timeout - shutdown connection after not being used
> >> for 10 x batch interval.
> >>
> >> TD
> >>
> >> On Thu, Dec 11, 2014 at 4:28 AM, Ashic Mahtab <as...@live.com> wrote:
> >> > Hi,
> >> > I was wondering if there's any way of having long running session type
> >> > behaviour in spark. For example, let's say we're using Spark Streaming
> >> > to
> >> > listen to a stream of events. Upon receiving an event, we process it,
> >> > and if
> >> > certain conditions are met, we wish to send a message to rabbitmq. Now,
> >> > rabbit clients have the concept of a connection factory, from which you
> >> > create a connection, from which you create a channel. You use the
> >> > channel to
> >> > get a queue, and finally the queue is what you publish messages on.
> >> >
> >> > Currently, what I'm doing can be summarised as :
> >> >
> >> > dstream.foreachRDD(x => x.forEachPartition(y => {
> >> > val factory = ..
> >> > val connection = ...
> >> > val channel = ...
> >> > val queue = channel.declareQueue(...);
> >> >
> >> > y.foreach(z => Processor.Process(z, queue));
> >> >
> >> > cleanup the queue stuff.
> >> > }));
> >> >
> >> > I'm doing the same thing for using Cassandra, etc. Now in these cases,
> >> > the
> >> > session initiation is expensive, so foing it per message is not a good
> >> > idea.
> >> > However, I can't find a way to say "hey...do this per worker once and
> >> > only
> >> > once".
> >> >
> >> > Is there a better pattern to do this?
> >> >
> >> > Regards,
> >> > Ashic.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
                                          

Reply via email to