Re: [Spark Streaming] Connect to Database only once at the start of Streaming job

2015-10-28 Thread Tathagata Das
Yeah, of course. Just create an RDD from jdbc, call cache()/persist(), then force it to be evaluated using something like count(). Once it is cached, you can use it in a StreamingContext. Because of the cache it should not access JDBC any more. On Tue, Oct 27, 2015 at 12:04 PM, diplomatic Guru

Re: [Spark Streaming] Connect to Database only once at the start of Streaming job

2015-10-28 Thread Tathagata Das
However, if your executor dies. Then it may reconnect to JDBC to reconstruct the RDD partitions that were lost. To prevent that you can checkpoint the RDD to a HDFS-like filesystem (using rdd.checkpoint()). Then you are safe, it wont reconnect to JDBC. On Tue, Oct 27, 2015 at 11:17 PM, Tathagata

[Spark Streaming] Connect to Database only once at the start of Streaming job

2015-10-27 Thread Uthayan Suthakar
Hello all, What I wanted to do is configure the spark streaming job to read the database using JdbcRDD and cache the results. This should occur only once at the start of the job. It should not make any further connection to DB afterwards. Is it possible to do that?

Re: [Spark Streaming] Connect to Database only once at the start of Streaming job

2015-10-27 Thread diplomatic Guru
I know it uses lazy model, which is why I was wondering. On 27 October 2015 at 19:02, Uthayan Suthakar wrote: > Hello all, > > What I wanted to do is configure the spark streaming job to read the > database using JdbcRDD and cache the results. This should occur only