Yeah, of course. Just create an RDD from jdbc, call cache()/persist(), then
force it to be evaluated using something like count(). Once it is cached,
you can use it in a StreamingContext. Because of the cache it should not
access JDBC any more.
On Tue, Oct 27, 2015 at 12:04 PM, diplomatic Guru
However, if your executor dies. Then it may reconnect to JDBC to
reconstruct the RDD partitions that were lost. To prevent that you can
checkpoint the RDD to a HDFS-like filesystem (using rdd.checkpoint()). Then
you are safe, it wont reconnect to JDBC.
On Tue, Oct 27, 2015 at 11:17 PM, Tathagata
Hello all,
What I wanted to do is configure the spark streaming job to read the
database using JdbcRDD and cache the results. This should occur only once
at the start of the job. It should not make any further connection to DB
afterwards. Is it possible to do that?
I know it uses lazy model, which is why I was wondering.
On 27 October 2015 at 19:02, Uthayan Suthakar
wrote:
> Hello all,
>
> What I wanted to do is configure the spark streaming job to read the
> database using JdbcRDD and cache the results. This should occur only