Hi guys,
I am new to spark and we are running a small project that collects data from 
Kinesis and inserts in to mongo.
I would like to share a high level view of how it is done and would love you 
input on it.

I am fetching kinesis data and for each RDD
  -> Parsing String data
  -> Inserting into a mongo storage

So what I understand is when in each RDD "we are parsing data”, that is 
serialized and send to workers. So when I would want to write to mongo.
Each workers creates a new connection to write to data.

Is there any way I can use a connection pool? By the way I am using scala and 
spark streaming.


A.K.M. Ashrafuzzaman
Lead Software Engineer
NewsCred

(M) 880-175-5592433
Twitter | Blog | Facebook

Check out The Academy, your #1 source
for free content marketing resources

Reply via email to