Spark usage patterns and questions

Sourav Chandra Tue, 11 Mar 2014 11:11:14 -0700

Hi,

I have some questions regarding usage patterns and debugging in spark/spark
streaming.

1. What is some used design patterns of using broadcast variable? In my
application i created some and also created a scheduled task which
periodically refreshes the variables. I want to know how efficiently and in
modular way people generally achieve this?

2. Sometimes a uncaught exception in driver program/worker does not get
traced anywhere? How can we debug this?

3. In our usecase we read from Kafka, do some mapping and lastly persists
data to cassandra as well as pushes the data over remote actor for realtime
update in dashboard. I used below approaches
- First tried to use vary naive way like stream.map(...).foreachRDD(
pushes to actor)
It does not work and stage failed saying akka exception
- Second tried to use
akka.serialization.JavaSerilizer.withSystem(system){...} approach
It does not work and stage failed BUT without any trace anywhere in
lofs
- Finally did rdd.collect to collect the output into driver and then
pushes to actor
It worked.

I would like to know is there any efficient way of achieving this sort of
usecases

4. Sometimes I see failed stages but when opened those stage details it
said stage did not start. What does this mean?

Looking forward for some interesting responses :)

Thanks,
--

Sourav Chandra

Senior Software Engineer

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·

sourav.chan...@livestream.com

o: +91 80 4121 8723

m: +91 988 699 3746

skype: sourav.chandra

Livestream

"Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd
Block, Koramangala Industrial Area,

Bangalore 560034

Spark usage patterns and questions

Reply via email to