Hi, I have some questions regarding usage patterns and debugging in spark/spark streaming.
1. What is some used design patterns of using broadcast variable? In my application i created some and also created a scheduled task which periodically refreshes the variables. I want to know how efficiently and in modular way people generally achieve this? 2. Sometimes a uncaught exception in driver program/worker does not get traced anywhere? How can we debug this? 3. In our usecase we read from Kafka, do some mapping and lastly persists data to cassandra as well as pushes the data over remote actor for realtime update in dashboard. I used below approaches - First tried to use vary naive way like stream.map(...).foreachRDD( pushes to actor) It does not work and stage failed saying akka exception - Second tried to use akka.serialization.JavaSerilizer.withSystem(system){...} approach It does not work and stage failed BUT without any trace anywhere in lofs - Finally did rdd.collect to collect the output into driver and then pushes to actor It worked. I would like to know is there any efficient way of achieving this sort of usecases 4. Sometimes I see failed stages but when opened those stage details it said stage did not start. What does this mean? Looking forward for some interesting responses :) Thanks, -- Sourav Chandra Senior Software Engineer · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · sourav.chan...@livestream.com o: +91 80 4121 8723 m: +91 988 699 3746 skype: sourav.chandra Livestream "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd Block, Koramangala Industrial Area, Bangalore 560034 www.livestream.com