Hi guys, My Spark Streaming application have this "java.lang.OutOfMemoryError: GC overhead limit exceeded" error in SparkStreaming driver program. I have done the following to debug with it:
1. improved the driver memory from 1GB to 2GB, this error came after 22 hrs. When the memory was 1GB, it came after 10 hrs. So I think it is the memory leak problem. 2. after starting the application a few hours, I killed all workers. The driver program kept running and also filling up the memory. I was thinking it was because too many batches in the queue, obviously it is not. Otherwise, after killing workers (of course, the receiver), the memory usage should have gone down. 3. run the heap dump and Leak Suspect of Memory Analysis in Eclipse, found that *"One instance of "org.apache.spark.storage.BlockManager" loaded by "sun.misc.Launcher$AppClassLoader @ 0x6c002fb90" occupies 1,477,177,296 (72.70%) bytes. The memory is accumulated in one instance of "java.util.LinkedHashMap" loaded by "<system class loader>".* *Keywords* *sun.misc.Launcher$AppClassLoader @ 0x6c002fb90**java.util.LinkedHashMap* *org.apache.spark.storage.BlockManager "* What my application mainly does is : 1. calculate the sum/count in a batch 2. get the average in the batch 3. store the result in DB 4. calculate the sum/count in a window 5. get the average/min/max in the window 6. store the result in DB 7. compare the current batch value with previous batch value using updateStateByKey. Any hint what causes this leakage? Thank you. Cheers, Fang, Yan yanfang...@gmail.com +1 (206) 849-4108