Hello everybody,

I am running a spark streaming app and I am planning to use it as a long
running service. However while trying the app in a rc environment I got
this exception in the master daemon after 1 hour of running:

​​Exception in thread "master-rebuild-ui-thread"
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.regex.Pattern.compile(Pattern.java:1667)
        at java.util.regex.Pattern.<init>(Pattern.java:1351)
        at java.util.regex.Pattern.compile(Pattern.java:1054)
        at java.lang.String.replace(String.java:2239)
        at
org.apache.spark.util.Utils$.getFormattedClassName(Utils.scala:1632)
        at
org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:486)
        at
org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
        at
org.apache.spark.deploy.master.Master$$anonfun$17.apply(Master.scala:972)
        at
org.apache.spark.deploy.master.Master$$anonfun$17.apply(Master.scala:952)
        at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


As a palliative measure I've increased the master memory to 1.5gb.
My job is running with a batch interval of 5 seconds.
I'm using spark version 1.6.2.

I think it might be related to this issues:

https://issues.apache.org/jira/browse/SPARK-6270
https://issues.apache.org/jira/browse/SPARK-12062
https://issues.apache.org/jira/browse/SPARK-12299

But I don't see a clear road to solve this apart from upgrading spark.
What would you recommend?


Thanks in advance
Mariano

Reply via email to