Simon King created SPARK-19814:
----------------------------------

             Summary: Spark History Server Out Of Memory / Extreme GC
                 Key: SPARK-19814
                 URL: https://issues.apache.org/jira/browse/SPARK-19814
             Project: Spark
          Issue Type: Bug
          Components: Web UI
    Affects Versions: 2.1.0, 2.0.0, 1.6.1
         Environment: Spark History Server (we've run it on several different 
Hadoop distributions)
            Reporter: Simon King


Spark History Server runs out of memory, gets into GC thrash and eventually 
becomes unresponsive. This seems to happen more quickly with heavy use of the 
REST API. We've seen this with several versions of Spark. 

Running with the following settings (spark 2.1):
{{spark.history.fs.cleaner.enabled    true
spark.history.fs.cleaner.interval   1d
spark.history.fs.cleaner.maxAge     7d
spark.history.retainedApplications  500}}

We will eventually get errors like:
{{17/02/25 05:02:19 WARN ServletHandler:ยท
javax.servlet.ServletException: scala.MatchError: java.lang.OutOfMemoryError: 
GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
  at 
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489)
  at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
  at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
  at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
  at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
  at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
  at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
  at 
org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
  at 
org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
  at 
org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
  at 
org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  at 
org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:529)
  at 
org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
  at 
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
  at org.spark_project.jetty.server.Server.handle(Server.java:499)
  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
  at 
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
  at 
org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
  at 
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
  at 
org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
  at java.lang.Thread.run(Thread.java:745)

Caused by: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit 
exceeded (of class java.lang.OutOfMemoryError)
  at 
org.apache.spark.deploy.history.ApplicationCache.getSparkUI(ApplicationCache.scala:148)
  at 
org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:110)
  at 
org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:244)
  at 
org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:49)
  at 
org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
  at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at 
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter$1.run(SubResourceLocatorRouter.java:158)
  at 
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.getResource(SubResourceLocatorRouter.java:178)
  at 
org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.apply(SubResourceLocatorRouter.java:109)
  at 
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:109)
  at 
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
  at 
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
  at 
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
  at 
org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112)
  at 
org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:92)
  at 
org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:61)
  at org.glassfish.jersey.process.internal.Stages.process(Stages.java:197)
  at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:318)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
  at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
  at org.glassfish.jersey.internal.Errors.process(Errors.java:267)

  at 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
  at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
  at 
org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
  at 
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)}}

In our case we see memory usage gradually increase over perhaps 2 days, then 
level off near max heap size (4G in our case), then often within 12-24 hours GC 
activity will start to increase, and will result in more and more frequent 
errors, as in the stack trace above.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to