[ https://issues.apache.org/jira/browse/SPARK-19814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Simon King closed SPARK-19814. ------------------------------ Resolution: Duplicate Looks like it's wrong to characterize this as a bug -- couldn't identify an actual memory leak. Seems more like we'll have to wait for the major overhaul proposed by https://issues.apache.org/jira/browse/SPARK-18085 > Spark History Server Out Of Memory / Extreme GC > ----------------------------------------------- > > Key: SPARK-19814 > URL: https://issues.apache.org/jira/browse/SPARK-19814 > Project: Spark > Issue Type: Bug > Components: Web UI > Affects Versions: 1.6.1, 2.0.0, 2.1.0 > Environment: Spark History Server (we've run it on several different > Hadoop distributions) > Reporter: Simon King > Attachments: SparkHistoryCPUandRAM.png > > > Spark History Server runs out of memory, gets into GC thrash and eventually > becomes unresponsive. This seems to happen more quickly with heavy use of the > REST API. We've seen this with several versions of Spark. > Running with the following settings (spark 2.1): > spark.history.fs.cleaner.enabled true > spark.history.fs.cleaner.interval 1d > spark.history.fs.cleaner.maxAge 7d > spark.history.retainedApplications 500 > We will eventually get errors like: > 17/02/25 05:02:19 WARN ServletHandler:ยท > javax.servlet.ServletException: scala.MatchError: java.lang.OutOfMemoryError: > GC overhead limit exceeded (of class java.lang.OutOfMemoryError) > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:489) > at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.servlets.gzip.GzipHandler.handle(GzipHandler.java:529) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.spark_project.jetty.server.Server.handle(Server.java:499) > at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > Caused by: scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit > exceeded (of class java.lang.OutOfMemoryError) > at > org.apache.spark.deploy.history.ApplicationCache.getSparkUI(ApplicationCache.scala:148) > at > org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:110) > at > org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:244) > at > org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:49) > at > org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66) > at sun.reflect.GeneratedMethodAccessor102.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter$1.run(SubResourceLocatorRouter.java:158) > at > org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.getResource(SubResourceLocatorRouter.java:178) > at > org.glassfish.jersey.server.internal.routing.SubResourceLocatorRouter.apply(SubResourceLocatorRouter.java:109) > at > org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:109) > at > org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112) > at > org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112) > at > org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112) > at > org.glassfish.jersey.server.internal.routing.RoutingStage._apply(RoutingStage.java:112) > at > org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:92) > at > org.glassfish.jersey.server.internal.routing.RoutingStage.apply(RoutingStage.java:61) > at org.glassfish.jersey.process.internal.Stages.process(Stages.java:197) > at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:318) > at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) > at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) > at org.glassfish.jersey.internal.Errors.process(Errors.java:315) > at org.glassfish.jersey.internal.Errors.process(Errors.java:297) > at org.glassfish.jersey.internal.Errors.process(Errors.java:267) > at > org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317) > at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305) > at > org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154) > at > org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473) > In our case we see memory usage gradually increase over perhaps 2 days, then > level off near max heap size (4G in our case), then often within 12-24 hours > GC activity will start to increase, and will result in more and more frequent > errors, as in the stack trace above. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org