[ https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357627#comment-14357627 ]
Josh Rosen commented on SPARK-6270: ----------------------------------- In the long run, my preference is to remove HistoryServer-like responsibilities from the Master: the standalone Master is typically configured with a small amount of memory and risks OOMing when loading UIs, even if the UI loading is done asynchronously (right now it blocks the main event processing thread). We might consider trying to add lazy loading as an intermediate stepping-stone to properly fixing this issue, but I'd like to argue against that approach: lazy loading inside of the Master is going to require mechanisms similar to what we have in the HistoryServer's loaderServlet, so we're either going to have to duplicate a bunch of code or change the HistoryServer code to be more modular so that we can reuse its components it inside of the Master. Another consideration firewall / port issues: currently, the master web UI and the Spark web UIs that it loads are served on the same port. If we set up a new Jetty server for the UIs, whether in the same Master JVM or in a separate HistoryServer process, then the Spark UIs will be served at some different port, potentially breaking those links in environments where only the master web UI port is exposed. I think it's going to be really painful to avoid this, though, and I don't think we should resort to solutions where we proxy the Spark UI through the master UI, since the responses could be huge and lead to OOMs in the proxy. I think we should Introduce a new configuration which completely disables the master's Spark UI serving feature, backport this to all maintenance branches, and mention this feature in the release notes. For Spark 1.4, I think we should completely remove the web UI serving from the Master and provide the ability to configure the master with a HistoryServer address which will be used to generate links to UIs. This runs into its own set of problems, though: the current HistoryServer FSHistoryProvider assumes that all applications' event logs are located in the same directory, whereas the Master can load event logs from any directory which is specified in the application description. This means that we'll need a way to instruct the HistoryServer to load logs from an arbitrary path. Therefore, maybe we should extend the HistoryServer's HTTP interface to allow requests to specify the event log location (falling back to the history server's default event log directory if no alternate log location was specified). This could have security implications, though; we'd have to be careful to ensure that this doesn't allow arbitrary file reads. > Standalone Master hangs when streaming job completes > ---------------------------------------------------- > > Key: SPARK-6270 > URL: https://issues.apache.org/jira/browse/SPARK-6270 > Project: Spark > Issue Type: Bug > Components: Deploy, Streaming > Affects Versions: 1.2.0, 1.3.0, 1.2.1 > Reporter: Tathagata Das > Priority: Critical > > If the event logging is enabled, the Spark Standalone Master tries to > recreate the web UI of a completed Spark application from its event logs. > However if this event log is huge (e.g. for a Spark Streaming application), > then the master hangs in its attempt to read and recreate the web ui. This > hang causes the whole standalone cluster to be unusable. > Workaround is to disable the event logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org