[ https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053512#comment-15053512 ]
Steve Loughran commented on SPARK-6270: --------------------------------------- replay time itself is going to be steep, which, given that summary metadata doesn't really need, is a large amount of wasted IO, coded CPU and Json deser. Now, if someone were to have a protobuf or avro event format, you'd get really good compression in exchange for the suffering of developers. What could boost startup is what comes in the yarn timeline integration: extraction of summary data (times, finished flag) without having to do the replay. A summary file alongside the main one would work there, perhaps with the file length of real log listed in the summary so as to prove that the summary is in sync with the saved log. (mismatch == fallback to replay, save the summary for next time). There's one more thing to consider with those standalone logs —if the destination is an object store, should the flush/commit logic be different? You'd want to make sure that an s3a dest had multipart upload enabled, then have a partial upload trigger on a flush-class event, rather than wait until the end of the run. Today you don't get those guarantees and hence run the risk that a failed app could lose the history > Standalone Master hangs when streaming job completes and event logging is > enabled > --------------------------------------------------------------------------------- > > Key: SPARK-6270 > URL: https://issues.apache.org/jira/browse/SPARK-6270 > Project: Spark > Issue Type: Bug > Components: Deploy, Streaming > Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.5.1 > Reporter: Tathagata Das > Priority: Critical > > If the event logging is enabled, the Spark Standalone Master tries to > recreate the web UI of a completed Spark application from its event logs. > However if this event log is huge (e.g. for a Spark Streaming application), > then the master hangs in its attempt to read and recreate the web ui. This > hang causes the whole standalone cluster to be unusable. > Workaround is to disable the event logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org