GitHub user andrewor14 opened a pull request:

    https://github.com/apache/spark/pull/204

    [SPARK-1276] Add a HistoryServer to render persisted UI

    Currently, a persisted UI can only be rendered through the standalone 
Master. This greatly limits the use case of the new feature of being able to 
log the details of a Spark application as events, since many people also run 
Spark on Yarn / Mesos.
    
    This PR introduces a new entity called the HistoryServer, which, given a 
log directory, keeps track of all completed applications independently of a 
Spark Master. Unlike Master, the HistoryServer needs not be running while the 
application is still running. It is relatively light-weight in that it only 
maintains static information of applications after-the-fact.
    
    To quickly test it out, generate event logs with 
```spark.eventLog.enabled=true``` and run ```sbin/start-history-server.sh 
<log-dir-path>```. Your HistoryServer awaits on port 18080.
    
    A few other changes introduced in this PR include refactoring the WebUI 
interface, which is beginning to have a lot of duplicate code now that we add 
more functionality to it. Two new SparkListenerEvents have been introduced 
(SparkListenerApplicationStart/End) to keep track of application name and 
start/finish times. This PR also clarifies the semantics of the 
ReplayListenerBus introduced in #42.
    
    A potential TODO in the future (not part of this PR) is to render live 
event logging applications in addition to just completed applications. This is 
useful if an application fails, in which case our current HistoryServer does 
not render the associated UI unless the user manually signals application 
completion. Processing the event logs in this case becomes significantly more 
complicated, however, because we must deal with multiple levels of streams that 
may each have arbitrary behavior if we want to avoid processing the entire file 
over and over again.
    
    Comments and feedback are most welcome.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #204
    
----
commit c086bd5c6837a98d3c989c43f2b75aeaa0e5eff0
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-03-20T19:43:16Z

    Add HistoryServer and scripts ++ Refactor WebUI interface
    
    HistoryServer can be launched with ./sbin/start-history-server.sh <log-dir>
    and stopped with ./sbin/stop-history-server.sh. This commit also involves
    refactoring all the UIs to avoid duplicate code.

commit 8aac16355329809b11c76430fa8737d328f2e962
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-03-20T21:34:34Z

    Add basic application table

commit 758441890dc86c8ed069e6c684b21528038f2ff7
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-03-21T04:59:34Z

    Report application start/end times to HistoryServer
    
    This involves adding application start and end events. This also
    allows us to record the actual app name instead of simply using
    the name of the directory.

commit 60bc6d57577742e861d62c183ec56d9893e3ea6a
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-03-22T01:17:43Z

    First complete implementation of HistoryServer (only for finished apps)
    
    This involves a change in Spark's event log format. All event logs are
    now prefixed with EVENT_LOG_. If compression is used, the logger creates
    a special empty file prefixed with COMPRESSION_CODEC_ that indicates which
    codec is used. After the application finishes, the logger logs a special
    empty file named APPLICATION_COMPLETE.
    
    The ReplayListenerBus is now responsible for parsing all of the above
    file formats. In this commit, we establish a one-to-one mapping between
    ReplayListenerBus and event logging applications. The semantics of the
    ReplayListenerBus is further clarified (e.g. replay is not allowed
    before starting, and can only be called once).
    
    This commit also adds a control mechanism for the frequency at which
    HistoryServer accesses the disk to check for log updates. This enforces
    a minimum interval of N seconds between two checks, where N is arbitrarily
    chosen to be 5.

commit 5dbfbb47826ea2edbf8cf2100228bddb5be473f8
Author: Andrew Or <andrewo...@gmail.com>
Date:   2014-03-22T01:54:28Z

    Merge branch 'master' of github.com:apache/spark
    
    Conflicts:
        core/src/main/scala/org/apache/spark/deploy/DeployWebUI.scala
        core/src/main/scala/org/apache/spark/deploy/WebUI.scala
        core/src/main/scala/org/apache/spark/deploy/master/Master.scala
        core/src/main/scala/org/apache/spark/ui/WebUI.scala

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to