Re: About Spark job web ui persist(JIRA-969)

Pillis Work Mon, 13 Jan 2014 12:23:59 -0800

The listeners in SparkUI which update the counters can trigger saves along
the way.
The save can be on a 500ms delay after the last update, to batch changes.
This solution would not require save on stop().




On Mon, Jan 13, 2014 at 6:15 AM, Tom Graves <[email protected]> wrote:

> So the downside to just saving stuff at the end is that if the app crashes
> or exits badly you don't have anything.   Hadoop has taken the approach of
> saving events along the way.  But Hadoop also uses that history file to
> start where it left off at if something bad happens and it gets restarted.
>  I don't think the latter really applies to spark though.
>
> Does mesos have a history server?
>
> Tom
>
>
>
> On Sunday, January 12, 2014 9:22 PM, Pillis Work <[email protected]>
> wrote:
>
> IMHO from a pure Spark standpoint, I don't know if having a dedicated
> history service makes sense as of now - considering that cluster managers
> have their own history servers. Just showing UI of history runs might be
> too thin a requirement for a full service. Spark should store history
> information that can later be exposed in required ways.
>
> Since each SparkContext is the logical entry and exit point for doing
> something useful in Spark, during its stop(), it should serialize that
> run's statistics into a JSON file - like "sc_run_[name]_[start-time].json".
> When SparkUI.stop() is called, it in turn asks its UI objects (which should
> implement a trait) to provide either a flat or hierarchical Map of String
> key/value pairs. This map (flat, hierarchical) is then serialized to a
> configured path (default being "var/history").
>
> With regards to Mesos or YARN, their applications during shutdown can have
> API to import this Spark history into their history servers - by making API
> calls etc.
>
> This way Spark's history information is persisted independent of cluster
> framework, and cluster frameworks can import the history when/as needed.
> Hope this helps.
> Regards,
> pillis
>
>
>
> On Thu, Jan 9, 2014 at 6:13 AM, Tom Graves <[email protected]> wrote:
>
> > Note that it looks like we are planning on adding support for application
> > specific frameworks to YARN sooner rather then later. There is an initial
> > design up here: https://issues.apache.org/jira/browse/YARN-1530. Note
> > this has not been reviewed yet so changes are likely but gives an idea of
> > the general direction.  If anyone has comments on how that might work
> with
> > SPARK I encourage you to post to the jira.
> >
> > As Sandy mentioned it would be very nice if the solution could be
> > compatible with that.
> >
> > Tom
> >
> >
> >
> > On Wednesday, January 8, 2014 12:44 AM, Sandy Ryza <
> > [email protected]> wrote:
> >
> > Hey,
> >
> > YARN-321 is targeted for the Hadoop 2.4.  The minimum feature set doesn't
> > include application-specific data, so that probably won't be part of 2.4
> > unless other things delay the release for a while.  There are no APIs for
> > it yet and pluggable UIs have been discussed but not agreed upon.  I
> think
> > requirements from Spark could be useful in helping shape what gets done
> > there.
> >
> > -Sandy
> >
> >
> >
> > On Tue, Jan 7, 2014 at 4:13 PM, Patrick Wendell <[email protected]>
> > wrote:
> >
> > > Hey Sandy,
> > >
> > > Do you know what the status is for YARN-321 and what version of YARN
> > > it's targeted for? Also, is there any kind of documentation or API for
> > > this? Does it control the presentation of the data itself (e.g. it
> > > actually has its own UI)?
> > >
> > > @Tom - having an optional history server sounds like a good idea.
> > >
> > > One question is what format to use for storing the data and how the
> > > persisted format relates to XML/HTML generation in the live UI. One
> > > idea would be to add JSON as an intermediate format inside of the
> > > current WebUI, and then any JSON page could be persisted and rendered
> > > by the history server using the same code. Once a SparkContext exits
> > > it could dump a series of named paths each with a JSON file. Then the
> > > history server could load those paths and pass them through the second
> > > rendering stage (JSON => XML) to create each page.
> > >
> > > It would be good if SPARK-969 had a good design doc before anyone
> > > starts working on it.
> > >
> > > - Patrick
> > >
> > > On Tue, Jan 7, 2014 at 3:18 PM, Sandy Ryza <[email protected]>
> > > wrote:
> > > > As a sidenote, it would be nice to make sure that whatever done here
> > will
> > > > work with the YARN Application History Server (YARN-321), a generic
> > > history
> > > > server that functions similarly to MapReduce's JobHistoryServer.  It
> > will
> > > > eventually have the ability to store application-specific data.
> > > >
> > > > -Sandy
> > > >
> > > >
> > > > On Tue, Jan 7, 2014 at 2:51 PM, Tom Graves <[email protected]>
> > wrote:
> > > >
> > > >> I don't think you want to save the html/xml files. I would rather
> see
> > > the
> > > >> info saved into a history file in like a json format that could then
> > be
> > > >> re-read and the web ui display the info, hopefully without much
> change
> > > to
> > > >> the UI parts.  For instance perhaps the history server could read
> the
> > > file
> > > >> and populate the appropriate Spark data structures that the web ui
> > > already
> > > >> uses.
> > > >>
> > > >> I would suggest making it so the history server is an optional
> server
> > > and
> > > >> could be run on any node. That way if the load on a particular node
> > > becomes
> > > >> to much it could be moved, but you also could run it on the same
> node
> > as
> > > >> the Master.  All it really needs to know is where to get the history
> > > files
> > > >> from and have access to that location.
> > > >>
> > > >> Hadoop actually has a history server for MapReduce which works very
> > > >> similar to what I mention above.   One thing to keep in minds here
> is
> > > >> security.  You want to make sure that the history files can only be
> > > read by
> > > >> users who have the appropriate permissions.  The history server
> itself
> > > >> could run as  a superuser who has permission to server up the files
> > > based
> > > >> on the acls.
> > > >>
> > > >>
> > > >>
> > > >> On Tuesday, January 7, 2014 8:06 AM, "Xia, Junluan" <
> > > [email protected]>
> > > >> wrote:
> > > >>
> > > >> Hi all
> > > >>          Spark job web ui will not be available when job is over,
> but
> > it
> > > >> is convenient for developer to debug with persisting job web ui. I
> > just
> > > >> come up with draft for this issue.
> > > >>
> > > >> 1.       We could simply save the web page with html/xml
> > > >> format(stages/executors/storages/environment) to certain location
> when
> > > job
> > > >> finished
> > > >>
> > > >> 2.       But it is not easy for user to review the job info with #1,
> > we
> > > >> could build extra job history service for developers
> > > >>
> > > >> 3.       But where will we build this history service? In Driver
> node
> > or
> > > >> Master node?
> > > >>
> > > >> Any suggestions about this improvement?
> > > >>
> > > >> regards,
> > > >> Andrew
> > > >>
> > >
> >
>

Re: About Spark job web ui persist(JIRA-969)

Reply via email to