[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

Marcelo Vanzin (JIRA) Fri, 20 Feb 2015 12:02:33 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329460#comment-14329460
 ]


Marcelo Vanzin commented on SPARK-1537:
---------------------------------------

Hi [~zzhan], thanks for uploading the document.

Reading through it, I don't see anything that is really that much different 
from my initial proof-of-concept. The points I'd like to highlight are:

- It still depends on YARN-2423, or at least on some effort to write a REST 
client that does not depend on internal Yarn classes.
- What about overhead of the read code? Large jobs with lots of tasks, or 
really long jobs such as Spark Streaming jobs, will have a really large amount 
of events. Fetching them all in one batch would require a lot of memory for 
serializing the data on both sides (ATS and History Server).
- Any security considerations? I haven't really kept up-to-date with the 
security changes in the ATS after I ran into issues with my p.o.c.; but mainly, 
does the Spark job need any special tokens to talk to the ATS when security is 
enabled? Does the ATS guarantee that only the job itself (or someone with the 
right credentials) can add events to its timeline? Or is that all handled 
transparently, somehow, by the client library?
- Does YARN-2928 affect the design in any way? I took a quick look at the data 
model, so hopefully they'll keep things backwards compatible. But it would 
kinda suck to add support for an API with a limited shelf life if that's not 
the case.


> Add integration with Yarn's Application Timeline Server
> -------------------------------------------------------
>
>                 Key: SPARK-1537
>                 URL: https://issues.apache.org/jira/browse/SPARK-1537
>             Project: Spark
>          Issue Type: New Feature
>          Components: YARN
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>         Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

Reply via email to