[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

Marcelo Vanzin (JIRA) Fri, 20 Feb 2015 16:38:49 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329852#comment-14329852
 ]


Marcelo Vanzin commented on SPARK-1537:
---------------------------------------

Hi [~zhzhan],

bq. But It is hard to comment or review patch given a hyper-link. 

Perhaps you're not familiar with all of Github's features, but you can click on 
each individual commit and comment on the code right there, just like you can 
on a PR created from those commits. Even if that doesn't sound very appealing, 
it's not hard to copy & paste the code and comment here if you really want to. 
Or generate a downloadable diff from the commits (just add ".diff" at the end 
of the commit URL, e.g. 
https://github.com/vanzin/spark/commit/c1365e0de264daa015c61a2248c80dfdea705786.diff).

bq. REST client: Currently Timeline client does not provide retrieve API.

That's the main reason why this feature hasn't moved forward. Using internal 
APIs to achieve that is something we're not willing to do in Spark, because it 
exposes us to future breakages and makes compatibility harder to maintain (just 
look at what has been done for Hive). So we either need the new API in Yarn, or 
we need to invest time to create a client API that does not use Yarn's classes.

bq. ACL: Timeline has ACL control as in hadoop-2.6

I'll believe you here since I haven't looked at that code yet. But it seems 
like it requires work on the client side, which is not currently covered in 
your spec.
bq. Read overhead and scalability: The effort is in the roadmap in yarn 
timeline service. This is a critical feature to use timeline service. Current 
HDFS approach in spark may not scalable due to similar reason

I think we're talking about different things. What I'm referring to is that the 
current code that reads from the ATS reads all events of a particular entity at 
the same time. If that entity has a large number of events, that will require a 
lot of memory on the ATS side to serialize the data, and a lot of memory on the 
Spark History Server side to deserialize it. It's orthogonal to whether the 
backing store is scalable or not.

> Add integration with Yarn's Application Timeline Server
> -------------------------------------------------------
>
>                 Key: SPARK-1537
>                 URL: https://issues.apache.org/jira/browse/SPARK-1537
>             Project: Spark
>          Issue Type: New Feature
>          Components: YARN
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>         Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

Reply via email to