[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329852#comment-14329852 ]
Marcelo Vanzin commented on SPARK-1537: --------------------------------------- Hi [~zhzhan], bq. But It is hard to comment or review patch given a hyper-link. Perhaps you're not familiar with all of Github's features, but you can click on each individual commit and comment on the code right there, just like you can on a PR created from those commits. Even if that doesn't sound very appealing, it's not hard to copy & paste the code and comment here if you really want to. Or generate a downloadable diff from the commits (just add ".diff" at the end of the commit URL, e.g. https://github.com/vanzin/spark/commit/c1365e0de264daa015c61a2248c80dfdea705786.diff). bq. REST client: Currently Timeline client does not provide retrieve API. That's the main reason why this feature hasn't moved forward. Using internal APIs to achieve that is something we're not willing to do in Spark, because it exposes us to future breakages and makes compatibility harder to maintain (just look at what has been done for Hive). So we either need the new API in Yarn, or we need to invest time to create a client API that does not use Yarn's classes. bq. ACL: Timeline has ACL control as in hadoop-2.6 I'll believe you here since I haven't looked at that code yet. But it seems like it requires work on the client side, which is not currently covered in your spec. bq. Read overhead and scalability: The effort is in the roadmap in yarn timeline service. This is a critical feature to use timeline service. Current HDFS approach in spark may not scalable due to similar reason I think we're talking about different things. What I'm referring to is that the current code that reads from the ATS reads all events of a particular entity at the same time. If that entity has a large number of events, that will require a lot of memory on the ATS side to serialize the data, and a lot of memory on the Spark History Server side to deserialize it. It's orthogonal to whether the backing store is scalable or not. > Add integration with Yarn's Application Timeline Server > ------------------------------------------------------- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org