[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089190#comment-17089190 ] Marcelo Masiero Vanzin commented on SPARK-1537: --- Well, the only thing to start with is the existing SHS code. EventLoggingListener + FsHistoryProvider. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Masiero Vanzin >Priority: Major > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089159#comment-17089159 ] Daniel Templeton commented on SPARK-1537: - Thanks for the response, [~vanzin]. Yeah, I think we would be interested in exploring the work involved to do the integration. We're in the process of introducing Spark into Hadoop clusters that primarily run Scalding today. We're using ATSv2 as the store for all of the Scalding metrics, so it would make sense for us to do the same with Spark. Any required reading that we should do as we decide how best to tackle this? Pointers, tips, tricks, potholes, or any other info would be welcome. Thanks! > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Masiero Vanzin >Priority: Major > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089137#comment-17089137 ] Marcelo Masiero Vanzin commented on SPARK-1537: --- [~templedf] sorry forgot to reply. ATSv1 wasn't a good match for this, and by the time ATSv2 was developed, interest in this feature had long lost traction in the Spark community. So this was closed. Also you probably can do this without requiring the code to live in Spark. But if you actually want to contribute the integration, there's nothing preventing you from opening a new bug and posting a PR. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Masiero Vanzin >Priority: Major > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17085922#comment-17085922 ] Daniel Templeton commented on SPARK-1537: - [~vanzin], is that because the community has invested instead in making SHS that central metrics store and UI? What about clusters with mixed workloads? > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Masiero Vanzin >Priority: Major > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076344#comment-15076344 ] Apache Spark commented on SPARK-1537: - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/10545 > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965230#comment-14965230 ] Apache Spark commented on SPARK-1537: - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/9182 > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743421#comment-14743421 ] Apache Spark commented on SPARK-1537: - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/8744 > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579152#comment-14579152 ] Steve Loughran commented on SPARK-1537: --- Full application log. Application hasn't actually stopped, which is interesting. {code} $ dist/bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --properties-file ../clusterconfigs/clusters/devix/spark/spark-defaults.conf \ --master yarn-client \ --executor-memory 128m \ --num-executors 1 \ --executor-cores 1 \ --driver-memory 128m \ dist/lib/spark-examples-1.5.0-SNAPSHOT-hadoop2.6.0.jar 12 2015-06-09 17:01:59,596 [main] INFO spark.SparkContext (Logging.scala:logInfo(59)) - Running Spark version 1.5.0-SNAPSHOT 2015-06-09 17:02:01,309 [sparkDriver-akka.actor.default-dispatcher-2] INFO slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started 2015-06-09 17:02:01,359 [sparkDriver-akka.actor.default-dispatcher-2] INFO Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting 2015-06-09 17:02:01,542 [sparkDriver-akka.actor.default-dispatcher-2] INFO Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.86:51476] 2015-06-09 17:02:01,549 [main] INFO util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'sparkDriver' on port 51476. 2015-06-09 17:02:01,568 [main] INFO spark.SparkEnv (Logging.scala:logInfo(59)) - Registering MapOutputTracker 2015-06-09 17:02:01,587 [main] INFO spark.SparkEnv (Logging.scala:logInfo(59)) - Registering BlockManagerMaster 2015-06-09 17:02:01,831 [main] INFO spark.HttpServer (Logging.scala:logInfo(59)) - Starting HTTP Server 2015-06-09 17:02:01,891 [main] INFO util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'HTTP file server' on port 51477. 2015-06-09 17:02:01,905 [main] INFO spark.SparkEnv (Logging.scala:logInfo(59)) - Registering OutputCommitCoordinator 2015-06-09 17:02:02,038 [main] INFO util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'SparkUI' on port 4040. 2015-06-09 17:02:02,039 [main] INFO ui.SparkUI (Logging.scala:logInfo(59)) - Started SparkUI at http://192.168.1.86:4040 2015-06-09 17:02:03,071 [main] INFO spark.SparkContext (Logging.scala:logInfo(59)) - Added JAR file:/Users/stevel/Projects/Hortonworks/Projects/sparkwork/spark/dist/lib/spark-examples-1.5.0-SNAPSHOT-hadoop2.6.0.jar at http://192.168.1.86:51477/jars/spark-examples-1.5.0-SNAPSHOT-hadoop2.6.0.jar with timestamp 1433865723062 2015-06-09 17:02:03,691 [main] INFO impl.TimelineClientImpl (TimelineClientImpl.java:serviceInit(285)) - Timeline service address: http://devix.cotham.uk:8188/ws/v1/timeline/ 2015-06-09 17:02:03,808 [main] INFO client.RMProxy (RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at devix.cotham.uk/192.168.1.134:8050 2015-06-09 17:02:04,577 [main] INFO yarn.Client (Logging.scala:logInfo(59)) - Requesting a new application from cluster with 1 NodeManagers 2015-06-09 17:02:04,637 [main] INFO yarn.Client (Logging.scala:logInfo(59)) - Verifying our application has not requested more than the maximum memory capability of the cluster (2048 MB per container) 2015-06-09 17:02:04,637 [main] INFO yarn.Client (Logging.scala:logInfo(59)) - Will allocate AM container, with 896 MB memory including 384 MB overhead 2015-06-09 17:02:04,638 [main] INFO yarn.Client (Logging.scala:logInfo(59)) - Setting up container launch context for our AM 2015-06-09 17:02:04,643 [main] INFO yarn.Client (Logging.scala:logInfo(59)) - Preparing resources for our AM container 2015-06-09 17:02:05,096 [main] WARN shortcircuit.DomainSocketFactory (DomainSocketFactory.java:(116)) - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2015-06-09 17:02:05,106 [main] DEBUG yarn.YarnSparkHadoopUtil (Logging.scala:logDebug(63)) - delegation token renewer is: rm/devix.cotham.uk@COTHAM 2015-06-09 17:02:05,107 [main] INFO yarn.YarnSparkHadoopUtil (Logging.scala:logInfo(59)) - getting token for namenode: hdfs://devix.cotham.uk:8020/user/stevel/.sparkStaging/application_1433777033372_0005 2015-06-09 17:02:06,129 [main] DEBUG yarn.Client (Logging.scala:logDebug(63)) - HiveMetaStore configured in localmode 2015-06-09 17:02:06,130 [main] DEBUG yarn.Client (Logging.scala:logDebug(63)) - HBase Class not found: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration 2015-06-09 17:02:06,225 [main] IN
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541589#comment-14541589 ] Steve Loughran commented on SPARK-1537: --- + YARN-3539 is resolved; the [v1 timeline |https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md#Timeline_Server_REST_API_v1] is now defined and declared one of the supported REST APIs. I'm also removing YARN-2423 as a dependency; the latest patch does this itself > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534810#comment-14534810 ] Steve Loughran commented on SPARK-1537: --- For people who've not been tracking the WiP # the timeline API is pretty thoroughly documented with examples; very close to going in [Latest TimelineServer.md|https://github.com/steveloughran/hadoop-trunk/blob/stevel/YARN-3539-ATS-compatibility/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md] # the timeline server integration is in sync with trunk, especially the SPARK-4705 changes # it has lots of tests. This includes: generating events from a spark context and verifying that they are served up by an an-VM timeline server instance, and retrievable by a REST client, bringing up a Spark History server and making GET requests against it to verifying it hooks up to the server, and other cross-system tests. That's about as much as you can do in a standalone unit test suite. # those tests all run happily on unix and windows, provided you set the {{-Phadoop-2.6 -Pyarn}} flags to request a Hadoop 2.6 profile. # and I've tested against hadoop 2.6.0, 2.7.0 & branch-2; everything compiles and runs Can I get some reviews? > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492303#comment-14492303 ] Steve Loughran commented on SPARK-1537: --- HADOOP-11826 patches the hadoop compatibility document to add timeline server to the list of stable APIs. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485258#comment-14485258 ] Apache Spark commented on SPARK-1537: - User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/5423 > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385890#comment-14385890 ] Steve Loughran commented on SPARK-1537: --- # I've just tried to see where YARN-2444 stands; I can't replicate it in trunk but I've submitted the tests to verify that it isn't there. # for YARN-2423 Spark seems kind of trapped. It needs an api tagged as public/stable; Robert's patch has the API, except it's being rejected on the basis that "ATSv2 will break it". So it can't be tagged as stable. So there's no API for GET operations until some undefined time {{t1 > now()}} —and then, only for Hadoop versions with it. Which implies it won't get picked up by Spark for a long time. I think we need to talk to the YARN dev team and see what can be done here. Even if there's no API client bundled into YARN, unless the v1 API and its paths beginning {{/ws/v1/timeline/}} are going to go away, then a REST client is possible; it may just have to be done spark-side, where at least it can be made resilient to hadoop versions. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329885#comment-14329885 ] Zhan Zhang commented on SPARK-1537: --- [~vanzin] We should centralized all comments and reviews in one place, instead of going to different links. Also, we want to the reviewed code is updated, instead of based on some old version. Let's go to technical: 1. We all agree on this one about timeline client, and this is why it is alpha feature. Hive is a good example, but nobody can deny its importance in spark. 2. ACL is included in the patch, but not in the spec. 3. I understand your question, but the scope of my respond may be too big. To solve this, more work is needed on the entity design. Let's keep an eye on these issues. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329852#comment-14329852 ] Marcelo Vanzin commented on SPARK-1537: --- Hi [~zhzhan], bq. But It is hard to comment or review patch given a hyper-link. Perhaps you're not familiar with all of Github's features, but you can click on each individual commit and comment on the code right there, just like you can on a PR created from those commits. Even if that doesn't sound very appealing, it's not hard to copy & paste the code and comment here if you really want to. Or generate a downloadable diff from the commits (just add ".diff" at the end of the commit URL, e.g. https://github.com/vanzin/spark/commit/c1365e0de264daa015c61a2248c80dfdea705786.diff). bq. REST client: Currently Timeline client does not provide retrieve API. That's the main reason why this feature hasn't moved forward. Using internal APIs to achieve that is something we're not willing to do in Spark, because it exposes us to future breakages and makes compatibility harder to maintain (just look at what has been done for Hive). So we either need the new API in Yarn, or we need to invest time to create a client API that does not use Yarn's classes. bq. ACL: Timeline has ACL control as in hadoop-2.6 I'll believe you here since I haven't looked at that code yet. But it seems like it requires work on the client side, which is not currently covered in your spec. bq. Read overhead and scalability: The effort is in the roadmap in yarn timeline service. This is a critical feature to use timeline service. Current HDFS approach in spark may not scalable due to similar reason I think we're talking about different things. What I'm referring to is that the current code that reads from the ATS reads all events of a particular entity at the same time. If that entity has a large number of events, that will require a lot of memory on the ATS side to serialize the data, and a lot of memory on the Spark History Server side to deserialize it. It's orthogonal to whether the backing store is scalable or not. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329828#comment-14329828 ] Zhan Zhang commented on SPARK-1537: --- [~sowen] In JIRA, we share the code so that other people can comment and review. I am not waiting for patch. But It is hard to comment or review patch given a hyper-link. I never think to make my change alone. Actually from the beginning I acknowledge his contribution, and don't mind closing my PR and help to review his at all if you follow the PR record. Do you agree? You mention you sense some insinuation and conspiracy. I didn't sense it. Can you please educate me if you figure it out? Let's go back to technical: Overall, it is early adoption for timeline service. It is alpha feature, but most functionality is working although with some walkaround. REST client: Currently Timeline client does not provide retrieve API. So we walk around with the similar approach to the timeclient its own implementation. This needs to be changed after timeline component provide more mature API. Read overhead and scalability: The effort is in the roadmap in yarn timeline service. This is a critical feature to use timeline service. Current HDFS approach in spark may not scalable due to similar reason (point me out if I am wrong), and timeline service may be more promising, although it is not there yet. Security: The security is handled transparently in timeline client. ACL: Timeline has ACL control as in hadoop-2.6, and client can create and set domain with R/W so that control the permission. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329739#comment-14329739 ] Sean Owen commented on SPARK-1537: -- [~zzhan] You have provided a patch as a PR right? anyone can try it. Request granted. Given the YARN JIRAs already referenced here, some of which have patches ready to go too, I think it has been discussed in YARN too? What isn't happening with YARN that should be, and, can you help with it? I'm not sure if that's where you are saying the waiting is. That is: hasn't this been blocked on YARN changes for a long time? I get it, one person's 'outstanding bug' is another's 'will not fix' but that's the give and take of OSS. If you want this feature in Spark, and people are asking that it should depend on some YARN changes -- then what do you think about lobbying for those YARN changes? or do you disagree that they're necessary, and can you argue that here please? I don't understand your second reply. Yes, it sounds like two people have a similar solution with a similar problem with YARN APIs. You say you're not waiting on code now, but have repeatedly asked Marcelo to share some (other?) code. It's odd since, yes, it's very clear you acknowledge you've already seen his code and reused a bit, which is entirely fine. I hope we're done with that exchange. I sense some insinuation that code is being 'hidden' in bad faith, but I can't figure out the conspiracy. I see every willingness to make *your* change alone here, if you propose something that addresses the YARN issues raised here. You are *not* blocked on anyone else's patch. However all of us are 'blocked' on the consensus of community / committers that care about this issue, and it looks like the response is clear so far: not until YARN API stuff is sorted out one way or the other. Are you suggesting this patch should be committed without the YARN changes? or that you're working on the YARN changes? what do you want to take over and do next? > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329704#comment-14329704 ] Zhan Zhang commented on SPARK-1537: --- [~sowen] By the way, I am not waiting for someone to give me the patch. It is because someone declare the patch is almost ready half year ago. After I submit mine, then some one keep saying my patch is not much different from his. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329700#comment-14329700 ] Zhan Zhang commented on SPARK-1537: --- [~sowen] From the whole context, I believe you understand what happened here. Let's be professional. My request is "if someone want to try this alpha feature, we can provide a patch at least so that people can give it a try. Even if it cannot go upstream due to various reasons." Due to Yarn block, we should discuss with the yarn community, instead of filing a bug and wait forever. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329691#comment-14329691 ] Sean Owen commented on SPARK-1537: -- [~zzhan] I also can't figure out what you are suggesting here. You have proposed a patch, and you've been given feedback with specific reasons it shouldn't be committed to Spark. I agree with those, FWIW, thought I think they can be overcome soon. I assume others agree, given the silence (?). You haven't responded to these specific points. As it stands I think that's your answer: these YARN issues need to be addressed -- either fixed or agreed to be not an issue. Nobody needs to 'take over'. I'm not clear why you think you have been waiting on something or someone to give you code. Right now the only thing this is waiting on is for you or [~zjshen] or anyone to address the YARN API issues. Rather than keep the broken record going, why not address the YARN API issues highlighted here? sorry, the answer may be that you can't commit this patch you want to by yourself but that's just how OSS works. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329681#comment-14329681 ] Marcelo Vanzin commented on SPARK-1537: --- It's impossible to submit a patch when the implementation is currently blocked on a feature that doesn't exist in Yarn. Please check the "is blocked by" link at the top of this bug. If you're willing to write the code to work around that missing feature, please include that in your spec and patch. I am not and would rather wait for Yarn instead. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329678#comment-14329678 ] Zhan Zhang commented on SPARK-1537: --- [~vanzin] I declare "integrate your code" from the first submission of PR. Do you want to count how many times you keeping saying this? "Here's the link to the comment with the link to my code, dated August '14". Now spark is under the vote for 1.3, and today is 2/20/2015. Is it so difficult submit a workable patch and design doc? > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329664#comment-14329664 ] Zhan Zhang commented on SPARK-1537: --- [~vanzin] If you don't have bandwidth, or don't know how to move forward with this JIRA after a long time. I don't mind to take it over. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329660#comment-14329660 ] Marcelo Vanzin commented on SPARK-1537: --- Hi [~zzhan], I already posted the link to my code in this bug several times. The reason why I haven't sent a PR is the exact reason I raised about your spec and your patch: it uses private Yarn APIs. I've said this several times, and I really don't understand what part of it you don't understand. Pardon me if I haven't been clear about it. Also note how there's Yarn bug in the list of blocker bugs for this one. That's because my p.o.c. code depends on that bug to be fixed before it can move forward. If you have a design that is not blocked by that code, and does not use internal APIs, feel free to remove the link and post it. Here's the link to the comment with the link to my code, dated August '14: https://issues.apache.org/jira/browse/SPARK-1537?focusedCommentId=14088438&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14088438 A link you have already seen, since you used parts of that code in your patch. So please, can you reply to my actual comments instead of keep going back to this issue? My comments have nothing to do with the fact that I've written a p.o.c. for this feature. They're issues that exist in your spec and your code independent of anything I've done. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329649#comment-14329649 ] Zhan Zhang commented on SPARK-1537: --- [~vanzin] Thanks for the comments. I don't understand you keep saying "my code does not have many differences form your code." We are working for apache project, and we all follow apache policy. Here is the link for apache license details: http://www.apache.org/licenses/LICENSE-2.0. As I request several times, why not post your workable patch and design. I will explain to you clearly "what's the major difference of the core design of my code from yours" . The patch size is small, and the design is not so complicated, but I am sure to show you where those core design come from. After you post your design and code, we can start from there. Thanks. Zhan Zhang > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329460#comment-14329460 ] Marcelo Vanzin commented on SPARK-1537: --- Hi [~zzhan], thanks for uploading the document. Reading through it, I don't see anything that is really that much different from my initial proof-of-concept. The points I'd like to highlight are: - It still depends on YARN-2423, or at least on some effort to write a REST client that does not depend on internal Yarn classes. - What about overhead of the read code? Large jobs with lots of tasks, or really long jobs such as Spark Streaming jobs, will have a really large amount of events. Fetching them all in one batch would require a lot of memory for serializing the data on both sides (ATS and History Server). - Any security considerations? I haven't really kept up-to-date with the security changes in the ATS after I ran into issues with my p.o.c.; but mainly, does the Spark job need any special tokens to talk to the ATS when security is enabled? Does the ATS guarantee that only the job itself (or someone with the right credentials) can add events to its timeline? Or is that all handled transparently, somehow, by the client library? - Does YARN-2928 affect the design in any way? I took a quick look at the data model, so hopefully they'll keep things backwards compatible. But it would kinda suck to add support for an API with a limited shelf life if that's not the case. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: SPARK-1537.txt, spark-1573.patch > > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327038#comment-14327038 ] Apache Spark commented on SPARK-1537: - User 'zhzhan' has created a pull request for this issue: https://github.com/apache/spark/pull/4683 > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326778#comment-14326778 ] Zhan Zhang commented on SPARK-1537: --- I have sent a PR with WIP for people who are interested. https://github.com/apache/spark/pull/4683/files > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14198852#comment-14198852 ] Marcelo Vanzin commented on SPARK-1537: --- I believe with YARN-2033 and YARN-2423 I can work around YARN-2444 even if it's still an issue, so I'll add the dependency accordingly. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192561#comment-14192561 ] Marcelo Vanzin commented on SPARK-1537: --- bq. It's again a vague statement. I don't know what is vague about wanting to read the data you write. bq. Can you share your design detail I already did way better than that, way earlier in this bug: I shared the actual code. For this particular question, here it is: https://github.com/vanzin/spark/blob/yarn-timeline/yarn/timeline/src/main/scala/org/apache/spark/deploy/yarn/timeline/YarnTimelineProvider.scala See how it reads data from the ATS? It feeds it into the Spark history server, where the data can be visualized. It's using Yarn internal APIs, which is generally bad practice. bq. If you don't agree on it, please post your investigation on YARN-2444, YARN folks will help you on this issue. I posted the error and the code to reproduce it. I don't know what else do you expect from me. If you think it's an authorization issue, test it with 2.6 and close the bug if you believe it's fixed. bq. No matter the integration with timeline service, Spark on YARN is picking Hadoop versions now. It doesn't make sense to ask for a feature by using an early version that hasn't it. I'm not sure I really understood what you're trying to say here. Yes, we have to pick versions. We need a version that supports the features we need. Even if the API in 2.5 didn't change in 2.6, it seems to have bugs that prevent my current code from working, so there is no point in trying to integrate with 2.5 as far as I'm concerned. And as far as I know, 2.6 hasn't been released yet. (BTW, my code used to work with 2.4.) > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192538#comment-14192538 ] Zhijie Shen commented on SPARK-1537: bq. Spark needs both to put and read data It's again a vague statement. Can you share your design detail, such that we can evaluate it is really necessary? And what is the actual way of visualizing data? And integration work is not just single bug fix patch, we can divide work into a sequent of sub tasks, and the first step is to enable Spark job to be able to putting the data into the timeline server. By doing this, not only Spark's only web front can visualize job history, it also enable the third-party tools to do Spark job analysis too. bq. I'm not sure why you say it's security-related since there nothing security-related in the example code I posted. I said "According to the exception, the user doesn't pass the authorization for some reason." If you don't agree on it, please post your investigation on YARN-2444, YARN folks will help you on this issue. bq. if something doesn't work in 2.5 but works in 2.6, No matter the integration with timeline service, Spark on YARN is picking Hadoop versions now. It doesn't make sense to ask for a feature by using an early version that hasn't it. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192502#comment-14192502 ] Marcelo Vanzin commented on SPARK-1537: --- bq. This is proposed to improve the Java libs by adding GET APIs. They are used to query data, NOT to put data. Spark needs both to put and read data, otherwise the ATS is useless for Spark. The current goal of Spark is to use the ATS as a store for its history data, since the data itself is not considered public and stable itself. So there is no point in integration if you can only write data. (I know you can read data through other means, but I don't want to write a custom REST client just to get ATS support in.) bq. It is reported for 2.5, and is probably no longer valid after we fixed a bunch of security issues for 2.6. I'm not sure why you say it's security-related since there nothing security-related in the example code I posted. And if something doesn't work in 2.5 but works in 2.6, it means we (and by that I mean Spark) have to restrict our support to the versions where things work - even if the underlying API is exactly the same. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14192491#comment-14192491 ] Zhijie Shen commented on SPARK-1537: bq. BTW, if you want a list of things I think are important for Spark, here are some quick ones: Thanks for sharing the details, which are more helpful to clean up the puzzles than some big but vague statement. Let me go through the aforementioned Jiras: * YARN-2521: I'd like to keep it open for some further client improvement, such as local timeline data caching, while YARN-2673 already made the client retry when the server temporally doesn't respond. Please note that "I think it's pretty critical when you can't upload your data because the server is down" is *no longer true* after YARN-2673. On the other side, At the point of view of the API, it should keep stable. * YARN-2423: This is proposed to improve the Java libs by adding GET APIs. They are used to query data, NOT to put data. We do this to help the use case that the developers write Java code to implement the UI to analyze the timeline data. Framework integration mainly deals with PUT APIs, and the Java client libs are already there. Take one step back, apart from the client libs, the RESTful APIs are always there, which is programming language neutral, and useful to non-Java developers. * YARN-2444: It's may be a bug or an improper use case. According to the exception, the user doesn't pass the authorization for some reason. It is reported for 2.5, and is probably no longer valid after we fixed a bunch of security issues for 2.6. We need to do more validation for this issue before a conclusion. Anyway, it's obviously an internal issue happening in secure mode only, which should not the API CHANGES. bq. I understand it doesn't affect the client API and we can still have the code in, It seems that we have the agreement that the current timeline service offering is not blocking the Spark integration work. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191060#comment-14191060 ] Marcelo Vanzin commented on SPARK-1537: --- I think it's pretty critical when you can't upload your data because the server is down; it means we can't really recommend using the current ATS because it's not reliable. I understand it doesn't affect the client API and we can still have the code in, but it's an important feature that seems to be missing. YARN-2423, though, is really something that can't be done today without poking into private Yarn classes or writing a bunch of extra code. I really wouldn't want to have to support any of those two options in Spark. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191052#comment-14191052 ] Zhan Zhang commented on SPARK-1537: --- Yarn-2521 can make client easier to use, but not critical. Some application logic make the client cache difficult to be generic. Yarn-2444 may be already obsolete. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191007#comment-14191007 ] Marcelo Vanzin commented on SPARK-1537: --- BTW, if you want a list of things I think are important for Spark, here are some quick ones: * YARN-2521 (I've sort of implemented this in my code, but would really like to not have to care about it) * YARN-2423 (note how this is a new API) * YARN-2444 YARN-2521 might be the same as YARN-2673, no? YARN-2513 is sort of interesting but not necessary. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190956#comment-14190956 ] Marcelo Vanzin commented on SPARK-1537: --- bq. Please elaborate the changes in case I've missed some discussion That's part of why I'm waiting on SPARK-1530. There's been no activity in a while; I've been told there have been offline discussions but I don't see any updates on the actual issue itself, so that's the main reason why I've been holding off on this work: I don't feel it's a good investment of time to go forward with something that might change in the near future. It would be great if you could update that bug with a concrete plan for the post-2.6 updates related to reliability and other features. If they really don't affect the client API, then great, I can continue my Spark-side work without worries. But again, I've mainly been waiting because of the radio silence from the ATS side w.r.t. the issues that I think are important to Spark. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190946#comment-14190946 ] Zhijie Shen commented on SPARK-1537: bq. That's exactly my point about the ATS not being production-level quality yet. The current plans I'm aware of would require changes in the ATS API. Not to mention the definition of production ready (which differs from community to community, such as Tez and MapReduce), I'm curious about the required API changes of the timeline server. Please elaborate the *changes* in case I've missed some discussion. On the other side, according to my understanding of the timeline server, the ongoing and the future improvement is: 1) Security is coming with Hadoop 2.6, which doesn't affect the usage of the existing APIs in a insecure mode. AFAIK, Spark is working with Hadoop 2.3(4). It should be okay to ride on the timeline server in insecure mode. Whenever upgrading to Hadoop 2.6, you just need to turn on the security switch. 2) Timeline availability and scalability is going to be a server side improvement, but doesn't affect user-faced API. In the scope of YARN, we have already successfully enhance RM with the HA feature while making it transparent to the user. I'm not aware of the major blocker that prevents the timeline server to achieve the same goal. 3) For the client libs, we're trying to help to users to utilize the timeline service more easily (e.g., YARN-2517, YARN-2673), which are either transparent or additions. As I've mentioned before, we're careful about any proposed changes that will break the incompatibility. I'm commenting on this Jira to share more insights about the timeline server to Spark folks in case the folks interested in this YARN offer. It's up to Spark folks to decide whether they want to make use of it or when they make use of it. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190863#comment-14190863 ] Marcelo Vanzin commented on SPARK-1537: --- bq. ...security (coming 2.6), high availability, scalability, better client libs and so on... That's exactly my point about the ATS not being production-level quality yet. The current plans I'm aware of would require changes in the ATS API. Since Spark does not support the ATS at the moment, I'd rather have it support the new-and-secure-and-scalable-and-available API than the current one. Otherwise you'll get into the mess of having to conditionally compile code for both APIs, or implement part of those features into your own client code (something I've done in my proof-of-concept but I'd really like to avoid, because it's really just trying to work around limitations in the current ATS design). So, short version of what I'm trying to say: yes, you can build something that talks to the current ATS. But given that it currently has shortcomings, and the fix for those will, as far as I know, affect the client API, I don't see the point in trying to push that integration at this moment when Spark already has a working solution for job history, just so that you'll ship code that will be immediately deprecated by the new ATS... > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190849#comment-14190849 ] Zhijie Shen commented on SPARK-1537: [~vanzin], thanks for introducing YARN timeline server to Spark. Let me briefly summarize the current status of the timeline server and answer some concerns here. Spark folks who are interested in this monitoring service offered by YARN can go ahead to YARN-1530 to read the design doc and watch the latest progress. 1. The essential functions or the timeline service have been available since Hadoop 2.4. Basically, the user can organize the app's history or metrics according to timeline data model and post it the the timeline server. Later on, user or admin can come back to query this information to analyze how the app was going. The essential APIs keep unchanged from 2.4 to the coming 2.6. There should *NOT* be any incompatible API changes that will block this work. Moreover, Keeping compatible is always in our consideration when coming up with new features in the following Hadoop releases. 2. It's *NOT* exactly that the timeline server is not production-ready. In fact, Apache Tez has already integrated the timeline server for logging the history information. In the coming Hadoop 2.6, MapReduce is also enabled to publish the history information to the timeline server, too. Moreover, within the scope of YARN, a built-in generic history service on top of the timeline service is available to YARN users to watch all kinds of apps. Hence, with several successful pioneer, Spark should be confident enough to take the new merit of YARN. 3. While YARN community is progressing quickly to improve the timeline server in terms of security (coming 2.6), high availability, scalability, better client libs and so on, it should not disturb the initial attempt for Spark to embrace the timeline server, but will offer better experience if Spark is riding on it. If you have other issue of high priority to work on, I think [~zhazhan] will be able to help this integration. Thanks! > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190396#comment-14190396 ] Marcelo Vanzin commented on SPARK-1537: --- Hi Zhan, As I mentioned, I'm waiting for issues being discussed in YARN-1530 to be resolved first. The current plans, as far as I am aware, would result in incompatible API changes in the timeline server API, so I'd rather wait for that before pushing any solution in Spark. You're free to come up with your own solution if you want, but I would seriously recommend waiting for the timeline server to actually reach production-level quality before going with integration, especially as far as its API goes. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189651#comment-14189651 ] Zhan Zhang commented on SPARK-1537: --- Hi Marcelo, Do you have update on this? If you don't mind, I can work on your branch to get this done asap. Please let me know how do you think? > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139402#comment-14139402 ] Marcelo Vanzin commented on SPARK-1537: --- No set schedule as of now. The current code "works", but it's blocked by at least one bug I filed against Yarn (YARN-2444). Also, I'm not comfortable with the current ATS design. There's discussion on YARN-1530 about making it better and I want to wait until that work at least starts, in case it causes changes in the API. While it's possible to submit the code without the Yarn changes in, I'm loth to add support for something that just isn't production-ready yet. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139396#comment-14139396 ] Zhan Zhang commented on SPARK-1537: --- Do you have any update on this, or any schedule in your mind yet? > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106164#comment-14106164 ] Marcelo Vanzin commented on SPARK-1537: --- No concrete timeline at the moment. I'm just starting to look at the 2.5.0 version of ATS so I can incorporate things into my patch. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104881#comment-14104881 ] Zhan Zhang commented on SPARK-1537: --- Thanks for sharing this. Do you have concrete plan or timeline for this Jira? > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088438#comment-14088438 ] Marcelo Vanzin commented on SPARK-1537: --- Current code is here: https://github.com/vanzin/spark/tree/yarn-timeline Very much WIP at this point. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086817#comment-14086817 ] Marcelo Vanzin commented on SPARK-1537: --- Currently busy with other more urgent tasks, but I'll push to my repo and post a link when I get some time. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086808#comment-14086808 ] Zhan Zhang commented on SPARK-1537: --- Do you mind sharing your thoughts, design document or prototype code? Thanks. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086663#comment-14086663 ] Marcelo Vanzin commented on SPARK-1537: --- I have a prototype ready. But I'm still investigating some issues with the Yarn side of things (mostly around security and scalability). Given that I have some code pretty much ready, you're welcome to spend time on it, but you'd be duplicating work I already have done. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086657#comment-14086657 ] Zhan Zhang commented on SPARK-1537: --- I am also interested in it and trying to integrate spark to yarn timeline server. Do you have any concrete plan in mind? I can start prototype it and then we can work together on this topic. How do you think? > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server
[ https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081465#comment-14081465 ] Marcelo Vanzin commented on SPARK-1537: --- I'm working on this but this all sort of depends on progress being made on the Yarn side, so at this moment I'm not yet ready to send any PRs. > Add integration with Yarn's Application Timeline Server > --- > > Key: SPARK-1537 > URL: https://issues.apache.org/jira/browse/SPARK-1537 > Project: Spark > Issue Type: New Feature > Components: YARN >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > > It would be nice to have Spark integrate with Yarn's Application Timeline > Server (see YARN-321, YARN-1530). This would allow users running Spark on > Yarn to have a single place to go for all their history needs, and avoid > having to manage a separate service (Spark's built-in server). > At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, > although there is still some ongoing work. But the basics are there, and I > wouldn't expect them to change (much) at this point. -- This message was sent by Atlassian JIRA (v6.2#6252)