[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353825#comment-14353825 ] Allen Wittenauer commented on YARN-321: --- Looks like this should get closed out w/a fix ver of 2.4.0? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076080#comment-14076080 ] Patrick Morton commented on YARN-321: - Compared to wrists well is less available status amphetamines but higher investigations of withdrawal. adderall 20 mg http://www.surveyanalytics.com//userimages/sub-2/2007589/3153260/29851520/7787428-29851520-stopadd3.html Areas also document any reasons they have surprisingly been using in the information. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887745#comment-13887745 ] Hudson commented on YARN-321: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1659 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1659/]) Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562950) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887731#comment-13887731 ] Hudson commented on YARN-321: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1684 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1684/]) Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562950) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887647#comment-13887647 ] Hudson commented on YARN-321: - SUCCESS: Integrated in Hadoop-Yarn-trunk #467 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/467/]) Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562950) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887036#comment-13887036 ] Hudson commented on YARN-321: - SUCCESS: Integrated in Hadoop-trunk-Commit #5074 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5074/]) Updating trunk's YARN CHANGES.txt after YARN-321 merge. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562950) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882302#comment-13882302 ] Hudson commented on YARN-321: - SUCCESS: Integrated in Hadoop-Hdfs-trunk #1654 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1654/]) YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi Yamashita. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561458) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml YARN-321. Merging YARN-321 branch to trunk. svn merge ../branches/YARN-321 (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561452) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs * /hadoop/common/trunk/hadoop-mapreduce-project * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoo
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882297#comment-13882297 ] Hudson commented on YARN-321: - FAILURE: Integrated in Hadoop-Mapreduce-trunk #1679 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1679/]) YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi Yamashita. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561458) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml YARN-321. Merging YARN-321 branch to trunk. svn merge ../branches/YARN-321 (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561452) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs * /hadoop/common/trunk/hadoop-mapreduce-project * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-pro
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882274#comment-13882274 ] Hudson commented on YARN-321: - SUCCESS: Integrated in Hadoop-Yarn-trunk #462 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/462/]) YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi Yamashita. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561458) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml YARN-321. Merging YARN-321 branch to trunk. svn merge ../branches/YARN-321 (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561452) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs * /hadoop/common/trunk/hadoop-mapreduce-project * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882206#comment-13882206 ] Hudson commented on YARN-321: - SUCCESS: Integrated in Hadoop-trunk-Commit #5040 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5040/]) YARN-1625. Fixed RAT warnings after YARN-321 merge. Contributed by Shinichi Yamashita. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561458) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/pom.xml > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882176#comment-13882176 ] Hudson commented on YARN-321: - SUCCESS: Integrated in Hadoop-trunk-Commit #5039 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5039/]) YARN-321. Merging YARN-321 branch to trunk. svn merge ../branches/YARN-321 (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561452) * /hadoop/common/trunk * /hadoop/common/trunk/hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs * /hadoop/common/trunk/hadoop-mapreduce-project * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ResourceMgrDelegate.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn.cmd * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/pom.xml * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationHistoryProtocol.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptReportResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetApplicationAttemptsResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainerReportResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/GetContainersResponse.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerReport.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/YarnApplicationAttemptState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ContainerNotFoundException.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/application_history_client.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/application_history_server.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AHSClient.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hado
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868043#comment-13868043 ] Zhijie Shen commented on YARN-321: -- bq. 1. Does it provide a function to set maximum files and maximum retention period of AppicationHistory to store in HDFS? No, currently the FS implementation doesn't discard the historic data of the applications completed before sometime, answer users' requests based on all the stored applications. However, via REST API, users are able to filter the applications outside a start/finish time window. bq. 2. When there are many AppilicationHistory in HDFS, does it not limit the number of the reading of ApplicationHistory? As to REST API, the users are able to limit the number of applications that AHS should return. As to HDFS access, the current implementation is going to load all the stored applications and filtering them one-by-one, which is not a efficient way given a big application collection. YARN-925 is reopened to discuss pushing the filtering into the implementation of the history store, where we can prevent loading all the applications. Meanwhile, caching (YARN-1322) is another way to reduce I/O. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865140#comment-13865140 ] Shinichi Yamashita commented on YARN-321: - I confirmed attached design document. And I have two questions about FileSystemApplicationHistoryStore. 1. Does it provide a function to set maximum files and maximum retention period of AppicationHistory to store in HDFS? 2. When there are many AppilicationHistory in HDFS, does it not limit the number of the reading of ApplicationHistory? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856021#comment-13856021 ] Vinod Kumar Vavilapalli commented on YARN-321: -- Tx Zhijie for answering most of Sandy's questions, you are spot on. I'll update the design doc to clarify things where it isn't clear. bq. What is the jira for app specific history data? I just filed YARN-1530, will post more information soon. bq. Could you describe the security requirements a bit further. Its not clear to everyone how everything works currently. To be clear, what exactly needs to be done to make apps write and read history data. The data covered in this JIRA is generic and only RM gets to write it. The consumers of this data are *both* the cluster admins for historical analyses as well as individual apps that chose to not use features that come out of YARN-1530. As such, we cannot let apps write history data. bq. How is the shared bus different from writing to a file. I would think one would cover the other. Yes, writing to a file is one example of a shared-bus. I'll fix it if the doc is confusing w.r.t this. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855707#comment-13855707 ] Robert Joseph Evans commented on YARN-321: -- The way it currently works is based off of group permissions on a directory (this is from memory from a while ago so I could be off on a few things). In HDFS when you create a file the group of the file is the group of the directory the file is a part of, similar to the sticky bit on a directory in Linux. When an MR job completes it will copy it's history log file, along with a few other files, to a drop box like location called intermediate done and atomically rename it from a temp name to the final name. The directory is world writable, but only readable by a special group that the history server is a part of, but general users are not. The history server then wakes up periodically and will scan that directory for new files, when it sees new files it will move them to a final location that is owned by the headless history server user. If a query comes in for a job that the history server is not aware of, it will also scan the intermediate done directory before failing. Reading history data is done through RPC to the history server, or through the web interface, including RESTful APIs. There is no supported way for an app to read history data directly though the file system. I hope this helps. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853830#comment-13853830 ] Bikas Saha commented on YARN-321: - What is the jira for app specific history data? Could you describe the security requirements a bit further. Its not clear to everyone how everything works currently. To be clear, what exactly needs to be done to make apps write and read history data. How is the shared bus different from writing to a file. I would think one would cover the other. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853492#comment-13853492 ] Zhijie Shen commented on YARN-321: -- AFAIK, bq. Will the AHS serve logs? It will serve the aggregated logs. bq. Is my understanding correct that for the first version, the RM will not do any RPCs to the AHS. Yes, RM and AHS don't communicate via RPC, but the shared bus. bq. Will APIs be Public / Stable? Assume you're talking about RPC interface. It will be Public but Unstable. bq. In the first version, what will be provided in terms of web UI? * The page of a list of completed applications (FINISHED, FAILED, KILLED) in HistoryStorage * The page of one application and a list of its attempts * The page of one attempt and a list of its containers * The page of one container * The page of the container logs bq. How are ACLs enforced between the AHS and RM? Is your question how the ACLs enforced on one application when it is running by RM is inherited by AHS when the application is finished and stored in HistoryStorage? My rough idea is to persist the ACLs info into HistoryStorage as well. For ApplicationACLs, it seems to be much obvious. For QueueACLs, I'm not quite sure, may be complex and depend on scheduler. bq. Eventually we want the AHS to interact well with long-running services, right? I think so. bq. For the file-based bus, have we though about an approach for how it might store info for applications in non-terminal states? Currently, HistoryStorage is receiving a series of events which notifies the updates about an application/attempt/container. The info is accumulated incrementally. IMO, the strategy can be applied to a long-running app with defining more intermediate events (now we only defines start and finish events) to report the progress. Then, instead of holding the events until app gets finished, we need to flush them in advance/periodically/..., such that users can get informed of latest progress of the app. Hopefully the answers can address you questions.:-) > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853439#comment-13853439 ] Sandy Ryza commented on YARN-321: - Thanks Vinod for the design doc. A few additional questions: Will the AHS serve logs? Is my understanding correct that for the first version, the RM will not do any RPCs to the AHS. Will APIs be Public / Stable? In the first version, what will be provided in terms of web UI? How are ACLs enforced between the AHS and RM? Eventually we want the AHS to interact well with long-running services, right? For the file-based bus, have we though about an approach for how it might store info for applications in non-terminal states? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838599#comment-13838599 ] Zhijie Shen commented on YARN-321: -- bq. Thanks Shen. Is there any estimation for the release of 2.4 ? You can trace the release plan via dev mailing list > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838492#comment-13838492 ] Jeff Zhang commented on YARN-321: - Thanks Shen. Is there any estimation for the release of 2.4 ? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838096#comment-13838096 ] Zhijie Shen commented on YARN-321: -- bq. Will this jira been included in the next release ? Yes, AHS is included in the plan of 2.4 bq. Another question about this jira. I found that the container logURL is hard-coded there, user still could not see the logs of each container ( stdout, stderror ). Is it on the roadmap that allow user to see the logs ? And which jira is tracking this ? Are you talking about showing aggregated logs via web interfaces? If it is, you may want to keep an eye on YARN-1413, which is tracking this issue. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837281#comment-13837281 ] Jeff Zhang commented on YARN-321: - Another question about this jira. I found that the container logURL is hard-coded there, user still could not see the logs of each container ( stdout, stderror ). Is it on the roadmap that allow user to see the logs ? And which jira is tracking this ? Thanks . > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837276#comment-13837276 ] Jeff Zhang commented on YARN-321: - Will this jira been included in the next release ? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790604#comment-13790604 ] Zhijie Shen commented on YARN-321: -- bq. I like the diagrams, but I want to understand if the generic application history service is intended to replace the job history server, or to just augment it? Yes, there's some previous discussion on recording the per application type history data, but we plan to exclude it from the initial version of AHS. Eventually, we'd like to integrate JHS into AHS in some way, the details of which we could discuss in the follow jiras. To me, it would be better if we can design a common per application type plugin framework, such that we can easily integrate the HS of other applications on YARN. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790555#comment-13790555 ] Robert Joseph Evans commented on YARN-321: -- I like the diagrams, but I want to understand if the generic application history service is intended to replace the job history server, or to just augment it? I would prefer it if we could replace the current server. Perhaps not in the first release, but eventually. To make that work we would have to provide a way for MR specific code to come up and run inside the service, exposing both the current restful web service, an application specific UI, and the RPC server that we currently run. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790047#comment-13790047 ] Sandy Ryza commented on YARN-321: - Thanks Vinod and Zhijie. Didn't see the comment. I'm going to attach your outline as a pdf to make it a little easier for passers-by to learn about. Here's the google doc it came from if you want to edit: https://docs.google.com/document/d/1cNsdGyLuagR8lzfeQrAclOAd-AdkVwgST6OG8Zzp43M/edit#heading=h.15p1lkmmm9g8 > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, > HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789951#comment-13789951 ] Vinod Kumar Vavilapalli commented on YARN-321: -- I assumed this is a well known problem so didn't try to eloborate on the WHY's. My comment above (https://issues.apache.org/jira/browse/YARN-321?focusedCommentId=13706553&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13706553) kind of captures the core design. Though code level design doc isn't there yet. We are about to wrap up stuff on YARN-321 branch - will post a more comprehensive doc covering design and implementation details. In the mean while, if you have specific questions, fire them away. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789876#comment-13789876 ] Sandy Ryza commented on YARN-321: - Was a design doc ever written up for this? The HistoryStorageDemo.java is a good start for understanding some of the interfaces, but it would be helpful to have something that explains things like what the Application History Service's role is, how it interacts with the RM, and key differences and similarities with the Job History Server. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13713204#comment-13713204 ] Bikas Saha commented on YARN-321: - Sounds right. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712796#comment-13712796 ] Zhijie Shen commented on YARN-321: -- bq. Running as service: By default, ApplicationHistoryService will be embedded inside ResourceManager but will be independent enough to run as a separate service for scaling purposes. IIUC, to be independent, ApplicationHistoryService should have its own event dispatcher, shouldn't it? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712546#comment-13712546 ] Karthik Kambatla commented on YARN-321: --- bq. Folks, it would be great if we have a consolidated document that describes the design and some details. +1 > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711345#comment-13711345 ] Bikas Saha commented on YARN-321: - Folks, it would be great if we have a consolidated document that describes the design and some details. We can keep posting revisions of the document to this jira as things changes significantly. HDFS-2802 is an example. Its a little difficult to piece it together from multiple comments and clarifications. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709991#comment-13709991 ] Vinod Kumar Vavilapalli commented on YARN-321: -- I just created YARN-321 branch and filing sub-tasks. Let's move ahead. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709989#comment-13709989 ] Vinod Kumar Vavilapalli commented on YARN-321: -- bq. How about starting with an in-memory implementation, which is the easiest to do and is useful for testing. We can have one for testing, but a file-system based impl should exist from the beginning. bq. Here's more ellaboration about the per-application data. There should be three objects to record: RMApp, RMAppAttempt and RMContainer. . Tx for the detailed analysis, [~zjshen]! bq. In addition, HistoryStorage APIs may involve a lot of I/O operations such that the response of an API will be long. Therefore, it is likely to be good to make the API non-blocking. +1 bq. Are we moving aggregated log management(i.e deletion after expiry) responsibility to AHS? bq. Right now its not clear what needs to be done for log aggregation? bq. Sorry for misunderstanding your previous question. IMHO, in the recent future, we're not moving the aggregated log management, but duplicate it, which Both AHS and JHS can serve the same aggregated logs. However, AHS and JHS see the same logs from different point of views. AHS simply considers them as container logs, no matter what application it is, while JHS know they are the MR job logs. Vinod Kumar Vavilapalli, would you please confirm it? That's an interesting point about JHS knowing that they are MR job logs. But we don't do anything special today. I'm thinking of just pull out log-handling and use it in both AHS and JHS for the time being. bq. >>>ResourceManager will push the data to HistoryStorage after an application finishes in a separate thread. bq. Is it per application or only one thread in RM? I foresee a single thread. bq. Isn't it be a good idea that as soon as application starts we send the information to AHS and let AHS write all the data published by RM for that application. In that case it would be very less overhead for RM. Like I mentioned in my proposal, this could be one implementation of HistoryStorage. bq. What about in the cases where RM restart or crashes in those cases RM has to republish all the running applications to AHS or just forget about the previous running apps? This was covered. In the first cut, we'll make best efforts. Losing an app's data in such a corner case is bearable. bq. (Thinking out loud) Are we decided on who all can write to HistoryStorage? I thought this was clear. Only RM. No per-framework data or data from AMs to AHS directly. Yet, at the least. bq. I think History storage should be written by AHS not RM in that case RM will have less load and will be better to scale. This is an implementation detail. Details on the impl whether the RM writes files directly or pushes to AHS. Let's do the APIs first. Overall, I repeat that we should separate the design for per-framework data. Fundamentally because the AHS is a shared service. It needs a radically different design from my initial thoughts. Will file a separate JIRA and post my thoughts there. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709966#comment-13709966 ] Mayank Bansal commented on YARN-321: I think History storage should be written by AHS not RM in that case RM will have less load and will be better to scale. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709940#comment-13709940 ] Karthik Kambatla commented on YARN-321: --- The approach in HistoryStorageDemo looks good, like the fact that the schema goes into the tuple. (Thinking out loud) Are we decided on who all can write to HistoryStorage? # Single writer - RM? If so, the AMs should pass all the information to the RM. Need to carefully handle the HA scenarios, may be as part of the HA work. # Both AMs and RM write to HistoryStorage directly? Writes synchronized at the tuple level? I am thinking of long-running services here - they might want AM/RM to write tuples every so often. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > Attachments: HistoryStorageDemo.java > > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709262#comment-13709262 ] Jason Lowe commented on YARN-321: - bq. Is there a reason to embed this inside the RM? I don't know if there were reasons for the JHS to be separate, other than it being MR-specific. IIRC the history server was embedded in the JT back in 1.x and was only split out as a separate daemon to keep the RM from having a dependency on MR. bq. That said, I agree it will be easier for the user if AHS starts along with the RM. May be, that should be configurable and turned on by default? That'd be my preference, and the proxyserver is already done this way. One can run it either as part of the RM (default) or setup some configs and launch it separately via {{yarn proxyserver}}. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709192#comment-13709192 ] Karthik Kambatla commented on YARN-321: --- Few other considerations: bq. Running as service: By default, ApplicationHistoryService will be embedded inside ResourceManager but will be independent enough to run as a separate service for scaling purposes. Is there a reason to embed this inside the RM? I don't know if there were reasons for the JHS to be separate, other than it being MR-specific. If there were, this would be against those. No? That said, I agree it will be easier for the user if AHS starts along with the RM. May be, that should be configurable and turned on by default? bq. Hosting/serving per-framework data is out of scope for this JIRA. Understand and agree it makes sense to not complicate it. However, during the design, it would be nice to outline (at least at a high-level) how the "plugins" can work. For the plugins to serve application-specific information, I suspect the RM should write this information in addition to generic YARN information about that application (e.g. MapReduce counters). On completion, can we leave a provision for the AM to write a json blob (may be, via RM) to {{HistoryStorage}}. In the AHS, can we leave a provision for app-"plugins" to access/use this information to render application specifics. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709098#comment-13709098 ] Zhijie Shen commented on YARN-321: -- bq. Is it per application or only one thread in RM? I think it should be one thread in RM. bq. Isn't it be a good idea that as soon as application starts we send the information to AHS and let AHS write all the data published by RM for that application. I'm afraid a number of metrics cannot be determined when an application has just been started, such as the finish time and the final status. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709030#comment-13709030 ] Mayank Bansal commented on YARN-321: Overall Looks good, However some points to consider >>>ResourceManager will push the data to HistoryStorage after an application >>>finishes in a separate thread. Is it per application or only one thread in RM? Isn't it be a good idea that as soon as application starts we send the information to AHS and let AHS write all the data published by RM for that application. In that case it would be very less overhead for RM. What about in the cases where RM restart or crashes in those cases RM has to republish all the running applications to AHS or just forget about the previous running apps? Right now its not clear what needs to be done for log aggregation? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706773#comment-13706773 ] Zhijie Shen commented on YARN-321: -- bq. Are we moving aggregated log management(i.e deletion after expiry) responsibility to AHS? Sorry for misunderstanding your previous question. IMHO, in the recent future, we're not moving the aggregated log management, but duplicate it, which Both AHS and JHS can serve the same aggregated logs. However, AHS and JHS see the same logs from different point of views. AHS simply considers them as container logs, no matter what application it is, while JHS know they are the MR job logs. [~vinodkv], would you please confirm it? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706753#comment-13706753 ] Devaraj K commented on YARN-321: I agree Zhijie Shen. In the above comment, I mean JHS currently supports serving Aggregated logs for MR jobs and managing the Aggregated logs(i.e deleting after 'yarn.log-aggregation.retain-seconds'). Are we moving aggregated log management(i.e deletion after expiry) responsibility to AHS? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706730#comment-13706730 ] Zhijie Shen commented on YARN-321: -- bq. How about the Job History Server which is currently handling this for MR Jobs? Are we moving this responsibility from JHS to AHS to handle for all types of applications? IMHO, AHS is used to record the history of some generic data of RM, such as RMApp, RMAppAttempt and RMContainer, regardless what kind of application it is. Currently, JHS of MR can coexist with AHS, recording the MR specific data, such as Job, Task and TaskAttempt. Perhaps in the future, AHS can be enhanced to record the specific data of a certain type of application by loading the per-framework "plugin", and JHS may be turned into such a plugin. Agree with [~vinodkv] on keeping hosting/serving per-framework data out of the scope now. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706699#comment-13706699 ] Devaraj K commented on YARN-321: bq. *Aggregated logs:* Logs will be served and potentially log management (expiry etc.) by ApplicationHistoryService via an abstract LogService component. How about the Job History Server which is currently handling this for MR Jobs? Are we moving this responsibility from JHS to AHS to handle for all types of applications? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706686#comment-13706686 ] Zhijie Shen commented on YARN-321: -- bq. ResoureManager will write per-application data to a (hopefully very) thin HistoryStorage layer. Here's more ellaboration about the per-application data. There should be three objects to record: RMApp, RMAppAttempt and RMContainer. Bellow are properties of each object: Completed Application: * Application ID * Application Name * Application Type * User * Queue * Submit Time * Start Time * Finish Time * Diagnostics Info * Final Application Status * Num of Application Attempts Completed Application Attempt: * Application Attempt ID * Application ID * Host * RPC Port * Tracking URL * Original Tracking URL (not sure it is necessary) * Diagnostics Info * Final Application Status * Master Container ID * Num of Containers Completed Container: * Container ID * Application Attempt ID * Final Container Status * Resource * Priority * Node ID * Log URL Application has one-to-many relationship with Application Attempt, while Application Attempt has one-to-one relationship with Container. WRT the concrete information to record, here's more idea about the interface of HistoryStorage. The follow APIs should be useful for RM to persist application history and for AHS to query it: * Iterable getApplications([conditions...]) * CompletedApplication getApplication(ApplicationId) * Iterable getApplicationAttempts(ApplicationId) * CompletedApplicationAttempt getApplicationAttempt(ApplicationAttemptId) * CompletedContainer getContainer(ApplicationAttemptId) * CompletedContainer getContainer(ContainerId) * void addApplication(CompletedApplication) * void addApplicationAttempt(CompletedApplicationAttempt) * void addContainer(CompletedContainer) In addition, HistoryStorage APIs may involve a lot of I/O operations such that the response of an API will be long. Therefore, it is likely to be good to make the API non-blocking. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706673#comment-13706673 ] Zhijie Shen commented on YARN-321: -- bq. To start with, we will have an implementation with per-app HDFS file. How about starting with an in-memory implementation, which is the easiest to do and is useful for testing. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706654#comment-13706654 ] Vinod Kumar Vavilapalli commented on YARN-321: -- Like I mentioned: bq. Querying list of apps based on user-name, queue-name etc. To start with, we will imitate what JHS does, throw up list of all apps and do the filtering client side. But we need a better server side solution. So for both the CLI and web UI, we will start with a client side basic filtering, perhaps coupled with paging on the results. More advanced analytics needs a more robust server side solution. I can already imagine file-based indices, but a more query friendly storage will be needed - a table view via HCat/HBase over HDFS will be a good start. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706645#comment-13706645 ] Hitesh Shah commented on YARN-321: -- {quote} To start with, we will have an implementation with per-app HDFS file. {quote} [~vinodkv] Based on the above, it seems like this will address allowing someone to analyse only one job at a time. Based on a per-app file, it will be non-trivial to search for applications that match a certain criteria? All jobs that run on a certain day? All jobs of a certain type? All jobs that took longer than 10 mins to run? All jobs that use over 100 containers? Sure, a directory hierarchy based on dates may solve the very basic use-cases but it looks like anyone needing to do any slightly more complex analysis on cluster utilization will need to build an indexing layer on top of the file-based store? > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706553#comment-13706553 ] Vinod Kumar Vavilapalli commented on YARN-321: -- Fundamentally, this JIRA is to track the management of data related to finished applications via a new server called ApplicationHistoryService (AHS). Some important design points: h4. Basics - ResoureManager will write per-application data to a (hopefully very) thin {{HistoryStorage}} layer. - ResourceManager will push the data to HistoryStorage after an application finishes in a separate thread. - HistoryStorage is different from the current RMStateStore and so unlike JobHistory, HistoryStorage isn't used for state-tracking or as a transaction log. ResourceManager will try to publish information about completed apps in a best-case manner but there will be edge cases during RM restart where we may not be flushing some data. Fixing it to be consistent and complete over an RM restart will be a future step. - HistoryStorage will have publish app-info, retrieve app-info and list apps APIs and can have various implementations -- A file based implementation where RM writes per-app files to DFS, HistoryStorage will take care of file management like we do today in JobHistoryServer (JHS) and serve users by reading the data in files -- A shared bus implementation where RM directly writes to AHS and AHS persists them in a storage that it controls - Files/DB etc. - To start with, we will have an implementation with per-app HDFS file. h4. Miscellaneous - *Running as service*: By default, ApplicationHistoryService will be embedded inside ResourceManager but will be independent enough to run as a separate service for scaling purposes. - *User interfaces*: Command line clients and/or web-clients will have RPC and web and REST interfaces to interact with ApplicationHistoryService to get info about finished applications. Fundamentally, we'll have two types of interfaces -- Per-app info -- List of all apps -- Querying list of apps based on user-name, queue-name etc. To start with, we will imitate what JHS does, throw up list of all apps and do the filtering client side. But we need a better server side solution. - *Aggregated logs*: Logs will be served and potentially log management (expiry etc.) by ApplicationHistoryService via an abstract LogService component. - *Retention*: ApplicationHistoryService will have components to take care of retention - expiring very old apps. - *Security*: ApplicationHistoryService will have security from start, will use tokens similar to JHS. h4. Out of scope - Hosting/serving per-framework data is out of scope for this JIRA. It is related to ApplicationHistoryService but I am keeping focus on generic data for now on this JIRA, will file a separate ticket for ApplicationHistoryService or a related service to work with per-framework or app data. I see a transition phase where we would continue to run AHS and JHS run at the same time till the other JIRA is resolved. - *Long running services*: We won't be having any special support for long running services yet. We should track this with other long running services' support. Feedback apprecitated. I am going kickstarting this right now. I am creating a branch for faster progress. > Generic application history service > --- > > Key: YARN-321 > URL: https://issues.apache.org/jira/browse/YARN-321 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Luke Lu >Assignee: Vinod Kumar Vavilapalli > > The mapreduce job history server currently needs to be deployed as a trusted > server in sync with the mapreduce runtime. Every new application would need a > similar application history server. Having to deploy O(T*V) (where T is > number of type of application, V is number of version of application) trusted > servers is clearly not scalable. > Job history storage handling itself is pretty generic: move the logs and > history data into a particular directory for later serving. Job history data > is already stored as json (or binary avro). I propose that we create only one > trusted application history server, which can have a generic UI (display json > as a tree of strings) as well. Specific application/version can deploy > untrusted webapps (a la AMs) to query the application history server and > interpret the json for its specific UI and/or analytics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira