[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138124#comment-16138124 ] zhangyubiao commented on MAPREDUCE-5641: Though timelineserver show the application history but the killjob hdfs://user/readuser//.staging/job_1496915015540_ still left in the fold. So we can deal with this in the applicationhistory ? [~zjshen] > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061476#comment-16061476 ] Ajay Babu Kakani commented on MAPREDUCE-5641: - Oozie workflows are failed with Unable to find job job_X_ because the job logs of application which failed at AM launch stage are not getting moved to JHS web UI. The job failure may be likely seen from the Nodemanager that was running, which means it need the Nodemanager log for the NM that ran the below job https://JHS_node/jobhistory/job/job_X_. It is evident that AM failed to launch the job. Can I know in which version it fix/going to fix? > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812195#comment-15812195 ] Jason Lowe commented on MAPREDUCE-5641: --- An even simpler approach until we're ready to have the JHS perform REST queries to the AHS is to have the JHS UI link to the AHS UI. For example, when the YARN AHS is enabled then we could make the various app attempt numbers in the MapReduce UI clickable links that go to the specific attempt page on the AHS UI. Not as nice as placing the diagnostics directly on the MapReduce UI, but at least the user can navigate to the information in the interim. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909246#comment-13909246 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5641: I still don't understand why my proposal [here|https://issues.apache.org/jira/browse/MAPREDUCE-5641?focusedCommentId=13906448&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906448] of making JHS talking to RM about the application-information is not enough to begin with. It can be extended in future to talk to AHS. To your question about scale, Jason did answer that it can be done on demand for only those apps which don't have history files. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909059#comment-13909059 ] Karthik Kambatla commented on MAPREDUCE-5641: - Thinking more about this, I am slightly wary of using AHSClient or the store directly for this, before we iron out any rough edges and mark them stable. [~vinodkv], [~zjshen] - do you think it is reasonable to let this go through for now, even though it is not the cleanest approach and adds duplicate code. Once AHS is stable, we can follow up with removing the flag-file parts in YARN and updating the JHS parts of the code to use AHS instead of flag-file? > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907656#comment-13907656 ] Zhijie Shen commented on MAPREDUCE-5641: bq. So it sounds like instead of doing YARN-1731 to make the RM write a little flag file that the JHS can check for, we can have the JHS check this store just like the AHS is doing. That should be cleaner. It could be an option, but depends on what information you want. According to my previous understanding, you plan to inspect jhist file, and probably look for MR specific information, such as map, reduce, shuffle, merge and etc. It cannot be obtained from AHS. In contrast, some other generic information, such as start time, finish time, host and etc can be obtained from AHS. Perhaps, you can choose to recover part of information for failed MR AM now, and make a complete recovery whenever MR reports its specific information to timeline service. bq. What is the store that its using? And where can I find out more about it or its API so I can update this patch to use it. The suggested way to access the information is not read from the store directly, but use AHSClient or web services, suppose you are going to programmatically do this. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907614#comment-13907614 ] Robert Kanter commented on MAPREDUCE-5641: -- So it sounds like instead of doing YARN-1731 to make the RM write a little flag file that the JHS can check for, we can have the JHS check this store just like the AHS is doing. That should be cleaner. What is the store that its using? And where can I find out more about it or its API so I can update this patch to use it. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907603#comment-13907603 ] Zhijie Shen commented on MAPREDUCE-5641: ah, sorry I said the wrong word. It should be finished, *failed*, killed. If AM crashes, given no more retry, the application will be failed, right. AHS records the information from the view of RM. bq. Please excuse my ignorance about AHS. What is the source of applications for the AHS? Does it periodically poll the RM? Or, does the RM trigger something on the completion of an app or its attempts? AHS doesn't query RM. Instead RM pushes the information to a store where AHS can read. The information will be pushed in terms of events before the application life cycle gets completed, no matter whether it completes as finished, failed or killed. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907594#comment-13907594 ] Karthik Kambatla commented on MAPREDUCE-5641: - This JIRA is not aimed at applications that have finished, removed or killed. I guess the issue is here is those AMs that crash - so, the AMs don't leave any information about their existence. In this case, the JHS wouldn't know and hence wont show them. Please excuse my ignorance about AHS. What is the source of applications for the AHS? Does it periodically poll the RM? Or, does the RM trigger something on the completion of an app or its attempts? > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907574#comment-13907574 ] Zhijie Shen commented on MAPREDUCE-5641: bq. could you point us to how the AHS gets this information for AMs that crash. We might be able to re-use some of that if the RM side of things for doing this is stable. No matter an application is finished, removed or killed, it is supposed to be recorded by AHS. However, it depends on what you need. If you're looking for the generic information, AHS should meet your requirement. Otherwise, you still need to walk around before per framework information of MR can be recorded. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907559#comment-13907559 ] Jason Lowe commented on MAPREDUCE-5641: --- I originally thought that as well but then wondered if the query was to be lazily performed. It would query the RM when asked for a particular job for which it could not find the jhist in either done or done_intermediate. That would solve the issue for providing a specific job's history but not the use-case of browsing for it. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907552#comment-13907552 ] Karthik Kambatla commented on MAPREDUCE-5641: - [~vinodkv] - could you point us to how the AHS gets this information for AMs that crash. We might be able to re-use some of that if the RM side of things for doing this is stable. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907547#comment-13907547 ] Karthik Kambatla commented on MAPREDUCE-5641: - bq. Instead of adding new functionality, can JHS simply ask RM about the application-status. Why would that not work? That would work, but on a cluster with say 10,000 running apps, the JHS would query the status of each app or fetch all apps every so often. It is nicer to avoid the poll model, no? > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906448#comment-13906448 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5641: bq. Vinod Kumar Vavilapalli, I'm a bit reluctant to get the JHS to depend on the AHS at this point as the AHS is not fully cooked. I would prefer dropping the JHS alltogether in favor of the AHS when the AHS is ready for prime time with AM extensions. The problem is that as I understand it, this JIRA requires corresponding changes in YARN via YARN-1731. It doesn't make sense to add duplicate functionality in YARN. Instead of adding new functionality, can JHS simply ask RM about the application-status. Why would that not work? Clearly if RM goes down and comes back up, it may lose history, but for that you need to enable the state-store anyways. But otherwise, it should work for the most part. Thoughts? > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901985#comment-13901985 ] Alejandro Abdelnur commented on MAPREDUCE-5641: --- yep, I've meant that. The JHS is trusted code, no user code running there. The doAs with the proxy user would be used only for this case. Also, all this would go away when the AHS is ready to take over. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901903#comment-13901903 ] Jason Lowe commented on MAPREDUCE-5641: --- In theory you could make the JHS able to proxy as users in HDFS so it can read the necessary files in the staging directory, if that's what you intended to suggest. Not sure I'm thrilled with the JHS having the ability to do anything in HDFS, but it should work. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901900#comment-13901900 ] Jason Lowe commented on MAPREDUCE-5641: --- bq. how about not touching the current permissions of stating and making the RM a proxy user in HDFS. Then the files would be written as the user. The issue is not the permissions of the proposed file the RM would write, rather the permissions of the .jhist and job.xml files written by the job. Those are already owner by the user and the RM isn't involved at all. The issue with the originally proposed approach is that the JHS is not the user and therefore cannot access the necessary files to place them in the proper locations after the job completes (something the AM normally does). > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901887#comment-13901887 ] Alejandro Abdelnur commented on MAPREDUCE-5641: --- [~rkanter], [~jlowe], how about not touching the current permissions of stating and making the RM a proxy user in HDFS. Then the files would be written as the user. [~vinodkv], I'm a bit reluctant to get the JHS to depend on the AHS at this point as the AHS is not fully cooked. I would prefer dropping the JHS alltogether in favor of the AHS when the AHS is ready for prime time with AM extensions. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901870#comment-13901870 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5641: Haven't yet read the discussion, but overall, we don't need yet another solution for this. YARN-321 already is enabling generic history and so has record of killed/failed applications. If at all we need a fix, - For the short term, we should make JHS invoke web-services on RM and/or AHS to obtain this information. - Medium/longer term, the generic data and timeline data (YARN-1530) will merge to expose all information about apps via web-services. And JHS (if it still exists by that time) should just use them. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901821#comment-13901821 ] Jason Lowe commented on MAPREDUCE-5641: --- bq. Do you have any alternatives on how to allow the JHS to have access to those files? Outside of imposing new restrictions on where the staging directory can be and how it has to be configured, no I don't know of an easy way to do that. To allow the JHS to access these files, we'd minimally have to require the user directories in the staging area to have their group set to the "hadoop" group (see http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html#Running_Hadoop_in_Secure_Mode for details on that group) and have permissions of 0750 all the way down to the specific staging directory for a job. Read permission is required so the history server can scan for the proper jhist file to grab, since a job with multiple AM attempts means the JHS can't just know what the name of the correct JHS file is -- it would have to scan to see which is the latest. That would relax the permissions on a user's staging files to include the hadoop group. That's probably OK and far better than letting everyone in, but I haven't thought through all of the security ramifications of doing so. bq. Or to somehow get those files into the done_intermediate dir? A proper way to do this would be to have something run by the user of the job do this, as that doesn't require any additional security beyond what's already done today. However that probably involves adding the ability in YARN for a specified task to run when an application is failed/killed to cleanup after the unsuccessful run. It's a non-trivial task, but it would also help solve the problem we have today where staging directories are leaked for applications that are killed before the AM launches. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901789#comment-13901789 ] Robert Kanter commented on MAPREDUCE-5641: -- hmm... I hadn't thought about the security of those files. Do you have any alternatives on how to allow the JHS to have access to those files? Or to somehow get those files into the done_intermediate dir? > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901635#comment-13901635 ] Jason Lowe commented on MAPREDUCE-5641: --- I should also point out that the assumption that the staging directory itself may not be publicly accessible. The staging area is configurable, and our current setup places the staging area at /user. That puts each user's .staging directory under their home directory, and the home directory of most users is locked down to 700. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901600#comment-13901600 ] Jason Lowe commented on MAPREDUCE-5641: --- bq. I don't believe that will work either, since the job history and job.xml files are 0600 Sorry, this is incorrect -- I was looking at the wrong files on one of our clusters. The job conf and jhist files are 644 by default, so it will work but insecurely. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901592#comment-13901592 ] Jason Lowe commented on MAPREDUCE-5641: --- bq. I modified the permissions from 0700 to 0701. I don't believe that will work either, since the job history and job.xml files are 0600. So even if the history server can see it via the execute bit it won't be able to copy it. If we allow it to copy it then it's not secure. With those permissions, anyone with a job ID of an active job, the job's user, and the job's staging directory can obtain the job configuration (via job.xml) and job counters (via _1.jhist). The information needed to pull this off is trivially available, as the first two are on the front page of the RM and the latter is in public cluster configs. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-5641.patch > > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847554#comment-13847554 ] Jason Lowe commented on MAPREDUCE-5641: --- How will the JHS copy the file to the intermediate directory? It likely won't have access to the staging directory containing the jhist file. > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846967#comment-13846967 ] Karthik Kambatla commented on MAPREDUCE-5641: - Proposal makes sense to me. Do you want to open a YARN JIRA for the YARN-specific changes? > History for failed Application Masters should be made available to the Job > History Server > - > > Key: MAPREDUCE-5641 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, jobhistoryserver >Affects Versions: 2.2.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > > Currently, the JHS has no information about jobs whose AMs have failed. This > is because the History is written by the AM to the intermediate folder just > before finishing, so when it fails for any reason, this information isn't > copied there. However, it is not lost as its in the AM's staging directory. > To make the History available in the JHS, all we need to do is have another > mechanism to move the History from the staging directory to the intermediate > directory. The AM also writes a "Summary" file before exiting normally, > which is also unavailable when the AM fails. -- This message was sent by Atlassian JIRA (v6.1.4#6159)