[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2017-08-23 Thread zhangyubiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138124#comment-16138124
 ] 

zhangyubiao commented on MAPREDUCE-5641:


Though timelineserver show the application history  but  the killjob 
hdfs://user/readuser//.staging/job_1496915015540_  still left  in the fold. 
 So we can deal with this in the applicationhistory ? [~zjshen]

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2017-06-23 Thread Ajay Babu Kakani (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061476#comment-16061476
 ] 

Ajay Babu Kakani commented on MAPREDUCE-5641:
-

Oozie workflows are failed with Unable to find job job_X_ because 
the job logs of application which failed at AM launch stage are not getting 
moved to JHS web UI.

The job failure may be likely seen from the Nodemanager that was running, which 
means it need the Nodemanager log for the NM that ran the below job 
https://JHS_node/jobhistory/job/job_X_. It is evident that AM 
failed to launch the job. 

Can I know in which version it fix/going to fix?

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2017-01-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812195#comment-15812195
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

An even simpler approach until we're ready to have the JHS perform REST queries 
to the AHS is to have the JHS UI link to the AHS UI.  For example, when the 
YARN AHS is enabled then we could make the various app attempt numbers in the 
MapReduce UI clickable links that go to the specific attempt page on the AHS 
UI.  Not as nice as placing the diagnostics directly on the MapReduce UI, but 
at least the user can navigate to the information in the interim.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909246#comment-13909246
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5641:


I still don't understand why my proposal 
[here|https://issues.apache.org/jira/browse/MAPREDUCE-5641?focusedCommentId=13906448&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13906448]
 of making JHS talking to RM about the application-information is not enough to 
begin with. It can be extended in future to talk to AHS. To your question about 
scale, Jason did answer that it can be done on demand for only those apps which 
don't have history files.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-21 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909059#comment-13909059
 ] 

Karthik Kambatla commented on MAPREDUCE-5641:
-

Thinking more about this, I am slightly wary of using AHSClient or the store 
directly for this, before we iron out any rough edges and mark them stable. 

[~vinodkv], [~zjshen] - do you think it is reasonable to let this go through 
for now, even though it is not the cleanest approach and adds duplicate code. 
Once AHS is stable, we can follow up with removing the flag-file parts in YARN 
and updating the JHS parts of the code to use AHS instead of flag-file? 

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-20 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907656#comment-13907656
 ] 

Zhijie Shen commented on MAPREDUCE-5641:


bq. So it sounds like instead of doing YARN-1731 to make the RM write a little 
flag file that the JHS can check for, we can have the JHS check this store just 
like the AHS is doing. That should be cleaner.

It could be an option, but depends on what information you want. According to 
my previous understanding, you plan to inspect jhist file, and probably look 
for MR specific information, such as map, reduce, shuffle, merge and etc. It 
cannot be obtained from AHS. In contrast, some other generic information, such 
as start time, finish time, host and etc can be obtained from AHS. Perhaps, you 
can choose to recover part of information for failed MR AM now, and make a 
complete recovery whenever MR reports its specific information to timeline 
service.

bq. What is the store that its using? And where can I find out more about it or 
its API so I can update this patch to use it.

The suggested way to access the information is not read from the store 
directly, but use AHSClient or web services, suppose you are going to 
programmatically do this.


> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-20 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907614#comment-13907614
 ] 

Robert Kanter commented on MAPREDUCE-5641:
--

So it sounds like instead of doing YARN-1731 to make the RM write a little flag 
file that the JHS can check for, we can have the JHS check this store just like 
the AHS is doing.  That should be cleaner.  

What is the store that its using?  And where can I find out more about it or 
its API so I can update this patch to use it.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-20 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907603#comment-13907603
 ] 

Zhijie Shen commented on MAPREDUCE-5641:


ah, sorry I said the wrong word. It should be finished, *failed*, killed. If AM 
crashes, given no more retry, the application will be failed, right. AHS 
records the information from the view of RM.

bq. Please excuse my ignorance about AHS. What is the source of applications 
for the AHS? Does it periodically poll the RM? Or, does the RM trigger 
something on the completion of an app or its attempts?

AHS doesn't query RM. Instead RM pushes the information to a store where AHS 
can read. The information will be pushed  in terms of events before the 
application life cycle gets completed, no matter whether it completes as 
finished, failed or killed.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907594#comment-13907594
 ] 

Karthik Kambatla commented on MAPREDUCE-5641:
-

This JIRA is not aimed at applications that have finished, removed or killed. I 
guess the issue is here is those AMs that crash - so, the AMs don't leave any 
information about their existence. In this case, the JHS wouldn't know and 
hence wont show them.

Please excuse my ignorance about AHS. What is the source of applications for 
the AHS? Does it periodically poll the RM? Or, does the RM trigger something on 
the completion of an app or its attempts? 

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-20 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907574#comment-13907574
 ] 

Zhijie Shen commented on MAPREDUCE-5641:


bq. could you point us to how the AHS gets this information for AMs that crash. 
We might be able to re-use some of that if the RM side of things for doing this 
is stable.

No matter an application is finished, removed or killed, it is supposed to be 
recorded by AHS. However, it depends on what you need. If you're looking for 
the generic information, AHS should meet your requirement. Otherwise, you still 
need to walk around before per framework information of MR can be recorded.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907559#comment-13907559
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

I originally thought that as well but then wondered if the query was to be 
lazily performed.   It would query the RM when asked for a particular job for 
which it could not find the jhist in either done or done_intermediate.  That 
would solve the issue for providing a specific job's history but not the 
use-case of browsing for it.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907552#comment-13907552
 ] 

Karthik Kambatla commented on MAPREDUCE-5641:
-

[~vinodkv] - could you point us to how the AHS gets this information for AMs 
that crash. We might be able to re-use some of that if the RM side of things 
for doing this is stable. 

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907547#comment-13907547
 ] 

Karthik Kambatla commented on MAPREDUCE-5641:
-

bq. Instead of adding new functionality, can JHS simply ask RM about the 
application-status. Why would that not work? 
That would work, but on a cluster with say 10,000 running apps, the JHS would 
query the status of each app or fetch all apps every so often. It is nicer to 
avoid the poll model, no? 

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906448#comment-13906448
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5641:


bq. Vinod Kumar Vavilapalli, I'm a bit reluctant to get the JHS to depend on 
the AHS at this point as the AHS is not fully cooked. I would prefer dropping 
the JHS alltogether in favor of the AHS when the AHS is ready for prime time 
with AM extensions.
The problem is that as I understand it, this JIRA requires corresponding 
changes in YARN via YARN-1731. It doesn't make sense to add duplicate 
functionality in YARN.

Instead of adding new functionality, can JHS simply ask RM about the 
application-status. Why would that not work? Clearly if RM goes down and comes 
back up, it may lose history, but for that you need to enable the state-store 
anyways. But otherwise, it should work for the most part. Thoughts?



> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch, MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901985#comment-13901985
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5641:
---

yep, I've meant that. The JHS is trusted code, no user code running there. The 
doAs with the proxy user would be used only for this case. Also, all this would 
go away when the AHS is ready to take over. 

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901903#comment-13901903
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

In theory you could make the JHS able to proxy as users in HDFS so it can read 
the necessary files in the staging directory, if that's what you intended to 
suggest.  Not sure I'm thrilled with the JHS having the ability to do anything 
in HDFS, but it should work.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901900#comment-13901900
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

bq. how about not touching the current permissions of stating and making the RM 
a proxy user in HDFS. Then the files would be written as the user.

The issue is not the permissions of the proposed file the RM would write, 
rather the permissions of the .jhist and job.xml files written by the job.  
Those are already owner by the user and the RM isn't involved at all.  The 
issue with the originally proposed approach is that the JHS is not the user and 
therefore cannot access the necessary files to place them in the proper 
locations after the job completes (something the AM normally does).

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901887#comment-13901887
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5641:
---

[~rkanter], [~jlowe], how about not touching the current permissions of stating 
and making the RM a proxy user in HDFS. Then the files would be written as the 
user.

[~vinodkv], I'm a bit reluctant to get the JHS to depend on the AHS at this 
point as the AHS is not fully cooked. I would prefer dropping the JHS 
alltogether in favor of the AHS when  the AHS is ready for prime time with AM 
extensions.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901870#comment-13901870
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5641:


Haven't yet read the discussion, but overall, we don't need yet another 
solution for this. YARN-321 already is enabling generic history and so has 
record of killed/failed applications. If at all we need a fix,
 - For the short term, we should make JHS invoke web-services on RM and/or AHS 
to obtain this information.
 - Medium/longer term, the generic data and timeline data (YARN-1530) will 
merge to expose all information about apps via web-services. And JHS (if it 
still exists by that time) should just use them.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901821#comment-13901821
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

bq. Do you have any alternatives on how to allow the JHS to have access to 
those files?

Outside of imposing new restrictions on where the staging directory can be and 
how it has to be configured, no I don't know of an easy way to do that.  To 
allow the JHS to access these files, we'd minimally have to require the user 
directories in the staging area to have their group set to the "hadoop" group 
(see 
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
 for details on that group) and have permissions of 0750 all the way down to 
the specific staging directory for a job.  Read permission is required so the 
history server can scan for the proper jhist file to grab, since a job with 
multiple AM attempts means the JHS can't just know what the name of the correct 
JHS file is -- it would have to scan to see which is the latest.  That would 
relax the permissions on a user's staging files to include the hadoop group.  
That's probably OK and far better than letting everyone in, but I haven't 
thought through all of the security ramifications of doing so.

bq. Or to somehow get those files into the done_intermediate dir?

A proper way to do this would be to have something run by the user of the job 
do this, as that doesn't require any additional security beyond what's already 
done today.  However that probably involves adding the ability in YARN for a 
specified task to run when an application is failed/killed to cleanup after the 
unsuccessful run.  It's a non-trivial task, but it would also help solve the 
problem we have today where staging directories are leaked for applications 
that are killed before the AM launches.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901789#comment-13901789
 ] 

Robert Kanter commented on MAPREDUCE-5641:
--

hmm... I hadn't thought about the security of those files.  Do you have any 
alternatives on how to allow the JHS to have access to those files?  Or to 
somehow get those files into the done_intermediate dir?

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901635#comment-13901635
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

I should also point out that the assumption that the staging directory itself 
may not be publicly accessible.  The staging area is configurable, and our 
current setup places the staging area at /user.  That puts each user's .staging 
directory under their home directory, and the home directory of most users is 
locked down to 700.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901600#comment-13901600
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

bq. I don't believe that will work either, since the job history and job.xml 
files are 0600

Sorry, this is incorrect -- I was looking at the wrong files on one of our 
clusters.  The job conf and jhist files are 644 by default, so it will work but 
insecurely.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901592#comment-13901592
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

bq. I modified the permissions from 0700 to 0701.

I don't believe that will work either, since the job history and job.xml files 
are 0600.  So even if the history server can see it via the execute bit it 
won't be able to copy it.  If we allow it to copy it then it's not secure.  
With those permissions, anyone with a job ID of an active job, the job's user, 
and the job's staging directory can obtain the job configuration (via job.xml) 
and job counters (via _1.jhist).  The information needed to pull this 
off is trivially available, as the first two are on the front page of the RM 
and the latter is in public cluster configs.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: MAPREDUCE-5641.patch
>
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2013-12-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847554#comment-13847554
 ] 

Jason Lowe commented on MAPREDUCE-5641:
---

How will the JHS copy the file to the intermediate directory?  It likely won't 
have access to the staging directory containing the jhist file.

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2013-12-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846967#comment-13846967
 ] 

Karthik Kambatla commented on MAPREDUCE-5641:
-

Proposal makes sense to me. Do you want to open a YARN JIRA for the 
YARN-specific changes? 

> History for failed Application Masters should be made available to the Job 
> History Server
> -
>
> Key: MAPREDUCE-5641
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, jobhistoryserver
>Affects Versions: 2.2.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>
> Currently, the JHS has no information about jobs whose AMs have failed.  This 
> is because the History is written by the AM to the intermediate folder just 
> before finishing, so when it fails for any reason, this information isn't 
> copied there.  However, it is not lost as its in the AM's staging directory.  
> To make the History available in the JHS, all we need to do is have another 
> mechanism to move the History from the staging directory to the intermediate 
> directory.  The AM also writes a "Summary" file before exiting normally, 
> which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)