Re: Job end notification does not always work (Hadoop 2.x)

2013-06-25 Thread Prashant Kommireddi
Thanks everyone. I have opened a JIRA and added a link to this discussion
https://issues.apache.org/jira/browse/MAPREDUCE-5353


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k devara...@huawei.com wrote:

  It is not mandatory to have running HS in the cluster. Still the user
 can submit the job without HS in the cluster, and user may expect the
 Job/App End Notification.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Alejandro Abdelnur [mailto:t...@cloudera.com]
 *Sent:* 24 June 2013 21:42
 *To:* user@hadoop.apache.org
 *Cc:* user@hadoop.apache.org

 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  ** **

 if we ought to do this in a yarn service it
 should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
 be a good choice if we are concerned about the extra work this would cause
 in the RM. the problem with the current HS is that it is MR specific, we
 should generalize it for diff AM types. 

 ** **

 thx


 Alejandro

 (phone typing)


 On Jun 23, 2013, at 23:28, Devaraj k devara...@huawei.com wrote:

  Even if we handle all the failure cases in AM for Job End Notification,
 we may miss cases like abrupt kill of AM when it is in last retry. If we
 choose NM to give the notification, again RM needs to identify which NM
 should give the end-notification as we don't have any direct protocol
 between AM and NM.

  

 I feel it would be better to move End-Notification responsibility to RM as
 Yarn Service because it ensures 100% notification and also useful for other
 types of applications as well. 

  

  

 Thanks

 Devaraj K

  

 *From:* Ravi Prakash [mailto:ravi...@ymail.com ravi...@ymail.com]
 *Sent:* 23 June 2013 19:01
 *To:* user@hadoop.apache.org
 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  

 Hi Alejandro,

 Thanks for your reply! I was thinking more along the lines Prashant
 suggested i.e. a failure during init() should still trigger an attempt to
 notify (by the AM). But now that you mention it, maybe we would be better
 of including this as a YARN feature after all (specially with all the new
 AMs being written). We could let the NM of the AM handle the notification
 burden, so that the RM doesn't get unduly taxed. Thoughts?

 Thanks
 Ravi

  

  
--

 *From:* Alejandro Abdelnur t...@cloudera.com
 *To:* common-u...@hadoop.apache.org user@hadoop.apache.org
 *Sent:* Saturday, June 22, 2013 7:37 PM
 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  

 If the AM fails before doing the job end notification, at any stage of the
 execution for whatever reason, the job end notification will never be
 deliver. There is not way to fix this unless the notification is done by a
 Yarn service. The 2 'candidate' services for doing this would be the RM and
 the HS. The job notification URL is in the job conf. The RM never sees the
 job conf, that rules out the RM out unless we add, at AM registration time
 the possibility to specify a callback URL. The HS has access to the job
 conf, but the HS is currently a 'passive' service.


 thx

  

 On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 Prashanth, 

  

  Please file a jira.

  

  One thing to be aware of - AMs get restarted a certain number of times
 for fault-tolerance - which means we can't just assume that failure of a
 single AM is equivalent to failure of the job.

  

  Only the ResourceManager is in the appropriate position to judge failure
 of AM v/s failure-of-job.

  

 hth,

 Arun

  

 On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi prash1...@gmail.com
 wrote:




 

 Thanks Ravi.

 Well, in this case its a no-effort :) A failure of AM init should be
 considered as failure of the job? I looked at the code and best-effort
 makes sense with respect to retry logic etc. You make a good point that
 there would be no notification in case AM OOMs, but I do feel AM init
 failure should send a notification by other means.

  

 On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash ravi...@ymail.com wrote:**
 **

 Hi Prashant,

 I would tend to agree with you. Although job-end notification is only a
 best-effort mechanism (i.e. we cannot always guarantee notification for
 example when the AM OOMs), I agree with you that we can do more. If you
 feel strongly about this, please create a JIRA and possibly upload a patch.

 Thanks
 Ravi

  

  
--

 *From:* Prashant Kommireddi prash1...@gmail.com
 *To:* user@hadoop.apache.org user@hadoop.apache.org
 *Sent:* Thursday, June 20, 2013 9:45 PM
 *Subject:* Job end notification does not always work (Hadoop 2.x)

  

 Hello,

 I came across an issue that occurs with the job notification callbacks in
 MR2. It works fine if the Application master has

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-25 Thread Alejandro Abdelnur
Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve
status/counters from it, from Java AP or Web UI. So I'd for any practical
usage, you need it.

thx


On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k devara...@huawei.com wrote:

  It is not mandatory to have running HS in the cluster. Still the user
 can submit the job without HS in the cluster, and user may expect the
 Job/App End Notification.

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Alejandro Abdelnur [mailto:t...@cloudera.com]
 *Sent:* 24 June 2013 21:42
 *To:* user@hadoop.apache.org
 *Cc:* user@hadoop.apache.org

 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  ** **

 if we ought to do this in a yarn service it
 should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
 be a good choice if we are concerned about the extra work this would cause
 in the RM. the problem with the current HS is that it is MR specific, we
 should generalize it for diff AM types. 

 ** **

 thx


 Alejandro

 (phone typing)


 On Jun 23, 2013, at 23:28, Devaraj k devara...@huawei.com wrote:

  Even if we handle all the failure cases in AM for Job End Notification,
 we may miss cases like abrupt kill of AM when it is in last retry. If we
 choose NM to give the notification, again RM needs to identify which NM
 should give the end-notification as we don't have any direct protocol
 between AM and NM.

  

 I feel it would be better to move End-Notification responsibility to RM as
 Yarn Service because it ensures 100% notification and also useful for other
 types of applications as well. 

  

  

 Thanks

 Devaraj K

  

 *From:* Ravi Prakash [mailto:ravi...@ymail.com ravi...@ymail.com]
 *Sent:* 23 June 2013 19:01
 *To:* user@hadoop.apache.org
 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  

 Hi Alejandro,

 Thanks for your reply! I was thinking more along the lines Prashant
 suggested i.e. a failure during init() should still trigger an attempt to
 notify (by the AM). But now that you mention it, maybe we would be better
 of including this as a YARN feature after all (specially with all the new
 AMs being written). We could let the NM of the AM handle the notification
 burden, so that the RM doesn't get unduly taxed. Thoughts?

 Thanks
 Ravi

  

  
--

 *From:* Alejandro Abdelnur t...@cloudera.com
 *To:* common-u...@hadoop.apache.org user@hadoop.apache.org
 *Sent:* Saturday, June 22, 2013 7:37 PM
 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  

 If the AM fails before doing the job end notification, at any stage of the
 execution for whatever reason, the job end notification will never be
 deliver. There is not way to fix this unless the notification is done by a
 Yarn service. The 2 'candidate' services for doing this would be the RM and
 the HS. The job notification URL is in the job conf. The RM never sees the
 job conf, that rules out the RM out unless we add, at AM registration time
 the possibility to specify a callback URL. The HS has access to the job
 conf, but the HS is currently a 'passive' service.


 thx

  

 On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 Prashanth, 

  

  Please file a jira.

  

  One thing to be aware of - AMs get restarted a certain number of times
 for fault-tolerance - which means we can't just assume that failure of a
 single AM is equivalent to failure of the job.

  

  Only the ResourceManager is in the appropriate position to judge failure
 of AM v/s failure-of-job.

  

 hth,

 Arun

  

 On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi prash1...@gmail.com
 wrote:




 

 Thanks Ravi.

 Well, in this case its a no-effort :) A failure of AM init should be
 considered as failure of the job? I looked at the code and best-effort
 makes sense with respect to retry logic etc. You make a good point that
 there would be no notification in case AM OOMs, but I do feel AM init
 failure should send a notification by other means.

  

 On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash ravi...@ymail.com wrote:**
 **

 Hi Prashant,

 I would tend to agree with you. Although job-end notification is only a
 best-effort mechanism (i.e. we cannot always guarantee notification for
 example when the AM OOMs), I agree with you that we can do more. If you
 feel strongly about this, please create a JIRA and possibly upload a patch.

 Thanks
 Ravi

  

  
--

 *From:* Prashant Kommireddi prash1...@gmail.com
 *To:* user@hadoop.apache.org user@hadoop.apache.org
 *Sent:* Thursday, June 20, 2013 9:45 PM
 *Subject:* Job end notification does not always work (Hadoop 2.x)

  

 Hello,

 I came across an issue that occurs with the job notification callbacks

RE: Job end notification does not always work (Hadoop 2.x)

2013-06-25 Thread Devaraj k
I agree, for getting status/counters we need HS. I mean Job can finish without 
HS also.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:t...@cloudera.com]
Sent: 25 June 2013 18:05
To: common-u...@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Devaraj,

If you don't run the HS, once your jobs finished you cannot retrieve 
status/counters from it, from Java AP or Web UI. So I'd for any practical 
usage, you need it.

thx

On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k 
devara...@huawei.commailto:devara...@huawei.com wrote:
It is not mandatory to have running HS in the cluster. Still the user can 
submit the job without HS in the cluster, and user may expect the Job/App End 
Notification.

Thanks
Devaraj k

From: Alejandro Abdelnur [mailto:t...@cloudera.commailto:t...@cloudera.com]
Sent: 24 June 2013 21:42
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Cc: user@hadoop.apache.orgmailto:user@hadoop.apache.org

Subject: Re: Job end notification does not always work (Hadoop 2.x)

if we ought to do this in a yarn service it
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a 
good choice if we are concerned about the extra work this would cause in the 
RM. the problem with the current HS is that it is MR specific, we should 
generalize it for diff AM types.

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k 
devara...@huawei.commailto:devara...@huawei.com wrote:
Even if we handle all the failure cases in AM for Job End Notification, we may 
miss cases like abrupt kill of AM when it is in last retry. If we choose NM to 
give the notification, again RM needs to identify which NM should give the 
end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn 
Service because it ensures 100% notification and also useful for other types of 
applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravi...@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested 
i.e. a failure during init() should still trigger an attempt to notify (by the 
AM). But now that you mention it, maybe we would be better of including this as 
a YARN feature after all (specially with all the new AMs being written). We 
could let the NM of the AM handle the notification burden, so that the RM 
doesn't get unduly taxed. Thoughts?

Thanks
Ravi



From: Alejandro Abdelnur t...@cloudera.commailto:t...@cloudera.com
To: common-u...@hadoop.apache.orgmailto:common-u...@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the 
execution for whatever reason, the job end notification will never be deliver. 
There is not way to fix this unless the notification is done by a Yarn service. 
The 2 'candidate' services for doing this would be the RM and the HS. The job 
notification URL is in the job conf. The RM never sees the job conf, that rules 
out the RM out unless we add, at AM registration time the possibility to 
specify a callback URL. The HS has access to the job conf, but the HS is 
currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy 
a...@hortonworks.commailto:a...@hortonworks.com wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for 
fault-tolerance - which means we can't just assume that failure of a single AM 
is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM 
v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi 
prash1...@gmail.commailto:prash1...@gmail.com wrote:


Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered 
as failure of the job? I looked at the code and best-effort makes sense with 
respect to retry logic etc. You make a good point that there would be no 
notification in case AM OOMs, but I do feel AM init failure should send a 
notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash 
ravi...@ymail.commailto:ravi...@ymail.com wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a 
best-effort mechanism (i.e. we cannot always guarantee notification for 
example when the AM OOMs), I agree with you that we can do more. If you feel 
strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi



From: Prashant Kommireddi prash1...@gmail.commailto:prash1...@gmail.com
To: user

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-25 Thread Alejandro Abdelnur
Devaraj,

if a job can finish but you cannot determine it status after it ended, then
the system is not usable. Thus, HS is a required component.

thx


On Tue, Jun 25, 2013 at 6:11 AM, Devaraj k devara...@huawei.com wrote:

  I agree, for getting status/counters we need HS. I mean Job can finish
 without HS also.  

 ** **

 Thanks

 Devaraj k

 ** **

 *From:* Alejandro Abdelnur [mailto:t...@cloudera.com]
 *Sent:* 25 June 2013 18:05
 *To:* common-u...@hadoop.apache.org

 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  ** **

 Devaraj,

 ** **

 If you don't run the HS, once your jobs finished you cannot retrieve
 status/counters from it, from Java AP or Web UI. So I'd for any practical
 usage, you need it.

 ** **

 thx

 ** **

 On Mon, Jun 24, 2013 at 8:42 PM, Devaraj k devara...@huawei.com wrote:**
 **

 It is not mandatory to have running HS in the cluster. Still the user can
 submit the job without HS in the cluster, and user may expect the Job/App
 End Notification.

  

 Thanks

 Devaraj k

  

 *From:* Alejandro Abdelnur [mailto:t...@cloudera.com]
 *Sent:* 24 June 2013 21:42
 *To:* user@hadoop.apache.org
 *Cc:* user@hadoop.apache.org


 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  

 if we ought to do this in a yarn service it
 should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would
 be a good choice if we are concerned about the extra work this would cause
 in the RM. the problem with the current HS is that it is MR specific, we
 should generalize it for diff AM types. 

  

 thx


 Alejandro

 (phone typing)


 On Jun 23, 2013, at 23:28, Devaraj k devara...@huawei.com wrote:

  Even if we handle all the failure cases in AM for Job End Notification,
 we may miss cases like abrupt kill of AM when it is in last retry. If we
 choose NM to give the notification, again RM needs to identify which NM
 should give the end-notification as we don't have any direct protocol
 between AM and NM.

  

 I feel it would be better to move End-Notification responsibility to RM as
 Yarn Service because it ensures 100% notification and also useful for other
 types of applications as well. 

  

  

 Thanks

 Devaraj K

  

 *From:* Ravi Prakash [mailto:ravi...@ymail.com ravi...@ymail.com]
 *Sent:* 23 June 2013 19:01
 *To:* user@hadoop.apache.org
 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  

 Hi Alejandro,

 Thanks for your reply! I was thinking more along the lines Prashant
 suggested i.e. a failure during init() should still trigger an attempt to
 notify (by the AM). But now that you mention it, maybe we would be better
 of including this as a YARN feature after all (specially with all the new
 AMs being written). We could let the NM of the AM handle the notification
 burden, so that the RM doesn't get unduly taxed. Thoughts?

 Thanks
 Ravi

  

  
--

 *From:* Alejandro Abdelnur t...@cloudera.com
 *To:* common-u...@hadoop.apache.org user@hadoop.apache.org
 *Sent:* Saturday, June 22, 2013 7:37 PM
 *Subject:* Re: Job end notification does not always work (Hadoop 2.x)

  

 If the AM fails before doing the job end notification, at any stage of the
 execution for whatever reason, the job end notification will never be
 deliver. There is not way to fix this unless the notification is done by a
 Yarn service. The 2 'candidate' services for doing this would be the RM and
 the HS. The job notification URL is in the job conf. The RM never sees the
 job conf, that rules out the RM out unless we add, at AM registration time
 the possibility to specify a callback URL. The HS has access to the job
 conf, but the HS is currently a 'passive' service.


 thx

  

 On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy a...@hortonworks.com
 wrote:

 Prashanth, 

  

  Please file a jira.

  

  One thing to be aware of - AMs get restarted a certain number of times
 for fault-tolerance - which means we can't just assume that failure of a
 single AM is equivalent to failure of the job.

  

  Only the ResourceManager is in the appropriate position to judge failure
 of AM v/s failure-of-job.

  

 hth,

 Arun

  

 On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi prash1...@gmail.com
 wrote:



 

 Thanks Ravi.

 Well, in this case its a no-effort :) A failure of AM init should be
 considered as failure of the job? I looked at the code and best-effort
 makes sense with respect to retry logic etc. You make a good point that
 there would be no notification in case AM OOMs, but I do feel AM init
 failure should send a notification by other means.

  

 On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash ravi...@ymail.com wrote:**
 **

 Hi Prashant,

 I would tend to agree with you. Although job-end

RE: Job end notification does not always work (Hadoop 2.x)

2013-06-24 Thread Devaraj k
Even if we handle all the failure cases in AM for Job End Notification, we may 
miss cases like abrupt kill of AM when it is in last retry. If we choose NM to 
give the notification, again RM needs to identify which NM should give the 
end-notification as we don't have any direct protocol between AM and NM.

I feel it would be better to move End-Notification responsibility to RM as Yarn 
Service because it ensures 100% notification and also useful for other types of 
applications as well.


Thanks
Devaraj K

From: Ravi Prakash [mailto:ravi...@ymail.com]
Sent: 23 June 2013 19:01
To: user@hadoop.apache.org
Subject: Re: Job end notification does not always work (Hadoop 2.x)

Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested 
i.e. a failure during init() should still trigger an attempt to notify (by the 
AM). But now that you mention it, maybe we would be better of including this as 
a YARN feature after all (specially with all the new AMs being written). We 
could let the NM of the AM handle the notification burden, so that the RM 
doesn't get unduly taxed. Thoughts?

Thanks
Ravi



From: Alejandro Abdelnur t...@cloudera.commailto:t...@cloudera.com
To: common-u...@hadoop.apache.orgmailto:common-u...@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)

If the AM fails before doing the job end notification, at any stage of the 
execution for whatever reason, the job end notification will never be deliver. 
There is not way to fix this unless the notification is done by a Yarn service. 
The 2 'candidate' services for doing this would be the RM and the HS. The job 
notification URL is in the job conf. The RM never sees the job conf, that rules 
out the RM out unless we add, at AM registration time the possibility to 
specify a callback URL. The HS has access to the job conf, but the HS is 
currently a 'passive' service.

thx

On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy 
a...@hortonworks.commailto:a...@hortonworks.com wrote:
Prashanth,

 Please file a jira.

 One thing to be aware of - AMs get restarted a certain number of times for 
fault-tolerance - which means we can't just assume that failure of a single AM 
is equivalent to failure of the job.

 Only the ResourceManager is in the appropriate position to judge failure of AM 
v/s failure-of-job.

hth,
Arun

On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi 
prash1...@gmail.commailto:prash1...@gmail.com wrote:


Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be considered 
as failure of the job? I looked at the code and best-effort makes sense with 
respect to retry logic etc. You make a good point that there would be no 
notification in case AM OOMs, but I do feel AM init failure should send a 
notification by other means.

On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash 
ravi...@ymail.commailto:ravi...@ymail.com wrote:
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a 
best-effort mechanism (i.e. we cannot always guarantee notification for 
example when the AM OOMs), I agree with you that we can do more. If you feel 
strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi



From: Prashant Kommireddi prash1...@gmail.commailto:prash1...@gmail.com
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org 
user@hadoop.apache.orgmailto:user@hadoop.apache.org
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)

Hello,
I came across an issue that occurs with the job notification callbacks in MR2. 
It works fine if the Application master has started, but does not send a 
callback if the initializing of AM fails.
Here is the code from MRAppMaster.java

.
...

  // set job classloader if configured

  MRApps.setJobClassLoader(conf);

  initAndStartAppMaster(appMaster, conf, jobUserName);

} catch (Throwable t) {

  LOG.fatal(Error starting MRAppMaster, t);

  System.exit(1);

}

  }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,

  final YarnConfiguration conf, String jobUserName) throws IOException,

  InterruptedException {

UserGroupInformation.setConfiguration(conf);

UserGroupInformation appMasterUgi = UserGroupInformation

.createRemoteUser(jobUserName);

appMasterUgi.doAs(new PrivilegedExceptionActionObject() {

  @Override

  public Object run() throws Exception {

appMaster.init(conf);

appMaster.start();

if(appMaster.errorHappenedShutDown) {

  throw new IOException(Was asked to shut down.);

}

return null;

  }

});

  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is 
responsible for sending a HTTP callback

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-24 Thread Alejandro Abdelnur
if we ought to do this in a yarn service it 
should be the RM or the HS. the RM is, IMO, the natural fit. the HS, would be a 
good choice if we are concerned about the extra work this would cause in the 
RM. the problem with the current HS is that it is MR specific, we should 
generalize it for diff AM types. 

thx

Alejandro
(phone typing)

On Jun 23, 2013, at 23:28, Devaraj k devara...@huawei.com wrote:

 Even if we handle all the failure cases in AM for Job End Notification, we 
 may miss cases like abrupt kill of AM when it is in last retry. If we choose 
 NM to give the notification, again RM needs to identify which NM should give 
 the end-notification as we don't have any direct protocol between AM and NM.
  
 I feel it would be better to move End-Notification responsibility to RM as 
 Yarn Service because it ensures 100% notification and also useful for other 
 types of applications as well.
  
  
 Thanks
 Devaraj K
  
 From: Ravi Prakash [mailto:ravi...@ymail.com] 
 Sent: 23 June 2013 19:01
 To: user@hadoop.apache.org
 Subject: Re: Job end notification does not always work (Hadoop 2.x)
  
 Hi Alejandro,
 
 Thanks for your reply! I was thinking more along the lines Prashant suggested 
 i.e. a failure during init() should still trigger an attempt to notify (by 
 the AM). But now that you mention it, maybe we would be better of including 
 this as a YARN feature after all (specially with all the new AMs being 
 written). We could let the NM of the AM handle the notification burden, so 
 that the RM doesn't get unduly taxed. Thoughts?
 
 Thanks
 Ravi
  
  
 From: Alejandro Abdelnur t...@cloudera.com
 To: common-u...@hadoop.apache.org user@hadoop.apache.org 
 Sent: Saturday, June 22, 2013 7:37 PM
 Subject: Re: Job end notification does not always work (Hadoop 2.x)
  
 If the AM fails before doing the job end notification, at any stage of the 
 execution for whatever reason, the job end notification will never be 
 deliver. There is not way to fix this unless the notification is done by a 
 Yarn service. The 2 'candidate' services for doing this would be the RM and 
 the HS. The job notification URL is in the job conf. The RM never sees the 
 job conf, that rules out the RM out unless we add, at AM registration time 
 the possibility to specify a callback URL. The HS has access to the job conf, 
 but the HS is currently a 'passive' service.
 
 thx
  
 On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy a...@hortonworks.com wrote:
 Prashanth, 
  
  Please file a jira.
  
  One thing to be aware of - AMs get restarted a certain number of times for 
 fault-tolerance - which means we can't just assume that failure of a single 
 AM is equivalent to failure of the job.
  
  Only the ResourceManager is in the appropriate position to judge failure of 
 AM v/s failure-of-job.
  
 hth,
 Arun
  
 On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi prash1...@gmail.com wrote:
 
 
 Thanks Ravi.
 
 Well, in this case its a no-effort :) A failure of AM init should be 
 considered as failure of the job? I looked at the code and best-effort makes 
 sense with respect to retry logic etc. You make a good point that there would 
 be no notification in case AM OOMs, but I do feel AM init failure should send 
 a notification by other means.
 
  
 
 On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash ravi...@ymail.com wrote:
 Hi Prashant,
 
 I would tend to agree with you. Although job-end notification is only a 
 best-effort mechanism (i.e. we cannot always guarantee notification for 
 example when the AM OOMs), I agree with you that we can do more. If you feel 
 strongly about this, please create a JIRA and possibly upload a patch.
 
 Thanks
 Ravi
  
  
 From: Prashant Kommireddi prash1...@gmail.com
 To: user@hadoop.apache.org user@hadoop.apache.org 
 Sent: Thursday, June 20, 2013 9:45 PM
 Subject: Job end notification does not always work (Hadoop 2.x)
  
 Hello,
 
 I came across an issue that occurs with the job notification callbacks in 
 MR2. It works fine if the Application master has started, but does not send a 
 callback if the initializing of AM fails.
 
 Here is the code from MRAppMaster.java
 
 .
 ...
   // set job classloader if configured
   MRApps.setJobClassLoader(conf);
   initAndStartAppMaster(appMaster, conf, jobUserName);
 } catch (Throwable t) {
   LOG.fatal(Error starting MRAppMaster, t);
   System.exit(1);
 }
   }
 
 protected static void initAndStartAppMaster(final MRAppMaster appMaster,
   final YarnConfiguration conf, String jobUserName) throws IOException,
   InterruptedException {
 UserGroupInformation.setConfiguration(conf);
 UserGroupInformation appMasterUgi = UserGroupInformation
 .createRemoteUser(jobUserName);
 appMasterUgi.doAs(new PrivilegedExceptionActionObject() {
   @Override
   public Object run() throws Exception {
 appMaster.init(conf);
 appMaster.start();
 if(appMaster.errorHappenedShutDown

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-23 Thread Ravi Prakash
Hi Alejandro,

Thanks for your reply! I was thinking more along the lines Prashant suggested 
i.e. a failure during init() should still trigger an attempt to notify (by the 
AM). But now that you mention it, maybe we would be better of including this as 
a YARN feature after all (specially with all the new AMs being written). We 
could let the NM of the AM handle the notification burden, so that the RM 
doesn't get unduly taxed. Thoughts?

Thanks
Ravi





 From: Alejandro Abdelnur t...@cloudera.com
To: common-u...@hadoop.apache.org user@hadoop.apache.org 
Sent: Saturday, June 22, 2013 7:37 PM
Subject: Re: Job end notification does not always work (Hadoop 2.x)
 


If the AM fails before doing the job end notification, at any stage of the 
execution for whatever reason, the job end notification will never be deliver. 
There is not way to fix this unless the notification is done by a Yarn service. 
The 2 'candidate' services for doing this would be the RM and the HS. The job 
notification URL is in the job conf. The RM never sees the job conf, that rules 
out the RM out unless we add, at AM registration time the possibility to 
specify a callback URL. The HS has access to the job conf, but the HS is 
currently a 'passive' service.

thx


On Sat, Jun 22, 2013 at 3:48 PM, Arun C Murthy a...@hortonworks.com wrote:

Prashanth, 


 Please file a jira.


 One thing to be aware of - AMs get restarted a certain number of times for 
fault-tolerance - which means we can't just assume that failure of a single AM 
is equivalent to failure of the job.


 Only the ResourceManager is in the appropriate position to judge failure of 
AM v/s failure-of-job.


hth,
Arun


On Jun 22, 2013, at 2:44 PM, Prashant Kommireddi prash1...@gmail.com wrote:

Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be 
considered as failure of the job? I looked at the code and best-effort makes 
sense with respect to retry logic etc. You make a good point that there would 
be no notification in case AM OOMs, but I do feel AM init failure should send 
a notification by other means.





On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash ravi...@ymail.com wrote:

Hi Prashant,

I would tend to agree with you. Although job-end notification is only a 
best-effort mechanism (i.e. we cannot always guarantee notification for 
example when the AM OOMs), I agree with you that we can do more. If you feel 
strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi







 From: Prashant Kommireddi prash1...@gmail.com
To: user@hadoop.apache.org user@hadoop.apache.org 
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)
 


Hello,

I came across an issue that occurs with the job notification callbacks in 
MR2. It works fine if the Application master has started, but does not send 
a callback if the initializing of AM fails.

Here is the code from MRAppMaster.java

.
...

// set job classloader if configured MRApps.setJobClassLoader(conf); 
initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { 
LOG.fatal(Error starting MRAppMaster, t); System.exit(1); } }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
  final YarnConfiguration conf, String jobUserName) throws IOException,
  InterruptedException {
UserGroupInformation.setConfiguration(conf);
UserGroupInformation appMasterUgi = UserGroupInformation
.createRemoteUser(jobUserName);
appMasterUgi.doAs(new PrivilegedExceptionActionObject() {
  @Override
  public Object run() throws Exception {
appMaster.init(conf);
appMaster.start();
if(appMaster.errorHappenedShutDown) {
  throw new IOException(Was asked to shut down.);
}
return null;
  }
});
  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is 
responsible for sending a HTTP callback (via shutDownJob()). If there was an 
exception at this time, the process would simply terminate (via 
System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things 
work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job 
failed?

Thanks,

Prashant







--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

 



-- 
Alejandro 

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-22 Thread Prashant Kommireddi
Following-up on this. Please let me know if this is expected/bug and if you
would like me to file a JIRA


On Thu, Jun 20, 2013 at 9:45 PM, Prashant Kommireddi prash1...@gmail.comwrote:

 Hello,

 I came across an issue that occurs with the job notification callbacks in
 MR2. It works fine if the Application master has started, but does not send
 a callback if the initializing of AM fails.

 Here is the code from MRAppMaster.java

 .
 ...

   // set job classloader if configured
   MRApps.setJobClassLoader(conf);
   initAndStartAppMaster(appMaster, conf, jobUserName);
 } catch (Throwable t) {
   LOG.fatal(Error starting MRAppMaster, t);
   System.exit(1);
 }
   }

 protected static void initAndStartAppMaster(final MRAppMaster appMaster,
   final YarnConfiguration conf, String jobUserName) throws IOException,
   InterruptedException {
 UserGroupInformation.setConfiguration(conf);
 UserGroupInformation appMasterUgi = UserGroupInformation
 .createRemoteUser(jobUserName);
 appMasterUgi.doAs(new PrivilegedExceptionActionObject() {
   @Override
   public Object run() throws Exception {
 appMaster.init(conf);
 appMaster.start();
 if(appMaster.errorHappenedShutDown) {
   throw new IOException(Was asked to shut down.);
 }
 return null;
   }
 });
   }

 appMaster.init(conf) does not dispatch JobFinishEventHandler which is
 responsible for sending a HTTP callback (via shutDownJob()). If there was
 an exception at this time, the process would simply terminate (via
 System.exit(1) )

 appMaster.start() however rightly uses the JobFinishEventHandler and
 things work fine.

 Shouldn't a failure on init(..) also send a callback suggesting the job
 failed?

 Thanks,
 Prashant




Re: Job end notification does not always work (Hadoop 2.x)

2013-06-22 Thread Ravi Prakash
Hi Prashant,

I would tend to agree with you. Although job-end notification is only a 
best-effort mechanism (i.e. we cannot always guarantee notification for 
example when the AM OOMs), I agree with you that we can do more. If you feel 
strongly about this, please create a JIRA and possibly upload a patch.

Thanks
Ravi





 From: Prashant Kommireddi prash1...@gmail.com
To: user@hadoop.apache.org user@hadoop.apache.org 
Sent: Thursday, June 20, 2013 9:45 PM
Subject: Job end notification does not always work (Hadoop 2.x)
 


Hello,

I came across an issue that occurs with the job notification callbacks in MR2. 
It works fine if the Application master has started, but does not send a 
callback if the initializing of AM fails.

Here is the code from MRAppMaster.java

.
...

// set job classloader if configured MRApps.setJobClassLoader(conf); 
initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { 
LOG.fatal(Error starting MRAppMaster, t); System.exit(1); } }

protected static void initAndStartAppMaster(final MRAppMaster appMaster,
  final YarnConfiguration conf, String jobUserName) throws IOException,
  InterruptedException {
UserGroupInformation.setConfiguration(conf);
UserGroupInformation appMasterUgi = UserGroupInformation
.createRemoteUser(jobUserName);
appMasterUgi.doAs(new PrivilegedExceptionActionObject() {
  @Override
  public Object run() throws Exception {
appMaster.init(conf);
appMaster.start();
if(appMaster.errorHappenedShutDown) {
  throw new IOException(Was asked to shut down.);
}
return null;
  }
});
  }
appMaster.init(conf) does not dispatch JobFinishEventHandler which is 
responsible for sending a HTTP callback (via shutDownJob()). If there was an 
exception at this time, the process would simply terminate (via System.exit(1) )

appMaster.start() however rightly uses the JobFinishEventHandler and things 
work fine.

Shouldn't a failure on init(..) also send a callback suggesting the job failed?

Thanks,

Prashant

Re: Job end notification does not always work (Hadoop 2.x)

2013-06-22 Thread Prashant Kommireddi
Thanks Ravi.

Well, in this case its a no-effort :) A failure of AM init should be
considered as failure of the job? I looked at the code and best-effort
makes sense with respect to retry logic etc. You make a good point that
there would be no notification in case AM OOMs, but I do feel AM init
failure should send a notification by other means.



On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash ravi...@ymail.com wrote:

 Hi Prashant,

 I would tend to agree with you. Although job-end notification is only a
 best-effort mechanism (i.e. we cannot always guarantee notification for
 example when the AM OOMs), I agree with you that we can do more. If you
 feel strongly about this, please create a JIRA and possibly upload a patch.

 Thanks
 Ravi


   --
  *From:* Prashant Kommireddi prash1...@gmail.com
 *To:* user@hadoop.apache.org user@hadoop.apache.org
 *Sent:* Thursday, June 20, 2013 9:45 PM
 *Subject:* Job end notification does not always work (Hadoop 2.x)

 Hello,

 I came across an issue that occurs with the job notification callbacks in
 MR2. It works fine if the Application master has started, but does not send
 a callback if the initializing of AM fails.

 Here is the code from MRAppMaster.java

 .
 ...

   // set job classloader if configured
   MRApps.setJobClassLoader(conf);
   initAndStartAppMaster(appMaster, conf, jobUserName);
 } catch (Throwable t) {
   LOG.fatal(Error starting MRAppMaster, t);
   System.exit(1);
 }
   }

 protected static void initAndStartAppMaster(final MRAppMaster appMaster,
   final YarnConfiguration conf, String jobUserName) throws IOException,
   InterruptedException {
 UserGroupInformation.setConfiguration(conf);
 UserGroupInformation appMasterUgi = UserGroupInformation
 .createRemoteUser(jobUserName);
 appMasterUgi.doAs(new PrivilegedExceptionActionObject() {
   @Override
   public Object run() throws Exception {
 appMaster.init(conf);
 appMaster.start();
 if(appMaster.errorHappenedShutDown) {
   throw new IOException(Was asked to shut down.);
 }
 return null;
   }
 });
   }

 appMaster.init(conf) does not dispatch JobFinishEventHandler which is
 responsible for sending a HTTP callback (via shutDownJob()). If there was
 an exception at this time, the process would simply terminate (via
 System.exit(1) )

 appMaster.start() however rightly uses the JobFinishEventHandler and
 things work fine.

 Shouldn't a failure on init(..) also send a callback suggesting the job
 failed?

 Thanks,
 Prashant