[ https://issues.apache.org/jira/browse/OOZIE-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Purshotam Shah updated OOZIE-1864: ---------------------------------- Description: Current chid job id aggregation logic Once launcher job complete submitting child job (jobs in case on pig), it writes jobID to file. >From Oozie server side, we collect childID in two ways 1. As soon as we submit launcher jobs, we check if launcher job terminated or not. If it's terminated, we read child-id from file and populated to DB. And once kill command is issued we kill all child jobs. 2. We have a timer task (ActionCheckerService) which keeps on checking the status of all running actions and if launcher job is terminated, it's update the DB with childIDs. Jobend notification is rejected if action is not running. Assume that launcher is killed after it has submitted child job. Child job will never be killed. To fix this, we should do following things. 1. If oozie receives job end notification and if launcher job is killed, collect all child job and kill them if they are not killed. 2. Have a better way logic to collect child job id. Launcher job can call callbackServlet ( may be periodically) to update child job ids. This could be useful in pig jobs. In current scenario we report child jobs job only when launcher job completes. > Improve chid job id aggregation logic > ------------------------------------- > > Key: OOZIE-1864 > URL: https://issues.apache.org/jira/browse/OOZIE-1864 > Project: Oozie > Issue Type: Bug > Reporter: Purshotam Shah > > Current chid job id aggregation logic > Once launcher job complete submitting child job (jobs in case on pig), it > writes jobID to file. > From Oozie server side, we collect childID in two ways > 1. As soon as we submit launcher jobs, we check if launcher job terminated or > not. If it's terminated, we read child-id from file and populated to DB. And > once kill command is issued we kill all child jobs. > 2. We have a timer task (ActionCheckerService) which keeps on checking the > status of all running actions and if launcher job is terminated, it's update > the DB with childIDs. > Jobend notification is rejected if action is not running. > Assume that launcher is killed after it has submitted child job. > Child job will never be killed. > To fix this, we should do following things. > 1. If oozie receives job end notification and if launcher job is killed, > collect > all child job and kill them if they are not killed. > 2. Have a better way logic to collect child job id. Launcher job can call > callbackServlet ( may be periodically) to > update child job ids. This could be useful in pig jobs. In current scenario we > report child jobs job only when launcher job completes. -- This message was sent by Atlassian JIRA (v6.2#6252)