[jira] [Comment Edited] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cau
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751270#comment-17751270 ] chenhaodan edited comment on OOZIE-3717 at 8/7/23 8:20 AM: --- [~dionusos] I am sorry for that. I had fixed in [^OOZIE-3717-003.patch] Thanks for your time. was (Author: chenhd): [~dionusos] I am sorry for that. I had fixed them in [^OOZIE-3717-003.patch] Thanks for your time. > When fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > -- > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, > OOZIE-3717-003.patch > > > when fork actions parallel submit will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Thread 1 and Thread 2 execute CallableWrapper's execute function order like: > 1. Thread 1 execute removeFromUniqueCallables; > 2. Thread 2 execute queue add ActionStartXCommand into queue and add to > uniqueCallables; > 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but > filterDuplicates() function found a same name XCommand in uniqueCallables, so > skip add to queue; > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > Thread 2 add ActionStartXCommand enqueue before Thread 1, so > ForkedActionStartXCommand would be lost(never execute), and the thread that > fork actions parallel submit block at CallableQueueService.blockingWait(). > {code} > > *CallableWrapper's code* > {code:java} > public class CallableWrapper extends PriorityDelayQueue.QueueElement > implements Runnable, Callable { > private Instrumentation.Cron cron; > public void run() { > XCallable callable = null; > try { > removeFromUniqueCallables(); > if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { > log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with > [{1}]ms delay", getElement().getType(), > SAFE_MODE_DELAY); > setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > return; > } > callable = getElement(); > if (callableBegin(callable)) { > cron.stop(); > addInQueueCron(cron); > XLog log = XLog.getLog(getClass()); > log.trace("executing callable [{0}]", callable.getName()); > try { > //FutureTask.run() will invoke cllable.call() > super.run(); > incrCounter(INSTR_EXECUTED_COUNTER, 1); > log.trace("executed callable [{0}]", callable.getName()); > } > catch (Exception ex) { > incrCounter(INSTR_FAILED_COUNTER, 1); > log.warn("exception callable [{0}], {1}", > callable.getName(), ex.getMessage(), ex); > } > } > else { > log.warn("max concurrency for callable [{0}] exceeded, > requeueing with [{1}]ms delay", callable > .getType(), CONCURRENCY_DELAY); > setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > incrCounter(callable.getType() + "#exceeded.concurrency", 1); > } > } > catch (Throwable t) { >
[jira] [Comment Edited] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cau
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751270#comment-17751270 ] chenhaodan edited comment on OOZIE-3717 at 8/7/23 8:20 AM: --- [~dionusos] I am sorry for that. I had fixed them in [^OOZIE-3717-003.patch] Thanks for your time. was (Author: chenhd): [~dionusos] I am sorry for that. I has fixed them in [^OOZIE-3717-003.patch] [|https://issues.apache.org/jira/secure/DeleteAttachment!default.jspa?id=13545401=13061935=issue] Thanks for your time. > When fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > -- > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, > OOZIE-3717-003.patch > > > when fork actions parallel submit will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Thread 1 and Thread 2 execute CallableWrapper's execute function order like: > 1. Thread 1 execute removeFromUniqueCallables; > 2. Thread 2 execute queue add ActionStartXCommand into queue and add to > uniqueCallables; > 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but > filterDuplicates() function found a same name XCommand in uniqueCallables, so > skip add to queue; > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > Thread 2 add ActionStartXCommand enqueue before Thread 1, so > ForkedActionStartXCommand would be lost(never execute), and the thread that > fork actions parallel submit block at CallableQueueService.blockingWait(). > {code} > > *CallableWrapper's code* > {code:java} > public class CallableWrapper extends PriorityDelayQueue.QueueElement > implements Runnable, Callable { > private Instrumentation.Cron cron; > public void run() { > XCallable callable = null; > try { > removeFromUniqueCallables(); > if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { > log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with > [{1}]ms delay", getElement().getType(), > SAFE_MODE_DELAY); > setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > return; > } > callable = getElement(); > if (callableBegin(callable)) { > cron.stop(); > addInQueueCron(cron); > XLog log = XLog.getLog(getClass()); > log.trace("executing callable [{0}]", callable.getName()); > try { > //FutureTask.run() will invoke cllable.call() > super.run(); > incrCounter(INSTR_EXECUTED_COUNTER, 1); > log.trace("executed callable [{0}]", callable.getName()); > } > catch (Exception ex) { > incrCounter(INSTR_FAILED_COUNTER, 1); > log.warn("exception callable [{0}], {1}", > callable.getName(), ex.getMessage(), ex); > } > } > else { > log.warn("max concurrency for callable [{0}] exceeded, > requeueing with [{1}]ms delay", callable > .getType(), CONCURRENCY_DELAY); > setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); >
[jira] [Commented] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause de
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751270#comment-17751270 ] chenhaodan commented on OOZIE-3717: --- [~dionusos] I am sorry for that. I has fixed them in [^OOZIE-3717-003.patch] [|https://issues.apache.org/jira/secure/DeleteAttachment!default.jspa?id=13545401=13061935=issue] Thanks for your time. > When fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > -- > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, > OOZIE-3717-003.patch > > > when fork actions parallel submit will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Thread 1 and Thread 2 execute CallableWrapper's execute function order like: > 1. Thread 1 execute removeFromUniqueCallables; > 2. Thread 2 execute queue add ActionStartXCommand into queue and add to > uniqueCallables; > 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but > filterDuplicates() function found a same name XCommand in uniqueCallables, so > skip add to queue; > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > Thread 2 add ActionStartXCommand enqueue before Thread 1, so > ForkedActionStartXCommand would be lost(never execute), and the thread that > fork actions parallel submit block at CallableQueueService.blockingWait(). > {code} > > *CallableWrapper's code* > {code:java} > public class CallableWrapper extends PriorityDelayQueue.QueueElement > implements Runnable, Callable { > private Instrumentation.Cron cron; > public void run() { > XCallable callable = null; > try { > removeFromUniqueCallables(); > if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { > log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with > [{1}]ms delay", getElement().getType(), > SAFE_MODE_DELAY); > setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > return; > } > callable = getElement(); > if (callableBegin(callable)) { > cron.stop(); > addInQueueCron(cron); > XLog log = XLog.getLog(getClass()); > log.trace("executing callable [{0}]", callable.getName()); > try { > //FutureTask.run() will invoke cllable.call() > super.run(); > incrCounter(INSTR_EXECUTED_COUNTER, 1); > log.trace("executed callable [{0}]", callable.getName()); > } > catch (Exception ex) { > incrCounter(INSTR_FAILED_COUNTER, 1); > log.warn("exception callable [{0}], {1}", > callable.getName(), ex.getMessage(), ex); > } > } > else { > log.warn("max concurrency for callable [{0}] exceeded, > requeueing with [{1}]ms delay", callable > .getType(), CONCURRENCY_DELAY); > setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > incrCounter(callable.getType() + "#exceeded.concurrency", 1); > } > } > catch (Throwable t) { > incrCounter(INSTR_FAILED_COUNTER, 1); >
[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Attachment: OOZIE-3717-003.patch > When fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > -- > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, > OOZIE-3717-003.patch > > > when fork actions parallel submit will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Thread 1 and Thread 2 execute CallableWrapper's execute function order like: > 1. Thread 1 execute removeFromUniqueCallables; > 2. Thread 2 execute queue add ActionStartXCommand into queue and add to > uniqueCallables; > 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but > filterDuplicates() function found a same name XCommand in uniqueCallables, so > skip add to queue; > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > Thread 2 add ActionStartXCommand enqueue before Thread 1, so > ForkedActionStartXCommand would be lost(never execute), and the thread that > fork actions parallel submit block at CallableQueueService.blockingWait(). > {code} > > *CallableWrapper's code* > {code:java} > public class CallableWrapper extends PriorityDelayQueue.QueueElement > implements Runnable, Callable { > private Instrumentation.Cron cron; > public void run() { > XCallable callable = null; > try { > removeFromUniqueCallables(); > if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { > log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with > [{1}]ms delay", getElement().getType(), > SAFE_MODE_DELAY); > setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > return; > } > callable = getElement(); > if (callableBegin(callable)) { > cron.stop(); > addInQueueCron(cron); > XLog log = XLog.getLog(getClass()); > log.trace("executing callable [{0}]", callable.getName()); > try { > //FutureTask.run() will invoke cllable.call() > super.run(); > incrCounter(INSTR_EXECUTED_COUNTER, 1); > log.trace("executed callable [{0}]", callable.getName()); > } > catch (Exception ex) { > incrCounter(INSTR_FAILED_COUNTER, 1); > log.warn("exception callable [{0}], {1}", > callable.getName(), ex.getMessage(), ex); > } > } > else { > log.warn("max concurrency for callable [{0}] exceeded, > requeueing with [{1}]ms delay", callable > .getType(), CONCURRENCY_DELAY); > setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > incrCounter(callable.getType() + "#exceeded.concurrency", 1); > } > } > catch (Throwable t) { > incrCounter(INSTR_FAILED_COUNTER, 1); > log.warn("exception callable [{0}], {1}", callable == null ? > "N/A" : callable.getName(), > t.getMessage(), t); > } > finally { > if (callable != null) { >
[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Description: when fork actions parallel submit will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Thread 1 and Thread 2 execute CallableWrapper's execute function order like: 1. Thread 1 execute removeFromUniqueCallables; 2. Thread 2 execute queue add ActionStartXCommand into queue and add to uniqueCallables; 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but filterDuplicates() function found a same name XCommand in uniqueCallables, so skip add to queue; Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, Thread 2 add ActionStartXCommand enqueue before Thread 1, so ForkedActionStartXCommand would be lost(never execute), and the thread that fork actions parallel submit block at CallableQueueService.blockingWait(). {code} *CallableWrapper's code* {code:java} public class CallableWrapper extends PriorityDelayQueue.QueueElement implements Runnable, Callable { private Instrumentation.Cron cron; public void run() { XCallable callable = null; try { removeFromUniqueCallables(); if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with [{1}]ms delay", getElement().getType(), SAFE_MODE_DELAY); setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); queue(this, true); return; } callable = getElement(); if (callableBegin(callable)) { cron.stop(); addInQueueCron(cron); XLog log = XLog.getLog(getClass()); log.trace("executing callable [{0}]", callable.getName()); try { //FutureTask.run() will invoke cllable.call() super.run(); incrCounter(INSTR_EXECUTED_COUNTER, 1); log.trace("executed callable [{0}]", callable.getName()); } catch (Exception ex) { incrCounter(INSTR_FAILED_COUNTER, 1); log.warn("exception callable [{0}], {1}", callable.getName(), ex.getMessage(), ex); } } else { log.warn("max concurrency for callable [{0}] exceeded, requeueing with [{1}]ms delay", callable .getType(), CONCURRENCY_DELAY); setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); queue(this, true); incrCounter(callable.getType() + "#exceeded.concurrency", 1); } } catch (Throwable t) { incrCounter(INSTR_FAILED_COUNTER, 1); log.warn("exception callable [{0}], {1}", callable == null ? "N/A" : callable.getName(), t.getMessage(), t); } finally { if (callable != null) { callableEnd(callable); } } } } {code} was: when fork actions parallel submit will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue|
[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Attachment: OOZIE-3717-002.patch > When fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > -- > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch > > > when fork actions parallel submit will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Thread 1 and Thread 2 execute CallableWrapper's execute function order : > 1. Thread 1 execute removeFromUniqueCallables; > 2. Thread 2 execute queue add ActionStartXCommand into queue and add to > uniqueCallables; > 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but > filterDuplicates() function found a same name XCommand in uniqueCallables, so > skip add to queue; > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > Thread 2 add ActionStartXCommand enqueue before Thread 1, so > ForkedActionStartXCommand would be lost(never execute), and the thread that > fork actions parallel submit block at CallableQueueService.blockingWait(). > {code} > > *CallableWrapper's code* > {code:java} > public class CallableWrapper extends PriorityDelayQueue.QueueElement > implements Runnable, Callable { > private Instrumentation.Cron cron; > public void run() { > XCallable callable = null; > try { > removeFromUniqueCallables(); > if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { > log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with > [{1}]ms delay", getElement().getType(), > SAFE_MODE_DELAY); > setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > return; > } > callable = getElement(); > if (callableBegin(callable)) { > cron.stop(); > addInQueueCron(cron); > XLog log = XLog.getLog(getClass()); > log.trace("executing callable [{0}]", callable.getName()); > try { > //FutureTask.run() will invoke cllable.call() > super.run(); > incrCounter(INSTR_EXECUTED_COUNTER, 1); > log.trace("executed callable [{0}]", callable.getName()); > } > catch (Exception ex) { > incrCounter(INSTR_FAILED_COUNTER, 1); > log.warn("exception callable [{0}], {1}", > callable.getName(), ex.getMessage(), ex); > } > } > else { > log.warn("max concurrency for callable [{0}] exceeded, > requeueing with [{1}]ms delay", callable > .getType(), CONCURRENCY_DELAY); > setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); > queue(this, true); > incrCounter(callable.getType() + "#exceeded.concurrency", 1); > } > } > catch (Throwable t) { > incrCounter(INSTR_FAILED_COUNTER, 1); > log.warn("exception callable [{0}], {1}", callable == null ? > "N/A" : callable.getName(), > t.getMessage(), t); > } > finally { > if (callable != null) { >
[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Description: when fork actions parallel submit will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Thread 1 and Thread 2 execute CallableWrapper's execute function order : 1. Thread 1 execute removeFromUniqueCallables; 2. Thread 2 execute queue add ActionStartXCommand into queue and add to uniqueCallables; 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but filterDuplicates() function found a same name XCommand in uniqueCallables, so skip add to queue; Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, Thread 2 add ActionStartXCommand enqueue before Thread 1, so ForkedActionStartXCommand would be lost(never execute), and the thread that fork actions parallel submit block at CallableQueueService.blockingWait(). {code} *CallableWrapper's code* {code:java} public class CallableWrapper extends PriorityDelayQueue.QueueElement implements Runnable, Callable { private Instrumentation.Cron cron; public void run() { XCallable callable = null; try { removeFromUniqueCallables(); if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with [{1}]ms delay", getElement().getType(), SAFE_MODE_DELAY); setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); queue(this, true); return; } callable = getElement(); if (callableBegin(callable)) { cron.stop(); addInQueueCron(cron); XLog log = XLog.getLog(getClass()); log.trace("executing callable [{0}]", callable.getName()); try { //FutureTask.run() will invoke cllable.call() super.run(); incrCounter(INSTR_EXECUTED_COUNTER, 1); log.trace("executed callable [{0}]", callable.getName()); } catch (Exception ex) { incrCounter(INSTR_FAILED_COUNTER, 1); log.warn("exception callable [{0}], {1}", callable.getName(), ex.getMessage(), ex); } } else { log.warn("max concurrency for callable [{0}] exceeded, requeueing with [{1}]ms delay", callable .getType(), CONCURRENCY_DELAY); setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); queue(this, true); incrCounter(callable.getType() + "#exceeded.concurrency", 1); } } catch (Throwable t) { incrCounter(INSTR_FAILED_COUNTER, 1); log.warn("exception callable [{0}], {1}", callable == null ? "N/A" : callable.getName(), t.getMessage(), t); } finally { if (callable != null) { callableEnd(callable); } } } } {code} was: Fork actions parallel submit, will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue|
[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Description: Fork actions parallel submit, will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Thread 1 and Thread 2 execute CallableWrapper's execute function order : 1. Thread 1 execute removeFromUniqueCallables; 2. Thread 2 execute queue add ActionStartXCommand into queue and add to uniqueCallables; 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but filterDuplicates() function found a same name XCommand in uniqueCallables, so skip add to queue; Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, Thread 2 add ActionStartXCommand enqueue before Thread 1, so ForkedActionStartXCommand would be lost(never execute), and the thread that fork actions parallel submit block at CallableQueueService.blockingWait(). {code} *CallableWrapper's code* {code:java} public class CallableWrapper extends PriorityDelayQueue.QueueElement implements Runnable, Callable { private Instrumentation.Cron cron; public void run() { XCallable callable = null; try { removeFromUniqueCallables(); if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with [{1}]ms delay", getElement().getType(), SAFE_MODE_DELAY); setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); queue(this, true); return; } callable = getElement(); if (callableBegin(callable)) { cron.stop(); addInQueueCron(cron); XLog log = XLog.getLog(getClass()); log.trace("executing callable [{0}]", callable.getName()); try { //FutureTask.run() will invoke cllable.call() super.run(); incrCounter(INSTR_EXECUTED_COUNTER, 1); log.trace("executed callable [{0}]", callable.getName()); } catch (Exception ex) { incrCounter(INSTR_FAILED_COUNTER, 1); log.warn("exception callable [{0}], {1}", callable.getName(), ex.getMessage(), ex); } } else { log.warn("max concurrency for callable [{0}] exceeded, requeueing with [{1}]ms delay", callable .getType(), CONCURRENCY_DELAY); setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); queue(this, true); incrCounter(callable.getType() + "#exceeded.concurrency", 1); } } catch (Throwable t) { incrCounter(INSTR_FAILED_COUNTER, 1); log.warn("exception callable [{0}], {1}", callable == null ? "N/A" : callable.getName(), t.getMessage(), t); } finally { if (callable != null) { callableEnd(callable); } } } } {code} was: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue|
[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Description: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Thread 1 and Thread 2 execute CallableWrapper's execute function order : 1. Thread 1 execute removeFromUniqueCallables; 2. Thread 2 execute queue add ActionStartXCommand into queue and add to uniqueCallables; 3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but filterDuplicates() function found a same name XCommand in uniqueCallables, so skip add to queue; Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, Thread 2 add ActionStartXCommand enqueue before Thread 1, so ForkedActionStartXCommand would be lost(never execute), and the thread that fork actions parallel submit block at CallableQueueService.blockingWait(). {code} *CallableWrapper's code* {code:java} public class CallableWrapper extends PriorityDelayQueue.QueueElement implements Runnable, Callable { private Instrumentation.Cron cron; public void run() { XCallable callable = null; try { removeFromUniqueCallables(); if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) { log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with [{1}]ms delay", getElement().getType(), SAFE_MODE_DELAY); setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS); queue(this, true); return; } callable = getElement(); if (callableBegin(callable)) { cron.stop(); addInQueueCron(cron); XLog log = XLog.getLog(getClass()); log.trace("executing callable [{0}]", callable.getName()); try { //FutureTask.run() will invoke cllable.call() super.run(); incrCounter(INSTR_EXECUTED_COUNTER, 1); log.trace("executed callable [{0}]", callable.getName()); } catch (Exception ex) { incrCounter(INSTR_FAILED_COUNTER, 1); log.warn("exception callable [{0}], {1}", callable.getName(), ex.getMessage(), ex); } } else { log.warn("max concurrency for callable [{0}] exceeded, requeueing with [{1}]ms delay", callable .getType(), CONCURRENCY_DELAY); setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS); queue(this, true); incrCounter(callable.getType() + "#exceeded.concurrency", 1); } } catch (Throwable t) { incrCounter(INSTR_FAILED_COUNTER, 1); log.warn("exception callable [{0}], {1}", callable == null ? "N/A" : callable.getName(), t.getMessage(), t); } finally { if (callable != null) { callableEnd(callable); } } } } {code} was: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue|
[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Description: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, Thread 2 add ActionStartXCommand enqueue before Thread 1, so ForkedActionStartXCommand would be lost, and the thread that fork actions parallel submit block at CallableQueueService.blockingWait(). {code} was: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and block at CallableQueueService.blockingWait(). {code} > When fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > -- > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > Attachments: OOZIE-3717-001.patch > > > Fork actions parallel submit, so will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > Thread 2 add ActionStartXCommand enqueue before Thread 1, so > ForkedActionStartXCommand would be lost, and the thread that fork actions > parallel submit block at CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Summary: When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock (was: Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock) > When fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > -- > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > Attachments: OOZIE-3717-001.patch > > > Fork actions parallel submit, so will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > so ForkedActionStartXCommand would be lost, and block at > CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Attachment: (was: OOZIE-3717-001.patch) > Fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > - > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > > Fork actions parallel submit, so will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > so ForkedActionStartXCommand would be lost, and block at > CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Attachment: OOZIE-3717-001.patch > Fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > - > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > > Fork actions parallel submit, so will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > so ForkedActionStartXCommand would be lost, and block at > CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Attachment: (was: OOZIE-3717-001.patch) > Fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > - > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > > Fork actions parallel submit, so will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > so ForkedActionStartXCommand would be lost, and block at > CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Description: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and block at CallableQueueService.blockingWait(). {code} was: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and block at CallableQueueService.blockingWait(). {code} > Fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > - > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > > Fork actions parallel submit, so will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > so ForkedActionStartXCommand would be lost, and block at > CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Description: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue| enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and block at CallableQueueService.blockingWait(). {code} was: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue | enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and block at CallableQueueService.blockingWait(). {code} > Fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > - > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > > Fork actions parallel submit, so will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue| enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > so ForkedActionStartXCommand would be lost, and block at > CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock
[ https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3717: -- Description: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue | enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and block at CallableQueueService.blockingWait(). {code} was: Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue | enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and CallableQueueService block at CallableQueueService.blockingWait(). {code} > Fork actions parallel submit, becasue ForkedActionStartXCommand and > ActionStartXCommand has the same name, so ForkedActionStartXCommand would be > lost, and cause deadlock > - > > Key: OOZIE-3717 > URL: https://issues.apache.org/jira/browse/OOZIE-3717 > Project: Oozie > Issue Type: Bug > Components: action >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Fix For: trunk > > > Fork actions parallel submit, so will add ForkedActionStartXCommand and > RecoveryService will check pending action may add ActionStartXCommand, if > ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same > action) in queue, it would be lose. The thread parallel submit actions block > at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to > finish, but ForkedActionStartXCommand had lost and cause deadlock. > {code:java} > Thread 1 Thread 2 > (ForkedActionStartXCommand) (ActionStartXCommand) > ++ +-+ > | removeFromUniqueCallables | | . | > ++ +-+ > | .. | | queue | > ++ +-+ > | queue | enqueue successed, in uniqueCallables > ++ > | wrapper.filterDuplicates() | > ++ > Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, > so ForkedActionStartXCommand would be lost, and block at > CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock
chenhaodan created OOZIE-3717: - Summary: Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock Key: OOZIE-3717 URL: https://issues.apache.org/jira/browse/OOZIE-3717 Project: Oozie Issue Type: Bug Components: action Affects Versions: 5.2.1 Reporter: chenhaodan Assignee: chenhaodan Fix For: trunk Fork actions parallel submit, so will add ForkedActionStartXCommand and RecoveryService will check pending action may add ActionStartXCommand, if ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same action) in queue, it would be lose. The thread parallel submit actions block at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand to finish, but ForkedActionStartXCommand had lost and cause deadlock. {code:java} Thread 1 Thread 2 (ForkedActionStartXCommand) (ActionStartXCommand) ++ +-+ | removeFromUniqueCallables | | . | ++ +-+ | .. | | queue | ++ +-+ | queue | enqueue successed, in uniqueCallables ++ | wrapper.filterDuplicates() | ++ Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and CallableQueueService block at CallableQueueService.blockingWait(). {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741800#comment-17741800 ] chenhaodan commented on OOZIE-3715: --- [~dionusos] Thank you very much! > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: 5.3.0 > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, > OOZIE-3715-006.patch, forkSubmitFail_issue.txt, status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741311#comment-17741311 ] chenhaodan commented on OOZIE-3715: --- [~dionusos] Thanks for your patience in guidance. I had change the code, there has any other remarks regarding. Thank you again. > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, > OOZIE-3715-006.patch, forkSubmitFail_issue.txt, status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: OOZIE-3715-006.patch > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, > OOZIE-3715-006.patch, forkSubmitFail_issue.txt, status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17729267#comment-17729267 ] chenhaodan commented on OOZIE-3715: --- Hi, [~dionusos] ,do you have any other feedback or remarks regarding this change({*}OOZIE-3715-005.patch{*} )? Thank you! > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, > forkSubmitFail_issue.txt, status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17717285#comment-17717285 ] chenhaodan commented on OOZIE-3715: --- [~dionusos] OK, Thank for you patience in guidance. > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, > forkSubmitFail_issue.txt, status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: OOZIE-3715-005.patch > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, > forkSubmitFail_issue.txt, status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17706796#comment-17706796 ] chenhaodan commented on OOZIE-3715: --- [~jmakai] OK , Thank for your patience. I found it in the Live environment,the subworkflow failed because of the xml fault in submitting,and the other has submit to yarn stay Running. Like the appendix _*status.png.*_ Thanks for your time.{*}{*} > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, forkSubmitFail_issue.txt, > status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: status.png > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, forkSubmitFail_issue.txt, > status.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: 错误.png) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, forkSubmitFail_issue.txt > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: 错误.png > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, > OOZIE-3715-003.patch, OOZIE-3715-004.patch, forkSubmitFail_issue.txt, 错误.png > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17704888#comment-17704888 ] chenhaodan commented on OOZIE-3715: --- [~jmakai] Thank for your help. > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: OOZIE-3715-001.patch) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: 5.3.0 > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: OOZIE-3715-1-1.patch) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: 5.3.0 > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: OOZIE-3715-1.patch) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: 5.3.0 > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703151#comment-17703151 ] chenhaodan commented on OOZIE-3715: --- [~jmakai] Thank‘s for your help. > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Fix For: 5.3.0 > > Attachments: OOZIE-3715-001.patch, OOZIE-3715-1-1.patch, > OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702971#comment-17702971 ] chenhaodan edited comment on OOZIE-3715 at 3/21/23 4:20 AM: [~dionusos] Yes. That's strange , I've created the change on top of master branch. And I don't know why patch error? The error message said that "No such file or directory" and "error: while searching for:". was (Author: chenhd): [~dionusos] Yes. That's strange , I've created your change on top of master branch. And I don't know why patch error? The error message said that "No such file or directory" and "error: while searching for:". > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Attachments: OOZIE-3715-1-1.patch, OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17702971#comment-17702971 ] chenhaodan commented on OOZIE-3715: --- [~dionusos] Yes. That's strange , I've created your change on top of master branch. And I don't know why patch error? The error message said that "No such file or directory" and "error: while searching for:". > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Assignee: chenhaodan >Priority: Major > Labels: patch > Attachments: OOZIE-3715-1-1.patch, OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Affects Version/s: (was: 5.3.0) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Fix Version/s: trunk (was: 5.3.0) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.3.0 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: OOZIE-3715-1-1.patch) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.3.0 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: 5.3.0 > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Affects Version/s: (was: 6.0) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.3.0 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: 6.0, 5.3.0 > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Fix Version/s: (was: 6.0) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.3.0 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: 5.3.0 > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: OOZIE-3715-1.patch) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.3.0 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Affects Version/s: 5.3.0 (was: trunk) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.3.0 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: OOZIE-3715-1.patch) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Fix Version/s: trunk (was: 5.2.1) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: trunk > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: OOZIE-3715.patch) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Priority: Major > Labels: patch > Fix For: 5.2.1 > > Attachments: OOZIE-3715-1.patch > > > When I fork 2 transitions( A and B) to submit , when A transition failed , B > transition still Running , because can't execute KillXCommand. > SignalXCommand.startForkedActions, when one transition submit fail will > create a new ActionStartXCommand and invoke failJob, failJob will add > WorkflowNotificationXCommand and KillXCommand to > {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , > but we add WorkflowNotificationXCommand and KillXCommand to > ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not > SignalXCommand , so can't execute KillXCommand. > The code is as follows : > > {code:java} > public void startForkedActions(List > workflowActionBeanListForForked) throws CommandException { > .. > for (Future result : futures) { > .. > if (context.getJobStatus() != null && > context.getJobStatus().equals(Job.Status.FAILED)) { > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context); > .. > } >.. > } > {code} > > {code:java} > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); > queue(new WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > } > {code} > > {code:java} > public final T call() throws CommandException { > if (commandQueue != null) { > for (Map.Entry>> entry : > commandQueue.entrySet()) { > LOG.debug("Queuing [{0}] commands with delay [{1}]ms", > entry.getValue().size(), entry.getKey()); > if (!callableQueueService.queueSerial(entry.getValue(), > entry.getKey())) { > LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, > queue full", entry.getValue() > .size(), entry.getKey()); > } > } > } > } > {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Description: When I fork 2 transitions( A and B) to submit , when A transition failed , B transition still Running , because can't execute KillXCommand. SignalXCommand.startForkedActions, when one transition submit fail will create a new ActionStartXCommand and invoke failJob, failJob will add WorkflowNotificationXCommand and KillXCommand to {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , but we add WorkflowNotificationXCommand and KillXCommand to ActionStartXCommand‘s {color:#ff}*commandQueue*{color} , but not SignalXCommand , so can't execute KillXCommand. The code is as follows : {code:java} public void startForkedActions(List workflowActionBeanListForForked) throws CommandException { .. for (Future result : futures) { .. if (context.getJobStatus() != null && context.getJobStatus().equals(Job.Status.FAILED)) { new ActionStartXCommand(context.getAction().getId(), null).failJob(context); .. } .. } {code} {code:java} public void failJob(ActionExecutor.Context context, WorkflowActionBean action) throws CommandException { WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); if (!handleUserRetry(context, action)) { incrActionErrorCounter(action.getType(), "failed", 1); LOG.warn("Failing Job due to failed action [{0}]", action.getName()); try { workflow.getWorkflowInstance().fail(action.getName()); WorkflowInstance wfInstance = workflow.getWorkflowInstance(); ((LiteWorkflowInstance) wfInstance).setStatus(WorkflowInstance.Status.FAILED); workflow.setWorkflowInstance(wfInstance); workflow.setStatus(WorkflowJob.Status.FAILED); action.setStatus(WorkflowAction.Status.FAILED); action.resetPending(); queue(new WorkflowNotificationXCommand(workflow, action)); queue(new KillXCommand(workflow.getId())); InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, getInstrumentation()); } catch (WorkflowException ex) { throw new CommandException(ex); } } } {code} {code:java} public final T call() throws CommandException { if (commandQueue != null) { for (Map.Entry>> entry : commandQueue.entrySet()) { LOG.debug("Queuing [{0}] commands with delay [{1}]ms", entry.getValue().size(), entry.getKey()); if (!callableQueueService.queueSerial(entry.getValue(), entry.getKey())) { LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, queue full", entry.getValue() .size(), entry.getKey()); } } } } {code} was: When I fork 2 transitions( A and B) to submit , when A fail , B still Running , because can't execute KillXCommand. ActionXCommand execute failJob and add KillXCommand to commandQueue , but the commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't execute KillXCommand. The code is as follows : {code:java} new ActionStartXCommand(context.getAction().getId(), null).failJob(context) public void failJob(ActionExecutor.Context context, WorkflowActionBean action) throws CommandException { WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); if (!handleUserRetry(context, action)) { incrActionErrorCounter(action.getType(), "failed", 1); LOG.warn("Failing Job due to failed action [{0}]", action.getName()); try { workflow.getWorkflowInstance().fail(action.getName()); WorkflowInstance wfInstance = workflow.getWorkflowInstance(); ((LiteWorkflowInstance) wfInstance).setStatus(WorkflowInstance.Status.FAILED); workflow.setWorkflowInstance(wfInstance); workflow.setStatus(WorkflowJob.Status.FAILED); action.setStatus(WorkflowAction.Status.FAILED); action.resetPending(); queue(new WorkflowNotificationXCommand(workflow, action)); queue(new KillXCommand(workflow.getId())); InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, getInstrumentation()); } catch (WorkflowException ex) { throw new CommandException(ex); } } }{code} > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand >
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: (was: OOZIE-3715.patch) > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Priority: Major > Fix For: 5.2.1 > > > When I fork 2 transitions( A and B) to submit , when A fail , B still Running > , because can't execute KillXCommand. > ActionXCommand execute failJob and add KillXCommand to commandQueue , but the > commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't > execute KillXCommand. The code is as follows : > > {code:java} > new ActionStartXCommand(context.getAction().getId(), null).failJob(context) > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending(); queue(new > WorkflowNotificationXCommand(workflow, action)); > queue(new KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > } > catch (WorkflowException ex) { > throw new CommandException(ex); > } > } > }{code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Description: When I fork 2 transitions( A and B) to submit , when A fail , B still Running , because can't execute KillXCommand. ActionXCommand execute failJob and add KillXCommand to commandQueue , but the commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't execute KillXCommand. The code is as follows : {code:java} new ActionStartXCommand(context.getAction().getId(), null).failJob(context) public void failJob(ActionExecutor.Context context, WorkflowActionBean action) throws CommandException { WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); if (!handleUserRetry(context, action)) { incrActionErrorCounter(action.getType(), "failed", 1); LOG.warn("Failing Job due to failed action [{0}]", action.getName()); try { workflow.getWorkflowInstance().fail(action.getName()); WorkflowInstance wfInstance = workflow.getWorkflowInstance(); ((LiteWorkflowInstance) wfInstance).setStatus(WorkflowInstance.Status.FAILED); workflow.setWorkflowInstance(wfInstance); workflow.setStatus(WorkflowJob.Status.FAILED); action.setStatus(WorkflowAction.Status.FAILED); action.resetPending(); queue(new WorkflowNotificationXCommand(workflow, action)); queue(new KillXCommand(workflow.getId())); InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, getInstrumentation()); } catch (WorkflowException ex) { throw new CommandException(ex); } } }{code} was: When I fork 2 transitions( A and B) to submit , when A fail , B still Running , because can't execute KillXCommand. ActionXCommand execute failJob and add KillXCommand to commandQueue , but the commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't execute KillXCommand. The code is as follows : {code:java} new ActionStartXCommand(context.getAction().getId(), null).failJob(context)public void failJob(ActionExecutor.Context context, WorkflowActionBean action) throws CommandException { WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); if (!handleUserRetry(context, action)) { incrActionErrorCounter(action.getType(), "failed", 1); LOG.warn("Failing Job due to failed action [{0}]", action.getName()); try { workflow.getWorkflowInstance().fail(action.getName()); WorkflowInstance wfInstance = workflow.getWorkflowInstance(); ((LiteWorkflowInstance) wfInstance).setStatus(WorkflowInstance.Status.FAILED); workflow.setWorkflowInstance(wfInstance); workflow.setStatus(WorkflowJob.Status.FAILED); action.setStatus(WorkflowAction.Status.FAILED); action.resetPending(); queue(new WorkflowNotificationXCommand(workflow, action)); queue(new KillXCommand(workflow.getId())); InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, getInstrumentation()); } catch (WorkflowException ex) { throw new CommandException(ex); } } }{code} > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Priority: Major > Fix For: 5.2.1 > > Attachments: OOZIE-3715.patch > > > When I fork 2 transitions( A and B) to submit , when A fail , B still Running > , because can't execute KillXCommand. > ActionXCommand execute failJob and add KillXCommand to commandQueue , but the > commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't > execute KillXCommand. The code is as follows : > > {code:java} > new ActionStartXCommand(context.getAction().getId(), null).failJob(context) > public void failJob(ActionExecutor.Context context, WorkflowActionBean > action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { >
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Description: When I fork 2 transitions( A and B) to submit , when A fail , B still Running , because can't execute KillXCommand. ActionXCommand execute failJob and add KillXCommand to commandQueue , but the commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't execute KillXCommand. The code is as follows : {code:java} new ActionStartXCommand(context.getAction().getId(), null).failJob(context)public void failJob(ActionExecutor.Context context, WorkflowActionBean action) throws CommandException { WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); if (!handleUserRetry(context, action)) { incrActionErrorCounter(action.getType(), "failed", 1); LOG.warn("Failing Job due to failed action [{0}]", action.getName()); try { workflow.getWorkflowInstance().fail(action.getName()); WorkflowInstance wfInstance = workflow.getWorkflowInstance(); ((LiteWorkflowInstance) wfInstance).setStatus(WorkflowInstance.Status.FAILED); workflow.setWorkflowInstance(wfInstance); workflow.setStatus(WorkflowJob.Status.FAILED); action.setStatus(WorkflowAction.Status.FAILED); action.resetPending(); queue(new WorkflowNotificationXCommand(workflow, action)); queue(new KillXCommand(workflow.getId())); InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, getInstrumentation()); } catch (WorkflowException ex) { throw new CommandException(ex); } } }{code} was: When I fork 2 transitions( A and B) to submit , when A fail , B still Running , because can't execute KillXCommand. ActionXCommand execute failJob and add KillXCommand to commandQueue , but the commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't execute KillXCommand. The code is as follows : new ActionStartXCommand(context.getAction().getId(), null).failJob(context)public void failJob(ActionExecutor.Context context, WorkflowActionBean action) throws CommandException {WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();if (!handleUserRetry(context, action)) { incrActionErrorCounter(action.getType(), "failed", 1); LOG.warn("Failing Job due to failed action [\{0}]", action.getName()); try {workflow.getWorkflowInstance().fail(action.getName()); WorkflowInstance wfInstance = workflow.getWorkflowInstance(); ((LiteWorkflowInstance) wfInstance).setStatus(WorkflowInstance.Status.FAILED); workflow.setWorkflowInstance(wfInstance); workflow.setStatus(WorkflowJob.Status.FAILED); action.setStatus(WorkflowAction.Status.FAILED); action.resetPending();queue(new WorkflowNotificationXCommand(workflow, action));queue(new KillXCommand(workflow.getId())); InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, getInstrumentation()); }catch (WorkflowException ex) {throw new CommandException(ex); } } } > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Priority: Major > Fix For: 5.2.1 > > Attachments: OOZIE-3715.patch > > > When I fork 2 transitions( A and B) to submit , when A fail , B still Running > , because can't execute KillXCommand. > ActionXCommand execute failJob and add KillXCommand to commandQueue , but the > commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't > execute KillXCommand. The code is as follows : > > {code:java} > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context)public void failJob(ActionExecutor.Context context, > WorkflowActionBean action) throws CommandException { > WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow(); > if (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [{0}]", > action.getName()); > try { >
[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
[ https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenhaodan updated OOZIE-3715: -- Attachment: OOZIE-3715.patch > Fix fork out more than one transitions submit , one transition submit fail > can't execute KillXCommand > - > > Key: OOZIE-3715 > URL: https://issues.apache.org/jira/browse/OOZIE-3715 > Project: Oozie > Issue Type: Bug > Components: core >Affects Versions: 5.2.1 >Reporter: chenhaodan >Priority: Major > Fix For: 5.2.1 > > Attachments: OOZIE-3715.patch > > > When I fork 2 transitions( A and B) to submit , when A fail , B still Running > , because can't execute KillXCommand. > ActionXCommand execute failJob and add KillXCommand to commandQueue , but the > commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't > execute KillXCommand. The code is as follows : > new ActionStartXCommand(context.getAction().getId(), > null).failJob(context)public void failJob(ActionExecutor.Context context, > WorkflowActionBean action) throws CommandException {WorkflowJobBean > workflow = (WorkflowJobBean) context.getWorkflow();if > (!handleUserRetry(context, action)) { > incrActionErrorCounter(action.getType(), "failed", 1); > LOG.warn("Failing Job due to failed action [\{0}]", action.getName()); > try { > workflow.getWorkflowInstance().fail(action.getName()); > WorkflowInstance wfInstance = workflow.getWorkflowInstance(); > ((LiteWorkflowInstance) > wfInstance).setStatus(WorkflowInstance.Status.FAILED); > workflow.setWorkflowInstance(wfInstance); > workflow.setStatus(WorkflowJob.Status.FAILED); > action.setStatus(WorkflowAction.Status.FAILED); > action.resetPending();queue(new > WorkflowNotificationXCommand(workflow, action));queue(new > KillXCommand(workflow.getId())); > InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, > getInstrumentation()); > }catch (WorkflowException ex) {throw > new CommandException(ex); > } > } > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand
chenhaodan created OOZIE-3715: - Summary: Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand Key: OOZIE-3715 URL: https://issues.apache.org/jira/browse/OOZIE-3715 Project: Oozie Issue Type: Bug Components: core Affects Versions: 5.2.1 Reporter: chenhaodan Fix For: 5.2.1 When I fork 2 transitions( A and B) to submit , when A fail , B still Running , because can't execute KillXCommand. ActionXCommand execute failJob and add KillXCommand to commandQueue , but the commandQueue is the new Bean ActionXCommand not the SignalXCommand , so can't execute KillXCommand. The code is as follows : new ActionStartXCommand(context.getAction().getId(), null).failJob(context)public void failJob(ActionExecutor.Context context, WorkflowActionBean action) throws CommandException {WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();if (!handleUserRetry(context, action)) { incrActionErrorCounter(action.getType(), "failed", 1); LOG.warn("Failing Job due to failed action [\{0}]", action.getName()); try {workflow.getWorkflowInstance().fail(action.getName()); WorkflowInstance wfInstance = workflow.getWorkflowInstance(); ((LiteWorkflowInstance) wfInstance).setStatus(WorkflowInstance.Status.FAILED); workflow.setWorkflowInstance(wfInstance); workflow.setStatus(WorkflowJob.Status.FAILED); action.setStatus(WorkflowAction.Status.FAILED); action.resetPending();queue(new WorkflowNotificationXCommand(workflow, action));queue(new KillXCommand(workflow.getId())); InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, getInstrumentation()); }catch (WorkflowException ex) {throw new CommandException(ex); } } } -- This message was sent by Atlassian Jira (v8.20.10#820010)