[jira] [Updated] (OOZIE-2324) A syntax error in the kill node causes the workflow to get stuck and other problems
[ https://issues.apache.org/jira/browse/OOZIE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] abhishek bafna updated OOZIE-2324: -- Fix Version/s: (was: trunk) 4.3.0 > A syntax error in the kill node causes the workflow to get stuck and other > problems > --- > > Key: OOZIE-2324 > URL: https://issues.apache.org/jira/browse/OOZIE-2324 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 4.3.0 > > Attachments: OOZIE-2324.001.patch, OOZIE-2324.002.patch, > OOZIE-2324.003.patch > > > A syntax error normally causes the action to go to the "fail to" transition, > which is typically a kill node, which kills the workflow. Unfortunately, in > the kill action, we don't have that behavior, so if you get a syntax error in > the kill node, it looks like Oozie gets stuck and might be requeueing the > command to retry it. This can then clog up the callable queue, and cause > other jobs to not get processed. Oozie should better handle an error in the > kill node. > In the log, we see this: > {noformat} > 2015-07-30 16:49:23,610 WARN SignalXCommand:523 - SERVER[rkanter-MBP.local] > USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] > JOB[004-150730164245830-oozie-rkan-W] > ACTION[004-150730164245830-oozie-rkan-W@fail-output] Exception in > SignalXCommand > javax.servlet.jsp.el.ELException: Encountered "{", expected one of ["}", ".", > ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", > "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"] > at > org.apache.commons.el.ExpressionEvaluatorImpl.parseExpressionString(ExpressionEvaluatorImpl.java:320) > at > org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:250) > at > org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190) > at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:300) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:352) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:434) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2015-07-30 16:49:23,612 ERROR SignalXCommand:517 - SERVER[rkanter-MBP.local] > USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] > JOB[004-150730164245830-oozie-rkan-W] > ACTION[004-150730164245830-oozie-rkan-W@fail-output] XException, > org.apache.oozie.command.CommandException: E0729: Kill node message > [fail-output] > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:315) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) > at > org.apache.o
[jira] [Updated] (OOZIE-2324) A syntax error in the kill node causes the workflow to get stuck and other problems
[ https://issues.apache.org/jira/browse/OOZIE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated OOZIE-2324: - Attachment: OOZIE-2324.003.patch 003 patch fixes trailing whitespace. Not sure what happened to the tests run; it simply says "killed" partway through the tests in Jenkins. I'm guessing it's just a weird Jenkins thing and will work next time. > A syntax error in the kill node causes the workflow to get stuck and other > problems > --- > > Key: OOZIE-2324 > URL: https://issues.apache.org/jira/browse/OOZIE-2324 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: OOZIE-2324.001.patch, OOZIE-2324.002.patch, > OOZIE-2324.003.patch > > > A syntax error normally causes the action to go to the "fail to" transition, > which is typically a kill node, which kills the workflow. Unfortunately, in > the kill action, we don't have that behavior, so if you get a syntax error in > the kill node, it looks like Oozie gets stuck and might be requeueing the > command to retry it. This can then clog up the callable queue, and cause > other jobs to not get processed. Oozie should better handle an error in the > kill node. > In the log, we see this: > {noformat} > 2015-07-30 16:49:23,610 WARN SignalXCommand:523 - SERVER[rkanter-MBP.local] > USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] > JOB[004-150730164245830-oozie-rkan-W] > ACTION[004-150730164245830-oozie-rkan-W@fail-output] Exception in > SignalXCommand > javax.servlet.jsp.el.ELException: Encountered "{", expected one of ["}", ".", > ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", > "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"] > at > org.apache.commons.el.ExpressionEvaluatorImpl.parseExpressionString(ExpressionEvaluatorImpl.java:320) > at > org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:250) > at > org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190) > at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:300) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:352) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:434) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2015-07-30 16:49:23,612 ERROR SignalXCommand:517 - SERVER[rkanter-MBP.local] > USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] > JOB[004-150730164245830-oozie-rkan-W] > ACTION[004-150730164245830-oozie-rkan-W@fail-output] XException, > org.apache.oozie.command.CommandException: E0729: Kill node message > [fail-output] > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:315) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.X
[jira] [Updated] (OOZIE-2324) A syntax error in the kill node causes the workflow to get stuck and other problems
[ https://issues.apache.org/jira/browse/OOZIE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated OOZIE-2324: - Attachment: OOZIE-2324.002.patch The 002 patch adds some missing test files. The line that is too long is a query. > A syntax error in the kill node causes the workflow to get stuck and other > problems > --- > > Key: OOZIE-2324 > URL: https://issues.apache.org/jira/browse/OOZIE-2324 > Project: Oozie > Issue Type: Bug >Affects Versions: trunk >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: OOZIE-2324.001.patch, OOZIE-2324.002.patch > > > A syntax error normally causes the action to go to the "fail to" transition, > which is typically a kill node, which kills the workflow. Unfortunately, in > the kill action, we don't have that behavior, so if you get a syntax error in > the kill node, it looks like Oozie gets stuck and might be requeueing the > command to retry it. This can then clog up the callable queue, and cause > other jobs to not get processed. Oozie should better handle an error in the > kill node. > In the log, we see this: > {noformat} > 2015-07-30 16:49:23,610 WARN SignalXCommand:523 - SERVER[rkanter-MBP.local] > USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] > JOB[004-150730164245830-oozie-rkan-W] > ACTION[004-150730164245830-oozie-rkan-W@fail-output] Exception in > SignalXCommand > javax.servlet.jsp.el.ELException: Encountered "{", expected one of ["}", ".", > ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", > "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"] > at > org.apache.commons.el.ExpressionEvaluatorImpl.parseExpressionString(ExpressionEvaluatorImpl.java:320) > at > org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:250) > at > org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190) > at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:300) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:352) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:434) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2015-07-30 16:49:23,612 ERROR SignalXCommand:517 - SERVER[rkanter-MBP.local] > USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] > JOB[004-150730164245830-oozie-rkan-W] > ACTION[004-150730164245830-oozie-rkan-W@fail-output] XException, > org.apache.oozie.command.CommandException: E0729: Kill node message > [fail-output] > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:315) > at > org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at org.apache.oozie.command.XCommand.call(XCommand.java:356) > at > org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) > at > org.apach
[jira] [Updated] (OOZIE-2324) A syntax error in the kill node causes the workflow to get stuck and other problems
[ https://issues.apache.org/jira/browse/OOZIE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated OOZIE-2324: - Attachment: OOZIE-2324.001.patch The patch changes it so that when there's an error processing the error message in the kill node, instead of throwing a CommandException, it now sets a new error code and sets the message to whatever the exception was. It also sets the kill node's status to ERROR to indicate that there was a problem with the kill node itself. It otherwise, continues normally, so the workflow is able to finish. It still logs an error message, but only once and more clear: {noformat} 2015-07-30 17:47:25,376 WARN SignalXCommand:523 - SERVER[rkanter-MBP.local] USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] JOB[000-150730174245156-oozie-rkan-W] ACTION[000-150730174245156-oozie-rkan-W@fail-output] Exception in SignalXCommand when processing Kill node message: Encountered "{", expected one of ["}", ".", ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"] javax.servlet.jsp.el.ELException: Encountered "{", expected one of ["}", ".", ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"] at org.apache.commons.el.ExpressionEvaluatorImpl.parseExpressionString(ExpressionEvaluatorImpl.java:320) at org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:250) at org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190) at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204) at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:300) at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) at org.apache.oozie.command.XCommand.call(XCommand.java:286) at org.apache.oozie.command.XCommand.call(XCommand.java:356) at org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) at org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61) at org.apache.oozie.command.XCommand.call(XCommand.java:286) at org.apache.oozie.command.XCommand.call(XCommand.java:356) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:352) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64) at org.apache.oozie.command.XCommand.call(XCommand.java:286) at org.apache.oozie.command.XCommand.call(XCommand.java:356) at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:435) at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76) at org.apache.oozie.command.XCommand.call(XCommand.java:286) at org.apache.oozie.command.XCommand.call(XCommand.java:356) at org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280) at org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61) at org.apache.oozie.command.XCommand.call(XCommand.java:286) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} Here's an example of what a failed kill node looks like: {noformat}$ oozie job -info 000-150730174245156-oozie-rkan-W@fail-output ID : 000-150730174245156-oozie-rkan-W@fail-output Console URL : - Error Code: E0756 Error Message : E0756: Exception parsing Kill node message [Encountered "{", expected one of ["}", ".", ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"]] External ID : - External Status : OK Name : fail-output Retries : 0 Tracker URI : - Type : :KILL: Started : 2015-07-31 00:47 GMT Status: ERROR Ended : 2015-07-31 00:47 GMT {noformat} > A syntax error in the kill node causes the workflow to get stuck and other > problems > --- > > Key: OOZIE-2324 > URL: https://