[jira] [Updated] (OOZIE-2324) A syntax error in the kill node causes the workflow to get stuck and other problems

2016-08-03 Thread abhishek bafna (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

abhishek bafna updated OOZIE-2324:
--
Fix Version/s: (was: trunk)
   4.3.0

> A syntax error in the kill node causes the workflow to get stuck and other 
> problems
> ---
>
> Key: OOZIE-2324
> URL: https://issues.apache.org/jira/browse/OOZIE-2324
> Project: Oozie
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Fix For: 4.3.0
>
> Attachments: OOZIE-2324.001.patch, OOZIE-2324.002.patch, 
> OOZIE-2324.003.patch
>
>
> A syntax error normally causes the action to go to the "fail to" transition, 
> which is typically a kill node, which kills the workflow. Unfortunately, in 
> the kill action, we don't have that behavior, so if you get a syntax error in 
> the kill node, it looks like Oozie gets stuck and might be requeueing the 
> command to retry it. This can then clog up the callable queue, and cause 
> other jobs to not get processed.  Oozie should better handle an error in the 
> kill node.
> In the log, we see this:
> {noformat}
> 2015-07-30 16:49:23,610  WARN SignalXCommand:523 - SERVER[rkanter-MBP.local] 
> USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] 
> JOB[004-150730164245830-oozie-rkan-W] 
> ACTION[004-150730164245830-oozie-rkan-W@fail-output] Exception in 
> SignalXCommand
> javax.servlet.jsp.el.ELException: Encountered "{", expected one of ["}", ".", 
> ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", 
> "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"]
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.parseExpressionString(ExpressionEvaluatorImpl.java:320)
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:250)
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190)
> at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:300)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:352)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:434)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2015-07-30 16:49:23,612 ERROR SignalXCommand:517 - SERVER[rkanter-MBP.local] 
> USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] 
> JOB[004-150730164245830-oozie-rkan-W] 
> ACTION[004-150730164245830-oozie-rkan-W@fail-output] XException,
> org.apache.oozie.command.CommandException: E0729: Kill node message 
> [fail-output]
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:315)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
> at 
> org.apache.o

[jira] [Updated] (OOZIE-2324) A syntax error in the kill node causes the workflow to get stuck and other problems

2015-08-10 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-2324:
-
Attachment: OOZIE-2324.003.patch

003 patch fixes trailing whitespace.  Not sure what happened to the tests run; 
it simply says "killed" partway through the tests in Jenkins.  I'm guessing 
it's just a weird Jenkins thing and will work next time.

> A syntax error in the kill node causes the workflow to get stuck and other 
> problems
> ---
>
> Key: OOZIE-2324
> URL: https://issues.apache.org/jira/browse/OOZIE-2324
> Project: Oozie
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: OOZIE-2324.001.patch, OOZIE-2324.002.patch, 
> OOZIE-2324.003.patch
>
>
> A syntax error normally causes the action to go to the "fail to" transition, 
> which is typically a kill node, which kills the workflow. Unfortunately, in 
> the kill action, we don't have that behavior, so if you get a syntax error in 
> the kill node, it looks like Oozie gets stuck and might be requeueing the 
> command to retry it. This can then clog up the callable queue, and cause 
> other jobs to not get processed.  Oozie should better handle an error in the 
> kill node.
> In the log, we see this:
> {noformat}
> 2015-07-30 16:49:23,610  WARN SignalXCommand:523 - SERVER[rkanter-MBP.local] 
> USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] 
> JOB[004-150730164245830-oozie-rkan-W] 
> ACTION[004-150730164245830-oozie-rkan-W@fail-output] Exception in 
> SignalXCommand
> javax.servlet.jsp.el.ELException: Encountered "{", expected one of ["}", ".", 
> ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", 
> "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"]
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.parseExpressionString(ExpressionEvaluatorImpl.java:320)
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:250)
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190)
> at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:300)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:352)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:434)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2015-07-30 16:49:23,612 ERROR SignalXCommand:517 - SERVER[rkanter-MBP.local] 
> USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] 
> JOB[004-150730164245830-oozie-rkan-W] 
> ACTION[004-150730164245830-oozie-rkan-W@fail-output] XException,
> org.apache.oozie.command.CommandException: E0729: Kill node message 
> [fail-output]
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:315)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.X

[jira] [Updated] (OOZIE-2324) A syntax error in the kill node causes the workflow to get stuck and other problems

2015-07-30 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-2324:
-
Attachment: OOZIE-2324.002.patch

The 002 patch adds some missing test files. 

The line that is too long is a query.

> A syntax error in the kill node causes the workflow to get stuck and other 
> problems
> ---
>
> Key: OOZIE-2324
> URL: https://issues.apache.org/jira/browse/OOZIE-2324
> Project: Oozie
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: OOZIE-2324.001.patch, OOZIE-2324.002.patch
>
>
> A syntax error normally causes the action to go to the "fail to" transition, 
> which is typically a kill node, which kills the workflow. Unfortunately, in 
> the kill action, we don't have that behavior, so if you get a syntax error in 
> the kill node, it looks like Oozie gets stuck and might be requeueing the 
> command to retry it. This can then clog up the callable queue, and cause 
> other jobs to not get processed.  Oozie should better handle an error in the 
> kill node.
> In the log, we see this:
> {noformat}
> 2015-07-30 16:49:23,610  WARN SignalXCommand:523 - SERVER[rkanter-MBP.local] 
> USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] 
> JOB[004-150730164245830-oozie-rkan-W] 
> ACTION[004-150730164245830-oozie-rkan-W@fail-output] Exception in 
> SignalXCommand
> javax.servlet.jsp.el.ELException: Encountered "{", expected one of ["}", ".", 
> ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", 
> "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"]
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.parseExpressionString(ExpressionEvaluatorImpl.java:320)
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:250)
> at 
> org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190)
> at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:300)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:352)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:434)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2015-07-30 16:49:23,612 ERROR SignalXCommand:517 - SERVER[rkanter-MBP.local] 
> USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] 
> JOB[004-150730164245830-oozie-rkan-W] 
> ACTION[004-150730164245830-oozie-rkan-W@fail-output] XException,
> org.apache.oozie.command.CommandException: E0729: Kill node message 
> [fail-output]
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:315)
> at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
> at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> at org.apache.oozie.command.XCommand.call(XCommand.java:356)
> at 
> org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
> at 
> org.apach

[jira] [Updated] (OOZIE-2324) A syntax error in the kill node causes the workflow to get stuck and other problems

2015-07-30 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-2324:
-
Attachment: OOZIE-2324.001.patch

The patch changes it so that when there's an error processing the error message 
in the kill node, instead of throwing a CommandException, it now sets a new 
error code and sets the message to whatever the exception was.  It also sets 
the kill node's status to ERROR to indicate that there was a problem with the 
kill node itself.  It otherwise, continues normally, so the workflow is able to 
finish.

It still logs an error message, but only once and more clear:
{noformat}
2015-07-30 17:47:25,376  WARN SignalXCommand:523 - SERVER[rkanter-MBP.local] 
USER[rkanter] GROUP[-] TOKEN[] APP[shell-wf] 
JOB[000-150730174245156-oozie-rkan-W] 
ACTION[000-150730174245156-oozie-rkan-W@fail-output] Exception in 
SignalXCommand when processing Kill node message: Encountered "{", expected one 
of ["}", ".", ">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", 
"ne", "[", "+", "-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", 
"(", "?"]
javax.servlet.jsp.el.ELException: Encountered "{", expected one of ["}", ".", 
">", "gt", "<", "lt", "==", "eq", "<=", "le", ">=", "ge", "!=", "ne", "[", "+", 
"-", "*", "/", "div", "%", "mod", "and", "&&", "or", "||", ":", "(", "?"]
at 
org.apache.commons.el.ExpressionEvaluatorImpl.parseExpressionString(ExpressionEvaluatorImpl.java:320)
at 
org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:250)
at 
org.apache.commons.el.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:190)
at org.apache.oozie.util.ELEvaluator.evaluate(ELEvaluator.java:204)
at 
org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:300)
at 
org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.command.XCommand.call(XCommand.java:356)
at 
org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
at 
org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.command.XCommand.call(XCommand.java:356)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:352)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:64)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.command.XCommand.call(XCommand.java:356)
at 
org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:435)
at 
org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:76)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.command.XCommand.call(XCommand.java:356)
at 
org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:280)
at 
org.apache.oozie.command.wf.ActionEndXCommand.execute(ActionEndXCommand.java:61)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Here's an example of what a failed kill node looks like:
{noformat}$ oozie job -info 000-150730174245156-oozie-rkan-W@fail-output
ID : 000-150730174245156-oozie-rkan-W@fail-output

Console URL   : -
Error Code: E0756
Error Message : E0756: Exception parsing Kill node message [Encountered 
"{", expected one of ["}", ".", ">", "gt", "<", "lt", "==", "eq", "<=", "le", 
">=", "ge", "!=", "ne", "[", "+", "-", "*", "/", "div", "%", "mod", "and", 
"&&", "or", "||", ":", "(", "?"]]
External ID   : -
External Status   : OK
Name  : fail-output
Retries   : 0
Tracker URI   : -
Type  : :KILL:
Started   : 2015-07-31 00:47 GMT
Status: ERROR
Ended : 2015-07-31 00:47 GMT

{noformat}

> A syntax error in the kill node causes the workflow to get stuck and other 
> problems
> ---
>
> Key: OOZIE-2324
> URL: https://