[jira] [Updated] (OOZIE-3723) Oozie service permanently caches workflow-supplied FileSystem connectivity configuration properties for obtaining HDFS Credentials until restarted

2024-08-28 Thread Andrew Olson (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated OOZIE-3723:

Description: 
We recently have encountered two separate issues that both ultimately required 
an Oozie service restart to resolve after extensive troubleshooting. In both 
situations it was apparent that certain workflow-supplied configuration 
properties related to remote FileSystem connectivity to support obtaining HDFS 
credentials for remote clusters (via {{{}mapreduce.job.hdfs-servers{}}}) are 
being retained permanently within some kind of cache in the Oozie service or 
underlying Hadoop code - cached global FileSystem instances perhaps). The 
previously "poisoned" cached values are superseding corrected values after the 
workflow configuration is fixed, giving us no known way to fix the problem 
without restarting the Oozie service. We confirmed that the {{hdfs-site.xml}} 
and {{oozie-site.xml}} files where Oozie is running had not been updated since 
the prior restart, so not a basic case of stale server-side configuration.

We are running Oozie version 5.2.0 in this environment.

Complete stack traces are provided below.

Issue 1:

A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to 
{*}/{*}@* but our system default is *.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
nn/hostname.some.domain@kerberos.realm.com, doesn't match the pattern: 
'*/*@*'
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegatio

[jira] [Updated] (OOZIE-3723) Oozie service permanently caches workflow-supplied FileSystem connectivity configuration properties for obtaining HDFS Credentials until restarted

2024-08-28 Thread Andrew Olson (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated OOZIE-3723:

Description: 
We recently have encountered two separate issues that both required an Oozie 
service restart to resolve. In both situations it was apparent that incorrect 
workflow-supplied configuration properties related to remote FileSystem 
connectivity to support obtaining HDFS credentials for remote clusters (via 
{{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some 
kind of cache in the Oozie service or underlying Hadoop code. These cached 
values are superseding corrected values after the workflow configuration is 
fixed, giving us no known way to fix the problem without restarting the Oozie 
service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files 
where Oozie is running had not been updated since the prior restart, so not a 
basic case of stale configuration.

We are running Oozie version 5.2.0 in this environment.

Complete stack traces are provided below.

Issue 1:

A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to */*@* 
but our system default is *.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
nn/hostname.some.domain@kerberos.realm.com, doesn't match the pattern: 
'*/*@*'
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(D

[jira] [Updated] (OOZIE-3723) Oozie service permanently caches workflow-supplied FileSystem connectivity configuration properties for obtaining HDFS Credentials until restarted

2024-08-28 Thread Andrew Olson (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated OOZIE-3723:

Description: 
We recently have encountered two separate issues that both required an Oozie 
service restart to resolve. In both situations it was apparent that incorrect 
workflow-supplied configuration properties related to remote FileSystem 
connectivity to support obtaining HDFS credentials for remote clusters (via 
{{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some 
kind of cache in the Oozie service or underlying Hadoop code. These cached 
values are superseding corrected values after the workflow configuration is 
fixed, giving us no known way to fix the problem without restarting the Oozie 
service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files 
where Oozie is running had not been updated since the prior restart, so not a 
basic case of stale configuration.

We are running Oozie version 5.2.0 in this environment.

Complete stack traces are provided below.

Issue 1:

A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to 
{{*/*@*}} but our system default is*.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
nn/hostname.some.domain@kerberos.realm.com, doesn't match the pattern: 
'*/*@*'
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(D

[jira] [Updated] (OOZIE-3723) Oozie service permanently caches workflow-supplied FileSystem connectivity configuration properties for obtaining HDFS Credentials until restarted

2024-08-28 Thread Andrew Olson (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated OOZIE-3723:

Description: 
We recently have encountered two separate issues that both required an Oozie 
service restart to resolve. In both situations it was apparent that incorrect 
workflow-supplied configuration properties related to remote FileSystem 
connectivity to support obtaining HDFS credentials for remote clusters (via 
{{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some 
kind of cache in the Oozie service or underlying Hadoop code. These cached 
values are superseding corrected values after the workflow configuration is 
fixed, giving us no known way to fix the problem without restarting the Oozie 
service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files 
where Oozie is running had not been updated since the prior restart, so not a 
basic case of stale configuration.

We are running Oozie version 5.2.0 in this environment.

Complete stack traces are provided below.

Issue 1:

A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to 
{{{}{*}*/*{*}@*{*}{*}{}}}, but our system default is {{{}*{}}}.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
nn/hostname.some.domain@kerberos.realm.com, doesn't match the pattern: 
'*/*@*'
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(D

[jira] [Updated] (OOZIE-3723) Oozie service permanently caches workflow-supplied FileSystem connectivity configuration properties for obtaining HDFS Credentials until restarted

2024-08-28 Thread Andrew Olson (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated OOZIE-3723:

Description: 
We recently have encountered two separate issues that both required an Oozie 
service restart to resolve. In both situations it was apparent that incorrect 
workflow-supplied configuration properties related to remote FileSystem 
connectivity to support obtaining HDFS credentials for remote clusters (via 
{{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some 
kind of cache in the Oozie service or underlying Hadoop code. These cached 
values are superseding corrected values after the workflow configuration is 
fixed, giving us no known way to fix the problem without restarting the Oozie 
service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files 
where Oozie is running had not been updated since the prior restart, so not a 
basic case of stale configuration.

We are running Oozie version 5.2.0 in this environment.

Complete stack traces are provided below.

Issue 1:

A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to 
\*/\*@\* but our system default is\*.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
nn/hostname.some.domain@kerberos.realm.com, doesn't match the pattern: 
'*/*@*'
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(D

[jira] [Created] (OOZIE-3723) Oozie service permanently caches workflow-supplied FileSystem connectivity configuration properties for obtaining HDFS Credentials until restarted

2024-08-28 Thread Andrew Olson (Jira)
Andrew Olson created OOZIE-3723:
---

 Summary: Oozie service permanently caches workflow-supplied 
FileSystem connectivity configuration properties for obtaining HDFS Credentials 
until restarted
 Key: OOZIE-3723
 URL: https://issues.apache.org/jira/browse/OOZIE-3723
 Project: Oozie
  Issue Type: Bug
  Components: workflow
Reporter: Andrew Olson


We recently have encountered two separate issues that both required an Oozie 
service restart to resolve. In both situations it was apparent that incorrect 
workflow-supplied configuration properties related to remote FileSystem 
connectivity to support obtaining HDFS credentials for remote clusters (via 
{{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some 
kind of cache in the Oozie service or underlying Hadoop code. These cached 
values are superseding corrected values after the workflow configuration is 
fixed, giving us no known way to fix the problem without restarting the Oozie 
service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files 
where Oozie is running had not been updated since the prior restart, so not a 
basic case of stale configuration.

We are running Oozie version 5.2.0 in this environment.

Complete stack traces are provided below.

Issue 1:

A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to our 
{{{}*/*@*{}}}, but our system default is {{{}*{}}}.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
nn/hostname.some.domain@kerberos.realm.com, doesn't match the pattern: 
'*/*@*'
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getDelega

[jira] [Commented] (OOZIE-3719) Improve coordinator scope range checking

2024-08-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870681#comment-17870681
 ] 

Dénes Bodó commented on OOZIE-3719:
---

I was not able to fix the build issues in Jenkins nor I will have the bandwidth 
to work on it in the near future. Asked help in 
[priv...@oozie.apache.org|mailto:private@oozie.apache] ; no response so far. 
Best effort I will try to find someone who can do the proper testing or fix 
build issues.

> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, OOZIE-3719-006.patch, 
> OOZIE-3719-007.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Improve coordinator scope range checking

2024-07-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866680#comment-17866680
 ] 

János Makai commented on OOZIE-3719:


The latest patch looks good to me, thanks [~dionusos]!

> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, OOZIE-3719-006.patch, 
> OOZIE-3719-007.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Improve coordinator scope range checking

2024-07-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866382#comment-17866382
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, OOZIE-3719-006.patch, 
> OOZIE-3719-007.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Improve coordinator scope range checking

2024-07-16 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó updated OOZIE-3719:
--
Attachment: OOZIE-3719-007.patch

> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, OOZIE-3719-006.patch, 
> OOZIE-3719-007.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Improve coordinator scope range checking

2024-07-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866243#comment-17866243
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, OOZIE-3719-006.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Improve coordinator scope range checking

2024-07-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó updated OOZIE-3719:
--
Attachment: OOZIE-3719-006.patch

> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, OOZIE-3719-006.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Improve coordinator scope range checking

2024-07-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866174#comment-17866174
 ] 

Hadoop QA commented on OOZIE-3719:
--


Testing JIRA OOZIE-3719

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch




> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Improve coordinator scope range checking

2024-07-15 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866172#comment-17866172
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Improve coordinator scope range checking

2024-07-15 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó updated OOZIE-3719:
--
Attachment: OOZIE-3719-005.patch

> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, OOZIE-3719-005.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Improve coordinator scope range checking

2024-07-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó updated OOZIE-3719:
--
Summary: Improve coordinator scope range checking  (was: Apache Oozie Regex 
Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting 
Access for Intended Users)

> Improve coordinator scope range checking
> 
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-07-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864260#comment-17864260
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-07-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó updated OOZIE-3719:
--
Attachment: OOZIE-3719-003.patch

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> OOZIE-3719-003.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-07-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864062#comment-17864062
 ] 

János Makai edited comment on OOZIE-3719 at 7/9/24 8:30 AM:


Looks like the *PreCommit-OOZIE-Build* is failing for the recent patch(es) but 
this seems like an +unrelated issue+ to the change\{+}
[https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/219/consoleFull]

Other than this the patch looks good so far, I'm waiting for the corresponding 
unit tests to be created.

Thanks [~dionusos] 


was (Author: jmakai):
Looks like the *PreCommit-OOZIE-Build* is failing for the recent patch(es) but 
this seems like an +unrelated issue+ to the change{+}
[https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/219/consoleFull]

{+}Other than this the patch looks good so far, I'm waiting for the 
corresponding unit tests to be created.

Thanks [~dionusos] 

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-07-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864062#comment-17864062
 ] 

János Makai commented on OOZIE-3719:


Looks like the *PreCommit-OOZIE-Build* is failing for the recent patch(es) but 
this seems like an +unrelated issue+ to the change{+}
[https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/219/consoleFull]

{+}Other than this the patch looks good so far, I'm waiting for the 
corresponding unit tests to be created.

Thanks [~dionusos] 

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-07-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17864035#comment-17864035
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-07-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863825#comment-17863825
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-07-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863824#comment-17863824
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-07-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó updated OOZIE-3719:
--
Attachment: OOZIE-3719-002.patch

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, OOZIE-3719-002.patch, 
> image-2023-09-15-02-47-52-819.png, image-2023-09-15-02-49-14-531.png, 
> image-2023-09-15-02-52-09-320.png, oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-02-14 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó updated OOZIE-3719:
--
Fix Version/s: (was: 5.3.0)

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: OOZIE-3719-001.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-02-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817263#comment-17817263
 ] 

Dénes Bodó commented on OOZIE-3719:
---

[~SanjayKumarSahu] 

The [Jenkins job|https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/216/] 
cannot apply the uploaded patch using this command:
{code:bash}
git apply --check -v -p0 < OOZIE-3719-001.patch{code}
 

Could you please format your patch according to this description?

[https://cwiki.apache.org/confluence/display/OOZIE/How+To+Contribute]
{code:bash}
git diff --no-prefix {code}
Thank you

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Fix For: 5.3.0
>
> Attachments: OOZIE-3719-001.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-02-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817246#comment-17817246
 ] 

Hadoop QA commented on OOZIE-3719:
--


Testing JIRA OOZIE-3719

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch




> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Fix For: 5.3.0
>
> Attachments: OOZIE-3719-001.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2024-02-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817240#comment-17817240
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Fix For: 5.3.0
>
> Attachments: OOZIE-3719-001.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3722) Workflow actions can stuck in RUNNING state when DB connections are killed on the DB side

2024-02-14 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17817237#comment-17817237
 ] 

Dénes Bodó commented on OOZIE-3722:
---

If anybody, who found this ticket, has any suggestion, solution or question, 
please do not hesitate to ask here or on any Oozie mailing lists.

> Workflow actions can stuck in RUNNING state when DB connections are killed on 
> the DB side
> -
>
> Key: OOZIE-3722
> URL: https://issues.apache.org/jira/browse/OOZIE-3722
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Dénes Bodó
>Assignee: Dénes Bodó
>Priority: Critical
>
> Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 
> 1.5.4. These are ancient versions, I know.
> h1. Description
> The issue is that when due to some network issues or "maintenance work" on 
> the DB side (especially PostgreSQL) which causes the DB connection to be 
> closed, it results exhausted Pool on the client side. Many threads are 
> waiting at this point:
> {noformat}
> "pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x7faf7903b800 nid=0x8603 
> waiting on condition [0x00030f3e7000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00066aca8e70> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
>   at 
> org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324)
>  {noformat}
> According to my observation this is because the JDBC driver does not get 
> closed on the client side, nor the abstract DBCP connection 
> _org.apache.commons.dbcp2.PoolableConnection_ .
>  
> This issue can cause workflow actions stuck in RUNNING state because the 
> thread which would update the DB after XActionExecutor.check() doesn't get a 
> connection causing the thread stuck infinitely.
>  
> h1. Workaround
> Restarts Oozie and/or fix the DB/network issue.
> h1. Repro
> (Un)Fortunately I can reproduce the issue using the latest and greatest 
> commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2.
> I've just created a Java application to reproduce the issue: 
> [https://github.com/dionusos/pool_exhausted_repro] . See README.md for 
> detailed repro steps.
>  
> DBCP-595 was created to ask for help from DBCP/Pool teams. I am working on 
> the case to provide them the necessary information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OOZIE-3722) Workflow actions can stuck in RUNNING state when DB connections are killed on the DB side

2024-02-13 Thread Jira
Dénes Bodó created OOZIE-3722:
-

 Summary: Workflow actions can stuck in RUNNING state when DB 
connections are killed on the DB side
 Key: OOZIE-3722
 URL: https://issues.apache.org/jira/browse/OOZIE-3722
 Project: Oozie
  Issue Type: Bug
  Components: core
Affects Versions: 5.2.1
Reporter: Dénes Bodó
Assignee: Dénes Bodó


Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 
1.5.4. These are ancient versions, I know.
h1. Description

The issue is that when due to some network issues or "maintenance work" on the 
DB side (especially PostgreSQL) which causes the DB connection to be closed, it 
results exhausted Pool on the client side. Many threads are waiting at this 
point:
{noformat}
"pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x7faf7903b800 nid=0x8603 
waiting on condition [0x00030f3e7000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00066aca8e70> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at 
org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324)
 {noformat}
According to my observation this is because the JDBC driver does not get closed 
on the client side, nor the abstract DBCP connection 
_org.apache.commons.dbcp2.PoolableConnection_ .

 

This issue can cause workflow actions stuck in RUNNING state because the thread 
which would update the DB after XActionExecutor.check() doesn't get a 
connection causing the thread stuck infinitely.

 
h1. Workaround

Restarts Oozie and/or fix the DB/network issue.
h1. Repro

(Un)Fortunately I can reproduce the issue using the latest and greatest 
commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2.

I've just created a Java application to reproduce the issue: 
[https://github.com/dionusos/pool_exhausted_repro] . See README.md for detailed 
repro steps.

 

DBCP-595 was created to ask for help from DBCP/Pool teams. I am working on the 
case to provide them the necessary information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3721) Subsidiaries freeze in the status of "RUNNING" during a high load on the cluster

2024-01-29 Thread Cecily Myles (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cecily Myles updated OOZIE-3721:

Description: 
When my cluster is loaded, I am faced with the problem of hanging subsidiaries 
in the status of "RUNNING". I get such a mistake when working with the HIVE 
tables. But also, I managed to reproduce the problem, launching the usual 
calculation of the number of pi in many subsidiaries, imitating the load.

I launch an Oozie workflow with the following structure:
{code:java}
-- Oozie workflow
--> subworkflow_1
-- fork_1
-- fork_2
-- ...
-- fork_n
--> subworkflow_2
-- fork_1
-- fork_2
-- ...
-- fork_n {code}
One of the fork have status "RUNNING" but if you open this fork, then it has 
"SUCCESS" status.

Parent workflow:
{code:java}
Job ID : 0061971-240125161152217-oozie-oozi-W

Workflow Name : test-subworkflow
App Path      : hdfs://mycluster:8020/user/cecyl/subwf/job
Status        : RUNNING
Run           : 0
User          : cecyl
Group         : -
Created       : 2024-01-25 15:55 GMT
Started       : 2024-01-25 15:55 GMT
Last Modified : 2024-01-30 06:24 GMT
Ended         : -
CoordAction ID: -Actions
-
ID                                                       Status    Ext ID       
          Ext Status Err Code
-
0061971-240125161152217-oozie-oozi-W@:start:             OK        -            
          OK         -
-
0061971-240125161152217-oozie-oozi-W@fork                OK        -            
          OK         -
-
0061971-240125161152217-oozie-oozi-W@fork7               OK        
0067643-240125161152217-oozie-oozi-WSUCCEEDED  -
-
0061971-240125161152217-oozie-oozi-W@fork9               OK        
0067640-240125161152217-oozie-oozi-WSUCCEEDED  -
-
0061971-240125161152217-oozie-oozi-W@fork10              RUNNING   
0067641-240125161152217-oozie-oozi-WRUNNING    -
-
0061971-240125161152217-oozie-oozi-W@fork5               OK        
0067645-240125161152217-oozie-oozi-WSUCCEEDED  -
-
 {code}
Running subworkflow:
{code:java}
Job ID : 0067641-240125161152217-oozie-oozi-W

Workflow Name : test-subworkflow
App Path      : hdfs://mycluster:8020/user/cecyl/subwf
Status        : RUNNING
Run           : 0
User          : cecyl
Group         : -
Created       : 2024-01-26 04:20 GMT
Started       : 2024-01-26 04:20 GMT
Last Modified : 2024-01-26 08:23 GMT
Ended         : -
CoordAction ID: 0061971-240125161152217-oozie-oozi-WActions
-
ID                                                       Status    Ext ID       
          Ext Status Err Code
-
0067641-240125161152217-oozie-oozi-W@:start:             OK        -            
          OK         -
-
0067641-240125161152217-oozie-oozi-W@fork                OK        -            
          OK         -
-
0067641-240125161152217-oozie-oozi-W@fork21              RUNNING   
application_1706187939089_147514RUNNING    -
-
0067641-240125161152217-oozie-oozi-W@fork22              RUNNING   
app

[jira] [Updated] (OOZIE-3721) Subsidiaries freeze in the status of "RUNNING" during a high load on the cluster

2024-01-29 Thread Cecily Myles (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cecily Myles updated OOZIE-3721:

Description: 
When my cluster is loaded, I am faced with the problem of hanging subsidiaries 
in the status of "RUNNING". I get such a mistake when working with the HIVE 
tables. But also, I managed to reproduce the problem, launching the usual 
calculation of the number of pi in many subsidiaries, imitating the load.

I launch an Oozie workflow with the following structure:
{code:java}
-- Oozie workflow
--> subworkflow_1
-- fork_1
-- fork_2
-- ...
-- fork_n
--> subworkflow_2
-- fork_1
-- fork_2
-- ...
-- fork_n {code}
One of the fork have status "RUNNING" but if you open this fork, then it has 
"SUCCESS" status.

Parent workflow:
{code:java}
Job ID : 0061971-240125161152217-oozie-oozi-W

Workflow Name : test-subworkflow
App Path      : hdfs://mycluster:8020/user/cecyl/subwf/job
Status        : RUNNING
Run           : 0
User          : cecyl
Group         : -
Created       : 2024-01-25 15:55 GMT
Started       : 2024-01-25 15:55 GMT
Last Modified : 2024-01-30 06:24 GMT
Ended         : -
CoordAction ID: -Actions

ID                                                       Status    Ext ID       
          Ext Status Err Code

0061971-240125161152217-oozie-oozi-W@:start:             OK        -            
          OK         -

0061971-240125161152217-oozie-oozi-W@fork                OK        -            
          OK         -

0061971-240125161152217-oozie-oozi-W@fork7               OK        
0067643-240125161152217-oozie-oozi-WSUCCEEDED  -

0061971-240125161152217-oozie-oozi-W@fork9               OK        
0067640-240125161152217-oozie-oozi-WSUCCEEDED  -

0061971-240125161152217-oozie-oozi-W@fork10              RUNNING   
0067641-240125161152217-oozie-oozi-WRUNNING    -

0061971-240125161152217-oozie-oozi-W@fork5               OK        
0067645-240125161152217-oozie-oozi-WSUCCEEDED  -

 {code}
Running subworkflow:
{code:java}
Job ID : 0067641-240125161152217-oozie-oozi-W

Workflow Name : test-subworkflow
App Path      : hdfs://mycluster:8020/user/cecyl/subwf
Status        : RUNNING
Run           : 0
User          : cecyl
Group         : -
Created       : 2024-01-26 04:20 GMT
Started       : 2024-01-26 04:20 GMT
Last Modified : 2024-01-26 08:23 GMT
Ended         : -
CoordAction ID: 0061971-240125161152217-oozie-oozi-WActions

ID                                                       Status    Ext ID       
          Ext Status Err Code

0067641-240125161152217-oozie-oozi-W@:start:             OK        -            
          OK         -

0067641-240125161152217-oozie-oozi-W@fork                OK        -            
          OK         -

0067641-240125161152217-oozie-oozi-W@fork21              RUNNING   
application_1706187939089_147514RUNNING    -


[jira] [Created] (OOZIE-3721) Subsidiaries freeze in the status of "RUNNING" during a high load on the cluster

2024-01-29 Thread Cecily Myles (Jira)
Cecily Myles created OOZIE-3721:
---

 Summary: Subsidiaries freeze in the status of "RUNNING" during a 
high load on the cluster
 Key: OOZIE-3721
 URL: https://issues.apache.org/jira/browse/OOZIE-3721
 Project: Oozie
  Issue Type: Bug
  Components: core
Affects Versions: 5.2.0
Reporter: Cecily Myles


When my cluster is loaded, I am faced with the problem of hanging subsidiaries 
in the status of "RUNNING". I get such a mistake when working with the HIVE 
tables. But also, I managed to reproduce the problem, launching the usual 
calculation of the number of pi in many subsidiaries, imitating the load.

I launch an Oozie workflow with the following structure:
{code:java}
-- Oozie workflow
--> subworkflow_1
-- fork_1
-- fork_2
-- ...
-- fork_n
--> subworkflow_2
-- fork_1
-- fork_2
-- ...
-- fork_n {code}
One of the fork have status "RUNNING" but if you open this fork, then it has 
"SUCCESS" status.

Parent workflow:
{code:java}
Job ID : 0061971-240125161152217-oozie-oozi-W

Workflow Name : test-subworkflow
App Path      : hdfs://mycluster:8020/user/cecyl/subwf/job
Status        : RUNNING
Run           : 0
User          : cecyl
Group         : -
Created       : 2024-01-25 15:55 GMT
Started       : 2024-01-25 15:55 GMT
Last Modified : 2024-01-30 06:24 GMT
Ended         : -
CoordAction ID: -Actions

ID                                                                            
Status    Ext ID                 Ext Status Err Code

0061971-240125161152217-oozie-oozi-W@:start:                                  
OK        -                      OK         -

0061971-240125161152217-oozie-oozi-W@fork                                     
OK        -                      OK         -

0061971-240125161152217-oozie-oozi-W@fork7                                    
OK        0067643-240125161152217-oozie-oozi-WSUCCEEDED  -

0061971-240125161152217-oozie-oozi-W@fork9                                    
OK        0067640-240125161152217-oozie-oozi-WSUCCEEDED  -

0061971-240125161152217-oozie-oozi-W@fork10                                   
RUNNING   0067641-240125161152217-oozie-oozi-WRUNNING    -

0061971-240125161152217-oozie-oozi-W@fork5                                    
OK        0067645-240125161152217-oozie-oozi-WSUCCEEDED  -

 {code}
Running subworkflow:
{code:java}
Job ID : 0067641-240125161152217-oozie-oozi-W

Workflow Name : test-subworkflow
App Path      : hdfs://mycluster:8020/user/cecyl/subwf
Status        : RUNNING
Run           : 0
User          : cecyl
Group         : -
Created       : 2024-01-26 04:20 GMT
Started       : 2024-01-26 04:20 GMT
Last Modified : 2024-01-26 08:23 GMT
Ended         : -
CoordAction ID: 0061971-240125161152217-oozie-oozi-WActions

ID                                                                            
Status    Ext ID                 Ext Status Err Code

0067641-240125161152217-oozie-oozi-W@:start:                                  
OK        -                      OK         -

0067641-240125161152217-oozie-oozi-W@fork                           

[jira] [Commented] (OOZIE-3720) Upgrade jetty to 9.4.53 due to CVE-2023-44487

2024-01-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811849#comment-17811849
 ] 

Hadoop QA commented on OOZIE-3720:
--


Testing JIRA OOZIE-3720

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch




> Upgrade jetty to 9.4.53 due to CVE-2023-44487
> -
>
> Key: OOZIE-3720
> URL: https://issues.apache.org/jira/browse/OOZIE-3720
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Anmol Sundaram
>Priority: Major
> Attachments: OOZIE-3720-001.patch
>
>
> The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
> CVE-2023-44487. As such, we should see if we can upgrade jetty to 
> 9.4.53.v20231009 in Oozie.
>  
> PR - https://github.com/apache/oozie/pull/93/files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3720) Upgrade jetty to 9.4.53 due to CVE-2023-44487

2024-01-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811844#comment-17811844
 ] 

Hadoop QA commented on OOZIE-3720:
--

PreCommit-OOZIE-Build started


> Upgrade jetty to 9.4.53 due to CVE-2023-44487
> -
>
> Key: OOZIE-3720
> URL: https://issues.apache.org/jira/browse/OOZIE-3720
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Anmol Sundaram
>Priority: Major
> Attachments: OOZIE-3720-001.patch
>
>
> The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
> CVE-2023-44487. As such, we should see if we can upgrade jetty to 
> 9.4.53.v20231009 in Oozie.
>  
> PR - https://github.com/apache/oozie/pull/93/files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3720) Upgrade jetty to 9.4.53 due to CVE-2023-44487

2024-01-29 Thread Anmol Sundaram (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anmol Sundaram updated OOZIE-3720:
--
Attachment: (was: OOZIE-3720-001.patch)

> Upgrade jetty to 9.4.53 due to CVE-2023-44487
> -
>
> Key: OOZIE-3720
> URL: https://issues.apache.org/jira/browse/OOZIE-3720
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Anmol Sundaram
>Priority: Major
> Attachments: OOZIE-3720-001.patch
>
>
> The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
> CVE-2023-44487. As such, we should see if we can upgrade jetty to 
> 9.4.53.v20231009 in Oozie.
>  
> PR - https://github.com/apache/oozie/pull/93/files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3720) Upgrade jetty to 9.4.53 due to CVE-2023-44487

2024-01-29 Thread Anmol Sundaram (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anmol Sundaram updated OOZIE-3720:
--
Attachment: (was: OOZIE-3720-001-1.patch)

> Upgrade jetty to 9.4.53 due to CVE-2023-44487
> -
>
> Key: OOZIE-3720
> URL: https://issues.apache.org/jira/browse/OOZIE-3720
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Anmol Sundaram
>Priority: Major
> Attachments: OOZIE-3720-001.patch
>
>
> The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
> CVE-2023-44487. As such, we should see if we can upgrade jetty to 
> 9.4.53.v20231009 in Oozie.
>  
> PR - https://github.com/apache/oozie/pull/93/files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3720) Upgrade jetty to 9.4.53 due to CVE-2023-44487

2024-01-29 Thread Anmol Sundaram (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anmol Sundaram updated OOZIE-3720:
--
Attachment: OOZIE-3720-001.patch

> Upgrade jetty to 9.4.53 due to CVE-2023-44487
> -
>
> Key: OOZIE-3720
> URL: https://issues.apache.org/jira/browse/OOZIE-3720
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Anmol Sundaram
>Priority: Major
> Attachments: OOZIE-3720-001.patch
>
>
> The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
> CVE-2023-44487. As such, we should see if we can upgrade jetty to 
> 9.4.53.v20231009 in Oozie.
>  
> PR - https://github.com/apache/oozie/pull/93/files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3720) Upgrade jetty to 9.4.53 due to CVE-2023-44487

2024-01-29 Thread Anmol Sundaram (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anmol Sundaram updated OOZIE-3720:
--
Description: 
The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
CVE-2023-44487. As such, we should see if we can upgrade jetty to 
9.4.53.v20231009 in Oozie.

 

PR - https://github.com/apache/oozie/pull/93/files

  was:The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
CVE-2023-44487. As such, we should see if we can upgrade jetty to 
9.4.53.v20231009 in Oozie.


> Upgrade jetty to 9.4.53 due to CVE-2023-44487
> -
>
> Key: OOZIE-3720
> URL: https://issues.apache.org/jira/browse/OOZIE-3720
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Anmol Sundaram
>Priority: Major
>
> The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
> CVE-2023-44487. As such, we should see if we can upgrade jetty to 
> 9.4.53.v20231009 in Oozie.
>  
> PR - https://github.com/apache/oozie/pull/93/files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OOZIE-3720) Upgrade jetty to 9.4.53 due to CVE-2023-44487

2024-01-29 Thread Anmol Sundaram (Jira)
Anmol Sundaram created OOZIE-3720:
-

 Summary: Upgrade jetty to 9.4.53 due to CVE-2023-44487
 Key: OOZIE-3720
 URL: https://issues.apache.org/jira/browse/OOZIE-3720
 Project: Oozie
  Issue Type: Improvement
Reporter: Anmol Sundaram


The latest version for Jetty 9.x is 9.4.53.v20231009. This also fixes the 
CVE-2023-44487. As such, we should see if we can upgrade jetty to 
9.4.53.v20231009 in Oozie.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-12-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794659#comment-17794659
 ] 

Hadoop QA commented on OOZIE-3719:
--


Testing JIRA OOZIE-3719

Cleaning local git workspace



{color:red}-1{color} Patch failed to apply to head of branch




> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Fix For: 5.3.0
>
> Attachments: OOZIE-3719-001.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-12-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794655#comment-17794655
 ] 

Hadoop QA commented on OOZIE-3719:
--

PreCommit-OOZIE-Build started


> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Fix For: 5.3.0
>
> Attachments: OOZIE-3719-001.patch, image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-12-08 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794647#comment-17794647
 ] 

Dénes Bodó commented on OOZIE-3719:
---

[~SanjayKumarSahu] Please upload your patch with the following name 
"OOZIE-3719-001.patch" and then push the "Submit Patch" button to start the 
automated build and tests.

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
>     URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-12-06 Thread Sanjay Kumar Sahu (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Kumar Sahu updated OOZIE-3719:
-
Attachment: oozie3719.patch

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png, 
> oozie3719.patch
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-12-06 Thread Sanjay Kumar Sahu (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17793748#comment-17793748
 ] 

Sanjay Kumar Sahu commented on OOZIE-3719:
--

PR link : https://github.com/apache/oozie/pull/92

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (OOZIE-3718) Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability

2023-10-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó reassigned OOZIE-3718:
-

Assignee: Dénes Bodó

> Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability
> 
>
> Key: OOZIE-3718
> URL: https://issues.apache.org/jira/browse/OOZIE-3718
> Project: Oozie
>  Issue Type: Bug
>Reporter: Nikhil Daf
>Assignee: Dénes Bodó
>Priority: Major
>
> [CVE-2023-36877 - Security Update Guide - Microsoft - Azure Apache Oozie 
> Spoofing 
> Vulnerability|https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-36877]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (OOZIE-3718) Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability

2023-10-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó reassigned OOZIE-3718:
-

Assignee: (was: Dénes Bodó)

> Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability
> 
>
> Key: OOZIE-3718
> URL: https://issues.apache.org/jira/browse/OOZIE-3718
> Project: Oozie
>  Issue Type: Bug
>Reporter: Nikhil Daf
>Priority: Major
>
> [CVE-2023-36877 - Security Update Guide - Microsoft - Azure Apache Oozie 
> Spoofing 
> Vulnerability|https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-36877]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (OOZIE-3718) Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability

2023-10-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó resolved OOZIE-3718.
---
Fix Version/s: 5.3.0
   Resolution: Fixed

Thanks [~NikhilDaf] for the fix. Your change is committed to master.

> Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability
> 
>
> Key: OOZIE-3718
> URL: https://issues.apache.org/jira/browse/OOZIE-3718
> Project: Oozie
>  Issue Type: Bug
>Reporter: Nikhil Daf
>Priority: Major
> Fix For: 5.3.0
>
>
> [CVE-2023-36877 - Security Update Guide - Microsoft - Azure Apache Oozie 
> Spoofing 
> Vulnerability|https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-36877]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3718) Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability

2023-10-25 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17779374#comment-17779374
 ] 

ASF subversion and git services commented on OOZIE-3718:


Commit 318fac5391eb1b7e9b868ee6fb64f4e9c49850cb in oozie's branch 
refs/heads/master from Denes Bodo
[ https://gitbox.apache.org/repos/asf?p=oozie.git;h=318fac539 ]

OOZIE-3718 Improve Oozie Web UI filtering (NikhilDaf via dionusos)


> Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability
> 
>
> Key: OOZIE-3718
> URL: https://issues.apache.org/jira/browse/OOZIE-3718
> Project: Oozie
>  Issue Type: Bug
>Reporter: Nikhil Daf
>Priority: Major
>
> [CVE-2023-36877 - Security Update Guide - Microsoft - Azure Apache Oozie 
> Spoofing 
> Vulnerability|https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-36877]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3718) Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability

2023-10-23 Thread Nikhil Daf (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778581#comment-17778581
 ] 

Nikhil Daf commented on OOZIE-3718:
---

[~dionusos] As this is a security issue, I have been informed not to add the 
details here. I have communicated the info like the repro steps and the patch 
to the apache security team. They will release the fix soon.  

> Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability
> 
>
> Key: OOZIE-3718
> URL: https://issues.apache.org/jira/browse/OOZIE-3718
> Project: Oozie
>  Issue Type: Bug
>Reporter: Nikhil Daf
>Priority: Major
>
> [CVE-2023-36877 - Security Update Guide - Microsoft - Azure Apache Oozie 
> Spoofing 
> Vulnerability|https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-36877]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3718) Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability

2023-10-17 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776227#comment-17776227
 ] 

Dénes Bodó commented on OOZIE-3718:
---

[~NikhilDaf] could you please attach your patch to the Jira?

Thank you.

> Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability
> 
>
> Key: OOZIE-3718
> URL: https://issues.apache.org/jira/browse/OOZIE-3718
> Project: Oozie
>  Issue Type: Bug
>Reporter: Nikhil Daf
>Priority: Major
>
> [CVE-2023-36877 - Security Update Guide - Microsoft - Azure Apache Oozie 
> Spoofing 
> Vulnerability|https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-36877]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-09-18 Thread Kinga Marton (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kinga Marton reassigned OOZIE-3719:
---

Assignee: Sanjay Kumar Sahu

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Assignee: Sanjay Kumar Sahu
>Priority: Major
> Attachments: image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-09-14 Thread Sanjay Kumar Sahu (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Kumar Sahu updated OOZIE-3719:
-
Description: 
!image-2023-09-15-02-47-52-819.png!

 

Looking further into the code focusing on the action and type query strings.
We can see that the filter variable is getting its value from the 
requestsParameters .
once the Filter parameter is being populated, an If loop checking whether Scope 
and Type are not Null and next
the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
the action query string).

 

Next the values of logRetrievalScope gets split by , and entering the the if 
loop.
In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
... } ), an attacker could potentially
send a specially crafted request with a massive range, such as "1-100". 
This would create a for loop
iterating and adding that many actions to the actionSet , consuming CPU and 
memory resources.
Though there is a subsequent check against maxNumActionsForLog , this check 
only happens after all the iterations,
allowing an attacker to consume resources before this check is made -

 

!image-2023-09-15-02-52-09-320.png!

 

 

  was:
!image-2023-09-15-02-47-52-819.png!

 

Looking further into the code focusing on the action and type query strings.
We can see that the filter variable is getting its value from the 
requestsParameters .
once the Filter parameter is being populated, an If loop checking whether Scope 
and Type are not Null and next
the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
the action query string).

 

Next the values of logRetrievalScope gets split by , and entering the the if 
loop.
In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
... } ), an attacker could potentially
send a specially crafted request with a massive range, such as "1-100". 
This would create a for loop
iterating and adding that many actions to the actionSet , consuming CPU and 
memory resources.
Though there is a subsequent check against maxNumActionsForLog , this check 
only happens after all the iterations,
allowing an attacker to consume resources before this check is made -

 

!image-2023-09-15-02-50-26-331.png!

 

 


> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Priority: Major
> Attachments: image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-52-09-320.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-09-14 Thread Sanjay Kumar Sahu (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sanjay Kumar Sahu updated OOZIE-3719:
-
Attachment: image-2023-09-15-02-52-09-320.png

> Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege 
> Users Disrupting Access for Intended Users
> --
>
> Key: OOZIE-3719
> URL: https://issues.apache.org/jira/browse/OOZIE-3719
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: Sanjay Kumar Sahu
>Priority: Major
> Attachments: image-2023-09-15-02-47-52-819.png, 
> image-2023-09-15-02-49-14-531.png, image-2023-09-15-02-52-09-320.png
>
>
> !image-2023-09-15-02-47-52-819.png!
>  
> Looking further into the code focusing on the action and type query strings.
> We can see that the filter variable is getting its value from the 
> requestsParameters .
> once the Filter parameter is being populated, an If loop checking whether 
> Scope and Type are not Null and next
> the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
> the action query string).
>  
> Next the values of logRetrievalScope gets split by , and entering the the if 
> loop.
> In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
> ... } ), an attacker could potentially
> send a specially crafted request with a massive range, such as "1-100". 
> This would create a for loop
> iterating and adding that many actions to the actionSet , consuming CPU and 
> memory resources.
> Though there is a subsequent check against maxNumActionsForLog , this check 
> only happens after all the iterations,
> allowing an attacker to consume resources before this check is made -
>  
> !image-2023-09-15-02-50-26-331.png!
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OOZIE-3719) Apache Oozie Regex Denial of Service (ReDoS) Vulnerability by Low Privilege Users Disrupting Access for Intended Users

2023-09-14 Thread Sanjay Kumar Sahu (Jira)
Sanjay Kumar Sahu created OOZIE-3719:


 Summary: Apache Oozie Regex Denial of Service (ReDoS) 
Vulnerability by Low Privilege Users Disrupting Access for Intended Users
 Key: OOZIE-3719
 URL: https://issues.apache.org/jira/browse/OOZIE-3719
 Project: Oozie
  Issue Type: Bug
  Components: core
Affects Versions: 5.2.1
Reporter: Sanjay Kumar Sahu
 Attachments: image-2023-09-15-02-47-52-819.png, 
image-2023-09-15-02-49-14-531.png

!image-2023-09-15-02-47-52-819.png!

 

Looking further into the code focusing on the action and type query strings.
We can see that the filter variable is getting its value from the 
requestsParameters .
once the Filter parameter is being populated, an If loop checking whether Scope 
and Type are not Null and next
the code checks the logRetrievalType is equal to the JOB_LOG_ACTION (which is 
the action query string).

 

Next the values of logRetrievalScope gets split by , and entering the the if 
loop.
In the block where ranges of actions are processed ( if (s.contains("-")) \{ 
... } ), an attacker could potentially
send a specially crafted request with a massive range, such as "1-100". 
This would create a for loop
iterating and adding that many actions to the actionSet , consuming CPU and 
memory resources.
Though there is a subsequent check against maxNumActionsForLog , this check 
only happens after all the iterations,
allowing an attacker to consume resources before this check is made -

 

!image-2023-09-15-02-50-26-331.png!

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OOZIE-3718) Fix CVE-2023-36877 Azure Apache Oozie Spoofing Vulnerability

2023-09-12 Thread Nikhil Daf (Jira)
Nikhil Daf created OOZIE-3718:
-

 Summary: Fix CVE-2023-36877 Azure Apache Oozie Spoofing 
Vulnerability
 Key: OOZIE-3718
 URL: https://issues.apache.org/jira/browse/OOZIE-3718
 Project: Oozie
  Issue Type: Bug
Reporter: Nikhil Daf


[CVE-2023-36877 - Security Update Guide - Microsoft - Azure Apache Oozie 
Spoofing 
Vulnerability|https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-36877]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause de

2023-08-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751691#comment-17751691
 ] 

ASF subversion and git services commented on OOZIE-3717:


Commit 3c614c74cb5cb8897a2f95334b5e467227edf740 in oozie's branch 
refs/heads/master from Denes Bodo
[ https://gitbox.apache.org/repos/asf?p=oozie.git;h=3c614c74c ]

OOZIE-3717 When fork actions parallel submit, becasue ForkedActionStartXCommand 
and ActionStartXCommand has the same name, so ForkedActionStartXCommand would 
be lost, and cause deadlock (chenhd via dionusos)


> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, 
> OOZIE-3717-003.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(),

[jira] [Comment Edited] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cau

2023-08-07 Thread chenhaodan (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751270#comment-17751270
 ] 

chenhaodan edited comment on OOZIE-3717 at 8/7/23 8:20 AM:
---

[~dionusos] I am sorry for that. I had fixed in [^OOZIE-3717-003.patch]

Thanks for your time.


was (Author: chenhd):
[~dionusos] I am sorry for that. I had fixed them in [^OOZIE-3717-003.patch]

Thanks for your time.

> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, 
> OOZIE-3717-003.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(), ex);
> }
> }
> else {
> log.warn("max concurrency for callable [{0}] exceeded, 
> requeuein

[jira] [Comment Edited] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cau

2023-08-07 Thread chenhaodan (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751270#comment-17751270
 ] 

chenhaodan edited comment on OOZIE-3717 at 8/7/23 8:20 AM:
---

[~dionusos] I am sorry for that. I had fixed them in [^OOZIE-3717-003.patch]

Thanks for your time.


was (Author: chenhd):
[~dionusos] I am sorry for that. I has fixed them in [^OOZIE-3717-003.patch] 
[|https://issues.apache.org/jira/secure/DeleteAttachment!default.jspa?id=13545401&deleteAttachmentId=13061935&from=issue]

Thanks for your time.

> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
>     URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, 
> OOZIE-3717-003.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(), ex);
> }
> 

[jira] [Commented] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause de

2023-08-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751281#comment-17751281
 ] 

Hadoop QA commented on OOZIE-3717:
--


Testing JIRA OOZIE-3717

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any star imports
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [3] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in [webapp].
.{color:red}-1{color} There are [3] new bugs found below threshold in 
[core] that must be fixed.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.Unsafe comparison of hash that are susceptible to timing attack: At 
BulkJPAExecutor.java:[line 206]
.At ShareLibService.java:[line 689]: At ShareLibService.java:[line 695]
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3262
.{color:orange}Tests failed at first run:{color}
TestCoordActionInputCheckXCommand#testCoordActionInputCheckXCommandUniqueness
.For the complete list of flaky tests, see TEST-SUMMARY-FULL files.
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/213/



> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chen

[jira] [Commented] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause de

2023-08-04 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751273#comment-17751273
 ] 

Hadoop QA commented on OOZIE-3717:
--

PreCommit-OOZIE-Build started


> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, 
> OOZIE-3717-003.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(), ex);
> }
> }
> else {
> log.warn("max concurrency for callable [{0}] exceeded, 
> requeueing with [{1}]ms delay", callable
> .getType(), CONCURRENCY_DELAY);
> setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
>

[jira] [Commented] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause de

2023-08-04 Thread chenhaodan (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751270#comment-17751270
 ] 

chenhaodan commented on OOZIE-3717:
---

[~dionusos] I am sorry for that. I has fixed them in [^OOZIE-3717-003.patch] 
[|https://issues.apache.org/jira/secure/DeleteAttachment!default.jspa?id=13545401&deleteAttachmentId=13061935&from=issue]

Thanks for your time.

> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
>     URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, 
> OOZIE-3717-003.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(), ex);
> }
> }
> else {
> log.warn("max concurrency for callable [{0}] exceeded, 
> requeueing with [{1}]ms delay", callab

[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-08-04 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Attachment: OOZIE-3717-003.patch

> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch, 
> OOZIE-3717-003.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(), ex);
> }
> }
> else {
> log.warn("max concurrency for callable [{0}] exceeded, 
> requeueing with [{1}]ms delay", callable
> .getType(), CONCURRENCY_DELAY);
> setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> incrCounter(callable.getType() + "#exceeded.concur

[jira] [Commented] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause de

2023-08-04 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751026#comment-17751026
 ] 

Dénes Bodó commented on OOZIE-3717:
---

[~chenhd] Your change looks good to me overall.

However, may you please fix these errors Jenkins reported and one minor typo?
{noformat}
. -1 the patch contains 1 star import(s)
. -1 the patch contains 4 line(s) longer than 132 characters 

it would be lose. => it would be lost.
private XLog log => can be final{noformat}
Removing the unused imports from the affected classes are also very welcome.

 

Let me start additional tests on your change (~ 1 day). If those passes and the 
above issues are fixed I think we are good to merge.

> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_CO

[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-08-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dénes Bodó updated OOZIE-3717:
--
Fix Version/s: (was: trunk)

> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(), ex);
> }
> }
> else {
> log.warn("max concurrency for callable [{0}] exceeded, 
> requeueing with [{1}]ms delay", callable
> .getType(), CONCURRENCY_DELAY);
> setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> incrCounter(callable.getType() + "#exceeded.concurrency", 1);
> }

[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-07-31 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Description: 
when fork actions parallel submit will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Thread 1 and Thread 2 execute CallableWrapper's execute function order like:
  1. Thread 1 execute removeFromUniqueCallables; 
  2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
uniqueCallables;
  3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
filterDuplicates() function found a same name XCommand in uniqueCallables, so 
skip add to queue;

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
ForkedActionStartXCommand would be lost(never execute), and the thread that 
fork actions parallel submit block at CallableQueueService.blockingWait(). 
{code}
 
*CallableWrapper's code*
{code:java}
public class CallableWrapper extends PriorityDelayQueue.QueueElement 
implements Runnable, Callable {
private Instrumentation.Cron cron;

public void run() {
XCallable callable = null;
try {
removeFromUniqueCallables();
if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
[{1}]ms delay", getElement().getType(),
SAFE_MODE_DELAY);
setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
queue(this, true);
return;
}
callable = getElement();
if (callableBegin(callable)) {
cron.stop();
addInQueueCron(cron);
XLog log = XLog.getLog(getClass());
log.trace("executing callable [{0}]", callable.getName());

try {
//FutureTask.run() will invoke cllable.call()
super.run();
incrCounter(INSTR_EXECUTED_COUNTER, 1);
log.trace("executed callable [{0}]", callable.getName());
}
catch (Exception ex) {
incrCounter(INSTR_FAILED_COUNTER, 1);
log.warn("exception callable [{0}], {1}", 
callable.getName(), ex.getMessage(), ex);
}
}
else {
log.warn("max concurrency for callable [{0}] exceeded, 
requeueing with [{1}]ms delay", callable
.getType(), CONCURRENCY_DELAY);
setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
queue(this, true);
incrCounter(callable.getType() + "#exceeded.concurrency", 1);
}
}
catch (Throwable t) {
incrCounter(INSTR_FAILED_COUNTER, 1);
log.warn("exception callable [{0}], {1}", callable == null ? "N/A" 
: callable.getName(),
t.getMessage(), t);
}
finally {
if (callable != null) {
callableEnd(callable);
}
}
}
}
 {code}
 

 

  was:
when fork actions parallel submit will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   | 

[jira] [Commented] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause de

2023-07-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748911#comment-17748911
 ] 

Hadoop QA commented on OOZIE-3717:
--


Testing JIRA OOZIE-3717

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:red}-1{color} the patch contains 1 star import(s)
.{color:red}-1{color} the patch contains 4 line(s) longer than 132 
characters
.{color:green}+1{color} the patch adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [3] new bugs found below threshold in total that 
must be fixed.
.{color:red}-1{color} There are [3] new bugs found below threshold in 
[core] that must be fixed.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.Unsafe comparison of hash that are susceptible to timing attack: At 
BulkJPAExecutor.java:[line 206]
.At ShareLibService.java:[line 689]: At ShareLibService.java:[line 695]
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [webapp].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3262
.{color:orange}Tests failed at first run:{color}
TestSignalXCommand#testDeadlockForForkParallelSubmit
.For the complete list of flaky tests, see TEST-SUMMARY-FULL files.
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/212/



> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assign

[jira] [Commented] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause de

2023-07-30 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748891#comment-17748891
 ] 

Hadoop QA commented on OOZIE-3717:
--

PreCommit-OOZIE-Build started


> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order :
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(), ex);
> }
> }
> else {
> log.warn("max concurrency for callable [{0}] exceeded, 
> requeueing with [{1}]ms delay", callable
> .getType(), CONCURRENCY_DELAY);
> setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);

[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-07-30 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Attachment: OOZIE-3717-002.patch

> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
> Attachments: OOZIE-3717-001.patch, OOZIE-3717-002.patch
>
>
> when fork actions parallel submit will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Thread 1 and Thread 2 execute CallableWrapper's execute function order :
>   1. Thread 1 execute removeFromUniqueCallables; 
>   2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
> uniqueCallables;
>   3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
> filterDuplicates() function found a same name XCommand in uniqueCallables, so 
> skip add to queue;
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost(never execute), and the thread that 
> fork actions parallel submit block at CallableQueueService.blockingWait(). 
> {code}
>  
> *CallableWrapper's code*
> {code:java}
> public class CallableWrapper extends PriorityDelayQueue.QueueElement 
> implements Runnable, Callable {
> private Instrumentation.Cron cron;
> public void run() {
> XCallable callable = null;
> try {
> removeFromUniqueCallables();
> if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
> log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
> [{1}]ms delay", getElement().getType(),
> SAFE_MODE_DELAY);
> setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> return;
> }
> callable = getElement();
> if (callableBegin(callable)) {
> cron.stop();
> addInQueueCron(cron);
> XLog log = XLog.getLog(getClass());
> log.trace("executing callable [{0}]", callable.getName());
> try {
> //FutureTask.run() will invoke cllable.call()
> super.run();
> incrCounter(INSTR_EXECUTED_COUNTER, 1);
> log.trace("executed callable [{0}]", callable.getName());
> }
> catch (Exception ex) {
> incrCounter(INSTR_FAILED_COUNTER, 1);
> log.warn("exception callable [{0}], {1}", 
> callable.getName(), ex.getMessage(), ex);
> }
> }
> else {
> log.warn("max concurrency for callable [{0}] exceeded, 
> requeueing with [{1}]ms delay", callable
> .getType(), CONCURRENCY_DELAY);
> setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
> queue(this, true);
> incrCounter(callable.getType() + "#exceeded.con

[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-07-30 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Description: 
when fork actions parallel submit will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Thread 1 and Thread 2 execute CallableWrapper's execute function order :
  1. Thread 1 execute removeFromUniqueCallables; 
  2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
uniqueCallables;
  3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
filterDuplicates() function found a same name XCommand in uniqueCallables, so 
skip add to queue;

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
ForkedActionStartXCommand would be lost(never execute), and the thread that 
fork actions parallel submit block at CallableQueueService.blockingWait(). 
{code}
 
*CallableWrapper's code*
{code:java}
public class CallableWrapper extends PriorityDelayQueue.QueueElement 
implements Runnable, Callable {
private Instrumentation.Cron cron;

public void run() {
XCallable callable = null;
try {
removeFromUniqueCallables();
if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
[{1}]ms delay", getElement().getType(),
SAFE_MODE_DELAY);
setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
queue(this, true);
return;
}
callable = getElement();
if (callableBegin(callable)) {
cron.stop();
addInQueueCron(cron);
XLog log = XLog.getLog(getClass());
log.trace("executing callable [{0}]", callable.getName());

try {
//FutureTask.run() will invoke cllable.call()
super.run();
incrCounter(INSTR_EXECUTED_COUNTER, 1);
log.trace("executed callable [{0}]", callable.getName());
}
catch (Exception ex) {
incrCounter(INSTR_FAILED_COUNTER, 1);
log.warn("exception callable [{0}], {1}", 
callable.getName(), ex.getMessage(), ex);
}
}
else {
log.warn("max concurrency for callable [{0}] exceeded, 
requeueing with [{1}]ms delay", callable
.getType(), CONCURRENCY_DELAY);
setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
queue(this, true);
incrCounter(callable.getType() + "#exceeded.concurrency", 1);
}
}
catch (Throwable t) {
incrCounter(INSTR_FAILED_COUNTER, 1);
log.warn("exception callable [{0}], {1}", callable == null ? "N/A" 
: callable.getName(),
t.getMessage(), t);
}
finally {
if (callable != null) {
callableEnd(callable);
}
}
}
}
 {code}
 

 

  was:
Fork actions parallel submit, will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   | 

[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-07-30 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Description: 
Fork actions parallel submit, will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Thread 1 and Thread 2 execute CallableWrapper's execute function order :
  1. Thread 1 execute removeFromUniqueCallables; 
  2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
uniqueCallables;
  3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
filterDuplicates() function found a same name XCommand in uniqueCallables, so 
skip add to queue;

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
ForkedActionStartXCommand would be lost(never execute), and the thread that 
fork actions parallel submit block at CallableQueueService.blockingWait(). 
{code}
 
*CallableWrapper's code*
{code:java}
public class CallableWrapper extends PriorityDelayQueue.QueueElement 
implements Runnable, Callable {
private Instrumentation.Cron cron;

public void run() {
XCallable callable = null;
try {
removeFromUniqueCallables();
if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
[{1}]ms delay", getElement().getType(),
SAFE_MODE_DELAY);
setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
queue(this, true);
return;
}
callable = getElement();
if (callableBegin(callable)) {
cron.stop();
addInQueueCron(cron);
XLog log = XLog.getLog(getClass());
log.trace("executing callable [{0}]", callable.getName());

try {
//FutureTask.run() will invoke cllable.call()
super.run();
incrCounter(INSTR_EXECUTED_COUNTER, 1);
log.trace("executed callable [{0}]", callable.getName());
}
catch (Exception ex) {
incrCounter(INSTR_FAILED_COUNTER, 1);
log.warn("exception callable [{0}], {1}", 
callable.getName(), ex.getMessage(), ex);
}
}
else {
log.warn("max concurrency for callable [{0}] exceeded, 
requeueing with [{1}]ms delay", callable
.getType(), CONCURRENCY_DELAY);
setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
queue(this, true);
incrCounter(callable.getType() + "#exceeded.concurrency", 1);
}
}
catch (Throwable t) {
incrCounter(INSTR_FAILED_COUNTER, 1);
log.warn("exception callable [{0}], {1}", callable == null ? "N/A" 
: callable.getName(),
t.getMessage(), t);
}
finally {
if (callable != null) {
callableEnd(callable);
}
}
}
}
 {code}
 

 

  was:
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   | 

[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-07-30 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Description: 
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Thread 1 and Thread 2 execute CallableWrapper's execute function order :
  1. Thread 1 execute removeFromUniqueCallables; 
  2. Thread 2 execute queue add ActionStartXCommand into queue and add to 
uniqueCallables;
  3. Thread 1 execute queue add ForkedActionStartXCommand into queue, but 
filterDuplicates() function found a same name XCommand in uniqueCallables, so 
skip add to queue;

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
ForkedActionStartXCommand would be lost(never execute), and the thread that 
fork actions parallel submit block at CallableQueueService.blockingWait(). 
{code}
 
*CallableWrapper's code*
{code:java}
public class CallableWrapper extends PriorityDelayQueue.QueueElement 
implements Runnable, Callable {
private Instrumentation.Cron cron;

public void run() {
XCallable callable = null;
try {
removeFromUniqueCallables();
if (Services.get().getSystemMode() == SYSTEM_MODE.SAFEMODE) {
log.info("Oozie is in SAFEMODE, requeuing callable [{0}] with 
[{1}]ms delay", getElement().getType(),
SAFE_MODE_DELAY);
setDelay(SAFE_MODE_DELAY, TimeUnit.MILLISECONDS);
queue(this, true);
return;
}
callable = getElement();
if (callableBegin(callable)) {
cron.stop();
addInQueueCron(cron);
XLog log = XLog.getLog(getClass());
log.trace("executing callable [{0}]", callable.getName());

try {
//FutureTask.run() will invoke cllable.call()
super.run();
incrCounter(INSTR_EXECUTED_COUNTER, 1);
log.trace("executed callable [{0}]", callable.getName());
}
catch (Exception ex) {
incrCounter(INSTR_FAILED_COUNTER, 1);
log.warn("exception callable [{0}], {1}", 
callable.getName(), ex.getMessage(), ex);
}
}
else {
log.warn("max concurrency for callable [{0}] exceeded, 
requeueing with [{1}]ms delay", callable
.getType(), CONCURRENCY_DELAY);
setDelay(CONCURRENCY_DELAY, TimeUnit.MILLISECONDS);
queue(this, true);
incrCounter(callable.getType() + "#exceeded.concurrency", 1);
}
}
catch (Throwable t) {
incrCounter(INSTR_FAILED_COUNTER, 1);
log.warn("exception callable [{0}], {1}", callable == null ? "N/A" 
: callable.getName(),
t.getMessage(), t);
}
finally {
if (callable != null) {
callableEnd(callable);
}
}
}
}
 {code}
 

 

  was:
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   | 

[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-07-29 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Description: 
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
ForkedActionStartXCommand would be lost, and the thread that fork actions 
parallel submit block at CallableQueueService.blockingWait(). {code}

  was:
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and block at 
CallableQueueService.blockingWait(). {code}


> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
> Attachments: OOZIE-3717-001.patch
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> Thread 2 add ActionStartXCommand enqueue before Thread 1, so 
> ForkedActionStartXCommand would be lost, and the thread that fork actions 
> parallel submit block at CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3717) When fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause dead

2023-07-29 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Summary: When fork actions parallel submit, becasue 
ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and cause deadlock  (was: Fork actions 
parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has 
the same name, so ForkedActionStartXCommand would be lost, and cause deadlock)

> When fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> --
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
> Attachments: OOZIE-3717-001.patch
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> so ForkedActionStartXCommand would be lost, and block at 
> CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadloc

2023-07-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748883#comment-17748883
 ] 

Hadoop QA commented on OOZIE-3717:
--


Testing JIRA OOZIE-3717

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:red}-1{color} the patch contains 1 star import(s)
.{color:red}-1{color} the patch contains 4 line(s) longer than 132 
characters
.{color:green}+1{color} the patch adds/modifies 1 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [3] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in [webapp].
.{color:red}-1{color} There are [3] new bugs found below threshold in 
[core] that must be fixed.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.Unsafe comparison of hash that are susceptible to timing attack: At 
BulkJPAExecutor.java:[line 206]
.At ShareLibService.java:[line 689]: At ShareLibService.java:[line 695]
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3262
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/211/



> Fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> -
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
> Attachments: OOZIE-3717-001.patch
>
>
> Fork actions parallel submit, so will add ForkedActi

[jira] [Commented] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadloc

2023-07-29 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17748871#comment-17748871
 ] 

Hadoop QA commented on OOZIE-3717:
--

PreCommit-OOZIE-Build started


> Fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> -
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
> Attachments: OOZIE-3717-001.patch
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> so ForkedActionStartXCommand would be lost, and block at 
> CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock

2023-07-29 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Attachment: (was: OOZIE-3717-001.patch)

> Fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> -
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> so ForkedActionStartXCommand would be lost, and block at 
> CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock

2023-07-29 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Attachment: OOZIE-3717-001.patch

> Fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> -
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> so ForkedActionStartXCommand would be lost, and block at 
> CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock

2023-07-29 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Attachment: (was: OOZIE-3717-001.patch)

> Fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> -
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> so ForkedActionStartXCommand would be lost, and block at 
> CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock

2023-07-29 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Description: 
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
Thread 1  Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and block at 
CallableQueueService.blockingWait(). {code}

  was:
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
     Thread 1                   Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and block at 
CallableQueueService.blockingWait(). {code}


> Fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> -
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
> Thread 1  Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> so ForkedActionStartXCommand would be lost, and block at 
> CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock

2023-07-29 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Description: 
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
     Thread 1                   Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  .  |
++          +-+
|   ..   |          |  queue  |
++          +-+
|   queue|       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and block at 
CallableQueueService.blockingWait(). {code}

  was:
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
     Thread 1                   Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  . |
++          +-+
|         ..    |          |  queue  |
++          +-+
|         queue     |       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and block at 
CallableQueueService.blockingWait(). {code}


> Fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> -
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
>      Thread 1                   Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  .  |
> ++          +-+
> |   ..   |          |  queue  |
> ++          +-+
> |   queue|       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> so ForkedActionStartXCommand would be lost, and block at 
> CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock

2023-07-29 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3717:
--
Description: 
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
     Thread 1                   Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  . |
++          +-+
|         ..    |          |  queue  |
++          +-+
|         queue     |       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and block at 
CallableQueueService.blockingWait(). {code}

  was:
Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
     Thread 1                   Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  . |
++          +-+
|         ..    |          |  queue  |
++          +-+
|         queue     |       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and CallableQueueService block at 
CallableQueueService.blockingWait(). {code}


> Fork actions parallel submit, becasue ForkedActionStartXCommand and 
> ActionStartXCommand has the same name, so ForkedActionStartXCommand would be 
> lost, and cause deadlock
> -
>
> Key: OOZIE-3717
> URL: https://issues.apache.org/jira/browse/OOZIE-3717
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
> Fix For: trunk
>
>
> Fork actions parallel submit, so will add ForkedActionStartXCommand and 
> RecoveryService will check pending action may add ActionStartXCommand, if 
> ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
> action) in queue, it would be lose. The thread parallel submit actions block 
> at CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
> finish, but ForkedActionStartXCommand had lost and cause deadlock.
> {code:java}
>      Thread 1                   Thread 2
>   (ForkedActionStartXCommand)      (ActionStartXCommand)
> ++          +-+
> | removeFromUniqueCallables  |          |  . |
> ++          +-+
> |         ..    |          |  queue  |
> ++          +-+
> |         queue     |       enqueue successed, in uniqueCallables
> ++ 
> | wrapper.filterDuplicates() |
> ++
> Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, 
> so ForkedActionStartXCommand would be lost, and block at 
> CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (OOZIE-3717) Fork actions parallel submit, becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so ForkedActionStartXCommand would be lost, and cause deadlock

2023-07-29 Thread chenhaodan (Jira)
chenhaodan created OOZIE-3717:
-

 Summary: Fork actions parallel submit, becasue 
ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and cause deadlock
 Key: OOZIE-3717
 URL: https://issues.apache.org/jira/browse/OOZIE-3717
 Project: Oozie
  Issue Type: Bug
  Components: action
Affects Versions: 5.2.1
Reporter: chenhaodan
Assignee: chenhaodan
 Fix For: trunk


Fork actions parallel submit, so will add ForkedActionStartXCommand and 

RecoveryService will check pending action may add ActionStartXCommand, if 
ForkedActionStartXCommand enqueue and there is a ActionStartXCommand(the same 
action) in queue, it would be lose. The thread parallel submit actions block at 
CallableQueueService.blockingWait() wait for ForkedActionStartXCommand  to 
finish, but ForkedActionStartXCommand had lost and cause deadlock.
{code:java}
     Thread 1                   Thread 2
  (ForkedActionStartXCommand)      (ActionStartXCommand)
++          +-+
| removeFromUniqueCallables  |          |  . |
++          +-+
|         ..    |          |  queue  |
++          +-+
|         queue     |       enqueue successed, in uniqueCallables
++ 
| wrapper.filterDuplicates() |
++

Becasue ForkedActionStartXCommand and ActionStartXCommand has the same name, so 
ForkedActionStartXCommand would be lost, and CallableQueueService block at 
CallableQueueService.blockingWait(). {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-07-10 Thread chenhaodan (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741800#comment-17741800
 ] 

chenhaodan commented on OOZIE-3715:
---

[~dionusos] Thank you very much!

> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: 5.3.0
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> OOZIE-3715-006.patch, forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-07-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741695#comment-17741695
 ] 

ASF subversion and git services commented on OOZIE-3715:


Commit 603900735d8682bad3d0d1a62f7ca1db9fa1d2b3 in oozie's branch 
refs/heads/master from Denes Bodo
[ https://gitbox.apache.org/repos/asf?p=oozie.git;h=603900735 ]

OOZIE-3715 Fix fork out more than one transitions submit , one transition 
submit fail can't execute KillXCommand (chenhd via dionusos)


> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
>     URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> OOZIE-3715-006.patch, forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-07-08 Thread chenhaodan (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741311#comment-17741311
 ] 

chenhaodan commented on OOZIE-3715:
---

[~dionusos] Thanks for your patience in guidance. I had change the code, there 
has any other remarks regarding. Thank you again.

> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> OOZIE-3715-006.patch, forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-07-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741268#comment-17741268
 ] 

Hadoop QA commented on OOZIE-3715:
--


Testing JIRA OOZIE-3715

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any star imports
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 2 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [5] new bugs found below threshold in total that 
must be fixed.
.{color:red}-1{color} There are [5] new bugs found below threshold in 
[core] that must be fixed.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection: At BulkJPAExecutor.java:[line 206]
.At CoordJobGetActionsSubsetJPAExecutor.java:[line 76]: At 
CoordJobGetActionsSubsetJPAExecutor.java:[line 111]
.{color:green}+1{color} There are no new bugs found in [client].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [webapp].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3261
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/210/



> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-

[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-07-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741253#comment-17741253
 ] 

Hadoop QA commented on OOZIE-3715:
--

PreCommit-OOZIE-Build started


> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> OOZIE-3715-006.patch, forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-07-08 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3715:
--
Attachment: OOZIE-3715-006.patch

> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> OOZIE-3715-006.patch, forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-06-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17733884#comment-17733884
 ] 

Dénes Bodó commented on OOZIE-3715:
---

[~chenhd] 

In TestSignalXCommand please don't use infinite loop:
{code:java}
while (!WorkflowJob.Status.FAILED.equals( engine.getJob(jobId).getStatus() )){
Thread.sleep(500);
} {code}
but invoke *waitFor* method as can be seen in other test cases.

Also please use "uri:oozie:workflow:1.0" instead of "uri:oozie:workflow:0.4" in 
the same file and remove space after '(' and before ')'.

 

Always compare constant to variable:
{code:java}
"somestring".equals(action.getName()) {code}
 

Isn't there a way to swap
{code:java}
// wait for execute KillXCommand
Thread.sleep(500);{code}
to some safer implementation? Waiting 500 ms may be not enough in some cases so 
this could became an intermittently failing test case which we must avoid. 
Isn't there an indicator which we could check in a waitFor method like size of 
a list of an SQL query executed via JPAExecutor?

Thanks

> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(

[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-06-05 Thread chenhaodan (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17729267#comment-17729267
 ] 

chenhaodan commented on OOZIE-3715:
---

Hi, [~dionusos] ,do you have any other feedback or remarks regarding this 
change({*}OOZIE-3715-005.patch{*} )? Thank you!

> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-04-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717329#comment-17717329
 ] 

Hadoop QA commented on OOZIE-3715:
--


Testing JIRA OOZIE-3715

Cleaning local git workspace



{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.{color:green}+1{color} the patch does not introduce any @author tags
.{color:green}+1{color} the patch does not introduce any tabs
.{color:green}+1{color} the patch does not introduce any trailing spaces
.{color:green}+1{color} the patch does not introduce any star imports
.{color:green}+1{color} the patch does not introduce any line longer than 
132
.{color:green}+1{color} the patch adds/modifies 2 testcase(s)
{color:green}+1 RAT{color}
.{color:green}+1{color} the patch does not seem to introduce new RAT 
warnings
{color:green}+1 JAVADOC{color}
.{color:green}+1{color} Javadoc generation succeeded with the patch
.{color:green}+1{color} the patch does not seem to introduce new Javadoc 
warning(s)
{color:green}+1 COMPILE{color}
.{color:green}+1{color} HEAD compiles
.{color:green}+1{color} patch compiles
.{color:green}+1{color} the patch does not seem to introduce new javac 
warnings
{color:red}-1{color} There are [5] new bugs found below threshold in total that 
must be fixed.
.{color:green}+1{color} There are no new bugs found in [examples].
.{color:green}+1{color} There are no new bugs found in 
[fluent-job/fluent-job-api].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive].
.{color:green}+1{color} There are no new bugs found in [sharelib/hive2].
.{color:green}+1{color} There are no new bugs found in [sharelib/git].
.{color:green}+1{color} There are no new bugs found in [sharelib/distcp].
.{color:green}+1{color} There are no new bugs found in [sharelib/hcatalog].
.{color:green}+1{color} There are no new bugs found in [sharelib/sqoop].
.{color:green}+1{color} There are no new bugs found in [sharelib/spark].
.{color:green}+1{color} There are no new bugs found in [sharelib/oozie].
.{color:green}+1{color} There are no new bugs found in [sharelib/pig].
.{color:green}+1{color} There are no new bugs found in [sharelib/streaming].
.{color:green}+1{color} There are no new bugs found in [server].
.{color:green}+1{color} There are no new bugs found in [docs].
.{color:green}+1{color} There are no new bugs found in [webapp].
.{color:red}-1{color} There are [5] new bugs found below threshold in 
[core] that must be fixed.
.You can find the SpotBugs diff here (look for the red and orange ones): 
core/findbugs-new.html
.The most important SpotBugs errors are:
.At BulkJPAExecutor.java:[line 206]: This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection
.At BulkJPAExecutor.java:[line 176]: At BulkJPAExecutor.java:[line 175]
.At BulkJPAExecutor.java:[line 205]: At BulkJPAExecutor.java:[line 199]
.This use of 
javax/persistence/EntityManager.createQuery(Ljava/lang/String;)Ljavax/persistence/Query;
 can be vulnerable to SQL/JPQL injection: At BulkJPAExecutor.java:[line 206]
.At CoordJobGetActionsSubsetJPAExecutor.java:[line 76]: At 
CoordJobGetActionsSubsetJPAExecutor.java:[line 111]
.{color:green}+1{color} There are no new bugs found in [tools].
.{color:green}+1{color} There are no new bugs found in [client].
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.{color:green}+1{color} the patch does not change any JPA 
Entity/Colum/Basic/Lob/Transient annotations
.{color:green}+1{color} the patch does not modify JPA files
{color:green}+1 TESTS{color}
.Tests run: 3261
.{color:orange}Tests failed at first run:{color}
TestCLIParser#testCommandParserShowHelp
TestBundleRerunXCommand#testBundleRerunInPausedWithError
TestCoordActionsKillXCommand#testActionKillCommandDate
.For the complete list of flaky tests, see TEST-SUMMARY-FULL files.
{color:green}+1 DISTRO{color}
.{color:green}+1{color} distro tarball builds with the patch 
{color:green}+1 MODERNIZER{color}


{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

. https://ci-hadoop.apache.org/job/PreCommit-OOZIE-Build/209/



> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affect

[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-04-27 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717288#comment-17717288
 ] 

Hadoop QA commented on OOZIE-3715:
--

PreCommit-OOZIE-Build started


> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-04-27 Thread chenhaodan (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717285#comment-17717285
 ] 

chenhaodan commented on OOZIE-3715:
---

[~dionusos] OK, Thank for you patience in guidance.

> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (OOZIE-3715) Fix fork out more than one transitions submit , one transition submit fail can't execute KillXCommand

2023-04-27 Thread chenhaodan (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenhaodan updated OOZIE-3715:
--
Attachment: OOZIE-3715-005.patch

> Fix fork out more than one transitions submit , one transition submit fail 
> can't execute KillXCommand
> -
>
> Key: OOZIE-3715
> URL: https://issues.apache.org/jira/browse/OOZIE-3715
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.1
>Reporter: chenhaodan
>Assignee: chenhaodan
>Priority: Major
>  Labels: patch
> Fix For: trunk
>
> Attachments: OOZIE-3715-001.patch, OOZIE-3715-002.patch, 
> OOZIE-3715-003.patch, OOZIE-3715-004.patch, OOZIE-3715-005.patch, 
> forkSubmitFail_issue.txt, status.png
>
>
> When I fork 2 transitions( A and B) to submit , when A transition failed , B 
> transition  still Running , because can't execute KillXCommand.
> SignalXCommand.startForkedActions, when one transition  submit fail will 
> create a new ActionStartXCommand and invoke failJob, failJob will add 
> WorkflowNotificationXCommand and KillXCommand to 
> {color:#ff}*commandQueue*{color} , and callback at XCommand.call method , 
> but we add WorkflowNotificationXCommand and KillXCommand to 
> ActionStartXCommand‘s {color:#ff}*commandQueue*{color}  , but not 
> SignalXCommand  ,  so can't execute KillXCommand. 
> The code is as follows :
>  
> {code:java}
> public void startForkedActions(List 
> workflowActionBeanListForForked) throws CommandException {
> ..
> for (Future result : futures) {
>  ..
> if (context.getJobStatus() != null && 
> context.getJobStatus().equals(Job.Status.FAILED)) {
> new ActionStartXCommand(context.getAction().getId(), 
> null).failJob(context);
>  ..
> }
>..
> }
> {code}
>  
> {code:java}
> public void failJob(ActionExecutor.Context context, WorkflowActionBean 
> action) throws CommandException {
>         WorkflowJobBean workflow = (WorkflowJobBean) context.getWorkflow();
>         if (!handleUserRetry(context, action)) {
>             incrActionErrorCounter(action.getType(), "failed", 1);
>             LOG.warn("Failing Job due to failed action [{0}]", 
> action.getName());
>             try {
>                 workflow.getWorkflowInstance().fail(action.getName());
>                 WorkflowInstance wfInstance = workflow.getWorkflowInstance();
>                 ((LiteWorkflowInstance) 
> wfInstance).setStatus(WorkflowInstance.Status.FAILED);
>                 workflow.setWorkflowInstance(wfInstance);
>                 workflow.setStatus(WorkflowJob.Status.FAILED);
>                 action.setStatus(WorkflowAction.Status.FAILED);
>                 action.resetPending();
>                 queue(new WorkflowNotificationXCommand(workflow, action));
>                 queue(new KillXCommand(workflow.getId()));         
> InstrumentUtils.incrJobCounter(INSTR_FAILED_JOBS_COUNTER_NAME, 1, 
> getInstrumentation());
>             }
>             catch (WorkflowException ex) {
>                 throw new CommandException(ex);
>             }
>         }
>     }
> {code}
>  
> {code:java}
> public final T call() throws CommandException {
> if (commandQueue != null) {
> for (Map.Entry>> entry : 
> commandQueue.entrySet()) {
> LOG.debug("Queuing [{0}] commands with delay [{1}]ms", 
> entry.getValue().size(), entry.getKey());
> if (!callableQueueService.queueSerial(entry.getValue(), 
> entry.getKey())) {
> LOG.warn("Could not queue [{0}] commands with delay [{1}]ms, 
> queue full", entry.getValue()
> .size(), entry.getKey());
> }
> }
>  } 
> }
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] Subscription: Oozie Patch Available

2023-04-24 Thread jira
Issue Subscription
Filter: Oozie Patch Available (103 issues)

Subscriber: ooziedaily

Key Summary
OOZIE-3715  Fix fork out more than one transitions submit , one transition 
submit fail can't execute KillXCommand
https://issues.apache.org/jira/browse/OOZIE-3715
OOZIE-3687  Fix Oozie client always using the current system username instead 
the one specified by the user (e.g.: via kerberos or via explicit basic 
authentication)
https://issues.apache.org/jira/browse/OOZIE-3687
OOZIE-3680  Add default value to custom configuration of all the supported file 
systems in Oozie
https://issues.apache.org/jira/browse/OOZIE-3680
OOZIE-3663  Upgrade Apache Xerces Java to 2.12.2
https://issues.apache.org/jira/browse/OOZIE-3663
OOZIE-3654  update to httpclient 4.5.13
https://issues.apache.org/jira/browse/OOZIE-3654
OOZIE-3635  Reduce nest of code in RecoveryService
https://issues.apache.org/jira/browse/OOZIE-3635
OOZIE-3623  Fix Typos in Distro Assembly
https://issues.apache.org/jira/browse/OOZIE-3623
OOZIE-3621  Make TestECPolicyDisabler work with Hadoop 3
https://issues.apache.org/jira/browse/OOZIE-3621
OOZIE-3620  hadoopId is not sent to eventHandlerService (listener) for workflow 
action events
https://issues.apache.org/jira/browse/OOZIE-3620
OOZIE-3609  Zookeeper SSL/TLS support
https://issues.apache.org/jira/browse/OOZIE-3609
OOZIE-3596  When the SSH action is killed, it must be changed to the kill 
command that can terminate the related subprocess.
https://issues.apache.org/jira/browse/OOZIE-3596
OOZIE-3568  Have large amount of log information “WARN messages [main] 
openjpa.MetaData” in jetty.log need to clean
https://issues.apache.org/jira/browse/OOZIE-3568
OOZIE-3567  Oozie ShellAction should support absolute bash file path
https://issues.apache.org/jira/browse/OOZIE-3567
OOZIE-3560  IDEA shows have some error  in index.jsp
https://issues.apache.org/jira/browse/OOZIE-3560
OOZIE-3554  Add asf.yaml to git repo
https://issues.apache.org/jira/browse/OOZIE-3554
OOZIE-3545  Upgrade jQuery
https://issues.apache.org/jira/browse/OOZIE-3545
OOZIE-3482  Fix bug in CoordSubmitXCommand#validateCoordinatorJob
https://issues.apache.org/jira/browse/OOZIE-3482
OOZIE-3480  Add windowactionstatus metrics in DBLiteWorkflowStoreService
https://issues.apache.org/jira/browse/OOZIE-3480
OOZIE-3461  CoordMaterializeTriggerService code cleanup
https://issues.apache.org/jira/browse/OOZIE-3461
OOZIE-3449  Make spark-2 as the default profile
https://issues.apache.org/jira/browse/OOZIE-3449
OOZIE-3447  Run test case in local : It shows oozie-hsqldb-orm.xml exception
https://issues.apache.org/jira/browse/OOZIE-3447
OOZIE-3434  Filtering for invalid jobtype should give error message
https://issues.apache.org/jira/browse/OOZIE-3434
OOZIE-3418  Upgrade to Guava 27
https://issues.apache.org/jira/browse/OOZIE-3418
OOZIE-3404  The env variable of SPARK_HOME needs to be set when running pySpark
https://issues.apache.org/jira/browse/OOZIE-3404
OOZIE-3375  Can't use empty  in coordinator
https://issues.apache.org/jira/browse/OOZIE-3375
OOZIE-3367  Using && in EL expressions in oozie bundle.xml files generates 
parse errors
https://issues.apache.org/jira/browse/OOZIE-3367
OOZIE-3366  Update workflow status and subworkflow status on suspend command
https://issues.apache.org/jira/browse/OOZIE-3366
OOZIE-3364  Rerunning Oozie bundle jobs starts the coordinators in 
indeterminate order
https://issues.apache.org/jira/browse/OOZIE-3364
OOZIE-3362  When killed, SSH action should kill the spawned processes on target 
host
https://issues.apache.org/jira/browse/OOZIE-3362
OOZIE-3335  Cleanup parseFilter methods
https://issues.apache.org/jira/browse/OOZIE-3335
OOZIE-3328  Create Hive compatibility action executor to run hive actions using 
beeline
https://issues.apache.org/jira/browse/OOZIE-3328
OOZIE-3319  Log SSH action callback error output
https://issues.apache.org/jira/browse/OOZIE-3319
OOZIE-3301  Update NOTICE file
https://issues.apache.org/jira/browse/OOZIE-3301
OOZIE-3274  Remove slf4j
https://issues.apache.org/jira/browse/OOZIE-3274
OOZIE-3266  Coord action rerun support RERUN_SKIP_NODES option
https://issues.apache.org/jira/browse/OOZIE-3266
OOZIE-3256  refactor OozieCLI class
https://issues.apache.org/jira/browse/OOZIE-3256
OOZIE-3196  Authorization: restrict world readability by user
https://issues.apache.org/jira/browse/OOZIE-3196
OOZIE-3170  Oozie Diagnostic Bundle tool fails with NPE due to missing service 
class
https://issues.apache.org/jira/browse/OOZIE-3170
OOZIE-314

[jira] Subscription: Oozie Patch Available

2023-04-23 Thread jira
Issue Subscription
Filter: Oozie Patch Available (103 issues)

Subscriber: ooziedaily

Key Summary
OOZIE-3715  Fix fork out more than one transitions submit , one transition 
submit fail can't execute KillXCommand
https://issues.apache.org/jira/browse/OOZIE-3715
OOZIE-3687  Fix Oozie client always using the current system username instead 
the one specified by the user (e.g.: via kerberos or via explicit basic 
authentication)
https://issues.apache.org/jira/browse/OOZIE-3687
OOZIE-3680  Add default value to custom configuration of all the supported file 
systems in Oozie
https://issues.apache.org/jira/browse/OOZIE-3680
OOZIE-3663  Upgrade Apache Xerces Java to 2.12.2
https://issues.apache.org/jira/browse/OOZIE-3663
OOZIE-3654  update to httpclient 4.5.13
https://issues.apache.org/jira/browse/OOZIE-3654
OOZIE-3635  Reduce nest of code in RecoveryService
https://issues.apache.org/jira/browse/OOZIE-3635
OOZIE-3623  Fix Typos in Distro Assembly
https://issues.apache.org/jira/browse/OOZIE-3623
OOZIE-3621  Make TestECPolicyDisabler work with Hadoop 3
https://issues.apache.org/jira/browse/OOZIE-3621
OOZIE-3620  hadoopId is not sent to eventHandlerService (listener) for workflow 
action events
https://issues.apache.org/jira/browse/OOZIE-3620
OOZIE-3609  Zookeeper SSL/TLS support
https://issues.apache.org/jira/browse/OOZIE-3609
OOZIE-3596  When the SSH action is killed, it must be changed to the kill 
command that can terminate the related subprocess.
https://issues.apache.org/jira/browse/OOZIE-3596
OOZIE-3568  Have large amount of log information “WARN messages [main] 
openjpa.MetaData” in jetty.log need to clean
https://issues.apache.org/jira/browse/OOZIE-3568
OOZIE-3567  Oozie ShellAction should support absolute bash file path
https://issues.apache.org/jira/browse/OOZIE-3567
OOZIE-3560  IDEA shows have some error  in index.jsp
https://issues.apache.org/jira/browse/OOZIE-3560
OOZIE-3554  Add asf.yaml to git repo
https://issues.apache.org/jira/browse/OOZIE-3554
OOZIE-3545  Upgrade jQuery
https://issues.apache.org/jira/browse/OOZIE-3545
OOZIE-3482  Fix bug in CoordSubmitXCommand#validateCoordinatorJob
https://issues.apache.org/jira/browse/OOZIE-3482
OOZIE-3480  Add windowactionstatus metrics in DBLiteWorkflowStoreService
https://issues.apache.org/jira/browse/OOZIE-3480
OOZIE-3461  CoordMaterializeTriggerService code cleanup
https://issues.apache.org/jira/browse/OOZIE-3461
OOZIE-3449  Make spark-2 as the default profile
https://issues.apache.org/jira/browse/OOZIE-3449
OOZIE-3447  Run test case in local : It shows oozie-hsqldb-orm.xml exception
https://issues.apache.org/jira/browse/OOZIE-3447
OOZIE-3434  Filtering for invalid jobtype should give error message
https://issues.apache.org/jira/browse/OOZIE-3434
OOZIE-3418  Upgrade to Guava 27
https://issues.apache.org/jira/browse/OOZIE-3418
OOZIE-3404  The env variable of SPARK_HOME needs to be set when running pySpark
https://issues.apache.org/jira/browse/OOZIE-3404
OOZIE-3375  Can't use empty  in coordinator
https://issues.apache.org/jira/browse/OOZIE-3375
OOZIE-3367  Using && in EL expressions in oozie bundle.xml files generates 
parse errors
https://issues.apache.org/jira/browse/OOZIE-3367
OOZIE-3366  Update workflow status and subworkflow status on suspend command
https://issues.apache.org/jira/browse/OOZIE-3366
OOZIE-3364  Rerunning Oozie bundle jobs starts the coordinators in 
indeterminate order
https://issues.apache.org/jira/browse/OOZIE-3364
OOZIE-3362  When killed, SSH action should kill the spawned processes on target 
host
https://issues.apache.org/jira/browse/OOZIE-3362
OOZIE-3335  Cleanup parseFilter methods
https://issues.apache.org/jira/browse/OOZIE-3335
OOZIE-3328  Create Hive compatibility action executor to run hive actions using 
beeline
https://issues.apache.org/jira/browse/OOZIE-3328
OOZIE-3319  Log SSH action callback error output
https://issues.apache.org/jira/browse/OOZIE-3319
OOZIE-3301  Update NOTICE file
https://issues.apache.org/jira/browse/OOZIE-3301
OOZIE-3274  Remove slf4j
https://issues.apache.org/jira/browse/OOZIE-3274
OOZIE-3266  Coord action rerun support RERUN_SKIP_NODES option
https://issues.apache.org/jira/browse/OOZIE-3266
OOZIE-3256  refactor OozieCLI class
https://issues.apache.org/jira/browse/OOZIE-3256
OOZIE-3196  Authorization: restrict world readability by user
https://issues.apache.org/jira/browse/OOZIE-3196
OOZIE-3170  Oozie Diagnostic Bundle tool fails with NPE due to missing service 
class
https://issues.apache.org/jira/browse/OOZIE-3170
OOZIE-314

[jira] Subscription: Oozie Patch Available

2023-04-22 Thread jira
Issue Subscription
Filter: Oozie Patch Available (103 issues)

Subscriber: ooziedaily

Key Summary
OOZIE-3715  Fix fork out more than one transitions submit , one transition 
submit fail can't execute KillXCommand
https://issues.apache.org/jira/browse/OOZIE-3715
OOZIE-3687  Fix Oozie client always using the current system username instead 
the one specified by the user (e.g.: via kerberos or via explicit basic 
authentication)
https://issues.apache.org/jira/browse/OOZIE-3687
OOZIE-3680  Add default value to custom configuration of all the supported file 
systems in Oozie
https://issues.apache.org/jira/browse/OOZIE-3680
OOZIE-3663  Upgrade Apache Xerces Java to 2.12.2
https://issues.apache.org/jira/browse/OOZIE-3663
OOZIE-3654  update to httpclient 4.5.13
https://issues.apache.org/jira/browse/OOZIE-3654
OOZIE-3635  Reduce nest of code in RecoveryService
https://issues.apache.org/jira/browse/OOZIE-3635
OOZIE-3623  Fix Typos in Distro Assembly
https://issues.apache.org/jira/browse/OOZIE-3623
OOZIE-3621  Make TestECPolicyDisabler work with Hadoop 3
https://issues.apache.org/jira/browse/OOZIE-3621
OOZIE-3620  hadoopId is not sent to eventHandlerService (listener) for workflow 
action events
https://issues.apache.org/jira/browse/OOZIE-3620
OOZIE-3609  Zookeeper SSL/TLS support
https://issues.apache.org/jira/browse/OOZIE-3609
OOZIE-3596  When the SSH action is killed, it must be changed to the kill 
command that can terminate the related subprocess.
https://issues.apache.org/jira/browse/OOZIE-3596
OOZIE-3568  Have large amount of log information “WARN messages [main] 
openjpa.MetaData” in jetty.log need to clean
https://issues.apache.org/jira/browse/OOZIE-3568
OOZIE-3567  Oozie ShellAction should support absolute bash file path
https://issues.apache.org/jira/browse/OOZIE-3567
OOZIE-3560  IDEA shows have some error  in index.jsp
https://issues.apache.org/jira/browse/OOZIE-3560
OOZIE-3554  Add asf.yaml to git repo
https://issues.apache.org/jira/browse/OOZIE-3554
OOZIE-3545  Upgrade jQuery
https://issues.apache.org/jira/browse/OOZIE-3545
OOZIE-3482  Fix bug in CoordSubmitXCommand#validateCoordinatorJob
https://issues.apache.org/jira/browse/OOZIE-3482
OOZIE-3480  Add windowactionstatus metrics in DBLiteWorkflowStoreService
https://issues.apache.org/jira/browse/OOZIE-3480
OOZIE-3461  CoordMaterializeTriggerService code cleanup
https://issues.apache.org/jira/browse/OOZIE-3461
OOZIE-3449  Make spark-2 as the default profile
https://issues.apache.org/jira/browse/OOZIE-3449
OOZIE-3447  Run test case in local : It shows oozie-hsqldb-orm.xml exception
https://issues.apache.org/jira/browse/OOZIE-3447
OOZIE-3434  Filtering for invalid jobtype should give error message
https://issues.apache.org/jira/browse/OOZIE-3434
OOZIE-3418  Upgrade to Guava 27
https://issues.apache.org/jira/browse/OOZIE-3418
OOZIE-3404  The env variable of SPARK_HOME needs to be set when running pySpark
https://issues.apache.org/jira/browse/OOZIE-3404
OOZIE-3375  Can't use empty  in coordinator
https://issues.apache.org/jira/browse/OOZIE-3375
OOZIE-3367  Using && in EL expressions in oozie bundle.xml files generates 
parse errors
https://issues.apache.org/jira/browse/OOZIE-3367
OOZIE-3366  Update workflow status and subworkflow status on suspend command
https://issues.apache.org/jira/browse/OOZIE-3366
OOZIE-3364  Rerunning Oozie bundle jobs starts the coordinators in 
indeterminate order
https://issues.apache.org/jira/browse/OOZIE-3364
OOZIE-3362  When killed, SSH action should kill the spawned processes on target 
host
https://issues.apache.org/jira/browse/OOZIE-3362
OOZIE-3335  Cleanup parseFilter methods
https://issues.apache.org/jira/browse/OOZIE-3335
OOZIE-3328  Create Hive compatibility action executor to run hive actions using 
beeline
https://issues.apache.org/jira/browse/OOZIE-3328
OOZIE-3319  Log SSH action callback error output
https://issues.apache.org/jira/browse/OOZIE-3319
OOZIE-3301  Update NOTICE file
https://issues.apache.org/jira/browse/OOZIE-3301
OOZIE-3274  Remove slf4j
https://issues.apache.org/jira/browse/OOZIE-3274
OOZIE-3266  Coord action rerun support RERUN_SKIP_NODES option
https://issues.apache.org/jira/browse/OOZIE-3266
OOZIE-3256  refactor OozieCLI class
https://issues.apache.org/jira/browse/OOZIE-3256
OOZIE-3196  Authorization: restrict world readability by user
https://issues.apache.org/jira/browse/OOZIE-3196
OOZIE-3170  Oozie Diagnostic Bundle tool fails with NPE due to missing service 
class
https://issues.apache.org/jira/browse/OOZIE-3170
OOZIE-314

[jira] Subscription: Oozie Patch Available

2023-04-21 Thread jira
Issue Subscription
Filter: Oozie Patch Available (103 issues)

Subscriber: ooziedaily

Key Summary
OOZIE-3715  Fix fork out more than one transitions submit , one transition 
submit fail can't execute KillXCommand
https://issues.apache.org/jira/browse/OOZIE-3715
OOZIE-3687  Fix Oozie client always using the current system username instead 
the one specified by the user (e.g.: via kerberos or via explicit basic 
authentication)
https://issues.apache.org/jira/browse/OOZIE-3687
OOZIE-3680  Add default value to custom configuration of all the supported file 
systems in Oozie
https://issues.apache.org/jira/browse/OOZIE-3680
OOZIE-3663  Upgrade Apache Xerces Java to 2.12.2
https://issues.apache.org/jira/browse/OOZIE-3663
OOZIE-3654  update to httpclient 4.5.13
https://issues.apache.org/jira/browse/OOZIE-3654
OOZIE-3635  Reduce nest of code in RecoveryService
https://issues.apache.org/jira/browse/OOZIE-3635
OOZIE-3623  Fix Typos in Distro Assembly
https://issues.apache.org/jira/browse/OOZIE-3623
OOZIE-3621  Make TestECPolicyDisabler work with Hadoop 3
https://issues.apache.org/jira/browse/OOZIE-3621
OOZIE-3620  hadoopId is not sent to eventHandlerService (listener) for workflow 
action events
https://issues.apache.org/jira/browse/OOZIE-3620
OOZIE-3609  Zookeeper SSL/TLS support
https://issues.apache.org/jira/browse/OOZIE-3609
OOZIE-3596  When the SSH action is killed, it must be changed to the kill 
command that can terminate the related subprocess.
https://issues.apache.org/jira/browse/OOZIE-3596
OOZIE-3568  Have large amount of log information “WARN messages [main] 
openjpa.MetaData” in jetty.log need to clean
https://issues.apache.org/jira/browse/OOZIE-3568
OOZIE-3567  Oozie ShellAction should support absolute bash file path
https://issues.apache.org/jira/browse/OOZIE-3567
OOZIE-3560  IDEA shows have some error  in index.jsp
https://issues.apache.org/jira/browse/OOZIE-3560
OOZIE-3554  Add asf.yaml to git repo
https://issues.apache.org/jira/browse/OOZIE-3554
OOZIE-3545  Upgrade jQuery
https://issues.apache.org/jira/browse/OOZIE-3545
OOZIE-3482  Fix bug in CoordSubmitXCommand#validateCoordinatorJob
https://issues.apache.org/jira/browse/OOZIE-3482
OOZIE-3480  Add windowactionstatus metrics in DBLiteWorkflowStoreService
https://issues.apache.org/jira/browse/OOZIE-3480
OOZIE-3461  CoordMaterializeTriggerService code cleanup
https://issues.apache.org/jira/browse/OOZIE-3461
OOZIE-3449  Make spark-2 as the default profile
https://issues.apache.org/jira/browse/OOZIE-3449
OOZIE-3447  Run test case in local : It shows oozie-hsqldb-orm.xml exception
https://issues.apache.org/jira/browse/OOZIE-3447
OOZIE-3434  Filtering for invalid jobtype should give error message
https://issues.apache.org/jira/browse/OOZIE-3434
OOZIE-3418  Upgrade to Guava 27
https://issues.apache.org/jira/browse/OOZIE-3418
OOZIE-3404  The env variable of SPARK_HOME needs to be set when running pySpark
https://issues.apache.org/jira/browse/OOZIE-3404
OOZIE-3375  Can't use empty  in coordinator
https://issues.apache.org/jira/browse/OOZIE-3375
OOZIE-3367  Using && in EL expressions in oozie bundle.xml files generates 
parse errors
https://issues.apache.org/jira/browse/OOZIE-3367
OOZIE-3366  Update workflow status and subworkflow status on suspend command
https://issues.apache.org/jira/browse/OOZIE-3366
OOZIE-3364  Rerunning Oozie bundle jobs starts the coordinators in 
indeterminate order
https://issues.apache.org/jira/browse/OOZIE-3364
OOZIE-3362  When killed, SSH action should kill the spawned processes on target 
host
https://issues.apache.org/jira/browse/OOZIE-3362
OOZIE-3335  Cleanup parseFilter methods
https://issues.apache.org/jira/browse/OOZIE-3335
OOZIE-3328  Create Hive compatibility action executor to run hive actions using 
beeline
https://issues.apache.org/jira/browse/OOZIE-3328
OOZIE-3319  Log SSH action callback error output
https://issues.apache.org/jira/browse/OOZIE-3319
OOZIE-3301  Update NOTICE file
https://issues.apache.org/jira/browse/OOZIE-3301
OOZIE-3274  Remove slf4j
https://issues.apache.org/jira/browse/OOZIE-3274
OOZIE-3266  Coord action rerun support RERUN_SKIP_NODES option
https://issues.apache.org/jira/browse/OOZIE-3266
OOZIE-3256  refactor OozieCLI class
https://issues.apache.org/jira/browse/OOZIE-3256
OOZIE-3196  Authorization: restrict world readability by user
https://issues.apache.org/jira/browse/OOZIE-3196
OOZIE-3170  Oozie Diagnostic Bundle tool fails with NPE due to missing service 
class
https://issues.apache.org/jira/browse/OOZIE-3170
OOZIE-314

[jira] Subscription: Oozie Patch Available

2023-04-20 Thread jira
Issue Subscription
Filter: Oozie Patch Available (103 issues)

Subscriber: ooziedaily

Key Summary
OOZIE-3715  Fix fork out more than one transitions submit , one transition 
submit fail can't execute KillXCommand
https://issues.apache.org/jira/browse/OOZIE-3715
OOZIE-3687  Fix Oozie client always using the current system username instead 
the one specified by the user (e.g.: via kerberos or via explicit basic 
authentication)
https://issues.apache.org/jira/browse/OOZIE-3687
OOZIE-3680  Add default value to custom configuration of all the supported file 
systems in Oozie
https://issues.apache.org/jira/browse/OOZIE-3680
OOZIE-3663  Upgrade Apache Xerces Java to 2.12.2
https://issues.apache.org/jira/browse/OOZIE-3663
OOZIE-3654  update to httpclient 4.5.13
https://issues.apache.org/jira/browse/OOZIE-3654
OOZIE-3635  Reduce nest of code in RecoveryService
https://issues.apache.org/jira/browse/OOZIE-3635
OOZIE-3623  Fix Typos in Distro Assembly
https://issues.apache.org/jira/browse/OOZIE-3623
OOZIE-3621  Make TestECPolicyDisabler work with Hadoop 3
https://issues.apache.org/jira/browse/OOZIE-3621
OOZIE-3620  hadoopId is not sent to eventHandlerService (listener) for workflow 
action events
https://issues.apache.org/jira/browse/OOZIE-3620
OOZIE-3609  Zookeeper SSL/TLS support
https://issues.apache.org/jira/browse/OOZIE-3609
OOZIE-3596  When the SSH action is killed, it must be changed to the kill 
command that can terminate the related subprocess.
https://issues.apache.org/jira/browse/OOZIE-3596
OOZIE-3568  Have large amount of log information “WARN messages [main] 
openjpa.MetaData” in jetty.log need to clean
https://issues.apache.org/jira/browse/OOZIE-3568
OOZIE-3567  Oozie ShellAction should support absolute bash file path
https://issues.apache.org/jira/browse/OOZIE-3567
OOZIE-3560  IDEA shows have some error  in index.jsp
https://issues.apache.org/jira/browse/OOZIE-3560
OOZIE-3554  Add asf.yaml to git repo
https://issues.apache.org/jira/browse/OOZIE-3554
OOZIE-3545  Upgrade jQuery
https://issues.apache.org/jira/browse/OOZIE-3545
OOZIE-3482  Fix bug in CoordSubmitXCommand#validateCoordinatorJob
https://issues.apache.org/jira/browse/OOZIE-3482
OOZIE-3480  Add windowactionstatus metrics in DBLiteWorkflowStoreService
https://issues.apache.org/jira/browse/OOZIE-3480
OOZIE-3461  CoordMaterializeTriggerService code cleanup
https://issues.apache.org/jira/browse/OOZIE-3461
OOZIE-3449  Make spark-2 as the default profile
https://issues.apache.org/jira/browse/OOZIE-3449
OOZIE-3447  Run test case in local : It shows oozie-hsqldb-orm.xml exception
https://issues.apache.org/jira/browse/OOZIE-3447
OOZIE-3434  Filtering for invalid jobtype should give error message
https://issues.apache.org/jira/browse/OOZIE-3434
OOZIE-3418  Upgrade to Guava 27
https://issues.apache.org/jira/browse/OOZIE-3418
OOZIE-3404  The env variable of SPARK_HOME needs to be set when running pySpark
https://issues.apache.org/jira/browse/OOZIE-3404
OOZIE-3375  Can't use empty  in coordinator
https://issues.apache.org/jira/browse/OOZIE-3375
OOZIE-3367  Using && in EL expressions in oozie bundle.xml files generates 
parse errors
https://issues.apache.org/jira/browse/OOZIE-3367
OOZIE-3366  Update workflow status and subworkflow status on suspend command
https://issues.apache.org/jira/browse/OOZIE-3366
OOZIE-3364  Rerunning Oozie bundle jobs starts the coordinators in 
indeterminate order
https://issues.apache.org/jira/browse/OOZIE-3364
OOZIE-3362  When killed, SSH action should kill the spawned processes on target 
host
https://issues.apache.org/jira/browse/OOZIE-3362
OOZIE-3335  Cleanup parseFilter methods
https://issues.apache.org/jira/browse/OOZIE-3335
OOZIE-3328  Create Hive compatibility action executor to run hive actions using 
beeline
https://issues.apache.org/jira/browse/OOZIE-3328
OOZIE-3319  Log SSH action callback error output
https://issues.apache.org/jira/browse/OOZIE-3319
OOZIE-3301  Update NOTICE file
https://issues.apache.org/jira/browse/OOZIE-3301
OOZIE-3274  Remove slf4j
https://issues.apache.org/jira/browse/OOZIE-3274
OOZIE-3266  Coord action rerun support RERUN_SKIP_NODES option
https://issues.apache.org/jira/browse/OOZIE-3266
OOZIE-3256  refactor OozieCLI class
https://issues.apache.org/jira/browse/OOZIE-3256
OOZIE-3196  Authorization: restrict world readability by user
https://issues.apache.org/jira/browse/OOZIE-3196
OOZIE-3170  Oozie Diagnostic Bundle tool fails with NPE due to missing service 
class
https://issues.apache.org/jira/browse/OOZIE-3170
OOZIE-314

  1   2   3   4   5   6   7   8   9   10   >