[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-02-10 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490321#comment-17490321
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Thanks for your patient review. [~dionusos]  [~asalamon74] :)

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Fix For: 5.3.0
>
> Attachments: OOZIE-3646-002.patch, OOZIE-3646-003.patch, 
> OOZIE-3646.patch-1, OOZIE-3646.patch-2, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-02-08 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488732#comment-17488732
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Fixed [~dionusos] .

Thanks for your review.

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646-002.patch, OOZIE-3646-003.patch, 
> OOZIE-3646.patch-1, OOZIE-3646.patch-2, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-02-08 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Attachment: OOZIE-3646-003.patch

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646-002.patch, OOZIE-3646-003.patch, 
> OOZIE-3646.patch-1, OOZIE-3646.patch-2, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-01-24 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17481580#comment-17481580
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Gentle ping [~dionusos] 

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, OOZIE-3646.patch-2, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-01-18 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477782#comment-17477782
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Gentle ping [~dionusos] :)

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, OOZIE-3646.patch-2, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-01-17 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Attachment: OOZIE-3646.patch-2

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, OOZIE-3646.patch-2, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-01-17 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477067#comment-17477067
 ] 

Junfan Zhang commented on OOZIE-3646:
-

[~dionusos] Upload new patch. Please recheck it. Thanks

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, OOZIE-3646.patch-2, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-01-12 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475065#comment-17475065
 ] 

Junfan Zhang edited comment on OOZIE-3646 at 1/13/22, 2:52 AM:
---

Yep. [~dionusos], maybe you should remove the starting process of 
{{HCatalogServer}} in {{{}XHCatTestCase{}}}. In my mac, it also hang when no 
removing. Should be like as follows:
{code:java}
public abstract class XHCatTestCase extends XFsTestCase {

private MiniHCatServer hcatServer;

@Override
protected void setUp() throws Exception {
super.setUp();
//super.setupHCatalogServer();
//hcatServer = super.getHCatalogServer();
}
{code}
Please check it. And to reproduce this bug, you could run this test case: 
[https://github.com/apache/oozie/pull/65]

 

Looking forward to your reply.


was (Author: zuston):
Yep. [~dionusos], maybe you should remove the starting process of 
{{HCatalogServer}} in {{XHCatTestCase}}. In my mac, it also hang when no 
removing. Should be like as follows:


{code:java}
public abstract class XHCatTestCase extends XFsTestCase {

private MiniHCatServer hcatServer;

@Override
protected void setUp() throws Exception {
super.setUp();
//super.setupHCatalogServer();
//hcatServer = super.getHCatalogServer();
}
{code}

Please check it. And to reproduce this bug, you could run this test case: 
https://github.com/apache/oozie/pull/65

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-01-12 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475065#comment-17475065
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Yep. [~dionusos], maybe you should remove the starting process of 
{{HCatalogServer}} in {{XHCatTestCase}}. In my mac, it also hang when no 
removing. Should be like as follows:


{code:java}
public abstract class XHCatTestCase extends XFsTestCase {

private MiniHCatServer hcatServer;

@Override
protected void setUp() throws Exception {
super.setUp();
//super.setupHCatalogServer();
//hcatServer = super.getHCatalogServer();
}
{code}

Please check it. And to reproduce this bug, you could run this test case: 
https://github.com/apache/oozie/pull/65

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2022-01-09 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471692#comment-17471692
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Gentle ping [~dionusos] [~asalamon74]

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-20 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463011#comment-17463011
 ] 

Junfan Zhang commented on OOZIE-3646:
-

[~dionusos] Take it easy :). Thanks

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3647) Oozie should use the ConfigurationService.getInt instead of Services.get().getConf().getInt

2021-12-16 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460516#comment-17460516
 ] 

Junfan Zhang commented on OOZIE-3647:
-

Maybe one step is a good idea.

I could take over the OOZIE-3462. [~asalamon74] 

 

Besides, could you help review this ticket OOZIE-3646

> Oozie should use the ConfigurationService.getInt instead of 
> Services.get().getConf().getInt
> ---
>
> Key: OOZIE-3647
> URL: https://issues.apache.org/jira/browse/OOZIE-3647
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> -Now in {{ActionCheckerService}} and {{{}CoordMaterializeTriggerService{}}}, 
> the callables will be queued using the 
> {{{}CallableQueueService.queueSerial{}}}. In the internal implementation of 
> {{{}queueSerial{}}}, all callables will be composited as single 
> {{{}CompositeCallable{}}}, which means it will be executed in single thread 
> serially.-
> -To speed to the execution time, we should set the default number of 
> callables in a batch in {{ActionCheckerService}} and 
> {{{}CoordMaterializeTriggerService{}}}, like the- 
> [-RecoveryService-|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]
>  
> Oozie should use the {{ConfigurationService.getInt}} instead of 
> {{Services.get().getConf().getInt}}, and the default value should be added in 
> oozie-default.xml.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-15 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458913#comment-17458913
 ] 

Junfan Zhang edited comment on OOZIE-3646 at 12/16/21, 2:44 AM:


Thanks [~dionusos].


Now the test case has been 
attached([link|https://github.com/apache/oozie/pull/65]), you could run it and 
reproduce the dead-lock.

Besides, the stucked thread stack is in ticket's attachment.

If you run {{testPossibleDeadLock}} method, it will fail.


But when you make the 
{{{}ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);{}}}, everything is ok.


Because of the sync invoking in {{SignalXCommand}}
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}
 

Please check it and let me know what you think [~dionusos]


was (Author: zuston):
Thanks [~dionusos].
Now the test case has been 
attached([link|https://github.com/apache/oozie/pull/65]), Besides, the stucked 
thread stack is in ticket's attachment.

If you run {{testPossibleDeadLock}} method, it will fail. 
But you make the 
{{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);}}, everything is ok. 
Because of the sync invoking in {{SignalXCommand}}  
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}

Please check it and let me know what you think [~dionusos]

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-15 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Attachment: (was: OOZIE-3646.patch-1)

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-15 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Attachment: OOZIE-3646.patch-1

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3646.patch-1, a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3647) Oozie should use the ConfigurationService.getInt instead of Services.get().getConf().getInt

2021-12-15 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460361#comment-17460361
 ] 

Junfan Zhang commented on OOZIE-3647:
-

Linked PR: https://github.com/apache/oozie/pull/66

> Oozie should use the ConfigurationService.getInt instead of 
> Services.get().getConf().getInt
> ---
>
> Key: OOZIE-3647
> URL: https://issues.apache.org/jira/browse/OOZIE-3647
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> -Now in {{ActionCheckerService}} and {{{}CoordMaterializeTriggerService{}}}, 
> the callables will be queued using the 
> {{{}CallableQueueService.queueSerial{}}}. In the internal implementation of 
> {{{}queueSerial{}}}, all callables will be composited as single 
> {{{}CompositeCallable{}}}, which means it will be executed in single thread 
> serially.-
> -To speed to the execution time, we should set the default number of 
> callables in a batch in {{ActionCheckerService}} and 
> {{{}CoordMaterializeTriggerService{}}}, like the- 
> [-RecoveryService-|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]
>  
> Oozie should use the {{ConfigurationService.getInt}} instead of 
> {{Services.get().getConf().getInt}}, and the default value should be added in 
> oozie-default.xml.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-15 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459850#comment-17459850
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Gentle ping [~dionusos]. Could you help check it? So give some suggestion? 

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-14 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Description: 
The limited thread execution mechanism aims to solve the dead-lock when all 
active threads are executing the SignalXCommand's invokeAll method.

h2. Dead-lock when to happen
Assuming that Oozie CallableQueue thread pool size is 120, when all threads are 
executing the {{SignalXCommand.startForkedActions}} method, a deadlock occurs.
Because in {{SignalXCommand.startForkedActions}}, the code of 
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks);
{code}
 will be sync executed, however now all callableQueue threads are busy.

h2. Solution
1. Need to limit directly invokeAll call when the num of rest threads is less 
than the tasks
2. To obtain correct active threads number in callableQueue, the 
SignalXCommand.class lock is needed.

  was:
The limited thread execution mechanism aims to solve the dead-lock when all 
active threads are executing the SignalXCommand's invokeAll method.

h2. Dead-lock when to happen
Assuming that Oozie CallableQueue thread pool size is 120, when all threads are 
executing the {{SignalXCommand.startForkedActions}} method, a deadlock occurs.
Because in {{SignalXCommand.startForkedActions}}, the code of 
{{List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks);}} will be sync executed, however now all 
callableQueue threads are busy.

h2. Solution
1. Need to limit directly invokeAll call when the num of rest threads is less 
than the tasks
2. To obtain correct active threads number in callableQueue, the 
SignalXCommand.class lock is needed.


> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {code:java}
> List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);
> {code}
>  will be sync executed, however now all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang reassigned OOZIE-3646:
---

Assignee: Junfan Zhang

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (OOZIE-3647) Oozie should use the ConfigurationService.getInt instead of Services.get().getConf().getInt

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang reassigned OOZIE-3647:
---

Assignee: Junfan Zhang

> Oozie should use the ConfigurationService.getInt instead of 
> Services.get().getConf().getInt
> ---
>
> Key: OOZIE-3647
> URL: https://issues.apache.org/jira/browse/OOZIE-3647
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> -Now in {{ActionCheckerService}} and {{{}CoordMaterializeTriggerService{}}}, 
> the callables will be queued using the 
> {{{}CallableQueueService.queueSerial{}}}. In the internal implementation of 
> {{{}queueSerial{}}}, all callables will be composited as single 
> {{{}CompositeCallable{}}}, which means it will be executed in single thread 
> serially.-
> -To speed to the execution time, we should set the default number of 
> callables in a batch in {{ActionCheckerService}} and 
> {{{}CoordMaterializeTriggerService{}}}, like the- 
> [-RecoveryService-|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]
>  
> Oozie should use the {{ConfigurationService.getInt}} instead of 
> {{Services.get().getConf().getInt}}, and the default value should be added in 
> oozie-default.xml.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3647) Oozie should use the ConfigurationService.getInt instead of Services.get().getConf().getInt

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3647:

Summary: Oozie should use the ConfigurationService.getInt instead of 
Services.get().getConf().getInt  (was: Oozie should use the 
{{ConfigurationService.getInt}} instead of {{Services.get().getConf().getInt}})

> Oozie should use the ConfigurationService.getInt instead of 
> Services.get().getConf().getInt
> ---
>
> Key: OOZIE-3647
> URL: https://issues.apache.org/jira/browse/OOZIE-3647
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> -Now in {{ActionCheckerService}} and {{{}CoordMaterializeTriggerService{}}}, 
> the callables will be queued using the 
> {{{}CallableQueueService.queueSerial{}}}. In the internal implementation of 
> {{{}queueSerial{}}}, all callables will be composited as single 
> {{{}CompositeCallable{}}}, which means it will be executed in single thread 
> serially.-
> -To speed to the execution time, we should set the default number of 
> callables in a batch in {{ActionCheckerService}} and 
> {{{}CoordMaterializeTriggerService{}}}, like the- 
> [-RecoveryService-|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]
>  
> Oozie should use the {{ConfigurationService.getInt}} instead of 
> {{Services.get().getConf().getInt}}, and the default value should be added in 
> oozie-default.xml.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3647) Set the default number of callables to be queued in a batch to speed up

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3647:

Description: 
-Now in {{ActionCheckerService}} and {{{}CoordMaterializeTriggerService{}}}, 
the callables will be queued using the 
{{{}CallableQueueService.queueSerial{}}}. In the internal implementation of 
{{{}queueSerial{}}}, all callables will be composited as single 
{{{}CompositeCallable{}}}, which means it will be executed in single thread 
serially.-

-To speed to the execution time, we should set the default number of callables 
in a batch in {{ActionCheckerService}} and 
{{{}CoordMaterializeTriggerService{}}}, like the- 
[-RecoveryService-|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]

 

Oozie should use the {{ConfigurationService.getInt}} instead of 
{{Services.get().getConf().getInt}}, and the default value should be added in 
oozie-default.xml.

  was:
Now in {{ActionCheckerService}} and {{CoordMaterializeTriggerService}}, the 
callables will be queued using the {{CallableQueueService.queueSerial}}. In the 
internal implementation of {{queueSerial}}, all callables will be composited as 
single {{CompositeCallable}}, which means it will be executed in single thread 
serially.

To speed to the execution time, we should set the default number of callables  
in a batch in {{ActionCheckerService}} and {{CoordMaterializeTriggerService}}, 
like the 
[RecoveryService|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]


> Set the default number of callables to be queued in a batch to speed up
> ---
>
> Key: OOZIE-3647
> URL: https://issues.apache.org/jira/browse/OOZIE-3647
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> -Now in {{ActionCheckerService}} and {{{}CoordMaterializeTriggerService{}}}, 
> the callables will be queued using the 
> {{{}CallableQueueService.queueSerial{}}}. In the internal implementation of 
> {{{}queueSerial{}}}, all callables will be composited as single 
> {{{}CompositeCallable{}}}, which means it will be executed in single thread 
> serially.-
> -To speed to the execution time, we should set the default number of 
> callables in a batch in {{ActionCheckerService}} and 
> {{{}CoordMaterializeTriggerService{}}}, like the- 
> [-RecoveryService-|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]
>  
> Oozie should use the {{ConfigurationService.getInt}} instead of 
> {{Services.get().getConf().getInt}}, and the default value should be added in 
> oozie-default.xml.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3647) Oozie should use the {{ConfigurationService.getInt}} instead of {{Services.get().getConf().getInt}}

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3647:

Summary: Oozie should use the {{ConfigurationService.getInt}} instead of 
{{Services.get().getConf().getInt}}  (was: Set the default number of callables 
to be queued in a batch to speed up)

> Oozie should use the {{ConfigurationService.getInt}} instead of 
> {{Services.get().getConf().getInt}}
> ---
>
> Key: OOZIE-3647
> URL: https://issues.apache.org/jira/browse/OOZIE-3647
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> -Now in {{ActionCheckerService}} and {{{}CoordMaterializeTriggerService{}}}, 
> the callables will be queued using the 
> {{{}CallableQueueService.queueSerial{}}}. In the internal implementation of 
> {{{}queueSerial{}}}, all callables will be composited as single 
> {{{}CompositeCallable{}}}, which means it will be executed in single thread 
> serially.-
> -To speed to the execution time, we should set the default number of 
> callables in a batch in {{ActionCheckerService}} and 
> {{{}CoordMaterializeTriggerService{}}}, like the- 
> [-RecoveryService-|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]
>  
> Oozie should use the {{ConfigurationService.getInt}} instead of 
> {{Services.get().getConf().getInt}}, and the default value should be added in 
> oozie-default.xml.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3647) Set the default number of callables to be queued in a batch to speed up

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3647:

Description: 
Now in {{ActionCheckerService}} and {{CoordMaterializeTriggerService}}, the 
callables will be queued using the {{CallableQueueService.queueSerial}}. In the 
internal implementation of {{queueSerial}}, all callables will be composited as 
single {{CompositeCallable}}, which means it will be executed in single thread 
serially.

To speed to the execution time, we should set the default number of callables  
in a batch in {{ActionCheckerService}} and {{CoordMaterializeTriggerService}}, 
like the 
[RecoveryService|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]

  was:Now in {{ActionCheckerService 


> Set the default number of callables to be queued in a batch to speed up
> ---
>
> Key: OOZIE-3647
> URL: https://issues.apache.org/jira/browse/OOZIE-3647
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> Now in {{ActionCheckerService}} and {{CoordMaterializeTriggerService}}, the 
> callables will be queued using the {{CallableQueueService.queueSerial}}. In 
> the internal implementation of {{queueSerial}}, all callables will be 
> composited as single {{CompositeCallable}}, which means it will be executed 
> in single thread serially.
> To speed to the execution time, we should set the default number of callables 
>  in a batch in {{ActionCheckerService}} and 
> {{CoordMaterializeTriggerService}}, like the 
> [RecoveryService|https://github.com/apache/oozie/blob/e010fbda91bd78cccb227fc872b3ddd317a5ce6a/core/src/main/java/org/apache/oozie/service/RecoveryService.java#L453]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3647) Set the default number of callables to be queued in a batch to speed up

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3647:

Description: Now in {{ActionCheckerService 

> Set the default number of callables to be queued in a batch to speed up
> ---
>
> Key: OOZIE-3647
> URL: https://issues.apache.org/jira/browse/OOZIE-3647
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Priority: Major
>
> Now in {{ActionCheckerService 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (OOZIE-3647) Set the default number of callables to be queued in a batch to speed up

2021-12-13 Thread Junfan Zhang (Jira)
Junfan Zhang created OOZIE-3647:
---

 Summary: Set the default number of callables to be queued in a 
batch to speed up
 Key: OOZIE-3647
 URL: https://issues.apache.org/jira/browse/OOZIE-3647
 Project: Oozie
  Issue Type: Improvement
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Attachment: a1.png

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
> Attachments: a1.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Attachment: (was: Screen Shot 2021-12-14 at 2.24.10 PM.png)

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458913#comment-17458913
 ] 

Junfan Zhang edited comment on OOZIE-3646 at 12/14/21, 6:26 AM:


Thanks [~dionusos].
Now the test case has been 
attached([link|https://github.com/apache/oozie/pull/65]), Besides, the stucked 
thread stack is in ticket's attachment.

If you run {{testPossibleDeadLock}} method, it will fail. 
But you make the 
{{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);}}, everything is ok. 
Because of the sync invoking in {{SignalXCommand}}  
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}

Please check it and let me know what you think [~dionusos]


was (Author: zuston):
Thanks [~dionusos].
https://github.com/apache/oozie/pull/65 Now the test case has been attached. 
Besides, the stucked thread stack is in ticket's attachment.

If you run {{testPossibleDeadLock}} method, it will fail. 
But you make the 
{{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);}}, everything is ok. 
Because of the sync invoking in {{SignalXCommand}}  
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}

Please check it and let me know what you think [~dionusos]

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
> Attachments: Screen Shot 2021-12-14 at 2.24.10 PM.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Attachment: Screen Shot 2021-12-14 at 2.24.10 PM.png

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
> Attachments: Screen Shot 2021-12-14 at 2.24.10 PM.png
>
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458913#comment-17458913
 ] 

Junfan Zhang edited comment on OOZIE-3646 at 12/14/21, 6:25 AM:


Thanks [~dionusos].
https://github.com/apache/oozie/pull/65 Now the test case has been attached. 
Besides, the stucked thread stack is in ticket's attachment.

If you run {{testPossibleDeadLock}} method, it will fail. 
But you make the 
{{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);}}, everything is ok. 
Because of the sync invoking in {{SignalXCommand}}  
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}

Please check it and let me know what you think [~dionusos]


was (Author: zuston):
Thanks [~dionusos].
https://github.com/apache/oozie/pull/65 Now the test case has been attached.

If you run {{testPossibleDeadLock}} method, it will fail. 
But you make the 
{{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);}}, everything is ok. 
Because of the sync invoking in {{SignalXCommand}}  
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}

Please check it and let me know what you think [~dionusos]

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458913#comment-17458913
 ] 

Junfan Zhang edited comment on OOZIE-3646 at 12/14/21, 5:45 AM:


Thanks [~dionusos].
https://github.com/apache/oozie/pull/65 Now the test case has been attached.

If you run {{testPossibleDeadLock}} method, it will fail. 
But you make the 
{{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);}}, everything is ok. 
Because of the sync invoking in {{SignalXCommand}}  
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}

Please check it and let me know what you think [~dionusos]


was (Author: zuston):
Thanks [~dionusos].
https://github.com/apache/oozie/pull/65 Now the test case has been attached.

If you run {{testPossibleDeadLock}} method, it will fail. 
But you make the 
{{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);}}, everything is ok. 
Because of the sync invoking in {{SignalXCommand}}  
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}


> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458913#comment-17458913
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Thanks [~dionusos].
https://github.com/apache/oozie/pull/65 Now the test case has been attached.

If you run {{testPossibleDeadLock}} method, it will fail. 
But you make the 
{{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, 
false);}}, everything is ok. 
Because of the sync invoking in {{SignalXCommand}}  
{code:java}
List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks)
{code}


> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458337#comment-17458337
 ] 

Junfan Zhang commented on OOZIE-3646:
-

[~dionusos] Hi, could you tell me how to run test case in TestSignalXCommand, 
like {{mvn test}} shell ? Thanks

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458322#comment-17458322
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Got it [~dionusos]. Test case will be provided in next few days.

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458223#comment-17458223
 ] 

Junfan Zhang commented on OOZIE-3646:
-

Github PR link: https://github.com/apache/oozie/pull/64

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458222#comment-17458222
 ] 

Junfan Zhang commented on OOZIE-3646:
-

This is a serious bug. Please check it. [~dionusos] [~asalamon74] 

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3646:

Description: 
The limited thread execution mechanism aims to solve the dead-lock when all 
active threads are executing the SignalXCommand's invokeAll method.

h2. Dead-lock when to happen
Assuming that Oozie CallableQueue thread pool size is 120, when all threads are 
executing the {{SignalXCommand.startForkedActions}} method, a deadlock occurs.
Because in {{SignalXCommand.startForkedActions}}, the code of 
{{List> futures = 
Services.get().get(CallableQueueService.class)
.invokeAll(tasks);}} will be sync executed, however now all 
callableQueue threads are busy.

h2. Solution
1. Need to limit directly invokeAll call when the num of rest threads is less 
than the tasks
2. To obtain correct active threads number in callableQueue, the 
SignalXCommand.class lock is needed.

> Possible dead-lock in SignalXCommand
> 
>
> Key: OOZIE-3646
> URL: https://issues.apache.org/jira/browse/OOZIE-3646
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Priority: Major
>
> The limited thread execution mechanism aims to solve the dead-lock when all 
> active threads are executing the SignalXCommand's invokeAll method.
> h2. Dead-lock when to happen
> Assuming that Oozie CallableQueue thread pool size is 120, when all threads 
> are executing the {{SignalXCommand.startForkedActions}} method, a deadlock 
> occurs.
> Because in {{SignalXCommand.startForkedActions}}, the code of 
> {{List> futures = 
> Services.get().get(CallableQueueService.class)
> .invokeAll(tasks);}} will be sync executed, however now 
> all callableQueue threads are busy.
> h2. Solution
> 1. Need to limit directly invokeAll call when the num of rest threads is less 
> than the tasks
> 2. To obtain correct active threads number in callableQueue, the 
> SignalXCommand.class lock is needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (OOZIE-3646) Possible dead-lock in SignalXCommand

2021-12-13 Thread Junfan Zhang (Jira)
Junfan Zhang created OOZIE-3646:
---

 Summary: Possible dead-lock in SignalXCommand
 Key: OOZIE-3646
 URL: https://issues.apache.org/jira/browse/OOZIE-3646
 Project: Oozie
  Issue Type: Bug
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3635) Reduce nest of code in RecoveryService

2021-12-08 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456125#comment-17456125
 ] 

Junfan Zhang commented on OOZIE-3635:
-

Github-PR is OK ?
I'm not famailar with review-board. [~dionusos]

> Reduce nest of code in RecoveryService
> --
>
> Key: OOZIE-3635
> URL: https://issues.apache.org/jira/browse/OOZIE-3635
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Minor
> Attachments: OOZIE-3635-1.patch, OOZIE-3635-2.patch
>
>
> Too much nest code in RecoveryService, this ticket to reduce it



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (OOZIE-3635) Reduce nest of code in RecoveryService

2021-09-26 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420505#comment-17420505
 ] 

Junfan Zhang commented on OOZIE-3635:
-

[~dionusos] Take it easy.

New patch have been uploaded. Please check it.

> Reduce nest of code in RecoveryService
> --
>
> Key: OOZIE-3635
> URL: https://issues.apache.org/jira/browse/OOZIE-3635
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Minor
> Attachments: OOZIE-3635-1.patch, OOZIE-3635-2.patch
>
>
> Too much nest code in RecoveryService, this ticket to reduce it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3635) Reduce nest of code in RecoveryService

2021-09-26 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3635:

Attachment: OOZIE-3635-2.patch

> Reduce nest of code in RecoveryService
> --
>
> Key: OOZIE-3635
> URL: https://issues.apache.org/jira/browse/OOZIE-3635
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Minor
> Attachments: OOZIE-3635-1.patch, OOZIE-3635-2.patch
>
>
> Too much nest code in RecoveryService, this ticket to reduce it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3635) Reduce nest of code in RecoveryService

2021-09-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3635:

Attachment: OOZIE-3635-1.patch

> Reduce nest of code in RecoveryService
> --
>
> Key: OOZIE-3635
> URL: https://issues.apache.org/jira/browse/OOZIE-3635
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Minor
> Attachments: OOZIE-3635-1.patch
>
>
> Too much nest code in RecoveryService, this ticket to reduce it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3635) Reduce nest of code in RecoveryService

2021-09-13 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3635:

Description: Too much nest code in RecoveryService, this ticket to reduce it

> Reduce nest of code in RecoveryService
> --
>
> Key: OOZIE-3635
> URL: https://issues.apache.org/jira/browse/OOZIE-3635
> Project: Oozie
>  Issue Type: Improvement
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Minor
>
> Too much nest code in RecoveryService, this ticket to reduce it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OOZIE-3635) Reduce nest of code in RecoveryService

2021-09-13 Thread Junfan Zhang (Jira)
Junfan Zhang created OOZIE-3635:
---

 Summary: Reduce nest of code in RecoveryService
 Key: OOZIE-3635
 URL: https://issues.apache.org/jira/browse/OOZIE-3635
 Project: Oozie
  Issue Type: Improvement
Reporter: Junfan Zhang
Assignee: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3594) Support Flink batch action

2021-07-06 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375664#comment-17375664
 ] 

Junfan Zhang edited comment on OOZIE-3594 at 7/6/21, 12:01 PM:
---

Agree more [~asalamon74] [~dionusos].
I will submit a proposal design doc later, and then we can talk more details. 
It's a big task, but fortunately we have already practiced this Flink batch 
action on our internal Oozie version and proved that it is feasible in our 
production env.

Besides, do you have some plan to remove sqooq or pig action in newer Oozie 
version, because they may block Oozie campatible with Hadoop3.x.


was (Author: zuston):
Agree more [~asalamon74] [~dionusos].
I will submit a proposal design doc later, and then we can talk more details. 
It's a big task, but fortunately we have already practiced this Flink batch 
action online and proved that it is feasible.

Besides, do you have some plan to remove sqooq or pig action in newer Oozie 
version, because they may block Oozie campatible with Hadoop3.x.

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action have been supported on Oozie since a long time ago. Now 
> Flink batch is productive, and i think it's necessary to integerate Flink 
> batch action on Oozie as a workflow node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3594) Support Flink batch action

2021-07-06 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375664#comment-17375664
 ] 

Junfan Zhang edited comment on OOZIE-3594 at 7/6/21, 12:00 PM:
---

Agree more [~asalamon74] [~dionusos].
I will submit a proposal design doc later, and then we can talk more details. 
It's a big task, but fortunately we have already practiced this Flink batch 
action online and proved that it is feasible.

Besides, do you have some plan to remove sqooq or pig action in newer Oozie 
version, because they may block Oozie campatible with Hadoop3.x.


was (Author: zuston):
Agree more [~asalamon74] [~dionusos].
I will submit a proposal design doc later, and then we can talk more details.

Besides, do you have some plan to remove sqooq or pig action in newer Oozie 
version, because they may block Oozie campatible with Hadoop3.x.

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action have been supported on Oozie since a long time ago. Now 
> Flink batch is productive, and i think it's necessary to integerate Flink 
> batch action on Oozie as a workflow node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3594) Support Flink batch action

2021-07-06 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375664#comment-17375664
 ] 

Junfan Zhang commented on OOZIE-3594:
-

Agree more [~asalamon74] [~dionusos].
I will submit a proposal design doc later, and then we can talk more details.

Besides, do you have some plan to remove sqooq or pig action in newer Oozie 
version, because they may block Oozie campatible with Hadoop3.x.

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action have been supported on Oozie since a long time ago. Now 
> Flink batch is productive, and i think it's necessary to integerate Flink 
> batch action on Oozie as a workflow node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3629) Fix the log print format error in CredentialsProviderFactory.java

2021-07-06 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375659#comment-17375659
 ] 

Junfan Zhang commented on OOZIE-3629:
-

[~dionusos] Thanks for your quick reply. Sorry for no detailed reasons on this 
change in description.

Actually the single quotation will make this log message EL params (like \{n\}) 
assigned incorrectly.

> Fix the log print format error in CredentialsProviderFactory.java
> -
>
> Key: OOZIE-3629
> URL: https://issues.apache.org/jira/browse/OOZIE-3629
> Project: Oozie
>  Issue Type: Bug
>Reporter: yang liu
>Priority: Major
> Attachments: OOZIE-3629-1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3629) Fix the log print format error in CredentialsProviderFactory.java

2021-07-06 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375396#comment-17375396
 ] 

Junfan Zhang commented on OOZIE-3629:
-

[~dionusos] Please check it. Thanks

> Fix the log print format error in CredentialsProviderFactory.java
> -
>
> Key: OOZIE-3629
> URL: https://issues.apache.org/jira/browse/OOZIE-3629
> Project: Oozie
>  Issue Type: Bug
>Reporter: yang liu
>Priority: Major
> Attachments: OOZIE-3629-1.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3594) Support Flink batch action

2021-06-30 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3594:

Description: Spark and Hive action have been all supported on Oozie a long 
time ago. Now Flink batch is productive, and i think it's necessary to 
integerate Flink batch action on Oozie as a workflow node.  (was: Spark and 
Hive action are all supported on Oozie a long time ago. Now Flink batch is 
productive, and i think it's necessary to integerate Flink batch action on 
Oozie as a workflow node.)

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action have been all supported on Oozie a long time ago. Now 
> Flink batch is productive, and i think it's necessary to integerate Flink 
> batch action on Oozie as a workflow node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3594) Support Flink batch action

2021-06-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371889#comment-17371889
 ] 

Junfan Zhang edited comment on OOZIE-3594 at 6/30/21, 8:33 AM:
---

Thanks for your quick reply. [~dionusos]
To support authentication mechanism with delegation token, i have submitted 
serveral PRs 
 which have been all merged. The prerequisites for Flink integration into Oozie 
have been met

If the community needs to support flink batch action, i will submit the 
proposal about Flink batch action laterly.



was (Author: zuston):
Thanks for your quick reply. [~dionusos]
To support authentication mechanism with delegation token, i have submitted 
serveral PR which have been all merged. The prerequisites for Flink integration 
into Oozie have been met

If the community needs to support flink batch action, i will submit the 
proposal about Flink batch action laterly.


> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action have been supported on Oozie since a long time ago. Now 
> Flink batch is productive, and i think it's necessary to integerate Flink 
> batch action on Oozie as a workflow node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3594) Support Flink batch action

2021-06-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371889#comment-17371889
 ] 

Junfan Zhang commented on OOZIE-3594:
-

Thanks for your quick reply. [~dionusos]
To support authentication mechanism with delegation token, i have submitted 
serveral PR which have been all merged. The prerequisites for Flink integration 
into Oozie have been met

If the community needs to support flink batch action, i will submit the 
proposal about Flink batch action laterly.


> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action have been supported on Oozie since a long time ago. Now 
> Flink batch is productive, and i think it's necessary to integerate Flink 
> batch action on Oozie as a workflow node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3594) Support Flink batch action

2021-06-30 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3594:

Description: Spark and Hive action have been supported on Oozie since a 
long time ago. Now Flink batch is productive, and i think it's necessary to 
integerate Flink batch action on Oozie as a workflow node.  (was: Spark and 
Hive action have been all supported on Oozie a long time ago. Now Flink batch 
is productive, and i think it's necessary to integerate Flink batch action on 
Oozie as a workflow node.)

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action have been supported on Oozie since a long time ago. Now 
> Flink batch is productive, and i think it's necessary to integerate Flink 
> batch action on Oozie as a workflow node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3594) Support Flink batch action

2021-06-30 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3594:

Description: Spark and Hive action are all supported on Oozie a long time 
ago. Now Flink batch is productive, and i think it's necessary to integerate 
Flink batch action on Oozie as a workflow node.  (was: Spark and Hive action 
are all supported on Oozie for a long time. Now Flink batch is productive, and 
i think it's necessary to integerate Flink batch action on Oozie.)

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action are all supported on Oozie a long time ago. Now Flink 
> batch is productive, and i think it's necessary to integerate Flink batch 
> action on Oozie as a workflow node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3594) Support Flink batch action

2021-06-30 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3594:

Description: Spark and Hive action are all supported on Oozie for a long 
time. Now Flink batch is productive, and i think it's necessary to integerate 
Flink batch action on Oozie.  (was: Now Spark. Hive action are all supported )

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Spark and Hive action are all supported on Oozie for a long time. Now Flink 
> batch is productive, and i think it's necessary to integerate Flink batch 
> action on Oozie.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3594) Support Flink batch action

2021-06-30 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3594:

Description: Now Spark. Hive action are all supported   (was: Plan to do 
it?)

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Affects Versions: 5.1.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Now Spark. Hive action are all supported 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3594) Support Flink batch action

2021-06-28 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17371028#comment-17371028
 ] 

Junfan Zhang commented on OOZIE-3594:
-

Update:
Another Flink & Hive PR has been merged, 
https://issues.apache.org/jira/browse/FLINK-22329

Now Flink has supported delegation tokens, we have extend Flink batch action in 
our production env.

Any ideas on it? If ok, i will submit proposal to support it. 
[~dionusos] [~asalamon74]

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Plan to do it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3620) hadoopId is not sent to eventHandlerService (listener) for workflow action events

2021-05-17 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345978#comment-17345978
 ] 

Junfan Zhang commented on OOZIE-3620:
-

Hi [~akkidx]. You need to attach patch in jira. Oozie ci don't support github 
PR.


> hadoopId is not sent to eventHandlerService (listener) for workflow action 
> events
> -
>
> Key: OOZIE-3620
> URL: https://issues.apache.org/jira/browse/OOZIE-3620
> Project: Oozie
>  Issue Type: Bug
>  Components: action, workflow
>Affects Versions: 4.3.0
>Reporter: Akshesh Doshi
>Priority: Major
> Attachments: oozie-3620.patch
>
>
> For some reason `hadoopId` is not set to `externalId` here - 
> https://github.com/apache/oozie/blob/76143a11ac765d786644e49c34f760c89a364d88/core/src/main/java/org/apache/oozie/command/wf/WorkflowXCommand.java#L86



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3622) No mapreduce jars in the classpath for hadoop3 mapreduce job

2021-05-17 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345974#comment-17345974
 ] 

Junfan Zhang commented on OOZIE-3622:
-

Could add more description on it? [~zjffdu]

> No mapreduce jars in the classpath  for hadoop3 mapreduce job
> -
>
> Key: OOZIE-3622
> URL: https://issues.apache.org/jira/browse/OOZIE-3622
> Project: Oozie
>  Issue Type: Bug
>  Components: action
>Affects Versions: 5.2.1
>Reporter: Jeff Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3594) Support Flink batch action

2021-05-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344291#comment-17344291
 ] 

Junfan Zhang edited comment on OOZIE-3594 at 5/14/21, 3:55 AM:
---

[~gezapeti]
Update:
To support Flink batch Action on Oozie, Flink need to support submmission with 
delegation token credentials.
So I submitted some PR to Flink, which have been merged.

1. https://issues.apache.org/jira/browse/FLINK-21700#
2. https://issues.apache.org/jira/browse/FLINK-22534
3. https://issues.apache.org/jira/browse/FLINK-21768


was (Author: zuston):
[~gezapeti]
Update:
To support Flink batch Action on Oozie, Flink need to support submmission with 
delegation token credentials.
So I submitted some PR to Flink, which have beed merged.

1. https://issues.apache.org/jira/browse/FLINK-21700#
2. https://issues.apache.org/jira/browse/FLINK-22534
3. https://issues.apache.org/jira/browse/FLINK-21768

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Plan to do it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3594) Support Flink batch action

2021-05-13 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17344291#comment-17344291
 ] 

Junfan Zhang commented on OOZIE-3594:
-

[~gezapeti]
Update:
To support Flink batch Action on Oozie, Flink need to support submmission with 
delegation token credentials.
So I submitted some PR to Flink, which have beed merged.

1. https://issues.apache.org/jira/browse/FLINK-21700#
2. https://issues.apache.org/jira/browse/FLINK-22534
3. https://issues.apache.org/jira/browse/FLINK-21768

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Plan to do it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3619) maxHistory default value set to 720 days assuming it to be hours which is not correct

2021-04-22 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329923#comment-17329923
 ] 

Junfan Zhang commented on OOZIE-3619:
-

Nice catch. But i think the {{oozie-log4j.properties}} should also be changed.

> maxHistory default value set to 720 days assuming it to be hours which is not 
> correct
> -
>
> Key: OOZIE-3619
> URL: https://issues.apache.org/jira/browse/OOZIE-3619
> Project: Oozie
>  Issue Type: Bug
>Affects Versions: 5.2.0
>Reporter: Bimalendu Choudhary
>Priority: Minor
> Fix For: trunk
>
> Attachments: oozie-3619.patch
>
>
> The default value of  maxHistory in OozieRollingPolicy.java is set to 720 and 
> the comments describes it as being hour. However maxHistory is maximum number 
> of rolled files which can be kept and rest gets deleted.  For days, 720 is a 
> huge number.
>  
> We should change it to default value of 30 which is equivalent to 30 days and 
> change the comments to describe it as number of rolled files and hours.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OOZIE-3594) Support flink batch action

2021-01-19 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang reassigned OOZIE-3594:
---

Assignee: Junfan Zhang

> Support flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Plan to do it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3594) Support Flink batch action

2021-01-19 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3594:

Summary: Support Flink batch action  (was: Support flink batch action)

> Support Flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Plan to do it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3581) Callback does not applied in Oozie server, workflows stuk in RUNNING states.

2020-11-09 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228640#comment-17228640
 ] 

Junfan Zhang commented on OOZIE-3581:
-

Of course. Please send me your wechat account to my email. 
{{junfan.zh...@outlook.com}}. Thanks

> Callback does not applied in Oozie server, workflows stuk in RUNNING states.
> 
>
> Key: OOZIE-3581
> URL: https://issues.apache.org/jira/browse/OOZIE-3581
> Project: Oozie
>  Issue Type: Bug
>  Components: action, workflow
>Affects Versions: 4.3.1
>Reporter: Kotsubinsky Victor
>Priority: Critical
>
> oozie version 4.3.1.3.1.0.0-78
> with HDP3.10 stack , release provides Oozie 4.3.1 and the additional Apache 
> patches listed here: 
> [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_oozie.html]
> I use Hadoop kerberized cluster, run on OOZIE, YARN Mr jobs.
> 1. OOzie run mr-job via YARN
> 2. After YARN mr job completed,YYARN mr-job Successfully sent Callback 
> request to OOzie,
> 3. in logs OOzie server, i can see this request, but OOZIE does not apply 
> this callback request, so in WF-action-id i still see RUNNING state (until 
> action.check process check wf-ids and swith action-id to SUCCESS state)
> LOGS in YARN:
> 2020-01-27 12:16:39,749 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification trying 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification to 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
>  succeeded
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification succeeded for job_1579778851579_31505
>  
> Oozie logs about this event:
> 2020-01-27 12:16:39,770 DEBUG CallbackServlet:526 - SERVER[hdp3-oo-2] USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Received a CallbackServlet.doGet() with query string 
> id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Execute command [callback] key [null]
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Queuing [1] commands with delay [0]ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3581) Callback does not applied in Oozie server, workflows stuk in RUNNING states.

2020-11-09 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228611#comment-17228611
 ] 

Junfan Zhang edited comment on OOZIE-3581 at 11/9/20, 2:01 PM:
---

Oozie uses two methods to detect the status of tasks, active and passive. 
Checkerservice is used to detect the status of the task. But callback will 
notify Oozie, and then trigger the detection. So if there is a delay, it means 
that the queue is backlogged.

[~gaofeng6]


was (Author: zuston):
Oozie uses two methods to detect the status of tasks, active and passive. 
Checkerservice is used to detect the status of the task. But callback will 
notify Oozie, and then trigger the detection. So if there is a delay, it means 
that the queue is backlogged.

> Callback does not applied in Oozie server, workflows stuk in RUNNING states.
> 
>
> Key: OOZIE-3581
> URL: https://issues.apache.org/jira/browse/OOZIE-3581
> Project: Oozie
>  Issue Type: Bug
>  Components: action, workflow
>Affects Versions: 4.3.1
>Reporter: Kotsubinsky Victor
>Priority: Critical
>
> oozie version 4.3.1.3.1.0.0-78
> with HDP3.10 stack , release provides Oozie 4.3.1 and the additional Apache 
> patches listed here: 
> [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_oozie.html]
> I use Hadoop kerberized cluster, run on OOZIE, YARN Mr jobs.
> 1. OOzie run mr-job via YARN
> 2. After YARN mr job completed,YYARN mr-job Successfully sent Callback 
> request to OOzie,
> 3. in logs OOzie server, i can see this request, but OOZIE does not apply 
> this callback request, so in WF-action-id i still see RUNNING state (until 
> action.check process check wf-ids and swith action-id to SUCCESS state)
> LOGS in YARN:
> 2020-01-27 12:16:39,749 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification trying 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification to 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
>  succeeded
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification succeeded for job_1579778851579_31505
>  
> Oozie logs about this event:
> 2020-01-27 12:16:39,770 DEBUG CallbackServlet:526 - SERVER[hdp3-oo-2] USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Received a CallbackServlet.doGet() with query string 
> id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Execute command [callback] key [null]
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Queuing [1] commands with delay [0]ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3581) Callback does not applied in Oozie server, workflows stuk in RUNNING states.

2020-11-09 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228611#comment-17228611
 ] 

Junfan Zhang commented on OOZIE-3581:
-

Oozie uses two methods to detect the status of tasks, active and passive. 
Checkerservice is used to detect the status of the task. But callback will 
notify Oozie, and then trigger the detection. So if there is a delay, it means 
that the queue is backlogged.

> Callback does not applied in Oozie server, workflows stuk in RUNNING states.
> 
>
> Key: OOZIE-3581
> URL: https://issues.apache.org/jira/browse/OOZIE-3581
> Project: Oozie
>  Issue Type: Bug
>  Components: action, workflow
>Affects Versions: 4.3.1
>Reporter: Kotsubinsky Victor
>Priority: Critical
>
> oozie version 4.3.1.3.1.0.0-78
> with HDP3.10 stack , release provides Oozie 4.3.1 and the additional Apache 
> patches listed here: 
> [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_oozie.html]
> I use Hadoop kerberized cluster, run on OOZIE, YARN Mr jobs.
> 1. OOzie run mr-job via YARN
> 2. After YARN mr job completed,YYARN mr-job Successfully sent Callback 
> request to OOzie,
> 3. in logs OOzie server, i can see this request, but OOZIE does not apply 
> this callback request, so in WF-action-id i still see RUNNING state (until 
> action.check process check wf-ids and swith action-id to SUCCESS state)
> LOGS in YARN:
> 2020-01-27 12:16:39,749 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification trying 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification to 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
>  succeeded
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification succeeded for job_1579778851579_31505
>  
> Oozie logs about this event:
> 2020-01-27 12:16:39,770 DEBUG CallbackServlet:526 - SERVER[hdp3-oo-2] USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Received a CallbackServlet.doGet() with query string 
> id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Execute command [callback] key [null]
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Queuing [1] commands with delay [0]ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3594) Support flink batch action

2020-04-22 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089791#comment-17089791
 ] 

Junfan Zhang commented on OOZIE-3594:
-

We have such a plan in my company, and progress will be synchronized here. We 
are very happy to support Flink batch with the community

> Support flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Junfan Zhang
>Priority: Major
>
> Plan to do it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3596) When the SSH action is killed, it must be changed to the kill command that can terminate the related subprocess.

2020-04-02 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073619#comment-17073619
 ] 

Junfan Zhang commented on OOZIE-3596:
-

Oozie does have this problem and we have made corresponding changes in 
production env samely.

> When the SSH action is killed, it must be changed to the kill command that 
> can terminate the related subprocess.
> 
>
> Key: OOZIE-3596
> URL: https://issues.apache.org/jira/browse/OOZIE-3596
> Project: Oozie
>  Issue Type: Improvement
>  Components: core
>Reporter: Suekyoung Lee
>Priority: Major
> Attachments: OOZIE-3596-001.patch, OOZIE-3596-002.patch, 
> OOZIE-3596-003.patch
>
>
> When the SSH action is terminated via the kill API, only the ssh-wrapper.sh 
> file (created for ssh execution in oozie) is terminated, and the 
> child-processes that has occurred is still running.
> For example, if the shell file for running spark-shell (e.g. called 
> "run-spark.sh") runs and exits as an SSH action, only the ssh-wrapper.sh file 
> exits and the child-processes (run-spark.sh and spark-shell) are still 
> running.
> Therefore, when the SSH action is killed, it must be changed to the kill 
> command that can terminate the related subprocess.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3594) Support flink batch action

2020-03-12 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3594:

Issue Type: New Feature  (was: Bug)

> Support flink batch action
> --
>
> Key: OOZIE-3594
> URL: https://issues.apache.org/jira/browse/OOZIE-3594
> Project: Oozie
>  Issue Type: New Feature
>Reporter: Junfan Zhang
>Priority: Major
>
> Plan to do it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OOZIE-3594) Support flink batch action

2020-03-12 Thread Junfan Zhang (Jira)
Junfan Zhang created OOZIE-3594:
---

 Summary: Support flink batch action
 Key: OOZIE-3594
 URL: https://issues.apache.org/jira/browse/OOZIE-3594
 Project: Oozie
  Issue Type: Bug
Reporter: Junfan Zhang


Plan to do it?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3589) Wrong using copyActionData method in ReRunXCommand

2020-03-01 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048756#comment-17048756
 ] 

Junfan Zhang commented on OOZIE-3589:
-

[~asalamon74]. I uploaded the latest patch to keep the logic consistent with 
the original code, although I think the method of \{{copyActionData}} is 
harmless.

>  Wrong using copyActionData method  in ReRunXCommand
> 
>
> Key: OOZIE-3589
> URL: https://issues.apache.org/jira/browse/OOZIE-3589
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3589-V1.patch
>
>
> Code is 
> [here|https://github.com/apache/oozie/blob/a40ab5361372aa73c9e4aa386a9c81bd21742aa4/core/src/main/java/org/apache/oozie/command/wf/ReRunXCommand.java#L213].
>  
> I don't think you should use the {{copyActionData}} method in a {{for}} loop. 
> Just call it externally.
> Fixed like:
> {code:java}
> for (int i = 0; i < actions.size(); i++) {
> // Skipping to delete the sub workflow when rerun failed node 
> option has been provided. As same
> // action will be used to rerun the job.
> if (!nodesToSkip.contains(actions.get(i).getName()) &&
> !(conf.getBoolean(OozieClient.RERUN_FAIL_NODES, false) &&
> 
> SubWorkflowActionExecutor.ACTION_TYPE.equals(actions.get(i).getType( {
> deleteList.add(actions.get(i));
> LOG.info("Deleting Action[{0}] for re-run", 
> actions.get(i).getId());
> }
> }
> copyActionData(newWfInstance, oldWfInstance);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3589) Wrong using copyActionData method in ReRunXCommand

2020-03-01 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3589:

Attachment: OOZIE-3589-V1.patch

>  Wrong using copyActionData method  in ReRunXCommand
> 
>
> Key: OOZIE-3589
> URL: https://issues.apache.org/jira/browse/OOZIE-3589
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3589-V1.patch
>
>
> Code is 
> [here|https://github.com/apache/oozie/blob/a40ab5361372aa73c9e4aa386a9c81bd21742aa4/core/src/main/java/org/apache/oozie/command/wf/ReRunXCommand.java#L213].
>  
> I don't think you should use the {{copyActionData}} method in a {{for}} loop. 
> Just call it externally.
> Fixed like:
> {code:java}
> for (int i = 0; i < actions.size(); i++) {
> // Skipping to delete the sub workflow when rerun failed node 
> option has been provided. As same
> // action will be used to rerun the job.
> if (!nodesToSkip.contains(actions.get(i).getName()) &&
> !(conf.getBoolean(OozieClient.RERUN_FAIL_NODES, false) &&
> 
> SubWorkflowActionExecutor.ACTION_TYPE.equals(actions.get(i).getType( {
> deleteList.add(actions.get(i));
> LOG.info("Deleting Action[{0}] for re-run", 
> actions.get(i).getId());
> }
> }
> copyActionData(newWfInstance, oldWfInstance);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (OOZIE-3423) [docs] Missing files from 5.1.0 docs

2020-03-01 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang reassigned OOZIE-3423:
---

Assignee: (was: Junfan Zhang)

> [docs] Missing files from 5.1.0 docs
> 
>
> Key: OOZIE-3423
> URL: https://issues.apache.org/jira/browse/OOZIE-3423
> Project: Oozie
>  Issue Type: Task
>  Components: docs
>Affects Versions: 5.1.0
>Reporter: Kinga Marton
>Priority: Major
>
> Several files are missing from the 5.1.0 documentation, which were present in 
> the 5.0.0 documentation:
>  * oozie-default.xml
>  * release-log.txt
>  * configuration.xsl
> Links:
> [https://oozie.apache.org/docs/5.1.0/oozie-default.xml]
> [https://oozie.apache.org/docs/5.0.0/oozie-default.xml]
> [https://oozie.apache.org/docs/5.1.0/release-log.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3479) Build pre-commit pipeline for pull requests.

2020-02-25 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044519#comment-17044519
 ] 

Junfan Zhang commented on OOZIE-3479:
-

We just discussed how to support pull requests CI. OOZIE-3590

What can i do for this patch. Also I'm looking forward your patch.

> Build pre-commit pipeline for pull requests.
> 
>
> Key: OOZIE-3479
> URL: https://issues.apache.org/jira/browse/OOZIE-3479
> Project: Oozie
>  Issue Type: Task
>Reporter: Gézapeti
>Priority: Major
>
> I think it would be great to accept pull requests for Oozie as well.
> We should build the pre-commit system out first to see how it works for us 
> before updating the site and changing the rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3588) Support oozie.wf.rerun.skip.nodes in subWorkflow?

2020-02-25 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044489#comment-17044489
 ] 

Junfan Zhang edited comment on OOZIE-3588 at 2/25/20 2:26 PM:
--

Maybe wrong code is 
[here|https://github.com/apache/oozie/blob/90b9f1077d63ce9dc255f8db170b68dcb8935910/core/src/main/java/org/apache/oozie/action/oozie/SubWorkflowActionExecutor.java#L177].

The scenario is to set the retryMax parameter, but the sub workflow does not 
rerun. You can reproduce this error. The real problem is that  subworkflow no 
reruning, not the jobid is not changed


was (Author: zuston):
Maybe wrong code is 
[here|https://github.com/apache/oozie/blob/90b9f1077d63ce9dc255f8db170b68dcb8935910/core/src/main/java/org/apache/oozie/action/oozie/SubWorkflowActionExecutor.java#L177].

The scenario is to set the retryMax parameter, but the sub workflow does not 
rerun. You can reproduce this error.

> Support oozie.wf.rerun.skip.nodes in subWorkflow?
> -
>
> Key: OOZIE-3588
> URL: https://issues.apache.org/jira/browse/OOZIE-3588
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Priority: Major
>
> In Oozie code, Subworkflow support oozie.wf.rerun.failed.nodes, but  don't 
> support oozie.wf.rerun.skip.nodes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3588) Support oozie.wf.rerun.skip.nodes in subWorkflow?

2020-02-25 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044489#comment-17044489
 ] 

Junfan Zhang commented on OOZIE-3588:
-

Maybe wrong code is 
[here|https://github.com/apache/oozie/blob/90b9f1077d63ce9dc255f8db170b68dcb8935910/core/src/main/java/org/apache/oozie/action/oozie/SubWorkflowActionExecutor.java#L177].

The scenario is to set the retryMax parameter, but the sub workflow does not 
rerun. You can reproduce this error.

> Support oozie.wf.rerun.skip.nodes in subWorkflow?
> -
>
> Key: OOZIE-3588
> URL: https://issues.apache.org/jira/browse/OOZIE-3588
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Priority: Major
>
> In Oozie code, Subworkflow support oozie.wf.rerun.failed.nodes, but  don't 
> support oozie.wf.rerun.skip.nodes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3589) Wrong using copyActionData method in ReRunXCommand

2020-02-25 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044476#comment-17044476
 ] 

Junfan Zhang commented on OOZIE-3589:
-

If at least one action is not deleted when reruning, the call function is 
required, and it needs to be called only once. So no need to call repeatedly. 
Do you think so?

>  Wrong using copyActionData method  in ReRunXCommand
> 
>
> Key: OOZIE-3589
> URL: https://issues.apache.org/jira/browse/OOZIE-3589
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
>
> Code is 
> [here|https://github.com/apache/oozie/blob/a40ab5361372aa73c9e4aa386a9c81bd21742aa4/core/src/main/java/org/apache/oozie/command/wf/ReRunXCommand.java#L213].
>  
> I don't think you should use the {{copyActionData}} method in a {{for}} loop. 
> Just call it externally.
> Fixed like:
> {code:java}
> for (int i = 0; i < actions.size(); i++) {
> // Skipping to delete the sub workflow when rerun failed node 
> option has been provided. As same
> // action will be used to rerun the job.
> if (!nodesToSkip.contains(actions.get(i).getName()) &&
> !(conf.getBoolean(OozieClient.RERUN_FAIL_NODES, false) &&
> 
> SubWorkflowActionExecutor.ACTION_TYPE.equals(actions.get(i).getType( {
> deleteList.add(actions.get(i));
> LOG.info("Deleting Action[{0}] for re-run", 
> actions.get(i).getId());
> }
> }
> copyActionData(newWfInstance, oldWfInstance);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3479) Build pre-commit pipeline for pull requests.

2020-02-25 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044462#comment-17044462
 ] 

Junfan Zhang commented on OOZIE-3479:
-

Here is more suitable for discussing pull request. Integrate Jenkins builds 
into GitHub pull request, including test or CI operations? like guideline 
[link|https://medium.com/@mreigen/integrate-jenkins-builds-into-github-pull-requests-33bc053d6210].
 [~asalamon74]

> Build pre-commit pipeline for pull requests.
> 
>
> Key: OOZIE-3479
> URL: https://issues.apache.org/jira/browse/OOZIE-3479
> Project: Oozie
>  Issue Type: Task
>Reporter: Gézapeti
>Priority: Major
>
> I think it would be great to accept pull requests for Oozie as well.
> We should build the pre-commit system out first to see how it works for us 
> before updating the site and changing the rules.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3590) Fix missing log expression parameters in SLACalculatorMemory

2020-02-25 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1709#comment-1709
 ] 

Junfan Zhang commented on OOZIE-3590:
-

Of course.~

I think github pull request will be more convenient code review. Maybe need to 
change jenkins configuration to support pull request

> Fix missing log expression parameters in SLACalculatorMemory
> 
>
> Key: OOZIE-3590
> URL: https://issues.apache.org/jira/browse/OOZIE-3590
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3590-v1.patch
>
>
> Fix missing log expression parameters in SLACalculatorMemory
>  
> Please check the linked github pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3590) Fix missing log expression parameters in SLACalculatorMemory

2020-02-25 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3590:

Attachment: OOZIE-3590-v1.patch

> Fix missing log expression parameters in SLACalculatorMemory
> 
>
> Key: OOZIE-3590
> URL: https://issues.apache.org/jira/browse/OOZIE-3590
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.0
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3590-v1.patch
>
>
> Fix missing log expression parameters in SLACalculatorMemory
>  
> Please check the linked github pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3588) Support oozie.wf.rerun.skip.nodes in subWorkflow?

2020-02-25 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044234#comment-17044234
 ] 

Junfan Zhang edited comment on OOZIE-3588 at 2/25/20 8:26 AM:
--

It seems that the sub-workflow also does not support the \{{retyMax}} 
parameter. Although the sub-workflow will rerun again when setting this param, 
the jobid has not changed, which is meaningless. Also this is not stated in the 
documentation. [~asalamon74], Please check it. Thanks.


was (Author: zuston):
It seems that the sub-workflow does not support the \{{retyMax}} parameter. 
Although the sub-workflow will rerun again, the jobid has not changed, which is 
meaningless. Also this is not stated in the documentation. [~asalamon74], 
Please check it. Thanks.

> Support oozie.wf.rerun.skip.nodes in subWorkflow?
> -
>
> Key: OOZIE-3588
> URL: https://issues.apache.org/jira/browse/OOZIE-3588
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Priority: Major
>
> In Oozie code, Subworkflow support oozie.wf.rerun.failed.nodes, but  don't 
> support oozie.wf.rerun.skip.nodes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3588) Support oozie.wf.rerun.skip.nodes in subWorkflow?

2020-02-25 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044234#comment-17044234
 ] 

Junfan Zhang commented on OOZIE-3588:
-

It seems that the sub-workflow does not support the \{{retyMax}} parameter. 
Although the sub-workflow will rerun again, the jobid has not changed, which is 
meaningless. Also this is not stated in the documentation. [~asalamon74], 
Please check it. Thanks.

> Support oozie.wf.rerun.skip.nodes in subWorkflow?
> -
>
> Key: OOZIE-3588
> URL: https://issues.apache.org/jira/browse/OOZIE-3588
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Priority: Major
>
> In Oozie code, Subworkflow support oozie.wf.rerun.failed.nodes, but  don't 
> support oozie.wf.rerun.skip.nodes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OOZIE-3590) Fix missing log expression parameters in SLACalculatorMemory

2020-02-24 Thread Junfan Zhang (Jira)
Junfan Zhang created OOZIE-3590:
---

 Summary: Fix missing log expression parameters in 
SLACalculatorMemory
 Key: OOZIE-3590
 URL: https://issues.apache.org/jira/browse/OOZIE-3590
 Project: Oozie
  Issue Type: Bug
  Components: core
Affects Versions: 5.2.0
Reporter: Junfan Zhang
Assignee: Junfan Zhang


Fix missing log expression parameters in SLACalculatorMemory

 

Please check the linked github pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OOZIE-3589) Wrong using copyActionData method in ReRunXCommand

2020-02-21 Thread Junfan Zhang (Jira)
Junfan Zhang created OOZIE-3589:
---

 Summary:  Wrong using copyActionData method  in ReRunXCommand
 Key: OOZIE-3589
 URL: https://issues.apache.org/jira/browse/OOZIE-3589
 Project: Oozie
  Issue Type: Bug
  Components: core
Reporter: Junfan Zhang
Assignee: Junfan Zhang


Code is 
[here|https://github.com/apache/oozie/blob/a40ab5361372aa73c9e4aa386a9c81bd21742aa4/core/src/main/java/org/apache/oozie/command/wf/ReRunXCommand.java#L213].
 

I don't think you should use the {{copyActionData}} method in a {{for}} loop. 
Just call it externally.

Fixed like:

{code:java}
for (int i = 0; i < actions.size(); i++) {
// Skipping to delete the sub workflow when rerun failed node 
option has been provided. As same
// action will be used to rerun the job.
if (!nodesToSkip.contains(actions.get(i).getName()) &&
!(conf.getBoolean(OozieClient.RERUN_FAIL_NODES, false) &&

SubWorkflowActionExecutor.ACTION_TYPE.equals(actions.get(i).getType( {
deleteList.add(actions.get(i));
LOG.info("Deleting Action[{0}] for re-run", 
actions.get(i).getId());
}
}
copyActionData(newWfInstance, oldWfInstance);
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3588) Support oozie.wf.rerun.skip.nodes in subWorkflow?

2020-02-20 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3588:

Description: In Oozie code, Subworkflow support 
oozie.wf.rerun.failed.nodes, but  don't support oozie.wf.rerun.skip.nodes   
(was: In Oozie code, Subworkflow support oozie.wf.rerun.failed.nodes, but  )

> Support oozie.wf.rerun.skip.nodes in subWorkflow?
> -
>
> Key: OOZIE-3588
> URL: https://issues.apache.org/jira/browse/OOZIE-3588
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Priority: Major
>
> In Oozie code, Subworkflow support oozie.wf.rerun.failed.nodes, but  don't 
> support oozie.wf.rerun.skip.nodes 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (OOZIE-3588) Support oozie.wf.rerun.skip.nodes in subWorkflow?

2020-02-20 Thread Junfan Zhang (Jira)
Junfan Zhang created OOZIE-3588:
---

 Summary: Support oozie.wf.rerun.skip.nodes in subWorkflow?
 Key: OOZIE-3588
 URL: https://issues.apache.org/jira/browse/OOZIE-3588
 Project: Oozie
  Issue Type: Bug
  Components: core
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3588) Support oozie.wf.rerun.skip.nodes in subWorkflow?

2020-02-20 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3588:

Description: In Oozie code, Subworkflow support 
oozie.wf.rerun.failed.nodes, but  

> Support oozie.wf.rerun.skip.nodes in subWorkflow?
> -
>
> Key: OOZIE-3588
> URL: https://issues.apache.org/jira/browse/OOZIE-3588
> Project: Oozie
>  Issue Type: Bug
>  Components: core
>Reporter: Junfan Zhang
>Priority: Major
>
> In Oozie code, Subworkflow support oozie.wf.rerun.failed.nodes, but  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3582) Rerun end point returning 401 error

2020-02-06 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031563#comment-17031563
 ] 

Junfan Zhang commented on OOZIE-3582:
-

Do you set kerberos true? [~mdbilaly2k]

> Rerun end point returning 401 error
> ---
>
> Key: OOZIE-3582
> URL: https://issues.apache.org/jira/browse/OOZIE-3582
> Project: Oozie
>  Issue Type: Bug
>  Components: security
>Affects Versions: 4.2.0
>Reporter: Mohammed Bilal
>Priority: Critical
> Fix For: 4.2.0
>
>
> Hi,
> I am making HTTP put request to rerun oozie job, but I am seeing 
> authorization issue.
> I added basic authorization, but no luck.
> I referred web API and followed all the steps mentioned there.
> [https://oozie.apache.org/docs/4.2.0/WebServicesAPI.html#Re-Running_a_Workflow_Job]
> Here there is no information about any kind of security mechanism which we 
> need to follow before calling rerun end point. I was able to invoke post 
> method without any issue to start a oozie job without any authentication.
> Could you please provide me more details on how I can fix this issue?
> Error Log:
> org.springframework.web.client.HttpClientErrorException: 401 
> Unauthorizedorg.springframework.web.client.HttpClientErrorException: 401 
> Unauthorized at 
> org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:94)
>  ~[spring-web-5.0.5.RELEASE.jar:5.0.5.RELEASE] at 
> org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:79)
>  ~[spring-web-5.0.5.RELEASE.jar:5.0.5.RELEASE] at 
> org.springframework.web.client.ResponseErrorHandler.handleError(ResponseErrorHandler.java:63)
>  ~[spring-web-5.0.5.RELEASE.jar:5.0.5.RELEASE] at 
> org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:775)
>  ~[spring-web-5.0.5.RELEASE.jar:5.0.5.RELEASE] at 
> org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:728) 
> ~[spring-web-5.0.5.RELEASE.jar:5.0.5.RELEASE] at 
> org.springframework.web.client.RestTemplate.execute(RestTemplate.java:694) 
> ~[spring-web-5.0.5.RELEASE.jar:5.0.5.RELEASE] at 
> org.springframework.web.client.RestTemplate.put(RestTemplate.java:503) 
> ~[spring-web-5.0.5.RELEASE.jar:5.0.5.RELEASE]
> Thank you!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3254) [coordinator] LAST_ONLY and NONE execution modes: possible OutOfMemoryError when there are too many coordinator actions to materialize

2020-02-05 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030596#comment-17030596
 ] 

Junfan Zhang commented on OOZIE-3254:
-

[~asalamon74] Can you commit patch using github {{pull request}}, which seems 
more readable.

> [coordinator] LAST_ONLY and NONE execution modes: possible OutOfMemoryError 
> when there are too many coordinator actions to materialize
> --
>
> Key: OOZIE-3254
> URL: https://issues.apache.org/jira/browse/OOZIE-3254
> Project: Oozie
>  Issue Type: Bug
>  Components: coordinator
>Affects Versions: 5.0.0
>Reporter: Andras Piros
>Assignee: Andras Salamon
>Priority: Major
> Attachments: OOZIE-3254-01-wip.patch
>
>
> If there is a coordinator job defined with a {{frequency}} by the minute 
> (e.g. {{frequency="* * * * *"}}), and {{start-time}} lies well in the past, 
> and the coordinator job's {{execution-mode}} is {{LAST_ONLY}} or {{NONE}}, it 
> can happen that too many {{CoordinatorActionBean}} instances are kept on JVM 
> heap within {{CoordMaterializeTransitionXCommand#insertList}} as those 
> execution modes [*omit the check for the {{throttle}} 
> value*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java#L439-L443].
> As a consequence, we can see as many as multiple hundred thousands of log 
> entries [*trying to increase 
> {{CoordMaterializeTransitionXCommand#insertList}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/coord/CoordMaterializeTransitionXCommand.java#L560-L566]:
> {noformat}
> [user@host ~]$ grep 'In storeToDB() coord action id' 
> /var/log/oozie/oozie-HOSTNAME.log.out | wc -l
> 478408
> {noformat}
> A much worse consequence is that those {{CoordinatorActionBean}} instances 
> are attached to GC root (the {{insertList}} itself), and thus, JVM is unable 
> to free them until a consequent call to {{insertList.clear()}}. This will 
> result in {{OutOfMemoryError}} occurrence in worst case.
> {{CoordMaterializeTransitionXCommand#insertList}} should be watched for a 
> configurable limit parameter (default value something like 1000), and 
> persisted / cleared when that limit is reached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-2425) LIFO executes action 1 and then does the LIFO behavior

2020-02-05 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030593#comment-17030593
 ] 

Junfan Zhang commented on OOZIE-2425:
-

[~asalamon74] In fact, this problem about {{LIFO}} still exists, and we may 
need to rewrite the {{LIFO}} execution rule to support some catch-up tasks. 
Solve the problem with the ideas [~rkanter] described

In addition, I think the coordinator does not support the minute-level tasks 
very well. The materizalization  is too slow. Do you have any good ideas? Maybe 
we can do something together.

> LIFO executes action 1 and then does the LIFO behavior
> --
>
> Key: OOZIE-2425
> URL: https://issues.apache.org/jira/browse/OOZIE-2425
> Project: Oozie
>  Issue Type: Bug
>  Components: coordinator
>Affects Versions: trunk
>Reporter: Robert Kanter
>Priority: Major
>
> When using LIFO execution order, Oozie should run the latest READY action 
> first.  This includes when you start your Coordinator in the past, and Oozie 
> plays "catch-up".  However, Oozie will materialize 12 actions (the default), 
> but will do @1 and then do the LIFO behavior (i.e. @12, @11, etc).  This is 
> likely due to a race condition about when the other actions are 
> materialized/READY vs when Oozie checks to run an action, as the actions are 
> materialized in FIFO order.
> {noformat}
> ID StatusExt ID   
> Err Code  Created  Nominal Time
> 113-151009113906602-oozie-oozi-C@1 SUCCEEDED 
> 114-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 11:30 GMT
> 
> 113-151009113906602-oozie-oozi-C@2 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:35 GMT
> 
> 113-151009113906602-oozie-oozi-C@3 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:40 GMT
> 
> 113-151009113906602-oozie-oozi-C@4 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:45 GMT
> 
> 113-151009113906602-oozie-oozi-C@5 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:50 GMT
> 
> 113-151009113906602-oozie-oozi-C@6 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:55 GMT
> 
> 113-151009113906602-oozie-oozi-C@7 RUNNING   
> 132-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:00 GMT
> 
> 113-151009113906602-oozie-oozi-C@8 SUCCEEDED 
> 119-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:05 GMT
> 
> 113-151009113906602-oozie-oozi-C@9 SUCCEEDED 
> 118-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:10 GMT
> 
> 113-151009113906602-oozie-oozi-C@10SUCCEEDED 
> 117-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:15 GMT
> 
> 113-151009113906602-oozie-oozi-C@11SUCCEEDED 
> 116-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:20 GMT
> 
> 113-151009113906602-oozie-oozi-C@12SUCCEEDED 
> 115-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:25 GMT
> 

[jira] [Commented] (OOZIE-2425) LIFO executes action 1 and then does the LIFO behavior

2020-02-05 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030582#comment-17030582
 ] 

Junfan Zhang commented on OOZIE-2425:
-

Do you have ideas on it? [~asalamon74]

> LIFO executes action 1 and then does the LIFO behavior
> --
>
> Key: OOZIE-2425
> URL: https://issues.apache.org/jira/browse/OOZIE-2425
> Project: Oozie
>  Issue Type: Bug
>  Components: coordinator
>Affects Versions: trunk
>Reporter: Robert Kanter
>Priority: Major
>
> When using LIFO execution order, Oozie should run the latest READY action 
> first.  This includes when you start your Coordinator in the past, and Oozie 
> plays "catch-up".  However, Oozie will materialize 12 actions (the default), 
> but will do @1 and then do the LIFO behavior (i.e. @12, @11, etc).  This is 
> likely due to a race condition about when the other actions are 
> materialized/READY vs when Oozie checks to run an action, as the actions are 
> materialized in FIFO order.
> {noformat}
> ID StatusExt ID   
> Err Code  Created  Nominal Time
> 113-151009113906602-oozie-oozi-C@1 SUCCEEDED 
> 114-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 11:30 GMT
> 
> 113-151009113906602-oozie-oozi-C@2 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:35 GMT
> 
> 113-151009113906602-oozie-oozi-C@3 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:40 GMT
> 
> 113-151009113906602-oozie-oozi-C@4 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:45 GMT
> 
> 113-151009113906602-oozie-oozi-C@5 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:50 GMT
> 
> 113-151009113906602-oozie-oozi-C@6 READY -
> - 2015-10-16 20:42 GMT 2015-10-15 11:55 GMT
> 
> 113-151009113906602-oozie-oozi-C@7 RUNNING   
> 132-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:00 GMT
> 
> 113-151009113906602-oozie-oozi-C@8 SUCCEEDED 
> 119-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:05 GMT
> 
> 113-151009113906602-oozie-oozi-C@9 SUCCEEDED 
> 118-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:10 GMT
> 
> 113-151009113906602-oozie-oozi-C@10SUCCEEDED 
> 117-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:15 GMT
> 
> 113-151009113906602-oozie-oozi-C@11SUCCEEDED 
> 116-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:20 GMT
> 
> 113-151009113906602-oozie-oozi-C@12SUCCEEDED 
> 115-151009113906602-oozie-oozi-W - 2015-10-16 20:42 GMT 
> 2015-10-15 12:25 GMT
> 
> {noformat}
> The Coordinator was super simple. It was set to LIFO, and was running a 
> workflow with a Shell action that would run sleep with an argument of 10 
> seconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3569) SSH Action should add checking success file

2020-01-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027194#comment-17027194
 ] 

Junfan Zhang commented on OOZIE-3569:
-

[~asalamon74] Thanks~

> SSH Action should add checking success file
> ---
>
> Key: OOZIE-3569
> URL: https://issues.apache.org/jira/browse/OOZIE-3569
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Fix For: 5.3.0
>
> Attachments: OOZIE-3569-v1.patch, OOZIE-3569-v2.patch, 
> OOZIE-3569-v3.patch, OOZIE-3569-v4.patch, OOZIE-3569-v5.patch
>
>
> *Phenomenon* 
> Currently, {{SSH Action}} checking operation are as following: 
> Firstly, check operation is to check {{Oozie}} ppid. When pgid does not 
> exist, check whether there is an error file. If the error file does not 
> exist, {{Oozie}} will set action status {{OK}}
> However, when {{Oozie}} pgid is killed externally, this action will be 
> incorrectly determined to be successful.
> *Solution*
> In ssh-wrapper.sh, when command execution is OK, {{Oozie}} should touch a 
> success empty file like touching error file.
> In {{SshActionExecutor}} check method, Oozie should add checking the success 
> file existence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3581) Callback does not applied in Oozie server, workflows stuk in RUNNING states.

2020-01-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027193#comment-17027193
 ] 

Junfan Zhang edited comment on OOZIE-3581 at 1/31/20 4:59 AM:
--

[~vkotsubinskiy]  Yes, this callback will trigger immediately event, firstly 
trigger {{CompletedActionXCommand}} and then {{ActionCheckXCommand}}.

If there is a delay problem, there may be a bug in it. The code is 
[here.|https://github.com/apache/oozie/blob/5b55169c6743194223a0756756fc7247eb673aca/core/src/main/java/org/apache/oozie/servlet/CallbackServlet.java#L121]


was (Author: zuston):
Yes, this callback will trigger immediately event, firstly trigger 
{{CompletedActionXCommand}} and then {{ActionCheckXCommand}}. 

If there is a delay problem, there may be a bug in it. The code is 
[here|https://github.com/apache/oozie/blob/5b55169c6743194223a0756756fc7247eb673aca/core/src/main/java/org/apache/oozie/servlet/CallbackServlet.java#L121].

> Callback does not applied in Oozie server, workflows stuk in RUNNING states.
> 
>
> Key: OOZIE-3581
> URL: https://issues.apache.org/jira/browse/OOZIE-3581
> Project: Oozie
>  Issue Type: Bug
>  Components: action, workflow
>Affects Versions: 4.3.1
>Reporter: Kotsubinsky Victor
>Priority: Critical
>
> oozie version 4.3.1.3.1.0.0-78
> with HDP3.10 stack , release provides Oozie 4.3.1 and the additional Apache 
> patches listed here: 
> [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_oozie.html]
> I use Hadoop kerberized cluster, run on OOZIE, YARN Mr jobs.
> 1. OOzie run mr-job via YARN
> 2. After YARN mr job completed,YYARN mr-job Successfully sent Callback 
> request to OOzie,
> 3. in logs OOzie server, i can see this request, but OOZIE does not apply 
> this callback request, so in WF-action-id i still see RUNNING state (until 
> action.check process check wf-ids and swith action-id to SUCCESS state)
> LOGS in YARN:
> 2020-01-27 12:16:39,749 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification trying 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification to 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
>  succeeded
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification succeeded for job_1579778851579_31505
>  
> Oozie logs about this event:
> 2020-01-27 12:16:39,770 DEBUG CallbackServlet:526 - SERVER[hdp3-oo-2] USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Received a CallbackServlet.doGet() with query string 
> id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Execute command [callback] key [null]
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Queuing [1] commands with delay [0]ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3581) Callback does not applied in Oozie server, workflows stuk in RUNNING states.

2020-01-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027193#comment-17027193
 ] 

Junfan Zhang commented on OOZIE-3581:
-

Yes, this callback will trigger immediately event, firstly trigger 
{{CompletedActionXCommand}} and then {{ActionCheckXCommand}}. 

If there is a delay problem, there may be a bug in it. The code is 
[here|https://github.com/apache/oozie/blob/5b55169c6743194223a0756756fc7247eb673aca/core/src/main/java/org/apache/oozie/servlet/CallbackServlet.java#L121].

> Callback does not applied in Oozie server, workflows stuk in RUNNING states.
> 
>
> Key: OOZIE-3581
> URL: https://issues.apache.org/jira/browse/OOZIE-3581
> Project: Oozie
>  Issue Type: Bug
>  Components: action, workflow
>Affects Versions: 4.3.1
>Reporter: Kotsubinsky Victor
>Priority: Critical
>
> oozie version 4.3.1.3.1.0.0-78
> with HDP3.10 stack , release provides Oozie 4.3.1 and the additional Apache 
> patches listed here: 
> [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_oozie.html]
> I use Hadoop kerberized cluster, run on OOZIE, YARN Mr jobs.
> 1. OOzie run mr-job via YARN
> 2. After YARN mr job completed,YYARN mr-job Successfully sent Callback 
> request to OOzie,
> 3. in logs OOzie server, i can see this request, but OOZIE does not apply 
> this callback request, so in WF-action-id i still see RUNNING state (until 
> action.check process check wf-ids and swith action-id to SUCCESS state)
> LOGS in YARN:
> 2020-01-27 12:16:39,749 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification trying 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification to 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
>  succeeded
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification succeeded for job_1579778851579_31505
>  
> Oozie logs about this event:
> 2020-01-27 12:16:39,770 DEBUG CallbackServlet:526 - SERVER[hdp3-oo-2] USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Received a CallbackServlet.doGet() with query string 
> id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Execute command [callback] key [null]
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Queuing [1] commands with delay [0]ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (OOZIE-3569) SSH Action should add checking success file

2020-01-30 Thread Junfan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/OOZIE-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junfan Zhang updated OOZIE-3569:

Attachment: OOZIE-3569-v5.patch

> SSH Action should add checking success file
> ---
>
> Key: OOZIE-3569
> URL: https://issues.apache.org/jira/browse/OOZIE-3569
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3569-v1.patch, OOZIE-3569-v2.patch, 
> OOZIE-3569-v3.patch, OOZIE-3569-v4.patch, OOZIE-3569-v5.patch
>
>
> *Phenomenon* 
> Currently, {{SSH Action}} checking operation are as following: 
> Firstly, check operation is to check {{Oozie}} ppid. When pgid does not 
> exist, check whether there is an error file. If the error file does not 
> exist, {{Oozie}} will set action status {{OK}}
> However, when {{Oozie}} pgid is killed externally, this action will be 
> incorrectly determined to be successful.
> *Solution*
> In ssh-wrapper.sh, when command execution is OK, {{Oozie}} should touch a 
> success empty file like touching error file.
> In {{SshActionExecutor}} check method, Oozie should add checking the success 
> file existence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3581) Callback does not applied in Oozie server, workflows stuk in RUNNING states.

2020-01-30 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026653#comment-17026653
 ] 

Junfan Zhang commented on OOZIE-3581:
-

Hey [~vkotsubinskiy], {{callback}} will trigger {{checkCommand}} instead of 
directly determining action status based on {{callback}}.

> Callback does not applied in Oozie server, workflows stuk in RUNNING states.
> 
>
> Key: OOZIE-3581
> URL: https://issues.apache.org/jira/browse/OOZIE-3581
> Project: Oozie
>  Issue Type: Bug
>  Components: action, workflow
>Affects Versions: 4.3.1
>Reporter: Kotsubinsky Victor
>Priority: Critical
>
> oozie version 4.3.1.3.1.0.0-78
> with HDP3.10 stack , release provides Oozie 4.3.1 and the additional Apache 
> patches listed here: 
> [https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/release-notes/content/patch_oozie.html]
> I use Hadoop kerberized cluster, run on OOZIE, YARN Mr jobs.
> 1. OOzie run mr-job via YARN
> 2. After YARN mr job completed,YYARN mr-job Successfully sent Callback 
> request to OOzie,
> 3. in logs OOzie server, i can see this request, but OOZIE does not apply 
> this callback request, so in WF-action-id i still see RUNNING state (until 
> action.check process check wf-ids and swith action-id to SUCCESS state)
> LOGS in YARN:
> 2020-01-27 12:16:39,749 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification trying 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification to 
> http://hdp3-oozie:11000/oozie/callback?id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
>  succeeded
> 2020-01-27 12:16:39,772 INFO [Thread-78] org.eclipse.jetty.util.log: Job end 
> notification succeeded for job_1579778851579_31505
>  
> Oozie logs about this event:
> 2020-01-27 12:16:39,770 DEBUG CallbackServlet:526 - SERVER[hdp3-oo-2] USER[-] 
> GROUP[-] TOKEN[-] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Received a CallbackServlet.doGet() with query string 
> id=0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java&status=SUCCEEDED
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Execute command [callback] key [null]
> 2020-01-27 12:16:39,776 DEBUG CompletedActionXCommand:526 - SERVER[hdp3-oo-2] 
> USER[-] GROUP[-] TOKEN[] APP[-] JOB[0005607-200123121357414-oozie-oozi-W] 
> ACTION[0005607-200123121357414-oozie-oozi-W@rdb-full-table-extract-java] 
> Queuing [1] commands with delay [0]ms



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (OOZIE-3574) JavaAction create incorrect fileSystem instance in addActionLibs method

2020-01-27 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024927#comment-17024927
 ] 

Junfan Zhang commented on OOZIE-3574:
-

Rebase it. UT looks OK. [~asalamon74]

> JavaAction create incorrect fileSystem instance in addActionLibs method
> ---
>
> Key: OOZIE-3574
> URL: https://issues.apache.org/jira/browse/OOZIE-3574
> Project: Oozie
>  Issue Type: Sub-task
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3574-v1.patch, OOZIE-3574-v2.patch, 
> OOZIE-3574-v3.patch, OOZIE-3574-v4.patch
>
>
> Code is 
> [here|https://github.com/apache/oozie/blob/9c288fe5cea6f2fbbae76f720b9e215acdd07709/core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java#L734].
> If actionlibPath scheme is different from appPath (like actionLibPath's 
> scheme is s3a, but the appPath is hdfs), this will fail to execute 
> {{fs.exist(actionLibsPath)}}. So i think Oozie should create fileSystem by 
> actionLibsPath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (OOZIE-3569) SSH Action should add checking success file

2020-01-27 Thread Junfan Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/OOZIE-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022769#comment-17022769
 ] 

Junfan Zhang edited comment on OOZIE-3569 at 1/28/20 7:47 AM:
--

Sorry, i fix it. [~asalamon74]


was (Author: zuston):
Sorry, i fix it.

> SSH Action should add checking success file
> ---
>
> Key: OOZIE-3569
> URL: https://issues.apache.org/jira/browse/OOZIE-3569
> Project: Oozie
>  Issue Type: Bug
>Reporter: Junfan Zhang
>Assignee: Junfan Zhang
>Priority: Major
> Attachments: OOZIE-3569-v1.patch, OOZIE-3569-v2.patch, 
> OOZIE-3569-v3.patch, OOZIE-3569-v4.patch
>
>
> *Phenomenon* 
> Currently, {{SSH Action}} checking operation are as following: 
> Firstly, check operation is to check {{Oozie}} ppid. When pgid does not 
> exist, check whether there is an error file. If the error file does not 
> exist, {{Oozie}} will set action status {{OK}}
> However, when {{Oozie}} pgid is killed externally, this action will be 
> incorrectly determined to be successful.
> *Solution*
> In ssh-wrapper.sh, when command execution is OK, {{Oozie}} should touch a 
> success empty file like touching error file.
> In {{SshActionExecutor}} check method, Oozie should add checking the success 
> file existence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >