[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117685#comment-15117685 ] Purshotam Shah commented on OOZIE-2394: --- {quote} On second thoughts, I think it would be good to retain the execute synchronous code in XCommand. It can be enhanced to avoid loadState() and the objects passed in constructor itself further saving time and being more performant. So can we just set synchronous to false when requeuing? {quote} We should leave it for command to decide. If an object is passed, then simple check in loadState can exclude loading of that object. Lot of time loadState loads multiple objects. Skipping loadState won't work for those commands. Example. {code:title=ActionStartXCommand.java} protected void loadState() throws CommandException { try { jpaService = Services.get().get(JPAService.class); if (jpaService != null) { if (wfJob == null) { this.wfJob = WorkflowJobQueryExecutor.getInstance().get(WorkflowJobQuery.GET_WORKFLOW, jobId); } this.wfAction = WorkflowActionQueryExecutor.getInstance().get(WorkflowActionQuery.GET_ACTION, actionId); LogUtils.setLogInfo( wfJob); LogUtils.setLogInfo(wfAction); } else { throw new CommandException(ErrorCode.E0610); } } catch (XException ex) { throw new CommandException(ex); } } > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117699#comment-15117699 ] Rohini Palaniswamy commented on OOZIE-2394: --- Ok. +1. If needed in future it would not be hard to add it back. > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073127#comment-15073127 ] Rohini Palaniswamy commented on OOZIE-2394: --- bq. Sounds good. But I would like to see the stress test run and confirmed that there is no performance regression before giving +1 for this patch. On second thoughts, I think it would be good to retain the execute synchronous code in XCommand. It can be enhanced to avoid loadState() and the objects passed in constructor itself further saving time and being more performant. So can we just set synchronous to false when requeuing? > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067116#comment-15067116 ] Purshotam Shah commented on OOZIE-2394: --- {quote} Removing that will impact performance very badly and we can't do that. {quote} None of wfstart,end command uses eager and eager verify. So, I don't think that there will be any performance issue. In fact, wherever I saw eager and eager verify, they have incorrect logic or there are multiple verify/load Ex. BundleJobChangeXCommand does prevalidate in eagerVerifyPrecondition which is incorrect. eagerVerifyPrecondition happens before acquiring lock. So if try to pause a bundle, which has SUCCEDED, it possible. {code BundleJobChangeXCommand.java} @Override protected void verifyPrecondition() throws CommandException, PreconditionException { } @Override protected void eagerLoadState() throws CommandException { try { this.bundleJob = BundleJobQueryExecutor.getInstance().get(BundleJobQuery.GET_BUNDLE_JOB_STATUS, jobId); LogUtils.setLogInfo(bundleJob); } catch (JPAExecutorException ex) { throw new CommandException(ex); } } @Override protected void eagerVerifyPrecondition() throws CommandException, PreconditionException { validateChangeValue(changeValue); if (bundleJob == null) { LOG.info("BundleChangeCommand not succeeded - " + "job " + jobId + " does not exist"); throw new PreconditionException(ErrorCode.E1314, jobId); } if (isChangePauseTime) { if (bundleJob.getStatus() == Job.Status.SUCCEEDED || bundleJob.getStatus() == Job.Status.FAILED || bundleJob.getStatus() == Job.Status.KILLED || bundleJob.getStatus() == Job.Status.DONEWITHERROR) { LOG.info("BundleChangeCommand not succeeded for changing pausetime- " + "job " + jobId + " finished, status is " + bundleJob.getStatusStr()); throw new PreconditionException(ErrorCode.E1312, jobId, bundleJob.getStatus().toString()); } } else if(isChangeEndTime){ if (bundleJob.getStatus() == Job.Status.KILLED) { LOG.info("BundleChangeCommand not succeeded for changing endtime- " + "job " + jobId + " finished, status is " + bundleJob.getStatusStr()); throw new PreconditionException(ErrorCode.E1312, jobId, bundleJob.getStatus().toString()); } } } {code} bq. Even if locks are re-entrant, acquiring them has minor cost associated with it and so it was skipping that as well which is good. No, I don't think so. This information is done at client side. I guess curator framework don't even have to call the ZK to check that. Custom doing this, which is already done as part of java/curator framework might introduce more bug. Only issue is see is https://issues.apache.org/jira/browse/OOZIE-1922. Will fix this as well. > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067134#comment-15067134 ] Rohini Palaniswamy commented on OOZIE-2394: --- bq. None of wfstart,end command uses eager and eager verify. So, I don't think that there will be any performance issue. bq. No, I don't think so. This information is done at client side. I guess curator framework don't even have to call the ZK to check that. Sounds good. But I would like to see the stress test run and confirmed that there is no performance regression before giving +1 for this patch. bq. Only issue is see is https://issues.apache.org/jira/browse/OOZIE-1922. Will fix this as well. OOZIE-1922 will have to be committed before this patch can have +1 as this change without that patch will totally break for MemoryLocksService. Also you will have to verify if curator lock also has any such issues. > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067137#comment-15067137 ] Purshotam Shah commented on OOZIE-2394: --- {quote} 3. It would be good to make at least a simple test to show that XCommands and the locking is reentrant. Perhaps just create a DummyXCommand that synchronously calls a Dummy2XCommand and both have the same entity key or something like that? The test should probably have some kind of timeout in case this ever breaks. {quote} We may not need extra testcase for this. All testcases which submit WF and wait for WF to complete or any cord action which goes from waiting to running uses this behaviour. > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063077#comment-15063077 ] Rohini Palaniswamy commented on OOZIE-2394: --- bq. All our commands are reentrant, we don't have to have set synchronized flag to run multiple command from same thread. Synchronous mode was mainly to avoid eager loading and checking pre-conditions which wasted lot of the time. Removing that will impact performance very badly and we can't do that. Even if locks are re-entrant, acquiring them has minor cost associated with it and so it was skipping that as well which is good. Can't we just set synchronous to false before re-queueing? > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057442#comment-15057442 ] Hadoop QA commented on OOZIE-2394: -- Testing JIRA OOZIE-2394 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1702 .Tests failed: 2 .Tests errors: 0 .The patch failed the following testcases: . testForNoDuplicates(org.apache.oozie.event.TestEventGeneration) . testSamplers(org.apache.oozie.util.TestMetricsInstrumentation) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/2633/ > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027635#comment-15027635 ] Robert Kanter commented on OOZIE-2394: -- A few minor things: # XCommand has an else statement that now has a new line {code} } else { {code} became {code} } else { {code} Can you put that back? # There's a lot of formatting changes in areas of the code that weren't changed. Don't we generally not fix those unless actually changing that part of the code? # It would be good to make at least a simple test to show that XCommands and the locking is reentrant. Perhaps just create a DummyXCommand that synchronously calls a Dummy2XCommand and both have the same entity key or something like that? The test should probably have some kind of timeout in case this ever breaks. > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (OOZIE-2394) Oozie can execute command without holding lock
[ https://issues.apache.org/jira/browse/OOZIE-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026088#comment-15026088 ] Hadoop QA commented on OOZIE-2394: -- Testing JIRA OOZIE-2394 Cleaning local git workspace {color:green}+1 PATCH_APPLIES{color} {color:green}+1 CLEAN{color} {color:red}-1 RAW_PATCH_ANALYSIS{color} .{color:green}+1{color} the patch does not introduce any @author tags .{color:green}+1{color} the patch does not introduce any tabs .{color:green}+1{color} the patch does not introduce any trailing spaces .{color:green}+1{color} the patch does not introduce any line longer than 132 .{color:red}-1{color} the patch does not add/modify any testcase {color:green}+1 RAT{color} .{color:green}+1{color} the patch does not seem to introduce new RAT warnings {color:green}+1 JAVADOC{color} .{color:green}+1{color} the patch does not seem to introduce new Javadoc warnings {color:green}+1 COMPILE{color} .{color:green}+1{color} HEAD compiles .{color:green}+1{color} patch compiles .{color:green}+1{color} the patch does not seem to introduce new javac warnings {color:green}+1 BACKWARDS_COMPATIBILITY{color} .{color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations .{color:green}+1{color} the patch does not modify JPA files {color:red}-1 TESTS{color} .Tests run: 1698 .Tests failed: 7 .Tests errors: 0 .The patch failed the following testcases: . testActionCheckTransientDuringLauncher(org.apache.oozie.command.wf.TestActionCheckXCommand) . testPauseBundleAndCoordinator(org.apache.oozie.service.TestPauseTransitService) . testBundlePauseExtendMaterializesCoordinator(org.apache.oozie.command.bundle.TestBundleChangeXCommand) . testbulkWfKillSuccess(org.apache.oozie.command.wf.TestBulkWorkflowXCommand) . testForNoDuplicates(org.apache.oozie.event.TestEventGeneration) . testRerunVariableSub(org.apache.oozie.command.wf.TestReRunXCommand) . testSamplers(org.apache.oozie.util.TestMetricsInstrumentation) {color:green}+1 DISTRO{color} .{color:green}+1{color} distro tarball builds with the patch {color:red}*-1 Overall result, please check the reported -1(s)*{color} The full output of the test-patch run is available at . https://builds.apache.org/job/oozie-trunk-precommit-build/2600/ > Oozie can execute command without holding lock > -- > > Key: OOZIE-2394 > URL: https://issues.apache.org/jira/browse/OOZIE-2394 > Project: Oozie > Issue Type: Bug >Reporter: Purshotam Shah >Assignee: Purshotam Shah >Priority: Critical > Attachments: OOZIE-2394-V1.patch > > > To speedup job submission ( not the the forked actions) we create workflow > actions synchronously. We call ActionStartXCommand from SignalXCommand by > setting isSynchronous = true. This will bypass lock acquiring, which is Ok, > SignalXCommand will have the job lock. > If there is transient error. Same command is requeued which will have > isSynchronous flag set to true. > Requeued command will wake-up and started executing without acquiring lock. > If the job submission takes more than 2 min, then we might have issue. > Action recovery is set to 2 min ( default), Recovery service will run and > submitted new the command. since the first command didn't acquire any lock. > Recovery will be able to run the new command. > We will have two same command running parallely. > All our commands are reentrant, we don't have to have set synchronized flag > to run multiple command from same thread. > Because of reentrant, command running in same thread should be able to > acquire same lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)