Hi Junkai, There is no manual workflow killing logic implemented but as you have suggested, I need to verify that. Unfortunately all the helix log levels in our servers were set to WARN as helix is printing a whole lot of logs in INFO level so there is no much valuable information in logs. Can you specify which class is printing logs associated for workflow termination and I'll enable DEBUG level for that class and observe further.
Thanks Dimuthu On Fri, Nov 9, 2018 at 9:18 PM Xue Junkai <[email protected]> wrote: > Hmm, that's very strange. The user content store znode only has been > deleted when the workflow is gone. From the log, it shows the znode is > gone. Could you please try to dig the log to find whether the workflow has > been manually killed? If that's the case, then it is possible you have the > problem. > > On Fri, Nov 9, 2018 at 12:13 PM DImuthu Upeksha < > [email protected]> > wrote: > > > Hi Junkai, > > > > Thanks for your suggestion. You have captured most of the parts > correctly. > > There are two jobs as job1 and job2. And there is a dependency that job2 > > depends on job1. Until job1 is completed job2 should not be scheduled. > And > > task 1 in job 1 is calling that method and it is not updating anyone's > > content. It's just putting and value in workflow level. What do you mean > my > > keeping a key-value store in workflow level? I already use that key value > > store given by helix by calling putUserContent method. > > > > public void sendNextJob(String jobId) { > > putUserContent(WORKFLOW_STARTED, "TRUE", Scope.WORKFLOW); > > if (jobId != null) { > > putUserContent(NEXT_JOB, jobId, Scope.WORKFLOW); > > } > > } > > > > Dimuthu > > > > > > On Fri, Nov 9, 2018 at 2:48 PM Xue Junkai <[email protected]> wrote: > > > > > In my understanding, it could be you have job1 and job2. The task > running > > > in job1 tries to update content for job2. Then, there could be a race > > > condition happening here that job2 is not scheduled. > > > > > > If that's the case, I suggest you can put key-value store at workflow > > level > > > since this is cross-job operation. > > > > > > Best, > > > > > > Junkai > > > > > > On Fri, Nov 9, 2018 at 11:45 AM DImuthu Upeksha < > > > [email protected]> > > > wrote: > > > > > > > Hi Junkai, > > > > > > > > This method is being called inside a running task. And it is working > > for > > > > most of the time. I only saw this in 2 occasions for last few months > > and > > > > both of them happened today and yesterday. > > > > > > > > Thanks > > > > Dimuthu > > > > > > > > On Fri, Nov 9, 2018 at 2:40 PM Xue Junkai <[email protected]> > > wrote: > > > > > > > > > User content store node will be created one the job has been > > scheduled. > > > > In > > > > > your case, I think the job is not scheduled. This method usually > has > > > been > > > > > utilized in running task. > > > > > > > > > > Best, > > > > > > > > > > Junkai > > > > > > > > > > On Fri, Nov 9, 2018 at 8:19 AM DImuthu Upeksha < > > > > [email protected] > > > > > > > > > > > wrote: > > > > > > > > > > > Hi Helix Folks, > > > > > > > > > > > > I'm having this sporadic issue in some tasks of our workflows > when > > we > > > > try > > > > > > to store a value in the workflow context and I have added both > code > > > > > section > > > > > > and error message below. Do you have an idea what's causing this? > > > > Please > > > > > > let me know if you need further information. We are using Helix > > 0.8.2 > > > > > > > > > > > > public void sendNextJob(String jobId) { > > > > > > putUserContent(WORKFLOW_STARTED, "TRUE", Scope.WORKFLOW); > > > > > > if (jobId != null) { > > > > > > putUserContent(NEXT_JOB, jobId, Scope.WORKFLOW); > > > > > > } > > > > > > } > > > > > > > > > > > > Failed to setup environment of task > > > > > > TASK_55096de4-2cb6-4b09-84fd-7fdddba93435 > > > > > > java.lang.NullPointerException: null > > > > > > at > > org.apache.helix.task.TaskUtil$1.update(TaskUtil.java:358) > > > > > > at > > org.apache.helix.task.TaskUtil$1.update(TaskUtil.java:356) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.helix.manager.zk.HelixGroupCommit.commit(HelixGroupCommit.java:126) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.helix.manager.zk.ZkCacheBaseDataAccessor.update(ZkCacheBaseDataAccessor.java:306) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.helix.store.zk.AutoFallbackPropertyStore.update(AutoFallbackPropertyStore.java:61) > > > > > > at > > > > > > > > > > > > > > > > > > > > > org.apache.helix.task.TaskUtil.addWorkflowJobUserContent(TaskUtil.java:356) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.helix.task.UserContentStore.putUserContent(UserContentStore.java:78) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.airavata.helix.core.AbstractTask.sendNextJob(AbstractTask.java:136) > > > > > > at > > > > org.apache.airavata.helix.core.OutPort.invoke(OutPort.java:42) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.airavata.helix.core.AbstractTask.onSuccess(AbstractTask.java:123) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.airavata.helix.impl.task.AiravataTask.onSuccess(AiravataTask.java:97) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.airavata.helix.impl.task.env.EnvSetupTask.onRun(EnvSetupTask.java:52) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.airavata.helix.impl.task.AiravataTask.onRun(AiravataTask.java:349) > > > > > > at > > > > > > > > org.apache.airavata.helix.core.AbstractTask.run(AbstractTask.java:92) > > > > > > at > org.apache.helix.task.TaskRunner.run(TaskRunner.java:71) > > > > > > at > > > > > > > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > > > > > at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > > > > > at java.lang.Thread.run(Thread.java:748) > > > > > > > > > > > > Thanks > > > > > > Dimuthu > > > > > > > > > > > > > > > > > > > > > -- > > > > > Junkai Xue > > > > > > > > > > > > > > > > > > -- > > > Junkai Xue > > > > > > > > -- > Junkai Xue >
