Hi Junkai, I'm CCing Airavata dev list as this is directly related to the project.
I just went through the zookeeper path like /<Cluster Name>/EXTERNALVIEW, /<Cluster Name>/CONFIGS/RESOURCE as I have noticed that helix controller is periodically monitoring for the children of those paths even though all the Workflows have moved into a saturated state like COMPLETED and STOPPED. In our case, we have a lot of completed workflows piled up in those paths. I believe that helix is clearing up those resources after some TTL. What I did was writing an external spectator [1] that continuously monitors for saturated workflows and clearing up resources before controller does that after a TTL. After that, we didn't see such delays in workflow execution and everything seems to be running smoothly. However we are continuously monitoring our deployments for any form of adverse effect introduced by that improvement. Please let us know if we are doing something wrong in this improvement or is there any better way to achieve this directly through helix task framework. [1] https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java Thanks Dimuthu On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai <[email protected]> wrote: > Could you please check the log of how long for each pipeline stage takes? > > Also, did you set expiry for workflows? Are they piled up for long time? > How long for each workflow completes? > > best, > > Junkai > > On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha < > [email protected]> > wrote: > > > Hi Junkai, > > > > Average load is like 10 - 20 workflows per minutes. In some cases it's > less > > than that However based on the observations, I feel like it does not > depend > > on the load and it is sporadic. Is there a particular log lines that I > can > > filter in controller and participant to capture the timeline of workflow > so > > that I can figure out which which component is malfunctioning? We use > helix > > v 0.8.1. > > > > Thanks > > Dimuthu > > > > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai <[email protected]> wrote: > > > > > Hi Dimuthu, > > > > > > At which rate, you are keep submitting workflows? Usually, Workflow > > > scheduling is very fast. And which version of Helix you are using? > > > > > > Best, > > > > > > Junkai > > > > > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha < > > > [email protected]> > > > wrote: > > > > > > > Hi Folks, > > > > > > > > We have noticed some delays between workflow submission and actual > > > picking > > > > up by participants and seems like that delay is somewhat constant > > around > > > 2- > > > > 3 minutes. We used to continuously submit workflows and after 2 -3 > > > minutes, > > > > a bulk of workflows are picked by participant and execute them. Then > it > > > > remain silent for next 2 -3 minutes event we submit more workflows. > > It's > > > > like participant picking up workflows in discrete time intervals. I'm > > not > > > > sure whether this is an issue of controller or the participant. Do > you > > > have > > > > any experience with this sort of behavior? > > > > > > > > Thanks > > > > Dimuthu > > > > > > > > > > > > > -- > > > Junkai Xue > > > > > > > > -- > Junkai Xue >
