Hi Junkai,

I'm CCing Airavata dev list as this is directly related to the project.

I just went through the zookeeper path like /<Cluster Name>/EXTERNALVIEW,
/<Cluster Name>/CONFIGS/RESOURCE as I have noticed that helix controller is
periodically monitoring for the children of those paths even though all the
Workflows have moved into a saturated state like COMPLETED and STOPPED. In
our case, we have a lot of completed workflows piled up in those paths. I
believe that helix is clearing up those resources after some TTL. What I
did was writing an external spectator [1] that continuously monitors for
saturated workflows and clearing up resources before controller does that
after a TTL. After that, we didn't see such delays in workflow execution
and everything seems to be running smoothly. However we are continuously
monitoring our deployments for any form of adverse effect introduced by
that improvement.

Please let us know if we are doing something wrong in this improvement or
is there any better way to achieve this directly through helix task
framework.

[1]
https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java

Thanks
Dimuthu

On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai <[email protected]> wrote:

> Could you please check the log of how long for each pipeline stage takes?
>
> Also, did you set expiry for workflows? Are they piled up for long time?
> How long for each workflow completes?
>
> best,
>
> Junkai
>
> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> [email protected]>
> wrote:
>
> > Hi Junkai,
> >
> > Average load is like 10 - 20 workflows per minutes. In some cases it's
> less
> > than that However based on the observations, I feel like it does not
> depend
> > on the load and it is sporadic. Is there a particular log lines that I
> can
> > filter in controller and participant to capture the timeline of workflow
> so
> > that I can figure out which which component is malfunctioning? We use
> helix
> > v 0.8.1.
> >
> > Thanks
> > Dimuthu
> >
> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai <[email protected]> wrote:
> >
> > > Hi Dimuthu,
> > >
> > > At which rate, you are keep submitting workflows? Usually, Workflow
> > > scheduling is very fast. And which version of Helix you are using?
> > >
> > > Best,
> > >
> > > Junkai
> > >
> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> > > [email protected]>
> > > wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > We have noticed some delays between workflow submission and actual
> > > picking
> > > > up by participants and seems like that delay is somewhat constant
> > around
> > > 2-
> > > > 3 minutes. We used to continuously submit workflows and after 2 -3
> > > minutes,
> > > > a bulk of workflows are picked by participant and execute them. Then
> it
> > > > remain silent for next 2 -3 minutes event we submit more workflows.
> > It's
> > > > like participant picking up workflows in discrete time intervals. I'm
> > not
> > > > sure whether this is an issue of controller or the participant. Do
> you
> > > have
> > > > any experience with this sort of behavior?
> > > >
> > > > Thanks
> > > > Dimuthu
> > > >
> > >
> > >
> > > --
> > > Junkai Xue
> > >
> >
>
>
> --
> Junkai Xue
>

Reply via email to