Re: Sporadic delays in task execution

Hunter Lee Thu, 21 Mar 2019 14:27:40 -0700

Hi Dimuthu,

What Junkai meant by touching the IdealState is this:


1) use Zooinspector to log into ZK
2) Locate the IDEALSTATES/ path
3) grab any ZNode under that path and try to modify (just add a whitespace)
and save
4) This will trigger a ZK callback which should tell Helix Controller to
rebalance/schedule things

On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <[email protected]>
wrote:

> Hi Junkai,
>
> What do you mean by touching ideal state to trigger an event? I didn't
> quite get what you said. Is that like creating some path in zookeeper?
> Workflows are eventually scheduled but the problem is, it is very slow due
> to that 30s freeze.
>
> Thanks
> Dimuthu
>
> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai <[email protected]> wrote:
>
> > Can you try one thing? Touch the ideal state to trigger an event. If
> > workflows are not scheduled, it should scheduling has problem.
> >
> > Best,
> >
> > Junkai
> >
> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha <
> > [email protected]> wrote:
> >
> >> Hi Junkai,
> >>
> >> We are using 0.8.1
> >>
> >> Dimuthu
> >>
> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai <[email protected]>
> wrote:
> >>
> >> > Hi Dimuthu,
> >> >
> >> > What's the version of Helix you are using?
> >> >
> >> > Best,
> >> >
> >> > Junkai
> >> >
> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha <
> >> > [email protected]>
> >> > wrote:
> >> >
> >> > > Hi Helix Dev,
> >> > >
> >> > > We are again seeing this delay in task execution. Please have a look
> >> at
> >> > the
> >> > > screencast [1] of logs printed in participant (top shell) and
> >> controller
> >> > > (bottom shell). When I record this, there were about 90 - 100
> >> workflows
> >> > > pending to be executed. As you can see some tasks were suddenly
> >> executed
> >> > > and then participant freezed for about 30 seconds before executing
> >> next
> >> > set
> >> > > of tasks. I can see some WARN logs on controller log. I feel like
> >> this 30
> >> > > second delay is some sort of a pattern. What do you think as the
> >> reason
> >> > for
> >> > > this? I can provide you more information by turning on verbose logs
> on
> >> > > controller if you want.
> >> > >
> >> > > [1] https://youtu.be/3EUdSxnIxVw
> >> > >
> >> > > Thanks
> >> > > Dimuthu
> >> > >
> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha <
> >> > [email protected]
> >> > > >
> >> > > wrote:
> >> > >
> >> > > > Hi Junkai,
> >> > > >
> >> > > > I'm CCing Airavata dev list as this is directly related to the
> >> project.
> >> > > >
> >> > > > I just went through the zookeeper path like /<Cluster
> >> > Name>/EXTERNALVIEW,
> >> > > > /<Cluster Name>/CONFIGS/RESOURCE as I have noticed that helix
> >> > controller
> >> > > is
> >> > > > periodically monitoring for the children of those paths even
> though
> >> all
> >> > > the
> >> > > > Workflows have moved into a saturated state like COMPLETED and
> >> STOPPED.
> >> > > In
> >> > > > our case, we have a lot of completed workflows piled up in those
> >> > paths. I
> >> > > > believe that helix is clearing up those resources after some TTL.
> >> What
> >> > I
> >> > > > did was writing an external spectator [1] that continuously
> monitors
> >> > for
> >> > > > saturated workflows and clearing up resources before controller
> does
> >> > that
> >> > > > after a TTL. After that, we didn't see such delays in workflow
> >> > execution
> >> > > > and everything seems to be running smoothly. However we are
> >> > continuously
> >> > > > monitoring our deployments for any form of adverse effect
> >> introduced by
> >> > > > that improvement.
> >> > > >
> >> > > > Please let us know if we are doing something wrong in this
> >> improvement
> >> > or
> >> > > > is there any better way to achieve this directly through helix
> task
> >> > > > framework.
> >> > > >
> >> > > > [1]
> >> > > >
> >> > >
> >> >
> >>
> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java
> >> > > >
> >> > > > Thanks
> >> > > > Dimuthu
> >> > > >
> >> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai <[email protected]>
> >> > wrote:
> >> > > >
> >> > > >> Could you please check the log of how long for each pipeline
> stage
> >> > > takes?
> >> > > >>
> >> > > >> Also, did you set expiry for workflows? Are they piled up for
> long
> >> > time?
> >> > > >> How long for each workflow completes?
> >> > > >>
> >> > > >> best,
> >> > > >>
> >> > > >> Junkai
> >> > > >>
> >> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha <
> >> > > >> [email protected]>
> >> > > >> wrote:
> >> > > >>
> >> > > >> > Hi Junkai,
> >> > > >> >
> >> > > >> > Average load is like 10 - 20 workflows per minutes. In some
> cases
> >> > it's
> >> > > >> less
> >> > > >> > than that However based on the observations, I feel like it
> does
> >> not
> >> > > >> depend
> >> > > >> > on the load and it is sporadic. Is there a particular log lines
> >> > that I
> >> > > >> can
> >> > > >> > filter in controller and participant to capture the timeline of
> >> > > >> workflow so
> >> > > >> > that I can figure out which which component is malfunctioning?
> We
> >> > use
> >> > > >> helix
> >> > > >> > v 0.8.1.
> >> > > >> >
> >> > > >> > Thanks
> >> > > >> > Dimuthu
> >> > > >> >
> >> > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai <
> [email protected]
> >> >
> >> > > >> wrote:
> >> > > >> >
> >> > > >> > > Hi Dimuthu,
> >> > > >> > >
> >> > > >> > > At which rate, you are keep submitting workflows? Usually,
> >> > Workflow
> >> > > >> > > scheduling is very fast. And which version of Helix you are
> >> using?
> >> > > >> > >
> >> > > >> > > Best,
> >> > > >> > >
> >> > > >> > > Junkai
> >> > > >> > >
> >> > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha <
> >> > > >> > > [email protected]>
> >> > > >> > > wrote:
> >> > > >> > >
> >> > > >> > > > Hi Folks,
> >> > > >> > > >
> >> > > >> > > > We have noticed some delays between workflow submission and
> >> > actual
> >> > > >> > > picking
> >> > > >> > > > up by participants and seems like that delay is somewhat
> >> > constant
> >> > > >> > around
> >> > > >> > > 2-
> >> > > >> > > > 3 minutes. We used to continuously submit workflows and
> >> after 2
> >> > -3
> >> > > >> > > minutes,
> >> > > >> > > > a bulk of workflows are picked by participant and execute
> >> them.
> >> > > >> Then it
> >> > > >> > > > remain silent for next 2 -3 minutes event we submit more
> >> > > workflows.
> >> > > >> > It's
> >> > > >> > > > like participant picking up workflows in discrete time
> >> > intervals.
> >> > > >> I'm
> >> > > >> > not
> >> > > >> > > > sure whether this is an issue of controller or the
> >> participant.
> >> > Do
> >> > > >> you
> >> > > >> > > have
> >> > > >> > > > any experience with this sort of behavior?
> >> > > >> > > >
> >> > > >> > > > Thanks
> >> > > >> > > > Dimuthu
> >> > > >> > > >
> >> > > >> > >
> >> > > >> > >
> >> > > >> > > --
> >> > > >> > > Junkai Xue
> >> > > >> > >
> >> > > >> >
> >> > > >>
> >> > > >>
> >> > > >> --
> >> > > >> Junkai Xue
> >> > > >>
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Junkai Xue
> >> >
> >>
> >
> >
> > --
> > Junkai Xue
> >
>

Re: Sporadic delays in task execution

Reply via email to