Hi Dimuthu, What Junkai meant by touching the IdealState is this:
1) use Zooinspector to log into ZK 2) Locate the IDEALSTATES/ path 3) grab any ZNode under that path and try to modify (just add a whitespace) and save 4) This will trigger a ZK callback which should tell Helix Controller to rebalance/schedule things On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha <[email protected]> wrote: > Hi Junkai, > > What do you mean by touching ideal state to trigger an event? I didn't > quite get what you said. Is that like creating some path in zookeeper? > Workflows are eventually scheduled but the problem is, it is very slow due > to that 30s freeze. > > Thanks > Dimuthu > > On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai <[email protected]> wrote: > > > Can you try one thing? Touch the ideal state to trigger an event. If > > workflows are not scheduled, it should scheduling has problem. > > > > Best, > > > > Junkai > > > > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha < > > [email protected]> wrote: > > > >> Hi Junkai, > >> > >> We are using 0.8.1 > >> > >> Dimuthu > >> > >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai <[email protected]> > wrote: > >> > >> > Hi Dimuthu, > >> > > >> > What's the version of Helix you are using? > >> > > >> > Best, > >> > > >> > Junkai > >> > > >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha < > >> > [email protected]> > >> > wrote: > >> > > >> > > Hi Helix Dev, > >> > > > >> > > We are again seeing this delay in task execution. Please have a look > >> at > >> > the > >> > > screencast [1] of logs printed in participant (top shell) and > >> controller > >> > > (bottom shell). When I record this, there were about 90 - 100 > >> workflows > >> > > pending to be executed. As you can see some tasks were suddenly > >> executed > >> > > and then participant freezed for about 30 seconds before executing > >> next > >> > set > >> > > of tasks. I can see some WARN logs on controller log. I feel like > >> this 30 > >> > > second delay is some sort of a pattern. What do you think as the > >> reason > >> > for > >> > > this? I can provide you more information by turning on verbose logs > on > >> > > controller if you want. > >> > > > >> > > [1] https://youtu.be/3EUdSxnIxVw > >> > > > >> > > Thanks > >> > > Dimuthu > >> > > > >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha < > >> > [email protected] > >> > > > > >> > > wrote: > >> > > > >> > > > Hi Junkai, > >> > > > > >> > > > I'm CCing Airavata dev list as this is directly related to the > >> project. > >> > > > > >> > > > I just went through the zookeeper path like /<Cluster > >> > Name>/EXTERNALVIEW, > >> > > > /<Cluster Name>/CONFIGS/RESOURCE as I have noticed that helix > >> > controller > >> > > is > >> > > > periodically monitoring for the children of those paths even > though > >> all > >> > > the > >> > > > Workflows have moved into a saturated state like COMPLETED and > >> STOPPED. > >> > > In > >> > > > our case, we have a lot of completed workflows piled up in those > >> > paths. I > >> > > > believe that helix is clearing up those resources after some TTL. > >> What > >> > I > >> > > > did was writing an external spectator [1] that continuously > monitors > >> > for > >> > > > saturated workflows and clearing up resources before controller > does > >> > that > >> > > > after a TTL. After that, we didn't see such delays in workflow > >> > execution > >> > > > and everything seems to be running smoothly. However we are > >> > continuously > >> > > > monitoring our deployments for any form of adverse effect > >> introduced by > >> > > > that improvement. > >> > > > > >> > > > Please let us know if we are doing something wrong in this > >> improvement > >> > or > >> > > > is there any better way to achieve this directly through helix > task > >> > > > framework. > >> > > > > >> > > > [1] > >> > > > > >> > > > >> > > >> > https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java > >> > > > > >> > > > Thanks > >> > > > Dimuthu > >> > > > > >> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai <[email protected]> > >> > wrote: > >> > > > > >> > > >> Could you please check the log of how long for each pipeline > stage > >> > > takes? > >> > > >> > >> > > >> Also, did you set expiry for workflows? Are they piled up for > long > >> > time? > >> > > >> How long for each workflow completes? > >> > > >> > >> > > >> best, > >> > > >> > >> > > >> Junkai > >> > > >> > >> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha < > >> > > >> [email protected]> > >> > > >> wrote: > >> > > >> > >> > > >> > Hi Junkai, > >> > > >> > > >> > > >> > Average load is like 10 - 20 workflows per minutes. In some > cases > >> > it's > >> > > >> less > >> > > >> > than that However based on the observations, I feel like it > does > >> not > >> > > >> depend > >> > > >> > on the load and it is sporadic. Is there a particular log lines > >> > that I > >> > > >> can > >> > > >> > filter in controller and participant to capture the timeline of > >> > > >> workflow so > >> > > >> > that I can figure out which which component is malfunctioning? > We > >> > use > >> > > >> helix > >> > > >> > v 0.8.1. > >> > > >> > > >> > > >> > Thanks > >> > > >> > Dimuthu > >> > > >> > > >> > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai < > [email protected] > >> > > >> > > >> wrote: > >> > > >> > > >> > > >> > > Hi Dimuthu, > >> > > >> > > > >> > > >> > > At which rate, you are keep submitting workflows? Usually, > >> > Workflow > >> > > >> > > scheduling is very fast. And which version of Helix you are > >> using? > >> > > >> > > > >> > > >> > > Best, > >> > > >> > > > >> > > >> > > Junkai > >> > > >> > > > >> > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha < > >> > > >> > > [email protected]> > >> > > >> > > wrote: > >> > > >> > > > >> > > >> > > > Hi Folks, > >> > > >> > > > > >> > > >> > > > We have noticed some delays between workflow submission and > >> > actual > >> > > >> > > picking > >> > > >> > > > up by participants and seems like that delay is somewhat > >> > constant > >> > > >> > around > >> > > >> > > 2- > >> > > >> > > > 3 minutes. We used to continuously submit workflows and > >> after 2 > >> > -3 > >> > > >> > > minutes, > >> > > >> > > > a bulk of workflows are picked by participant and execute > >> them. > >> > > >> Then it > >> > > >> > > > remain silent for next 2 -3 minutes event we submit more > >> > > workflows. > >> > > >> > It's > >> > > >> > > > like participant picking up workflows in discrete time > >> > intervals. > >> > > >> I'm > >> > > >> > not > >> > > >> > > > sure whether this is an issue of controller or the > >> participant. > >> > Do > >> > > >> you > >> > > >> > > have > >> > > >> > > > any experience with this sort of behavior? > >> > > >> > > > > >> > > >> > > > Thanks > >> > > >> > > > Dimuthu > >> > > >> > > > > >> > > >> > > > >> > > >> > > > >> > > >> > > -- > >> > > >> > > Junkai Xue > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> > >> > > >> -- > >> > > >> Junkai Xue > >> > > >> > >> > > > > >> > > > >> > > >> > > >> > -- > >> > Junkai Xue > >> > > >> > > > > > > -- > > Junkai Xue > > >
