Let me add a caveat to my previous email. Although it comes with scalability improvements, there are currently a few known issues with the latest version. We'd encourage you to check back to make sure your current usage isn't affected.
Hunter On Fri, Mar 22, 2019 at 12:35 PM Hunter Lee <[email protected]> wrote: > No problem. If you have further questions, let us know what kind of load > you're putting on Helix as well. The newest version of Helix contains Task > Framework 2.0, and has greater scalability in scheduling tasks, so you > might want to consider using the newest version as well. > > Hunter > > On Fri, Mar 22, 2019 at 8:59 AM DImuthu Upeksha < > [email protected]> wrote: > >> Hi Lee, >> >> Thanks for the trick. I didn't know that we can poke the controller like >> that :) However now we can see that tasks are moving smoothly in our >> staging setup. This behavior can be seen from time to time and get >> resolved >> automatically in few hours. I can't find a particular pattern however my >> best guess is that this happens when the load is high. I will put some >> load >> on testing setup and see if I can reproduce this issue and try your >> instructions then get back to you >> >> Thanks >> Dimuthu >> >> On Thu, Mar 21, 2019 at 5:27 PM Hunter Lee <[email protected]> wrote: >> >> > Hi Dimuthu, >> > >> > What Junkai meant by touching the IdealState is this: >> > >> > 1) use Zooinspector to log into ZK >> > 2) Locate the IDEALSTATES/ path >> > 3) grab any ZNode under that path and try to modify (just add a >> > whitespace) and save >> > 4) This will trigger a ZK callback which should tell Helix Controller to >> > rebalance/schedule things >> > >> > On Thu, Mar 21, 2019 at 11:30 AM DImuthu Upeksha < >> > [email protected]> wrote: >> > >> >> Hi Junkai, >> >> >> >> What do you mean by touching ideal state to trigger an event? I didn't >> >> quite get what you said. Is that like creating some path in zookeeper? >> >> Workflows are eventually scheduled but the problem is, it is very slow >> due >> >> to that 30s freeze. >> >> >> >> Thanks >> >> Dimuthu >> >> >> >> On Thu, Mar 21, 2019 at 2:26 PM Xue Junkai <[email protected]> >> wrote: >> >> >> >> > Can you try one thing? Touch the ideal state to trigger an event. If >> >> > workflows are not scheduled, it should scheduling has problem. >> >> > >> >> > Best, >> >> > >> >> > Junkai >> >> > >> >> > On Wed, Mar 20, 2019 at 10:31 PM DImuthu Upeksha < >> >> > [email protected]> wrote: >> >> > >> >> >> Hi Junkai, >> >> >> >> >> >> We are using 0.8.1 >> >> >> >> >> >> Dimuthu >> >> >> >> >> >> On Thu, Mar 21, 2019 at 12:14 AM Xue Junkai <[email protected]> >> >> wrote: >> >> >> >> >> >> > Hi Dimuthu, >> >> >> > >> >> >> > What's the version of Helix you are using? >> >> >> > >> >> >> > Best, >> >> >> > >> >> >> > Junkai >> >> >> > >> >> >> > On Wed, Mar 20, 2019 at 8:54 PM DImuthu Upeksha < >> >> >> > [email protected]> >> >> >> > wrote: >> >> >> > >> >> >> > > Hi Helix Dev, >> >> >> > > >> >> >> > > We are again seeing this delay in task execution. Please have a >> >> look >> >> >> at >> >> >> > the >> >> >> > > screencast [1] of logs printed in participant (top shell) and >> >> >> controller >> >> >> > > (bottom shell). When I record this, there were about 90 - 100 >> >> >> workflows >> >> >> > > pending to be executed. As you can see some tasks were suddenly >> >> >> executed >> >> >> > > and then participant freezed for about 30 seconds before >> executing >> >> >> next >> >> >> > set >> >> >> > > of tasks. I can see some WARN logs on controller log. I feel >> like >> >> >> this 30 >> >> >> > > second delay is some sort of a pattern. What do you think as the >> >> >> reason >> >> >> > for >> >> >> > > this? I can provide you more information by turning on verbose >> >> logs on >> >> >> > > controller if you want. >> >> >> > > >> >> >> > > [1] https://youtu.be/3EUdSxnIxVw >> >> >> > > >> >> >> > > Thanks >> >> >> > > Dimuthu >> >> >> > > >> >> >> > > On Thu, Oct 4, 2018 at 4:46 PM DImuthu Upeksha < >> >> >> > [email protected] >> >> >> > > > >> >> >> > > wrote: >> >> >> > > >> >> >> > > > Hi Junkai, >> >> >> > > > >> >> >> > > > I'm CCing Airavata dev list as this is directly related to the >> >> >> project. >> >> >> > > > >> >> >> > > > I just went through the zookeeper path like /<Cluster >> >> >> > Name>/EXTERNALVIEW, >> >> >> > > > /<Cluster Name>/CONFIGS/RESOURCE as I have noticed that helix >> >> >> > controller >> >> >> > > is >> >> >> > > > periodically monitoring for the children of those paths even >> >> though >> >> >> all >> >> >> > > the >> >> >> > > > Workflows have moved into a saturated state like COMPLETED and >> >> >> STOPPED. >> >> >> > > In >> >> >> > > > our case, we have a lot of completed workflows piled up in >> those >> >> >> > paths. I >> >> >> > > > believe that helix is clearing up those resources after some >> TTL. >> >> >> What >> >> >> > I >> >> >> > > > did was writing an external spectator [1] that continuously >> >> monitors >> >> >> > for >> >> >> > > > saturated workflows and clearing up resources before >> controller >> >> does >> >> >> > that >> >> >> > > > after a TTL. After that, we didn't see such delays in workflow >> >> >> > execution >> >> >> > > > and everything seems to be running smoothly. However we are >> >> >> > continuously >> >> >> > > > monitoring our deployments for any form of adverse effect >> >> >> introduced by >> >> >> > > > that improvement. >> >> >> > > > >> >> >> > > > Please let us know if we are doing something wrong in this >> >> >> improvement >> >> >> > or >> >> >> > > > is there any better way to achieve this directly through helix >> >> task >> >> >> > > > framework. >> >> >> > > > >> >> >> > > > [1] >> >> >> > > > >> >> >> > > >> >> >> > >> >> >> >> >> >> https://github.com/apache/airavata/blob/staging/modules/airavata-helix/helix-spectator/src/main/java/org/apache/airavata/helix/impl/controller/WorkflowCleanupAgent.java >> >> >> > > > >> >> >> > > > Thanks >> >> >> > > > Dimuthu >> >> >> > > > >> >> >> > > > On Tue, Oct 2, 2018 at 1:12 PM Xue Junkai < >> [email protected]> >> >> >> > wrote: >> >> >> > > > >> >> >> > > >> Could you please check the log of how long for each pipeline >> >> stage >> >> >> > > takes? >> >> >> > > >> >> >> >> > > >> Also, did you set expiry for workflows? Are they piled up for >> >> long >> >> >> > time? >> >> >> > > >> How long for each workflow completes? >> >> >> > > >> >> >> >> > > >> best, >> >> >> > > >> >> >> >> > > >> Junkai >> >> >> > > >> >> >> >> > > >> On Wed, Sep 26, 2018 at 8:52 AM DImuthu Upeksha < >> >> >> > > >> [email protected]> >> >> >> > > >> wrote: >> >> >> > > >> >> >> >> > > >> > Hi Junkai, >> >> >> > > >> > >> >> >> > > >> > Average load is like 10 - 20 workflows per minutes. In some >> >> cases >> >> >> > it's >> >> >> > > >> less >> >> >> > > >> > than that However based on the observations, I feel like it >> >> does >> >> >> not >> >> >> > > >> depend >> >> >> > > >> > on the load and it is sporadic. Is there a particular log >> >> lines >> >> >> > that I >> >> >> > > >> can >> >> >> > > >> > filter in controller and participant to capture the >> timeline >> >> of >> >> >> > > >> workflow so >> >> >> > > >> > that I can figure out which which component is >> >> malfunctioning? We >> >> >> > use >> >> >> > > >> helix >> >> >> > > >> > v 0.8.1. >> >> >> > > >> > >> >> >> > > >> > Thanks >> >> >> > > >> > Dimuthu >> >> >> > > >> > >> >> >> > > >> > On Tue, Sep 25, 2018 at 5:19 PM Xue Junkai < >> >> [email protected] >> >> >> > >> >> >> > > >> wrote: >> >> >> > > >> > >> >> >> > > >> > > Hi Dimuthu, >> >> >> > > >> > > >> >> >> > > >> > > At which rate, you are keep submitting workflows? >> Usually, >> >> >> > Workflow >> >> >> > > >> > > scheduling is very fast. And which version of Helix you >> are >> >> >> using? >> >> >> > > >> > > >> >> >> > > >> > > Best, >> >> >> > > >> > > >> >> >> > > >> > > Junkai >> >> >> > > >> > > >> >> >> > > >> > > On Tue, Sep 25, 2018 at 8:58 AM DImuthu Upeksha < >> >> >> > > >> > > [email protected]> >> >> >> > > >> > > wrote: >> >> >> > > >> > > >> >> >> > > >> > > > Hi Folks, >> >> >> > > >> > > > >> >> >> > > >> > > > We have noticed some delays between workflow submission >> >> and >> >> >> > actual >> >> >> > > >> > > picking >> >> >> > > >> > > > up by participants and seems like that delay is >> somewhat >> >> >> > constant >> >> >> > > >> > around >> >> >> > > >> > > 2- >> >> >> > > >> > > > 3 minutes. We used to continuously submit workflows and >> >> >> after 2 >> >> >> > -3 >> >> >> > > >> > > minutes, >> >> >> > > >> > > > a bulk of workflows are picked by participant and >> execute >> >> >> them. >> >> >> > > >> Then it >> >> >> > > >> > > > remain silent for next 2 -3 minutes event we submit >> more >> >> >> > > workflows. >> >> >> > > >> > It's >> >> >> > > >> > > > like participant picking up workflows in discrete time >> >> >> > intervals. >> >> >> > > >> I'm >> >> >> > > >> > not >> >> >> > > >> > > > sure whether this is an issue of controller or the >> >> >> participant. >> >> >> > Do >> >> >> > > >> you >> >> >> > > >> > > have >> >> >> > > >> > > > any experience with this sort of behavior? >> >> >> > > >> > > > >> >> >> > > >> > > > Thanks >> >> >> > > >> > > > Dimuthu >> >> >> > > >> > > > >> >> >> > > >> > > >> >> >> > > >> > > >> >> >> > > >> > > -- >> >> >> > > >> > > Junkai Xue >> >> >> > > >> > > >> >> >> > > >> > >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> -- >> >> >> > > >> Junkai Xue >> >> >> > > >> >> >> >> > > > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > Junkai Xue >> >> >> > >> >> >> >> >> > >> >> > >> >> > -- >> >> > Junkai Xue >> >> > >> >> >> > >> >
