We have seen large numbers of people running and deploying. I have opened a PR with the fix. The scheduler should not get deleted, unless scaled down on purpose. It should not get evicted either, it should run as a high priority pod unless we missed that. Crashing of the scheduler is a bug,
We should let v0.12.2 go through as normal. In the release announcement we should have a section that points to known issues and we can reference the jira there with the workaround. The workaround is as simple as a scale down and scale up. As long as the admission controller is running all pods will be pushed towards the YuniKorn scheduler. We can start on a next release on the branch v0.12. We should get this case as part of our e2e tests added. Wilfred On Fri, 21 Jan 2022 at 17:15, Weiwei Yang <w...@apache.org> wrote: > > Agree, this needs to be fixed. > Likely we need to revoke 0.12.2 and get out a 0.12.3. > > On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu <yuchaoran2...@gmail.com> wrote: > > > Yes, Helm install and upgrade both work. > > The failure scenario is as follows: > > > > 1. Both the admission controller and the scheduler pods are running > > 2. The scheduler pod is restarted for some reason (e.g. deleted, evicted, > > or crashed) > > 3. The new scheduler pod will be stuck in the pending state because it’s > > intercepted by the admission controller (The schedulerName field is > > yunikorn). > > > > I think this bug is critical because if the scheduler pod fails for any > > reason, someone has to manually redeploy the whole thing. > > > > > > > On Jan 20, 2022, at 21:45, Weiwei Yang <w...@apache.org> wrote: > > > > > > Hmmm. that is a bug. But during the release verification, I have tried > > the > > > helm install, and that works as expected. I am guessing that is because > > the > > > scheduler always gets started first. Maybe the same for the upgrade? In > > > this case, maybe this can work as long as people are using helm charts to > > > deploy yunikorn? Craig, could you please look into this and let us know > > if > > > we need to revoke the vote for 0.12.2 and have a 0.12.3? > > > > > > Thank you Chaoran to raise this up. Much appreciated! > > > > > > On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu <yuchaoran2...@gmail.com> > > wrote: > > > > > >> I just spotted a bug > > https://issues.apache.org/jira/browse/YUNIKORN-1038. > > >> which is critical and worth porting back into branch 0.12 > > >> > > >> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan <sun...@apache.org> > > wrote: > > >> > > >>> A late +1 (binding) from me. > > >>> > > >>> I build this from source > > >>> - Ran basic spark job > > >>> - Verified UI > > >>> - Checked signature. > > >>> - Checked the images. > > >>> > > >>> Thanks > > >>> Sunil > > >>> > > >>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit <apa...@craigcondit.com> > > >>> wrote: > > >>> > > >>>> Hi all, > > >>>> > > >>>> The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed > > >>>> with 3 binding +1 votes and 3 non-binding +1 votes. > > >>>> > > >>>> Vote thread: > > >>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j < > > >>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j> > > >>>> > > >>>> Thank you to all the members who helped verify this release. We will > > >> move > > >>>> to IPMC voting shortly. > > >>>> > > >>>> > > >>>> Thanks, > > >>>> Craig > > >>>> > > >>>> > > >>>> > > >>> > > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org > > For additional commands, e-mail: dev-h...@yunikorn.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org