We have seen large numbers of people running and deploying. I have
opened a PR with the fix.
The scheduler should not get deleted, unless scaled down on purpose.
It should not get evicted either, it should run as a high priority pod
unless we missed that.
Crashing of the scheduler is a bug,

We should let v0.12.2 go through as normal. In the release
announcement we should have a section that points to known issues and
we can reference the jira there with the workaround.

The workaround is as simple as a scale down and scale up. As long as
the admission controller is running all pods will be pushed towards
the YuniKorn scheduler. We can start on a next release on the branch
v0.12. We should get this case as part of our e2e tests added.

Wilfred

On Fri, 21 Jan 2022 at 17:15, Weiwei Yang <w...@apache.org> wrote:
>
> Agree, this needs to be fixed.
> Likely we need to revoke 0.12.2 and get out a 0.12.3.
>
> On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu <yuchaoran2...@gmail.com> wrote:
>
> > Yes, Helm install and upgrade both work.
> > The failure scenario is as follows:
> >
> > 1. Both the admission controller and the scheduler pods are running
> > 2. The scheduler pod is restarted for some reason (e.g. deleted, evicted,
> > or crashed)
> > 3. The new scheduler pod will be stuck in the pending state because it’s
> > intercepted by the admission controller (The schedulerName field is
> > yunikorn).
> >
> > I think this bug is critical because if the scheduler pod fails for any
> > reason, someone has to manually redeploy the whole thing.
> >
> >
> > > On Jan 20, 2022, at 21:45, Weiwei Yang <w...@apache.org> wrote:
> > >
> > > Hmmm. that is a bug. But during the release verification, I have tried
> > the
> > > helm install, and that works as expected. I am guessing that is because
> > the
> > > scheduler always gets started first. Maybe the same for the upgrade? In
> > > this case, maybe this can work as long as people are using helm charts to
> > > deploy yunikorn? Craig, could you please look into this and let us know
> > if
> > > we need to revoke the vote for 0.12.2 and have a 0.12.3?
> > >
> > > Thank you Chaoran to raise this up. Much appreciated!
> > >
> > > On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu <yuchaoran2...@gmail.com>
> > wrote:
> > >
> > >> I just spotted a bug
> > https://issues.apache.org/jira/browse/YUNIKORN-1038.
> > >> which is critical and worth porting back into branch 0.12
> > >>
> > >> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan <sun...@apache.org>
> > wrote:
> > >>
> > >>> A late +1 (binding) from me.
> > >>>
> > >>> I build this from source
> > >>> - Ran basic spark job
> > >>> - Verified UI
> > >>> - Checked signature.
> > >>> - Checked the images.
> > >>>
> > >>> Thanks
> > >>> Sunil
> > >>>
> > >>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit <apa...@craigcondit.com>
> > >>> wrote:
> > >>>
> > >>>> Hi all,
> > >>>>
> > >>>> The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
> > >>>> with 3 binding +1 votes and 3 non-binding +1 votes.
> > >>>>
> > >>>> Vote thread:
> > >>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
> > >>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
> > >>>>
> > >>>> Thank you to all the members who helped verify this release. We will
> > >> move
> > >>>> to IPMC voting shortly.
> > >>>>
> > >>>>
> > >>>> Thanks,
> > >>>> Craig
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> > For additional commands, e-mail: dev-h...@yunikorn.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to