Chaoran, nice catch on this one. Unfortunate that we didn’t find it before 
cutting 0.12.2.

I agree with Wilfred that we can add to the release notes on the website, but 
that we should back port to 0.12.3 as well. I can RM that release as well, 
unless someone else wants to volunteer.

- Craig



> On Jan 21, 2022, at 12:44 AM, Wilfred Spiegelenburg <wilfr...@apache.org> 
> wrote:
> 
> We have seen large numbers of people running and deploying. I have
> opened a PR with the fix.
> The scheduler should not get deleted, unless scaled down on purpose.
> It should not get evicted either, it should run as a high priority pod
> unless we missed that.
> Crashing of the scheduler is a bug,
> 
> We should let v0.12.2 go through as normal. In the release
> announcement we should have a section that points to known issues and
> we can reference the jira there with the workaround.
> 
> The workaround is as simple as a scale down and scale up. As long as
> the admission controller is running all pods will be pushed towards
> the YuniKorn scheduler. We can start on a next release on the branch
> v0.12. We should get this case as part of our e2e tests added.
> 
> Wilfred
> 
> On Fri, 21 Jan 2022 at 17:15, Weiwei Yang <w...@apache.org> wrote:
>> 
>> Agree, this needs to be fixed.
>> Likely we need to revoke 0.12.2 and get out a 0.12.3.
>> 
>> On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu <yuchaoran2...@gmail.com> wrote:
>> 
>>> Yes, Helm install and upgrade both work.
>>> The failure scenario is as follows:
>>> 
>>> 1. Both the admission controller and the scheduler pods are running
>>> 2. The scheduler pod is restarted for some reason (e.g. deleted, evicted,
>>> or crashed)
>>> 3. The new scheduler pod will be stuck in the pending state because it’s
>>> intercepted by the admission controller (The schedulerName field is
>>> yunikorn).
>>> 
>>> I think this bug is critical because if the scheduler pod fails for any
>>> reason, someone has to manually redeploy the whole thing.
>>> 
>>> 
>>>> On Jan 20, 2022, at 21:45, Weiwei Yang <w...@apache.org> wrote:
>>>> 
>>>> Hmmm. that is a bug. But during the release verification, I have tried
>>> the
>>>> helm install, and that works as expected. I am guessing that is because
>>> the
>>>> scheduler always gets started first. Maybe the same for the upgrade? In
>>>> this case, maybe this can work as long as people are using helm charts to
>>>> deploy yunikorn? Craig, could you please look into this and let us know
>>> if
>>>> we need to revoke the vote for 0.12.2 and have a 0.12.3?
>>>> 
>>>> Thank you Chaoran to raise this up. Much appreciated!
>>>> 
>>>> On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu <yuchaoran2...@gmail.com>
>>> wrote:
>>>> 
>>>>> I just spotted a bug
>>> https://issues.apache.org/jira/browse/YUNIKORN-1038.
>>>>> which is critical and worth porting back into branch 0.12
>>>>> 
>>>>> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan <sun...@apache.org>
>>> wrote:
>>>>> 
>>>>>> A late +1 (binding) from me.
>>>>>> 
>>>>>> I build this from source
>>>>>> - Ran basic spark job
>>>>>> - Verified UI
>>>>>> - Checked signature.
>>>>>> - Checked the images.
>>>>>> 
>>>>>> Thanks
>>>>>> Sunil
>>>>>> 
>>>>>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit <apa...@craigcondit.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
>>>>>>> with 3 binding +1 votes and 3 non-binding +1 votes.
>>>>>>> 
>>>>>>> Vote thread:
>>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
>>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
>>>>>>> 
>>>>>>> Thank you to all the members who helped verify this release. We will
>>>>> move
>>>>>>> to IPMC voting shortly.
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Craig
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
>>> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>>> 
>>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to