Re: [RESULT] [VOTE] Release Apache YuniKorn (incubating) 0.12.2 RC2

2022-01-20 Thread Wilfred Spiegelenburg
We have seen large numbers of people running and deploying. I have
opened a PR with the fix.
The scheduler should not get deleted, unless scaled down on purpose.
It should not get evicted either, it should run as a high priority pod
unless we missed that.
Crashing of the scheduler is a bug,

We should let v0.12.2 go through as normal. In the release
announcement we should have a section that points to known issues and
we can reference the jira there with the workaround.

The workaround is as simple as a scale down and scale up. As long as
the admission controller is running all pods will be pushed towards
the YuniKorn scheduler. We can start on a next release on the branch
v0.12. We should get this case as part of our e2e tests added.

Wilfred

On Fri, 21 Jan 2022 at 17:15, Weiwei Yang  wrote:
>
> Agree, this needs to be fixed.
> Likely we need to revoke 0.12.2 and get out a 0.12.3.
>
> On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu  wrote:
>
> > Yes, Helm install and upgrade both work.
> > The failure scenario is as follows:
> >
> > 1. Both the admission controller and the scheduler pods are running
> > 2. The scheduler pod is restarted for some reason (e.g. deleted, evicted,
> > or crashed)
> > 3. The new scheduler pod will be stuck in the pending state because it’s
> > intercepted by the admission controller (The schedulerName field is
> > yunikorn).
> >
> > I think this bug is critical because if the scheduler pod fails for any
> > reason, someone has to manually redeploy the whole thing.
> >
> >
> > > On Jan 20, 2022, at 21:45, Weiwei Yang  wrote:
> > >
> > > Hmmm. that is a bug. But during the release verification, I have tried
> > the
> > > helm install, and that works as expected. I am guessing that is because
> > the
> > > scheduler always gets started first. Maybe the same for the upgrade? In
> > > this case, maybe this can work as long as people are using helm charts to
> > > deploy yunikorn? Craig, could you please look into this and let us know
> > if
> > > we need to revoke the vote for 0.12.2 and have a 0.12.3?
> > >
> > > Thank you Chaoran to raise this up. Much appreciated!
> > >
> > > On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu 
> > wrote:
> > >
> > >> I just spotted a bug
> > https://issues.apache.org/jira/browse/YUNIKORN-1038.
> > >> which is critical and worth porting back into branch 0.12
> > >>
> > >> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan 
> > wrote:
> > >>
> > >>> A late +1 (binding) from me.
> > >>>
> > >>> I build this from source
> > >>> - Ran basic spark job
> > >>> - Verified UI
> > >>> - Checked signature.
> > >>> - Checked the images.
> > >>>
> > >>> Thanks
> > >>> Sunil
> > >>>
> > >>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit 
> > >>> wrote:
> > >>>
> >  Hi all,
> > 
> >  The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
> >  with 3 binding +1 votes and 3 non-binding +1 votes.
> > 
> >  Vote thread:
> >  https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
> >  https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
> > 
> >  Thank you to all the members who helped verify this release. We will
> > >> move
> >  to IPMC voting shortly.
> > 
> > 
> >  Thanks,
> >  Craig
> > 
> > 
> > 
> > >>>
> > >>
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> > For additional commands, e-mail: dev-h...@yunikorn.apache.org
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [RESULT] [VOTE] Release Apache YuniKorn (incubating) 0.12.2 RC2

2022-01-20 Thread Weiwei Yang
Agree, this needs to be fixed.
Likely we need to revoke 0.12.2 and get out a 0.12.3.

On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu  wrote:

> Yes, Helm install and upgrade both work.
> The failure scenario is as follows:
>
> 1. Both the admission controller and the scheduler pods are running
> 2. The scheduler pod is restarted for some reason (e.g. deleted, evicted,
> or crashed)
> 3. The new scheduler pod will be stuck in the pending state because it’s
> intercepted by the admission controller (The schedulerName field is
> yunikorn).
>
> I think this bug is critical because if the scheduler pod fails for any
> reason, someone has to manually redeploy the whole thing.
>
>
> > On Jan 20, 2022, at 21:45, Weiwei Yang  wrote:
> >
> > Hmmm. that is a bug. But during the release verification, I have tried
> the
> > helm install, and that works as expected. I am guessing that is because
> the
> > scheduler always gets started first. Maybe the same for the upgrade? In
> > this case, maybe this can work as long as people are using helm charts to
> > deploy yunikorn? Craig, could you please look into this and let us know
> if
> > we need to revoke the vote for 0.12.2 and have a 0.12.3?
> >
> > Thank you Chaoran to raise this up. Much appreciated!
> >
> > On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu 
> wrote:
> >
> >> I just spotted a bug
> https://issues.apache.org/jira/browse/YUNIKORN-1038.
> >> which is critical and worth porting back into branch 0.12
> >>
> >> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan 
> wrote:
> >>
> >>> A late +1 (binding) from me.
> >>>
> >>> I build this from source
> >>> - Ran basic spark job
> >>> - Verified UI
> >>> - Checked signature.
> >>> - Checked the images.
> >>>
> >>> Thanks
> >>> Sunil
> >>>
> >>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit 
> >>> wrote:
> >>>
>  Hi all,
> 
>  The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
>  with 3 binding +1 votes and 3 non-binding +1 votes.
> 
>  Vote thread:
>  https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
>  https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
> 
>  Thank you to all the members who helped verify this release. We will
> >> move
>  to IPMC voting shortly.
> 
> 
>  Thanks,
>  Craig
> 
> 
> 
> >>>
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>
>


Re: Apache YuniKorn (Incubating) - Community Graduation Vote

2022-01-20 Thread Weiwei Yang
hi all

Most issues under the graduation preparation JIRA YUNIKORN-1005
 are fixed.
The remaining one is the who-are-we web page, I am currently collecting
info for that, should be done by next week.
Shall we start to vote now? I can start a new thread for the community
voting if nobody has objections.

On Tue, Jan 11, 2022 at 11:02 AM Wilfred Spiegelenburg 
wrote:

> None of the security lists mentioned in the security page [1] are
> moderated. They are private lists, i.e. not openly available for
> browsing in an archive, but not moderated. Using the private@ for
> YuniKorn does not seem to line up with what other projects do either.
> None of the recently graduated projects mention anything like using
> the private@ mailing list on their sites. They all have just used the
> general security link mentioned on their site unless they have a
> specific security@ list. YuniKorn would be the one standing out from
> what seems to be the norm.
> Examples from the last 2 years of graduated projects using a simple
> link or a text pointing to [1]: Pinot, Dolphinscheduler, Ratis,
> Echarts, Gobblin, TVM, Superset and Datasketches. There are more but I
> think this provides an overview of what is expected on graduation.
>
> Wilfred
>
> [1] https://www.apache.org/security/
>
> On Tue, 11 Jan 2022 at 18:21, Weiwei Yang  wrote:
> >
> > Hi Wilfred
> >
> > Adding a security@ mailing list sounds like a good idea, but I do not
> think that is required in the current stage.
> > We can do that post-graduate. For now, the Apache security doc said
> >
> > > We strongly encourage you to report potential security vulnerabilities
> to one of our private security mailing lists first, before disclosing them
> in a public forum.
> >
> > I do not see any issue if we use our private@ mailing list for this
> purpose.
> >
> > On Mon, Jan 10, 2022 at 11:01 PM Wilfred Spiegelenburg <
> wilfr...@apache.org> wrote:
> >>
> >> The private@ is a moderated list. This has two issues: a moderator
> >> needs to approve any message not sent by a PMC member. This will slow
> >> down the process of interaction with the reporter. It would also not
> >> reach the YuniKorn committers group as not all committers are part of
> >> the PMC. Security issues should be handled and worked on by all
> >> committers not just by the PMC members.
> >>
> >> The security notification update made to the website I think does not
> >> line up with the security guidelines referenced in the link provided
> >> in the dropdown menu of the YuniKorn site [1]. In that link there is a
> >> well defined way to report security issues. If we need to enhance and
> >> extend what we do we either establish a security@ mailing list and
> >> provide a static page with security related information on our site or
> >> we leave it as is. My preference would be to establish a security@
> >> list and make all committers a member of that list.
> >>
> >> I think we need to roll back the website changes part of YUNIKORN-1006
> >> [2] in PR [3] for the website.
> >>
> >> Wilfred
> >>
> >> [1] https://www.apache.org/security/
> >> [2] https://issues.apache.org/jira/browse/YUNIKORN-1006
> >> [3] https://github.com/apache/incubator-yunikorn-site/pull/105
> >>
> >> On Tue, 11 Jan 2022 at 04:45, Holden Karau 
> wrote:
> >> >
> >> > For "The project provides a well-documented, secure and private
> channel to report security issues, along with a documented way of
> responding to them.' the standard that I've seen used is to tell people to
> e-mail private@ when they think they might have a security related issue.
> I think that would probably work well for Yunikorn too.
> >> >
> >> >
> >> > On Mon, Jan 10, 2022 at 7:04 AM Chenya Zhang <
> chenyazhangche...@gmail.com> wrote:
> >> >>
> >> >> Hi Weiwei,
> >> >>
> >> >> Thanks for driving this! The evaluation is quite comprehensive
> overall. I checked our Apache project maturity guidelines and noticed the
> below three items. Not sure if we already have them but they are not
> blockers to our graduation. We could think more about them along the way.
> >> >>
> >> >> QU30
> >> >>
> >> >> The project provides a well-documented, secure and private channel
> to report security issues, along with a documented way of responding to
> them.
> >> >>
> >> >> QU40
> >> >>
> >> >> The project puts a high priority on backwards compatibility and aims
> to document any incompatible changes and provide tools and documentation to
> help users transition to new features.
> >> >>
> >> >> CO50
> >> >>
> >> >> The project documents how contributors can earn more rights such as
> commit access or decision power, and applies these principles consistently.
> >> >>
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Chenya
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Jan 10, 2022 at 12:00 AM Weiwei Yang 
> wrote:
> >> >>>
> >> >>> Hi YuniKorn community and mentors
> >> >>>
> >> >>> Based on the discussion thread [1], after 2 years time of
> 

Re: [RESULT] [VOTE] Release Apache YuniKorn (incubating) 0.12.2 RC2

2022-01-20 Thread Chaoran Yu
Yes, Helm install and upgrade both work.
The failure scenario is as follows:

1. Both the admission controller and the scheduler pods are running
2. The scheduler pod is restarted for some reason (e.g. deleted, evicted, or 
crashed)
3. The new scheduler pod will be stuck in the pending state because it’s 
intercepted by the admission controller (The schedulerName field is yunikorn).

I think this bug is critical because if the scheduler pod fails for any reason, 
someone has to manually redeploy the whole thing.


> On Jan 20, 2022, at 21:45, Weiwei Yang  wrote:
> 
> Hmmm. that is a bug. But during the release verification, I have tried the
> helm install, and that works as expected. I am guessing that is because the
> scheduler always gets started first. Maybe the same for the upgrade? In
> this case, maybe this can work as long as people are using helm charts to
> deploy yunikorn? Craig, could you please look into this and let us know if
> we need to revoke the vote for 0.12.2 and have a 0.12.3?
> 
> Thank you Chaoran to raise this up. Much appreciated!
> 
> On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu  wrote:
> 
>> I just spotted a bug https://issues.apache.org/jira/browse/YUNIKORN-1038.
>> which is critical and worth porting back into branch 0.12
>> 
>> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan  wrote:
>> 
>>> A late +1 (binding) from me.
>>> 
>>> I build this from source
>>> - Ran basic spark job
>>> - Verified UI
>>> - Checked signature.
>>> - Checked the images.
>>> 
>>> Thanks
>>> Sunil
>>> 
>>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit 
>>> wrote:
>>> 
 Hi all,
 
 The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
 with 3 binding +1 votes and 3 non-binding +1 votes.
 
 Vote thread:
 https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
 https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
 
 Thank you to all the members who helped verify this release. We will
>> move
 to IPMC voting shortly.
 
 
 Thanks,
 Craig
 
 
 
>>> 
>> 


-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [RESULT] [VOTE] Release Apache YuniKorn (incubating) 0.12.2 RC2

2022-01-20 Thread Weiwei Yang
Hmmm. that is a bug. But during the release verification, I have tried the
helm install, and that works as expected. I am guessing that is because the
scheduler always gets started first. Maybe the same for the upgrade? In
this case, maybe this can work as long as people are using helm charts to
deploy yunikorn? Craig, could you please look into this and let us know if
we need to revoke the vote for 0.12.2 and have a 0.12.3?

Thank you Chaoran to raise this up. Much appreciated!

On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu  wrote:

> I just spotted a bug https://issues.apache.org/jira/browse/YUNIKORN-1038.
> which is critical and worth porting back into branch 0.12
>
> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan  wrote:
>
> > A late +1 (binding) from me.
> >
> > I build this from source
> > - Ran basic spark job
> > - Verified UI
> > - Checked signature.
> > - Checked the images.
> >
> > Thanks
> > Sunil
> >
> > On Wed, Jan 19, 2022 at 8:44 AM Craig Condit 
> > wrote:
> >
> > > Hi all,
> > >
> > > The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
> > > with 3 binding +1 votes and 3 non-binding +1 votes.
> > >
> > > Vote thread:
> > > https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
> > > https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
> > >
> > > Thank you to all the members who helped verify this release. We will
> move
> > > to IPMC voting shortly.
> > >
> > >
> > > Thanks,
> > > Craig
> > >
> > >
> > >
> >
>


Re: [RESULT] [VOTE] Release Apache YuniKorn (incubating) 0.12.2 RC2

2022-01-20 Thread Chaoran Yu
I just spotted a bug https://issues.apache.org/jira/browse/YUNIKORN-1038.
which is critical and worth porting back into branch 0.12

On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan  wrote:

> A late +1 (binding) from me.
>
> I build this from source
> - Ran basic spark job
> - Verified UI
> - Checked signature.
> - Checked the images.
>
> Thanks
> Sunil
>
> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit 
> wrote:
>
> > Hi all,
> >
> > The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
> > with 3 binding +1 votes and 3 non-binding +1 votes.
> >
> > Vote thread:
> > https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
> > https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
> >
> > Thank you to all the members who helped verify this release. We will move
> > to IPMC voting shortly.
> >
> >
> > Thanks,
> > Craig
> >
> >
> >
>


[jira] [Created] (YUNIKORN-1038) Admission controller does not ignore the YuniKorn scheduler pod

2022-01-20 Thread Chaoran Yu (Jira)
Chaoran Yu created YUNIKORN-1038:


 Summary: Admission controller does not ignore the YuniKorn 
scheduler pod
 Key: YUNIKORN-1038
 URL: https://issues.apache.org/jira/browse/YUNIKORN-1038
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Affects Versions: 0.12.2
Reporter: Chaoran Yu
 Fix For: 1.0.0


The admission controller currently intercepts the YuniKorn scheduler pod just 
like other pods. This shouldn't happen because YuniKorn won't there to schedule 
itself when it's restarting. This is caused by not returning the value at this 
line 
[https://github.com/apache/incubator-yunikorn-k8shim/blob/v0.12.2-1/pkg/plugin/admissioncontrollers/webhook/admission_controller.go#L127]
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [RESULT] [VOTE] Release Apache YuniKorn (incubating) 0.12.2 RC2

2022-01-20 Thread Sunil Govindan
A late +1 (binding) from me.

I build this from source
- Ran basic spark job
- Verified UI
- Checked signature.
- Checked the images.

Thanks
Sunil

On Wed, Jan 19, 2022 at 8:44 AM Craig Condit  wrote:

> Hi all,
>
> The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
> with 3 binding +1 votes and 3 non-binding +1 votes.
>
> Vote thread:
> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
>
> Thank you to all the members who helped verify this release. We will move
> to IPMC voting shortly.
>
>
> Thanks,
> Craig
>
>
>


[jira] [Resolved] (YUNIKORN-1037) Update community conf links

2022-01-20 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang resolved YUNIKORN-1037.
---
Fix Version/s: 1.0.0
   Resolution: Fixed

> Update community conf links
> ---
>
> Key: YUNIKORN-1037
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1037
> Project: Apache YuniKorn
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> Entering 2022, the previous link doesn't work anymore.
> Updating the info on the website to the latest.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-1037) Update community conf links

2022-01-20 Thread Weiwei Yang (Jira)
Weiwei Yang created YUNIKORN-1037:
-

 Summary: Update community conf links
 Key: YUNIKORN-1037
 URL: https://issues.apache.org/jira/browse/YUNIKORN-1037
 Project: Apache YuniKorn
  Issue Type: Bug
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Entering 2022, the previous link doesn't work anymore.

Updating the info on the website to the latest.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org