[jira] [Created] (YUNIKORN-2630) Release context lock in shim when processing config in the core

2024-05-16 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2630:
---

 Summary: Release context lock in shim when processing config in 
the core
 Key: YUNIKORN-2630
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2630
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


When an change comes in for a the configmaps we process the change under a 
context lock as we need to merge the two configmaps.

We keep this lock even if all the work is done in the shim and processing has 
been transferred to the core. This is unneeded as the core has its own locking 
an serialisation of the changes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2628) fix release announcement links

2024-05-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2628.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

links are fixed after removing the {{..}} from the path

> fix release announcement links
> --
>
> Key: YUNIKORN-2628
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2628
> Project: Apache YuniKorn
>  Issue Type: Task
>  Components: website
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> In YUNIKORN-2595 a regression snuck in breaking the links to the release 
> announcements.
> Need to reverse that path change for the release announcements.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-16 Thread Desai, Mit
Thanks for confirming. I think we should go for 1.5.2 and leave 1.5.1 as is. As 
Craig said, tags should remain immutable.

-Mit

From: Peter Bacsko 
Date: Thursday, May 16, 2024 at 12:35 PM
To: dev@yunikorn.apache.org 
Subject: Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1
Yes, that is correct.

On Thu, May 16, 2024 at 8:54 PM Desai, Mit  wrote:

> This issue could also be faced by non-autoscaled clusters who still gets a
> node added at some point. Right?
>
> -Mit
>
> From: Peter Bacsko 
> Date: Thursday, May 16, 2024 at 11:23 AM
> To: dev@yunikorn.apache.org 
> Subject: Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1
> I'm fine with either approach. If it's too late, then let's go ahead with
> 1.5.1.
>
> Maybe it's better this way because we can do a more thorough verification.
>
> Peter
>
> On Thu, May 16, 2024 at 8:15 PM Craig Condit  wrote:
>
> > IMO, it’s too late to update 1.5.1. We’ve already cut the tags, and those
> > must remain immutable. Our best bet would probably be to continue with
> > 1.5.1 as-is; the new issue is unlikely to affect non-autoscaled clusters
> > and it’s better than 1.5.0. We should, I think, get this latest issue
> fixed
> > and go for 1.5.2.
> >
> > Do we have a fix yet? If so, we could probably push for 1.5.2 alone. But
> > either way, 1.5.1 is already baked.
> >
> >
> > Craig
> >
> >
> > > On May 16, 2024, at 1:06 PM, Peter Bacsko  wrote:
> > >
> > > Dear community,
> > >
> > > I've been working together with Jacob Salway on an issue and we found
> out
> > > that there's one more deadlock in the shim which can be triggered when
> a
> > > new node is added. This means that an autoscaler setup is prone to a
> > > deadlock.
> > >
> > > I filed a JIRA which explains the problem:
> > >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FYUNIKORN-2629=05%7C02%7Cmdesai%40visa.com%7C6df2ae04c69742bd432408dc75df47de%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638514849079024750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=eFVUmOC3yPtje6OZCAgvRfShyb3k93WkAzb7yVa8W4o%3D=0
> >
> > >
> > > I already published the release artifacts to the release repo, GitHub
> and
> > > dockerhub, however no announcement has been made. I think we need an
> RC2
> > > and re-run the voting and delete the artifacts.
> > >
> > > Thoughts, opinions?
> > >
> > > Peter
> > >
> > > On Thu, May 16, 2024 at 11:34 AM Peter Bacsko 
> wrote:
> > >
> > >> +1 binding
> > >>
> > >> - Built images from source (amd64) on Ubuntu 22.04
> > >> - Run make test && make image
> > >> - Run it on a local cluster
> > >> - Checked some REST API endpoints
> > >> - Ran sample jobs
> > >>
> > >> Thank you all for the voting on the RC1 for 1.5.1.
> > >>
> > >> Voting for the release has passed with:
> > >> 5 binding +1
> > >> 3 non binding +1
> > >>
> > >> no 0 or -1 votes.
> > >>
> > >> As the next step, I'll publish the release, images and update the
> > website.
> > >> After that is done I will send an announcement email.
> > >>
> > >> Thank you,
> > >> Peter
> > >>
> > >>
> > >> On Wed, May 15, 2024 at 4:45 PM Manikandan R 
> > wrote:
> > >>
> > >>> +1 (Binding)
> > >>>
> > >>> - Built images from source on Mac M1 MacOS Monterey (arm64) with go
> > 1.21.8
> > >>> - Verified the signatures
> > >>> - Verified the licences and checksums
> > >>> - Run the scheduler with a local kind cluster (version 1.29.0)
> > >>> - Ran simple sleep jobs
> > >>> - Verified REST APIs outputs, Web UI
> > >>>
> > >>> Thanks,
> > >>> Mani
> > >>>
> > >>> On Tue, May 14, 2024 at 9:41 PM Desai, Mit 
> > >>> wrote:
> > >>>
> >  +1 (non-binding)
> > 
> > 
> >   *   Built release on MacOS Sonoma (arm64)
> >   *   Installed locally on Kind Cluster (1.28)
> >   *   Successfully ran make test
> >   *   Ran sample sleep jobs
> > 
> >  Thank you, Peter, for your efforts in driving the release.
> > 
> >  - Mit Desai
> > 
> >  From: Peter Bacsko 
> >  Date: Friday, May 10, 2024 at 1:41 AM
> >  To: dev@yunikorn.apache.org 
> >  Subject: [VOTE] Release Apache YuniKorn 1.5.1 RC1
> >  Hello everyone,
> > 
> >  I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
> >  This is a minor release which contains only bugfixes.
> > 
> >  The release artefacts have been uploaded here:
> > 
> > 
> > >>>
> >
> 

Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-16 Thread Peter Bacsko
Yes, that is correct.

On Thu, May 16, 2024 at 8:54 PM Desai, Mit  wrote:

> This issue could also be faced by non-autoscaled clusters who still gets a
> node added at some point. Right?
>
> -Mit
>
> From: Peter Bacsko 
> Date: Thursday, May 16, 2024 at 11:23 AM
> To: dev@yunikorn.apache.org 
> Subject: Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1
> I'm fine with either approach. If it's too late, then let's go ahead with
> 1.5.1.
>
> Maybe it's better this way because we can do a more thorough verification.
>
> Peter
>
> On Thu, May 16, 2024 at 8:15 PM Craig Condit  wrote:
>
> > IMO, it’s too late to update 1.5.1. We’ve already cut the tags, and those
> > must remain immutable. Our best bet would probably be to continue with
> > 1.5.1 as-is; the new issue is unlikely to affect non-autoscaled clusters
> > and it’s better than 1.5.0. We should, I think, get this latest issue
> fixed
> > and go for 1.5.2.
> >
> > Do we have a fix yet? If so, we could probably push for 1.5.2 alone. But
> > either way, 1.5.1 is already baked.
> >
> >
> > Craig
> >
> >
> > > On May 16, 2024, at 1:06 PM, Peter Bacsko  wrote:
> > >
> > > Dear community,
> > >
> > > I've been working together with Jacob Salway on an issue and we found
> out
> > > that there's one more deadlock in the shim which can be triggered when
> a
> > > new node is added. This means that an autoscaler setup is prone to a
> > > deadlock.
> > >
> > > I filed a JIRA which explains the problem:
> > >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FYUNIKORN-2629=05%7C02%7Cmdesai%40visa.com%7Cdcae530e12744b3b959d08dc75d549a6%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638514806143403455%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=hsPVxgphiw1b6zTFScOiQ%2Fea7%2BydOqLpZCdqp38V26s%3D=0
> 
> > >
> > > I already published the release artifacts to the release repo, GitHub
> and
> > > dockerhub, however no announcement has been made. I think we need an
> RC2
> > > and re-run the voting and delete the artifacts.
> > >
> > > Thoughts, opinions?
> > >
> > > Peter
> > >
> > > On Thu, May 16, 2024 at 11:34 AM Peter Bacsko 
> wrote:
> > >
> > >> +1 binding
> > >>
> > >> - Built images from source (amd64) on Ubuntu 22.04
> > >> - Run make test && make image
> > >> - Run it on a local cluster
> > >> - Checked some REST API endpoints
> > >> - Ran sample jobs
> > >>
> > >> Thank you all for the voting on the RC1 for 1.5.1.
> > >>
> > >> Voting for the release has passed with:
> > >> 5 binding +1
> > >> 3 non binding +1
> > >>
> > >> no 0 or -1 votes.
> > >>
> > >> As the next step, I'll publish the release, images and update the
> > website.
> > >> After that is done I will send an announcement email.
> > >>
> > >> Thank you,
> > >> Peter
> > >>
> > >>
> > >> On Wed, May 15, 2024 at 4:45 PM Manikandan R 
> > wrote:
> > >>
> > >>> +1 (Binding)
> > >>>
> > >>> - Built images from source on Mac M1 MacOS Monterey (arm64) with go
> > 1.21.8
> > >>> - Verified the signatures
> > >>> - Verified the licences and checksums
> > >>> - Run the scheduler with a local kind cluster (version 1.29.0)
> > >>> - Ran simple sleep jobs
> > >>> - Verified REST APIs outputs, Web UI
> > >>>
> > >>> Thanks,
> > >>> Mani
> > >>>
> > >>> On Tue, May 14, 2024 at 9:41 PM Desai, Mit 
> > >>> wrote:
> > >>>
> >  +1 (non-binding)
> > 
> > 
> >   *   Built release on MacOS Sonoma (arm64)
> >   *   Installed locally on Kind Cluster (1.28)
> >   *   Successfully ran make test
> >   *   Ran sample sleep jobs
> > 
> >  Thank you, Peter, for your efforts in driving the release.
> > 
> >  - Mit Desai
> > 
> >  From: Peter Bacsko 
> >  Date: Friday, May 10, 2024 at 1:41 AM
> >  To: dev@yunikorn.apache.org 
> >  Subject: [VOTE] Release Apache YuniKorn 1.5.1 RC1
> >  Hello everyone,
> > 
> >  I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
> >  This is a minor release which contains only bugfixes.
> > 
> >  The release artefacts have been uploaded here:
> > 
> > 
> > >>>
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fyunikorn%2F1.5.1-RC1%2F=05%7C02%7Cmdesai%40visa.com%7Cdcae530e12744b3b959d08dc75d549a6%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638514806143415049%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=wJ%2FIMOPtTVnE2assTlZVTb7PHUIpU5Qqe5nrHc0iz5w%3D=0
> 
> >  <
> 

Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-16 Thread Desai, Mit
This issue could also be faced by non-autoscaled clusters who still gets a node 
added at some point. Right?

-Mit

From: Peter Bacsko 
Date: Thursday, May 16, 2024 at 11:23 AM
To: dev@yunikorn.apache.org 
Subject: Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1
I'm fine with either approach. If it's too late, then let's go ahead with
1.5.1.

Maybe it's better this way because we can do a more thorough verification.

Peter

On Thu, May 16, 2024 at 8:15 PM Craig Condit  wrote:

> IMO, it’s too late to update 1.5.1. We’ve already cut the tags, and those
> must remain immutable. Our best bet would probably be to continue with
> 1.5.1 as-is; the new issue is unlikely to affect non-autoscaled clusters
> and it’s better than 1.5.0. We should, I think, get this latest issue fixed
> and go for 1.5.2.
>
> Do we have a fix yet? If so, we could probably push for 1.5.2 alone. But
> either way, 1.5.1 is already baked.
>
>
> Craig
>
>
> > On May 16, 2024, at 1:06 PM, Peter Bacsko  wrote:
> >
> > Dear community,
> >
> > I've been working together with Jacob Salway on an issue and we found out
> > that there's one more deadlock in the shim which can be triggered when a
> > new node is added. This means that an autoscaler setup is prone to a
> > deadlock.
> >
> > I filed a JIRA which explains the problem:
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FYUNIKORN-2629=05%7C02%7Cmdesai%40visa.com%7Cdcae530e12744b3b959d08dc75d549a6%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638514806143403455%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=hsPVxgphiw1b6zTFScOiQ%2Fea7%2BydOqLpZCdqp38V26s%3D=0
> >
> > I already published the release artifacts to the release repo, GitHub and
> > dockerhub, however no announcement has been made. I think we need an RC2
> > and re-run the voting and delete the artifacts.
> >
> > Thoughts, opinions?
> >
> > Peter
> >
> > On Thu, May 16, 2024 at 11:34 AM Peter Bacsko  wrote:
> >
> >> +1 binding
> >>
> >> - Built images from source (amd64) on Ubuntu 22.04
> >> - Run make test && make image
> >> - Run it on a local cluster
> >> - Checked some REST API endpoints
> >> - Ran sample jobs
> >>
> >> Thank you all for the voting on the RC1 for 1.5.1.
> >>
> >> Voting for the release has passed with:
> >> 5 binding +1
> >> 3 non binding +1
> >>
> >> no 0 or -1 votes.
> >>
> >> As the next step, I'll publish the release, images and update the
> website.
> >> After that is done I will send an announcement email.
> >>
> >> Thank you,
> >> Peter
> >>
> >>
> >> On Wed, May 15, 2024 at 4:45 PM Manikandan R 
> wrote:
> >>
> >>> +1 (Binding)
> >>>
> >>> - Built images from source on Mac M1 MacOS Monterey (arm64) with go
> 1.21.8
> >>> - Verified the signatures
> >>> - Verified the licences and checksums
> >>> - Run the scheduler with a local kind cluster (version 1.29.0)
> >>> - Ran simple sleep jobs
> >>> - Verified REST APIs outputs, Web UI
> >>>
> >>> Thanks,
> >>> Mani
> >>>
> >>> On Tue, May 14, 2024 at 9:41 PM Desai, Mit 
> >>> wrote:
> >>>
>  +1 (non-binding)
> 
> 
>   *   Built release on MacOS Sonoma (arm64)
>   *   Installed locally on Kind Cluster (1.28)
>   *   Successfully ran make test
>   *   Ran sample sleep jobs
> 
>  Thank you, Peter, for your efforts in driving the release.
> 
>  - Mit Desai
> 
>  From: Peter Bacsko 
>  Date: Friday, May 10, 2024 at 1:41 AM
>  To: dev@yunikorn.apache.org 
>  Subject: [VOTE] Release Apache YuniKorn 1.5.1 RC1
>  Hello everyone,
> 
>  I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
>  This is a minor release which contains only bugfixes.
> 
>  The release artefacts have been uploaded here:
> 
> 
> >>>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fyunikorn%2F1.5.1-RC1%2F=05%7C02%7Cmdesai%40visa.com%7Cdcae530e12744b3b959d08dc75d549a6%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638514806143415049%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=wJ%2FIMOPtTVnE2assTlZVTb7PHUIpU5Qqe5nrHc0iz5w%3D=0
>  >
> 
>  My public key is located in the KEYS file:
> 
> 
> >>>
> 

Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-16 Thread Peter Bacsko
I'm fine with either approach. If it's too late, then let's go ahead with
1.5.1.

Maybe it's better this way because we can do a more thorough verification.

Peter

On Thu, May 16, 2024 at 8:15 PM Craig Condit  wrote:

> IMO, it’s too late to update 1.5.1. We’ve already cut the tags, and those
> must remain immutable. Our best bet would probably be to continue with
> 1.5.1 as-is; the new issue is unlikely to affect non-autoscaled clusters
> and it’s better than 1.5.0. We should, I think, get this latest issue fixed
> and go for 1.5.2.
>
> Do we have a fix yet? If so, we could probably push for 1.5.2 alone. But
> either way, 1.5.1 is already baked.
>
>
> Craig
>
>
> > On May 16, 2024, at 1:06 PM, Peter Bacsko  wrote:
> >
> > Dear community,
> >
> > I've been working together with Jacob Salway on an issue and we found out
> > that there's one more deadlock in the shim which can be triggered when a
> > new node is added. This means that an autoscaler setup is prone to a
> > deadlock.
> >
> > I filed a JIRA which explains the problem:
> > https://issues.apache.org/jira/browse/YUNIKORN-2629
> >
> > I already published the release artifacts to the release repo, GitHub and
> > dockerhub, however no announcement has been made. I think we need an RC2
> > and re-run the voting and delete the artifacts.
> >
> > Thoughts, opinions?
> >
> > Peter
> >
> > On Thu, May 16, 2024 at 11:34 AM Peter Bacsko  wrote:
> >
> >> +1 binding
> >>
> >> - Built images from source (amd64) on Ubuntu 22.04
> >> - Run make test && make image
> >> - Run it on a local cluster
> >> - Checked some REST API endpoints
> >> - Ran sample jobs
> >>
> >> Thank you all for the voting on the RC1 for 1.5.1.
> >>
> >> Voting for the release has passed with:
> >> 5 binding +1
> >> 3 non binding +1
> >>
> >> no 0 or -1 votes.
> >>
> >> As the next step, I'll publish the release, images and update the
> website.
> >> After that is done I will send an announcement email.
> >>
> >> Thank you,
> >> Peter
> >>
> >>
> >> On Wed, May 15, 2024 at 4:45 PM Manikandan R 
> wrote:
> >>
> >>> +1 (Binding)
> >>>
> >>> - Built images from source on Mac M1 MacOS Monterey (arm64) with go
> 1.21.8
> >>> - Verified the signatures
> >>> - Verified the licences and checksums
> >>> - Run the scheduler with a local kind cluster (version 1.29.0)
> >>> - Ran simple sleep jobs
> >>> - Verified REST APIs outputs, Web UI
> >>>
> >>> Thanks,
> >>> Mani
> >>>
> >>> On Tue, May 14, 2024 at 9:41 PM Desai, Mit 
> >>> wrote:
> >>>
>  +1 (non-binding)
> 
> 
>   *   Built release on MacOS Sonoma (arm64)
>   *   Installed locally on Kind Cluster (1.28)
>   *   Successfully ran make test
>   *   Ran sample sleep jobs
> 
>  Thank you, Peter, for your efforts in driving the release.
> 
>  - Mit Desai
> 
>  From: Peter Bacsko 
>  Date: Friday, May 10, 2024 at 1:41 AM
>  To: dev@yunikorn.apache.org 
>  Subject: [VOTE] Release Apache YuniKorn 1.5.1 RC1
>  Hello everyone,
> 
>  I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
>  This is a minor release which contains only bugfixes.
> 
>  The release artefacts have been uploaded here:
> 
> 
> >>>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fyunikorn%2F1.5.1-RC1%2F=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668929112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=DjD5Z%2BWZJwP%2Brya2vzsYf%2BMawgZ%2B57Uc6ksy6daaOLk%3D=0
>  
> 
>  My public key is located in the KEYS file:
> 
> 
> >>>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownloads.apache.org%2F%2Fyunikorn%2FKEYS=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668939209%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=bSdAxF2fZu4mbBCmWSAFCtUr3lN8Ok1j6wFG%2FjCExt8%3D=0
>  
> 
>  JIRA issues that have been resolved in this release:
> 
> 
> >>>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Ffilter%3D12353383=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668945621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=YXpRtzAMX1WVourp29T3sm6hWciTzJDOFhPtjKwNMM4%3D=0
>  
> 
>  The release solves a deadlock issue. If possible, test Yunikorn with
>  workloads that put Yunikorn under stress (ie. thousands/tens of
> >>> thousands
>  of pods).
> 
>  Git 

Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-16 Thread Craig Condit
IMO, it’s too late to update 1.5.1. We’ve already cut the tags, and those must 
remain immutable. Our best bet would probably be to continue with 1.5.1 as-is; 
the new issue is unlikely to affect non-autoscaled clusters and it’s better 
than 1.5.0. We should, I think, get this latest issue fixed and go for 1.5.2.

Do we have a fix yet? If so, we could probably push for 1.5.2 alone. But either 
way, 1.5.1 is already baked.


Craig


> On May 16, 2024, at 1:06 PM, Peter Bacsko  wrote:
> 
> Dear community,
> 
> I've been working together with Jacob Salway on an issue and we found out
> that there's one more deadlock in the shim which can be triggered when a
> new node is added. This means that an autoscaler setup is prone to a
> deadlock.
> 
> I filed a JIRA which explains the problem:
> https://issues.apache.org/jira/browse/YUNIKORN-2629
> 
> I already published the release artifacts to the release repo, GitHub and
> dockerhub, however no announcement has been made. I think we need an RC2
> and re-run the voting and delete the artifacts.
> 
> Thoughts, opinions?
> 
> Peter
> 
> On Thu, May 16, 2024 at 11:34 AM Peter Bacsko  wrote:
> 
>> +1 binding
>> 
>> - Built images from source (amd64) on Ubuntu 22.04
>> - Run make test && make image
>> - Run it on a local cluster
>> - Checked some REST API endpoints
>> - Ran sample jobs
>> 
>> Thank you all for the voting on the RC1 for 1.5.1.
>> 
>> Voting for the release has passed with:
>> 5 binding +1
>> 3 non binding +1
>> 
>> no 0 or -1 votes.
>> 
>> As the next step, I'll publish the release, images and update the website.
>> After that is done I will send an announcement email.
>> 
>> Thank you,
>> Peter
>> 
>> 
>> On Wed, May 15, 2024 at 4:45 PM Manikandan R  wrote:
>> 
>>> +1 (Binding)
>>> 
>>> - Built images from source on Mac M1 MacOS Monterey (arm64) with go 1.21.8
>>> - Verified the signatures
>>> - Verified the licences and checksums
>>> - Run the scheduler with a local kind cluster (version 1.29.0)
>>> - Ran simple sleep jobs
>>> - Verified REST APIs outputs, Web UI
>>> 
>>> Thanks,
>>> Mani
>>> 
>>> On Tue, May 14, 2024 at 9:41 PM Desai, Mit 
>>> wrote:
>>> 
 +1 (non-binding)
 
 
  *   Built release on MacOS Sonoma (arm64)
  *   Installed locally on Kind Cluster (1.28)
  *   Successfully ran make test
  *   Ran sample sleep jobs
 
 Thank you, Peter, for your efforts in driving the release.
 
 - Mit Desai
 
 From: Peter Bacsko 
 Date: Friday, May 10, 2024 at 1:41 AM
 To: dev@yunikorn.apache.org 
 Subject: [VOTE] Release Apache YuniKorn 1.5.1 RC1
 Hello everyone,
 
 I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
 This is a minor release which contains only bugfixes.
 
 The release artefacts have been uploaded here:
 
 
>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fyunikorn%2F1.5.1-RC1%2F=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668929112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=DjD5Z%2BWZJwP%2Brya2vzsYf%2BMawgZ%2B57Uc6ksy6daaOLk%3D=0
 
 
 My public key is located in the KEYS file:
 
 
>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownloads.apache.org%2F%2Fyunikorn%2FKEYS=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668939209%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=bSdAxF2fZu4mbBCmWSAFCtUr3lN8Ok1j6wFG%2FjCExt8%3D=0
 
 
 JIRA issues that have been resolved in this release:
 
 
>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Ffilter%3D12353383=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668945621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=YXpRtzAMX1WVourp29T3sm6hWciTzJDOFhPtjKwNMM4%3D=0
 
 
 The release solves a deadlock issue. If possible, test Yunikorn with
 workloads that put Yunikorn under stress (ie. thousands/tens of
>>> thousands
 of pods).
 
 Git tags for each component are as follows:
 yunikorn-scheduler-interface: v1.5.1-1
 yunikorn-core: v1.5.1-1
 yunikorn-k8shim: v1.5.1-1
 yunikorn-web: v1.5.1-1
 yunikorn-release: v1.5.1-1
 
 Once the release is voted on and approved, all repos will be tagged
 1.5.1 for consistency.
 
 Please review and vote. The vote will be open for at least 96 hours
 and closes on Tuesday 14 May 

Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-16 Thread Peter Bacsko
Dear community,

I've been working together with Jacob Salway on an issue and we found out
that there's one more deadlock in the shim which can be triggered when a
new node is added. This means that an autoscaler setup is prone to a
deadlock.

I filed a JIRA which explains the problem:
https://issues.apache.org/jira/browse/YUNIKORN-2629

I already published the release artifacts to the release repo, GitHub and
dockerhub, however no announcement has been made. I think we need an RC2
and re-run the voting and delete the artifacts.

Thoughts, opinions?

Peter

On Thu, May 16, 2024 at 11:34 AM Peter Bacsko  wrote:

> +1 binding
>
> - Built images from source (amd64) on Ubuntu 22.04
> - Run make test && make image
> - Run it on a local cluster
> - Checked some REST API endpoints
> - Ran sample jobs
>
> Thank you all for the voting on the RC1 for 1.5.1.
>
> Voting for the release has passed with:
> 5 binding +1
> 3 non binding +1
>
> no 0 or -1 votes.
>
> As the next step, I'll publish the release, images and update the website.
> After that is done I will send an announcement email.
>
> Thank you,
> Peter
>
>
> On Wed, May 15, 2024 at 4:45 PM Manikandan R  wrote:
>
>> +1 (Binding)
>>
>> - Built images from source on Mac M1 MacOS Monterey (arm64) with go 1.21.8
>> - Verified the signatures
>> - Verified the licences and checksums
>> - Run the scheduler with a local kind cluster (version 1.29.0)
>> - Ran simple sleep jobs
>> - Verified REST APIs outputs, Web UI
>>
>> Thanks,
>> Mani
>>
>> On Tue, May 14, 2024 at 9:41 PM Desai, Mit 
>> wrote:
>>
>> > +1 (non-binding)
>> >
>> >
>> >   *   Built release on MacOS Sonoma (arm64)
>> >   *   Installed locally on Kind Cluster (1.28)
>> >   *   Successfully ran make test
>> >   *   Ran sample sleep jobs
>> >
>> > Thank you, Peter, for your efforts in driving the release.
>> >
>> > - Mit Desai
>> >
>> > From: Peter Bacsko 
>> > Date: Friday, May 10, 2024 at 1:41 AM
>> > To: dev@yunikorn.apache.org 
>> > Subject: [VOTE] Release Apache YuniKorn 1.5.1 RC1
>> > Hello everyone,
>> >
>> > I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
>> > This is a minor release which contains only bugfixes.
>> >
>> > The release artefacts have been uploaded here:
>> >
>> >
>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fyunikorn%2F1.5.1-RC1%2F=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668929112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=DjD5Z%2BWZJwP%2Brya2vzsYf%2BMawgZ%2B57Uc6ksy6daaOLk%3D=0
>> > 
>> >
>> > My public key is located in the KEYS file:
>> >
>> >
>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownloads.apache.org%2F%2Fyunikorn%2FKEYS=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668939209%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=bSdAxF2fZu4mbBCmWSAFCtUr3lN8Ok1j6wFG%2FjCExt8%3D=0
>> > 
>> >
>> > JIRA issues that have been resolved in this release:
>> >
>> >
>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Ffilter%3D12353383=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668945621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=YXpRtzAMX1WVourp29T3sm6hWciTzJDOFhPtjKwNMM4%3D=0
>> > 
>> >
>> > The release solves a deadlock issue. If possible, test Yunikorn with
>> > workloads that put Yunikorn under stress (ie. thousands/tens of
>> thousands
>> > of pods).
>> >
>> > Git tags for each component are as follows:
>> > yunikorn-scheduler-interface: v1.5.1-1
>> > yunikorn-core: v1.5.1-1
>> > yunikorn-k8shim: v1.5.1-1
>> > yunikorn-web: v1.5.1-1
>> > yunikorn-release: v1.5.1-1
>> >
>> > Once the release is voted on and approved, all repos will be tagged
>> > 1.5.1 for consistency.
>> >
>> > Please review and vote. The vote will be open for at least 96 hours
>> > and closes on Tuesday 14 May 2024, 20:00:00 CEST.
>> >
>> > [ ] +1 Approve
>> > [ ] +0 No opinion
>> > [ ] -1 Disapprove (and the reason why)
>> >
>> >
>> > Thank you,
>> > Peter
>> >
>>
>


[jira] [Created] (YUNIKORN-2629) Adding a node can result in a deadlock

2024-05-16 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2629:
--

 Summary: Adding a node can result in a deadlock
 Key: YUNIKORN-2629
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2629
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: shim - kubernetes
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Adding a new node after Yunikorn state initialization can result in a deadlock.

The problem is that {{Context.addNode()}} holds a lock while we're waiting for 
the {{NodeAccepted}} event:
{noformat}
dispatcher.RegisterEventHandler(handlerID, dispatcher.EventTypeNode, func(event 
interface{}) {
nodeEvent, ok := event.(CachedSchedulerNodeEvent)
if !ok {
return
}
[...] removed for clarity
wg.Done()
})
defer dispatcher.UnregisterEventHandler(handlerID, 
dispatcher.EventTypeNode)
api := ctx.apiProvider.GetAPIs().SchedulerAPI
if err := api.UpdateNode({
Nodes: nodesToRegister,
RmID:  schedulerconf.GetSchedulerConf().ClusterID,
}); err != nil {
log.Log(log.ShimContext).Error("Failed to register nodes", 
zap.Error(err))
return nil, err
}

// wait for all responses to accumulate
wg.Wait()  <--- shim gets stuck here
 {noformat}
If tasks are being processed, then the dispatcher will try to retrieve the 
evend handler, which is returned from Context:
{noformat}
go func() {
for {
select {
case event := <-getDispatcher().eventChan:
switch v := event.(type) {
case events.TaskEvent:
getEventHandler(EventTypeTask)(v)  <--- 
eventually calls Context.getTask()
case events.ApplicationEvent:
getEventHandler(EventTypeApp)(v)
case events.SchedulerNodeEvent:
getEventHandler(EventTypeNode)(v)  
{noformat}

Since {{addNode()}} is holding a write lock, the event processing loop gets 
stuck.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2612) Tagging for 1.5.1

2024-05-16 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2612.

Fix Version/s: 1.5.1
   Resolution: Fixed

> Tagging for 1.5.1
> -
>
> Key: YUNIKORN-2612
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2612
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: release
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 1.5.1
>
>
> Tagging for updating dependencies (SI/core/k8shim).
> No branching is needed because we'll deliver the release from branch-1.5 
> directly as we did with incubator minor releases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2602) Fix spelling/grammar in configvalidator

2024-05-16 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved YUNIKORN-2602.
--
Fix Version/s: 1.6.0
   Resolution: Fixed

> Fix spelling/grammar in configvalidator
> ---
>
> Key: YUNIKORN-2602
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2602
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - common
>Reporter: Peter Bacsko
>Assignee: Yun Sun
>Priority: Trivial
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> Let's fix some minor grammar issues in configvalidator.go.
> Eg.: "existed" -> "existing", but there could be other mistakes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



Re: [VOTE] Release Apache YuniKorn 1.5.1 RC1

2024-05-16 Thread Peter Bacsko
+1 binding

- Built images from source (amd64) on Ubuntu 22.04
- Run make test && make image
- Run it on a local cluster
- Checked some REST API endpoints
- Ran sample jobs

Thank you all for the voting on the RC1 for 1.5.1.

Voting for the release has passed with:
5 binding +1
3 non binding +1

no 0 or -1 votes.

As the next step, I'll publish the release, images and update the website.
After that is done I will send an announcement email.

Thank you,
Peter


On Wed, May 15, 2024 at 4:45 PM Manikandan R  wrote:

> +1 (Binding)
>
> - Built images from source on Mac M1 MacOS Monterey (arm64) with go 1.21.8
> - Verified the signatures
> - Verified the licences and checksums
> - Run the scheduler with a local kind cluster (version 1.29.0)
> - Ran simple sleep jobs
> - Verified REST APIs outputs, Web UI
>
> Thanks,
> Mani
>
> On Tue, May 14, 2024 at 9:41 PM Desai, Mit 
> wrote:
>
> > +1 (non-binding)
> >
> >
> >   *   Built release on MacOS Sonoma (arm64)
> >   *   Installed locally on Kind Cluster (1.28)
> >   *   Successfully ran make test
> >   *   Ran sample sleep jobs
> >
> > Thank you, Peter, for your efforts in driving the release.
> >
> > - Mit Desai
> >
> > From: Peter Bacsko 
> > Date: Friday, May 10, 2024 at 1:41 AM
> > To: dev@yunikorn.apache.org 
> > Subject: [VOTE] Release Apache YuniKorn 1.5.1 RC1
> > Hello everyone,
> >
> > I would like to call a vote for releasing Apache YuniKorn 1.5.1 RC1.
> > This is a minor release which contains only bugfixes.
> >
> > The release artefacts have been uploaded here:
> >
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fyunikorn%2F1.5.1-RC1%2F=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668929112%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=DjD5Z%2BWZJwP%2Brya2vzsYf%2BMawgZ%2B57Uc6ksy6daaOLk%3D=0
> > 
> >
> > My public key is located in the KEYS file:
> >
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdownloads.apache.org%2F%2Fyunikorn%2FKEYS=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668939209%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=bSdAxF2fZu4mbBCmWSAFCtUr3lN8Ok1j6wFG%2FjCExt8%3D=0
> > 
> >
> > JIRA issues that have been resolved in this release:
> >
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fissues%2F%3Ffilter%3D12353383=05%7C02%7Cmdesai%40visa.com%7C2a3124b63a9d4c5c1e0e08dc70cced61%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C638509272668945621%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=YXpRtzAMX1WVourp29T3sm6hWciTzJDOFhPtjKwNMM4%3D=0
> > 
> >
> > The release solves a deadlock issue. If possible, test Yunikorn with
> > workloads that put Yunikorn under stress (ie. thousands/tens of thousands
> > of pods).
> >
> > Git tags for each component are as follows:
> > yunikorn-scheduler-interface: v1.5.1-1
> > yunikorn-core: v1.5.1-1
> > yunikorn-k8shim: v1.5.1-1
> > yunikorn-web: v1.5.1-1
> > yunikorn-release: v1.5.1-1
> >
> > Once the release is voted on and approved, all repos will be tagged
> > 1.5.1 for consistency.
> >
> > Please review and vote. The vote will be open for at least 96 hours
> > and closes on Tuesday 14 May 2024, 20:00:00 CEST.
> >
> > [ ] +1 Approve
> > [ ] +0 No opinion
> > [ ] -1 Disapprove (and the reason why)
> >
> >
> > Thank you,
> > Peter
> >
>


[jira] [Resolved] (YUNIKORN-2627) Add K8s 1.30 to the e2e matrix

2024-05-16 Thread Wilfred Spiegelenburg (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2627.
-
Fix Version/s: 1.6.0
   Resolution: Fixed

Upgrdaed kind to version 0.23 and added 1.30 as a new version to test with

> Add K8s 1.30 to the e2e matrix
> --
>
> Key: YUNIKORN-2627
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2627
> Project: Apache YuniKorn
>  Issue Type: Improvement
>Reporter: Wilfred Spiegelenburg
>Assignee: Tseng Hsi-Huang
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 1.6.0
>
>
> k8s 1.30 support in kind is now available as part of the [0.23 
> release|https://github.com/kubernetes-sigs/kind/releases/tag/v0.23.0]
> Need to add 1.30 to the matrix for the next release



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org