I'd like to start the release process. The following items will be delivered as part of 1.5.1: https://issues.apache.org/jira/issues/?filter=12353383
No features in this release, only bugfixes. No further items are considered (unless something critical is found). Planned schedule: RC1 out: 10th May Voting: from 10th May to early next week, 13th-14th May Release: 15-16th May Thanks, Peter On Thu, May 2, 2024 at 2:11 AM Shravan Achar <shravan.ac...@apple.com.invalid> wrote: > Have been helping Peter with YUNIKORN-2526, and it has been a tricky > problem to reproduce and resolve. It makes sense to continue to make > progress on it without blocking the 1.5.1 patch release as it has > considerable fixes already (re: deadlock) > > Shravan > > On 2024/04/29 15:20:27 Peter Bacsko wrote: > > Hey Wilfred, > > > > Yes, I'm taking the role of release manager. > > I cherry-picked YUNIKORN-2520 to branch-1.5. > > > > Regarding the remaining JIRAs, I asked PoAn Yang on Slack to take a look > at > > YUNIKORN-2057 as he originally volunteered to solve it. I told him that > it > > was not urgent, but depending on how quickly he makes progress, we might > > re-consider our position later. > > > > Peter > > > > On Mon, Apr 29, 2024 at 5:00 AM Wilfred Spiegelenburg <wi...@apache.org> > > wrote: > > > > > Peter, > > > > > > Thank you for starting this discussion. See inline for further > comments. > > > > > > > Hi all, > > > > > > > > Due to the number of problems that we have discovered since the > release > > > of > > > > 1.5.0, I believe it makes sense to create a new Yunikorn release > which > > > > consists of bug fixes only. If I'm not mistaken we haven't done this > > > before > > > > (at least since leaving the ASF incubator), so this would be the > first > > > > minor Yunikorn release. > > > > > > +1 > > > I am totally for releasing YuniKorn 1.5.1 with the lock fixes. > > > Looking at all the work you have done for this release: would you be > > > willing to also step up as a release manager for the 1.5.1 release? > > > > > > > There are a bunch of fixes that are already on branch-1.5: > > > > > > > > - YUNIKORN-2521 Scheduler deadlock (resolved indirectly by > > > YUNIKORN-2544) > > > > - YUNIKORN-2539 Add optional deadlock detection > > > > - YUNIKORN-2544 [UMBRELLA] Fix Yunikorn potential locking issues > > > > - YUNIKORN-2543 Fix locking in RMProxy > > > > - YUNIKORN-2545 Eliminate multiple lock calls from Queue > > > > - YUNIKORN-2548 Potential deadlock during concurrent > > > > bottom-up/top-down queue traversal > > > > - YUNIKORN-2550 Fix locking in PartitionContext > > > > - YUNIKORN-2552 Recursive locking when sending remove queue > event > > > > - YUNIKORN-2553 [core] Enable deadlock detection during unit > tests > > > > - YUNIKORN-2563 [shim] Enable deadlock detection during unit > tests > > > > - YUNIKORN-2574 totalPartitionResource should not be mutated > with > > > > AddTo/SubFrom > > > > - YUNIKORN-2562 Nil pointer panic in > > > Application.ReplaceAllocation() > > > > > > > > > > Yes for all the above. > > > > > > > The following is In Progress for 1.5.1: > > > > > > > > - YUNIKORN-2526 Discrepancy between shim cache and core app/task > list > > > > after scheduler restart > > > > > > This would be a good one to get in if we have some progress on this. > > > Do we understand what is going on yet? I looked at the jira and am not > > > sure if we understand the root cause. > > > > > > > Candidates: > > > > > > > > - YUNIKORN-2520 PVC errors in AssumePod() are not handled > properly - > > > > Resolved, only cherry-picking is needed > > > > > > Yes, this could be added. > > > > > > I also think we need to check if we have any CVE fixes that need to be > > > added. > > > Quick check shows these two: > > > * golang.org/x/net 0.23 (CVE-2023-45288 or GO-2024-2687 via > YUNIKORN-2541) > > > * google.golang.org/protobuf to v1.33.0 (CVE-2024-24786 via > YUNIKORN-2469) > > > * build with golang 1.21.9 > > > > > > To satisfy the scanners, although we are not affected: > > > * K8s 1.29.4 (CVE-2024-3177) > > > > > > > > > > - YUNIKORN-2057 FindQueueByAppID is slow - Critical priority, "In > > > > progress" since Oct 2023 > > > > - YUNIKORN-1089 Application handling with invalid task group > > > annotations > > > > - Critical priority, no progress > > > > - YUNIKORN-1988 Preemption happens when a queue lower than its > > > > guaranteed capacity - Critical priority, "In progress" since Sep > 2023 > > > > > > No for the last 3 mentioned. We did not block the 1.5.0 release on > > > these and they have not made enough progress since then. > > > I would not consider them as a possible candidate for 1.5.1 > > > > > > Wilfred > > > > > > > > > > > Thoughts, opinions? What should be the scope of 1.5.1? > > > > > > > > Thanks, > > > > Peter > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org > > > For additional commands, e-mail: dev-h...@yunikorn.apache.org > > > > > > > >