[jira] [Resolved] (YUNIKORN-2738) Only check failure reason once not for every pod
[ https://issues.apache.org/jira/browse/YUNIKORN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2738. - Fix Version/s: 1.6.0 Resolution: Fixed > Only check failure reason once not for every pod > > > Key: YUNIKORN-2738 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2738 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > The reason for an application failure does not change and can be > pre-calculated for all pods when a failure is handled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2738) Only check failure reason once not for every pod
Wilfred Spiegelenburg created YUNIKORN-2738: --- Summary: Only check failure reason once not for every pod Key: YUNIKORN-2738 URL: https://issues.apache.org/jira/browse/YUNIKORN-2738 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The reason for an application failure does not change and can be pre-calculated for all pods when a failure is handled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2737) Cleanup handleFailApplicationEvent handling
Wilfred Spiegelenburg created YUNIKORN-2737: --- Summary: Cleanup handleFailApplicationEvent handling Key: YUNIKORN-2737 URL: https://issues.apache.org/jira/browse/YUNIKORN-2737 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg When we handle a failed application in the shim in {{handleFailApplicationEvent()}} we call the placeholder cleanup. Three issues: * The cleanup needs the app lock after it takes the mgr lock. The app lock is already held when we process the event. Should be placing the cleanup last to not hold the manager lock for longer than needed * failing an application is triggered by the core which should do the cleanup already so this might be redundant to start with. * The failure handling also marks unassigned pods as failed which means there is an overlap between the failure handling and the placeholder cleanup which we should remove. Either ignore all placeholders in the failure or only cleanup assigned placeholders. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2734. - Fix Version/s: 1.6.0 Resolution: Delivered The TODO was removed as part of the changes in YUNIKORN-2729. Since we do not want to make this configurable that is all we need, closing again with a link to the Jira that has the change. > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Trivial > Labels: newbie > Fix For: 1.6.0 > > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Reopened] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg reopened YUNIKORN-2734: - > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Minor > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2705) DOAP is malformed
[ https://issues.apache.org/jira/browse/YUNIKORN-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2705. - Fix Version/s: 1.5.2 Resolution: Fixed Committed the change: online validator can be found [here.|https://www.w3.org/RDF/Validator/] The validator fails before the PR is applied and passes after. The format is defined [here|https://www.w3.org/RDF/], we could leverage the python libraries in our release tool to check the content or even automate the update of the doap file. Leaving that open for other to pursue if considered relevant. > DOAP is malformed > - > > Key: YUNIKORN-2705 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2705 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Arnout Engelen >Assignee: Arnout Engelen >Priority: Major > Fix For: 1.5.2 > > > you cannot have multiple 'Version' nodes under one 'release' property in > RDF/XML -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2665) Gang app originator pod changes after restart
[ https://issues.apache.org/jira/browse/YUNIKORN-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2665. - Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed Changes have been committed and backported into the 1.5 branch closing > Gang app originator pod changes after restart > - > > Key: YUNIKORN-2665 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2665 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1 >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > Gang app choose the first pod (who created the app) as originator pod which > becomes the real driver pod later. While processing gang app specifically > after the placeholder creation and in the process of replacement, restart can > lead to the below described incorrect behaviour: > During restore, there is no guarantee on the ordering of pods coming from K8s > lister especially when all the pods created with the same second timestamp. > k8s use the seconds based timestamp, which means all pods created with in > same second has same timestamp. During this situation, whichever pod comes > first from lister, YK designate it as originator pod. So, any placeholder > could become the originator pod and actual originator pod has been lost. This > change could cause rippling effects leading to weird behaviour and needs to > be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2672) Upgrade to K8s 1.29.6
Wilfred Spiegelenburg created YUNIKORN-2672: --- Summary: Upgrade to K8s 1.29.6 Key: YUNIKORN-2672 URL: https://issues.apache.org/jira/browse/YUNIKORN-2672 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg A major performance regression was fixed in K8s that on analysis mainly impacts the plugin implementation. The regression is part of the release 1.29.4 we currently build against. See [https://github.com/kubernetes/kubernetes/pull/125197] for details -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2655) Cleanup REST API documentation
Wilfred Spiegelenburg created YUNIKORN-2655: --- Summary: Cleanup REST API documentation Key: YUNIKORN-2655 URL: https://issues.apache.org/jira/browse/YUNIKORN-2655 Project: Apache YuniKorn Issue Type: Task Components: documentation Reporter: Wilfred Spiegelenburg The REST API documentation is not up to date with the current behaviour as it does not show any 400 or 404 errors returned by a number of API calls. The error response only shows a 500 code with the same message for each call. We should move to a simple list for each call showing the applicable errors like this: {code:java} ### Error responses **Code** : `400 Bad Request` (URL query is invalid, missing partition name) **Code** : `404 Not Found` (Partition not found) **Code** : `500 Internal Server Error` {code} Remove the error examples as they do not add any detail required -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2654) Remove unused code in k8shim context
Wilfred Spiegelenburg created YUNIKORN-2654: --- Summary: Remove unused code in k8shim context Key: YUNIKORN-2654 URL: https://issues.apache.org/jira/browse/YUNIKORN-2654 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg The NotifyApplicationComplete and NotifyApplicationFail function are not called by anything and are unused code. The K8shim does not trigger the application completion or failure. This is triggered by the core when the application no longer has any activity registered. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2653) Gang scheduling K8s event formatting compliance
Wilfred Spiegelenburg created YUNIKORN-2653: --- Summary: Gang scheduling K8s event formatting compliance Key: YUNIKORN-2653 URL: https://issues.apache.org/jira/browse/YUNIKORN-2653 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The K8s events provide definitions and rules around the content of the fields within the event. Adjust the content of gang scheduling related events to comply with the rules. Focussed on the reason and action fields only. * 'reason' is the reason this event is generated. 'reason' should be short and unique; it should be in UpperCamelCase format (starting with a capital letter). * 'action' explains what happened with regarding/ what action did the ReportingController take in objects name; it should be in UpperCamelCase format (starting with a capital letter). No space or long text. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2648) Add deadlock detection config to the configmap
Wilfred Spiegelenburg created YUNIKORN-2648: --- Summary: Add deadlock detection config to the configmap Key: YUNIKORN-2648 URL: https://issues.apache.org/jira/browse/YUNIKORN-2648 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg The current deadlock detection is configured using environment variables. That requires a change of the image and a restart of the scheduler to take effect and is not easy to maintain. We should be using yunikorn-defaults config map for the settings. We want a default set, turned off, for production use cases. However making the configs loadable from the config map makes turning it on easier. Update the configmap and restart the scheduler to turn the detection on or off. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2647) Flaky test TestUpdateNodeCapacity
Wilfred Spiegelenburg created YUNIKORN-2647: --- Summary: Flaky test TestUpdateNodeCapacity Key: YUNIKORN-2647 URL: https://issues.apache.org/jira/browse/YUNIKORN-2647 Project: Apache YuniKorn Issue Type: Bug Components: test - unit Reporter: Wilfred Spiegelenburg Same as we saw in YUNIKORN-2573 the single node update test might fail: {code:java} --- FAIL: TestUpdateNodeCapacity (0.03s) operation_test.go:446: Expected partition resource map[memory:1 vcore:2], doesn't match with actual partition resource map[memory:1 vcore:2]{code} We calculate the delta resources when updating node capacity with that delta we update resources in partition. The test would fail with following order same as for multiple nodes node.SetCapacity() -> waitForAvailableNodeResource() -> partitionInfo.GetTotalPartitionResource() -> partition.updatePartitionResource() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2638) Simplify finalizeNodes and finalizePods
Wilfred Spiegelenburg created YUNIKORN-2638: --- Summary: Simplify finalizeNodes and finalizePods Key: YUNIKORN-2638 URL: https://issues.apache.org/jira/browse/YUNIKORN-2638 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg In finalizeNodes and finalizePods a map is created to store the newly retrieved pods and nodes. The map is only used as a reference and the pod and node objects themselves are not used. Instead of storing the objects the maps could use a boolean value to store. This also simplifies the check later for the existence of the node or pod to just a single map lookup. We should also set the size of the map, length of the nodes or pod list retrieved, to prevent any re-allocation during the map filling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2637) finalizePods should ignore pods like registerPods does
Wilfred Spiegelenburg created YUNIKORN-2637: --- Summary: finalizePods should ignore pods like registerPods does Key: YUNIKORN-2637 URL: https://issues.apache.org/jira/browse/YUNIKORN-2637 Project: Apache YuniKorn Issue Type: Bug Components: shim - kubernetes Reporter: Wilfred Spiegelenburg The initialisation code is a two step process for pods: first list all pods and add them to the system in registerPods(). This returns a list of pods processed. The second step happens after event handlers are turned on and nodes have been cleaned up etc. During the second step pods from the first step are checked and removed. However pods that were already in a terminated state in step 1 get removed again. Although the step should be idempotent this is unneeded. When iterating over the existing pods any pod in a terminal state should be skipped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2630) Release context lock in shim when processing config in the core
Wilfred Spiegelenburg created YUNIKORN-2630: --- Summary: Release context lock in shim when processing config in the core Key: YUNIKORN-2630 URL: https://issues.apache.org/jira/browse/YUNIKORN-2630 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg When an change comes in for a the configmaps we process the change under a context lock as we need to merge the two configmaps. We keep this lock even if all the work is done in the shim and processing has been transferred to the core. This is unneeded as the core has its own locking an serialisation of the changes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2628) fix release announcement links
[ https://issues.apache.org/jira/browse/YUNIKORN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2628. - Fix Version/s: 1.6.0 Resolution: Fixed links are fixed after removing the {{..}} from the path > fix release announcement links > -- > > Key: YUNIKORN-2628 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2628 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0 > > > In YUNIKORN-2595 a regression snuck in breaking the links to the release > announcements. > Need to reverse that path change for the release announcements. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2627) Add K8s 1.30 to the e2e matrix
[ https://issues.apache.org/jira/browse/YUNIKORN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2627. - Fix Version/s: 1.6.0 Resolution: Fixed Upgrdaed kind to version 0.23 and added 1.30 as a new version to test with > Add K8s 1.30 to the e2e matrix > -- > > Key: YUNIKORN-2627 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2627 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Wilfred Spiegelenburg >Assignee: Tseng Hsi-Huang >Priority: Major > Labels: newbie, pull-request-available > Fix For: 1.6.0 > > > k8s 1.30 support in kind is now available as part of the [0.23 > release|https://github.com/kubernetes-sigs/kind/releases/tag/v0.23.0] > Need to add 1.30 to the matrix for the next release -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2628) fix release announcement links
Wilfred Spiegelenburg created YUNIKORN-2628: --- Summary: fix release announcement links Key: YUNIKORN-2628 URL: https://issues.apache.org/jira/browse/YUNIKORN-2628 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg In YUNIKORN-2596 a regression snuck in breaking the links to the release announcements. Need to reverse that path change for the release announcements. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2627) Add K8s 1.30 to the e2e matrix
Wilfred Spiegelenburg created YUNIKORN-2627: --- Summary: Add K8s 1.30 to the e2e matrix Key: YUNIKORN-2627 URL: https://issues.apache.org/jira/browse/YUNIKORN-2627 Project: Apache YuniKorn Issue Type: Improvement Reporter: Wilfred Spiegelenburg k8s 1.30 support in kind is now available as part of the [0.23 release|https://github.com/kubernetes-sigs/kind/releases/tag/v0.23.0] Need to add 1.30 to the matrix for the next release -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2531) Create unit tests for AsyncRMCallback
[ https://issues.apache.org/jira/browse/YUNIKORN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2531. - Fix Version/s: 1.6.0 Resolution: Fixed new tests added to the system to improve coverage > Create unit tests for AsyncRMCallback > - > > Key: YUNIKORN-2531 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2531 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > There are no unit tests for the {{AsyncRMCallback}} type in the shim > (scheduler_callback.go). It's tested indirectly but we have no idea about the > coverage or how it behaves in rare scenarios. > At least longer methods such as {{UpdateApplication()}}, > {{UpdateAllocation()}} and {{UpdateNode()}} should be covered. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2615) Remove named returns from predicate_manager.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2615. - Fix Version/s: 1.6.0 Resolution: Fixed refactor committed to master for 1.6.0 > Remove named returns from predicate_manager.go > -- > > Key: YUNIKORN-2615 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2615 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Predicate manager has defined named returns on some functions but does not > use them. They should be removed as the way they are used can cause issues > that are hard to debug. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2618) Streamline AsyncRMCallback UpdateAllocation
Wilfred Spiegelenburg created YUNIKORN-2618: --- Summary: Streamline AsyncRMCallback UpdateAllocation Key: YUNIKORN-2618 URL: https://issues.apache.org/jira/browse/YUNIKORN-2618 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg if task is not found, a nil is returned from {{context.getTask}} in for {{response.New}} processing we should just log that fact and proceed to the next alloc. Simplifies the flow as we never need to check for a. nil task. We should never have a pod in the cache that does not exist as a task on an application. We retrieve the application using the application ID from the response to never use the object. We only use the application ID to pass into an event. The context event handler then does the exact same lookup again to process the event on the app. We need to become much smarter in this area, double or triple lookups, generate async events that just change the state of the app or task or kick off another event. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2616) Remove unused bool return from PreemptionPredicates()
Wilfred Spiegelenburg created YUNIKORN-2616: --- Summary: Remove unused bool return from PreemptionPredicates() Key: YUNIKORN-2616 URL: https://issues.apache.org/jira/browse/YUNIKORN-2616 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg The predicate manager method {{PreemptionPredicates()}} returns two values an int and boolean. The boolean is false if the integer is -1 and true for 0 or llarger. There is no need for the boolean as the -1 already indicates the same -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2615) Remove named returns from predicate_manager.go
Wilfred Spiegelenburg created YUNIKORN-2615: --- Summary: Remove named returns from predicate_manager.go Key: YUNIKORN-2615 URL: https://issues.apache.org/jira/browse/YUNIKORN-2615 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Predicate manager has defined named returns on some functions but does not use them. They should be removed as the way they are used can cause issues that are hard to debug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2601) Update kindest/node: v1.29.1 to v1.29.2, v1.28.6 to v1.28.7, v1.27.10 to v1.27.11, v1.26.13 -> v1.26.14
[ https://issues.apache.org/jira/browse/YUNIKORN-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2601. - Fix Version/s: 1.6.0 Resolution: Fixed Changes committed. No Kind for 1.30 available yet we should log a new Jira to add it later. > Update kindest/node: v1.29.1 to v1.29.2, v1.28.6 to v1.28.7, v1.27.10 to > v1.27.11, v1.26.13 -> v1.26.14 > > > Key: YUNIKORN-2601 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2601 > Project: Apache YuniKorn > Issue Type: Improvement > Components: test - e2e >Reporter: Chia-Ping Tsai >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > as title -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2591) Document placement rules always
[ https://issues.apache.org/jira/browse/YUNIKORN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2591. - Fix Version/s: 1.5.1 1.5.0 1.4.0 Resolution: Fixed Change made to the docs going back to 1.4.0, 1.5.0. Will be part of the 1.5.1. release also > Document placement rules always > --- > > Key: YUNIKORN-2591 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2591 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Wilfred Spiegelenburg >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.1, 1.5.0, 1.4.0 > > > The current [doc > says|https://yunikorn.apache.org/docs/user_guide/queue_config#placement-rules]: > {quote}If no rules are defined the placement manager is not started and each > application _must_ have a queue set on submit. > {quote} > This is not correct, we moved to placement rules always in YUNIKORN-1793 in > YuniKorn 1.4 The documentation needs to be updated to reflect that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2596) Enhance layout for release announcements
[ https://issues.apache.org/jira/browse/YUNIKORN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2596. - Fix Version/s: 1.5.1 Resolution: Fixed Fixed and published changes applied to 1.5.0 layout, before the 1.5.1 release. marking as fixed in 1.5.1 > Enhance layout for release announcements > > > Key: YUNIKORN-2596 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2596 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.5.1 > > Attachments: release_announce.png, releasee_announce_updated.png > > > The current release announcements page lacks a decent layout. The page is > generated during the build based on the directory content. > Some simple updates would make the page more readable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2595) Fix download page links
[ https://issues.apache.org/jira/browse/YUNIKORN-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2595. - Fix Version/s: 1.5.1 Resolution: Fixed download page fixed for 1.5.0, deployed before the 1.5.1 release Marking as fixed in 1.5.1 > Fix download page links > --- > > Key: YUNIKORN-2595 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2595 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.5.1 > > > The download links must follow a specific set of rule as specified > [here|https://infra.apache.org/release-download-pages.html]. > We currently do not set the correct download link for the source package. We > dropped the closer.lua resolution for the content network in one of the > releases. With the next release, 1.5.1, coming up we need to fix this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2595) Fix download page links
Wilfred Spiegelenburg created YUNIKORN-2595: --- Summary: Fix download page links Key: YUNIKORN-2595 URL: https://issues.apache.org/jira/browse/YUNIKORN-2595 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The download links must follow a specific set of rule as specified [here|https://infra.apache.org/release-download-pages.html]. We currently do not set the correct download link for the source package. We dropped the closer.lua resolution for the content network in one of the releases. With the next release, 1.5.1, coming up we need to fix this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2591) Document placement rules always
Wilfred Spiegelenburg created YUNIKORN-2591: --- Summary: Document placement rules always Key: YUNIKORN-2591 URL: https://issues.apache.org/jira/browse/YUNIKORN-2591 Project: Apache YuniKorn Issue Type: Improvement Components: documentation Reporter: Wilfred Spiegelenburg The current [doc says|https://yunikorn.apache.org/docs/user_guide/queue_config#placement-rules]: {quote}If no rules are defined the placement manager is not started and each application _must_ have a queue set on submit. {quote} This is not correct, we moved to placement rules always in YUNIKORN-1793 in YuniKorn 1.4 The documentation needs to be updated to reflect that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2590) Handler tests should check for nil request on create
Wilfred Spiegelenburg created YUNIKORN-2590: --- Summary: Handler tests should check for nil request on create Key: YUNIKORN-2590 URL: https://issues.apache.org/jira/browse/YUNIKORN-2590 Project: Apache YuniKorn Issue Type: Improvement Components: core - common, test - unit Reporter: Wilfred Spiegelenburg In the handler_test.go file we have an anti pattern showing a large number (40+) warnings in an IDE: {quote}'req' might have 'nil' or other unexpected value as its corresponding error variable might be not 'nil' {quote} The warning are due to the fact that we have the following pattern: {code:java} req, err = http.NewRequest("GET", "path", strings.NewReader("")) req = req.WithContext(context.WithValue(req.Context(), httprouter.ParamsKey, httprouter.Params{})){code} There is no error assertion after the request creation. We should add a simple {{assert.NilError(t, err, "HTTP request create failed")}} inserted between creating and using the request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2581) Expose running placement rules in REST
Wilfred Spiegelenburg created YUNIKORN-2581: --- Summary: Expose running placement rules in REST Key: YUNIKORN-2581 URL: https://issues.apache.org/jira/browse/YUNIKORN-2581 Project: Apache YuniKorn Issue Type: New Feature Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Since introducing the use of placement rules always and the recovery rule the queue config does not correctly show the running rules. Also if a config update has been rejected, for any reason, the rules would not be correct Exposing the configured rules from the placement manager works around all these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2575) Make logging for IsPodFitNode clear
[ https://issues.apache.org/jira/browse/YUNIKORN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2575. - Fix Version/s: 1.6.0 Resolution: Fixed unique errors are returned for all failure cases which at DEBUG level will show exactly why the failure occurred. > Make logging for IsPodFitNode clear > --- > > Key: YUNIKORN-2575 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2575 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > The logging in {{IsPodFitNode()}} logs the same message for a missing pod and > node. We should log clearly which thing is missing: the node or the pod. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2580) Remove executionTimeoutMilliSeconds
[ https://issues.apache.org/jira/browse/YUNIKORN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2580. - Resolution: Won't Fix This is used for the placeholder timeout and cannot be removed. > Remove executionTimeoutMilliSeconds > --- > > Key: YUNIKORN-2580 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2580 > Project: Apache YuniKorn > Issue Type: Improvement > Components: scheduler-interface >Reporter: Chia-Ping Tsai >Priority: Minor > > [https://github.com/apache/yunikorn-scheduler-interface/blob/b70081933c38018fd7f01c82635f5b186c4ef394/si.proto#L211] > It is not used actually, and hence we should either remove it or add facility > for it. Personally, I'd like to remove it to simplify the interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2578) Refactor SchedulerCache.GetPod() remove bool return
Wilfred Spiegelenburg created YUNIKORN-2578: --- Summary: Refactor SchedulerCache.GetPod() remove bool return Key: YUNIKORN-2578 URL: https://issues.apache.org/jira/browse/YUNIKORN-2578 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg SchedulerCache {{GetPod()}} and {{GetPodNoLock()}} retrun two values: # *v1.Pod # bool The boolean value is redundant as it is false if the pod is not found and a nil is returned for the pod. The boolean is true if the pod has a value. Testing for a nil pod has the same result. We do not cache a nil pod in the cache for a pod UID -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2577) Remove named returns from IsPodFitNodeViaPreemption
Wilfred Spiegelenburg created YUNIKORN-2577: --- Summary: Remove named returns from IsPodFitNodeViaPreemption Key: YUNIKORN-2577 URL: https://issues.apache.org/jira/browse/YUNIKORN-2577 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg IsPodFitNodeViaPreemption has defined named returns but does not use them. They should be removed as the way they are used can cause issues that are hard to debug. As part of this change we need to further cleanup: * The variable {{ok}} also gets shadowed multiple times, not just from the named return declaration. * The if construct around {{GetPodNoLock()}} is not needed as it returns a nil for the pod if it returns false. Just adding the result for the pod always has the same effect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2575) Make logging for IsPodFitNode clear
Wilfred Spiegelenburg created YUNIKORN-2575: --- Summary: Make logging for IsPodFitNode clear Key: YUNIKORN-2575 URL: https://issues.apache.org/jira/browse/YUNIKORN-2575 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The logging in {{IsPodFitNode()}} logs the same message for a missing pod and node. We should log clearly which thing is missing: the node or the pod. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2556) Remove getResourceUsageDAOInfo from test code
Wilfred Spiegelenburg created YUNIKORN-2556: --- Summary: Remove getResourceUsageDAOInfo from test code Key: YUNIKORN-2556 URL: https://issues.apache.org/jira/browse/YUNIKORN-2556 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg Remove the {{getResourceUsageDAOInfo()}} call from the test code. If we need to retrieve the usage for the whole queueTracker hierarchy we should add that in the test code separately instead of using the DAO and convert that back The DAO object should also not contain the pointer to the resource object. It should contain the DAOMap for the resource object similar to all other DAO definitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2555) Cleanup placement rules in partition
Wilfred Spiegelenburg created YUNIKORN-2555: --- Summary: Cleanup placement rules in partition Key: YUNIKORN-2555 URL: https://issues.apache.org/jira/browse/YUNIKORN-2555 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler Reporter: Wilfred Spiegelenburg The placement rule config is tracked in the partition in the object {{partition.rules}} This object contains the config with which the placement manager is initialised . This was used/needed before the move to always use placement rules.. Since the change to always use placement rules it no longer has a function. The config is now also out of sync with the rules used in the placement manager. There is no need to keep this object in the partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2540) clean up constants in pkg/cache/context_test.go
Wilfred Spiegelenburg created YUNIKORN-2540: --- Summary: clean up constants in pkg/cache/context_test.go Key: YUNIKORN-2540 URL: https://issues.apache.org/jira/browse/YUNIKORN-2540 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Constants are duplicated in the {{pkg/cache/context_test.go}} example {{fakeNodeName}} is defined multiple times in the files. We should move to a central point of defining the constants for the test at the top of the file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2520) PVC errors in AssumePod() are not handled properly
[ https://issues.apache.org/jira/browse/YUNIKORN-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2520. - Fix Version/s: 1.6.0 Resolution: Fixed Changes merged to master Volume issues should be handled correctly now. > PVC errors in AssumePod() are not handled properly > -- > > Key: YUNIKORN-2520 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2520 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > When there is an error caused by a volume operation in > {{Context.AssumePod()}}, the allocation on core side will not be removed. > Although we check the result from {{UpdateAllocation}}, the error handling is > just logging: > {noformat} > if err := callback.UpdateAllocation(response); err != nil { > rmp.handleUpdateResponseError(rmID, err) > } > ... > func (rmp *RMProxy) handleUpdateResponseError(rmID string, err error) { > log.Log(log.RMProxy).Error("failed to handle response", >zap.String("rmID", rmID), >zap.Error(err)) > }{noformat} > I suggest moving volume-related code to {{{}Task.postTaskAllocated()}}. In > this case, the task will transition to "Failed" state and we'll have > allocationID available, so we can release both the ask and the allocation: > {noformat} > func (task *Task) releaseAllocation() { > ... > var releaseRequest *si.AllocationRequest > s := TaskStates() > switch task.GetTaskState() { > case s.New, s.Pending, s.Scheduling, s.Rejected: > releaseRequest = common.CreateReleaseAskRequestForTask( > task.applicationID, task.taskID, > task.application.partition) <-- release ask + allocation if possible > default: > if task.allocationID == "" { > ... log error ... > return > } > releaseRequest = > common.CreateReleaseAllocationRequestForTask( > task.applicationID, task.taskID, > task.allocationID, task.application.partition, task.terminationType) > } > ...{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2538) Shim cache context pre-allocate slice
Wilfred Spiegelenburg created YUNIKORN-2538: --- Summary: Shim cache context pre-allocate slice Key: YUNIKORN-2538 URL: https://issues.apache.org/jira/browse/YUNIKORN-2538 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg When building the reason string from all volume failure reasons we should allocate a slice once based on the size of the reasons object we get returned. See [review comment|https://github.com/apache/yunikorn-k8shim/pull/810#discussion_r1550882867] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2537) cleanup UpdateAllocation in callback
Wilfred Spiegelenburg created YUNIKORN-2537: --- Summary: cleanup UpdateAllocation in callback Key: YUNIKORN-2537 URL: https://issues.apache.org/jira/browse/YUNIKORN-2537 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg UpdateAllocation needs a cleanup: {{getTask()}} already checks for the application. No need to retrieve the application when we process response.New. Sending an event should be linked to the existence of the task not of the application. On top of that we have the appID already in the task so we do not need to get it from the app. The same logic needs to be applied to the whole function, we already do it for the release.* handling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2533) Implement String() for TrackedResource
Wilfred Spiegelenburg created YUNIKORN-2533: --- Summary: Implement String() for TrackedResource Key: YUNIKORN-2533 URL: https://issues.apache.org/jira/browse/YUNIKORN-2533 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg To fix the way TrackedResources are logged it should implement the String() function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2527) Allow remove and re-add configured queue within cleanup time
[ https://issues.apache.org/jira/browse/YUNIKORN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2527. - Fix Version/s: 1.6.0 Resolution: Fixed Queues can now be removed and added back again within a cleanup cycle > Allow remove and re-add configured queue within cleanup time > - > > Key: YUNIKORN-2527 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2527 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > When we remove a queue from the config it is marked for cleanup. If we re-add > the same queue in the config again before the cleanup gets executed the queue > still gets removed. > reproduction: > * edit config map remove a queue, save > * immediately edit configmap add the same queue back, save > * wait for the cleanup to happen, queue should still exist after the fix -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2519) Remove bypass ACL check from placement rules
[ https://issues.apache.org/jira/browse/YUNIKORN-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2519. - Fix Version/s: 1.6.0 Resolution: Fixed refactor committed to master for 1.6.0 > Remove bypass ACL check from placement rules > > > Key: YUNIKORN-2519 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2519 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Instead of returning a flag to not bypass the ACL check by all rules except > for the recovery rule special case the recovery rule to bypass checks. > The recovery queue is created without ACLs, quota and is always a leaf queue. > The only rule that can return the recovery queue is the recovery rule which > is the last one in the list. > Use all these facts to simplify the placement processing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2527) Allow remove and re-add configured queue within cleanup time
Wilfred Spiegelenburg created YUNIKORN-2527: --- Summary: Allow remove and re-add configured queue within cleanup time Key: YUNIKORN-2527 URL: https://issues.apache.org/jira/browse/YUNIKORN-2527 Project: Apache YuniKorn Issue Type: Bug Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg When we remove a queue from the config it is marked for cleanup. If we re-add the same queue in the config again before the cleanup gets executed the queue still gets removed. reproduction: * edit config map remove a queue, save * immediately edit configmap add the same queue back, save * wait for the cleanup to happen -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2498) Implement force create flag in k8shim for recovery queue
[ https://issues.apache.org/jira/browse/YUNIKORN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2498. - Fix Version/s: 1.6.0 Resolution: Fixed > Implement force create flag in k8shim for recovery queue > > > Key: YUNIKORN-2498 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2498 > Project: Apache YuniKorn > Issue Type: Task > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > As part of the initialisation changes a new recovery queue was added to allow > already running allocation to be restored even if the queue config was > changed. The implementation on the k8shim side needs to be added to leverage > the forced create flag from YUNIKORN-1887. > Without that the changes added for the recovery queue will not be used -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2494) Revisit IsAtorAbove, WithIn, GetRemaining Guaranteed resources calculation
[ https://issues.apache.org/jira/browse/YUNIKORN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2494. - Fix Version/s: 1.6.0 Resolution: Fixed Functions added to the master code, not actively used yet. > Revisit IsAtorAbove, WithIn, GetRemaining Guaranteed resources calculation > -- > > Key: YUNIKORN-2494 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2494 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - common >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > These 3 methods doesn't expose the actual guaranteed values and returns > boolean value based on the calculation. There are cases, where these boolean > values are not correct and also there is a need to know the actual guaranteed > values. For example, How much is remaining in Guaranteed? How much can be > preempted? etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2519) Remove bypass ACL check from placement rules
Wilfred Spiegelenburg created YUNIKORN-2519: --- Summary: Remove bypass ACL check from placement rules Key: YUNIKORN-2519 URL: https://issues.apache.org/jira/browse/YUNIKORN-2519 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Instead of returning a flag to not bypass the ACL check by all rules except for the recovery rule special case the recovery rule to bypass checks. The recovery queue is created without ACLs, quota and is always a leaf queue. The only rule that can return the recovery queue is the recovery rule which is the last one in the list. Use all these facts to simplify the placement processing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2518) Allow recovery queue in REST requests
Wilfred Spiegelenburg created YUNIKORN-2518: --- Summary: Allow recovery queue in REST requests Key: YUNIKORN-2518 URL: https://issues.apache.org/jira/browse/YUNIKORN-2518 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg The current checks for the REST requests that require a queue path to be provided prevent looking at the {{root.@recover@}} queue. The validator filters the queue names which makes it impossible to check if the queue has any running applications or pod after initialisation using the REST requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2506) fix
Wilfred Spiegelenburg created YUNIKORN-2506: --- Summary: fix Key: YUNIKORN-2506 URL: https://issues.apache.org/jira/browse/YUNIKORN-2506 Project: Apache YuniKorn Issue Type: Improvement Components: webapp Reporter: Wilfred Spiegelenburg When running make on the web UI project a deprecation warning is printed for the fonts we include: {code:java} WARN deprecated fontsource-roboto@4.0.0: Package relocated. Please install and migrate to @fontsource/roboto. {code} Move to {{@fontsource/roboto}} to fix the warning -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2498) Implement force create flag in k8shim for recovery queue
Wilfred Spiegelenburg created YUNIKORN-2498: --- Summary: Implement force create flag in k8shim for recovery queue Key: YUNIKORN-2498 URL: https://issues.apache.org/jira/browse/YUNIKORN-2498 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg As part of the initialisation changes a new recovery queue was added to allow already running allocation to be restored even if the queue config was changed. The implementation on the k8shim side needs to be added to leverage the forced create flag from YUNIKORN-1887. Without that the changes added for the recovery queue will not be used -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2497) Update node.js to 18.19.1
Wilfred Spiegelenburg created YUNIKORN-2497: --- Summary: Update node.js to 18.19.1 Key: YUNIKORN-2497 URL: https://issues.apache.org/jira/browse/YUNIKORN-2497 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Node 18.x is a LTS version. The version 18.17 has been superseded with two other releases 18.18 and 18.19. Both have some CVE fixes which we should be including for stability. Moving the build to 18.19 (currently 18.19.1) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2496) Fix security issues in website javascript
[ https://issues.apache.org/jira/browse/YUNIKORN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2496. - Fix Version/s: 1.6.0 Resolution: Fixed Change committed all dependabot alerts closed > Fix security issues in website javascript > - > > Key: YUNIKORN-2496 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2496 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The change to pnmp triggered a large number of security alerts from > dependabot. > 7 could be fixed directly by the 4 PRs opened by dependabot. 6 need manual > intervention. > The change also included an upgrade of the Algolia search component to 3.x. > That change prevent running {{{}pnpm audit{}}}. > Docusaurus 3.x also contains a large number of backward incompatible changes > and an upgrade is planned separately. Using the Algolia 3.x dependency > already pushes some of these changes and should be reverted to Algolia 2.x > same as the rest of Docusaurus environment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2496) Fix security issues in website javascript
Wilfred Spiegelenburg created YUNIKORN-2496: --- Summary: Fix security issues in website javascript Key: YUNIKORN-2496 URL: https://issues.apache.org/jira/browse/YUNIKORN-2496 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The change to pnmp triggered a large number of security alerts from dependabot. 7 could be fixed directly by the 4 PRs opened by dependabot. 6 need manual intervention. The change also included an upgrade of the Algolia search component to 3.x. That change prevent running {{{}pnpm audit{}}}. Docusaurus 3.x also contains a large number of backward incompatible changes and an upgrade is planned separately. Using the Algolia 3.x dependency already pushes some of these changes and should be reverted to Algolia 2.x same as the rest of Docusaurus environment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2490) Add new PMC and committer members
[ https://issues.apache.org/jira/browse/YUNIKORN-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2490. - Fix Version/s: 1.6.0 Resolution: Fixed Web site is updated with the new details after checks. Deploy of the new site should take about 30 min. > Add new PMC and committer members > - > > Key: YUNIKORN-2490 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2490 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Trivial > Labels: pull-request-available > Fix For: 1.6.0 > > > We have elected a new PMC member and some committers. Now that they have > accepted we should add them to the website. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2490) Add new PMC and committer members
Wilfred Spiegelenburg created YUNIKORN-2490: --- Summary: Add new PMC and committer members Key: YUNIKORN-2490 URL: https://issues.apache.org/jira/browse/YUNIKORN-2490 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg We have elected a new PMC member and some committers. Now that they have accepted we should add them to the website. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2482) Failure to set template does not return error
Wilfred Spiegelenburg created YUNIKORN-2482: --- Summary: Failure to set template does not return error Key: YUNIKORN-2482 URL: https://issues.apache.org/jira/browse/YUNIKORN-2482 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Wilfred Spiegelenburg The update of setting a template on a parent could fail if the template is not correct. The error is swallowed and a success is returned but the update of the queue has not finished correctly: *Queue.applyConf() {code:java} if !sq.isLeaf { if err = sq.setTemplate(conf.ChildTemplate); err != nil { return nil } } {code} Need to add tests to make sure we do not regress. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2472) REST API returns subtree by default
Wilfred Spiegelenburg created YUNIKORN-2472: --- Summary: REST API returns subtree by default Key: YUNIKORN-2472 URL: https://issues.apache.org/jira/browse/YUNIKORN-2472 Project: Apache YuniKorn Issue Type: Bug Components: core - common Affects Versions: 1.5.0 Reporter: Wilfred Spiegelenburg The subtree query parameter is interpreted the opposite of what would be expected. If you call {{/ws/v1/partition/default/queue/root?subtree}} then you do not get the subtree. If you call {{/ws/v1/partition/default/queue/root}} you get the whole tree rooted at root We have not documented the new API yet so before we add it to the docs we should fix the behaviour: * subtree given: return the whole tree * subtree missing: return one level The code fix is as simple as a ! in a single call and inverting the test cases to pass or not pass {{?subtree}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2462) incorrect gang annotations in example
Wilfred Spiegelenburg created YUNIKORN-2462: --- Summary: incorrect gang annotations in example Key: YUNIKORN-2462 URL: https://issues.apache.org/jira/browse/YUNIKORN-2462 Project: Apache YuniKorn Issue Type: Bug Components: documentation Reporter: Wilfred Spiegelenburg The example for turning on gang scheduling with Spark is incorrect. [https://yunikorn.apache.org/docs/next/user_guide/gang_scheduling/#enable-gang-scheduling-for-spark-jobs] The example shows: {code:java} yunikorn.apache.org/taskGroupName: “spark-driver” yunikorn.apache.org/taskGroup: “ TaskGroups: [ {code} The {{taskGroupName}} should be {{task-group-name}} and {{taskGroup}} should be {{task-groups}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2456) Remove weak ciphers from TLS
[ https://issues.apache.org/jira/browse/YUNIKORN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2456. - Fix Version/s: 1.5.0 Resolution: Fixed committed to master and cherry-picked into branch-1.5 resolving > Remove weak ciphers from TLS > > > Key: YUNIKORN-2456 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2456 > Project: Apache YuniKorn > Issue Type: Bug > Components: security, shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.0 > > > The TLS connection for the admission controller allows ciphers that are > considered weak in the connection. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2456) Remove weak ciphers from TLS
Wilfred Spiegelenburg created YUNIKORN-2456: --- Summary: Remove weak ciphers from TLS Key: YUNIKORN-2456 URL: https://issues.apache.org/jira/browse/YUNIKORN-2456 Project: Apache YuniKorn Issue Type: Bug Components: security, shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The TLS connection for the admission controller allows ciphers that are considered weak in the connection. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2042) REST API for specific queue
[ https://issues.apache.org/jira/browse/YUNIKORN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2042. - Fix Version/s: 1.5.0 Target Version: 1.5.0 (was: 1.6.0) Resolution: Fixed change committed and cherry-picked into branch 1.5 > REST API for specific queue > --- > > Key: YUNIKORN-2042 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2042 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Ted Lin >Assignee: Ted Lin >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > Expose a REST API for specific queue: > /ws/v1/partition/%s/queue/%s/ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2030) Need to check headroom when trying other nodes for reserved allocations
[ https://issues.apache.org/jira/browse/YUNIKORN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2030. - Fix Version/s: 1.5.0 Resolution: Fixed change committed and cherry-picked into branch-1.5 thank you for the analysis and change. > Need to check headroom when trying other nodes for reserved allocations > --- > > Key: YUNIKORN-2030 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2030 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Labels: pull-request-available > Fix For: 1.5.0 > > > As reported in YUNIKORN-1996, we are seeing many messages like below from > time to time: > {code:java} > WARN objects/application.go:1504 queue update failed unexpectedly > {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts > queue ‘root.test-queue’ over maximum allocation (map[memory:3300011278336 > vcore:390584]), current usage (map[memory:3291983380480 pods:91 > vcore:186000])“}{code} > Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate > why it happened, because it's not supposed to happen as we check if there is > enough resource headroom before calling > > {code:java} > func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation > {code} > which printed the above message, and only call it when there is enough > headroom. > There maybe a bug in headroom checking? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2448) Expose 3rd party licenses in the web UI
Wilfred Spiegelenburg created YUNIKORN-2448: --- Summary: Expose 3rd party licenses in the web UI Key: YUNIKORN-2448 URL: https://issues.apache.org/jira/browse/YUNIKORN-2448 Project: Apache YuniKorn Issue Type: Improvement Components: webapp Reporter: Wilfred Spiegelenburg We have a 3rd party license file that gets generated and included in the deployment for the web UI. This 3rd party license file is accessible if you know what its name is etc. We should expose this detail to comply with the some requirements on attribution etc as part of the web UI. Similar to how Jira exposes it as part of its About Jira pop up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2413) Variables that are initialisms or acronyms should have a consistent case
[ https://issues.apache.org/jira/browse/YUNIKORN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2413. - Fix Version/s: 1.5.0 Resolution: Fixed Two refactors left for later: function names should be updated: [{{{}master{}}}/pkg/events/event_ringbuffer.go#L206|https://github.com/apache/yunikorn-core/blob/master/pkg/events/event_ringbuffer.go?rgh-link-date=2024-02-19T17%3A21%3A31Z#L206] [{{{}master{}}}/pkg/log/logger_test.go#L38|https://github.com/apache/yunikorn-core/blob/master/pkg/log/logger_test.go?rgh-link-date=2024-02-19T17%3A21%3A31Z#L38] thank you [~priyansh] > Variables that are initialisms or acronyms should have a consistent case > > > Key: YUNIKORN-2413 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2413 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Ryan Lo >Assignee: Priyansh Choudhary >Priority: Major > Labels: newbie, pull-request-available > Fix For: 1.5.0 > > > Discussed in YUNIKORN-2405 > We mixed up "Id" and "ID" in our code base, and it's better to standardize > the use of acronyms and initialisms according to [this > doc.|https://go.dev/wiki/CodeReviewComments#initialisms] > An example: > current: allocationId > taget: allocationID -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2115) [Umbrella] YuniKorn application traceability - phase II
[ https://issues.apache.org/jira/browse/YUNIKORN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2115. - Fix Version/s: 1.5.0 Resolution: Fixed > [Umbrella] YuniKorn application traceability - phase II > --- > > Key: YUNIKORN-2115 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2115 > Project: Apache YuniKorn > Issue Type: New Feature > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.5.0 > > > This is a follow-up on YUNIKORN-1628. > This ticket focuses on streaming and user/group events. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2116) Track user/group events
[ https://issues.apache.org/jira/browse/YUNIKORN-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2116. - Fix Version/s: 1.5.0 Resolution: Fixed Core changes committed. The changes to the SI have been committed last week. Both PRs are done, closing. > Track user/group events > --- > > Key: YUNIKORN-2116 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2116 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2441) Wildcard limits are not applied to the root tracker during creation
[ https://issues.apache.org/jira/browse/YUNIKORN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2441. - Fix Version/s: 1.5.0 Resolution: Fixed Change committed > Wildcard limits are not applied to the root tracker during creation > --- > > Key: YUNIKORN-2441 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2441 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Blocker > Labels: pull-request-available > Fix For: 1.5.0 > > > When a queue tracker is created with {{newQueueTracker()}}, the appropriate > wildcard limits are applied if the tracking type is "user". > The problem is this call: > {noformat} > if trackType == user { > if config := m.getUserWildCardLimitsConfig(queuePath + "." + > queueName); config != nil { > {noformat} > For "root", we'll call "root." (with a dot at the end) instead of "root". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2445) Add comments around locking setup in tracker code
Wilfred Spiegelenburg created YUNIKORN-2445: --- Summary: Add comments around locking setup in tracker code Key: YUNIKORN-2445 URL: https://issues.apache.org/jira/browse/YUNIKORN-2445 Project: Apache YuniKorn Issue Type: Task Components: core - scheduler Reporter: Wilfred Spiegelenburg The QueueTracker code is lock free and should stay lock free. Each queue tracker object is always only linked to one UserTracker or GroupTracker. Locking is thus handled from those objects. This does mean that calls to the user or group trackers that can modify the underlying queue tracker structure must take a write lock. This specifically impacts the {{canRunApp()}} and {{headroom()}} calls as they add new entries in the queue hierarchy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2440) [UMBRELLA] Remove stateaware scheduling
Wilfred Spiegelenburg created YUNIKORN-2440: --- Summary: [UMBRELLA] Remove stateaware scheduling Key: YUNIKORN-2440 URL: https://issues.apache.org/jira/browse/YUNIKORN-2440 Project: Apache YuniKorn Issue Type: Task Components: core - scheduler Reporter: Wilfred Spiegelenburg Umbrella jira to track all the work to remove state ware scheduling: * remove scheduling code * remove documentation * remove configuration options * document way to achieve similar behaviour (FIFO with max applications) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2439) Announce deprecation of state aware scheduling
Wilfred Spiegelenburg created YUNIKORN-2439: --- Summary: Announce deprecation of state aware scheduling Key: YUNIKORN-2439 URL: https://issues.apache.org/jira/browse/YUNIKORN-2439 Project: Apache YuniKorn Issue Type: Task Components: release-notes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg State aware scheduling was a simple scheduling algorithm that provided a stop gap until gang scheduling was implemented. Gang scheduling and state aware do not work together. Gang scheduling is a more generic way of achieving almost the same behaviour. State aware scheduling has a number of drawbacks and could be used as an attack vector to slow down overall scheduling performance. We should deprecate it and remove in an upcoming release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2026) Update features document in Chinese translation
[ https://issues.apache.org/jira/browse/YUNIKORN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-2026. --- > Update features document in Chinese translation > --- > > Key: YUNIKORN-2026 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2026 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: JiaChi Wang >Assignee: JiaChi Wang >Priority: Minor > Labels: pull-request-available > > Some parts are missing in the Chinese translation of the features document. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-1511) Adding Chinese translation of Deploy to Kubernetes
[ https://issues.apache.org/jira/browse/YUNIKORN-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-1511. --- > Adding Chinese translation of Deploy to Kubernetes > -- > > Key: YUNIKORN-1511 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1511 > Project: Apache YuniKorn > Issue Type: Task >Reporter: Chen Yu Teng >Assignee: Chenchen Lai >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2220) pod.DeepCopy() is called twice in Task
[ https://issues.apache.org/jira/browse/YUNIKORN-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-2220. --- > pod.DeepCopy() is called twice in Task > -- > > Key: YUNIKORN-2220 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2220 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > > A small improvement is possible in {{task.go}}. > In {{handleSubmitTaskEvent()}} and {{{}postTaskAllocated(){}}}, we call > {{pod.DeepCopy()}} twice to avoid possible race conditions, but a single copy > is enough. Once we have a copy, it's local to the method. > {noformat} > events.GetRecorder().Eventf(task.pod.DeepCopy(), nil, v1.EventTypeNormal, > "Scheduling", "Scheduling", > "%s is queued and waiting for allocation", task.alias) > // if this task belongs to a task group, that means the app has gang > scheduling enabled > // in this case, post an event to indicate the task is being gang > scheduled > if !task.placeholder && task.taskGroupName != "" { > events.GetRecorder().Eventf(task.pod.DeepCopy(), nil, > v1.EventTypeNormal, "GangScheduling", "GangScheduling", > "Pod belongs to the taskGroup %s, it will be scheduled > as a gang member", task.taskGroupName) <-- second copy if GS is used > } > {noformat} > {noformat} > events.GetRecorder().Eventf(task.pod.DeepCopy(), > nil, v1.EventTypeNormal, "Scheduled", "Scheduled", > "Successfully assigned %s to node %s", task.alias, task.nodeName) > ... > events.GetRecorder().Eventf(task.pod.DeepCopy(), nil, > v1.EventTypeNormal, "PodBindSuccessful", "PodBindSuccessful", > "Pod %s is successfully bound to node %s", task.alias, task.nodeName) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-803) Improve coverage of partition.go
[ https://issues.apache.org/jira/browse/YUNIKORN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-803. -- > Improve coverage of partition.go > > > Key: YUNIKORN-803 > URL: https://issues.apache.org/jira/browse/YUNIKORN-803 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Chen Yu Teng >Assignee: Cliff Su >Priority: Minor > Attachments: list.png, partition.go coverage.png > > > According to feedback of coverage file, add test to improve coverage of > partition.go -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-1691) Adding Chinese translation of User Based Resource Usage Tracking
[ https://issues.apache.org/jira/browse/YUNIKORN-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-1691. --- > Adding Chinese translation of User Based Resource Usage Tracking > - > > Key: YUNIKORN-1691 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1691 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Chen Yu Teng >Assignee: Chenchen Lai >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-1692) Adding Chinese translation of User Based Resource Usage Tracking
[ https://issues.apache.org/jira/browse/YUNIKORN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-1692. --- > Adding Chinese translation of User Based Resource Usage Tracking > > > Key: YUNIKORN-1692 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1692 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Chen Yu Teng >Assignee: Huang Guan Hao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2223) Eliminate separate mutex variables
[ https://issues.apache.org/jira/browse/YUNIKORN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-2223. --- > Eliminate separate mutex variables > -- > > Key: YUNIKORN-2223 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2223 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Peter Bacsko >Priority: Minor > > In {{{}cache.Task{}}}, the lock variable is defined as: > {noformat} > type Task struct { > ... > schedulingState TaskSchedulingState > sm *fsm.FSM > lock*sync.RWMutex > } {noformat} > This also applies to {{cache.Application}} and {{cache.Context}}. > In other parts of the code, we simply embed {{sync.RWMutex}}. There's no need > to have a separate variable. Locking and unlocking become simpler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1033) Add Chinese translation for developer guide documents
[ https://issues.apache.org/jira/browse/YUNIKORN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1033. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Add Chinese translation for developer guide documents > - > > Key: YUNIKORN-1033 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1033 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: cdmikechen >Assignee: Chen Yu Teng >Priority: Major > > Add Chinese translation for developer guide documents, this is a sub task on > https://issues.apache.org/jira/browse/YUNIKORN-1029 > This issue include YuniKorn site developer guide documents. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1691) Adding Chinese translation of User Based Resource Usage Tracking
[ https://issues.apache.org/jira/browse/YUNIKORN-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1691. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Adding Chinese translation of User Based Resource Usage Tracking > - > > Key: YUNIKORN-1691 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1691 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Chen Yu Teng >Assignee: Chenchen Lai >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1692) Adding Chinese translation of User Based Resource Usage Tracking
[ https://issues.apache.org/jira/browse/YUNIKORN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1692. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Adding Chinese translation of User Based Resource Usage Tracking > > > Key: YUNIKORN-1692 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1692 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Chen Yu Teng >Assignee: Huang Guan Hao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1511) Adding Chinese translation of Deploy to Kubernetes
[ https://issues.apache.org/jira/browse/YUNIKORN-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1511. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Adding Chinese translation of Deploy to Kubernetes > -- > > Key: YUNIKORN-1511 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1511 > Project: Apache YuniKorn > Issue Type: Task >Reporter: Chen Yu Teng >Assignee: Chenchen Lai >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2026) Update features document in Chinese translation
[ https://issues.apache.org/jira/browse/YUNIKORN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2026. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Update features document in Chinese translation > --- > > Key: YUNIKORN-2026 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2026 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: JiaChi Wang >Assignee: JiaChi Wang >Priority: Minor > Labels: pull-request-available > > Some parts are missing in the Chinese translation of the features document. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2337) Update documentation about event streaming
[ https://issues.apache.org/jira/browse/YUNIKORN-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2337. - Fix Version/s: 1.5.0 Resolution: Fixed New REST API end point added to the docs > Update documentation about event streaming > -- > > Key: YUNIKORN-2337 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2337 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > Update the docs about the new REST endpoint and possible config entries > (concurrent streaming limits). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2425) Release build script should use "go mod" instead of manual replacements
[ https://issues.apache.org/jira/browse/YUNIKORN-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2425. - Fix Version/s: 1.5.0 Resolution: Fixed Using go mod edit instead of adding lines to go mod file. > Release build script should use "go mod" instead of manual replacements > --- > > Key: YUNIKORN-2425 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2425 > Project: Apache YuniKorn > Issue Type: Improvement > Components: release >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > The tools/build-release.py script included in yunikorn-release uses manual > file editing to perform module replacements. This is fragile, and can fail in > a number of cases (including when a replace directive already exists). Go > provides native tooling to script this via the "go mod edit" command. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2407) Update review guidelines link
Wilfred Spiegelenburg created YUNIKORN-2407: --- Summary: Update review guidelines link Key: YUNIKORN-2407 URL: https://issues.apache.org/jira/browse/YUNIKORN-2407 Project: Apache YuniKorn Issue Type: Improvement Components: website Reporter: Wilfred Spiegelenburg The coding guidelines link in the contribution guide points to the old location. Update the link to point to the wiki instead of the github page: https://yunikorn.apache.org/community/coding_guidelines#the-basics Point it to: https://go.dev/wiki/CodeReviewComments -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-1333) [webapp] Expose current user quota usage details
[ https://issues.apache.org/jira/browse/YUNIKORN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-1333. --- > [webapp] Expose current user quota usage details > > > Key: YUNIKORN-1333 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1333 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: webapp >Reporter: Manikandan R >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1333) [webapp] Expose current user quota usage details
[ https://issues.apache.org/jira/browse/YUNIKORN-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1333. - Target Version: (was: 1.5.0) Resolution: Duplicate Resolving as duplicate in favour of new Jira with a lot of history and discussion. > [webapp] Expose current user quota usage details > > > Key: YUNIKORN-1333 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1333 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: webapp >Reporter: Manikandan R >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2400) Upgrade docusaurus to 3.x
Wilfred Spiegelenburg created YUNIKORN-2400: --- Summary: Upgrade docusaurus to 3.x Key: YUNIKORN-2400 URL: https://issues.apache.org/jira/browse/YUNIKORN-2400 Project: Apache YuniKorn Issue Type: Improvement Components: website Reporter: Wilfred Spiegelenburg Docusaurus has released a major version update with updated dependencies. There are a lot of breaking changes in v3 and we need to follow the upgrade guide as things will most likely break: [https://docusaurus.io/docs/migration/v3] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2393) Upgrade codecov/codecov-action to v4
[ https://issues.apache.org/jira/browse/YUNIKORN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2393. - Fix Version/s: 1.5.0 Resolution: Fixed Token has been added to all 3 repos, master pushes upload coverage files. Thank you for the improvement. > Upgrade codecov/codecov-action to v4 > > > Key: YUNIKORN-2393 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2393 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common, shim - kubernetes, webapp >Reporter: Yu-Lin Chen >Assignee: Yu-Lin Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > codecov/codecov-action@v4 was > [released|https://github.com/codecov/codecov-action/releases] on 2024/01/31. > The workaround change to support Node.js 20 in v3 was reverted on 2024/01/31. > ([v3 commits|https://github.com/codecov/codecov-action/commits/v3/]) > We should update codecov/codecov-action from v3 to v4 to surpress below > warning message in GitHub action workflow: > {code:json} > Node.js 16 actions are deprecated. Please update the following actions to use > Node.js 20: codecov/codecov-action@v3. For more information see: > https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2392) Concurrent map write in SchedulerCache
[ https://issues.apache.org/jira/browse/YUNIKORN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2392. - Fix Version/s: 1.5.0 Resolution: Fixed change committed > Concurrent map write in SchedulerCache > -- > > Key: YUNIKORN-2392 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2392 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Craig Condit >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.0 > > > While running the simple performance test with MockScheduler, the following > error was detected: > {noformat} > fatal error: concurrent map writes > goroutine 16 [running]: > github.com/apache/yunikorn-k8shim/pkg/cache/external.(*SchedulerCache).removeSchedulingTask(0xc0004261e0, > {0xc00d6be600?, 0xc00d6be600?}) > > /home/bacskop/repos/yunikorn-k8shim/pkg/cache/external/scheduler_cache.go:593 > +0x45 > github.com/apache/yunikorn-k8shim/pkg/cache/external.(*SchedulerCache).updatePod(0xc0004261e0, > 0xc034a96900) > > /home/bacskop/repos/yunikorn-k8shim/pkg/cache/external/scheduler_cache.go:511 > +0x8d1 > github.com/apache/yunikorn-k8shim/pkg/cache/external.(*SchedulerCache).UpdatePod(0xc0004261e0, > 0x1a?) > > /home/bacskop/repos/yunikorn-k8shim/pkg/cache/external/scheduler_cache.go:468 > +0xba > github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).updateYuniKornPod(0xc00012a090, > 0xc034a96900) > /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:306 +0x285 > github.com/apache/yunikorn-k8shim/pkg/cache.(*Context).UpdatePod(0xc00012a090, > {0xc0001da060?, 0xc0001da120?}, {0x1f92940?, 0xc034a96900?}) > /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:288 +0x2b9 > k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) > > /home/bacskop/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/controller.go:246 > github.com/apache/yunikorn-k8shim/pkg/client.(*MockedAPIProvider).RunEventHandler.func1() > /home/bacskop/repos/yunikorn-k8shim/pkg/client/apifactory_mock.go:284 > +0x5da > created by > github.com/apache/yunikorn-k8shim/pkg/client.(*MockedAPIProvider).RunEventHandler > in goroutine 13 > /home/bacskop/repos/yunikorn-k8shim/pkg/client/apifactory_mock.go:240 > +0xda > {noformat} > We need locking inside {{{}SchedulerCache.addSchedulingTask(){}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-150) Add a link on queue’s detail info page that links to the apps page to show running in this queue
[ https://issues.apache.org/jira/browse/YUNIKORN-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-150. Fix Version/s: 1.0.0 Resolution: Fixed This was delivered in YUNIKORN-955 > Add a link on queue’s detail info page that links to the apps page to show > running in this queue > > > Key: YUNIKORN-150 > URL: https://issues.apache.org/jira/browse/YUNIKORN-150 > Project: Apache YuniKorn > Issue Type: Bug > Components: webapp >Reporter: Weiwei Yang >Assignee: Akhil PB >Priority: Major > Fix For: 1.0.0 > > > A problem we usually have is. When we look at queues, we don't know what are > the apps using the queue resources. > We could go back to the apps page and find out the apps by going over all > apps, but pretty time-consuming. It would be good if we can add a quick link > for this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-474) Remove direct dependency of core internals from shim
[ https://issues.apache.org/jira/browse/YUNIKORN-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-474. Resolution: Abandoned > Remove direct dependency of core internals from shim > > > Key: YUNIKORN-474 > URL: https://issues.apache.org/jira/browse/YUNIKORN-474 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Priority: Major > > Internal core implementations are used in the shim for unit tests. > Unit tests should not depend on internal implementations of the core > structures. The parts of the core should be mocked up but not directly called > by the unit tests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-690) [Umbrella] UI usability enhancements 2
[ https://issues.apache.org/jira/browse/YUNIKORN-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-690. Resolution: Delivered all tasks are done, closing > [Umbrella] UI usability enhancements 2 > -- > > Key: YUNIKORN-690 > URL: https://issues.apache.org/jira/browse/YUNIKORN-690 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler, webapp >Reporter: Weiwei Yang >Assignee: Wen-Chien,Juan >Priority: Critical > > Continuous effort to improve the UI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1154) Add YuniKorn Release Procedure translation zh-cn
[ https://issues.apache.org/jira/browse/YUNIKORN-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1154. - Fix Version/s: 1.3.0 Resolution: Fixed This was done a long time ago, closing > Add YuniKorn Release Procedure translation zh-cn > > > Key: YUNIKORN-1154 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1154 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: cdmikechen >Assignee: Xiang Chen >Priority: Major > Labels: pull-request-available > Fix For: 1.3.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1688) Receiving the users and groups from core
[ https://issues.apache.org/jira/browse/YUNIKORN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1688. - Resolution: Duplicate Marking this as a duplicate of the newer jira which has far more detail in it > Receiving the users and groups from core > > > Key: YUNIKORN-1688 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1688 > Project: Apache YuniKorn > Issue Type: Task > Components: webapp >Reporter: Chen Yu Teng >Assignee: Chen Yu Teng >Priority: Major > > Create the struct to receive the json from the following routes. > 1. "/ws/v1/partition/\{partition}/usage/users" > 2. "/ws/v1/partition/\{partition}/usage/groups" -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1752) Update karma as it prevents engine.io updates
[ https://issues.apache.org/jira/browse/YUNIKORN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1752. - Fix Version/s: 1.5.0 Resolution: Duplicate Done via YUNIKORN-2081 > Update karma as it prevents engine.io updates > - > > Key: YUNIKORN-1752 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1752 > Project: Apache YuniKorn > Issue Type: Task > Components: webapp >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Fix For: 1.5.0 > > > From the dependabot alert: > {code:java} > karma@6.3.20 requires engine.io@~6.1.0 via a transitive dependency on > socket.io@4.4.1{code} > Karma 6.3.20 is the latest in the 6.3 range need to move to 6.4 to fix this -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org