[jira] [Created] (YUNIKORN-2907) Queue config processing log spew
Wilfred Spiegelenburg created YUNIKORN-2907: --- Summary: Queue config processing log spew Key: YUNIKORN-2907 URL: https://issues.apache.org/jira/browse/YUNIKORN-2907 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg During configuration updates a shadow queue structure is build based on the new configuration. The shadow structure is then walked and compared to the existing queue structure. Actions are taken based on the existing queue structure: add or remove of queues that exist in new or existing structure. Update if differences are found between queues that exist in new and existing structures. During the build of the shadow structure queue creations are logged. This logs the creation of the whole queue structure. The logs do not make clear the queues are not really added but that it is the shadow structure being created. In case of large queue structures this causes a log spew, and makes the log difficult to read. The actions taken based on the comparison are logged clearly. We need to be able to distinguish between a real create and one for the shadow create in the log. The same code is executed when we create the "real" queue. The creation of the shadow queue structure should not log, log only at debug level and or log with a clear message that it is the shadow structure creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2886) update Spark operator documentation for YuniKorn integration
Wilfred Spiegelenburg created YUNIKORN-2886: --- Summary: update Spark operator documentation for YuniKorn integration Key: YUNIKORN-2886 URL: https://issues.apache.org/jira/browse/YUNIKORN-2886 Project: Apache YuniKorn Issue Type: New Feature Components: documentation Reporter: Wilfred Spiegelenburg Spark Operator 2.0 has been released with full YuniKorn support. We need to update the website and push this information. Spark Operator with YuniKorn details: * Support gang scheduling with Yunikorn * Set schedulerName to Yunikorn * Account for spark.executor.pyspark.memory in Yunikorn gang scheduling See [Spark Operator v2.0.0|https://github.com/kubeflow/spark-operator/releases/tag/v2.0.0] tag for details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2840) sortQueues: fair max performance and correctness change
Wilfred Spiegelenburg created YUNIKORN-2840: --- Summary: sortQueues: fair max performance and correctness change Key: YUNIKORN-2840 URL: https://issues.apache.org/jira/browse/YUNIKORN-2840 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler Reporter: Wilfred Spiegelenburg In YUNIKORN-2678 the fair queue sorting was improved to take guaranteed quota into account correctly. During the review there were two minor points left over that would need improving: * performance * correctness on change Currently {{GetFairMaxResource()}} gets called for each child this does a recursive call back up the queue hierarchy. This is a performance loss specially when sorting a deep hierarchy or a larger number of children. The parent details for a real fair comparison between the children should also not change. When they do, as in the current implementation, two children might use different inputs when sorted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2809) Fix layout of node transition diagram
[ https://issues.apache.org/jira/browse/YUNIKORN-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2809. - Fix Version/s: 1.6.0 Resolution: Fixed Thank you [~blue.tzuhua] for your first contribution to the SI repo. Committed the change > Fix layout of node transition diagram > - > > Key: YUNIKORN-2809 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2809 > Project: Apache YuniKorn > Issue Type: Improvement > Components: scheduler-interface >Reporter: Wilfred Spiegelenburg >Assignee: Tzu-Hua Lan >Priority: Trivial > Labels: pull-request-available > Fix For: 1.6.0 > > Attachments: image-2024-08-16-15-57-12-928.png > > > Fix formatting of the node state transition diagram. It is missing white > space and the diagram is not readable at the moment. Screenshot taken from > file after the > [commit|https://github.com/apache/yunikorn-scheduler-interface/blob/38a38685cd4ee2d108f28f6e749ce06cf5db96ce/scheduler-interface-spec.md] > !image-2024-08-16-15-57-12-928.png|width=321,height=184! > {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2838) SI: Update protobuf dependencies
[ https://issues.apache.org/jira/browse/YUNIKORN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2838. - Fix Version/s: 1.6.0 Resolution: Fixed protobuf and grpc updated to current latest versions > SI: Update protobuf dependencies > > > Key: YUNIKORN-2838 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2838 > Project: Apache YuniKorn > Issue Type: Task > Components: scheduler-interface >Reporter: Craig Condit >Assignee: Craig Condit >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Kubernetes 1.31.0 has moved to grpc v1.65.0 and protobuf v1.34.2 upstream. We > should update our own dependencies in the scheduler interface to match. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2809) Fix layout of node transition diagram
Wilfred Spiegelenburg created YUNIKORN-2809: --- Summary: Fix layout of node transition diagram Key: YUNIKORN-2809 URL: https://issues.apache.org/jira/browse/YUNIKORN-2809 Project: Apache YuniKorn Issue Type: Improvement Components: scheduler-interface Reporter: Wilfred Spiegelenburg Attachments: image-2024-08-16-15-57-12-928.png Fix formatting of the node state transition diagram. It is missing white space and the diagram is not readable at the moment. Screenshot taken from file after the [commit|https://github.com/apache/yunikorn-scheduler-interface/blob/38a38685cd4ee2d108f28f6e749ce06cf5db96ce/scheduler-interface-spec.md] !image-2024-08-16-15-57-12-928.png|width=321,height=184! {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2098) Change go lint SHA detection (following)
[ https://issues.apache.org/jira/browse/YUNIKORN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2098. - Fix Version/s: 1.6.0 Resolution: Delivered As a side effect of all the clean up work around the linter we no longer use a SHA detection as we are clean. The lint command has been updated as part of other changes to remove all SHA detection code. > Change go lint SHA detection (following) > > > Key: YUNIKORN-2098 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2098 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Dong-Lin Hsieh >Assignee: Dong-Lin Hsieh >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Following https://issues.apache.org/jira/browse/YUNIKORN-285 > Currently, we will always use the "ORIGIN/HEAD" ref. Fallback to "HEAD^" when > "ORIGIN/HEAD" doesn't exist. > This will avoid the 'fatal: Needed a single revision' error in forked repos. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2796) Root queue and partition should not have resource types with 0 values
Wilfred Spiegelenburg created YUNIKORN-2796: --- Summary: Root queue and partition should not have resource types with 0 values Key: YUNIKORN-2796 URL: https://issues.apache.org/jira/browse/YUNIKORN-2796 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg When we register a node the node available resources get added to the partition and root queue. When we remove the node the resources get removed again. Updates do a similar action. When we no longer have nodes that expose a specific resource we leave the resource type in the root queue and partition with a 0. It looks strange to have a maximum with 0 set for the partition or root and contradicts the quota interpretation documented. A resource we do not have at a certain point in time should not have a quota of 0 assigned in the root or partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2794) Resource: Change SubOnlyExisting() to same signature as AddOnlyExisting()
Wilfred Spiegelenburg created YUNIKORN-2794: --- Summary: Resource: Change SubOnlyExisting() to same signature as AddOnlyExisting() Key: YUNIKORN-2794 URL: https://issues.apache.org/jira/browse/YUNIKORN-2794 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg The AddOnlyExisting function takes two resource objects and returns a new object. The SubOnlyExisting method is called on a resource receiver modifying the receiver object. These two should use the same kind of signature taking two resource objects and returning a new object. In most use cases for SubOnlyExisting we do a clone before we call within a locked method on an object that contains the resource. This clone becomes obsolete when we make the change. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2790) GPU node restart could leave root queue always out of quota
Wilfred Spiegelenburg created YUNIKORN-2790: --- Summary: GPU node restart could leave root queue always out of quota Key: YUNIKORN-2790 URL: https://issues.apache.org/jira/browse/YUNIKORN-2790 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg On a node restart the pods assigned and running on a node are not checked against the quota of the queue(s) they run in. This has multiple reasons. Pods on a node that are scheduled by YuniKorn and already running must not be rejected. Rejecting pods could cause lots of side effects. The combination of a node restart and the reconfiguring a GPU driver could however cause a secondary issue. The node on restart might not expose the GPU resource yet. Pods that ran before the restart can be using the GPU resource. After those pods are added, ignoring quotas, the root queue will show a usage for a resource that has not been registered yet. This fact prevents all scheduling from progressing. Even for pods not requesting the GPU resource. Each scheduling action will check the root queue quota and fail. This prevents the GPU driver pods to be placed and the GPU to be registered by the node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2789) Queue internalGetMax should not use permissive calculator
Wilfred Spiegelenburg created YUNIKORN-2789: --- Summary: Queue internalGetMax should not use permissive calculator Key: YUNIKORN-2789 URL: https://issues.apache.org/jira/browse/YUNIKORN-2789 Project: Apache YuniKorn Issue Type: Bug Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg We have documented for queue resources that: {quote}Resources that are not specified in the list are not limited, for max resources, or guaranteed in the case of guaranteed resources. {quote} However in the implementation on the queue, internalGetMax, we call resources.ComponentWiseMin(). This returns 0 values for each type that is not defined in the two resources passed in. That does not line up. Example for getting the maximum resources of a queue using GetMaxQueueSet what I would expect based on the documentation: {code:java} parent: max{memory: 100G} parent.child: max{vcore: 100} => result child max{memory: 100G, vcore: 100}{code} currently we get: {code:java} parent: max{memory: 100G} parent.child: max{vcore: 100} => result child max{memory: 0, vcore: 0}{code} Similar when we add the root and call GetMaxResource: {code:java} root: max{memory: 100G, vcore: 200} root.parent: max{vcore: 100} root.parent.child: max{nvidia.com/gpu: 10} => result parent max{memory: 0, vcore: 100} => result child max{memory: 0, vcore: 0, nvidia.com/gpu: 0}{code} The fact that the resource type does not exist, even in the root, should not mean a zero set. The nodes that expose the specific resource might not have been registered or scaled up yet. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2782) Cleanup dead code in cache/context
Wilfred Spiegelenburg created YUNIKORN-2782: --- Summary: Cleanup dead code in cache/context Key: YUNIKORN-2782 URL: https://issues.apache.org/jira/browse/YUNIKORN-2782 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg In the cache context we have a number of functions that only get called from tests. We need to clean up and only use one version: * RemoveApplication & RemoveApplicationInternal We should only have RemoveApplication but the internal version is used everywhere * UpdateApplication is not used at all -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2708) Release notes for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2708. - Fix Version/s: 1.5.2 Resolution: Fixed release is done > Release notes for 1.5.2 > --- > > Key: YUNIKORN-2708 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2708 > Project: Apache YuniKorn > Issue Type: Sub-task >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available, release > Fix For: 1.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2709) Update website for 1.5.2
[ https://issues.apache.org/jira/browse/YUNIKORN-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2709. - Fix Version/s: 1.5.2 Resolution: Fixed release is done > Update website for 1.5.2 > > > Key: YUNIKORN-2709 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2709 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: release >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.5.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2738) Only check failure reason once not for every pod
[ https://issues.apache.org/jira/browse/YUNIKORN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2738. - Fix Version/s: 1.6.0 Resolution: Fixed > Only check failure reason once not for every pod > > > Key: YUNIKORN-2738 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2738 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > The reason for an application failure does not change and can be > pre-calculated for all pods when a failure is handled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2738) Only check failure reason once not for every pod
Wilfred Spiegelenburg created YUNIKORN-2738: --- Summary: Only check failure reason once not for every pod Key: YUNIKORN-2738 URL: https://issues.apache.org/jira/browse/YUNIKORN-2738 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The reason for an application failure does not change and can be pre-calculated for all pods when a failure is handled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2737) Cleanup handleFailApplicationEvent handling
Wilfred Spiegelenburg created YUNIKORN-2737: --- Summary: Cleanup handleFailApplicationEvent handling Key: YUNIKORN-2737 URL: https://issues.apache.org/jira/browse/YUNIKORN-2737 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg When we handle a failed application in the shim in {{handleFailApplicationEvent()}} we call the placeholder cleanup. Three issues: * The cleanup needs the app lock after it takes the mgr lock. The app lock is already held when we process the event. Should be placing the cleanup last to not hold the manager lock for longer than needed * failing an application is triggered by the core which should do the cleanup already so this might be redundant to start with. * The failure handling also marks unassigned pods as failed which means there is an overlap between the failure handling and the placeholder cleanup which we should remove. Either ignore all placeholders in the failure or only cleanup assigned placeholders. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2734. - Fix Version/s: 1.6.0 Resolution: Delivered The TODO was removed as part of the changes in YUNIKORN-2729. Since we do not want to make this configurable that is all we need, closing again with a link to the Jira that has the change. > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Trivial > Labels: newbie > Fix For: 1.6.0 > > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Reopened] (YUNIKORN-2734) make configurable for pods in k8shim pkg/client/kubeclient.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg reopened YUNIKORN-2734: - > make configurable for pods in k8shim pkg/client/kubeclient.go > - > > Key: YUNIKORN-2734 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2734 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Huang Guan Hao >Priority: Minor > > for remove //TODO comment > in pkg/client/kubeclient.go > https://github.com/rich7420/yunikorn-k8shim/blob/a5e875086fc8fd7e2bf4cdddf532d768d7a5f54c/pkg/client/kubeclient.go#L141 > Make the grace period for pod deletion configurable. > Currently, the grace period for deleting pods is hardcoded to 3 seconds. > This might not be suitable for all use cases, as some pods might require more > time to gracefully shut down. In the future, this value should be made > configurable, either through a function parameter, configuration file, or > environment variable, to provide more flexibility and accommodate different > scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2705) DOAP is malformed
[ https://issues.apache.org/jira/browse/YUNIKORN-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2705. - Fix Version/s: 1.5.2 Resolution: Fixed Committed the change: online validator can be found [here.|https://www.w3.org/RDF/Validator/] The validator fails before the PR is applied and passes after. The format is defined [here|https://www.w3.org/RDF/], we could leverage the python libraries in our release tool to check the content or even automate the update of the doap file. Leaving that open for other to pursue if considered relevant. > DOAP is malformed > - > > Key: YUNIKORN-2705 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2705 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Arnout Engelen >Assignee: Arnout Engelen >Priority: Major > Fix For: 1.5.2 > > > you cannot have multiple 'Version' nodes under one 'release' property in > RDF/XML -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2665) Gang app originator pod changes after restart
[ https://issues.apache.org/jira/browse/YUNIKORN-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2665. - Fix Version/s: 1.6.0 1.5.2 Resolution: Fixed Changes have been committed and backported into the 1.5 branch closing > Gang app originator pod changes after restart > - > > Key: YUNIKORN-2665 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2665 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1 >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0, 1.5.2 > > > Gang app choose the first pod (who created the app) as originator pod which > becomes the real driver pod later. While processing gang app specifically > after the placeholder creation and in the process of replacement, restart can > lead to the below described incorrect behaviour: > During restore, there is no guarantee on the ordering of pods coming from K8s > lister especially when all the pods created with the same second timestamp. > k8s use the seconds based timestamp, which means all pods created with in > same second has same timestamp. During this situation, whichever pod comes > first from lister, YK designate it as originator pod. So, any placeholder > could become the originator pod and actual originator pod has been lost. This > change could cause rippling effects leading to weird behaviour and needs to > be fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2672) Upgrade to K8s 1.29.6
Wilfred Spiegelenburg created YUNIKORN-2672: --- Summary: Upgrade to K8s 1.29.6 Key: YUNIKORN-2672 URL: https://issues.apache.org/jira/browse/YUNIKORN-2672 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg A major performance regression was fixed in K8s that on analysis mainly impacts the plugin implementation. The regression is part of the release 1.29.4 we currently build against. See [https://github.com/kubernetes/kubernetes/pull/125197] for details -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2655) Cleanup REST API documentation
Wilfred Spiegelenburg created YUNIKORN-2655: --- Summary: Cleanup REST API documentation Key: YUNIKORN-2655 URL: https://issues.apache.org/jira/browse/YUNIKORN-2655 Project: Apache YuniKorn Issue Type: Task Components: documentation Reporter: Wilfred Spiegelenburg The REST API documentation is not up to date with the current behaviour as it does not show any 400 or 404 errors returned by a number of API calls. The error response only shows a 500 code with the same message for each call. We should move to a simple list for each call showing the applicable errors like this: {code:java} ### Error responses **Code** : `400 Bad Request` (URL query is invalid, missing partition name) **Code** : `404 Not Found` (Partition not found) **Code** : `500 Internal Server Error` {code} Remove the error examples as they do not add any detail required -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2654) Remove unused code in k8shim context
Wilfred Spiegelenburg created YUNIKORN-2654: --- Summary: Remove unused code in k8shim context Key: YUNIKORN-2654 URL: https://issues.apache.org/jira/browse/YUNIKORN-2654 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg The NotifyApplicationComplete and NotifyApplicationFail function are not called by anything and are unused code. The K8shim does not trigger the application completion or failure. This is triggered by the core when the application no longer has any activity registered. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2653) Gang scheduling K8s event formatting compliance
Wilfred Spiegelenburg created YUNIKORN-2653: --- Summary: Gang scheduling K8s event formatting compliance Key: YUNIKORN-2653 URL: https://issues.apache.org/jira/browse/YUNIKORN-2653 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The K8s events provide definitions and rules around the content of the fields within the event. Adjust the content of gang scheduling related events to comply with the rules. Focussed on the reason and action fields only. * 'reason' is the reason this event is generated. 'reason' should be short and unique; it should be in UpperCamelCase format (starting with a capital letter). * 'action' explains what happened with regarding/ what action did the ReportingController take in objects name; it should be in UpperCamelCase format (starting with a capital letter). No space or long text. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2648) Add deadlock detection config to the configmap
Wilfred Spiegelenburg created YUNIKORN-2648: --- Summary: Add deadlock detection config to the configmap Key: YUNIKORN-2648 URL: https://issues.apache.org/jira/browse/YUNIKORN-2648 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg The current deadlock detection is configured using environment variables. That requires a change of the image and a restart of the scheduler to take effect and is not easy to maintain. We should be using yunikorn-defaults config map for the settings. We want a default set, turned off, for production use cases. However making the configs loadable from the config map makes turning it on easier. Update the configmap and restart the scheduler to turn the detection on or off. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2647) Flaky test TestUpdateNodeCapacity
Wilfred Spiegelenburg created YUNIKORN-2647: --- Summary: Flaky test TestUpdateNodeCapacity Key: YUNIKORN-2647 URL: https://issues.apache.org/jira/browse/YUNIKORN-2647 Project: Apache YuniKorn Issue Type: Bug Components: test - unit Reporter: Wilfred Spiegelenburg Same as we saw in YUNIKORN-2573 the single node update test might fail: {code:java} --- FAIL: TestUpdateNodeCapacity (0.03s) operation_test.go:446: Expected partition resource map[memory:1 vcore:2], doesn't match with actual partition resource map[memory:1 vcore:2]{code} We calculate the delta resources when updating node capacity with that delta we update resources in partition. The test would fail with following order same as for multiple nodes node.SetCapacity() -> waitForAvailableNodeResource() -> partitionInfo.GetTotalPartitionResource() -> partition.updatePartitionResource() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2638) Simplify finalizeNodes and finalizePods
Wilfred Spiegelenburg created YUNIKORN-2638: --- Summary: Simplify finalizeNodes and finalizePods Key: YUNIKORN-2638 URL: https://issues.apache.org/jira/browse/YUNIKORN-2638 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg In finalizeNodes and finalizePods a map is created to store the newly retrieved pods and nodes. The map is only used as a reference and the pod and node objects themselves are not used. Instead of storing the objects the maps could use a boolean value to store. This also simplifies the check later for the existence of the node or pod to just a single map lookup. We should also set the size of the map, length of the nodes or pod list retrieved, to prevent any re-allocation during the map filling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2637) finalizePods should ignore pods like registerPods does
Wilfred Spiegelenburg created YUNIKORN-2637: --- Summary: finalizePods should ignore pods like registerPods does Key: YUNIKORN-2637 URL: https://issues.apache.org/jira/browse/YUNIKORN-2637 Project: Apache YuniKorn Issue Type: Bug Components: shim - kubernetes Reporter: Wilfred Spiegelenburg The initialisation code is a two step process for pods: first list all pods and add them to the system in registerPods(). This returns a list of pods processed. The second step happens after event handlers are turned on and nodes have been cleaned up etc. During the second step pods from the first step are checked and removed. However pods that were already in a terminated state in step 1 get removed again. Although the step should be idempotent this is unneeded. When iterating over the existing pods any pod in a terminal state should be skipped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2630) Release context lock in shim when processing config in the core
Wilfred Spiegelenburg created YUNIKORN-2630: --- Summary: Release context lock in shim when processing config in the core Key: YUNIKORN-2630 URL: https://issues.apache.org/jira/browse/YUNIKORN-2630 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg When an change comes in for a the configmaps we process the change under a context lock as we need to merge the two configmaps. We keep this lock even if all the work is done in the shim and processing has been transferred to the core. This is unneeded as the core has its own locking an serialisation of the changes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2628) fix release announcement links
[ https://issues.apache.org/jira/browse/YUNIKORN-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2628. - Fix Version/s: 1.6.0 Resolution: Fixed links are fixed after removing the {{..}} from the path > fix release announcement links > -- > > Key: YUNIKORN-2628 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2628 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: pull-request-available > Fix For: 1.6.0 > > > In YUNIKORN-2595 a regression snuck in breaking the links to the release > announcements. > Need to reverse that path change for the release announcements. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2627) Add K8s 1.30 to the e2e matrix
[ https://issues.apache.org/jira/browse/YUNIKORN-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2627. - Fix Version/s: 1.6.0 Resolution: Fixed Upgrdaed kind to version 0.23 and added 1.30 as a new version to test with > Add K8s 1.30 to the e2e matrix > -- > > Key: YUNIKORN-2627 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2627 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Wilfred Spiegelenburg >Assignee: Tseng Hsi-Huang >Priority: Major > Labels: newbie, pull-request-available > Fix For: 1.6.0 > > > k8s 1.30 support in kind is now available as part of the [0.23 > release|https://github.com/kubernetes-sigs/kind/releases/tag/v0.23.0] > Need to add 1.30 to the matrix for the next release -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2628) fix release announcement links
Wilfred Spiegelenburg created YUNIKORN-2628: --- Summary: fix release announcement links Key: YUNIKORN-2628 URL: https://issues.apache.org/jira/browse/YUNIKORN-2628 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg In YUNIKORN-2596 a regression snuck in breaking the links to the release announcements. Need to reverse that path change for the release announcements. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2627) Add K8s 1.30 to the e2e matrix
Wilfred Spiegelenburg created YUNIKORN-2627: --- Summary: Add K8s 1.30 to the e2e matrix Key: YUNIKORN-2627 URL: https://issues.apache.org/jira/browse/YUNIKORN-2627 Project: Apache YuniKorn Issue Type: Improvement Reporter: Wilfred Spiegelenburg k8s 1.30 support in kind is now available as part of the [0.23 release|https://github.com/kubernetes-sigs/kind/releases/tag/v0.23.0] Need to add 1.30 to the matrix for the next release -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2531) Create unit tests for AsyncRMCallback
[ https://issues.apache.org/jira/browse/YUNIKORN-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2531. - Fix Version/s: 1.6.0 Resolution: Fixed new tests added to the system to improve coverage > Create unit tests for AsyncRMCallback > - > > Key: YUNIKORN-2531 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2531 > Project: Apache YuniKorn > Issue Type: Test > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > There are no unit tests for the {{AsyncRMCallback}} type in the shim > (scheduler_callback.go). It's tested indirectly but we have no idea about the > coverage or how it behaves in rare scenarios. > At least longer methods such as {{UpdateApplication()}}, > {{UpdateAllocation()}} and {{UpdateNode()}} should be covered. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2615) Remove named returns from predicate_manager.go
[ https://issues.apache.org/jira/browse/YUNIKORN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2615. - Fix Version/s: 1.6.0 Resolution: Fixed refactor committed to master for 1.6.0 > Remove named returns from predicate_manager.go > -- > > Key: YUNIKORN-2615 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2615 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > Predicate manager has defined named returns on some functions but does not > use them. They should be removed as the way they are used can cause issues > that are hard to debug. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2618) Streamline AsyncRMCallback UpdateAllocation
Wilfred Spiegelenburg created YUNIKORN-2618: --- Summary: Streamline AsyncRMCallback UpdateAllocation Key: YUNIKORN-2618 URL: https://issues.apache.org/jira/browse/YUNIKORN-2618 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg if task is not found, a nil is returned from {{context.getTask}} in for {{response.New}} processing we should just log that fact and proceed to the next alloc. Simplifies the flow as we never need to check for a. nil task. We should never have a pod in the cache that does not exist as a task on an application. We retrieve the application using the application ID from the response to never use the object. We only use the application ID to pass into an event. The context event handler then does the exact same lookup again to process the event on the app. We need to become much smarter in this area, double or triple lookups, generate async events that just change the state of the app or task or kick off another event. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2616) Remove unused bool return from PreemptionPredicates()
Wilfred Spiegelenburg created YUNIKORN-2616: --- Summary: Remove unused bool return from PreemptionPredicates() Key: YUNIKORN-2616 URL: https://issues.apache.org/jira/browse/YUNIKORN-2616 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg The predicate manager method {{PreemptionPredicates()}} returns two values an int and boolean. The boolean is false if the integer is -1 and true for 0 or llarger. There is no need for the boolean as the -1 already indicates the same -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2615) Remove named returns from predicate_manager.go
Wilfred Spiegelenburg created YUNIKORN-2615: --- Summary: Remove named returns from predicate_manager.go Key: YUNIKORN-2615 URL: https://issues.apache.org/jira/browse/YUNIKORN-2615 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Predicate manager has defined named returns on some functions but does not use them. They should be removed as the way they are used can cause issues that are hard to debug. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2601) Update kindest/node: v1.29.1 to v1.29.2, v1.28.6 to v1.28.7, v1.27.10 to v1.27.11, v1.26.13 -> v1.26.14
[ https://issues.apache.org/jira/browse/YUNIKORN-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2601. - Fix Version/s: 1.6.0 Resolution: Fixed Changes committed. No Kind for 1.30 available yet we should log a new Jira to add it later. > Update kindest/node: v1.29.1 to v1.29.2, v1.28.6 to v1.28.7, v1.27.10 to > v1.27.11, v1.26.13 -> v1.26.14 > > > Key: YUNIKORN-2601 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2601 > Project: Apache YuniKorn > Issue Type: Improvement > Components: test - e2e >Reporter: Chia-Ping Tsai >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > as title -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2591) Document placement rules always
[ https://issues.apache.org/jira/browse/YUNIKORN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2591. - Fix Version/s: 1.5.1 1.5.0 1.4.0 Resolution: Fixed Change made to the docs going back to 1.4.0, 1.5.0. Will be part of the 1.5.1. release also > Document placement rules always > --- > > Key: YUNIKORN-2591 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2591 > Project: Apache YuniKorn > Issue Type: Improvement > Components: documentation >Reporter: Wilfred Spiegelenburg >Assignee: Hsien-Cheng(Ryan) Huang >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.1, 1.5.0, 1.4.0 > > > The current [doc > says|https://yunikorn.apache.org/docs/user_guide/queue_config#placement-rules]: > {quote}If no rules are defined the placement manager is not started and each > application _must_ have a queue set on submit. > {quote} > This is not correct, we moved to placement rules always in YUNIKORN-1793 in > YuniKorn 1.4 The documentation needs to be updated to reflect that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2596) Enhance layout for release announcements
[ https://issues.apache.org/jira/browse/YUNIKORN-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2596. - Fix Version/s: 1.5.1 Resolution: Fixed Fixed and published changes applied to 1.5.0 layout, before the 1.5.1 release. marking as fixed in 1.5.1 > Enhance layout for release announcements > > > Key: YUNIKORN-2596 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2596 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.5.1 > > Attachments: release_announce.png, releasee_announce_updated.png > > > The current release announcements page lacks a decent layout. The page is > generated during the build based on the directory content. > Some simple updates would make the page more readable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2595) Fix download page links
[ https://issues.apache.org/jira/browse/YUNIKORN-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2595. - Fix Version/s: 1.5.1 Resolution: Fixed download page fixed for 1.5.0, deployed before the 1.5.1 release Marking as fixed in 1.5.1 > Fix download page links > --- > > Key: YUNIKORN-2595 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2595 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.5.1 > > > The download links must follow a specific set of rule as specified > [here|https://infra.apache.org/release-download-pages.html]. > We currently do not set the correct download link for the source package. We > dropped the closer.lua resolution for the content network in one of the > releases. With the next release, 1.5.1, coming up we need to fix this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2595) Fix download page links
Wilfred Spiegelenburg created YUNIKORN-2595: --- Summary: Fix download page links Key: YUNIKORN-2595 URL: https://issues.apache.org/jira/browse/YUNIKORN-2595 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The download links must follow a specific set of rule as specified [here|https://infra.apache.org/release-download-pages.html]. We currently do not set the correct download link for the source package. We dropped the closer.lua resolution for the content network in one of the releases. With the next release, 1.5.1, coming up we need to fix this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2591) Document placement rules always
Wilfred Spiegelenburg created YUNIKORN-2591: --- Summary: Document placement rules always Key: YUNIKORN-2591 URL: https://issues.apache.org/jira/browse/YUNIKORN-2591 Project: Apache YuniKorn Issue Type: Improvement Components: documentation Reporter: Wilfred Spiegelenburg The current [doc says|https://yunikorn.apache.org/docs/user_guide/queue_config#placement-rules]: {quote}If no rules are defined the placement manager is not started and each application _must_ have a queue set on submit. {quote} This is not correct, we moved to placement rules always in YUNIKORN-1793 in YuniKorn 1.4 The documentation needs to be updated to reflect that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2590) Handler tests should check for nil request on create
Wilfred Spiegelenburg created YUNIKORN-2590: --- Summary: Handler tests should check for nil request on create Key: YUNIKORN-2590 URL: https://issues.apache.org/jira/browse/YUNIKORN-2590 Project: Apache YuniKorn Issue Type: Improvement Components: core - common, test - unit Reporter: Wilfred Spiegelenburg In the handler_test.go file we have an anti pattern showing a large number (40+) warnings in an IDE: {quote}'req' might have 'nil' or other unexpected value as its corresponding error variable might be not 'nil' {quote} The warning are due to the fact that we have the following pattern: {code:java} req, err = http.NewRequest("GET", "path", strings.NewReader("")) req = req.WithContext(context.WithValue(req.Context(), httprouter.ParamsKey, httprouter.Params{})){code} There is no error assertion after the request creation. We should add a simple {{assert.NilError(t, err, "HTTP request create failed")}} inserted between creating and using the request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2581) Expose running placement rules in REST
Wilfred Spiegelenburg created YUNIKORN-2581: --- Summary: Expose running placement rules in REST Key: YUNIKORN-2581 URL: https://issues.apache.org/jira/browse/YUNIKORN-2581 Project: Apache YuniKorn Issue Type: New Feature Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Since introducing the use of placement rules always and the recovery rule the queue config does not correctly show the running rules. Also if a config update has been rejected, for any reason, the rules would not be correct Exposing the configured rules from the placement manager works around all these issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2575) Make logging for IsPodFitNode clear
[ https://issues.apache.org/jira/browse/YUNIKORN-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2575. - Fix Version/s: 1.6.0 Resolution: Fixed unique errors are returned for all failure cases which at DEBUG level will show exactly why the failure occurred. > Make logging for IsPodFitNode clear > --- > > Key: YUNIKORN-2575 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2575 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Minor > Labels: pull-request-available > Fix For: 1.6.0 > > > The logging in {{IsPodFitNode()}} logs the same message for a missing pod and > node. We should log clearly which thing is missing: the node or the pod. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2580) Remove executionTimeoutMilliSeconds
[ https://issues.apache.org/jira/browse/YUNIKORN-2580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2580. - Resolution: Won't Fix This is used for the placeholder timeout and cannot be removed. > Remove executionTimeoutMilliSeconds > --- > > Key: YUNIKORN-2580 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2580 > Project: Apache YuniKorn > Issue Type: Improvement > Components: scheduler-interface >Reporter: Chia-Ping Tsai >Priority: Minor > > [https://github.com/apache/yunikorn-scheduler-interface/blob/b70081933c38018fd7f01c82635f5b186c4ef394/si.proto#L211] > It is not used actually, and hence we should either remove it or add facility > for it. Personally, I'd like to remove it to simplify the interface. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2578) Refactor SchedulerCache.GetPod() remove bool return
Wilfred Spiegelenburg created YUNIKORN-2578: --- Summary: Refactor SchedulerCache.GetPod() remove bool return Key: YUNIKORN-2578 URL: https://issues.apache.org/jira/browse/YUNIKORN-2578 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg SchedulerCache {{GetPod()}} and {{GetPodNoLock()}} retrun two values: # *v1.Pod # bool The boolean value is redundant as it is false if the pod is not found and a nil is returned for the pod. The boolean is true if the pod has a value. Testing for a nil pod has the same result. We do not cache a nil pod in the cache for a pod UID -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2577) Remove named returns from IsPodFitNodeViaPreemption
Wilfred Spiegelenburg created YUNIKORN-2577: --- Summary: Remove named returns from IsPodFitNodeViaPreemption Key: YUNIKORN-2577 URL: https://issues.apache.org/jira/browse/YUNIKORN-2577 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg IsPodFitNodeViaPreemption has defined named returns but does not use them. They should be removed as the way they are used can cause issues that are hard to debug. As part of this change we need to further cleanup: * The variable {{ok}} also gets shadowed multiple times, not just from the named return declaration. * The if construct around {{GetPodNoLock()}} is not needed as it returns a nil for the pod if it returns false. Just adding the result for the pod always has the same effect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2575) Make logging for IsPodFitNode clear
Wilfred Spiegelenburg created YUNIKORN-2575: --- Summary: Make logging for IsPodFitNode clear Key: YUNIKORN-2575 URL: https://issues.apache.org/jira/browse/YUNIKORN-2575 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The logging in {{IsPodFitNode()}} logs the same message for a missing pod and node. We should log clearly which thing is missing: the node or the pod. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2556) Remove getResourceUsageDAOInfo from test code
Wilfred Spiegelenburg created YUNIKORN-2556: --- Summary: Remove getResourceUsageDAOInfo from test code Key: YUNIKORN-2556 URL: https://issues.apache.org/jira/browse/YUNIKORN-2556 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg Remove the {{getResourceUsageDAOInfo()}} call from the test code. If we need to retrieve the usage for the whole queueTracker hierarchy we should add that in the test code separately instead of using the DAO and convert that back The DAO object should also not contain the pointer to the resource object. It should contain the DAOMap for the resource object similar to all other DAO definitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2555) Cleanup placement rules in partition
Wilfred Spiegelenburg created YUNIKORN-2555: --- Summary: Cleanup placement rules in partition Key: YUNIKORN-2555 URL: https://issues.apache.org/jira/browse/YUNIKORN-2555 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler Reporter: Wilfred Spiegelenburg The placement rule config is tracked in the partition in the object {{partition.rules}} This object contains the config with which the placement manager is initialised . This was used/needed before the move to always use placement rules.. Since the change to always use placement rules it no longer has a function. The config is now also out of sync with the rules used in the placement manager. There is no need to keep this object in the partition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2540) clean up constants in pkg/cache/context_test.go
Wilfred Spiegelenburg created YUNIKORN-2540: --- Summary: clean up constants in pkg/cache/context_test.go Key: YUNIKORN-2540 URL: https://issues.apache.org/jira/browse/YUNIKORN-2540 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Constants are duplicated in the {{pkg/cache/context_test.go}} example {{fakeNodeName}} is defined multiple times in the files. We should move to a central point of defining the constants for the test at the top of the file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2520) PVC errors in AssumePod() are not handled properly
[ https://issues.apache.org/jira/browse/YUNIKORN-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2520. - Fix Version/s: 1.6.0 Resolution: Fixed Changes merged to master Volume issues should be handled correctly now. > PVC errors in AssumePod() are not handled properly > -- > > Key: YUNIKORN-2520 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2520 > Project: Apache YuniKorn > Issue Type: Bug > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > When there is an error caused by a volume operation in > {{Context.AssumePod()}}, the allocation on core side will not be removed. > Although we check the result from {{UpdateAllocation}}, the error handling is > just logging: > {noformat} > if err := callback.UpdateAllocation(response); err != nil { > rmp.handleUpdateResponseError(rmID, err) > } > ... > func (rmp *RMProxy) handleUpdateResponseError(rmID string, err error) { > log.Log(log.RMProxy).Error("failed to handle response", >zap.String("rmID", rmID), >zap.Error(err)) > }{noformat} > I suggest moving volume-related code to {{{}Task.postTaskAllocated()}}. In > this case, the task will transition to "Failed" state and we'll have > allocationID available, so we can release both the ask and the allocation: > {noformat} > func (task *Task) releaseAllocation() { > ... > var releaseRequest *si.AllocationRequest > s := TaskStates() > switch task.GetTaskState() { > case s.New, s.Pending, s.Scheduling, s.Rejected: > releaseRequest = common.CreateReleaseAskRequestForTask( > task.applicationID, task.taskID, > task.application.partition) <-- release ask + allocation if possible > default: > if task.allocationID == "" { > ... log error ... > return > } > releaseRequest = > common.CreateReleaseAllocationRequestForTask( > task.applicationID, task.taskID, > task.allocationID, task.application.partition, task.terminationType) > } > ...{noformat} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2538) Shim cache context pre-allocate slice
Wilfred Spiegelenburg created YUNIKORN-2538: --- Summary: Shim cache context pre-allocate slice Key: YUNIKORN-2538 URL: https://issues.apache.org/jira/browse/YUNIKORN-2538 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg When building the reason string from all volume failure reasons we should allocate a slice once based on the size of the reasons object we get returned. See [review comment|https://github.com/apache/yunikorn-k8shim/pull/810#discussion_r1550882867] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2537) cleanup UpdateAllocation in callback
Wilfred Spiegelenburg created YUNIKORN-2537: --- Summary: cleanup UpdateAllocation in callback Key: YUNIKORN-2537 URL: https://issues.apache.org/jira/browse/YUNIKORN-2537 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg UpdateAllocation needs a cleanup: {{getTask()}} already checks for the application. No need to retrieve the application when we process response.New. Sending an event should be linked to the existence of the task not of the application. On top of that we have the appID already in the task so we do not need to get it from the app. The same logic needs to be applied to the whole function, we already do it for the release.* handling. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2533) Implement String() for TrackedResource
Wilfred Spiegelenburg created YUNIKORN-2533: --- Summary: Implement String() for TrackedResource Key: YUNIKORN-2533 URL: https://issues.apache.org/jira/browse/YUNIKORN-2533 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg To fix the way TrackedResources are logged it should implement the String() function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2527) Allow remove and re-add configured queue within cleanup time
[ https://issues.apache.org/jira/browse/YUNIKORN-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2527. - Fix Version/s: 1.6.0 Resolution: Fixed Queues can now be removed and added back again within a cleanup cycle > Allow remove and re-add configured queue within cleanup time > - > > Key: YUNIKORN-2527 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2527 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - common >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > When we remove a queue from the config it is marked for cleanup. If we re-add > the same queue in the config again before the cleanup gets executed the queue > still gets removed. > reproduction: > * edit config map remove a queue, save > * immediately edit configmap add the same queue back, save > * wait for the cleanup to happen, queue should still exist after the fix -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2519) Remove bypass ACL check from placement rules
[ https://issues.apache.org/jira/browse/YUNIKORN-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2519. - Fix Version/s: 1.6.0 Resolution: Fixed refactor committed to master for 1.6.0 > Remove bypass ACL check from placement rules > > > Key: YUNIKORN-2519 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2519 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > Instead of returning a flag to not bypass the ACL check by all rules except > for the recovery rule special case the recovery rule to bypass checks. > The recovery queue is created without ACLs, quota and is always a leaf queue. > The only rule that can return the recovery queue is the recovery rule which > is the last one in the list. > Use all these facts to simplify the placement processing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2527) Allow remove and re-add configured queue within cleanup time
Wilfred Spiegelenburg created YUNIKORN-2527: --- Summary: Allow remove and re-add configured queue within cleanup time Key: YUNIKORN-2527 URL: https://issues.apache.org/jira/browse/YUNIKORN-2527 Project: Apache YuniKorn Issue Type: Bug Components: core - common Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg When we remove a queue from the config it is marked for cleanup. If we re-add the same queue in the config again before the cleanup gets executed the queue still gets removed. reproduction: * edit config map remove a queue, save * immediately edit configmap add the same queue back, save * wait for the cleanup to happen -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2498) Implement force create flag in k8shim for recovery queue
[ https://issues.apache.org/jira/browse/YUNIKORN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2498. - Fix Version/s: 1.6.0 Resolution: Fixed > Implement force create flag in k8shim for recovery queue > > > Key: YUNIKORN-2498 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2498 > Project: Apache YuniKorn > Issue Type: Task > Components: shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > As part of the initialisation changes a new recovery queue was added to allow > already running allocation to be restored even if the queue config was > changed. The implementation on the k8shim side needs to be added to leverage > the forced create flag from YUNIKORN-1887. > Without that the changes added for the recovery queue will not be used -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2494) Revisit IsAtorAbove, WithIn, GetRemaining Guaranteed resources calculation
[ https://issues.apache.org/jira/browse/YUNIKORN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2494. - Fix Version/s: 1.6.0 Resolution: Fixed Functions added to the master code, not actively used yet. > Revisit IsAtorAbove, WithIn, GetRemaining Guaranteed resources calculation > -- > > Key: YUNIKORN-2494 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2494 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - common >Reporter: Manikandan R >Assignee: Manikandan R >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > These 3 methods doesn't expose the actual guaranteed values and returns > boolean value based on the calculation. There are cases, where these boolean > values are not correct and also there is a need to know the actual guaranteed > values. For example, How much is remaining in Guaranteed? How much can be > preempted? etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2519) Remove bypass ACL check from placement rules
Wilfred Spiegelenburg created YUNIKORN-2519: --- Summary: Remove bypass ACL check from placement rules Key: YUNIKORN-2519 URL: https://issues.apache.org/jira/browse/YUNIKORN-2519 Project: Apache YuniKorn Issue Type: Improvement Components: core - scheduler Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Instead of returning a flag to not bypass the ACL check by all rules except for the recovery rule special case the recovery rule to bypass checks. The recovery queue is created without ACLs, quota and is always a leaf queue. The only rule that can return the recovery queue is the recovery rule which is the last one in the list. Use all these facts to simplify the placement processing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2518) Allow recovery queue in REST requests
Wilfred Spiegelenburg created YUNIKORN-2518: --- Summary: Allow recovery queue in REST requests Key: YUNIKORN-2518 URL: https://issues.apache.org/jira/browse/YUNIKORN-2518 Project: Apache YuniKorn Issue Type: Improvement Components: core - common Reporter: Wilfred Spiegelenburg The current checks for the REST requests that require a queue path to be provided prevent looking at the {{root.@recover@}} queue. The validator filters the queue names which makes it impossible to check if the queue has any running applications or pod after initialisation using the REST requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2506) fix
Wilfred Spiegelenburg created YUNIKORN-2506: --- Summary: fix Key: YUNIKORN-2506 URL: https://issues.apache.org/jira/browse/YUNIKORN-2506 Project: Apache YuniKorn Issue Type: Improvement Components: webapp Reporter: Wilfred Spiegelenburg When running make on the web UI project a deprecation warning is printed for the fonts we include: {code:java} WARN deprecated fontsource-roboto@4.0.0: Package relocated. Please install and migrate to @fontsource/roboto. {code} Move to {{@fontsource/roboto}} to fix the warning -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2498) Implement force create flag in k8shim for recovery queue
Wilfred Spiegelenburg created YUNIKORN-2498: --- Summary: Implement force create flag in k8shim for recovery queue Key: YUNIKORN-2498 URL: https://issues.apache.org/jira/browse/YUNIKORN-2498 Project: Apache YuniKorn Issue Type: Task Components: shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg As part of the initialisation changes a new recovery queue was added to allow already running allocation to be restored even if the queue config was changed. The implementation on the k8shim side needs to be added to leverage the forced create flag from YUNIKORN-1887. Without that the changes added for the recovery queue will not be used -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2497) Update node.js to 18.19.1
Wilfred Spiegelenburg created YUNIKORN-2497: --- Summary: Update node.js to 18.19.1 Key: YUNIKORN-2497 URL: https://issues.apache.org/jira/browse/YUNIKORN-2497 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg Node 18.x is a LTS version. The version 18.17 has been superseded with two other releases 18.18 and 18.19. Both have some CVE fixes which we should be including for stability. Moving the build to 18.19 (currently 18.19.1) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2496) Fix security issues in website javascript
[ https://issues.apache.org/jira/browse/YUNIKORN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2496. - Fix Version/s: 1.6.0 Resolution: Fixed Change committed all dependabot alerts closed > Fix security issues in website javascript > - > > Key: YUNIKORN-2496 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2496 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > The change to pnmp triggered a large number of security alerts from > dependabot. > 7 could be fixed directly by the 4 PRs opened by dependabot. 6 need manual > intervention. > The change also included an upgrade of the Algolia search component to 3.x. > That change prevent running {{{}pnpm audit{}}}. > Docusaurus 3.x also contains a large number of backward incompatible changes > and an upgrade is planned separately. Using the Algolia 3.x dependency > already pushes some of these changes and should be reverted to Algolia 2.x > same as the rest of Docusaurus environment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2496) Fix security issues in website javascript
Wilfred Spiegelenburg created YUNIKORN-2496: --- Summary: Fix security issues in website javascript Key: YUNIKORN-2496 URL: https://issues.apache.org/jira/browse/YUNIKORN-2496 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The change to pnmp triggered a large number of security alerts from dependabot. 7 could be fixed directly by the 4 PRs opened by dependabot. 6 need manual intervention. The change also included an upgrade of the Algolia search component to 3.x. That change prevent running {{{}pnpm audit{}}}. Docusaurus 3.x also contains a large number of backward incompatible changes and an upgrade is planned separately. Using the Algolia 3.x dependency already pushes some of these changes and should be reverted to Algolia 2.x same as the rest of Docusaurus environment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2490) Add new PMC and committer members
[ https://issues.apache.org/jira/browse/YUNIKORN-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2490. - Fix Version/s: 1.6.0 Resolution: Fixed Web site is updated with the new details after checks. Deploy of the new site should take about 30 min. > Add new PMC and committer members > - > > Key: YUNIKORN-2490 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2490 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Trivial > Labels: pull-request-available > Fix For: 1.6.0 > > > We have elected a new PMC member and some committers. Now that they have > accepted we should add them to the website. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2490) Add new PMC and committer members
Wilfred Spiegelenburg created YUNIKORN-2490: --- Summary: Add new PMC and committer members Key: YUNIKORN-2490 URL: https://issues.apache.org/jira/browse/YUNIKORN-2490 Project: Apache YuniKorn Issue Type: Task Components: website Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg We have elected a new PMC member and some committers. Now that they have accepted we should add them to the website. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2482) Failure to set template does not return error
Wilfred Spiegelenburg created YUNIKORN-2482: --- Summary: Failure to set template does not return error Key: YUNIKORN-2482 URL: https://issues.apache.org/jira/browse/YUNIKORN-2482 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Wilfred Spiegelenburg The update of setting a template on a parent could fail if the template is not correct. The error is swallowed and a success is returned but the update of the queue has not finished correctly: *Queue.applyConf() {code:java} if !sq.isLeaf { if err = sq.setTemplate(conf.ChildTemplate); err != nil { return nil } } {code} Need to add tests to make sure we do not regress. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2472) REST API returns subtree by default
Wilfred Spiegelenburg created YUNIKORN-2472: --- Summary: REST API returns subtree by default Key: YUNIKORN-2472 URL: https://issues.apache.org/jira/browse/YUNIKORN-2472 Project: Apache YuniKorn Issue Type: Bug Components: core - common Affects Versions: 1.5.0 Reporter: Wilfred Spiegelenburg The subtree query parameter is interpreted the opposite of what would be expected. If you call {{/ws/v1/partition/default/queue/root?subtree}} then you do not get the subtree. If you call {{/ws/v1/partition/default/queue/root}} you get the whole tree rooted at root We have not documented the new API yet so before we add it to the docs we should fix the behaviour: * subtree given: return the whole tree * subtree missing: return one level The code fix is as simple as a ! in a single call and inverting the test cases to pass or not pass {{?subtree}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2462) incorrect gang annotations in example
Wilfred Spiegelenburg created YUNIKORN-2462: --- Summary: incorrect gang annotations in example Key: YUNIKORN-2462 URL: https://issues.apache.org/jira/browse/YUNIKORN-2462 Project: Apache YuniKorn Issue Type: Bug Components: documentation Reporter: Wilfred Spiegelenburg The example for turning on gang scheduling with Spark is incorrect. [https://yunikorn.apache.org/docs/next/user_guide/gang_scheduling/#enable-gang-scheduling-for-spark-jobs] The example shows: {code:java} yunikorn.apache.org/taskGroupName: “spark-driver” yunikorn.apache.org/taskGroup: “ TaskGroups: [ {code} The {{taskGroupName}} should be {{task-group-name}} and {{taskGroup}} should be {{task-groups}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2456) Remove weak ciphers from TLS
[ https://issues.apache.org/jira/browse/YUNIKORN-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2456. - Fix Version/s: 1.5.0 Resolution: Fixed committed to master and cherry-picked into branch-1.5 resolving > Remove weak ciphers from TLS > > > Key: YUNIKORN-2456 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2456 > Project: Apache YuniKorn > Issue Type: Bug > Components: security, shim - kubernetes >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: pull-request-available > Fix For: 1.5.0 > > > The TLS connection for the admission controller allows ciphers that are > considered weak in the connection. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2456) Remove weak ciphers from TLS
Wilfred Spiegelenburg created YUNIKORN-2456: --- Summary: Remove weak ciphers from TLS Key: YUNIKORN-2456 URL: https://issues.apache.org/jira/browse/YUNIKORN-2456 Project: Apache YuniKorn Issue Type: Bug Components: security, shim - kubernetes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg The TLS connection for the admission controller allows ciphers that are considered weak in the connection. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2042) REST API for specific queue
[ https://issues.apache.org/jira/browse/YUNIKORN-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2042. - Fix Version/s: 1.5.0 Target Version: 1.5.0 (was: 1.6.0) Resolution: Fixed change committed and cherry-picked into branch 1.5 > REST API for specific queue > --- > > Key: YUNIKORN-2042 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2042 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - common >Reporter: Ted Lin >Assignee: Ted Lin >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > > Expose a REST API for specific queue: > /ws/v1/partition/%s/queue/%s/ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2030) Need to check headroom when trying other nodes for reserved allocations
[ https://issues.apache.org/jira/browse/YUNIKORN-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2030. - Fix Version/s: 1.5.0 Resolution: Fixed change committed and cherry-picked into branch-1.5 thank you for the analysis and change. > Need to check headroom when trying other nodes for reserved allocations > --- > > Key: YUNIKORN-2030 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2030 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Labels: pull-request-available > Fix For: 1.5.0 > > > As reported in YUNIKORN-1996, we are seeing many messages like below from > time to time: > {code:java} > WARN objects/application.go:1504 queue update failed unexpectedly > {“error”: “allocation (map[memory:37580963840 pods:1 vcore:2000]) puts > queue ‘root.test-queue’ over maximum allocation (map[memory:3300011278336 > vcore:390584]), current usage (map[memory:3291983380480 pods:91 > vcore:186000])“}{code} > Restarting Yunikorn helps stoppinging it. Creating this Jira to investigate > why it happened, because it's not supposed to happen as we check if there is > enough resource headroom before calling > > {code:java} > func (sa *Application) tryNode(node *Node, ask *AllocationAsk) *Allocation > {code} > which printed the above message, and only call it when there is enough > headroom. > There maybe a bug in headroom checking? > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2448) Expose 3rd party licenses in the web UI
Wilfred Spiegelenburg created YUNIKORN-2448: --- Summary: Expose 3rd party licenses in the web UI Key: YUNIKORN-2448 URL: https://issues.apache.org/jira/browse/YUNIKORN-2448 Project: Apache YuniKorn Issue Type: Improvement Components: webapp Reporter: Wilfred Spiegelenburg We have a 3rd party license file that gets generated and included in the deployment for the web UI. This 3rd party license file is accessible if you know what its name is etc. We should expose this detail to comply with the some requirements on attribution etc as part of the web UI. Similar to how Jira exposes it as part of its About Jira pop up. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2413) Variables that are initialisms or acronyms should have a consistent case
[ https://issues.apache.org/jira/browse/YUNIKORN-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2413. - Fix Version/s: 1.5.0 Resolution: Fixed Two refactors left for later: function names should be updated: [{{{}master{}}}/pkg/events/event_ringbuffer.go#L206|https://github.com/apache/yunikorn-core/blob/master/pkg/events/event_ringbuffer.go?rgh-link-date=2024-02-19T17%3A21%3A31Z#L206] [{{{}master{}}}/pkg/log/logger_test.go#L38|https://github.com/apache/yunikorn-core/blob/master/pkg/log/logger_test.go?rgh-link-date=2024-02-19T17%3A21%3A31Z#L38] thank you [~priyansh] > Variables that are initialisms or acronyms should have a consistent case > > > Key: YUNIKORN-2413 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2413 > Project: Apache YuniKorn > Issue Type: Improvement >Reporter: Ryan Lo >Assignee: Priyansh Choudhary >Priority: Major > Labels: newbie, pull-request-available > Fix For: 1.5.0 > > > Discussed in YUNIKORN-2405 > We mixed up "Id" and "ID" in our code base, and it's better to standardize > the use of acronyms and initialisms according to [this > doc.|https://go.dev/wiki/CodeReviewComments#initialisms] > An example: > current: allocationId > taget: allocationID -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2115) [Umbrella] YuniKorn application traceability - phase II
[ https://issues.apache.org/jira/browse/YUNIKORN-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2115. - Fix Version/s: 1.5.0 Resolution: Fixed > [Umbrella] YuniKorn application traceability - phase II > --- > > Key: YUNIKORN-2115 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2115 > Project: Apache YuniKorn > Issue Type: New Feature > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 1.5.0 > > > This is a follow-up on YUNIKORN-1628. > This ticket focuses on streaming and user/group events. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2116) Track user/group events
[ https://issues.apache.org/jira/browse/YUNIKORN-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2116. - Fix Version/s: 1.5.0 Resolution: Fixed Core changes committed. The changes to the SI have been committed last week. Both PRs are done, closing. > Track user/group events > --- > > Key: YUNIKORN-2116 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2116 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: pull-request-available > Fix For: 1.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2441) Wildcard limits are not applied to the root tracker during creation
[ https://issues.apache.org/jira/browse/YUNIKORN-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2441. - Fix Version/s: 1.5.0 Resolution: Fixed Change committed > Wildcard limits are not applied to the root tracker during creation > --- > > Key: YUNIKORN-2441 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2441 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Blocker > Labels: pull-request-available > Fix For: 1.5.0 > > > When a queue tracker is created with {{newQueueTracker()}}, the appropriate > wildcard limits are applied if the tracking type is "user". > The problem is this call: > {noformat} > if trackType == user { > if config := m.getUserWildCardLimitsConfig(queuePath + "." + > queueName); config != nil { > {noformat} > For "root", we'll call "root." (with a dot at the end) instead of "root". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2445) Add comments around locking setup in tracker code
Wilfred Spiegelenburg created YUNIKORN-2445: --- Summary: Add comments around locking setup in tracker code Key: YUNIKORN-2445 URL: https://issues.apache.org/jira/browse/YUNIKORN-2445 Project: Apache YuniKorn Issue Type: Task Components: core - scheduler Reporter: Wilfred Spiegelenburg The QueueTracker code is lock free and should stay lock free. Each queue tracker object is always only linked to one UserTracker or GroupTracker. Locking is thus handled from those objects. This does mean that calls to the user or group trackers that can modify the underlying queue tracker structure must take a write lock. This specifically impacts the {{canRunApp()}} and {{headroom()}} calls as they add new entries in the queue hierarchy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2440) [UMBRELLA] Remove stateaware scheduling
Wilfred Spiegelenburg created YUNIKORN-2440: --- Summary: [UMBRELLA] Remove stateaware scheduling Key: YUNIKORN-2440 URL: https://issues.apache.org/jira/browse/YUNIKORN-2440 Project: Apache YuniKorn Issue Type: Task Components: core - scheduler Reporter: Wilfred Spiegelenburg Umbrella jira to track all the work to remove state ware scheduling: * remove scheduling code * remove documentation * remove configuration options * document way to achieve similar behaviour (FIFO with max applications) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Created] (YUNIKORN-2439) Announce deprecation of state aware scheduling
Wilfred Spiegelenburg created YUNIKORN-2439: --- Summary: Announce deprecation of state aware scheduling Key: YUNIKORN-2439 URL: https://issues.apache.org/jira/browse/YUNIKORN-2439 Project: Apache YuniKorn Issue Type: Task Components: release-notes Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg State aware scheduling was a simple scheduling algorithm that provided a stop gap until gang scheduling was implemented. Gang scheduling and state aware do not work together. Gang scheduling is a more generic way of achieving almost the same behaviour. State aware scheduling has a number of drawbacks and could be used as an attack vector to slow down overall scheduling performance. We should deprecate it and remove in an upcoming release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2026) Update features document in Chinese translation
[ https://issues.apache.org/jira/browse/YUNIKORN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-2026. --- > Update features document in Chinese translation > --- > > Key: YUNIKORN-2026 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2026 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: JiaChi Wang >Assignee: JiaChi Wang >Priority: Minor > Labels: pull-request-available > > Some parts are missing in the Chinese translation of the features document. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-1511) Adding Chinese translation of Deploy to Kubernetes
[ https://issues.apache.org/jira/browse/YUNIKORN-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-1511. --- > Adding Chinese translation of Deploy to Kubernetes > -- > > Key: YUNIKORN-1511 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1511 > Project: Apache YuniKorn > Issue Type: Task >Reporter: Chen Yu Teng >Assignee: Chenchen Lai >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2220) pod.DeepCopy() is called twice in Task
[ https://issues.apache.org/jira/browse/YUNIKORN-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-2220. --- > pod.DeepCopy() is called twice in Task > -- > > Key: YUNIKORN-2220 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2220 > Project: Apache YuniKorn > Issue Type: Sub-task > Components: shim - kubernetes >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Minor > > A small improvement is possible in {{task.go}}. > In {{handleSubmitTaskEvent()}} and {{{}postTaskAllocated(){}}}, we call > {{pod.DeepCopy()}} twice to avoid possible race conditions, but a single copy > is enough. Once we have a copy, it's local to the method. > {noformat} > events.GetRecorder().Eventf(task.pod.DeepCopy(), nil, v1.EventTypeNormal, > "Scheduling", "Scheduling", > "%s is queued and waiting for allocation", task.alias) > // if this task belongs to a task group, that means the app has gang > scheduling enabled > // in this case, post an event to indicate the task is being gang > scheduled > if !task.placeholder && task.taskGroupName != "" { > events.GetRecorder().Eventf(task.pod.DeepCopy(), nil, > v1.EventTypeNormal, "GangScheduling", "GangScheduling", > "Pod belongs to the taskGroup %s, it will be scheduled > as a gang member", task.taskGroupName) <-- second copy if GS is used > } > {noformat} > {noformat} > events.GetRecorder().Eventf(task.pod.DeepCopy(), > nil, v1.EventTypeNormal, "Scheduled", "Scheduled", > "Successfully assigned %s to node %s", task.alias, task.nodeName) > ... > events.GetRecorder().Eventf(task.pod.DeepCopy(), nil, > v1.EventTypeNormal, "PodBindSuccessful", "PodBindSuccessful", > "Pod %s is successfully bound to node %s", task.alias, task.nodeName) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-803) Improve coverage of partition.go
[ https://issues.apache.org/jira/browse/YUNIKORN-803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-803. -- > Improve coverage of partition.go > > > Key: YUNIKORN-803 > URL: https://issues.apache.org/jira/browse/YUNIKORN-803 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler >Reporter: Chen Yu Teng >Assignee: Cliff Su >Priority: Minor > Attachments: list.png, partition.go coverage.png > > > According to feedback of coverage file, add test to improve coverage of > partition.go -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-1691) Adding Chinese translation of User Based Resource Usage Tracking
[ https://issues.apache.org/jira/browse/YUNIKORN-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-1691. --- > Adding Chinese translation of User Based Resource Usage Tracking > - > > Key: YUNIKORN-1691 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1691 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Chen Yu Teng >Assignee: Chenchen Lai >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-1692) Adding Chinese translation of User Based Resource Usage Tracking
[ https://issues.apache.org/jira/browse/YUNIKORN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-1692. --- > Adding Chinese translation of User Based Resource Usage Tracking > > > Key: YUNIKORN-1692 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1692 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Chen Yu Teng >Assignee: Huang Guan Hao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Closed] (YUNIKORN-2223) Eliminate separate mutex variables
[ https://issues.apache.org/jira/browse/YUNIKORN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg closed YUNIKORN-2223. --- > Eliminate separate mutex variables > -- > > Key: YUNIKORN-2223 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2223 > Project: Apache YuniKorn > Issue Type: Improvement > Components: shim - kubernetes >Reporter: Peter Bacsko >Priority: Minor > > In {{{}cache.Task{}}}, the lock variable is defined as: > {noformat} > type Task struct { > ... > schedulingState TaskSchedulingState > sm *fsm.FSM > lock*sync.RWMutex > } {noformat} > This also applies to {{cache.Application}} and {{cache.Context}}. > In other parts of the code, we simply embed {{sync.RWMutex}}. There's no need > to have a separate variable. Locking and unlocking become simpler. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1033) Add Chinese translation for developer guide documents
[ https://issues.apache.org/jira/browse/YUNIKORN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1033. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Add Chinese translation for developer guide documents > - > > Key: YUNIKORN-1033 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1033 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: cdmikechen >Assignee: Chen Yu Teng >Priority: Major > > Add Chinese translation for developer guide documents, this is a sub task on > https://issues.apache.org/jira/browse/YUNIKORN-1029 > This issue include YuniKorn site developer guide documents. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1691) Adding Chinese translation of User Based Resource Usage Tracking
[ https://issues.apache.org/jira/browse/YUNIKORN-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1691. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Adding Chinese translation of User Based Resource Usage Tracking > - > > Key: YUNIKORN-1691 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1691 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Chen Yu Teng >Assignee: Chenchen Lai >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1692) Adding Chinese translation of User Based Resource Usage Tracking
[ https://issues.apache.org/jira/browse/YUNIKORN-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1692. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Adding Chinese translation of User Based Resource Usage Tracking > > > Key: YUNIKORN-1692 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1692 > Project: Apache YuniKorn > Issue Type: Task > Components: website >Reporter: Chen Yu Teng >Assignee: Huang Guan Hao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-1511) Adding Chinese translation of Deploy to Kubernetes
[ https://issues.apache.org/jira/browse/YUNIKORN-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-1511. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Adding Chinese translation of Deploy to Kubernetes > -- > > Key: YUNIKORN-1511 > URL: https://issues.apache.org/jira/browse/YUNIKORN-1511 > Project: Apache YuniKorn > Issue Type: Task >Reporter: Chen Yu Teng >Assignee: Chenchen Lai >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org
[jira] [Resolved] (YUNIKORN-2026) Update features document in Chinese translation
[ https://issues.apache.org/jira/browse/YUNIKORN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wilfred Spiegelenburg resolved YUNIKORN-2026. - Resolution: Won't Do With the changes from YUNIKORN-2411 this is no longer relevant. > Update features document in Chinese translation > --- > > Key: YUNIKORN-2026 > URL: https://issues.apache.org/jira/browse/YUNIKORN-2026 > Project: Apache YuniKorn > Issue Type: Task > Components: documentation >Reporter: JiaChi Wang >Assignee: JiaChi Wang >Priority: Minor > Labels: pull-request-available > > Some parts are missing in the Chinese translation of the features document. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org