[jira] [Created] (YUNIKORN-2575) Make logging for IsPodFitNode clear

2024-04-22 Thread Wilfred Spiegelenburg (Jira)
Wilfred Spiegelenburg created YUNIKORN-2575:
---

 Summary: Make logging for IsPodFitNode clear
 Key: YUNIKORN-2575
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2575
 Project: Apache YuniKorn
  Issue Type: Improvement
  Components: shim - kubernetes
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg


The logging in {{IsPodFitNode()}} logs the same message for a missing pod and 
node. We should log clearly which thing is missing: the node or the pod.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2544) [UMBRELLA] Fix Yunikorn potential locking issues

2024-04-22 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2544.

Resolution: Fixed

All subtasks have been resolved, closing ticket.

> [UMBRELLA] Fix Yunikorn potential locking issues
> 
>
> Key: YUNIKORN-2544
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2544
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>
> Go tool [go-deadlock|https://github.com/sasha-s/go-deadlock/] identified 
> several potential deadlocks in Yunikorn.
> Some of these do not cause problems right now, but a lock-related change in 
> the future can trigger a deadlock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2563) [shim] Enable deadlock detection during unit tests

2024-04-22 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved YUNIKORN-2563.
--
Fix Version/s: 1.6.0
   Resolution: Fixed

> [shim] Enable deadlock detection during unit tests
> --
>
> Key: YUNIKORN-2563
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2563
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: shim - kubernetes, test - unit
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Created] (YUNIKORN-2574) totalPartitionResource should not be mutated with AddTo/SubFrom

2024-04-22 Thread Peter Bacsko (Jira)
Peter Bacsko created YUNIKORN-2574:
--

 Summary: totalPartitionResource should not be mutated with 
AddTo/SubFrom
 Key: YUNIKORN-2574
 URL: https://issues.apache.org/jira/browse/YUNIKORN-2574
 Project: Apache YuniKorn
  Issue Type: Bug
  Components: core - scheduler
Affects Versions: 1.5.0, 1.4.0
Reporter: Peter Bacsko
Assignee: Peter Bacsko


There is a potential data race in PartitionContext: the field 
"totalPartitionResource" is mutated in place. The problem is that the method 
{{GetTotalPartitionResource()}} does not clone it. 

{noformat}
func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource {
pc.RLock()
defer pc.RUnlock()

return pc.totalPartitionResource
}
{noformat}

In general, we should prefer the immutable approach for variables like this, 
just like in {{objects.Queue}}:
{noformat}
func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported 
bool) error {
// check this queue: failure stops checks if the allocation is not part 
of a node addition
newAllocated := resources.Add(sq.allocatedResource, alloc)
[ ... removed ... ]
sq.Lock()
defer sq.Unlock()
// all OK update this queue
sq.allocatedResource = newAllocated
sq.updateAllocatedResourceMetrics()
return nil
}

// incPendingResource increments pending resource of this queue and its parents.
func (sq *Queue) incPendingResource(delta *resources.Resource) {
// update the parent
if sq.parent != nil {
sq.parent.incPendingResource(delta)
}
// update this queue
sq.Lock()
defer sq.Unlock()
sq.pending = resources.Add(sq.pending, delta)
sq.updatePendingResourceMetrics()
}
{noformat}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2547) Queue: clean up logic when adding application

2024-04-22 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved YUNIKORN-2547.
--
Fix Version/s: 1.6.0
   Resolution: Fixed

> Queue: clean up logic when adding application
> -
>
> Key: YUNIKORN-2547
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2547
> Project: Apache YuniKorn
>  Issue Type: Sub-task
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> We found two issues when adding an application to a queue:
> # Inside {{Queue.AddApplication()}}, we parse and process "quota" and 
> "guaranteed" from the application tags, then we set them on the queue if they 
> have a valid value. We shouldn't be doing this inside {{AddApplication()}}, 
> but rather when we're constructing the application object. That way, they're 
> already available when the app is being added.
> # We an add application to the Queue, but this can be reverted immediately if 
> the placeholder doesn't fit or the "sortType" is not FIFO.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2541) Fix CVE-2023-45288

2024-04-22 Thread Yu-Lin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Lin Chen resolved YUNIKORN-2541.
---
Fix Version/s: 1.6.0
   Resolution: Fixed

Merged to master. Thanks for [~targetoee]'s contribution.

> Fix CVE-2023-45288
> --
>
> Key: YUNIKORN-2541
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2541
> Project: Apache YuniKorn
>  Issue Type: Improvement
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: JiaChi Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0
>
>
> Update golang.org/x/net to 0.23.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org



[jira] [Resolved] (YUNIKORN-2562) Nil pointer panic in Application.ReplaceAllocation()

2024-04-22 Thread Peter Bacsko (Jira)


 [ 
https://issues.apache.org/jira/browse/YUNIKORN-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko resolved YUNIKORN-2562.

Fix Version/s: 1.6.0
   1.5.1
   Resolution: Fixed

> Nil pointer panic in Application.ReplaceAllocation()
> 
>
> Key: YUNIKORN-2562
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2562
> Project: Apache YuniKorn
>  Issue Type: Bug
>  Components: core - scheduler
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.6.0, 1.5.1
>
>
> The following panic was generated during placeholder replacement:
> {noformat}
> 2024-04-16T13:46:58.583Z  INFOshim.cache.task cache/task.go:542   
> releasing allocations   {"numOfAsksToRelease": 1, 
> "numOfAllocationsToRelease": 1}
> 2024-04-16T13:46:58.583Z  INFOshim.fsmcache/task_state.go:380 
> Task state transition   {"app": "application-spark-abrdrsmo8no2", "task": 
> "cd73be15-af61-4248-89e1-d3296e72214e", "taskAlias": 
> "obem-spark/tg-application-spark-abrdrsmo8n-spark-driver-y71h0amzo5", 
> "source": "Bound", "destination": "Completed", "event": "CompleteTask"}
> 2024-04-16T13:46:58.584Z  INFOcore.scheduler.application  
> objects/application.go:616  ask removed successfully from application 
>   {"appID": "application-spark-abrdrsmo8no2", "ask": 
> "cd73be15-af61-4248-89e1-d3296e72214e", "pendingDelta": "map[]"}
> 2024-04-16T13:46:58.584Z  INFOcore.scheduler.partition
> scheduler/partition.go:1281 replacing placeholder allocation
> {"appID": "application-spark-abrdrsmo8no2", "allocationID": 
> "cd73be15-af61-4248-89e1-d3296e72214e"}
> panic: runtime error: invalid memory address or nil pointer dereference
> [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x17e1255]
> goroutine 117 [running]:
> github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Application).ReplaceAllocation(0xc008c46600,
>  {0xc007710cf0, 0x24})
>   
> github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/objects/application.go:1745
>  +0x615
> github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0x?,
>  0xc009786700)
>   
> github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/partition.go:1284 
> +0x28b
> github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc00be64ba0?,
>  {0xc00bb1af90, 0x1, 0x40a0fa?}, {0x1e0d902, 0x9})
>   github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/context.go:870 
> +0x9e
> github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent(0xc0005f5f58?,
>  0xc0071a3f10?)
>   github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/context.go:750 
> +0xa5
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc000700540)
>   github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/scheduler.go:133 
> +0x1c5
> created by 
> github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService in 
> goroutine 1
>   github.com/apache/yunikorn-core@v1.5.0-3/pkg/scheduler/scheduler.go:60 
> +0x9c
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org