[ 
https://issues.apache.org/jira/browse/YUNIKORN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved YUNIKORN-2599.
--------------------------------------
    Fix Version/s: 1.6.0
       Resolution: Fixed

> Certain shim events are not handled by the state machine
> --------------------------------------------------------
>
>                 Key: YUNIKORN-2599
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2599
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - yarn
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.6.0
>
>
> After YUNIKORN-2597 got merged, it became clear that we keep sending an 
> {{AppStateChange}} event which cannot be handled by the state machine. There 
> isn't any state transition which are be triggered by it.
> {{AppTaskCompleted}} is very similar, it is only processed in {{Resuming}} 
> state, but it's sent whenever a task is completed.
> If someone runs the test case TestApplicationScheduling, the following errors 
> are displayed:
> {noformat}
> [...]
> 2024-05-02T18:08:14.856+0200  ERROR   shim.context    cache/context.go:1316   
> application event cannot be handled in the current state        
> {"applicationID": "app0001", "event": "AppStateChange", "state": "Running"}
> github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
> 2024-05-02T18:08:14.856+0200  INFO    core.scheduler.application      
> [...] 
> 2024-05-02T18:08:14.857+0200  INFO    core.scheduler.partition        
> scheduler/partition.go:928      scheduler allocation processed  {"appID": 
> "app0001", "allocationKey": "task0002", "allocatedResource": 
> "map[memory:10000000 pods:1 vcore:1]", "placeholder": false, "targetNode": 
> "test.host.02"}
> 2024-05-02T18:08:14.857+0200  ERROR   shim.context    cache/context.go:1316   
> application event cannot be handled in the current state        
> {"applicationID": "app0001", "event": "AppStateChange", "state": "Running"}
> github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
> [...]
> 2024-05-02T18:08:15.856+0200  INFO    shim.fsm        cache/task_state.go:380 
> Task state transition   {"app": "app0001", "task": "task0001", "taskAlias": 
> "default/task0001", "source": "Bound", "destination": "Completed", "event": 
> "CompleteTask"}
> 2024-05-02T18:08:15.856+0200  ERROR   shim.context    cache/context.go:1316   
> application event cannot be handled in the current state        
> {"applicationID": "app0001", "event": "AppTaskCompleted", "state": "Running"}
> github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
> [...]
> 2024-05-02T18:08:16.858+0200  INFO    shim.fsm        cache/task_state.go:380 
> Task state transition   {"app": "app0001", "task": "task0002", "taskAlias": 
> "default/task0002", "source": "Bound", "destination": "Completed", "event": 
> "CompleteTask"}
> 2024-05-02T18:08:16.858+0200  ERROR   shim.context    cache/context.go:1316   
> application event cannot be handled in the current state        
> {"applicationID": "app0001", "event": "AppTaskCompleted", "state": "Running"}
> github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
> [...]
> 2024-05-02T18:08:16.859+0200  ERROR   shim.context    cache/context.go:1316   
> application event cannot be handled in the current state        
> {"applicationID": "app0001", "event": "AppStateChange", "state": "Running"}
> github.com/apache/yunikorn-k8shim/pkg/shim.newShimSchedulerInternal.(*Context).ApplicationEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/cache/context.go:1316
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.getEventHandler.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:123
> github.com/apache/yunikorn-k8shim/pkg/dispatcher.Start.func1
>       /home/bacskop/repos/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:225
> 2024-05-02T18:08:17.859+0200  INFO    shim.cache.application  
> cache/application.go:243        task removed    {"appID": "app0001", 
> "taskID": "task0002"}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to