joaopamaral opened a new pull request, #6722:
URL: https://github.com/apache/kyuubi/pull/6722

   # :mag: Description
   ## Issue References ๐Ÿ”—
   <!-- Append the issue number after #. If there is no issue for you to link 
create one or -->
   <!-- If there are no issues to link, please provide details here. -->
   
   This issue was noticed a few times when the batch `state` was `set` to 
`ERROR`, but the `appState` kept the non-terminal state forever (e.g. 
`RUNNING`), even if the application was finished (in this case Yarn 
Application).
   
   ```json
   {
   "id": "********",
   "user": "****",
   "batchType": "SPARK",
   "name": "*********",
   "appStartTime": 0,
   "appId": "********",
   "appUrl": "********",
   "appState": "RUNNING",
   "appDiagnostic": "",
   "kyuubiInstance": "*********",
   "state": "ERROR",
   "createTime": 1725343207318,
   "endTime": 1725343300986,
   "batchInfo": {}
   }
   ```
   
   It seems that this happens when there is some intermittent failure during 
the monitoring step and the batch ends with ERROR, leaving the application 
metadata without an update. This can lead to some misinterpretation that the 
application is still running. We need to set this to `UNKNOWN` state to avoid 
errors.
   
   ## Describe Your Solution ๐Ÿ”ง
   
   This is a simple fix that only checks if the batch state is `ERROR` and the 
appState is not in a terminal state and changes the `appState` to `UNKNOWN`, in 
these cases (during the batch metadata update).
   
   ## Types of changes :bookmark:
   <!--- What types of changes does your code introduce? Put an `x` in all the 
boxes that apply: -->
   - [x] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing 
functionality to change)
   
   ## Test Plan ๐Ÿงช
   
   #### Behavior Without This Pull Request :coffin:
   
   If there is some error between the Kyuubi and the Application request (e.g. 
YARN client), the batch is finished with `ERROR` state and the application 
keeps the last know state (e.g. RUNNING).
   
   #### Behavior With This Pull Request :tada:
   
   If there is some error between the Kyuubi and the Application request (e.g. 
YARN client), the batch is finished with `ERROR `state and the application has 
a non-terminal state, it is forced to `UNKNOWN` state.
   
   #### Related Unit Tests
   
   I've tried to implement a unit test to replicate this behavior but I didn't 
make it. We need to force an exception in the Engine Request (e.g. 
`YarnClient.getApplication`) but we need to wait for the application to be in 
the RUNNING state before raising this exception, or maybe block the connection 
between kyuubi and the engine.
   
   ---
   
   # Checklist ๐Ÿ“
   <!--- Go over all the following points, and put an `x` in all the boxes that 
apply. -->
   <!--- If you're unsure about any of these, don't hesitate to ask. We're here 
to help! -->
   
   - [ ] This patch was not authored or co-authored using [Generative 
Tooling](https://www.apache.org/legal/generative-tooling.html)
   
   **Be nice. Be informative.**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to