This is an automated email from the ASF dual-hosted git repository.
feiwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kyuubi.git
The following commit(s) were added to refs/heads/master by this push:
new f8431da7a [KYUUBI #6686] Ignore Spark pod container state if pod is
terminated
f8431da7a is described below
commit f8431da7acc4530929ffbddf9fdaa8e6cec9f15c
Author: Wang, Fei <[email protected]>
AuthorDate: Sat Sep 14 12:28:28 2024 -0700
[KYUUBI #6686] Ignore Spark pod container state if pod is terminated
# :mag: Description
## Issue References ๐
To close #6686

The pod already in failed state, and the driver container is in waiting
state.
We shall mark the application terminated and ignore the container state.
## Describe Your Solution ๐ง
Please include a summary of the change and which issue is fixed. Please
also include relevant motivation and context. List any dependencies that are
required for this change.
## Types of changes :bookmark:
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
## Test Plan ๐งช
#### Behavior Without This Pull Request :coffin:
#### Behavior With This Pull Request :tada:
#### Related Unit Tests
---
# Checklist ๐
- [x] This patch was not authored or co-authored using [Generative
Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes #6690 from turboFei/pod_state.
Closes #6686
0d4c8a255 [Wang, Fei] comments
d60b901c1 [Wang, Fei] check pod terminated
Authored-by: Wang, Fei <[email protected]>
Signed-off-by: Wang, Fei <[email protected]>
---
docs/configuration/settings.md | 42 +++++++++++-----------
.../org/apache/kyuubi/config/KyuubiConf.scala | 4 ++-
.../engine/KubernetesApplicationOperation.scala | 33 +++++++++++------
3 files changed, 47 insertions(+), 32 deletions(-)
diff --git a/docs/configuration/settings.md b/docs/configuration/settings.md
index 8cfa2659f..b7ecef116 100644
--- a/docs/configuration/settings.md
+++ b/docs/configuration/settings.md
@@ -344,27 +344,27 @@ You can configure the Kyuubi properties in
`$KYUUBI_HOME/conf/kyuubi-defaults.co
### Kubernetes
-| Key |
Default |
Meaning
[...]
-|----------------------------------------------------------------------|----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
-| kyuubi.kubernetes.application.state.container |
spark-kubernetes-driver |
The container name to retrieve the application state from.
[...]
-| kyuubi.kubernetes.application.state.source | POD
| The
source to retrieve the application state from. The valid values are pod and
container. If the source is container and there is container inside the pod
with the name of kyuubi.kubernetes.application.state.container, the application
state will be from the matched container state. Otherwise, the application
state will be from the pod stat [...]
-| kyuubi.kubernetes.authenticate.caCertFile |
<undefined> |
Path to the CA cert file for connecting to the Kubernetes API server over TLS
from the kyuubi. Specify this as a path as opposed to a URI (i.e. do not
provide a scheme)
[...]
-| kyuubi.kubernetes.authenticate.clientCertFile |
<undefined> |
Path to the client cert file for connecting to the Kubernetes API server over
TLS from the kyuubi. Specify this as a path as opposed to a URI (i.e. do not
provide a scheme)
[...]
-| kyuubi.kubernetes.authenticate.clientKeyFile |
<undefined> |
Path to the client key file for connecting to the Kubernetes API server over
TLS from the kyuubi. Specify this as a path as opposed to a URI (i.e. do not
provide a scheme)
[...]
-| kyuubi.kubernetes.authenticate.oauthToken |
<undefined> |
The OAuth token to use when authenticating against the Kubernetes API server.
Note that unlike, the other authentication options, this must be the exact
string value of the token to use for the authentication.
[...]
-| kyuubi.kubernetes.authenticate.oauthTokenFile |
<undefined> |
Path to the file containing the OAuth token to use when authenticating against
the Kubernetes API server. Specify this as a path as opposed to a URI (i.e. do
not provide a scheme)
[...]
-| kyuubi.kubernetes.context |
<undefined> |
The desired context from your kubernetes config file used to configure the K8s
client for interacting with the cluster.
[...]
-| kyuubi.kubernetes.context.allow.list
|| The
allowed kubernetes context list, if it is empty, there is no kubernetes context
limitation.
[...]
-| kyuubi.kubernetes.master.address |
<undefined> |
The internal Kubernetes master (API server) address to be used for kyuubi.
[...]
-| kyuubi.kubernetes.namespace |
default |
The namespace that will be used for running the kyuubi pods and find engines.
[...]
-| kyuubi.kubernetes.namespace.allow.list
|| The
allowed kubernetes namespace list, if it is empty, there is no kubernetes
namespace limitation.
[...]
-| kyuubi.kubernetes.spark.appUrlPattern |
http://{{SPARK_DRIVER_SVC}}.{{KUBERNETES_NAMESPACE}}.svc:{{SPARK_UI_PORT}} |
The pattern to generate the spark on kubernetes application UI URL. The pattern
should contain placeholders for the application variables. Available
placeholders are `{{SPARK_APP_ID}}`, `{{SPARK_DRIVER_SVC}}`,
`{{KUBERNETES_NAMESPACE}}`, `{{KUBERNETES_CONTEXT}}` and `{{SPARK_UI_PORT}}`.
[...]
-| kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.checkInterval | PT1M
| Kyuubi
server use guava cache as the cleanup trigger with time-based eviction, but the
eviction would not happened until any get/put operation happened. This option
schedule a daemon thread evict cache periodically.
[...]
-| kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind | NONE
| Kyuubi
server will delete the spark driver pod after the application terminates for
kyuubi.kubernetes.terminatedApplicationRetainPeriod. Available options are
NONE, ALL, COMPLETED and default value is None which means none of the pod will
be deleted
[...]
-| kyuubi.kubernetes.spark.forciblyRewriteDriverPodName.enabled | false
| Whether
to forcibly rewrite Spark driver pod name with 'kyuubi-<uuid>-driver'. If
disabled, Kyuubi will try to preserve the application name while satisfying
K8s' pod name policy, but some vendors may have stricter pod name policies,
thus the generated name may become illegal.
[...]
-| kyuubi.kubernetes.spark.forciblyRewriteExecutorPodNamePrefix.enabled | false
| Whether
to forcibly rewrite Spark executor pod name prefix with 'kyuubi-<uuid>'. If
disabled, Kyuubi will try to preserve the application name while satisfying
K8s' pod name policy, but some vendors may have stricter Pod name policies,
thus the generated name may become illegal.
[...]
-| kyuubi.kubernetes.terminatedApplicationRetainPeriod | PT5M
| The
period for which the Kyuubi server retains application information after the
application terminates.
[...]
-| kyuubi.kubernetes.trust.certificates | false
| If set
to true then client can submit to kubernetes cluster only with token
[...]
+| Key |
Default |
Meaning
[...]
+|----------------------------------------------------------------------|----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| kyuubi.kubernetes.application.state.container |
spark-kubernetes-driver |
The container name to retrieve the application state from.
[...]
+| kyuubi.kubernetes.application.state.source | POD
| The
source to retrieve the application state from. The valid values are pod and
container. When the pod is in a terminated state, the container state will be
ignored, and the application state will be determined based on the pod state.
If the source is container and there is container inside the pod with the name
of kyuubi.kubernetes.applic [...]
+| kyuubi.kubernetes.authenticate.caCertFile |
<undefined> |
Path to the CA cert file for connecting to the Kubernetes API server over TLS
from the kyuubi. Specify this as a path as opposed to a URI (i.e. do not
provide a scheme)
[...]
+| kyuubi.kubernetes.authenticate.clientCertFile |
<undefined> |
Path to the client cert file for connecting to the Kubernetes API server over
TLS from the kyuubi. Specify this as a path as opposed to a URI (i.e. do not
provide a scheme)
[...]
+| kyuubi.kubernetes.authenticate.clientKeyFile |
<undefined> |
Path to the client key file for connecting to the Kubernetes API server over
TLS from the kyuubi. Specify this as a path as opposed to a URI (i.e. do not
provide a scheme)
[...]
+| kyuubi.kubernetes.authenticate.oauthToken |
<undefined> |
The OAuth token to use when authenticating against the Kubernetes API server.
Note that unlike, the other authentication options, this must be the exact
string value of the token to use for the authentication.
[...]
+| kyuubi.kubernetes.authenticate.oauthTokenFile |
<undefined> |
Path to the file containing the OAuth token to use when authenticating against
the Kubernetes API server. Specify this as a path as opposed to a URI (i.e. do
not provide a scheme)
[...]
+| kyuubi.kubernetes.context |
<undefined> |
The desired context from your kubernetes config file used to configure the K8s
client for interacting with the cluster.
[...]
+| kyuubi.kubernetes.context.allow.list
|| The
allowed kubernetes context list, if it is empty, there is no kubernetes context
limitation.
[...]
+| kyuubi.kubernetes.master.address |
<undefined> |
The internal Kubernetes master (API server) address to be used for kyuubi.
[...]
+| kyuubi.kubernetes.namespace |
default |
The namespace that will be used for running the kyuubi pods and find engines.
[...]
+| kyuubi.kubernetes.namespace.allow.list
|| The
allowed kubernetes namespace list, if it is empty, there is no kubernetes
namespace limitation.
[...]
+| kyuubi.kubernetes.spark.appUrlPattern |
http://{{SPARK_DRIVER_SVC}}.{{KUBERNETES_NAMESPACE}}.svc:{{SPARK_UI_PORT}} |
The pattern to generate the spark on kubernetes application UI URL. The pattern
should contain placeholders for the application variables. Available
placeholders are `{{SPARK_APP_ID}}`, `{{SPARK_DRIVER_SVC}}`,
`{{KUBERNETES_NAMESPACE}}`, `{{KUBERNETES_CONTEXT}}` and `{{SPARK_UI_PORT}}`.
[...]
+| kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.checkInterval | PT1M
| Kyuubi
server use guava cache as the cleanup trigger with time-based eviction, but the
eviction would not happened until any get/put operation happened. This option
schedule a daemon thread evict cache periodically.
[...]
+| kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind | NONE
| Kyuubi
server will delete the spark driver pod after the application terminates for
kyuubi.kubernetes.terminatedApplicationRetainPeriod. Available options are
NONE, ALL, COMPLETED and default value is None which means none of the pod will
be deleted
[...]
+| kyuubi.kubernetes.spark.forciblyRewriteDriverPodName.enabled | false
| Whether
to forcibly rewrite Spark driver pod name with 'kyuubi-<uuid>-driver'. If
disabled, Kyuubi will try to preserve the application name while satisfying
K8s' pod name policy, but some vendors may have stricter pod name policies,
thus the generated name may become illegal.
[...]
+| kyuubi.kubernetes.spark.forciblyRewriteExecutorPodNamePrefix.enabled | false
| Whether
to forcibly rewrite Spark executor pod name prefix with 'kyuubi-<uuid>'. If
disabled, Kyuubi will try to preserve the application name while satisfying
K8s' pod name policy, but some vendors may have stricter Pod name policies,
thus the generated name may become illegal.
[...]
+| kyuubi.kubernetes.terminatedApplicationRetainPeriod | PT5M
| The
period for which the Kyuubi server retains application information after the
application terminates.
[...]
+| kyuubi.kubernetes.trust.certificates | false
| If set
to true then client can submit to kubernetes cluster only with token
[...]
### Lineage
diff --git
a/kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
b/kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
index 999f1844e..e4b1b8c21 100644
--- a/kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
+++ b/kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
@@ -1349,7 +1349,9 @@ object KyuubiConf {
val KUBERNETES_APPLICATION_STATE_SOURCE: ConfigEntry[String] =
buildConf("kyuubi.kubernetes.application.state.source")
.doc("The source to retrieve the application state from. The valid
values are " +
- "pod and container. If the source is container and there is container
inside the pod " +
+ "pod and container. When the pod is in a terminated state, the
container state" +
+ " will be ignored, and the application state will be determined based
on the pod state." +
+ " If the source is container and there is container inside the pod " +
s"with the name of ${KUBERNETES_APPLICATION_STATE_CONTAINER.key}, the
application state " +
s"will be from the matched container state. " +
s"Otherwise, the application state will be from the pod state.")
diff --git
a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
index c4d3c93ba..41b4d4481 100644
---
a/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
+++
b/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
@@ -484,20 +484,33 @@ object KubernetesApplicationOperation extends Logging {
.find(cs => appStateContainer.equalsIgnoreCase(cs.getName))
case KubernetesApplicationStateSource.POD => None
}
- val applicationState = containerStatusToBuildAppState
+
+ val podAppState = podStateToApplicationState(pod.getStatus.getPhase)
+ val containerAppState = containerStatusToBuildAppState
.map(_.getState)
.map(containerStateToApplicationState)
- .getOrElse(podStateToApplicationState(pod.getStatus.getPhase))
- val applicationError = if (ApplicationState.isFailed(applicationState)) {
- val errorMap = containerStatusToBuildAppState.map { cs =>
- Map("Pod" -> podName, "Container" -> appStateContainer,
"ContainerStatus" -> cs)
- }.getOrElse {
- Map("Pod" -> podName, "PodStatus" -> pod.getStatus)
- }
- Some(JsonUtils.toPrettyJson(errorMap.asJava))
+
+ // When the pod app state is terminated, the container app state will be
ignored
+ val applicationState = if (ApplicationState.isTerminated(podAppState)) {
+ podAppState
} else {
- None
+ containerAppState.getOrElse(podAppState)
}
+ val applicationError =
+ if (ApplicationState.isFailed(applicationState)) {
+ val errorMap = containerStatusToBuildAppState.map { cs =>
+ Map(
+ "Pod" -> podName,
+ "PodStatus" -> pod.getStatus,
+ "Container" -> appStateContainer,
+ "ContainerStatus" -> cs)
+ }.getOrElse {
+ Map("Pod" -> podName, "PodStatus" -> pod.getStatus)
+ }
+ Some(JsonUtils.toPrettyJson(errorMap.asJava))
+ } else {
+ None
+ }
applicationState -> applicationError
}