JoonPark1 opened a new issue, #7226:
URL: https://github.com/apache/kyuubi/issues/7226

   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [x] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the bug
   
   This occurs for Kubernetes Cluster deployment mode for kyuubi over k8 
cluster. When multiple large number of spark batch jobs are submitted to 
kyuubi, the kyuubi attempts to spin up large number of spark drivers to handle 
per batch job. However, when there is heavy load, it can cause kyuubi to store 
records via MetadataManager about the state of each batch job as "PENDING" and 
repeated polling about each batch job's status until it runs out of memory. 
Then, upon next restart of kyuubi pod container, it again repeatedly polls as 
the records never gets updated as all the spark drivers to handle batch jobs 
never gets created and scheduled in first place. This causes the records in 
Metadata Store for kyuubi to persist regarding batch jobs as "state" field of 
value "PENDING" and "engine_state" field of value "UNKNOWN". The records can 
never get resolved and the repeated polling continues causing subsequent 
restarts of kyuubi to run out of memory. 
   
   ### Affects Version(s)
   
   v1.10.2
   
   ### Kyuubi Server Log Output
   
   ```logtalk
   2025-10-21 16:03:48.837 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=406b8bc1-457e-4115-ae55-50a0d39c061c to be 
created, elapsed time: 92106ms, return UNKNOWN status
   2025-10-21 16:03:48.929 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=6d439666-0484-4902-9dab-39ad39f96b3e to be 
created, elapsed time: 92291ms, return UNKNOWN status
   2025-10-21 16:03:48.929 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=fc5979a0-5599-4088-90ee-c5e995e0fca7 to be 
created, elapsed time: 92365ms, return UNKNOWN status
   2025-10-21 16:03:48.930 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=6aacf45b-21b3-44a5-bdd1-1f05eaaec393 to be 
created, elapsed time: 92365ms, return UNKNOWN status
   2025-10-21 16:03:48.930 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=2db1998a-e5e2-4a31-bc97-40f5f1b31345 to be 
created, elapsed time: 92258ms, return UNKNOWN status
   2025-10-21 16:03:48.931 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=20a84034-3c6b-47b0-916b-199b0e0750da to be 
created, elapsed time: 92250ms, return UNKNOWN status
   2025-10-21 16:03:48.932 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=3aa6e64e-a4af-443e-97c3-4296befd050a to be 
created, elapsed time: 92202ms, return UNKNOWN status
   2025-10-21 16:03:48.937 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=d259e70a-ef06-41f3-8e2e-fe661552fae1 to be 
created, elapsed time: 92199ms, return UNKNOWN status
   2025-10-21 16:03:48.938 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=f4e45523-f18a-4f69-b163-07b96666cee0 to be 
created, elapsed time: 92271ms, return UNKNOWN status
   2025-10-21 16:03:49.135 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=171cacb5-4b47-4db4-a384-6f2925f927e6 to be 
created, elapsed time: 92505ms, return UNKNOWN status
   2025-10-21 16:03:50.638 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=d98b10c6-e56a-429a-b86c-79f353ac18bb to be 
created, elapsed time: 94073ms, return UNKNOWN status
   2025-10-21 16:03:50.640 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=a1e6f809-c1b7-41a7-b21f-007eaa2eaaf0 to be 
created, elapsed time: 94076ms, return UNKNOWN status
   2025-10-21 16:03:50.729 WARN 
org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod 
with label: kyuubi-unique-tag=0d71979b-3f34-485f-8088-9bada0308133 to be 
created, elapsed time: 94086ms, return UNKNOWN status
   ```
   
   ### Kyuubi Engine Log Output
   
   ```logtalk
   No engine log output... it seems the k8 pod crashes for kyuubi server before 
it even have chance to write the engine logs specific to each batch-job 
submission for my given user...
   ```
   
   ### Kyuubi Server Configurations
   
   ```yaml
   ################################################## kyuubi server settings 
#############################################
               
kyuubi.kubernetes.master.address=https://azure-eastus2-st-085-dv-dl-001-9ejkzfkk.hcp.eastus2.azmk8s.io:443
               kyuubi.kubernetes.namespace=kyuubi-poc
               
kyuubi.kubernetes.authenticate.driver.serviceAccountName=kyuubi-poc
               kyuubi.kubernetes.trust.certificates=true
               # defaults to POD we ran into edge case where imagepullbackoff 
and job is pending
               kyuubi.kubernetes.application.state.source=CONTAINER
               
kyuubi.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token
               kyuubi.engine.kubernetes.submit.timeout=PT300S
               # enable arrow configuration
               kyuubi.operation.result.format=arrow
               # kyuubi.operation.incremental.collect=true
               ################################################## kyuubi engine 
settings #############################################
               kyuubi.engine.share.level=USER
               # kyuubi.engine.share.level=SERVER
               kyuubi.server.name=superset-poc-server
               ################################################## Very 
expiremental stuff ############################################
               # kyuubi.engine.deregister.exception.messages=Error getting 
policies,serviceName=spark,httpStatusCode:400
               # kyuubi.engine.deregister.job.max.failures=1
               # kyuubi.engine.deregister.exception.ttl=PT10M
               ################################################## kyuubi admin 
list ##################################################
               kyuubi.server.administrators=A242528,A295378,A270054
               ################################################## kyuubi 
profile settings ############################################
               
kyuubi.session.conf.advisor=org.apache.kyuubi.session.FileSessionConfAdvisor
               ##################################kyuubi engine kill disable 
settings #################################################
               # kyuubi.engine.ui.stop.enabled=false
               ################################################## kyuubi engine 
clean up settings ####################################
               kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=COMPLETED
               kyuubi.kubernetes.terminatedApplicationRetainPeriod=PT5M
               ################################################## User specific 
defaults #############################################
               # ___srv-spark-dbt-np___.kyuubi.session.engine.idle.timeout=PT30S
               # ___srv-spark-dbt-np___.kyuubi.session.idle.timeout=PT30S
               
___srv-spark-dbt-np___.kyuubi.session.engine.initialize.timeout=PT10M
               kyuubi.session.idle.timeout=PT15M
               kyuubi.batch.session.idle.timeout=PT15M
               kyuubi.engine.user.isolated.spark.session.idle.timeout=PT15M
               ################################################## Trino Engine 
#######################################################
               kyuubi.frontend.protocols=REST,THRIFT_BINARY,TRINO
               kyuubi.frontend.trino.bind.host=0.0.0.0
               kyuubi.frontend.trino.bind.port=10011
               ################################################## kyuubi ldap 
auth ###################################################
               kyuubi.authentication=LDAP
               # 
kyuubi.authentication.ldap.url=ldaps://GEICO-LDAPS-FR-IH.geico.corp.net:636
               
kyuubi.authentication.ldap.url=ldaps://GEICO-LDAPS-FR-IH.geico.corp.net:636 
ldaps://GEICO-LDAPS-PL-IH.geico.corp.net:636 
ldaps://GEICO-LDAPS-PD-WL.geico.corp.net:636
               
kyuubi.authentication.ldap.binddn=CN=SRV-DESPT-RGR-LDP-NP,OU=Service 
Accounts,OU=Admin,DC=GEICO,DC=corp,DC=net
               kyuubi.authentication.ldap.bindpw=_SYNC_LDAP_BIND_PASSWORD_
               
kyuubi.authentication.ldap.baseDN=OU=Admin,DC=GEICO,DC=corp,DC=net
               
kyuubi.authentication.ldap.userDNPattern=sAMAccountName=%s,OU=Admin,DC=GEICO,DC=corp,DC=net
               kyuubi.authentication.ldap.userMembershipKey=memberOf
               
kyuubi.authentication.ldap.groupDNPattern=CN=%s,OU=Admin,DC=GEICO,DC=corp,DC=net
               kyuubi.authentication.ldap.guidKey=sAMAccountName
               kyuubi.authentication.ldap.groupClassKey=group
               
kyuubi.authentication.ldap.groupFilter=ENT-ASG-DATALAKEHOUSE-COMPUTE-PLATFORM-NP-USER,ENT-SBR-DATALAKEHOUSE_CONTRIBUTOR-NP-ASSIGNED,ENT-ASG-ADB-EDPCOR-SB-DED-CONTRIBUTOR,ENT-ASG-AZURE-DATAOPS-PLATFORM-NP-ADMIN
               ################################################## kyuubi enable 
UI ##################################################
               kyuubi.frontend.rest.bind.host=0.0.0.0
               ################################################## kyuubi 
session settings ###########################################
               # 
kyuubi.session.conf.restrict.list=spark.sql.optimizer.excludedRules,spark.kubernetes.driver.node.selector.label,spark.kubernetes.executor.node.selector.label,spark.master,spark.submit.deployMode,spark.kubernetes.namespace,spark.kubernetes.authenticate.driver.serviceAccountName,spark.kubernetes.driver.podTemplateFile,spark.kubernetes.executor.podTemplateFile,spark.ui.killEnabled,spark.redaction.regex,spark.sql.redaction.string.regex
               
spark.kyuubi.conf.restricted.list=spark.sql.optimizer.excludedRules,spark.kubernetes.driver.node.selector.label,spark.kubernetes.executor.node.selector.label,spark.master,spark.submit.deployMode,spark.kubernetes.namespace,spark.kubernetes.authenticate.driver.serviceAccountName,spark.kubernetes.driver.podTemplateFile,spark.kubernetes.executor.podTemplateFile,spark.ui.killEnabled,spark.redaction.regex,spark.sql.redaction.string.regex
               
kyuubi.session.conf.ignore.list=spark.sql.optimizer.excludedRules,spark.kubernetes.driver.node.selector.label,spark.kubernetes.executor.node.selector.label,spark.master,spark.submit.deployMode,spark.kubernetes.namespace,spark.kubernetes.authenticate.driver.serviceAccountName,spark.kubernetes.driver.podTemplateFile,spark.kubernetes.executor.podTemplateFile,spark.ui.killEnabled,spark.redaction.regex,spark.sql.redaction.string.regex
               
kyuubi.batch.conf.ignore.list=spark.kubernetes.driver.node.selector.label,spark.kubernetes.executor.node.selector.label,spark.master,spark.submit.deployMode,spark.kubernetes.namespace,spark.kubernetes.authenticate.driver.serviceAccountName,spark.kubernetes.driver.podTemplateFile,spark.kubernetes.executor.podTemplateFile,spark.ui.killEnabled,spark.redaction.regex,spark.sql.redaction.string.regex
               ################################################## kyuubi 
zookeeper settings #########################################
               kyuubi.ha.addresses=http://etcd:2379
               
kyuubi.ha.client.class=org.apache.kyuubi.ha.client.etcd.EtcdDiscoveryClient
               kyuubi.ha.namespace=kyuubi
               ################################################## Database 
configurations for metadata store ########################
               kyuubi.metadata.store.jdbc.database.type=POSTGRESQL
               kyuubi.metadata.store.jdbc.driver=org.postgresql.Driver
               
kyuubi.metadata.store.jdbc.url=jdbc:postgresql://kyuubipoc.datalakehouse.dv.prw.cloud.geico.net:5432/kyuubidb?tcpKeepAlive=true&logUnclosedConnections=true&prepareThreshold=0
               kyuubi.metadata.store.jdbc.user=srv-kyuubi-user-dv
               kyuubi.metadata.store.jdbc.password=_KYUUBI_DB_PWD_
               kyuubi.metadata.store.jdbc.datasource.maxLifetime=180000
               # README!!!!! max 200 connections, so take 200 / pod count for 
the max pool size. this value can be less than the max too
               kyuubi.metadata.store.jdbc.datasource.maximumPoolSize=20
               kyuubi.metadata.store.jdbc.datasource.connectionTimeout=30000
               
kyuubi.metadata.store.jdbc.datasource.leakDetectionThreshold=150000
               ################################################## Batch 
Defaults #####################################################
               
kyuubi.batchConf.spark.spark.master=k8s://azure-eastus2-st-085-dv-dl-001-9ejkzfkk.hcp.eastus2.azmk8s.io:443
               kyuubi.batchConf.spark.spark.kubernetes.namespace=kyuubi-poc
               
kyuubi.batchConf.spark.spark.kubernetes.authenticate.driver.serviceAccountName=kyuubi-poc
               
kyuubi.batchConf.spark.spark.hadoop.fs.AbstractFileSystem.abfss.impl=org.apache.hadoop.fs.azurebfs.Abfss
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto002.dfs.core.windows.net=OAuth
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto002.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto002.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto002.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto002.dfs.core.windows.net=_HADOOP_ADLS_SECRET_
               
kyuubi.batchConf.spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto002.dfs.core.windows.net=SharedKey
               
kyuubi.batchConf.spark.hadoop.fs.azure.account.key.gzfedpcordv1sto002.dfs.core.windows.net=_GZFEDPCORDV1STO002_ADLS_KEY_
 
               
kyuubi.batchConf.spark.hadoop.fs.azure.account.auth.type.gzfdlhdrsdv1sto001.dfs.core.windows.net=SharedKey
               
kyuubi.batchConf.spark.hadoop.fs.azure.account.key.gzfdlhdrsdv1sto001.dfs.core.windows.net=_GZFDLHDRSDV1STO001_ADLS_KEY_
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfdlhingdv1sto001.dfs.core.windows.net=OAuth
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfdlhingdv1sto001.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfdlhingdv1sto001.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfdlhingdv1sto001.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfdlhingdv1sto001.dfs.core.windows.net=_DLH_ADLS_SECRET_
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto003.dfs.core.windows.net=OAuth
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto003.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto003.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto003.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto003.dfs.core.windows.net=_HADOOP_ADLS_SECRET_
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto003.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto003.dfs.core.windows.net=_DLH_ADLS_SECRET_
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto004.dfs.core.windows.net=OAuth
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto004.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto004.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto004.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto004.dfs.core.windows.net=_HADOOP_ADLS_SECRET_
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto004.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto004.dfs.core.windows.net=_DLH_ADLS_SECRET_
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfhststgdv1sto001.dfs.core.windows.net=OAuth
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfhststgdv1sto001.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfhststgdv1sto001.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfhststgdv1sto001.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e
               # 
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfhststgdv1sto001.dfs.core.windows.net=_HADOOP_ADLS_SECRET_
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfhststgdv1sto001.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfhststgdv1sto001.dfs.core.windows.net=_DLH_ADLS_SECRET_
               kyuubi.batchConf.spark.spark.eventLog.enabled=true
               kyuubi.batchConf.spark.spark.eventLog.compress=true
               kyuubi.batchConf.spark.spark.eventLog.compression.codec=zstd
               
kyuubi.batchConf.spark.spark.hadoop.fs.azure.write.request.size=33554432
               
kyuubi.batchConf.spark.spark.eventLog.dir=abfss://[email protected]/eventlogs
            
               
#kyuubi.batchConf.spark.spark.hadoop.hive.metastore.client.connect.retry.delay=5
               
#kyuubi.batchConf.spark.spark.hadoop.hive.metastore.client.socket.timeout=1800
               
#kyuubi.batchConf.spark.spark.hadoop.hive.metastore.uris=thrift://10.29.27.118:443
               
kyuubi.batchConf.spark.spark.hadoop.hive.server2.thrift.http.port=10002
               
kyuubi.batchConf.spark.spark.hadoop.hive.server2.thrift.port=10000
               
kyuubi.batchConf.spark.spark.hadoop.hive.server2.transport.mode=binary
               
#kyuubi.batchConf.spark.spark.hadoop.metastore.catalog.default=hive
               kyuubi.batchConf.spark.spark.hadoop.hive.execution.engine=spark
               
kyuubi.batchConf.spark.spark.hadoop.hive.input.format=io.delta.hive.HiveInputFormat
               
kyuubi.batchConf.spark.spark.hadoop.hive.tez.input.format=io.delta.hive.HiveInputFormat
               kyuubi.frontend.rest.proxy.jetty.client.responseBufferSize=16384
               # 
kyuubi.batchConf.spark.spark.redaction.regex="(?i)secret|password|passwd|token|\.account\.key|credential|credentials|\.client\.secret\|_secret|appMgrInfo|pwd"
               
kyuubi.server.redaction.regex='(?i)(secret|password|passwd|token|\.account\.key|credential|credentials|\.client\.secret\|_secret|pwd|appMgrInfo)'
               
kyuubi.batchConf.spark.sql.redaction.string.regex=(?i)\bselect\b[\s\S]+?\bfrom\b[\s\S]+?(;|$)
               ######################################################### 
Optimizations ################################################
               kyuubi.batchConf.spark.spark.sql.adaptive.enabled=true
               kyuubi.batchConf.spark.spark.sql.adaptive.forceApply=false
               kyuubi.batchConf.spark.spark.sql.adaptive.logLevel=info
               
kyuubi.batchConf.spark.spark.sql.adaptive.advisoryPartitionSizeInBytes=128m
               
kyuubi.batchConf.spark.spark.sql.adaptive.coalescePartitions.enabled=true
               
kyuubi.batchConf.spark.spark.sql.adaptive.coalescePartitions.minPartitionNum=1
               
kyuubi.batchConf.spark.spark.sql.adaptive.coalescePartitions.initialPartitionNum=1024
               
kyuubi.batchConf.spark.spark.sql.adaptive.fetchShuffleBlocksInBatch=true
               
kyuubi.batchConf.spark.spark.sql.adaptive.localShuffleReader.enabled=true
               kyuubi.batchConf.spark.spark.sql.adaptive.skewJoin.enabled=true
               
kyuubi.batchConf.spark.spark.sql.adaptive.skewJoin.skewedPartitionFactor=5
               
kyuubi.batchConf.spark.spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=400m
               
kyuubi.batchConf.spark.spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0.2
               # DRA (shuffle tracking) defaults for batch engines
               kyuubi.batchConf.spark.spark.dynamicAllocation.enabled=true
               
kyuubi.batchConf.spark.spark.dynamicAllocation.shuffleTracking.enabled=true
               kyuubi.batchConf.spark.spark.dynamicAllocation.initialExecutors=2
               kyuubi.batchConf.spark.spark.dynamicAllocation.minExecutors=2
               kyuubi.batchConf.spark.spark.dynamicAllocation.maxExecutors=64
               
kyuubi.batchConf.spark.spark.dynamicAllocation.executorAllocationRatio=0.5
               
kyuubi.batchConf.spark.spark.dynamicAllocation.executorIdleTimeout=60s
               
kyuubi.batchConf.spark.spark.dynamicAllocation.cachedExecutorIdleTimeout=30min
               kyuubi.batchConf.spark.spark.cleaner.periodicGC.interval=5min
               kyuubi.batchConf.spark.spark.sql.autoBroadcastJoinThreshold=-1
               kyuubi.operation.getTables.ignoreTableProperties=true
               # spark executor ADLS Variables
               # 
kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_CLIENT_ID=1d680742-02be-4b8c-969f-afafeccdcc0e
               
kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_CLIENT_ID=dba6925b-465b-436b-b99c-f1b963988e48
               
kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_TENANT_ID=7389d8c0-3607-465c-a69f-7d4426502912
 
               # 
kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_CLIENT_SECRET=_HADOOP_ADLS_SECRET_
               
kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_CLIENT_SECRET=_DLH_ADLS_SECRET_
               # 
kyuubi.batchConf.spark.spark.executorEnv.AZURE_CLIENT_ID=1d680742-02be-4b8c-969f-afafeccdcc0e
               
kyuubi.batchConf.spark.spark.executorEnv.AZURE_CLIENT_ID=dba6925b-465b-436b-b99c-f1b963988e48
               
kyuubi.batchConf.spark.spark.executorEnv.AZURE_TENANT_ID=7389d8c0-3607-465c-a69f-7d4426502912
 
               # 
kyuubi.batchConf.spark.spark.executorEnv.AZURE_CLIENT_SECRET=_HADOOP_ADLS_SECRET_
               
kyuubi.batchConf.spark.spark.executorEnv.AZURE_CLIENT_SECRET=_DLH_ADLS_SECRET_
               # default resource configs
               kyuubi.batchConf.spark.spark.executor.memory=20G
               kyuubi.batchConf.spark.spark.executor.cores=6
               kyuubi.batchConf.spark.spark.driver.memory=20G
               kyuubi.batchConf.spark.spark.driver.cores=6
   ```
   
   ### Kyuubi Engine Configurations
   
   ```yaml
   
spark.master=k8s://azure-eastus2-st-085-dv-dl-001-9ejkzfkk.hcp.eastus2.azmk8s.io:443
               spark.submit.deployMode=cluster
               spark.kubernetes.namespace=kyuubi-poc
               
spark.kubernetes.authenticate.driver.serviceAccountName=kyuubi-poc
               #testing image with spark-hadoop-cloud dep
               
spark.kubernetes.container.image=geiconp.azurecr.io/edposs/edpcor/spark/datalakehouse-spark-3.5.1:2025-10-09T17-23-02.1.9477612
               
#spark.kubernetes.container.image=geiconp.azurecr.io/edposs/edpcor/dlh-apache/spark-3.5.6-s2.12-j17-py3-dlh:2025-08-19T20-05-36.1.8886835
               spark.hadoop.hive.server2.transport.mode=binary
               spark.hadoop.hive.execution.engine=spark
               spark.hadoop.hive.input.format=io.delta.hive.HiveInputFormat
               spark.hadoop.hive.tez.input.format=io.delta.hive.HiveInputFormat
               
spark.sql.warehouse.dir=abfss://[email protected]/warehouse
               
spark.hadoop.fs.defaultFS=abfss://[email protected]
               
spark.hadoop.fs.AbstractFileSystem.abfss.impl=org.apache.hadoop.fs.azurebfs.Abfss
               # 
spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto002.dfs.core.windows.net=OAuth
               # 
spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto002.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               # 
spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto002.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               # 
spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto002.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e
               # 
spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto002.dfs.core.windows.net=_HADOOP_ADLS_SECRET_
               
spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto002.dfs.core.windows.net=SharedKey
               
spark.hadoop.fs.azure.account.key.gzfedpcordv1sto002.dfs.core.windows.net=_GZFEDPCORDV1STO002_ADLS_KEY_
 
               
spark.hadoop.fs.azure.account.auth.type.gzfdlhdrsdv1sto001.dfs.core.windows.net=SharedKey
               
spark.hadoop.fs.azure.account.key.gzfdlhdrsdv1sto001.dfs.core.windows.net=_GZFDLHDRSDV1STO001_ADLS_KEY_
               
spark.hadoop.fs.azure.account.auth.type.gzfdlhingdv1sto001.dfs.core.windows.net=OAuth
               
spark.hadoop.fs.azure.account.oauth.provider.type.gzfdlhingdv1sto001.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               
spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfdlhingdv1sto001.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               
spark.hadoop.fs.azure.account.oauth2.client.id.gzfdlhingdv1sto001.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48
               
spark.hadoop.fs.azure.account.oauth2.client.secret.gzfdlhingdv1sto001.dfs.core.windows.net=_DLH_ADLS_SECRET_
               
spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto003.dfs.core.windows.net=OAuth
               spark.eventLog.compress=true
               spark.eventLog.compression.codec=zstd
               spark.hadoop.fs.azure.write.request.size=33554432
               
spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto003.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               
spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto003.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               # 
spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto003.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e
               # 
spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto003.dfs.core.windows.net=_HADOOP_ADLS_SECRET_
               
spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto003.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48
               
spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto003.dfs.core.windows.net=_DLH_ADLS_SECRET_
               
spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto004.dfs.core.windows.net=OAuth
               
spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto004.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               
spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto004.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               # 
spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto004.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e
               # 
spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto004.dfs.core.windows.net=_HADOOP_ADLS_SECRET_
               
spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto004.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48
               
spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto004.dfs.core.windows.net=_DLH_ADLS_SECRET_
               
spark.hadoop.fs.azure.account.auth.type.gzfhststgdv1sto001.dfs.core.windows.net=OAuth
               
spark.hadoop.fs.azure.account.oauth.provider.type.gzfhststgdv1sto001.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
               
spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfhststgdv1sto001.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token
               # 
spark.hadoop.fs.azure.account.oauth2.client.id.gzfhststgdv1sto001.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e
               # 
spark.hadoop.fs.azure.account.oauth2.client.secret.gzfhststgdv1sto001.dfs.core.windows.net=_HADOOP_ADLS_SECRET_
               
spark.hadoop.fs.azure.account.oauth2.client.id.gzfhststgdv1sto001.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48
               
spark.hadoop.fs.azure.account.oauth2.client.secret.gzfhststgdv1sto001.dfs.core.windows.net=_DLH_ADLS_SECRET_
               
spark.kubernetes.file.upload.path=abfss://[email protected]/fileupload
               spark.executor.memory=16G
               spark.executor.cores=8
               spark.driver.memory=200G
               spark.driver.cores=40
               spark.driver.maxResultSize=20g
               spark.scheduler.mode=FAIR
               
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
               spark.sql.adaptive.enabled=true
               spark.decommission.enabled=true
               spark.dynamicAllocation.enabled=true
               spark.dynamicAllocation.minExecutors=16
               spark.dynamicAllocation.maxExecutors=64
               spark.dynamicAllocation.executorAllocationRatio=0.5
               spark.kubernetes.driver.node.selector.label=nodepool2
               spark.kubernetes.executor.node.selector.label=nodepool3
               spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp 
-Divy.home=/tmp 
-javaagent:/opt/spark/jars/jmx_prometheus_javaagent-1.0.1.jar=7778:/opt/spark/conf/config.yaml
 -Djava.security.manager=allow -Dio.netty.tryReflectionSetAccessible=true 
-XX:+UseG1GC
               spark.executor.extraJavaOptions=-Divy.cache.dir=/tmp 
-Divy.home=/tmp 
-javaagent:/opt/spark/jars/jmx_prometheus_javaagent-1.0.1.jar=7778:/opt/spark/conf/config.yaml
 -Djava.security.manager=allow -Dio.netty.tryReflectionSetAccessible=true 
-XX:+UseG1GC
               spark.kubernetes.executor.annotation.prometheus.io/port=7778
               spark.kubernetes.executor.annotation.prometheus.io/scrape=true
               spark.kubernetes.executor.annotation.prometheus.io/path=/metrics
               spark.kubernetes.driver.annotation.prometheus.io/scrape=true
               spark.kubernetes.driver.annotation.prometheus.io/port=7778
               spark.kubernetes.driver.annotation.prometheus.io/path=/metrics
               spark.kubernetes.executor.annotation.k8s.grafana.com/scrape=true
               
spark.kubernetes.executor.annotation.k8s.grafana.com/metrics.path=/metrics
               
spark.kubernetes.executor.annotation.k8s.grafana.com/metrics.portNumber=7778
               spark.kubernetes.driver.annotation.k8s.grafana.com/scrape=true
               
spark.kubernetes.driver.annotation.k8s.grafana.com/metrics.path=/metrics
               
spark.kubernetes.driver.annotation.k8s.grafana.com/metrics.portNumber=7778
               ## Gang Scheduling Configs
               # # spark.kubernetes.scheduler.name=yunikorn
               # spark.kubernetes.driver.label.queue=root.kyuubi-poc
               # spark.kubernetes.executor.label.queue=root.kyuubi-poc
               # 
spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters="placeholderTimeoutInSeconds=30
 gangSchedulingStyle=Hard"
               # 
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name="spark-driver"
               # 
spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name="spark-executor"
               # # 
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups=[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"40","memory":"200Gi"}},{"name":"spark-executor","minMember":2,"minResource":{"cpu":"40","memory":"180Gi"}}]
               # 
spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups=[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"40","memory":"200Gi"}},{"name":"spark-executor","minMember":2,"minResource":{"cpu":"40","memory":"180Gi"},"tolerations":[{"key":"kubernetes.azure.com/scalesetpriority","operator":"Equal","value":"spot","effect":"NoSchedule"}]}]
               spark.excludeOnFailure.enabled=true
               spark.metrics.conf=/opt/spark/conf/metrics.properties
               spark.metrics.namespace=${spark.app.name}
               spark.eventLog.enabled=true
               
spark.eventLog.dir=abfss://[email protected]/eventlogs
               
spark.sql.extensions=org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
               # 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
               
spark.kubernetes.executor.podTemplateFile=/opt/kyuubi/conf/spotTemplate.yml
               
spark.kubernetes.driver.podTemplateFile=/opt/kyuubi/conf/driverTemplate.yml
               # Optimizations
               
spark.sql.redaction.string.regex=(?i)\bselect\b[\s\S]+?\bfrom\b[\s\S]+?(;|$)
               # 
spark.redaction.regex=(?i)secret|password|passwd|token|key|credential|credentials|pwd
               # 
spark.redaction.regex="(?i)secret|password|passwd|token|\.account\.key|credential|credentials|\.client\.secret\|_secret|pwd"
               # test new redaction
               
spark.redaction.regex=(?i)secret|password|passwd|token|\.account\.key|credential|credentials|pwd|appMgrInfo
               spark.sql.adaptive.enabled=true
               spark.sql.adaptive.forceApply=false
               spark.sql.adaptive.logLevel=info
               spark.sql.adaptive.advisoryPartitionSizeInBytes=256m
               spark.sql.adaptive.coalescePartitions.enabled=true
               spark.sql.adaptive.coalescePartitions.minPartitionNum=1
               spark.sql.adaptive.coalescePartitions.initialPartitionNum=1024
               spark.sql.adaptive.fetchShuffleBlocksInBatch=true
               spark.sql.adaptive.localShuffleReader.enabled=true
               spark.sql.adaptive.skewJoin.enabled=true
               spark.sql.adaptive.skewJoin.skewedPartitionFactor=5
               spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=400m
               spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0.2
               spark.sql.autoBroadcastJoinThreshold=-1
               # Plugins (disable Gluten globally; enable only in Gluten 
profile)
               spark.plugins=io.dataflint.spark.SparkDataflintPlugin
               # TPCDS catalog configs
               
spark.sql.catalog.tpcds=org.apache.kyuubi.spark.connector.tpcds.TPCDSCatalog
               # spark.sql.catalog.tpcds.excludeDatabases=sf30000
               spark.sql.catalog.tpcds.useAnsiStringType=false
               spark.sql.catalog.tpcds.useTableSchema_2_6=true
               spark.sql.catalog.tpcds.read.maxPartitionBytes=128m
               # Polaris
               spark.sql.defaultCatalog=polaris
               spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog
               spark.sql.catalog.polaris.warehouse=dv-polaris
               
#spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials
               
spark.sql.catalog.polaris.catalog-impl=org.apache.iceberg.rest.RESTCatalog
               
spark.sql.catalog.polaris.uri=http://10.16.188.108:8181/api/catalog
               
spark.sql.catalog.polaris.credential=0853b716c1ffaad3:_POLARIS_CRED_
               spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL
               spark.sql.catalog.polaris.token-refresh-enabled=true
               
spark.sql.catalog.polaris.oauth2-server-uri=http://10.16.188.108:8181/api/catalog/v1/oauth/tokens
               # spark executor ADLS Variables
               # 
spark.kubernetes.driverEnv.AZURE_CLIENT_ID=1d680742-02be-4b8c-969f-afafeccdcc0e
               
spark.kubernetes.driverEnv.AZURE_CLIENT_ID=dba6925b-465b-436b-b99c-f1b963988e48
               
spark.kubernetes.driverEnv.AZURE_TENANT_ID=7389d8c0-3607-465c-a69f-7d4426502912
               # 
spark.kubernetes.driverEnv.AZURE_CLIENT_SECRET=_HADOOP_ADLS_SECRET_
               spark.kubernetes.driverEnv.AZURE_CLIENT_SECRET=_DLH_ADLS_SECRET_
               # 
spark.executorEnv.AZURE_CLIENT_ID=1d680742-02be-4b8c-969f-afafeccdcc0e
               
spark.executorEnv.AZURE_CLIENT_ID=dba6925b-465b-436b-b99c-f1b963988e48
               
spark.executorEnv.AZURE_TENANT_ID=7389d8c0-3607-465c-a69f-7d4426502912
               # spark.executorEnv.AZURE_CLIENT_SECRET=_HADOOP_ADLS_SECRET_
               spark.executorEnv.AZURE_CLIENT_SECRET=_DLH_ADLS_SECRET_
               # impersonation settings
               hive.server2.enable.doAs=true
               # Spark UI TITAN integration:
               
spark.executorEnv.SPARK_EXECUTOR_ATTRIBUTE_APP_ID='$(SPARK_APPLICATION_ID)'
               
spark.executorEnv.SPARK_EXECUTOR_ATTRIBUTE_EXECUTOR_ID='$(SPARK_EXECUTOR_ID)'
               # enable additional metrics:
               spark.executor.metrics.fileSystemSchemes=hdfs,file,abfss,abfs,s3a
               spark.metrics.appStatusSource.enabled=true
               spark.sql.streaming.metricsEnabled=true
               spark.metrics.executorMetricsSource.enabled=true
               # Testing intermediate manifest commiter to azure through v1 of 
FileOutput Commiter algorithm to handle file renames/merge to dest!
               spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1
               #enable intermediate manifest commiter via binding to spark!
               
#spark.hadoop.mapreduce.outputcommitter.factory.scheme.abfs=org.apache.hadoop.fs.azurebfs.commit.AzureManifestCommitterFactory
               
#spark.hadoop.mapreduce.outputcommitter.factory.scheme.abfss=org.apache.hadoop.fs.azurebfs.commit.AzureManifestCommitterFactory
               
#spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
               
#spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
               
#spark.hadoop.mapreduce.manifest.committer.summary.report.directory=abfss://[email protected]/dv-spark-commit-report
               # DataFlint
               # spark.plugins=io.dataflint.spark.SparkDataflintPlugin
               # ivy settings for debug
               spark.jars.ivy.log.level=DEBUG
               # spark ui enabled set to false
               spark.ui.killEnabled=false
   ```
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to fix.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to