[I] [Bug] when set iceberg configuration and query any exists db.table, kyuubi show table not found [kyuubi]

via GitHub Thu, 23 Nov 2023 05:05:55 -0800


beat4ocean opened a new issue, #5759:
URL: https://github.com/apache/kyuubi/issues/5759


   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the bug
   
   my env: hadoop-3.3.6 spark-3.3.3 kyuubi-1.8.0/1.7.3
   when set iceberg conf from the offical website:
   spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.spark_catalog.type=hive
   spark.sql.catalog.spark_catalog.uri=thrift://metastore-host:port
   
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   
   and also put the iceberg-spark-runtime-3.3_2.12-1.4.2.jar into the 
spark_home/jars. and restart kyuubi
   and then I query any exists tables, it triggered a bug:
   Caused by: org.apache.spark.sql.AnalysisException: Table or view not found: 
t1; line 1 pos 21;
   'Aggregate [unresolvedalias(count(1), None)]
   +- 'UnresolvedRelation [t1], [], false
   
        at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:131)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:102)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:367)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:366)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:366)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:366)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:102)
        at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:97)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:214)
        at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
        at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
        at 
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
        at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
        at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
        at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
        at 
org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:86)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.kyuubi.engine.spark.operation.SparkOperation.$anonfun$withLocalProperties$1(SparkOperation.scala:155)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
        at 
org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:139)
        at 
org.apache.kyuubi.engine.spark.operation.ExecuteStatement.executeStatement(ExecuteStatement.scala:81)
        ... 6 more
   
        at 
org.apache.kyuubi.KyuubiSQLException$.apply(KyuubiSQLException.scala:70)
        at 
org.apache.kyuubi.operation.ExecuteStatement.waitStatementComplete(ExecuteStatement.scala:135)
        at 
org.apache.kyuubi.operation.ExecuteStatement.$anonfun$runInternal$1(ExecuteStatement.scala:173)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750) (state=,code=0)
   
   ### Affects Version(s)
   
   1.8.0/1.7.3
   
   ### Kyuubi Server Log Output
   
   ```logtalk
   2023-11-23 20:53:01.514 INFO KyuubiSessionManager-exec-pool: Thread-71 
org.apache.kyuubi.operation.LaunchEngine: Processing bigdata's 
query[3fb0109a-a1dc-40b9-aa34-eb9b65ccb790]: PENDING_STATE -> RUNNING_STATE, 
statement:
   LaunchEngine
   2023-11-23 20:53:01.517 INFO KyuubiSessionManager-exec-pool: Thread-71 
org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: Starting
   2023-11-23 20:53:01.517 INFO KyuubiSessionManager-exec-pool: Thread-71 
org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Initiating client connection, 
connectString=hadoop202:2181,hadoop203:2181,hadoop204:2181 sessionTimeout=60000 
watcher=org.apache.kyuubi.shaded.curator.ConnectionState@b8c1e4
   2023-11-23 20:53:01.519 INFO KyuubiSessionManager-exec-pool: 
Thread-71-SendThread(hadoop202:2181) 
org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to 
server hadoop202/192.168.10.202:2181. Will not attempt to authenticate using 
SASL (unknown error)
   2023-11-23 20:53:01.522 INFO KyuubiSessionManager-exec-pool: 
Thread-71-SendThread(hadoop202:2181) 
org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to 
hadoop202/192.168.10.202:2181, initiating session
   2023-11-23 20:53:01.528 INFO KyuubiSessionManager-exec-pool: 
Thread-71-SendThread(hadoop202:2181) 
org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Session establishment complete 
on server hadoop202/192.168.10.202:2181, sessionid = 0xca00000206af0007, 
negotiated timeout = 40000
   2023-11-23 20:53:01.528 INFO KyuubiSessionManager-exec-pool: 
Thread-71-EventThread 
org.apache.kyuubi.shaded.curator.framework.state.ConnectionStateManager: State 
change: CONNECTED
   2023-11-23 20:53:01.579 INFO KyuubiSessionManager-exec-pool: Thread-71 
org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient: Get service 
instance:hadoop203:42141 engine id:application_1700742232452_0003 and 
version:1.8.0 under /kyuubi_1.8.0_GROUP_SPARK_SQL/bigdata/default
   2023-11-23 20:53:01.614 INFO KyuubiSessionManager-exec-pool: Thread-71 
org.apache.kyuubi.session.KyuubiSessionImpl: [bigdata:192.168.10.203] 
SessionHandle [74a56112-661b-47c5-a4ef-a3b351b2e9d4] - Connected to engine 
[hadoop203:42141]/[application_1700742232452_0003] with SessionHandle 
[74a56112-661b-47c5-a4ef-a3b351b2e9d4]]
   2023-11-23 20:53:01.616 INFO Curator-Framework-0 
org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: 
backgroundOperationsLoop exiting
   2023-11-23 20:53:01.624 INFO KyuubiSessionManager-exec-pool: Thread-71 
org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Session: 0xca00000206af0007 closed
   2023-11-23 20:53:01.624 INFO KyuubiSessionManager-exec-pool: 
Thread-71-EventThread org.apache.kyuubi.shaded.zookeeper.ClientCnxn: 
EventThread shut down for session: 0xca00000206af0007
   2023-11-23 20:53:01.625 INFO KyuubiSessionManager-exec-pool: Thread-71 
org.apache.kyuubi.operation.LaunchEngine: Processing bigdata's 
query[3fb0109a-a1dc-40b9-aa34-eb9b65ccb790]: RUNNING_STATE -> FINISHED_STATE, 
time taken: 0.109 seconds
   ```
   
   
   ### Kyuubi Engine Log Output
   
   ```logtalk
   no output
   ```
   
   
   ### Kyuubi Server Configurations
   
   ```yaml
   # Z-Ordering Support
   spark.sql.extensions=org.apache.kyuubi.sql.KyuubiSparkSQLExtension
   
   # Auxiliary Optimization Rules
   spark.sql.optimizer.insertZorderBeforeWriting.enabled=true
   spark.sql.optimizer.zorderGlobalSort.enabled=true
   spark.sql.optimizer.dropIgnoreNonExistent=false
   spark.sql.optimizer.rebalanceBeforeZorder.enabled=false
   spark.sql.optimizer.rebalanceZorderColumns.enabled=false
   spark.sql.optimizer.twoPhaseRebalanceBeforeZorder.enabled=false
   spark.sql.optimizer.zorderUsingOriginalOrdering.enabled=false
   spark.sql.optimizer.inferRebalanceAndSortOrders.enabled=false
   spark.sql.optimizer.inferRebalanceAndSortOrdersMaxColumns=3
   spark.sql.optimizer.insertRepartitionBeforeWriteIfNoShuffle.enabled=false
   spark.sql.optimizer.finalStageConfigIsolationWriteOnly.enabled=false
   
   # Spark Dynamic Resource Allocation (DRA)
   spark.dynamicAllocation.enabled=true
   ##false if prefer shuffle tracking than ESS
   spark.dynamicAllocation.initialExecutors=1
   spark.dynamicAllocation.minExecutors=1
   spark.dynamicAllocation.maxExecutors=500
   spark.dynamicAllocation.executorAllocationRatio=0.5
   spark.dynamicAllocation.executorIdleTimeout=60s
   spark.dynamicAllocation.cachedExecutorIdleTimeout=30min
   # true if prefer shuffle tracking than ESS
   spark.dynamicAllocation.shuffleTracking.enabled=false
   spark.dynamicAllocation.shuffleTracking.timeout=30min
   spark.dynamicAllocation.schedulerBacklogTimeout=1s
   spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=1s
   spark.cleaner.periodicGC.interval=5min
   
   # Spark Adaptive Query Execution (AQE)
   spark.sql.adaptive.enabled=true
   spark.sql.adaptive.forceApply=false
   spark.sql.adaptive.logLevel=info
   spark.sql.adaptive.advisoryPartitionSizeInBytes=256m
   spark.sql.adaptive.coalescePartitions.enabled=true
   spark.sql.adaptive.coalescePartitions.minPartitionSize=256m
   spark.sql.adaptive.coalescePartitions.initialPartitionNum=8192
   spark.sql.adaptive.fetchShuffleBlocksInBatch=true
   spark.sql.adaptive.localShuffleReader.enabled=true
   spark.sql.adaptive.skewJoin.enabled=true
   spark.sql.adaptive.skewJoin.skewedPartitionFactor=5
   spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=400m
   spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0.2
   spark.sql.adaptive.optimizer.excludedRules
   spark.sql.autoBroadcastJoinThreshold=-1
   
   # SPARK Paimon
   spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog
   spark.sql.catalog.paimon.warehouse=hdfs://hadoop202:8020/kyuubi_spark_paimon
   
   # SPARK hudi
   spark.serializer=org.apache.spark.serializer.KryoSerializer
   spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
   
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
   
   # SPARK iceberg
   spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.spark_catalog.type=hive
   spark.sql.catalog.spark_catalog.uri=thrift://hadoop203:9083
   
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   
   # SPARK lineage
   
spark.sql.queryExecutionListeners=org.apache.kyuubi.plugin.lineage.SparkOperationLineageQueryExecutionListener
   ```
   
   
   ### Kyuubi Engine Configurations
   
   ```yaml
   # spark default conf
   spark.master=yarn
   spark.shuffle.service.enabled=true
   ```
   
   
   ### Additional context
   
   if i shield the iceberg config from the kyuubi-defaults.conf, the bug 
disappears
   
   ### Are you willing to submit PR?
   
   - [X] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to fix.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Bug] when set iceberg configuration and query any exists db.table, kyuubi show table not found [kyuubi]

Reply via email to