[jira] [Updated] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47197: --- Labels: pull-request-available (was: ) > Failed to connect HiveMetastore when using iceberg with HiveCatalog on > spark-sql or spark-shell > --- > > Key: SPARK-47197 > URL: https://issues.apache.org/jira/browse/SPARK-47197 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 3.2.3, 3.5.1 >Reporter: YUBI LEE >Priority: Major > Labels: pull-request-available > > I can't connect to kerberized HiveMetastore when using iceberg with > HiveCatalog on spark-sql or spark-shell. > I think this issue is caused by the fact that there is no way to get > HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. > ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] > > {code:java} > val currentToken = > UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) > currentToken == null && UserGroupInformation.isSecurityEnabled && > hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && > (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) > || > (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) > {code} > There should be a way to force to get HIVE_DELEGATION_TOKEN even when using > spark-sql or spark-shell. > Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is > set? > {code:java} > spark.security.credentials.hive.enabled true {code} > > {code:java} > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > ... > Caused by: MetaException(message:Could not connect to meta store using any of > the URIs provided. Most recent failure: > org.apache.thrift.transport.TTransportException: GSS initiate failed {code} > > > {code:java} > spark-sql> select * from temp.test_hive_catalog; > ... > ... > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) > at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) > at > org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) > at > org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) > at > org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) > at > org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) > at > org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) > at > org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) > at > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:247) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:24
[jira] [Updated] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-47197: - Description: I can't connect to kerberized HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell. I think this issue is caused by the fact that there is no way to get HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] {code:java} val currentToken = UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) currentToken == null && UserGroupInformation.isSecurityEnabled && hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) || (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) {code} There should be a way to force to get HIVE_DELEGATION_TOKEN even when using spark-sql or spark-shell. Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is set? {code:java} spark.security.credentials.hive.enabled true {code} {code:java} 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (machine1.example.com executor 2): org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore ... Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed {code} {code:java} spark-sql> select * from temp.test_hive_catalog; ... ... 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (machine1.example.com executor 2): org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) at scala.Option.foreach(Option.scala:407) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:247) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apach
[jira] [Updated] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-47197: - Summary: Failed to connect HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell (was: Failed to connect HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell) > Failed to connect HiveMetastore when using iceberg with HiveCatalog on > spark-sql or spark-shell > --- > > Key: SPARK-47197 > URL: https://issues.apache.org/jira/browse/SPARK-47197 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 3.2.3, 3.5.1 >Reporter: YUBI LEE >Priority: Major > > I can't connect to kerberized HiveMetastore when using iceberg with > HiveCatalog by spark-sql or spark-shell. > I think this issue is caused by the fact that there is no way to get > HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. > ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] > > {code:java} > val currentToken = > UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) > currentToken == null && UserGroupInformation.isSecurityEnabled && > hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && > (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) > || > (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) > {code} > There should be a way to force to get HIVE_DELEGATION_TOKEN even when using > spark-sql or spark-shell. > Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is > set? > {code:java} > spark.security.credentials.hive.enabled true {code} > > {code:java} > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > ... > Caused by: MetaException(message:Could not connect to meta store using any of > the URIs provided. Most recent failure: > org.apache.thrift.transport.TTransportException: GSS initiate failed {code} > > > {code:java} > spark-sql> select * from temp.test_hive_catalog; > ... > ... > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) > at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) > at > org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) > at > org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) > at > org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) > at > org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) > at > org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) > at > org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) > at > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) > at org.apache.spark.r