[jira] [Resolved] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE resolved SPARK-47197. -- Resolution: Not A Problem https://github.com/apache/spark/pull/45309#issuecomment-1969269354 > Failed to connect HiveMetastore when using iceberg with HiveCatalog on > spark-sql or spark-shell > --- > > Key: SPARK-47197 > URL: https://issues.apache.org/jira/browse/SPARK-47197 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 3.2.3, 3.5.1 >Reporter: YUBI LEE >Priority: Major > Labels: pull-request-available > > I can't connect to kerberized HiveMetastore when using iceberg with > HiveCatalog on spark-sql or spark-shell. > I think this issue is caused by the fact that there is no way to get > HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. > ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] > > {code:java} > val currentToken = > UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) > currentToken == null && UserGroupInformation.isSecurityEnabled && > hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && > (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) > || > (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) > {code} > There should be a way to force to get HIVE_DELEGATION_TOKEN even when using > spark-sql or spark-shell. > Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is > set? > {code:java} > spark.security.credentials.hive.enabled true {code} > > {code:java} > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > ... > Caused by: MetaException(message:Could not connect to meta store using any of > the URIs provided. Most recent failure: > org.apache.thrift.transport.TTransportException: GSS initiate failed {code} > > > {code:java} > spark-sql> select * from temp.test_hive_catalog; > ... > ... > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) > at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) > at > org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) > at > org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) > at > org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) > at > org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) > at > org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) > at > org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) > at > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:247) > at
[jira] [Updated] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-47197: - Description: I can't connect to kerberized HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell. I think this issue is caused by the fact that there is no way to get HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] {code:java} val currentToken = UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) currentToken == null && UserGroupInformation.isSecurityEnabled && hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) || (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) {code} There should be a way to force to get HIVE_DELEGATION_TOKEN even when using spark-sql or spark-shell. Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is set? {code:java} spark.security.credentials.hive.enabled true {code} {code:java} 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (machine1.example.com executor 2): org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore ... Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed {code} {code:java} spark-sql> select * from temp.test_hive_catalog; ... ... 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (machine1.example.com executor 2): org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) at scala.Option.foreach(Option.scala:407) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:247) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
[jira] [Updated] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-47197: - Summary: Failed to connect HiveMetastore when using iceberg with HiveCatalog on spark-sql or spark-shell (was: Failed to connect HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell) > Failed to connect HiveMetastore when using iceberg with HiveCatalog on > spark-sql or spark-shell > --- > > Key: SPARK-47197 > URL: https://issues.apache.org/jira/browse/SPARK-47197 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 3.2.3, 3.5.1 >Reporter: YUBI LEE >Priority: Major > > I can't connect to kerberized HiveMetastore when using iceberg with > HiveCatalog by spark-sql or spark-shell. > I think this issue is caused by the fact that there is no way to get > HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. > ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] > > {code:java} > val currentToken = > UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) > currentToken == null && UserGroupInformation.isSecurityEnabled && > hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && > (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) > || > (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) > {code} > There should be a way to force to get HIVE_DELEGATION_TOKEN even when using > spark-sql or spark-shell. > Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is > set? > {code:java} > spark.security.credentials.hive.enabled true {code} > > {code:java} > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > ... > Caused by: MetaException(message:Could not connect to meta store using any of > the URIs provided. Most recent failure: > org.apache.thrift.transport.TTransportException: GSS initiate failed {code} > > > {code:java} > spark-sql> select * from temp.test_hive_catalog; > ... > ... > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) > at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) > at > org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) > at > org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) > at > org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) > at > org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) > at > org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) > at > org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) > at > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) > at
[jira] [Updated] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-47197: - Component/s: SQL > Failed to connect HiveMetastore when using iceberg with HiveCatalog by > spark-sql or spark-shell > --- > > Key: SPARK-47197 > URL: https://issues.apache.org/jira/browse/SPARK-47197 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 3.2.3, 3.5.1 >Reporter: YUBI LEE >Priority: Major > > I can't connect to kerberized HiveMetastore when using iceberg with > HiveCatalog by spark-sql or spark-shell. > I think this issue is caused by the fact that there is no way to get > HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. > ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] > > {code:java} > val currentToken = > UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) > currentToken == null && UserGroupInformation.isSecurityEnabled && > hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && > (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) > || > (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) > {code} > There should be a way to force to get HIVE_DELEGATION_TOKEN even when using > spark-sql or spark-shell. > Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is > set? > {code:java} > spark.security.credentials.hive.enabled true {code} > > {code:java} > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > ... > Caused by: MetaException(message:Could not connect to meta store using any of > the URIs provided. Most recent failure: > org.apache.thrift.transport.TTransportException: GSS initiate failed {code} > > > {code:java} > spark-sql> select * from temp.test_hive_catalog; > ... > ... > 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) > (machine1.example.com executor 2): > org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive > Metastore > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) > at > org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) > at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) > at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) > at > org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) > at > org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) > at > org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) > at > org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) > at > org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) > at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) > at > org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) > at > org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) > at > org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) > at scala.Option.foreach(Option.scala:407) > at > org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:247) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) >
[jira] [Updated] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-47197: - Description: I can't connect to kerberized HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell. I think this issue is caused by the fact that there is no way to get HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] {code:java} val currentToken = UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) currentToken == null && UserGroupInformation.isSecurityEnabled && hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) || (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) {code} There should be a way to force to get HIVE_DELEGATION_TOKEN even when using spark-sql or spark-shell. Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is set? {code:java} spark.security.credentials.hive.enabled true {code} {code:java} 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (machine1.example.com executor 2): org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore ... Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: GSS initiate failed {code} {code:java} spark-sql> select * from temp.test_hive_catalog; ... ... 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (machine1.example.com executor 2): org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) at scala.Option.foreach(Option.scala:407) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:247) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at
[jira] [Updated] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell
[ https://issues.apache.org/jira/browse/SPARK-47197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-47197: - Description: I can't connect to kerberized HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell. I think this issue is caused by the fact that there is no way to get HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] {code:java} val currentToken = UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) currentToken == null && UserGroupInformation.isSecurityEnabled && hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) || (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) {code} There should be a way to force to get HIVE_DELEGATION_TOKEN even when using spark-sql or spark-shell. Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is set? {code:java} spark.security.credentials.hive.enabled true {code} {code:java} spark-sql> select * from temp.test_hive_catalog; ... ... 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (machine1.example.com executor 2): org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) at scala.Option.foreach(Option.scala:407) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:247) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at
[jira] [Created] (SPARK-47197) Failed to connect HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell
YUBI LEE created SPARK-47197: Summary: Failed to connect HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell Key: SPARK-47197 URL: https://issues.apache.org/jira/browse/SPARK-47197 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 3.5.1, 3.2.3 Reporter: YUBI LEE I can't connect to kerberized HiveMetastore when using iceberg with HiveCatalog by spark-sql or spark-shell. I think this issue is caused by the fact that there is no way to get HIVE_DELEGATION_TOKEN when using spark-sql or spark-shell. ([https://github.com/apache/spark/blob/v3.5.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/security/HiveDelegationTokenProvider.scala#L78-L83)] {code:java} val currentToken = UserGroupInformation.getCurrentUser().getCredentials().getToken(tokenAlias) currentToken == null && UserGroupInformation.isSecurityEnabled && hiveConf(hadoopConf).getTrimmed("hive.metastore.uris", "").nonEmpty && (SparkHadoopUtil.get.isProxyUser(UserGroupInformation.getCurrentUser()) || (!Utils.isClientMode(sparkConf) && !sparkConf.contains(KEYTAB))) {code} There should be a way to force to get HIVE_DELEGATION_TOKEN even when using spark-sql or spark-shell. Possible way is to get HIVE_DELEGATION_TOKEN if the configuration below is set? {code:java} spark.security.credentials.hive.enabled true {code} {code:java} spark-sql> select * from temp.test_hive_catalog; ... ... 24/02/28 07:42:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) (machine1.example.com executor 2): org.apache.iceberg.hive.RuntimeMetaException: Failed to connect to Hive Metastore at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:84) at org.apache.iceberg.hive.HiveClientPool.newClient(HiveClientPool.java:34) at org.apache.iceberg.ClientPoolImpl.get(ClientPoolImpl.java:125) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:56) at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51) at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122) at org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:158) at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97) at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80) at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:124) at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:111) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.overlayTableProperties(HiveIcebergStorageHandler.java:276) at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.configureInputJobProperties(HiveIcebergStorageHandler.java:86) at org.apache.spark.sql.hive.HiveTableUtil$.configureJobPropertiesForStorageHandler(TableReader.scala:426) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:456) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1(TableReader.scala:342) at org.apache.spark.sql.hive.HadoopTableReader.$anonfun$createOldHadoopRDD$1$adapted(TableReader.scala:342) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8(HadoopRDD.scala:181) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$8$adapted(HadoopRDD.scala:181) at scala.Option.foreach(Option.scala:407) at org.apache.spark.rdd.HadoopRDD.$anonfun$getJobConf$6(HadoopRDD.scala:181) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:178) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:247) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at
[jira] [Comment Edited] (SPARK-44976) Preserve full principal user name on executor side
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759201#comment-17759201 ] YUBI LEE edited comment on SPARK-44976 at 12/8/23 12:34 AM: [https://github.com/apache/spark/pull/44244] was (Author: eub): [https://github.com/apache/spark/pull/44244] > Preserve full principal user name on executor side > -- > > Key: SPARK-44976 > URL: https://issues.apache.org/jira/browse/SPARK-44976 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.3, 3.3.3, 3.4.1 >Reporter: YUBI LEE >Priority: Major > Labels: pull-request-available > > SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use > shortname instead of full principal name. > Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the > side of non-kerberized hdfs namenode. > For example, I use 2 hdfs cluster. One is kerberized, the other one is not > kerberized. > I make a rule to add some prefix to username on the non-kerberized cluster if > some one access it from the kerberized cluster. > {code} > > hadoop.security.auth_to_local > > RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > DEFAULT > > {code} > However, if I submit spark job with keytab & principal option, hdfs directory > and files ownership is not coherent. > (I change some words for privacy.) > {code} > $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 > Found 52 items > -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/_SUCCESS > -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 > hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > {code} > Another interesting point is that if I submit spark job without keytab and > principal option but with kerberos authentication with {{kinit}}, it will not > follow {{hadoop.security.auth_to_local}} rule completely. > {code} > $ hdfs dfs -ls hdfs:///user/eub/output/ > Found 3 items > -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 > hdfs:///user/eub/output/_SUCCESS > -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 > hdfs:///user/eub/output/part-0.gz > -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 > hdfs:///user/eub/output/part-1.gz > {code} > I finally found that if I submit spark job with {{--principal}} and > {{--keytab}} option, ugi will be different. > (refer to > https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). > Only file ({{_SUCCESS}}) and output directory created by driver (application > master side) will respect {{hadoop.security.auth_to_local}} on the > non-kerberized namenode only if {{--principal}} and {{--keytab}] options are > provided. > No matter how hdfs files or directory are created by executor or driver, > those should respect {{hadoop.security.auth_to_local}} rule and should be the > same. > Workaround is to pass additional argument to change {{SPARK_USER}} on the > executor side. > e.g. {{--conf spark.executorEnv.SPARK_USER=_ex_eub}} > {{--conf spark.yarn.appMasterEnv.SPARK_USER=_ex_eub}} will make an error. > There are some logics to append environment value with {{:}} (colon) as a > separator. > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L893 > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L52 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44976) Preserve full principal user name on executor side
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759201#comment-17759201 ] YUBI LEE edited comment on SPARK-44976 at 12/8/23 12:33 AM: -https://github.com/apache/spark/pull/42690- https://github.com/apache/spark/pull/44244 was (Author: eub): https://github.com/apache/spark/pull/42690 > Preserve full principal user name on executor side > -- > > Key: SPARK-44976 > URL: https://issues.apache.org/jira/browse/SPARK-44976 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.3, 3.3.3, 3.4.1 >Reporter: YUBI LEE >Priority: Major > Labels: pull-request-available > > SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use > shortname instead of full principal name. > Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the > side of non-kerberized hdfs namenode. > For example, I use 2 hdfs cluster. One is kerberized, the other one is not > kerberized. > I make a rule to add some prefix to username on the non-kerberized cluster if > some one access it from the kerberized cluster. > {code} > > hadoop.security.auth_to_local > > RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > DEFAULT > > {code} > However, if I submit spark job with keytab & principal option, hdfs directory > and files ownership is not coherent. > (I change some words for privacy.) > {code} > $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 > Found 52 items > -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/_SUCCESS > -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 > hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > {code} > Another interesting point is that if I submit spark job without keytab and > principal option but with kerberos authentication with {{kinit}}, it will not > follow {{hadoop.security.auth_to_local}} rule completely. > {code} > $ hdfs dfs -ls hdfs:///user/eub/output/ > Found 3 items > -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 > hdfs:///user/eub/output/_SUCCESS > -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 > hdfs:///user/eub/output/part-0.gz > -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 > hdfs:///user/eub/output/part-1.gz > {code} > I finally found that if I submit spark job with {{--principal}} and > {{--keytab}} option, ugi will be different. > (refer to > https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). > Only file ({{_SUCCESS}}) and output directory created by driver (application > master side) will respect {{hadoop.security.auth_to_local}} on the > non-kerberized namenode only if {{--principal}} and {{--keytab}] options are > provided. > No matter how hdfs files or directory are created by executor or driver, > those should respect {{hadoop.security.auth_to_local}} rule and should be the > same. > Workaround is to pass additional argument to change {{SPARK_USER}} on the > executor side. > e.g. {{--conf spark.executorEnv.SPARK_USER=_ex_eub}} > {{--conf spark.yarn.appMasterEnv.SPARK_USER=_ex_eub}} will make an error. > There are some logics to append environment value with {{:}} (colon) as a > separator. > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L893 > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L52 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44976) Preserve full principal user name on executor side
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759201#comment-17759201 ] YUBI LEE edited comment on SPARK-44976 at 12/8/23 12:33 AM: [https://github.com/apache/spark/pull/44244] was (Author: eub): -https://github.com/apache/spark/pull/42690- https://github.com/apache/spark/pull/44244 > Preserve full principal user name on executor side > -- > > Key: SPARK-44976 > URL: https://issues.apache.org/jira/browse/SPARK-44976 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.3, 3.3.3, 3.4.1 >Reporter: YUBI LEE >Priority: Major > Labels: pull-request-available > > SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use > shortname instead of full principal name. > Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the > side of non-kerberized hdfs namenode. > For example, I use 2 hdfs cluster. One is kerberized, the other one is not > kerberized. > I make a rule to add some prefix to username on the non-kerberized cluster if > some one access it from the kerberized cluster. > {code} > > hadoop.security.auth_to_local > > RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > DEFAULT > > {code} > However, if I submit spark job with keytab & principal option, hdfs directory > and files ownership is not coherent. > (I change some words for privacy.) > {code} > $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 > Found 52 items > -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/_SUCCESS > -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 > hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > {code} > Another interesting point is that if I submit spark job without keytab and > principal option but with kerberos authentication with {{kinit}}, it will not > follow {{hadoop.security.auth_to_local}} rule completely. > {code} > $ hdfs dfs -ls hdfs:///user/eub/output/ > Found 3 items > -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 > hdfs:///user/eub/output/_SUCCESS > -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 > hdfs:///user/eub/output/part-0.gz > -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 > hdfs:///user/eub/output/part-1.gz > {code} > I finally found that if I submit spark job with {{--principal}} and > {{--keytab}} option, ugi will be different. > (refer to > https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). > Only file ({{_SUCCESS}}) and output directory created by driver (application > master side) will respect {{hadoop.security.auth_to_local}} on the > non-kerberized namenode only if {{--principal}} and {{--keytab}] options are > provided. > No matter how hdfs files or directory are created by executor or driver, > those should respect {{hadoop.security.auth_to_local}} rule and should be the > same. > Workaround is to pass additional argument to change {{SPARK_USER}} on the > executor side. > e.g. {{--conf spark.executorEnv.SPARK_USER=_ex_eub}} > {{--conf spark.yarn.appMasterEnv.SPARK_USER=_ex_eub}} will make an error. > There are some logics to append environment value with {{:}} (colon) as a > separator. > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L893 > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L52 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44976) Preserve full principal user name on executor side
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-44976: - Summary: Preserve full principal user name on executor side (was: Utils.getCurrentUserName should return the full principal name) > Preserve full principal user name on executor side > -- > > Key: SPARK-44976 > URL: https://issues.apache.org/jira/browse/SPARK-44976 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.3, 3.3.3, 3.4.1 >Reporter: YUBI LEE >Priority: Major > > SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use > shortname instead of full principal name. > Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the > side of non-kerberized hdfs namenode. > For example, I use 2 hdfs cluster. One is kerberized, the other one is not > kerberized. > I make a rule to add some prefix to username on the non-kerberized cluster if > some one access it from the kerberized cluster. > {code} > > hadoop.security.auth_to_local > > RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > DEFAULT > > {code} > However, if I submit spark job with keytab & principal option, hdfs directory > and files ownership is not coherent. > (I change some words for privacy.) > {code} > $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 > Found 52 items > -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/_SUCCESS > -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 > hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > {code} > Another interesting point is that if I submit spark job without keytab and > principal option but with kerberos authentication with {{kinit}}, it will not > follow {{hadoop.security.auth_to_local}} rule completely. > {code} > $ hdfs dfs -ls hdfs:///user/eub/output/ > Found 3 items > -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 > hdfs:///user/eub/output/_SUCCESS > -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 > hdfs:///user/eub/output/part-0.gz > -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 > hdfs:///user/eub/output/part-1.gz > {code} > I finally found that if I submit spark job with {{--principal}} and > {{--keytab}} option, ugi will be different. > (refer to > https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). > Only file ({{_SUCCESS}}) and output directory created by driver (application > master side) will respect {{hadoop.security.auth_to_local}} on the > non-kerberized namenode only if {{--principal}} and {{--keytab}] options are > provided. > No matter how hdfs files or directory are created by executor or driver, > those should respect {{hadoop.security.auth_to_local}} rule and should be the > same. > Workaround is to pass additional argument to change {{SPARK_USER}} on the > executor side. > e.g. {{--conf spark.executorEnv.SPARK_USER=_ex_eub}} > {{--conf spark.yarn.appMasterEnv.SPARK_USER=_ex_eub}} will make an error. > There are some logics to append environment value with {{:}} (colon) as a > separator. > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L893 > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L52 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44976) Utils.getCurrentUserName should return the full principal name
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759255#comment-17759255 ] YUBI LEE commented on SPARK-44976: -- I think it is also related to https://issues.apache.org/jira/browse/SPARK-31551. > Utils.getCurrentUserName should return the full principal name > -- > > Key: SPARK-44976 > URL: https://issues.apache.org/jira/browse/SPARK-44976 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.3, 3.3.3, 3.4.1 >Reporter: YUBI LEE >Priority: Major > > SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use > shortname instead of full principal name. > Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the > side of non-kerberized hdfs namenode. > For example, I use 2 hdfs cluster. One is kerberized, the other one is not > kerberized. > I make a rule to add some prefix to username on the non-kerberized cluster if > some one access it from the kerberized cluster. > {code} > > hadoop.security.auth_to_local > > RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > DEFAULT > > {code} > However, if I submit spark job with keytab & principal option, hdfs directory > and files ownership is not coherent. > (I change some words for privacy.) > {code} > $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 > Found 52 items > -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/_SUCCESS > -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 > hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > {code} > Another interesting point is that if I submit spark job without keytab and > principal option but with kerberos authentication with {{kinit}}, it will not > follow {{hadoop.security.auth_to_local}} rule completely. > {code} > $ hdfs dfs -ls hdfs:///user/eub/output/ > Found 3 items > -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 > hdfs:///user/eub/output/_SUCCESS > -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 > hdfs:///user/eub/output/part-0.gz > -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 > hdfs:///user/eub/output/part-1.gz > {code} > I finally found that if I submit spark job with {{--principal}} and > {{--keytab}} option, ugi will be different. > (refer to > https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). > Only file ({{_SUCCESS}}) and output directory created by driver (application > master side) will respect {{hadoop.security.auth_to_local}} on the > non-kerberized namenode only if {{--principal}} and {{--keytab}] options are > provided. > No matter how hdfs files or directory are created by executor or driver, > those should respect {{hadoop.security.auth_to_local}} rule and should be the > same. > Workaround is to pass additional argument to change {{SPARK_USER}} on the > executor side. > e.g. {{--conf spark.executorEnv.SPARK_USER=_ex_eub}} > {{--conf spark.yarn.appMasterEnv.SPARK_USER=_ex_eub}} will make an error. > There are some logics to append environment value with {{:}} (colon) as a > separator. > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L893 > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L52 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44976) Utils.getCurrentUserName should return the full principal name
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759201#comment-17759201 ] YUBI LEE commented on SPARK-44976: -- https://github.com/apache/spark/pull/42690 > Utils.getCurrentUserName should return the full principal name > -- > > Key: SPARK-44976 > URL: https://issues.apache.org/jira/browse/SPARK-44976 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.3, 3.3.3, 3.4.1 >Reporter: YUBI LEE >Priority: Major > > SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use > shortname instead of full principal name. > Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the > side of non-kerberized hdfs namenode. > For example, I use 2 hdfs cluster. One is kerberized, the other one is not > kerberized. > I make a rule to add some prefix to username on the non-kerberized cluster if > some one access it from the kerberized cluster. > {code} > > hadoop.security.auth_to_local > > RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ > DEFAULT > > {code} > However, if I submit spark job with keytab & principal option, hdfs directory > and files ownership is not coherent. > (I change some words for privacy.) > {code} > $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 > Found 52 items > -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/_SUCCESS > -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 > hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 > hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz > {code} > Another interesting point is that if I submit spark job without keytab and > principal option but with kerberos authentication with {{kinit}}, it will not > follow {{hadoop.security.auth_to_local}} rule completely. > {code} > $ hdfs dfs -ls hdfs:///user/eub/output/ > Found 3 items > -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 > hdfs:///user/eub/output/_SUCCESS > -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 > hdfs:///user/eub/output/part-0.gz > -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 > hdfs:///user/eub/output/part-1.gz > {code} > I finally found that if I submit spark job with {{--principal}} and > {{--keytab}} option, ugi will be different. > (refer to > https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). > Only file ({{_SUCCESS}}) and output directory created by driver (application > master side) will respect {{hadoop.security.auth_to_local}} on the > non-kerberized namenode only if {{--principal}} and {{--keytab}] options are > provided. > No matter how hdfs files or directory are created by executor or driver, > those should respect {{hadoop.security.auth_to_local}} rule and should be the > same. > Workaround is to pass additional argument to change {{SPARK_USER}} on the > executor side. > e.g. {{--conf spark.executorEnv.SPARK_USER=_ex_eub}} > {{--conf spark.yarn.appMasterEnv.SPARK_USER=_ex_eub}} will make an error. > There are some logics to append environment value with {{:}} (colon) as a > separator. > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L893 > - > https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L52 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44976) Utils.getCurrentUserName should return the full principal name
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-44976: - Description: SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use shortname instead of full principal name. Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the side of non-kerberized hdfs namenode. For example, I use 2 hdfs cluster. One is kerberized, the other one is not kerberized. I make a rule to add some prefix to username on the non-kerberized cluster if some one access it from the kerberized cluster. {code} hadoop.security.auth_to_local RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ DEFAULT {code} However, if I submit spark job with keytab & principal option, hdfs directory and files ownership is not coherent. (I change some words for privacy.) {code} $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 Found 52 items -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/_SUCCESS -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz {code} Another interesting point is that if I submit spark job without keytab and principal option but with kerberos authentication with {{kinit}}, it will not follow {{hadoop.security.auth_to_local}} rule completely. {code} $ hdfs dfs -ls hdfs:///user/eub/output/ Found 3 items -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 hdfs:///user/eub/output/_SUCCESS -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 hdfs:///user/eub/output/part-0.gz -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 hdfs:///user/eub/output/part-1.gz {code} I finally found that if I submit spark job with {{--principal}} and {{--keytab}} option, ugi will be different. (refer to https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). Only file ({{_SUCCESS}}) and output directory created by driver (application master side) will respect {{hadoop.security.auth_to_local}} on the non-kerberized namenode only if {{--principal}} and {{--keytab}] options are provided. No matter how hdfs files or directory are created by executor or driver, those should respect {{hadoop.security.auth_to_local}} rule and should be the same. Workaround is to pass additional argument to change {{SPARK_USER}} on the executor side. e.g. {{--conf spark.executorEnv.SPARK_USER=_ex_eub}} {{--conf spark.yarn.appMasterEnv.SPARK_USER=_ex_eub}} will make an error. There are some logics to append environment value with {{:}} (colon) as a separator. - https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L893 - https://github.com/apache/spark/blob/4748d858b4478ea7503b792050d4735eae83b3cd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala#L52 was: SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use shortname instead of full principal name. Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the side of non-kerberized hdfs namenode. For example, I use 2 hdfs cluster. One is kerberized, the other one is not kerberized. I make a rule to add some prefix to username on the non-kerberized cluster if some one access it from the kerberized cluster. {code} hadoop.security.auth_to_local RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ DEFAULT {code} However, if I submit spark job with keytab & principal option, hdfs directory and files ownership is not coherent. (I change some words for privacy.) {code} $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 Found 52 items -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/_SUCCESS -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs
[jira] [Updated] (SPARK-44976) Utils.getCurrentUserName should return the full principal name
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-44976: - Description: SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use shortname instead of full principal name. Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the side of non-kerberized hdfs namenode. For example, I use 2 hdfs cluster. One is kerberized, the other one is not kerberized. I make a rule to add some prefix to username on the non-kerberized cluster if some one access it from the kerberized cluster. {code} hadoop.security.auth_to_local RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ DEFAULT {code} However, if I submit spark job with keytab & principal option, hdfs directory and files ownership is not coherent. (I change some words for privacy.) {code} $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 Found 52 items -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/_SUCCESS -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz {code} Another interesting point is that if I submit spark job without keytab and principal option but with kerberos authentication with {{kinit}}, it will not follow {{hadoop.security.auth_to_local}} rule completely. {code} $ hdfs dfs -ls hdfs:///user/eub/output/ Found 3 items -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 hdfs:///user/eub/output/_SUCCESS -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 hdfs:///user/eub/output/part-0.gz -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 hdfs:///user/eub/output/part-1.gz {code} I finally found that if I submit spark job with {{--principal}} and {{--keytab}} option, ugi will be different. (refer to https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). Only file ({{_SUCCESS}}) and output directory created by driver (application master side) will respect {{hadoop.security.auth_to_local}} on the non-kerberized namenode only if {{--principal}} and {{--keytab}] options are provided. No matter how hdfs files or directory are created by executor or driver, those should respect {{hadoop.security.auth_to_local}} rule and should be the same. This issue is related to https://issues.apache.org/jira/browse/SPARK-6558. was: SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use shortname instead of full principal name. Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the side of non-kerberized hdfs namenode. For example, I use 2 hdfs cluster. One is kerberized, the other one is not kerberized. I make a rule to add some prefix to username on the non-kerberized cluster if some one access it from the kerberized cluster. {code} hadoop.security.auth_to_local RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ DEFAULT {code} However, if I submit spark job with keytab & principal option, hdfs directory and files ownership is not coherent. (I change some words for privacy.) {code} $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 Found 52 items -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/_SUCCESS -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz {code} Another interesting point is that if I submit spark job without keytab and principal option but with kerberos authentication with {{kinit}}, it will not follow {{hadoop.security.auth_to_local}} rule completely. {code} $ hdfs dfs -ls hdfs:///user/eub/output/ Found 3 items -rw-rw-r--+ 3
[jira] [Updated] (SPARK-44976) Utils.getCurrentUserName should return the full principal name
[ https://issues.apache.org/jira/browse/SPARK-44976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-44976: - Description: SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use shortname instead of full principal name. Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the side of non-kerberized hdfs namenode. For example, I use 2 hdfs cluster. One is kerberized, the other one is not kerberized. I make a rule to add some prefix to username on the non-kerberized cluster if some one access it from the kerberized cluster. {code} hadoop.security.auth_to_local RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ DEFAULT {code} However, if I submit spark job with keytab & principal option, hdfs directory and files ownership is not coherent. (I change some words for privacy.) {code} $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 Found 52 items -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/_SUCCESS -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz {code} Another interesting point is that if I submit spark job without keytab and principal option but with kerberos authentication with {{kinit}}, it will not follow {{hadoop.security.auth_to_local}} rule completely. {code} $ hdfs dfs -ls hdfs:///user/eub/output/ Found 3 items -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 hdfs:///user/eub/output/_SUCCESS -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 hdfs:///user/eub/output/part-0.gz -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 hdfs:///user/eub/output/part-1.gz {code} I finally found that if I submit spark job with {{--principal}} and {{--keytab}} option, ugi will be different. (refer to https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). Only file ({{_SUCCESS}}) and output directory created by driver (application master side) will respect {{hadoop.security.auth_to_local}} on the non-kerberized namenode only if {{--principal}} and {{--keytab}] options are provided. No matter how hdfs files or directory are created by executor or driver, those should respect {{hadoop.security.auth_to_local}} rule and should be the same. was: SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use shortname instead of full principal name. Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the side of non-kerberized hdfs namenode. For example, I use 2 hdfs cluster. One is kerberized, the other one is not kerberized. I make a rule to add some prefix to username on the non-kerberized cluster if some one access it from the kerberized cluster. {code} hadoop.security.auth_to_local RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ DEFAULT {code} However, if I submit spark job with keytab & principal option, hdfs directory and files ownership is not coherent. (I change some words for privacy.) {code} $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 Found 52 items -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/_SUCCESS -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz {code} Another interesting point is that if I submit spark job without keytab and principal option but with kerberos authentication with {{kinit}}, it will not follow {{hadoop.security.auth_to_local}} rule completely. {code} $ hdfs dfs -ls hdfs:///user/eub/output/ Found 3 items -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 hdfs:///user/eub/output/_SUCCESS
[jira] [Created] (SPARK-44976) Utils.getCurrentUserName should return the full principal name
YUBI LEE created SPARK-44976: Summary: Utils.getCurrentUserName should return the full principal name Key: SPARK-44976 URL: https://issues.apache.org/jira/browse/SPARK-44976 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.1, 3.3.3, 3.2.3 Reporter: YUBI LEE SPARK-6558 changes the behavior of {{Utils.getCurrentUserName()}} to use shortname instead of full principal name. Due to this, it doesn't respect {{hadoop.security.auth_to_local}} rule on the side of non-kerberized hdfs namenode. For example, I use 2 hdfs cluster. One is kerberized, the other one is not kerberized. I make a rule to add some prefix to username on the non-kerberized cluster if some one access it from the kerberized cluster. {code} hadoop.security.auth_to_local RULE:[1:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ RULE:[2:$1@$0](.*@EXAMPLE.COM)s/(.+)@.*/_ex_$1/ DEFAULT {code} However, if I submit spark job with keytab & principal option, hdfs directory and files ownership is not coherent. (I change some words for privacy.) {code} $ hdfs dfs -ls hdfs:///user/eub/some/path/20230510/23 Found 52 items -rw-rw-rw- 3 _ex_eub hdfs 0 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/_SUCCESS -rw-r--r-- 3 eub hdfs 134418857 2023-05-11 00:15 hdfs:///user/eub/some/path/20230510/23/part-0-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 153410049 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-1-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 157260989 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-2-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz -rw-r--r-- 3 eub hdfs 156222760 2023-05-11 00:16 hdfs:///user/eub/some/path/20230510/23/part-3-b781be38-9dbc-41da-8d0e-597a7f343649-c000.txt.gz {code} Another interesting point is that if I submit spark job without keytab and principal option but with kerberos authentication with {{kinit}}, it will not follow {{hadoop.security.auth_to_local}} rule completely. {code} $ hdfs dfs -ls hdfs:///user/eub/output/ Found 3 items -rw-rw-r--+ 3 eub hdfs 0 2023-08-25 12:31 hdfs:///user/eub/output/_SUCCESS -rw-rw-r--+ 3 eub hdfs512 2023-08-25 12:31 hdfs:///user/eub/output/part-0.gz -rw-rw-r--+ 3 eub hdfs574 2023-08-25 12:31 hdfs:///user/eub/output/part-1.gz {code} I finally found that if I submit spark job with {{--principal}} and {{--keytab}} option, ugi will be different. (refer to https://github.com/apache/spark/blob/2583bd2c16a335747895c0843f438d0966f47ecd/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L905). Only file ({{_SUCCESS}}) and output directory created by driver (application master side) will respect {{hadoop.security.auth_to_local}} on the non-kerberized namenode only if {{--principal}} and {{--keytab}] options are provided. No matter how hdfs files or directory are created by executor or driver, those should respect {{hadoop.security.auth_to_local}} rule and should be the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40964) Cannot run spark history server with shaded hadoop jar
[ https://issues.apache.org/jira/browse/SPARK-40964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-40964: - Description: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. If you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter, you will meet following exception. {code} # spark-env.sh export SPARK_HISTORY_OPTS='-Dspark.ui.filters=org.apache.hadoop.security.authentication.server.AuthenticationFilter -Dspark.org.apache.hadoop.security.authentication.server.AuthenticationFilter.params="type=kerberos,kerberos.principal=HTTP/some.example@example.com,kerberos.keytab=/etc/security/keytabs/spnego.service.keytab"' {code} {code} # spark history server's out file 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:755) at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379) at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:910) at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:491) at org.apache.spark.ui.WebUI.$anonfun$bind$3(WebUI.scala:148) at org.apache.spark.ui.WebUI.$anonfun$bind$3$adapted(WebUI.scala:148) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.ui.WebUI.bind(WebUI.scala:148) at org.apache.spark.deploy.history.HistoryServer.bind(HistoryServer.scala:164) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:310) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) {code} I think "AuthenticationFilter" in the shaded jar imports "org.apache.hadoop.shaded.javax.servlet.Filter", not "javax.servlet.Filter". {code} ❯ grep -r org.apache.hadoop.shaded.javax.servlet.Filter * Binary file hadoop-client-runtime-3.3.1.jar matches {code} It causes the exception I mentioned. I'm not sure what is the best answer. Workaround is not to use spark with pre-built for Apache Hadoop, specify `HADOOP_HOME` or `SPARK_DIST_CLASSPATH` in spark-env.sh for Spark History Server. May be the possible options are: - Not to shade "javax.servlet.Filter" at hadoop shaded jar - Or, shade "javax.servlet.Filter" also at jetty. was: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. If you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter, you will meet following exception. {code} export SPARK_HISTORY_OPTS='-Dspark.ui.filters=org.apache.hadoop.security.authentication.server.AuthenticationFilter -Dspark.org.apache.hadoop.security.authentication.server.AuthenticationFilter.params="type=kerberos,kerberos.principal=HTTP/some.example@example.com,kerberos.keytab=/etc/security/keytabs/spnego.service.keytab"' {code} {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException:
[jira] [Updated] (SPARK-40964) Cannot run spark history server with shaded hadoop jar
[ https://issues.apache.org/jira/browse/SPARK-40964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-40964: - Description: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. If you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter, you will meet following exception. {code} export SPARK_HISTORY_OPTS='-Dspark.ui.filters=org.apache.hadoop.security.authentication.server.AuthenticationFilter -Dspark.org.apache.hadoop.security.authentication.server.AuthenticationFilter.params="type=kerberos,kerberos.principal=HTTP/some.example@example.com,kerberos.keytab=/etc/security/keytabs/spnego.service.keytab"' {code} {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:755) at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379) at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:910) at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:491) at org.apache.spark.ui.WebUI.$anonfun$bind$3(WebUI.scala:148) at org.apache.spark.ui.WebUI.$anonfun$bind$3$adapted(WebUI.scala:148) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.ui.WebUI.bind(WebUI.scala:148) at org.apache.spark.deploy.history.HistoryServer.bind(HistoryServer.scala:164) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:310) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) {code} I think "AuthenticationFilter" in the shaded jar imports "org.apache.hadoop.shaded.javax.servlet.Filter", not "javax.servlet.Filter". {code} ❯ grep -r org.apache.hadoop.shaded.javax.servlet.Filter * Binary file hadoop-client-runtime-3.3.1.jar matches {code} It causes the exception I mentioned. I'm not sure what is the best answer. Workaround is not to use spark with pre-built for Apache Hadoop, specify `HADOOP_HOME` or `SPARK_DIST_CLASSPATH` in spark-env.sh for Spark History Server. May be the possible options are: - Not to shade "javax.servlet.Filter" at hadoop shaded jar - Or, shade "javax.servlet.Filter" also at jetty. was: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. If you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter, you will meet following exception. {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730)
[jira] [Updated] (SPARK-40964) Cannot run spark history server with shaded hadoop jar
[ https://issues.apache.org/jira/browse/SPARK-40964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-40964: - Description: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. If you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter, you will meet following exception. {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:755) at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379) at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:910) at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:491) at org.apache.spark.ui.WebUI.$anonfun$bind$3(WebUI.scala:148) at org.apache.spark.ui.WebUI.$anonfun$bind$3$adapted(WebUI.scala:148) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.ui.WebUI.bind(WebUI.scala:148) at org.apache.spark.deploy.history.HistoryServer.bind(HistoryServer.scala:164) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:310) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) {code} I think "AuthenticationFilter" in the shaded jar imports "org.apache.hadoop.shaded.javax.servlet.Filter", not "javax.servlet.Filter". {code} ❯ grep -r org.apache.hadoop.shaded.javax.servlet.Filter * Binary file hadoop-client-runtime-3.3.1.jar matches {code} It causes the exception I mentioned. I'm not sure what is the best answer. Workaround is not to use spark with pre-built for Apache Hadoop, specify `HADOOP_HOME` or `SPARK_DIST_CLASSPATH` in spark-env.sh for Spark History Server. May be the possible options are: - Not to shade "javax.servlet.Filter" at hadoop shaded jar - Or, shade "javax.servlet.Filter" also at jetty. was: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. In this situation, if you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter. You will meet following exception. {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at
[jira] [Updated] (SPARK-40964) Cannot run spark history server with shaded hadoop jar
[ https://issues.apache.org/jira/browse/SPARK-40964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-40964: - Description: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. In this situation, if you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter. You will meet following exception. {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:755) at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379) at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:910) at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:491) at org.apache.spark.ui.WebUI.$anonfun$bind$3(WebUI.scala:148) at org.apache.spark.ui.WebUI.$anonfun$bind$3$adapted(WebUI.scala:148) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.ui.WebUI.bind(WebUI.scala:148) at org.apache.spark.deploy.history.HistoryServer.bind(HistoryServer.scala:164) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:310) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) {code} I think "AuthenticationFilter" in the shaded jar imports "org.apache.hadoop.shaded.javax.servlet.Filter", not "javax.servlet.Filter". ``` ❯ grep -r org.apache.hadoop.shaded.javax.servlet.Filter * Binary file hadoop-client-runtime-3.3.1.jar matches ``` It causes the exception I mentioned. I'm not sure what is the best answer. Workaround is not to use spark with pre-built for Apache Hadoop, specify `HADOOP_HOME` or `SPARK_DIST_CLASSPATH` in spark-env.sh for Spark History Server. May be the possible options are: - Not to shade "javax.servlet.Filter" at hadoop shaded jar - Or, shade "javax.servlet.Filter" also at jetty. was: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. In this situation, if you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter. You will meet following exception. {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at
[jira] [Updated] (SPARK-40964) Cannot run spark history server with shaded hadoop jar
[ https://issues.apache.org/jira/browse/SPARK-40964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-40964: - Description: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. In this situation, if you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter. You will meet following exception. {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:755) at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379) at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:910) at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:491) at org.apache.spark.ui.WebUI.$anonfun$bind$3(WebUI.scala:148) at org.apache.spark.ui.WebUI.$anonfun$bind$3$adapted(WebUI.scala:148) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.ui.WebUI.bind(WebUI.scala:148) at org.apache.spark.deploy.history.HistoryServer.bind(HistoryServer.scala:164) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:310) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) {code} I think "AuthenticationFilter" in the shaded jar imports "org.apache.hadoop.shaded.javax.servlet.Filter", not "javax.servlet.Filter". {code} ❯ grep -r org.apache.hadoop.shaded.javax.servlet.Filter * Binary file hadoop-client-runtime-3.3.1.jar matches {code} It causes the exception I mentioned. I'm not sure what is the best answer. Workaround is not to use spark with pre-built for Apache Hadoop, specify `HADOOP_HOME` or `SPARK_DIST_CLASSPATH` in spark-env.sh for Spark History Server. May be the possible options are: - Not to shade "javax.servlet.Filter" at hadoop shaded jar - Or, shade "javax.servlet.Filter" also at jetty. was: Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. In this situation, if you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter. You will meet following exception. {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at
[jira] [Created] (SPARK-40964) Cannot run spark history server with shaded hadoop jar
YUBI LEE created SPARK-40964: Summary: Cannot run spark history server with shaded hadoop jar Key: SPARK-40964 URL: https://issues.apache.org/jira/browse/SPARK-40964 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.2.2 Reporter: YUBI LEE Since SPARK-33212, Spark uses shaded client jars from Hadoop 3.x+. In this situation, if you try to start Spark History Server with shaded client jars and enable security using org.apache.hadoop.security.authentication.server.AuthenticationFilter. You will meet following exception. {code} 22/10/27 15:29:48 INFO AbstractConnector: Started ServerConnector@5ca1f591{HTTP/1.1, (http/1.1)}{0.0.0.0:18081} 22/10/27 15:29:48 INFO Utils: Successfully started service 'HistoryServerUI' on port 18081. 22/10/27 15:29:48 INFO ServerInfo: Adding filter to /: org.apache.hadoop.security.authentication.server.AuthenticationFilter 22/10/27 15:29:48 ERROR HistoryServer: Failed to bind HistoryServer java.lang.IllegalStateException: class org.apache.hadoop.security.authentication.server.AuthenticationFilter is not a javax.servlet.Filter at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:103) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$0(ServletHandler.java:730) at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647) at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:755) at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:379) at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:910) at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:288) at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73) at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:491) at org.apache.spark.ui.WebUI.$anonfun$bind$3(WebUI.scala:148) at org.apache.spark.ui.WebUI.$anonfun$bind$3$adapted(WebUI.scala:148) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.ui.WebUI.bind(WebUI.scala:148) at org.apache.spark.deploy.history.HistoryServer.bind(HistoryServer.scala:164) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:310) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) {code} I think "AuthenticationFilter" in the shaded jar imports "org.apache.hadoop.shaded.javax.servlet.Filter", not "javax.servlet.Filter". It causes the exception I mentioned. I'm not sure what is the best answer. Workaround is not to use spark with pre-built for Apache Hadoop, specify `HADOOP_HOME` or `SPARK_DIST_CLASSPATH` in spark-env.sh for Spark History Server. May be the possible options are: - Not to shade "javax.servlet.Filter" at hadoop shaded jar - Or, shade "javax.servlet.Filter" also at jetty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40072) MAVEN_OPTS in make-distributions.sh is different from one specified in pom.xml
[ https://issues.apache.org/jira/browse/SPARK-40072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YUBI LEE updated SPARK-40072: - Description: Building spark with make-distribution.sh is failed with default setting because default MAVEN_OPTS is different from the one in pom.xml. It is related to [SPARK-35825|https://issues.apache.org/jira/browse/SPARK-35825]. PR: https://github.com/apache/spark/pull/37510 was: Building spark with make-distribution.sh is failed with default setting because default MAVEN_OPTS is different from the one in pom.xml. It is related to [SPARK-35825|https://issues.apache.org/jira/browse/SPARK-35825]. > MAVEN_OPTS in make-distributions.sh is different from one specified in pom.xml > -- > > Key: SPARK-40072 > URL: https://issues.apache.org/jira/browse/SPARK-40072 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.2 >Reporter: YUBI LEE >Priority: Minor > > Building spark with make-distribution.sh is failed with default setting > because default MAVEN_OPTS is different from the one in pom.xml. > It is related to > [SPARK-35825|https://issues.apache.org/jira/browse/SPARK-35825]. > PR: https://github.com/apache/spark/pull/37510 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40072) MAVEN_OPTS in make-distributions.sh is different from one specified in pom.xml
YUBI LEE created SPARK-40072: Summary: MAVEN_OPTS in make-distributions.sh is different from one specified in pom.xml Key: SPARK-40072 URL: https://issues.apache.org/jira/browse/SPARK-40072 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.2.2 Reporter: YUBI LEE Building spark with make-distribution.sh is failed with default setting because default MAVEN_OPTS is different from the one in pom.xml. It is related to [SPARK-35825|https://issues.apache.org/jira/browse/SPARK-35825]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org