[jira] [Updated] (HIVE-27862) Map propertyContent to a wrong column in package.jdo
[ https://issues.apache.org/jira/browse/HIVE-27862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27862: -- Labels: pull-request-available (was: ) > Map propertyContent to a wrong column in package.jdo > > > Key: HIVE-27862 > URL: https://issues.apache.org/jira/browse/HIVE-27862 > Project: Hive > Issue Type: Bug >Reporter: Zhihua Deng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27555) Upgrade issues with Kudu table on backend db
[ https://issues.apache.org/jira/browse/HIVE-27555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27555: -- Labels: pull-request-available (was: ) > Upgrade issues with Kudu table on backend db > > > Key: HIVE-27555 > URL: https://issues.apache.org/jira/browse/HIVE-27555 > Project: Hive > Issue Type: Bug >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > > In HIVE-27457, we try to update the serde lib, (input/output)format of the > kudu table in back db. In the upgrade scripts, we join the "SDS"."SD_ID" > with "TABLE_PARAMS"."TBL_ID", > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-4.0.0-alpha-2-to-4.0.0-beta-1.mysql.sql#L37-L39 > as "SD_ID" is the primary key of SDS, and "TBL_ID" is the primary key of > TBLS, we can't join the two tables using these two columns. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.2
[ https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-27819. --- Fix Version/s: 4.0.0 Resolution: Fixed > Iceberg: Upgrade iceberg version to 1.4.2 > - > > Key: HIVE-27819 > URL: https://issues.apache.org/jira/browse/HIVE-27819 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg > depdency from 1.3.0 to 1.4.2. Meantime, we should port some Hive catalog > changes from Iceberg repo to Hive repo. > [https://iceberg.apache.org/releases/#142-release] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.2
[ https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785051#comment-17785051 ] Denys Kuzmenko commented on HIVE-27819: --- Merged to master Thanks for the patch [~zhangbutao], and [~ayushsaxena], [~kkasa] for the review! > Iceberg: Upgrade iceberg version to 1.4.2 > - > > Key: HIVE-27819 > URL: https://issues.apache.org/jira/browse/HIVE-27819 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg > depdency from 1.3.0 to 1.4.2. Meantime, we should port some Hive catalog > changes from Iceberg repo to Hive repo. > [https://iceberg.apache.org/releases/#142-release] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27762) Don't fall back to jdo query in ObjectStore if direct sql throws unrecoverable exception
[ https://issues.apache.org/jira/browse/HIVE-27762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko resolved HIVE-27762. --- Fix Version/s: 4.0.0 Resolution: Fixed > Don't fall back to jdo query in ObjectStore if direct sql throws > unrecoverable exception > > > Key: HIVE-27762 > URL: https://issues.apache.org/jira/browse/HIVE-27762 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently {{GetHelper}} will call {{getJdoResult()}} if {{getSqlResult()}} > throws exception, it can be avoid if the exception is unrecoverable to > improve performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27762) Don't fall back to jdo query in ObjectStore if direct sql throws unrecoverable exception
[ https://issues.apache.org/jira/browse/HIVE-27762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785046#comment-17785046 ] Denys Kuzmenko commented on HIVE-27762: --- Merged to master. Thanks [~wechar] for the patch and [~gsaihemanth] for the review! > Don't fall back to jdo query in ObjectStore if direct sql throws > unrecoverable exception > > > Key: HIVE-27762 > URL: https://issues.apache.org/jira/browse/HIVE-27762 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > > Currently {{GetHelper}} will call {{getJdoResult()}} if {{getSqlResult()}} > throws exception, it can be avoid if the exception is unrecoverable to > improve performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27762) Don't fall back to jdo query in ObjectStore if direct sql throws unrecoverable exception
[ https://issues.apache.org/jira/browse/HIVE-27762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-27762: -- Summary: Don't fall back to jdo query in ObjectStore if direct sql throws unrecoverable exception (was: ObjectStore GetHelper do not need run jdo query if direct sql exception is unrecoverable) > Don't fall back to jdo query in ObjectStore if direct sql throws > unrecoverable exception > > > Key: HIVE-27762 > URL: https://issues.apache.org/jira/browse/HIVE-27762 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > > Currently {{GetHelper}} will call {{getJdoResult()}} if {{getSqlResult()}} > throws exception, it can be avoid if the exception is unrecoverable to > improve performance. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784956#comment-17784956 ] Krisztian Kasa commented on HIVE-26986: --- [~seonggon] 1. It is not clear why adding extra concentrator RS leads to data correctness issue. Could you please share a simple repro on a small dataset which has the necessary records only. It can be also added to the PR to extend the test coverage of SWO and ParallelEdgeFixer. 2. IIUC parallel edge support can be controlled via config setting. Could you please verify if the correctness issue stands when {code:java} set hive.optimize.shared.work.parallel.edge.support=false; {code} > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: hive-4.0.0-must, pull-request-available > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > Time Spent: 50m > Remaining Estimate: 0h > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27869: -- Labels: pull-request-available (was: ) > Iceberg: Select HadoopTables will fail at > HiveIcebergStorageHandler::canProvideColStats > > > Key: HIVE-27869 > URL: https://issues.apache.org/jira/browse/HIVE-27869 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > Labels: pull-request-available > > Step to reproduce:(latest master code) > 1) Create path-based HadoopTable by Spark: > > {code:java} > ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client > \--conf > spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions > \--conf > spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog > \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf > spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; > create table ice_test_001(id int) using iceberg; > insert into ice_test_001(id) values(1),(2),(3);{code} > > 2) Create iceberg table based on the HadoopTable by Hive: > {code:java} > CREATE EXTERNAL TABLE ice_test_001 STORED BY > 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION > 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES > ('iceberg.catalog'='location_based_table'); {code} > 3)Select the HadoopTable by Hive > // launch tez task to scan data > *set hive.fetch.task.conversion=none;* > {code:java} > jdbc:hive2://localhost:1/default> select * from ice_test_001; > Error: Error while compiling statement: FAILED: IllegalArgumentException > Pathname > /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > from > hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > is not a valid DFS filename. (state=42000,code=4) {code} > Full stacktrace: > {code:java} > Caused by: java.lang.IllegalArgumentException: Pathname > /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > from > hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > is not a valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > ~[hadoop-common-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) > ~[hadoop-common-3.3.1.jar:?] > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) > ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) > ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) >
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce:(latest master code) 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),(2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive // launch tez task to scan data *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:1/default> select * from ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),(2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive // launch tez task to scan data *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:1/default> select * from ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),(2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive // launch tez task to scan data *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:10004/default> select * from ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:10004/default> select * from ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:84)
[jira] [Assigned] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao reassigned HIVE-27869: - Assignee: zhangbutao > Iceberg: Select HadoopTables will fail at > HiveIcebergStorageHandler::canProvideColStats > > > Key: HIVE-27869 > URL: https://issues.apache.org/jira/browse/HIVE-27869 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: zhangbutao >Assignee: zhangbutao >Priority: Major > > Step to reproduce: > 1) Create path-based HadoopTable by Spark: > > {code:java} > ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client > \--conf > spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions > \--conf > spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog > \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf > spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; > create table ice_test_001(id int) using iceberg; > insert into ice_test_001(id) values(1),2),(3);{code} > > 2) Create iceberg table based on the HadoopTable by Hive: > {code:java} > CREATE EXTERNAL TABLE ice_test_001STORED BY > 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION > 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES > ('iceberg.catalog'='location_based_table'); {code} > 3)Select the HadoopTable by Hive > *set hive.fetch.task.conversion=none;* > {code:java} > jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001; > Error: Error while compiling statement: FAILED: IllegalArgumentException > Pathname > /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > from > hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > is not a valid DFS filename. (state=42000,code=4) {code} > Full stacktrace: > {code:java} > Caused by: java.lang.IllegalArgumentException: Pathname > /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > from > hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 > is not a valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > ~[hadoop-common-3.3.1.jar:?] > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) > ~[hadoop-hdfs-client-3.3.1.jar:?] > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) > ~[hadoop-common-3.3.1.jar:?] > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) > ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) > ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] > at >
[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
[ https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhangbutao updated HIVE-27869: -- Description: Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg; insert into ice_test_001(id) values(1),2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at
[jira] [Created] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats
zhangbutao created HIVE-27869: - Summary: Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats Key: HIVE-27869 URL: https://issues.apache.org/jira/browse/HIVE-27869 Project: Hive Issue Type: Improvement Components: Iceberg integration Reporter: zhangbutao Step to reproduce: 1) Create path-based HadoopTable by Spark: {code:java} ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client \--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg; create table ice_test_001(id int) using iceberg;insert into ice_test_001(id) values(1),2),(3);{code} 2) Create iceberg table based on the HadoopTable by Hive: {code:java} CREATE EXTERNAL TABLE ice_test_001STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES ('iceberg.catalog'='location_based_table'); {code} 3)Select the HadoopTable by Hive *set hive.fetch.task.conversion=none;* {code:java} jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001; Error: Error while compiling statement: FAILED: IllegalArgumentException Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. (state=42000,code=4) {code} Full stacktrace: {code:java} Caused by: java.lang.IllegalArgumentException: Pathname /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 from hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) ~[hadoop-common-3.3.1.jar:?] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533) ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT] at