[jira] [Updated] (HIVE-27862) Map propertyContent to a wrong column in package.jdo

2023-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27862:
--
Labels: pull-request-available  (was: )

> Map propertyContent to a wrong column in package.jdo
> 
>
> Key: HIVE-27862
> URL: https://issues.apache.org/jira/browse/HIVE-27862
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27555) Upgrade issues with Kudu table on backend db

2023-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27555:
--
Labels: pull-request-available  (was: )

> Upgrade issues with Kudu table on backend db
> 
>
> Key: HIVE-27555
> URL: https://issues.apache.org/jira/browse/HIVE-27555
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>
> In HIVE-27457, we try to update the serde lib, (input/output)format of the 
> kudu table in back db. In the upgrade scripts, we join the  "SDS"."SD_ID" 
> with "TABLE_PARAMS"."TBL_ID", 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/sql/mysql/upgrade-4.0.0-alpha-2-to-4.0.0-beta-1.mysql.sql#L37-L39
> as "SD_ID" is the primary key of SDS, and "TBL_ID" is the primary key of 
> TBLS, we can't join the two tables using these two columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.2

2023-11-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27819.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Iceberg: Upgrade iceberg version to 1.4.2
> -
>
> Key: HIVE-27819
> URL: https://issues.apache.org/jira/browse/HIVE-27819
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
> depdency from 1.3.0 to 1.4.2. Meantime, we should port some Hive catalog 
> changes from Iceberg repo to Hive repo.
> [https://iceberg.apache.org/releases/#142-release]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27819) Iceberg: Upgrade iceberg version to 1.4.2

2023-11-10 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785051#comment-17785051
 ] 

Denys Kuzmenko commented on HIVE-27819:
---

Merged to master
Thanks for the patch [~zhangbutao], and [~ayushsaxena], [~kkasa] for the review!

> Iceberg: Upgrade iceberg version to 1.4.2
> -
>
> Key: HIVE-27819
> URL: https://issues.apache.org/jira/browse/HIVE-27819
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> Iceberg latest version 1.4.2 has been released out. we need upgrade iceberg 
> depdency from 1.3.0 to 1.4.2. Meantime, we should port some Hive catalog 
> changes from Iceberg repo to Hive repo.
> [https://iceberg.apache.org/releases/#142-release]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27762) Don't fall back to jdo query in ObjectStore if direct sql throws unrecoverable exception

2023-11-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27762.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

> Don't fall back to jdo query in ObjectStore if direct sql throws 
> unrecoverable exception
> 
>
> Key: HIVE-27762
> URL: https://issues.apache.org/jira/browse/HIVE-27762
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently {{GetHelper}}  will call {{getJdoResult()}} if {{getSqlResult()}} 
> throws exception, it can be avoid if the exception is unrecoverable to 
> improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27762) Don't fall back to jdo query in ObjectStore if direct sql throws unrecoverable exception

2023-11-10 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17785046#comment-17785046
 ] 

Denys Kuzmenko commented on HIVE-27762:
---

Merged to master.
Thanks [~wechar] for the patch and [~gsaihemanth] for the review!

> Don't fall back to jdo query in ObjectStore if direct sql throws 
> unrecoverable exception
> 
>
> Key: HIVE-27762
> URL: https://issues.apache.org/jira/browse/HIVE-27762
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>
> Currently {{GetHelper}}  will call {{getJdoResult()}} if {{getSqlResult()}} 
> throws exception, it can be avoid if the exception is unrecoverable to 
> improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27762) Don't fall back to jdo query in ObjectStore if direct sql throws unrecoverable exception

2023-11-10 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-27762:
--
Summary: Don't fall back to jdo query in ObjectStore if direct sql throws 
unrecoverable exception  (was: ObjectStore GetHelper do not need run jdo query 
if direct sql exception is unrecoverable)

> Don't fall back to jdo query in ObjectStore if direct sql throws 
> unrecoverable exception
> 
>
> Key: HIVE-27762
> URL: https://issues.apache.org/jira/browse/HIVE-27762
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>
> Currently {{GetHelper}}  will call {{getJdoResult()}} if {{getSqlResult()}} 
> throws exception, it can be avoid if the exception is unrecoverable to 
> improve performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.

2023-11-10 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784956#comment-17784956
 ] 

Krisztian Kasa commented on HIVE-26986:
---

[~seonggon] 
1. It is not clear why adding extra concentrator RS leads to data correctness 
issue.
Could you please share a simple repro on a small dataset which has the 
necessary records only. It can be also added to the PR to extend the test 
coverage of SWO and ParallelEdgeFixer.
2. IIUC parallel edge support can be controlled via config setting. Could you 
please verify if the correctness issue stands when
{code:java}
set hive.optimize.shared.work.parallel.edge.support=false;
{code}

> A DAG created by OperatorGraph is not equal to the Tez DAG.
> ---
>
> Key: HIVE-26986
> URL: https://issues.apache.org/jira/browse/HIVE-26986
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 4.0.0-alpha-2
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: hive-4.0.0-must, pull-request-available
> Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A DAG created by OperatorGraph is not equal to the corresponding DAG that is 
> submitted to Tez.
> Because of this problem, ParallelEdgeFixer reports a pair of normal edges to 
> a parallel edge.
> We observe this problem by comparing OperatorGraph and Tez DAG when running 
> TPC-DS query 71 on 1TB ORC format managed table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27869:
--
Labels: pull-request-available  (was: )

> Iceberg: Select  HadoopTables will fail at 
> HiveIcebergStorageHandler::canProvideColStats
> 
>
> Key: HIVE-27869
> URL: https://issues.apache.org/jira/browse/HIVE-27869
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>  Labels: pull-request-available
>
> Step to reproduce:(latest master code)
> 1) Create path-based HadoopTable by Spark:
>  
> {code:java}
> ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
> \--conf 
> spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
>  \--conf 
> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
> \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
> spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;
> create table ice_test_001(id int) using iceberg;
> insert into ice_test_001(id) values(1),(2),(3);{code}
>  
> 2) Create iceberg table based on the HadoopTable by Hive:
> {code:java}
> CREATE EXTERNAL TABLE ice_test_001 STORED BY 
> 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
> 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table'); {code}
> 3)Select the HadoopTable by Hive
> // launch tez task to scan data
> *set hive.fetch.task.conversion=none;*
> {code:java}
> jdbc:hive2://localhost:1/default> select * from ice_test_001;
> Error: Error while compiling statement: FAILED: IllegalArgumentException 
> Pathname 
> /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  from 
> hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  is not a valid DFS filename. (state=42000,code=4) {code}
> Full stacktrace:
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Pathname 
> /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  from 
> hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  is not a valid DFS filename.
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
> ~[hadoop-common-3.3.1.jar:?]
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
>  ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
>  ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:(latest master code)

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),(2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001 STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive

// launch tez task to scan data
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:1/default> select * from ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),(2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001 STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive

// launch tez task to scan data
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:1/default> select * from ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),(2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001 STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001' TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive

// launch tez task to scan data
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:10004/default> select * from ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001 STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:10004/default> select * from ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:84)
 

[jira] [Assigned] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao reassigned HIVE-27869:
-

Assignee: zhangbutao

> Iceberg: Select  HadoopTables will fail at 
> HiveIcebergStorageHandler::canProvideColStats
> 
>
> Key: HIVE-27869
> URL: https://issues.apache.org/jira/browse/HIVE-27869
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: zhangbutao
>Assignee: zhangbutao
>Priority: Major
>
> Step to reproduce:
> 1) Create path-based HadoopTable by Spark:
>  
> {code:java}
> ./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
> \--conf 
> spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
>  \--conf 
> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
> \--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
> spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;
> create table ice_test_001(id int) using iceberg;
> insert into ice_test_001(id) values(1),2),(3);{code}
>  
> 2) Create iceberg table based on the HadoopTable by Hive:
> {code:java}
> CREATE EXTERNAL TABLE ice_test_001STORED BY 
> 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
> 'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table'); {code}
> 3)Select the HadoopTable by Hive
> *set hive.fetch.task.conversion=none;*
> {code:java}
> jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001;
> Error: Error while compiling statement: FAILED: IllegalArgumentException 
> Pathname 
> /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  from 
> hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  is not a valid DFS filename. (state=42000,code=4) {code}
> Full stacktrace:
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Pathname 
> /tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  from 
> hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
>  is not a valid DFS filename.
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.3.1.jar:?]
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
> ~[hadoop-common-3.3.1.jar:?]
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
>  ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
>  ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
>         at 
> 

[jira] [Updated] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhangbutao updated HIVE-27869:
--
Description: 
Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;
insert into ice_test_001(id) values(1),2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:125)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 

[jira] [Created] (HIVE-27869) Iceberg: Select HadoopTables will fail at HiveIcebergStorageHandler::canProvideColStats

2023-11-10 Thread zhangbutao (Jira)
zhangbutao created HIVE-27869:
-

 Summary: Iceberg: Select  HadoopTables will fail at 
HiveIcebergStorageHandler::canProvideColStats
 Key: HIVE-27869
 URL: https://issues.apache.org/jira/browse/HIVE-27869
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


Step to reproduce:

1) Create path-based HadoopTable by Spark:
 
{code:java}
./spark-3.3.1-bin-hadoop3/bin/spark-sql \--master local \--deploy-mode client 
\--conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
\--conf spark.sql.catalog.spark_catalog.type=hadoop \--conf 
spark.sql.catalog.spark_catalog.warehouse=hdfs://localhost:8028/tmp/testiceberg;


create table ice_test_001(id int) using iceberg;insert into ice_test_001(id) 
values(1),2),(3);{code}
 
2) Create iceberg table based on the HadoopTable by Hive:
{code:java}
CREATE EXTERNAL TABLE ice_test_001STORED BY 
'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 
'hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001'TBLPROPERTIES 
('iceberg.catalog'='location_based_table'); {code}
3)Select the HadoopTable by Hive
*set hive.fetch.task.conversion=none;*
{code:java}
jdbc:hive2://localhost:10004/default> select * from testicedb118.ice_test_001;
Error: Error while compiling statement: FAILED: IllegalArgumentException 
Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename. (state=42000,code=4) {code}
Full stacktrace:
{code:java}
Caused by: java.lang.IllegalArgumentException: Pathname 
/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 from 
hdfs://localhost:8028/tmp/testiceberg/default/ice_test_001/stats/hdfs:/localhost:8028/tmp/testiceberg/default/ice_test_0018020750642632422610
 is not a valid DFS filename.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:256)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1752)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1749)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1764)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) 
~[hadoop-common-3.3.1.jar:?]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStats(HiveIcebergStorageHandler.java:540)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler.canProvideColStatistics(HiveIcebergStorageHandler.java:533)
 ~[hive-iceberg-handler-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:1073)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:302)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:193)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:181)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:173)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.walk(LevelOrderWalker.java:148) 
~[hive-exec-4.0.0-beta-2-SNAPSHOT.jar:4.0.0-beta-2-SNAPSHOT]
        at