[jira] [Assigned] (HIVE-25383) Make TestMarkPartitionRemote more stable

2023-11-27 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-25383:
--

Assignee: (was: Zhihua Deng)

> Make TestMarkPartitionRemote more stable
> 
>
> Key: HIVE-25383
> URL: https://issues.apache.org/jira/browse/HIVE-25383
> Project: Hive
>  Issue Type: Test
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sometimes the TestMarkPartitionRemote failed by
> {noformat}
> org.apache.hadoop.hive.metastore.api.MetaException: Exception determining 
> external table location:Default location is not available for table: 
> file:/path/to/table
> at 
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transformCreateTable(MetastoreDefaultTransformer.java:660)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.create_table_core(HMSHandler.java:2325)
>  ~[classes/:?]
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.create_table_req(HMSHandler.java:2578)
>  [classes/:?]{noformat}
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2441/15/tests]
>  
> [http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2473/3/tests]
>   
>  The cause is that the table path is existed before the test executed, 
> TableLocationStrategy with prohibit does not allow alternate locations. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-24422) Throw SemanticException when CTE alias is conflicted with table name

2023-11-27 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-24422:
--

Assignee: (was: Zhihua Deng)

> Throw SemanticException when CTE alias is conflicted with table name
> 
>
> Key: HIVE-24422
> URL: https://issues.apache.org/jira/browse/HIVE-24422
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If the alias of CTE is conflicted with the table name, we use the alias 
> fetching the table other than replacing it with the ASTNode tree, this may 
> cause some confusing problems. For example:
> {noformat}
> create table game_info (game_name string);
> with game_info as (
> select distinct ext_id, dev_app_id, game_name
> from game_info_extend )
> select count(game_name) from game_info;{noformat}
> The query will return the number of rows of the table game_info, instead of 
> the game_info_extend. Maybe we should better throw an exception to avoid such 
> cases.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-24511) Potential classloader leak in SerDeStorageSchemaReader and add JsonSerde to managed serde

2023-11-27 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-24511:
--

Assignee: (was: Zhihua Deng)

> Potential classloader leak in SerDeStorageSchemaReader and add JsonSerde to 
> managed serde
> -
>
> Key: HIVE-24511
> URL: https://issues.apache.org/jira/browse/HIVE-24511
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Zhihua Deng
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> 1,  Close the created classloader to release resources.
> 2,  More detail error messages on MetaException when throwing.
> 3,  Skip JsonSerDe/RegexSerDe creation when get columns/schemas of such 
> tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27892) Hive "insert overwrite table" for multiple partition table issue

2023-11-27 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar updated HIVE-27892:
-
Status: Patch Available  (was: Open)

> Hive "insert overwrite table" for multiple partition table issue
> 
>
> Key: HIVE-27892
> URL: https://issues.apache.org/jira/browse/HIVE-27892
> Project: Hive
>  Issue Type: Bug
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> Authorization is not working for Hive "insert overwrite table" for multiple 
> partition table.
> Steps to reproduce the issue:
> 1) CREATE EXTERNAL TABLE Part (eid int, name int)
> PARTITIONED BY (position int, dept int);
> 2) SET hive.exec.dynamic.partition.mode=nonstrict;
> 3) INSERT INTO TABLE PART PARTITION (position,DEPT)
> SELECT 1,1,1,1;
> 4) select * from part;
> create a test user test123, and grant test123 only Select permission for db 
> default, table Part and column * .
> 1) insert overwrite table part partition(position=2,DEPT=2) select 2,2;
> This will failed as expected.
> 2) insert overwrite table part partition(position,DEPT) select 2,2,2,2;
> This will failed as expected.
> 3) insert overwrite table part partition(position=2,DEPT) select 2,2,2;
> But this will succeed and no audit in Ranger, which means no authorization 
> happened when this query was executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-19818) SessionState getQueryId returns an empty string

2023-11-27 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-19818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-19818:
--

Assignee: (was: Zhihua Deng)

> SessionState getQueryId returns an empty string
> ---
>
> Key: HIVE-19818
> URL: https://issues.apache.org/jira/browse/HIVE-19818
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.2
>Reporter: Zhihua Deng
>Priority: Minor
> Attachments: HIVE-19818.patch
>
>
> When we execute sql asynchronously,  a new configuration based on the session 
> holds will be created and passed to the driver instance, which resulting to 
> return an empty string when SessionState#getQueryId called later on. This 
> problem can be seen in HadoopJobExecHelper.java.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26505) Case When Some result data is lost when there are common column conditions and partitioned column conditions

2023-11-27 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790021#comment-17790021
 ] 

Krisztian Kasa commented on HIVE-26505:
---

This is the expression sent by {{PartitionPruner}} to HMS 
get_partitions_spec_by_expr
{code}
(true and (cp = '2022-08-23')) is not true
{code}
This removes partition cp = '2022-08-23' from the partitions which should be 
scan.

CBO plan
{code}
HiveProject(id=[$0], cp=[$1])
  HiveFilter(condition=[IS NOT TRUE(AND(=($0, _UTF-16LE'a'), =($1, 
_UTF-16LE'2022-08-23')))])
HiveTableScan(table=[[default, test0831]], table:alias=[test0831])
{code}

Expression 
{code}
IS NOT TRUE(AND(=($0, _UTF-16LE'a'), =($1, _UTF-16LE'2022-08-23')))
{code}
is converted to
{code}
GenericUDFOPNotTrue(GenericUDFOPAnd(Const boolean true, 
GenericUDFOPEqual(Column[cp], Const string 2022-08-23)))
{code}
in RelOptHiveTable.computePartitionList
https://github.com/apache/hive/blob/f4c4d6b2ec1555abb844b6f348f879719b0a67c3/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/RelOptHiveTable.java#L476-L478


> Case When Some result data is lost when there are common column conditions 
> and partitioned column conditions 
> -
>
> Key: HIVE-26505
> URL: https://issues.apache.org/jira/browse/HIVE-26505
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.0, 4.0.0-alpha-1
>Reporter: GuangMing Lu
>Priority: Critical
>  Labels: check, hive-4.0.0-must
>
> {code:java}https://issues.apache.org/jira/browse/HIVE-26505#
> create table test0831 (id string) partitioned by (cp string);
> insert into test0831 values ('a', '2022-08-23'),('c', '2022-08-23'),('d', 
> '2022-08-23');
> insert into test0831 values ('a', '2022-08-24'),('b', '2022-08-24');
> select * from test0831;
> +-+--+
> | test0831.id | test0831.cp  |
> +-+--+
> | a     | 2022-08-23   |
> | b        | 2022-08-23   |
> | a        | 2022-08-23   |
> | c        | 2022-08-24   |
> | d        | 2022-08-24   |
> +-+--+
> select * from test0831 where (case when id='a' and cp='2022-08-23' then 1 
> else 0 end)=0;  
> +--+--+
> | test0830.id  | test0830.cp  |
> +--+--+
> | a        | 2022-08-24   |
> | b        | 2022-08-24   |
> +--+--+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27903) TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work

2023-11-27 Thread JK Pasimuthu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790127#comment-17790127
 ] 

JK Pasimuthu commented on HIVE-27903:
-

The RETAIN LAST is count based. The use case involving max-sanpshot-age is time 
based. So the use cases are different. 

The documentation about older_than and retain_last is specific for Spark 
procedure expire_snapshots. I don't think we have a similar procedure for hive. 

Is there a way one can specify ALTER TABLE  expire_snapshot without any 
arguments against hive? 

> TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work
> 
>
> Key: HIVE-27903
> URL: https://issues.apache.org/jira/browse/HIVE-27903
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: JK Pasimuthu
>Priority: Major
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27903) TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work

2023-11-27 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790142#comment-17790142
 ] 

Ayush Saxena commented on HIVE-27903:
-

The Spark procedure also invokes the same Iceberg API & Hive does the same, The 
iceberg layer is same for hive, spark or any other engine.

When you use the count based expiration, it uses the table property for time, 
when you use the time based expiration it uses the count based table property.

That is what Iceberg is designed. 
{quote}Is there a way one can specify ALTER TABLE  expire_snapshot without 
any arguments against hive? 
{quote}
Nopes,

> TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work
> 
>
> Key: HIVE-27903
> URL: https://issues.apache.org/jira/browse/HIVE-27903
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: JK Pasimuthu
>Priority: Major
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27843) Add QueryOperation to Hive proto logger for post execution hook information

2023-11-27 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan resolved HIVE-27843.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Add QueryOperation to Hive proto logger for post execution hook information
> ---
>
> Key: HIVE-27843
> URL: https://issues.apache.org/jira/browse/HIVE-27843
> Project: Hive
>  Issue Type: Task
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently the query operation type is missing in the proto logger
> Add QueryOperation to Hive proto logger for post execution hook information



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27843) Add QueryOperation to Hive proto logger for post execution hook information

2023-11-27 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27843 started by Ramesh Kumar Thangarajan.
---
> Add QueryOperation to Hive proto logger for post execution hook information
> ---
>
> Key: HIVE-27843
> URL: https://issues.apache.org/jira/browse/HIVE-27843
> Project: Hive
>  Issue Type: Task
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Currently the query operation type is missing in the proto logger
> Add QueryOperation to Hive proto logger for post execution hook information



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27658) dynamic hash join when use left semi get "SemanticException Error resolving join keys (state=42000,code=40000)"

2023-11-27 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27658:
---
Component/s: Query Planning
 (was: hpl/sql)

> dynamic hash join when use left semi get  "SemanticException Error resolving 
> join keys (state=42000,code=4)"
> 
>
> Key: HIVE-27658
> URL: https://issues.apache.org/jira/browse/HIVE-27658
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.3
>Reporter: xiaojunxiang
>Priority: Major
>
> Occurs in some left semi join cases in the CBO dynamic hash join  optimized 
> queries
> hiveserver2 log as follows
> 2023-08-31T10:22:21,738 WARN  [HiveServer2-Handler-Pool: Thread-100]: 
> thrift.ThriftCLIService (()) - Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Error resolving join keys
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
> ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  ~[hive-exec-100.jar:?]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-100.jar:?]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-100.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_312]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_312]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Error resolving 
> join keys
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.getMapJoinDesc(MapJoinProcessor.java:1105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertJoinOpMapJoinOp(MapJoinProcessor.java:372)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinMapJoin(ConvertJoinMapJoin.java:1056)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin(ConvertJoinMapJoin.java:1280)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.fallbackToReduceSideJoin(ConvertJoinMapJoin.java:1312)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.checkAndConvertSMBJoin(ConvertJoinMapJoin.java:371)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.process(ConvertJoinMapJoin.java:151)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) 
> ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.Defaul

[jira] [Updated] (HIVE-27658) dynamic hash join when use left semi get "SemanticException Error resolving join keys (state=42000,code=40000)"

2023-11-27 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27658:
---
Affects Version/s: 4.0.0-beta-1

> dynamic hash join when use left semi get  "SemanticException Error resolving 
> join keys (state=42000,code=4)"
> 
>
> Key: HIVE-27658
> URL: https://issues.apache.org/jira/browse/HIVE-27658
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.3, 4.0.0-beta-1
>Reporter: xiaojunxiang
>Priority: Major
>
> Occurs in some left semi join cases in the CBO dynamic hash join  optimized 
> queries
> hiveserver2 log as follows
> 2023-08-31T10:22:21,738 WARN  [HiveServer2-Handler-Pool: Thread-100]: 
> thrift.ThriftCLIService (()) - Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Error resolving join keys
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
> ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  ~[hive-exec-100.jar:?]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-100.jar:?]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-100.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_312]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_312]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Error resolving 
> join keys
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.getMapJoinDesc(MapJoinProcessor.java:1105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertJoinOpMapJoinOp(MapJoinProcessor.java:372)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinMapJoin(ConvertJoinMapJoin.java:1056)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin(ConvertJoinMapJoin.java:1280)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.fallbackToReduceSideJoin(ConvertJoinMapJoin.java:1312)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.checkAndConvertSMBJoin(ConvertJoinMapJoin.java:371)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.process(ConvertJoinMapJoin.java:151)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) 
> ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.start

[jira] [Updated] (HIVE-27658) Error resolving join keys during conversion to dynamic partition hashjoin

2023-11-27 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27658:
---
Summary: Error resolving join keys during conversion to dynamic partition 
hashjoin  (was: dynamic hash join when use left semi get  "SemanticException 
Error resolving join keys (state=42000,code=4)")

> Error resolving join keys during conversion to dynamic partition hashjoin
> -
>
> Key: HIVE-27658
> URL: https://issues.apache.org/jira/browse/HIVE-27658
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.3, 4.0.0-beta-1
>Reporter: xiaojunxiang
>Priority: Major
>
> Occurs in some left semi join cases in the CBO dynamic hash join  optimized 
> queries
> hiveserver2 log as follows
> 2023-08-31T10:22:21,738 WARN  [HiveServer2-Handler-Pool: Thread-100]: 
> thrift.ThriftCLIService (()) - Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Error resolving join keys
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
> ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  ~[hive-exec-100.jar:?]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-100.jar:?]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-100.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_312]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_312]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Error resolving 
> join keys
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.getMapJoinDesc(MapJoinProcessor.java:1105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertJoinOpMapJoinOp(MapJoinProcessor.java:372)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinMapJoin(ConvertJoinMapJoin.java:1056)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin(ConvertJoinMapJoin.java:1280)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.fallbackToReduceSideJoin(ConvertJoinMapJoin.java:1312)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.checkAndConvertSMBJoin(ConvertJoinMapJoin.java:371)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.process(ConvertJoinMapJoin.java:151)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) 

[jira] [Commented] (HIVE-27903) TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work

2023-11-27 Thread JK Pasimuthu (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790151#comment-17790151
 ] 

JK Pasimuthu commented on HIVE-27903:
-

So if I specify RETAIN LAST 10, the max-snapshot-age will come into play and 
the actual retained will be based on that property and not 10?

> TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work
> 
>
> Key: HIVE-27903
> URL: https://issues.apache.org/jira/browse/HIVE-27903
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: JK Pasimuthu
>Priority: Major
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27658) Error resolving join keys during conversion to dynamic partition hashjoin

2023-11-27 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27658:
---
Description: 
In certain cases the compilation of queries fail during the conversion to a 
dynamic partition hash join with the stacktrace similar to the one shown below.

{noformat}
2023-08-31T10:22:21,738 WARN  [HiveServer2-Handler-Pool: Thread-100]: 
thrift.ThriftCLIService (()) - Error executing statement: 
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: SemanticException Error resolving join keys
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
 ~[hive-service-100.jar:?]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
 ~[hive-service-100.jar:?]
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
 ~[hive-service-100.jar:?]
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
~[hive-service-100.jar:?]
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
 ~[hive-service-100.jar:?]
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
 ~[hive-service-100.jar:?]
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
 ~[hive-service-100.jar:?]
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
 ~[hive-service-100.jar:?]
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
 ~[hive-exec-100.jar:?]
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
 ~[hive-exec-100.jar:?]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[hive-exec-100.jar:?]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
 ~[hive-exec-100.jar:?]
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 ~[hive-exec-100.jar:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_312]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_312]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Error resolving 
join keys
at 
org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.getMapJoinDesc(MapJoinProcessor.java:1105)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertJoinOpMapJoinOp(MapJoinProcessor.java:372)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinMapJoin(ConvertJoinMapJoin.java:1056)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin(ConvertJoinMapJoin.java:1280)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.fallbackToReduceSideJoin(ConvertJoinMapJoin.java:1312)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.checkAndConvertSMBJoin(ConvertJoinMapJoin.java:371)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.process(ConvertJoinMapJoin.java:151)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) 
~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:447)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:160)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:144) 
~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12320)
 ~[hive-exec-100.jar:?]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInter

[jira] [Updated] (HIVE-27658) Error resolving join keys during conversion to dynamic partition hashjoin

2023-11-27 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-27658:
---
Attachment: hive27658.q

> Error resolving join keys during conversion to dynamic partition hashjoin
> -
>
> Key: HIVE-27658
> URL: https://issues.apache.org/jira/browse/HIVE-27658
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 3.1.3, 4.0.0-beta-1
>Reporter: xiaojunxiang
>Priority: Major
> Attachments: hive27658.q
>
>
> In certain cases the compilation of queries fail during the conversion to a 
> dynamic partition hash join with the stacktrace similar to the one shown 
> below.
> {noformat}
> 2023-08-31T10:22:21,738 WARN  [HiveServer2-Handler-Pool: Thread-100]: 
> thrift.ThriftCLIService (()) - Error executing statement: 
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: SemanticException Error resolving join keys
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:260)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
> ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
>  ~[hive-service-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  ~[hive-exec-100.jar:?]
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-100.jar:?]
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-100.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_312]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_312]
>   at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Error resolving 
> join keys
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.getMapJoinDesc(MapJoinProcessor.java:1105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertJoinOpMapJoinOp(MapJoinProcessor.java:372)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinMapJoin(ConvertJoinMapJoin.java:1056)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin(ConvertJoinMapJoin.java:1280)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.fallbackToReduceSideJoin(ConvertJoinMapJoin.java:1312)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.checkAndConvertSMBJoin(ConvertJoinMapJoin.java:371)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.process(ConvertJoinMapJoin.java:151)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>  ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) 
> ~[hive-exec-100.jar:?]
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalke

[jira] [Commented] (HIVE-27658) Error resolving join keys during conversion to dynamic partition hashjoin

2023-11-27 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790159#comment-17790159
 ] 

Stamatis Zampetakis commented on HIVE-27658:


The LEFT SEMI JOIN scenario described in the description of this ticket 
compiles successfully in [current 
master|https://github.com/apache/hive/commit/9b4ea7affa4902fc2849f1a88b68103940fc9866].
 However, the same exception can be reproduced now in master using the scenario 
attached in [^hive27658.q] using only INNER JOINs and fine tuning the table 
statistics. The stacktrace of the error on current master is shown below:

{noformat}
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=hive27658.q 
-Dtest.output.overwrite
{noformat}

{noformat}
org.apache.hadoop.hive.ql.parse.SemanticException: Error resolving join keys
at 
org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.getMapJoinDesc(MapJoinProcessor.java:1309)
at 
org.apache.hadoop.hive.ql.optimizer.MapJoinProcessor.convertJoinOpMapJoinOp(MapJoinProcessor.java:556)
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinMapJoin(ConvertJoinMapJoin.java:1306)
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.convertJoinDynamicPartitionedHashJoin(ConvertJoinMapJoin.java:1540)
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.fallbackToReduceSideJoin(ConvertJoinMapJoin.java:1573)
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.checkAndConvertSMBJoin(ConvertJoinMapJoin.java:494)
at 
org.apache.hadoop.hive.ql.optimizer.ConvertJoinMapJoin.process(ConvertJoinMapJoin.java:161)
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at 
org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsDependentOptimizations(TezCompiler.java:496)
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:229)
at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:181)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:13053)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13271)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:180)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:107)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:733)
at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:703)
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:115)
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157)
at 
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
{noformat}

> Error resolving join keys during conversion to dynamic partition hashjoin
> -
>
> Key: HIVE-27658
> URL: https://issues.apache.org/jira/browse/HIVE-27658
> Project: Hive
>  Issue Type: Bug
>  Components: Que

[jira] [Commented] (HIVE-27903) TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work

2023-11-27 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790160#comment-17790160
 ] 

Ayush Saxena commented on HIVE-27903:
-

Both will work, say if you 20 snapshots & you say retain 10, if 8 snapshots are 
older than the time in the TBLPROPERTIES, it will wipe out only 8 & you will be 
left with 12, it won’t wipe 2 because of the table property

> TBLPROPERTIES('history.expire.max-snapshot-age-ms') doesn't work
> 
>
> Key: HIVE-27903
> URL: https://issues.apache.org/jira/browse/HIVE-27903
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0-alpha-2
>Reporter: JK Pasimuthu
>Priority: Major
>
> [https://github.com/apache/iceberg/issues/9123]
> The 'history.expire.max-snapshot-age-ms' option doesn't have any effect while 
> expiring snapshots.
>  #  
> CREATE TABLE IF NOT EXISTS test5d78b6 (
> id INT, random1 STRING
> )
> PARTITIONED BY (random2 STRING)
> STORED BY ICEBERG
> TBLPROPERTIES (
> 'write.format.default'='orc',
> 'format-version'='2',
> 'write.orc.compression-codec'='none'
> )
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SLEEP for 30 seconds
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # INSERT INTO test5d78b6 SELECT if(isnull(MAX(id)) ,0 , MAX(id) ) +1, 
> uuid(), uuid() FROM test5d78b6
>  # SELECT (UNIX_TIMESTAMP(CURRENT_TIMESTAMP) - UNIX_TIMESTAMP('2023-10-09 
> 13:23:54.455')) * 1000;
>  # ALTER TABLE test5d78b6 SET 
> tblproperties('history.expire.max-snapshot-age-ms'='54000'); - the elapsed 
> time in ms from the second insert and current time
>  # ALTER TABLE test5d78b6 EXECUTE expire_snapshots('2200-10-10');
>  # SELECT COUNT FROM default.test5d78b6.snapshots;
> output: 1. it should be 2 rows. The default 1 is retained an all snapshots 
> are expired as usual, so setting the property has no effect.
> Additional Info: the default value for 'history.expire.max-snapshot-age-ms' 
> is 5 days per this link: 
> [https://iceberg.apache.org/docs/1.3.1/configuration/]
> Now while writing the tests and running them, the expiring snapshots just 
> worked fine within few seconds of the snapshots being created.
> So, I'm assuming that this option doesn't have any effect right now. Having 
> said that, I'm thinking the implications on end user will have if we fix this.
> The end user may not know about this option at all and will have tough time 
> figuring out why the snapshots are not getting expired. One option could be 
> to set the default to 0ms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22447) Update HBase Version to GA

2023-11-27 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790237#comment-17790237
 ] 

Istvan Toth commented on HIVE-22447:


This is already solved by HIVE-24473, [~belugabehr] . Can we close this one  ?

> Update HBase Version to GA
> --
>
> Key: HIVE-22447
> URL: https://issues.apache.org/jira/browse/HIVE-22447
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-22447.1.patch, HIVE-22447.2.patch
>
>
> Currently at:
> {code:none}
> 2.0.0-alpha4
> {code}
> Upgrade to a GA release (2.2.2)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22447) Update HBase Version to GA

2023-11-27 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790336#comment-17790336
 ] 

David Mollitor commented on HIVE-22447:
---

[~stoty] Sounds good. Thanks.

> Update HBase Version to GA
> --
>
> Key: HIVE-22447
> URL: https://issues.apache.org/jira/browse/HIVE-22447
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-22447.1.patch, HIVE-22447.2.patch
>
>
> Currently at:
> {code:none}
> 2.0.0-alpha4
> {code}
> Upgrade to a GA release (2.2.2)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-22447) Update HBase Version to GA

2023-11-27 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor updated HIVE-22447:
--
Fix Version/s: 4.0.0-alpha-1
   Resolution: Duplicate
   Status: Resolved  (was: Patch Available)

> Update HBase Version to GA
> --
>
> Key: HIVE-22447
> URL: https://issues.apache.org/jira/browse/HIVE-22447
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0-alpha-1
>
> Attachments: HIVE-22447.1.patch, HIVE-22447.2.patch
>
>
> Currently at:
> {code:none}
> 2.0.0-alpha4
> {code}
> Upgrade to a GA release (2.2.2)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790350#comment-17790350
 ] 

yongzhi.shao edited comment on HIVE-27898 at 11/28/23 3:19 AM:
---

Hi.[~zhangbutao] 

I reproduced the problem as follows:

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 


was (Author: lisoda):
Hi.[~zhangbutao] 

I reproduced the problem as follows:

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

STORED AS iceberg;insert into datacenter.test.test_data_02(id,name) 
values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRS

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790350#comment-17790350
 ] 

yongzhi.shao commented on HIVE-27898:
-

Hi.[~zhangbutao] 

I reproduced the problem as follows:

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

STORED AS iceberg;insert into datacenter.test.test_data_02(id,name) 
values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
>

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790350#comment-17790350
 ] 

yongzhi.shao edited comment on HIVE-27898 at 11/28/23 3:20 AM:
---

Hi.[~zhangbutao] 

I reproduced the problem as follows:

看起来,HIVE无法在子查询中正确解析ICEBERG分区表.

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 


was (Author: lisoda):
Hi.[~zhangbutao] 

I reproduced the problem as follows:

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created A

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790350#comment-17790350
 ] 

yongzhi.shao edited comment on HIVE-27898 at 11/28/23 3:20 AM:
---

Hi.[~zhangbutao] 

I reproduced the problem as follows:

It seems that HIVE cannot correctly parse ICEBERG partition tables in 
subqueries.

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 


was (Author: lisoda):
Hi.[~zhangbutao] 

I reproduced the problem as follows:

看起来,HIVE无法在子查询中正确解析ICEBERG分区表.

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectori

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790350#comment-17790350
 ] 

yongzhi.shao edited comment on HIVE-27898 at 11/28/23 3:22 AM:
---

Hi.[~zhangbutao] 

I reproduced the problem as follows:

It seems that HIVE cannot correctly parse ICEBERG partition tables in 
subqueries.

When I create ICEBERG table without partition, the query is normal.

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 


was (Author: lisoda):
Hi.[~zhangbutao] 

I reproduced the problem as follows:

It seems that HIVE cannot correctly parse ICEBERG partition tables in 
subqueries.

 
{code:java}
spark3.4.1+iceberg 1.4.2
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null 


{code}
 

 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '72178194727037

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790355#comment-17790355
 ] 

yongzhi.shao commented on HIVE-27898:
-

但是很奇怪的是,我目前复现不出来之前提到的问题了.

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread yongzhi.shao (Jira)


[ https://issues.apache.org/jira/browse/HIVE-27898 ]


yongzhi.shao deleted comment on HIVE-27898:
-

was (Author: lisoda):
但是很奇怪的是,我目前复现不出来之前提到的问题了.

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790357#comment-17790357
 ] 

yongzhi.shao commented on HIVE-27898:
-

But strangely enough, I can't reproduce the previously mentioned problem at the 
moment.

Previously, I couldn't find a single record. Now the attribute value is NULL 
after nested subquery.

Although none of them are correct.

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27906) Iceberg: Implement Delete Orphan Files

2023-11-27 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790372#comment-17790372
 ] 

Ayush Saxena commented on HIVE-27906:
-

Committed to master.

Thanx [~zhangbutao] for the review!!!

> Iceberg: Implement Delete Orphan Files
> --
>
> Key: HIVE-27906
> URL: https://issues.apache.org/jira/browse/HIVE-27906
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Implement delete orphan files for iceberg tables



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27906) Iceberg: Implement Delete Orphan Files

2023-11-27 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27906.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Iceberg: Implement Delete Orphan Files
> --
>
> Key: HIVE-27906
> URL: https://issues.apache.org/jira/browse/HIVE-27906
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implement delete orphan files for iceberg tables



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2023-11-27 Thread zhangbutao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790407#comment-17790407
 ] 

zhangbutao commented on HIVE-27898:
---

I still can not reproduce your issue:

Spark3.5.0

Hadoop3.3.1

Tez 0.10.2

Hive4 master code

Iceberg 1.4.2

 
{code:java}
//Spark side 

/data/spark-3.5.0-bin-hadoop3/bin/spark-sql \
--master local \
--deploy-mode client \
--conf spark.sql.catalog.hadoop_prod=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.hadoop_prod.type=hadoop \
--conf 
spark.sql.catalog.hadoop_prod.warehouse=hdfs://localhost:8028/tmp/testiceberg;


CREATE TABLE IF NOT EXISTS hadoop_prod.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');


insert into hadoop_prod.default.test_data_04(id,name) 
values('1','a'),('2','b');{code}
 

 
{code:java}
// HS2 side

CREATE EXTERNAL TABLE test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://localhost:8028/tmp/testiceberg/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

 
select id from (select * from test_data_04 limit 10) s1;
+-+
| id  |
+-+
| 1   |
| 2   |
+-+

select id from (select * from test_data_04) s1;
+-+
| id  |
+-+
| 1   |
| 2   |
+-+
{code}
 

 

 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceber