[jira] [Created] (HIVE-27952) Hive fails to create SslContextFactory when KeyStore has multiple certificates

2023-12-12 Thread Seonggon Namgung (Jira)
Seonggon Namgung created HIVE-27952:
---

 Summary: Hive fails to create SslContextFactory when KeyStore has 
multiple certificates
 Key: HIVE-27952
 URL: https://issues.apache.org/jira/browse/HIVE-27952
 Project: Hive
  Issue Type: Bug
Reporter: Seonggon Namgung
Assignee: Seonggon Namgung


With Jetty 9.4.40, we should call SslContextFactory.Server(), instead of 
SslContextFactory(), to create SslContextFactory. Otherwise we get the 
following error when using a KeyStore with multiple certificates in it.
{code:java}
Caused by: java.lang.IllegalStateException: KeyStores with multiple 
certificates are not supported on the base class 
org.eclipse.jetty.util.ssl.SslContextFactory. (Use 
org.eclipse.jetty.util.ssl.SslContextFactory$Server or 
org.eclipse.jetty.util.ssl.SslContextFactory$Client instead) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27950) STACK UDTF returns wrong results when # of argument is not a multiple of N

2023-12-12 Thread okumin (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

okumin updated HIVE-27950:
--
Status: Patch Available  (was: Open)

> STACK UDTF returns wrong results when # of argument is not a multiple of N
> --
>
> Key: HIVE-27950
> URL: https://issues.apache.org/jira/browse/HIVE-27950
> Project: Hive
>  Issue Type: Bug
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>  Labels: pull-request-available
>
> GenericUDTFStack nullifies a wrong cell when the number of values is 
> indivisible. In the following case, the `col2` column of the last row should 
> be `NULL`. But, `col1` is NULL somehow. 
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> select stack(2, 'a', 'b', 'c', 
> 'd', 'e');
> +---+---+---+
> | col0  | col1  | col2  |
> +---+---+---+
> | a     | b     | c     |
> | d     | NULL  | c     |
> +---+---+---+{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-21213) Acid table bootstrap replication needs to handle directory created by compaction with txn id

2023-12-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-21213:

Release Note:   (was: Merged. Thanks.)

> Acid table bootstrap replication needs to handle directory created by 
> compaction with txn id
> 
>
> Key: HIVE-21213
> URL: https://issues.apache.org/jira/browse/HIVE-21213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2, repl
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21213.01.patch, HIVE-21213.02.patch, 
> HIVE-21213.03.patch, HIVE-21213.04.patch, HIVE-21213.05.patch
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The current implementation of compaction uses the txn id in the directory 
> name. This is used to isolate the queries from reading the directory until 
> compaction has finished and to avoid the compactor marking used earlier. In 
> case of replication, during bootstrap , directory is copied as it is with the 
> same name from source to destination cluster. But the directory created by 
> compaction with txn id can not be copied as the txn list at target may be 
> different from source. The txn id which is valid at source may be an aborted 
> txn at target. So conversion logic is required to create a new directory with 
> valid txn at target and dump the data to the newly created directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27446) Exception when rebuild materialized view incrementally in presence of delete operations

2023-12-12 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa resolved HIVE-27446.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Merged to master. Thanks [~lvegh] for review.

> Exception when rebuild materialized view incrementally in presence of delete 
> operations
> ---
>
> Key: HIVE-27446
> URL: https://issues.apache.org/jira/browse/HIVE-27446
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_n6 values
>  (1, 'alfred', 10.30, 2),
>  (2, 'bob', 3.14, 3),
>  (2, 'bonnie', 172342.2, 3),
>  (3, 'calvin', 978.76, 3),
>  (3, 'charlie', 9.8, 1);
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true');
> insert into cmv_basetable_2_n3 values
>  (1, 'alfred', 10.30, 2),
>  (3, 'calvin', 978.76, 3);
> CREATE MATERIALIZED VIEW cmv_mat_view_n6
>   TBLPROPERTIES ('transactional'='true') AS
>   SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
>   FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
>   WHERE cmv_basetable_2_n3.c > 10.0;
> DELETE from cmv_basetable_2_n3 WHERE a=1;
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> DELETE FROM cmv_basetable_n6 WHERE a=1;
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> {code}
> The second rebuild fails
> {code}
>  org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1686925588164_0001_7_06, 
> diagnostics=[Task failed, taskId=task_1686925588164_0001_7_06_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1686925588164_0001_7_06_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:313)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:291)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:387)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:303)
>   ... 17 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableIntObjectInspector.get(WritableIntObjectInspector.java:36)
>   at 
> 

[jira] [Resolved] (HIVE-27803) Bump org.apache.avro:avro from 1.11.1 to 1.11.3

2023-12-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-27803.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

> Bump org.apache.avro:avro from 1.11.1 to 1.11.3
> ---
>
> Key: HIVE-27803
> URL: https://issues.apache.org/jira/browse/HIVE-27803
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Major
> Fix For: 4.0.0
>
>
> PR from *[dependabot|https://github.com/apps/dependabot]*
> https://github.com/apache/hive/pull/4764



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27803) Bump org.apache.avro:avro from 1.11.1 to 1.11.3

2023-12-12 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795887#comment-17795887
 ] 

Ayush Saxena commented on HIVE-27803:
-

Resolved Thanx Akshat for reminding!!!

> Bump org.apache.avro:avro from 1.11.1 to 1.11.3
> ---
>
> Key: HIVE-27803
> URL: https://issues.apache.org/jira/browse/HIVE-27803
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Major
> Fix For: 4.0.0
>
>
> PR from *[dependabot|https://github.com/apps/dependabot]*
> https://github.com/apache/hive/pull/4764



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27672) Iceberg: Truncate partition support

2023-12-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27672:

Issue Type: Improvement  (was: New Feature)

> Iceberg: Truncate partition support
> ---
>
> Key: HIVE-27672
> URL: https://issues.apache.org/jira/browse/HIVE-27672
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support the following truncate operations on a partition level - 
> {code:java}
> TRUNCATE TABLE tableName PARTITION (partCol1 = partValue1, partCol2 = 
> partValue2);{code}
> Truncate is not supported for partition transforms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27627) Iceberg: Insert into/overwrite partition support

2023-12-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27627:

Issue Type: Improvement  (was: New Feature)

> Iceberg: Insert into/overwrite partition support
> 
>
> Key: HIVE-27627
> URL: https://issues.apache.org/jira/browse/HIVE-27627
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support inserting data in the following query types -
> Inserting data via static partition -
> {code:java}
> INSERT INTO|OVERWRITE TABLE tableName PARTITION(pCol = pColValue) VALUES 
> (...);
> INSERT INTO|OVERWRITE TABLE tableName PARTITION(pCol = pColValue) SELECT 
> query;{code}
> Inserting data via dynamic partitioning - 
> {code:java}
> INSERT INTO|OVERWRITE TABLE tableName PARTITION(pCol) VALUES (...); 
> INSERT INTO|OVERWRITE TABLE tableName PARTITION(pCol) SELECT query; {code}
> Inserting data via static and dynamic partitioning with static partitioning 
> coming at the beginning - 
> {code:java}
> INSERT INTO|OVERWRITE TABLE tableName PARTITION(pCol1 = pColValue, pCol2) 
> VALUES (...); 
> INSERT INTO|OVERWRITE TABLE tableName PARTITION(pCol1 = pColValue, pCol2) 
> SELECT query;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27324) Hive query with NOT IN condition is giving incorrect results when the sub query table contains the null value.

2023-12-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-27324:

Flags:   (was: Important)

> Hive query with NOT IN condition is giving incorrect results when the sub 
> query table contains the null value.
> --
>
> Key: HIVE-27324
> URL: https://issues.apache.org/jira/browse/HIVE-27324
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 3.1.3
>Reporter: Shobika Selvaraj
>Assignee: Diksha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: sql_queries.png
>
>
> Hive query giving empty results when the sub query table contains the null 
> value. 
> We encountered two issues here. 
> 1) The query - "select * from t3 where age not in (select distinct(age) age 
> from t1);" is giving empty results when the table t1 contains a null value. 
> Disabling cbo didn't helped here.
> 2) Let's consider the table t3 has null value and table t1 doesn't have any 
> null values. Now if we run the above query it is returning other data's but 
> not the null value from t3. If we disable the cbo then it's giving result 
> with null value. 
>  
> *REPRO STEPS WITH DETAILED EXPLANATION AS BELOW:*
> *FIRST ISSUE:*
> Create two tables and insert data with null values as below:
> ---
> create table t3 (id int,name string, age int);
>  
> insert into t3 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23),(9,'ron',3),(10,'Sam',22),(11,'nick',19),(12,'fed',18),(13,'kong',13),(14,'hela',45);
>  
> create table t1 (id int,name string, age int);
> insert into t1 
> values(1,'Sagar',23),(2,'Sultan',NULL),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> ---
>  
> Then executed the below query: 
> --
> select * from t3
> where age not in (select distinct(age) age from t1);
> --
>  
> The result should be as below:
> {code:java}
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> | 9      | ron      | 3       |
> | 10     | Sam      | 22      |
> | 11     | nick     | 19      |
> | 12     | fed      | 18      |
> | 13     | kong     | 13      |
> ++--+-+
> 5 rows selected (35.897 seconds) {code}
> But when we run the above query it is giving zero records:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t1);
> INFO  : Compiling 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96): 
> select * from t3
> where age not in (select distinct(age) age from t1)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> .
> INFO  : Completed executing 
> command(queryId=hive_20230427164202_e25b671a-f3bd-41e4-b364-844466305d96); 
> Time taken: 10.191 seconds
> INFO  : OK
> ++--+-+
> | t3.id  | t3.name  | t3.age  |
> ++--+-+
> ++--+-+
> No rows selected (12.17 seconds) {code}
> The query works fine when we use nvl function or not null condition. 
>  
> So as a workaround we can use nvl function for both main and sub query as 
> below:
> {code:java}
> select * from t3 where nvl(age,'-') not in (select distinct(nvl(age,'-')) age 
> from t1); {code}
>  
> *SECOND ISSUE:* 
> Also while testing multiple scenario's i found one more issue as well.
> When the sub query table (t1) doesn't contain any null values then the query 
> is giving result but it is ignoring the null values of the main table(t3) .
>  
> For example: Created another table t4 and inserted the data's without any 
> null values:
> create table t4 (id int,name string, age int);
> insert into t4 
> values(1,'Sagar',23),(3,'Surya',23),(4,'Raman',45),(5,'Scott',23),(6,'Ramya',5),(7,'',23),(8,'',23);
> Now i tested with the below query and it gives 5 records. The count should be 
> six and it omitted the null value of the table t3:
> {code:java}
> 0: jdbc:hive2://zk0-shobia.lhlexkfu3vfebcezzj> select * from t3
> . . . . . . . . . . . . . . . . . . . . . . .> where age not in (select 
> distinct(age) age from t4);
> INFO  : Compiling 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c): 
> select * from t3
> where age not in (select distinct(age) age from t4)
> INFO  : Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross 
> product
> ..
> ..
> INFO  : Completed executing 
> command(queryId=hive_20230427164745_f20f47ce-614d-493d-8910-99a118de089c); 
> Time taken: 17.724 

[jira] [Updated] (HIVE-27800) Metastore: Make database installation tests running on ARM chipset

2023-12-12 Thread Akshat Mathur (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshat Mathur updated HIVE-27800:
-
Labels: Arm64  (was: )

> Metastore: Make database installation tests running on ARM chipset
> --
>
> Key: HIVE-27800
> URL: https://issues.apache.org/jira/browse/HIVE-27800
> Project: Hive
>  Issue Type: Task
>  Components: Standalone Metastore
>Reporter: Zsolt Miskolczi
>Priority: Major
>  Labels: Arm64
>
> Those tests are running docker containers, on linux/x86_64 platform and they 
> cannot start a docker image at all on ARM based processor. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27803) Bump org.apache.avro:avro from 1.11.1 to 1.11.3

2023-12-12 Thread Akshat Mathur (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795853#comment-17795853
 ] 

Akshat Mathur commented on HIVE-27803:
--

[~ayushtkn] as the PR is merged along with the qtests 
[https://github.com/apache/hive/pull/4918]

should we resolve this ticket?

> Bump org.apache.avro:avro from 1.11.1 to 1.11.3
> ---
>
> Key: HIVE-27803
> URL: https://issues.apache.org/jira/browse/HIVE-27803
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Major
>
> PR from *[dependabot|https://github.com/apps/dependabot]*
> https://github.com/apache/hive/pull/4764



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27937) Clarifying comments and xml configs around tez container size

2023-12-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27937:

Summary: Clarifying comments and xml configs around tez container size  
(was: Clarifying comments around tez container size)

> Clarifying comments and xml configs around tez container size
> -
>
> Key: HIVE-27937
> URL: https://issues.apache.org/jira/browse/HIVE-27937
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> the comment in HiveConf about hive.tez.container.size is useless, let's 
> improve it:
> https://github.com/apache/hive/blob/1e4f488394d19ea51766e0633a605e078d8558c3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2463



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27937) Clarifying comments around tez container size

2023-12-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27937:

Fix Version/s: 4.0.0

> Clarifying comments around tez container size
> -
>
> Key: HIVE-27937
> URL: https://issues.apache.org/jira/browse/HIVE-27937
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> the comment in HiveConf about hive.tez.container.size is useless, let's 
> improve it:
> https://github.com/apache/hive/blob/1e4f488394d19ea51766e0633a605e078d8558c3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2463



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27937) Clarifying comments around tez container size

2023-12-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-27937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-27937:

Description: 
the comment in HiveConf about hive.tez.container.size is useless, let's improve 
it:
https://github.com/apache/hive/blob/1e4f488394d19ea51766e0633a605e078d8558c3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2463

  was:
the comment in HiveConf about hive.tez.container.size is totally useless:
https://github.com/apache/hive/blob/1e4f488394d19ea51766e0633a605e078d8558c3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2463


> Clarifying comments around tez container size
> -
>
> Key: HIVE-27937
> URL: https://issues.apache.org/jira/browse/HIVE-27937
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>
> the comment in HiveConf about hive.tez.container.size is useless, let's 
> improve it:
> https://github.com/apache/hive/blob/1e4f488394d19ea51766e0633a605e078d8558c3/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L2463



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-12-12 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795740#comment-17795740
 ] 

Krisztian Kasa commented on HIVE-27924:
---

A draft patch created to address the issue when the MV definition has 
aggregate. I'm working on the part which handles the non-aggregate case.

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, hive-4.0.0-must, known_issue, 
> materializedviews, pull-request-available
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; 

[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2023-12-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27924:
--
Labels: bug hive hive-4.0.0-must known_issue materializedviews 
pull-request-available  (was: bug hive hive-4.0.0-must known_issue 
materializedviews)

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, hive-4.0.0-must, known_issue, 
> materializedviews, pull-request-available
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable 

[jira] [Updated] (HIVE-24730) Shims classes override values from hive-site.xml and tez-site.xml silently

2023-12-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-24730:

Fix Version/s: 4.0.0

> Shims classes override values from hive-site.xml and tez-site.xml silently
> --
>
> Key: HIVE-24730
> URL: https://issues.apache.org/jira/browse/HIVE-24730
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Since HIVE-14887, 
> [Hadoop23Shims|https://github.com/apache/hive/blob/master/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java]
>  silently overrides e.g. hive.tez.container.size which is defined in 
> data/conf/hive/llap/hive-site.xml. This way, the developer will have no idea 
> about what happened after setting those values in the xml.
> My proposal: 
> 1. don't set those values, unless they contain the default value (e.g.: -1 
> for hive.tez.container.size)
> 2. put an INFO level log message about the override
> OR:
> put a comment in hive-site.xml and tez-site.xml files that shims override it 
> while creating a tez mini cluster



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-24730) Shims classes override values from hive-site.xml and tez-site.xml silently

2023-12-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-24730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor resolved HIVE-24730.
-
Resolution: Fixed

> Shims classes override values from hive-site.xml and tez-site.xml silently
> --
>
> Key: HIVE-24730
> URL: https://issues.apache.org/jira/browse/HIVE-24730
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Since HIVE-14887, 
> [Hadoop23Shims|https://github.com/apache/hive/blob/master/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java]
>  silently overrides e.g. hive.tez.container.size which is defined in 
> data/conf/hive/llap/hive-site.xml. This way, the developer will have no idea 
> about what happened after setting those values in the xml.
> My proposal: 
> 1. don't set those values, unless they contain the default value (e.g.: -1 
> for hive.tez.container.size)
> 2. put an INFO level log message about the override
> OR:
> put a comment in hive-site.xml and tez-site.xml files that shims override it 
> while creating a tez mini cluster



--
This message was sent by Atlassian Jira
(v8.20.10#820010)