[jira] [Created] (HIVE-27228) Add missing upgrade SQL statements after CQ_NUMBER_OF_BUCKETS column being introduced in HIVE-26719

2023-04-06 Thread Sourabh Badhya (Jira)
Sourabh Badhya created HIVE-27228:
-

 Summary: Add missing upgrade SQL statements after 
CQ_NUMBER_OF_BUCKETS column being introduced in HIVE-26719
 Key: HIVE-27228
 URL: https://issues.apache.org/jira/browse/HIVE-27228
 Project: Hive
  Issue Type: Bug
Reporter: Sourabh Badhya
Assignee: Sourabh Badhya


HIVE-26719 introduced CQ_NUMBER_OF_BUCKETS column in COMPACTION_QUEUE table and 
COMPLETED_COMPACTIONS table. However, the corresponding upgrade SQL statements 
is missing for these columns. Also CQ_NUMBER_OF_BUCKETS is not updated in the 
COMPACTIONS view in information schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27227) Provide config to re-enable partitions discovery on external tables

2023-04-06 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-27227:
-

 Summary: Provide config to re-enable partitions discovery on 
external tables
 Key: HIVE-27227
 URL: https://issues.apache.org/jira/browse/HIVE-27227
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Taraka Rama Rao Lethavadla


As part of HIVE-25039 disabled discovery.partitions config for external tables 
by default. Now if someone wants to turn on the feature(knowing the risk) they 
have to add the config set to true for every newly created table.

Another use case is if a user want to enable this feature(knowing the risk) to 
all the existing tables, he has to execute alter table command for every table. 
It would be very difficult if they have lot of tables



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27226) FullOuterJoin with filter expressions is not computed correctly

2023-04-06 Thread Seonggon Namgung (Jira)
Seonggon Namgung created HIVE-27226:
---

 Summary: FullOuterJoin with filter expressions is not computed 
correctly
 Key: HIVE-27226
 URL: https://issues.apache.org/jira/browse/HIVE-27226
 Project: Hive
  Issue Type: Bug
Reporter: Seonggon Namgung


I tested many OuterJoin queries as an extension of HIVE-27138, and I found that 
Hive returns incorrect result for a query containing FullOuterJoin with filter 
expressions. In a nutshell, all JoinOperators that run on Tez engine return 
incorrect result for OuterJoin queries, and one of the reason for incorrect 
computation comes from CommonJoinOperator, which is the base of all 
JoinOperators. I attached the queries and configuration that I used at the 
bottom of the document. I am still inspecting this problems, and I will share 
an update once when I find out another reason. Also any comments and opinions 
would be appreciated.


First of all, I observed that current Hive ignores filter expressions contained 
in MapJoinOperator. For example, the attached result of query1 shows that 
MapJoinOperator performs inner join, not full outer join. This problem stems 
from removal of filterMap. When converting JoinOperator to MapJoinOperator, 
ConvertJoinMapJoin#convertJoinDynamicPartitionedHashJoin() removes filterMap of 
MapJoinOperator. Because MapJoinOperator does not evaluate filter expressions 
if filterMap is null, this change makes MapJoinOperator ignore filter 
expressions and it always joins tables regardless whether they satisfy filter 
expressions or not. To solve this problem, I disable 
FullOuterMapJoinOptimization and apply path for HIVE-27138, which prevents NPE. 
(The patch is available at the following link: LINK.) The rest of this document 
uses this modified Hive, but most of problems happen to current Hive, too.


The second problem I found is that Hive returns the same left-null or 
right-null rows multiple time when it uses MapJoinOperator or 
CommonMergeJoinOperator. This is caused by the logic of current 
CommonJoinOperator. Both of the two JoinOperators joins tables in 2 steps. 
First, they create RowContainers, each of which is a group of rows from one 
table and has the same key. Second, they call 
CommonJoinOperator#checkAndGenObject() with created RowContainers. This method 
checks filterTag of each row in RowContainers and forwards joined row if they 
meet all filter conditions. For OuterJoin, checkAndGenObject() forwards 
non-matching rows if there is no matching row in RowContainer. The problem 
happens when there are multiple RowContainer for the same key and table. For 
example, suppose that there are two left RowContainers and one right 
RowContainer. If none of the row in two left RowContainers satisfies filter 
condition, then checkAndGenObject() will forward Left-Null row for each right 
row. Because checkAndGenObject() is called with each left RowContainer, there 
will be two duplicated Left-Null rows for every right row.


In the case of MapJoinOperator, it always creates singleton RowContainer for 
big table. Therefore, it always produces duplicated non-matching rows. 
CommonMergeJoinOperator also creates multiple RowContainer for big table, whose 
size is hive.join.emit.interval. In the below experiment, I also set 
hive.join.shortcut.unmatched.rows=false, and hive.exec.reducers.max=1 to 
disable specialized algorithm for OuterJoin of 2 tables and force calling 
checkAndGenObject() before all rows with the same keys are gathered. I didn't 
observe this problem when using VectorMapJoinOperator, and I will inspect 
VectorMapJoinOperator whether we can reproduce the problem with it.


I think the second problem is not limited to FullOuterJoin, but I couldn't find 
such query as of now. This will also be added to this issue if I can write a 
query that reproduces the second problem without FullOuterJoin.


I also found that Hive returns wrong result for query2 even when I used 
VectorMapJoinOperator. I am still inspecting this problem and I will add an 
update on it when I find out the reason.

 

Experiment:

 
{code:java}
 Configuration
set hive.optimize.shared.work=false;

-- Std MapJoin
set hive.auto.convert.join=true;
set hive.vectorized.execution.enabled=false;

-- Vec MapJoin
set hive.auto.convert.join=true;
set hive.vectorized.execution.enabled=true;

-- MergeJoin
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled=false;
set hive.join.shortcut.unmatched.rows=false;
set hive.join.emit.interval=1;
set hive.exec.reducers.max=1;
 
 Queries
-- Query 1
DROP TABLE IF EXISTS a;
CREATE TABLE a (key string, value string);
INSERT INTO a VALUES (1, 1), (1, 2), (2, 1);
SELECT * FROM a FULL OUTER JOIN a b ON a.key = b.key AND a.key < 0;

-- Query 2
DROP TABLE IF EXISTS b;
CREATE TABLE b (key string, value string);
INSERT INTO b VALUES (1, 0), (1, 1);
SELECT * FROM b FULL OUTE

[jira] [Created] (HIVE-27225) Speedup build by skipping SBOM generation by default

2023-04-06 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27225:
--

 Summary: Speedup build by skipping SBOM generation by default
 Key: HIVE-27225
 URL: https://issues.apache.org/jira/browse/HIVE-27225
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


A full build of Hive locally in my environment takes ~15 minutes.
{noformat}
mvn clean install -DskipTests -Pitests
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  14:15 min
{noformat}

Profiling the build shows that we are spending roughly 30% of CPU in 
org.cyclonedx.maven plugin which is used to generate SBOM artifacts 
(HIVE-26912). 

The SBOM generation does not need run in every single build and probably needs 
to be active only during the release build. To speed-up every-day builds I 
propose to activate the cyclonedx plugin only in the dist (release) profile.

After this change, the default build drops from 14 minutes to 8.
{noformat}
mvn clean install -DskipTests -Pitests
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  08:19 min
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27224) Enhance drop table/partition command

2023-04-06 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-27224:
-

 Summary: Enhance drop table/partition command
 Key: HIVE-27224
 URL: https://issues.apache.org/jira/browse/HIVE-27224
 Project: Hive
  Issue Type: Improvement
  Components: Hive, Standalone Metastore
Reporter: Taraka Rama Rao Lethavadla


{*}Problem Statement{*}:

If the table has a large number of partitions, then drop table command will 
take a lot of time to finish. To improve the command we have the following 
proposals 
 * Perform all the queries(HMS->DB) in drop table in batches(not just 
partitions table) so that query will not fail throwing exceptions like 
transaction id not found or any other timeout issues as this is directly 
proportional to backend database performance
 * Display what action is happening as part of drop table, so that user will 
know what step is taking more time or how many steps completed so far. we 
should have loggers(DEBUG's at least) in clients to know how many 
partitions/batches being processed & current iterations to estimate approx. 
timeout for such large HMS operation.
 * support retry option, if for some reason drop table command fails performing 
some of the operations, the next time it is run, it should proceed with next 
operations instead of failing due to missing/stale entries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27223) Show Compactions failing with NPE

2023-04-05 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27223:
---

 Summary: Show Compactions failing with NPE
 Key: HIVE-27223
 URL: https://issues.apache.org/jira/browse/HIVE-27223
 Project: Hive
  Issue Type: Bug
Reporter: Ayush Saxena
Assignee: Ayush Saxena


{noformat}
java.lang.NullPointerException: null
at java.io.DataOutputStream.writeBytes(DataOutputStream.java:274) ~[?:?]
at 
org.apache.hadoop.hive.ql.ddl.process.show.compactions.ShowCompactionsOperation.writeRow(ShowCompactionsOperation.java:135)
 
at 
org.apache.hadoop.hive.ql.ddl.process.show.compactions.ShowCompactionsOperation.execute(ShowCompactionsOperation.java:57)
 
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:360) 
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27222) New functionality to show compactions information for a specific table/partition of a given database

2023-04-05 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-27222:
-

 Summary: New functionality to show compactions information for a 
specific table/partition of a given database
 Key: HIVE-27222
 URL: https://issues.apache.org/jira/browse/HIVE-27222
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Taraka Rama Rao Lethavadla


As per the current implementation, show compactions command will list 
compaction details of all the partitions and tables of all the databases in a 
single output.

If user happens to have 100 or 1000 or even more databases/tables/partitions, 
parsing the show compactions output to check details of specific 
table/partition will be difficult

So the proposal is to support something like
{noformat}
show compactions `db`.`table`[.`partition`]{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27221) Backport of HIVE-25726: Upgrade velocity to 2.3 due to CVE-2020-13936

2023-04-05 Thread Apoorva Aggarwal (Jira)
Apoorva Aggarwal created HIVE-27221:
---

 Summary: Backport of HIVE-25726: Upgrade velocity to 2.3 due to 
CVE-2020-13936
 Key: HIVE-27221
 URL: https://issues.apache.org/jira/browse/HIVE-27221
 Project: Hive
  Issue Type: Sub-task
Reporter: Apoorva Aggarwal
Assignee: Apoorva Aggarwal
 Fix For: 3.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27220) Backport Upgrade commons,httpclient,jackson,jetty,log4j binaries from branch-3.1

2023-04-05 Thread Apoorva Aggarwal (Jira)
Apoorva Aggarwal created HIVE-27220:
---

 Summary:  Backport Upgrade commons,httpclient,jackson,jetty,log4j 
binaries from branch-3.1
 Key: HIVE-27220
 URL: https://issues.apache.org/jira/browse/HIVE-27220
 Project: Hive
  Issue Type: Sub-task
Reporter: Apoorva Aggarwal
 Fix For: 3.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27219) Backport of HIVE-25616: Hive-24741 backport to 3.1

2023-04-05 Thread Apoorva Aggarwal (Jira)
Apoorva Aggarwal created HIVE-27219:
---

 Summary: Backport of HIVE-25616: Hive-24741 backport to 3.1
 Key: HIVE-27219
 URL: https://issues.apache.org/jira/browse/HIVE-27219
 Project: Hive
  Issue Type: Sub-task
Reporter: Apoorva Aggarwal
Assignee: Apoorva Aggarwal
 Fix For: 3.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27218) Hive-3 set hive.materializedview.rewriting default to false

2023-04-04 Thread Yi Zhang (Jira)
Yi Zhang created HIVE-27218:
---

 Summary: Hive-3 set hive.materializedview.rewriting default to 
false
 Key: HIVE-27218
 URL: https://issues.apache.org/jira/browse/HIVE-27218
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Affects Versions: 3.1.3, 3.1.2, 3.1.0
Reporter: Yi Zhang


https://issues.apache.org/jira/browse/HIVE-19973 switched 
hive.materializedview.rewriting default from false to true, however user 
observed high latency at query compilation as they have large amount of 
databases (5k), each call to a remote metastore DB adds up the latency to 
minutes, each query compilation time is high.

as hive-4 have improvements in HIVE-21631 and HIVE-21344  backport is unlikely, 
suggest to turn this false by default in hive-3.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27217) addWriteNotificationLogInBatch can silently fail

2023-04-04 Thread John Sherman (Jira)
John Sherman created HIVE-27217:
---

 Summary: addWriteNotificationLogInBatch can silently fail
 Key: HIVE-27217
 URL: https://issues.apache.org/jira/browse/HIVE-27217
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: John Sherman
Assignee: John Sherman


Debugging an issue, I noticed that addWriteNotificationLogInBatch in Hive.java 
can fail silently if the TApplicationException thrown is not 
TApplicationException.UNKNOWN_METHOD or TApplicationException.WRONG_METHOD_NAME.

https://github.com/apache/hive/blob/40a7d689e51d02fa9b324553fd1810d0ad043080/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3359-L3381

Failures to write in the notification log can be very difficult to debug, we 
should rethrow the exception so that the failure is very visible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27216) Upgrade postgresql to 42.5.1 from 9.x

2023-04-03 Thread Aman Raj (Jira)
Aman Raj created HIVE-27216:
---

 Summary: Upgrade postgresql to 42.5.1 from 9.x
 Key: HIVE-27216
 URL: https://issues.apache.org/jira/browse/HIVE-27216
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


This ticket involves partial cherry pick of #HIVE-23965 and complete cherry 
picks of HIVE-26253 and HIVE-26914



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27215) On DB with defaultTableType property, create external table with transactional property as true creates a managed table

2023-04-03 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created HIVE-27215:


 Summary: On DB with defaultTableType property, create external 
table with transactional property as true creates a managed table
 Key: HIVE-27215
 URL: https://issues.apache.org/jira/browse/HIVE-27215
 Project: Hive
  Issue Type: Bug
Reporter: Venugopal Reddy K


*Description:*

On Database created with defaultTableType property, create external table with 
transactional property as true creates a managed table.

*Steps to reproduce:*

Create database with db property defaultTableType either as external or acid. 
And create external table with transactional property set to true or with 
transactional property set to true and transactional_properties=insert_only. 
Table is created as managed table.
{code:java}
0: jdbc:hive2://localhost:1> create database mydbext with 
dbproperties('defaultTableType'='external');
0: jdbc:hive2://localhost:1> use mydbext;
0: jdbc:hive2://localhost:1> create external table test_ext_txn(i string) 
stored as orc tblproperties('transactional'='true');
0: jdbc:hive2://localhost:1> desc formatted test_ext_txn;

+---+++
|           col_name            |                     data_type                 
     |                      comment                       |
+---+++
| i                             | string                                        
     |                                                    |
|                               | NULL                                          
     | NULL                                               |
| # Detailed Table Information  | NULL                                          
     | NULL                                               |
| Database:                     | mydbext                                       
     | NULL                                               |
| OwnerType:                    | USER                                          
     | NULL                                               |
| Owner:                        | hive                                          
     | NULL                                               |
| CreateTime:                   | Mon Apr 03 23:24:07 IST 2023                  
     | NULL                                               |
| LastAccessTime:               | UNKNOWN                                       
     | NULL                                               |
| Retention:                    | 0                                             
     | NULL                                               |
| Location:                     | 
file:/tmp/warehouse/managed/mydbext.db/test_ext_txn | NULL                      
                         |
| Table Type:                   | MANAGED_TABLE                                 
     | NULL                                               |
| Table Parameters:             | NULL                                          
     | NULL                                               |
|                               | COLUMN_STATS_ACCURATE                         
     | {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"i\":\"true\"}} |
|                               | bucketing_version                             
     | 2                                                  |
|                               | numFiles                                      
     | 0                                                  |
|                               | numRows                                       
     | 0                                                  |
|                               | rawDataSize                                   
     | 0                                                  |
|                               | totalSize                                     
     | 0                                                  |
|                               | transactional                                 
     | true                                               |
|                               | transactional_properties                      
     | default                                            |
|                               | transient_lastDdlTime                         
     | 168057                                         |
|                               | NULL                                          
     | NULL                                               |
| # Storage Information         | NULL                                          
     | NULL                                       

[jira] [Created] (HIVE-27214) Backport of HIVE-24414: Backport HIVE-19662 to branch-3.1

2023-04-02 Thread Diksha (Jira)
Diksha created HIVE-27214:
-

 Summary: Backport of HIVE-24414: Backport HIVE-19662 to branch-3.1
 Key: HIVE-27214
 URL: https://issues.apache.org/jira/browse/HIVE-27214
 Project: Hive
  Issue Type: Sub-task
Reporter: Diksha






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27213) parquet logical decimal type to INT32 is not working while compute statastics

2023-04-02 Thread KIRTI RUGE (Jira)
KIRTI RUGE created HIVE-27213:
-

 Summary: parquet logical decimal type to INT32 is not working 
while compute statastics
 Key: HIVE-27213
 URL: https://issues.apache.org/jira/browse/HIVE-27213
 Project: Hive
  Issue Type: Improvement
Reporter: KIRTI RUGE
 Attachments: test.parquet

[^test.parquet]

Steps to reproduce:

dfs ${system:test.dfs.mkdir} hdfs:///tmp/dwxtest/ws_sold_date_sk=2451825;
dfs -copyFromLocal ../../data/files/dwxtest.parquet 
hdfs:///tmp/dwxtest/ws_sold_date_sk=2451825;
dfs -ls hdfs:///tmp/dwxtest/ws_sold_date_sk=2451825/;


CREATE EXTERNAL TABLE `web_sales`(
`ws_sold_time_sk` int,
`ws_ship_date_sk` int,
`ws_item_sk` int,
`ws_bill_customer_sk` int,
`ws_bill_cdemo_sk` int,
`ws_bill_hdemo_sk` int,
`ws_bill_addr_sk` int,
`ws_ship_customer_sk` int,
`ws_ship_cdemo_sk` int,
`ws_ship_hdemo_sk` int,
`ws_ship_addr_sk` int,
`ws_web_page_sk` int,
`ws_web_site_sk` int,
`ws_ship_mode_sk` int,
`ws_warehouse_sk` int,
`ws_promo_sk` int,
`ws_order_number` bigint,
`ws_quantity` int,
`ws_wholesale_cost` decimal(7,2),
`ws_list_price` decimal(7,2),
`ws_sales_price` decimal(7,2),
`ws_ext_discount_amt` decimal(7,2),
`ws_ext_sales_price` decimal(7,2),
`ws_ext_wholesale_cost` decimal(7,2),
`ws_ext_list_price` decimal(7,2),
`ws_ext_tax` decimal(7,2),
`ws_coupon_amt` decimal(7,2),
`ws_ext_ship_cost` decimal(7,2),
`ws_net_paid` decimal(7,2),
`ws_net_paid_inc_tax` decimal(7,2),
`ws_net_paid_inc_ship` decimal(7,2),
`ws_net_paid_inc_ship_tax` decimal(7,2),
`ws_net_profit` decimal(7,2))
PARTITIONED BY (
`ws_sold_date_sk` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS PARQUET LOCATION 'hdfs:///tmp/dwxtest/';


MSCK REPAIR TABLE web_sales;


analyze table web_sales compute statistics for columns;

 

Error Stack:

 
{noformat}
analyze table web_sales compute statistics for columns;

], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
attempt_1678779198717__2_00_52_3:java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.IOException: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
block -1 in file 
s3a://nfqe-tpcds-test/spark-tpcds/sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/web_sales/ws_sold_date_sk=2451825/part-00796-788bef86-2748-4e21-a464-b34c7e646c94-cfcafd2c-2abd-4067-8aea-f58cb1021b35.c000.snappy.parquet
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:351)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: java.io.IOException: 
org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in 
block -1 in file 
s3a://nfqe-tpcds-test/spark-tpcds/sf1000-parquet/useDecimal=true,useDate=true,filterNull=false/web_sales/ws_sold_date_sk=2451825/part-00796-788bef86-2748-4e21-a464-b34c7e646c94-cfcafd2c-2abd-4067-8aea-f58cb1021b35.c000.snappy.parquet
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.(TezGroupedSplitsInputFormat.java:145)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:111)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:164)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(MR

[jira] [Created] (HIVE-27212) Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1

2023-04-02 Thread Diksha (Jira)
Diksha created HIVE-27212:
-

 Summary: Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 
in branch-3.1
 Key: HIVE-27212
 URL: https://issues.apache.org/jira/browse/HIVE-27212
 Project: Hive
  Issue Type: Task
Reporter: Diksha


Backport of HIVE-24316: Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27211) Backport HIVE-22453: Describe table unnecessarily fetches partitions

2023-04-02 Thread Nikhil Gupta (Jira)
Nikhil Gupta created HIVE-27211:
---

 Summary: Backport HIVE-22453: Describe table unnecessarily fetches 
partitions
 Key: HIVE-27211
 URL: https://issues.apache.org/jira/browse/HIVE-27211
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 3.1.2
Reporter: Nikhil Gupta
 Fix For: 3.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27210) Backport HIVE-23338: Bump jackson version to 2.10.0

2023-04-02 Thread Nikhil Gupta (Jira)
Nikhil Gupta created HIVE-27210:
---

 Summary: Backport HIVE-23338: Bump jackson version to 2.10.0
 Key: HIVE-27210
 URL: https://issues.apache.org/jira/browse/HIVE-27210
 Project: Hive
  Issue Type: Sub-task
Reporter: Nikhil Gupta
 Fix For: 3.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27209) Backport HIVE-24569: LLAP daemon leaks file descriptors/log4j appenders

2023-04-02 Thread Nikhil Gupta (Jira)
Nikhil Gupta created HIVE-27209:
---

 Summary: Backport HIVE-24569: LLAP daemon leaks file 
descriptors/log4j appenders
 Key: HIVE-27209
 URL: https://issues.apache.org/jira/browse/HIVE-27209
 Project: Hive
  Issue Type: Sub-task
  Components: llap
Affects Versions: 2.2.0
Reporter: Nikhil Gupta






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27208) Iceberg: Add support for rename table

2023-04-01 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27208:
---

 Summary: Iceberg: Add support for rename table
 Key: HIVE-27208
 URL: https://issues.apache.org/jira/browse/HIVE-27208
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Add support for renaming iceberg tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27207) Backport of HIVE-26530: HS2 OOM-OperationManager.queryIdOperation does not properly clean up multiple queryIds

2023-03-31 Thread Aman Raj (Jira)
Aman Raj created HIVE-27207:
---

 Summary: Backport of HIVE-26530: HS2 
OOM-OperationManager.queryIdOperation does not properly clean up multiple 
queryIds
 Key: HIVE-27207
 URL: https://issues.apache.org/jira/browse/HIVE-27207
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


HIVE-26530 was already part of Hive-3.1.3 release so it should be backported to 
branch-3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27206) Backport of HIVE-20179

2023-03-31 Thread Aman Raj (Jira)
Aman Raj created HIVE-27206:
---

 Summary: Backport of HIVE-20179
 Key: HIVE-27206
 URL: https://issues.apache.org/jira/browse/HIVE-27206
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


HIVE-20179 was already part of Hive 3.1.3 release so it makes sense to backport 
this to branch-3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27205) Update jackson-databind for CVE fix for CVE-2022-42003

2023-03-31 Thread Diksha (Jira)
Diksha created HIVE-27205:
-

 Summary: Update jackson-databind for CVE fix for CVE-2022-42003
 Key: HIVE-27205
 URL: https://issues.apache.org/jira/browse/HIVE-27205
 Project: Hive
  Issue Type: Task
Reporter: Diksha
Assignee: Diksha


Update jackson-databind for CVE fix for CVE-2022-42003



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27204) Upgrade jettison to 1.5.2 to fix CVE-2022-45685

2023-03-31 Thread Aman Raj (Jira)
Aman Raj created HIVE-27204:
---

 Summary: Upgrade jettison to 1.5.2 to fix CVE-2022-45685
 Key: HIVE-27204
 URL: https://issues.apache.org/jira/browse/HIVE-27204
 Project: Hive
  Issue Type: Bug
Reporter: Aman Raj
Assignee: Aman Raj


Upgrade jettison to 1.5.2 to fix CVE-2022-45685



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27203) Add compaction pending Qtest for Insert-only, Partitioned, Clustered ACID, and combination Tables

2023-03-31 Thread Akshat Mathur (Jira)
Akshat Mathur created HIVE-27203:


 Summary: Add compaction pending Qtest for Insert-only, 
Partitioned, Clustered ACID, and combination Tables 
 Key: HIVE-27203
 URL: https://issues.apache.org/jira/browse/HIVE-27203
 Project: Hive
  Issue Type: Test
Reporter: Akshat Mathur
Assignee: Akshat Mathur


Improve Qtest Coverage for Compaction use cases for ACID Tables:
 # Partitioned Tables( Major & Minor ) 
 # Insert-Only Clustered( Major & Minor )
 # Insert-Only Partitioned( Major & Minor ) 
 # Insert-Only Clustered and Partitioned( Major & Minor ) 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27202) Disable flaky test TestJdbcWithMiniLlapRow#testComplexQuery

2023-03-31 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27202:
--

 Summary: Disable flaky test 
TestJdbcWithMiniLlapRow#testComplexQuery
 Key: HIVE-27202
 URL: https://issues.apache.org/jira/browse/HIVE-27202
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


TestJdbcWithMiniLlapRow#testComplexQuery is flaky and should be disabled.

 

http://ci.hive.apache.org/job/hive-flaky-check/634/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27201) Inconsistency between session Hive and thread-local Hive may cause HS2 deadlock

2023-03-31 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-27201:
--

 Summary: Inconsistency between session Hive and thread-local Hive 
may cause HS2 deadlock
 Key: HIVE-27201
 URL: https://issues.apache.org/jira/browse/HIVE-27201
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Zhihua Deng
Assignee: Zhihua Deng


The HiveServer2’s server handler can switch to process the operation from other 
session, in such case, the Hive cached in ThreadLocal is not the same as the 
Hive in SessionState, and can be referenced by another session. 

If the two handlers swap their sessions to process the DatabaseMetaData 
request, and the HiveMetastoreClientFactory obtains the Hive via Hive.get(), 
then there is a chance that the deadlock can happen.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27200) Backport HIVE-24928 In case of non-native tables use basic statistics from HiveStorageHandler

2023-03-30 Thread Yi Zhang (Jira)
Yi Zhang created HIVE-27200:
---

 Summary: Backport HIVE-24928 In case of non-native tables use 
basic statistics from HiveStorageHandler
 Key: HIVE-27200
 URL: https://issues.apache.org/jira/browse/HIVE-27200
 Project: Hive
  Issue Type: Improvement
  Components: StorageHandler
Reporter: Yi Zhang


This is to backport HIVE-24928 so that for HiveStorageHandler table 'ANALYZE 
TABLE ... COMPUTE STATISTICS' can use storagehandler to provide basic stats 
with BasicStatsNoJobTask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27199) Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats

2023-03-30 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27199:
--

 Summary: Read TIMESTAMP WITH LOCAL TIME ZONE columns from text 
files using custom formats
 Key: HIVE-27199
 URL: https://issues.apache.org/jira/browse/HIVE-27199
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Timestamp values come in many flavors and formats and there is no single 
representation that can satisfy everyone especially when such values are stored 
in plain text/csv files.

HIVE-9298, added a special SERDE property, {{{}timestamp.formats{}}}, that 
allows to provide custom timestamp patterns to parse correctly TIMESTAMP values 
coming from files.

However, when the column type is TIMESTAMP WITH LOCAL TIME ZONE (LTZ) it is not 
possible to use a custom pattern thus when the built-in Hive parser does not 
match the expected format a NULL value is returned.

Consider a text file, F1, with the following values:
{noformat}
2016-05-03 12:26:34
2016-05-03T12:26:34
{noformat}
and a table with a column declared as LTZ.
{code:sql}
CREATE TABLE ts_table (ts TIMESTAMP WITH LOCAL TIME ZONE);
LOAD DATA LOCAL INPATH './F1' INTO TABLE ts_table;

SELECT * FROM ts_table;
2016-05-03 12:26:34.0 US/Pacific
NULL
{code}
In order to give more flexibility to the users relying on the TIMESTAMP WITH 
LOCAL TIME ZONE datatype and also align the behavior with the TIMESTAMP type 
this JIRA aims to reuse the {{timestamp.formats}} property for both TIMESTAMP 
types.

The work here focuses exclusively on simple text files but the same could be 
done for other SERDE such as JSON etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27198) Delete directly aborted transactions instead of select and loading ids

2023-03-30 Thread Mahesh Raju Somalaraju (Jira)
Mahesh Raju Somalaraju created HIVE-27198:
-

 Summary: Delete directly aborted transactions instead of select 
and loading ids
 Key: HIVE-27198
 URL: https://issues.apache.org/jira/browse/HIVE-27198
 Project: Hive
  Issue Type: Improvement
Reporter: Mahesh Raju Somalaraju
Assignee: Mahesh Raju Somalaraju


in cleaning the aborted transaction , we can directly deletes the txns instead 
of selecting and process.

method name: 

cleanEmptyAbortedAndCommittedTxns

Code:

String s = "SELECT \"TXN_ID\" FROM \"TXNS\" WHERE " +
"\"TXN_ID\" NOT IN (SELECT \"TC_TXNID\" FROM \"TXN_COMPONENTS\") AND " +
" (\"TXN_STATE\" = " + TxnStatus.ABORTED + " OR \"TXN_STATE\" = " + 
TxnStatus.COMMITTED + ") AND "
+ " \"TXN_ID\" < " + lowWaterMark;

 

proposed code:

String s = "DELETE \"TXN_ID\" FROM \"TXNS\" WHERE " +
"\"TXN_ID\" NOT IN (SELECT \"TC_TXNID\" FROM \"TXN_COMPONENTS\") AND " +
" (\"TXN_STATE\" = " + TxnStatus.ABORTED + " OR \"TXN_STATE\" = " + 
TxnStatus.COMMITTED + ") AND "
+ " \"TXN_ID\" < " + lowWaterMark;

 
the select needs to be eliminated and the delete should work with the where 
clause instead of the built in clause
we can see no reason for loading the ids into memory and then generate a huge 
sql
 
Bathcing is also not necessary here, we can deletes the records directly



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27197) Iceberg: Support Iceberg version travel by reference name

2023-03-30 Thread zhangbutao (Jira)
zhangbutao created HIVE-27197:
-

 Summary: Iceberg:  Support Iceberg version travel by reference 
name 
 Key: HIVE-27197
 URL: https://issues.apache.org/jira/browse/HIVE-27197
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: zhangbutao


This ticket is inspired by https://github.com/apache/iceberg/pull/6575



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27196) Upgrade jettision version to 1.5.4 due to CVEs

2023-03-30 Thread Mahesh Raju Somalaraju (Jira)
Mahesh Raju Somalaraju created HIVE-27196:
-

 Summary: Upgrade jettision version to 1.5.4 due to CVEs
 Key: HIVE-27196
 URL: https://issues.apache.org/jira/browse/HIVE-27196
 Project: Hive
  Issue Type: Improvement
Reporter: Mahesh Raju Somalaraju
Assignee: Mahesh Raju Somalaraju


[CVE-2023-1436|https://www.cve.org/CVERecord?id=CVE-2023-1436]
[CWE-400|https://cwe.mitre.org/data/definitions/400.html]
Need to update jettison version to 1.5.4 version due to above CVE issues.
version 1.5.4 has no CVE issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27195) Drop table if Exists . fails during authorization for temporary tables

2023-03-29 Thread Riju Trivedi (Jira)
Riju Trivedi created HIVE-27195:
---

 Summary: Drop table if Exists . fails during 
authorization for temporary tables
 Key: HIVE-27195
 URL: https://issues.apache.org/jira/browse/HIVE-27195
 Project: Hive
  Issue Type: Bug
Reporter: Riju Trivedi
Assignee: Riju Trivedi


https://issues.apache.org/jira/browse/HIVE-20051 handles skipping authorization 
for temporary tables. But still, the drop table if Exists fails with  
HiveAccessControlException.

Steps to Repro:
{code:java}
use test; CREATE TEMPORARY TABLE temp_table (id int);
drop table if exists test.temp_table;
Error: Error while compiling statement: FAILED: HiveAccessControlException 
Permission denied: user [rtrivedi] does not have [DROP] privilege on 
[test/temp_table] (state=42000,code=4) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27194) Support expression in limit and offset clauses

2023-03-29 Thread vamshi kolanu (Jira)
vamshi kolanu created HIVE-27194:


 Summary: Support expression in limit and offset clauses
 Key: HIVE-27194
 URL: https://issues.apache.org/jira/browse/HIVE-27194
 Project: Hive
  Issue Type: Task
  Components: Hive
Reporter: vamshi kolanu
Assignee: vamshi kolanu


As part of this task, support expressions in both limit and offset clauses. 
Currently, these clauses are only supporting integers.

For example: The following expressions will be supported after this change.
1. select key from (select * from src limit (1+2*3)) q1;
2. select key from (select * from src limit (1+2*3) offset (3*4*5)) q1;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27193) Database names starting with '@' cause error during ALTER/DROP table.

2023-03-29 Thread Oliver Schiller (Jira)
Oliver Schiller created HIVE-27193:
--

 Summary: Database names starting with '@' cause error during 
ALTER/DROP table.
 Key: HIVE-27193
 URL: https://issues.apache.org/jira/browse/HIVE-27193
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Standalone Metastore
Affects Versions: 4.0.0-alpha-2
Reporter: Oliver Schiller


The creation of database that start with '@' is supported:

 
{code:java}
create database `@test`;{code}
 

The creation of a table in this database works:

 
{code:java}
create table `@test`.testtable (c1 integer);{code}
However, dropping or altering the table result in an error:

 
{code:java}
drop table `@test`.testtable;
FAILED: SemanticException Unable to fetch table testtable. @test is prepended 
with the catalog marker but does not appear to have a catalog name in it
Error: Error while compiling statement: FAILED: SemanticException Unable to 
fetch table testtable. @test is prepended with the catalog marker but does not 
appear to have a catalog name in it (state=42000,code=4)

alter table `@test`.testtable add columns (c2 integer);
FAILED: SemanticException Unable to fetch table testtable. @test is prepended 
with the catalog marker but does not appear to have a catalog name in it
Error: Error while compiling statement: FAILED: SemanticException Unable to 
fetch table testtable. @test is prepended with the catalog marker but does not 
appear to have a catalog name in it (state=42000,code=4)

{code}
 

Relevant snippet of stack trace:

 

{{}}
{code:java}
org.apache.hadoop.hive.metastore.api.MetaException: @TEST is prepended with the 
catalog marker but does not appear to have a catalog name in it at 
org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.parseDbName(MetaStoreUtils.java:1031
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTempTable(SessionHiveMetaStoreClient.java:651)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:279)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:273)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:258)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:1982)org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:1957)
...{code}
{{}}

 

My suspicion is that this caused by the implementation of getTempTable and how 
it is called. The method getTempTable calls parseDbName assuming that the given 
dbname might be prefixed with a catalog name. I'm wondering whether this is 
correct at this layer. From poking a bit around, it appears to me that the 
catalog name is typically prepended when making the actual thrift call.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27192) Use normal import instead of shaded import in TestSchemaToolCatalogOps.java

2023-03-29 Thread Jira
Zoltán Rátkai created HIVE-27192:


 Summary: Use normal import instead of shaded import in 
TestSchemaToolCatalogOps.java
 Key: HIVE-27192
 URL: https://issues.apache.org/jira/browse/HIVE-27192
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltán Rátkai
Assignee: Zoltán Rátkai






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27191) Cleaner is blocked by orphaned entries in MHL table

2023-03-29 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27191:
--

 Summary: Cleaner is blocked by orphaned entries in MHL table
 Key: HIVE-27191
 URL: https://issues.apache.org/jira/browse/HIVE-27191
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa


The following  mhl_txnids do not exist in TXNS table, as a result, the cleaner 
gets blocked and many entries are stuck in the ready-for-cleaning state. 

The cleaner should periodically check for such entries and remove them from 
MHL_TABLE to prevent the cleaner from being blocked.
{noformat}
postgres=# select mhl_txnid from min_history_level where not exists (select 1 
from txns where txn_id = mhl_txnid);
 mhl_txnid
---
  43708080
  43708088
  43679962
  43680464
  43680352
  43680392
  43680424
  43680436
  43680471
  43680475
  43680483
  43622677
  43708083
  43708084
  43678157
  43680482
  43680484
  43622745
  43622750
  43706829
  43707261
(21 rows){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27190) Implement col stats cache for hive iceberg table

2023-03-29 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27190:
--

 Summary: Implement  col stats cache for hive iceberg table
 Key: HIVE-27190
 URL: https://issues.apache.org/jira/browse/HIVE-27190
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27189) Remove duplicate info log in Hive.isSubDIr

2023-03-29 Thread shuyouZZ (Jira)
shuyouZZ created HIVE-27189:
---

 Summary: Remove duplicate info log in Hive.isSubDIr
 Key: HIVE-27189
 URL: https://issues.apache.org/jira/browse/HIVE-27189
 Project: Hive
  Issue Type: Improvement
Reporter: shuyouZZ






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27188) Explore usage of FilterApi.in(C column, Set values) in Parquet instead of nested OR

2023-03-28 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27188:
---

 Summary: Explore usage of FilterApi.in(C column, Set values) in 
Parquet instead of nested OR
 Key: HIVE-27188
 URL: https://issues.apache.org/jira/browse/HIVE-27188
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


Following query can throw stackoverflow exception with "Xss256K".

Currently it generates nested OR filter

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/FilterPredicateLeafBuilder.java#L43-L52]

Instead, need to explore the possibility of using {color:#de350b}FilterApi.in(C 
column, Set values) {color:#172b4d}in parquet{color}{color}

 
{noformat}
drop table if exists test;

create external table test (i int) stored as parquet;

insert into test values (1),(2),(3);

select count(*) from test where i in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 
73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 
93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 
110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 
126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 
142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 
158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 
174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 
190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 
206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 
222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 
238, 239, 240, 241, 242, 243);

 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27187) Incremental rebuild of materialized view stored by iceberg

2023-03-28 Thread Krisztian Kasa (Jira)
Krisztian Kasa created HIVE-27187:
-

 Summary: Incremental rebuild of materialized view stored by iceberg
 Key: HIVE-27187
 URL: https://issues.apache.org/jira/browse/HIVE-27187
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration, Materialized views
Reporter: Krisztian Kasa
Assignee: Krisztian Kasa


Currently incremental rebuild of materialized view stored by iceberg which 
definition query contains aggregate operator is transformed to an insert 
overwrite statement which contains a union operator if the source tables 
contains insert operations only. One branch of the union scans the view the 
other produces the delta.

This can be improved further: transform the statement to a multi insert 
statement representing a merge statement to insert new aggregations and update 
existing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27186) A persistent property store

2023-03-28 Thread Henri Biestro (Jira)
Henri Biestro created HIVE-27186:


 Summary: A persistent property store 
 Key: HIVE-27186
 URL: https://issues.apache.org/jira/browse/HIVE-27186
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 4.0.0-alpha-2
Reporter: Henri Biestro


WHAT
A persistent property store usable as a support facility for any metadata 
augmentation feature.

WHY
When adding new meta-data oriented features, we usually need to persist 
information linking the feature data and the HiveMetaStore objects it applies 
to. Any information related to a database, a table or the cluster - like 
statistics for example or any operational data state or data (think rolling 
backup) -  fall in this use-case.
Typically, accommodating such a feature requires modifying the Metastore 
database schema by adding or altering a table. It also usually implies 
modifying the thrift APIs to expose such meta-data to consumers.
The proposed feature wants to solve the persistence and query/transport for 
these types of use-cases by exposing a 'key/(meta)value' store exposed as a 
property system.

HOW
A property-value model is the simple and generic exposed API.
To provision for several usage scenarios, the model entry point is a 
'namespace' that qualifies the feature-component property manager. For example, 
'stats' could be the namespace for all properties related to the 'statistics' 
feature.
The namespace identifies a manager that handles property-groups persisted as 
property-maps. For instance, all statistics pertaining to a given table would 
be collocated in the same property-group. As such, all properties (say number 
of 'unique_values' per columns) for a given HMS table 'relation0' would all be 
stored and persisted in the same property-map instance.
Property-maps may be decorated by an (optional) schema that may declare the 
name and value-type of allowed properties (and their optional default value). 
Each property is addressed by a name, a path uniquely identifying the property 
in a given property map.
The manager also handles transforming property-map names to the property-map 
keys used to persist them in the DB.

The API provides inserting/updating properties in bulk transactionally. It also 
provides selection/projection to help reduce the volume of exchange between 
client/server; selection can use (JEXL expression) predicates to filter maps.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27185) Iceberg: Cache iceberg table while loading for stats

2023-03-27 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27185:
---

 Summary: Iceberg: Cache iceberg table while loading for stats
 Key: HIVE-27185
 URL: https://issues.apache.org/jira/browse/HIVE-27185
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Presently iceberg for stats loads the iceberg table multiple times for stats 
via different routes.
Cache it to avoid reading/loading the iceberg table multiple times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27184) Add class name profiling option in ProfileServlet

2023-03-27 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27184:
---

 Summary: Add class name profiling option in ProfileServlet
 Key: HIVE-27184
 URL: https://issues.apache.org/jira/browse/HIVE-27184
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Rajesh Balamohan


With async-profiler "-e classame.method", it is possible to profile specific 
events. Currently profileServlet supports events like cpu, alloc, lock etc. It 
will be good to enhance to support method name profiling as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27183) Iceberg: Table information is loaded multiple times

2023-03-27 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27183:
---

 Summary: Iceberg: Table information is loaded multiple times
 Key: HIVE-27183
 URL: https://issues.apache.org/jira/browse/HIVE-27183
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


HMS::getTable invokes "HiveIcebergMetaHook::postGetTable" which internally 
loads iceberg table again.

If this isn't needed or needed only for show-create-table, do not load the 
table again.
{noformat}
    at jdk.internal.misc.Unsafe.park(java.base@11.0.18/Native Method)
    - parking to wait for  <0x00066f84eef0> (a 
java.util.concurrent.CompletableFuture$Signaller)
    at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.18/LockSupport.java:194)
    at 
java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.18/CompletableFuture.java:1796)
    at 
java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.18/ForkJoinPool.java:3128)
    at 
java.util.concurrent.CompletableFuture.waitingGet(java.base@11.0.18/CompletableFuture.java:1823)
    at 
java.util.concurrent.CompletableFuture.get(java.base@11.0.18/CompletableFuture.java:1998)
    at org.apache.hadoop.util.functional.FutureIO.awaitFuture(FutureIO.java:77)
    at 
org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:196)
    at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:263)
    at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:258)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$0(BaseMetastoreTableOperations.java:177)
    at 
org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$609/0x000840e18040.apply(Unknown
 Source)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.lambda$refreshFromMetadataLocation$1(BaseMetastoreTableOperations.java:191)
    at 
org.apache.iceberg.BaseMetastoreTableOperations$$Lambda$610/0x000840e18440.run(Unknown
 Source)
    at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
    at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
    at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
    at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:191)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:176)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.refreshFromMetadataLocation(BaseMetastoreTableOperations.java:171)
    at 
org.apache.iceberg.hive.HiveTableOperations.doRefresh(HiveTableOperations.java:153)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:96)
    at 
org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:79)
    at 
org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:44)
    at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:115)
    at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:105)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$1(IcebergTableUtil.java:99)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$552/0x000840d59840.apply(Unknown
 Source)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.lambda$getTable$4(IcebergTableUtil.java:111)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil$$Lambda$557/0x000840d58c40.get(Unknown
 Source)
    at java.util.Optional.orElseGet(java.base@11.0.18/Optional.java:369)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:108)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:69)
    at 
org.apache.iceberg.mr.hive.IcebergTableUtil.getTable(IcebergTableUtil.java:73)
    at 
org.apache.iceberg.mr.hive.HiveIcebergMetaHook.postGetTable(HiveIcebergMetaHook.java:931)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.executePostGetTableHook(HiveMetaStoreClient.java:2638)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2624)
    at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:267)
    at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source)
    at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:216)
    at com.sun.proxy.$Proxy56.getTable(Unknown Source)
    at jdk.internal.reflect.GeneratedMethodAccessor137.invoke(Unknown Source)
    at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.inv

[jira] [Created] (HIVE-27182) tez_union_with_udf.q with TestMiniTezCliDriver is flaky

2023-03-27 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27182:
---

 Summary: tez_union_with_udf.q with TestMiniTezCliDriver is flaky
 Key: HIVE-27182
 URL: https://issues.apache.org/jira/browse/HIVE-27182
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena


Looks like memory issue:

{noformat}
< Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded
< Serialization trace:
< genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
< colExprMap (org.apache.hadoop.hive.ql.plan.SelectDesc)
< conf (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
< childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27181) Remove RegexSerDe from hive-contrib, Upgrade should update changed FQN for RegexSerDe in HMS DB

2023-03-27 Thread Riju Trivedi (Jira)
Riju Trivedi created HIVE-27181:
---

 Summary: Remove RegexSerDe from hive-contrib, Upgrade should 
update changed FQN for RegexSerDe in HMS DB
 Key: HIVE-27181
 URL: https://issues.apache.org/jira/browse/HIVE-27181
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Riju Trivedi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27180) Remove JsonSerde from hcatalog, Upgrade should update changed FQN for JsonSerDe in HMS DB

2023-03-27 Thread Riju Trivedi (Jira)
Riju Trivedi created HIVE-27180:
---

 Summary: Remove JsonSerde from hcatalog, Upgrade should update 
changed FQN for JsonSerDe in HMS DB 
 Key: HIVE-27180
 URL: https://issues.apache.org/jira/browse/HIVE-27180
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Riju Trivedi
Assignee: Riju Trivedi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27179) HS2 WebUI throws NPE when JspFactory loaded from jetty-runner

2023-03-27 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-27179:
--

 Summary: HS2 WebUI throws NPE when JspFactory loaded from 
jetty-runner
 Key: HIVE-27179
 URL: https://issues.apache.org/jira/browse/HIVE-27179
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Zhihua Deng


In HIVE-17088{*},{*} we resolved a NPE thrown from HS2 WebUI by introducing 

javax.servlet.jsp-api. It works as expected when the javax.servlet.jsp-api jar 
prevails jetty-runner jar, but things can be different in some environments, it 
still throws NPE when opening the HS2 web:
{noformat}
java.lang.NullPointerException at 
org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:286)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:71) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
org.eclipse.jetty.servlet.ServletHolder$NotAsync.service(ServletHolder.java:1443)
 at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at 
org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
...{noformat}
The jetty-runner JspFactory.getDefaultFactory() just returns null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27178) Backport of HIVE-23321 to branch-3

2023-03-27 Thread Aman Raj (Jira)
Aman Raj created HIVE-27178:
---

 Summary: Backport of HIVE-23321 to branch-3
 Key: HIVE-27178
 URL: https://issues.apache.org/jira/browse/HIVE-27178
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


Current branch-3 fails with the diff in select count(*) from skewed_string_list 
and select count(*) from skewed_string_list_values. Jenkins run : [jenkins / 
hive-precommit / PR-4156 / #1 
(apache.org)|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4156/1/tests/]

Diff : 
Client Execution succeeded but contained differences (error code = 1) after 
executing sysdb.q 
3740d3739
< hdfs://### HDFS PATH ### default public ROLE
4036c4035
< 3
---
> 6
4045c4044
< 3
---
> 6
 
This ticket tries to fix this diff. Please read the description of this ticket 
for the exact reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27177) Add alter table...Convert to Iceberg command

2023-03-26 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-27177:
---

 Summary: Add alter table...Convert to Iceberg command
 Key: HIVE-27177
 URL: https://issues.apache.org/jira/browse/HIVE-27177
 Project: Hive
  Issue Type: Improvement
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Add an alter table  convert to Iceberg [TBLPROPERTIES('','')] to 
convert exiting external tables to iceberg tables



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27176) EXPLAIN SKEW

2023-03-26 Thread Jira
László Bodor created HIVE-27176:
---

 Summary: EXPLAIN SKEW 
 Key: HIVE-27176
 URL: https://issues.apache.org/jira/browse/HIVE-27176
 Project: Hive
  Issue Type: Improvement
Reporter: László Bodor


Thinking about a new explain feature, which is actually not an explain, instead 
a set of analytical queries: considering a very complicated and large SQL 
statement (this below is a simple one, just for example's sake):
{code}
SELECT a FROM (SELECT b ... JOIN c on b.x = c.y) d JOIN e ON d.v = e.w
{code}

EXPLAIN skew should run a query like:
{code}
SELECT "b", "x", x, count distinct(b.x) as count order by count desc limit 50
UNION ALL
SELECT "c", "y", y, count distinct(c.y) as count order by count desc limit 50
UNION ALL
SELECT "d", "v", v count distinct(d.v) as count order by count desc limit 50
UNION ALL
SELECT "e", "w", w, count distinct(e.w) as count order by count desc limit 50
{code}

collecting some cardinality info about all the join columns found in the query, 
so result might be like:

{code}
table_name column_name column_value count
b "x" x_skew_value1 100431234
b "x" x_skew_value2 234
c "y" y_skew_value1 35
c "y" x_skew_value2 45
c "y" x_skew_value3 42
...
{code}
this doesn't solve the problem, instead shows data skew immediately for further 
analysis

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27175) Fix TestJdbcDriver2#testSelectExecAsync2

2023-03-25 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27175:
--

 Summary: Fix TestJdbcDriver2#testSelectExecAsync2
 Key: HIVE-27175
 URL: https://issues.apache.org/jira/browse/HIVE-27175
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


TestJdbcDriver2#testSelectExecAsync2 is failing on branch-3. We need to 
backport HIVE-20897 to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27174) Disable sysdb.q test

2023-03-24 Thread Aman Raj (Jira)
Aman Raj created HIVE-27174:
---

 Summary: Disable sysdb.q test
 Key: HIVE-27174
 URL: https://issues.apache.org/jira/browse/HIVE-27174
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


h3. What changes were proposed in this pull request?

Disabled sysdb.q test. The test is failing because of diff in 
BASIC_COLUMN_STATS json string.
Client Execution succeeded but contained differences (error code = 1) after 
executing sysdb.q
3803,3807c3803,3807
< COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@125b285b
< COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@471246f3
< COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@57c013
< COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@59f1d7ac
< COLUMN_STATS_ACCURATE org.apache.derby.impl.jdbc.EmbedClob@71a0
—
{quote}COLUMN_STATS_ACCURATE 
\{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}}
COLUMN_STATS_ACCURATE 
\{"BASIC_STATS":"true","COLUMN_STATS":{"c_boolean":"true","c_float":"true","c_int":"true","key":"true","value":"true"}}
COLUMN_STATS_ACCURATE 
\{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
COLUMN_STATS_ACCURATE 
\{"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":
{quote}
h3. Why are the changes needed?

There is no issue in the test. The current code prints the COL_STATS as an 
Object instead of a json string. Not sure why is this case. Tried a lot of ways 
but seems like this is not fixable at the moment. So, disabling it for now. 
Note that, in Hive 3.1.3 release this test was disabled so there should not be 
any issue in disabling it here.

 

 

Created a followup ticket to fix this test that can be taken up later - 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27173) Add method for Spark to be able to trigger DML events

2023-03-24 Thread Naveen Gangam (Jira)
Naveen Gangam created HIVE-27173:


 Summary: Add method for Spark to be able to trigger DML events
 Key: HIVE-27173
 URL: https://issues.apache.org/jira/browse/HIVE-27173
 Project: Hive
  Issue Type: Improvement
Reporter: Naveen Gangam


Spark currently uses Hive.java from Hive as a convenient way to hide from the 
having to deal with HMS Client and the thrift objects. Currently, Hive has 
support for DML events (being able to generate events on DML operations but 
does not expose a public method to do so). It has a private method that takes 
in Hive objects like Table etc. Would be nice if we can have something with 
more primitive datatypes.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27172) Add the HMS client connection timeout config

2023-03-24 Thread Wechar (Jira)
Wechar created HIVE-27172:
-

 Summary: Add the HMS client connection timeout config
 Key: HIVE-27172
 URL: https://issues.apache.org/jira/browse/HIVE-27172
 Project: Hive
  Issue Type: New Feature
  Components: Hive
Reporter: Wechar
Assignee: Wechar


Currently {{HiveMetastoreClient}} use {{CLIENT_SOCKET_TIMEOUT}} as both socket 
timeout and connection timeout, it's not convenient for users to set a smaller 
connection timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27171) Backport HIVE-20680 to branch-3

2023-03-24 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27171:
--

 Summary: Backport HIVE-20680 to branch-3
 Key: HIVE-27171
 URL: https://issues.apache.org/jira/browse/HIVE-27171
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


We need to backport HIVE-26836 to fix the 
TestReplicationScenariosAcrossInstances on branch-3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27170) facing issues while using tez 0.9.2 as execution engine to hive 2.3.9

2023-03-23 Thread vikran (Jira)
vikran created HIVE-27170:
-

 Summary: facing issues while using tez 0.9.2 as execution engine 
to hive 2.3.9
 Key: HIVE-27170
 URL: https://issues.apache.org/jira/browse/HIVE-27170
 Project: Hive
  Issue Type: Bug
  Components: Hive, Tez
Affects Versions: 2.3.9
Reporter: vikran
 Fix For: 2.3.9
 Attachments: hive-site.txt, hive_error_in_yarn.txt, tez-site.txt

Hi Team,

am using below versions:

hive 2.3.9

tez 0.9.2

spark 3.3.2

 hive-site.xml(attached)

 tez-site.xml(attached)

i have added tez jars and files as well as hive jars into /apps/tez in hdfs 
directory,

when am trying to start hive in cli, i am getting below error,

hive> INSERT INTO emp1.employee values(7,'scott',23,'M');
Query ID = azureuser_20230324061903_97928963-410d-44a0-aa47-a83cdc24ce88
Total jobs = 1
Launching Job 1 out of 1
*FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask*

and i have attached complete error log from appmaster.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27169) New Locked List to prevent configuration change at runtime without throwing error

2023-03-23 Thread Raghav Aggarwal (Jira)
Raghav Aggarwal created HIVE-27169:
--

 Summary: New Locked List to prevent configuration change at 
runtime without throwing error
 Key: HIVE-27169
 URL: https://issues.apache.org/jira/browse/HIVE-27169
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0-alpha-2
Reporter: Raghav Aggarwal
Assignee: Raghav Aggarwal


_*AIM*_

Create a new locked list called{{ hive.conf.locked.list}} which contains comma 
separated configuration that won't be changed during runtime. If someone try to 
change them at runtime then it will give WARN log on beeline itself.

 

_*How is it different from Restricted List?*_

When running hql file or at runtime, if a configuration present in restricted 
list get updated then it will throw error and won't proceed with further 
execution of hql file.

With locked list, the configuration that is getting updated will throw WARN log 
on beeline and will continue to execute the hql file.

 

_*Why is it required?*_

In organisations, admin want to enforce some configs which user shouldn't be 
able to change at runtime and it shouldn't affect user's existing hql scripts. 
Therefore, this locked list will be useful as it will not allow user to change 
the value of particular configs and it will also not stop the execution of hql 
scripts.

 

{_}*NOTE*{_}: Only at cluster level {{hive.conf.locked.list }}can be set and 
after that the hive service needs to be restarted.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27168) Use basename of the datatype when fetching partition metadata using partition filters

2023-03-23 Thread Sourabh Badhya (Jira)
Sourabh Badhya created HIVE-27168:
-

 Summary: Use basename of the datatype when fetching partition 
metadata using partition filters
 Key: HIVE-27168
 URL: https://issues.apache.org/jira/browse/HIVE-27168
 Project: Hive
  Issue Type: Bug
Reporter: Sourabh Badhya
Assignee: Sourabh Badhya


While fetching partition metadata using partition filters, we use the column 
type of the table directly. However, char/varchar types can contain extra 
information such as length of the char/varchar column and hence it skips 
fetching partition metadata due to this extra information.

Solution: Use the basename of the column type while deciding on whether 
partition pruning can be done on the partitioned column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27167) Upgrade guava version in standalone-metastore and storage-api module

2023-03-23 Thread Raghav Aggarwal (Jira)
Raghav Aggarwal created HIVE-27167:
--

 Summary: Upgrade guava version in standalone-metastore and 
storage-api module
 Key: HIVE-27167
 URL: https://issues.apache.org/jira/browse/HIVE-27167
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore, storage-api
Affects Versions: 4.0.0-alpha-2
Reporter: Raghav Aggarwal
Assignee: Raghav Aggarwal


The guava version in standalone-metastore and storage-api (i.e 19.0) is not in 
sync with the the parent pom.xml (i.e 22.0). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27166) Introduce Apache Commons DBUtils to handle boilerplate code

2023-03-22 Thread KIRTI RUGE (Jira)
KIRTI RUGE created HIVE-27166:
-

 Summary: Introduce Apache Commons DBUtils to handle boilerplate 
code
 Key: HIVE-27166
 URL: https://issues.apache.org/jira/browse/HIVE-27166
 Project: Hive
  Issue Type: Improvement
Reporter: KIRTI RUGE


Apache Commons DbUtils is a small library that makes working with JDBC a lot 
easier.

Currently scope of this Jira is introducing Apache DBUtils latest version for 
applicable methods in TxnHandler and CompactionTxnHandler classes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27165) PART_COL_STATS metastore query not hitting the index

2023-03-21 Thread Hongdan Zhu (Jira)
Hongdan Zhu created HIVE-27165:
--

 Summary: PART_COL_STATS metastore query not hitting the index
 Key: HIVE-27165
 URL: https://issues.apache.org/jira/browse/HIVE-27165
 Project: Hive
  Issue Type: Improvement
Reporter: Hongdan Zhu


The query located here:
[https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java#L1029-L1032]

is not hitting an index.  The index contains CAT_NAME whereas this query does 
not. This was a change made in Hive 3.0, I think.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27164) Create Temp Txn Table As Select is failing at tablePath validation

2023-03-21 Thread Naresh P R (Jira)
Naresh P R created HIVE-27164:
-

 Summary: Create Temp Txn Table As Select is failing at tablePath 
validation
 Key: HIVE-27164
 URL: https://issues.apache.org/jira/browse/HIVE-27164
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Metastore
Reporter: Naresh P R
 Attachments: mm_cttas.q

After HIVE-25303, every CTAS goes for  
HiveMetaStore$HMSHandler#translate_table_dryrun() call to fetch table location 
for CTAS queries which fails with following exception for temp tables if 
MetastoreDefaultTransformer is set.
{code:java}
2023-03-17 16:41:23,390 INFO  
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer: 
[pool-6-thread-196]: Starting translation for CreateTable for processor 
HMSClient-@localhost with [EXTWRITE, EXTREAD, HIVEBUCKET2, HIVEFULLACIDREAD, 
HIVEFULLACIDWRITE, HIVECACHEINVALIDATE, HIVEMANAGESTATS, 
HIVEMANAGEDINSERTWRITE, HIVEMANAGEDINSERTREAD, HIVESQL, HIVEMQT, 
HIVEONLYMQTWRITE] on table test_temp
2023-03-17 16:41:23,392 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-6-thread-196]: 
MetaException(message:Illegal location for managed table, it has to be within 
database's managed location)
        at 
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.validateTablePaths(MetastoreDefaultTransformer.java:886)
        at 
org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer.transformCreateTable(MetastoreDefaultTransformer.java:666)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.translate_table_dryrun(HiveMetaStore.java:2164)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code}
I am able to repro this issue at apache upstream using attached testcase.

 

There are multiple ways to fix this issue
 * Have temp txn table path under db's managed location path. This will help 
with encryption zone tables as well.
 * skips location check for temp tables at 
MetastoreDefaultTransformer#validateTablePaths()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27163) Column stats not getting published after an insert query into an external table with custom location

2023-03-21 Thread Taraka Rama Rao Lethavadla (Jira)
Taraka Rama Rao Lethavadla created HIVE-27163:
-

 Summary: Column stats not getting published after an insert query 
into an external table with custom location
 Key: HIVE-27163
 URL: https://issues.apache.org/jira/browse/HIVE-27163
 Project: Hive
  Issue Type: Bug
  Components: Hive
Reporter: Taraka Rama Rao Lethavadla


Test case details are below


*test.q*
{noformat}
set hive.stats.column.autogather=true;
set hive.stats.autogather=true;
dfs ${system:test.dfs.mkdir} ${system:test.tmp.dir}/test;
create external table test_custom(age int, name string) stored as orc location 
'/tmp/test';
insert into test_custom select 1, 'test';
desc formatted test_custom age;{noformat}

*test.q.out*

 

 
{noformat}
 A masked pattern was here 
PREHOOK: type: CREATETABLE
 A masked pattern was here 
PREHOOK: Output: database:default
PREHOOK: Output: default@test_custom
 A masked pattern was here 
POSTHOOK: type: CREATETABLE
 A masked pattern was here 
POSTHOOK: Output: database:default
POSTHOOK: Output: default@test_custom
PREHOOK: query: insert into test_custom select 1, 'test'
PREHOOK: type: QUERY
PREHOOK: Input: _dummy_database@_dummy_table
PREHOOK: Output: default@test_custom
POSTHOOK: query: insert into test_custom select 1, 'test'
POSTHOOK: type: QUERY
POSTHOOK: Input: _dummy_database@_dummy_table
POSTHOOK: Output: default@test_custom
POSTHOOK: Lineage: test_custom.age SIMPLE []
POSTHOOK: Lineage: test_custom.name SIMPLE []
PREHOOK: query: desc formatted test_custom age
PREHOOK: type: DESCTABLE
PREHOOK: Input: default@test_custom
POSTHOOK: query: desc formatted test_custom age
POSTHOOK: type: DESCTABLE
POSTHOOK: Input: default@test_custom
col_name                age
data_type               int
min
max
num_nulls
distinct_count
avg_col_len
max_col_len
num_trues
num_falses
bit_vector
comment                 from deserializer{noformat}
As we can see from desc formatted output, column stats were not populated

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27162) Unify HiveUnixTimestampSqlOperator and HiveToUnixTimestampSqlOperator

2023-03-21 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27162:
--

 Summary: Unify HiveUnixTimestampSqlOperator and 
HiveToUnixTimestampSqlOperator
 Key: HIVE-27162
 URL: https://issues.apache.org/jira/browse/HIVE-27162
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Stamatis Zampetakis


The two classes below both represent the {{unix_timestamp}} operator and have 
identical implementations.
* 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java
* 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java

Probably there is a way to use one or the other and not both; having two ways 
of representing the same thing can bring various problems in query planning and 
it also leads to code duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27161) MetaException when executing CTAS query in Druid storage handler

2023-03-21 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27161:
--

 Summary: MetaException when executing CTAS query in Druid storage 
handler
 Key: HIVE-27161
 URL: https://issues.apache.org/jira/browse/HIVE-27161
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis


Any kind of CTAS query targeting the Druid storage handler fails with the 
following exception:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:LOCATION may not be specified for Druid)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1347) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1352) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:158)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:116)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:228) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.initDataset(QTestDatasetHandler.java:86)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.beforeTest(QTestDatasetHandler.java:190)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:607) 
~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:112)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) 
~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:60)
 ~[test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_261]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_261]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_261]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 ~[junit-4.13.2.jar:4.13.2]
at

[jira] [Created] (HIVE-27160) Iceberg: Optimise delete (entire) data from table

2023-03-21 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-27160:
-

 Summary: Iceberg: Optimise delete (entire) data from table
 Key: HIVE-27160
 URL: https://issues.apache.org/jira/browse/HIVE-27160
 Project: Hive
  Issue Type: Task
Reporter: Denys Kuzmenko


Currently, in MOR mode, Hive creates "positional delete" files during deletes. 
With "Delete from ", the entire dataset in the table or partition is written as 
a "positional delete" file.

During the read operation, all these files are read again causing huge delay.

Proposal: apply "truncate" optimization in case of "delete *".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27159) Filters are not pushed down for decimal format in Parquet

2023-03-20 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27159:
---

 Summary: Filters are not pushed down for decimal format in Parquet
 Key: HIVE-27159
 URL: https://issues.apache.org/jira/browse/HIVE-27159
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


Decimal filters are not created and pushed down in parquet readers. This causes 
latency delays and unwanted row processing in query execution. 

It throws exception in runtime and processes more rows. 

E.g Q13.

{noformat}

Parquet: (Map 1)

INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :  Map 1  31254.00  0  0 
549,181,950  133
INFO  :  Map 3  0.00  0  0  
73,049  365
INFO  :  Map 4   2027.00  0  0   
6,000,0001,689,919
INFO  :  Map 5  0.00  0  0   
7,2001,440
INFO  :  Map 6517.00  0  0   
1,920,800  493,920
INFO  :  Map 7  0.00  0  0   
1,0021,002
INFO  :  Reducer 2  18716.00  0  0 
1330
INFO  : 
--

ORC:


INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES  DURATION(ms)   CPU_TIME(ms)GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :  Map 1   6556.00  0  0 
267,146,063  152
INFO  :  Map 3  0.00  0  0  
10,000  365
INFO  :  Map 4   2014.00  0  0   
6,000,0001,689,919
INFO  :  Map 5  0.00  0  0   
7,2001,440
INFO  :  Map 6504.00  0  0   
1,920,800  493,920
INFO  :  Reducer 2   3159.00  0  0 
1520
INFO  : 
--

{noformat}




{noformat}
 Map 1
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: (ss_hdemo_sk is not null and ss_addr_sk is not 
null and ss_cdemo_sk is not null and ss_store_sk is not null and 
((ss_sales_price >= 100) or (ss_sales_price <= 150) or (ss_sales_price >= 50) 
or (ss_sales_price <= 100) or (ss_sales_price >= 150) or (ss_sales_price <= 
200)) and ((ss_net_profit >= 100) or (ss_net_profit <= 200) or (ss_net_profit 
>= 150) or (ss_net_profit <= 300) or (ss_net_profit >= 50) or (ss_net_profit <= 
250))) (type: boolean)
  probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_112_container, 
bigKeyColName:ss_hdemo_sk, smallTablePos:1, keyRatio:5.042575832290721E-6
  Statistics: Num rows: 2750380056 Data size: 1321831086472 
Basic stats: COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: (ss_hdemo_sk is not null and ss_addr_sk is not 
null and ss_cdemo_sk is not null and ss_store_sk is not null and 
((ss_sales_price >= 100) or (ss_sales_price <= 150) or (ss_sales_price >= 50) 
or (ss_sales_price <= 100) or (ss_sales_price >= 150) or (ss_sales_price <= 
200)) and ((ss_net_profit >= 100) or (ss_net_profit <= 200) or (ss_net_profit 
>= 150) or (ss_net_profit <= 300) or (ss_net_profit >= 50) or (ss_net_profit <= 
250))) (type: boolean)
Statistics: Num rows: 2500252205 Data size: 1201619783884 
Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
  expressions: ss_cdemo_sk (type: bigint), ss_hdemo_sk 
(type: bigint), ss_addr_sk (type: bigint), ss_store_sk (type: bigint), 
ss_quantity (type: int), ss_ext_sales_price (type: decimal(7,2)), 
ss_ext_wholesale_cost (type: decimal(7,2)), ss_sold_date_sk (type: bigint), 
ss_net_profit BETWEEN 100 AND 200 (type: boolean), ss_net_profit BETWEEN 150 
AND 300 (type: boolean), ss_net_profit BETWEEN 50 AND 250 (type: boolean), 
ss_sales_price BETWEEN 100 AND 150 (type: boolean), ss_sales_price BETWEEN 50 
AN

[jira] [Created] (HIVE-27158) Store hive columns stats in puffin files for iceberg tables

2023-03-20 Thread Simhadri Govindappa (Jira)
Simhadri Govindappa created HIVE-27158:
--

 Summary: Store hive columns stats in puffin files for iceberg 
tables
 Key: HIVE-27158
 URL: https://issues.apache.org/jira/browse/HIVE-27158
 Project: Hive
  Issue Type: Improvement
Reporter: Simhadri Govindappa
Assignee: Simhadri Govindappa






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27157) AssertionError when inferring return type for unix_timestamp function

2023-03-20 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27157:
--

 Summary: AssertionError when inferring return type for 
unix_timestamp function
 Key: HIVE-27157
 URL: https://issues.apache.org/jira/browse/HIVE-27157
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Any attempt to derive the return data type for the {{unix_timestamp}} function 
results into the following assertion error.
{noformat}
java.lang.AssertionError: typeName.allowsPrecScale(true, false): BIGINT
at 
org.apache.calcite.sql.type.BasicSqlType.checkPrecScale(BasicSqlType.java:65)
at org.apache.calcite.sql.type.BasicSqlType.(BasicSqlType.java:81)
at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:67)
at 
org.apache.calcite.sql.fun.SqlAbstractTimeFunction.inferReturnType(SqlAbstractTimeFunction.java:78)
at 
org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:278)
{noformat}
due to a faulty implementation of type inference for the respective operators:
 * 
[https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java]
 * 
[https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java]

Although at this stage in master it is not possible to reproduce the problem 
with an actual SQL query the buggy implementation must be fixed since slight 
changes in the code/CBO rules may lead to methods relying on 
{{{}SqlOperator.inferReturnType{}}}.

Note that in older versions of Hive it is possible to hit the AssertionError in 
various ways. For example in Hive 3.1.3 (and older), the error may come from 
[HiveRelDecorrelator|https://github.com/apache/hive/blob/4df4d75bf1e16fe0af75aad0b4179c34c07fc975/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1933]
 in the presence of sub-queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27156) Wrong results when CAST timestamp literal with timezone to TIMESTAMP

2023-03-20 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27156:
--

 Summary: Wrong results when CAST timestamp literal with timezone 
to TIMESTAMP
 Key: HIVE-27156
 URL: https://issues.apache.org/jira/browse/HIVE-27156
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Casting a timestamp literal with an invalid timezone to the TIMESTAMP datatype 
results into a timestamp with the time part truncated to midnight (00:00:00). 

*Case I*
{code:sql}
select cast('2020-06-28 22:17:33.123456 Europe/Amsterd' as timestamp);
{code}

+Actual+
|2020-06-28 00:00:00|

+Expected+
|NULL/ERROR/2020-06-28 22:17:33.123456|

*Case II*
{code:sql}
select cast('2020-06-28 22:17:33.123456 Invalid/Zone' as timestamp);
{code}

+Actual+
|2020-06-28 00:00:00|

+Expected+
|NULL/ERROR/2020-06-28 22:17:33.123456|

The existing documentation does not cover what should be the output in the 
cases above:
* 
https://cwiki.apache.org/confluence/display/hive/languagemanual+types#LanguageManualTypes-TimestampstimestampTimestamps
* https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types

*Case III*
Another subtle but important case is the following where the timestamp literal 
has a valid timezone but we are attempting a cast to a datatype that does not 
store the timezone.

{code:sql}
select cast('2020-06-28 22:17:33.123456 Europe/Amsterdam' as timestamp);
{code}

+Actual+
|2020-06-28 22:17:33.123456|

The correctness of the last result is debatable since someone would expect a 
NULL or ERROR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27155) Iceberg: Vectorize virtual columns

2023-03-20 Thread Denys Kuzmenko (Jira)
Denys Kuzmenko created HIVE-27155:
-

 Summary: Iceberg: Vectorize virtual columns
 Key: HIVE-27155
 URL: https://issues.apache.org/jira/browse/HIVE-27155
 Project: Hive
  Issue Type: Task
Reporter: Denys Kuzmenko


Vectorization gets disabled at runtime with the following reason: 
{code}
Select expression for SELECT operator: Virtual column PARTITION__SPEC__ID is 
not supported
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27154) Fix testBootstrapReplLoadRetryAfterFailureForPartitions

2023-03-19 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27154:
--

 Summary: Fix testBootstrapReplLoadRetryAfterFailureForPartitions
 Key: HIVE-27154
 URL: https://issues.apache.org/jira/browse/HIVE-27154
 Project: Hive
  Issue Type: Sub-task
Reporter: Vihang Karajgaonkar
Assignee: Vihang Karajgaonkar


`testBootstrapReplLoadRetryAfterFailureForPartitions` has been failing on 
branch-3

 

http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4067/12/tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27153) Revert "HIVE-20182: Backport HIVE-20067 to branch-3"

2023-03-19 Thread Aman Raj (Jira)
Aman Raj created HIVE-27153:
---

 Summary: Revert "HIVE-20182: Backport HIVE-20067 to branch-3"
 Key: HIVE-27153
 URL: https://issues.apache.org/jira/browse/HIVE-27153
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


The mm_all.q test is failing because of this commit. This commit was not 
validated before committing.

There is no stack trace for this exception. Link to the exception : 
[http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4126/2/tests]

 
{code:java}
java.lang.AssertionError: Client execution failed with error code = 1 running 
"insert into table part_mm_n0 partition(key_mm=455) select key from 
intermediate_n0" fname=mm_all.q See ./ql/target/tmp/log/hive.log or 
./itests/qtest/target/tmp/log/hive.log, or check ./ql/target/surefire-reports 
or ./itests/qtest/target/surefire-reports/ for specific test cases logs.at 
org.junit.Assert.fail(Assert.java:88)at 
org.apache.hadoop.hive.ql.QTestUtil.failed(QTestUtil.java:2232)  at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:180)
 at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)   at 
org.apache.hadoop.hive.cli.split1.TestMiniLlapCliDriver.testCliDriver(TestMiniLlapCliDriver.java:62)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27152) Revert "Constant UDF is not pushed to JDBCStorage Handler"

2023-03-19 Thread Aman Raj (Jira)
Aman Raj created HIVE-27152:
---

 Summary: Revert "Constant UDF is not pushed to JDBCStorage Handler"
 Key: HIVE-27152
 URL: https://issues.apache.org/jira/browse/HIVE-27152
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


current_date_timestamp.q - This change was committed in HIVE-21388 without 
validation.
The failure is because again Hive is not able to parse 
explain cbo select current_timestamp() from alltypesorc

 

Exception stack trace :
{code:java}
2023-03-16 04:06:17 Completed running task attempt: 
attempt_1678964507586_0001_175_01_00_02023-03-16 04:06:17 Completed Dag: 
dag_1678964507586_0001_175TRACE StatusLogger Log4jLoggerFactory.getContext() 
found anchor class org.apache.hadoop.hive.ql.exec.OperatorTRACE StatusLogger 
Log4jLoggerFactory.getContext() found anchor class 
org.apache.hadoop.hive.ql.stats.fs.FSStatsPublisherTRACE StatusLogger 
Log4jLoggerFactory.getContext() found anchor class 
org.apache.hadoop.hive.ql.stats.fs.FSStatsAggregatorNoViableAltException(24@[]) 
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.explainStatement(HiveParser.java:1512)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1407)   at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:230)  at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:79) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:72) at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:617)at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1854)   at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1801) at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1796) at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
   at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)  at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1474)  
 at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1448)  
 at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177)
 at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)   at 
org.apache.hadoop.hive.cli.split12.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
  at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92)  
 at org.junit.rules.RunRules.evaluate(RunRules.java:20) Attachments {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27151) Revert "HIVE-21685 Wrong simplification in query with multiple IN clauses"

2023-03-19 Thread Aman Raj (Jira)
Aman Raj created HIVE-27151:
---

 Summary: Revert "HIVE-21685 Wrong simplification in query with 
multiple IN clauses"
 Key: HIVE-27151
 URL: https://issues.apache.org/jira/browse/HIVE-27151
 Project: Hive
  Issue Type: Sub-task
Reporter: Aman Raj
Assignee: Aman Raj


The multi_in_clause.q fails because Hive is not able to parse 
explain cbo
select * from very_simple_table_for_in_test where name IN('g','r') AND name 
IN('a','b')
If we want this to work, I am able to do it in my local. We have 2 options :
a. Either revert HIVE-21685 since this scenario was not validated back then 
before adding this test.
b. This fix was present in https://issues.apache.org/jira/browse/HIVE-20718 but 
to cherry pick this we need to cherry pick 
https://issues.apache.org/jira/browse/HIVE-17040 since HIVE-20718 has a lot of 
merge conflicts with  HIVE-17040. But after cherry picking these we have other 
failures to fix.
 
I am reverting this ticket for now.

Exception stacktrace :

{code:java}
2023-03-16 12:33:11 Completed running task attempt: 
attempt_1678994907903_0001_185_01_00_02023-03-16 12:33:11 Completed Dag: 
dag_1678994907903_0001_185TRACE StatusLogger Log4jLoggerFactory.getContext() 
found anchor class org.apache.hadoop.hive.ql.exec.OperatorTRACE StatusLogger 
Log4jLoggerFactory.getContext() found anchor class 
org.apache.hadoop.hive.ql.stats.fs.FSStatsPublisherTRACE StatusLogger 
Log4jLoggerFactory.getContext() found anchor class 
org.apache.hadoop.hive.ql.stats.fs.FSStatsAggregatorNoViableAltException(24@[]) 
 at 
org.apache.hadoop.hive.ql.parse.HiveParser.explainStatement(HiveParser.java:1512)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1407)   at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:230)  at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:79) at 
org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:72) at 
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:617)at 
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1854)   at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1801) at 
org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1796) at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
   at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:214) at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) at 
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)  at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) at 
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) at 
org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1474)  
 at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1448)  
 at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:177)
 at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)   at 
org.apache.hadoop.hive.cli.split12.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:62)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27150) Drop single partition can also support direct sql

2023-03-19 Thread Wechar (Jira)
Wechar created HIVE-27150:
-

 Summary: Drop single partition can also support direct sql
 Key: HIVE-27150
 URL: https://issues.apache.org/jira/browse/HIVE-27150
 Project: Hive
  Issue Type: Improvement
  Components: Hive
Reporter: Wechar
Assignee: Wechar


*Background:*
[HIVE-6980|https://issues.apache.org/jira/browse/HIVE-6980] supports direct sql 
for drop_partitions, we can reuse this huge improvement in drop_partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27149) StorageHandler PPD query planning statistics not adjusted for residualPredicate

2023-03-16 Thread Yi Zhang (Jira)
Yi Zhang created HIVE-27149:
---

 Summary: StorageHandler PPD query planning statistics not adjusted 
for residualPredicate
 Key: HIVE-27149
 URL: https://issues.apache.org/jira/browse/HIVE-27149
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 4.0.0-alpha-2
Reporter: Yi Zhang


In StorageHandler PPD, filter predicates can be pushed down to storage and 
trimmed to a subset residualPredicate, however, at query planning statistics 
based on filters only consider the 'final' residual predicates, when in fact 
there are pushedPredicates that should also be considered, this affect reducer 
parallelism (more reducers than needed)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27148) Disable TestJdbcGenericUDTFGetSplits

2023-03-16 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27148:
--

 Summary: Disable TestJdbcGenericUDTFGetSplits
 Key: HIVE-27148
 URL: https://issues.apache.org/jira/browse/HIVE-27148
 Project: Hive
  Issue Type: Sub-task
  Components: Tests
Reporter: Vihang Karajgaonkar


TestJdbcGenericUDTFGetSplits is flaky and intermittently fails.

http://ci.hive.apache.org/job/hive-flaky-check/614/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27147) HS2 is not accessible to clients via zookeeper when hostname used is not FQDN

2023-03-16 Thread Venugopal Reddy K (Jira)
Venugopal Reddy K created HIVE-27147:


 Summary: HS2 is not accessible to clients via zookeeper when 
hostname used is not FQDN
 Key: HIVE-27147
 URL: https://issues.apache.org/jira/browse/HIVE-27147
 Project: Hive
  Issue Type: Bug
Reporter: Venugopal Reddy K


HS2 is not accessible to clients via zookeeper when hostname used during 
registration is InetAddress.getHostName() with JDK 11. This issue is happening 
due to change in behavior on JDK 11. 

[https://stackoverflow.com/questions/61898627/inetaddress-getlocalhost-gethostname-different-behavior-between-jdk-11-and-j|http://example.com]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27146) Re-enable orc_merge*.q tests for TestMiniSparkOnYarnCliDriver

2023-03-16 Thread Vihang Karajgaonkar (Jira)
Vihang Karajgaonkar created HIVE-27146:
--

 Summary: Re-enable orc_merge*.q tests for 
TestMiniSparkOnYarnCliDriver
 Key: HIVE-27146
 URL: https://issues.apache.org/jira/browse/HIVE-27146
 Project: Hive
  Issue Type: Test
Reporter: Vihang Karajgaonkar


It was found that the q.out file for these tests fail with a diff in the 
replication factor of the files. The tests only fail on the CI job so it is 
possible that it is due to some test environment issues. The tests also fail on 
3.1.3 release.

E.g orc_merge4.q fails with the error. Similarly the other tests fail with the 
same difference in replication factor.
{code:java}
40c40
< -rw-r--r--   1 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
hdfs://### HDFS PATH ###
---
> -rw-r--r--   3 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
> hdfs://### HDFS PATH ###
66c66
< -rw-r--r--   1 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
hdfs://### HDFS PATH ###
---
> -rw-r--r--   3 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
> hdfs://### HDFS PATH ###
68c68
< -rw-r--r--   1 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
hdfs://### HDFS PATH ###
---
> -rw-r--r--   3 ### USER ### ### GROUP ###   2530 ### HDFS DATE ### 
> hdfs://### HDFS PATH ###
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27145) Use StrictMath for remaining Math functions as followup of HIVE-23133

2023-03-16 Thread Himanshu Mishra (Jira)
Himanshu Mishra created HIVE-27145:
--

 Summary: Use StrictMath for remaining Math functions as followup 
of HIVE-23133
 Key: HIVE-27145
 URL: https://issues.apache.org/jira/browse/HIVE-27145
 Project: Hive
  Issue Type: Task
  Components: UDF
Reporter: Himanshu Mishra
Assignee: Himanshu Mishra


[HIVE-23133|https://issues.apache.org/jira/browse/HIVE-23133] started using 
{{StrictMath}} for {{cos, exp, log}} UDFs to fix QTests failing as results vary 
based on hardware when using Math library.

Follow it up by using {{StrictMath}} for other Math functions that can have 
same impact of underlying hardware namely, {{sin, tan, asin, acos, atan, sqrt, 
pow, cbrt}}.

[JDK-4477961](https://bugs.openjdk.org/browse/JDK-4477961) (in Java 9) changed 
radians and degrees calculation leading to Q Test failures when tests are run 
on Java 9+, fix such tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27144) Alter table partitions need not DBNotificationListener for external tables

2023-03-15 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-27144:
---

 Summary: Alter table partitions need not DBNotificationListener 
for external tables
 Key: HIVE-27144
 URL: https://issues.apache.org/jira/browse/HIVE-27144
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Rajesh Balamohan


DBNotificationListener for external tables may not be needed. 

Even for "analyze table blah compute statistics for columns" for external 
partitioned tables, it invokes DBNotificationListener for all partitions. 


{noformat}
at org.datanucleus.store.query.Query.execute(Query.java:1726)
  at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)
  at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
  at 
org.apache.hadoop.hive.metastore.ObjectStore.addNotificationEvent(ObjectStore.java:11774)
  at jdk.internal.reflect.GeneratedMethodAccessor135.invoke(Unknown Source)
  at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566)
  at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
  at com.sun.proxy.$Proxy33.addNotificationEvent(Unknown Source)
  at 
org.apache.hive.hcatalog.listener.DbNotificationListener.process(DbNotificationListener.java:1308)
  at 
org.apache.hive.hcatalog.listener.DbNotificationListener.onAlterPartition(DbNotificationListener.java:458)
  at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$14.notify(MetaStoreListenerNotifier.java:161)
  at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:328)
  at 
org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:390)
  at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:863)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:6253)
  at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_req(HiveMetaStore.java:6201)
  at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.18/Native 
Method)
  at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.18/NativeMethodAccessorImpl.java:62)
  at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.18/DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(java.base@11.0.18/Method.java:566)
  at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:160)
  at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:121)
  at com.sun.proxy.$Proxy34.alter_partitions_req(Unknown Source)
  at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:21532)
  at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_partitions_req.getResult(ThriftHiveMetastore.java:21511)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
  at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:652)
  at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:647)
  at java.security.AccessController.doPrivileged(java.base@11.0.18/Native 
Method)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27143) Improve HCatStorer move task

2023-03-15 Thread Yi Zhang (Jira)
Yi Zhang created HIVE-27143:
---

 Summary: Improve HCatStorer move task
 Key: HIVE-27143
 URL: https://issues.apache.org/jira/browse/HIVE-27143
 Project: Hive
  Issue Type: Improvement
  Components: HCatalog
Affects Versions: 3.1.3
Reporter: Yi Zhang


moveTask in hcatalog is inefficient, it does 2 iterations dryRun and execution, 
and is sequential. This can be improved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27142) Map Join not working as expected when joining non-native tables with native tables

2023-03-15 Thread Syed Shameerur Rahman (Jira)
Syed Shameerur Rahman created HIVE-27142:


 Summary:  Map Join not working as expected when joining non-native 
tables with native tables
 Key: HIVE-27142
 URL: https://issues.apache.org/jira/browse/HIVE-27142
 Project: Hive
  Issue Type: Bug
Affects Versions: All Versions
Reporter: Syed Shameerur Rahman
Assignee: Syed Shameerur Rahman
 Fix For: 4.0.0


*1. Issue :*

When *_hive.auto.convert.join=true_* and if the underlying query is trying to 
join a large non-native hive table with a small native hive table, The map join 
is happening in the wrong side i.e on the map task which process the small 
native hive table and it can lead to OOM when the non-native table is really 
large and only few map tasks are spawned to scan the small native hive tables.

 

*2. Why is this happening ?*

This happens due to improper stats collection/computation of non native hive 
tables. Since the non-native hive tables are actually stored in a different 
location which Hive does not know of and only a temporary path which is visible 
to Hive while creating a non native table does not store the actual data, The 
stats collection logic tend to under estimate the data/rows and hence causes 
the map join to happen in the wrong side.

 

*3. Potential Solutions*

 3.1  Turn off *_hive.auto.convert.join=false._* This can have a negative 
impact of the query    if  the same query is trying to do multiple joins i.e 
one join with non-native tables and other join where both the tables are native.

 3.2 Compute stats for non-native table by firing the ANALYZE TABLE <> command 
before joining native and non-native commands. The user may or may not choose 
to do it.

 3.3 Don't collect/estimate stats for non-native hive tables by default 
(Preferred solution)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27141) Iceberg: Add more iceberg table metadata

2023-03-14 Thread zhangbutao (Jira)
zhangbutao created HIVE-27141:
-

 Summary: Iceberg:  Add more iceberg table metadata
 Key: HIVE-27141
 URL: https://issues.apache.org/jira/browse/HIVE-27141
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Affects Versions: 4.0.0-alpha-2
Reporter: zhangbutao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27140) Set HADOOP_PROXY_USER cause hiveMetaStoreClient close everytime

2023-03-14 Thread chenruotao (Jira)
chenruotao created HIVE-27140:
-

 Summary: Set HADOOP_PROXY_USER cause  hiveMetaStoreClient close 
everytime
 Key: HIVE-27140
 URL: https://issues.apache.org/jira/browse/HIVE-27140
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 3.1.2, 2.3.8
Reporter: chenruotao
Assignee: chenruotao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27139) Log details when hiveserver2.sh doing sanity check with the process id

2023-03-14 Thread Zhihua Deng (Jira)
Zhihua Deng created HIVE-27139:
--

 Summary: Log details when hiveserver2.sh doing sanity check with 
the process id
 Key: HIVE-27139
 URL: https://issues.apache.org/jira/browse/HIVE-27139
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Zhihua Deng


HiveServer2 always persists the process id into a file after HIVE-22193. When 
some other process reuses the same pid, restarting the HiveServer2 would be 
failed, print the details of the process if in case, and delete the old pid 
file when the HiveServer2 is decommissioning. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27138) MapJoinOperator throws NPE when computing OuterJoin with filter expressions on small table

2023-03-14 Thread Seonggon Namgung (Jira)
Seonggon Namgung created HIVE-27138:
---

 Summary: MapJoinOperator throws NPE when computing OuterJoin with 
filter expressions on small table
 Key: HIVE-27138
 URL: https://issues.apache.org/jira/browse/HIVE-27138
 Project: Hive
  Issue Type: Bug
Reporter: Seonggon Namgung
Assignee: Seonggon Namgung


Hive throws NPE when running mapjoin_filter_on_outerjoin.q using Tez engine. (I 
used TestMiniLlapCliDriver.)
The NPE is thrown by CommonJoinOperator.getFilterTag(), which just retreives 
the last object from the given list.
To the best of my knowledge, if Hive selects MapJoin to perform Join operation, 
filterTag should be computed and appended to a row before the row is passed to 
MapJoinOperator.
In the case of MapReduce engine, this is done by HashTableSinkOperator.
However, I cannot find any logic pareparing filterTag for small tables when 
Hive uses Tez engine.

I think there are 2 available options:
1. Don't use MapJoinOperator if a small table has filter expression.
2. Add a new logic that computes and passes filterTag to MapJoinOperator.

I am working on the second option and ready to discuss about it.
It would be grateful if you could give any opinion about this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27137) Remove HIVE_IN_TEST_ICEBERG flag

2023-03-14 Thread Zsolt Miskolczi (Jira)
Zsolt Miskolczi created HIVE-27137:
--

 Summary: Remove HIVE_IN_TEST_ICEBERG flag
 Key: HIVE-27137
 URL: https://issues.apache.org/jira/browse/HIVE-27137
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Zsolt Miskolczi


Remove the HIVE_IN_TEST_ICEBERG flag from the production code.

Remove code snippet from TxnHandler and update unit tests which are expecting 
the exception. 
{{
if (lc.isSetOperationType() && lc.getOperationType() == DataOperationType.UNSET 
&&
((MetastoreConf.getBoolVar(conf, ConfVars.HIVE_IN_TEST) ||
MetastoreConf.getBoolVar(conf, ConfVars.HIVE_IN_TEZ_TEST)) &&
!MetastoreConf.getBoolVar(conf, ConfVars.HIVE_IN_TEST_ICEBERG))) { 
}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27136) Backport HIVE-27129 to branch-3

2023-03-13 Thread Junlin Zeng (Jira)
Junlin Zeng created HIVE-27136:
--

 Summary: Backport HIVE-27129 to branch-3
 Key: HIVE-27136
 URL: https://issues.apache.org/jira/browse/HIVE-27136
 Project: Hive
  Issue Type: Improvement
Reporter: Junlin Zeng
Assignee: Junlin Zeng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27135) Cleaner fails with FileNotFoundException

2023-03-13 Thread Dayakar M (Jira)
Dayakar M created HIVE-27135:


 Summary: Cleaner fails with FileNotFoundException
 Key: HIVE-27135
 URL: https://issues.apache.org/jira/browse/HIVE-27135
 Project: Hive
  Issue Type: Bug
Reporter: Dayakar M
Assignee: Dayakar M


The compaction fails when the Cleaner tried to remove a missing directory from 
HDFS.
{code:java}
2023-03-06 07:45:48,331 ERROR org.apache.hadoop.hive.ql.txn.compactor.Cleaner: 
[Cleaner-executor-thread-12]: Caught exception when cleaning, unable to 
complete cleaning of 
id:39762523,dbname:ramas04_hk_ch,tableName:wsinvoicepart,partName:null,state:,type:MINOR,enqueueTime:0,start:0,properties:null,runAs:hive,tooManyAborts:false,hasOldAbort:false,highestWriteId:989,errorMessage:null,workerId:
 null,initiatorId: null java.io.FileNotFoundException: File 
hdfs://OnPrem-P-Se-DL-01/warehouse/tablespace/managed/hive/ramas04_hk_ch.db/wsinvoicepart/.hive-staging_hive_2023-03-06_07-45-23_120_4659605113266849995-73550
 does not exist.
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1275)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1249)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
    at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
    at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144)
    at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332)
    at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309)
    at 
org.apache.hadoop.util.functional.RemoteIterators$WrappingRemoteIterator.sourceHasNext(RemoteIterators.java:432)
    at 
org.apache.hadoop.util.functional.RemoteIterators$FilteringRemoteIterator.fetch(RemoteIterators.java:581)
    at 
org.apache.hadoop.util.functional.RemoteIterators$FilteringRemoteIterator.hasNext(RemoteIterators.java:602)
    at 
org.apache.hadoop.hive.ql.io.AcidUtils.getHdfsDirSnapshots(AcidUtils.java:1435)
    at 
org.apache.hadoop.hive.ql.txn.compactor.Cleaner.removeFiles(Cleaner.java:287)
    at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.clean(Cleaner.java:214)
    at 
org.apache.hadoop.hive.ql.txn.compactor.Cleaner.lambda$run$0(Cleaner.java:114)
    at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil$ThrowingRunnable.lambda$unchecked$0(CompactorUtil.java:54)
    at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750){code}
h4.  

This issue got fixed as a part of HIVE-26481 but here its not fixed completely. 
[Here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java#L1541]
 FileUtils.listFiles() API which returns a RemoteIterator. 
So while iterating over, it checks if it is a directory and recursive listing 
then it will try to list files from that directory but if that directory is 
removed by other thread/task then it throws FileNotFoundException. Here the 
directory which got removed is the .staging directory which needs to be 
excluded through by using passed filter.

 

So here we can use 
_*org.apache.hadoop.hive.common.FileUtils#listStatusRecursively()*_ 
[API|https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/FileUtils.java#L372]
 which will apply the filter before listing the files from that directory which 
will avoid FileNotFoundException.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27134) SharedWorkOptimizer merges TableScan operators that have different DPP parents

2023-03-11 Thread Sungwoo Park (Jira)
Sungwoo Park created HIVE-27134:
---

 Summary: SharedWorkOptimizer merges TableScan operators that have 
different DPP parents
 Key: HIVE-27134
 URL: https://issues.apache.org/jira/browse/HIVE-27134
 Project: Hive
  Issue Type: Sub-task
Reporter: Sungwoo Park






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27133) Round off limit value greater than int_max to int_max;

2023-03-10 Thread vamshi kolanu (Jira)
vamshi kolanu created HIVE-27133:


 Summary: Round off limit value greater than int_max to int_max;
 Key: HIVE-27133
 URL: https://issues.apache.org/jira/browse/HIVE-27133
 Project: Hive
  Issue Type: Task
Reporter: vamshi kolanu
Assignee: vamshi kolanu


Currently when the limit has a bigint value, it fails with the following error. 
As part of this task, we are going to round off any value greater than int_max 
to int_max.
select string_col from alltypes order by 1 limit 9223372036854775807
 
java.lang.NumberFormatException: For input string: "9223372036854775807"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.(Integer.java:867)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1803)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1911)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1911)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12616)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12718)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:450)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:299)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:650)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1503)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1450)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1445)
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:274)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567)
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27132) backport HIVE-12371 to branch-3 hive-jdbc use global Driver loginTimeout

2023-03-10 Thread shalk (Jira)
shalk created HIVE-27132:


 Summary: backport HIVE-12371 to branch-3 hive-jdbc use global 
Driver loginTimeout
 Key: HIVE-27132
 URL: https://issues.apache.org/jira/browse/HIVE-27132
 Project: Hive
  Issue Type: Improvement
Reporter: shalk






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27131) Remove empty module shims/scheduler

2023-03-09 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27131:
--

 Summary: Remove empty module shims/scheduler 
 Key: HIVE-27131
 URL: https://issues.apache.org/jira/browse/HIVE-27131
 Project: Hive
  Issue Type: Task
  Components: Shims
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


The module has nothing more than a plain pom.xml file and the latter does not 
seem to do anything special apart from bundling up together some optional 
dependencies.

There is no source code, no tests, and no reason for the module to exist.

At some point it used to contain a few classes but these were removed 
progressively (e.g., HIVE-22398) leaving back an empty module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27130) Add metrics to report the size of data replicated/copied to target

2023-03-08 Thread Amit Saonerkar (Jira)
Amit Saonerkar created HIVE-27130:
-

 Summary: Add metrics to report the size of data replicated/copied 
to target
 Key: HIVE-27130
 URL: https://issues.apache.org/jira/browse/HIVE-27130
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 3.2.0
Reporter: Amit Saonerkar


The corresponding CDPD Jira is 
[CDPD-45872|https://jira.cloudera.com/browse/CDPD-45872]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27129) Enhanced support to Hive Client http support

2023-03-08 Thread Junlin Zeng (Jira)
Junlin Zeng created HIVE-27129:
--

 Summary: Enhanced support to Hive Client http support
 Key: HIVE-27129
 URL: https://issues.apache.org/jira/browse/HIVE-27129
 Project: Hive
  Issue Type: Improvement
Reporter: Junlin Zeng
Assignee: Junlin Zeng


Currently we support using http in the hive metastore connection. However, we 
do not support custom headers and also default trust store. This ticket tracks 
the work to improve the http journey.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >