[jira] [Updated] (HIVE-28543) Previous snapshotId is stored in backend database for iceberg tables

2024-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28543:
--
Labels: pull-request-available  (was: )

> Previous snapshotId is stored in backend database for iceberg tables
> 
>
> Key: HIVE-28543
> URL: https://issues.apache.org/jira/browse/HIVE-28543
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: Iceberg integration
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28540) Special characters in user DN should be escaped when querying LDAP

2024-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28540:
--
Labels: pull-request-available  (was: )

> Special characters in user DN should be escaped when querying LDAP
> --
>
> Key: HIVE-28540
> URL: https://issues.apache.org/jira/browse/HIVE-28540
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Minor
>  Labels: pull-request-available
>
> When user name has a comma in it, and it is not escaped properly by Hive when 
> querying LDAP, the query failes.
> For example if the given user DN is "user , name", the correct escaped LDAP 
> query should contain: "user r\5c, name".
> More details here:
> https://datatracker.ietf.org/doc/html/rfc4515



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27262) Hive metastore changelog for Authorization operations

2024-09-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27262:
--
Labels: pull-request-available  (was: )

> Hive metastore changelog for Authorization operations
> -
>
> Key: HIVE-27262
> URL: https://issues.apache.org/jira/browse/HIVE-27262
> Project: Hive
>  Issue Type: New Feature
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0-alpha-2
>Reporter: Bharath Krishna
>Priority: Major
>  Labels: pull-request-available
>
> IIUC, Hive metastore doesn't provide the changelog (NOTIFICATION_LOG) for the 
> authorization operations like GRANT , REVOKE etc..
> I also assume in this case the Hive Replication doesn't replicate these 
> events as they are not recorded as metastore events.
>  
> Is there any reason these events are not captured, or is it just a missing 
> feature ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28542) OTEL: Implement OTEL Exporter to expose JVM details of HiveServer2

2024-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28542:
--
Labels: pull-request-available  (was: )

> OTEL: Implement OTEL Exporter to expose JVM details of HiveServer2
> --
>
> Key: HIVE-28542
> URL: https://issues.apache.org/jira/browse/HIVE-28542
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28541) Incorrectly treating materialized CTE as Table when privilege checking

2024-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28541:
--
Labels: pull-request-available  (was: )

> Incorrectly treating materialized CTE as Table when privilege checking
> --
>
> Key: HIVE-28541
> URL: https://issues.apache.org/jira/browse/HIVE-28541
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: Logical Optimizer, Parser, Query Planning
>Affects Versions: 2.3.5, 3.1.2
>Reporter: shuaiqi.guo
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-28541.patch
>
>
> when generating ReadEntity for function doAuthorization(), the materialized 
> CTE will be parsed as Table, like the following sql:
> {code:java}
> -- hive.security.authorization.enabled should be set to true.
> set hive.optimize.cte.materialize.threshold=1;
> with aaa as ( select 1) select * from aaa; {code}
> then we will get the following Exception:
> {code:java}
> Error: Error while compiling statement: FAILED: HiveAuthzPluginException 
> Error getting object from metastore for Object [type=TABLE_OR_VIEW, 
> name=test_db.aaa] (state=42000,code=4) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-21481) MERGE correctness issues with null safe equality

2024-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21481:
--
Labels: pull-request-available  (was: )

> MERGE correctness issues with null safe equality
> 
>
> Key: HIVE-21481
> URL: https://issues.apache.org/jira/browse/HIVE-21481
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>
> The way Hive currently generates plan for MERGE statement can lead to wrong 
> results with null safe equality.
> To illustrate consider the following reproducer
> {code:sql}
> create table ttarget(s string, j int, flag string) stored as orc 
> tblproperties("transactional"="true");
> truncate table ttarget;
> insert into ttarget values('not_null', 1, 'dont udpate'), (null,2, 'update');
> create table tsource (i int);
> insert into tsource values(null),(2);
> {code}
> Let's say you have the following MERGE statement
> {code:sql}
> explain merge into ttarget using tsource on i<=>j
>  when matched THEN
>   UPDATE set flag='updated'
>  when not matched THEN
>   INSERT VALUES('new', 1999, 'true');
> {code}
> With this MERGE {{*ONLY ONE*}} row should match in target which should be 
> updated. But currently due to the plan hive generate it will end up matching 
> both rows.
> This is because MERGE statement is rewritten into RIGHT OUTER JOIN + FILTER 
> corresponding to all branches.
> The part of the plan generated by hive for this statement consist of:
> {noformat}
> Map 2
> Map Operator Tree:
> TableScan
>   alias: tsource
>   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
> Column stats: NONE
>   Map Join Operator
> condition map:
>  Right Outer Join 0 to 1
> keys:
>   0 j (type: int)
>   1 i (type: int)
> nullSafes: [true]
> outputColumnNames: _col0, _col1, _col5, _col6
> input vertices:
>   0 Map 1
> Statistics: Num rows: 1 Data size: 206 Basic stats: 
> COMPLETE Column stats: NONE
> HybridGraceHashJoin: true
> Filter Operator
>   predicate: (_col6 IS NOT DISTINCT FROM _col1) (type: 
> boolean)
>   Statistics: Num rows: 1 Data size: 206 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: _col5 (type: 
> struct), _col0 (type: string), 
> _col1 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 1 Data size: 206 Basic stats: 
> COMPLETE Column stats: NONE
> Reduce Output Operator
>   key expressions: _col0 (type: 
> struct)
>   sort order: +
>   Map-reduce partition columns: UDFToInteger(_col0) 
> (type: int)
>   Statistics: Num rows: 1 Data size: 206 Basic stats: 
> COMPLETE Column stats: NONE
>   value expressions: _col1 (type: string), _col2 
> (type: int)
> {noformat}
> Result after JOIN will be :
> {code:sql}
> select s,j,i from ttarget right outer join tsource on i<=>j ;
> NULL  NULLNULL
> NULL  NULL2
> {code}
> On this resultset predicate {{(_col6 IS NOT DISTINCT FROM _col1)}} will be 
> true for both resulting into both rows matching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28533) Fix compaction with custom pools

2024-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28533:
--
Labels: pull-request-available  (was: )

> Fix compaction with custom pools
> 
>
> Key: HIVE-28533
> URL: https://issues.apache.org/jira/browse/HIVE-28533
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: Hive
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>
> Hive has the feature of assigning compaction requests and workers to pools, 
> as described here: 
> [https://cwiki.apache.org/confluence/display/Hive/Compaction+pooling.]
> However, there is a bug because of which this feature doesn't work with 
> non-default pools as requests remain stuck forever in Initiating state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28534) Improve HMS Client Exception Handling for Hive-3

2024-09-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28534:
--
Labels: pull-request-available  (was: )

> Improve HMS Client Exception Handling for Hive-3
> 
>
> Key: HIVE-28534
> URL: https://issues.apache.org/jira/browse/HIVE-28534
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.3
>Reporter: Sercan Tekin
>Assignee: Sercan Tekin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.4
>
>
> When the HMS client fails to connect to the server due to a 
> *TTransportException*, there is no issue with error reporting. 
> However, when the failure is caused by an *IOException*, the exception 
> object, which is used for reporting purposes, remains null. As a result, it 
> does not properly capture the root cause, and end-users encounter an 
> unrelated NPE, masking the actual issue.
> {code:java}
> Exception in thread "main" java.lang.AssertionError: Unable to connect to HMS!
>   at TestHMS.main(TestHMS.java:20)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.StringUtils.stringifyException(StringUtils.java:90)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:613)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:233)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:145)
>   at TestHMS.main(TestHMS.java:13)
> {code}
> The testing code that I used is as below:
> {code:java}
> import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
> import org.apache.hadoop.hive.conf.HiveConf;
> import java.util.List;
> public class TestHMS {
> public static void main(String[] args) {
> String HOSTNAME = "";
> HiveConf hiveConf = new HiveConf();
> hiveConf.setVar(HiveConf.ConfVars.METASTOREURIS, "thrift://" + 
> HOSTNAME + ":9083");
> hiveConf.setBoolVar(HiveConf.ConfVars.METASTORE_USE_THRIFT_SASL, 
> true);
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_METASTORE_USE_SSL, true);
> try (HiveMetaStoreClient client = new HiveMetaStoreClient(hiveConf)) {
> List databases = client.getAllDatabases();
> System.out.println("Available databases:");
> for (String db : databases) {
> System.out.println(db);
> }
> } catch (Exception e) {
> throw new AssertionError("Unable to connect to HMS!", e);
> }
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28532) Map Join Reuse cache allows to share hashtables for different join types

2024-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28532:
--
Labels: pull-request-available  (was: )

> Map Join Reuse cache allows to share hashtables for different join types
> 
>
> Key: HIVE-28532
> URL: https://issues.apache.org/jira/browse/HIVE-28532
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Map Join Reuse cache allows to share hashtables for different join types.
> For example lets take Outer join and Inner join. We cannot reuse a hash table 
> for a non-outer join vs outer join. Because outer join cannot accept the hash 
> table kind other than HASHMAP, whereas there are other types like HASHSET and 
> HASH_MULTISET. Below is the exception when we share the hash table for outer 
> join and inner. May be in certain cases we might produce wrong results as we 
> expect the hash table to be one type whereas we get the hashtable of another 
> type.
> {code:java}
> Caused by: java.lang.ClassCastException: class 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMultiSetContainer
>  cannot be cast to class 
> org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashMap
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28530) Fetched result from another query

2024-09-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28530:
--
Labels: pull-request-available  (was: )

> Fetched result from another query
> -
>
> Key: HIVE-28530
> URL: https://issues.apache.org/jira/browse/HIVE-28530
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Xiaomin Zhang
>Priority: Major
>  Labels: pull-request-available
>
> When running Hive load tests, we observed Beeline can fetch wrong query 
> result which is from another one running at same time.  We ruled out Load 
> Balancing issue, because it happened to a single HiveServer2.  And we found 
> this issue only happens when *hive.query.result.cached.enabled is false.*
> All test queries are in the same format as below: 
> {code:java}
> select concat('total record (test_$PID)=',count(*)) as count_record from t1t
> {code}
> We randomized the query by replacing the $PID with the Beeline PID and the 
> test driver ran 10 Beeline concurrently.  The table t1t is static and has a 
> few rows. So now the test driver can check if the query result is equal to: 
> total record (test_recon_mock_$PID)=2
> When query result cache is disabled,  we can see randomly query got a wrong 
> result, and can always reproduced.  For example, below two queries were 
> running in parallel:
> {code:java}
> queryId=hive_20240701103742_ff1adb2d-e9eb-448d-990e-00ab371e9db6): select 
> concat('total record (test_21535)=',count(*)) as count_record from t1t
> queryId=hive_20240701103742_9bdfff92-89e1-4bcd-88ea-bf73ba5fd93d): select 
> concat('total record (test_21566)=',count(*)) as count_record from t1t
> {code}
> While the second query is supposed to get below result:
> *total record (test_21566)=2*
> But actually Beeline got below result:
> *total record (test_21535)=2*
> There is no error in the HS2 log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28512) CREATE TABLE x LIKE retain whitelisted table properties

2024-09-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28512:
--
Labels: pull-request-available  (was: )

> CREATE TABLE x LIKE retain whitelisted table properties
> ---
>
> Key: HIVE-28512
> URL: https://issues.apache.org/jira/browse/HIVE-28512
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>
> It would be good to retain properties in 
> HiveConf.ConfVars.DDL_CTL_PARAMETERS_WHITELIST for CTLT query, as this is 
> particularly useful for avro base tables as the schema can evolve over time 
> and avro schema is mentioned in the avro.schema.url tblproperty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28524) Iceberg: Major QB Compaction add sort order support

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28524:
--
Labels: pull-request-available  (was: )

> Iceberg: Major QB Compaction add sort order support
> ---
>
> Key: HIVE-28524
> URL: https://issues.apache.org/jira/browse/HIVE-28524
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Hive
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28343) Iceberg: Major QB Compaction support filter in compaction request in OPTIMIZE TABLE command

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28343:
--
Labels: hive iceberg pull-request-available  (was: hive iceberg)

> Iceberg: Major QB Compaction support filter in compaction request in OPTIMIZE 
> TABLE command
> ---
>
> Key: HIVE-28343
> URL: https://issues.apache.org/jira/browse/HIVE-28343
> Project: Hive
>  Issue Type: Task
>  Components: Hive, Iceberg integration
>Reporter: Dmitriy Fingerman
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: hive, iceberg, pull-request-available
>
> Depends on this: [HIVE-28342|https://issues.apache.org/jira/browse/HIVE-28342]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28526) may produce null pointer when struct type value is null

2024-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28526:
--
Labels: pull-request-available  (was: )

> may produce null pointer when struct type value is null
> ---
>
> Key: HIVE-28526
> URL: https://issues.apache.org/jira/browse/HIVE-28526
> Project: Hive
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>Reporter: terrytlu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-09-18-18-38-53-494.png
>
>
> reproduce:
> create table test_struct
> (
> f1 string,
> demo_struct struct,
> datestr string
> );
> insert into test_struct(f1, datestr) select 'test_f1', 'datestr_1';
>  
> !image-2024-09-18-18-38-53-494.png|width=933,height=145!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28523) Performance issues that may occur when tables or partitions are deleted

2024-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28523:
--
Labels: pull-request-available  (was: )

> Performance issues that may occur when  tables or partitions are deleted
> 
>
> Key: HIVE-28523
> URL: https://issues.apache.org/jira/browse/HIVE-28523
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>  Components: Standalone Metastore
>Reporter: liux
>Assignee: liux
>Priority: Major
>  Labels: pull-request-available
> Attachments: ME1726238367718.jpg
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1. Traversal when deleting a table or partitions may have performance 
> problems.
> Location: standalone - metastore/metastore - 
> server/SRC/main/Java/org/apache/hadoop/hive/metastore/HMSHandler.java
> for (String partName : partNames) {    
> Path partPath = wh.getDnsPath(new Path(pathString));
> }
> Assuming that wh.getDnsPath takes about 10 ms at a time, the traversal of a 
> 20w partitioned object takes 33 minutes, which may result in large table 
> deletion or partition timeout.
> 2. It is not necessary to execute the wh.getDnsPath(new Path(pathString)) 
> statement when traversing all partition names. It is only necessary to 
> execute the statement when the partition is not a table subdirectory



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28510) Iceberg: FanoutPositionOnlyDeleteWriter support

2024-09-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28510:
--
Labels: pull-request-available  (was: )

> Iceberg: FanoutPositionOnlyDeleteWriter support
> ---
>
> Key: HIVE-28510
> URL: https://issues.apache.org/jira/browse/HIVE-28510
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>
>  A position delete writer capable of writing to multiple specs and partitions 
> if the incoming stream of deletes is not ordered



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28522) Fix actions/upload-artifact

2024-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28522:
--
Labels: pull-request-available  (was: )

> Fix actions/upload-artifact
> ---
>
> Key: HIVE-28522
> URL: https://issues.apache.org/jira/browse/HIVE-28522
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/apache/hive/actions/runs/10827752595/job/30041603070]
> [https://github.com/apache/hive/actions/runs/10830075344/job/30049009968?pr=5444]
>  
> {code:java}
> Error: This request has been automatically failed because it uses a 
> deprecated version of `actions/upload-artifact: v2`. Learn more: 
> https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28520) Upgrade to datasketches 2.0.0

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28520:
--
Labels: pull-request-available  (was: )

> Upgrade to datasketches 2.0.0
> -
>
> Key: HIVE-28520
> URL: https://issues.apache.org/jira/browse/HIVE-28520
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Minor
>  Labels: pull-request-available
>
> [https://datasketches.apache.org/docs/Community/Downloads.html] 
> apache-datasketches-hive-2.0.0 is released.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28519) Upgrade Maven SureFire Plugin to latest version 3.5.0

2024-09-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28519:
--
Labels: pull-request-available  (was: )

> Upgrade Maven SureFire Plugin to latest version 3.5.0
> -
>
> Key: HIVE-28519
> URL: https://issues.apache.org/jira/browse/HIVE-28519
> Project: Hive
>  Issue Type: Improvement
>  Security Level: Public(Viewable by anyone) 
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28517) Roaringbit version should be in sync with iceberg dependency required version

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28517:
--
Labels: pull-request-available  (was: )

> Roaringbit version should be in sync with iceberg dependency required version
> -
>
> Key: HIVE-28517
> URL: https://issues.apache.org/jira/browse/HIVE-28517
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
>
> Currenly, we have roaringbit version 0.9.22 in pom.xml defined at multiple 
> places and iceberg version 1.5.2 requires 1.0.1 during runtime. It better to 
> keep the version in sync to prevent classpath issues. Also, whenever we are 
> upgrading iceberg version, roaringbit version should also be ugraded in 
> parent pom.xml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28515) Iceberg: Concurrent queries fail during commit with ValidationException

2024-09-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28515:
--
Labels: pull-request-available  (was: )

> Iceberg: Concurrent queries fail during commit with ValidationException
> ---
>
> Key: HIVE-28515
> URL: https://issues.apache.org/jira/browse/HIVE-28515
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> {noformat}
> Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot commit, 
> missing data files: 
> [file:/Users/ayushsaxena/code/hive/iceberg/iceberg-handler/target/tmp/hive7073916777566968859/external/customers/data/0-0-data-ayushsaxena_20240909232021_99fd025f-1e27-4541-ab3e-77c6f9905eb7-job_17259492220180_0001-6-1.parquet]
>         at 
> org.apache.iceberg.MergingSnapshotProducer.validateDataFilesExist(MergingSnapshotProducer.java:751)
>         at org.apache.iceberg.BaseRowDelta.validate(BaseRowDelta.java:116)
>         at 
> org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:233)
>         at 
> org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:384)
>         at 
> org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
>         at 
> org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
>         at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
>         at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
>         at 
> org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:382)
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitWrite(HiveIcebergOutputCommitter.java:580)
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:494)
>         at 
> org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJobs$4(HiveIcebergOutputCommitter.java:291){noformat}
> Queries failing with {{ValidationException}} during commit even with retry 
> strategy configured with {{write_conflict}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28495) Iceberg: Upgrade iceberg version to 1.6.1

2024-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28495:
--
Labels: pull-request-available  (was: )

> Iceberg: Upgrade iceberg version to 1.6.1
> -
>
> Key: HIVE-28495
> URL: https://issues.apache.org/jira/browse/HIVE-28495
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Upgrade iceberg version to the latest 1.6.1 version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28502) Refactor method names that start with capital letters in PasswdAuthenticationProvider class

2024-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28502:
--
Labels: pull-request-available  (was: )

> Refactor method names that start with capital letters in 
> PasswdAuthenticationProvider class
> ---
>
> Key: HIVE-28502
> URL: https://issues.apache.org/jira/browse/HIVE-28502
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28503) Wrong results(NULL) when string concat operation with || operator for ORC file format when vectorization enabled

2024-09-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28503:
--
Labels: pull-request-available  (was: )

> Wrong results(NULL) when string concat operation with || operator for ORC 
> file format when vectorization enabled
> 
>
> Key: HIVE-28503
> URL: https://issues.apache.org/jira/browse/HIVE-28503
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Mahesh Raju Somalaraju
>Assignee: Mahesh Raju Somalaraju
>Priority: Major
>  Labels: pull-request-available
>
> Wrong results(NULL) when string concat operation with || operator for ORC 
> file format when vectorization enabled.
> set hive.query.results.cache.enabled=false;
> set hive.fetch.task.conversion=none;
> set hive.vectorized.execution.enabled=true;
> Result is NULL when we do concat operation with || operator. Locally it is 
> not able to reproduce. It is able to reproduce in cluster with more 
> records.Input data should be mix of NULL and NOT NULL values something like 
> below.
> create a table with orc file format and has 3 string columns and insert data 
> such way that it should have mix of NULL values and NOT NULL values.
>  
> |column1|column2|column3|count|
> |NULL      |NULL     |NULL      |18000  |
> |G         |L        |A1        |123932 |
> with above configuration, perform concat() operation with || operator and 
> insert new row with the concat() results.
> select * from (select t1.column1, t1.column2, t1.column3,  *t1.column1 || 
> t1.column2 || t1.column3 as VEH_MODEL_ID*
> from test_table t1 )t where VEH_MODEL_ID is NULL and if(column1 is 
> null,0,1)=1 AND if(column2 is null,0,1)=1 AND if(column3 is null,0,1)=1 limit 
> 1;
> in above query, *t1.column1 || t1.column2 || t1.column3 as VEH_MODEL_ID*  
> operation is returning the NULL result eventhough the input string values are 
> not null.
> |t.VEH_MODEL_ID|t.column1|t.column2|t.column3|
> |NULL|G|L|A2|
>  
> +Proposed solution as per code review:+
> +*Root cause:*+
> While doing concat() operation, In *StringGroupConcatColCol* class, if input 
> batch vector has mixed of NULL and NOT NULL values of inputs then we are not 
> setting output vector batch flags related to NULL and NOT NULLS correctly . 
> Each value in the vector has the flag whether it is NULL or NOT NULL. But  
> here we are not setting correctly the whole output vector flag 
> (outV.noNulls).  Without this flag it is working for parquet, some how they 
> may be referring each value instead of checking whole output vector flag 
> whether it is NULL or NOT NULL.
> +*code snippet:*+
> *StringGroupConcatColCol->evaluate() method:*
> if (inV1.noNulls && !inV2.noNulls) {   *>>  if any one input has NULL, then 
> output should be NULL.* 
> outV.noNulls = false; *--> setting this flag false as all values in this are 
> NULLs*
> 
> }
> else if (!inV1.noNulls && inV2.noNulls) { *>>  if any one input has NULL, 
> then output should be NULL.*
> outV.noNulls = false; --> *setting this flag false as all values in this are 
> NULLs* 
> ---
> }
> else if (!inV1.noNulls && !inV2.noNulls) {  *>>  if two inputs are NULL, then 
> output should be NULL.*
> outV.noNulls = false; *--> setting this flag false as all values in this are 
> NULLs**
> ---
> }
> else {                  *--> there are no nulls in either input vector*
> {color:#4c9aff}*outV.noNulls = true;  --> this has to be set true, as there 
> are no NULL values, this check is missed currently.*{color}
> // perform data operation
> ---
> }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28500) fix: alterSchemaVersion

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28500:
--
Labels: pull-request-available  (was: )

> fix: alterSchemaVersion
> ---
>
> Key: HIVE-28500
> URL: https://issues.apache.org/jira/browse/HIVE-28500
> Project: Hive
>  Issue Type: Improvement
>Reporter: KIM JI HYE
>Priority: Major
>  Labels: pull-request-available
>
> hello
> Only the alterSchemaVersion method of ObjectStore does not perform 
> rollbackTranscation() if committed is false in finally.
>  
> It is currently set to commitTransaction(), but it seems appropriate to 
> change it to rollbackTranscation().
> https://github.com/apache/hive/blob/3f6f940af3f60cc28834268e5d7f5612e3b13c30/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L13372



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28496) Address CVE-2020-28487 due to 4.20.0 version of vis.js

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28496:
--
Labels: pull-request-available  (was: )

> Address CVE-2020-28487 due to 4.20.0 version of vis.js
> --
>
> Key: HIVE-28496
> URL: https://issues.apache.org/jira/browse/HIVE-28496
> Project: Hive
>  Issue Type: Improvement
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: pull-request-available
>
> This is to address CVE-2020-28487 coming from 4.20.0 version of vis.js from 
> the file vis.min.js. This file is being used in the recently added Query plan 
> tab in the HiveServer2 web UI.
>  
> The project vis.js has been split up into sub projects(from version 5.0.0) 
> from which we only require the Network sub-project. This sub-project contains 
> both vis.Network and vis.Dataset that we require from vis.min.js.
>  
> Link to CVE-2020-28487: 
> http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-28487



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28497) Address CVE due to commons-codec:commons-codec:jar:1.11

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28497:
--
Labels: pull-request-available  (was: )

> Address CVE due to commons-codec:commons-codec:jar:1.11
> ---
>
> Key: HIVE-28497
> URL: https://issues.apache.org/jira/browse/HIVE-28497
> Project: Hive
>  Issue Type: Improvement
>Reporter: Kiran Velumuri
>Assignee: Kiran Velumuri
>Priority: Major
>  Labels: pull-request-available
>
> The vulnerability sonatype-2012-0050 comes from 
> commons-codec:commons-codec:jar:1.11 dependency in the 
> hive-standalone-metastore-server module.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28494) Iceberg: mvn build enables iceberg module by default

2024-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28494:
--
Labels: pull-request-available  (was: )

> Iceberg: mvn build enables iceberg module by default
> 
>
> Key: HIVE-28494
> URL: https://issues.apache.org/jira/browse/HIVE-28494
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-25027 hidden the iceberg module by default. IMO, we have put lots of 
> effort into iceberg module and it is more stable than before. We should 
> enable the iceberg module by default in case of mvn build.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28492) Upgrade Janino version to 3.1.12

2024-09-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28492:
--
Labels: pull-request-available  (was: )

> Upgrade Janino version to 3.1.12
> 
>
> Key: HIVE-28492
> URL: https://issues.apache.org/jira/browse/HIVE-28492
> Project: Hive
>  Issue Type: Improvement
>Reporter: shivangi
>Assignee: shivangi
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28490) SharedWorkOptimizer sometimes removes useful DPP sources.

2024-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28490:
--
Labels: pull-request-available  (was: )

> SharedWorkOptimizer sometimes removes useful DPP sources.
> -
>
> Key: HIVE-28490
> URL: https://issues.apache.org/jira/browse/HIVE-28490
> Project: Hive
>  Issue Type: Improvement
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Attachments: 3.StopRemovingRetainableDPP.pptx
>
>
> Current SharedWorkOptimizer sometimes removes DPP sources that are not 
> invalidated. I found that findAscendantWorkOperators() returns a super set of 
> ascendant operators, which causes wrong DPP source removal.
> Please check out the attached slides for detailed explanation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28489) Partitioning the input data of Grouping Set GroupBy operator

2024-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28489:
--
Labels: pull-request-available  (was: )

> Partitioning the input data of Grouping Set GroupBy operator
> 
>
> Key: HIVE-28489
> URL: https://issues.apache.org/jira/browse/HIVE-28489
> Project: Hive
>  Issue Type: New Feature
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Attachments: 2.PartitionDataBeforeGroupingSet.pptx
>
>
> GroupBy operator with grouping sets often emits too many rows, which becomes 
> the bottleneck of query execution. To reduce the number output rows, this 
> JIRA proposes partitioning the input data of such GroupBy operator.
> Please check out the attached slides for detailed explanation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28488) Merge adjacent union distinct

2024-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28488:
--
Labels: pull-request-available  (was: )

> Merge adjacent union distinct
> -
>
> Key: HIVE-28488
> URL: https://issues.apache.org/jira/browse/HIVE-28488
> Project: Hive
>  Issue Type: Improvement
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1.MergeAdjacentUnionDistinct.pptx
>
>
> Current Hive compiles
> "SELECT * FROM TBL1 UNION SELECT * FROM TBL2 UNION SELECT * FROM TBL3"
> to
> {code:java}
> TS - GBY - RS
> TS - GBY - RS - GBY - RS
>            TS - GBY - RS - GBY {code}
> This can be optimized as follows:
> {code:java}
> TS - GBY - RS
> TS - GBY - RS
> TS - GBY - RS - GBY {code}
> Please check out the attached slides for detailed explanation and feel free 
> to ask any questions or share suggestions. Also, it would be glad if one can 
> share about better location of this optimization (e.g. SemanticAnalyzer, 
> Calcite, etc.).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28483) String date cast giving wrong result

2024-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28483:
--
Labels: pull-request-available  (was: )

> String date cast giving wrong result
> 
>
> Key: HIVE-28483
> URL: https://issues.apache.org/jira/browse/HIVE-28483
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Rátkai
>Assignee: Zoltán Rátkai
>Priority: Minor
>  Labels: pull-request-available
>
> Date conversation gives wrong result. Like:1 row selected (6.403 seconds)
> select to_date('03-08-2024');
> Result:
> +-+
> |     _c0     |
> +-+
> | 0003-08-20  |
> +-+
> or:
> select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ;
> Result:
> +-+
> |    _c0    |
> +-+
> |0003-07-31  |
> +-+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28487) Outdated MetastoreSchemaTool class reference in schemaTool.sh

2024-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28487:
--
Labels: pull-request-available  (was: )

> Outdated MetastoreSchemaTool class reference in schemaTool.sh
> -
>
> Key: HIVE-28487
> URL: https://issues.apache.org/jira/browse/HIVE-28487
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Sebastian Bernauer
>Assignee: Sebastian Bernauer
>Priority: Minor
>  Labels: pull-request-available
>
> In HIVE-21298 {{MetastoreSchemaTool}} was moved from 
> {{org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool}} to 
> {{{}org.apache.hadoop.hive.metastore.tools.schematool.MetastoreSchemaTool{}}},
>  but it seems like {{schemaTool.sh}} was not updated.
>  
> This results in the following error being raised when invoking the shell 
> script:
> {code:java}
> /stackable/apache-hive-metastore-4.0.0-bin $ bin/base --service schemaTool
> Exception in thread "main" java.lang.ClassNotFoundException: 
> org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool
> at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594)
> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
> at java.base/java.lang.Class.forName0(Native Method)
> at java.base/java.lang.Class.forName(Class.java:398)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:321)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:241){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-12203) CBO (Calcite Return Path): groupby_grouping_id2.q returns wrong results

2024-08-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-12203:
--
Labels: pull-request-available  (was: )

> CBO (Calcite Return Path): groupby_grouping_id2.q returns wrong results
> ---
>
> Key: HIVE-12203
> URL: https://issues.apache.org/jira/browse/HIVE-12203
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.0.0
>Reporter: Jesús Camacho Rodríguez
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-12203.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-21298) Move Hive Schema Tool classes to their own package to have cleaner structure

2024-08-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-21298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-21298:
--
Labels: pull-request-available  (was: )

> Move Hive Schema Tool classes to their own package to have  cleaner structure
> -
>
> Key: HIVE-21298
> URL: https://issues.apache.org/jira/browse/HIVE-21298
> Project: Hive
>  Issue Type: Improvement
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
> Attachments: HIVE-21298.01.patch, HIVE-21298.02.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28484) SharedWorkOptimizer leaves residual unused operator tree that send DPP events to unknown operators

2024-08-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28484:
--
Labels: pull-request-available  (was: )

> SharedWorkOptimizer leaves residual unused operator tree that send DPP events 
> to unknown operators
> --
>
> Key: HIVE-28484
> URL: https://issues.apache.org/jira/browse/HIVE-28484
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Physical Optimizer
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28482) Iceberg: CTAS query failure while fetching URI for authorization

2024-08-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28482:
--
Labels: pull-request-available  (was: )

> Iceberg: CTAS query failure while fetching URI for authorization
> 
>
> Key: HIVE-28482
> URL: https://issues.apache.org/jira/browse/HIVE-28482
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>
> When we perform CTAS query with the following configs set to true - 
> {code:java}
> set hive.security.authorization.enabled=true;
> set hive.security.authorization.tables.on.storagehandlers=true;
> create table ctas_source stored by iceberg stored as orc as select * from 
> src;{code}
> The following error trace is seen - 
> {code:java}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception 
> occurred while getting the URI from storage handler: null
>         at 
> org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.addHivePrivObject(CommandAuthorizerV2.java:213)
>         at 
> org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.getHivePrivObjects(CommandAuthorizerV2.java:152)
>         at 
> org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.doAuthorization(CommandAuthorizerV2.java:77)
>         at 
> org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizer.doAuthorization(CommandAuthorizer.java:58)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28411) Bucket Map Join on Iceberg tables

2024-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28411:
--
Labels: pull-request-available  (was: )

> Bucket Map Join on Iceberg tables
> -
>
> Key: HIVE-28411
> URL: https://issues.apache.org/jira/browse/HIVE-28411
> Project: Hive
>  Issue Type: Sub-task
>  Components: Iceberg integration, StorageHandler
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: pull-request-available
>
> Allow HiveIcebergStorageHandler or any other non-native tables to declare how 
> to physically bucket records so that Hive can enable Bucket Map Join for them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28473) INSERT OVERWRITE LOCAL DIRECTORY writes staging files to wrong hdfs directory

2024-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28473:
--
Labels: pull-request-available  (was: )

> INSERT OVERWRITE LOCAL DIRECTORY writes staging files to wrong hdfs directory
> -
>
> Key: HIVE-28473
> URL: https://issues.apache.org/jira/browse/HIVE-28473
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
> Environment: Hadoop 3.3.4
> HIVE 3.1.3
> mapreduce engine
>Reporter: liang yu
>Priority: Major
>  Labels: pull-request-available
>
> using HIVE 3.1.3 ; mr engine; HADOOP 3.3.4
>  
> *Description*
> When I try to insert data into the local directory "/path/to/local", Hive 
> usually first creates an intermediate HDFS directory like 
> "hdfs:/session/execution/.staging-hive-xx", which is based on sessionId and 
> executionId. After that, it moves the results to the local filesystem at 
> "/path/to/local".
> However, it’s currently trying to create an intermediate HDFS directory at 
> "hdfs:/path/to/local/.staging-hive-xx", which incorrectly uses the local 
> filesystem path. This causes an error because it's attempting to create a new 
> path starting from {{{}/root{}}}, where we don't have sufficient permissions.
>  
> It can be reproduced by:
> {code:java}
> INSERT OVERWRITE LOCAL DIRECTORY "/path/to/local/dir"
> select a 
> from table 
> group by a; {code}
>  
> StackTrace:
> {code:java}
> RuntimeException: cannot create staging directory 
> "hdfs:/path/to/local/dir/.hive-staging-xx":
> Permission denied: user=aaa, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x 
> {code}
>  
> *ANALYSE*
>  
> In function 
> _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc._ We do 
> the same execution for both _QBMetaData.DEST_LOCAL_FILE_ and 
> _QBMetaData.DEST_DFS_FILE,_ and then we set the value 
> _ctx.getTempDirForInterimJobPath(dest_path).toString() to_ {_}statsTmpLoc{_}. 
> But for local filesystem dest_path is always totally different from the paths 
> of HADOOP filesystem, and then we get the exception that we cannot create a 
> HDFS directory because we don't have sufficient permissions.
>  
> *SOLUTION*
>  
> we should modify the function  
> _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc_ to 
> treat _QBMetaData.DEST_LOCAL_FILE_ and _QBMetaData.DEST_DFS_FILE_ differently 
> by giving the value _ctx.getMRTmpPath().toString()_ to _statsTmpLoc_ to avoid 
> creating a wrong intermediate direcoty. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28480) Disable SMB on partition hash generator mismatch across join branches in previous RS

2024-08-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28480:
--
Labels: pull-request-available  (was: )

> Disable SMB on partition hash generator mismatch across join branches in 
> previous RS
> 
>
> Key: HIVE-28480
> URL: https://issues.apache.org/jira/browse/HIVE-28480
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Himanshu Mishra
>Assignee: Himanshu Mishra
>Priority: Major
>  Labels: pull-request-available
>
> As SMB replaces last RS op from the joining branches and the JOIN op with 
> MERGEJOIN, we need to ensure the RS before these RS, in both branches, are 
> partitioning using same hash generator.
> Hash code generator differs based on ReducerTraits.UNIFORM i.e. 
> [ReduceSinkOperator#computeMurmurHash()  or 
> ReduceSinkOperator#computeHashCode()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L340-L344],
>  leading to different hash code for same value.
> Skip SMB join in such cases.
> h3. Replication:
> Consider following query, where join would get converted to SMB. Auto reducer 
> is enabled which ensures more than 1 reducer task.
>  
> {code:java}
> CREATE TABLE t_asj_18 (k STRING, v INT);
> INSERT INTO t_asj_18 values ('a', 10), ('a', 10);
> set hive.auto.convert.join=false;
> set hive.tez.auto.reducer.parallelism=true;
> EXPLAIN SELECT * FROM (
> SELECT k, COUNT(DISTINCT v), SUM(v)
> FROM t_asj_18 GROUP BY k
> ) a LEFT JOIN (
> SELECT k, COUNT(v)
> FROM t_asj_18 GROUP BY k
> ) b ON a.k = b.k; {code}
>  
>  
> Expected result is:
>  
> {code:java}
> a   1   20  a   2 {code}
> but on master branch, it results in
>  
>  
> {code:java}
> a   1   20  NULLNULL {code}
>  
>  
> Here for COUNT(DISTINCT), the RS key is k, v while partition is still k. In 
> such scenario [reducer trait UNIFORM is not 
> set|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SetReducerParallelism.java#L99-L104].]
>  The hash code for "a" from 2nd subquery is generated using murmurHash 
> (270516725) while 1st is generated using bucketHash (1086686554) and result 
> in rows with "a" key reaching different reducer tasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28451) JDBC: TableName matcher fix in GenericJdbcDatabaseAccessor#addBoundaryToQuery

2024-08-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28451:
--
Labels: pull-request-available  (was: )

> JDBC: TableName matcher fix in GenericJdbcDatabaseAccessor#addBoundaryToQuery
> -
>
> Key: HIVE-28451
> URL: https://issues.apache.org/jira/browse/HIVE-28451
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC storage handler
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> Caught exception while trying to execute query\rjava.lang.RuntimeException: 
> Cannot find . in sql query SELECT … FROM 
> "".""\rat...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28264) OOM/slow compilation when query contains SELECT clauses with nested expressions

2024-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28264:
--
Labels: pull-request-available  (was: )

> OOM/slow compilation when query contains SELECT clauses with nested 
> expressions
> ---
>
> Key: HIVE-28264
> URL: https://issues.apache.org/jira/browse/HIVE-28264
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, HiveServer2
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> {code:sql}
> CREATE TABLE t0 (`title` string);
> SELECT x10 from
> (SELECT concat_ws('L10',x9, x9, x9, x9) as x10 from
> (SELECT concat_ws('L9',x8, x8, x8, x8) as x9 from
> (SELECT concat_ws('L8',x7, x7, x7, x7) as x8 from
> (SELECT concat_ws('L7',x6, x6, x6, x6) as x7 from
> (SELECT concat_ws('L6',x5, x5, x5, x5) as x6 from
> (SELECT concat_ws('L5',x4, x4, x4, x4) as x5 from
> (SELECT concat_ws('L4',x3, x3, x3, x3) as x4 from
> (SELECT concat_ws('L3',x2, x2, x2, x2) as x3 
> from
> (SELECT concat_ws('L2',x1, x1, x1, x1) as 
> x2 from
> (SELECT concat_ws('L1',x0, x0, x0, 
> x0) as x1 from
> (SELECT concat_ws('L0',title, 
> title, title, title) as x0 from t0) t1) t2) t3) t4) t5) t6) t7) t8) t9) t10) t
> WHERE x10 = 'Something';
> {code}
> The query above fails with OOM when run with the TestMiniLlapLocalCliDriver 
> and the default max heap size configuration effective for tests (-Xmx2048m).
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>   at java.util.Arrays.copyOf(Arrays.java:3332)
>   at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
>   at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
>   at java.lang.StringBuilder.append(StringBuilder.java:136)
>   at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:152)
>   at org.apache.calcite.rex.RexCall.toString(RexCall.java:165)
>   at org.apache.calcite.rex.RexCall.appendOperands(RexCall.java:105)
>   at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:151)
>   at org.apache.calcite.rex.RexCall.toString(RexCall.java:165)
>   at java.lang.String.valueOf(String.java:2994)
>   at java.lang.StringBuilder.append(StringBuilder.java:131)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:90)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
>   at 
> org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explainInputs(RelWriterImpl.java:122)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:116)
>   at 
> org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144)
>   at 
> org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246)
>   at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2308)
>   at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2292)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.RuleEventLogger.ruleProductionSucceeded(RuleEventLogger.java:73)
>   at 
> org.apache.calcite.plan.MulticastRelOptListener.ruleProductionSucceeded(MulticastRelOptListener.java:68)
>   at 
> org.apache.calcite.plan.AbstractRelOptPlanner.notifyTransformation(AbstractRelOptPlanner.java:370)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyTransformationResults(HepPlanner.java:702)
>   at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:545)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271)
>   at 
> org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
>   at 
> org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2452)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2411)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28460) Cleanup some dangling codes around the Metastore

2024-08-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28460:
--
Labels: pull-request-available  (was: )

> Cleanup some dangling codes around the Metastore
> 
>
> Key: HIVE-28460
> URL: https://issues.apache.org/jira/browse/HIVE-28460
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>
> This jira is to track the work of:
> 1. Determine the database product once instead of every time on ObjectStore 
> initialization;
> 2. Extract the ObjectStore methods designed for HiveMetaTool to a standalone 
> class;
> 3. Remove some dangling codes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27080) Support project pushdown in JDBC storage handler even when filters are not pushed

2024-08-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27080:
--
Labels: pull-request-available  (was: )

> Support project pushdown in JDBC storage handler even when filters are not 
> pushed
> -
>
> Key: HIVE-27080
> URL: https://issues.apache.org/jira/browse/HIVE-27080
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 4.0.0-alpha-2
>Reporter: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: jdbc_project_pushdown.q
>
>
> {code:sql}
> CREATE EXTERNAL TABLE book
> (
> id int,
> title varchar(20),
> author int
> )
> STORED BY  
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
> TBLPROPERTIES (
> "hive.sql.database.type" = "POSTGRES",
> "hive.sql.jdbc.driver" = "org.postgresql.Driver",
> "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB",
> "hive.sql.dbcp.username" = "qtestuser",
> "hive.sql.dbcp.password" = "qtestpassword",
> "hive.sql.table" = "book"
> );
> {code}
> {code:sql}
> explain cbo select id from book where title = 'Les Miserables';
> {code}
> {noformat}
> CBO PLAN:
> HiveJdbcConverter(convention=[JDBC.POSTGRES])
>   JdbcProject(id=[$0])
> JdbcFilter(condition=[=($1, _UTF-16LE'Les Miserables')])
>   JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
> {noformat}
> +Good case:+ Only the id column is fetched from the underlying database (see 
> JdbcProject) since it is necessary for the result.
> {code:sql}
> explain cbo select id from book where UPPER(title) = 'LES MISERABLES';
> {code}
> {noformat}
> CBO PLAN:
> HiveProject(id=[$0])
>   HiveFilter(condition=[=(CAST(UPPER($1)):VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE", _UTF-16LE'LES MISERABLES')])
> HiveProject(id=[$0], title=[$1], author=[$2])
>   HiveJdbcConverter(convention=[JDBC.POSTGRES])
> JdbcHiveTableScan(table=[[default, book]], table:alias=[book])
> {noformat}
> +Bad case:+ All table columns are fetched from the database although only id 
> and title are necessary; id is the result so cannot be dropped and title is 
> needed for HiveFilter since the UPPER operation was not pushed in the DBMS. 
> The author column is not needed at all so the plan should have a JdbcProject 
> with id, and title, on top of the JdbcHiveTableScan.
> Although it doesn't seem a big deal in some cases tables are pretty wide 
> (more than 100 columns) while the queries rarely return all of them. 
> Improving project pushdown to handle such cases can give a major performance 
> boost.
> Pushing the filter with UPPER to JDBC storage handler is also a relevant 
> improvement but this should be tracked under another ticket.
> The problem can be reproduced by running:
> {noformat}
> mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=jdbc_project_pushdown.q 
> -Dtest.output.overwrite
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28457) HS2 WEBUI: LDAP authorization

2024-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28457:
--
Labels: pull-request-available  (was: )

> HS2 WEBUI: LDAP authorization
> -
>
> Key: HIVE-28457
> URL: https://issues.apache.org/jira/browse/HIVE-28457
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, HiveServer2
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28456) ObjectStore updatePartitionColumnStatisticsInBatch can cause connection starvation

2024-08-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28456:
--
Labels: pull-request-available  (was: )

> ObjectStore updatePartitionColumnStatisticsInBatch can cause connection 
> starvation 
> ---
>
> Key: HIVE-28456
> URL: https://issues.apache.org/jira/browse/HIVE-28456
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>
> Since HIVE-26419, we have a secondary connection pool for schema generation, 
> and for value generation operations, the size of this pool is 2. However, 
> based on DataNucleus documentation on datanucleus.ConnectionFactory2, link:
> [https://www.datanucleus.org/products/accessplatform_5_0/jdo/persistence.html]
> the secondary pool also serves for nontransactional connections, which makes 
> the ObjectStore updatePartitionColumnStatisticsInBatch request the connection 
> from this pool, as it doesn't open a transaction explicitly. If there is a 
> slow on inserting or updating the column statistics, the pool will become 
> unavailable quickly(the pool reaches its maximum size), the ObjectStore cloud 
> see the "Connection is not available, request timed out" under such a 
> situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28452) Iceberg: Cache delete files on executors

2024-08-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28452:
--
Labels: pull-request-available  (was: )

> Iceberg: Cache delete files on executors
> 
>
> Key: HIVE-28452
> URL: https://issues.apache.org/jira/browse/HIVE-28452
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28443) Add ifExists field to dropCatalogRequest

2024-08-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28443:
--
Labels: pull-request-available  (was: )

> Add ifExists field to dropCatalogRequest
> 
>
> Key: HIVE-28443
> URL: https://issues.apache.org/jira/browse/HIVE-28443
> Project: Hive
>  Issue Type: Bug
>Reporter: Jintong Jiang
>Assignee: Jintong Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Add ifExists field to dropCatalogRequest to not throw exception from server 
> if needed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28446) Convert some reserved words to non-reserved words

2024-08-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28446:
--
Labels: pull-request-available  (was: )

> Convert some reserved words to non-reserved words
> -
>
> Key: HIVE-28446
> URL: https://issues.apache.org/jira/browse/HIVE-28446
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: pull-request-available
>
> We've missed listing some new keywords in the non-reserved list.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> create table test (application 
> int);
> Error: Error while compiling statement: FAILED: ParseException line 1:19 
> cannot recognize input near 'application' 'int' ')' in column name or 
> constraint (state=42000,code=4) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28342) Iceberg: Major QB Compaction support filter in compaction request

2024-08-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28342:
--
Labels: hive iceberg pull-request-available  (was: hive iceberg)

> Iceberg: Major QB Compaction support filter in compaction request
> -
>
> Key: HIVE-28342
> URL: https://issues.apache.org/jira/browse/HIVE-28342
> Project: Hive
>  Issue Type: Task
>  Components: Hive, Iceberg integration
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: hive, iceberg, pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28442) Missing Column `ENGINE` in tables SYSDB.TAB_COL_STATS and SYSDB.PART_COL_STATS after upgrade from 3.1.0 to 4.1.0

2024-08-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28442:
--
Labels: pull-request-available  (was: )

> Missing Column `ENGINE` in tables SYSDB.TAB_COL_STATS and 
> SYSDB.PART_COL_STATS after upgrade from 3.1.0 to 4.1.0
> 
>
> Key: HIVE-28442
> URL: https://issues.apache.org/jira/browse/HIVE-28442
> Project: Hive
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-08-12-13-22-25-517.png, 
> image-2024-08-12-13-23-29-221.png, image-2024-08-12-13-24-29-328.png, 
> image-2024-08-12-13-25-25-731.png
>
>
> Fresh Install: InitSchema
> Tables SYSDB.TAB_COL_STATS and SYSDB.PART_COL_STATS
> !image-2024-08-12-13-22-25-517.png|width=399,height=241!
> Select * from the above tables:
> !image-2024-08-12-13-23-29-221.png|width=439,height=265!
>  
> Issue:
> Upgrade Hive Schema from 3.2.0 to 4.1.0:
> !image-2024-08-12-13-24-29-328.png|width=466,height=285!
> Select * on the tables fails post upgrade
> !image-2024-08-12-13-25-25-731.png|width=416,height=185!
>  
> Looks like missed from HIVE-22046



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28441) NPE in ORC tables when hive.orc.splits.include.file.footer is enabled

2024-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28441:
--
Labels: pull-request-available  (was: )

> NPE in ORC tables when hive.orc.splits.include.file.footer is enabled
> -
>
> Key: HIVE-28441
> URL: https://issues.apache.org/jira/browse/HIVE-28441
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 4.0.0
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Major
>  Labels: pull-request-available
>
> Steps to reproduce (tested on hive4 docker image):
> {code:java}
> set hive.orc.splits.include.file.footer=true;
> set hive.fetch.task.conversion=none;
> CREATE TABLE tbl (id INT, name STRING) STORED AS ORC;
> INSERT INTO tbl VALUES (1, 'abc');
> SELECT * FROM tbl;{code}
> Stacktrace:
> {code:java}
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.NullPointerException
>     at org.apache.orc.impl.BufferChunk.(BufferChunk.java:41)
>     at org.apache.orc.impl.OrcTail.(OrcTail.java:56)
>     at org.apache.orc.impl.OrcTail.(OrcTail.java:50)
>     at org.apache.hadoop.hive.ql.io.orc.OrcSplit.readFields(OrcSplit.java:230)
>     at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:223)
>     at 
> org.apache.hadoop.mapred.split.TezGroupedSplit.readWrappedSplit(TezGroupedSplit.java:161)
>     at 
> org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:132)
>     at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
>     at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
>     at 
> org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:176)
>     at 
> org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:132)
>     at 
> org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:693)
>     at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:664)
>     at 
> org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150)
>     at 
> org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:520)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:292)
>     ... 16 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28440) unblock hcatalog parquet project pushdown

2024-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28440:
--
Labels: pull-request-available  (was: )

> unblock hcatalog parquet project pushdown
> -
>
> Key: HIVE-28440
> URL: https://issues.apache.org/jira/browse/HIVE-28440
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 4.0.0
>Reporter: Yi Zhang
>Priority: Minor
>  Labels: pull-request-available
>
> for pig jobs that uses hcatloader, when load parquet tables project pushdown 
> is not in effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28436) Incorrect syntax in Hive schema file for table MIN_HISTORY_LEVEL

2024-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28436:
--
Labels: pull-request-available  (was: )

> Incorrect syntax in Hive schema file for table MIN_HISTORY_LEVEL
> 
>
> Key: HIVE-28436
> URL: https://issues.apache.org/jira/browse/HIVE-28436
> Project: Hive
>  Issue Type: Bug
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> CREATE EXTERNAL TABLE IF NOT EXISTS `MIN_HISTORY_LEVEL` ( `MHL_TXNID` bigint, 
> `MHL_MIN_OPEN_TXNID` bigint ) STORED BY 
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( 
> "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT 
> `MHL_TXNID`, `MHL_MIN_OPEN_TXNID`, FROM `MIN_HISTORY_LEVEL`" )
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: 
> MetaException(message:org.apache.hadoop.hive.serde2.SerDeException
> org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error 
> while trying to get column names: You have an error in your SQL syntax; check 
> the manual that corresponds to your MySQL server version for the right syntax 
> to use near 'FROM `MIN_HISTORY_LEVEL` LIMIT 1' at line 1)
> INFO  : Compiling 
> command(queryId=hive_20240805174353_0c743113-4f53-4174-916f-4a2d2085888e): 
> CREATE EXTERNAL TABLE IF NOT EXISTS `MIN_HISTORY_LEVEL` ( `MHL_TXNID` bigint, 
> `MHL_MIN_OPEN_TXNID` bigint ) STORED BY 
> 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( 
> "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT 
> `MHL_TXNID`, `MHL_MIN_OPEN_TXNID`, FROM `MIN_HISTORY_LEVEL`" )
> INFO  : Concurrency mode is disabled, not creating a lock manager 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28439) Iceberg: Bucket partition transform with DECIMAL can throw NPE

2024-08-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28439:
--
Labels: pull-request-available  (was: )

> Iceberg: Bucket partition transform with DECIMAL can throw NPE
> --
>
> Key: HIVE-28439
> URL: https://issues.apache.org/jira/browse/HIVE-28439
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: pull-request-available
>
> Hive can fail when we bucket records by decimal columns.
> {code:java}
> CREATE TABLE test (c_decimal DECIMAL(38, 0)) PARTITIONED BY SPEC (bucket(8, 
> c_decimal)) STORED BY ICEBERG;
> INSERT INTO test VALUES (CAST('5000441610525' AS DECIMAL(38, 
> 0))); {code}
> Stacktrace
> {code:java}
> ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1722775255811_0004_1_00, diagnostics=[Task failed, 
> taskId=task_1722775255811_0004_1_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Node: 
> yarn-nodemanager-2.yarn-nodemanager.zookage.svc.cluster.local/10.1.5.93 : 
> Error while running task ( failure ) : 
> attempt_1722775255811_0004_1_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing writable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
>   at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing writable
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
>   ... 16 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing writable
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:569)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
>   ... 19 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:384)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94)
>   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888)
>   at 
> org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:133)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:110)
>   at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDTFInline.process(GenericUDTFInline.java:64)
>   at 
> org.a

[jira] [Updated] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result

2024-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28366:
--
Labels: pull-request-available  (was: )

> Iceberg: Concurrent Insert and IOW produce incorrect result 
> 
>
> Key: HIVE-28366
> URL: https://issues.apache.org/jira/browse/HIVE-28366
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>
> 1. create a table and insert some data:
> {code}
> create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) 
> stored by iceberg;
> insert into ice_t values (1, 1), (2, 2);
> insert into ice_t values (10, 10), (20, 20);
> insert into ice_t values (40, 40), (30, 30);
> {code}
> Then concurrently execute the following jobs:
> Job 1:
> {code}
> insert into ice_t select i*100, p*100 from ice_t;
> {code}
> Job 2:
> {code}
> insert overwrite ice_t select i+1, p+1 from ice_t;
> {code}
> If Job 1 finishes first, Job 2 still succeeds for me, and after that the 
> table content will be the following:
> {code}
> 2  2
> 3  3
> 11 11
> 21 21
> 31 31
> 41 41
> 100100
> 200200
> 1000   1000
> 2000   2000
> 3000   3000
> 4000   4000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28438) Upgrade commons-dbcp2 and commons-pool2 to 2.12.0 to fix CVEs

2024-08-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28438:
--
Labels: pull-request-available  (was: )

> Upgrade commons-dbcp2 and commons-pool2 to 2.12.0 to fix CVEs
> -
>
> Key: HIVE-28438
> URL: https://issues.apache.org/jira/browse/HIVE-28438
> Project: Hive
>  Issue Type: Improvement
>Reporter: tanishqchugh
>Assignee: tanishqchugh
>Priority: Major
>  Labels: pull-request-available
>
> In the master branch, currently we are using commons-dbcp2 v2.9.0 which is 
> affected by following CVEs:
> [CVE-2022-45868|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-45868]
> [CVE-2022-23221|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-23221]
> [CVE-2021-42392|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42392]
> [CVE-2021-23463|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-23463]
> Also, there are two different versions (2.10.0 & 2.11.1) of commons-pool2 
> across project as commons-dbcp2 v2.9.0 requires commons-pool2 v2.10.0 as a 
> compile time dependency. 
> Upgrading both of the dependencies to v2.12.0 to resolve the CVEs as well as 
> make commons-pool2 version consistent across project to prevent any kind of 
> potential conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28431) Fix RexLiteral to ExprNode conversion if the literal is an empty string

2024-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28431:
--
Labels: pull-request-available  (was: )

> Fix RexLiteral to ExprNode conversion if the literal is an empty string
> ---
>
> Key: HIVE-28431
> URL: https://issues.apache.org/jira/browse/HIVE-28431
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Currently conversion from RexLiteral to ExprNode fail if the literal is an 
> empty string. This was introduced from 
> https://issues.apache.org/jira/browse/HIVE-23892. This causes the CBO to fail
> RexLiteral node will not be null but still value within RexLiteral can be 
> empty.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28435) Upgrade cron-utils to 9.2.1

2024-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28435:
--
Labels: pull-request-available  (was: )

> Upgrade cron-utils to 9.2.1
> ---
>
> Key: HIVE-28435
> URL: https://issues.apache.org/jira/browse/HIVE-28435
> Project: Hive
>  Issue Type: Task
>Reporter: tanishqchugh
>Assignee: tanishqchugh
>Priority: Major
>  Labels: pull-request-available
>
> Cron-utils v9.1.6 requires org.glassfish:javax.el v3.0.0 as a compile time 
> dependency. javax.el artifact was moved to jakarta.el. All versions upto and 
> including 3.0.3 for jakarta.el artifact is affected by 
> [CVE-2021-28170|[https://nvd.nist.gov/vuln/detail/CVE-2021-28170]]
> Upgrade cron-utils to 9.2.1 to get rid of CVE-2021-28170 as this upgrade 
> would remove transitive usage of javax.el



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28434) Upgrade to tez 0.10.4

2024-08-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28434:
--
Labels: pull-request-available  (was: )

> Upgrade to tez 0.10.4
> -
>
> Key: HIVE-28434
> URL: https://issues.apache.org/jira/browse/HIVE-28434
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28428) Map hash aggregation performance degradation

2024-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28428:
--
Labels: pull-request-available  (was: )

>  Map hash aggregation performance degradation
> -
>
> Key: HIVE-28428
> URL: https://issues.apache.org/jira/browse/HIVE-28428
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Major
>  Labels: pull-request-available
> Attachments: 2024-08-02 14.35.46.png, 
> image-2024-08-02-14-37-01-824.png, image-2024-08-02-14-38-45-459.png
>
>
> The following ticket has been fixed to enable map hash aggregation, but 
> performance degradation than when it is disabled.
> https://issues.apache.org/jira/browse/HIVE-23356
> I found a few reasons for this. If there are a large number of keys, the 
> following log will be output in large volume, affecting performance. And, 
> this can also cause an OOM.
> {code:java}
> 2024-08-02 05:21:53,675 [INFO] [TezChild] |exec.GroupByOperator|: Hash Tbl 
> flush: #hash table = 171000
> 2024-08-02 05:21:53,713 [INFO] [TezChild] |exec.GroupByOperator|: Hash Table 
> flushed: new size = 153900
> {code}
> By fixing this, we can improve performance as follows.
> Before:
> !image-2024-08-02-14-37-01-824.png!
> After:
> !2024-08-02 14.35.46.png!
> And, currently the flush size is fixed, but performance can be improved by 
> changing it depending on the data:
> !image-2024-08-02-14-38-45-459.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28427) HivePreFilteringRule gets applied multiple times

2024-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28427:
--
Labels: pull-request-available  (was: )

> HivePreFilteringRule gets applied multiple times
> 
>
> Key: HIVE-28427
> URL: https://issues.apache.org/jira/browse/HIVE-28427
> Project: Hive
>  Issue Type: Bug
>Reporter: Soumyakanti Das
>Assignee: Soumyakanti Das
>Priority: Major
>  Labels: pull-request-available
>
> In the {{matches}} method of {{{}HivePreFilteringRule{}}}, we check if a node 
> has already been visited using the {{{}HiveRulesRegistry{}}}. This is done by 
> using a {{{}SetMultimap{}}}. Currently, we don't get the 
> same hash value for equivalent RelNodes and because of this we visit similar 
> nodes multiple times even when it is present in the registry. Sometimes we 
> can also see infinite matching.
>  
> Instead we can use a {{SetMultimap}} and store Strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28426) mysql schema upgrade fails

2024-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28426:
--
Labels: pull-request-available  (was: )

> mysql schema upgrade fails
> --
>
> Key: HIVE-28426
> URL: https://issues.apache.org/jira/browse/HIVE-28426
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28422) Iceberg: Added missing awaitility dependency in iceberg-catalog

2024-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28422:
--
Labels: pull-request-available  (was: )

> Iceberg: Added missing awaitility dependency in iceberg-catalog
> ---
>
> Key: HIVE-28422
> URL: https://issues.apache.org/jira/browse/HIVE-28422
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Trivial
>  Labels: pull-request-available
>
> HIVE-28364 only added the *org.awaitility* dependency in iceberg root pom, 
> but this dep is used by iceberg-catalog, so we need to add it in 
> iceberg-catalog moudle. Otherwise, the idea will regard *Awaitility* as 
> unrecognized code in *TestHiveTableConcurrency* which may make people 
> confused, though we can build iceberg project successfully.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28421) Iceberg: mvn test can not run UTs in iceberg-cacatlog

2024-07-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28421:
--
Labels: pull-request-available  (was: )

> Iceberg: mvn test can not run UTs in iceberg-cacatlog
> -
>
> Key: HIVE-28421
> URL: https://issues.apache.org/jira/browse/HIVE-28421
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>
> mvn clean test -Dtest=TestHiveCommits  -pl iceberg/iceberg-catalog -Piceberg
> will print {*}Tests run: 0, Failures: 0, Errors: 0, Skipped: 0{*}, and 
> acutually the test won't be ran.
>  
> {code:java}
> For more information see 
> https://gradle.com/help/maven-extension-compile-avoidance.
> [INFO] Loaded from the build cache, saving 1.185s
> [INFO]
> [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ 
> hive-iceberg-catalog ---
> [INFO]
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO]
> [INFO] Results:
> [INFO]
> [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0
> [INFO]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time:  9.513 s
> [INFO] Finished at: 2024-08-01T13:32:26+08:00
> [INFO] 
> 
> [INFO] 17 goals, 14 executed, 3 from cache, saving at least 2s
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28417) Bump log4j2 to 2.23.1 to facilitate the use of HiveServer2 JDBC Driver under GraalVM Native Image

2024-07-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28417:
--
Labels: pull-request-available  (was: )

> Bump log4j2 to 2.23.1 to facilitate the use of HiveServer2 JDBC Driver under 
> GraalVM Native Image
> -
>
> Key: HIVE-28417
> URL: https://issues.apache.org/jira/browse/HIVE-28417
> Project: Hive
>  Issue Type: Improvement
>Reporter: Qiheng He
>Priority: Major
>  Labels: pull-request-available
>
> * Bump log4j2 to 2.23.1 to facilitate the use of HiveServer2 JDBC Driver 
> under GraalVM Native Image.
>  *  apache/logging-log4j2 has eliminated various removed old JDK APIs that 
> prevented it from being used under GraalVM Native Image since `2.24.0`. See 
> [https://github.com/apache/logging-log4j2/issues/1539] .
>  - But apache/hive:4.0.0 is still using the old version of 
> apache/logging-log4j2, which means that in PRs such as 
> [https://github.com/apache/shardingsphere/pull/31526] , in order to execute 
> unit tests related to HiveServer2 JDBC Driver under GraalVM Native Image, I 
> have to manually exclude the dependency of Log4j2. This sounds like,
> {code:xml}
> 
> 
>
>   org.apache.hive
>   hive-jdbc
>   4.0.0
>
>
>   org.apache.hive
>   hive-service
>   4.0.0
>   
>  
> org.apache.hadoop
> hadoop-client-api
>  
>  
> org.apache.logging.log4j
> log4j-api
>  
>  
> org.apache.logging.log4j
> log4j-slf4j-impl
>  
>  
> org.slf4j
> slf4j-log4j12
>  
>   
>
>
>   org.apache.hadoop
>   hadoop-client-api
>   3.3.6
>
> 
> 
> {code}
>  - If `org.apache.logging.log4j:log4j-api` is not excluded, HiveServer2 JDBC 
> Driver cannot be used under GraalVM Native Image, and the log is similar to 
> the following.
> {code:bash}
> [INFO] Executing: 
> /home/linghengqian/TwinklingLiftWorks/git/public/shardingsphere/test/native/target/native-tests
>  --xml-output-dir 
> /home/linghengqian/TwinklingLiftWorks/git/public/shardingsphere/test/native/target/native-test-reports
>  
> -Djunit.platform.listeners.uid.tracking.output.dir=/home/linghengqian/TwinklingLiftWorks/git/public/shardingsphere/test/native/target/test-ids
> JUnit Platform on Native Image - report
> 
> Failures (1):
>   JUnit Jupiter:HiveTest:assertShardingInLocalTransactions()
> MethodSource [className = 
> 'org.apache.shardingsphere.test.natived.jdbc.databases.HiveTest', methodName 
> = 'assertShardingInLocalTransactions', methodParameterTypes = '']
> => java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.logging.log4j.LogManager
>
> org.apache.commons.logging.LogAdapter$Log4jLog.(LogAdapter.java:155)
>
> org.apache.commons.logging.LogAdapter$Log4jAdapter.createLog(LogAdapter.java:122)
>org.apache.commons.logging.LogAdapter.createLog(LogAdapter.java:89)
>org.apache.commons.logging.LogFactory.getLog(LogFactory.java:67)
>org.apache.commons.logging.LogFactory.getLog(LogFactory.java:59)
>org.apache.hadoop.fs.FileSystem.(FileSystem.java:135)
>java.base@22.0.2/java.lang.Class.ensureInitialized(DynamicHub.java:599)
>java.base@22.0.2/java.lang.Class.ensureInitialized(DynamicHub.java:599)
>java.base@22.0.2/java.lang.Class.ensureInitialized(DynamicHub.java:599)
>
> org.apache.hadoop.hive.conf.valcoersion.JavaIOTmpdirVariableCoercion.(JavaIOTmpdirVariableCoercion.java:37)
>[...]
> {code}
> - If the Apache/Hive side can improve the version of log4j2, then to use the 
> HiveServer2 JDBC Driver under the GraalVM Native Image, I only need to 
> provide the GraalVM Reachability Metadata of Log4j2 in the downstream project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-23964) SemanticException in query 30 while generating logical plan

2024-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23964:
--
Labels: pull-request-available  (was: )

> SemanticException in query 30 while generating logical plan
> ---
>
> Key: HIVE-23964
> URL: https://issues.apache.org/jira/browse/HIVE-23964
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: cbo_query30_stacktrace.txt
>
>
> Invalid table alias or column reference 'c_last_review_date' is thrown when  
> running TPC-DS query 30 (cbo_query30.q, query30.q) on the metastore with the 
> partitoned TPC-DS 30TB dataset. 
> The respective stacktrace is attached to this case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28402) Precommit tests fail with OOM when running split-19

2024-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28402:
--
Labels: pull-request-available  (was: )

> Precommit tests fail with OOM when running split-19
> ---
>
> Key: HIVE-28402
> URL: https://issues.apache.org/jira/browse/HIVE-28402
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>
> The last 3 runs in master all fail with OOM when running split-19:
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2233/pipeline]
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2234/pipeline]
>  * 
> [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2235/pipeline]
> {noformat}
> [2024-07-25T05:57:46.816Z] [INFO] Running 
> org.apache.hadoop.hive.metastore.client.TestGetPartitions
> [2024-07-25T06:00:23.926Z] Exception in thread "Thread-46" 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> [2024-07-25T06:00:23.926Z]at 
> java.util.Arrays.copyOfRange(Arrays.java:3664)
> [2024-07-25T06:00:23.926Z]at java.lang.String.(String.java:207)
> [2024-07-25T06:00:23.926Z]at 
> java.io.BufferedReader.readLine(BufferedReader.java:356)
> [2024-07-25T06:00:24.907Z]at 
> java.io.BufferedReader.readLine(BufferedReader.java:389)
> [2024-07-25T06:00:24.907Z]at 
> org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:89)
> [2024-07-25T06:01:46.664Z] [WARNING] ForkStarter IOException: GC overhead 
> limit exceeded. See the dump file 
> /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream
> [2024-07-25T06:01:55.003Z] [INFO] Running 
> org.apache.hadoop.hive.metastore.TestFilterHooks
> [2024-07-25T06:02:21.747Z] 
> [2024-07-25T06:02:21.748Z] Exception: java.lang.OutOfMemoryError thrown from 
> the UncaughtExceptionHandler in thread "Thread-49"
> [2024-07-25T06:03:08.707Z] [WARNING] ForkStarter IOException: GC overhead 
> limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded
> [2024-07-25T06:03:08.707Z] GC overhead limit exceeded. See the dump file 
> /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream
> [2024-07-25T06:03:15.362Z] [ERROR] Error closing test event listener:
> [2024-07-25T06:03:15.362Z] java.util.concurrent.CompletionException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.CompletableFuture.encodeThrowable 
> (CompletableFuture.java:273)
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.CompletableFuture.completeThrowable 
> (CompletableFuture.java:280)
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.CompletableFuture$AsyncRun.run 
> (CompletableFuture.java:1643)
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.ThreadPoolExecutor.runWorker 
> (ThreadPoolExecutor.java:1149)
> [2024-07-25T06:03:15.362Z] at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run 
> (ThreadPoolExecutor.java:624)
> [2024-07-25T06:03:15.362Z] at java.lang.Thread.run (Thread.java:748)
> [2024-07-25T06:03:15.362Z] Caused by: java.lang.OutOfMemoryError: GC overhead 
> limit exceeded
> [2024-07-25T06:03:15.363Z] [ERROR] GC overhead limit exceeded -> [Help 1]
> {noformat}
> The OOM is also affecting PR runs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28409) Column lineage when creating view is missing if atlas HiveHook is set

2024-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28409:
--
Labels: pull-request-available  (was: )

> Column lineage when creating view is missing if atlas HiveHook is set
> -
>
> Key: HIVE-28409
> URL: https://issues.apache.org/jira/browse/HIVE-28409
> Project: Hive
>  Issue Type: Bug
>  Components: lineage
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>
> Column lineage info is collected by 
> {{{}org.apache.hadoop.hive.ql.optimizer.lineage.Generator{}}}. This is called 
> during Hive optimizations and view creation if one of these conditions is met:
> {code:java}
> hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_LINEAGE_INFO)
> || 
> postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.PostExecutePrinter")
> || 
> postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.LineageLogger")
> || postExecHooks.contains("org.apache.atlas.hive.hook.HiveHook")
> {code}
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java#L78-L81]
> and 
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L13226-L13228]
> However HIVE-17125 introduced more conditions which affects only the 
> {{org.apache.atlas.hive.hook.HiveHook}}
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java#L75-L86]
>  
> Later HIVE-23244 changed the code handles view creation. Since there are no 
> tests for testing view creation when {{org.apache.atlas.hive.hook.HiveHook}} 
> is specified at all the new code skips column lineage info collection.
> The tests we have for testing column lineage info collection are using 
> [LineageLogger.java|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageLogger.java]
>  which doesn't have any restriction in the Generator so column lineage info 
> is always collected when LineageLogger is set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28407) Alter Table Rename should not require create database privilege

2024-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28407:
--
Labels: pull-request-available  (was: )

> Alter Table Rename should not require create database privilege
> ---
>
> Key: HIVE-28407
> URL: https://issues.apache.org/jira/browse/HIVE-28407
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
>
> Currently we add the database object to the list of privilege objects 
> required for authorization in the alter table rename set of queries. Ideally 
> we only need a create table permission on the database and not a create 
> database permission. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28404) Fix typo Overriden to Overridden

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28404:
--
Labels: pull-request-available  (was: )

> Fix typo Overriden to Overridden
> 
>
> Key: HIVE-28404
> URL: https://issues.apache.org/jira/browse/HIVE-28404
> Project: Hive
>  Issue Type: Bug
>Reporter: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
>
> fix typo Overriden to Overridden



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28405) Set default TimeUnit for hive.repl.cm.retain to DAYS in metastore configs

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28405:
--
Labels: pull-request-available  (was: )

> Set default TimeUnit for hive.repl.cm.retain to DAYS in metastore configs
> -
>
> Key: HIVE-28405
> URL: https://issues.apache.org/jira/browse/HIVE-28405
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Standalone Metastore
>Reporter: Smruti Biswal
>Assignee: Smruti Biswal
>Priority: Minor
>  Labels: pull-request-available
>
> In HiveConf.java the default time unit for hive.repl.cm.retain is DAYS.
> One could very easily get confused and set the value in DAYs under metastore 
> configuration. It will be good idea to keep the units in sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28403) Delete redundant Javadoc for Hive

2024-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28403:
--
Labels: pull-request-available  (was: )

> Delete redundant Javadoc for Hive
> -
>
> Key: HIVE-28403
> URL: https://issues.apache.org/jira/browse/HIVE-28403
> Project: Hive
>  Issue Type: Wish
>Reporter: Caican Cai
>Priority: Minor
>  Labels: pull-request-available
>
> Hive has some redundant Javadoc, but there are no comments in it. I think 
> some Javadoc can be deleted.
> {code:java}
> // Some comments here
>   /**
>*
>*/
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline

2024-07-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28401:
--
Labels: pull-request-available  (was: )

> Drop redundant XML test report post-processing from CI pipeline
> ---
>
> Key: HIVE-28401
> URL: https://issues.apache.org/jira/browse/HIVE-28401
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> The [Maven Surefire 
> plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin]
>  generates an XML report containing various information regarding the 
> execution of tests. In case of failures the system-out and system-err output 
> from the test is saved in the XML file.
> The Jenkins pipeline has a post-processing 
> [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380]
>  that attempts to remove the system-out and system-err entries from the XML 
> files generated by Surefire for all tests that passed as an attempt to save 
> disk space in the Jenkins node.
> {code:bash}
> # removes all stdout and err for passed tests
> xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 
> 'testsuite/testcase/system-err[count(../failure)=0]' 
> {code}
> This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing 
> system-out and system-err for tests that passed. 
> Moreover, when the XML report file is large xmlstarlet chokes and throws a 
> "Huge input lookup" error that skips the remaining post-processing steps and 
> makes the build fail.
> {noformat}
> [2024-07-23T16:11:26.052Z] 
> ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2:
>  internal error: Huge input lookup
> [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799  INFO 
> [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio
> [2024-07-23T16:11:26.053Z]  ^
> [2024-07-23T16:11:43.133Z] Recording test results
> [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found.
> script returned exit code 3
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28399) Improve the fetch size in HiveConnection

2024-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28399:
--
Labels: pull-request-available  (was: )

> Improve the fetch size in HiveConnection
> 
>
> Key: HIVE-28399
> URL: https://issues.apache.org/jira/browse/HIVE-28399
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>
> If the 4.x Hive Jdbc client connects to an older HS2 or other thrift 
> implementations, it might throw the IllegalStateException: 
> [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1253-L1258],
>  as the remote might haven't set the property:  
> hive.server2.thrift.resultset.default.fetch.size back to the response of 
> OpenSession request. It also introduces the confusing on what the real fetch 
> size the connection is, as we have both initFetchSize and defaultFetchSize in 
> this HiveConnection, the HiveStatement checks the initFetchSize, 
> defaultFetchSize and 
> HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.defaultIntVal to obtain the 
> real fetch size, we can make them one in HiveConnection, so every statement 
> created from the connection uses this new fetch size.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28360) Upgrade jersey to version 1.19.4,

2024-07-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28360:
--
Labels: hive-4.0.1-must pull-request-available  (was: hive-4.0.1-must)

> Upgrade jersey to version 1.19.4,
> -
>
> Key: HIVE-28360
> URL: https://issues.apache.org/jira/browse/HIVE-28360
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.3
>Reporter: lvyankui
>Assignee: lvyankui
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Attachments: HIVE-28360.patch
>
>
> Hive version: 3.1.3
> Hadoop version: 3.3.6
> After upgrading to Hadoop 3.3.6, the Hive WebHCat server fails to start 
> because of inconsistent versions of the Jersey JAR package. Hive HCat lacks 
> the jersey-server-1.19 jar.
>  
> After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version 
> {color:#ff}1.19.4{color}, which is inconsistent with the jersey version 
> in the Hive WebHCat server. As a result, the startup fails. To resolve this, 
> manually download a package and place it in 
> /usr/lib/hive-hcatalog/share/webhcat/svr/lib/
> Therefore, when packaging Hive, we need to specify the version of Jersey in 
> the Hive POM file to match the version of Jersey in Hadoop  to avoid version 
> conflicts.
>  
> Here is the error log 
> INFO  | 18 Jul 2024 14:37:13,237 | org.eclipse.jetty.server.Server | 
> jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: 
> 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 1.8.0_412-b08
> WARN  | 18 Jul 2024 14:37:13,326 | 
> org.eclipse.jetty.server.handler.ContextHandler.ROOT | unavailable
> com.sun.jersey.api.container.ContainerException: No WebApplication provider 
> is present
>         at 
> com.sun.jersey.spi.container.WebApplicationFactory.createWebApplication(WebApplicationFactory.java:69)
>  ~[jersey-server-1.19.4.jar:1.19.4]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.create(ServletContainer.java:412)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.create(ServletContainer.java:327)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:603) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:207) 
> ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:394)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:577)
>  ~[jersey-servlet-1.19.jar:1.19]
>         at javax.servlet.GenericServlet.init(GenericServlet.java:244) 
> ~[javax.servlet-api-3.1.0.jar:3.1.0]
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28347) Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled.

2024-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28347:
--
Labels: pull-request-available  (was: )

> Make a UDAF 'collect_set' work with complex types, even when map-side 
> aggregation is disabled.
> --
>
> Key: HIVE-28347
> URL: https://issues.apache.org/jira/browse/HIVE-28347
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.4.0, 3.1.3, 4.0.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
>  Labels: pull-request-available
>
> collect_set() (+ collect_list()) doesn't work with complex types, when 
> map-side aggregation is disabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26473) Upgrade to Java17

2024-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26473:
--
Labels: pull-request-available  (was: )

> Upgrade to Java17
> -
>
> Key: HIVE-26473
> URL: https://issues.apache.org/jira/browse/HIVE-26473
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: dingwei2019
>Assignee: Akshat Mathur
>Priority: Major
>  Labels: pull-request-available
>
> we know that jdk11 is a LTS version, but the technical support will be end in 
> September 2023. JDK17 is the next generation LTS version, and will support a 
> least to 2026. 
> for G1GC, Java17 will get 8.66% faster than  Java11, for ParallelGC, the 
> percent will be 6.54%. If we upgrade to java17, we will get more performance 
> improvementthan Java11.
>  
> I suggest, we upgrade hive version to support java17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28375) Upgrade Nimbus-JOSE-JWT to 9.37.3 due to CVE-2023-52428

2024-07-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28375:
--
Labels: pull-request-available  (was: )

> Upgrade Nimbus-JOSE-JWT to 9.37.3 due to CVE-2023-52428
> ---
>
> Key: HIVE-28375
> URL: https://issues.apache.org/jira/browse/HIVE-28375
> Project: Hive
>  Issue Type: Task
>Reporter: Devaspati Krishnatri
>Assignee: Devaspati Krishnatri
>Priority: Minor
>  Labels: pull-request-available
> Attachments: mvn_dependency_tree.txt
>
>
> Upgrade Nimbus-JOSE-JWT to 9.37.3 due to CVE-2023-52428



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28377) Add support for hive.output.file.extension to HCatStorer

2024-07-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28377:
--
Labels: pull-request-available  (was: )

> Add support for hive.output.file.extension to HCatStorer
> 
>
> Key: HIVE-28377
> URL: https://issues.apache.org/jira/browse/HIVE-28377
> Project: Hive
>  Issue Type: Improvement
>Reporter: Venkatasubrahmanian Narayanan
>Assignee: Venkatasubrahmanian Narayanan
>Priority: Minor
>  Labels: pull-request-available
>
> Hive supports custom file extensions for output files configured through the 
> hive.output.file.extension property, but HCatStorer doesn't support that 
> property or have a replacement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28376) Remove unused Hive object from RelOptHiveTable

2024-07-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28376:
--
Labels: pull-request-available  (was: )

> Remove unused Hive object from RelOptHiveTable
> --
>
> Key: HIVE-28376
> URL: https://issues.apache.org/jira/browse/HIVE-28376
> Project: Hive
>  Issue Type: Task
>  Components: CBO
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>
> The 
> [Hive|https://github.com/apache/hive/blob/b18d5732b4f309fdc3b8226847c9c1ebcd2476fd/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java]
>  object is not used inside RelOptHiveTable so keeping a reference to it is 
> wasting memory and also complicates creation of RelOptHiveTable objects 
> (constructor parameter). 
> Moreover, the Hive objects have thread local scope so in general they 
> shouldn't be passed around cause their lifecycle becomes harder to manage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28374) Iceberg: Handle change of default format-version

2024-07-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28374:
--
Labels: pull-request-available  (was: )

> Iceberg: Handle change of default format-version
> 
>
> Key: HIVE-28374
> URL: https://issues.apache.org/jira/browse/HIVE-28374
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Handle changes due to change of default format version to 2 in the iceberg 
> lib.
> one eg: table created with explicitly defined format-version=2 is MOR but 
> when unspecified the format version is 2 but the table is COW



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28370) HMS's Authorizer for ALTER_TABLE event doesn't depend on HIVE_AUTHORIZATION_TABLES_ON_STORAGEHANDLERS

2024-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28370:
--
Labels: pull-request-available  (was: )

> HMS's Authorizer for ALTER_TABLE event doesn't depend on 
> HIVE_AUTHORIZATION_TABLES_ON_STORAGEHANDLERS
> -
>
> Key: HIVE-28370
> URL: https://issues.apache.org/jira/browse/HIVE-28370
> Project: Hive
>  Issue Type: Bug
>Reporter: Hongdan Zhu
>Assignee: Hongdan Zhu
>Priority: Major
>  Labels: pull-request-available
>
> When HIVE_AUTHORIZATION_TABLES_ON_STORAGEHANDLERS is set on both HS2 and HMS, 
> only HS2 authorization depends on it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28373) fix-hadoop-catalog based table

2024-07-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28373:
--
Labels: pull-request-available  (was: )

> fix-hadoop-catalog based table
> --
>
> Key: HIVE-28373
> URL: https://issues.apache.org/jira/browse/HIVE-28373
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
>  Labels: pull-request-available
>
> Since there are a lot of problems with hadoop_catalog, we submitted the 
> following PR to the iceberg community: 
> [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request 
> #10623 · apache/iceberg 
> (github.com)|https://github.com/apache/iceberg/pull/10623]
> With this PR, we can implement atomic operations based on hadoopcatalog.
> But this PR is not accepted by the iceberg community.And it seems that the 
> iceberg community is trying to remove support for hadoopcatalog.
> Since hive itself supports a number of features based on the hadoop_catalog 
> table, can we merge this patch in hive?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28372) No need to update partitions stats when renaming table

2024-07-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28372:
--
Labels: pull-request-available  (was: )

> No need to update partitions stats when renaming table
> --
>
> Key: HIVE-28372
> URL: https://issues.apache.org/jira/browse/HIVE-28372
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>
> After HIVE-27725, We no need to update partitions stats when renaming table.
> This change can speed up partitioned table rename operation in case of many 
> partition stats stored in HMS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28371) Optimize add partitions authorization in HiveMetaStore

2024-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28371:
--
Labels: pull-request-available  (was: )

> Optimize add partitions authorization in HiveMetaStore
> --
>
> Key: HIVE-28371
> URL: https://issues.apache.org/jira/browse/HIVE-28371
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>
> Currently add_partitions() api sends all the partitions (new partitions and 
> existing partitions) that need to be added for authorization, instead, we can 
> optimize this by sending only the new partitions for authorization.
> Impact: Alter table recover partitions collects all the available partitions 
> and sends it to Metastore to check if any new partitions can be added. If all 
> the partitions are sent for authorization irrespective of whether it exists 
> or not, the Authorization service will unnecessarily spend time on 
> authorizing already existing partitions. This can be avoided by only 
> authorizing new partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28369) LLAP proactive eviction fails with NullPointerException

2024-07-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28369:
--
Labels: pull-request-available  (was: )

> LLAP proactive eviction fails with NullPointerException
> ---
>
> Key: HIVE-28369
> URL: https://issues.apache.org/jira/browse/HIVE-28369
> Project: Hive
>  Issue Type: Bug
>Reporter: Seonggon Namgung
>Assignee: Seonggon Namgung
>Priority: Major
>  Labels: pull-request-available
>
> When hive.llap.io.encode.enabled is false, LLAP proactive eviction fails with 
> NullPointerException as follows:
> {code:java}
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.evictEntity(LlapIoImpl.java:313)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.evictEntity(LlapProtocolServerImpl.java:365)
>  ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapManagementProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:33214)
>  ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
>  ~[hadoop-common-3.3.6.jar:?]
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
>  ~[hadoop-common-3.3.6.jar:?]
> ...{code}
>  
> In fact,  3 caches used by LlapIoImpl.evictEntity() may be null or throw 
> UnsupportedOperationException, so we should check whether it is safe to call 
> markBuffersForProactiveEviction() or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28356) HMS’s Authorizer for the CREATE_TABLE event doesn’t handle HivePrivilegeObjectType.STORAGEHANDLER_URI

2024-07-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28356:
--
Labels: pull-request-available  (was: )

> HMS’s Authorizer for the CREATE_TABLE event doesn’t handle 
> HivePrivilegeObjectType.STORAGEHANDLER_URI
> -
>
> Key: HIVE-28356
> URL: https://issues.apache.org/jira/browse/HIVE-28356
> Project: Hive
>  Issue Type: Bug
>Reporter: Hongdan Zhu
>Assignee: Hongdan Zhu
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-27322 fixed the authorization of the Iceberg storagehandler through 
> Ranger policies for HS2, but the same policy enforcement is missing on the 
> HMS side, allowing the user to use directly the HMS API or simply use 
> Spark-SQL to create a storagehandler based table without the ranger policies 
> checked.
> From Spark-SQL:
> {noformat}
> spark.sql("CREATE TABLE default.icespark1 (id int, txt string) USING iceberg 
> TBLPROPERTIES ('external.table.purge'='true')"){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28367) Bump org.xerial.snappy:snappy-java from 1.1.10.4 to 1.1.10.5

2024-07-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28367:
--
Labels: pull-request-available  (was: )

> Bump org.xerial.snappy:snappy-java from 1.1.10.4 to 1.1.10.5
> 
>
> Key: HIVE-28367
> URL: https://issues.apache.org/jira/browse/HIVE-28367
> Project: Hive
>  Issue Type: Bug
>Reporter: tanishqchugh
>Assignee: tanishqchugh
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28368) Iceberg: Unable to read PARTITIONS Metadata table

2024-07-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28368:
--
Labels: pull-request-available  (was: )

> Iceberg: Unable to read PARTITIONS Metadata table
> -
>
> Key: HIVE-28368
> URL: https://issues.apache.org/jira/browse/HIVE-28368
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Fails with
> {noformat}
> Caused by: java.lang.ClassCastException: java.time.LocalDateTime cannot be 
> cast to java.time.OffsetDateTime
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampWithZoneObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampWithZoneObjectInspectorHive3.java:60)
>   at 
> org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampWithZoneObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampWithZoneObjectInspectorHive3.java:67)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:313)
>  
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
>   at 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
>  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28364) Iceberg: Upgrade iceberg version to 1.5.2

2024-07-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28364:
--
Labels: pull-request-available  (was: )

> Iceberg: Upgrade iceberg version to 1.5.2
> -
>
> Key: HIVE-28364
> URL: https://issues.apache.org/jira/browse/HIVE-28364
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28350) Drop remote database succeeds but fails while deleting data under

2024-07-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28350:
--
Labels: pull-request-available  (was: )

> Drop remote database succeeds but fails while deleting data under
> -
>
> Key: HIVE-28350
> URL: https://issues.apache.org/jira/browse/HIVE-28350
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>
> Drop remote database operation succeeds but fails towards the end while 
> clearing data under the database's location because while fetching database 
> object via JDO we don't seem to set the 'locationUri' field.
> {code:java}
> > drop database pg_hive_tests;
> INFO  : Compiling 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86): 
> drop database pg_hive_tests
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86); 
> Time taken: 0.115 seconds
> INFO  : Executing 
> command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86): 
> drop database pg_hive_tests
> INFO  : Starting task [Stage-0:DDL] in serial mode
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from a null string)
>     at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:716) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:51)
>  ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:356) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:329) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:813) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:550) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:544) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) 
> ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_232]
>     at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_232]
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  ~[hadoop-common-3.1.1.7.2.18.0-641.jar:?]
>     at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
>  ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[?:1.8.0_232]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
>     at 
> java.util.concurrent.ThreadPoolE

[jira] [Updated] (HIVE-28349) SHOW TABLES with invalid connector, giving 0 results, instead of failing

2024-07-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28349:
--
Labels: pull-request-available  (was: )

> SHOW TABLES with invalid connector, giving 0 results, instead of failing
> 
>
> Key: HIVE-28349
> URL: https://issues.apache.org/jira/browse/HIVE-28349
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Standalone Metastore
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SHOW TABLES with invalid connector, giving 0 results, instead of failing
> Steps to repro:
> {code:java}
> drop connector postgres_connector;
> create connector postgres_connector type 'postgres' url 
> 'jdbc:postgresql://1.1.1.1:31462' with DCPROPERTIES 
> ("hive.sql.dbcp.username"="root", "hive.sql.dbcp.password"="cloudera");
> drop database pg_hive_testing;
> create remote database pg_hive_testing using postgres_connector with 
> DBPROPERTIES ("connector.remoteDbName"="postgres");
> show tables in pg_hive_testing; {code}
> The last query gives 0 rows (not a failure).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27829) New command to display current connections on HS2 and HMS instances

2024-07-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27829:
--
Labels: pull-request-available  (was: )

> New command to display current connections on HS2 and HMS instances
> ---
>
> Key: HIVE-27829
> URL: https://issues.apache.org/jira/browse/HIVE-27829
> Project: Hive
>  Issue Type: New Feature
>  Components: Hive, HiveServer2, Standalone Metastore
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Riju Trivedi
>Priority: Major
>  Labels: pull-request-available
>
> We would need a command to list current connections to HS2/HMS instances
> It could like {*}show processlist{*}(Mysql) or {*}select * from 
> pg_stat_activity{*}(Postgresql) or {*}show compactions{*}(Hive) to see 
> current connections to the Hive Server2/HMS instances
> This command can help in troubleshooting issues with Hive service. One can 
> know the load on a given HS2/HMS instance with this command and identify 
> inappropriate connections to terminate them.
>  
> We can even extend this command to show connections between an HMS instance 
> and backend database to troubleshoot issues between HMS and backend database



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28354) Rename NegativeLlapCliDriver to NegativeLlapCliConfig

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28354:
--
Labels: newbie pull-request-available  (was: newbie)

> Rename NegativeLlapCliDriver to NegativeLlapCliConfig
> -
>
> Key: HIVE-28354
> URL: https://issues.apache.org/jira/browse/HIVE-28354
> Project: Hive
>  Issue Type: Bug
>Reporter: László Bodor
>Assignee: Zsolt Miskolczi
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 4.1.0
>
>
> https://github.com/apache/hive/blob/74b9c88aced9407351f6635769a4bd48214fca1e/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java#L364
> this is a config (extending an abstract one), not a driver, rename it to 
> avoid confusion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28341) Iceberg: Change Major QB Full Table Compaction to compact partition by partition

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28341:
--
Labels: hive iceberg pull-request-available  (was: hive iceberg)

> Iceberg: Change Major QB Full Table Compaction to compact partition by 
> partition
> 
>
> Key: HIVE-28341
> URL: https://issues.apache.org/jira/browse/HIVE-28341
> Project: Hive
>  Issue Type: Task
>  Components: Hive, Iceberg integration
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: hive, iceberg, pull-request-available
>
> Currently, Major compaction compacts a whole table in one step. If a table is 
> partition and has a lot of data this operation can take a lot of time and it 
> risks getting write conflicts at the commit stage. This can be improved to 
> work partition by partition. Also, for each partition it will create one 
> snapshot instead of 2 snapshots (truncate+IOW) created now when compacting 
> the whole table in one step.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28363) Improve heuristics of FilterStatsRule without column stats

2024-07-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28363:
--
Labels: pull-request-available  (was: )

> Improve heuristics of FilterStatsRule without column stats
> --
>
> Key: HIVE-28363
> URL: https://issues.apache.org/jira/browse/HIVE-28363
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: pull-request-available
>
> HIVE-13287 gave a better estimation of the selectivity of IN operators, 
> especially when column stats are available. This ticket would try to improve 
> the case where column stats are unavailable.
>  
> This is an example. The table has ten rows and no column stats on `id`.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> DESCRIBE FORMATTED users id;
> ...
> ++-+
> |    column_property     |            value            |
> ++-+
> | col_name               | id                          |
> | data_type              | int                         |
> | min                    |                             |
> | max                    |                             |
> | num_nulls              |                             |
> | distinct_count         |                             |
> | avg_col_len            |                             |
> | max_col_len            |                             |
> | num_trues              |                             |
> | num_falses             |                             |
> | bit_vector             |                             |
> | comment                | from deserializer           |
> | COLUMN_STATS_ACCURATE  | {\"BASIC_STATS\":\"true\"}  |
> ++-+{code}
> With a single needle, the estimated number becomes 10 * 0.5 = 5 because of 
> the fallback heuristics.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> EXPLAIN SELECT * FROM users 
> WHERE id IN (1);
> ...
> |                 TableScan                          |
> |                   alias: users                     |
> |                   filterExpr: (id = 1) (type: boolean) |
> |                   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE |
> |                   Filter Operator                  |
> |                     predicate: (id = 1) (type: boolean) |
> |                     Statistics: Num rows: 5 Data size: 5 Basic stats: 
> COMPLETE Column stats: NONE | {code}
> The size is estimated to be the original size with two or more needles. The 
> heuristics estimate the size as min(10, 10 * 0.5 * N) = 10. However, I 
> believe users expect to observe some reduction when using IN.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> EXPLAIN SELECT * FROM users 
> WHERE id IN (1, 2);
> |                 TableScan                          |
> |                   alias: users                     |
> |                   filterExpr: (id) IN (1, 2) (type: boolean) |
> |                   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE |
> |                   Filter Operator                  |
> |                     predicate: (id) IN (1, 2) (type: boolean) |
> |                     Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE | {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28362) Fail to materialize a CTE with VOID

2024-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28362:
--
Labels: pull-request-available  (was: )

> Fail to materialize a CTE with VOID
> ---
>
> Key: HIVE-28362
> URL: https://issues.apache.org/jira/browse/HIVE-28362
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 4.0.0
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
>  Labels: pull-request-available
>
> CTE materialization fails when it includes a NULL literal.
> {code:java}
> set hive.optimize.cte.materialize.full.aggregate.only=false;
> set hive.optimize.cte.materialize.threshold=2;
> WITH x AS (SELECT null AS null_value)
> SELECT * FROM x UNION ALL SELECT * FROM x; {code}
> Error message.
> {code:java}
> org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT 
> creates a VOID type, please use CAST to specify the type, near field:  
> null_value
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8344)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8303)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7846)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11598)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11461)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12397)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12263)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:638)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13136)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1062)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2390)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2338)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2340)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2501)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2323)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12978)
>     at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13085)
>     at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465)
>     at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:332)
>     at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
>     at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
>     at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:508) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion

2024-07-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28359:
--
Labels: pull-request-available  (was: )

> Discard old builds in Jenkins to avoid disk space exhaustion
> 
>
> Key: HIVE-28359
> URL: https://issues.apache.org/jira/browse/HIVE-28359
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: builds.txt
>
>
> Currently Jenkins retains the builds from all active branches/PRs. 
> {code:bash}
> for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls 
> -l $b | wc -l; done | sort -k2 -rn > builds.txt
> {code}
> Some PRs (e.g., 
> [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/])
>  with an excessive number of builds (i.e., 66) can easily consume many GBs of 
> data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was 
> saved on April 26, 2024 and it is now more than 2 months old.
> For master, we currently have all builds since January 2023 (previous builds 
> where manually removed as part of HIVE-28013). The builds for master occupy 
> currently 50GB of space.
> Due to the above the disk space (persistent volume) cannot be reclaimed and 
> currently it is almost full (91% /var/jenkins_home).
> {noformat}
> kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df"
> Filesystem 1K-blocks  Used Available Use% Mounted on
> overlay 98831908   4675004  94140520   5% /
> tmpfs  65536 0 65536   0% /dev
> tmpfs6645236 0   6645236   0% /sys/fs/cgroup
> /dev/sdb   308521792 278996208  29509200  91% /var/jenkins_home
> /dev/sda1   98831908   4675004  94140520   5% /etc/hosts
> shm65536 0 65536   0% /dev/shm
> tmpfs   1080112812  10801116   1% 
> /run/secrets/kubernetes.io/serviceaccount
> tmpfs6645236 0   6645236   0% /proc/acpi
> tmpfs6645236 0   6645236   0% /proc/scsi
> tmpfs6645236 0   6645236   0% /sys/firmware
> {noformat}
> Without a discard policy in place we are going to hit again HIVE-28013 or 
> other disk related issues pretty soon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   5   6   7   8   9   10   >