[jira] [Updated] (HIVE-28543) Previous snapshotId is stored in backend database for iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-28543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28543: -- Labels: pull-request-available (was: ) > Previous snapshotId is stored in backend database for iceberg tables > > > Key: HIVE-28543 > URL: https://issues.apache.org/jira/browse/HIVE-28543 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: Iceberg integration >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28540) Special characters in user DN should be escaped when querying LDAP
[ https://issues.apache.org/jira/browse/HIVE-28540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28540: -- Labels: pull-request-available (was: ) > Special characters in user DN should be escaped when querying LDAP > -- > > Key: HIVE-28540 > URL: https://issues.apache.org/jira/browse/HIVE-28540 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) >Reporter: Zoltán Rátkai >Assignee: Zoltán Rátkai >Priority: Minor > Labels: pull-request-available > > When user name has a comma in it, and it is not escaped properly by Hive when > querying LDAP, the query failes. > For example if the given user DN is "user , name", the correct escaped LDAP > query should contain: "user r\5c, name". > More details here: > https://datatracker.ietf.org/doc/html/rfc4515 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27262) Hive metastore changelog for Authorization operations
[ https://issues.apache.org/jira/browse/HIVE-27262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27262: -- Labels: pull-request-available (was: ) > Hive metastore changelog for Authorization operations > - > > Key: HIVE-27262 > URL: https://issues.apache.org/jira/browse/HIVE-27262 > Project: Hive > Issue Type: New Feature > Components: Standalone Metastore >Affects Versions: 3.1.2, 4.0.0-alpha-2 >Reporter: Bharath Krishna >Priority: Major > Labels: pull-request-available > > IIUC, Hive metastore doesn't provide the changelog (NOTIFICATION_LOG) for the > authorization operations like GRANT , REVOKE etc.. > I also assume in this case the Hive Replication doesn't replicate these > events as they are not recorded as metastore events. > > Is there any reason these events are not captured, or is it just a missing > feature ? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28542) OTEL: Implement OTEL Exporter to expose JVM details of HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-28542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28542: -- Labels: pull-request-available (was: ) > OTEL: Implement OTEL Exporter to expose JVM details of HiveServer2 > -- > > Key: HIVE-28542 > URL: https://issues.apache.org/jira/browse/HIVE-28542 > Project: Hive > Issue Type: Sub-task >Reporter: Ayush Saxena >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28541) Incorrectly treating materialized CTE as Table when privilege checking
[ https://issues.apache.org/jira/browse/HIVE-28541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28541: -- Labels: pull-request-available (was: ) > Incorrectly treating materialized CTE as Table when privilege checking > -- > > Key: HIVE-28541 > URL: https://issues.apache.org/jira/browse/HIVE-28541 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: Logical Optimizer, Parser, Query Planning >Affects Versions: 2.3.5, 3.1.2 >Reporter: shuaiqi.guo >Priority: Major > Labels: pull-request-available > Attachments: HIVE-28541.patch > > > when generating ReadEntity for function doAuthorization(), the materialized > CTE will be parsed as Table, like the following sql: > {code:java} > -- hive.security.authorization.enabled should be set to true. > set hive.optimize.cte.materialize.threshold=1; > with aaa as ( select 1) select * from aaa; {code} > then we will get the following Exception: > {code:java} > Error: Error while compiling statement: FAILED: HiveAuthzPluginException > Error getting object from metastore for Object [type=TABLE_OR_VIEW, > name=test_db.aaa] (state=42000,code=4) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-21481) MERGE correctness issues with null safe equality
[ https://issues.apache.org/jira/browse/HIVE-21481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-21481: -- Labels: pull-request-available (was: ) > MERGE correctness issues with null safe equality > > > Key: HIVE-21481 > URL: https://issues.apache.org/jira/browse/HIVE-21481 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Vineet Garg >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > > The way Hive currently generates plan for MERGE statement can lead to wrong > results with null safe equality. > To illustrate consider the following reproducer > {code:sql} > create table ttarget(s string, j int, flag string) stored as orc > tblproperties("transactional"="true"); > truncate table ttarget; > insert into ttarget values('not_null', 1, 'dont udpate'), (null,2, 'update'); > create table tsource (i int); > insert into tsource values(null),(2); > {code} > Let's say you have the following MERGE statement > {code:sql} > explain merge into ttarget using tsource on i<=>j > when matched THEN > UPDATE set flag='updated' > when not matched THEN > INSERT VALUES('new', 1999, 'true'); > {code} > With this MERGE {{*ONLY ONE*}} row should match in target which should be > updated. But currently due to the plan hive generate it will end up matching > both rows. > This is because MERGE statement is rewritten into RIGHT OUTER JOIN + FILTER > corresponding to all branches. > The part of the plan generated by hive for this statement consist of: > {noformat} > Map 2 > Map Operator Tree: > TableScan > alias: tsource > Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE > Column stats: NONE > Map Join Operator > condition map: > Right Outer Join 0 to 1 > keys: > 0 j (type: int) > 1 i (type: int) > nullSafes: [true] > outputColumnNames: _col0, _col1, _col5, _col6 > input vertices: > 0 Map 1 > Statistics: Num rows: 1 Data size: 206 Basic stats: > COMPLETE Column stats: NONE > HybridGraceHashJoin: true > Filter Operator > predicate: (_col6 IS NOT DISTINCT FROM _col1) (type: > boolean) > Statistics: Num rows: 1 Data size: 206 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col5 (type: > struct), _col0 (type: string), > _col1 (type: int) > outputColumnNames: _col0, _col1, _col2 > Statistics: Num rows: 1 Data size: 206 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: > struct) > sort order: + > Map-reduce partition columns: UDFToInteger(_col0) > (type: int) > Statistics: Num rows: 1 Data size: 206 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col1 (type: string), _col2 > (type: int) > {noformat} > Result after JOIN will be : > {code:sql} > select s,j,i from ttarget right outer join tsource on i<=>j ; > NULL NULLNULL > NULL NULL2 > {code} > On this resultset predicate {{(_col6 IS NOT DISTINCT FROM _col1)}} will be > true for both resulting into both rows matching. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28533) Fix compaction with custom pools
[ https://issues.apache.org/jira/browse/HIVE-28533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28533: -- Labels: pull-request-available (was: ) > Fix compaction with custom pools > > > Key: HIVE-28533 > URL: https://issues.apache.org/jira/browse/HIVE-28533 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: Hive >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > > Hive has the feature of assigning compaction requests and workers to pools, > as described here: > [https://cwiki.apache.org/confluence/display/Hive/Compaction+pooling.] > However, there is a bug because of which this feature doesn't work with > non-default pools as requests remain stuck forever in Initiating state. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28534) Improve HMS Client Exception Handling for Hive-3
[ https://issues.apache.org/jira/browse/HIVE-28534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28534: -- Labels: pull-request-available (was: ) > Improve HMS Client Exception Handling for Hive-3 > > > Key: HIVE-28534 > URL: https://issues.apache.org/jira/browse/HIVE-28534 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Affects Versions: 3.1.3 >Reporter: Sercan Tekin >Assignee: Sercan Tekin >Priority: Major > Labels: pull-request-available > Fix For: 3.1.4 > > > When the HMS client fails to connect to the server due to a > *TTransportException*, there is no issue with error reporting. > However, when the failure is caused by an *IOException*, the exception > object, which is used for reporting purposes, remains null. As a result, it > does not properly capture the root cause, and end-users encounter an > unrelated NPE, masking the actual issue. > {code:java} > Exception in thread "main" java.lang.AssertionError: Unable to connect to HMS! > at TestHMS.main(TestHMS.java:20) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.util.StringUtils.stringifyException(StringUtils.java:90) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:613) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:233) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:145) > at TestHMS.main(TestHMS.java:13) > {code} > The testing code that I used is as below: > {code:java} > import org.apache.hadoop.hive.metastore.HiveMetaStoreClient; > import org.apache.hadoop.hive.conf.HiveConf; > import java.util.List; > public class TestHMS { > public static void main(String[] args) { > String HOSTNAME = ""; > HiveConf hiveConf = new HiveConf(); > hiveConf.setVar(HiveConf.ConfVars.METASTOREURIS, "thrift://" + > HOSTNAME + ":9083"); > hiveConf.setBoolVar(HiveConf.ConfVars.METASTORE_USE_THRIFT_SASL, > true); > hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_METASTORE_USE_SSL, true); > try (HiveMetaStoreClient client = new HiveMetaStoreClient(hiveConf)) { > List databases = client.getAllDatabases(); > System.out.println("Available databases:"); > for (String db : databases) { > System.out.println(db); > } > } catch (Exception e) { > throw new AssertionError("Unable to connect to HMS!", e); > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28532) Map Join Reuse cache allows to share hashtables for different join types
[ https://issues.apache.org/jira/browse/HIVE-28532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28532: -- Labels: pull-request-available (was: ) > Map Join Reuse cache allows to share hashtables for different join types > > > Key: HIVE-28532 > URL: https://issues.apache.org/jira/browse/HIVE-28532 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: Logical Optimizer >Affects Versions: 4.0.0 >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > > Map Join Reuse cache allows to share hashtables for different join types. > For example lets take Outer join and Inner join. We cannot reuse a hash table > for a non-outer join vs outer join. Because outer join cannot accept the hash > table kind other than HASHMAP, whereas there are other types like HASHSET and > HASH_MULTISET. Below is the exception when we share the hash table for outer > join and inner. May be in certain cases we might produce wrong results as we > expect the hash table to be one type whereas we get the hashtable of another > type. > {code:java} > Caused by: java.lang.ClassCastException: class > org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMultiSetContainer > cannot be cast to class > org.apache.hadoop.hive.ql.exec.vector.mapjoin.hashtable.VectorMapJoinHashMap > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28530) Fetched result from another query
[ https://issues.apache.org/jira/browse/HIVE-28530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28530: -- Labels: pull-request-available (was: ) > Fetched result from another query > - > > Key: HIVE-28530 > URL: https://issues.apache.org/jira/browse/HIVE-28530 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Xiaomin Zhang >Priority: Major > Labels: pull-request-available > > When running Hive load tests, we observed Beeline can fetch wrong query > result which is from another one running at same time. We ruled out Load > Balancing issue, because it happened to a single HiveServer2. And we found > this issue only happens when *hive.query.result.cached.enabled is false.* > All test queries are in the same format as below: > {code:java} > select concat('total record (test_$PID)=',count(*)) as count_record from t1t > {code} > We randomized the query by replacing the $PID with the Beeline PID and the > test driver ran 10 Beeline concurrently. The table t1t is static and has a > few rows. So now the test driver can check if the query result is equal to: > total record (test_recon_mock_$PID)=2 > When query result cache is disabled, we can see randomly query got a wrong > result, and can always reproduced. For example, below two queries were > running in parallel: > {code:java} > queryId=hive_20240701103742_ff1adb2d-e9eb-448d-990e-00ab371e9db6): select > concat('total record (test_21535)=',count(*)) as count_record from t1t > queryId=hive_20240701103742_9bdfff92-89e1-4bcd-88ea-bf73ba5fd93d): select > concat('total record (test_21566)=',count(*)) as count_record from t1t > {code} > While the second query is supposed to get below result: > *total record (test_21566)=2* > But actually Beeline got below result: > *total record (test_21535)=2* > There is no error in the HS2 log. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28512) CREATE TABLE x LIKE retain whitelisted table properties
[ https://issues.apache.org/jira/browse/HIVE-28512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28512: -- Labels: pull-request-available (was: ) > CREATE TABLE x LIKE retain whitelisted table properties > --- > > Key: HIVE-28512 > URL: https://issues.apache.org/jira/browse/HIVE-28512 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > > It would be good to retain properties in > HiveConf.ConfVars.DDL_CTL_PARAMETERS_WHITELIST for CTLT query, as this is > particularly useful for avro base tables as the schema can evolve over time > and avro schema is mentioned in the avro.schema.url tblproperty. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28524) Iceberg: Major QB Compaction add sort order support
[ https://issues.apache.org/jira/browse/HIVE-28524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28524: -- Labels: pull-request-available (was: ) > Iceberg: Major QB Compaction add sort order support > --- > > Key: HIVE-28524 > URL: https://issues.apache.org/jira/browse/HIVE-28524 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) > Components: Hive >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28343) Iceberg: Major QB Compaction support filter in compaction request in OPTIMIZE TABLE command
[ https://issues.apache.org/jira/browse/HIVE-28343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28343: -- Labels: hive iceberg pull-request-available (was: hive iceberg) > Iceberg: Major QB Compaction support filter in compaction request in OPTIMIZE > TABLE command > --- > > Key: HIVE-28343 > URL: https://issues.apache.org/jira/browse/HIVE-28343 > Project: Hive > Issue Type: Task > Components: Hive, Iceberg integration >Reporter: Dmitriy Fingerman >Assignee: Zoltán Rátkai >Priority: Major > Labels: hive, iceberg, pull-request-available > > Depends on this: [HIVE-28342|https://issues.apache.org/jira/browse/HIVE-28342] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28526) may produce null pointer when struct type value is null
[ https://issues.apache.org/jira/browse/HIVE-28526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28526: -- Labels: pull-request-available (was: ) > may produce null pointer when struct type value is null > --- > > Key: HIVE-28526 > URL: https://issues.apache.org/jira/browse/HIVE-28526 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) >Reporter: terrytlu >Priority: Major > Labels: pull-request-available > Attachments: image-2024-09-18-18-38-53-494.png > > > reproduce: > create table test_struct > ( > f1 string, > demo_struct struct, > datestr string > ); > insert into test_struct(f1, datestr) select 'test_f1', 'datestr_1'; > > !image-2024-09-18-18-38-53-494.png|width=933,height=145! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28523) Performance issues that may occur when tables or partitions are deleted
[ https://issues.apache.org/jira/browse/HIVE-28523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28523: -- Labels: pull-request-available (was: ) > Performance issues that may occur when tables or partitions are deleted > > > Key: HIVE-28523 > URL: https://issues.apache.org/jira/browse/HIVE-28523 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) > Components: Standalone Metastore >Reporter: liux >Assignee: liux >Priority: Major > Labels: pull-request-available > Attachments: ME1726238367718.jpg > > Original Estimate: 168h > Remaining Estimate: 168h > > 1. Traversal when deleting a table or partitions may have performance > problems. > Location: standalone - metastore/metastore - > server/SRC/main/Java/org/apache/hadoop/hive/metastore/HMSHandler.java > for (String partName : partNames) { > Path partPath = wh.getDnsPath(new Path(pathString)); > } > Assuming that wh.getDnsPath takes about 10 ms at a time, the traversal of a > 20w partitioned object takes 33 minutes, which may result in large table > deletion or partition timeout. > 2. It is not necessary to execute the wh.getDnsPath(new Path(pathString)) > statement when traversing all partition names. It is only necessary to > execute the statement when the partition is not a table subdirectory -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28510) Iceberg: FanoutPositionOnlyDeleteWriter support
[ https://issues.apache.org/jira/browse/HIVE-28510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28510: -- Labels: pull-request-available (was: ) > Iceberg: FanoutPositionOnlyDeleteWriter support > --- > > Key: HIVE-28510 > URL: https://issues.apache.org/jira/browse/HIVE-28510 > Project: Hive > Issue Type: Improvement >Reporter: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > > A position delete writer capable of writing to multiple specs and partitions > if the incoming stream of deletes is not ordered -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28522) Fix actions/upload-artifact
[ https://issues.apache.org/jira/browse/HIVE-28522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28522: -- Labels: pull-request-available (was: ) > Fix actions/upload-artifact > --- > > Key: HIVE-28522 > URL: https://issues.apache.org/jira/browse/HIVE-28522 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Major > Labels: pull-request-available > > [https://github.com/apache/hive/actions/runs/10827752595/job/30041603070] > [https://github.com/apache/hive/actions/runs/10830075344/job/30049009968?pr=5444] > > {code:java} > Error: This request has been automatically failed because it uses a > deprecated version of `actions/upload-artifact: v2`. Learn more: > https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28520) Upgrade to datasketches 2.0.0
[ https://issues.apache.org/jira/browse/HIVE-28520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28520: -- Labels: pull-request-available (was: ) > Upgrade to datasketches 2.0.0 > - > > Key: HIVE-28520 > URL: https://issues.apache.org/jira/browse/HIVE-28520 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Minor > Labels: pull-request-available > > [https://datasketches.apache.org/docs/Community/Downloads.html] > apache-datasketches-hive-2.0.0 is released. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28519) Upgrade Maven SureFire Plugin to latest version 3.5.0
[ https://issues.apache.org/jira/browse/HIVE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28519: -- Labels: pull-request-available (was: ) > Upgrade Maven SureFire Plugin to latest version 3.5.0 > - > > Key: HIVE-28519 > URL: https://issues.apache.org/jira/browse/HIVE-28519 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) >Reporter: Indhumathi Muthumurugesh >Assignee: Indhumathi Muthumurugesh >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28517) Roaringbit version should be in sync with iceberg dependency required version
[ https://issues.apache.org/jira/browse/HIVE-28517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28517: -- Labels: pull-request-available (was: ) > Roaringbit version should be in sync with iceberg dependency required version > - > > Key: HIVE-28517 > URL: https://issues.apache.org/jira/browse/HIVE-28517 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > Labels: pull-request-available > > Currenly, we have roaringbit version 0.9.22 in pom.xml defined at multiple > places and iceberg version 1.5.2 requires 1.0.1 during runtime. It better to > keep the version in sync to prevent classpath issues. Also, whenever we are > upgrading iceberg version, roaringbit version should also be ugraded in > parent pom.xml -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28515) Iceberg: Concurrent queries fail during commit with ValidationException
[ https://issues.apache.org/jira/browse/HIVE-28515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28515: -- Labels: pull-request-available (was: ) > Iceberg: Concurrent queries fail during commit with ValidationException > --- > > Key: HIVE-28515 > URL: https://issues.apache.org/jira/browse/HIVE-28515 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > {noformat} > Caused by: org.apache.iceberg.exceptions.ValidationException: Cannot commit, > missing data files: > [file:/Users/ayushsaxena/code/hive/iceberg/iceberg-handler/target/tmp/hive7073916777566968859/external/customers/data/0-0-data-ayushsaxena_20240909232021_99fd025f-1e27-4541-ab3e-77c6f9905eb7-job_17259492220180_0001-6-1.parquet] > at > org.apache.iceberg.MergingSnapshotProducer.validateDataFilesExist(MergingSnapshotProducer.java:751) > at org.apache.iceberg.BaseRowDelta.validate(BaseRowDelta.java:116) > at > org.apache.iceberg.SnapshotProducer.apply(SnapshotProducer.java:233) > at > org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:384) > at > org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413) > at > org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203) > at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196) > at > org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:382) > at > org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitWrite(HiveIcebergOutputCommitter.java:580) > at > org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.commitTable(HiveIcebergOutputCommitter.java:494) > at > org.apache.iceberg.mr.hive.HiveIcebergOutputCommitter.lambda$commitJobs$4(HiveIcebergOutputCommitter.java:291){noformat} > Queries failing with {{ValidationException}} during commit even with retry > strategy configured with {{write_conflict}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28495) Iceberg: Upgrade iceberg version to 1.6.1
[ https://issues.apache.org/jira/browse/HIVE-28495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28495: -- Labels: pull-request-available (was: ) > Iceberg: Upgrade iceberg version to 1.6.1 > - > > Key: HIVE-28495 > URL: https://issues.apache.org/jira/browse/HIVE-28495 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Major > Labels: pull-request-available > > Upgrade iceberg version to the latest 1.6.1 version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28502) Refactor method names that start with capital letters in PasswdAuthenticationProvider class
[ https://issues.apache.org/jira/browse/HIVE-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28502: -- Labels: pull-request-available (was: ) > Refactor method names that start with capital letters in > PasswdAuthenticationProvider class > --- > > Key: HIVE-28502 > URL: https://issues.apache.org/jira/browse/HIVE-28502 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28503) Wrong results(NULL) when string concat operation with || operator for ORC file format when vectorization enabled
[ https://issues.apache.org/jira/browse/HIVE-28503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28503: -- Labels: pull-request-available (was: ) > Wrong results(NULL) when string concat operation with || operator for ORC > file format when vectorization enabled > > > Key: HIVE-28503 > URL: https://issues.apache.org/jira/browse/HIVE-28503 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Mahesh Raju Somalaraju >Assignee: Mahesh Raju Somalaraju >Priority: Major > Labels: pull-request-available > > Wrong results(NULL) when string concat operation with || operator for ORC > file format when vectorization enabled. > set hive.query.results.cache.enabled=false; > set hive.fetch.task.conversion=none; > set hive.vectorized.execution.enabled=true; > Result is NULL when we do concat operation with || operator. Locally it is > not able to reproduce. It is able to reproduce in cluster with more > records.Input data should be mix of NULL and NOT NULL values something like > below. > create a table with orc file format and has 3 string columns and insert data > such way that it should have mix of NULL values and NOT NULL values. > > |column1|column2|column3|count| > |NULL |NULL |NULL |18000 | > |G |L |A1 |123932 | > with above configuration, perform concat() operation with || operator and > insert new row with the concat() results. > select * from (select t1.column1, t1.column2, t1.column3, *t1.column1 || > t1.column2 || t1.column3 as VEH_MODEL_ID* > from test_table t1 )t where VEH_MODEL_ID is NULL and if(column1 is > null,0,1)=1 AND if(column2 is null,0,1)=1 AND if(column3 is null,0,1)=1 limit > 1; > in above query, *t1.column1 || t1.column2 || t1.column3 as VEH_MODEL_ID* > operation is returning the NULL result eventhough the input string values are > not null. > |t.VEH_MODEL_ID|t.column1|t.column2|t.column3| > |NULL|G|L|A2| > > +Proposed solution as per code review:+ > +*Root cause:*+ > While doing concat() operation, In *StringGroupConcatColCol* class, if input > batch vector has mixed of NULL and NOT NULL values of inputs then we are not > setting output vector batch flags related to NULL and NOT NULLS correctly . > Each value in the vector has the flag whether it is NULL or NOT NULL. But > here we are not setting correctly the whole output vector flag > (outV.noNulls). Without this flag it is working for parquet, some how they > may be referring each value instead of checking whole output vector flag > whether it is NULL or NOT NULL. > +*code snippet:*+ > *StringGroupConcatColCol->evaluate() method:* > if (inV1.noNulls && !inV2.noNulls) { *>> if any one input has NULL, then > output should be NULL.* > outV.noNulls = false; *--> setting this flag false as all values in this are > NULLs* > > } > else if (!inV1.noNulls && inV2.noNulls) { *>> if any one input has NULL, > then output should be NULL.* > outV.noNulls = false; --> *setting this flag false as all values in this are > NULLs* > --- > } > else if (!inV1.noNulls && !inV2.noNulls) { *>> if two inputs are NULL, then > output should be NULL.* > outV.noNulls = false; *--> setting this flag false as all values in this are > NULLs** > --- > } > else { *--> there are no nulls in either input vector* > {color:#4c9aff}*outV.noNulls = true; --> this has to be set true, as there > are no NULL values, this check is missed currently.*{color} > // perform data operation > --- > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28500) fix: alterSchemaVersion
[ https://issues.apache.org/jira/browse/HIVE-28500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28500: -- Labels: pull-request-available (was: ) > fix: alterSchemaVersion > --- > > Key: HIVE-28500 > URL: https://issues.apache.org/jira/browse/HIVE-28500 > Project: Hive > Issue Type: Improvement >Reporter: KIM JI HYE >Priority: Major > Labels: pull-request-available > > hello > Only the alterSchemaVersion method of ObjectStore does not perform > rollbackTranscation() if committed is false in finally. > > It is currently set to commitTransaction(), but it seems appropriate to > change it to rollbackTranscation(). > https://github.com/apache/hive/blob/3f6f940af3f60cc28834268e5d7f5612e3b13c30/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L13372 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28496) Address CVE-2020-28487 due to 4.20.0 version of vis.js
[ https://issues.apache.org/jira/browse/HIVE-28496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28496: -- Labels: pull-request-available (was: ) > Address CVE-2020-28487 due to 4.20.0 version of vis.js > -- > > Key: HIVE-28496 > URL: https://issues.apache.org/jira/browse/HIVE-28496 > Project: Hive > Issue Type: Improvement >Reporter: Kiran Velumuri >Assignee: Kiran Velumuri >Priority: Major > Labels: pull-request-available > > This is to address CVE-2020-28487 coming from 4.20.0 version of vis.js from > the file vis.min.js. This file is being used in the recently added Query plan > tab in the HiveServer2 web UI. > > The project vis.js has been split up into sub projects(from version 5.0.0) > from which we only require the Network sub-project. This sub-project contains > both vis.Network and vis.Dataset that we require from vis.min.js. > > Link to CVE-2020-28487: > http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2020-28487 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28497) Address CVE due to commons-codec:commons-codec:jar:1.11
[ https://issues.apache.org/jira/browse/HIVE-28497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28497: -- Labels: pull-request-available (was: ) > Address CVE due to commons-codec:commons-codec:jar:1.11 > --- > > Key: HIVE-28497 > URL: https://issues.apache.org/jira/browse/HIVE-28497 > Project: Hive > Issue Type: Improvement >Reporter: Kiran Velumuri >Assignee: Kiran Velumuri >Priority: Major > Labels: pull-request-available > > The vulnerability sonatype-2012-0050 comes from > commons-codec:commons-codec:jar:1.11 dependency in the > hive-standalone-metastore-server module. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28494) Iceberg: mvn build enables iceberg module by default
[ https://issues.apache.org/jira/browse/HIVE-28494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28494: -- Labels: pull-request-available (was: ) > Iceberg: mvn build enables iceberg module by default > > > Key: HIVE-28494 > URL: https://issues.apache.org/jira/browse/HIVE-28494 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Major > Labels: pull-request-available > > HIVE-25027 hidden the iceberg module by default. IMO, we have put lots of > effort into iceberg module and it is more stable than before. We should > enable the iceberg module by default in case of mvn build. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28492) Upgrade Janino version to 3.1.12
[ https://issues.apache.org/jira/browse/HIVE-28492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28492: -- Labels: pull-request-available (was: ) > Upgrade Janino version to 3.1.12 > > > Key: HIVE-28492 > URL: https://issues.apache.org/jira/browse/HIVE-28492 > Project: Hive > Issue Type: Improvement >Reporter: shivangi >Assignee: shivangi >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28490) SharedWorkOptimizer sometimes removes useful DPP sources.
[ https://issues.apache.org/jira/browse/HIVE-28490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28490: -- Labels: pull-request-available (was: ) > SharedWorkOptimizer sometimes removes useful DPP sources. > - > > Key: HIVE-28490 > URL: https://issues.apache.org/jira/browse/HIVE-28490 > Project: Hive > Issue Type: Improvement >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > Attachments: 3.StopRemovingRetainableDPP.pptx > > > Current SharedWorkOptimizer sometimes removes DPP sources that are not > invalidated. I found that findAscendantWorkOperators() returns a super set of > ascendant operators, which causes wrong DPP source removal. > Please check out the attached slides for detailed explanation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28489) Partitioning the input data of Grouping Set GroupBy operator
[ https://issues.apache.org/jira/browse/HIVE-28489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28489: -- Labels: pull-request-available (was: ) > Partitioning the input data of Grouping Set GroupBy operator > > > Key: HIVE-28489 > URL: https://issues.apache.org/jira/browse/HIVE-28489 > Project: Hive > Issue Type: New Feature >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > Attachments: 2.PartitionDataBeforeGroupingSet.pptx > > > GroupBy operator with grouping sets often emits too many rows, which becomes > the bottleneck of query execution. To reduce the number output rows, this > JIRA proposes partitioning the input data of such GroupBy operator. > Please check out the attached slides for detailed explanation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28488) Merge adjacent union distinct
[ https://issues.apache.org/jira/browse/HIVE-28488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28488: -- Labels: pull-request-available (was: ) > Merge adjacent union distinct > - > > Key: HIVE-28488 > URL: https://issues.apache.org/jira/browse/HIVE-28488 > Project: Hive > Issue Type: Improvement >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > Attachments: 1.MergeAdjacentUnionDistinct.pptx > > > Current Hive compiles > "SELECT * FROM TBL1 UNION SELECT * FROM TBL2 UNION SELECT * FROM TBL3" > to > {code:java} > TS - GBY - RS > TS - GBY - RS - GBY - RS > TS - GBY - RS - GBY {code} > This can be optimized as follows: > {code:java} > TS - GBY - RS > TS - GBY - RS > TS - GBY - RS - GBY {code} > Please check out the attached slides for detailed explanation and feel free > to ask any questions or share suggestions. Also, it would be glad if one can > share about better location of this optimization (e.g. SemanticAnalyzer, > Calcite, etc.). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28483) String date cast giving wrong result
[ https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28483: -- Labels: pull-request-available (was: ) > String date cast giving wrong result > > > Key: HIVE-28483 > URL: https://issues.apache.org/jira/browse/HIVE-28483 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Rátkai >Assignee: Zoltán Rátkai >Priority: Minor > Labels: pull-request-available > > Date conversation gives wrong result. Like:1 row selected (6.403 seconds) > select to_date('03-08-2024'); > Result: > +-+ > | _c0 | > +-+ > | 0003-08-20 | > +-+ > or: > select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ; > Result: > +-+ > | _c0 | > +-+ > |0003-07-31 | > +-+ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28487) Outdated MetastoreSchemaTool class reference in schemaTool.sh
[ https://issues.apache.org/jira/browse/HIVE-28487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28487: -- Labels: pull-request-available (was: ) > Outdated MetastoreSchemaTool class reference in schemaTool.sh > - > > Key: HIVE-28487 > URL: https://issues.apache.org/jira/browse/HIVE-28487 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Affects Versions: 4.0.0 >Reporter: Sebastian Bernauer >Assignee: Sebastian Bernauer >Priority: Minor > Labels: pull-request-available > > In HIVE-21298 {{MetastoreSchemaTool}} was moved from > {{org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool}} to > {{{}org.apache.hadoop.hive.metastore.tools.schematool.MetastoreSchemaTool{}}}, > but it seems like {{schemaTool.sh}} was not updated. > > This results in the following error being raised when invoking the shell > script: > {code:java} > /stackable/apache-hive-metastore-4.0.0-bin $ bin/base --service schemaTool > Exception in thread "main" java.lang.ClassNotFoundException: > org.apache.hadoop.hive.metastore.tools.MetastoreSchemaTool > at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:594) > at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527) > at java.base/java.lang.Class.forName0(Native Method) > at java.base/java.lang.Class.forName(Class.java:398) > at org.apache.hadoop.util.RunJar.run(RunJar.java:321) > at org.apache.hadoop.util.RunJar.main(RunJar.java:241){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-12203) CBO (Calcite Return Path): groupby_grouping_id2.q returns wrong results
[ https://issues.apache.org/jira/browse/HIVE-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-12203: -- Labels: pull-request-available (was: ) > CBO (Calcite Return Path): groupby_grouping_id2.q returns wrong results > --- > > Key: HIVE-12203 > URL: https://issues.apache.org/jira/browse/HIVE-12203 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 2.0.0 >Reporter: Jesús Camacho Rodríguez >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Attachments: HIVE-12203.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-21298) Move Hive Schema Tool classes to their own package to have cleaner structure
[ https://issues.apache.org/jira/browse/HIVE-21298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-21298: -- Labels: pull-request-available (was: ) > Move Hive Schema Tool classes to their own package to have cleaner structure > - > > Key: HIVE-21298 > URL: https://issues.apache.org/jira/browse/HIVE-21298 > Project: Hive > Issue Type: Improvement >Reporter: Miklos Gergely >Assignee: Miklos Gergely >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-1 > > Attachments: HIVE-21298.01.patch, HIVE-21298.02.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28484) SharedWorkOptimizer leaves residual unused operator tree that send DPP events to unknown operators
[ https://issues.apache.org/jira/browse/HIVE-28484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28484: -- Labels: pull-request-available (was: ) > SharedWorkOptimizer leaves residual unused operator tree that send DPP events > to unknown operators > -- > > Key: HIVE-28484 > URL: https://issues.apache.org/jira/browse/HIVE-28484 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Physical Optimizer >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28482) Iceberg: CTAS query failure while fetching URI for authorization
[ https://issues.apache.org/jira/browse/HIVE-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28482: -- Labels: pull-request-available (was: ) > Iceberg: CTAS query failure while fetching URI for authorization > > > Key: HIVE-28482 > URL: https://issues.apache.org/jira/browse/HIVE-28482 > Project: Hive > Issue Type: Bug >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > > When we perform CTAS query with the following configs set to true - > {code:java} > set hive.security.authorization.enabled=true; > set hive.security.authorization.tables.on.storagehandlers=true; > create table ctas_source stored by iceberg stored as orc as select * from > src;{code} > The following error trace is seen - > {code:java} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception > occurred while getting the URI from storage handler: null > at > org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.addHivePrivObject(CommandAuthorizerV2.java:213) > at > org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.getHivePrivObjects(CommandAuthorizerV2.java:152) > at > org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizerV2.doAuthorization(CommandAuthorizerV2.java:77) > at > org.apache.hadoop.hive.ql.security.authorization.command.CommandAuthorizer.doAuthorization(CommandAuthorizer.java:58) > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28411) Bucket Map Join on Iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-28411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28411: -- Labels: pull-request-available (was: ) > Bucket Map Join on Iceberg tables > - > > Key: HIVE-28411 > URL: https://issues.apache.org/jira/browse/HIVE-28411 > Project: Hive > Issue Type: Sub-task > Components: Iceberg integration, StorageHandler >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > > Allow HiveIcebergStorageHandler or any other non-native tables to declare how > to physically bucket records so that Hive can enable Bucket Map Join for them. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28473) INSERT OVERWRITE LOCAL DIRECTORY writes staging files to wrong hdfs directory
[ https://issues.apache.org/jira/browse/HIVE-28473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28473: -- Labels: pull-request-available (was: ) > INSERT OVERWRITE LOCAL DIRECTORY writes staging files to wrong hdfs directory > - > > Key: HIVE-28473 > URL: https://issues.apache.org/jira/browse/HIVE-28473 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.3 > Environment: Hadoop 3.3.4 > HIVE 3.1.3 > mapreduce engine >Reporter: liang yu >Priority: Major > Labels: pull-request-available > > using HIVE 3.1.3 ; mr engine; HADOOP 3.3.4 > > *Description* > When I try to insert data into the local directory "/path/to/local", Hive > usually first creates an intermediate HDFS directory like > "hdfs:/session/execution/.staging-hive-xx", which is based on sessionId and > executionId. After that, it moves the results to the local filesystem at > "/path/to/local". > However, it’s currently trying to create an intermediate HDFS directory at > "hdfs:/path/to/local/.staging-hive-xx", which incorrectly uses the local > filesystem path. This causes an error because it's attempting to create a new > path starting from {{{}/root{}}}, where we don't have sufficient permissions. > > It can be reproduced by: > {code:java} > INSERT OVERWRITE LOCAL DIRECTORY "/path/to/local/dir" > select a > from table > group by a; {code} > > StackTrace: > {code:java} > RuntimeException: cannot create staging directory > "hdfs:/path/to/local/dir/.hive-staging-xx": > Permission denied: user=aaa, access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x > {code} > > *ANALYSE* > > In function > _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc._ We do > the same execution for both _QBMetaData.DEST_LOCAL_FILE_ and > _QBMetaData.DEST_DFS_FILE,_ and then we set the value > _ctx.getTempDirForInterimJobPath(dest_path).toString() to_ {_}statsTmpLoc{_}. > But for local filesystem dest_path is always totally different from the paths > of HADOOP filesystem, and then we get the exception that we cannot create a > HDFS directory because we don't have sufficient permissions. > > *SOLUTION* > > we should modify the function > _org.apache.hadoop.hive.ql.parse.SemanticAnalyzer#createFileSinkDesc_ to > treat _QBMetaData.DEST_LOCAL_FILE_ and _QBMetaData.DEST_DFS_FILE_ differently > by giving the value _ctx.getMRTmpPath().toString()_ to _statsTmpLoc_ to avoid > creating a wrong intermediate direcoty. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28480) Disable SMB on partition hash generator mismatch across join branches in previous RS
[ https://issues.apache.org/jira/browse/HIVE-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28480: -- Labels: pull-request-available (was: ) > Disable SMB on partition hash generator mismatch across join branches in > previous RS > > > Key: HIVE-28480 > URL: https://issues.apache.org/jira/browse/HIVE-28480 > Project: Hive > Issue Type: Bug > Components: Query Planning >Reporter: Himanshu Mishra >Assignee: Himanshu Mishra >Priority: Major > Labels: pull-request-available > > As SMB replaces last RS op from the joining branches and the JOIN op with > MERGEJOIN, we need to ensure the RS before these RS, in both branches, are > partitioning using same hash generator. > Hash code generator differs based on ReducerTraits.UNIFORM i.e. > [ReduceSinkOperator#computeMurmurHash() or > ReduceSinkOperator#computeHashCode()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L340-L344], > leading to different hash code for same value. > Skip SMB join in such cases. > h3. Replication: > Consider following query, where join would get converted to SMB. Auto reducer > is enabled which ensures more than 1 reducer task. > > {code:java} > CREATE TABLE t_asj_18 (k STRING, v INT); > INSERT INTO t_asj_18 values ('a', 10), ('a', 10); > set hive.auto.convert.join=false; > set hive.tez.auto.reducer.parallelism=true; > EXPLAIN SELECT * FROM ( > SELECT k, COUNT(DISTINCT v), SUM(v) > FROM t_asj_18 GROUP BY k > ) a LEFT JOIN ( > SELECT k, COUNT(v) > FROM t_asj_18 GROUP BY k > ) b ON a.k = b.k; {code} > > > Expected result is: > > {code:java} > a 1 20 a 2 {code} > but on master branch, it results in > > > {code:java} > a 1 20 NULLNULL {code} > > > Here for COUNT(DISTINCT), the RS key is k, v while partition is still k. In > such scenario [reducer trait UNIFORM is not > set|[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SetReducerParallelism.java#L99-L104].] > The hash code for "a" from 2nd subquery is generated using murmurHash > (270516725) while 1st is generated using bucketHash (1086686554) and result > in rows with "a" key reaching different reducer tasks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28451) JDBC: TableName matcher fix in GenericJdbcDatabaseAccessor#addBoundaryToQuery
[ https://issues.apache.org/jira/browse/HIVE-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28451: -- Labels: pull-request-available (was: ) > JDBC: TableName matcher fix in GenericJdbcDatabaseAccessor#addBoundaryToQuery > - > > Key: HIVE-28451 > URL: https://issues.apache.org/jira/browse/HIVE-28451 > Project: Hive > Issue Type: Bug > Components: JDBC storage handler >Affects Versions: 4.0.0 >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > > {code} > Caught exception while trying to execute query\rjava.lang.RuntimeException: > Cannot find . in sql query SELECT … FROM > "".""\rat... > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28264) OOM/slow compilation when query contains SELECT clauses with nested expressions
[ https://issues.apache.org/jira/browse/HIVE-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28264: -- Labels: pull-request-available (was: ) > OOM/slow compilation when query contains SELECT clauses with nested > expressions > --- > > Key: HIVE-28264 > URL: https://issues.apache.org/jira/browse/HIVE-28264 > Project: Hive > Issue Type: Bug > Components: CBO, HiveServer2 >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > {code:sql} > CREATE TABLE t0 (`title` string); > SELECT x10 from > (SELECT concat_ws('L10',x9, x9, x9, x9) as x10 from > (SELECT concat_ws('L9',x8, x8, x8, x8) as x9 from > (SELECT concat_ws('L8',x7, x7, x7, x7) as x8 from > (SELECT concat_ws('L7',x6, x6, x6, x6) as x7 from > (SELECT concat_ws('L6',x5, x5, x5, x5) as x6 from > (SELECT concat_ws('L5',x4, x4, x4, x4) as x5 from > (SELECT concat_ws('L4',x3, x3, x3, x3) as x4 from > (SELECT concat_ws('L3',x2, x2, x2, x2) as x3 > from > (SELECT concat_ws('L2',x1, x1, x1, x1) as > x2 from > (SELECT concat_ws('L1',x0, x0, x0, > x0) as x1 from > (SELECT concat_ws('L0',title, > title, title, title) as x0 from t0) t1) t2) t3) t4) t5) t6) t7) t8) t9) t10) t > WHERE x10 = 'Something'; > {code} > The query above fails with OOM when run with the TestMiniLlapLocalCliDriver > and the default max heap size configuration effective for tests (-Xmx2048m). > {noformat} > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448) > at java.lang.StringBuilder.append(StringBuilder.java:136) > at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:152) > at org.apache.calcite.rex.RexCall.toString(RexCall.java:165) > at org.apache.calcite.rex.RexCall.appendOperands(RexCall.java:105) > at org.apache.calcite.rex.RexCall.computeDigest(RexCall.java:151) > at org.apache.calcite.rex.RexCall.toString(RexCall.java:165) > at java.lang.String.valueOf(String.java:2994) > at java.lang.StringBuilder.append(StringBuilder.java:131) > at > org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:90) > at > org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144) > at > org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) > at > org.apache.calcite.rel.externalize.RelWriterImpl.explainInputs(RelWriterImpl.java:122) > at > org.apache.calcite.rel.externalize.RelWriterImpl.explain_(RelWriterImpl.java:116) > at > org.apache.calcite.rel.externalize.RelWriterImpl.done(RelWriterImpl.java:144) > at > org.apache.calcite.rel.AbstractRelNode.explain(AbstractRelNode.java:246) > at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2308) > at org.apache.calcite.plan.RelOptUtil.toString(RelOptUtil.java:2292) > at > org.apache.hadoop.hive.ql.optimizer.calcite.RuleEventLogger.ruleProductionSucceeded(RuleEventLogger.java:73) > at > org.apache.calcite.plan.MulticastRelOptListener.ruleProductionSucceeded(MulticastRelOptListener.java:68) > at > org.apache.calcite.plan.AbstractRelOptPlanner.notifyTransformation(AbstractRelOptPlanner.java:370) > at > org.apache.calcite.plan.hep.HepPlanner.applyTransformationResults(HepPlanner.java:702) > at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:545) > at > org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407) > at > org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:271) > at > org.apache.calcite.plan.hep.HepInstruction$RuleCollection.execute(HepInstruction.java:74) > at > org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202) > at > org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2452) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.executeProgram(CalcitePlanner.java:2411) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28460) Cleanup some dangling codes around the Metastore
[ https://issues.apache.org/jira/browse/HIVE-28460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28460: -- Labels: pull-request-available (was: ) > Cleanup some dangling codes around the Metastore > > > Key: HIVE-28460 > URL: https://issues.apache.org/jira/browse/HIVE-28460 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > > This jira is to track the work of: > 1. Determine the database product once instead of every time on ObjectStore > initialization; > 2. Extract the ObjectStore methods designed for HiveMetaTool to a standalone > class; > 3. Remove some dangling codes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27080) Support project pushdown in JDBC storage handler even when filters are not pushed
[ https://issues.apache.org/jira/browse/HIVE-27080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27080: -- Labels: pull-request-available (was: ) > Support project pushdown in JDBC storage handler even when filters are not > pushed > - > > Key: HIVE-27080 > URL: https://issues.apache.org/jira/browse/HIVE-27080 > Project: Hive > Issue Type: Improvement > Components: CBO >Affects Versions: 4.0.0-alpha-2 >Reporter: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: jdbc_project_pushdown.q > > > {code:sql} > CREATE EXTERNAL TABLE book > ( > id int, > title varchar(20), > author int > ) > STORED BY > 'org.apache.hive.storage.jdbc.JdbcStorageHandler' > TBLPROPERTIES ( > "hive.sql.database.type" = "POSTGRES", > "hive.sql.jdbc.driver" = "org.postgresql.Driver", > "hive.sql.jdbc.url" = "jdbc:postgresql://localhost:5432/qtestDB", > "hive.sql.dbcp.username" = "qtestuser", > "hive.sql.dbcp.password" = "qtestpassword", > "hive.sql.table" = "book" > ); > {code} > {code:sql} > explain cbo select id from book where title = 'Les Miserables'; > {code} > {noformat} > CBO PLAN: > HiveJdbcConverter(convention=[JDBC.POSTGRES]) > JdbcProject(id=[$0]) > JdbcFilter(condition=[=($1, _UTF-16LE'Les Miserables')]) > JdbcHiveTableScan(table=[[default, book]], table:alias=[book]) > {noformat} > +Good case:+ Only the id column is fetched from the underlying database (see > JdbcProject) since it is necessary for the result. > {code:sql} > explain cbo select id from book where UPPER(title) = 'LES MISERABLES'; > {code} > {noformat} > CBO PLAN: > HiveProject(id=[$0]) > HiveFilter(condition=[=(CAST(UPPER($1)):VARCHAR(2147483647) CHARACTER SET > "UTF-16LE", _UTF-16LE'LES MISERABLES')]) > HiveProject(id=[$0], title=[$1], author=[$2]) > HiveJdbcConverter(convention=[JDBC.POSTGRES]) > JdbcHiveTableScan(table=[[default, book]], table:alias=[book]) > {noformat} > +Bad case:+ All table columns are fetched from the database although only id > and title are necessary; id is the result so cannot be dropped and title is > needed for HiveFilter since the UPPER operation was not pushed in the DBMS. > The author column is not needed at all so the plan should have a JdbcProject > with id, and title, on top of the JdbcHiveTableScan. > Although it doesn't seem a big deal in some cases tables are pretty wide > (more than 100 columns) while the queries rarely return all of them. > Improving project pushdown to handle such cases can give a major performance > boost. > Pushing the filter with UPPER to JDBC storage handler is also a relevant > improvement but this should be tracked under another ticket. > The problem can be reproduced by running: > {noformat} > mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=jdbc_project_pushdown.q > -Dtest.output.overwrite > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28457) HS2 WEBUI: LDAP authorization
[ https://issues.apache.org/jira/browse/HIVE-28457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28457: -- Labels: pull-request-available (was: ) > HS2 WEBUI: LDAP authorization > - > > Key: HIVE-28457 > URL: https://issues.apache.org/jira/browse/HIVE-28457 > Project: Hive > Issue Type: Sub-task > Components: Hive, HiveServer2 >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28456) ObjectStore updatePartitionColumnStatisticsInBatch can cause connection starvation
[ https://issues.apache.org/jira/browse/HIVE-28456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28456: -- Labels: pull-request-available (was: ) > ObjectStore updatePartitionColumnStatisticsInBatch can cause connection > starvation > --- > > Key: HIVE-28456 > URL: https://issues.apache.org/jira/browse/HIVE-28456 > Project: Hive > Issue Type: Bug >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > > Since HIVE-26419, we have a secondary connection pool for schema generation, > and for value generation operations, the size of this pool is 2. However, > based on DataNucleus documentation on datanucleus.ConnectionFactory2, link: > [https://www.datanucleus.org/products/accessplatform_5_0/jdo/persistence.html] > the secondary pool also serves for nontransactional connections, which makes > the ObjectStore updatePartitionColumnStatisticsInBatch request the connection > from this pool, as it doesn't open a transaction explicitly. If there is a > slow on inserting or updating the column statistics, the pool will become > unavailable quickly(the pool reaches its maximum size), the ObjectStore cloud > see the "Connection is not available, request timed out" under such a > situation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28452) Iceberg: Cache delete files on executors
[ https://issues.apache.org/jira/browse/HIVE-28452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28452: -- Labels: pull-request-available (was: ) > Iceberg: Cache delete files on executors > > > Key: HIVE-28452 > URL: https://issues.apache.org/jira/browse/HIVE-28452 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0 >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28443) Add ifExists field to dropCatalogRequest
[ https://issues.apache.org/jira/browse/HIVE-28443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28443: -- Labels: pull-request-available (was: ) > Add ifExists field to dropCatalogRequest > > > Key: HIVE-28443 > URL: https://issues.apache.org/jira/browse/HIVE-28443 > Project: Hive > Issue Type: Bug >Reporter: Jintong Jiang >Assignee: Jintong Jiang >Priority: Major > Labels: pull-request-available > > Add ifExists field to dropCatalogRequest to not throw exception from server > if needed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28446) Convert some reserved words to non-reserved words
[ https://issues.apache.org/jira/browse/HIVE-28446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28446: -- Labels: pull-request-available (was: ) > Convert some reserved words to non-reserved words > - > > Key: HIVE-28446 > URL: https://issues.apache.org/jira/browse/HIVE-28446 > Project: Hive > Issue Type: Improvement > Components: Parser >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > > We've missed listing some new keywords in the non-reserved list. > {code:java} > 0: jdbc:hive2://hive-hiveserver2:1/defaul> create table test (application > int); > Error: Error while compiling statement: FAILED: ParseException line 1:19 > cannot recognize input near 'application' 'int' ')' in column name or > constraint (state=42000,code=4) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28342) Iceberg: Major QB Compaction support filter in compaction request
[ https://issues.apache.org/jira/browse/HIVE-28342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28342: -- Labels: hive iceberg pull-request-available (was: hive iceberg) > Iceberg: Major QB Compaction support filter in compaction request > - > > Key: HIVE-28342 > URL: https://issues.apache.org/jira/browse/HIVE-28342 > Project: Hive > Issue Type: Task > Components: Hive, Iceberg integration >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > Labels: hive, iceberg, pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28442) Missing Column `ENGINE` in tables SYSDB.TAB_COL_STATS and SYSDB.PART_COL_STATS after upgrade from 3.1.0 to 4.1.0
[ https://issues.apache.org/jira/browse/HIVE-28442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28442: -- Labels: pull-request-available (was: ) > Missing Column `ENGINE` in tables SYSDB.TAB_COL_STATS and > SYSDB.PART_COL_STATS after upgrade from 3.1.0 to 4.1.0 > > > Key: HIVE-28442 > URL: https://issues.apache.org/jira/browse/HIVE-28442 > Project: Hive > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Assignee: Indhumathi Muthumurugesh >Priority: Major > Labels: pull-request-available > Attachments: image-2024-08-12-13-22-25-517.png, > image-2024-08-12-13-23-29-221.png, image-2024-08-12-13-24-29-328.png, > image-2024-08-12-13-25-25-731.png > > > Fresh Install: InitSchema > Tables SYSDB.TAB_COL_STATS and SYSDB.PART_COL_STATS > !image-2024-08-12-13-22-25-517.png|width=399,height=241! > Select * from the above tables: > !image-2024-08-12-13-23-29-221.png|width=439,height=265! > > Issue: > Upgrade Hive Schema from 3.2.0 to 4.1.0: > !image-2024-08-12-13-24-29-328.png|width=466,height=285! > Select * on the tables fails post upgrade > !image-2024-08-12-13-25-25-731.png|width=416,height=185! > > Looks like missed from HIVE-22046 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28441) NPE in ORC tables when hive.orc.splits.include.file.footer is enabled
[ https://issues.apache.org/jira/browse/HIVE-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28441: -- Labels: pull-request-available (was: ) > NPE in ORC tables when hive.orc.splits.include.file.footer is enabled > - > > Key: HIVE-28441 > URL: https://issues.apache.org/jira/browse/HIVE-28441 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 4.0.0 >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Major > Labels: pull-request-available > > Steps to reproduce (tested on hive4 docker image): > {code:java} > set hive.orc.splits.include.file.footer=true; > set hive.fetch.task.conversion=none; > CREATE TABLE tbl (id INT, name STRING) STORED AS ORC; > INSERT INTO tbl VALUES (1, 'abc'); > SELECT * FROM tbl;{code} > Stacktrace: > {code:java} > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.NullPointerException > at org.apache.orc.impl.BufferChunk.(BufferChunk.java:41) > at org.apache.orc.impl.OrcTail.(OrcTail.java:56) > at org.apache.orc.impl.OrcTail.(OrcTail.java:50) > at org.apache.hadoop.hive.ql.io.orc.OrcSplit.readFields(OrcSplit.java:230) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.readFields(HiveInputFormat.java:223) > at > org.apache.hadoop.mapred.split.TezGroupedSplit.readWrappedSplit(TezGroupedSplit.java:161) > at > org.apache.hadoop.mapred.split.TezGroupedSplit.readFields(TezGroupedSplit.java:132) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42) > at > org.apache.tez.mapreduce.hadoop.MRInputHelpers.createOldFormatSplitFromUserPayload(MRInputHelpers.java:176) > at > org.apache.tez.mapreduce.lib.MRInputUtils.getOldSplitDetailsFromEvent(MRInputUtils.java:132) > at > org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(MRInput.java:693) > at org.apache.tez.mapreduce.input.MRInput.initFromEvent(MRInput.java:664) > at > org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(MRInputLegacy.java:150) > at > org.apache.tez.mapreduce.input.MRInputLegacy.init(MRInputLegacy.java:114) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(MapRecordProcessor.java:520) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:173) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:292) > ... 16 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28440) unblock hcatalog parquet project pushdown
[ https://issues.apache.org/jira/browse/HIVE-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28440: -- Labels: pull-request-available (was: ) > unblock hcatalog parquet project pushdown > - > > Key: HIVE-28440 > URL: https://issues.apache.org/jira/browse/HIVE-28440 > Project: Hive > Issue Type: Improvement > Components: HCatalog >Affects Versions: 4.0.0 >Reporter: Yi Zhang >Priority: Minor > Labels: pull-request-available > > for pig jobs that uses hcatloader, when load parquet tables project pushdown > is not in effect. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28436) Incorrect syntax in Hive schema file for table MIN_HISTORY_LEVEL
[ https://issues.apache.org/jira/browse/HIVE-28436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28436: -- Labels: pull-request-available (was: ) > Incorrect syntax in Hive schema file for table MIN_HISTORY_LEVEL > > > Key: HIVE-28436 > URL: https://issues.apache.org/jira/browse/HIVE-28436 > Project: Hive > Issue Type: Bug >Reporter: Indhumathi Muthumurugesh >Assignee: Indhumathi Muthumurugesh >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > CREATE EXTERNAL TABLE IF NOT EXISTS `MIN_HISTORY_LEVEL` ( `MHL_TXNID` bigint, > `MHL_MIN_OPEN_TXNID` bigint ) STORED BY > 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( > "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT > `MHL_TXNID`, `MHL_MIN_OPEN_TXNID`, FROM `MIN_HISTORY_LEVEL`" ) > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: > MetaException(message:org.apache.hadoop.hive.serde2.SerDeException > org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Error > while trying to get column names: You have an error in your SQL syntax; check > the manual that corresponds to your MySQL server version for the right syntax > to use near 'FROM `MIN_HISTORY_LEVEL` LIMIT 1' at line 1) > INFO : Compiling > command(queryId=hive_20240805174353_0c743113-4f53-4174-916f-4a2d2085888e): > CREATE EXTERNAL TABLE IF NOT EXISTS `MIN_HISTORY_LEVEL` ( `MHL_TXNID` bigint, > `MHL_MIN_OPEN_TXNID` bigint ) STORED BY > 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES ( > "hive.sql.database.type" = "METASTORE", "hive.sql.query" = "SELECT > `MHL_TXNID`, `MHL_MIN_OPEN_TXNID`, FROM `MIN_HISTORY_LEVEL`" ) > INFO : Concurrency mode is disabled, not creating a lock manager -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28439) Iceberg: Bucket partition transform with DECIMAL can throw NPE
[ https://issues.apache.org/jira/browse/HIVE-28439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28439: -- Labels: pull-request-available (was: ) > Iceberg: Bucket partition transform with DECIMAL can throw NPE > -- > > Key: HIVE-28439 > URL: https://issues.apache.org/jira/browse/HIVE-28439 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > > Hive can fail when we bucket records by decimal columns. > {code:java} > CREATE TABLE test (c_decimal DECIMAL(38, 0)) PARTITIONED BY SPEC (bucket(8, > c_decimal)) STORED BY ICEBERG; > INSERT INTO test VALUES (CAST('5000441610525' AS DECIMAL(38, > 0))); {code} > Stacktrace > {code:java} > ERROR : Vertex failed, vertexName=Map 1, > vertexId=vertex_1722775255811_0004_1_00, diagnostics=[Task failed, > taskId=task_1722775255811_0004_1_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Node: > yarn-nodemanager-2.yarn-nodemanager.zookage.svc.cluster.local/10.1.5.93 : > Error while running task ( failure ) : > attempt_1722775255811_0004_1_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > Hive Runtime Error while processing writable > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing writable > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293) > ... 16 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing writable > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:569) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101) > ... 19 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:384) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:94) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:888) > at > org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:133) > at > org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:45) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:110) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDTFInline.process(GenericUDTFInline.java:64) > at > org.a
[jira] [Updated] (HIVE-28366) Iceberg: Concurrent Insert and IOW produce incorrect result
[ https://issues.apache.org/jira/browse/HIVE-28366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28366: -- Labels: pull-request-available (was: ) > Iceberg: Concurrent Insert and IOW produce incorrect result > > > Key: HIVE-28366 > URL: https://issues.apache.org/jira/browse/HIVE-28366 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0 >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > > 1. create a table and insert some data: > {code} > create table ice_t (i int, p int) partitioned by spec (truncate(10, i)) > stored by iceberg; > insert into ice_t values (1, 1), (2, 2); > insert into ice_t values (10, 10), (20, 20); > insert into ice_t values (40, 40), (30, 30); > {code} > Then concurrently execute the following jobs: > Job 1: > {code} > insert into ice_t select i*100, p*100 from ice_t; > {code} > Job 2: > {code} > insert overwrite ice_t select i+1, p+1 from ice_t; > {code} > If Job 1 finishes first, Job 2 still succeeds for me, and after that the > table content will be the following: > {code} > 2 2 > 3 3 > 11 11 > 21 21 > 31 31 > 41 41 > 100100 > 200200 > 1000 1000 > 2000 2000 > 3000 3000 > 4000 4000 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28438) Upgrade commons-dbcp2 and commons-pool2 to 2.12.0 to fix CVEs
[ https://issues.apache.org/jira/browse/HIVE-28438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28438: -- Labels: pull-request-available (was: ) > Upgrade commons-dbcp2 and commons-pool2 to 2.12.0 to fix CVEs > - > > Key: HIVE-28438 > URL: https://issues.apache.org/jira/browse/HIVE-28438 > Project: Hive > Issue Type: Improvement >Reporter: tanishqchugh >Assignee: tanishqchugh >Priority: Major > Labels: pull-request-available > > In the master branch, currently we are using commons-dbcp2 v2.9.0 which is > affected by following CVEs: > [CVE-2022-45868|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-45868] > [CVE-2022-23221|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-23221] > [CVE-2021-42392|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-42392] > [CVE-2021-23463|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-23463] > Also, there are two different versions (2.10.0 & 2.11.1) of commons-pool2 > across project as commons-dbcp2 v2.9.0 requires commons-pool2 v2.10.0 as a > compile time dependency. > Upgrading both of the dependencies to v2.12.0 to resolve the CVEs as well as > make commons-pool2 version consistent across project to prevent any kind of > potential conflicts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28431) Fix RexLiteral to ExprNode conversion if the literal is an empty string
[ https://issues.apache.org/jira/browse/HIVE-28431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28431: -- Labels: pull-request-available (was: ) > Fix RexLiteral to ExprNode conversion if the literal is an empty string > --- > > Key: HIVE-28431 > URL: https://issues.apache.org/jira/browse/HIVE-28431 > Project: Hive > Issue Type: Bug >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > > Currently conversion from RexLiteral to ExprNode fail if the literal is an > empty string. This was introduced from > https://issues.apache.org/jira/browse/HIVE-23892. This causes the CBO to fail > RexLiteral node will not be null but still value within RexLiteral can be > empty. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28435) Upgrade cron-utils to 9.2.1
[ https://issues.apache.org/jira/browse/HIVE-28435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28435: -- Labels: pull-request-available (was: ) > Upgrade cron-utils to 9.2.1 > --- > > Key: HIVE-28435 > URL: https://issues.apache.org/jira/browse/HIVE-28435 > Project: Hive > Issue Type: Task >Reporter: tanishqchugh >Assignee: tanishqchugh >Priority: Major > Labels: pull-request-available > > Cron-utils v9.1.6 requires org.glassfish:javax.el v3.0.0 as a compile time > dependency. javax.el artifact was moved to jakarta.el. All versions upto and > including 3.0.3 for jakarta.el artifact is affected by > [CVE-2021-28170|[https://nvd.nist.gov/vuln/detail/CVE-2021-28170]] > Upgrade cron-utils to 9.2.1 to get rid of CVE-2021-28170 as this upgrade > would remove transitive usage of javax.el -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28434) Upgrade to tez 0.10.4
[ https://issues.apache.org/jira/browse/HIVE-28434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28434: -- Labels: pull-request-available (was: ) > Upgrade to tez 0.10.4 > - > > Key: HIVE-28434 > URL: https://issues.apache.org/jira/browse/HIVE-28434 > Project: Hive > Issue Type: Improvement >Reporter: László Bodor >Assignee: László Bodor >Priority: Major > Labels: pull-request-available > Fix For: 4.0.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28428) Map hash aggregation performance degradation
[ https://issues.apache.org/jira/browse/HIVE-28428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28428: -- Labels: pull-request-available (was: ) > Map hash aggregation performance degradation > - > > Key: HIVE-28428 > URL: https://issues.apache.org/jira/browse/HIVE-28428 > Project: Hive > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Major > Labels: pull-request-available > Attachments: 2024-08-02 14.35.46.png, > image-2024-08-02-14-37-01-824.png, image-2024-08-02-14-38-45-459.png > > > The following ticket has been fixed to enable map hash aggregation, but > performance degradation than when it is disabled. > https://issues.apache.org/jira/browse/HIVE-23356 > I found a few reasons for this. If there are a large number of keys, the > following log will be output in large volume, affecting performance. And, > this can also cause an OOM. > {code:java} > 2024-08-02 05:21:53,675 [INFO] [TezChild] |exec.GroupByOperator|: Hash Tbl > flush: #hash table = 171000 > 2024-08-02 05:21:53,713 [INFO] [TezChild] |exec.GroupByOperator|: Hash Table > flushed: new size = 153900 > {code} > By fixing this, we can improve performance as follows. > Before: > !image-2024-08-02-14-37-01-824.png! > After: > !2024-08-02 14.35.46.png! > And, currently the flush size is fixed, but performance can be improved by > changing it depending on the data: > !image-2024-08-02-14-38-45-459.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28427) HivePreFilteringRule gets applied multiple times
[ https://issues.apache.org/jira/browse/HIVE-28427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28427: -- Labels: pull-request-available (was: ) > HivePreFilteringRule gets applied multiple times > > > Key: HIVE-28427 > URL: https://issues.apache.org/jira/browse/HIVE-28427 > Project: Hive > Issue Type: Bug >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > > In the {{matches}} method of {{{}HivePreFilteringRule{}}}, we check if a node > has already been visited using the {{{}HiveRulesRegistry{}}}. This is done by > using a {{{}SetMultimap{}}}. Currently, we don't get the > same hash value for equivalent RelNodes and because of this we visit similar > nodes multiple times even when it is present in the registry. Sometimes we > can also see infinite matching. > > Instead we can use a {{SetMultimap}} and store Strings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28426) mysql schema upgrade fails
[ https://issues.apache.org/jira/browse/HIVE-28426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28426: -- Labels: pull-request-available (was: ) > mysql schema upgrade fails > -- > > Key: HIVE-28426 > URL: https://issues.apache.org/jira/browse/HIVE-28426 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Pravin Sinha >Assignee: Pravin Sinha >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28422) Iceberg: Added missing awaitility dependency in iceberg-catalog
[ https://issues.apache.org/jira/browse/HIVE-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28422: -- Labels: pull-request-available (was: ) > Iceberg: Added missing awaitility dependency in iceberg-catalog > --- > > Key: HIVE-28422 > URL: https://issues.apache.org/jira/browse/HIVE-28422 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Trivial > Labels: pull-request-available > > HIVE-28364 only added the *org.awaitility* dependency in iceberg root pom, > but this dep is used by iceberg-catalog, so we need to add it in > iceberg-catalog moudle. Otherwise, the idea will regard *Awaitility* as > unrecognized code in *TestHiveTableConcurrency* which may make people > confused, though we can build iceberg project successfully. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28421) Iceberg: mvn test can not run UTs in iceberg-cacatlog
[ https://issues.apache.org/jira/browse/HIVE-28421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28421: -- Labels: pull-request-available (was: ) > Iceberg: mvn test can not run UTs in iceberg-cacatlog > - > > Key: HIVE-28421 > URL: https://issues.apache.org/jira/browse/HIVE-28421 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Major > Labels: pull-request-available > > mvn clean test -Dtest=TestHiveCommits -pl iceberg/iceberg-catalog -Piceberg > will print {*}Tests run: 0, Failures: 0, Errors: 0, Skipped: 0{*}, and > acutually the test won't be ran. > > {code:java} > For more information see > https://gradle.com/help/maven-extension-compile-avoidance. > [INFO] Loaded from the build cache, saving 1.185s > [INFO] > [INFO] --- maven-surefire-plugin:3.0.0-M4:test (default-test) @ > hive-iceberg-catalog --- > [INFO] > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] > [INFO] Results: > [INFO] > [INFO] Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 > [INFO] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 9.513 s > [INFO] Finished at: 2024-08-01T13:32:26+08:00 > [INFO] > > [INFO] 17 goals, 14 executed, 3 from cache, saving at least 2s > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28417) Bump log4j2 to 2.23.1 to facilitate the use of HiveServer2 JDBC Driver under GraalVM Native Image
[ https://issues.apache.org/jira/browse/HIVE-28417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28417: -- Labels: pull-request-available (was: ) > Bump log4j2 to 2.23.1 to facilitate the use of HiveServer2 JDBC Driver under > GraalVM Native Image > - > > Key: HIVE-28417 > URL: https://issues.apache.org/jira/browse/HIVE-28417 > Project: Hive > Issue Type: Improvement >Reporter: Qiheng He >Priority: Major > Labels: pull-request-available > > * Bump log4j2 to 2.23.1 to facilitate the use of HiveServer2 JDBC Driver > under GraalVM Native Image. > * apache/logging-log4j2 has eliminated various removed old JDK APIs that > prevented it from being used under GraalVM Native Image since `2.24.0`. See > [https://github.com/apache/logging-log4j2/issues/1539] . > - But apache/hive:4.0.0 is still using the old version of > apache/logging-log4j2, which means that in PRs such as > [https://github.com/apache/shardingsphere/pull/31526] , in order to execute > unit tests related to HiveServer2 JDBC Driver under GraalVM Native Image, I > have to manually exclude the dependency of Log4j2. This sounds like, > {code:xml} > > > > org.apache.hive > hive-jdbc > 4.0.0 > > > org.apache.hive > hive-service > 4.0.0 > > > org.apache.hadoop > hadoop-client-api > > > org.apache.logging.log4j > log4j-api > > > org.apache.logging.log4j > log4j-slf4j-impl > > > org.slf4j > slf4j-log4j12 > > > > > org.apache.hadoop > hadoop-client-api > 3.3.6 > > > > {code} > - If `org.apache.logging.log4j:log4j-api` is not excluded, HiveServer2 JDBC > Driver cannot be used under GraalVM Native Image, and the log is similar to > the following. > {code:bash} > [INFO] Executing: > /home/linghengqian/TwinklingLiftWorks/git/public/shardingsphere/test/native/target/native-tests > --xml-output-dir > /home/linghengqian/TwinklingLiftWorks/git/public/shardingsphere/test/native/target/native-test-reports > > -Djunit.platform.listeners.uid.tracking.output.dir=/home/linghengqian/TwinklingLiftWorks/git/public/shardingsphere/test/native/target/test-ids > JUnit Platform on Native Image - report > > Failures (1): > JUnit Jupiter:HiveTest:assertShardingInLocalTransactions() > MethodSource [className = > 'org.apache.shardingsphere.test.natived.jdbc.databases.HiveTest', methodName > = 'assertShardingInLocalTransactions', methodParameterTypes = ''] > => java.lang.NoClassDefFoundError: Could not initialize class > org.apache.logging.log4j.LogManager > > org.apache.commons.logging.LogAdapter$Log4jLog.(LogAdapter.java:155) > > org.apache.commons.logging.LogAdapter$Log4jAdapter.createLog(LogAdapter.java:122) >org.apache.commons.logging.LogAdapter.createLog(LogAdapter.java:89) >org.apache.commons.logging.LogFactory.getLog(LogFactory.java:67) >org.apache.commons.logging.LogFactory.getLog(LogFactory.java:59) >org.apache.hadoop.fs.FileSystem.(FileSystem.java:135) >java.base@22.0.2/java.lang.Class.ensureInitialized(DynamicHub.java:599) >java.base@22.0.2/java.lang.Class.ensureInitialized(DynamicHub.java:599) >java.base@22.0.2/java.lang.Class.ensureInitialized(DynamicHub.java:599) > > org.apache.hadoop.hive.conf.valcoersion.JavaIOTmpdirVariableCoercion.(JavaIOTmpdirVariableCoercion.java:37) >[...] > {code} > - If the Apache/Hive side can improve the version of log4j2, then to use the > HiveServer2 JDBC Driver under the GraalVM Native Image, I only need to > provide the GraalVM Reachability Metadata of Log4j2 in the downstream project. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-23964) SemanticException in query 30 while generating logical plan
[ https://issues.apache.org/jira/browse/HIVE-23964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-23964: -- Labels: pull-request-available (was: ) > SemanticException in query 30 while generating logical plan > --- > > Key: HIVE-23964 > URL: https://issues.apache.org/jira/browse/HIVE-23964 > Project: Hive > Issue Type: Bug >Reporter: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: cbo_query30_stacktrace.txt > > > Invalid table alias or column reference 'c_last_review_date' is thrown when > running TPC-DS query 30 (cbo_query30.q, query30.q) on the metastore with the > partitoned TPC-DS 30TB dataset. > The respective stacktrace is attached to this case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28402) Precommit tests fail with OOM when running split-19
[ https://issues.apache.org/jira/browse/HIVE-28402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28402: -- Labels: pull-request-available (was: ) > Precommit tests fail with OOM when running split-19 > --- > > Key: HIVE-28402 > URL: https://issues.apache.org/jira/browse/HIVE-28402 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > > The last 3 runs in master all fail with OOM when running split-19: > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2233/pipeline] > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2234/pipeline] > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2235/pipeline] > {noformat} > [2024-07-25T05:57:46.816Z] [INFO] Running > org.apache.hadoop.hive.metastore.client.TestGetPartitions > [2024-07-25T06:00:23.926Z] Exception in thread "Thread-46" > java.lang.OutOfMemoryError: GC overhead limit exceeded > [2024-07-25T06:00:23.926Z]at > java.util.Arrays.copyOfRange(Arrays.java:3664) > [2024-07-25T06:00:23.926Z]at java.lang.String.(String.java:207) > [2024-07-25T06:00:23.926Z]at > java.io.BufferedReader.readLine(BufferedReader.java:356) > [2024-07-25T06:00:24.907Z]at > java.io.BufferedReader.readLine(BufferedReader.java:389) > [2024-07-25T06:00:24.907Z]at > org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:89) > [2024-07-25T06:01:46.664Z] [WARNING] ForkStarter IOException: GC overhead > limit exceeded. See the dump file > /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream > [2024-07-25T06:01:55.003Z] [INFO] Running > org.apache.hadoop.hive.metastore.TestFilterHooks > [2024-07-25T06:02:21.747Z] > [2024-07-25T06:02:21.748Z] Exception: java.lang.OutOfMemoryError thrown from > the UncaughtExceptionHandler in thread "Thread-49" > [2024-07-25T06:03:08.707Z] [WARNING] ForkStarter IOException: GC overhead > limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded. See the dump file > /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream > [2024-07-25T06:03:15.362Z] [ERROR] Error closing test event listener: > [2024-07-25T06:03:15.362Z] java.util.concurrent.CompletionException: > java.lang.OutOfMemoryError: GC overhead limit exceeded > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.CompletableFuture.encodeThrowable > (CompletableFuture.java:273) > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.CompletableFuture.completeThrowable > (CompletableFuture.java:280) > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.CompletableFuture$AsyncRun.run > (CompletableFuture.java:1643) > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > [2024-07-25T06:03:15.362Z] at java.lang.Thread.run (Thread.java:748) > [2024-07-25T06:03:15.362Z] Caused by: java.lang.OutOfMemoryError: GC overhead > limit exceeded > [2024-07-25T06:03:15.363Z] [ERROR] GC overhead limit exceeded -> [Help 1] > {noformat} > The OOM is also affecting PR runs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28409) Column lineage when creating view is missing if atlas HiveHook is set
[ https://issues.apache.org/jira/browse/HIVE-28409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28409: -- Labels: pull-request-available (was: ) > Column lineage when creating view is missing if atlas HiveHook is set > - > > Key: HIVE-28409 > URL: https://issues.apache.org/jira/browse/HIVE-28409 > Project: Hive > Issue Type: Bug > Components: lineage >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > > Column lineage info is collected by > {{{}org.apache.hadoop.hive.ql.optimizer.lineage.Generator{}}}. This is called > during Hive optimizations and view creation if one of these conditions is met: > {code:java} > hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_LINEAGE_INFO) > || > postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.PostExecutePrinter") > || > postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.LineageLogger") > || postExecHooks.contains("org.apache.atlas.hive.hook.HiveHook") > {code} > [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java#L78-L81] > and > [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L13226-L13228] > However HIVE-17125 introduced more conditions which affects only the > {{org.apache.atlas.hive.hook.HiveHook}} > [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java#L75-L86] > > Later HIVE-23244 changed the code handles view creation. Since there are no > tests for testing view creation when {{org.apache.atlas.hive.hook.HiveHook}} > is specified at all the new code skips column lineage info collection. > The tests we have for testing column lineage info collection are using > [LineageLogger.java|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageLogger.java] > which doesn't have any restriction in the Generator so column lineage info > is always collected when LineageLogger is set. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28407) Alter Table Rename should not require create database privilege
[ https://issues.apache.org/jira/browse/HIVE-28407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28407: -- Labels: pull-request-available (was: ) > Alter Table Rename should not require create database privilege > --- > > Key: HIVE-28407 > URL: https://issues.apache.org/jira/browse/HIVE-28407 > Project: Hive > Issue Type: Bug > Components: Authorization, HiveServer2 >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > > Currently we add the database object to the list of privilege objects > required for authorization in the alter table rename set of queries. Ideally > we only need a create table permission on the database and not a create > database permission. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28404) Fix typo Overriden to Overridden
[ https://issues.apache.org/jira/browse/HIVE-28404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28404: -- Labels: pull-request-available (was: ) > Fix typo Overriden to Overridden > > > Key: HIVE-28404 > URL: https://issues.apache.org/jira/browse/HIVE-28404 > Project: Hive > Issue Type: Bug >Reporter: Caican Cai >Priority: Minor > Labels: pull-request-available > > fix typo Overriden to Overridden -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28405) Set default TimeUnit for hive.repl.cm.retain to DAYS in metastore configs
[ https://issues.apache.org/jira/browse/HIVE-28405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28405: -- Labels: pull-request-available (was: ) > Set default TimeUnit for hive.repl.cm.retain to DAYS in metastore configs > - > > Key: HIVE-28405 > URL: https://issues.apache.org/jira/browse/HIVE-28405 > Project: Hive > Issue Type: Improvement > Components: Metastore, Standalone Metastore >Reporter: Smruti Biswal >Assignee: Smruti Biswal >Priority: Minor > Labels: pull-request-available > > In HiveConf.java the default time unit for hive.repl.cm.retain is DAYS. > One could very easily get confused and set the value in DAYs under metastore > configuration. It will be good idea to keep the units in sync. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28403) Delete redundant Javadoc for Hive
[ https://issues.apache.org/jira/browse/HIVE-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28403: -- Labels: pull-request-available (was: ) > Delete redundant Javadoc for Hive > - > > Key: HIVE-28403 > URL: https://issues.apache.org/jira/browse/HIVE-28403 > Project: Hive > Issue Type: Wish >Reporter: Caican Cai >Priority: Minor > Labels: pull-request-available > > Hive has some redundant Javadoc, but there are no comments in it. I think > some Javadoc can be deleted. > {code:java} > // Some comments here > /** >* >*/ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline
[ https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28401: -- Labels: pull-request-available (was: ) > Drop redundant XML test report post-processing from CI pipeline > --- > > Key: HIVE-28401 > URL: https://issues.apache.org/jira/browse/HIVE-28401 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > The [Maven Surefire > plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin] > generates an XML report containing various information regarding the > execution of tests. In case of failures the system-out and system-err output > from the test is saved in the XML file. > The Jenkins pipeline has a post-processing > [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380] > that attempts to remove the system-out and system-err entries from the XML > files generated by Surefire for all tests that passed as an attempt to save > disk space in the Jenkins node. > {code:bash} > # removes all stdout and err for passed tests > xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d > 'testsuite/testcase/system-err[count(../failure)=0]' > {code} > This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing > system-out and system-err for tests that passed. > Moreover, when the XML report file is large xmlstarlet chokes and throws a > "Huge input lookup" error that skips the remaining post-processing steps and > makes the build fail. > {noformat} > [2024-07-23T16:11:26.052Z] > ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2: > internal error: Huge input lookup > [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799 INFO > [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio > [2024-07-23T16:11:26.053Z] ^ > [2024-07-23T16:11:43.133Z] Recording test results > [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found. > script returned exit code 3 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28399) Improve the fetch size in HiveConnection
[ https://issues.apache.org/jira/browse/HIVE-28399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28399: -- Labels: pull-request-available (was: ) > Improve the fetch size in HiveConnection > > > Key: HIVE-28399 > URL: https://issues.apache.org/jira/browse/HIVE-28399 > Project: Hive > Issue Type: Improvement >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > > If the 4.x Hive Jdbc client connects to an older HS2 or other thrift > implementations, it might throw the IllegalStateException: > [https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L1253-L1258], > as the remote might haven't set the property: > hive.server2.thrift.resultset.default.fetch.size back to the response of > OpenSession request. It also introduces the confusing on what the real fetch > size the connection is, as we have both initFetchSize and defaultFetchSize in > this HiveConnection, the HiveStatement checks the initFetchSize, > defaultFetchSize and > HIVE_SERVER2_THRIFT_RESULTSET_DEFAULT_FETCH_SIZE.defaultIntVal to obtain the > real fetch size, we can make them one in HiveConnection, so every statement > created from the connection uses this new fetch size. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28360) Upgrade jersey to version 1.19.4,
[ https://issues.apache.org/jira/browse/HIVE-28360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28360: -- Labels: hive-4.0.1-must pull-request-available (was: hive-4.0.1-must) > Upgrade jersey to version 1.19.4, > - > > Key: HIVE-28360 > URL: https://issues.apache.org/jira/browse/HIVE-28360 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.3 >Reporter: lvyankui >Assignee: lvyankui >Priority: Major > Labels: hive-4.0.1-must, pull-request-available > Attachments: HIVE-28360.patch > > > Hive version: 3.1.3 > Hadoop version: 3.3.6 > After upgrading to Hadoop 3.3.6, the Hive WebHCat server fails to start > because of inconsistent versions of the Jersey JAR package. Hive HCat lacks > the jersey-server-1.19 jar. > > After upgrading to Hadoop 3.3.5+, Hadoop updates jersey to version > {color:#ff}1.19.4{color}, which is inconsistent with the jersey version > in the Hive WebHCat server. As a result, the startup fails. To resolve this, > manually download a package and place it in > /usr/lib/hive-hcatalog/share/webhcat/svr/lib/ > Therefore, when packaging Hive, we need to specify the version of Jersey in > the Hive POM file to match the version of Jersey in Hadoop to avoid version > conflicts. > > Here is the error log > INFO | 18 Jul 2024 14:37:13,237 | org.eclipse.jetty.server.Server | > jetty-9.4.53.v20231009; built: 2023-10-09T12:29:09.265Z; git: > 27bde00a0b95a1d5bbee0eae7984f891d2d0f8c9; jvm 1.8.0_412-b08 > WARN | 18 Jul 2024 14:37:13,326 | > org.eclipse.jetty.server.handler.ContextHandler.ROOT | unavailable > com.sun.jersey.api.container.ContainerException: No WebApplication provider > is present > at > com.sun.jersey.spi.container.WebApplicationFactory.createWebApplication(WebApplicationFactory.java:69) > ~[jersey-server-1.19.4.jar:1.19.4] > at > com.sun.jersey.spi.container.servlet.ServletContainer.create(ServletContainer.java:412) > ~[jersey-servlet-1.19.jar:1.19] > at > com.sun.jersey.spi.container.servlet.ServletContainer$InternalWebComponent.create(ServletContainer.java:327) > ~[jersey-servlet-1.19.jar:1.19] > at > com.sun.jersey.spi.container.servlet.WebComponent.load(WebComponent.java:603) > ~[jersey-servlet-1.19.jar:1.19] > at > com.sun.jersey.spi.container.servlet.WebComponent.init(WebComponent.java:207) > ~[jersey-servlet-1.19.jar:1.19] > at > com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:394) > ~[jersey-servlet-1.19.jar:1.19] > at > com.sun.jersey.spi.container.servlet.ServletContainer.init(ServletContainer.java:577) > ~[jersey-servlet-1.19.jar:1.19] > at javax.servlet.GenericServlet.init(GenericServlet.java:244) > ~[javax.servlet-api-3.1.0.jar:3.1.0] > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28347) Make a UDAF 'collect_set' work with complex types, even when map-side aggregation is disabled.
[ https://issues.apache.org/jira/browse/HIVE-28347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28347: -- Labels: pull-request-available (was: ) > Make a UDAF 'collect_set' work with complex types, even when map-side > aggregation is disabled. > -- > > Key: HIVE-28347 > URL: https://issues.apache.org/jira/browse/HIVE-28347 > Project: Hive > Issue Type: Bug >Affects Versions: 2.4.0, 3.1.3, 4.0.0 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Labels: pull-request-available > > collect_set() (+ collect_list()) doesn't work with complex types, when > map-side aggregation is disabled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26473) Upgrade to Java17
[ https://issues.apache.org/jira/browse/HIVE-26473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26473: -- Labels: pull-request-available (was: ) > Upgrade to Java17 > - > > Key: HIVE-26473 > URL: https://issues.apache.org/jira/browse/HIVE-26473 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 4.0.0 >Reporter: dingwei2019 >Assignee: Akshat Mathur >Priority: Major > Labels: pull-request-available > > we know that jdk11 is a LTS version, but the technical support will be end in > September 2023. JDK17 is the next generation LTS version, and will support a > least to 2026. > for G1GC, Java17 will get 8.66% faster than Java11, for ParallelGC, the > percent will be 6.54%. If we upgrade to java17, we will get more performance > improvementthan Java11. > > I suggest, we upgrade hive version to support java17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28375) Upgrade Nimbus-JOSE-JWT to 9.37.3 due to CVE-2023-52428
[ https://issues.apache.org/jira/browse/HIVE-28375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28375: -- Labels: pull-request-available (was: ) > Upgrade Nimbus-JOSE-JWT to 9.37.3 due to CVE-2023-52428 > --- > > Key: HIVE-28375 > URL: https://issues.apache.org/jira/browse/HIVE-28375 > Project: Hive > Issue Type: Task >Reporter: Devaspati Krishnatri >Assignee: Devaspati Krishnatri >Priority: Minor > Labels: pull-request-available > Attachments: mvn_dependency_tree.txt > > > Upgrade Nimbus-JOSE-JWT to 9.37.3 due to CVE-2023-52428 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28377) Add support for hive.output.file.extension to HCatStorer
[ https://issues.apache.org/jira/browse/HIVE-28377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28377: -- Labels: pull-request-available (was: ) > Add support for hive.output.file.extension to HCatStorer > > > Key: HIVE-28377 > URL: https://issues.apache.org/jira/browse/HIVE-28377 > Project: Hive > Issue Type: Improvement >Reporter: Venkatasubrahmanian Narayanan >Assignee: Venkatasubrahmanian Narayanan >Priority: Minor > Labels: pull-request-available > > Hive supports custom file extensions for output files configured through the > hive.output.file.extension property, but HCatStorer doesn't support that > property or have a replacement. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28376) Remove unused Hive object from RelOptHiveTable
[ https://issues.apache.org/jira/browse/HIVE-28376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28376: -- Labels: pull-request-available (was: ) > Remove unused Hive object from RelOptHiveTable > -- > > Key: HIVE-28376 > URL: https://issues.apache.org/jira/browse/HIVE-28376 > Project: Hive > Issue Type: Task > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > The > [Hive|https://github.com/apache/hive/blob/b18d5732b4f309fdc3b8226847c9c1ebcd2476fd/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java] > object is not used inside RelOptHiveTable so keeping a reference to it is > wasting memory and also complicates creation of RelOptHiveTable objects > (constructor parameter). > Moreover, the Hive objects have thread local scope so in general they > shouldn't be passed around cause their lifecycle becomes harder to manage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28374) Iceberg: Handle change of default format-version
[ https://issues.apache.org/jira/browse/HIVE-28374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28374: -- Labels: pull-request-available (was: ) > Iceberg: Handle change of default format-version > > > Key: HIVE-28374 > URL: https://issues.apache.org/jira/browse/HIVE-28374 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > Handle changes due to change of default format version to 2 in the iceberg > lib. > one eg: table created with explicitly defined format-version=2 is MOR but > when unspecified the format version is 2 but the table is COW -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28370) HMS's Authorizer for ALTER_TABLE event doesn't depend on HIVE_AUTHORIZATION_TABLES_ON_STORAGEHANDLERS
[ https://issues.apache.org/jira/browse/HIVE-28370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28370: -- Labels: pull-request-available (was: ) > HMS's Authorizer for ALTER_TABLE event doesn't depend on > HIVE_AUTHORIZATION_TABLES_ON_STORAGEHANDLERS > - > > Key: HIVE-28370 > URL: https://issues.apache.org/jira/browse/HIVE-28370 > Project: Hive > Issue Type: Bug >Reporter: Hongdan Zhu >Assignee: Hongdan Zhu >Priority: Major > Labels: pull-request-available > > When HIVE_AUTHORIZATION_TABLES_ON_STORAGEHANDLERS is set on both HS2 and HMS, > only HS2 authorization depends on it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28373) fix-hadoop-catalog based table
[ https://issues.apache.org/jira/browse/HIVE-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28373: -- Labels: pull-request-available (was: ) > fix-hadoop-catalog based table > -- > > Key: HIVE-28373 > URL: https://issues.apache.org/jira/browse/HIVE-28373 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0 >Reporter: yongzhi.shao >Priority: Major > Labels: pull-request-available > > Since there are a lot of problems with hadoop_catalog, we submitted the > following PR to the iceberg community: > [core:Refactor the code of HadoopTableOptions by BsoBird · Pull Request > #10623 · apache/iceberg > (github.com)|https://github.com/apache/iceberg/pull/10623] > With this PR, we can implement atomic operations based on hadoopcatalog. > But this PR is not accepted by the iceberg community.And it seems that the > iceberg community is trying to remove support for hadoopcatalog. > Since hive itself supports a number of features based on the hadoop_catalog > table, can we merge this patch in hive? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28372) No need to update partitions stats when renaming table
[ https://issues.apache.org/jira/browse/HIVE-28372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28372: -- Labels: pull-request-available (was: ) > No need to update partitions stats when renaming table > -- > > Key: HIVE-28372 > URL: https://issues.apache.org/jira/browse/HIVE-28372 > Project: Hive > Issue Type: Improvement >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Major > Labels: pull-request-available > > After HIVE-27725, We no need to update partitions stats when renaming table. > This change can speed up partitioned table rename operation in case of many > partition stats stored in HMS. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28371) Optimize add partitions authorization in HiveMetaStore
[ https://issues.apache.org/jira/browse/HIVE-28371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28371: -- Labels: pull-request-available (was: ) > Optimize add partitions authorization in HiveMetaStore > -- > > Key: HIVE-28371 > URL: https://issues.apache.org/jira/browse/HIVE-28371 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > > Currently add_partitions() api sends all the partitions (new partitions and > existing partitions) that need to be added for authorization, instead, we can > optimize this by sending only the new partitions for authorization. > Impact: Alter table recover partitions collects all the available partitions > and sends it to Metastore to check if any new partitions can be added. If all > the partitions are sent for authorization irrespective of whether it exists > or not, the Authorization service will unnecessarily spend time on > authorizing already existing partitions. This can be avoided by only > authorizing new partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28369) LLAP proactive eviction fails with NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-28369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28369: -- Labels: pull-request-available (was: ) > LLAP proactive eviction fails with NullPointerException > --- > > Key: HIVE-28369 > URL: https://issues.apache.org/jira/browse/HIVE-28369 > Project: Hive > Issue Type: Bug >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > > When hive.llap.io.encode.enabled is false, LLAP proactive eviction fails with > NullPointerException as follows: > {code:java} > java.lang.NullPointerException: null > at > org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.evictEntity(LlapIoImpl.java:313) > ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.evictEntity(LlapProtocolServerImpl.java:365) > ~[hive-llap-server-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapManagementProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:33214) > ~[hive-exec-4.1.0-SNAPSHOT.jar:4.1.0-SNAPSHOT] > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484) > ~[hadoop-common-3.3.6.jar:?] > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595) > ~[hadoop-common-3.3.6.jar:?] > ...{code} > > In fact, 3 caches used by LlapIoImpl.evictEntity() may be null or throw > UnsupportedOperationException, so we should check whether it is safe to call > markBuffersForProactiveEviction() or not. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28356) HMS’s Authorizer for the CREATE_TABLE event doesn’t handle HivePrivilegeObjectType.STORAGEHANDLER_URI
[ https://issues.apache.org/jira/browse/HIVE-28356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28356: -- Labels: pull-request-available (was: ) > HMS’s Authorizer for the CREATE_TABLE event doesn’t handle > HivePrivilegeObjectType.STORAGEHANDLER_URI > - > > Key: HIVE-28356 > URL: https://issues.apache.org/jira/browse/HIVE-28356 > Project: Hive > Issue Type: Bug >Reporter: Hongdan Zhu >Assignee: Hongdan Zhu >Priority: Major > Labels: pull-request-available > > HIVE-27322 fixed the authorization of the Iceberg storagehandler through > Ranger policies for HS2, but the same policy enforcement is missing on the > HMS side, allowing the user to use directly the HMS API or simply use > Spark-SQL to create a storagehandler based table without the ranger policies > checked. > From Spark-SQL: > {noformat} > spark.sql("CREATE TABLE default.icespark1 (id int, txt string) USING iceberg > TBLPROPERTIES ('external.table.purge'='true')"){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28367) Bump org.xerial.snappy:snappy-java from 1.1.10.4 to 1.1.10.5
[ https://issues.apache.org/jira/browse/HIVE-28367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28367: -- Labels: pull-request-available (was: ) > Bump org.xerial.snappy:snappy-java from 1.1.10.4 to 1.1.10.5 > > > Key: HIVE-28367 > URL: https://issues.apache.org/jira/browse/HIVE-28367 > Project: Hive > Issue Type: Bug >Reporter: tanishqchugh >Assignee: tanishqchugh >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28368) Iceberg: Unable to read PARTITIONS Metadata table
[ https://issues.apache.org/jira/browse/HIVE-28368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28368: -- Labels: pull-request-available (was: ) > Iceberg: Unable to read PARTITIONS Metadata table > - > > Key: HIVE-28368 > URL: https://issues.apache.org/jira/browse/HIVE-28368 > Project: Hive > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > > Fails with > {noformat} > Caused by: java.lang.ClassCastException: java.time.LocalDateTime cannot be > cast to java.time.OffsetDateTime > at > org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampWithZoneObjectInspectorHive3.getPrimitiveJavaObject(IcebergTimestampWithZoneObjectInspectorHive3.java:60) > at > org.apache.iceberg.mr.hive.serde.objectinspector.IcebergTimestampWithZoneObjectInspectorHive3.getPrimitiveWritableObject(IcebergTimestampWithZoneObjectInspectorHive3.java:67) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:313) > > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) > > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28364) Iceberg: Upgrade iceberg version to 1.5.2
[ https://issues.apache.org/jira/browse/HIVE-28364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28364: -- Labels: pull-request-available (was: ) > Iceberg: Upgrade iceberg version to 1.5.2 > - > > Key: HIVE-28364 > URL: https://issues.apache.org/jira/browse/HIVE-28364 > Project: Hive > Issue Type: Task >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28350) Drop remote database succeeds but fails while deleting data under
[ https://issues.apache.org/jira/browse/HIVE-28350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28350: -- Labels: pull-request-available (was: ) > Drop remote database succeeds but fails while deleting data under > - > > Key: HIVE-28350 > URL: https://issues.apache.org/jira/browse/HIVE-28350 > Project: Hive > Issue Type: Sub-task > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > > Drop remote database operation succeeds but fails towards the end while > clearing data under the database's location because while fetching database > object via JDO we don't seem to set the 'locationUri' field. > {code:java} > > drop database pg_hive_tests; > INFO : Compiling > command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86): > drop database pg_hive_tests > INFO : Semantic Analysis Completed (retrial = false) > INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86); > Time taken: 0.115 seconds > INFO : Executing > command(queryId=hive_20240625161645_bbe11908-8d1c-46d7-9a02-1ef2091e1b86): > drop database pg_hive_tests > INFO : Starting task [Stage-0:DDL] in serial mode > ERROR : Failed > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:java.lang.IllegalArgumentException: Can not create a > Path from a null string) > at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:716) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at > org.apache.hadoop.hive.ql.ddl.database.drop.DropDatabaseOperation.execute(DropDatabaseOperation.java:51) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:356) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:329) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:813) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:550) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:544) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) > ~[hive-exec-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) > ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340) > ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_232] > at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_232] > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > ~[hadoop-common-3.1.1.7.2.18.0-641.jar:?] > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360) > ~[hive-service-3.1.3000.7.2.18.0-641.jar:3.1.3000.7.2.18.0-641] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_232] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_232] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232] > at > java.util.concurrent.ThreadPoolE
[jira] [Updated] (HIVE-28349) SHOW TABLES with invalid connector, giving 0 results, instead of failing
[ https://issues.apache.org/jira/browse/HIVE-28349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28349: -- Labels: pull-request-available (was: ) > SHOW TABLES with invalid connector, giving 0 results, instead of failing > > > Key: HIVE-28349 > URL: https://issues.apache.org/jira/browse/HIVE-28349 > Project: Hive > Issue Type: Sub-task > Components: Hive, Standalone Metastore >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SHOW TABLES with invalid connector, giving 0 results, instead of failing > Steps to repro: > {code:java} > drop connector postgres_connector; > create connector postgres_connector type 'postgres' url > 'jdbc:postgresql://1.1.1.1:31462' with DCPROPERTIES > ("hive.sql.dbcp.username"="root", "hive.sql.dbcp.password"="cloudera"); > drop database pg_hive_testing; > create remote database pg_hive_testing using postgres_connector with > DBPROPERTIES ("connector.remoteDbName"="postgres"); > show tables in pg_hive_testing; {code} > The last query gives 0 rows (not a failure). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27829) New command to display current connections on HS2 and HMS instances
[ https://issues.apache.org/jira/browse/HIVE-27829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27829: -- Labels: pull-request-available (was: ) > New command to display current connections on HS2 and HMS instances > --- > > Key: HIVE-27829 > URL: https://issues.apache.org/jira/browse/HIVE-27829 > Project: Hive > Issue Type: New Feature > Components: Hive, HiveServer2, Standalone Metastore >Reporter: Taraka Rama Rao Lethavadla >Assignee: Riju Trivedi >Priority: Major > Labels: pull-request-available > > We would need a command to list current connections to HS2/HMS instances > It could like {*}show processlist{*}(Mysql) or {*}select * from > pg_stat_activity{*}(Postgresql) or {*}show compactions{*}(Hive) to see > current connections to the Hive Server2/HMS instances > This command can help in troubleshooting issues with Hive service. One can > know the load on a given HS2/HMS instance with this command and identify > inappropriate connections to terminate them. > > We can even extend this command to show connections between an HMS instance > and backend database to troubleshoot issues between HMS and backend database -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28354) Rename NegativeLlapCliDriver to NegativeLlapCliConfig
[ https://issues.apache.org/jira/browse/HIVE-28354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28354: -- Labels: newbie pull-request-available (was: newbie) > Rename NegativeLlapCliDriver to NegativeLlapCliConfig > - > > Key: HIVE-28354 > URL: https://issues.apache.org/jira/browse/HIVE-28354 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: Zsolt Miskolczi >Priority: Major > Labels: newbie, pull-request-available > Fix For: 4.1.0 > > > https://github.com/apache/hive/blob/74b9c88aced9407351f6635769a4bd48214fca1e/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java#L364 > this is a config (extending an abstract one), not a driver, rename it to > avoid confusion -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28341) Iceberg: Change Major QB Full Table Compaction to compact partition by partition
[ https://issues.apache.org/jira/browse/HIVE-28341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28341: -- Labels: hive iceberg pull-request-available (was: hive iceberg) > Iceberg: Change Major QB Full Table Compaction to compact partition by > partition > > > Key: HIVE-28341 > URL: https://issues.apache.org/jira/browse/HIVE-28341 > Project: Hive > Issue Type: Task > Components: Hive, Iceberg integration >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > Labels: hive, iceberg, pull-request-available > > Currently, Major compaction compacts a whole table in one step. If a table is > partition and has a lot of data this operation can take a lot of time and it > risks getting write conflicts at the commit stage. This can be improved to > work partition by partition. Also, for each partition it will create one > snapshot instead of 2 snapshots (truncate+IOW) created now when compacting > the whole table in one step. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28363) Improve heuristics of FilterStatsRule without column stats
[ https://issues.apache.org/jira/browse/HIVE-28363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28363: -- Labels: pull-request-available (was: ) > Improve heuristics of FilterStatsRule without column stats > -- > > Key: HIVE-28363 > URL: https://issues.apache.org/jira/browse/HIVE-28363 > Project: Hive > Issue Type: Improvement > Components: Statistics >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > > HIVE-13287 gave a better estimation of the selectivity of IN operators, > especially when column stats are available. This ticket would try to improve > the case where column stats are unavailable. > > This is an example. The table has ten rows and no column stats on `id`. > {code:java} > 0: jdbc:hive2://hive-hiveserver2:1/defaul> DESCRIBE FORMATTED users id; > ... > ++-+ > | column_property | value | > ++-+ > | col_name | id | > | data_type | int | > | min | | > | max | | > | num_nulls | | > | distinct_count | | > | avg_col_len | | > | max_col_len | | > | num_trues | | > | num_falses | | > | bit_vector | | > | comment | from deserializer | > | COLUMN_STATS_ACCURATE | {\"BASIC_STATS\":\"true\"} | > ++-+{code} > With a single needle, the estimated number becomes 10 * 0.5 = 5 because of > the fallback heuristics. > {code:java} > 0: jdbc:hive2://hive-hiveserver2:1/defaul> EXPLAIN SELECT * FROM users > WHERE id IN (1); > ... > | TableScan | > | alias: users | > | filterExpr: (id = 1) (type: boolean) | > | Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE | > | Filter Operator | > | predicate: (id = 1) (type: boolean) | > | Statistics: Num rows: 5 Data size: 5 Basic stats: > COMPLETE Column stats: NONE | {code} > The size is estimated to be the original size with two or more needles. The > heuristics estimate the size as min(10, 10 * 0.5 * N) = 10. However, I > believe users expect to observe some reduction when using IN. > {code:java} > 0: jdbc:hive2://hive-hiveserver2:1/defaul> EXPLAIN SELECT * FROM users > WHERE id IN (1, 2); > | TableScan | > | alias: users | > | filterExpr: (id) IN (1, 2) (type: boolean) | > | Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE | > | Filter Operator | > | predicate: (id) IN (1, 2) (type: boolean) | > | Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE | {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28362) Fail to materialize a CTE with VOID
[ https://issues.apache.org/jira/browse/HIVE-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28362: -- Labels: pull-request-available (was: ) > Fail to materialize a CTE with VOID > --- > > Key: HIVE-28362 > URL: https://issues.apache.org/jira/browse/HIVE-28362 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > > CTE materialization fails when it includes a NULL literal. > {code:java} > set hive.optimize.cte.materialize.full.aggregate.only=false; > set hive.optimize.cte.materialize.threshold=2; > WITH x AS (SELECT null AS null_value) > SELECT * FROM x UNION ALL SELECT * FROM x; {code} > Error message. > {code:java} > org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT > creates a VOID type, please use CAST to specify the type, near field: > null_value > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8344) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8303) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7846) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11598) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11461) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12397) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12263) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:638) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13136) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1062) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2390) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2338) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2340) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2501) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2323) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12978) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13085) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:332) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:508) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion
[ https://issues.apache.org/jira/browse/HIVE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-28359: -- Labels: pull-request-available (was: ) > Discard old builds in Jenkins to avoid disk space exhaustion > > > Key: HIVE-28359 > URL: https://issues.apache.org/jira/browse/HIVE-28359 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: builds.txt > > > Currently Jenkins retains the builds from all active branches/PRs. > {code:bash} > for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls > -l $b | wc -l; done | sort -k2 -rn > builds.txt > {code} > Some PRs (e.g., > [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/]) > with an excessive number of builds (i.e., 66) can easily consume many GBs of > data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was > saved on April 26, 2024 and it is now more than 2 months old. > For master, we currently have all builds since January 2023 (previous builds > where manually removed as part of HIVE-28013). The builds for master occupy > currently 50GB of space. > Due to the above the disk space (persistent volume) cannot be reclaimed and > currently it is almost full (91% /var/jenkins_home). > {noformat} > kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df" > Filesystem 1K-blocks Used Available Use% Mounted on > overlay 98831908 4675004 94140520 5% / > tmpfs 65536 0 65536 0% /dev > tmpfs6645236 0 6645236 0% /sys/fs/cgroup > /dev/sdb 308521792 278996208 29509200 91% /var/jenkins_home > /dev/sda1 98831908 4675004 94140520 5% /etc/hosts > shm65536 0 65536 0% /dev/shm > tmpfs 1080112812 10801116 1% > /run/secrets/kubernetes.io/serviceaccount > tmpfs6645236 0 6645236 0% /proc/acpi > tmpfs6645236 0 6645236 0% /proc/scsi > tmpfs6645236 0 6645236 0% /sys/firmware > {noformat} > Without a discard policy in place we are going to hit again HIVE-28013 or > other disk related issues pretty soon. -- This message was sent by Atlassian Jira (v8.20.10#820010)