[jira] [Updated] (HIVE-27428) CTAS fails with SemanticException when join subquery has complex type column and false filter predicate
[ https://issues.apache.org/jira/browse/HIVE-27428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naresh P R updated HIVE-27428: -- Description: Repro steps: {code:java} drop table if exists table1; drop table if exists table2; create table table1 (a string, b string); create table table2 (complex_column create table table2 (complex_column array, values:array); -- CTAS failing query create table table3 as with t1 as (select * from table1), t2 as (select * from table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code} Exception: {code:java} Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: t2.complex_column at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194) {code} was: Repro steps: {code:java} drop table if exists table1; drop table if exists table2; create table table1 (a string, b string); create table table2 (complex_column create table table2 (complex_column array, values:array); -- CTAS failing query create table table3 as with t1 as (select * from table1), t2 as (select * from table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code} Exception: {code:java} Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: t2.df0rrd_prod_wers_x at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194) {code} > CTAS fails with SemanticException when join subquery has complex type column > and false filter predicate > --- > > Key: HIVE-27428 > URL: https://issues.apache.org/jira/browse/HIVE-27428 > Project: Hive > Issue Type: Bug >Reporter: Naresh P R >Priority: Major > > Repro steps: > {code:java} > drop table if exists table1; > drop table if exists table2; > create table table1 (a string, b string); > create table table2
[jira] [Created] (HIVE-27428) CTAS fails with SemanticException when join subquery has complex type column and false filter predicate
Naresh P R created HIVE-27428: - Summary: CTAS fails with SemanticException when join subquery has complex type column and false filter predicate Key: HIVE-27428 URL: https://issues.apache.org/jira/browse/HIVE-27428 Project: Hive Issue Type: Bug Reporter: Naresh P R Repro steps: {code:java} drop table if exists table1; drop table if exists table2; create table table1 (a string, b string); create table table2 (complex_column create table table2 (complex_column array, values:array); -- CTAS failing query create table table3 as with t1 as (select * from table1), t2 as (select * from table2 where 1=0) select t1.*, t2.* from t1 left join t2;{code} Exception: {code:java} Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT creates a VOID type, please use CAST to specify the type, near field: t2.df0rrd_prod_wers_x at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8171) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8129) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7822) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11248) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11120) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12050) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11916) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12730) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:722) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12831) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:442) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:300) at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:194) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-27427) Automatic rerunning of failed tests in Hive Pre-commit job
[ https://issues.apache.org/jira/browse/HIVE-27427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-27427 started by Dmitriy Fingerman. > Automatic rerunning of failed tests in Hive Pre-commit job > -- > > Key: HIVE-27427 > URL: https://issues.apache.org/jira/browse/HIVE-27427 > Project: Hive > Issue Type: Improvement >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > > It often happens that Hive unit tests fail during pre-commit which requires > rerunning the whole pre-commit job and creates hours of delays. Maven has the > ability to rerun failed tests. There is the following property in > maven-surefire-plugin which can be used for that: > {code:java} > rerunFailingTestsCount{code} > * [Dev mail > discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] > * [Rerun Failing > Tests|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27427) Automatic rerunning of failed tests in Hive Pre-commit job
[ https://issues.apache.org/jira/browse/HIVE-27427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Fingerman updated HIVE-27427: - Description: It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. Maven has the ability to rerun failed tests. There is the following property in maven-surefire-plugin which can be used for that: {code:java} rerunFailingTestsCount{code} [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] [Rerun Failing Tests|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html] was: It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. Maven has the ability to rerun failed tests. There is the following property in maven-surefire-plugin which can be used for that: {code:java} rerunFailingTestsCount{code} [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] [Rerun Failing Tests|[http://example.com|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html]] > Automatic rerunning of failed tests in Hive Pre-commit job > -- > > Key: HIVE-27427 > URL: https://issues.apache.org/jira/browse/HIVE-27427 > Project: Hive > Issue Type: Improvement >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > > It often happens that Hive unit tests fail during pre-commit which requires > rerunning the whole pre-commit job and creates hours of delays. Maven has the > ability to rerun failed tests. There is the following property in > maven-surefire-plugin which can be used for that: > {code:java} > rerunFailingTestsCount{code} > > [Dev mail > discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] > [Rerun Failing > Tests|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27427) Automatic rerunning of failed tests in Hive Pre-commit job
[ https://issues.apache.org/jira/browse/HIVE-27427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Fingerman updated HIVE-27427: - Description: It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. Maven has the ability to rerun failed tests. There is the following property in maven-surefire-plugin which can be used for that: {code:java} rerunFailingTestsCount{code} * [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] * [Rerun Failing Tests|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html] was: It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. Maven has the ability to rerun failed tests. There is the following property in maven-surefire-plugin which can be used for that: {code:java} rerunFailingTestsCount{code} [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] [Rerun Failing Tests|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html] > Automatic rerunning of failed tests in Hive Pre-commit job > -- > > Key: HIVE-27427 > URL: https://issues.apache.org/jira/browse/HIVE-27427 > Project: Hive > Issue Type: Improvement >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > > It often happens that Hive unit tests fail during pre-commit which requires > rerunning the whole pre-commit job and creates hours of delays. Maven has the > ability to rerun failed tests. There is the following property in > maven-surefire-plugin which can be used for that: > {code:java} > rerunFailingTestsCount{code} > * [Dev mail > discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] > * [Rerun Failing > Tests|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27427) Automatic rerunning of failed tests in Hive Pre-commit job
[ https://issues.apache.org/jira/browse/HIVE-27427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Fingerman updated HIVE-27427: - Description: It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. Maven has the ability to rerun failed tests. There is the following property in maven-surefire-plugin which can be used for that: {code:java} rerunFailingTestsCount{code} [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] [Rerun Failing Tests|[http://example.com|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html]] was: It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. What if we set Maven config to retry failed tests automatically X times? There is "rerunFailingTestsCount" property in maven-surefire-plugin which can be used for that. I would like to hear the feedback and if it is positive I could open a JIRA ticket and work on it. [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] > Automatic rerunning of failed tests in Hive Pre-commit job > -- > > Key: HIVE-27427 > URL: https://issues.apache.org/jira/browse/HIVE-27427 > Project: Hive > Issue Type: Improvement >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > > It often happens that Hive unit tests fail during pre-commit which requires > rerunning the whole pre-commit job and creates hours of delays. Maven has the > ability to rerun failed tests. There is the following property in > maven-surefire-plugin which can be used for that: > {code:java} > rerunFailingTestsCount{code} > > [Dev mail > discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] > [Rerun Failing > Tests|[http://example.com|https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html]] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27427) Automatic rerunning of failed tests in Hive Pre-commit job
[ https://issues.apache.org/jira/browse/HIVE-27427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Fingerman updated HIVE-27427: - Description: It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. What if we set Maven config to retry failed tests automatically X times? There is "rerunFailingTestsCount" property in maven-surefire-plugin which can be used for that. I would like to hear the feedback and if it is positive I could open a JIRA ticket and work on it. [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] was: It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. What if we set Maven config to retry failed tests automatically X times? There is "rerunFailingTestsCount" property in maven-surefire-plugin which can be used for that. I would like to hear the feedback and if it is positive I could open a JIRA ticket and work on it. [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] > Automatic rerunning of failed tests in Hive Pre-commit job > -- > > Key: HIVE-27427 > URL: https://issues.apache.org/jira/browse/HIVE-27427 > Project: Hive > Issue Type: Improvement >Reporter: Dmitriy Fingerman >Assignee: Dmitriy Fingerman >Priority: Major > > It often happens that Hive unit tests fail during pre-commit which requires > rerunning the whole pre-commit job and creates hours of delays. What if we > set Maven config to retry failed tests automatically X times? There is > "rerunFailingTestsCount" property in maven-surefire-plugin which can be used > for that. I would like to hear the feedback and if it is positive I could > open a JIRA ticket and work on it. > [Dev mail > discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27427) Automatic rerunning of failed tests in Hive Pre-commit job
Dmitriy Fingerman created HIVE-27427: Summary: Automatic rerunning of failed tests in Hive Pre-commit job Key: HIVE-27427 URL: https://issues.apache.org/jira/browse/HIVE-27427 Project: Hive Issue Type: Improvement Reporter: Dmitriy Fingerman Assignee: Dmitriy Fingerman It often happens that Hive unit tests fail during pre-commit which requires rerunning the whole pre-commit job and creates hours of delays. What if we set Maven config to retry failed tests automatically X times? There is "rerunFailingTestsCount" property in maven-surefire-plugin which can be used for that. I would like to hear the feedback and if it is positive I could open a JIRA ticket and work on it. [Dev mail discussion|https://lists.apache.org/thread/3vfw9b7wc35vr17zjzk1pq2jrgtkdvrq] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27293) Vectorization: Incorrect results with nvl for ORC table
[ https://issues.apache.org/jira/browse/HIVE-27293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan resolved HIVE-27293. - Resolution: Fixed > Vectorization: Incorrect results with nvl for ORC table > --- > > Key: HIVE-27293 > URL: https://issues.apache.org/jira/browse/HIVE-27293 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-alpha-2 >Reporter: Riju Trivedi >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > Attachments: esource.txt, vectorization_nvl.q > > > Attached repro.q file and data file used to reproduce the issue. > {code:java} > Insert overwrite table etarget > select mt.*, floor(rand() * 1) as bdata_no from (select nvl(np.client_id,' > '),nvl(np.id_enddate,cast(0 as decimal(10,0))),nvl(np.client_gender,' > '),nvl(np.birthday,cast(0 as decimal(10,0))),nvl(np.nationality,' > '),nvl(np.address_zipcode,' '),nvl(np.income,cast(0 as > decimal(15,2))),nvl(np.address,' '),nvl(np.part_date,cast(0 as int)) from > (select * from esource where part_date = 20230414) np) mt; > {code} > Outcome: > {code:java} > select client_id,birthday,income from etarget; > 15678 0 0.00 > 67891 19313 -1.00 > 12345 0 0.00{code} > Expected Result : > {code:java} > select client_id,birthday,income from etarget; > 12345 19613 -1.00 > 67891 19313 -1.00 > 15678 0 0.00{code} > Disabling hive.vectorized.use.vectorized.input.format produces correct output. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27425) Upgrade Nimbus-JOSE-JWT to 9.24+ due to CVEs coming from json-smart
[ https://issues.apache.org/jira/browse/HIVE-27425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27425: -- Labels: pull-request-available (was: ) > Upgrade Nimbus-JOSE-JWT to 9.24+ due to CVEs coming from json-smart > --- > > Key: HIVE-27425 > URL: https://issues.apache.org/jira/browse/HIVE-27425 > Project: Hive > Issue Type: Task >Reporter: Devaspati Krishnatri >Assignee: Devaspati Krishnatri >Priority: Major > Labels: pull-request-available > > Nimbus-JOSE-JWT before 9.24 is using the vulnerable version of json-smart. > nimbus-jose-jwt has dropped the json-smart dependency completely with > nimbus-jose-jwt 9.24 and replaces it with *Gson 2.9.1 (shaded),* as seen in > the commit history here: > [https://bitbucket.org/connect2id/nimbus-jose-jwt/commits/tag/9.24]. > Json-smart before 2.4.9 is affected by CVE-2023-1370 > CVE-2023-1370 - [Json-smart]([https://netplex.github.io/json-smart/]) is a > performance focused, JSON processor lib. When reaching a '[' or '{' character > in the JSON input, the code parses an array or an object respectively. It was > discovered that the code does not have any limit to the nesting of such > arrays or objects. Since the parsing of nested arrays and objects is done > recursively, nesting too many of them can cause a stack exhaustion (stack > overflow) and crash the software. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27418) UNION ALL + ORDER BY ordinal works incorrectly for all const queries
[ https://issues.apache.org/jira/browse/HIVE-27418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730898#comment-17730898 ] zhangbutao commented on HIVE-27418: --- Hi [~csringhofer] , could you provide more info about your hive cluster env? e.g. Hive & Hadoop version. And what execution engine did you use for the test? Tez? or MR? > UNION ALL + ORDER BY ordinal works incorrectly for all const queries > > > Key: HIVE-27418 > URL: https://issues.apache.org/jira/browse/HIVE-27418 > Project: Hive > Issue Type: Bug >Reporter: Csaba Ringhofer >Priority: Major > > For the following query I get results in wrong order: > SELECT '1', 'b' UNION ALL SELECT '2', 'a' ORDER BY 2; > +--+--+ > | _c0 | _c1 | > +--+--+ > | 1| b| > | 2| a| > +--+--+ > I get correct results if: > - the column has an alias > - the same rows come from tables > - the UNION ALL part of the query is in a sub-query and ORDER BY is run on > the sub*query > Checked with postgres and Apache Impala and they apply ORDER BY correctly. > (also noted the the ordinal after ORDER BY is not checked, so it could be 20 > and Hive doesn't complain) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27332) Add retry backoff mechanism for abort cleanup
[ https://issues.apache.org/jira/browse/HIVE-27332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730892#comment-17730892 ] Sourabh Badhya commented on HIVE-27332: --- Thanks [~veghlaci05] and [~dkuzmenko] for the reviews. > Add retry backoff mechanism for abort cleanup > - > > Key: HIVE-27332 > URL: https://issues.apache.org/jira/browse/HIVE-27332 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-27019 and HIVE-27020 added the functionality to directly clean data > directories from aborted transactions without using Initiator & Worker. > However, during the event of continuous failure during cleanup, the retry > mechanism is initiated every single time. We need to add retry backoff > mechanism to control the time required to initiate retry again and not > continuously retry. > There are widely 3 cases wherein retry due to abort cleanup is impacted - > *1. Abort cleanup on the table failed + Compaction on the table failed.* > *2. Abort cleanup on the table failed + Compaction on the table passed* > *3. Abort cleanup on the table failed + No compaction on the table.* > *Solution -* > *We reuse COMPACTION_QUEUE table to store the retry metadata -* > *Advantage: Most of the fields with respect to retry are present in > COMPACTION_QUEUE. Hence we can use the same for storing retry metadata. A > compaction type called ABORT_CLEANUP ('c') is introduced. The CQ_STATE will > remain ready for cleaning for such records.* > *Actions performed by TaskHandler in the case of failure -* > *AbortTxnCleaner -* > Action: Just add retry details in the queue table during the abort failure. > *CompactionCleaner -* > Action: If compaction on the same table is successful, delete the retry entry > in markCleaned when removing any TXN_COMPONENTS entries except when there are > no uncompacted aborts. We do not want to be in a situation where there is a > queue entry for a table but there is no record in TXN_COMPONENTS associated > with the same table. > *Advantage: Expecting no performance issues with this approach. Since we > delete 1 record most of the times for the associated table/partition.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27332) Add retry backoff mechanism for abort cleanup
[ https://issues.apache.org/jira/browse/HIVE-27332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sourabh Badhya resolved HIVE-27332. --- Fix Version/s: 4.0.0 Resolution: Fixed > Add retry backoff mechanism for abort cleanup > - > > Key: HIVE-27332 > URL: https://issues.apache.org/jira/browse/HIVE-27332 > Project: Hive > Issue Type: Sub-task >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > HIVE-27019 and HIVE-27020 added the functionality to directly clean data > directories from aborted transactions without using Initiator & Worker. > However, during the event of continuous failure during cleanup, the retry > mechanism is initiated every single time. We need to add retry backoff > mechanism to control the time required to initiate retry again and not > continuously retry. > There are widely 3 cases wherein retry due to abort cleanup is impacted - > *1. Abort cleanup on the table failed + Compaction on the table failed.* > *2. Abort cleanup on the table failed + Compaction on the table passed* > *3. Abort cleanup on the table failed + No compaction on the table.* > *Solution -* > *We reuse COMPACTION_QUEUE table to store the retry metadata -* > *Advantage: Most of the fields with respect to retry are present in > COMPACTION_QUEUE. Hence we can use the same for storing retry metadata. A > compaction type called ABORT_CLEANUP ('c') is introduced. The CQ_STATE will > remain ready for cleaning for such records.* > *Actions performed by TaskHandler in the case of failure -* > *AbortTxnCleaner -* > Action: Just add retry details in the queue table during the abort failure. > *CompactionCleaner -* > Action: If compaction on the same table is successful, delete the retry entry > in markCleaned when removing any TXN_COMPONENTS entries except when there are > no uncompacted aborts. We do not want to be in a situation where there is a > queue entry for a table but there is no record in TXN_COMPONENTS associated > with the same table. > *Advantage: Expecting no performance issues with this approach. Since we > delete 1 record most of the times for the associated table/partition.* -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27018) Move aborted transaction cleanup outside compaction process
[ https://issues.apache.org/jira/browse/HIVE-27018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sourabh Badhya resolved HIVE-27018. --- Fix Version/s: 4.0.0 Resolution: Fixed > Move aborted transaction cleanup outside compaction process > > > Key: HIVE-27018 > URL: https://issues.apache.org/jira/browse/HIVE-27018 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Fix For: 4.0.0 > > > Aborted transactions processing is tightly integrated into the compaction > pipeline and consists of 3 main stages: Initiator, Compactor (Worker), > Cleaner. That could be simplified by doing all work on the Cleaner side. > *Potential Benefits -* > There are major advantages of implementing this on the cleaner side - > 1) Currently an aborted txn in the TXNS table blocks the cleaning of > TXN_TO_WRITE_ID table since nothing gets cleaned above MIN(aborted txnid) in > the current implementation. After implementing this on the cleaner side, the > cleaner regularly checks and cleans the aborted records in the TXN_COMPONENTS > table, which in turn makes the AcidTxnCleanerService clean the aborted txns > in TXNS table. > 2) Initiator and worker do not do anything on tables which contain only > aborted directories. It's the cleaner which removes the aborted directories > of the table. Hence all operations associated with the initiator and worker > for these tables are wasteful. These wasteful operations are avoided. > 3) DP writes which are aborted are skipped by the worker currently. Hence > once again the cleaner is the one deleting the aborted directories. All > operations associated with the initiator and worker for this entry are > wasteful. These wasteful operations are avoided. > *Proposed solution -* > *Implement logic to handle aborted transactions exclusively in Cleaner.* > Implement logic to fetch the TXN_COMPONENTS which are associated with > transactions in aborted state and send the required information to the > cleaner. Cleaner must clean up the aborted deltas/delete deltas by using the > aborted directories in the AcidState of the table/partition. > It is also better to separate entities which provide information of > compaction and abort cleanup to enhance code modularity. This can be done in > this way - > Cleaner can be divided into separate entities like - > *1) Handler* - This entity fetches the data from the metastore DB from > relevant tables and converts it into a request entity called CleaningRequest. > It would also do SQL operations post cleanup (postprocess). Every type of > cleaning request is provided by a separate handler. > *2) Filesystem remover* - This entity fetches the cleaning requests from > various handlers and deletes them according to the cleaning request. > *This division allows for dynamic extensibility of cleanup from multiple > handlers. Every handler is responsible for providing cleaning requests from a > specific source.* > The following solution is resilient i.e. in the event of abrupt metastore > shutdown, the cleaner can still see the relevant entries in the metastore DB > and retry the cleaning task for that entry. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27426) Upgrade kryo version in iceberg module
[ https://issues.apache.org/jira/browse/HIVE-27426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27426: -- Labels: pull-request-available (was: ) > Upgrade kryo version in iceberg module > -- > > Key: HIVE-27426 > URL: https://issues.apache.org/jira/browse/HIVE-27426 > Project: Hive > Issue Type: Task >Reporter: Devaspati Krishnatri >Assignee: Devaspati Krishnatri >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27426) Upgrade kryo version in iceberg module
Devaspati Krishnatri created HIVE-27426: --- Summary: Upgrade kryo version in iceberg module Key: HIVE-27426 URL: https://issues.apache.org/jira/browse/HIVE-27426 Project: Hive Issue Type: Task Reporter: Devaspati Krishnatri Assignee: Devaspati Krishnatri -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27340) ThreadPool in HS2 over HTTP should respect the customized ThreadFactory
[ https://issues.apache.org/jira/browse/HIVE-27340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-27340: -- Labels: pull-request-available (was: ) > ThreadPool in HS2 over HTTP should respect the customized ThreadFactory > --- > > Key: HIVE-27340 > URL: https://issues.apache.org/jira/browse/HIVE-27340 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Minor > Labels: pull-request-available > > In Jetty, ExecutorThreadPool will override the ThreadFactory of > ThreadPoolExecutor even though the ThreadPoolExecutor has already initialized > the ThreadFactory, > {code:java} > _executor.setThreadFactory(this::newThread); {code} > Need to ignore such action as we have injected a ThreadFactory into the > ThreadPoolExecutor. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27425) Upgrade Nimbus-JOSE-JWT to 9.24+ due to CVEs coming from json-smart
[ https://issues.apache.org/jira/browse/HIVE-27425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaspati Krishnatri updated HIVE-27425: Summary: Upgrade Nimbus-JOSE-JWT to 9.24+ due to CVEs coming from json-smart (was: Upgrade Nimbus-JOSE-JWT to 9.24 due to CVEs coming from json-smart) > Upgrade Nimbus-JOSE-JWT to 9.24+ due to CVEs coming from json-smart > --- > > Key: HIVE-27425 > URL: https://issues.apache.org/jira/browse/HIVE-27425 > Project: Hive > Issue Type: Task >Reporter: Devaspati Krishnatri >Assignee: Devaspati Krishnatri >Priority: Major > > Nimbus-JOSE-JWT before 9.24 is using the vulnerable version of json-smart. > nimbus-jose-jwt has dropped the json-smart dependency completely with > nimbus-jose-jwt 9.24 and replaces it with *Gson 2.9.1 (shaded),* as seen in > the commit history here: > [https://bitbucket.org/connect2id/nimbus-jose-jwt/commits/tag/9.24]. > Json-smart before 2.4.9 is affected by CVE-2023-1370 > CVE-2023-1370 - [Json-smart]([https://netplex.github.io/json-smart/]) is a > performance focused, JSON processor lib. When reaching a '[' or '{' character > in the JSON input, the code parses an array or an object respectively. It was > discovered that the code does not have any limit to the nesting of such > arrays or objects. Since the parsing of nested arrays and objects is done > recursively, nesting too many of them can cause a stack exhaustion (stack > overflow) and crash the software. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27425) Upgrade Nimbus-JOSE-JWT to 9.24 due to CVEs coming from json-smart
Devaspati Krishnatri created HIVE-27425: --- Summary: Upgrade Nimbus-JOSE-JWT to 9.24 due to CVEs coming from json-smart Key: HIVE-27425 URL: https://issues.apache.org/jira/browse/HIVE-27425 Project: Hive Issue Type: Task Reporter: Devaspati Krishnatri Assignee: Devaspati Krishnatri Nimbus-JOSE-JWT before 9.24 is using the vulnerable version of json-smart. nimbus-jose-jwt has dropped the json-smart dependency completely with nimbus-jose-jwt 9.24 and replaces it with *Gson 2.9.1 (shaded),* as seen in the commit history here: [https://bitbucket.org/connect2id/nimbus-jose-jwt/commits/tag/9.24]. Json-smart before 2.4.9 is affected by CVE-2023-1370 CVE-2023-1370 - [Json-smart]([https://netplex.github.io/json-smart/]) is a performance focused, JSON processor lib. When reaching a '[' or '{' character in the JSON input, the code parses an array or an object respectively. It was discovered that the code does not have any limit to the nesting of such arrays or objects. Since the parsing of nested arrays and objects is done recursively, nesting too many of them can cause a stack exhaustion (stack overflow) and crash the software. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-27424) Add mvn dependency:tree run in github actions
[ https://issues.apache.org/jira/browse/HIVE-27424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshat Mathur reassigned HIVE-27424: Assignee: Akshat Mathur > Add mvn dependency:tree run in github actions > - > > Key: HIVE-27424 > URL: https://issues.apache.org/jira/browse/HIVE-27424 > Project: Hive > Issue Type: Improvement >Reporter: Akshat Mathur >Assignee: Akshat Mathur >Priority: Major > > From the discussion on [#4396|https://github.com/apache/hive/pull/4396] > Run mvn dependency:tree in github actions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27424) Add mvn dependency:tree run in github actions
[ https://issues.apache.org/jira/browse/HIVE-27424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshat Mathur updated HIVE-27424: - Description: >From the discussion on [#4396|https://github.com/apache/hive/pull/4396] Run mvn dependency:tree in github actions was: >From the discussion on [https://github.com/apache/hive/pull/4396|#4396] Run mvn dependency:tree in github actions > Add mvn dependency:tree run in github actions > - > > Key: HIVE-27424 > URL: https://issues.apache.org/jira/browse/HIVE-27424 > Project: Hive > Issue Type: Improvement >Reporter: Akshat Mathur >Priority: Major > > From the discussion on [#4396|https://github.com/apache/hive/pull/4396] > Run mvn dependency:tree in github actions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27424) Add mvn dependency:tree run in github actions
[ https://issues.apache.org/jira/browse/HIVE-27424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshat Mathur updated HIVE-27424: - Description: >From the discussion on [https://github.com/apache/hive/pull/4396|#4396] Run mvn dependency:tree in github actions was: >From the discussion on [#https://github.com/apache/hive/pull/4396] Run mvn dependency:tree in github actions > Add mvn dependency:tree run in github actions > - > > Key: HIVE-27424 > URL: https://issues.apache.org/jira/browse/HIVE-27424 > Project: Hive > Issue Type: Improvement >Reporter: Akshat Mathur >Priority: Major > > From the discussion on [https://github.com/apache/hive/pull/4396|#4396] > Run mvn dependency:tree in github actions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-27424) Add mvn dependency:tree run in github actions
Akshat Mathur created HIVE-27424: Summary: Add mvn dependency:tree run in github actions Key: HIVE-27424 URL: https://issues.apache.org/jira/browse/HIVE-27424 Project: Hive Issue Type: Improvement Reporter: Akshat Mathur >From the discussion on [#https://github.com/apache/hive/pull/4396] Run mvn dependency:tree in github actions -- This message was sent by Atlassian Jira (v8.20.10#820010)