[jira] [Updated] (HIVE-26166) Make website GDPR compliant and enable matomo analytics
[ https://issues.apache.org/jira/browse/HIVE-26166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-26166: --- Summary: Make website GDPR compliant and enable matomo analytics (was: Make website GDPR compliant) > Make website GDPR compliant and enable matomo analytics > --- > > Key: HIVE-26166 > URL: https://issues.apache.org/jira/browse/HIVE-26166 > Project: Hive > Issue Type: Task > Components: Website >Reporter: Stamatis Zampetakis >Assignee: Martijn Visser >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Per the email that was sent out from privacy we need to make the Hive website > GDPR compliant. > # The link to privacy policy needs to be updated from > [https://hive.apache.org/privacy_policy.html] to > [https://privacy.apache.org/policies/privacy-policy-public.html] > # The google analytics service must be removed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26166) Make website GDPR compliant
[ https://issues.apache.org/jira/browse/HIVE-26166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-26166. Fix Version/s: 4.1.0 Assignee: Martijn Visser Resolution: Fixed Fixed in https://github.com/apache/hive-site/commit/4f50371c738ede37571c6ae5a244994f8f670c95. Thanks for the PR [~martijnvisser]! > Make website GDPR compliant > --- > > Key: HIVE-26166 > URL: https://issues.apache.org/jira/browse/HIVE-26166 > Project: Hive > Issue Type: Task > Components: Website >Reporter: Stamatis Zampetakis >Assignee: Martijn Visser >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Per the email that was sent out from privacy we need to make the Hive website > GDPR compliant. > # The link to privacy policy needs to be updated from > [https://hive.apache.org/privacy_policy.html] to > [https://privacy.apache.org/policies/privacy-policy-public.html] > # The google analytics service must be removed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26166) Make website GDPR compliant
[ https://issues.apache.org/jira/browse/HIVE-26166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886912#comment-17886912 ] Stamatis Zampetakis commented on HIVE-26166: After HIVE-26565, the website is already GDPR compliant since the google analytics is removed and the privacy link is up-to-date. The PR#1 linked to this ticket is complementary work that enables analytics via the ASF matomo instance running at https://analytics.apache.org/ > Make website GDPR compliant > --- > > Key: HIVE-26166 > URL: https://issues.apache.org/jira/browse/HIVE-26166 > Project: Hive > Issue Type: Task > Components: Website >Reporter: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Per the email that was sent out from privacy we need to make the Hive website > GDPR compliant. > # The link to privacy policy needs to be updated from > [https://hive.apache.org/privacy_policy.html] to > [https://privacy.apache.org/policies/privacy-policy-public.html] > # The google analytics service must be removed -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28558) Drop HCatalog download page from the website
Stamatis Zampetakis created HIVE-28558: -- Summary: Drop HCatalog download page from the website Key: HIVE-28558 URL: https://issues.apache.org/jira/browse/HIVE-28558 Project: Hive Issue Type: Task Security Level: Public (Viewable by anyone) Components: Website Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The HCatalog download page (https://hive.apache.org/general/hcatalogdownloads/) is mostly there for historical reasons. It was probably useful back in 2013 to inform users about the merge of HCatalog in Hive but for the past 10 years we have been releasing HCatalog as part of Hive so anyone who is using that module does not need to visit that obsolete page. Moreover, the presence of the HCatalog download page adds an additional level of indirection for users that want to download recent Hive releases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28551) Stale results when executing queries over recreated transactional tables
[ https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886366#comment-17886366 ] Stamatis Zampetakis commented on HIVE-28551: It seems that after HIVE-19820 there were changes in some .q.out files related to the invalidation of the query result cache (i.e., results_cache_truncate.q) so this bug could be a regression from that ticket. > Stale results when executing queries over recreated transactional tables > > > Key: HIVE-28551 > URL: https://issues.apache.org/jira/browse/HIVE-28551 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: HiveServer2 >Affects Versions: 4.0.1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: results_cache_invalidation3.q > > > SQL queries return stale results from the cache when the tables involved in > the queries are dropped and then recreated with the same name. > The problem can be reproduced by executing the following sequence of queries. > {code:sql} > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Victor'); > SELECT fname FROM author; > DROP TABLE author; > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Alexander'); > SELECT fname FROM author; > {code} > The first execution of the SELECT query correctly returns "Victor" as a > result. > The second execution of the SELECT query incorrectly returns "Victor" while > it should return "Alexander". > The problem manifestates only when the hive.query.results.cache.enabled is > set to true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28551) Stale results when executing queries over recreated transactional tables
[ https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886365#comment-17886365 ] Stamatis Zampetakis commented on HIVE-28551: The problem can be reproduced on master (commit 6e261a32657d36185654dd05af7215fe33123878) by running [^results_cache_invalidation3.q] {noformat} mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite -Dqfile=results_cache_invalidation3.q -pl itests/qtest -Pitests {noformat} > Stale results when executing queries over recreated transactional tables > > > Key: HIVE-28551 > URL: https://issues.apache.org/jira/browse/HIVE-28551 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: HiveServer2 >Affects Versions: 4.0.1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: results_cache_invalidation3.q > > > SQL queries return stale results from the cache when the tables involved in > the queries are dropped and then recreated with the same name. > The problem can be reproduced by executing the following sequence of queries. > {code:sql} > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Victor'); > SELECT fname FROM author; > DROP TABLE author; > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Alexander'); > SELECT fname FROM author; > {code} > The first execution of the SELECT query correctly returns "Victor" as a > result. > The second execution of the SELECT query incorrectly returns "Victor" while > it should return "Alexander". > The problem manifestates only when the hive.query.results.cache.enabled is > set to true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28551) Stale results when executing queries over recreated transactional tables
[ https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-28551: --- Attachment: results_cache_invalidation3.q > Stale results when executing queries over recreated transactional tables > > > Key: HIVE-28551 > URL: https://issues.apache.org/jira/browse/HIVE-28551 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: HiveServer2 >Affects Versions: 4.0.1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: results_cache_invalidation3.q > > > SQL queries return stale results from the cache when the tables involved in > the queries are dropped and then recreated with the same name. > The problem can be reproduced by executing the following sequence of queries. > {code:sql} > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Victor'); > SELECT fname FROM author; > DROP TABLE author; > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Alexander'); > SELECT fname FROM author; > {code} > The first execution of the SELECT query correctly returns "Victor" as a > result. > The second execution of the SELECT query incorrectly returns "Victor" while > it should return "Alexander". > The problem manifestates only when the hive.query.results.cache.enabled is > set to true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28551) Stale results when executing queries over recreated transactional tables
[ https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886357#comment-17886357 ] Stamatis Zampetakis commented on HIVE-28551: HIVE-19154 adds an event based cache invalidation mechanism that is able remove stale entries from the cache after certain intervals but this is mostly a performance improvement and not a mechanism to guarantee correctness. The cache should never return stale entries for transactional tables. > Stale results when executing queries over recreated transactional tables > > > Key: HIVE-28551 > URL: https://issues.apache.org/jira/browse/HIVE-28551 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: HiveServer2 >Affects Versions: 4.0.1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > SQL queries return stale results from the cache when the tables involved in > the queries are dropped and then recreated with the same name. > The problem can be reproduced by executing the following sequence of queries. > {code:sql} > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Victor'); > SELECT fname FROM author; > DROP TABLE author; > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Alexander'); > SELECT fname FROM author; > {code} > The first execution of the SELECT query correctly returns "Victor" as a > result. > The second execution of the SELECT query incorrectly returns "Victor" while > it should return "Alexander". > The problem manifestates only when the hive.query.results.cache.enabled is > set to true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28551) Stale results when executing queries over recreated transactional tables
[ https://issues.apache.org/jira/browse/HIVE-28551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886351#comment-17886351 ] Stamatis Zampetakis commented on HIVE-28551: Based on the [initial design|https://issues.apache.org/jira/browse/HIVE-18513?focusedCommentId=16484227&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16484227] of the query result cache dropping or altering a transactional table should result in automatic invalidation of the cached entries. It is OK to return stale results for non-transactional tables (when hive.query.results.cache.nontransactional.tables.enabled property is true) but it is *not* OK to return stale results for transactional tables. > Stale results when executing queries over recreated transactional tables > > > Key: HIVE-28551 > URL: https://issues.apache.org/jira/browse/HIVE-28551 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: HiveServer2 >Affects Versions: 4.0.1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > SQL queries return stale results from the cache when the tables involved in > the queries are dropped and then recreated with the same name. > The problem can be reproduced by executing the following sequence of queries. > {code:sql} > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Victor'); > SELECT fname FROM author; > DROP TABLE author; > CREATE TABLE author (fname STRING) STORED AS ORC > TBLPROPERTIES('transactional'='true'); > INSERT INTO author VALUES ('Alexander'); > SELECT fname FROM author; > {code} > The first execution of the SELECT query correctly returns "Victor" as a > result. > The second execution of the SELECT query incorrectly returns "Victor" while > it should return "Alexander". > The problem manifestates only when the hive.query.results.cache.enabled is > set to true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28551) Stale results when executing queries over recreated transactional tables
Stamatis Zampetakis created HIVE-28551: -- Summary: Stale results when executing queries over recreated transactional tables Key: HIVE-28551 URL: https://issues.apache.org/jira/browse/HIVE-28551 Project: Hive Issue Type: Bug Security Level: Public (Viewable by anyone) Components: HiveServer2 Affects Versions: 4.0.1 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis SQL queries return stale results from the cache when the tables involved in the queries are dropped and then recreated with the same name. The problem can be reproduced by executing the following sequence of queries. {code:sql} CREATE TABLE author (fname STRING) STORED AS ORC TBLPROPERTIES('transactional'='true'); INSERT INTO author VALUES ('Victor'); SELECT fname FROM author; DROP TABLE author; CREATE TABLE author (fname STRING) STORED AS ORC TBLPROPERTIES('transactional'='true'); INSERT INTO author VALUES ('Alexander'); SELECT fname FROM author; {code} The first execution of the SELECT query correctly returns "Victor" as a result. The second execution of the SELECT query incorrectly returns "Victor" while it should return "Alexander". The problem manifestates only when the hive.query.results.cache.enabled is set to true. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28538) Post JDK11: Local Test Failures for TestGenericUDFFromUnixTimeEvaluate and TestMiniLlapLocalCliDriver.udf_date_format.q
[ https://issues.apache.org/jira/browse/HIVE-28538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884908#comment-17884908 ] Stamatis Zampetakis commented on HIVE-28538: This means that Hive users that will upgrade to JDK11 may observe behavior changes when using the SQL functions outlined. Many thanks for this analysis, it will be helpful for people doing upgrades. > Post JDK11: Local Test Failures for TestGenericUDFFromUnixTimeEvaluate and > TestMiniLlapLocalCliDriver.udf_date_format.q > --- > > Key: HIVE-28538 > URL: https://issues.apache.org/jira/browse/HIVE-28538 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) >Reporter: shivangi >Assignee: shivangi >Priority: Major > > *Issue:* > Post JDK11, there are test failures occurring locally in the following test > classes: > # {{TestGenericUDFFromUnixTimeEvaluate}} > # {{TestMiniLlapLocalCliDriver.udf_date_format.q}} > *Error:* > * *from_unixtime(1689930780, -MM-dd HH:mmaa) sessionZone=Etc/GMT, > formatter=SIMPLE expected:<2023-07-21 09:13[AM]> but was:<2023-07-21 > 09:13[am]>* > *Root Cause Analysis (RCA):* > In JDK11, changes were made to the Java locale handling as documented in the > following issues: > * [JDK-8145136|https://bugs.openjdk.org/browse/JDK-8145136] > * [JDK-8211985|https://bugs.openjdk.org/browse/JDK-8211985] > These changes result in different locale behaviors. This issue does not occur > in Jenkins because the test cases are written according to the {{en_US}} > locale. However, when run locally, the locale may differ (e.g., > {{{}en_IN{}}}), causing discrepancies. For instance, {{en_IN}} would expect > {{am/pm}} while the tests expect {{{}AM/PM{}}}, leading to failures. > *Solution:* > To ensure consistent behavior across both local and Jenkins environments, > pass the locale in {{}} to align both environments to > {{en_US.}} > This will ensure that all tests run with the {{en_US}} locale, mitigating > locale-related test failures. > > *Additional Notes:* > * There may be other test failures not yet captured due to different locales > in local environments. Ensure all tests run with the {{en_US}} locale to > identify and resolve any further issues. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28538) Post JDK11: Local Test Failures for TestGenericUDFFromUnixTimeEvaluate and TestMiniLlapLocalCliDriver.udf_date_format.q
[ https://issues.apache.org/jira/browse/HIVE-28538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884906#comment-17884906 ] Stamatis Zampetakis commented on HIVE-28538: This is similar to HIVE-28381. How are these two related? > Post JDK11: Local Test Failures for TestGenericUDFFromUnixTimeEvaluate and > TestMiniLlapLocalCliDriver.udf_date_format.q > --- > > Key: HIVE-28538 > URL: https://issues.apache.org/jira/browse/HIVE-28538 > Project: Hive > Issue Type: Bug > Security Level: Public(Viewable by anyone) >Reporter: shivangi >Assignee: shivangi >Priority: Major > > *Issue:* > Post JDK11, there are test failures occurring locally in the following test > classes: > # {{TestGenericUDFFromUnixTimeEvaluate}} > # {{TestMiniLlapLocalCliDriver.udf_date_format.q}} > *Error:* > * *from_unixtime(1689930780, -MM-dd HH:mmaa) sessionZone=Etc/GMT, > formatter=SIMPLE expected:<2023-07-21 09:13[AM]> but was:<2023-07-21 > 09:13[am]>* > *Root Cause Analysis (RCA):* > In JDK11, changes were made to the Java locale handling as documented in the > following issues: > * [JDK-8145136|https://bugs.openjdk.org/browse/JDK-8145136] > * [JDK-8211985|https://bugs.openjdk.org/browse/JDK-8211985] > These changes result in different locale behaviors. This issue does not occur > in Jenkins because the test cases are written according to the {{en_US}} > locale. However, when run locally, the locale may differ (e.g., > {{{}en_IN{}}}), causing discrepancies. For instance, {{en_IN}} would expect > {{am/pm}} while the tests expect {{{}AM/PM{}}}, leading to failures. > *Solution:* > To ensure consistent behavior across both local and Jenkins environments, > pass the locale in {{}} to align both environments to > {{en_US.}} > This will ensure that all tests run with the {{en_US}} locale, mitigating > locale-related test failures. > > *Additional Notes:* > * There may be other test failures not yet captured due to different locales > in local environments. Ensure all tests run with the {{en_US}} locale to > identify and resolve any further issues. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25351) stddev(), stddev_pop() with CBO enable returning null
[ https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884538#comment-17884538 ] Stamatis Zampetakis commented on HIVE-25351: It seems that this problem also affects the respective rule in Calcite (CALCITE-6080). If that's the case I would suggest to first tackle the problem there and then bring the changes in Hive. > stddev(), stddev_pop() with CBO enable returning null > - > > Key: HIVE-25351 > URL: https://issues.apache.org/jira/browse/HIVE-25351 > Project: Hive > Issue Type: Bug >Reporter: Ashish Sharma >Assignee: Jiandan Yang >Priority: Blocker > Labels: pull-request-available > > *script used to repro* > create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 > decimal(30,2)); > insert into cbo_test values ("00140006375905", 10230.72, > 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, > 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69), > ("00140006375905", 10230.72, 10230.72, 10230.69); > select stddev(v1), stddev(v2), stddev(v3) from cbo_test; > *Enable CBO* > ++ > | Explain | > ++ > | Plan optimized by CBO. | > || > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_13] | > | Select Operator [SEL_12] (rows=1 width=24) | > | Output:["_col0","_col1","_col2"] | > | Group By Operator [GBY_11] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized | > | PARTITION_ONLY_SHUFFLE [RS_10] | > | Group By Operator [GBY_9] (rows=1 width=72) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"] > | > | Select Operator [SEL_8] (rows=6 width=232) | > | > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] | > | TableScan [TS_0] (rows=6 width=232) | > | default@cbo_test,cbo_test, ACID > table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] | > || > ++ > *Query Result* > _c0 _c1 _c2 > 0.0 NaN NaN > *Disable CBO* > ++ > | Explain | > ++ > | Vertex dependency in root stage| > | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)| > || > | Stage-0| > | Fetch Operator | > | limit:-1 | > | Stage-1| > | Reducer 2 vectorized | > | File Output Operator [FS_11] | > | Group By Operator [GBY_10] (rows=1 width=24) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"] > | > | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized| > | PARTITION_ONLY_SHUFFLE [RS_9]| > | Group By Operator [GBY_8] (rows=1 width=240) | > | > Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"] > | > | Select Operator [SEL_7] (rows=6 width=232) | > | Outp
[jira] [Commented] (HIVE-28014) to_unix_timestamp udf produces inconsistent results in different jdk versions
[ https://issues.apache.org/jira/browse/HIVE-28014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883788#comment-17883788 ] Stamatis Zampetakis commented on HIVE-28014: I suppose that now that HIVE-28337 is fixed the first failure reported here in TestMetastoreUtils should no longer appear. > to_unix_timestamp udf produces inconsistent results in different jdk versions > - > > Key: HIVE-28014 > URL: https://issues.apache.org/jira/browse/HIVE-28014 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0-beta-1 >Reporter: Wechar >Assignee: Wechar >Priority: Major > > In HIVE-27999 we update the CI docker image which upgrades jdk8 from > {*}1.8.0_262-b19{*} to *1.8.0_392-b08*. This upgrade cause 3 timestamp > related tests failed: > *1. Testing / split-02 / PostProcess / > testTimestampToString[zoneId=Europe/Paris, timestamp=2417-03-26T02:08:43] – > org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils* > {code:bash} > Error > expected:<2417-03-26 0[2]:08:43> but was:<2417-03-26 0[3]:08:43> > Stacktrace > org.junit.ComparisonFailure: expected:<2417-03-26 0[2]:08:43> but > was:<2417-03-26 0[3]:08:43> > at > org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils.testTimestampToString(TestMetaStoreUtils.java:85) > {code} > *2. Testing / split-01 / PostProcess / testCliDriver[udf5] – > org.apache.hadoop.hive.cli.split24.TestMiniLlapLocalCliDriver* > {code:bash} > Error > Client Execution succeeded but contained differences (error code = 1) after > executing udf5.q > 263c263 > < 1400-11-08 07:35:34 > --- > > 1400-11-08 07:35:24 > 272c272 > < 1800-11-08 07:35:34 > --- > > 1800-11-08 07:35:24 > 434c434 > < 1399-12-31 23:35:34 > --- > > 1399-12-31 23:35:24 > 443c443 > < 1799-12-31 23:35:34 > --- > > 1799-12-31 23:35:24 > 452c452 > < 1899-12-31 23:35:34 > --- > > 1899-12-31 23:35:24 > {code} > *3. Testing / split-19 / PostProcess / testStringArg2 – > org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp* > {code:bash} > Stacktrace > org.junit.ComparisonFailure: expected:<-17984790[40]0> but > was:<-17984790[39]0> > at org.junit.Assert.assertEquals(Assert.java:117) > at org.junit.Assert.assertEquals(Assert.java:146) > at > org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp.runAndVerify(TestGenericUDFToUnixTimestamp.java:70) > at > org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFToUnixTimestamp.testStringArg2(TestGenericUDFToUnixTimestamp.java:167) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > {code} > It maybe a jdk bug and fixed in the new release, because we could get the > same result from Spark: > {code:sql} > spark-sql> select to_unix_timestamp(to_timestamp("1400-02-01 00:00:00 ICT", > "-MM-dd HH:mm:ss z"), "US/Pacific"); > -17984790390 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28337) Process timestamps at UTC timezone instead of local timezone in MetaStoreUtils
[ https://issues.apache.org/jira/browse/HIVE-28337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28337. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/e31811bb7c6670ab1f725adde3aa2b012ca64415] Thanks for the PR [~kiranvelumuri] and for the review [~wechar] ! > Process timestamps at UTC timezone instead of local timezone in MetaStoreUtils > -- > > Key: HIVE-28337 > URL: https://issues.apache.org/jira/browse/HIVE-28337 > Project: Hive > Issue Type: Bug >Reporter: Kiran Velumuri >Assignee: Kiran Velumuri >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > Attachments: image-2024-06-18-12-42-05-646.png, > image-2024-06-18-12-42-31-472.png > > > Currently in MetaStoreUtils, the conversion to/from timestamp and string > makes use of LocalDateTime in the local time zone while processing > timestamps. This causes issue with representing timestamps *as mentioned > below*. Instead, while dealing with timestamps it is proposed to use > java.time.Instant to represent a point on the time-line, which would overcome > the issue with representing such timestamps. Accordingly the test class for > MetaStoreUtils (TestMetaStoreUtils) has also been modified to account for > these changes. > +Failing scenario:+ > Timestamps in time-zones which observe daylight savings during which the > clock is set forward(typicallly 2:00 AM - 3:00 AM) > Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get > converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method, when instead > we want to represent the original timestamp without conversion. > This is happening due to representing timestamp as LocalDateTime in > TestMetaStoreUtils, which is independent of the time-zone of the timestamp. > This LocalDateTime timestamp when combined with time-zone is leading to > invalid timestamp. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28483) CAST string to date should return null when format is invalid
[ https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28483. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/e87b5bb8f4bded30b17e46ee573151488c78d178.] Thanks for the PR [~zratkai] ! The new behavior was also discussed in the mailing lists: https://lists.apache.org/thread/blo8ozrhmh1jq9c0oz8bhm39lpb95bbv > CAST string to date should return null when format is invalid > - > > Key: HIVE-28483 > URL: https://issues.apache.org/jira/browse/HIVE-28483 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Rátkai >Assignee: Zoltán Rátkai >Priority: Minor > Labels: pull-request-available > Fix For: 4.1.0 > > > Date conversation gives wrong result. Like:1 row selected (6.403 seconds) > select to_date('03-08-2024'); > Result: > +-+ > | _c0 | > +-+ > |0003-08-20 | > +-+ > or: > select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ; > Result: > +-+ > | _c0 | > +-+ > |0003-07-31 | > +- > Here is my comparison with other database systems: > + > -- > PostgreSQL > -- > SELECT TO_DATE('03-08-2024','MMDD'); > invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only > 2 could be parsed. HINT: If your source string is not fixed-width, try using > the "FM" modifier. > SELECT TO_DATE('03-08-2024','DD-MM-'); > to_date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('03-08-2024' AS date); > date > Fri, 08 Mar 2024 00:00:00 GMT > SELECT CAST('2024-08-03' AS date); > date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('2024-08-03 T' AS date); > invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT > CAST('2024-08-03 T' AS date) ^ > SELECT CAST('2024-08-03T' AS date); > invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT > CAST('2024-08-03T' AS date) ^ > SELECT CAST('2024-08-03T12:00:00' AS date); > date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('2024-08-0312:00:00' AS date); > date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT > CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different > "datestyle" setting. > -- > -ORACLE--- > -- > select CAST('2024-08-03 12:00:00' AS date) from dual; > Output: > select CAST('2024-08-03 12:00:00' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > select CAST('2024-08-03' AS date) from dual; > Output: > select CAST('2024-08-03' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL; > Output: > TO_DATE(' > - > 03-AUG-24 > - > SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL; > Output: > TO_DATE(' > - > 03-AUG-24 > - > select CAST('03-08-2024' AS date) from dual; > Output: > select CAST('03-08-2024' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > - > select CAST('2024-08-0312:00:00' AS date) from dual; > Output: > select CAST('2024-08-0312:00:00' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > select CAST('10-AUG-24' AS date) from dual; > Output: > CAST('10- > - > 10-AUG-24 > - > select CAST('10-AUG-2024' AS date) from dual; > Output: > CAST('10- > - > 10-AUG-24 > - > select CAST('03-08-24' AS date) from dual; > Output: > select CAST('03-08-24' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > > -- > select CAST('03-08-2024' AS date) from dual; > Output: > select CAST('03-08-2024' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > > SELECT sysdate FROM DUAL; > Output: > SYSDATE > - > 10-SEP-24
[jira] [Updated] (HIVE-28483) CAST string to date should return null when format is invalid
[ https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-28483: --- Summary: CAST string to date should return null when format is invalid (was: String date cast giving wrong result) > CAST string to date should return null when format is invalid > - > > Key: HIVE-28483 > URL: https://issues.apache.org/jira/browse/HIVE-28483 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Rátkai >Assignee: Zoltán Rátkai >Priority: Minor > Labels: pull-request-available > > Date conversation gives wrong result. Like:1 row selected (6.403 seconds) > select to_date('03-08-2024'); > Result: > +-+ > | _c0 | > +-+ > |0003-08-20 | > +-+ > or: > select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ; > Result: > +-+ > | _c0 | > +-+ > |0003-07-31 | > +- > Here is my comparison with other database systems: > + > -- > PostgreSQL > -- > SELECT TO_DATE('03-08-2024','MMDD'); > invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only > 2 could be parsed. HINT: If your source string is not fixed-width, try using > the "FM" modifier. > SELECT TO_DATE('03-08-2024','DD-MM-'); > to_date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('03-08-2024' AS date); > date > Fri, 08 Mar 2024 00:00:00 GMT > SELECT CAST('2024-08-03' AS date); > date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('2024-08-03 T' AS date); > invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT > CAST('2024-08-03 T' AS date) ^ > SELECT CAST('2024-08-03T' AS date); > invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT > CAST('2024-08-03T' AS date) ^ > SELECT CAST('2024-08-03T12:00:00' AS date); > date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('2024-08-0312:00:00' AS date); > date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT > CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different > "datestyle" setting. > -- > -ORACLE--- > -- > select CAST('2024-08-03 12:00:00' AS date) from dual; > Output: > select CAST('2024-08-03 12:00:00' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > select CAST('2024-08-03' AS date) from dual; > Output: > select CAST('2024-08-03' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL; > Output: > TO_DATE(' > - > 03-AUG-24 > - > SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL; > Output: > TO_DATE(' > - > 03-AUG-24 > - > select CAST('03-08-2024' AS date) from dual; > Output: > select CAST('03-08-2024' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > - > select CAST('2024-08-0312:00:00' AS date) from dual; > Output: > select CAST('2024-08-0312:00:00' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > select CAST('10-AUG-24' AS date) from dual; > Output: > CAST('10- > - > 10-AUG-24 > - > select CAST('10-AUG-2024' AS date) from dual; > Output: > CAST('10- > - > 10-AUG-24 > - > select CAST('03-08-24' AS date) from dual; > Output: > select CAST('03-08-24' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > > -- > select CAST('03-08-2024' AS date) from dual; > Output: > select CAST('03-08-2024' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > > SELECT sysdate FROM DUAL; > Output: > SYSDATE > - > 10-SEP-24 > SYSDATE > - > 10-SEP-24 > -- > -MYSQL-
[jira] [Commented] (HIVE-28483) String date cast giving wrong result
[ https://issues.apache.org/jira/browse/HIVE-28483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882943#comment-17882943 ] Stamatis Zampetakis commented on HIVE-28483: For the behavior, of the CAST function across Hive versions, I added some more detailed tests in HIVE-27586. > String date cast giving wrong result > > > Key: HIVE-28483 > URL: https://issues.apache.org/jira/browse/HIVE-28483 > Project: Hive > Issue Type: Bug >Reporter: Zoltán Rátkai >Assignee: Zoltán Rátkai >Priority: Minor > Labels: pull-request-available > > Date conversation gives wrong result. Like:1 row selected (6.403 seconds) > select to_date('03-08-2024'); > Result: > +-+ > | _c0 | > +-+ > |0003-08-20 | > +-+ > or: > select to_date(last_day(add_months(last_day('03-08-2024'), -1))) ; > Result: > +-+ > | _c0 | > +-+ > |0003-07-31 | > +- > Here is my comparison with other database systems: > + > -- > PostgreSQL > -- > SELECT TO_DATE('03-08-2024','MMDD'); > invalid value "03-0" for "" DETAIL: Field requires 4 characters, but only > 2 could be parsed. HINT: If your source string is not fixed-width, try using > the "FM" modifier. > SELECT TO_DATE('03-08-2024','DD-MM-'); > to_date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('03-08-2024' AS date); > date > Fri, 08 Mar 2024 00:00:00 GMT > SELECT CAST('2024-08-03' AS date); > date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('2024-08-03 T' AS date); > invalid input syntax for type date: "2024-08-03 T" LINE 1: SELECT > CAST('2024-08-03 T' AS date) ^ > SELECT CAST('2024-08-03T' AS date); > invalid input syntax for type date: "2024-08-03T" LINE 1: SELECT > CAST('2024-08-03T' AS date) ^ > SELECT CAST('2024-08-03T12:00:00' AS date); > date > Sat, 03 Aug 2024 00:00:00 GMT > SELECT CAST('2024-08-0312:00:00' AS date); > date/time field value out of range: "2024-08-0312:00:00" LINE 1: SELECT > CAST('2024-08-0312:00:00' AS date) ^ HINT: Perhaps you need a different > "datestyle" setting. > -- > -ORACLE--- > -- > select CAST('2024-08-03 12:00:00' AS date) from dual; > Output: > select CAST('2024-08-03 12:00:00' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > select CAST('2024-08-03' AS date) from dual; > Output: > select CAST('2024-08-03' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > SELECT TO_DATE('08/03/2024', 'MM/DD/') FROM DUAL; > Output: > TO_DATE(' > - > 03-AUG-24 > - > SELECT TO_DATE('2024-08-03', '-MM-DD') FROM DUAL; > Output: > TO_DATE(' > - > 03-AUG-24 > - > select CAST('03-08-2024' AS date) from dual; > Output: > select CAST('03-08-2024' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > - > select CAST('2024-08-0312:00:00' AS date) from dual; > Output: > select CAST('2024-08-0312:00:00' AS date) from dual > * > ERROR at line 1: > ORA-01861: literal does not match format string > - > select CAST('10-AUG-24' AS date) from dual; > Output: > CAST('10- > - > 10-AUG-24 > - > select CAST('10-AUG-2024' AS date) from dual; > Output: > CAST('10- > - > 10-AUG-24 > - > select CAST('03-08-24' AS date) from dual; > Output: > select CAST('03-08-24' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > > -- > select CAST('03-08-2024' AS date) from dual; > Output: > select CAST('03-08-2024' AS date) from dual > * > ERROR at line 1: > ORA-01843: An invalid month was specified. > > SELECT sysdate FROM DUAL; > Output: > SYSDATE > - > 10-SEP-24 > SYSDATE > - > 10-SEP-24 > -- > -MYSQL
[jira] [Commented] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars
[ https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17882942#comment-17882942 ] Stamatis Zampetakis commented on HIVE-27586: In the light of HIVE-28483, I performed a series of tests to document the behavior of parsing dates from strings across some major Hive versions. Date parsing appears in various places and may differ slightly across SQL functions so in the tests that follow I only examined the results of the CAST (V AS DATE) which is probably the most popular way of performing string to date conversions. For various SQL functions, the behavior of the vectorized and non-vectorized implementation is not aligned so in the tests I included both variants. !cast_string_date_hive_versions.svg! The tests were performed using the script in [^cast_as_date.q] file and were run using the following command. {noformat} mvn test -Dtest=TestCliDriver -Dqfile=cast_as_date.q -Phadoop-2 -Dtest.output.overwrite {noformat} Note that hadoop-2 profile is necessary for building older versions of Hive. > Parse dates from strings ignoring trailing (potentialy) invalid chars > - > > Key: HIVE-27586 > URL: https://issues.apache.org/jira/browse/HIVE-27586 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: backwards-compatibility, pull-request-available > Fix For: 4.0.0 > > Attachments: cast_as_date.q, cast_string_date_hive_versions.pdf, > cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg > > > The goal of this ticket is to extract and return a valid date from a string > value when there is a valid date prefix in the string. > The following table contains a few illustrative examples highlighting what > happens now and what will happen after the proposed changes to ignore > trailing characters. HIVE-20007 introduced some behavior changes around this > area so the table also displays what was the Hive behavior before that change. > ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing > chars|| > |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03| > |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03| > |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03| > |4|03-08-2023|0009-02-12|null|0003-08-20| > |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03| > |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03| > |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03| > This change partially (see example 3 and 4) restores the behavior changes > introduced by HIVE-20007 and at the same time makes the current behavior of > handling trailing invalid chars more uniform. > This change will have an impact on various Hive SQL functions and operators > (+/-) that accept dates from string values. A partial list of affected > functions is outlined below: > * CAST (V AS DATE) > * CAST (V AS TIMESTAMP) > * TO_DATE > * DATE_ADD > * DATE_DIFF > * WEEKOFYEAR > * DAYOFWEEK > * TRUNC -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars
[ https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27586: --- Attachment: cast_string_date_hive_versions.svg > Parse dates from strings ignoring trailing (potentialy) invalid chars > - > > Key: HIVE-27586 > URL: https://issues.apache.org/jira/browse/HIVE-27586 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: backwards-compatibility, pull-request-available > Fix For: 4.0.0 > > Attachments: cast_string_date_hive_versions.pdf, > cast_string_date_hive_versions.svg > > > The goal of this ticket is to extract and return a valid date from a string > value when there is a valid date prefix in the string. > The following table contains a few illustrative examples highlighting what > happens now and what will happen after the proposed changes to ignore > trailing characters. HIVE-20007 introduced some behavior changes around this > area so the table also displays what was the Hive behavior before that change. > ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing > chars|| > |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03| > |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03| > |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03| > |4|03-08-2023|0009-02-12|null|0003-08-20| > |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03| > |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03| > |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03| > This change partially (see example 3 and 4) restores the behavior changes > introduced by HIVE-20007 and at the same time makes the current behavior of > handling trailing invalid chars more uniform. > This change will have an impact on various Hive SQL functions and operators > (+/-) that accept dates from string values. A partial list of affected > functions is outlined below: > * CAST (V AS DATE) > * CAST (V AS TIMESTAMP) > * TO_DATE > * DATE_ADD > * DATE_DIFF > * WEEKOFYEAR > * DAYOFWEEK > * TRUNC -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars
[ https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27586: --- Attachment: cast_as_date.q > Parse dates from strings ignoring trailing (potentialy) invalid chars > - > > Key: HIVE-27586 > URL: https://issues.apache.org/jira/browse/HIVE-27586 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: backwards-compatibility, pull-request-available > Fix For: 4.0.0 > > Attachments: cast_as_date.q, cast_string_date_hive_versions.pdf, > cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg > > > The goal of this ticket is to extract and return a valid date from a string > value when there is a valid date prefix in the string. > The following table contains a few illustrative examples highlighting what > happens now and what will happen after the proposed changes to ignore > trailing characters. HIVE-20007 introduced some behavior changes around this > area so the table also displays what was the Hive behavior before that change. > ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing > chars|| > |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03| > |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03| > |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03| > |4|03-08-2023|0009-02-12|null|0003-08-20| > |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03| > |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03| > |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03| > This change partially (see example 3 and 4) restores the behavior changes > introduced by HIVE-20007 and at the same time makes the current behavior of > handling trailing invalid chars more uniform. > This change will have an impact on various Hive SQL functions and operators > (+/-) that accept dates from string values. A partial list of affected > functions is outlined below: > * CAST (V AS DATE) > * CAST (V AS TIMESTAMP) > * TO_DATE > * DATE_ADD > * DATE_DIFF > * WEEKOFYEAR > * DAYOFWEEK > * TRUNC -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars
[ https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27586: --- Attachment: cast_string_date_hive_versions.png > Parse dates from strings ignoring trailing (potentialy) invalid chars > - > > Key: HIVE-27586 > URL: https://issues.apache.org/jira/browse/HIVE-27586 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: backwards-compatibility, pull-request-available > Fix For: 4.0.0 > > Attachments: cast_string_date_hive_versions.pdf, > cast_string_date_hive_versions.png, cast_string_date_hive_versions.svg > > > The goal of this ticket is to extract and return a valid date from a string > value when there is a valid date prefix in the string. > The following table contains a few illustrative examples highlighting what > happens now and what will happen after the proposed changes to ignore > trailing characters. HIVE-20007 introduced some behavior changes around this > area so the table also displays what was the Hive behavior before that change. > ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing > chars|| > |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03| > |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03| > |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03| > |4|03-08-2023|0009-02-12|null|0003-08-20| > |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03| > |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03| > |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03| > This change partially (see example 3 and 4) restores the behavior changes > introduced by HIVE-20007 and at the same time makes the current behavior of > handling trailing invalid chars more uniform. > This change will have an impact on various Hive SQL functions and operators > (+/-) that accept dates from string values. A partial list of affected > functions is outlined below: > * CAST (V AS DATE) > * CAST (V AS TIMESTAMP) > * TO_DATE > * DATE_ADD > * DATE_DIFF > * WEEKOFYEAR > * DAYOFWEEK > * TRUNC -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27586) Parse dates from strings ignoring trailing (potentialy) invalid chars
[ https://issues.apache.org/jira/browse/HIVE-27586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27586: --- Attachment: cast_string_date_hive_versions.pdf > Parse dates from strings ignoring trailing (potentialy) invalid chars > - > > Key: HIVE-27586 > URL: https://issues.apache.org/jira/browse/HIVE-27586 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0-beta-1 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: backwards-compatibility, pull-request-available > Fix For: 4.0.0 > > Attachments: cast_string_date_hive_versions.pdf > > > The goal of this ticket is to extract and return a valid date from a string > value when there is a valid date prefix in the string. > The following table contains a few illustrative examples highlighting what > happens now and what will happen after the proposed changes to ignore > trailing characters. HIVE-20007 introduced some behavior changes around this > area so the table also displays what was the Hive behavior before that change. > ||ID||String value||Before HIVE-20007||Current behavior||Ignore trailing > chars|| > |1|2023-08-03_16:02:00|2023-08-03|null|2023-08-03| > |2|2023-08-03-16:02:00|2023-08-03|null|2023-08-03| > |3|2023-08-0316:02:00|2024-06-11|null|2023-08-03| > |4|03-08-2023|0009-02-12|null|0003-08-20| > |5|2023-08-03 GARBAGE|2023-08-03|2023-08-03|2023-08-03| > |6|2023-08-03TGARBAGE|2023-08-03|2023-08-03|2023-08-03| > |7|2023-08-03_GARBAGE|2023-08-03|null|2023-08-03| > This change partially (see example 3 and 4) restores the behavior changes > introduced by HIVE-20007 and at the same time makes the current behavior of > handling trailing invalid chars more uniform. > This change will have an impact on various Hive SQL functions and operators > (+/-) that accept dates from string values. A partial list of affected > functions is outlined below: > * CAST (V AS DATE) > * CAST (V AS TIMESTAMP) > * TO_DATE > * DATE_ADD > * DATE_DIFF > * WEEKOFYEAR > * DAYOFWEEK > * TRUNC -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-28519) Upgrade Maven SureFire Plugin to latest version 3.5.0
[ https://issues.apache.org/jira/browse/HIVE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881049#comment-17881049 ] Stamatis Zampetakis edited comment on HIVE-28519 at 9/11/24 4:47 PM: - Upgrading Surefire is far from trivial. I am working on it as part of HIVE-26332. [~Indhumathi27] If you want to take over I can leave it to you but I would suggest you check HIVE-26332 and the history behind it. was (Author: zabetak): Upgrading Surefire is far from trivial. I am working on it as part of HIVE-26332. [~Indhumathi27] If you want to take over I can leave it to you but I would suggest you check HIVE-26332 and the history behind i.t > Upgrade Maven SureFire Plugin to latest version 3.5.0 > - > > Key: HIVE-28519 > URL: https://issues.apache.org/jira/browse/HIVE-28519 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) >Reporter: Indhumathi Muthumurugesh >Assignee: Indhumathi Muthumurugesh >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28519) Upgrade Maven SureFire Plugin to latest version 3.5.0
[ https://issues.apache.org/jira/browse/HIVE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881049#comment-17881049 ] Stamatis Zampetakis commented on HIVE-28519: Upgrading Surefire is far from trivial. I am working on it as part of HIVE-26332. [~Indhumathi27] If you want to take over I can leave it to you but I would suggest you check HIVE-26332 and the history behind i.t > Upgrade Maven SureFire Plugin to latest version 3.5.0 > - > > Key: HIVE-28519 > URL: https://issues.apache.org/jira/browse/HIVE-28519 > Project: Hive > Issue Type: Improvement > Security Level: Public(Viewable by anyone) >Reporter: Indhumathi Muthumurugesh >Assignee: Indhumathi Muthumurugesh >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28499) Intellij Idea2024 can't import iceberg&hive checkstyle files
[ https://issues.apache.org/jira/browse/HIVE-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880889#comment-17880889 ] Stamatis Zampetakis commented on HIVE-28499: {noformat} find . -name "checkstyle.xml" ./standalone-metastore/checkstyle/checkstyle.xml ./checkstyle/checkstyle.xml ./iceberg/checkstyle/checkstyle.xml ./hcatalog/build-support/ant/checkstyle.xml ./storage-api/checkstyle/checkstyle.xml {noformat} > Intellij Idea2024 can't import iceberg&hive checkstyle files > > > Key: HIVE-28499 > URL: https://issues.apache.org/jira/browse/HIVE-28499 > Project: Hive > Issue Type: Improvement >Reporter: Butao Zhang >Priority: Major > Attachments: idea2024-checkstyle-8.28.jpg, > idea2024-checkstyle-tool-error.jpg, > idea2024-hive-checkstyle-8.28versiion-failed.jpg, > idea2024-import-iceberg-checkstyle_1.jpg, > idea2024-import-iceberg-checkstyle_2.jpg, > idea2024-import-iceberg-checkstyle_error_3.jpg, > idea2024-with-checkstyle-plugin-5.86.0.jpg, install_checkstyle_plugin.jpg > > > I upgraded my Intellij from 2022 to 2024 version, and i found that i can't > import iceberg&hive checkstyle files. > {code:java} > IntelliJ IDEA 2024.2.1 (Ultimate Edition) > Build #IU-242.21829.142, built on August 28, 2024{code} > Here are some screen shot & steps of my Intellij 2024: > 1. Install CheckStyle-IDEA plugin > !install_checkstyle_plugin.jpg! > 2. import hive-iceberg checkstyle files using Code Style setting > !idea2024-import-iceberg-checkstyle_1.jpg! > > import this file > [https://github.com/apache/hive/blob/master/iceberg/checkstyle/checkstyle.xml] > !idea2024-import-iceberg-checkstyle_2.jpg! > > 3. import checkstyle failed > !idea2024-import-iceberg-checkstyle_error_3.jpg! > > 4. Checkstyle tool also failed > {code:java} > com.puppycrawl.tools.checkstyle.api.CheckstyleException: > SuppressWithNearbyCommentFilter is not allowed as a child in Checker > at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:501) > at > com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:201) > at > org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61) > at > org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:116) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51) > at > org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:42) > {code} > !idea2024-checkstyle-tool-error.jpg! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-24167) TPC-DS query 14 fails while generating plan for the filter
[ https://issues.apache.org/jira/browse/HIVE-24167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880320#comment-17880320 ] Stamatis Zampetakis commented on HIVE-24167: If the failures in PlanMapper: * cause plan changes then existing tests should be able to capture this since the diff with the golden files will fail * do not cause plan changes then addressing the failures may not be worth the effort thus I feel that we don't necessarily need a toggle but I don't feel strongly about it. Feel free to choose whatever you think its best. Skipping the failures is very different from what has been done in the previous PRs so I think it is better to put it in a new PR instead of updating the existing ones. > TPC-DS query 14 fails while generating plan for the filter > -- > > Key: HIVE-24167 > URL: https://issues.apache.org/jira/browse/HIVE-24167 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Shohei Okumiya >Priority: Major > Labels: hive-4.1.0-must, pull-request-available > > TPC-DS query 14 (cbo_query14.q and query4.q) fail with NPE on the metastore > with the partitioned TPC-DS 30TB dataset while generating the plan for the > filter. > The problem can be reproduced using the PR in HIVE-23965. > The current stacktrace shows that the NPE appears while trying to display the > debug message but even if this line didn't exist it would fail again later on. > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10867) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlanForSubQueryPredicate(SemanticAnalyzer.java:3375) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFilterPlan(SemanticAnalyzer.java:3473) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10819) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11765) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11625) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11622) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11649) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11635) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12417) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:718) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12519) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:443) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:171) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:301) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:173) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:414) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:363) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:357) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:129) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(R
[jira] [Commented] (HIVE-28499) Intellij Idea2024 can't import iceberg&hive checkstyle files
[ https://issues.apache.org/jira/browse/HIVE-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879482#comment-17879482 ] Stamatis Zampetakis commented on HIVE-28499: Are the two files so different? If not then maybe worth a try to replace the iceberg one with the older. >From now on going forward, I think we should keep only one style. This way at >least the new code will be correctly formatted. Having multiple styles for the >same project is very strange. > Intellij Idea2024 can't import iceberg&hive checkstyle files > > > Key: HIVE-28499 > URL: https://issues.apache.org/jira/browse/HIVE-28499 > Project: Hive > Issue Type: Improvement >Reporter: Butao Zhang >Priority: Major > Attachments: idea2024-checkstyle-8.28.jpg, > idea2024-checkstyle-tool-error.jpg, > idea2024-hive-checkstyle-8.28versiion-failed.jpg, > idea2024-import-iceberg-checkstyle_1.jpg, > idea2024-import-iceberg-checkstyle_2.jpg, > idea2024-import-iceberg-checkstyle_error_3.jpg, > idea2024-with-checkstyle-plugin-5.86.0.jpg, install_checkstyle_plugin.jpg > > > I upgraded my Intellij from 2022 to 2024 version, and i found that i can't > import iceberg&hive checkstyle files. > {code:java} > IntelliJ IDEA 2024.2.1 (Ultimate Edition) > Build #IU-242.21829.142, built on August 28, 2024{code} > Here are some screen shot & steps of my Intellij 2024: > 1. Install CheckStyle-IDEA plugin > !install_checkstyle_plugin.jpg! > 2. import hive-iceberg checkstyle files using Code Style setting > !idea2024-import-iceberg-checkstyle_1.jpg! > > import this file > [https://github.com/apache/hive/blob/master/iceberg/checkstyle/checkstyle.xml] > !idea2024-import-iceberg-checkstyle_2.jpg! > > 3. import checkstyle failed > !idea2024-import-iceberg-checkstyle_error_3.jpg! > > 4. Checkstyle tool also failed > {code:java} > com.puppycrawl.tools.checkstyle.api.CheckstyleException: > SuppressWithNearbyCommentFilter is not allowed as a child in Checker > at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:501) > at > com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:201) > at > org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61) > at > org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:116) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51) > at > org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:42) > {code} > !idea2024-checkstyle-tool-error.jpg! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28499) Intellij Idea2024 can't import iceberg&hive checkstyle files
[ https://issues.apache.org/jira/browse/HIVE-28499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879233#comment-17879233 ] Stamatis Zampetakis commented on HIVE-28499: The checkstyle configuration for the whole project should be the same. I don't know how we ended up with multiple checkstyle files. If we can unify the configurations with minimal changes to the source code that would be ideal. > Intellij Idea2024 can't import iceberg&hive checkstyle files > > > Key: HIVE-28499 > URL: https://issues.apache.org/jira/browse/HIVE-28499 > Project: Hive > Issue Type: Improvement >Reporter: Butao Zhang >Priority: Major > Attachments: idea2024-checkstyle-8.28.jpg, > idea2024-checkstyle-tool-error.jpg, > idea2024-hive-checkstyle-8.28versiion-failed.jpg, > idea2024-import-iceberg-checkstyle_1.jpg, > idea2024-import-iceberg-checkstyle_2.jpg, > idea2024-import-iceberg-checkstyle_error_3.jpg, > idea2024-with-checkstyle-plugin-5.86.0.jpg, install_checkstyle_plugin.jpg > > > I upgraded my Intellij from 2022 to 2024 version, and i found that i can't > import iceberg&hive checkstyle files. > {code:java} > IntelliJ IDEA 2024.2.1 (Ultimate Edition) > Build #IU-242.21829.142, built on August 28, 2024{code} > Here are some screen shot & steps of my Intellij 2024: > 1. Install CheckStyle-IDEA plugin > !install_checkstyle_plugin.jpg! > 2. import hive-iceberg checkstyle files using Code Style setting > !idea2024-import-iceberg-checkstyle_1.jpg! > > import this file > [https://github.com/apache/hive/blob/master/iceberg/checkstyle/checkstyle.xml] > !idea2024-import-iceberg-checkstyle_2.jpg! > > 3. import checkstyle failed > !idea2024-import-iceberg-checkstyle_error_3.jpg! > > 4. Checkstyle tool also failed > {code:java} > com.puppycrawl.tools.checkstyle.api.CheckstyleException: > SuppressWithNearbyCommentFilter is not allowed as a child in Checker > at com.puppycrawl.tools.checkstyle.Checker.setupChild(Checker.java:501) > at > com.puppycrawl.tools.checkstyle.api.AutomaticBean.configure(AutomaticBean.java:201) > at > org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:61) > at > org.infernus.idea.checkstyle.service.cmd.OpCreateChecker.execute(OpCreateChecker.java:26) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.executeCommand(CheckstyleActionsImpl.java:116) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:60) > at > org.infernus.idea.checkstyle.service.CheckstyleActionsImpl.createChecker(CheckstyleActionsImpl.java:51) > at > org.infernus.idea.checkstyle.checker.CheckerFactoryWorker.run(CheckerFactoryWorker.java:42) > {code} > !idea2024-checkstyle-tool-error.jpg! > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28494) Iceberg: mvn build enables iceberg module by default
[ https://issues.apache.org/jira/browse/HIVE-28494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878881#comment-17878881 ] Stamatis Zampetakis commented on HIVE-28494: [~zhangbutao] Isn't this a duplicate of HIVE-25998? > Iceberg: mvn build enables iceberg module by default > > > Key: HIVE-28494 > URL: https://issues.apache.org/jira/browse/HIVE-28494 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Major > Labels: pull-request-available > > HIVE-25027 hidden the iceberg module by default. IMO, we have put lots of > effort into iceberg module and it is more stable than before. We should > enable the iceberg module by default in case of mvn build. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26987) InvalidProtocolBufferException when reading column statistics from ORC files
[ https://issues.apache.org/jira/browse/HIVE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878852#comment-17878852 ] Stamatis Zampetakis commented on HIVE-26987: [~zhangbutao] We need to attempt to run the repro again and check if it passes in order to mark this as resolved. With ORC-1361 we will get an empty list instead of an exception but we may need to add some special handling for the empty list in Hive. > InvalidProtocolBufferException when reading column statistics from ORC files > > > Key: HIVE-26987 > URL: https://issues.apache.org/jira/browse/HIVE-26987 > Project: Hive > Issue Type: Bug > Components: HiveServer2, ORC >Affects Versions: 3.1.0, 4.0.0-alpha-2 >Reporter: Stamatis Zampetakis >Priority: Major > Attachments: data.csv.gz, orc_large_column_metadata.q > > > Any attempt to read an ORC file (query an ORC table) having a metadata > section with column statistics exceeding the hardcoded limit of 1GB > ([https://github.com/apache/orc/blob/2ff9001ddef082eaa30e21cbb034f266e0721664/java/core/src/java/org/apache/orc/impl/InStream.java#L41]) > leads to the following exception. > {noformat} > Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol > message was too large. May be malicious. Use > CodedInputStream.setSizeLimit() to increase the size limit. > at > com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:162) > at > com.google.protobuf.CodedInputStream$StreamDecoder.readRawBytesSlowPathOneChunk(CodedInputStream.java:2940) > at > com.google.protobuf.CodedInputStream$StreamDecoder.readBytesSlowPath(CodedInputStream.java:3021) > at > com.google.protobuf.CodedInputStream$StreamDecoder.readBytes(CodedInputStream.java:2432) > at org.apache.orc.OrcProto$StringStatistics.(OrcProto.java:1718) > at org.apache.orc.OrcProto$StringStatistics.(OrcProto.java:1663) > at > org.apache.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1766) > at > org.apache.orc.OrcProto$StringStatistics$1.parsePartialFrom(OrcProto.java:1761) > at > com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409) > at org.apache.orc.OrcProto$ColumnStatistics.(OrcProto.java:6552) > at org.apache.orc.OrcProto$ColumnStatistics.(OrcProto.java:6468) > at > org.apache.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:6678) > at > org.apache.orc.OrcProto$ColumnStatistics$1.parsePartialFrom(OrcProto.java:6673) > at > com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409) > at > org.apache.orc.OrcProto$StripeStatistics.(OrcProto.java:19586) > at > org.apache.orc.OrcProto$StripeStatistics.(OrcProto.java:19533) > at > org.apache.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:19622) > at > org.apache.orc.OrcProto$StripeStatistics$1.parsePartialFrom(OrcProto.java:19617) > at > com.google.protobuf.CodedInputStream$StreamDecoder.readMessage(CodedInputStream.java:2409) > at org.apache.orc.OrcProto$Metadata.(OrcProto.java:20270) > at org.apache.orc.OrcProto$Metadata.(OrcProto.java:20217) > at > org.apache.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:20306) > at > org.apache.orc.OrcProto$Metadata$1.parsePartialFrom(OrcProto.java:20301) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:86) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:91) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:48) > at org.apache.orc.OrcProto$Metadata.parseFrom(OrcProto.java:20438) > at > org.apache.orc.impl.ReaderImpl.deserializeStripeStats(ReaderImpl.java:1013) > at > org.apache.orc.impl.ReaderImpl.getVariantStripeStatistics(ReaderImpl.java:317) > at > org.apache.orc.impl.ReaderImpl.getStripeStatistics(ReaderImpl.java:1047) > at > org.apache.orc.impl.ReaderImpl.getStripeStatistics(ReaderImpl.java:1034) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1679) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1557) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2900(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1529) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputF
[jira] [Commented] (HIVE-28408) Support ARRAY field access in CBO
[ https://issues.apache.org/jira/browse/HIVE-28408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875517#comment-17875517 ] Stamatis Zampetakis commented on HIVE-28408: To better understand the purpose of this ticket let's use a simpler and more readable example. {code:sql} CREATE TABLE book ( bid int, title string, author struct< aid:int, name:string, addresses: array>> > ) STORED AS PARQUET; INSERT INTO TABLE book VALUES ( 1, 'Les Miserables', named_struct('aid', 100, 'name', 'Victor-Hugo', 'addresses', array( named_struct('street', 'Avenue Champs-Elysees', 'num', 42, 'gcs', named_struct('latitude', 48.8701431D, 'longitude', 2.3051376D)), named_struct('street', 'Rue de Rivoli', 'num', 8, 'gcs', named_struct('latitude', 48.8554165D, 'longitude', 2.3582763D)) ))); SELECT author.addresses.gcs.latitude FROM book; {code} The query returns the following result. {noformat} [48.8701431,48.8554165] {noformat} Observe that the "addresses" is a complex/struct ARRAY type. The addresses.gcs.latitude expression aims to drill in/navigate/extract specific fields from the ARRAY while keeping the structure intact. This operation is similar to XPath and JSON navigational patterns and is not part of the SQL standard although supported by some DBMS. Currently, when a query contains such expressions CBO fails and we fallback to legacy optimizer. {noformat} org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Unexpected rexnode : org.apache.calcite.rex.RexFieldAccess at org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createNestedColumnRefExpr(RexNodeExprFactory.java:629) at org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createNestedColumnRefExpr(RexNodeExprFactory.java:97) at org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:903) at org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1481) at org.apache.hadoop.hive.ql.lib.CostLessRuleDispatcher.dispatch(CostLessRuleDispatcher.java:66) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) at org.apache.hadoop.hive.ql.lib.ExpressionWalker.walk(ExpressionWalker.java:101) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) at org.apache.hadoop.hive.ql.parse.type.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:231) at org.apache.hadoop.hive.ql.parse.type.RexNodeTypeCheck.genExprNode(RexNodeTypeCheck.java:40) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genAllRexNode(CalcitePlanner.java:5376) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genRexNode(CalcitePlanner.java:5333) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.internalGenSelectLogicalPlan(CalcitePlanner.java:4660) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genSelectLogicalPlan(CalcitePlanner.java:4418) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.genLogicalPlan(CalcitePlanner.java:5087) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1629) at org.apache.hadoop.hive.ql.parse.CalcitePlanner$CalcitePlannerAction.apply(CalcitePlanner.java:1572) at org.apache.calcite.tools.Frameworks.lambda$withPlanner$0(Frameworks.java:131) at org.apache.calcite.prepare.CalcitePrepareImpl.perform(CalcitePrepareImpl.java:914) at org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:180) at org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:126) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1324) {noformat} Since the physical layer of Hive is able to handle field access over ARRAY types we have to find a CBO (RexNode) expression that allows to express field access over ARRAY expressions. This can be achieved by introducing a new Hive-specific operator (e.g. COMPONENT_ACCESS) that takes an expression of an ARRAY type and alters its type so that we can perform field access as if it was a regular STRUCT. The CBO plan for the query above would look like the following: {noformat} CBO PLAN: HiveProject(latitude=[COMPONENT_ACCESS($2.addresses).gcs.latitude]) HiveTableScan(table=[[default, book]], table:alias=[book]) {noformat} The new operator acts mainly as syntactic sugar so when we translate it back to AST we can treat it as NOOP. A proof of concept using the COMPONENT_ACCESS operator is attac
[jira] [Updated] (HIVE-28408) Support ARRAY field access in CBO
[ https://issues.apache.org/jira/browse/HIVE-28408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-28408: --- Attachment: HIVE-28408.patch > Support ARRAY field access in CBO > - > > Key: HIVE-28408 > URL: https://issues.apache.org/jira/browse/HIVE-28408 > Project: Hive > Issue Type: Sub-task >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: CBO Fallback - Nested column pruning item.docx, > HIVE-28408.patch > > > fname=nested_column_pruning.q > {code:sql} > EXPLAIN > SELECT count(s1.f6), s5.f16.f18.f19 > FROM nested_tbl_1_n1 > GROUP BY s5.f16.f18.f19 > {code} > {noformat} > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: > Unexpected rexnode : org.apache.calcite.rex.RexFieldAccess{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28408) Support ARRAY field access in CBO
[ https://issues.apache.org/jira/browse/HIVE-28408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-28408: --- Summary: Support ARRAY field access in CBO (was: Support for the nested filed access within Array datatype ) > Support ARRAY field access in CBO > - > > Key: HIVE-28408 > URL: https://issues.apache.org/jira/browse/HIVE-28408 > Project: Hive > Issue Type: Sub-task >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: CBO Fallback - Nested column pruning item.docx > > > fname=nested_column_pruning.q > {code:sql} > EXPLAIN > SELECT count(s1.f6), s5.f16.f18.f19 > FROM nested_tbl_1_n1 > GROUP BY s5.f16.f18.f19 > {code} > {noformat} > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: > Unexpected rexnode : org.apache.calcite.rex.RexFieldAccess{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28455) Missing dependencies due to upgrade of maven-shade-plugin
[ https://issues.apache.org/jira/browse/HIVE-28455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875429#comment-17875429 ] Stamatis Zampetakis commented on HIVE-28455: It is a bad practice for a single module to publish multiple jars since there is no way to publish multiple pom files for the same module. In the presence of multiple jars we have to decide what is the main artifact that should be used in dependent components and fix things accordingly. > Missing dependencies due to upgrade of maven-shade-plugin > - > > Key: HIVE-28455 > URL: https://issues.apache.org/jira/browse/HIVE-28455 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0, 4.0.0-beta-1, 4.1.0 >Reporter: Kokila N >Assignee: Kokila N >Priority: Major > Labels: hive-4.0.1-must > > For, hive jdbc , we create two jars {{hive-jdbc}} and > {{hive-jdbc-standalone}} (shaded jar/uber jar). > *Reason for change in pom :* > Due to the changes in the maven code after version 3.2.4, when we create a > shaded jar ( {{{}hive-jdbc-standalone{}}}), {{dependency-reduced-pom.xml}} > is generated and dependencies that have been included into the uber JAR will > be removed from the {{}} section of the generated POM to avoid > duplication. This {{dependency-reduced-pom.xml}} is why the dependencies are > removed from the pom as its common for both {{hive-jdbc}} and > {{{}hive-jdbc-standalone{}}}. So, currently for hive-jdbc , the transitive > dependencies for it are not propagated. > Same applies to hive-beeline and hive-exec modules as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28445) Uber JAR of HiveServer2 JDBC Driver 4.1.0-SNAPSHOT is incompatible with `org.apache.zookeeper.zookeeper:3.9.2`
[ https://issues.apache.org/jira/browse/HIVE-28445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875085#comment-17875085 ] Stamatis Zampetakis commented on HIVE-28445: The Hive JDBC jar has way more things than necessary. Adding more things inside the shaded artifact is not the right approach. Optional or complementary features should be pluggable and not require entire libraries to be packaged in the same jar. From my perspective the ideal thing would be to not have dependencies to zookeeper (or having them only as optional). > Uber JAR of HiveServer2 JDBC Driver 4.1.0-SNAPSHOT is incompatible with > `org.apache.zookeeper.zookeeper:3.9.2` > -- > > Key: HIVE-28445 > URL: https://issues.apache.org/jira/browse/HIVE-28445 > Project: Hive > Issue Type: Bug >Reporter: Qiheng He >Priority: Major > > - Uber JAR of HiveServer2 JDBC Driver 4.1.0-SNAPSHOT is incompatible with > `org.apache.zookeeper.zookeeper:3.9.2`. This is one of the findings from > https://github.com/apache/shardingsphere/pull/31526 . > - Just a simple compile. > {code:bash} > sdk install java 8.0.422-tem > sdk use java 8.0.422-tem > sdk install maven > git clone g...@github.com:apache/hive.git > cd ./hive/ > git reset --hard b09d76e68bfba6be19733d864b3207f95265d11f > mvn clean install -DskipTests -T1C > mvn clean package -pl packaging -DskipTests -Pdocker > cd ../ > {code} > - Introduce at will. > {code:xml} > > org.apache.hive > hive-jdbc > 4.1.0-SNAPSHOT > standalone > > > org.apache.zookeeper > zookeeper > 3.9.2 > > > org.apache.curator > curator-test > 5.7.0 > test > > > org.junit.jupiter > junit-jupiter > 5.10.3 > test > > > org.awaitility > awaitility > 4.2.0 > test > > {code} > - Start a Zookeeper Server in the unit test. > {code:java} > import org.apache.curator.CuratorZookeeperClient; > import org.apache.curator.retry.ExponentialBackoffRetry; > import org.apache.curator.test.TestingServer; > import org.awaitility.Awaitility; > import org.junit.jupiter.api.Test; > import java.time.Duration; > public class ZookeeperTest { > @Test > void testZookeeper() throws Exception { > TestingServer testingServer = new TestingServer(); > try ( > CuratorZookeeperClient client = new > CuratorZookeeperClient(testingServer.getConnectString(), > 60 * 1000, 500, null, > new ExponentialBackoffRetry(500, 3, 500 * 3))) { > client.start(); > Awaitility.await().atMost(Duration.ofMillis(500 * > 60)).ignoreExceptions().until(client::isConnected); > } > } > } > {code} > - The following Error Log is obtained. > {code:bash} > [ERROR] 2024-08-14 13:35:55.349 [SyncThread:0] > o.a.z.server.ZooKeeperCriticalThread - Severe unrecoverable error, from > thread : SyncThread:0 > java.lang.NoSuchMethodError: 'long > org.apache.jute.OutputArchive.getDataSize()' > at > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:291) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:592) > at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:672) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:181) > [ERROR] 2024-08-14 13:35:55.373 [zkservermainrunner] > o.a.zookeeper.server.ZooKeeperServer - Error updating DB > java.io.EOFException: null > at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210) > at java.base/java.io.DataInputStream.readInt(DataInputStream.java:385) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:67) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:725) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:743) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:711) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:687) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:646) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:466) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:453) > at > org.apache.zookeeper.server.persiste
[jira] [Commented] (HIVE-28455) Missing dependencies due to upgrade of maven-shade-plugin
[ https://issues.apache.org/jira/browse/HIVE-28455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875083#comment-17875083 ] Stamatis Zampetakis commented on HIVE-28455: When the dependencies are shaded into the main published jar the pom should not include them. That's the correct and expected behavior. > Missing dependencies due to upgrade of maven-shade-plugin > - > > Key: HIVE-28455 > URL: https://issues.apache.org/jira/browse/HIVE-28455 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0, 4.0.0-beta-1, 4.1.0 >Reporter: Kokila N >Assignee: Kokila N >Priority: Major > Labels: hive-4.0.1-must > > For, hive jdbc , we create two jars {{hive-jdbc}} and > {{hive-jdbc-standalone}} (shaded jar/uber jar). > *Reason for change in pom :* > Due to the changes in the maven code after version 3.2.4, when we create a > shaded jar ( {{{}hive-jdbc-standalone{}}}), {{dependency-reduced-pom.xml}} > is generated and dependencies that have been included into the uber JAR will > be removed from the {{}} section of the generated POM to avoid > duplication. This {{dependency-reduced-pom.xml}} is why the dependencies are > removed from the pom as its common for both {{hive-jdbc}} and > {{{}hive-jdbc-standalone{}}}. So, currently for hive-jdbc , the transitive > dependencies for it are not propagated. > Same applies to hive-beeline and hive-exec modules as well. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28449) Infer constant types from columns before strict type validation
[ https://issues.apache.org/jira/browse/HIVE-28449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874909#comment-17874909 ] Stamatis Zampetakis commented on HIVE-28449: The table below summarizes the behavior changes before and after PR#5374 for a comparison expression of the form {{{}E1 op E2{}}}, where op is a comparison operator (=,<,<=,>=,>,!=), E1 is a column reference expression, E2 is a constant/literal holding a numeric value. ||Case||E1 type||E2 type||Before||After||Example expression|| |I|BIGINT|STRING|ERROR/WARN|OK|c_bigint = '9223372036854775807'| |II|DECIMAL|STRING|ERROR/WARN|OK|c_decimal_19_0 = '9223372036854775807'| |III|DOUBLE|BIGINT|ERROR/WARN|OK|c_double = 9223372036854775807| In a nutshell, the change will remove some compilation ERROR/WARNING messages for the above combinations but everything else (including the query plan) will remain unaltered. For cases I, and II, the ERROR/WARN message is misleading since there is no information/precision loss on any side of the comparison. For case III, the ERROR/WARN message is valid since the constant will be converted to DOUBLE and some digits will be truncated (Java long to double). The change in PR#5374 addresses the unintentional behavior changes introduced by HIVE-23100 but at the same time weakens strict type checking (case III) and complicates the semantics of the "hive.strict.checks.type.safety" property. Moreover, it leads to more behavior changes (between Hive 4.0.0 and Hive 4.1.0) which might not be received well by all users. Given that there are both pros and cons with the proposed changes here, I am more inclined to void this ticket and accept the existing behavior where strict type comparisons are done before any kind of type inference but I am fully open to other opinions as well. If the majority feels that the positives outweigh the negatives please leave a comment and review the PR. > Infer constant types from columns before strict type validation > --- > > Key: HIVE-28449 > URL: https://issues.apache.org/jira/browse/HIVE-28449 > Project: Hive > Issue Type: Improvement > Components: Query Planning > Environment: >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: backwards-compatibility > > HIVE-2249 introduced some [specialized type inference > logic|https://github.com/apache/hive/blob/5cbffb532a586226500abc498d6505722d62234d/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L972] > that kicks in when there are comparisons between columns and numeric > constant expressions. > Consider for instance a comparison between a BIGINT column and a STRING > constant. > {code:sql} > SELECT * FROM table WHERE c_bigint = '9223372036854775807' > {code} > The type derivation logic will attempt to convert the STRING constant to > BIGINT and evaluate the expression by comparing long values. > Currently (commit 5cbffb532a586226500abc498d6505722d62234d), the query above > throws the following ERROR/WARNING: > {noformat} > Comparing bigint and string may result in loss of information. > {noformat} > This is due to strict type checking (controlled via > hive.strict.checks.type.safety property) that is now applied before the > constant type inference logic described above. > In this case, the ERROR/WARNING is a bit misleading since there is no real > risk for losing precision/information since the STRING constant fits into a > BIGINT (Java long) and the whole comparison can be evaluated without > precision loss. > For quite some time, strict type checking was performed *after* constant type > inference (and not *before*) but the behavior was changed unintentionally by > HIVE-23100. > The goal of this change is to perform constant type inference before strict > type validation (behavior before HIVE-23100) to restore backward > compatibility and remove some unnecessary warnings/errors during compilation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-19741) Update documentation to reflect list of reserved words
[ https://issues.apache.org/jira/browse/HIVE-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874858#comment-17874858 ] Stamatis Zampetakis commented on HIVE-19741: [~okumin] I created INFRA-26047 to request the account creation from the INFRA team with the following info: Display Name: Shohei Okumiya Username: okumin Email: m...@okumin.com > Update documentation to reflect list of reserved words > -- > > Key: HIVE-19741 > URL: https://issues.apache.org/jira/browse/HIVE-19741 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: Matt Burgess >Assignee: Shohei Okumiya >Priority: Minor > > The current list of non-reserved and reserved keywords is on the Hive wiki: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords > However it does not match the list in code (see the lexer rules here): > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g > On particular example is the "application" keyword, which was discovered > while trying to create a table with a column named "application". > This Jira proposes to align the documentation with the current set of > non-reserved and reserved keywords. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-19741) Update documentation to reflect list of reserved words
[ https://issues.apache.org/jira/browse/HIVE-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874834#comment-17874834 ] Stamatis Zampetakis commented on HIVE-19741: [~okumin] Do you have a confluence JIRA id? If yes please share it and I will give you access to modify the wiki. > Update documentation to reflect list of reserved words > -- > > Key: HIVE-19741 > URL: https://issues.apache.org/jira/browse/HIVE-19741 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: Matt Burgess >Assignee: Shohei Okumiya >Priority: Minor > > The current list of non-reserved and reserved keywords is on the Hive wiki: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords > However it does not match the list in code (see the lexer rules here): > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g > On particular example is the "application" keyword, which was discovered > while trying to create a table with a column named "application". > This Jira proposes to align the documentation with the current set of > non-reserved and reserved keywords. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28449) Infer constant types from columns before strict type validation
Stamatis Zampetakis created HIVE-28449: -- Summary: Infer constant types from columns before strict type validation Key: HIVE-28449 URL: https://issues.apache.org/jira/browse/HIVE-28449 Project: Hive Issue Type: Improvement Components: Query Planning Environment: Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis HIVE-2249 introduced some [specialized type inference logic|https://github.com/apache/hive/blob/5cbffb532a586226500abc498d6505722d62234d/ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java#L972] that kicks in when there are comparisons between columns and numeric constant expressions. Consider for instance a comparison between a BIGINT column and a STRING constant. {code:sql} SELECT * FROM table WHERE c_bigint = '9223372036854775807' {code} The type derivation logic will attempt to convert the STRING constant to BIGINT and evaluate the expression by comparing long values. Currently (commit 5cbffb532a586226500abc498d6505722d62234d), the query above throws the following ERROR/WARNING: {noformat} Comparing bigint and string may result in loss of information. {noformat} This is due to strict type checking (controlled via hive.strict.checks.type.safety property) that is now applied before the constant type inference logic described above. In this case, the ERROR/WARNING is a bit misleading since there is no real risk for losing precision/information since the STRING constant fits into a BIGINT (Java long) and the whole comparison can be evaluated without precision loss. For quite some time, strict type checking was performed *after* constant type inference (and not *before*) but the behavior was changed unintentionally by HIVE-23100. The goal of this change is to perform constant type inference before strict type validation (behavior before HIVE-23100) to restore backward compatibility and remove some unnecessary warnings/errors during compilation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
[ https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-24907: --- Affects Version/s: (was: 4.0.0) > Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY > --- > > Key: HIVE-24907 > URL: https://issues.apache.org/jira/browse/HIVE-24907 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.4.0, 3.2.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > > The following SQL query returns wrong results when run in TEZ/LLAP: > {code:sql} > SET hive.auto.convert.sortmerge.join=true; > CREATE TABLE tbl (key int,value int); > INSERT INTO tbl VALUES (1, 2000); > INSERT INTO tbl VALUES (2, 2001); > INSERT INTO tbl VALUES (3, 2005); > SELECT sub1.key, sub2.key > FROM > (SELECT a.key FROM tbl a GROUP BY a.key) sub1 > LEFT OUTER JOIN ( > SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key > UNION > SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 > ON sub1.key = sub2.key; > {code} > Actual results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|NULL| > |3|NULL| > Expected results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|2| > |3|3| > Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or > {{TestMiniTezCliDriver}} in older versions of Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
[ https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-24907. Fix Version/s: 4.0.0 Resolution: Fixed > Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY > --- > > Key: HIVE-24907 > URL: https://issues.apache.org/jira/browse/HIVE-24907 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.4.0, 3.2.0, 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: 4.0.0 > > > The following SQL query returns wrong results when run in TEZ/LLAP: > {code:sql} > SET hive.auto.convert.sortmerge.join=true; > CREATE TABLE tbl (key int,value int); > INSERT INTO tbl VALUES (1, 2000); > INSERT INTO tbl VALUES (2, 2001); > INSERT INTO tbl VALUES (3, 2005); > SELECT sub1.key, sub2.key > FROM > (SELECT a.key FROM tbl a GROUP BY a.key) sub1 > LEFT OUTER JOIN ( > SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key > UNION > SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 > ON sub1.key = sub2.key; > {code} > Actual results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|NULL| > |3|NULL| > Expected results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|2| > |3|3| > Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or > {{TestMiniTezCliDriver}} in older versions of Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
[ https://issues.apache.org/jira/browse/HIVE-24907 ] Stamatis Zampetakis deleted comment on HIVE-24907: was (Author: zabetak): Thanks for looking into this [~soumyakanti.das]. Indeed the HIVE-27303 seems to be the right fix for the problem reported here so we can mark this as resolved. Since HIVE-27303 was fixed in 4.0.0 I will also assign the same fix version to this ticket as well. > Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY > --- > > Key: HIVE-24907 > URL: https://issues.apache.org/jira/browse/HIVE-24907 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.4.0, 3.2.0, 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The following SQL query returns wrong results when run in TEZ/LLAP: > {code:sql} > SET hive.auto.convert.sortmerge.join=true; > CREATE TABLE tbl (key int,value int); > INSERT INTO tbl VALUES (1, 2000); > INSERT INTO tbl VALUES (2, 2001); > INSERT INTO tbl VALUES (3, 2005); > SELECT sub1.key, sub2.key > FROM > (SELECT a.key FROM tbl a GROUP BY a.key) sub1 > LEFT OUTER JOIN ( > SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key > UNION > SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 > ON sub1.key = sub2.key; > {code} > Actual results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|NULL| > |3|NULL| > Expected results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|2| > |3|3| > Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or > {{TestMiniTezCliDriver}} in older versions of Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27303) select query result is different when enable/disable mapjoin with UNION ALL
[ https://issues.apache.org/jira/browse/HIVE-27303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-27303: --- Fix Version/s: 4.0.0 > select query result is different when enable/disable mapjoin with UNION ALL > --- > > Key: HIVE-27303 > URL: https://issues.apache.org/jira/browse/HIVE-27303 > Project: Hive > Issue Type: Bug >Reporter: Mahesh Raju Somalaraju >Assignee: Mahesh Raju Somalaraju >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > select query result is different when enable/disable mapjoin with UNION ALL > Below are the reproduce steps. > As per query when map.join is disabled it should not give rows(duplicate). > Same is working fine with map.join=true. > Expected result: Empty rows. > Problem: returning duplicate rows. > Steps: > -- > SET hive.server2.tez.queue.access.check=true; > SET tez.queue.name=default > SET hive.query.results.cache.enabled=false; > SET hive.fetch.task.conversion=none; > SET hive.execution.engine=tez; > SET hive.stats.autogather=true; > SET hive.server2.enable.doAs=false; > SET hive.auto.convert.join=false; > drop table if exists hive1_tbl_data; > drop table if exists hive2_tbl_data; > drop table if exists hive3_tbl_data; > drop table if exists hive4_tbl_data; > CREATE EXTERNAL TABLE hive1_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive2_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive3_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > CREATE EXTERNAL TABLE hive4_tbl_data (COLUMID string,COLUMN_FN > string,COLUMN_LN string,EMAIL string,COL_UPDATED_DATE timestamp, PK_COLUM > string) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > TBLPROPERTIES ( > 'TRANSLATED_TO_EXTERNAL'='true', > 'bucketing_version'='2', > 'external.table.purge'='true', > 'parquet.compression'='SNAPPY'); > > insert into table hive1_tbl_data select > '1','john','doe','j...@hotmail.com','2014-01-01 12:01:02','4000-1'; > insert into table hive1_tbl_data select > '2','john','doe','j...@hotmail.com','2014-01-01 > 12:01:02','4000-1';insert into table hive2_tbl_data select > '1','john','do
[jira] [Commented] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
[ https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873845#comment-17873845 ] Stamatis Zampetakis commented on HIVE-24907: Thanks for looking into this [~soumyakanti.das]. Indeed the HIVE-27303 seems to be the right fix for the problem reported here so we can mark this as resolved. Since HIVE-27303 was fixed in 4.0.0 I will also assign the same fix version to this ticket as well. > Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY > --- > > Key: HIVE-24907 > URL: https://issues.apache.org/jira/browse/HIVE-24907 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.4.0, 3.2.0, 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The following SQL query returns wrong results when run in TEZ/LLAP: > {code:sql} > SET hive.auto.convert.sortmerge.join=true; > CREATE TABLE tbl (key int,value int); > INSERT INTO tbl VALUES (1, 2000); > INSERT INTO tbl VALUES (2, 2001); > INSERT INTO tbl VALUES (3, 2005); > SELECT sub1.key, sub2.key > FROM > (SELECT a.key FROM tbl a GROUP BY a.key) sub1 > LEFT OUTER JOIN ( > SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key > UNION > SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 > ON sub1.key = sub2.key; > {code} > Actual results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|NULL| > |3|NULL| > Expected results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|2| > |3|3| > Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or > {{TestMiniTezCliDriver}} in older versions of Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-24907) Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY
[ https://issues.apache.org/jira/browse/HIVE-24907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17873846#comment-17873846 ] Stamatis Zampetakis commented on HIVE-24907: Thanks for looking into this [~soumyakanti.das]. Indeed the HIVE-27303 seems to be the right fix for the problem reported here so we can mark this as resolved. Since HIVE-27303 was fixed in 4.0.0 I will also assign the same fix version to this ticket as well. > Wrong results with LEFT JOIN and subqueries with UNION and GROUP BY > --- > > Key: HIVE-24907 > URL: https://issues.apache.org/jira/browse/HIVE-24907 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.4.0, 3.2.0, 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The following SQL query returns wrong results when run in TEZ/LLAP: > {code:sql} > SET hive.auto.convert.sortmerge.join=true; > CREATE TABLE tbl (key int,value int); > INSERT INTO tbl VALUES (1, 2000); > INSERT INTO tbl VALUES (2, 2001); > INSERT INTO tbl VALUES (3, 2005); > SELECT sub1.key, sub2.key > FROM > (SELECT a.key FROM tbl a GROUP BY a.key) sub1 > LEFT OUTER JOIN ( > SELECT b.key FROM tbl b WHERE b.value = 2001 GROUP BY b.key > UNION > SELECT c.key FROM tbl c WHERE c.value = 2005 GROUP BY c.key) sub2 > ON sub1.key = sub2.key; > {code} > Actual results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|NULL| > |3|NULL| > Expected results: > ||SUB1.KEY||SUB2.KEY|| > |1|NULL| > |2|2| > |3|3| > Tested can be reproduced with {{TestMiniLlapLocalCliDriver}} or > {{TestMiniTezCliDriver}} in older versions of Hive. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28423) The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing the requirement statement for `hive.server2.support.dynamic.service.discovery`
[ https://issues.apache.org/jira/browse/HIVE-28423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17872833#comment-17872833 ] Stamatis Zampetakis commented on HIVE-28423: Hey [~linghengqian] , I gave you permissions to edit the wiki. Please check if you are able to modify the desired pages. Unfortunately there is no PR/review model for wiki changes so what you did here is perfect. I checked the proposed suggestions and they make sense to me. I cannot really test these at the moment but I am OK to update the wiki per your suggestions. Once, you are done please mark this ticket as resolved. > The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing > the requirement statement for `hive.server2.support.dynamic.service.discovery` > - > > Key: HIVE-28423 > URL: https://issues.apache.org/jira/browse/HIVE-28423 > Project: Hive > Issue Type: Improvement >Reporter: Qiheng He >Assignee: Qiheng He >Priority: Major > > - The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing > the requirement statement for > *hive.server2.support.dynamic.service.discovery*. This is a documentation > issue I noticed at [https://github.com/dbeaver/dbeaver/issues/22777] , where > dbeaver contributors spent 6 months trying to figure out how to start > ZooKeeper Service Discovery on HiveServer2. > - > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLWhenZooKeeperServiceDiscoveryIsEnabled > describes ZooKeeper Service Discovery like this. > {code:bash} > ZooKeeper-based service discovery introduced in Hive 0.14.0 (HIVE-7935) > enables high availability and rolling upgrade for HiveServer2. A JDBC URL > that specifies needs to be used to make use of these > features. > With further changes in Hive 2.0.0 and 1.3.0 (unreleased, HIVE-11581), none > of the additional configuration parameters such as authentication mode, > transport mode, or SSL parameters need to be specified, as they are retrieved > from the ZooKeeper entries along with the hostname. > The JDBC connection URL: jdbc:hive2:// quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 . > The is the same as the value of hive.zookeeper.quorum > configuration parameter in hive-site.xml/hivserver2-site.xml used by > HiveServer2. > {code} > - I did a test at https://github.com/linghengqian/hivesever2-v400-sd-test to > verify that setting *hive.zookeeper.quorum* only on HiveServer2 was not > enough. I found the *hive.server2.support.dynamic.service.discovery* property > defined in the *org.apache.hadoop.hive.conf.HiveConf* class in a > stackoverflow discussion. > - To verify this git, just execute the following shell. Related unit tests > occupy *2181*, *1*, *10002* ports to start Docker Container. > {code:bash} > sdk install java 22.0.2-graalce > sdk use java 22.0.2-graalce > git clone g...@github.com:linghengqian/hivesever2-v400-sd-test.git > cd ./hivesever2-v400-sd-test/ > docker compose -f ./docker-compose-lingh.yml pull > docker compose -f ./docker-compose-lingh.yml up -d > # ... Wait five seconds for HiveServer2 to finish initializing. > ./mvnw clean test > docker compose -f ./docker-compose-lingh.yml down > {code} > - I also searched for the keyword > *hive.server2.support.dynamic.service.discovery* in https://cwiki.apache.org/ > , but I could only find this property in the documentation page of the KNOX > project > https://cwiki.apache.org/confluence/display/KNOX/Dynamic+HA+Provider+Configuration > , which doesn't make sense from my perspective. > - From my perspective, it is reasonable to add the description of > *hive.server2.support.dynamic.service.discovery* properties to the > documentation of apache/hive:4.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28423) The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing the requirement statement for `hive.server2.support.dynamic.service.discovery`
[ https://issues.apache.org/jira/browse/HIVE-28423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-28423: -- Assignee: Qiheng He > The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing > the requirement statement for `hive.server2.support.dynamic.service.discovery` > - > > Key: HIVE-28423 > URL: https://issues.apache.org/jira/browse/HIVE-28423 > Project: Hive > Issue Type: Improvement >Reporter: Qiheng He >Assignee: Qiheng He >Priority: Major > > - The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing > the requirement statement for > *hive.server2.support.dynamic.service.discovery*. This is a documentation > issue I noticed at [https://github.com/dbeaver/dbeaver/issues/22777] , where > dbeaver contributors spent 6 months trying to figure out how to start > ZooKeeper Service Discovery on HiveServer2. > - > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLWhenZooKeeperServiceDiscoveryIsEnabled > describes ZooKeeper Service Discovery like this. > {code:bash} > ZooKeeper-based service discovery introduced in Hive 0.14.0 (HIVE-7935) > enables high availability and rolling upgrade for HiveServer2. A JDBC URL > that specifies needs to be used to make use of these > features. > With further changes in Hive 2.0.0 and 1.3.0 (unreleased, HIVE-11581), none > of the additional configuration parameters such as authentication mode, > transport mode, or SSL parameters need to be specified, as they are retrieved > from the ZooKeeper entries along with the hostname. > The JDBC connection URL: jdbc:hive2:// quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 . > The is the same as the value of hive.zookeeper.quorum > configuration parameter in hive-site.xml/hivserver2-site.xml used by > HiveServer2. > {code} > - I did a test at https://github.com/linghengqian/hivesever2-v400-sd-test to > verify that setting *hive.zookeeper.quorum* only on HiveServer2 was not > enough. I found the *hive.server2.support.dynamic.service.discovery* property > defined in the *org.apache.hadoop.hive.conf.HiveConf* class in a > stackoverflow discussion. > - To verify this git, just execute the following shell. Related unit tests > occupy *2181*, *1*, *10002* ports to start Docker Container. > {code:bash} > sdk install java 22.0.2-graalce > sdk use java 22.0.2-graalce > git clone g...@github.com:linghengqian/hivesever2-v400-sd-test.git > cd ./hivesever2-v400-sd-test/ > docker compose -f ./docker-compose-lingh.yml pull > docker compose -f ./docker-compose-lingh.yml up -d > # ... Wait five seconds for HiveServer2 to finish initializing. > ./mvnw clean test > docker compose -f ./docker-compose-lingh.yml down > {code} > - I also searched for the keyword > *hive.server2.support.dynamic.service.discovery* in https://cwiki.apache.org/ > , but I could only find this property in the documentation page of the KNOX > project > https://cwiki.apache.org/confluence/display/KNOX/Dynamic+HA+Provider+Configuration > , which doesn't make sense from my perspective. > - From my perspective, it is reasonable to add the description of > *hive.server2.support.dynamic.service.discovery* properties to the > documentation of apache/hive:4.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline
[ https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28401. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/90165d76826439cbad38e10eb126e8710ffc1d28] Thanks for the review [~asolimando] ! > Drop redundant XML test report post-processing from CI pipeline > --- > > Key: HIVE-28401 > URL: https://issues.apache.org/jira/browse/HIVE-28401 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > The [Maven Surefire > plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin] > generates an XML report containing various information regarding the > execution of tests. In case of failures the system-out and system-err output > from the test is saved in the XML file. > The Jenkins pipeline has a post-processing > [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380] > that attempts to remove the system-out and system-err entries from the XML > files generated by Surefire for all tests that passed as an attempt to save > disk space in the Jenkins node. > {code:bash} > # removes all stdout and err for passed tests > xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d > 'testsuite/testcase/system-err[count(../failure)=0]' > {code} > This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing > system-out and system-err for tests that passed. > Moreover, when the XML report file is large xmlstarlet chokes and throws a > "Huge input lookup" error that skips the remaining post-processing steps and > makes the build fail. > {noformat} > [2024-07-23T16:11:26.052Z] > ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2: > internal error: Huge input lookup > [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799 INFO > [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio > [2024-07-23T16:11:26.053Z] ^ > [2024-07-23T16:11:43.133Z] Recording test results > [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found. > script returned exit code 3 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28376) Remove unused Hive object from RelOptHiveTable
[ https://issues.apache.org/jira/browse/HIVE-28376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28376. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/59e8f0d9eac8fce6a0586ef1a3deef53a774c86a] Thanks for the reviews [~simhadri-g] and [~kokila19] ! > Remove unused Hive object from RelOptHiveTable > -- > > Key: HIVE-28376 > URL: https://issues.apache.org/jira/browse/HIVE-28376 > Project: Hive > Issue Type: Task > Components: CBO >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > The > [Hive|https://github.com/apache/hive/blob/b18d5732b4f309fdc3b8226847c9c1ebcd2476fd/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java] > object is not used inside RelOptHiveTable so keeping a reference to it is > wasting memory and also complicates creation of RelOptHiveTable objects > (constructor parameter). > Moreover, the Hive objects have thread local scope so in general they > shouldn't be passed around cause their lifecycle becomes harder to manage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28423) The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing the requirement statement for `hive.server2.support.dynamic.service.discovery`
[ https://issues.apache.org/jira/browse/HIVE-28423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870187#comment-17870187 ] Stamatis Zampetakis commented on HIVE-28423: [~linghengqian] can you clarify what is exactly the modification that you are proposing? If you want to contribute to the wiki yourself please create an account and give me your username so that I can give you the appropriate permissions. > The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing > the requirement statement for `hive.server2.support.dynamic.service.discovery` > - > > Key: HIVE-28423 > URL: https://issues.apache.org/jira/browse/HIVE-28423 > Project: Hive > Issue Type: Improvement >Reporter: Qiheng He >Priority: Major > > - The doc for enabling ZooKeeper Service Discovery on HiveServer2 is missing > the requirement statement for > *hive.server2.support.dynamic.service.discovery*. This is a documentation > issue I noticed at [https://github.com/dbeaver/dbeaver/issues/22777] , where > dbeaver contributors spent 6 months trying to figure out how to start > ZooKeeper Service Discovery on HiveServer2. > - > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-ConnectionURLWhenZooKeeperServiceDiscoveryIsEnabled > describes ZooKeeper Service Discovery like this. > {code:bash} > ZooKeeper-based service discovery introduced in Hive 0.14.0 (HIVE-7935) > enables high availability and rolling upgrade for HiveServer2. A JDBC URL > that specifies needs to be used to make use of these > features. > With further changes in Hive 2.0.0 and 1.3.0 (unreleased, HIVE-11581), none > of the additional configuration parameters such as authentication mode, > transport mode, or SSL parameters need to be specified, as they are retrieved > from the ZooKeeper entries along with the hostname. > The JDBC connection URL: jdbc:hive2:// quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 . > The is the same as the value of hive.zookeeper.quorum > configuration parameter in hive-site.xml/hivserver2-site.xml used by > HiveServer2. > {code} > - I did a test at https://github.com/linghengqian/hivesever2-v400-sd-test to > verify that setting *hive.zookeeper.quorum* only on HiveServer2 was not > enough. I found the *hive.server2.support.dynamic.service.discovery* property > defined in the *org.apache.hadoop.hive.conf.HiveConf* class in a > stackoverflow discussion. > - To verify this git, just execute the following shell. Related unit tests > occupy *2181*, *1*, *10002* ports to start Docker Container. > {code:bash} > sdk install java 22.0.2-graalce > sdk use java 22.0.2-graalce > git clone g...@github.com:linghengqian/hivesever2-v400-sd-test.git > cd ./hivesever2-v400-sd-test/ > docker compose -f ./docker-compose-lingh.yml pull > docker compose -f ./docker-compose-lingh.yml up -d > # ... Wait five seconds for HiveServer2 to finish initializing. > ./mvnw clean test > docker compose -f ./docker-compose-lingh.yml down > {code} > - I also searched for the keyword > *hive.server2.support.dynamic.service.discovery* in https://cwiki.apache.org/ > , but I could only find this property in the documentation page of the KNOX > project > https://cwiki.apache.org/confluence/display/KNOX/Dynamic+HA+Provider+Configuration > , which doesn't make sense from my perspective. > - From my perspective, it is reasonable to add the description of > *hive.server2.support.dynamic.service.discovery* properties to the > documentation of apache/hive:4.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28425) Document CAST FORMAT function in the wiki
Stamatis Zampetakis created HIVE-28425: -- Summary: Document CAST FORMAT function in the wiki Key: HIVE-28425 URL: https://issues.apache.org/jira/browse/HIVE-28425 Project: Hive Issue Type: Task Components: Documentation Reporter: Stamatis Zampetakis The CAST( AS FORMAT ) function has been implemented in HIVE-21575 but does not appear in the respective page in the wiki: https://cwiki.apache.org/confluence/display/Hive/Hive+UDFs#HiveUDFs-TypeConversionFunctions -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline
[ https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869893#comment-17869893 ] Stamatis Zampetakis commented on HIVE-28401: Since we are removing code its not easy to find the right place for adding a comment. How about documenting it in the commit message? > Drop redundant XML test report post-processing from CI pipeline > --- > > Key: HIVE-28401 > URL: https://issues.apache.org/jira/browse/HIVE-28401 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > The [Maven Surefire > plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin] > generates an XML report containing various information regarding the > execution of tests. In case of failures the system-out and system-err output > from the test is saved in the XML file. > The Jenkins pipeline has a post-processing > [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380] > that attempts to remove the system-out and system-err entries from the XML > files generated by Surefire for all tests that passed as an attempt to save > disk space in the Jenkins node. > {code:bash} > # removes all stdout and err for passed tests > xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d > 'testsuite/testcase/system-err[count(../failure)=0]' > {code} > This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing > system-out and system-err for tests that passed. > Moreover, when the XML report file is large xmlstarlet chokes and throws a > "Huge input lookup" error that skips the remaining post-processing steps and > makes the build fail. > {noformat} > [2024-07-23T16:11:26.052Z] > ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2: > internal error: Huge input lookup > [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799 INFO > [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio > [2024-07-23T16:11:26.053Z] ^ > [2024-07-23T16:11:43.133Z] Recording test results > [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found. > script returned exit code 3 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28420) Incorrect URL to Hive mailing lists in HowToContribute Wiki page
[ https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-28420: --- Component/s: Documentation > Incorrect URL to Hive mailing lists in HowToContribute Wiki page > > > Key: HIVE-28420 > URL: https://issues.apache.org/jira/browse/HIVE-28420 > Project: Hive > Issue Type: Bug > Components: Documentation >Reporter: Qiheng He >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: Not Applicable > > > - HowToContribute Wiki page uses a non-existent URL. See > [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] . > {code:bash} > Stay Involved > Contributors should join the Hive mailing lists. In particular the dev list > (to join discussions of changes) and the user list (to help others). > {code} > - Clicking on `Hive mailing lists` takes me to > [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28420) HowToContribute Wiki page uses a non-existent URL
[ https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869883#comment-17869883 ] Stamatis Zampetakis commented on HIVE-28420: Thanks for reporting this [~linghengqian] . It is now fixed, please test and let me know how it looks. > HowToContribute Wiki page uses a non-existent URL > - > > Key: HIVE-28420 > URL: https://issues.apache.org/jira/browse/HIVE-28420 > Project: Hive > Issue Type: Bug >Reporter: Qiheng He >Assignee: Stamatis Zampetakis >Priority: Major > > - HowToContribute Wiki page uses a non-existent URL. See > [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] . > {code:bash} > Stay Involved > Contributors should join the Hive mailing lists. In particular the dev list > (to join discussions of changes) and the user list (to help others). > {code} > - Clicking on `Hive mailing lists` takes me to > [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28420) Incorrect URL to Hive mailing lists in HowToContribute Wiki page
[ https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28420. Fix Version/s: Not Applicable Resolution: Fixed > Incorrect URL to Hive mailing lists in HowToContribute Wiki page > > > Key: HIVE-28420 > URL: https://issues.apache.org/jira/browse/HIVE-28420 > Project: Hive > Issue Type: Bug >Reporter: Qiheng He >Assignee: Stamatis Zampetakis >Priority: Major > Fix For: Not Applicable > > > - HowToContribute Wiki page uses a non-existent URL. See > [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] . > {code:bash} > Stay Involved > Contributors should join the Hive mailing lists. In particular the dev list > (to join discussions of changes) and the user list (to help others). > {code} > - Clicking on `Hive mailing lists` takes me to > [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28420) Incorrect URL to Hive mailing lists in HowToContribute Wiki page
[ https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-28420: --- Summary: Incorrect URL to Hive mailing lists in HowToContribute Wiki page (was: HowToContribute Wiki page uses a non-existent URL) > Incorrect URL to Hive mailing lists in HowToContribute Wiki page > > > Key: HIVE-28420 > URL: https://issues.apache.org/jira/browse/HIVE-28420 > Project: Hive > Issue Type: Bug >Reporter: Qiheng He >Assignee: Stamatis Zampetakis >Priority: Major > > - HowToContribute Wiki page uses a non-existent URL. See > [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] . > {code:bash} > Stay Involved > Contributors should join the Hive mailing lists. In particular the dev list > (to join discussions of changes) and the user list (to help others). > {code} > - Clicking on `Hive mailing lists` takes me to > [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28420) HowToContribute Wiki page uses a non-existent URL
[ https://issues.apache.org/jira/browse/HIVE-28420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-28420: -- Assignee: Stamatis Zampetakis > HowToContribute Wiki page uses a non-existent URL > - > > Key: HIVE-28420 > URL: https://issues.apache.org/jira/browse/HIVE-28420 > Project: Hive > Issue Type: Bug >Reporter: Qiheng He >Assignee: Stamatis Zampetakis >Priority: Major > > - HowToContribute Wiki page uses a non-existent URL. See > [https://cwiki.apache.org/confluence/display/Hive/HowToContribute] . > {code:bash} > Stay Involved > Contributors should join the Hive mailing lists. In particular the dev list > (to join discussions of changes) and the user list (to help others). > {code} > - Clicking on `Hive mailing lists` takes me to > [https://hadoop.apache.org/hive/mailing_lists.html] which is inaccessible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline
[ https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869880#comment-17869880 ] Stamatis Zampetakis commented on HIVE-28401: I was checking the content of test-results.tgz that was generated after running precommit tests for #5364 and it seems that a few .xml files still contain system-out and system-err entries. All these entries correspond to tests that are *skipped*. The above findings show that the XML post-processing step is not completely redundant. If we remove it now, then it means that we will be storing a bit more information in the test-results. However, the difference of the test results archive size between [PR-5364|https://ci.hive.apache.org/job/hive-precommit/job/PR-5364/2/artifact/test-results.tgz] (21.75MB) and [master|https://ci.hive.apache.org/job/hive-precommit/job/master/2238/artifact/test-results.tgz] (20.35MB) is subtle. When compressed the system-out/err from skipped tests consume 6.5% (~1.5MB) more space. I feel that removing the custom logic at the expense of 1.5MB of extra space is an acceptable trade-off given that we will get rid of the "Huge input lookup" error. [~asolimando] since this information was not present when you initially reviewed the PR I would like to hear your thoughts about this. Alternatively, for having an ISO behavior with before we could wait and hope for SUREFIRE-2254 to be implemented although I don't find it necessary. > Drop redundant XML test report post-processing from CI pipeline > --- > > Key: HIVE-28401 > URL: https://issues.apache.org/jira/browse/HIVE-28401 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > The [Maven Surefire > plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin] > generates an XML report containing various information regarding the > execution of tests. In case of failures the system-out and system-err output > from the test is saved in the XML file. > The Jenkins pipeline has a post-processing > [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380] > that attempts to remove the system-out and system-err entries from the XML > files generated by Surefire for all tests that passed as an attempt to save > disk space in the Jenkins node. > {code:bash} > # removes all stdout and err for passed tests > xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d > 'testsuite/testcase/system-err[count(../failure)=0]' > {code} > This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing > system-out and system-err for tests that passed. > Moreover, when the XML report file is large xmlstarlet chokes and throws a > "Huge input lookup" error that skips the remaining post-processing steps and > makes the build fail. > {noformat} > [2024-07-23T16:11:26.052Z] > ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2: > internal error: Huge input lookup > [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799 INFO > [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio > [2024-07-23T16:11:26.053Z] ^ > [2024-07-23T16:11:43.133Z] Recording test results > [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found. > script returned exit code 3 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28415) Disable Develocity build scans when not used
Stamatis Zampetakis created HIVE-28415: -- Summary: Disable Develocity build scans when not used Key: HIVE-28415 URL: https://issues.apache.org/jira/browse/HIVE-28415 Project: Hive Issue Type: Task Components: Build Infrastructure Reporter: Stamatis Zampetakis Develocity build scans have been introduced by HIVE-28303 and now they run on every invocation of the mvn command. However, the results of the scans are only published in [ge.apache.org|https://ge.apache.org/scans?search.relativeStartTime=P28D&search.rootProjectNames=hive] by very specific [CI actions|https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/.github/workflows/build.yml#L31]. The build analysis adds a noticeable overhead to build times and resources (CPU & memory) so it shouldn't be active by default cause most of the time it is not used. The analysis should only take place when we want to publish the result (CI action) or if it is explicitly requested by a developer. The build scans were also responsible for some OOM errors in CI (HIVE-28402) since they require more memory than a regular build. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28402) Precommit tests fail with OOM when running split-19
[ https://issues.apache.org/jira/browse/HIVE-28402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869325#comment-17869325 ] Stamatis Zampetakis commented on HIVE-28402: Ι can reproduce the problem locally by running the following command: {noformat} export MAVEN_OPTS=-Xmx2g mvn test -Dtest=TestExchangePartitions,TestAlterPartitions,TestFunctions,TestGetPartitions -s ~/.m2/settings.xml -Dtest.groups= {noformat} Using the ps command I could see that the main maven {{Launcher}} process was occupying 2.5GB of RSS memory before hitting an OOM. Note that the {{Launcher}} process is the one performing the build and analysis and not the one running the tests. {noformat} $ ps -eo pid,rss,cmd | grep Launcher 103750 2532640 /opt/jdks/jdk1.8.0_261/bin/java -Xmx2g -classpath /home/stamatis/Programs/apache-maven-3.6.3/boot/plexus-classworlds-2.6.0.jar -Dclassworlds.conf=/home/stamatis/Programs/apache-maven-3.6.3/bin/m2.conf -Dmaven.home=/home/stamatis/Programs/apache-maven-3.6.3 -Dlibrary.jansi.path=/home/stamatis/Programs/apache-maven-3.6.3/lib/jansi-native -Dmaven.multiModuleProjectDirectory=/home/stamatis/Projects/Apache/hive org.codehaus.plexus.classworlds.launcher.Launcher test -Dtest=TestExchangePartitions,TestAlterPartitions,TestFunctions,TestGetPartitions -s /home/stamatis/.m2/settings.xml -Dtest.groups= {noformat} Note that if heap is not restricted using MAVEN_OPTS the build will pass since heap is able to grow more. As Zhihua mentioned, the heap dump analysis shows that we have 2.4M instances of com.gradle.scan.eventmodel.maven.MvnTestOutput_1_0 class which retain ~1.2GB of heap space. The majority of space is occupied by the [message content|https://docs.gradle.com/enterprise/event-model-javadoc/com/gradle/scan/eventmodel/maven/MvnTestOutput_1_0.html] of the event. From the Javadoc of this class "An EventData that is published when a test writes to the standard output or standard error during test execution." we can infer that standard out/err are generating events and till we publish them they will remain in memory. The latter means that tests which write lots of things to system out/err will accumulate in the heap. > Precommit tests fail with OOM when running split-19 > --- > > Key: HIVE-28402 > URL: https://issues.apache.org/jira/browse/HIVE-28402 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Attachments: image-2024-07-29-18-06-33-250.png, > image-2024-07-29-18-11-37-046.png, image-2024-07-29-18-17-58-271.png > > > The last 3 runs in master all fail with OOM when running split-19: > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2233/pipeline] > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2234/pipeline] > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2235/pipeline] > {noformat} > [2024-07-25T05:57:46.816Z] [INFO] Running > org.apache.hadoop.hive.metastore.client.TestGetPartitions > [2024-07-25T06:00:23.926Z] Exception in thread "Thread-46" > java.lang.OutOfMemoryError: GC overhead limit exceeded > [2024-07-25T06:00:23.926Z]at > java.util.Arrays.copyOfRange(Arrays.java:3664) > [2024-07-25T06:00:23.926Z]at java.lang.String.(String.java:207) > [2024-07-25T06:00:23.926Z]at > java.io.BufferedReader.readLine(BufferedReader.java:356) > [2024-07-25T06:00:24.907Z]at > java.io.BufferedReader.readLine(BufferedReader.java:389) > [2024-07-25T06:00:24.907Z]at > org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:89) > [2024-07-25T06:01:46.664Z] [WARNING] ForkStarter IOException: GC overhead > limit exceeded. See the dump file > /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream > [2024-07-25T06:01:55.003Z] [INFO] Running > org.apache.hadoop.hive.metastore.TestFilterHooks > [2024-07-25T06:02:21.747Z] > [2024-07-25T06:02:21.748Z] Exception: java.lang.OutOfMemoryError thrown from > the UncaughtExceptionHandler in thread "Thread-49" > [2024-07-25T06:03:08.707Z] [WARNING] ForkStarter IOException: GC overhead > limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC
[jira] [Commented] (HIVE-28402) Precommit tests fail with OOM when running split-19
[ https://issues.apache.org/jira/browse/HIVE-28402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869282#comment-17869282 ] Stamatis Zampetakis commented on HIVE-28402: Many thanks for working on this [~dengzh]. Can you please add a few details here about the root cause and the proposed solution for future reference. > Precommit tests fail with OOM when running split-19 > --- > > Key: HIVE-28402 > URL: https://issues.apache.org/jira/browse/HIVE-28402 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > > The last 3 runs in master all fail with OOM when running split-19: > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2233/pipeline] > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2234/pipeline] > * > [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2235/pipeline] > {noformat} > [2024-07-25T05:57:46.816Z] [INFO] Running > org.apache.hadoop.hive.metastore.client.TestGetPartitions > [2024-07-25T06:00:23.926Z] Exception in thread "Thread-46" > java.lang.OutOfMemoryError: GC overhead limit exceeded > [2024-07-25T06:00:23.926Z]at > java.util.Arrays.copyOfRange(Arrays.java:3664) > [2024-07-25T06:00:23.926Z]at java.lang.String.(String.java:207) > [2024-07-25T06:00:23.926Z]at > java.io.BufferedReader.readLine(BufferedReader.java:356) > [2024-07-25T06:00:24.907Z]at > java.io.BufferedReader.readLine(BufferedReader.java:389) > [2024-07-25T06:00:24.907Z]at > org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:89) > [2024-07-25T06:01:46.664Z] [WARNING] ForkStarter IOException: GC overhead > limit exceeded. See the dump file > /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream > [2024-07-25T06:01:55.003Z] [INFO] Running > org.apache.hadoop.hive.metastore.TestFilterHooks > [2024-07-25T06:02:21.747Z] > [2024-07-25T06:02:21.748Z] Exception: java.lang.OutOfMemoryError thrown from > the UncaughtExceptionHandler in thread "Thread-49" > [2024-07-25T06:03:08.707Z] [WARNING] ForkStarter IOException: GC overhead > limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded > [2024-07-25T06:03:08.707Z] GC overhead limit exceeded. See the dump file > /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream > [2024-07-25T06:03:15.362Z] [ERROR] Error closing test event listener: > [2024-07-25T06:03:15.362Z] java.util.concurrent.CompletionException: > java.lang.OutOfMemoryError: GC overhead limit exceeded > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.CompletableFuture.encodeThrowable > (CompletableFuture.java:273) > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.CompletableFuture.completeThrowable > (CompletableFuture.java:280) > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.CompletableFuture$AsyncRun.run > (CompletableFuture.java:1643) > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.ThreadPoolExecutor.runWorker > (ThreadPoolExecutor.java:1149) > [2024-07-25T06:03:15.362Z] at > java.util.concurrent.ThreadPoolExecutor$Worker.run > (ThreadPoolExecutor.java:624) > [2024-07-25T06:03:15.362Z] at java.lang.Thread.run (Thread.java:748) > [2024-07-25T06:03:15.362Z] Caused by: java.lang.OutOfMemoryError: GC overhead > limit exceeded > [2024-07-25T06:03:15.363Z] [ERROR] GC overhead limit exceeded -> [Help 1] > {noformat} > The OOM is also affecting PR runs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28403) Delete redundant Javadoc for Hive
[ https://issues.apache.org/jira/browse/HIVE-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868899#comment-17868899 ] Stamatis Zampetakis commented on HIVE-28403: We are happy to have you on board. Just to clarify a JIRA is required for every (PR) contribution. I just wanted to highlight that we should be mindful when opening JIRAs/PRs and weight the pros/cons. > Delete redundant Javadoc for Hive > - > > Key: HIVE-28403 > URL: https://issues.apache.org/jira/browse/HIVE-28403 > Project: Hive > Issue Type: Wish >Reporter: Caican Cai >Priority: Minor > Labels: pull-request-available > Fix For: Not Applicable > > > Hive has some redundant Javadoc, but there are no comments in it. I think > some Javadoc can be deleted. > {code:java} > // Some comments here > /** >* >*/ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28403) Delete redundant Javadoc for Hive
[ https://issues.apache.org/jira/browse/HIVE-28403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868897#comment-17868897 ] Stamatis Zampetakis commented on HIVE-28403: Hey [~caicancai], thanks for working on this. I really appreciate the time that you took in contributing this PR but I feel that these kind of changes have more negatives than positives for the project. +Negatives:+ * Consume CI resources (runs are limited in Hive so this PR may block others from running) * Increase likelihood of merge conflicts during backports * Consume reviewers time (for checking and merging) * Consume contributors time (they could spend their time on more impactful changes) * Additional JIRA/git/mailing list traffic +Positives:+ * Minor reduce in code size * Other? The above is my personal view point for such contributions and it does not mean that everyone in the Hive community agrees. I tend to avoid merging such contributions cause I have a limited amount of time and would like to focus on more impactful changes but other reviewers may be willing to get this in. > Delete redundant Javadoc for Hive > - > > Key: HIVE-28403 > URL: https://issues.apache.org/jira/browse/HIVE-28403 > Project: Hive > Issue Type: Wish >Reporter: Caican Cai >Priority: Minor > Labels: pull-request-available > > Hive has some redundant Javadoc, but there are no comments in it. I think > some Javadoc can be deleted. > {code:java} > // Some comments here > /** >* >*/ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26332) Upgrade maven-surefire-plugin to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868612#comment-17868612 ] Stamatis Zampetakis commented on HIVE-26332: Hey [~michael-o] apologies for the delay. I saw that SUREFIRE-1934 is already released with 3.3.1 so I tested with that version. After setting {{enableOutErrElements}} to false things seem to work as fine on some tests that I performed locally (my personal opinion is that false should be the default but this is another discussion). I now updated the PR upgrading to 3.3.1 and will see how it goes for the full test run. > Upgrade maven-surefire-plugin to 3.3.1 > -- > > Key: HIVE-26332 > URL: https://issues.apache.org/jira/browse/HIVE-26332 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently we use 3.0.0-M4 which was released in 2019. Since there have been > multiple bug fixes and improvements. > Worth mentioning that interaction with JUnit5 is much more mature as well and > this is one of the main reasons driving this upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26332) Upgrade maven-surefire-plugin to 3.3.1
[ https://issues.apache.org/jira/browse/HIVE-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-26332: --- Summary: Upgrade maven-surefire-plugin to 3.3.1 (was: Upgrade maven-surefire-plugin to 3.2.5) > Upgrade maven-surefire-plugin to 3.3.1 > -- > > Key: HIVE-26332 > URL: https://issues.apache.org/jira/browse/HIVE-26332 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently we use 3.0.0-M4 which was released in 2019. Since there have been > multiple bug fixes and improvements. > Worth mentioning that interaction with JUnit5 is much more mature as well and > this is one of the main reasons driving this upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28402) Precommit tests fail with OOM when running split-19
Stamatis Zampetakis created HIVE-28402: -- Summary: Precommit tests fail with OOM when running split-19 Key: HIVE-28402 URL: https://issues.apache.org/jira/browse/HIVE-28402 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis The last 3 runs in master all fail with OOM when running split-19: * [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2233/pipeline] * [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2234/pipeline] * [https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/2235/pipeline] {noformat} [2024-07-25T05:57:46.816Z] [INFO] Running org.apache.hadoop.hive.metastore.client.TestGetPartitions [2024-07-25T06:00:23.926Z] Exception in thread "Thread-46" java.lang.OutOfMemoryError: GC overhead limit exceeded [2024-07-25T06:00:23.926Z] at java.util.Arrays.copyOfRange(Arrays.java:3664) [2024-07-25T06:00:23.926Z] at java.lang.String.(String.java:207) [2024-07-25T06:00:23.926Z] at java.io.BufferedReader.readLine(BufferedReader.java:356) [2024-07-25T06:00:24.907Z] at java.io.BufferedReader.readLine(BufferedReader.java:389) [2024-07-25T06:00:24.907Z] at org.apache.maven.surefire.shade.common.org.apache.maven.shared.utils.cli.StreamPumper.run(StreamPumper.java:89) [2024-07-25T06:01:46.664Z] [WARNING] ForkStarter IOException: GC overhead limit exceeded. See the dump file /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream [2024-07-25T06:01:55.003Z] [INFO] Running org.apache.hadoop.hive.metastore.TestFilterHooks [2024-07-25T06:02:21.747Z] [2024-07-25T06:02:21.748Z] Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-49" [2024-07-25T06:03:08.707Z] [WARNING] ForkStarter IOException: GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded [2024-07-25T06:03:08.707Z] GC overhead limit exceeded. See the dump file /home/jenkins/agent/workspace/hive-precommit_master/standalone-metastore/metastore-server/target/surefire-reports/2024-07-25T05-50-11_022-jvmRun1.dumpstream [2024-07-25T06:03:15.362Z] [ERROR] Error closing test event listener: [2024-07-25T06:03:15.362Z] java.util.concurrent.CompletionException: java.lang.OutOfMemoryError: GC overhead limit exceeded [2024-07-25T06:03:15.362Z] at java.util.concurrent.CompletableFuture.encodeThrowable (CompletableFuture.java:273) [2024-07-25T06:03:15.362Z] at java.util.concurrent.CompletableFuture.completeThrowable (CompletableFuture.java:280) [2024-07-25T06:03:15.362Z] at java.util.concurrent.CompletableFuture$AsyncRun.run (CompletableFuture.java:1643) [2024-07-25T06:03:15.362Z] at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) [2024-07-25T06:03:15.362Z] at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) [2024-07-25T06:03:15.362Z] at java.lang.Thread.run (Thread.java:748) [2024-07-25T06:03:15.362Z] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded [2024-07-25T06:03:15.363Z] [ERROR] GC overhead limit exceeded -> [Help 1] {noformat} The OOM is also affecting PR runs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26369) Hive Insert Overwrite causing Data duplication
[ https://issues.apache.org/jira/browse/HIVE-26369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868595#comment-17868595 ] Stamatis Zampetakis commented on HIVE-26369: [~pengbei] I haven't worked on this so don't know sorry. > Hive Insert Overwrite causing Data duplication > -- > > Key: HIVE-26369 > URL: https://issues.apache.org/jira/browse/HIVE-26369 > Project: Hive > Issue Type: Bug >Reporter: Jayram Kumar >Priority: Critical > > Hive Insert Overwrite is causing Data Duplication. When there is an exception > while writing the file and it gets retried, the existing state does not get > cleaned up. It causes duplication in output. > It happens when the following exception is triggered. > {code:java} > java.io.IOException: java.lang.reflect.InvocationTargetException > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:144) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:277) > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:214) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at > scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList.hasNext(HiveBaseFunctionResultList.java:83) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127) > at > org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$12.apply(AsyncRDDActions.scala:127) > at > org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2185) > at > org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2185) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedConstructorAccessor15.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257) > ... 20 more > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RetriableException): > Server too busy > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1500) > at org.apache.hadoop.ipc.Client.call(Client.java:1446) > at org.apache.hadoop.ipc.Client.call(Client.java:1356) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy20.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:812) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.
[jira] [Work started] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline
[ https://issues.apache.org/jira/browse/HIVE-28401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-28401 started by Stamatis Zampetakis. -- > Drop redundant XML test report post-processing from CI pipeline > --- > > Key: HIVE-28401 > URL: https://issues.apache.org/jira/browse/HIVE-28401 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The [Maven Surefire > plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin] > generates an XML report containing various information regarding the > execution of tests. In case of failures the system-out and system-err output > from the test is saved in the XML file. > The Jenkins pipeline has a post-processing > [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380] > that attempts to remove the system-out and system-err entries from the XML > files generated by Surefire for all tests that passed as an attempt to save > disk space in the Jenkins node. > {code:bash} > # removes all stdout and err for passed tests > xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d > 'testsuite/testcase/system-err[count(../failure)=0]' > {code} > This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing > system-out and system-err for tests that passed. > Moreover, when the XML report file is large xmlstarlet chokes and throws a > "Huge input lookup" error that skips the remaining post-processing steps and > makes the build fail. > {noformat} > [2024-07-23T16:11:26.052Z] > ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2: > internal error: Huge input lookup > [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799 INFO > [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio > [2024-07-23T16:11:26.053Z] ^ > [2024-07-23T16:11:43.133Z] Recording test results > [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found. > script returned exit code 3 > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28401) Drop redundant XML test report post-processing from CI pipeline
Stamatis Zampetakis created HIVE-28401: -- Summary: Drop redundant XML test report post-processing from CI pipeline Key: HIVE-28401 URL: https://issues.apache.org/jira/browse/HIVE-28401 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The [Maven Surefire plugin|https://maven.apache.org/surefire/maven-surefire-plugin/#maven-surefire-plugin] generates an XML report containing various information regarding the execution of tests. In case of failures the system-out and system-err output from the test is saved in the XML file. The Jenkins pipeline has a post-processing [step|https://github.com/apache/hive/blob/78f577d73e5a49ca0f8f1dcae721f3980162872a/Jenkinsfile#L380] that attempts to remove the system-out and system-err entries from the XML files generated by Surefire for all tests that passed as an attempt to save disk space in the Jenkins node. {code:bash} # removes all stdout and err for passed tests xmlstarlet ed -L -d 'testsuite/testcase/system-out[count(../failure)=0]' -d 'testsuite/testcase/system-err[count(../failure)=0]' {code} This cleanup step is not necessary since Surefire (3.0.0-M4) is not storing system-out and system-err for tests that passed. Moreover, when the XML report file is large xmlstarlet chokes and throws a "Huge input lookup" error that skips the remaining post-processing steps and makes the build fail. {noformat} [2024-07-23T16:11:26.052Z] ./itests/qtest/target/surefire-reports/TEST-org.apache.hadoop.hive.cli.split31.TestMiniLlapLocalCliDriver.xml:53539.2: internal error: Huge input lookup [2024-07-23T16:11:26.053Z] 2024-07-23T09:02:51,799 INFO [734aa572-f1e1-4376-8c1c-9666c216e579 main] Sessio [2024-07-23T16:11:26.053Z] ^ [2024-07-23T16:11:43.133Z] Recording test results [2024-07-23T16:11:50.785Z] [Checks API] No suitable checks publisher found. script returned exit code 3 {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-25952) Drop HiveRelMdPredicates::getPredicates(Project...) to use that of RelMdPredicates
[ https://issues.apache.org/jira/browse/HIVE-25952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868076#comment-17868076 ] Stamatis Zampetakis commented on HIVE-25952: I created a new PR (#5360) to gauge the impact on existing test cases. I would like to advance this work mainly to avoid potential wrong result issues due to the discrepancies outlined under HIVE-26733. [~asolimando] I feel that HIVE-25966 is redundant. Is this really a blocker for this ticket? If not I guess we can close HIVE-25966 as won't fix. I like the analysis of differences in the description of this ticket. I agree with everything except the line describing the behavior for {{RexCall}} expression. It seems that both in Hive and Calcite the arguments of the call are checked. In Hive the check is incomplete/wrong cause the last argument of the call determines the result, while in Calcite all call arguments must be constants. > Drop HiveRelMdPredicates::getPredicates(Project...) to use that of > RelMdPredicates > -- > > Key: HIVE-25952 > URL: https://issues.apache.org/jira/browse/HIVE-25952 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 4.0.0 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are some differences on this method between Hive and Calcite, the idea > of this ticket is to unify the two methods, and then drop the override in > HiveRelMdPredicates in favour of the method of RelMdPredicates. > After applying HIVE-25966, the only difference is in the test for constant > expressions, which can be summarized as follows: > ||Expression Type|Is Constant for Hive?||Is Constant for Calcite?|| > |InputRef|False|False| > |Call|True if function is deterministic (arguments are not checked), false > otherwise|True if function is deterministic and all operands are constants, > false otherwise| > |CorrelatedVariable|False|False| > |LocalRef|False|False| > |Over|False|False| > |DymanicParameter|False|True| > |RangeRef|False|False| > |FieldAccess|False|Given expr.field, true if expr is constant, false > otherwise| -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion
[ https://issues.apache.org/jira/browse/HIVE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867257#comment-17867257 ] Stamatis Zampetakis commented on HIVE-28359: As of now, we retain only the last 5 builds for each PR for at most 2 months. For the master branch we keep all builds for at least one year. > Discard old builds in Jenkins to avoid disk space exhaustion > > > Key: HIVE-28359 > URL: https://issues.apache.org/jira/browse/HIVE-28359 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > Attachments: builds.txt > > > Currently Jenkins retains the builds from all active branches/PRs. > {code:bash} > for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls > -l $b | wc -l; done | sort -k2 -rn > builds.txt > {code} > Some PRs (e.g., > [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/]) > with an excessive number of builds (i.e., 66) can easily consume many GBs of > data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was > saved on April 26, 2024 and it is now more than 2 months old. > For master, we currently have all builds since January 2023 (previous builds > where manually removed as part of HIVE-28013). The builds for master occupy > currently 50GB of space. > Due to the above the disk space (persistent volume) cannot be reclaimed and > currently it is almost full (91% /var/jenkins_home). > {noformat} > kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df" > Filesystem 1K-blocks Used Available Use% Mounted on > overlay 98831908 4675004 94140520 5% / > tmpfs 65536 0 65536 0% /dev > tmpfs6645236 0 6645236 0% /sys/fs/cgroup > /dev/sdb 308521792 278996208 29509200 91% /var/jenkins_home > /dev/sda1 98831908 4675004 94140520 5% /etc/hosts > shm65536 0 65536 0% /dev/shm > tmpfs 1080112812 10801116 1% > /run/secrets/kubernetes.io/serviceaccount > tmpfs6645236 0 6645236 0% /proc/acpi > tmpfs6645236 0 6645236 0% /proc/scsi > tmpfs6645236 0 6645236 0% /sys/firmware > {noformat} > Without a discard policy in place we are going to hit again HIVE-28013 or > other disk related issues pretty soon. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion
[ https://issues.apache.org/jira/browse/HIVE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28359. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [4835968fcdf44ec759f91dbeafec71bf059de42e|https://github.com/apache/hive/commit/4835968fcdf44ec759f91dbeafec71bf059de42e]. Thanks for the review [~kgyrtkirk]! > Discard old builds in Jenkins to avoid disk space exhaustion > > > Key: HIVE-28359 > URL: https://issues.apache.org/jira/browse/HIVE-28359 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > Attachments: builds.txt > > > Currently Jenkins retains the builds from all active branches/PRs. > {code:bash} > for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls > -l $b | wc -l; done | sort -k2 -rn > builds.txt > {code} > Some PRs (e.g., > [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/]) > with an excessive number of builds (i.e., 66) can easily consume many GBs of > data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was > saved on April 26, 2024 and it is now more than 2 months old. > For master, we currently have all builds since January 2023 (previous builds > where manually removed as part of HIVE-28013). The builds for master occupy > currently 50GB of space. > Due to the above the disk space (persistent volume) cannot be reclaimed and > currently it is almost full (91% /var/jenkins_home). > {noformat} > kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df" > Filesystem 1K-blocks Used Available Use% Mounted on > overlay 98831908 4675004 94140520 5% / > tmpfs 65536 0 65536 0% /dev > tmpfs6645236 0 6645236 0% /sys/fs/cgroup > /dev/sdb 308521792 278996208 29509200 91% /var/jenkins_home > /dev/sda1 98831908 4675004 94140520 5% /etc/hosts > shm65536 0 65536 0% /dev/shm > tmpfs 1080112812 10801116 1% > /run/secrets/kubernetes.io/serviceaccount > tmpfs6645236 0 6645236 0% /proc/acpi > tmpfs6645236 0 6645236 0% /proc/scsi > tmpfs6645236 0 6645236 0% /sys/firmware > {noformat} > Without a discard policy in place we are going to hit again HIVE-28013 or > other disk related issues pretty soon. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28376) Remove unused Hive object from RelOptHiveTable
Stamatis Zampetakis created HIVE-28376: -- Summary: Remove unused Hive object from RelOptHiveTable Key: HIVE-28376 URL: https://issues.apache.org/jira/browse/HIVE-28376 Project: Hive Issue Type: Task Components: CBO Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The [Hive|https://github.com/apache/hive/blob/b18d5732b4f309fdc3b8226847c9c1ebcd2476fd/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java] object is not used inside RelOptHiveTable so keeping a reference to it is wasting memory and also complicates creation of RelOptHiveTable objects (constructor parameter). Moreover, the Hive objects have thread local scope so in general they shouldn't be passed around cause their lifecycle becomes harder to manage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28314) Support non-boolean WHERE conditions in CBO
[ https://issues.apache.org/jira/browse/HIVE-28314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28314. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in [b18d5732b4f309fdc3b8226847c9c1ebcd2476fd|https://github.com/apache/hive/commit/b18d5732b4f309fdc3b8226847c9c1ebcd2476fd]. Thanks for the PR [~soumyakanti.das]! > Support non-boolean WHERE conditions in CBO > --- > > Key: HIVE-28314 > URL: https://issues.apache.org/jira/browse/HIVE-28314 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > h3. Filter expression with non-boolean return type > fname=annotate_stats_filter.q > {code:sql} > explain select * from loc_orc where 'foo' > {code} > {noformat} > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Filter > expression with non-boolean return type. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28314) Support non-boolean WHERE conditions in CBO
[ https://issues.apache.org/jira/browse/HIVE-28314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-28314: --- Summary: Support non-boolean WHERE conditions in CBO (was: Support literals as filter expression with non-boolean return type) > Support non-boolean WHERE conditions in CBO > --- > > Key: HIVE-28314 > URL: https://issues.apache.org/jira/browse/HIVE-28314 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > > h3. Filter expression with non-boolean return type > fname=annotate_stats_filter.q > {code:sql} > explain select * from loc_orc where 'foo' > {code} > {noformat} > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: Filter > expression with non-boolean return type. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28321) Support select alias in the having clause for CBO
[ https://issues.apache.org/jira/browse/HIVE-28321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866341#comment-17866341 ] Stamatis Zampetakis commented on HIVE-28321: This is kinda a revert of HIVE-8194. Unfortunately, there is no much background why in HIVE-8194 they opted to not support alias in the HAVING clause apart from the fact that it is not standard behavior. At the moment various DBMS support this feature and there are various primitives in Calcite as well (CALCITE-1306) so it makes sense to make CBO handle this case. > Support select alias in the having clause for CBO > - > > Key: HIVE-28321 > URL: https://issues.apache.org/jira/browse/HIVE-28321 > Project: Hive > Issue Type: Sub-task >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Labels: pull-request-available > > fname=limit_pushdown_negative.q > {code:sql} > explain select value, sum(key) as sum from src group by value having sum > > 100 limit 20 > {code} > {noformat} > org.apache.hadoop.hive.ql.optimizer.calcite.CalciteSemanticException: > Encountered Select alias 'sum' in having clause 'sum > 100' This non standard > behavior is not supported with cbo on. Turn off cbo for these queries. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
[ https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866039#comment-17866039 ] Stamatis Zampetakis commented on HIVE-28339: The upgrade was initiated to address CVE-2024-23897 (among other CVEs) which affected ci.hive.apache.org. For more details, please check: https://lists.apache.org/thread/hrfo4x4tylpvf3q25ro6gys64cmcvyjz > Upgrade Jenkins version in CI from 2.332.3 to 2.452.2 > - > > Key: HIVE-28339 > URL: https://issues.apache.org/jira/browse/HIVE-28339 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > The Jenkins version that is used in [https://ci.hive.apache.org/] is > currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] > which was released in 2022. > The latest stable version at the moment is > [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many > improvements, bug and CVE fixes. > The Dockerfile that is used to build the Jenkins file can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile] > The Kubernetes deployment files can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28362) Fail to materialize a CTE with VOID
[ https://issues.apache.org/jira/browse/HIVE-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863714#comment-17863714 ] Stamatis Zampetakis commented on HIVE-28362: If we can infer a concrete type from the overall context of the query then that would be a nice improvement and definitely worth contributing in this area. On the other hand, I consider type derivation a different topic from supporting VOID/NULL type in DDLs. > Fail to materialize a CTE with VOID > --- > > Key: HIVE-28362 > URL: https://issues.apache.org/jira/browse/HIVE-28362 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > > CTE materialization fails when it includes a NULL literal. > {code:java} > set hive.optimize.cte.materialize.full.aggregate.only=false; > set hive.optimize.cte.materialize.threshold=2; > WITH x AS (SELECT null AS null_value) > SELECT * FROM x UNION ALL SELECT * FROM x; {code} > Error message. > {code:java} > org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT > creates a VOID type, please use CAST to specify the type, near field: > null_value > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8344) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8303) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7846) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11598) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11461) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12397) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12263) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:638) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13136) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1062) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2390) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2338) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2340) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2501) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2323) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12978) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13085) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:332) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:508) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28362) Fail to materialize a CTE with VOID
[ https://issues.apache.org/jira/browse/HIVE-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863195#comment-17863195 ] Stamatis Zampetakis commented on HIVE-28362: Currently I don't think we support CTAS with VOID types so if we want to allow CTE materialization with VOID then we should treat CTAS and other cases first. > Fail to materialize a CTE with VOID > --- > > Key: HIVE-28362 > URL: https://issues.apache.org/jira/browse/HIVE-28362 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > > CTE materialization fails when it includes a NULL literal. > {code:java} > set hive.optimize.cte.materialize.full.aggregate.only=false; > set hive.optimize.cte.materialize.threshold=2; > WITH x AS (SELECT null AS null_value) > SELECT * FROM x UNION ALL SELECT * FROM x; {code} > Error message. > {code:java} > org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT > creates a VOID type, please use CAST to specify the type, near field: > null_value > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8344) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8303) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7846) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11598) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11461) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12397) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12263) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:638) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13136) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1062) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2390) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2338) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2340) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2501) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2323) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12978) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13085) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:332) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:508) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28362) Fail to materialize a CTE with VOID
[ https://issues.apache.org/jira/browse/HIVE-28362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863194#comment-17863194 ] Stamatis Zampetakis commented on HIVE-28362: I think the behavior is expected and not a bug. There is no possible way to infer the type of the "null" column so we cannot pick the correct type for materializing it. Please check HIVE-11217 for more context. Even without materialization, the query is quite ambiguous since we cannot derive the type of the result. > Fail to materialize a CTE with VOID > --- > > Key: HIVE-28362 > URL: https://issues.apache.org/jira/browse/HIVE-28362 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Shohei Okumiya >Assignee: Shohei Okumiya >Priority: Major > Labels: pull-request-available > > CTE materialization fails when it includes a NULL literal. > {code:java} > set hive.optimize.cte.materialize.full.aggregate.only=false; > set hive.optimize.cte.materialize.threshold=2; > WITH x AS (SELECT null AS null_value) > SELECT * FROM x UNION ALL SELECT * FROM x; {code} > Error message. > {code:java} > org.apache.hadoop.hive.ql.parse.SemanticException: CREATE-TABLE-AS-SELECT > creates a VOID type, please use CAST to specify the type, near field: > null_value > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8344) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.deriveFileSinkColTypes(SemanticAnalyzer.java:8303) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7846) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11598) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:11461) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12397) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:12263) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:638) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13136) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.materializeCTE(CalcitePlanner.java:1062) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2390) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2338) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2340) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2501) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:2323) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12978) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13085) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:465) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:332) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:508) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26332) Upgrade maven-surefire-plugin to 3.2.5
[ https://issues.apache.org/jira/browse/HIVE-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17863155#comment-17863155 ] Stamatis Zampetakis commented on HIVE-26332: Currently our CI does not allow the use of https://repository.apache.org/snapshots/ so can't test this widely. I know what needs to be changed but don't have the permissions to do so. I asked for the necessary privileges and waiting to get them. > Upgrade maven-surefire-plugin to 3.2.5 > -- > > Key: HIVE-26332 > URL: https://issues.apache.org/jira/browse/HIVE-26332 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently we use 3.0.0-M4 which was released in 2019. Since there have been > multiple bug fixes and improvements. > Worth mentioning that interaction with JUnit5 is much more mature as well and > this is one of the main reasons driving this upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26332) Upgrade maven-surefire-plugin to 3.2.5
[ https://issues.apache.org/jira/browse/HIVE-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862807#comment-17862807 ] Stamatis Zampetakis commented on HIVE-26332: I started a full test run using surefire 3.1.3-SNAPSHOT for testing the state after resolving SUREFIRE-1934. Once the tests finish I will share the findings. > Upgrade maven-surefire-plugin to 3.2.5 > -- > > Key: HIVE-26332 > URL: https://issues.apache.org/jira/browse/HIVE-26332 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently we use 3.0.0-M4 which was released in 2019. Since there have been > multiple bug fixes and improvements. > Worth mentioning that interaction with JUnit5 is much more mature as well and > this is one of the main reasons driving this upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
[ https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862763#comment-17862763 ] Stamatis Zampetakis commented on HIVE-28339: The new upgrade Jenkins image is now available at https://hub.docker.com/repository/docker/apache/hive-ci-jenkins/ and was created by the Dockerfile present in PR#5331. Tomorrow July 4, 2024 starting at 9:00 UTC I will start the upgrade process. Basically, the main thing that needs to be done is update the existing Jenkins Kubernetes deployment (deployment.apps/jenkins) and change the image to use "apache/hive-ci-jenkins:lts-jdk21" instead of "kgyrtkirk/htk-jenkins". > Upgrade Jenkins version in CI from 2.332.3 to 2.452.2 > - > > Key: HIVE-28339 > URL: https://issues.apache.org/jira/browse/HIVE-28339 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > > The Jenkins version that is used in [https://ci.hive.apache.org/] is > currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] > which was released in 2022. > The latest stable version at the moment is > [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many > improvements, bug and CVE fixes. > The Dockerfile that is used to build the Jenkins file can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile] > The Kubernetes deployment files can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion
[ https://issues.apache.org/jira/browse/HIVE-28359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-28359 started by Stamatis Zampetakis. -- > Discard old builds in Jenkins to avoid disk space exhaustion > > > Key: HIVE-28359 > URL: https://issues.apache.org/jira/browse/HIVE-28359 > Project: Hive > Issue Type: Task > Components: Testing Infrastructure >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: builds.txt > > > Currently Jenkins retains the builds from all active branches/PRs. > {code:bash} > for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls > -l $b | wc -l; done | sort -k2 -rn > builds.txt > {code} > Some PRs (e.g., > [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/]) > with an excessive number of builds (i.e., 66) can easily consume many GBs of > data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was > saved on April 26, 2024 and it is now more than 2 months old. > For master, we currently have all builds since January 2023 (previous builds > where manually removed as part of HIVE-28013). The builds for master occupy > currently 50GB of space. > Due to the above the disk space (persistent volume) cannot be reclaimed and > currently it is almost full (91% /var/jenkins_home). > {noformat} > kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df" > Filesystem 1K-blocks Used Available Use% Mounted on > overlay 98831908 4675004 94140520 5% / > tmpfs 65536 0 65536 0% /dev > tmpfs6645236 0 6645236 0% /sys/fs/cgroup > /dev/sdb 308521792 278996208 29509200 91% /var/jenkins_home > /dev/sda1 98831908 4675004 94140520 5% /etc/hosts > shm65536 0 65536 0% /dev/shm > tmpfs 1080112812 10801116 1% > /run/secrets/kubernetes.io/serviceaccount > tmpfs6645236 0 6645236 0% /proc/acpi > tmpfs6645236 0 6645236 0% /proc/scsi > tmpfs6645236 0 6645236 0% /sys/firmware > {noformat} > Without a discard policy in place we are going to hit again HIVE-28013 or > other disk related issues pretty soon. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28359) Discard old builds in Jenkins to avoid disk space exhaustion
Stamatis Zampetakis created HIVE-28359: -- Summary: Discard old builds in Jenkins to avoid disk space exhaustion Key: HIVE-28359 URL: https://issues.apache.org/jira/browse/HIVE-28359 Project: Hive Issue Type: Task Components: Testing Infrastructure Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Attachments: builds.txt Currently Jenkins retains the builds from all active branches/PRs. {code:bash} for b in `find var/jenkins_home/jobs -name "builds"`; do echo -n $b" " ; ls -l $b | wc -l; done | sort -k2 -rn > builds.txt {code} Some PRs (e.g., [PR-5216|https://ci.hive.apache.org/job/hive-precommit/view/change-requests/job/PR-5216/]) with an excessive number of builds (i.e., 66) can easily consume many GBs of data (PR-5216 uses 13GB for the builds). The first build for PR-5216 was saved on April 26, 2024 and it is now more than 2 months old. For master, we currently have all builds since January 2023 (previous builds where manually removed as part of HIVE-28013). The builds for master occupy currently 50GB of space. Due to the above the disk space (persistent volume) cannot be reclaimed and currently it is almost full (91% /var/jenkins_home). {noformat} kubectl exec jenkins-6858ddb664-l4xfg -- bash -c "df" Filesystem 1K-blocks Used Available Use% Mounted on overlay 98831908 4675004 94140520 5% / tmpfs 65536 0 65536 0% /dev tmpfs6645236 0 6645236 0% /sys/fs/cgroup /dev/sdb 308521792 278996208 29509200 91% /var/jenkins_home /dev/sda1 98831908 4675004 94140520 5% /etc/hosts shm65536 0 65536 0% /dev/shm tmpfs 1080112812 10801116 1% /run/secrets/kubernetes.io/serviceaccount tmpfs6645236 0 6645236 0% /proc/acpi tmpfs6645236 0 6645236 0% /proc/scsi tmpfs6645236 0 6645236 0% /sys/firmware {noformat} Without a discard policy in place we are going to hit again HIVE-28013 or other disk related issues pretty soon. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
[ https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17861089#comment-17861089 ] Stamatis Zampetakis commented on HIVE-28339: I tested starting Jenkins from a vanilla Jenkins image (jenkins/jenkins:lts-jdk17) using the jenkins_home_backup.tar obtained previously. {noformat} tar -xvf jenkins_home_backup.tar docker run -p 35000:8080 -v /home/stamatis/Issues/HIVE-28339/var/jenkins_home:/var/jenkins_home jenkins/jenkins:lts-jdk17 {noformat} Unfortunately Jenkins cannot start and there are many SEVERE errors due to the plugins and configuration that is present in the jenkins_home directory. {noformat}> docker logs CONTAINER_NAME 2>&1 | grep SEVERE 2024-07-01 08:04:00.498+ [id=32]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Mina SSHD API :: Core v2.12.1-101.v85b_e08b_780dd (mina-sshd-api-core) 2024-07-01 08:04:00.499+ [id=32]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin SSH server v3.322.v159e91f6a_550 (sshd) 2024-07-01 08:04:00.537+ [id=55]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Jenkins GIT server Plugin v1.11 (git-server) 2024-07-01 08:04:00.538+ [id=55]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Pipeline: Deprecated Groovy Libraries v588.v576c103a_ff86 (workflow-cps-global-lib) 2024-07-01 08:04:00.539+ [id=55]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Pipeline: Declarative v2.2064.v5eef7d0982b_e (pipeline-model-definition) 2024-07-01 08:04:00.540+ [id=55]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Pipeline implementation for Blue Ocean v1.25.5 (blueocean-pipeline-api-impl) 2024-07-01 08:04:00.541+ [id=55]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Bitbucket Pipeline for Blue Ocean v1.25.5 (blueocean-bitbucket-pipeline) 2024-07-01 08:04:00.542+ [id=38]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Events API for Blue Ocean v1.25.5 (blueocean-events) 2024-07-01 08:04:00.543+ [id=38]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Git Pipeline for Blue Ocean v1.25.5 (blueocean-git-pipeline) 2024-07-01 08:04:00.543+ [id=38]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin GitHub Pipeline for Blue Ocean v1.25.5 (blueocean-github-pipeline) 2024-07-01 08:04:00.545+ [id=33]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Blue Ocean Pipeline Editor v1.25.5 (blueocean-pipeline-editor) 2024-07-01 08:04:00.546+ [id=33]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Blue Ocean v1.25.5 (blueocean) 2024-07-01 08:04:00.603+ [id=45]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Docker Pipeline v1.28 (docker-workflow) 2024-07-01 08:04:00.704+ [id=45]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading plugin Matrix Authorization Strategy Plugin v3.2.2 (matrix-auth) 2024-07-01 08:04:04.340+ [id=46]SEVERE jenkins.InitReactorRunner$1#onTaskFailed: Failed Loading global config 2024-07-01 08:04:04.342+ [id=26]SEVERE hudson.util.BootFailure#publish: Failed to initialize Jenkins {noformat} So I replying to [my own previous|https://issues.apache.org/jira/browse/HIVE-28339?focusedCommentId=17860168&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17860168] it is not possible to use a vanilla Jenkins image and we need to publish and maintain our custom Jenkins images with all the necessary plugins installed. I managed to make Jenkins start by modifying slightly the respective [Dockerfile|https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile] that we use currently in CI. I will raise an INFRA ticket to request the https://hub.docker.com/r/apache/hive-ci-jenkins/ to be created so we can publish the image there. I will also create a PR for apache/hive with the Dockerfile so that we have everything in the official Apache namespace. > Upgrade Jenkins version in CI from 2.332.3 to 2.452.2 > - > > Key: HIVE-28339 > URL: https://issues.apache.org/jira/browse/HIVE-28339 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The Jenkins version that is used in [https://ci.hive.apache.org/] is > currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] > which was released in 2022. > The latest stable version at the moment is > [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many > improveme
[jira] [Resolved] (HIVE-28340) Test concurrent JDBC connections with Kerberized cluster, impersonation, and HTTP transport
[ https://issues.apache.org/jira/browse/HIVE-28340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28340. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/fe2e17c3ad4773a4b1066ac525f7de2a86572eca Thanks for the review [~dengzh]! > Test concurrent JDBC connections with Kerberized cluster, impersonation, and > HTTP transport > --- > > Key: HIVE-28340 > URL: https://issues.apache.org/jira/browse/HIVE-28340 > Project: Hive > Issue Type: Test > Components: HiveServer2 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > The new test case simulates a scenario with two JDBC clients doing the > following in parallel: > * client 1, continuously opens and closes connections (short-lived > connection) > * client 2, opens connection, sends fixed number of simple queries, closes > connection (long-lived connection) > Since the clients are running in parallel we have one long-lived session in > HS2 interleaved with many short ones. > The test case aims to increase test coverage and guard against regressions in > the presence of many interleaved HS2 sessions. > In older versions, without HIVE-27201, this test fails (with the exception > outlined below) when the cluster is Kerberized, and we are using HTTP > transport mode with impersonation enabled. > {noformat} > javax.security.sasl.SaslException: GSS initiate failed > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > ~[?:1.8.0_261] > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:96) > ~[libthrift-0.16.0.jar:0.16.0] > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:238) > ~[libthrift-0.16.0.jar:0.16.0] > at > org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:39) > ~[libthrift-0.16.0.jar:0.16.0]{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28310) Disable hive.optimize.join.disjunctive.transitive.predicates.pushdown by default
[ https://issues.apache.org/jira/browse/HIVE-28310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28310. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/a875a455867979758e24e51f97481f62ad80bc07 Thanks for the reviews [~asolimando] and [~kkasa]! > Disable hive.optimize.join.disjunctive.transitive.predicates.pushdown by > default > > > Key: HIVE-28310 > URL: https://issues.apache.org/jira/browse/HIVE-28310 > Project: Hive > Issue Type: Task > Components: CBO >Affects Versions: 4.0.0 >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > HIVE-25758 introduced > hive.optimize.join.disjunctive.transitive.predicates.pushdown to > conditionally limit some features of the HiveJoinPushTransitivePredicatesRule > which are rather unsafe and can lead to Hiveserver2 crashes (OOM, hangs, > etc.). > The property was initially set to true to retain the old behavior and prevent > changes in performance for those queries that work fine as is. However, when > the property is true there are various known cases/queries that can bring > down HS2 completely. When this happens debugging, finding the root cause, and > turning off the property may require lots of effort from developers and users. > In this ticket, we propose to disable the property by default and thus limit > the optimizations performed by the rule (at least till a complete solution is > found for the known problematic cases). > This change favors HS2 stability at the expense of slight performance > degradation in certain queries. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
[ https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860827#comment-17860827 ] Stamatis Zampetakis commented on HIVE-28339: The bulk of stateful content that is maintained by Jenkins is located under the "/var/jenkins_home" directory. {noformat} kubectl exec jenkins-6858ddb664-sg6nl df Filesystem 1K-blocks Used Available Use% Mounted on overlay 98831908 4612016 94203508 5% / tmpfs 65536 0 65536 0% /dev tmpfs6645236 0 6645236 0% /sys/fs/cgroup /dev/sdb 308521792 279898320 28607088 91% /var/jenkins_home /dev/sda1 98831908 4612016 94203508 5% /etc/hosts shm65536 0 65536 0% /dev/shm tmpfs 1080112812 10801116 1% /run/secrets/kubernetes.io/serviceaccount tmpfs6645236 0 6645236 0% /proc/acpi tmpfs6645236 0 6645236 0% /proc/scsi tmpfs6645236 0 6645236 0% /sys/firmware {noformat} As expected the persistent volume used by the Jenkins pod is mounted to the "/var/jenkins_home" directory (see kubectl describe pod/jenkins-6858ddb664-sg6nl). For testing purposes we need to obtain a backup of the jenkins_home directory and try to mount it to the new (upgraded) Jenkins image to ensure that everything will work smoothly. Currently, the jenkins_home directory is 280GB which makes a complete local backup and testing impractical. The majority of disk space is occupied by the "jobs" directory and in particular by archives that are kept for each build, test results, and log files for each run. These files are kept for archiving and diagnosability purposes when users wants to consult the results of each build. However, they are not indispensable for the correct functionality of the Jenkins instance so for the sake of our experiments we can exclude them from the backup. The command that was used to create the backup is given below. {code:bash} kubectl exec jenkins-6858ddb664-sg6nl -- tar cf - --exclude=junitResult.xml --exclude=*log* --exclude=archive --exclude=workflow --exclude=*git/objects* /var/jenkins_home > jenkins_home_backup.tar {code} The command took ~5 minutes to run and created an archive of 1.2GB. The exclusions are referring to voluminous files that are nonessential for testing the upgrade. I am now in the process of testing the new Jenkins image locally by mounting the unpacked jenkins_home_backup.tar directory to the /var/jenkins_home directory of the container. > Upgrade Jenkins version in CI from 2.332.3 to 2.452.2 > - > > Key: HIVE-28339 > URL: https://issues.apache.org/jira/browse/HIVE-28339 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The Jenkins version that is used in [https://ci.hive.apache.org/] is > currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] > which was released in 2022. > The latest stable version at the moment is > [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many > improvements, bug and CVE fixes. > The Dockerfile that is used to build the Jenkins file can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile] > The Kubernetes deployment files can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
[ https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-28339: -- Assignee: Stamatis Zampetakis > Upgrade Jenkins version in CI from 2.332.3 to 2.452.2 > - > > Key: HIVE-28339 > URL: https://issues.apache.org/jira/browse/HIVE-28339 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The Jenkins version that is used in [https://ci.hive.apache.org/] is > currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] > which was released in 2022. > The latest stable version at the moment is > [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many > improvements, bug and CVE fixes. > The Dockerfile that is used to build the Jenkins file can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile] > The Kubernetes deployment files can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
[ https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-28339 started by Stamatis Zampetakis. -- > Upgrade Jenkins version in CI from 2.332.3 to 2.452.2 > - > > Key: HIVE-28339 > URL: https://issues.apache.org/jira/browse/HIVE-28339 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > > The Jenkins version that is used in [https://ci.hive.apache.org/] is > currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] > which was released in 2022. > The latest stable version at the moment is > [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many > improvements, bug and CVE fixes. > The Dockerfile that is used to build the Jenkins file can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile] > The Kubernetes deployment files can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
[ https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860168#comment-17860168 ] Stamatis Zampetakis commented on HIVE-28339: Given that in the CI we are using persistent volumes where all Jenkins configurations and plugins remain as is I am trying to see if we really need to have and maintain a custom Jenkins image. Any thoughts [~abstractdog]? > Upgrade Jenkins version in CI from 2.332.3 to 2.452.2 > - > > Key: HIVE-28339 > URL: https://issues.apache.org/jira/browse/HIVE-28339 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Priority: Major > > The Jenkins version that is used in [https://ci.hive.apache.org/] is > currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] > which was released in 2022. > The latest stable version at the moment is > [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many > improvements, bug and CVE fixes. > The Dockerfile that is used to build the Jenkins file can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile] > The Kubernetes deployment files can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28345) Avoid redundant HiveConf creation in MiniHS2.Builder
[ https://issues.apache.org/jira/browse/HIVE-28345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28345. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/633af371edf0823967da2dca50c3893855dab626 Thanks for the reviews [~okumin] [~simhadri-g]! > Avoid redundant HiveConf creation in MiniHS2.Builder > > > Key: HIVE-28345 > URL: https://issues.apache.org/jira/browse/HIVE-28345 > Project: Hive > Issue Type: Improvement > Components: Tests >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > Every creation of a MiniHS2.Builder object triggers the creation of a > [HiveConf > object|https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L100]. > In many cases this new configuration object is thrown away and replaced by > another conf object via the [withConf > method|https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L159]. > Creating a HiveConf object is computationally heavy so for performance > reasons its best to avoid it if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28339) Upgrade Jenkins version in CI from 2.332.3 to 2.452.2
[ https://issues.apache.org/jira/browse/HIVE-28339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859862#comment-17859862 ] Stamatis Zampetakis commented on HIVE-28339: Here is a rough outline of the steps that I have in mind. # Build a new Jenkins image using the aforementioned Dockerfile # Push the image to some container registry: ## [https://hub.docker.com/r/kgyrtkirk/htk-jenkins] ## [https://hub.docker.com/r/apache/hive-ci-jenkins/] (Need to request a new Docker namespace from INFRA) ## [https://hub.docker.com/r/zabetak/hive-ci-jenkins/] # Take backups from existing Jenkins instance # Test/Start new Jenkins image locally (and ensure backups are working if necessary) # Send an email to dev@ about estimated downtime # Modify the kubernetes deployment file to point to new image (if necessary) # Restart the Jenkins pod (with or without backups) and hope for the best # Send an email when CI is operational Please add/remove/suggest others as you see fit. > Upgrade Jenkins version in CI from 2.332.3 to 2.452.2 > - > > Key: HIVE-28339 > URL: https://issues.apache.org/jira/browse/HIVE-28339 > Project: Hive > Issue Type: Task >Reporter: Stamatis Zampetakis >Priority: Major > > The Jenkins version that is used in [https://ci.hive.apache.org/] is > currently at [2.332.3|https://www.jenkins.io/changelog-stable/#v2.332.3] > which was released in 2022. > The latest stable version at the moment is > [2.452.2|https://www.jenkins.io/changelog-stable/#v2.452.2] and contains many > improvements, bug and CVE fixes. > The Dockerfile that is used to build the Jenkins file can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/blob/master/htk-jenkins/Dockerfile] > The Kubernetes deployment files can be found here: > [https://github.com/kgyrtkirk/hive-test-kube/tree/master/k8s] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28345) Avoid redundant HiveConf creation in MiniHS2.Builder
Stamatis Zampetakis created HIVE-28345: -- Summary: Avoid redundant HiveConf creation in MiniHS2.Builder Key: HIVE-28345 URL: https://issues.apache.org/jira/browse/HIVE-28345 Project: Hive Issue Type: Improvement Components: Tests Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Every creation of a MiniHS2.Builder object triggers the creation of a [HiveConf object|https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L100]. In many cases this new configuration object is thrown away and replaced by another conf object via the [withConf method|https://github.com/apache/hive/blob/1c9969a003b09abc851ae7e19631ad208d3b6066/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java#L159]. Creating a HiveConf object is computationally heavy so for performance reasons its best to avoid it if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28311) Backward compatibility of java.sql.Date and java.sql.Timestamp in hive-serde
[ https://issues.apache.org/jira/browse/HIVE-28311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856729#comment-17856729 ] Stamatis Zampetakis commented on HIVE-28311: [~wechar] Can you please add more details around the actual use case/setup that leads to this ClassCastException. It would be nice if we can ensure that changes are backward compatible but we should be mindful not generate correctness problems in doing so. > Backward compatibility of java.sql.Date and java.sql.Timestamp in hive-serde > > > Key: HIVE-28311 > URL: https://issues.apache.org/jira/browse/HIVE-28311 > Project: Hive > Issue Type: Bug >Reporter: Wechar >Assignee: Wechar >Priority: Major > Labels: pull-request-available > > HIVE-20007 introduced {{org.apache.hadoop.hive.common.type.Date}} and > {{org.apache.hadoop.hive.common.type.Timestamp}} to replace {{java.sql.Date}} > and {{{}java.sql.Timestamp{}}}. > It's a huge improvements but it also produce incompatibility issues for > clients without this update. > {code:bash} > Caused by: java.lang.ClassCastException: java.sql.Timestamp cannot be cast to > org.apache.hadoop.hive.common.type.Timestamp > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaTimestampObjectInspector.getPrimitiveWritableObject(JavaTimestampObjectInspector.java:33) > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getTimestamp(PrimitiveObjectInspectorUtils.java:1232) > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TimestampConverter.convert(PrimitiveObjectInspectorConverter.java:291) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getConstantObjectInspector(ObjectInspectorUtils.java:1397) > at > org.apache.hadoop.hive.ql.plan.ExprNodeConstantDesc.getWritableObjectInspector(ExprNodeConstantDesc.java:93) > at > org.apache.hadoop.hive.ql.exec.ExprNodeConstantEvaluator.(ExprNodeConstantEvaluator.java:41) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:49) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.(ExprNodeGenericFuncEvaluator.java:101) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:58) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorFactory.get(ExprNodeEvaluatorFactory.java:43) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.prepareExpr(PartExprEvalUtils.java:118) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prunePartitionNames(PartitionPruner.java:551) > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:73) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionNamesPrunedByExprNoTxn(ObjectStore.java:3606) > at > org.apache.hadoop.hive.metastore.ObjectStore.access$1000(ObjectStore.java:241) > at > org.apache.hadoop.hive.metastore.ObjectStore$16.getJdoResult(ObjectStore.java:4157) > at > org.apache.hadoop.hive.metastore.ObjectStore$16.getJdoResult(ObjectStore.java:4124) > at > org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3913) > ... 30 more > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-28337) TestMetaStoreUtils fails for invalid timestamps
[ https://issues.apache.org/jira/browse/HIVE-28337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856721#comment-17856721 ] Stamatis Zampetakis commented on HIVE-28337: The description of the ticket implies that there is a bug in the testing methodology but I would argue that the bug is in the production code instead. The test aims to ensure that the conversion from/to string does not alter the value in any way no matter the timezone. The DATE, and TIMESTAMP datatypes in standard SQL are timezone agnostic and the [same semantics are adopted in Hive|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]. There are still various known issues in Hive for dates/timestamps that fall into DST shift and the MetaStoreUtils API is probably affected. > TestMetaStoreUtils fails for invalid timestamps > --- > > Key: HIVE-28337 > URL: https://issues.apache.org/jira/browse/HIVE-28337 > Project: Hive > Issue Type: Bug >Reporter: Kiran Velumuri >Assignee: Kiran Velumuri >Priority: Major > Labels: pull-request-available > Attachments: image-2024-06-18-12-42-05-646.png, > image-2024-06-18-12-42-31-472.png > > > The test > org.apache.hadoop.hive.metastore.utils.TestMetaStoreUtils#testTimestampToString > and #testDateToString fails for invalid timestamps in the following cases: > 1. Timestamps in time-zones which observe daylight savings during which the > clock is set forward(typicallly 2:00 AM - 3:00 AM) > Example: 2417-03-26T02:08:43 in Europe/Paris is invalid, and would get > converted to 2417-03-26T03:08:43 by Timestamp.valueOf() method > This is happening due to representing timestamp as LocalDateTime in > TestMetaStoreUtils, which is independent of the time-zone of the timestamp. > This LocalDateTime timestamp when combined with time-zone is leading to > invalid timestamp. > > 2. Timestamps with year as '' > Example: -01-07T22:44:36 is invalid and would get converted to > 0001-01-07T22:44:36 by Timestamp.valueof() method > Year '' is invalid and should not be included while generating the test > cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28340) Test concurrent JDBC connections with Kerberized cluster, impersonation, and HTTP transport
Stamatis Zampetakis created HIVE-28340: -- Summary: Test concurrent JDBC connections with Kerberized cluster, impersonation, and HTTP transport Key: HIVE-28340 URL: https://issues.apache.org/jira/browse/HIVE-28340 Project: Hive Issue Type: Test Components: HiveServer2 Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis The new test case simulates a scenario with two JDBC clients doing the following in parallel: * client 1, continuously opens and closes connections (short-lived connection) * client 2, opens connection, sends fixed number of simple queries, closes connection (long-lived connection) Since the clients are running in parallel we have one long-lived session in HS2 interleaved with many short ones. The test case aims to increase test coverage and guard against regressions in the presence of many interleaved HS2 sessions. In older versions, without HIVE-27201, this test fails (with the exception outlined below) when the cluster is Kerberized, and we are using HTTP transport mode with impersonation enabled. {noformat} javax.security.sasl.SaslException: GSS initiate failed at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) ~[?:1.8.0_261] at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:96) ~[libthrift-0.16.0.jar:0.16.0] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:238) ~[libthrift-0.16.0.jar:0.16.0] at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:39) ~[libthrift-0.16.0.jar:0.16.0]{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)